Most startup infrastructure advice is written for companies you are not. Talks by engineers running a thousand customers per second of traffic, or whichever framework X is hyped about this month. Neither tells you what to build when you need to ship, test in production, and survive until Series B.
This is the stack I’d pick. Slight tilt towards AI features because most products being built now have an LLM call, an eval, or an embedding somewhere in the workflow. Credit to Arjay the Dev, whose system design tier list got me thinking about how I’d order this for a builder rather than a learner.
None of this is a new argument. It sits in the tradition of Dan McKinley’s ‘Choose Boring Technology’ and LeadDev’s ‘Boring Stack’ piece. What’s new here is the 2026 cut: fitting AI features into a stack that still has to be debuggable on a Sunday.
The non-negotiable foundation: REST and Postgres#
REST APIs and a Postgres database. The two most boring picks on the list and the two I’d defend the hardest.
REST. Don’t reach for GraphQL unless you have a specific reason: mobile clients pulling deeply nested data, or a public API where consumers genuinely need flexible querying. Most startups are neither. JSON over HTTP with predictable endpoints gets you to Series A without breaking a sweat.
Postgres. My rule: Postgres everything until something proves you need otherwise. Transactions, JSON, full-text search, pgvector for embeddings, queues if you want them. The number of AI startups I’ve seen reach for Pinecone when they had 10,000 vectors is depressing. pgvector handles that, and into the millions if you index and query properly.
The upgrade path when it eventually matters: read replicas, then PgBouncer, then partitioning. Sharding is a Series C problem, not Series A.
Shipping speed: containers and CI/CD#
These two belong together. The fastest startups I’ve worked with have a Dockerfile, a CI pipeline that runs tests on every push, and a deploy that runs on merge to main. No tickets, no approvals, no Friday deploy freeze.
Docker for everything. Local dev, CI, prod. The “containers are slower than running natively” argument is mostly noise at startup scale. The consistency is worth the 200ms.
GitHub Actions for CI/CD. You don’t need Jenkins or a platform team. A Dockerfile, a workflow file, a deploy target, and you’re done. The first version can be ugly. Iterate when it breaks.
Concrete example. I built a Polymarket market-making bot recently and the only reason I could validate the strategy fast was the CI/CD setup. Push, deploy, watch the next round of trades, tweak, push again. Without that, the bot would have died half-finished on my laptop.
For an AI startup, your CI pipeline should be running your evals. Skip that and you’ll ship a regression and find out from a customer. Evals are the new tests.
Later, this becomes staging environments, canary deploys, and feature flags. Feature flags matter earlier for AI startups because the pace of model and prompt iteration means you want to ship dark, test on a slice, and roll back fast. PostHog or a flag table in Postgres works; LaunchDarkly is overkill early.
Don’t roll your own auth#
My pick: Clerk by default. WorkOS if B2B SSO is core to your customer profile. Don’t roll your own.
The amount of founder time I’ve seen burned on custom JWT and password reset flows is absurd. Every one of those flows has a security failure mode. The providers have spent millions on these. You haven’t. Pay the bill.
The trap is “but we have weird auth requirements.” Usually you don’t. Standard email/password or SSO with maybe a custom claim or two, and every provider handles it. Custom auth only makes sense for multi-tenant federation, weird compliance constraints, or if you’re building auth itself.
Monitoring (and evals) before you need them#
Instrument before launch, not after. That’s the whole rule.
The minimum: Sentry for errors, structured logs, and one LLM observability tool if you’re shipping AI features. APM (Datadog, New Relic, OpenTelemetry → Grafana) is nice to have but not on the critical path day one.
Concrete example. I built an RFID linen tracking system for SPL where Grafana was doing the heavy lifting from day one. The dashboards caught bad reads and offline scanners before the ops team did. The system felt more reliable than it was, because we saw issues first.
For AI features, three things matter: LLM call logging, cost tracking per feature or per tenant, and eval scores sampled from production traffic. You don’t have to build any of that yourself in 2026. My shortlist: Helicone (fastest setup, change one base URL), Langfuse (open-source MIT, the full-featured free option), Braintrust (commercial, used by Stripe, Notion, Vercel, Perplexity). Pick one and wire it in the day you ship the AI feature, not later.
The eval pipeline is the one founders skip. Sample production traffic, run it through evaluators (LLM-as-judge, regex, human review), store the results, alert on regressions. Langfuse and Braintrust handle this as a core feature, so you usually don’t need a separate evals stack. Without it, you don’t know if your last prompt change made things better or worse.
Cost tracking is the one founders forget until the bill arrives. Anthropic and OpenAI will happily charge you $50k in a month if a customer loops your agent. You want a dashboard that tells you which feature is burning which dollars, and an alert when any one customer exceeds a threshold.
Object storage#
S3, R2, or GCS. Pick one. Anything that’s a file goes there: user uploads, generated artefacts, model outputs, embeddings dumps, eval datasets, training data, fine-tune snapshots.
The mistake I see is people storing files on the server’s local disk because it’s faster to wire up. Scale horizontally, the file lives on one server, the next request hits a different one, and you’ve got a “we lost a customer’s upload” bug. Use object storage from day one.
R2 for AI startups specifically: no egress fees. If you’re serving model outputs or large generated files, S3 egress costs will eat your margins.
Queues and the AI gateway#
Queues are not optional for an AI startup. LLM calls are slow (a complex agent run can take 30+ seconds, and you cannot block an HTTP request that long), and providers have rate limits you need to back off and retry against, not throw 500s on.
My picks: Postgres-backed queue first (Oban for Elixir, River for Go). SQS if you’re on AWS and want managed. Redis with BullMQ or Sidekiq only if you already have Redis for other reasons. Pick by language, not by hype.
And in front of the model providers, run an AI gateway: LiteLLM (open-source) or Portkey (commercial). Cloudflare AI Gateway if you’re already on Cloudflare. The gateway handles multi-provider routing, retries, caching, and cost attribution before your queue ever sees a rate-limit problem. Calling OpenAI or Anthropic directly from your API handler with no gateway is a bug waiting to happen.
Load balancers and CDNs#
Both relevant the day you go from one server to two.
Load balancers are usually free from your hosting provider. Configure health checks. Make your app handle SIGTERM properly so deploys don’t drop requests. Move on.
CDNs are where founders under-invest. Cloudflare in front of most things is cheap enough that there’s usually no good excuse to ignore it, and it makes your app feel faster everywhere in the world. For AI startups serving generated images, audio, or video, it’s the difference between a 4-second load and an instant one.
The less obvious CDN play for AI: cache common LLM responses. If 80% of your users are asking the same five questions of your support agent, cache the answers in front of the model.
The boring safety layer people skip#
Backups, migrations, secrets, hard usage limits, idempotency, tenant boundaries. None of these sound like system design until they fail.
Backups are not real until you’ve restored from one. Schedule a quarterly drill where someone pulls last night’s backup into a scratch environment and brings the app up against it. You’ll find something broken in the process, which is the point.
Migrations belong in your deployment story, not in a Slack DM at 11pm. Wire your migration tool into CI/CD so schema changes ship with the code that needs them. Manual production migrations are how teams lose data.
Secrets live in a real secrets manager: AWS Secrets Manager, Doppler, Infisical, or your PaaS’s built-in store. Not in a .env file passed around Slack. This matters double for AI startups, where you’re typically wiring together six third-party APIs with their own keys and their own blast radius if leaked.
Hard usage limits, not just dashboards. A dashboard tells you the bill is bad. A per-user, per-tenant, or per-job ceiling stops it getting bad. For LLM features specifically, set a daily token cap per tenant and a circuit breaker per job. Then test the limits actually fire.
Idempotency from day one. Anything that can retry can run twice. If a job sends an email, charges a customer, generates a document, writes to a CRM, or calls a model, assume it will eventually be retried at the worst possible time.
Tenant boundaries deliberate from the first table if you’re building B2B SaaS. A tenant_id column on every relevant table and a query-layer guard that refuses to run without one. Not sophisticated. Impossible to forget.
None of this is glamorous. It’s the difference between “we shipped fast” and “we shipped fast and didn’t wake up to a production incident we could have prevented.”
What to defer#
Microservices. One service, one repo, one deploy until you have a real team-size reason to split.
Kubernetes. Not until you’ve outgrown the simpler platforms. Fly, Railway, ECS Fargate, and Cloud Run get you a long way without the operational tax.
Elaborate caching. Add a cache when you have a measured performance problem, not before. Cache invalidation is genuinely hard.
Websockets. Use only when you need bidirectional real-time. Server-sent events handle most “stream tokens from the model to the user” cases just fine.
A custom vector database. pgvector or pgvectorscale will hold you longer than you think. Beyond that, Turbopuffer has picked up real traction in 2026, but you do not need it on day one.
One last thing: just use a PaaS#
For most startup validation work, a PaaS deploy will carry you all the way to your first thousand paying users. Fly, Railway, Render, Vercel. Pick one. Point it at your repo. Ship.
Validation is the only thing that matters before product-market fit. Not your stack. Not your infrastructure choices. Whether anyone actually wants this thing you’re building.
I leaned on this hard in the early days of Brokerloop. The whole point was to put something real in front of my cofounder and a handful of brokers, fast, to test whether the GTM thesis held. If I’d spent two weeks setting up Kubernetes and a service mesh, the conversations that actually mattered would have died before they started. PaaS deploy, working product, real signal back inside a week. The cofounder buy-in came from seeing the thing work, not from a deck.
You move off the third-party managed platform when revenue actually justifies running your own infrastructure. The 2026 break-even for self-managed cloud sits around $2.5M of annual compute spend, before the savings cover the engineers you need to run it safely. Most startups never get close during PMF.
(AWS itself has PaaS services too: Elastic Beanstalk, App Runner, Fargate. The move I’m describing is leaving someone else’s fully-managed platform for your own AWS account, where you choose your level of abstraction and own the IaC. You’re trading their opinions for your controls.)
The escape plan nobody hands you: when you do make the move, go specifically to AWS, with Terraform as your IaC layer. Both for the same reason. Hiring pool.
AWS has around 55,000 active cloud engineering postings globally, versus about 42,000 for Azure and 20,000 for GCP. 2.5x more candidates than the next-largest cloud when you’re hiring your first infra engineer. Same logic as picking Postgres over the cool new database. Being able to hire matters more than which option is technically nicer.
Terraform has the dominant IaC ecosystem. 4,800+ providers, around 26 million weekly downloads. Pulumi is genuinely better if your team writes TypeScript or Python daily and wants testable infra in a real language. OpenTofu is the safer licensing bet if you don’t trust HashiCorp’s BSL change. But for hiring leverage, Terraform still wins. You can re-platform later if you outgrow it.
On AWS itself, start at App Runner for most apps. It’s the closest equivalent to what Railway gave you: push a container, get a URL, autoscaling included. Climb to Fargate when you need orchestration: coordinated services, sidecars, scaling at the service level. Raw EC2 only when App Runner and Fargate genuinely can’t do what you need (GPU workloads, specific networking, compliance constraints). Most startups never make it past App Runner.
Almost everything startups over-engineer at the start gets thrown out by month six anyway.
The boring conclusion#
Written out, the stack I’d actually pick before Series B:
- API: REST.
- Database: Postgres, with pgvector if you need vectors.
- Object storage: R2 for AI-heavy workloads (no egress), S3 otherwise.
- Queue: Postgres-backed (Oban, River). SQS if you’re on AWS.
- Containers: Docker.
- CI/CD: GitHub Actions.
- Deploy: Railway, Render, or Fly until revenue justifies AWS via Terraform.
- Auth: Clerk. WorkOS if B2B SSO is core.
- Errors: Sentry.
- Logs: structured, into whatever APM you grow into.
- CDN: Cloudflare in front of everything.
- AI gateway: LiteLLM or Portkey. Don’t call providers directly.
- AI observability: Helicone, Langfuse, or Braintrust. One of them, wired in day one.
Boring on purpose. Hard to beat at this stage.
Startup velocity comes from not being clever in places where clever isn’t required. Save your cleverness for the product. Use the boring infrastructure that the rest of the industry has already debugged for you.
Related#
- Why layered codebases punish humans and AI coding agents: once the stack is boring, this is the shape that keeps the code boring too.
- Why “judgement is the differentiator” advice fails before PMF: the same pre-Series-B pragmatism, applied to product judgement instead of infrastructure.