What Polymarket taught me about event-driven

A Polymarket market-making bot looks event-driven on paper. In production it became streams, queues, schedulers, and state machines stitched together. This is the hybrid that survived.

I built a Polymarket market-making bot. The pitch I’d bought was that event-driven would simplify it. It did the opposite. The bus helped at the edges, but every decision that actually had to be right wanted to be boring, synchronous, and explicit.

The codebase that shipped is a three-layer hybrid: an async bus for facts, a synchronous hot path for decisions, an append-only SQLite log off to the side for memory. Facts are the stream of things that happened: order-book updates, user fills, boundary events. Decisions are the executor’s hot path: strategy intent, risk checks, capital reservation, submit, position update. Memory is the durable log of decisions, fills, errors, and reconciliations. None of the three is optional, and none lives inside a single pattern.

Pure event-driven architecture fails at the moment the bot has to decide, because risk and submit cannot be a loose chain of passive handlers. Pure synchronous architecture fails at the inputs, because market data, fills, reconnects, and resolution signals arrive as streams. The hybrid is what survived contact with production.

The rule: stream the facts, synchronise the decision, append the memory. The literature has been saying a version of this since LMAX in 2011 and Fowler in 2017. The 2025 vendor pitch, a Kafka shape for everything, quietly drops the middle layer.

Problem and requirements#

The bot’s job is to make markets on Polymarket. That means quoting both sides of an outcome, capturing the spread, and unwinding cleanly when the underlying event resolves. The strategy I focused on is complete-set market-making, where the price of the YES and NO tokens for a given outcome should sum to one dollar because every pair is fully collateralised by USDC in a smart contract. When the market drifts away from that sum, there is a small risk-free margin to capture if you can act before someone else does. Polymarket’s CLOB trading docs describe the order book, and the NautilusTrader Polymarket integration docs are a useful neutral reference for how the venue behaves.

The non-negotiable requirements were:

React to order-book changes fast enough that prices have not already moved by the time an order lands.
Survive feed disconnects, rate limits, and partial fills without leaking position.
Keep risk checks deterministic and synchronous, no matter how busy the rest of the system is.
Be possible to reason about after the fact. Every decision, every fill, every reconciliation has to leave a record.
Be testable. Strategies, the executor, the reconciliation logic, and the feed handlers all have to be swappable in isolation.

Those last two are the constraints that quietly do the most architectural work.

Product context#

Market-making is not the same problem as latency arbitrage. The bot is not chasing microseconds against HFT shops. It is trying to be present and well-behaved across many small markets and outwait the people who quote and run. That changes the design. The hot path matters, but throughput-on-one-symbol matters less than reliability across many.

The other relevant detail: most of the code was written with heavy AI assistance. That had consequences I will come back to.

Architecture#

At the centre of the system is an in-memory message bus. One process, one bus, multiple async tasks reading and publishing. The bus is unremarkable except for two policies: coalescing on high-volume topics, and unbounded queues on critical ones. Feeds publish market data and user data. Strategies consume market data and publish orders. The executor consumes orders, runs risk checks, submits to a backend, and publishes fills. The reconciliation service consumes the user-fills feed in live mode and corrects the executor’s optimistic view of the world. A boundary service publishes typed interval-boundary events that drive resolution and redemption.

That sentence sounds tidy. The code is not. Five core supervised tasks at boot (FeedManager, ExecutionEngine, Cache, BoundaryPriceService, ResolutionService), plus MarketFinder, RedemptionSweeper, and ReconciliationService in live mode. Each runs under a supervisor that restarts on crash with exponential backoff capped at thirty seconds and trips a circuit breaker after three crashes in five minutes. The bus is injected into roughly a dozen components. The boundary cases are most of the engineering: what happens when a feed disconnects, what happens when the bus drops an update under load, what happens when the executor’s optimistic fill diverges from what the venue reports.

Here is the diagram view of the main flow.

System diagram: an async message bus on the left feeding a synchronous executor hot path in the middle (strategy intent → risk → submit), with an append-only event log branching off to the right. The middle 'decisions' panel carries a deep-rust accent line and a deep-rust filled execution-engine box. — Fig. 1: bus carries facts, executor runs the synchronous decision, append-only log is the separate memory.

The actual flow, with the topic names from the codebase:

book_update  →  strategy  →  orders  →  executor
                                            ↓
                  risk  →  backend.submit  →  fills
                                                ↓
                                       event_log (SQLite)
                                                ↑
                                    user_fills  →  reconcile

The thing I want to draw attention to: the bus is not the whole system. The executor calls risk checks directly. The executor calls the backend submit directly. The position tracker is updated synchronously inside the executor before any fill event hits the bus. Three of the most consequential moves are deliberately synchronous.

Polymarket bot three-layer hybrid: facts, decisions, memory

Layer	What it owns	Implementation in the bot
Facts	Streaming inputs and things that happened: market data, user fills, boundary events, fills, rejections.	The in-memory bus carries topics including `book_update`, `orders`, `fills`, and `user_fills`. `FeedManager` publishes feed data; `BoundaryPriceService` publishes typed interval-boundary events; `ResolutionService` reacts to resolution state.
Decisions	The synchronous decision hot path where strategy intent becomes risk-checked action.	The executor / `ExecutionEngine` consumes `orders`, calls risk directly, reserves capital, calls `backend.submit_orders`, updates positions synchronously, then publishes `fills` or rejections. The `execution-engine` box is deliberately not a choreographed event chain.
Memory	The durable record used to explain, audit, reconcile, and replay what happened after the fact.	A separate append-only event log records decisions, fills, errors, and reconciliations. `ReconciliationService` compares optimistic executor state against the live `user_fills` feed and writes corrections into the durable history.

Key technical decisions#

This is the section that took me longest to get honest about, because it is the section where the answer is “the literature was already right”.

The bus carries facts, not commands#

Bernd Ruecker has been saying this since 2019. Commands and events are not interchangeable. An event describes something that happened; a command requests an action. They can both ride on messaging infrastructure. They differ in intent, not transport.

The strategy publishes orders to a topic called orders. By Ruecker’s definition that is a command, not an event. There is one consumer. The producer wants the consumer to do something specific. The honest topic name would be something like submit_order or order_command. I left it as orders because the code reads better that way, but I no longer pretend it is an event. Treating it as a command in my head changed how I wrote the consumer side, because commands deserve direct error handling and explicit acknowledgement, where events do not.

The opposite case is book_update. There can be many consumers (cache, strategies, recorder). The producer does not care who responds. Stale messages can be dropped when consumers fall behind because only the latest book matters. That is an event. More specifically, it is the event-carried state transfer pattern with compaction layered on top. Coalescing high-volume topics is not a hack; it is a documented pattern that the vendor pitch decks rarely call by name.

The whole policy is two screens of code:

class CoalescingBuffer:
    """Drop-in queue that keeps only the latest value per key."""

    def __init__(self, key_attr: str) -> None:
        self._key_attr = key_attr
        self._latest: dict[str, Any] = {}
        self._event = asyncio.Event()

    def put_nowait(self, item: Any) -> None:
        key = getattr(item, self._key_attr)
        self._latest[key] = item
        self._event.set()

book_update is keyed by token_id: every new snapshot for a token overwrites the last one. orders and fills are not coalesced; they sit on asyncio.Queue(maxsize=0), truly unbounded. That two-line policy difference is the architecture, written down.

The hot path is synchronous on purpose#

Inside the executor the sequence is: receive an order command, run risk checks, reserve capital, submit to the backend, apply the fill to the position tracker, publish a fill or rejection. Stripped to the load-bearing lines:

# inside ExecutionEngine.execute()
risk_result = await self.risk.check(orders)        # direct call
if not risk_result.passed:
    await self._reject_orders(orders, risk_result.reason)
    return []

self._backend.commit_capital(total_cost)           # sync reservation
fills = await self._backend.submit_orders(orders)  # only awaitable

for fill in fills:
    self.positions.on_fill(fill, ...)              # sync update

There is no risk_check_requested topic. Risk is a function call. Capital reservation is a function call. The position tracker is updated before the fill ever hits the bus. The only await in the path is the network round-trip to the venue.

This is the same shape that LMAX uses at very different scale. The Business Logic Processor in their architecture handles roughly six million orders per second on a single thread, with the single-writer principle eliminating concurrency bugs. Their input and output are event-driven. Their hot path is not. The Polymarket bot is operating at a much smaller scale and a much higher level of language abstraction, but the design rhymes: events at the edges, deterministic synchronous code in the middle.

This is the move the 2025 vendor pitches for event-driven AI agents consistently underplay. Loose coupling for decision-making is a liability. You want the loose coupling around the decision-maker, not inside it.

The event log is not the bus#

The bus lives in memory. The event log is a separate SQLite-backed append-only record of decisions, fills, errors, and reconciliations. It is the thing I look at after a session to understand what happened.

Conflating these two is one of the most common ways event-driven systems go wrong. Greg Young, who coined CQRS and built much of the event-sourcing literature on top of his algorithmic-trading work, has made this point repeatedly in talks and interviews: event sourcing is a state-derivation pattern, and append-only logs are bad at answering queries like “what are all the currently open orders” without a projection. You need both: an in-memory state model fed by the bus for the hot path, and an immutable log for analysis and replay. Kleppmann’s Chapter 11 of Designing Data-Intensive Applications is the longest version of this argument.

Treating the durable log as a side product of the bus, not as the bus itself, made the system debuggable.

A trade-off table for the choices that mattered#

Most of the decisions came down to picking one of two reasonable things. Here is the short version.

Decision	Option A	Option B	Choice
Bus delivery for high-volume market data	Unbounded queue	Coalesce, drop stale	Coalesce
Bus delivery for orders, fills, rejections	Coalesce	Unbounded queue	Unbounded
Risk checks	Topic + handler	Synchronous function call inside executor	Synchronous
Backend submit	Publish, async ack	Direct call, await response	Direct
Durable event log	Replay-the-bus	Separate append-only sink	Separate
Strategy ↔ executor coupling	Direct method calls	Bus topic with explicit command semantics	Bus + command intent
Reconciliation strategy	Block on venue confirmation	Optimistic fill, correct from user-feed later	Optimistic with correction

None of those is novel. Each one has a name in the literature. The work was knowing which name went where.

What broke#

The system did not fall over in production in any spectacular way. The thing that broke was the development process itself.

The first version looked clean because every file was readable in isolation. The problem only surfaced when I tried to add paper trading and a second strategy. Suddenly the strategy knew too much about the executor, the executor owned too much risk logic, and swapping one part meant touching three others.

Most of the code was written with heavy AI assistance. The default the model kept reaching for was direct function calls between modules. Strategies imported the executor and called it directly. Risk checks lived as methods on the executor itself. Reconciliation mutated the executor’s state object in place. Every diff was locally readable, and the call graph was a tarball.

This is the AI-assistant failure mode I now watch for: the model optimises for the local readable change because it is looking at one file at a time. It does not have a reason to enforce architectural seams. It produces code that works in the small and tangles in the large.

How it was fixed#

The fix was discipline, not cleverness.

I split the system into explicit engines: a feed engine, a cache, a strategy engine, an execution engine, a reconciliation service, a boundary service, a resolution service, and a metrics collector. Each engine has a single responsibility and a clean interface. They communicate through the bus by default, and where they communicate directly (executor calling risk checks, executor calling backend submit) the dependency is explicit and injected at construction time.

Dependency injection is the lever. The trading node owns one message bus and one set of engines, and it injects them into each other at startup. Strategies do not import the executor. The executor does not import the venue’s HTTP client; it gets handed a backend object. Risk checks are a separate engine with their own state, not a method on the executor.

The ReconciliationService is the cleanest example of what this discipline buys you. The executor still applies fills optimistically the moment it submits, but the reconciliation service holds onto each one and matches it against the venue’s user_fills topic when the truth arrives:

def register_fill(self, fill: Fill, order: OrderRequest) -> None:
    """Register an assumed fill for future reconciliation."""
    if not fill.clob_order_id:
        return
    buffered = self._buffered.pop(fill.clob_order_id, None)
    if buffered:  # venue beat us
        asyncio.create_task(self.reconcile(fill, buffered, order))
    else:  # we beat the venue
        pending = (fill, order, self._clock.now())
        self._pending[fill.clob_order_id] = pending

Most of the time both buffers are empty within milliseconds. The interesting bugs live in the times they aren’t.

What this does in practice:

Every engine can be unit-tested by handing it a fake bus and a fake set of collaborators. No global state. No “import the whole world to run one test”.
Switching between paper trading and live trading is a backend swap at startup, not a change anywhere downstream.
New strategies are added by writing a class that subscribes to market data and publishes orders. The executor never learns about them.
Reconciliation can be tested with synthesised user-fill streams against a fake executor state.

The architectural lesson I would not have learned without doing this build is that AI-assisted development pushes hardest in the wrong direction at exactly the moment when the right move costs the most to enforce. Every “could you just call this directly” felt cheaper at the time and would have been five times more expensive to undo six months later.

Performance, cost, and reliability notes#

I have not run formal benchmarks I would publish. Qualitatively, the bus comfortably handles the rate of Polymarket order-book updates I see, even with coalescing turned off on the critical topics, because order-book churn on prediction markets is much lower than equities or perpetual futures. The append-only log to SQLite is the only piece I would worry about under sustained load. At high event rates, batched commits or a switch to a log-structured store would matter. None of this is at LMAX scale, and the post would be dishonest if it implied otherwise.

The most reliable single decision was the coalescing policy. The strategies do not need every book update; they need the latest one. Dropping intermediate snapshots when the consumer falls behind keeps the system from queueing useless work and protects the hot path. The opposite call on orders and fills (never drop, ever) keeps the executor’s view of the world consistent with the venue’s.

Where this hybrid shows up in practice#

The same three-layer shape shows up wherever automation systems have to be reliable.

Automation system	Reactive edge (bus / stream)	Synchronous decision layer	Append-only memory
Trading bot	Market data, order book, user fills	Strategy intent → risk → submit	Event log of fills, rejects, reconciliations
Voice AI workflow	ASR partials, call state, agent transcripts	Routing, eligibility, payment decisions	Call transcript log, audit of decisions
Document processing pipeline	Ingestion queues, OCR events, classifier outputs	Extraction rules, validation, escalation	Decision audit trail per document
Internal agent system	Tool-call results, context updates, user inputs	Orchestrator, policy checks, action dispatch	Run history, prompt and response log
RPA / scheduled workflow	Triggers, source-system events, queue items	Business rule engine, idempotency checks	Run log with inputs, outputs, retries

Different stacks, same logical shape. The reactive edge keeps the system responsive. The synchronous layer makes the decisions that have to be right. The append-only memory makes the system explicable after the fact.

Sam Newman’s orchestration-vs-choreography distinction maps onto this directly. Choreography (events flowing between services) lives on the reactive edge. Orchestration (a service that knows the sequence and tells participants what to do) lives in the decision layer. Both are needed, and both belong in different parts of the system. The saga pattern with compensating actions bridges them when state spans services. Backpressure is the unnamed thing the coalescing policy implements.

The 2025 vendor narrative pushes Kafka-shaped event streaming as the architecture for everything an automation system might need. It is half right. The reactive edges of an automation system benefit enormously from streaming. The decision points do not, and treating them as event chains is what Fowler called the passive-aggressive command trap and what Bernd Ruecker has spent years arguing the field keeps reinventing. The durable record is not the bus either, and treating it as the bus loses the ability to query state cheaply.

When I look at the automation systems I have helped teams ship across voice, documents, and internal tooling, the ones that survive past launch look like the trading bot. The ones that struggle look like the vendor pitch.

The shorter version: pick the pattern by the job, not by the marketing.

What I would do differently#

A handful of decisions I would revisit on the next build.

The metrics collector writes to SQLite from inside the executor’s process. That works for a single-node bot. The next version should treat the durable log as an explicit out-of-process sink, even at small scale, because the development cost of swapping it later is non-trivial and the code is the same shape either way.

I leaned on the in-process bus too long before adding tracing. Adding correlation IDs once the system was already running was painful. Adding them on day one, with every published message tagged, would have cost an hour and saved a week.

Risk checks live in the executor’s process. For a single-strategy bot that is fine. For a multi-strategy version sharing one risk envelope, risk should be a separate engine with its own bus topic and its own state. I drew that line at the wrong place the first time.

The boundary service is good. The resolution service is good. The reconciliation service is good. The metrics collector is the one I would re-design from scratch.

If you’re building this#

If you’re building automation that has to make decisions in production, the conversation worth having is which patterns belong in which layer, not which vendor’s stream gets the contract. I help teams move from impressive demos to systems that survive contact with users.

Caveats and claim-safety notes#

Personal project, not exchange-scale infrastructure. The architecture lesson generalises; the throughput numbers do not.
No PnL, latency, or fill-rate figures are claimed. The reliability claims are qualitative.
The “AI-assisted development biases toward function calls” claim is an observation from this build, not a controlled study. Treat it as a hypothesis worth testing on your own.

References#

Canonical / empirical:

Patterns / supporting:

Vendor and AI-context references (cited as the perspective the post pushes back on):

Polymarket / CLOB context:

Most “human-in-the-loop” is escalation done badly: oversight and decision design for AI-adjacent systems.
When to force the LLM, and when to use a button: which layer of a workflow the model should own.

$ git blame ./site/src/content/posts/polymarket-bot-event-driven.mdx Suggest an edit on GitHub

← older Most 'human-in-the-loop' is escalation done badly newer → Why most "judgement is the differentiator" advice fails at pre-PMF (and the loop to steal)