how to quote a solana swap in under 5ms
jupiter is quoting in 400ms.
mobula in 5.
how?
the fetch-on-quote floor.
to quote a route on solana you need: the candidate pools for tokenIn → tokenOut, their full state (vault balances, AMM config, CLMM tick bitmap, current sqrt_price, observation slots, meteora bin arrays, pumpfun bonding curve params), the decoded layout per protocol, and the resolved account list to encode the instruction. a 3-hop CLMM route easily references 30–50 accounts.
getMultipleAccounts caps at 100 keys per call and is rate-limited by every credible rpc. each call is one network round-trip plus rocksdb i/o on the validator: 80–150ms in the best regional case. you parallelize 2–3 calls, take the max latency under Promise.all, deserialize 30 buffers against IDLs, score n paths through per-protocol math, build the calldata, encode the address lookup tables, send back the http response. the floor isn't a bug in jupiter's code — it's the physical lower bound of any architecture that reads pool state from an rpc at request time. that's why their p50 hovers around 400ms and the p-min never breaks ~300.
the move: every active pool, in ram, on every api instance.
the pivot is mundane in principle and operationally hard. instead of resolving state when a quote arrives, you keep it resident. every api pod maintains an in-memory store keyed by pool address. lookup is Map.get(address): sub-microsecond, no syscall, no allocation in the hot path. miss falls through to redis, then postgres, then on-chain re-resolution — but the hot path is ram.
the hard part is keeping that ram coherent across 30+ api instances, 5000+ swaps/s on solana alone, with no per-message global synchronization. ingest services own the chain — websocket subs, yellowstone grpc, raw account subs for clmm tick updates. they decode each event with the right protocol math, persist to postgres, then publish a delta on a pub/sub channel. every api pod subscribes and updates its local map. the on-the-wire payload is shaped so a consumer can decide whether to parse the message before parsing it — at 5000 swaps/s fanned out across 200+ subscribers, an unconditional JSON.parse on every delivery would burn the entire cpu budget of every api pod. early-skip is the difference between viable and not.
routing once state is local.
phase 1, enumeration: for tokenIn → tokenOut, intersect the pool index by side. paths can go through any intermediate token in principle, but in practice you generate length 1, 2, 3 paths through the usual majors (SOL, USDC, USDT) and prune by combined liquidity to keep the candidate set bounded. exotic intermediates get tried only when no major-routed path clears the threshold.
phase 2, scoring: simulate each path hop-by-hop with the protocol-correct math. raydium clmm walks the tick bitmap with computeSwapStep, ceiling-rounds amount_in and floor-rounds amount_out per the cl-amm spec; pumpfun applies the bonding curve as f(supply, sol_in); meteora dlmm steps bin by bin; constant-product is x·y = k. all of this is straight arithmetic on data already in cache lines — no awaits, no rpc.
phase 3, splits: for parallel multi-leg routes, ternary search over basis-point allocations. seven iterations converges within 1bp on smooth curves; for clmm with active liquidity discontinuities, a finer linear sweep on the suspect range catches the kinks.
warm: 2–4ms end-to-end through all three phases. cold (first quote after boot, lazy-load pool fan-in): <50ms, dominated by parallel redis hgetall plus postgres for the long tail.
calldata without rpc.
the versioned transaction needs every account address pre-encoded. for each pool in the route, the cache already holds vault A, vault B, AMM config, observation, tick array PDAs, etc. — zero getAccountInfo. the three things you can't pin in ram, you keep on background refreshers:
• recent blockhash — refreshed every 45s. a blockhash is valid for ~150 slots (~60s); 45s sits comfortably inside the window with margin for clock skew.
• priority fee — p75 of the prioritization fees from the last 150 slots, refreshed every 30s. handed to ComputeBudgetProgram.setComputeUnitPrice.
• address lookup tables — protocol-published ALTs loaded at boot, occasionally refreshed. a greedy set-cover picks the minimum ALT subset covering the route's accounts so the encoded tx stays under the 1232-byte v0 limit.
the router instruction itself uses a compact binary layout — 21-byte header, 3 bytes per hop — so the calldata for a 3-hop route fits in ~50 bytes excluding accounts. construction takes ~1ms, dominated by min_out computation and the encoder.
the principle.
latency in a quote pipeline is bounded by the latency of the data it consumes. the quote/build phase is entirely an architectural choice. fetch-on-quote plateaus at 300ms. ram-resident state with realtime pubsub propagation runs at 3–5ms. the underlying principle is one line: you cannot be fast at consuming a piece of data you only go fetch when someone asks for it.