how to quote a solana swap in under 5ms

jupiter is quoting in 400ms. mobula in 5. how? the fetch-on-quote floor. to quote a route on solana you need: the candidate pools for tokenIn → tokenOut, their full state (vault balances, AMM config, CLMM tick bitmap, current sqrt_price, observation slots, meteora bin arrays, pumpfun bonding curve params), the decoded layout per protocol, and the resolved account list to encode the instruction. a 3-hop CLMM route easily references 30–50 accounts. getMultipleAccounts caps at 100 keys per call and is rate-limited by every credible rpc. each call is one network round-trip plus rocksdb i/o on the validator: 80–150ms in the best regional case. you parallelize 2–3 calls, take the max latency under Promise.all, deserialize 30 buffers against IDLs, score n paths through per-protocol math, build the calldata, encode the address lookup tables, send back the http response. the floor isn't a bug in jupiter's code — it's the physical lower bound of any architecture that reads pool state from an rpc at request time. that's why their p50 hovers around 400ms and the p-min never breaks ~300. the move: every active pool, in ram, on every api instance. the pivot is mundane in principle and operationally hard. instead of resolving state when a quote arrives, you keep it resident. every api pod maintains an in-memory store keyed by pool address. lookup is Map.get(address): sub-microsecond, no syscall, no allocation in the hot path. miss falls through to redis, then postgres, then on-chain re-resolution — but the hot path is ram. the hard part is keeping that ram coherent across 30+ api instances, 5000+ swaps/s on solana alone, with no per-message global synchronization. ingest services own the chain — websocket subs, yellowstone grpc, raw account subs for clmm tick updates. they decode each event with the right protocol math, persist to postgres, then publish a delta on a pub/sub channel. every api pod subscribes and updates its local map. the on-the-wire payload is shaped so a consumer can decide whether to parse the message before parsing it — at 5000 swaps/s fanned out across 200+ subscribers, an unconditional JSON.parse on every delivery would burn the entire cpu budget of every api pod. early-skip is the difference between viable and not. routing once state is local. phase 1, enumeration: for tokenIn → tokenOut, intersect the pool index by side. paths can go through any intermediate token in principle, but in practice you generate length 1, 2, 3 paths through the usual majors (SOL, USDC, USDT) and prune by combined liquidity to keep the candidate set bounded. exotic intermediates get tried only when no major-routed path clears the threshold. phase 2, scoring: simulate each path hop-by-hop with the protocol-correct math. raydium clmm walks the tick bitmap with computeSwapStep, ceiling-rounds amount_in and floor-rounds amount_out per the cl-amm spec; pumpfun applies the bonding curve as f(supply, sol_in); meteora dlmm steps bin by bin; constant-product is x·y = k. all of this is straight arithmetic on data already in cache lines — no awaits, no rpc. phase 3, splits: for parallel multi-leg routes, ternary search over basis-point allocations. seven iterations converges within 1bp on smooth curves; for clmm with active liquidity discontinuities, a finer linear sweep on the suspect range catches the kinks. warm: 2–4ms end-to-end through all three phases. cold (first quote after boot, lazy-load pool fan-in): <50ms, dominated by parallel redis hgetall plus postgres for the long tail. calldata without rpc. the versioned transaction needs every account address pre-encoded. for each pool in the route, the cache already holds vault A, vault B, AMM config, observation, tick array PDAs, etc. — zero getAccountInfo. the three things you can't pin in ram, you keep on background refreshers: • recent blockhash — refreshed every 45s. a blockhash is valid for ~150 slots (~60s); 45s sits comfortably inside the window with margin for clock skew. • priority fee — p75 of the prioritization fees from the last 150 slots, refreshed every 30s. handed to ComputeBudgetProgram.setComputeUnitPrice. • address lookup tables — protocol-published ALTs loaded at boot, occasionally refreshed. a greedy set-cover picks the minimum ALT subset covering the route's accounts so the encoded tx stays under the 1232-byte v0 limit. the router instruction itself uses a compact binary layout — 21-byte header, 3 bytes per hop — so the calldata for a 3-hop route fits in ~50 bytes excluding accounts. construction takes ~1ms, dominated by min_out computation and the encoder. the principle. latency in a quote pipeline is bounded by the latency of the data it consumes. the quote/build phase is entirely an architectural choice. fetch-on-quote plateaus at 300ms. ram-resident state with realtime pubsub propagation runs at 3–5ms. the underlying principle is one line: you cannot be fast at consuming a piece of data you only go fetch when someone asks for it.

More in Tech

Sacha Delhoux