Inside Cortex: The Engineering Behind Sub-50ms, 99.995% Uptime Infrastructure
Author: Alchemy

Reliable blockchain connectivity is non-negotiable. Cortex is the intelligent blockchain engine powering the Alchemy platform: RPC APIs, data APIs, rollups, and the infrastructure behind the rest of the developer platform.
It was built from first principles, trained on trillions of requests and more than seven years of data, with one goal: deliver infrastructure that performs reliably at any scale.
The result is sub-50ms average response times, 99.995% uptime, and throughput that scales to tens of millions of users without migrations, new tooling, or configuration changes on your end.
This post takes a closer look at the engineering decisions behind Cortex: the systems-level work across latency, throughput, and reliability that makes those numbers possible.
How does Cortex reduce latency?
Pod colocation with Kubernetes affinity
When your app sends a request, that request travels through several internal services before it gets a response. Every time it moves from one server to another, it pays a latency cost.
Pod colocation means placing the services that communicate most frequently on the same physical machine. Kubernetes supports affinity rules that make this possible, so latency-sensitive workloads run together instead of spreading across machines or racks.
The result: requests stay local, which reduces network hops and cuts response time.
Istio locality-aware routing
Even within a single data center, services are distributed across nodes and racks. Istio is the networking layer that manages how internal services communicate.
With locality-aware routing, Istio knows the physical location of each service instance and routes traffic to the closest available one whenever possible. If the service handling your request is running on the same node, traffic stays there instead of crossing the network to a farther instance.
This reduces unnecessary network hops without requiring any changes to application code.
Microkernel proxy architecture
As infrastructure grows in complexity, tightly coupled systems become harder to change safely. Updating one component can introduce risk across the rest of the stack.
A microkernel approach keeps the core small and stable, responsible only for the critical path. Everything else, including logging, rate limiting, authentication, and routing logic, is implemented as modular components layered on top.
Our proxy layer sits between incoming requests and backend systems. Because it is built on a microkernel model, individual modules can be updated and deployed independently without touching the core request path. That makes it easier to iterate quickly while reducing the risk of regressions.
Direct-to-datacenter routing
For enterprise workloads, routing traffic over the public internet introduces variability: inconsistent latency, packet loss, and congested paths that shift based on conditions outside your control.
Direct-to-datacenter routing avoids that by sending enterprise traffic over private, optimized network paths that connect directly to our infrastructure. This removes the unpredictability of shared public routing and delivers more consistent low-latency performance at every hop.
How does Cortex scale?
Thousands of globally deployed bare-metal servers
Most cloud infrastructure runs on virtual machines. Virtualization is flexible, but it adds overhead because CPU, memory bandwidth, and I/O are shared with other workloads on the same host.
Bare-metal means running directly on physical hardware with no hypervisor in between. Every resource is dedicated to your requests. That removes the performance variability of shared environments and gives Cortex consistent throughput under heavy load.
Java Virtual Threads
Handling large numbers of concurrent requests puts pressure on a server's threading model. Traditional OS-level threads are memory-intensive, and the cost of managing context switching grows as concurrency rises.
Java Virtual Threads, introduced in Java 21, are lightweight threads managed by the JVM rather than the operating system. The cost of creating and switching between them is far lower than traditional threads, which lets a single machine handle far more concurrent tasks without the memory and CPU overhead that usually limits throughput.
In practice, that means more efficient use of existing hardware and more headroom to absorb traffic spikes without degrading performance.
Predictive scaling logic
Standard auto-scaling watches a threshold like CPU usage and adds capacity after that threshold is crossed. That makes it reactive by design: demand rises first, infrastructure catches up second.
Our scaling logic monitors multiple signals at once, including CPU, memory, disk I/O, and traffic patterns, then uses historical data to anticipate spikes before they happen. Capacity is provisioned ahead of demand instead of after it, which narrows the gap between a traffic increase and the infrastructure's ability to absorb it.
AI-managed node fleet
Our node fleet is managed by a fully automated agentic system that handles upgrades, testing, and monitoring. Updates are detected and applied in real time, with automated testing built into the pipeline, so mandatory network upgrades and hard forks are never missed.
That improves uptime and removes the risk of human delay when critical chain events happen.
Why do these systems compound?
These are not isolated optimizations. Pod colocation reduces latency. Locality-aware routing reduces it further. The microkernel architecture makes it possible to improve both without introducing instability. Bare-metal hardware gives Java Virtual Threads the resources they need to operate efficiently. Predictive scaling makes sure capacity is available before it is needed.
Each layer was designed to make the layers above it faster and more resilient. That is what it means to rethink infrastructure from first principles, and that is what Cortex is built on.
All of this runs under the hood. There is no migration, no new tooling, and no configuration required on your end. We keep building the infrastructure so you can keep building your product.
Start building today and contact sales for custom pricing, integration questions, and more.
Alchemy Newsletter
Be the first to know about releases
Sign up for our newsletter
Get the latest product updates and resources from Alchemy
By entering your email address, you agree to receive our marketing communications and product updates. You acknowledge that Alchemy processes the information we receive in accordance with our Privacy Notice. You can unsubscribe anytime.
Related articles

Zero Fees, Infinite Complexity: Building Gasless Transactions at Scale
Consumer apps need blockchain to feel like any other payment experience. The biggest obstacle is a fee layer most users will never understand. Here's how we remove it entirely.

Inside Alchemy's Enterprise-Grade Security Infrastructure
When enterprises evaluate blockchain providers, security is paramount. Here's how our security team—built from experts at major banks, federal agencies, and leading cloud providers—delivers infrastructure that meets enterprise standards.

Introducing Prices API
Save development time with our new API for real-time and historical token prices.