Inside Cortex: the engineering behind sub-50ms, 99.995% uptime infrastructure

Last updated: March 10, 20265 min read

Reliable blockchain connectivity is non-negotiable. Cortex is the intelligent blockchain engine powering the Alchemy platform: RPC APIs, data APIs, rollups, and the infrastructure behind the rest of the developer platform.

It was built from first principles, trained on trillions of requests and more than seven years of data, with one goal: deliver infrastructure that performs reliably at any scale.

The result is sub-50ms average response times, 99.995% uptime, and throughput that scales to tens of millions of users without migrations, new tooling, or configuration changes on your end.

This post takes a closer look at the engineering decisions behind Cortex: the systems-level work across latency, throughput, and reliability that makes those numbers possible.

How does Cortex reduce latency?

Pod colocation with kubernetes affinity

When your app sends a request, that request travels through several internal services before it gets a response. Every time it moves from one server to another, it pays a latency cost.

Pod colocation means placing the services that communicate most frequently on the same physical machine. Kubernetes supports affinity rules that make this possible, so latency-sensitive workloads run together instead of spreading across machines or racks.

The result: requests stay local, which reduces network hops and cuts response time.

Istio locality-aware routing

Even within a single data center, services are distributed across nodes and racks. Istio is the networking layer that manages how internal services communicate.

With locality-aware routing, Istio knows the physical location of each service instance and routes traffic to the closest available one whenever possible. If the service handling your request is running on the same node, traffic stays there instead of crossing the network to a farther instance.

This reduces unnecessary network hops without requiring any changes to application code.

Microkernel proxy architecture

As infrastructure grows in complexity, tightly coupled systems become harder to change safely. Updating one component can introduce risk across the rest of the stack.

A microkernel approach keeps the core small and stable, responsible only for the critical path. Everything else, including logging, rate limiting, authentication, and routing logic, is implemented as modular components layered on top.

Our proxy layer sits between incoming requests and backend systems. Because it is built on a microkernel model, individual modules can be updated and deployed independently without touching the core request path. That makes it easier to iterate quickly while reducing the risk of regressions.

Direct-to-datacenter routing

For enterprise workloads, routing traffic over the public internet introduces variability: inconsistent latency, packet loss, and congested paths that shift based on conditions outside your control.

Direct-to-datacenter routing avoids that by sending enterprise traffic over private, optimized network paths that connect directly to our infrastructure. This removes the unpredictability of shared public routing and delivers more consistent low-latency performance at every hop.

How does Cortex scale?

Thousands of globally deployed bare-metal servers

Most cloud infrastructure runs on virtual machines. Virtualization is flexible, but it adds overhead because CPU, memory bandwidth, and I/O are shared with other workloads on the same host.

Bare-metal means running directly on physical hardware with no hypervisor in between. Every resource is dedicated to your requests. That removes the performance variability of shared environments and gives Cortex consistent throughput under heavy load.

Java virtual threads

Handling large numbers of concurrent requests puts pressure on a server's threading model. Traditional OS-level threads are memory-intensive, and the cost of managing context switching grows as concurrency rises.

Java Virtual Threads, introduced in Java 21, are lightweight threads managed by the JVM rather than the operating system. The cost of creating and switching between them is far lower than traditional threads, which lets a single machine handle far more concurrent tasks without the memory and CPU overhead that usually limits throughput.

In practice, that means more efficient use of existing hardware and more headroom to absorb traffic spikes without degrading performance.

Predictive scaling logic

Standard auto-scaling watches a threshold like CPU usage and adds capacity after that threshold is crossed. That makes it reactive by design: demand rises first, infrastructure catches up second.

Our scaling logic monitors multiple signals at once, including CPU, memory, disk I/O, and traffic patterns, then uses historical data to anticipate spikes before they happen. Capacity is provisioned ahead of demand instead of after it, which narrows the gap between a traffic increase and the infrastructure's ability to absorb it.

AI-managed node fleet

Our node fleet is managed by a fully automated agentic system that handles upgrades, testing, and monitoring. Updates are detected and applied in real time, with automated testing built into the pipeline, so mandatory network upgrades and hard forks are never missed.

That improves uptime and removes the risk of human delay when critical chain events happen.

Why do these systems compound?

These are not isolated optimizations. Pod colocation reduces latency. Locality-aware routing reduces it further. The microkernel architecture makes it possible to improve both without introducing instability. Bare-metal hardware gives Java Virtual Threads the resources they need to operate efficiently. Predictive scaling makes sure capacity is available before it is needed.

Each layer was designed to make the layers above it faster and more resilient. That is what it means to rethink infrastructure from first principles, and that is what Cortex is built on.

All of this runs under the hood. There is no migration, no new tooling, and no configuration required on your end. We keep building the infrastructure so you can keep building your product.

Start building today and contact sales for custom pricing, integration questions, and more.

Frequently asked questions

What is Alchemy Cortex?

Cortex is the intelligent blockchain engine powering Alchemy's developer platform, including RPC APIs, data APIs, rollups, and infrastructure. It was built from first principles and trained on trillions of requests and over seven years of data to deliver sub-50ms response times, 99.995% uptime, and throughput that scales without requiring migrations or configuration changes.

How does Cortex achieve sub-50ms response times?

Cortex uses pod colocation with Kubernetes affinity to place frequently communicating services on the same physical machine, Istio locality-aware routing to direct traffic to the closest service instance, a microkernel proxy architecture for stable core performance, and direct-to-datacenter routing for enterprise traffic over private networks.

What is Cortex's microkernel proxy architecture?

The microkernel approach keeps the core request-handling system small and stable, while logging, rate limiting, authentication, and routing logic are implemented as modular components that can be updated independently without touching the critical path or introducing risk across the stack.

How does Cortex handle scaling differently?

Cortex uses predictive scaling logic that monitors multiple signals including CPU, memory, disk I/O, and traffic patterns, then uses historical data to provision capacity ahead of demand spikes instead of reacting after thresholds are crossed.

Why does Cortex use bare-metal servers instead of virtual machines?

Bare-metal servers run directly on physical hardware without a hypervisor, dedicating every resource to requests and removing the performance variability and overhead of shared virtualized environments, which delivers consistent throughput under heavy load.

What are Java Virtual Threads and why does Cortex use them?

Java Virtual Threads are lightweight threads managed by the JVM rather than the operating system, with far lower creation and context-switching costs than traditional threads, allowing a single machine to handle more concurrent tasks without the memory and CPU overhead that typically limits throughput.

How does Cortex manage blockchain node upgrades?

Cortex uses a fully automated agentic system that handles upgrades, testing, and monitoring in real time, with automated testing built into the pipeline so mandatory network upgrades and hard forks are never missed, improving uptime and removing human delay risk.

Do I need to migrate or configure anything to use Cortex?

No migration, new tooling, or configuration is required on your end, Cortex runs under the hood and delivers infrastructure improvements automatically while you continue building your product.

Alchemy Newsletter

Be the first to know about releases

Sign up for our newsletter

Get the latest product updates and resources from Alchemy

Over 80,000 subscribers

By entering your email address, you agree to receive our marketing communications and product updates. You acknowledge that Alchemy processes the information we receive in accordance with our Privacy Notice. You can unsubscribe anytime.

Blog post page related articles background

Technical

Migrating from Sim to Alchemy's Data APIs

Dune is retiring the Sim API on August 1, 2026. Alchemy's Data APIs cover the most common Sim migration paths across EVM chains and Solana.

Technical

Solana Agent Kit vs GOAT vs ElizaOS: which framework should you use?

Solana Agent Kit vs GOAT vs ElizaOS compared: Solana-native depth, multi-chain breadth, or full agent runtime. Code examples and a decision framework.

How to build onchain agents with wallets, x402 payments, and real-time data

Technical

How to build onchain agents: wallets, payments, and real-time data

How to build onchain agents: a chain-agnostic guide to giving AI agents secure wallets, x402-based autonomous payments, and real-time blockchain data with Alchemy.

Inside Cortex: the engineering behind sub-50ms, 99.995% uptime infrastructure

How does Cortex reduce latency?

Pod colocation with kubernetes affinity

Istio locality-aware routing

Microkernel proxy architecture

Direct-to-datacenter routing

How does Cortex scale?

Thousands of globally deployed bare-metal servers

Java virtual threads

Predictive scaling logic

AI-managed node fleet

Why do these systems compound?

Frequently asked questions

What is Alchemy Cortex?

How does Cortex achieve sub-50ms response times?

What is Cortex's microkernel proxy architecture?

How does Cortex handle scaling differently?

Why does Cortex use bare-metal servers instead of virtual machines?

What are Java Virtual Threads and why does Cortex use them?

How does Cortex manage blockchain node upgrades?

Do I need to migrate or configure anything to use Cortex?

Alchemy Newsletter

Sign up for our newsletter

Related articles

Migrating from Sim to Alchemy's Data APIs

Solana Agent Kit vs GOAT vs ElizaOS: which framework should you use?

How to build onchain agents: wallets, payments, and real-time data