Real-Time Ride-Hailing Architecture: Uber & Grab

Answer-first: This series covers the high-concurrency real-time architecture required to power ride-hailing services like Uber and Grab, discussing GPS ingestion, spatial indexing, Kafka event streams, dispatch matching, dynamic pricing, and WebSocket gateways.

This series dives deep into the technical architecture behind the most critical feature of ride-hailing applications: Real-time capabilities.

Seeing a car move smoothly on a map might seem simple, but behind it lies a massive distributed network: from battery-optimized GPS transport protocols, map gridding algorithms using hexagons (H3), the Kafka backbone processing millions of events per second, the DISCO system for optimal ride matching, to RAMEN — Uber’s real-time notification push network.

All content is synthesized from the official engineering blogs of Uber, Grab, and Lyft.

Series Contents

Implementation Deep Dive

Building on Part 5’s theory with a full architectural implementation:

Surge Pricing Algorithm & Spatial Indexing Architecture — End-to-end implementation of a surge pricing engine: H3 hex grid demand/supply aggregation, Kafka real-time event pipeline, Redis geospatial caching, and multiplier computation at sub-50ms latency.

Executive Summary — The Big Picture of Real-time Ride-Hailing Systems

Answer-first: Real-time ride-hailing systems process millions of location updates and matching requests per second. Using highly optimized ingestion pipelines, H3 geospatial indexing, and real-time Kafka event streams, these architectures match riders and drivers under 100ms. The Engineering Challenge Imagine you are an engineer at Uber or Grab. Your system must: Ingest GPS coordinates from millions of drivers every 4 seconds. Store and index all these positions in memory to query them in under 10ms. When a user requests a ride, find and rank the best drivers within a few kilometers, calculate the Estimated Time of Arrival (ETA) based on real-time traffic, and push the ride offer to the driver’s phone instantly — all within 2 seconds. Simultaneously, continuously calculate dynamic pricing (surge pricing) based on the supply-demand ratio in each area, updating every few seconds. This is not a typical CRUD application. It is one of the most complex distributed systems in the world. ...

GPS Ingestion at Scale: gRPC Streaming, MQTT & Kalman Filter

Answer-first: High-concurrency location ingestion processes millions of concurrent GPS pings by using lightweight transport protocols (gRPC or MQTT) terminated at scalable load balancers. Routing updates to Apache Kafka and caching active coordinates in-memory (Redis) achieves sub-100ms end-to-end latency and protects core systems from write bottlenecks. The Challenge: Millions of Drivers, Every 4 Seconds Grab has approximately 5 million drivers operating in Southeast Asia. Uber has over 5 million drivers globally. If every driver sends a GPS coordinate every 4 seconds, the system must receive: ...

Uber H3 Geospatial Indexing: Find Nearest Driver in <100ms with Redis (Production Guide)

Answer-first: Uber and Grab find the nearest available driver in under 100ms by dividing the Earth’s surface into hexagonal cells (H3 index at Resolution 8, each ~0.74 km²). Instead of calculating distance to every driver, they look up only the 7 cells nearest to the rider — reducing millions of comparisons to dozens. The Problem: Finding a Needle in a Haystack When you tap “Book” on Grab, the system must find the most suitable driver within a radius of a few kilometers. But the system is tracking millions of drivers simultaneously. The naive approach — calculating the distance from you to every driver — is impossible: ...

Kafka & Flink in Ride-Hailing: Event Streaming at Scale

Answer-first: Apache Kafka and Flink form the real-time event streaming backbone of ride-hailing architectures. By partitioning stream topics by location and processing events in stateful sliding windows, the system routes GPS locations and matches rides with sub-second latency. Why Do We Need Event Streaming? Millions of events occur every second in a ride-hailing system: Driver A updates their GPS coordinates. Customer B opens the app and requests a ride. Driver C accepts a ride offer and starts moving. Customer D cancels a ride. Surge pricing updates the multiplier in the Downtown area. If every service called each other directly (synchronous communication), the system would become tightly coupled and fragile — one slow service would bring down the entire chain. The solution is Event Streaming: every event is pushed into a central “pipeline,” and services independently subscribe to listen to the events they care about. ...

Ride-Hailing Dispatch Engine: Bipartite Matching, Uber DISCO & Grab DispatchGym (2026)

Answer-first: Dispatch and matching engines resolve spatial routing in real-time by querying active drivers within localized H3 rings. By running parallel bipartite matching algorithms, the engine pairs riders and drivers to minimize pickup ETA and passenger wait times. Every time you tap “Book Ride,” a system makes dozens of decisions in under two seconds: Which driver? What route? What’s the real ETA? This article breaks down exactly how the dispatch algorithm works — from the greedy approach that fails at scale, to the bipartite graphs, batched matching, and surge pricing mechanics that power Uber, Lyft, Grab, and Gojek today. ...

Surge Pricing Algorithm: Real-Time Surge Rate Calculation

Answer-first: Surge pricing engines compute dynamic multipliers in real-time by analyzing supply-demand ratios within H3 hex cells. These engines ingest location data to update prices dynamically, balancing market availability during peak demand hours. Series context: This is Part 5 of the Real-Time Ride-Hailing Architecture series. For location ingestion and geospatial indexing, start at Part 1. What is Surge Multiplier (Surge Rate)? Surge Multiplier Meaning: A surge multiplier (or surge rate) is a dynamic price multiplier (e.g., 2.0×) automatically applied by ride-hailing platforms in real-time when the demand for rides in a specific geographic zone exceeds the available supply of drivers. For example, if the base fare is $10 and the surge multiplier is 2.0x, the rider pays $20. This multiplier is recalculated every 30–60 seconds for each localized zone (H3 hexagon cell) using Machine Learning models. ...

Uber RAMEN: Real-Time Push to Millions of Devices

Answer-first: Scaling real-time dispatch pushes requires a stateful WebSocket gateway layer that maintains millions of persistent TCP connections. Terminating mTLS at high-performance reverse proxies (Envoy) and tracking socket locations in a distributed Redis connection registry allows backend dispatchers to push targeted ride offers under 10ms. The Problem: Pushing Instant Notifications to Millions of Devices When DISCO decides to match you with Driver John Doe, the system must: Send the ride offer to exactly John Doe’s phone (out of millions of connected phones). Deliver it in milliseconds (not seconds). Ensure the driver receives it even if their 4G connection is weak. Simultaneously push the driver’s location back to your app so you can watch the car move on the map. There are two main approaches: Polling (asking continuously) and Push (proactively sending). ...

Series Contents#

Implementation Deep Dive#

Series Contents

Implementation Deep Dive