Kafka Worker Pool in Go — Backpressure & Exactly-Once

Q: "What is the difference between Kafka and RabbitMQ?"

"Kafka is a distributed log — messages persist indefinitely, consumers manage offsets, replay is possible, throughput is in millions/s. RabbitMQ is a message broker — messages deleted after ACK, broker handles complex routing, push-based. Choose Kafka for event sourcing, audit trails, and fan-out to multiple consumers. Choose RabbitMQ for task queues, request-reply, and complex routing patterns."

Q: "How do you implement backpressure in Go?"

"Use a bounded buffered channel. When the channel is full, the sender blocks — this is natural backpressure. Combine with select { case jobChan \u0026lt;- msg: default: // backpressure handling } for non-blocking sends with explicit backpressure logic (e.g., pause Kafka consumption, increment a metrics counter)."

Q: "How do you guarantee Exactly-Once in Kafka?"

"True end-to-end exactly-once for external side effects (DB writes, API calls) requires an idempotent consumer: store the Kafka offset and business data in the same DB transaction. If the consumer crashes and restarts, the duplicate message is detected via the offset check and safely skipped."

Prerequisite: Part 5 of the System Design Masterclass. Read Part 4: Database Scaling first.

Kafka Worker Pool in Go — Backpressure & Exactly-Once

Executive Summary & Quick Answer: High-throughput event streaming in Go leverages Kafka zero-copy sendfile() kernel transfers combined with bounded goroutine worker pools. Natural backpressure is achieved using buffered Go channels, while partition-pinned workers preserve message ordering without distributed locks.
Key Takeaways:
Zero-Copy Performance: Kafka bypasses user-space buffer copies via sendfile(), routing data directly from Linux page cache to network socket buffers.
Channel Backpressure: Bounded Go channels automatically throttle poll loops when downstream workers reach memory capacity limits.
Partition-Aware Ordering: Pinning specific Kafka partition IDs to dedicated worker goroutines maintains strict message sequence guarantees.

What You’ll Learn That AI Won’t Tell You

Kernel-Level sendfile() Mechanics: How zero-copy I/O bypasses the context switches between user and kernel space, preventing CPU cache invalidation.
Worker Pool Partition Pinning: Why mapping partitions to specific workers is the only way to maintain order processing sequences without locking.
Offset Commit Transaction Math: Implementing transactional offset commits inside Go consumers to guarantee idempotency under broker rebalances.

Kafka vs RabbitMQ — When to Use Each?

Key Concept: Kafka is a distributed commit log — messages are retained indefinitely, consumers manage their own offsets, and replay is possible. RabbitMQ is a message broker — messages are deleted after acknowledgment, the broker handles routing complexity, push-based delivery. They solve different problems.

Architectural Comparison

Property	Apache Kafka	RabbitMQ
Message Model	Distributed commit log (append-only, immutable)	Message broker (queue/exchange, mutable)
Message Retention	Configurable (default 7 days, can be indefinite)	Deleted after ACK
Delivery Model	Pull — consumers poll, manage offsets	Push — broker delivers to consumer
Ordering Guarantee	Within a partition only	Not guaranteed (multiple consumers)
Throughput	Millions of messages/s (zero-copy kernel optimization)	~100k messages/s
Replay	✅ Yes — rewind offset to any position	❌ No — ACK’d messages are gone
Routing	Topic + partition (simple)	Exchange types: direct, fanout, topic, headers
Best Use Case	Event sourcing, stream processing, audit log, fan-out	Task queue, RPC, complex routing, work queues

[!NOTE] Use both together. Shopee uses Kafka for order event streaming (audit trail, analytics fan-out, replay capability) and RabbitMQ for inventory task queues (worker-based processing, dead-letter queue retry). They solve different problems — not competitors.

Kafka Zero-Copy Internals — Why Kafka Is So Fast

Kernel Level Strategy: Kafka achieves extreme throughput using the sendfile() system call — zero-copy data transfer from the OS page cache directly to the NIC socket buffer, bypassing user space completely. Combined with sequential disk writes and sparse index lookups, Kafka eliminates most CPU and memory copy overhead.

Traditional I/O vs Zero-Copy

graph LR
    subgraph traditional["Traditional I/O (4 copies, 4 context switches)"]
        D1[Disk] -->|"DMA copy"| KC1[Kernel Page Cache]
        KC1 -->|"CPU copy"| US1[User Buffer]
        US1 -->|"CPU copy"| SK1[Socket Buffer]
        SK1 -->|"DMA copy"| NIC1[NIC]
    end

    subgraph zerocopy["Zero-Copy sendfile() (2 DMA copies, 0 CPU copies)"]
        D2[Disk] -->|"DMA copy"| KC2[Kernel Page Cache]
        KC2 -->|"DMA scatter-gather"| NIC2[NIC]
    end

Traditional: 4 memory copies + 4 user/kernel context switches.
Zero-copy: 0 CPU copies + 2 DMA copies + 2 context switches.
Real-world impact: 2–4× throughput improvement on I/O-bound workloads.

Sparse Index — Fast Offset Lookup Without Full Scan

Kafka doesn’t index every message. It maintains a sparse index mapping every $X$ bytes of log data to a file offset:

.index file (sparse):
  Offset 0       → File position 0
  Offset 1,234   → File position 4,096
  Offset 2,468   → File position 8,192

.log file:
  [Message offset=0]
  [Message offset=1]
  ...
  [Message offset=1,234]  ← Jump here via binary search on .index

Lookup: binary search on .index file → sequential scan from closest entry. Combining O(log N) index search with O(M) sequential scan where M is tiny (bounded by index density).

Implementing Backpressure in Go

This practical Implementing Backpressure in Go section details production-grade Go code, middleware setup, and architectural patterns designed to ensure high performance and system resilience under peak load.

Go Design Pattern: Backpressure in Go is implemented naturally via buffered channels — when the buffer is full, the sender blocks, propagating pressure back to the upstream producer. Combined with a bounded worker pool, the system automatically throttles ingest when consumers are slower than producers.

Bounded Worker Pool Pattern

package kafka

import (
    "context"
    "fmt"
    "log"
    "sync"
    "time"
)

type Message struct {
    Key       string
    Value     []byte
    Partition int32
    Offset    int64
}

// StartWorkerPool creates a bounded pool with natural backpressure
// workers: number of concurrent goroutines
// bufferSize: channel buffer size — when full, sends to jobChan block (backpressure)
func StartWorkerPool(
    ctx context.Context,
    workers int,
    bufferSize int,
    process func(ctx context.Context, msg Message) error,
) chan<- Message {
    jobChan := make(chan Message, bufferSize) // Buffered channel = backpressure mechanism

    var wg sync.WaitGroup
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func(workerID int) {
            defer wg.Done()
            for {
                select {
                case <-ctx.Done():
                    return
                case msg, ok := <-jobChan:
                    if !ok {
                        return
                    }
                    if err := process(ctx, msg); err != nil {
                        log.Printf("worker %d: failed to process offset=%d: %v",
                            workerID, msg.Offset, err)
                        // Production: send to dead-letter queue
                    }
                }
            }
        }(i)
    }

    go func() { wg.Wait() }()
    return jobChan
}

// KafkaConsumerLoop feeds messages into the worker pool
func KafkaConsumerLoop(ctx context.Context, jobChan chan<- Message) {
    msgOffset := int64(0)
    for {
        select {
        case <-ctx.Done():
            return
        default:
            // Simulate a Kafka poll batch
            for i := 0; i < 10; i++ {
                msg := Message{
                    Key:    fmt.Sprintf("order-%d", msgOffset),
                    Value:  []byte(`{"event":"order_created"}`),
                    Offset: msgOffset,
                }
                select {
                case jobChan <- msg:
                    msgOffset++
                case <-ctx.Done():
                    return
                }
            }
        }
    }
}

[!IMPORTANT] Partition ordering constraint: Kafka guarantees ordering within a single partition only. A generic worker pool processes messages from the same partition in arbitrary order (any free worker picks up the next message). If ordering matters (e.g., ORDER_CREATED must be processed before ORDER_CANCELLED for the same order), use a partition-aware pool that assigns one dedicated goroutine per partition.

Partition-Aware Ordered Worker Pool

// OrderedPartitionWorkerPool: each partition → one dedicated goroutine
// Guarantees in-order processing within each partition
type OrderedPartitionWorkerPool struct {
    mu             sync.RWMutex
    partitionChans map[int32]chan Message
}

func (p *OrderedPartitionWorkerPool) Submit(
    ctx context.Context,
    msg Message,
    process func(ctx context.Context, msg Message) error,
) {
    p.mu.Lock()
    ch, exists := p.partitionChans[msg.Partition]
    if !exists {
        ch = make(chan Message, 100)
        p.partitionChans[msg.Partition] = ch
        // Spawn a dedicated goroutine for this partition
        go func(partCh <-chan Message) {
            for m := range partCh {
                process(ctx, m) // Sequential processing — ordering guaranteed
            }
        }(ch)
    }
    p.mu.Unlock()

    ch <- msg
}

Exactly-Once Semantics in Kafka

Reliability Standard: Kafka Exactly-Once Semantics requires an idempotent producer (prevents duplicate publishes) plus a consumer that commits the Kafka offset atomically with the business operation. True exactly-once end-to-end requires an idempotency key for any side effects outside Kafka.

At-Least-Once Is the Default

graph LR
    Consumer -->|"1. DB write SUCCESS"| DB
    Consumer -->|"2. Crash before offset commit"| X[💥 Crash]
    Consumer -->|"3. Restart: re-reads same offset"| Kafka
    Consumer -->|"4. DB write DUPLICATE!"| DB

    style X fill:#f8d7da,stroke:#dc3545

Exactly-Once via Transactional Offset Commit

The production pattern: save the Kafka offset in the same DB transaction as the business write:

package consumer

import (
    "context"
    "database/sql"
    "fmt"
    "log"
)

type OrderEventConsumer struct {
    db *sql.DB
}

// ProcessOrderEvent — Exactly-Once via transactional offset storage
func (c *OrderEventConsumer) ProcessOrderEvent(
    ctx context.Context,
    partition int32,
    offset int64,
    orderJSON []byte,
) error {
    tx, err := c.db.BeginTx(ctx, nil)
    if err != nil {
        return fmt.Errorf("begin tx: %w", err)
    }
    defer tx.Rollback()

    // 1. Check idempotency — has this offset already been processed?
    var exists bool
    err = tx.QueryRowContext(ctx,
        `SELECT EXISTS(
            SELECT 1 FROM kafka_offsets
            WHERE topic='order-events' AND partition=$1 AND offset=$2
        )`, partition, offset,
    ).Scan(&exists)
    if err != nil {
        return fmt.Errorf("offset check: %w", err)
    }
    if exists {
        log.Printf("Offset %d already processed, skipping (idempotent)", offset)
        return nil
    }

    // 2. Business logic — insert order
    _, err = tx.ExecContext(ctx,
        `INSERT INTO orders (data, created_at) VALUES ($1, NOW())`, orderJSON,
    )
    if err != nil {
        return fmt.Errorf("insert order: %w", err)
    }

    // 3. Commit Kafka offset in the SAME transaction
    _, err = tx.ExecContext(ctx,
        `INSERT INTO kafka_offsets (topic, partition, offset)
         VALUES ('order-events', $1, $2)
         ON CONFLICT (topic, partition) DO UPDATE SET offset = EXCLUDED.offset`,
        partition, offset,
    )
    if err != nil {
        return fmt.Errorf("save offset: %w", err)
    }

    // 4. Atomic commit: business data + offset both committed or both rolled back
    return tx.Commit()
}

[!TIP] Schema for Kafka offset tracking:

CREATE TABLE kafka_offsets (
    topic     VARCHAR(255) NOT NULL,
    partition INT          NOT NULL,
    offset    BIGINT       NOT NULL,
    PRIMARY KEY (topic, partition)
);

Case Study: Shopee Flash Sale Peak Shaving

🔥 [Production Pattern]: Shopee’s order event pipeline Problem: Flash sale midnight burst: 500,000 orders/minute. Database cannot absorb this synchronous write volume. Architecture: User Request → API Gateway → Kafka (Order Topic) → Worker Pool (Go) → DB Write Result: DB receives a steady ~5,000 writes/s regardless of burst size. Kafka absorbs the spike; workers drain it at a controlled rate. Config: 50 workers × 10 partitions = 500 concurrent DB writes. Buffer size = 10,000 messages. Backpressure: When buffer fills, Kafka consumer pauses automatically. Orders queue in Kafka (7-day retention) — zero data loss. (Source: Shopee Engineering Blog, 2021)

FAQ

What is the difference between Kafka and RabbitMQ?

Kafka is a distributed log — messages persist indefinitely, consumers manage offsets, replay is possible, throughput is in millions/s. RabbitMQ is a message broker — messages deleted after ACK, broker handles complex routing, push-based. Choose Kafka for event sourcing, audit trails, and fan-out to multiple consumers. Choose RabbitMQ for task queues, request-reply, and complex routing patterns.

How do you implement backpressure in Go?

Use a bounded buffered channel. When the channel is full, the sender blocks — this is natural backpressure. Combine with select { case jobChan <- msg: default: // backpressure handling } for non-blocking sends with explicit backpressure logic (e.g., pause Kafka consumption, increment a metrics counter).

How do you guarantee Exactly-Once in Kafka?

True end-to-end exactly-once for external side effects (DB writes, API calls) requires an idempotent consumer: store the Kafka offset and business data in the same DB transaction. If the consumer crashes and restarts, the duplicate message is detected via the offset check and safely skipped.

← Previous Part Next Part →

🔗 Next Step: Continue to Part 6: Distributed Locks — Redlock, etcd & Split-Brain Prevention in Go

Kafka Worker Pool in Go — Backpressure & Exactly-Once#

What You’ll Learn That AI Won’t Tell You#

Kafka vs RabbitMQ — When to Use Each?#

Architectural Comparison#

Kafka Zero-Copy Internals — Why Kafka Is So Fast#

Traditional I/O vs Zero-Copy#

Sparse Index — Fast Offset Lookup Without Full Scan#

Implementing Backpressure in Go#

Bounded Worker Pool Pattern#

Partition-Aware Ordered Worker Pool#

Exactly-Once Semantics in Kafka#

At-Least-Once Is the Default#

Exactly-Once via Transactional Offset Commit#

Case Study: Shopee Flash Sale Peak Shaving#

FAQ#

What is the difference between Kafka and RabbitMQ?#

How do you implement backpressure in Go?#

How do you guarantee Exactly-Once in Kafka?#

Navigation & Next Steps#

Kafka Worker Pool in Go — Backpressure & Exactly-Once

What You’ll Learn That AI Won’t Tell You

Kafka vs RabbitMQ — When to Use Each?

Architectural Comparison

Kafka Zero-Copy Internals — Why Kafka Is So Fast

Traditional I/O vs Zero-Copy

Sparse Index — Fast Offset Lookup Without Full Scan

Implementing Backpressure in Go

Bounded Worker Pool Pattern

Partition-Aware Ordered Worker Pool

Exactly-Once Semantics in Kafka

At-Least-Once Is the Default

Exactly-Once via Transactional Offset Commit

Case Study: Shopee Flash Sale Peak Shaving

FAQ

What is the difference between Kafka and RabbitMQ?

How do you implement backpressure in Go?

How do you guarantee Exactly-Once in Kafka?

Navigation & Next Steps