Kavod Technologies
How We Process 2M Transactions per Day Across Kavod Platforms
InfrastructureKarat Dollar

How We Process 2M Transactions per Day Across Kavod Platforms

Sarah Johnson
Sarah Johnson
February 1, 202613 min read
Sarah Johnson

Sarah Johnson

Principal Engineer

Sarah architects Kavod's shared payments infrastructure, handling millions of daily transactions with 99.99% uptime.

The Scale of Kavod Payments

Every ride on Buslyft, every stream on BantuStream, every property token purchase on GrandEstate, every royalty payout on Sonora Beats — they all flow through a single payment backbone: Karat Dollar.

As of February 2026, we process an average of 2 million transactions per day with a combined daily volume of $4.7 million. These transactions span:

  • 12 currencies (USD, NGN, KES, GHS, ZAR, TZS, UGX, and more)
  • 28 payment methods (cards, mobile money, bank transfer, USDC)
  • 10 Kavod platforms (each with different transaction patterns and requirements)

This post explains the architecture that makes it all work — reliably, at scale, with 99.99% uptime over the past 12 months.

Event-Driven Architecture

Why Events?

Traditional payment systems use synchronous request-response patterns: the caller sends a payment request, waits for the payment processor to respond, and then continues. This works fine at low volume, but it creates several problems at scale:

  • Tight coupling: Every upstream service needs to know the exact API of the payment service and handle its error codes
  • Cascading failures: If the payment service is slow, every calling service gets slow
  • Lost transactions: If the caller crashes between sending the request and receiving the response, the transaction status is unknown

Karat Dollar is built on an event-driven architecture where every state change is published as an immutable event to Apache Kafka. Services communicate by producing and consuming events, not by calling each other directly.

The Payment Event Stream

Every transaction progresses through a well-defined state machine, with each transition producing an event:

INITIATED → VALIDATED → ROUTED → SUBMITTED → PROCESSING → COMPLETED
                                                              │
                                                    (or) → FAILED → RETRY → SUBMITTED → ...

Each event contains the full context needed to process it:

interface PaymentEvent {
  eventId: string;              // Globally unique, used for idempotency
  transactionId: string;        // Groups all events for a single transaction
  eventType: PaymentEventType;  // INITIATED, VALIDATED, ROUTED, etc.
  timestamp: string;            // ISO 8601
  payload: {
    amount: number;
    currency: string;
    sourceMethod: PaymentMethod;
    destinationMethod: PaymentMethod;
    platformId: string;         // Which Kavod platform initiated this
    userId: string;
    metadata: Record<string, string>;
  };
  previousEventId: string;      // Linked list of events for this transaction
}

Service Topology

The payment pipeline consists of six core services, each consuming and producing events:

  1. Gateway Service — Accepts payment requests from platforms, validates input, assigns a transaction ID, and produces an INITIATED event
  1. Validation Service — Consumes INITIATED events. Checks fraud rules, verifies account balance, validates payment method. Produces VALIDATED or REJECTED event
  1. Routing Service — Consumes VALIDATED events. Determines the optimal payment processor based on method, currency, cost, and availability. Produces ROUTED event
  1. Processor Adapters — One adapter per external payment processor (Paystack, Flutterwave, M-Pesa API, Stripe, Circle). Each consumes ROUTED events for its processor, submits to the external API, and produces SUBMITTED / PROCESSING / COMPLETED / FAILED events
  1. Reconciliation Service — Continuously compares our internal ledger against processor settlement reports to detect discrepancies
  1. Notification Service — Consumes terminal events (COMPLETED, FAILED) and notifies the originating platform and user
Platform ──> Gateway ──> Kafka ──> Validation ──> Kafka ──> Routing ──> Kafka ──> Processor
                                                                                      │
                                                                                      ▼
User <── Notification <── Kafka <── Reconciliation <────────────────────────── Kafka (terminal)

Payment Routing

The routing service is where much of the business intelligence lives. It selects the optimal processor for each transaction based on a scoring function:

function scoreProcessor(tx: Transaction, processor: Processor): number {
  if (!processor.supports(tx.currency, tx.sourceMethod)) return -Infinity;
  if (!processor.isHealthy()) return -Infinity;

  return (
    WEIGHT_COST * (1 - normalize(processor.feeForTx(tx))) +
    WEIGHT_SUCCESS_RATE * processor.recentSuccessRate(tx.currency) +
    WEIGHT_SPEED * (1 - normalize(processor.avgSettlementTime(tx.currency))) +
    WEIGHT_AVAILABILITY * processor.uptimeScore()
  );
}

When a processor is experiencing issues (elevated error rates or latency), the routing service automatically shifts traffic to alternatives. This active-active routing is a key factor in our 99.99% uptime — no single processor failure can bring down payments.

Fallback and Retry Logic

When a transaction fails at the processor level, the system distinguishes between:

  • Hard failures (invalid card number, insufficient funds, account closed): Not retried. The user is notified immediately
  • Soft failures (timeout, rate limit, temporary processor error): Retried up to 3 times with exponential backoff. If all retries fail, the routing service re-routes to an alternative processor
const RETRY_DELAYS = [2000, 8000, 30000]; // ms - exponential backoff

async function handleFailure(event: PaymentEvent, error: ProcessorError) {
  if (error.isHardFailure) {
    produce({ ...event, eventType: "FAILED", reason: error.code });
    return;
  }

  const retryCount = event.metadata.retryCount || 0;
  if (retryCount < RETRY_DELAYS.length) {
    await delay(RETRY_DELAYS[retryCount]);
    produce({
      ...event,
      eventType: "RETRY",
      metadata: { ...event.metadata, retryCount: retryCount + 1 },
    });
  } else {
    // Reroute to alternative processor
    produce({ ...event, eventType: "REROUTE" });
  }
}

Reconciliation

Financial systems must be exactly correct. Our reconciliation service runs continuously, comparing three sources of truth:

  1. Internal ledger — Our event-sourced record of every transaction
  2. Processor reports — Settlement files from each payment processor (received daily or hourly depending on the processor)
  3. Bank statements — Actual funds movement in our settlement accounts

The reconciler flags three types of discrepancies:

  • Missing transactions — Present in processor report but not in our ledger (or vice versa)
  • Amount mismatches — Transaction exists in both systems but amounts differ
  • Status mismatches — We show COMPLETED but the processor shows FAILED (or vice versa)

Each discrepancy generates an alert. Critical discrepancies (amount mismatches above $100, status mismatches) are escalated to the finance team immediately. In practice, our discrepancy rate is less than 0.002% of transactions, and most are resolved automatically within 24 hours.

Achieving 99.99% Uptime

99.99% uptime means less than 52 minutes of downtime per year. Here's how we achieve it:

Multi-Region Deployment

Karat Dollar runs in two active regions (AWS eu-west-1 and af-south-1) with automatic failover. Kafka is replicated across regions with MirrorMaker 2. If an entire region goes down, traffic is redirected within 30 seconds.

Circuit Breakers

Every external call (to payment processors, to databases) is wrapped in a circuit breaker. When the error rate for a dependency exceeds 50% over a 10-second window, the circuit opens and all subsequent calls fail fast. This prevents a slow dependency from consuming all our threads and bringing down the entire system.

Chaos Engineering

We run weekly chaos experiments using Litmus Chaos:

  • Pod kill: Randomly terminate payment service pods during peak hours
  • Network partition: Simulate network splits between regions
  • Processor outage: Simulate a payment processor going completely offline
  • Kafka broker failure: Kill individual Kafka brokers

Every chaos experiment must result in zero user-visible errors and zero lost transactions. When an experiment fails, we fix the root cause before the next week's run.

Monitoring and Alerting

We monitor approximately 2,400 metrics across the payment stack. Key SLOs:

| SLO | Target | Current | |---|---|---| | End-to-end success rate | ≥ 99.5% | 99.72% | | P99 latency (card payments) | ≤ 5s | 3.1s | | P99 latency (mobile money) | ≤ 15s | 11.2s | | Reconciliation discrepancy rate | ≤ 0.01% | 0.002% | | System availability | ≥ 99.99% | 99.993% |

Explore Karat Dollar at karatdollar.com.

#payments#infrastructure#event-driven#karat-dollar#scalability

Try Karat Dollar today

Discover how Karat Dollar can help you build better, faster. Get started for free and see the difference.

Get Started
Back to All Articles

Annual Report FY2025

Our comprehensive review of performance and strategy

View Reports

Stay updated

Product launches, engineering updates, and company news.

Headquarters

Cape Town, South Africa
Technology Hub, Innovation District

Regional Offices

Lagos, Nigeria • Nairobi, Kenya
Accra, Ghana • Johannesburg, SA

Contact

info@kavodtechnologies.com
+27 21 123 4567

Kavod Technologies Limited © 2026. All rights reserved.

Accessibility Options