EngineeringInfrastructureAfrica Tech

Building reliable systems in West Africa: the constraints that matter

Reliability engineering looks different when your infrastructure has to account for intermittent connectivity, unreliable power, and a narrower pool of specialised vendors. Here's what we've learned.

Tchalla7 min read12 September 2024

Building reliable software in West Africa is not harder than building it anywhere else -- but it requires a clear-eyed acknowledgement of the constraints that are different. The engineers who struggle here are usually the ones who apply playbooks designed for AWS us-east-1 with 99.9% internet connectivity and redundant power as an assumption, not a target.

This post covers the three constraints we encounter most often, and how we account for them in architecture and operational decisions.

1. Connectivity is intermittent, not reliable

In the UK or US, a mobile network or broadband connection failing for more than a minute is a meaningful incident. In Greater Accra, it's Tuesday afternoon. Applications that assume continuous connectivity will fail in ways their designers did not anticipate.

The design principle that follows is: build for offline-first, not connectivity-assumed. For any system that has end-users in the field -- clinic staff, warehouse workers, sales teams -- we now default to local-first data storage with a sync architecture rather than a thin client that depends on a live API connection.

The technology choices this implies are not exotic: SQLite with sync, or a lightweight write-ahead log that reconciles when connectivity resumes, handles most cases. The important thing is treating offline as a first-class state, not an edge case.

What this looks like in practice

When we built MediConnect GH's data platform, every site runs a local sync agent. It writes patient data locally first, queues the write for synchronisation, and reconciles with the central database when the connection is stable. Clinicians have never experienced data loss due to a connectivity drop, even though connectivity drops happen regularly.

2. Power reliability varies significantly

Consistent power supply is a constraint that affects both end-users and your own infrastructure. For user-facing applications, this means designing for sessions that end unexpectedly and resume later -- form state that survives a browser close, a checkout flow that recovers gracefully from a payment timeout.

For infrastructure, it means thinking carefully about where you host. Cloud hosting for the primary application layer solves a lot of this, but it does not solve the problem of access: if your users' devices lose power, they cannot use your application. Designing for low-bandwidth, mobile-first interfaces reduces the energy cost of interacting with your product and makes it more accessible across device types.

3. The vendor ecosystem is narrower

Finding a specialist in Kubernetes networking or PostgreSQL replication at short notice in Accra is harder than finding one in Amsterdam or Austin. This has two practical implications.

Choose boring technology. The risk-adjusted recommendation for most workloads in this market is to pick well-understood, widely-supported tools over cutting-edge ones. PostgreSQL over exotic NewSQL databases. Nginx over the newest service mesh. React over whichever framework got a Hacker News post last week. The person who needs to debug your production incident at 11pm might be good but not necessarily specialised in your stack -- do not make their job harder than it needs to be.

Invest heavily in documentation and runbooks. Every system we build has documented runbooks for the ten most likely operational scenarios. This is not just good practice -- in a market where you might not be able to reach a specialist immediately, the difference between a 30-minute recovery and a 4-hour recovery is often whether the runbook exists and is up to date.

The broader point

None of these constraints are insurmountable, and none of them mean that ambitious systems cannot be built here. They mean that the people building those systems need to be honest about the environment they are designing for, and make deliberate choices that account for it.

The worst failure mode we see is the offshore team that builds to a European or American standard and hands it to a local team that then struggles to maintain it in conditions the original design did not consider. Reliability engineering here has to be done here, with these constraints as first-class inputs.

Ready to Start

Found this useful?

Talk to us about what you're building — we're always up for a direct conversation.

Start a Conversation