About +254
Understanding the socioeconomic factors that shape everyday life starts with data. In Kenya, that data is scattered across PDFs and messy Excel files, inconsistently formatted, and buried in portals that are unintuitive and rarely updated. It can take hours just to locate, extract, and clean data before any analysis can begin.
+254 makes data from official Kenyan sources easier to find, retrieve, and use, in formats that are ready for real work. The goal is straightforward: public data should be accessible, not just available. We hope this makes it easier for researchers, journalists, developers, and policymakers to work with the numbers that shape Kenya.
All our data comes exclusively from official Kenyan government institutions and their public publications. Each dataset links back to its original source, so you can always trace where it came from and cite it if needed.
Datasets are updated live: whenever a source publishes new data, it flows through our extraction pipeline automatically. Everything is cleaned and standardized from extraction, with no imputation of any kind. Check the full catalog for details on what's available and how each dataset is structured.
Our catalog is always growing. Know a source that should be added? Open a GitHub issue with the URL and a description of what it contains.
Extraction
Raw data comes in a dozen different formats. We extract it all into a single consistent structure, so you never have to deal with the original mess.
Validation
Sources contain errors, gaps, and inconsistencies. We validate every column against a defined schema to catch problems before they reach you.
Cleaning
Date formats, units, column names: we standardize the presentation without touching the underlying values.
Publication
Every dataset is served through the same REST API, with the same JSON format, pagination, and query patterns. Learn one, and you know them all.
Long-form data
Every dataset uses a metric and value column pair instead of one column per indicator.
snake_case columns
All column names are snake_case: fiscal_year, area_name, sale_week.
Lowercased values
Every string value is lowercased: months (january), metric names (total revenue), country names (united states dollar), counties (nairobi).
Clean numeric values
Commas and other formatting are stripped. Every value column is numeric. Rows where a value was not reported are omitted.
Schema patterns
Monthly data has year and month, quarterly has year and quarter, annual has just year, and geographic datasets include area_name.
No imputation
Values are reported as published. Nothing is estimated, interpolated, or filled in.
+254 is an open-source project. The code, pipelines, and dataset definitions are available on GitHub.