About +254

Mission

Understanding the socioeconomic factors that shape everyday life starts with data. In Kenya, that data is scattered across PDFs and messy Excel files, inconsistently formatted, and buried in portals that are unintuitive and rarely updated. It can take hours just to locate, extract, and clean data before any analysis can begin.

+254 makes data from official Kenyan sources easier to find, retrieve, and use, in formats that are ready for real work. The goal is straightforward: public data should be accessible, not just available. We hope this makes it easier for researchers, journalists, developers, and policymakers to work with the numbers that shape Kenya.

How we source data

All our data comes exclusively from official Kenyan government institutions and their public publications. Each dataset links back to its original source, so you can always trace where it came from and cite it if needed.

Datasets are updated live: whenever a source publishes new data, it flows through our extraction pipeline automatically. Everything is cleaned and standardized from extraction, with no imputation of any kind. Check the full catalog for details on what's available and how each dataset is structured.

Our catalog is always growing. Know a source that should be added? Open a GitHub issue with the URL and a description of what it contains.

How we process data
01

Extraction

Raw data comes in a dozen different formats. We extract it all into a single consistent structure, so you never have to deal with the original mess.

02

Validation

Sources contain errors, gaps, and inconsistencies. We validate every column against a defined schema to catch problems before they reach you.

03

Cleaning

Date formats, units, column names: we standardize the presentation without touching the underlying values.

04

Publication

Every dataset is served through the same REST API, with the same JSON format, pagination, and query patterns. Learn one, and you know them all.

Conventions

Long-form data

Every dataset uses a metric and value column pair instead of one column per indicator.

snake_case columns

All column names are snake_case: fiscal_year, area_name, sale_week.

Lowercased values

Every string value is lowercased: months (january), metric names (total revenue), country names (united states dollar), counties (nairobi).

Clean numeric values

Commas and other formatting are stripped. Every value column is numeric. Rows where a value was not reported are omitted.

Schema patterns

Monthly data has year and month, quarterly has year and quarter, annual has just year, and geographic datasets include area_name.

No imputation

Values are reported as published. Nothing is estimated, interpolated, or filled in.

Support & contributions

+254 is an open-source project. The code, pipelines, and dataset definitions are available on GitHub.

Report an issue. If you find incorrect data, missing values, or unexpected API behavior, open a GitHub issue.
Suggest a dataset. Know a source that should be added? Open an issue with the URL and a description of its contents.
Contribute code. Pull requests are welcome. The codebase uses descriptive variable names and consistent pipeline patterns, making it straightforward to extend.