Quality Tracking¶
OSM is crowd-sourced and uneven. Every power=plant element that
cannot be resolved into a usable unit — missing capacity, unparseable
tag, unknown source — is dropped from the output and recorded as a
rejection. You can surface those rejections alongside the accepted
plants to diagnose short results and to feed triage targets back to
OSM contributors.
The easy path: rejected_output_path¶
process_units writes a CSV + sibling GeoJSON when rejected_output_path
is set. Rejection data is only populated on API fetches (cache hits
don't re-run the filter), so force a refresh when you need a full
report.
from osm_powerplants import process_units, get_config, get_cache_dir
config = get_config()
config["force_refresh"] = True
df = process_units(
countries=["Kenya"],
config=config,
cache_dir=str(get_cache_dir(config)),
output_path="kenya.csv",
rejected_output_path="kenya_rejected.csv", # also writes kenya_rejected.geojson
)
The CSV lists each rejection with:
| Column | Description |
|---|---|
id |
type/id OSM identifier |
element_type |
node, way, relation |
country, lat, lon |
Location |
unit_type |
plant or generator |
reason |
One of the codes below |
keywords |
The offending tag value (e.g. yes, photovoltaic) |
details |
Free-text context (e.g. raw tag dict) |
url |
Direct link to the OSM element |
The GeoJSON has the same fields as feature properties and loads directly into JOSM or QGIS as a triage layer.
Rejection reasons¶
| Reason | What to fix in OSM |
|---|---|
Missing output tag |
Add plant:output:electricity (e.g. 50 MW) |
Capacity placeholder value |
Replace stubs like yes with a real number |
Capacity regex no match / Capacity non-numeric |
Use a parseable format — number + unit |
Missing source tag |
Add plant:source |
Missing technology tag |
Add plant:method |
Missing source type / Missing technology type |
Extend source_mapping / technology_mapping in config, or normalise the OSM value |
Capacity placeholder value, Capacity zero |
Replace with an actual measurement |
Element within existing plant geometry |
Already counted — a generator polygon lies inside a parent plant |
Analysis¶
import pandas as pd
rej = pd.read_csv("kenya_rejected.csv")
# Top reasons
print(rej["reason"].value_counts())
# Plants only (filter out generators)
plants = rej[rej["unit_type"] == "plant"]
print(plants[["url", "reason", "details"]].head())
Low-level: re-using a tracker¶
For custom pipelines, instantiate RejectionTracker directly and hand
it to a Workflow:
from osm_powerplants import Units, get_config, get_cache_dir
from osm_powerplants.quality.rejection import RejectionTracker
from osm_powerplants.retrieval.client import OverpassAPIClient
from osm_powerplants.workflow import Workflow
config = get_config()
config["missing_name_allowed"] = False # strict mode
tracker = RejectionTracker()
units = Units()
with OverpassAPIClient(cache_dir=str(get_cache_dir(config))) as client:
workflow = Workflow(client, tracker, units, config)
workflow.process_country_data("Malta")
print(tracker.get_summary_string())
# Slice by reason
from osm_powerplants.models import RejectionReason
missing = tracker.get_rejections_by_reason(RejectionReason.MISSING_OUTPUT_TAG)
# Export
tracker.generate_report().to_csv("rejections.csv", index=False)
tracker.save_geojson("rejections.geojson")
tracker.save_geojson_by_reasons("output/") # one file per reason
The feedback loop¶
osm-powerplants → rejection report → fix in JOSM → upload to OSM → re-run
↑ │
└──────────────────────────────────────────────────────────────┘
See the Workshop notebook for an end-to-end walkthrough.