Quality Tracking¶

OSM is crowd-sourced and uneven. Every power=plant element that cannot be resolved into a usable unit — missing capacity, unparseable tag, unknown source — is dropped from the output and recorded as a rejection. You can surface those rejections alongside the accepted plants to diagnose short results and to feed triage targets back to OSM contributors.

The easy path: `rejected_output_path`¶

process_units writes a CSV + sibling GeoJSON when rejected_output_path is set. Rejection data is only populated on API fetches (cache hits don't re-run the filter), so force a refresh when you need a full report.

from osm_powerplants import process_units, get_config, get_cache_dir

config = get_config()
config["force_refresh"] = True

df = process_units(
    countries=["Kenya"],
    config=config,
    cache_dir=str(get_cache_dir(config)),
    output_path="kenya.csv",
    rejected_output_path="kenya_rejected.csv",    # also writes kenya_rejected.geojson
)

The CSV lists each rejection with:

Column	Description
`id`	`type/id` OSM identifier
`element_type`	`node`, `way`, `relation`
`country`, `lat`, `lon`	Location
`unit_type`	`plant` or `generator`
`reason`	One of the codes below
`keywords`	The offending tag value (e.g. `yes`, `photovoltaic`)
`details`	Free-text context (e.g. raw tag dict)
`url`	Direct link to the OSM element

The GeoJSON has the same fields as feature properties and loads directly into JOSM or QGIS as a triage layer.

Rejection reasons¶

Reason	What to fix in OSM
`Missing output tag`	Add `plant:output:electricity` (e.g. `50 MW`)
`Capacity placeholder value`	Replace stubs like `yes` with a real number
`Capacity regex no match` / `Capacity non-numeric`	Use a parseable format — number + unit
`Missing source tag`	Add `plant:source`
`Missing technology tag`	Add `plant:method`
`Missing source type` / `Missing technology type`	Extend `source_mapping` / `technology_mapping` in config, or normalise the OSM value
`Capacity placeholder value`, `Capacity zero`	Replace with an actual measurement
`Element within existing plant geometry`	Already counted — a generator polygon lies inside a parent plant

Analysis¶

import pandas as pd

rej = pd.read_csv("kenya_rejected.csv")

# Top reasons
print(rej["reason"].value_counts())

# Plants only (filter out generators)
plants = rej[rej["unit_type"] == "plant"]
print(plants[["url", "reason", "details"]].head())

Low-level: re-using a tracker¶

For custom pipelines, instantiate RejectionTracker directly and hand it to a Workflow:

from osm_powerplants import Units, get_config, get_cache_dir
from osm_powerplants.quality.rejection import RejectionTracker
from osm_powerplants.retrieval.client import OverpassAPIClient
from osm_powerplants.workflow import Workflow

config = get_config()
config["missing_name_allowed"] = False   # strict mode

tracker = RejectionTracker()
units = Units()

with OverpassAPIClient(cache_dir=str(get_cache_dir(config))) as client:
    workflow = Workflow(client, tracker, units, config)
    workflow.process_country_data("Malta")

print(tracker.get_summary_string())

# Slice by reason
from osm_powerplants.models import RejectionReason
missing = tracker.get_rejections_by_reason(RejectionReason.MISSING_OUTPUT_TAG)

# Export
tracker.generate_report().to_csv("rejections.csv", index=False)
tracker.save_geojson("rejections.geojson")
tracker.save_geojson_by_reasons("output/")   # one file per reason

The feedback loop¶

osm-powerplants → rejection report → fix in JOSM → upload to OSM → re-run
       ↑                                                              │
       └──────────────────────────────────────────────────────────────┘

See the Workshop notebook for an end-to-end walkthrough.