XGBoost
"XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework." (XGBoost documentation)
Motivation
The core motivation for using XGBoost to generate hourly electricity demand forecasts is due to previous work in literature. Our approach involves using socioeconomic and weather parameters passed to an XGBoost model to predict the hourly electricity demand. For this purpose it is a fast model to train and perform inference, therefore serves as a great option for a baseline that can be expanded on in future work.
Features
The following features are used by the model.
See serve.py
for more details:
class PredictionInput(BaseModel):
"""Define input data for prediction."""
is_weekend: int = Field(ge=0, le=1)
hour: int = Field(ge=0, lt=24)
month: int = Field(ge=1, le=12)
month_temp_avg: float
month_temp_rank: int = Field(ge=1, le=12)
year_electricity_demand_per_capita: float
year_gdp_ppp: float = Field(gt=0)
year_temp_percentile_5: float
year_temp_percentile_95: float
year_temp_top3: float
Temporal
Hour of the day
hour: int = Field(ge=0, lt=24)
Month of the year
month: int = Field(ge=1, le=12)
Weekend indicator
is_weekend: int = Field(ge=0, le=1)
Electricity Demand
Yearly electricity demand per capita
year_electricity_demand_per_capita: float
Monetary
Yearly Gross Domestic Product Purchasing Power Parity
year_gdp_ppp: float = Field(gt=0)
Weather
The grid cells used for these features has a resolution of (0.25° x 0.25°) and is bounded by the respective country borders.
Average temperature for the month
month_temp_avg: float
Calculated based on the temperature in the most populous grid cell.
Temperature rank of the month
month_temp_rank: int = Field(ge=1, le=12)
Calculated based on the temperature in the most populous grid cell.
Yearly temperature percentiles
year_temp_percentile_5: float
year_temp_percentile_95: float
Calculated based on the temperature in the most populous grid cell.
Average temperature in most populous grid cell
year_temp_top1: float
Average temperature in most populous 3 grid cells
year_temp_top3: float
Implementation
You can find all the relevant files in the models/xgboost
folder.
inference.py
Run inference for an XGBoost model that outputs electricity demand.
This module loads the input data, loads the pre-trained XGBoost model, performs inference using the loaded model, and saves the results.
serve.py
Serve an XGBoost model for electricity demand prediction using FastAPI.
This module provides a REST API service to serve predictions from a pre-trained XGBoost model.
It includes endpoints for health checks, model information, and making predictions based on passed features.
XGBoost.ipynb
A jupyter notebook containing the code to train and run an XGBoost model based on toktarova(2019) data. Includes visualizations and cross-validation.