Enedis Data Challenge
Reconstruction of missing electricity consumption data using Machine Learning and Time Series Analysis.
Objective: Develop a robust algorithm to reconstruct missing values in 1,000 electricity load curves (Linky-like data). The dataset contained approximately 69,000 synthetic curves generated by DeepCourbogen. The challenge was to restore coherent temporal dynamics, evaluated by the Mean Absolute Error (MAE) metric calculated exclusively on missing points.
The Time Series Challenge
Electricity consumption is highly periodic. A standard integer encoding creates a discontinuity between 23:30 and 00:00 (Distance = 23). In physical reality, these moments are adjacent.
23h → 0h
Distance = 23 (Huge Gap)
Cyclic Projection
Distance ≈ 0 (Continuous)
# Cyclic Time Encoding
h_sin = sin(2 * π * h / 24)
h_cos = cos(2 * π * h / 24)
1. Weighted k-NN
Standard interpolation fails on complex consumption patterns. We implemented a custom k-NN regressor that selects the 5 nearest neighbors based on valid data points.
- Calculates vertical bias (offset) to adjust neighbor curves.
- Applies inverse distance weighting for final prediction.
2. Matrix Completion
We assumed the consumption matrix has a low-rank structure (users share common behaviors). We used the SoftImpute algorithm.
- Iterative SVD with singular value thresholding.
- Captures global trends across all 69,000 curves simultaneously.