Example of Use#

[1]:
import pandas as pd

import powerplantmatching as pm
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
ERROR 1: PROJ: proj_create_from_database: Open of /home/fabian/.miniconda3/share/proj failed

Load open source data published by the Global Energy Observatory, GEO. As you might know, this is not the original format of the database but the standardized format of powerplantmatching.

[2]:
geo = pm.data.GEO()

geo.head()
[2]:
GEO Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
0 Duernrohr Chp Hard Coal CCGT CHP Austria 373.384467 NaN 1985.0 NaN NaN 48.3264 15.9246 NaN NaN NaN NaN NaN GEO-45151
1 Duernrohr Chp Hard Coal CCGT CHP Austria 324.521809 NaN 1985.0 NaN NaN 48.3264 15.9246 NaN NaN NaN NaN NaN GEO-45151
2 Mellach Chp Hard Coal Steam Turbine CHP Austria 226.796491 NaN 1986.0 NaN NaN 46.9115 15.4884 NaN NaN NaN NaN NaN GEO-45150
3 Lenzing Hard Coal NaN PP Austria 11.063243 NaN 1955.0 NaN NaN 47.9767 13.6201 NaN NaN NaN NaN NaN GEO-45719
4 Lenzing Hard Coal NaN PP Austria 19.360676 NaN 1972.0 NaN NaN 47.9767 13.6201 NaN NaN NaN NaN NaN GEO-45719

Load the data published by the ENTSOE which has the same format as the GEO data.

[3]:
entsoe = pm.data.ENTSOE()

entsoe.head()
[3]:
ENTSOE Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
0 Aanekoski Bioenergy NaN PP Finland 260.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 44W-T-YT-000017B 44W-T-YT-000017B
1 Abono Hard Coal NaN PP Spain 561.8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 18WABO2-12345-0N 18WABO2-12345-0N
2 Abono Hard Coal NaN PP Spain 341.7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 18WABO1-12345-0X 18WABO1-12345-0X
3 Abthb Hard Coal NaN PP United Kingdom 1590.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 48WSTN0000ABTHBN 48WSTN0000ABTHBN
4 Abthgt Oil NaN PP United Kingdom 51.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 48WSTN000ABTHGTK 48WSTN000ABTHGTK

Data Inspection#

Whereas various options of inspection are provided by the pandas package, some more powerplant-specific methods are applicable via an accessor ‘powerplant’. It gives you a convenient way to inspect, manipulate the data:

[4]:
geo.powerplant.plot_map(figsize=(11, 8));
_images/example_8_0.png
[5]:
geo.powerplant.lookup().head(20).to_frame()
[5]:
Capacity
Country Fueltype
Albania Hydro 1458.488
Oil 89.855
Austria Hard Coal 990.160
Hydro 7495.837
Natural Gas 1112.211
Oil 2935.705
Wind 0.000
Belgium Hard Coal 1726.788
Hydro 1310.298
Natural Gas 3034.023
Nuclear 4966.076
Oil 1093.076
Waste 366.526
Wind 0.000
Bosnia and Herzegovina Hard Coal 414.872
Hydro 2184.036
Lignite 1226.446
Bulgaria Hard Coal 1742.461
Hydro 2457.894
Lignite 2747.977
[6]:
geo.powerplant.fill_missing_commissioning_years().head()
/tmp/ipykernel_227814/846699350.py:1: DeprecatedWarning: fill_missing_commyears is deprecated as of 0.5.0 and will be removed in 0.6.0. This function was renamed to `fill_missing_commissioning_years`
  geo.powerplant.fill_missing_commyears().head()
[6]:
GEO Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
0 Duernrohr Chp Hard Coal CCGT CHP Austria 373.384467 NaN 1985 1985.0 NaN 48.3264 15.9246 NaN NaN NaN NaN NaN GEO-45151
1 Duernrohr Chp Hard Coal CCGT CHP Austria 324.521809 NaN 1985 1985.0 NaN 48.3264 15.9246 NaN NaN NaN NaN NaN GEO-45151
2 Mellach Chp Hard Coal Steam Turbine CHP Austria 226.796491 NaN 1986 1986.0 NaN 46.9115 15.4884 NaN NaN NaN NaN NaN GEO-45150
3 Lenzing Hard Coal NaN PP Austria 11.063243 NaN 1955 1955.0 NaN 47.9767 13.6201 NaN NaN NaN NaN NaN GEO-45719
4 Lenzing Hard Coal NaN PP Austria 19.360676 NaN 1972 1972.0 NaN 47.9767 13.6201 NaN NaN NaN NaN NaN GEO-45719

Of course the pandas functions are also very convenient:

[7]:
print(f"Total capacity of GEO is: \n  {geo.Capacity.sum()} MW \n")
print(f"The technology types are: \n {geo.Technology.unique()} ")
Total capacity of GEO is:
  621580.8665191324 MW

The technology types are:
 ['CCGT' 'Steam Turbine' nan 'OCGT' 'Reservoir' 'Run-Of-River'
 'Pumped Storage' 'PV' 'CSP']

Incomplete data#

All open databases are so far not complete and cover only a part of overall European powerplants. We perceive the capacity gaps looking at the ENTSOE SO&AF Statistics.

[8]:
stats = pm.data.Capacity_stats()
[9]:
pm.plot.fueltype_totals_bar([geo, entsoe, stats], keys=["ENTSOE", "GEO", "Statistics"]);
_images/example_16_0.png

The gaps for both datasets are unmistakable. Adding both datasets on top of each other would not be a solution, since the intersections of both sources are two high, and the resulting dataset would include many duplicates. A better approach is to merge the incomplete datasets together, respecting intersections and differences of each dataset.

Merging datasets#

Before comparing two lists of power plants, we need to make sure that the data sets are on the same level of aggregation. That is, we ensure that all power plant blocks are aggregated to power plant stations.

[10]:
dfs = [geo.powerplant.aggregate_units(), entsoe.powerplant.aggregate_units()]
intersection = pm.matching.combine_multiple_datasets(dfs)
INFO:powerplantmatching.cleaning:Aggregating blocks in data source 'GEO'.
INFO:powerplantmatching.cleaning:Aggregating blocks in data source 'ENTSOE'.
INFO:powerplantmatching.matching:Comparing data sources `GEO` and `ENTSOE`
[11]:
intersection.head()
[11]:
GEO Name Fueltype Technology Set Country ... Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE ... GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE
0 Fierza Albania Fierzag Hydro Hydro Reservoir Reservoir PP PP Albania Albania ... 0.0 0.0 0.0 0.0 0.0 0.0 {nan, nan, nan, nan} {54W-FIERZ000001A} {GEO-42688} {54W-FIERZ000001A}
1 Dalkia Poznan Karolin Chp Karolin Hard Coal Hard Coal NaN NaN CHP PP Poland Poland ... 0.0 0.0 0.0 0.0 0.0 0.0 {nan, nan, nan} {19W0000000000725, 19W0000000000741} {GEO-42494} {19W0000000000725, 19W0000000000741}
2 Alqueva Alqueva Hydro Hydro Reservoir Pumped Storage PP Store Portugal Portugal ... 0.0 0.0 0.0 0.0 0.0 0.0 {nan, nan} {16WALQUE-------F} {GEO-43534} {16WALQUE-------F}
3 Aguieira Brazil Aguieira Hydro Hydro Reservoir Pumped Storage PP Store Portugal Portugal ... 0.0 0.0 0.0 0.0 0.0 0.0 {nan, nan, nan} {16WAGUIE-------3} {GEO-43566} {16WAGUIE-------3}
4 Zydowo Zydowo Hydro Hydro Pumped Storage Pumped Storage Store Store Poland Poland ... 0.0 0.0 0.0 0.0 0.0 0.0 {nan, nan, nan} {19W0000000002426} {GEO-42470} {19W0000000002426}

5 rows × 36 columns

The result of the matching process is a multi-indexed dataframe. To bring the matched dataframe into a convenient format, we combine the information of the two sources.

[12]:
intersection = intersection.powerplant.reduce_matched_dataframe()
intersection.head()
[12]:
GEO Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
0 Fierzag Hydro Reservoir PP Albania 500.0 NaN 1978.0 2003.0 NaN 42.251390 20.04306 NaN 0.0 0.0 0.0 {54W-FIERZ000001A} {'ENTSOE': {'54W-FIERZ000001A'}, 'GEO': {'GEO-...
1 Karolin Hard Coal NaN CHP Poland 261.0 NaN 1985.0 NaN NaN 52.436300 16.98790 NaN 0.0 0.0 0.0 {19W0000000000725, 19W0000000000741} {'ENTSOE': {'19W0000000000725', '19W0000000000...
2 Alqueva Hydro Pumped Storage Store Portugal 508.0 NaN 2004.0 NaN NaN 38.197500 -7.49640 NaN 0.0 0.0 0.0 {16WALQUE-------F} {'ENTSOE': {'16WALQUE-------F'}, 'GEO': {'GEO-...
3 Aguieira Hydro Pumped Storage Store Portugal 336.0 NaN 1981.0 NaN NaN 40.340200 -8.19700 NaN 0.0 0.0 0.0 {16WAGUIE-------3} {'ENTSOE': {'16WAGUIE-------3'}, 'GEO': {'GEO-...
4 Zydowo Hydro Pumped Storage Store Poland 167.0 NaN 1971.0 NaN NaN 54.024965 16.70690 NaN 0.0 0.0 0.0 {19W0000000002426} {'ENTSOE': {'19W0000000002426'}, 'GEO': {'GEO-...

As you can see in the very last column, we can track which original data entries flew into the resulting one.

We can have a look into the Capacity statistics.

[13]:
pm.plot.fueltype_totals_bar([intersection, stats], keys=["Intersection", "Statistics"]);
_images/example_25_0.png
[14]:
combined = intersection.powerplant.extend_by_non_matched(
    entsoe
).powerplant.extend_by_non_matched(geo)
INFO:powerplantmatching.cleaning:Aggregating blocks in data source 'ENTSOE'.
INFO:powerplantmatching.cleaning:Aggregating blocks in data source 'GEO'.
[15]:
pm.plot.fueltype_totals_bar([combined, stats], keys=["Combined", "Statistics"]);
_images/example_27_0.png

The aggregated capacities roughly match the SO&AF for all conventional powerplants.

Processed Data#

powerplantmatching comes along with already matched data, this includes data from GEO, ENTSOE, OPSD, CARMA, GPD and ESE (ESE, only if you have followed the instructions).

[16]:
m = pm.collection.powerplants()
/tmp/ipykernel_227814/3660418824.py:1: DeprecatedWarning: matched_data is deprecated as of 5.5 and will be removed in 0.6. Use `powerplants` instead.
  m = pm.collection.matched_data()
[17]:
m.powerplant.plot_map(figsize=(11, 8));
_images/example_32_0.png
[18]:
pm.plot.fueltype_totals_bar([m, stats], keys=["Processed", "Statistics"]);
_images/example_33_0.png
[19]:
pm.plot.factor_comparison([m, stats], keys=["Processed", "Statistics"])
/home/fabian/vres/py/powerplantmatching/powerplantmatching/plot.py:230: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  compare.append(
[19]:
(<Figure size 864x648 with 1 Axes>, <AxesSubplot:ylabel='Capacity [GW]'>)
_images/example_34_2.png
[20]:
m.head()
[20]:
Matched Data Name Fueltype Technology Set Country Capacity Efficiency DateIn DateRetrofit DateOut lat lon Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh EIC projectID
id
0 Emsland Nuclear Steam Turbine CHP Germany 1336.000000 0.33 1988.0 1988.0 2022.0 52.481878 7.306658 NaN 0.0 0.0 0.0 {'11WD7KKE-1K--KW5'} {'ENTSOE': {'11WD7KKE-1K--KW5'}, 'OPSD': {'BNA...
1 Mellach Hard Coal Steam Turbine CHP Austria 200.000000 NaN 1986.0 1986.0 2020.0 46.911700 15.488300 NaN 0.0 0.0 0.0 {'14W-WML-KW-----0'} {'BEYONDCOAL': {'BEYOND-AT-11'}, 'ENTSOE': {'1...
2 Eemshaven Hard Coal CCGT PP Netherlands 1604.170304 0.58 2015.0 NaN 2029.0 53.440500 6.861200 NaN 0.0 0.0 0.0 {'49W000000000044-'} {'BEYONDCOAL': {'BEYOND-NL-12'}, 'ENTSOE': {'4...
3 Emile Huchet Hard Coal CCGT PP France 596.493211 NaN 1958.0 2010.0 2022.0 49.152500 6.698100 NaN 0.0 0.0 0.0 {'17W100P100P0345B', '17W100P100P0344D'} {'BEYONDCOAL': {'BEYOND-FR-67'}, 'ENTSOE': {'1...
4 Fusina Hard Coal Steam Turbine PP Italy 899.810470 NaN 1964.0 NaN 2025.0 45.431400 12.245800 NaN 0.0 0.0 0.0 {'26WIMPI-S05FTSNK'} {'BEYONDCOAL': {'BEYOND-IT-24'}, 'ENTSOE': {'2...
[21]:
pd.concat(
    [
        m[m.DateIn.notnull()].groupby("Fueltype").DateIn.count(),
        m[m.DateIn.isna()].fillna(1).groupby("Fueltype").DateIn.count(),
    ],
    keys=["DateIn existent", "DateIn missing"],
    axis=1,
)
[21]:
DateIn existent DateIn missing
Fueltype
Hard Coal 191 10.0
Hydro 1743 1549.0
Lignite 98 9.0
Natural Gas 727 54.0
Nuclear 63 NaN
Oil 62 33.0
Other 103 75.0
Solar 12 191.0
Waste 84 17.0
Wind 220 163.0