Example of Use
[1]:
import powerplantmatching as pm
import pandas as pd
Load open source data published bei the Global Energy Observation, GEO. As you might know, this is not the original format of the database but the standardized format of powerplantmatching.
[2]:
geo = pm.data.GEO()
geo.head()
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
[2]:
GEO | Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | DateIn | DateRetrofit | DateMothball | DateOut | lat | lon | EIC | projectID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aarberg | Hydro | Run-Of-River | PP | Switzerland | 10.504635 | NaN | NaN | NaN | NaN | NaN | 1968.0 | 1968.0 | NaN | NaN | 47.03780 | 7.2720 | NaN | GEO45222 |
1 | Aberthaw | Hard Coal | Steam Turbine | PP | United Kingdom | 1382.987474 | NaN | NaN | NaN | NaN | NaN | 1971.0 | 2008.0 | NaN | NaN | 51.38731 | -3.4049 | NaN | GEO1972 |
2 | Ablaincourt pressoir | Wind | NaN | PP | France | NaN | NaN | NaN | NaN | NaN | NaN | 2009.0 | 2009.0 | NaN | NaN | 49.84330 | 2.8017 | NaN | GEO40074 |
3 | Abono | Hard Coal | Steam Turbine | PP | Spain | 849.799703 | NaN | NaN | NaN | NaN | NaN | 1974.0 | 2005.0 | NaN | NaN | 43.55280 | -5.7231 | NaN | GEO43673 |
4 | Abwinden asten | Hydro | Run-Of-River | PP | Austria | 113.856689 | NaN | NaN | NaN | NaN | NaN | 1979.0 | 1979.0 | NaN | NaN | 48.24798 | 14.4305 | NaN | GEO45165 |
Load the data published by the ENTSOE which has the same format as the geo data.
[3]:
entsoe = pm.data.ENTSOE()
entsoe.head()
WARNING:powerplantmatching.core:Geoparsing not possible as no google api key was found, please add the key to your config.yaml if you want to enable it.
[3]:
ENTSOE | Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | DateIn | DateRetrofit | DateMothball | DateOut | lat | lon | EIC | projectID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aboño | Hard Coal | NaN | PP | Spain | 341.7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 43.564759 | -5.718301 | 18WABO1-12345-0X | 18WABO1-12345-0X |
1 | Aboño | Hard Coal | NaN | PP | Spain | 535.8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 43.564759 | -5.718301 | 18WABO2-12345-0N | 18WABO2-12345-0N |
2 | Abthb | Hard Coal | NaN | PP | United Kingdom | 1590.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 51.381064 | -2.359017 | 48WSTN0000ABTHBN | 48WSTN0000ABTHBN |
3 | Abthgt | Oil | NaN | PP | United Kingdom | 51.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 55.902400 | -3.643118 | 48WSTN000ABTHGTK | 48WSTN000ABTHGTK |
4 | Abwinden asten | Hydro | NaN | PP | Austria | 168.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 48.218823 | 14.420281 | 14W-BAA-KW-----D | 14W-BAA-KW-----D |
Data Inspection
Whereas various options of inspection of provided by the pandas package, some more powerplant specific methods are applicable via an accessor ‘powerplant’. It gives you a convenient way to inspect, manipulate the data:
[4]:
geo.powerplant.plot_map();

[5]:
geo.powerplant.lookup().head(20).to_frame()
[5]:
Capacity | ||
---|---|---|
Country | Fueltype | |
Albania | Hydro | 1455.726 |
Natural Gas | 88.305 | |
Oil | 162.836 | |
Austria | Hard Coal | 989.297 |
Hydro | 7256.102 | |
Natural Gas | 2883.085 | |
Oil | 481.991 | |
Wind | 0.000 | |
Belgium | Hard Coal | 1971.218 |
Hydro | 1305.442 | |
Natural Gas | 4475.025 | |
Nuclear | 5793.162 | |
Wind | 0.000 | |
Bosnia and Herzegovina | Hydro | 2101.863 |
Lignite | 1629.448 | |
Bulgaria | Hard Coal | 1853.203 |
Hydro | 2271.509 | |
Lignite | 2709.644 | |
Nuclear | 1901.892 | |
Wind | 0.000 |
[6]:
geo.powerplant.fill_missing_commyears().head()
[6]:
GEO | Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | DateIn | DateRetrofit | DateMothball | DateOut | lat | lon | EIC | projectID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aarberg | Hydro | Run-Of-River | PP | Switzerland | 10.504635 | NaN | NaN | NaN | NaN | NaN | 1968 | 1968.0 | NaN | NaN | 47.03780 | 7.2720 | NaN | GEO45222 |
1 | Aberthaw | Hard Coal | Steam Turbine | PP | United Kingdom | 1382.987474 | NaN | NaN | NaN | NaN | NaN | 1971 | 2008.0 | NaN | NaN | 51.38731 | -3.4049 | NaN | GEO1972 |
2 | Ablaincourt pressoir | Wind | NaN | PP | France | NaN | NaN | NaN | NaN | NaN | NaN | 2009 | 2009.0 | NaN | NaN | 49.84330 | 2.8017 | NaN | GEO40074 |
3 | Abono | Hard Coal | Steam Turbine | PP | Spain | 849.799703 | NaN | NaN | NaN | NaN | NaN | 1974 | 2005.0 | NaN | NaN | 43.55280 | -5.7231 | NaN | GEO43673 |
4 | Abwinden asten | Hydro | Run-Of-River | PP | Austria | 113.856689 | NaN | NaN | NaN | NaN | NaN | 1979 | 1979.0 | NaN | NaN | 48.24798 | 14.4305 | NaN | GEO45165 |
Of course the pandas function are also very convenient:
[7]:
print('Total capacity of GEO is: \n {} MW \n'.format(geo.Capacity.sum()));
print('The technology types are: \n {} '.format(geo.Technology.unique()))
Total capacity of GEO is:
604029.214937292 MW
The technology types are:
['Run-Of-River' 'Steam Turbine' nan 'CCGT' 'Reservoir' 'OCGT'
'Pumped Storage' 'Reciprocating Engine' 'Oil Engine' 'Gas Turbine'
'CCGT, Thermal' 'Tidal' 'Gas Engines']
Incomplete data
All open databases are so far not complete and cover only an part of overall European powerplants. We perceive the capacity gaps looking at the ENTSOE SO&AF Statistics.
[8]:
stats = pm.data.Capacity_stats()
INFO:powerplantmatching.core:Retrieving data from https://data.open-power-system-data.org/national_generation_capacity/2019-02-22/national_generation_capacity_stacked.csv
[9]:
pm.plot.fueltype_totals_bar([geo, entsoe, stats], keys=["ENTSOE", "GEO", 'Statistics']);

The gaps for both datasets are unmistakable. Adding both datasets on top of each other would not be a solution, since the intersection of both sources are two high, and the resulting dataset would include many duplicates. A better approach is to merge the incomplete datasets together, respecting intersections and differences of each dataset.
Merging datasets
Before comparing two lists of power plants, we need to make sure that the data sets are on the same level of aggretation. That is, we ensure that all power plants blocks are aggregated to powerplant stations.
[10]:
dfs = [geo.powerplant.aggregate_units(), entsoe.powerplant.aggregate_units()]
intersection = pm.matching.combine_multiple_datasets(dfs)
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'GEO'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'GEO'.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'ENTSOE'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'ENTSOE'.
INFO:powerplantmatching.matching:Comparing GEO with ENTSOE
[11]:
intersection.head()
[11]:
GEO | Name | Fueltype | Technology | Set | Country | ... | DateOut | lat | lon | EIC | projectID | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GEO | ENTSOE | GEO | ENTSOE | GEO | ENTSOE | GEO | ENTSOE | GEO | ENTSOE | ... | GEO | ENTSOE | GEO | ENTSOE | GEO | ENTSOE | GEO | ENTSOE | GEO | ENTSOE | |
0 | Abwinden asten | Abwinden asten | Hydro | Hydro | Run-Of-River | NaN | PP | PP | Austria | Austria | ... | NaN | NaN | 48.24798 | 48.218823 | 14.430500 | 14.420281 | [nan] | [14W-BAA-KW-----D] | [GEO45165] | [14W-BAA-KW-----D] |
1 | Jaworzno | Jaworzno | Hard Coal | Hard Coal | Steam Turbine | NaN | CHP | PP | Poland | Poland | ... | NaN | NaN | 50.21000 | 50.204987 | 19.215000 | 19.273931 | [nan, nan] | [19W0000000000636, 19W0000000000628] | [GEO40601, GEO40602] | [19W0000000000636, 19W0000000000628] |
2 | Tauron stalowa wola | Stalowa wola | Hard Coal | Hard Coal | Steam Turbine | NaN | PP | PP | Poland | Poland | ... | NaN | NaN | 50.55250 | 50.582600 | 22.081111 | 22.053586 | [nan] | [19W000000000193U] | [GEO42464] | [19W000000000193U] |
3 | Tauron siersza | Siersza | Hard Coal | Hard Coal | Steam Turbine | NaN | PP | PP | Poland | Poland | ... | NaN | NaN | 50.20860 | 50.201076 | 19.460900 | 19.419812 | [nan] | [19W0000000002361] | [GEO42463] | [19W0000000002361] |
4 | Solina | Solina | Hydro | Hydro | Pumped Storage | NaN | Store | PP | Poland | Poland | ... | NaN | NaN | 49.39560 | 49.399332 | 22.454000 | 22.467730 | [nan] | [19W0000000002515] | [GEO42474] | [19W0000000002515] |
5 rows × 38 columns
The result of the matching process is a multiindexed dataframe. To bring the matched dataframe into a convenient format, we combine the information of the two source sources.
[12]:
intersection = intersection.powerplant.reduce_matched_dataframe()
intersection.head()
[12]:
Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | DateIn | DateRetrofit | DateMothball | DateOut | lat | lon | EIC | projectID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Abwinden asten | Hydro | Run-Of-River | PP | Austria | 168.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 1979.0 | 1979.0 | NaN | NaN | 48.218823 | 14.420281 | [14W-BAA-KW-----D] | {'ENTSOE': ['14W-BAA-KW-----D'], 'GEO': ['GEO4... |
1 | Jaworzno | Hard Coal | Steam Turbine | PP | Poland | 1534.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 1977.0 | 1999.0 | NaN | NaN | 50.204987 | 19.273931 | [19W0000000000636, 19W0000000000628] | {'ENTSOE': ['19W0000000000636', '19W0000000000... |
2 | Stalowa wola | Hard Coal | Steam Turbine | PP | Poland | 250.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 1958.0 | 1966.0 | NaN | NaN | 50.582600 | 22.053586 | [19W000000000193U] | {'ENTSOE': ['19W000000000193U'], 'GEO': ['GEO4... |
3 | Siersza | Hard Coal | Steam Turbine | PP | Poland | 557.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 1962.0 | 1993.0 | NaN | NaN | 50.201076 | 19.419812 | [19W0000000002361] | {'ENTSOE': ['19W0000000002361'], 'GEO': ['GEO4... |
4 | Solina | Hydro | Pumped Storage | PP | Poland | 198.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 1968.0 | 2000.0 | NaN | NaN | 49.399332 | 22.467730 | [19W0000000002515] | {'ENTSOE': ['19W0000000002515'], 'GEO': ['GEO4... |
As you can see in the very last column, we can track which original data entries flew into the resulting one.
We can have a look into the Capacity statisitcs
[13]:
pm.plot.fueltype_totals_bar([intersection, stats], keys=["Intersection", 'Statistics']);

[14]:
combined = intersection.powerplant.extend_by_non_matched(entsoe).powerplant.extend_by_non_matched(geo)
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'ENTSOE'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'ENTSOE'.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'GEO'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'GEO'.
[15]:
pm.plot.fueltype_totals_bar([combined, stats], keys=["Combined", 'Statistics']);

The aggregated capacities roughly match the SO&AF for all conventional powerplants
Processed Data
powerplantmatching comes along with already matched data, this includes data from GEO, ENTSOE, OPSD, CARMA, GPD and ESE (ESE, only if you have followed the instructions)
[16]:
m = pm.collection.matched_data()
[17]:
m.powerplant.plot_map(figsize=(13,13));

[18]:
pm.plot.fueltype_totals_bar([m, stats], keys=["Processed", 'Statistics']);

[19]:
pm.plot.factor_comparison([m, stats], keys=['Processed', 'Statistics'])
[19]:
(<Figure size 864x648 with 1 Axes>,
<matplotlib.axes._subplots.AxesSubplot at 0x7fc506f9cb10>)

[20]:
m.head()
[20]:
Matched Data | Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | DateIn | DateRetrofit | DateMothball | DateOut | lat | lon | EIC | projectID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
0 | Alta | Hydro | Reservoir | Store | Norway | 150.0 | NaN | 389.700000 | 135.0 | 185.0 | 58455.0 | 1987.0 | 1987.0 | NaN | NaN | 69.718538 | 23.807803 | [nan] | {'JRC': ['N278'], 'OPSD': ['OEU5361'], 'ENTSOE... |
1 | Porabka porabka porabka | Hydro | Pumped Storage | Store | Poland | 551.0 | NaN | 3.629764 | 0.0 | 440.0 | 2000.0 | 1953.0 | 1998.0 | NaN | NaN | 49.788898 | 19.206832 | [nan, nan] | {'JRC': ['H1649', 'H50'], 'OPSD': ['OEU301', '... |
2 | Żydowo zydowops zydowo | Hydro | Pumped Storage | Store | Poland | 157.0 | NaN | 42.038217 | 0.0 | 0.0 | 6600.0 | 1971.0 | 1971.0 | NaN | NaN | 54.021671 | 16.738544 | [nan] | {'JRC': ['H315'], 'OPSD': ['OEU332', 'OEU333',... |
3 | Żarnowiec zarnowiec zarnowiec | Hydro | Pumped Storage | Store | Poland | 716.0 | NaN | 8.379888 | 0.0 | 119.3 | 6000.0 | 1983.0 | 2006.0 | NaN | NaN | 54.722272 | 18.082356 | [nan] | {'JRC': ['H36'], 'OPSD': ['OEU330', 'OEU327', ... |
4 | Svartisen | Hydro | Reservoir | Store | Norway | 600.0 | NaN | 8116.390000 | 3506.0 | 543.0 | 4869834.0 | 1993.0 | 2011.0 | NaN | NaN | 66.724624 | 13.941961 | [nan] | {'JRC': ['N250'], 'OPSD': ['OEU5416', 'OEU5417... |
[22]:
pd.concat([m[m.DateIn.notnull()].groupby('Fueltype').DateIn.count(),
m[m.DateIn.isna()].fillna(1).groupby('Fueltype').DateIn.count()],
keys=['DateIn existent', 'DateIn missing'], axis=1)
[22]:
DateIn existent | DateIn missing | |
---|---|---|
Fueltype | ||
Hard Coal | 195 | 30 |
Hydro | 1730 | 3230 |
Lignite | 63 | 13 |
Natural Gas | 505 | 154 |
Nuclear | 64 | 1 |
Oil | 91 | 52 |
Other | 157 | 164 |
Solar | 5 | 90 |
Waste | 75 | 58 |
Wind | 382 | 171 |