Example of Use

[1]:
import powerplantmatching as pm
import pandas as pd

Load open source data published bei the Global Energy Observation, GEO. As you might know, this is not the original format of the database but the standardized format of powerplantmatching.

[2]:
geo = pm.data.GEO()

geo.head()
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
[2]:
GEO Name Fueltype Technology Set Country Capacity Efficiency Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh DateIn DateRetrofit DateMothball DateOut lat lon EIC projectID
0 Aarberg Hydro Run-Of-River PP Switzerland 10.504635 NaN NaN NaN NaN NaN 1968.0 1968.0 NaN NaN 47.03780 7.2720 NaN GEO45222
1 Aberthaw Hard Coal Steam Turbine PP United Kingdom 1382.987474 NaN NaN NaN NaN NaN 1971.0 2008.0 NaN NaN 51.38731 -3.4049 NaN GEO1972
2 Ablaincourt pressoir Wind NaN PP France NaN NaN NaN NaN NaN NaN 2009.0 2009.0 NaN NaN 49.84330 2.8017 NaN GEO40074
3 Abono Hard Coal Steam Turbine PP Spain 849.799703 NaN NaN NaN NaN NaN 1974.0 2005.0 NaN NaN 43.55280 -5.7231 NaN GEO43673
4 Abwinden asten Hydro Run-Of-River PP Austria 113.856689 NaN NaN NaN NaN NaN 1979.0 1979.0 NaN NaN 48.24798 14.4305 NaN GEO45165

Load the data published by the ENTSOE which has the same format as the geo data.

[3]:
entsoe = pm.data.ENTSOE()

entsoe.head()
WARNING:powerplantmatching.core:Geoparsing not possible as no google api key was found, please add the key to your config.yaml if you want to enable it.
[3]:
ENTSOE Name Fueltype Technology Set Country Capacity Efficiency Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh DateIn DateRetrofit DateMothball DateOut lat lon EIC projectID
0 Aboño Hard Coal NaN PP Spain 341.7 NaN NaN NaN NaN NaN NaN NaN NaN NaN 43.564759 -5.718301 18WABO1-12345-0X 18WABO1-12345-0X
1 Aboño Hard Coal NaN PP Spain 535.8 NaN NaN NaN NaN NaN NaN NaN NaN NaN 43.564759 -5.718301 18WABO2-12345-0N 18WABO2-12345-0N
2 Abthb Hard Coal NaN PP United Kingdom 1590.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 51.381064 -2.359017 48WSTN0000ABTHBN 48WSTN0000ABTHBN
3 Abthgt Oil NaN PP United Kingdom 51.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 55.902400 -3.643118 48WSTN000ABTHGTK 48WSTN000ABTHGTK
4 Abwinden asten Hydro NaN PP Austria 168.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 48.218823 14.420281 14W-BAA-KW-----D 14W-BAA-KW-----D

Data Inspection

Whereas various options of inspection of provided by the pandas package, some more powerplant specific methods are applicable via an accessor ‘powerplant’. It gives you a convenient way to inspect, manipulate the data:

[4]:
geo.powerplant.plot_map();
_images/example_8_0.png
[5]:
geo.powerplant.lookup().head(20).to_frame()
[5]:
Capacity
Country Fueltype
Albania Hydro 1455.726
Natural Gas 88.305
Oil 162.836
Austria Hard Coal 989.297
Hydro 7256.102
Natural Gas 2883.085
Oil 481.991
Wind 0.000
Belgium Hard Coal 1971.218
Hydro 1305.442
Natural Gas 4475.025
Nuclear 5793.162
Wind 0.000
Bosnia and Herzegovina Hydro 2101.863
Lignite 1629.448
Bulgaria Hard Coal 1853.203
Hydro 2271.509
Lignite 2709.644
Nuclear 1901.892
Wind 0.000
[6]:
geo.powerplant.fill_missing_commyears().head()
[6]:
GEO Name Fueltype Technology Set Country Capacity Efficiency Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh DateIn DateRetrofit DateMothball DateOut lat lon EIC projectID
0 Aarberg Hydro Run-Of-River PP Switzerland 10.504635 NaN NaN NaN NaN NaN 1968 1968.0 NaN NaN 47.03780 7.2720 NaN GEO45222
1 Aberthaw Hard Coal Steam Turbine PP United Kingdom 1382.987474 NaN NaN NaN NaN NaN 1971 2008.0 NaN NaN 51.38731 -3.4049 NaN GEO1972
2 Ablaincourt pressoir Wind NaN PP France NaN NaN NaN NaN NaN NaN 2009 2009.0 NaN NaN 49.84330 2.8017 NaN GEO40074
3 Abono Hard Coal Steam Turbine PP Spain 849.799703 NaN NaN NaN NaN NaN 1974 2005.0 NaN NaN 43.55280 -5.7231 NaN GEO43673
4 Abwinden asten Hydro Run-Of-River PP Austria 113.856689 NaN NaN NaN NaN NaN 1979 1979.0 NaN NaN 48.24798 14.4305 NaN GEO45165

Of course the pandas function are also very convenient:

[7]:
print('Total capacity of GEO is: \n  {} MW \n'.format(geo.Capacity.sum()));
print('The technology types are: \n {} '.format(geo.Technology.unique()))
Total capacity of GEO is:
  604029.214937292 MW

The technology types are:
 ['Run-Of-River' 'Steam Turbine' nan 'CCGT' 'Reservoir' 'OCGT'
 'Pumped Storage' 'Reciprocating Engine' 'Oil Engine' 'Gas Turbine'
 'CCGT, Thermal' 'Tidal' 'Gas Engines']

Incomplete data

All open databases are so far not complete and cover only an part of overall European powerplants. We perceive the capacity gaps looking at the ENTSOE SO&AF Statistics.

[8]:
stats = pm.data.Capacity_stats()
INFO:powerplantmatching.core:Retrieving data from https://data.open-power-system-data.org/national_generation_capacity/2019-02-22/national_generation_capacity_stacked.csv
[9]:
pm.plot.fueltype_totals_bar([geo, entsoe, stats], keys=["ENTSOE", "GEO", 'Statistics']);
_images/example_16_0.png

The gaps for both datasets are unmistakable. Adding both datasets on top of each other would not be a solution, since the intersection of both sources are two high, and the resulting dataset would include many duplicates. A better approach is to merge the incomplete datasets together, respecting intersections and differences of each dataset.

Merging datasets

Before comparing two lists of power plants, we need to make sure that the data sets are on the same level of aggretation. That is, we ensure that all power plants blocks are aggregated to powerplant stations.

[10]:
dfs = [geo.powerplant.aggregate_units(), entsoe.powerplant.aggregate_units()]
intersection = pm.matching.combine_multiple_datasets(dfs)
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'GEO'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'GEO'.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'ENTSOE'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'ENTSOE'.
INFO:powerplantmatching.matching:Comparing GEO with ENTSOE
[11]:
intersection.head()
[11]:
GEO Name Fueltype Technology Set Country ... DateOut lat lon EIC projectID
GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE ... GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE GEO ENTSOE
0 Abwinden asten Abwinden asten Hydro Hydro Run-Of-River NaN PP PP Austria Austria ... NaN NaN 48.24798 48.218823 14.430500 14.420281 [nan] [14W-BAA-KW-----D] [GEO45165] [14W-BAA-KW-----D]
1 Jaworzno Jaworzno Hard Coal Hard Coal Steam Turbine NaN CHP PP Poland Poland ... NaN NaN 50.21000 50.204987 19.215000 19.273931 [nan, nan] [19W0000000000636, 19W0000000000628] [GEO40601, GEO40602] [19W0000000000636, 19W0000000000628]
2 Tauron stalowa wola Stalowa wola Hard Coal Hard Coal Steam Turbine NaN PP PP Poland Poland ... NaN NaN 50.55250 50.582600 22.081111 22.053586 [nan] [19W000000000193U] [GEO42464] [19W000000000193U]
3 Tauron siersza Siersza Hard Coal Hard Coal Steam Turbine NaN PP PP Poland Poland ... NaN NaN 50.20860 50.201076 19.460900 19.419812 [nan] [19W0000000002361] [GEO42463] [19W0000000002361]
4 Solina Solina Hydro Hydro Pumped Storage NaN Store PP Poland Poland ... NaN NaN 49.39560 49.399332 22.454000 22.467730 [nan] [19W0000000002515] [GEO42474] [19W0000000002515]

5 rows × 38 columns

The result of the matching process is a multiindexed dataframe. To bring the matched dataframe into a convenient format, we combine the information of the two source sources.

[12]:
intersection = intersection.powerplant.reduce_matched_dataframe()
intersection.head()
[12]:
Name Fueltype Technology Set Country Capacity Efficiency Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh DateIn DateRetrofit DateMothball DateOut lat lon EIC projectID
0 Abwinden asten Hydro Run-Of-River PP Austria 168.0 NaN 0.0 0.0 0.0 0.0 1979.0 1979.0 NaN NaN 48.218823 14.420281 [14W-BAA-KW-----D] {'ENTSOE': ['14W-BAA-KW-----D'], 'GEO': ['GEO4...
1 Jaworzno Hard Coal Steam Turbine PP Poland 1534.0 NaN 0.0 0.0 0.0 0.0 1977.0 1999.0 NaN NaN 50.204987 19.273931 [19W0000000000636, 19W0000000000628] {'ENTSOE': ['19W0000000000636', '19W0000000000...
2 Stalowa wola Hard Coal Steam Turbine PP Poland 250.0 NaN 0.0 0.0 0.0 0.0 1958.0 1966.0 NaN NaN 50.582600 22.053586 [19W000000000193U] {'ENTSOE': ['19W000000000193U'], 'GEO': ['GEO4...
3 Siersza Hard Coal Steam Turbine PP Poland 557.0 NaN 0.0 0.0 0.0 0.0 1962.0 1993.0 NaN NaN 50.201076 19.419812 [19W0000000002361] {'ENTSOE': ['19W0000000002361'], 'GEO': ['GEO4...
4 Solina Hydro Pumped Storage PP Poland 198.0 NaN 0.0 0.0 0.0 0.0 1968.0 2000.0 NaN NaN 49.399332 22.467730 [19W0000000002515] {'ENTSOE': ['19W0000000002515'], 'GEO': ['GEO4...

As you can see in the very last column, we can track which original data entries flew into the resulting one.

We can have a look into the Capacity statisitcs

[13]:
pm.plot.fueltype_totals_bar([intersection, stats], keys=["Intersection", 'Statistics']);
_images/example_25_0.png
[14]:
combined = intersection.powerplant.extend_by_non_matched(entsoe).powerplant.extend_by_non_matched(geo)
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'ENTSOE'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'ENTSOE'.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'GEO'.
INFO:powerplantmatching.cleaning:Not using saved aggregation groups for dataset 'GEO'.
[15]:
pm.plot.fueltype_totals_bar([combined, stats], keys=["Combined", 'Statistics']);
_images/example_27_0.png

The aggregated capacities roughly match the SO&AF for all conventional powerplants

Processed Data

powerplantmatching comes along with already matched data, this includes data from GEO, ENTSOE, OPSD, CARMA, GPD and ESE (ESE, only if you have followed the instructions)

[16]:
m = pm.collection.matched_data()
[17]:
m.powerplant.plot_map(figsize=(13,13));
_images/example_32_0.png
[18]:
pm.plot.fueltype_totals_bar([m, stats], keys=["Processed", 'Statistics']);
_images/example_33_0.png
[19]:
pm.plot.factor_comparison([m, stats], keys=['Processed', 'Statistics'])
[19]:
(<Figure size 864x648 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fc506f9cb10>)
_images/example_34_1.png
[20]:
m.head()
[20]:
Matched Data Name Fueltype Technology Set Country Capacity Efficiency Duration Volume_Mm3 DamHeight_m StorageCapacity_MWh DateIn DateRetrofit DateMothball DateOut lat lon EIC projectID
id
0 Alta Hydro Reservoir Store Norway 150.0 NaN 389.700000 135.0 185.0 58455.0 1987.0 1987.0 NaN NaN 69.718538 23.807803 [nan] {'JRC': ['N278'], 'OPSD': ['OEU5361'], 'ENTSOE...
1 Porabka porabka porabka Hydro Pumped Storage Store Poland 551.0 NaN 3.629764 0.0 440.0 2000.0 1953.0 1998.0 NaN NaN 49.788898 19.206832 [nan, nan] {'JRC': ['H1649', 'H50'], 'OPSD': ['OEU301', '...
2 Żydowo zydowops zydowo Hydro Pumped Storage Store Poland 157.0 NaN 42.038217 0.0 0.0 6600.0 1971.0 1971.0 NaN NaN 54.021671 16.738544 [nan] {'JRC': ['H315'], 'OPSD': ['OEU332', 'OEU333',...
3 Żarnowiec zarnowiec zarnowiec Hydro Pumped Storage Store Poland 716.0 NaN 8.379888 0.0 119.3 6000.0 1983.0 2006.0 NaN NaN 54.722272 18.082356 [nan] {'JRC': ['H36'], 'OPSD': ['OEU330', 'OEU327', ...
4 Svartisen Hydro Reservoir Store Norway 600.0 NaN 8116.390000 3506.0 543.0 4869834.0 1993.0 2011.0 NaN NaN 66.724624 13.941961 [nan] {'JRC': ['N250'], 'OPSD': ['OEU5416', 'OEU5417...
[22]:
pd.concat([m[m.DateIn.notnull()].groupby('Fueltype').DateIn.count(),
          m[m.DateIn.isna()].fillna(1).groupby('Fueltype').DateIn.count()],
          keys=['DateIn existent', 'DateIn missing'], axis=1)
[22]:
DateIn existent DateIn missing
Fueltype
Hard Coal 195 30
Hydro 1730 3230
Lignite 63 13
Natural Gas 505 154
Nuclear 64 1
Oil 91 52
Other 157 164
Solar 5 90
Waste 75 58
Wind 382 171