Data Processing Modules¶

data ¶

Collection of power plant data bases and statistical data

Functions:

BEYONDCOAL –

Importer for the BEYOND COAL database.
BNETZA –

Importer for the database put together by Germany's 'Federal Network
CARMA –

Importer for the Carma database.
Capacity_stats –

Standardize the aggregated capacity statistics provided by the ENTSO-E.
EESI –

Get the European Energy Storage Inventory (EESI) dataset.
ENTSOE –

Importer for the list of installed generators provided by the ENTSO-E
ENTSOE_EIC –

Importer for the meta data given for each ENTSOE entry.
EXTERNAL_DATABASE –

Importer for external custom databases.
GBPT –

Importer for the global bioenergy powerplant tracker from global energy monitor.
GCPT –

Importer for the global coal powerplant tracker from global energy monitor.
GEM –

Get the combined dataset of all GEM (https://globalenergymonitor.org/) datasets.
GEM_GGPT –
GEO –

Importer for the GEO database.
GGPT –

Importer for the global gas powerplant tracker from global energy monitor.
GGTPT –

Importer for the global geothermal powerplant tracker from global energy monitor.
GHPT –

Importer for the global gas powerplant tracker from global energy monitor.
GHR –

Get the GloHydroRes (GHR) dataset.
GND –

Get the GeoNuclearData (GND) dataset.
GNPT –

Importer for the global nuclear energy powerplant tracker from global energy monitor.
GPD –

Importer for the Global Power Plant Database.
GSPT –

Importer for the global solar powerplant tracker from global energy monitor.
GWPT –

Importer for the global wind powerplant tracker from global energy monitor.
IRENASTAT –

Importer for the IRENASTAT renewable capacity statistics.
IWPDCY –

This data is not yet available. Was extracted manually from
JRC –

Importer for the JRC Hydro-power plants database retrieves from
MASTR –

Get the Marktstammdatenregister (MaStR) dataset.
OPSD –

Importer for the OPSD (Open Power Systems Data) database.
OPSD_VRE –

Importer for the OPSD (Open Power Systems Data) renewables (VRE)
OPSD_VRE_country –

Get country specific data from OPSD for renewables, if available.
OSM –

Importer for the OpenStreetMap power plant data.
UBA –

Importer for the UBA Database. Please download the data from
WEPP –

Importer for the standardized WEPP (Platts, World Elecrtric Power
WIKIPEDIA –

Importer for the WIKIPEDIA nuclear power plant database.

Attributes:

cget –
net_caps –

cget `module-attribute` ¶

cget = pycountry.countries.get

net_caps `module-attribute` ¶

net_caps = get_config()['display_net_caps']

BEYONDCOAL ¶

BEYONDCOAL(raw=False, update=False, config=None)

Importer for the BEYOND COAL database.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

BNETZA ¶

BNETZA(raw=False, update=False, config=None, header=9, sheet_name='Gesamtkraftwerksliste BNetzA', prune_wind=True, prune_solar=True)

Importer for the database put together by Germany's 'Federal Network Agency' (dt. 'Bundesnetzagentur' (BNetzA)). Please download the data from <https://www.bundesnetzagentur.de/DE/Sachgebiete/ElektrizitaetundGas/ Unternehmen_Institutionen/Versorgungssicherheit/Erzeugungskapazitaeten/ Kraftwerksliste/kraftwerksliste-node.html>_.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
header (int, Default 9, default: 9 ) –

The zero-indexed row in which the column headings are found.

CARMA ¶

CARMA(raw=False, update=False, config=None)

Importer for the Carma database.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

Capacity_stats ¶

Capacity_stats(raw=False, config=None, update=False, source='ENTSO-E SOAF', year=2015)

Standardize the aggregated capacity statistics provided by the ENTSO-E.

Parameters:

year (int, default: 2015 ) –

Year of the data (range usually 2013-2017) (defaults to 2016)
source (str, default: 'ENTSO-E SOAF' ) –

Which statistics source from {'ENTSO-E Transparency Platform', 'EUROSTAT', ...} (defaults to 'ENTSO-E Transparency Platform')

Returns:

df ( DataFrame ) –

Capacity statistics per country and fuel-type

EESI ¶

EESI(raw=False, update=False, config=None)

Get the European Energy Storage Inventory (EESI) dataset.

Provided by the European Commission's Joint Research Centre. Contains chemical, electrochemical, thermal and mechanical energy storage technologies in Europe.

https://ses.jrc.ec.europa.eu/storage-inventory-maps

https://ses.jrc.ec.europa.eu/storage-inventory-tool/api/projects

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

ENTSOE ¶

ENTSOE(raw=False, update=False, config=None, entsoe_token=None, entsoe_session=None, **fill_geoposition_kwargs)

Importer for the list of installed generators provided by the ENTSO-E Transparency Project. Geographical information is not given. If update=True, the dataset is parsed through a request to 'https://transparency.entsoe.eu/generation/r2/ installedCapacityPerProductionUnit/show', Internet connection required. If raw=True, the same request is done, but the unprocessed data is returned.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
entsoe_token –

Security token of the ENTSO-E Transparency platform. If None, it will be read from the config file. A token is required any update.
entsoe_session –

Whether to pass a session to the ENTSO-E client. This can be useful for some networks with proxy settings. Check the client documentation for more information. This argument is just passed to entsoe.EntsoePandasClient.
fill_geoposition_kwargs –

Keyword arguments passed to fill_geoposition.
Note –
RESTful –
https –
web –
token –

ENTSOE_EIC ¶

ENTSOE_EIC(raw=False, update=False, config=None, entsoe_token=None)

Importer for the meta data given for each ENTSOE entry.

This data serves to fill up geographical information. If update=True an internet connection is required.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
entsoe_token –

Security token of the ENTSO-E Transparency platform
Note –
RESTful –
https –
web –
token –

EXTERNAL_DATABASE ¶

EXTERNAL_DATABASE(raw=False, update=True, config=None)

Importer for external custom databases.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GBPT ¶

GBPT(raw=False, update=False, config=None)

Importer for the global bioenergy powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GCPT ¶

GCPT(raw=False, update=False, config=None)

Importer for the global coal powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GEM ¶

GEM(raw=False, update=False, config=None)

Get the combined dataset of all GEM (https://globalenergymonitor.org/) datasets.

Parameters:

raw (bool, default: False ) –

Whether to return the raw dataset, by default False
update (bool, default: False ) –

Whether to update the raw dataset, by default False
config (_type_, default: None ) –

Custom configuration, by default None

GEM_GGPT ¶

GEM_GGPT(*args, **kwargs)

GEO ¶

GEO(raw=False, update=False, config=None)

Importer for the GEO database.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GGPT ¶

GGPT(raw=False, update=False, config=None)

Importer for the global gas powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GGTPT ¶

GGTPT(raw=False, update=False, config=None)

Importer for the global geothermal powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GHPT ¶

GHPT(raw=False, update=False, config=None)

Importer for the global gas powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GHR ¶

GHR(raw=False, update=False, config=None)

Get the GloHydroRes (GHR) dataset.

https://www.nature.com/articles/s41597-025-04975-0

https://zenodo.org/records/14526360

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GND ¶

GND(raw=False, update=False, config=None)

Get the GeoNuclearData (GND) dataset.

https://github.com/cristianst85/GeoNuclearData

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GNPT ¶

GNPT(raw=False, update=False, config=None)

Importer for the global nuclear energy powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GPD ¶

GPD(raw=False, update=False, config=None, filter_other_dbs=True)

Importer for the Global Power Plant Database.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GSPT ¶

GSPT(raw=False, update=False, config=None)

Importer for the global solar powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GWPT ¶

GWPT(raw=False, update=False, config=None)

Importer for the global wind powerplant tracker from global energy monitor.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

IRENASTAT ¶

IRENASTAT(raw=False, update=False, config=None)

Importer for the IRENASTAT renewable capacity statistics.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

IWPDCY ¶

IWPDCY(config=None)

This data is not yet available. Was extracted manually from the 'International Water Power & Dam Country Yearbook'.

Parameters:

config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

JRC ¶

JRC(raw=False, update=False, config=None)

Importer for the JRC Hydro-power plants database retrieves from https://github.com/energy-modelling-toolkit/hydro-power-database.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

MASTR ¶

MASTR(raw=False, update=False, config=None)

Get the Marktstammdatenregister (MaStR) dataset.

Provided by the German Federal Network Agency (Bundesnetzagentur / BNetzA) and contains data on Germany, Austria and Switzerland.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

OPSD ¶

OPSD(raw=False, update=False, statusDE=None, config=None, **fill_geoposition_kwargs)

Importer for the OPSD (Open Power Systems Data) database.

Parameters:

raw (Boolean, default: False ) –

Whether to return a dictionary of the raw databases.
update –

Whether to update the data from the url.
statusDE (list, default: ['operating', 'reserve', 'special_case'] ) –

Filter DE entries by operational status ['operating', 'shutdown', 'reserve', etc.]
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
fill_geoposition_kwargs –

Keyword arguments for fill_geoposition.

OPSD_VRE ¶

OPSD_VRE(raw=False, update=False, config=None)

Importer for the OPSD (Open Power Systems Data) renewables (VRE) database.

This sqlite database is very big and hence not part of the package. It needs to be obtained from <http://data.open-power-system-data.org/renewable_power_plants/>_

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

OPSD_VRE_country ¶

OPSD_VRE_country(country, raw=False, update=False, config=None)

Get country specific data from OPSD for renewables, if available. Available for DE, FR, PL, CH, DK, CZ and SE (last update: 09/2020).

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

OSM ¶

OSM(raw=False, update=False, config=None)

Importer for the OpenStreetMap power plant data.

Downloads pre-processed OSM data from the osm-powerplants repository.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update (bool, default: False ) –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

Returns:

DataFrame –

Power plant data from OpenStreetMap

UBA ¶

UBA(raw=False, update=False, config=None, header=9, skipfooter=26, prune_wind=True, prune_solar=True)

Importer for the UBA Database. Please download the data from <https://www.umweltbundesamt.de/dokument/datenbank-kraftwerke-in-deutschland>_.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
header (int, Default 9, default: 9 ) –

The zero-indexed row in which the column headings are found.
skipfooter (int, Default 26, default: 26 ) –

WEPP ¶

WEPP(raw=False, config=None)

Importer for the standardized WEPP (Platts, World Elecrtric Power Plants Database). This database is not provided by this repository because of its restrictive licence.

Parameters:

raw (Boolean, default: False ) –

Whether to return the original dataset
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

WIKIPEDIA ¶

WIKIPEDIA(raw=False, update=False, config=None)

Importer for the WIKIPEDIA nuclear power plant database.

Parameters:

raw (boolean, default: False ) –

Whether to return the original dataset
update –

Whether to update the data from the url.
config (dict, default: None ) –

Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

cleaning ¶

Functions for vertically cleaning a dataset.

Functions:

aggregate_units –

Vertical cleaning of the database. Cleans the "Name"-column, sums
clean_name –

Clean the name of a power plant list.
clean_powerplantname –
clean_technology –

Clean the 'Technology' by condensing down the value into one claim. This
cliques –

Locate cliques of units which are determined to belong to the same
config_target_key –

Convert a column name to the key that is used to specify the target
gather_and_replace –

Search for patterns in multiple columns and return a series of represantativ keys.
gather_fueltype_info –

Parses in a set of columns for distinct fueltype specifications.
gather_set_info –

Parses in a set of columns for distinct Set specifications.
gather_specifications –

Parse columns to collect representative keys.
gather_technology_info –

Parses in a set of columns for distinct technology specifications.
mode –

Get the most common value of a series.

Attributes:

AGGREGATION_FUNCTIONS –

AGGREGATION_FUNCTIONS `module-attribute` ¶

AGGREGATION_FUNCTIONS = {'Name': mode, 'Fueltype': mode, 'Technology': mode, 'Set': mode, 'Country': mode, 'Capacity': 'sum', 'lat': 'mean', 'lon': 'mean', 'DateIn': 'min', 'DateRetrofit': 'max', 'DateMothball': 'min', 'DateOut': 'max', 'File': mode, 'projectID': set, 'EIC': set, 'Duration': 'sum', 'Volume_Mm3': 'sum', 'DamHeight_m': 'sum', 'StorageCapacity_MWh': 'sum', 'Efficiency': 'sum'}

aggregate_units ¶

aggregate_units(df, dataset_name=None, pre_clean_name=False, country_wise=True, config=None, threads=1, **kwargs)

Vertical cleaning of the database. Cleans the "Name"-column, sums up the capacity of powerplant units which are determined to belong to the same plant.

Parameters:

df (Dataframe or string) –

Dataframe or name to use for the resulting database
dataset_name (str, default: None ) –

Specify the name of your df, required if use_saved_aggregation is set to True.
pre_clean_name (Boolean, default: True ) –

Whether to clean the 'Name'-column before aggregating.
country_wise (Boolean, default: True ) –

Whether to aggregate only entries with a identical country.
threads (int, default: 1 ) –

Number of threads to use

clean_name ¶

clean_name(df, config=None)

Clean the name of a power plant list.

Cleans the column "Name" of the database by deleting very frequent words and nonalphanumerical characters of the column. Returns a reduced dataframe with nonempty Name-column.

Parameters:

df (Dataframe) –

dataframe to be cleaned
config (dict, default: None ) –

Custom configuration, defaults to powerplantmatching.config.get_config().

clean_powerplantname ¶

clean_powerplantname(df, config=None)

clean_technology ¶

clean_technology(df, generalize_hydros=False)

Clean the 'Technology' by condensing down the value into one claim. This procedure might reduce the scope of information, however is crucial for comparing different data sources.

Parameter

search_col : list, default is ['Name', 'Fueltype', 'Technology'] Specify the columns to be parsed config : dict, default None Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

cliques ¶

cliques(df, dataduplicates)

Locate cliques of units which are determined to belong to the same powerplant. Return the same dataframe with an additional column "grouped" which indicates the group that the powerplant is belonging to.

Parameters:

df (Dataframe or string) –

dataframe or csv-file which should be analysed
dataduplicates (Dataframe or string) –

dataframe or name of the csv-linkfile which determines the link within one dataset

config_target_key ¶

config_target_key(column)

Convert a column name to the key that is used to specify the target values in the config.

Parameters:

column (str) –

Name of the column.

Returns:

str –

Name of the key used in the config file.

gather_and_replace ¶

gather_and_replace(df, mapping)

Search for patterns in multiple columns and return a series of represantativ keys.

The function will return a series of unique identifiers given by the keys of the mapping dictionary. The order in the mapping dictionary determines which represantativ keys are calculated first. Note that these may be overwritten by the following mappings.

Parameters:

df (DataFrame) –

DataFrame with columns that should be parsed.
mapping (dict) –

Dictionary mapping the represantativ keys to the regex patterns.

gather_fueltype_info ¶

gather_fueltype_info(df, search_col=['Name', 'Fueltype', 'Technology'], config=None)

Parses in a set of columns for distinct fueltype specifications.

This function uses the mappings (key -> regex pattern) given by the config under the section target_fueltypes. The representative keys are set if any of the columns in search_col matches the regex pattern.

Parameter

df : pandas.DataFrame DataFrame to be parsed. search_col : list, default is ["Name", "Fueltype", "Technology", "Set"] Set of columns to be parsed. Must be in df. config : dict, default None Custom configuration, defaults to powerplantmatching.config.get_config().

gather_set_info ¶

gather_set_info(df, search_col=['Name', 'Fueltype', 'Technology'], config=None)

Parses in a set of columns for distinct Set specifications.

This function uses the mappings (key -> regex pattern) given by the config under the section target_sets. The representative keys are set if any of the columns in search_col matches the regex pattern.

Parameter

df : pandas.DataFrame DataFrame to be parsed. search_col : list, default is ["Name", "Fueltype", "Technology", "Set"] Set of columns to be parsed. Must be in df. config : dict, default None Custom configuration, defaults to powerplantmatching.config.get_config().

gather_specifications ¶

gather_specifications(df, target_columns=['Fueltype', 'Technology', 'Set'], parse_columns=['Name', 'Fueltype', 'Technology', 'Set'], config=None)

Parse columns to collect representative keys.

This function will parse the columns specified in parse_columns and collects the representative keys for each row in target_columns. The parsing is based on the config file.

Parameters:

df (DataFrame) –

Power plant dataframe.
target_columns (list, default: ['Fueltype', 'Technology', 'Set'] ) –

Columns where the representative keys will be collected, by default ["Fueltype", "Technology", "Set"]
parse_columns (list, default: ['Name', 'Fueltype', 'Technology', 'Set'] ) –

Columns that should be parsed, by default ["Name", "Fueltype", "Technology", "Set"]
config (dict, default: None ) –

Custom configuration, defaults to powerplantmatching.config.get_config().

Returns:

DataFrame –

gather_technology_info ¶

gather_technology_info(df, search_col=['Name', 'Fueltype', 'Technology', 'Set'], config=None)

Parses in a set of columns for distinct technology specifications.

This function uses the mappings (key -> regex pattern) given by the config under the section target_technologies. The representative keys are set if any of the columns in search_col matches the regex pattern.

Parameter

df : pandas.DataFrame DataFrame to be parsed. search_col : list, default is ["Name", "Fueltype", "Technology", "Set"] Set of columns to be parsed. Must be in df. config : dict, default None Custom configuration, defaults to powerplantmatching.config.get_config().

mode ¶

mode(x)

Get the most common value of a series.

matching ¶

Functions for linking and combining different datasets

Functions:

best_matches –

Subsequent to duke() with singlematch=True. Returns reduced list of
combine_multiple_datasets –

Duke-based horizontal match of multiple databases. Returns the
compare_two_datasets –

Duke-based horizontal match of two databases. Returns the matched
cross_matches –

Combines multiple sets of pairs and returns one consistent
link_multiple_datasets –

Duke-based horizontal match of multiple databases. Returns the
reduce_matched_dataframe –

Reduce a matched dataframe to a unique set of columns. For each entry

best_matches ¶

best_matches(links)

Subsequent to duke() with singlematch=True. Returns reduced list of matches on the base of the highest score for each duplicated entry.

Parameters:

links (DataFrame) –

Links as returned by duke

combine_multiple_datasets ¶

combine_multiple_datasets(datasets, labels=None, config=None, **dukeargs)

Duke-based horizontal match of multiple databases. Returns the matched dataframe including only the matched entries in a multi-indexed pandas.Dataframe. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link to the other database. This leads to unique entries in the resulting dataframe.

Parameters:

datasets (list of pandas.Dataframe or strings) –

dataframes or csv-files to use for the matching
labels (list of strings, default: None ) –

Names of the databases in alphabetical order and corresponding order to the datasets

compare_two_datasets ¶

compare_two_datasets(dfs, labels, country_wise=True, config=None, **dukeargs)

Duke-based horizontal match of two databases. Returns the matched dataframe including only the matched entries in a multi-indexed pandas.Dataframe. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different two datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link in order to obtain unique entries in the resulting dataframe. Attention: When aborting this command, the duke process will still continue in the background, wait until the process is finished before restarting.

Parameters:

dfs (list of pandas.Dataframe or strings) –

dataframes or csv-files to use for the matching
labels (list of strings) –

Names of the databases for the resulting dataframe

cross_matches ¶

cross_matches(sets_of_pairs, labels=None)

Combines multiple sets of pairs and returns one consistent dataframe. Identifiers of two datasets can appear in one row even though they did not match directly but indirectly through a connecting identifier of another database.

Parameters:

sets_of_pairs (list) –

list of pd.Dataframe's containing only the matches (without scores), obtained from the linkfile (duke() and best_matches())
labels (list of strings, default: None ) –

list of names of the databases, used for specifying the order of the output

link_multiple_datasets ¶

link_multiple_datasets(datasets, labels, use_saved_matches=False, config=None, **dukeargs)

Duke-based horizontal match of multiple databases. Returns the matching indices of the datasets. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link to the other database. This leads to unique entries in the resulting dataframe.

Parameters:

datasets (list of pandas.Dataframe or strings) –

dataframes or csv-files to use for the matching
labels (list of strings) –

Names of the databases in alphabetical order and corresponding order to the datasets

reduce_matched_dataframe ¶

reduce_matched_dataframe(df, show_orig_names=False, config=None)

Reduce a matched dataframe to a unique set of columns. For each entry take the value of the most reliable data source included in that match.

Parameters:

df (Dataframe) –

MultiIndex dataframe with the matched powerplants, as obtained from combined_dataframe() or match_multiple_datasets()

collection ¶

Processed datasets of merged and/or adjusted data

Functions:

collect –

Return the collection for a given list of datasets in matched or
matched_data –
powerplants –

Return the full matched dataset including all data sources listed in

collect ¶

collect(datasets, update=False, reduced=True, config=None, **dukeargs)

Return the collection for a given list of datasets in matched or reduced form.

Parameters:

datasets (list or str) –

list containing the dataset identifiers as str, or single str
update (bool, default: False ) –

Do an horizontal update (True) or read from the cache file (False)
reduced (bool, default: True ) –

Switch as to return the reduced (True) or matched (False) dataset.
config (dict, default: None ) –

Configuration file of powerplantmatching
**dukeargs (keyword-args for duke, default: {} ) –

matched_data ¶

matched_data(config=None, update=False, from_url=False, extend_by_vres=False, extendby_kwargs={}, extend_by_kwargs={}, fill_geopositions=True, filter_missing_geopositions=True, **collection_kwargs)

powerplants ¶

powerplants(config=None, config_update=None, update=False, from_url=False, extend_by_vres=False, extendby_kwargs={}, extend_by_kwargs={}, fill_geopositions=True, filter_missing_geopositions=True, **collection_kwargs)

Return the full matched dataset including all data sources listed in config.yaml/matching_sources. The combined data is additionally extended by non-matched entries of sources given in config.yaml/fully_included_sources.

Parameters:

update (Boolean, default: False ) –

Whether to rerun the matching process. Overrides stored to False
if True.

from_url –

Whether to parse and store the already build data from the repo
website.

config (Dict, default: None ) –

Define a configuration varying from the setting in config.yaml.
Relevant keywords are 'matching_sources', 'fully_included_sources'.

config_update (Dict, default: None ) –

Configuration input dictionary to be merged into the default
configuration data

extend_by_vres (Boolean, default: False ) –

Whether extend the dataset by variable renewable energy sources
given by powerplantmatching.data.OPSD_VRE()

extendby_kwargs ((Dict,), default: {} ) –

Dict of keyword arguments passed to powerplantmatchting.
heuristics.extend_by_non_matched

fill_geopositions –

Whether to fill geo coordinates by calling
`df.powerplant.fill_geoposition()` after the matching process
and before the optional extension by VRES. Only active if
`update` is true.

filter_missing_geopositions –

Whether to filter out resulting entries without geo coordinates. The
filtering happens after the matching process and the optional filling of
geo coordinates and before the optional extension by VRES. Only active
if `update` is true.

**collection_kwargs (kwargs, default: {} ) –

Arguments passed to powerplantmatching.collection.Collection.

Data Processing Modules¶

data ¶

cget module-attribute ¶

net_caps module-attribute ¶

BEYONDCOAL ¶

BNETZA ¶

CARMA ¶

Capacity_stats ¶

EESI ¶

ENTSOE ¶

ENTSOE_EIC ¶

EXTERNAL_DATABASE ¶

GBPT ¶

GCPT ¶

GEM ¶

GEM_GGPT ¶

GEO ¶

GGPT ¶

GGTPT ¶

GHPT ¶

GHR ¶

GND ¶

GNPT ¶

GPD ¶

GSPT ¶

GWPT ¶

IRENASTAT ¶

IWPDCY ¶

JRC ¶

MASTR ¶

OPSD ¶

OPSD_VRE ¶

OPSD_VRE_country ¶

OSM ¶

UBA ¶

WEPP ¶

WIKIPEDIA ¶

cleaning ¶

AGGREGATION_FUNCTIONS module-attribute ¶

aggregate_units ¶

clean_name ¶

clean_powerplantname ¶

clean_technology ¶

cliques ¶

config_target_key ¶

gather_and_replace ¶

gather_fueltype_info ¶

gather_set_info ¶

gather_specifications ¶

gather_technology_info ¶

mode ¶

matching ¶

best_matches ¶

combine_multiple_datasets ¶

compare_two_datasets ¶

cross_matches ¶

link_multiple_datasets ¶

reduce_matched_dataframe ¶

collection ¶

collect ¶

matched_data ¶

powerplants ¶

cget `module-attribute` ¶

net_caps `module-attribute` ¶

AGGREGATION_FUNCTIONS `module-attribute` ¶