Skip to content

Data Processing Modules

data

Collection of power plant data bases and statistical data

Functions:

  • BEYONDCOAL

    Importer for the BEYOND COAL database.

  • BNETZA

    Importer for the database put together by Germany's 'Federal Network

  • CARMA

    Importer for the Carma database.

  • Capacity_stats

    Standardize the aggregated capacity statistics provided by the ENTSO-E.

  • EESI

    Get the European Energy Storage Inventory (EESI) dataset.

  • ENTSOE

    Importer for the list of installed generators provided by the ENTSO-E

  • ENTSOE_EIC

    Importer for the meta data given for each ENTSOE entry.

  • EXTERNAL_DATABASE

    Importer for external custom databases.

  • GBPT

    Importer for the global bioenergy powerplant tracker from global energy monitor.

  • GCPT

    Importer for the global coal powerplant tracker from global energy monitor.

  • GEM

    Get the combined dataset of all GEM (https://globalenergymonitor.org/) datasets.

  • GEM_GGPT
  • GEO

    Importer for the GEO database.

  • GGPT

    Importer for the global gas powerplant tracker from global energy monitor.

  • GGTPT

    Importer for the global geothermal powerplant tracker from global energy monitor.

  • GHPT

    Importer for the global gas powerplant tracker from global energy monitor.

  • GHR

    Get the GloHydroRes (GHR) dataset.

  • GND

    Get the GeoNuclearData (GND) dataset.

  • GNPT

    Importer for the global nuclear energy powerplant tracker from global energy monitor.

  • GPD

    Importer for the Global Power Plant Database.

  • GSPT

    Importer for the global solar powerplant tracker from global energy monitor.

  • GWPT

    Importer for the global wind powerplant tracker from global energy monitor.

  • IRENASTAT

    Importer for the IRENASTAT renewable capacity statistics.

  • IWPDCY

    This data is not yet available. Was extracted manually from

  • JRC

    Importer for the JRC Hydro-power plants database retrieves from

  • MASTR

    Get the Marktstammdatenregister (MaStR) dataset.

  • OPSD

    Importer for the OPSD (Open Power Systems Data) database.

  • OPSD_VRE

    Importer for the OPSD (Open Power Systems Data) renewables (VRE)

  • OPSD_VRE_country

    Get country specific data from OPSD for renewables, if available.

  • OSM

    Importer for the OpenStreetMap power plant data.

  • UBA

    Importer for the UBA Database. Please download the data from

  • WEPP

    Importer for the standardized WEPP (Platts, World Elecrtric Power

  • WIKIPEDIA

    Importer for the WIKIPEDIA nuclear power plant database.

Attributes:

cget module-attribute

cget = get

net_caps module-attribute

net_caps = get_config()['display_net_caps']

BEYONDCOAL

BEYONDCOAL(raw=False, update=False, config=None)

Importer for the BEYOND COAL database.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

BNETZA

BNETZA(raw=False, update=False, config=None, header=9, sheet_name='Gesamtkraftwerksliste BNetzA', prune_wind=True, prune_solar=True)

Importer for the database put together by Germany's 'Federal Network Agency' (dt. 'Bundesnetzagentur' (BNetzA)). Please download the data from <https://www.bundesnetzagentur.de/DE/Sachgebiete/ElektrizitaetundGas/ Unternehmen_Institutionen/Versorgungssicherheit/Erzeugungskapazitaeten/ Kraftwerksliste/kraftwerksliste-node.html>_.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

  • header (int, Default 9, default: 9 ) –

    The zero-indexed row in which the column headings are found.

CARMA

CARMA(raw=False, update=False, config=None)

Importer for the Carma database.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

Capacity_stats

Capacity_stats(raw=False, config=None, update=False, source='ENTSO-E SOAF', year=2015)

Standardize the aggregated capacity statistics provided by the ENTSO-E.

Parameters:

  • year (int, default: 2015 ) –

    Year of the data (range usually 2013-2017) (defaults to 2016)

  • source (str, default: 'ENTSO-E SOAF' ) –

    Which statistics source from {'ENTSO-E Transparency Platform', 'EUROSTAT', ...} (defaults to 'ENTSO-E Transparency Platform')

Returns:

  • df ( DataFrame ) –

    Capacity statistics per country and fuel-type

EESI

EESI(raw=False, update=False, config=None)

Get the European Energy Storage Inventory (EESI) dataset.

Provided by the European Commission's Joint Research Centre. Contains chemical, electrochemical, thermal and mechanical energy storage technologies in Europe.

https://ses.jrc.ec.europa.eu/storage-inventory-maps

https://ses.jrc.ec.europa.eu/storage-inventory-tool/api/projects

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

ENTSOE

ENTSOE(raw=False, update=False, config=None, entsoe_token=None, entsoe_session=None, **fill_geoposition_kwargs)

Importer for the list of installed generators provided by the ENTSO-E Transparency Project. Geographical information is not given. If update=True, the dataset is parsed through a request to 'https://transparency.entsoe.eu/generation/r2/ installedCapacityPerProductionUnit/show', Internet connection required. If raw=True, the same request is done, but the unprocessed data is returned.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

  • entsoe_token

    Security token of the ENTSO-E Transparency platform. If None, it will be read from the config file. A token is required any update.

  • entsoe_session

    Whether to pass a session to the ENTSO-E client. This can be useful for some networks with proxy settings. Check the client documentation for more information. This argument is just passed to entsoe.EntsoePandasClient.

  • fill_geoposition_kwargs

    Keyword arguments passed to fill_geoposition.

  • Note
  • RESTful
  • https
  • web
  • token

ENTSOE_EIC

ENTSOE_EIC(raw=False, update=False, config=None, entsoe_token=None)

Importer for the meta data given for each ENTSOE entry.

This data serves to fill up geographical information. If update=True an internet connection is required.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

  • entsoe_token

    Security token of the ENTSO-E Transparency platform

  • Note
  • RESTful
  • https
  • web
  • token

EXTERNAL_DATABASE

EXTERNAL_DATABASE(raw=False, update=True, config=None)

Importer for external custom databases.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GBPT

GBPT(raw=False, update=False, config=None)

Importer for the global bioenergy powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GCPT

GCPT(raw=False, update=False, config=None)

Importer for the global coal powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GEM

GEM(raw=False, update=False, config=None)

Get the combined dataset of all GEM (https://globalenergymonitor.org/) datasets.

Parameters:

  • raw (bool, default: False ) –

    Whether to return the raw dataset, by default False

  • update (bool, default: False ) –

    Whether to update the raw dataset, by default False

  • config (_type_, default: None ) –

    Custom configuration, by default None

GEM_GGPT

GEM_GGPT(*args, **kwargs)

GEO

GEO(raw=False, update=False, config=None)

Importer for the GEO database.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GGPT

GGPT(raw=False, update=False, config=None)

Importer for the global gas powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GGTPT

GGTPT(raw=False, update=False, config=None)

Importer for the global geothermal powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GHPT

GHPT(raw=False, update=False, config=None)

Importer for the global gas powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GHR

GHR(raw=False, update=False, config=None)

Get the GloHydroRes (GHR) dataset.

https://www.nature.com/articles/s41597-025-04975-0

https://zenodo.org/records/14526360

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GND

GND(raw=False, update=False, config=None)

Get the GeoNuclearData (GND) dataset.

https://github.com/cristianst85/GeoNuclearData

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GNPT

GNPT(raw=False, update=False, config=None)

Importer for the global nuclear energy powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GPD

GPD(raw=False, update=False, config=None, filter_other_dbs=True)

Importer for the Global Power Plant Database.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GSPT

GSPT(raw=False, update=False, config=None)

Importer for the global solar powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

GWPT

GWPT(raw=False, update=False, config=None)

Importer for the global wind powerplant tracker from global energy monitor.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

IRENASTAT

IRENASTAT(raw=False, update=False, config=None)

Importer for the IRENASTAT renewable capacity statistics.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

IWPDCY

IWPDCY(config=None)

This data is not yet available. Was extracted manually from the 'International Water Power & Dam Country Yearbook'.

Parameters:

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

JRC

JRC(raw=False, update=False, config=None)

Importer for the JRC Hydro-power plants database retrieves from https://github.com/energy-modelling-toolkit/hydro-power-database.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

MASTR

MASTR(raw=False, update=False, config=None)

Get the Marktstammdatenregister (MaStR) dataset.

Provided by the German Federal Network Agency (Bundesnetzagentur / BNetzA) and contains data on Germany, Austria and Switzerland.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

OPSD

OPSD(raw=False, update=False, statusDE=None, config=None, **fill_geoposition_kwargs)

Importer for the OPSD (Open Power Systems Data) database.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return a dictionary of the raw databases.

  • update

    Whether to update the data from the url.

  • statusDE (list, default: ['operating', 'reserve', 'special_case'] ) –

    Filter DE entries by operational status ['operating', 'shutdown', 'reserve', etc.]

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

  • fill_geoposition_kwargs

    Keyword arguments for fill_geoposition.

OPSD_VRE

OPSD_VRE(raw=False, update=False, config=None)

Importer for the OPSD (Open Power Systems Data) renewables (VRE) database.

This sqlite database is very big and hence not part of the package. It needs to be obtained from <http://data.open-power-system-data.org/renewable_power_plants/>_

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

OPSD_VRE_country

OPSD_VRE_country(country, raw=False, update=False, config=None)

Get country specific data from OPSD for renewables, if available. Available for DE, FR, PL, CH, DK, CZ and SE (last update: 09/2020).

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

OSM

OSM(raw=False, update=False, config=None)

Importer for the OpenStreetMap power plant data.

Downloads pre-processed OSM data from the osm-powerplants repository.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update (bool, default: False ) –

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

Returns:

  • DataFrame

    Power plant data from OpenStreetMap

UBA

UBA(raw=False, update=False, config=None, header=9, skipfooter=26, prune_wind=True, prune_solar=True)

Importer for the UBA Database. Please download the data from <https://www.umweltbundesamt.de/dokument/datenbank-kraftwerke-in-deutschland>_.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

  • header (int, Default 9, default: 9 ) –

    The zero-indexed row in which the column headings are found.

  • skipfooter (int, Default 26, default: 26 ) –

WEPP

WEPP(raw=False, config=None)

Importer for the standardized WEPP (Platts, World Elecrtric Power Plants Database). This database is not provided by this repository because of its restrictive licence.

Parameters:

  • raw (Boolean, default: False ) –

    Whether to return the original dataset

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

WIKIPEDIA

WIKIPEDIA(raw=False, update=False, config=None)

Importer for the WIKIPEDIA nuclear power plant database.

Parameters:

  • raw (boolean, default: False ) –

    Whether to return the original dataset

  • update

    Whether to update the data from the url.

  • config (dict, default: None ) –

    Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

cleaning

Functions for vertically cleaning a dataset.

Functions:

Attributes:

AGGREGATION_FUNCTIONS module-attribute

AGGREGATION_FUNCTIONS = {'Name': mode, 'Fueltype': mode, 'Technology': mode, 'Set': mode, 'Country': mode, 'Capacity': 'sum', 'lat': 'mean', 'lon': 'mean', 'DateIn': 'min', 'DateRetrofit': 'max', 'DateMothball': 'min', 'DateOut': 'max', 'File': mode, 'projectID': set, 'EIC': set, 'Duration': 'sum', 'Volume_Mm3': 'sum', 'DamHeight_m': 'sum', 'StorageCapacity_MWh': 'sum', 'Efficiency': 'sum'}

aggregate_units

aggregate_units(df, dataset_name=None, pre_clean_name=False, country_wise=True, config=None, threads=1, **kwargs)

Vertical cleaning of the database. Cleans the "Name"-column, sums up the capacity of powerplant units which are determined to belong to the same plant.

Parameters:

  • df (Dataframe or string) –

    Dataframe or name to use for the resulting database

  • dataset_name (str, default: None ) –

    Specify the name of your df, required if use_saved_aggregation is set to True.

  • pre_clean_name (Boolean, default: True ) –

    Whether to clean the 'Name'-column before aggregating.

  • country_wise (Boolean, default: True ) –

    Whether to aggregate only entries with a identical country.

  • threads (int, default: 1 ) –

    Number of threads to use

clean_name

clean_name(df, config=None)

Clean the name of a power plant list.

Cleans the column "Name" of the database by deleting very frequent words and nonalphanumerical characters of the column. Returns a reduced dataframe with nonempty Name-column.

Parameters:

  • df (Dataframe) –

    dataframe to be cleaned

  • config (dict, default: None ) –

    Custom configuration, defaults to powerplantmatching.config.get_config().

clean_powerplantname

clean_powerplantname(df, config=None)

clean_technology

clean_technology(df, generalize_hydros=False)

Clean the 'Technology' by condensing down the value into one claim. This procedure might reduce the scope of information, however is crucial for comparing different data sources.

Parameter

search_col : list, default is ['Name', 'Fueltype', 'Technology'] Specify the columns to be parsed config : dict, default None Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()

cliques

cliques(df, dataduplicates)

Locate cliques of units which are determined to belong to the same powerplant. Return the same dataframe with an additional column "grouped" which indicates the group that the powerplant is belonging to.

Parameters:

  • df (Dataframe or string) –

    dataframe or csv-file which should be analysed

  • dataduplicates (Dataframe or string) –

    dataframe or name of the csv-linkfile which determines the link within one dataset

config_target_key

config_target_key(column)

Convert a column name to the key that is used to specify the target values in the config.

Parameters:

  • column (str) –

    Name of the column.

Returns:

  • str

    Name of the key used in the config file.

gather_and_replace

gather_and_replace(df, mapping)

Search for patterns in multiple columns and return a series of represantativ keys.

The function will return a series of unique identifiers given by the keys of the mapping dictionary. The order in the mapping dictionary determines which represantativ keys are calculated first. Note that these may be overwritten by the following mappings.

Parameters:

  • df (DataFrame) –

    DataFrame with columns that should be parsed.

  • mapping (dict) –

    Dictionary mapping the represantativ keys to the regex patterns.

gather_fueltype_info

gather_fueltype_info(df, search_col=['Name', 'Fueltype', 'Technology'], config=None)

Parses in a set of columns for distinct fueltype specifications.

This function uses the mappings (key -> regex pattern) given by the config under the section target_fueltypes. The representative keys are set if any of the columns in search_col matches the regex pattern.

Parameter

df : pandas.DataFrame DataFrame to be parsed. search_col : list, default is ["Name", "Fueltype", "Technology", "Set"] Set of columns to be parsed. Must be in df. config : dict, default None Custom configuration, defaults to powerplantmatching.config.get_config().

gather_set_info

gather_set_info(df, search_col=['Name', 'Fueltype', 'Technology'], config=None)

Parses in a set of columns for distinct Set specifications.

This function uses the mappings (key -> regex pattern) given by the config under the section target_sets. The representative keys are set if any of the columns in search_col matches the regex pattern.

Parameter

df : pandas.DataFrame DataFrame to be parsed. search_col : list, default is ["Name", "Fueltype", "Technology", "Set"] Set of columns to be parsed. Must be in df. config : dict, default None Custom configuration, defaults to powerplantmatching.config.get_config().

gather_specifications

gather_specifications(df, target_columns=['Fueltype', 'Technology', 'Set'], parse_columns=['Name', 'Fueltype', 'Technology', 'Set'], config=None)

Parse columns to collect representative keys.

This function will parse the columns specified in parse_columns and collects the representative keys for each row in target_columns. The parsing is based on the config file.

Parameters:

  • df (DataFrame) –

    Power plant dataframe.

  • target_columns (list, default: ['Fueltype', 'Technology', 'Set'] ) –

    Columns where the representative keys will be collected, by default ["Fueltype", "Technology", "Set"]

  • parse_columns (list, default: ['Name', 'Fueltype', 'Technology', 'Set'] ) –

    Columns that should be parsed, by default ["Name", "Fueltype", "Technology", "Set"]

  • config (dict, default: None ) –

    Custom configuration, defaults to powerplantmatching.config.get_config().

Returns:

  • DataFrame

gather_technology_info

gather_technology_info(df, search_col=['Name', 'Fueltype', 'Technology', 'Set'], config=None)

Parses in a set of columns for distinct technology specifications.

This function uses the mappings (key -> regex pattern) given by the config under the section target_technologies. The representative keys are set if any of the columns in search_col matches the regex pattern.

Parameter

df : pandas.DataFrame DataFrame to be parsed. search_col : list, default is ["Name", "Fueltype", "Technology", "Set"] Set of columns to be parsed. Must be in df. config : dict, default None Custom configuration, defaults to powerplantmatching.config.get_config().

mode

mode(x)

Get the most common value of a series.

matching

Functions for linking and combining different datasets

Functions:

best_matches

best_matches(links)

Subsequent to duke() with singlematch=True. Returns reduced list of matches on the base of the highest score for each duplicated entry.

Parameters:

  • links (DataFrame) –

    Links as returned by duke

combine_multiple_datasets

combine_multiple_datasets(datasets, labels=None, config=None, **dukeargs)

Duke-based horizontal match of multiple databases. Returns the matched dataframe including only the matched entries in a multi-indexed pandas.Dataframe. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link to the other database. This leads to unique entries in the resulting dataframe.

Parameters:

  • datasets (list of pandas.Dataframe or strings) –

    dataframes or csv-files to use for the matching

  • labels (list of strings, default: None ) –

    Names of the databases in alphabetical order and corresponding order to the datasets

compare_two_datasets

compare_two_datasets(dfs, labels, country_wise=True, config=None, **dukeargs)

Duke-based horizontal match of two databases. Returns the matched dataframe including only the matched entries in a multi-indexed pandas.Dataframe. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different two datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link in order to obtain unique entries in the resulting dataframe. Attention: When aborting this command, the duke process will still continue in the background, wait until the process is finished before restarting.

Parameters:

  • dfs (list of pandas.Dataframe or strings) –

    dataframes or csv-files to use for the matching

  • labels (list of strings) –

    Names of the databases for the resulting dataframe

cross_matches

cross_matches(sets_of_pairs, labels=None)

Combines multiple sets of pairs and returns one consistent dataframe. Identifiers of two datasets can appear in one row even though they did not match directly but indirectly through a connecting identifier of another database.

Parameters:

  • sets_of_pairs (list) –

    list of pd.Dataframe's containing only the matches (without scores), obtained from the linkfile (duke() and best_matches())

  • labels (list of strings, default: None ) –

    list of names of the databases, used for specifying the order of the output

link_multiple_datasets(datasets, labels, use_saved_matches=False, config=None, **dukeargs)

Duke-based horizontal match of multiple databases. Returns the matching indices of the datasets. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link to the other database. This leads to unique entries in the resulting dataframe.

Parameters:

  • datasets (list of pandas.Dataframe or strings) –

    dataframes or csv-files to use for the matching

  • labels (list of strings) –

    Names of the databases in alphabetical order and corresponding order to the datasets

reduce_matched_dataframe

reduce_matched_dataframe(df, show_orig_names=False, config=None)

Reduce a matched dataframe to a unique set of columns. For each entry take the value of the most reliable data source included in that match.

Parameters:

  • df (Dataframe) –

    MultiIndex dataframe with the matched powerplants, as obtained from combined_dataframe() or match_multiple_datasets()

collection

Processed datasets of merged and/or adjusted data

Functions:

  • collect

    Return the collection for a given list of datasets in matched or

  • matched_data
  • powerplants

    Return the full matched dataset including all data sources listed in

collect

collect(datasets, update=False, reduced=True, config=None, **dukeargs)

Return the collection for a given list of datasets in matched or reduced form.

Parameters:

  • datasets (list or str) –

    list containing the dataset identifiers as str, or single str

  • update (bool, default: False ) –

    Do an horizontal update (True) or read from the cache file (False)

  • reduced (bool, default: True ) –

    Switch as to return the reduced (True) or matched (False) dataset.

  • config (dict, default: None ) –

    Configuration file of powerplantmatching

  • **dukeargs (keyword-args for duke, default: {} ) –

matched_data

matched_data(config=None, update=False, from_url=False, extend_by_vres=False, extendby_kwargs={}, extend_by_kwargs={}, fill_geopositions=True, filter_missing_geopositions=True, **collection_kwargs)

powerplants

powerplants(config=None, config_update=None, update=False, from_url=False, extend_by_vres=False, extendby_kwargs={}, extend_by_kwargs={}, fill_geopositions=True, filter_missing_geopositions=True, **collection_kwargs)

Return the full matched dataset including all data sources listed in config.yaml/matching_sources. The combined data is additionally extended by non-matched entries of sources given in config.yaml/fully_included_sources.

Parameters:

  • update (Boolean, default: False ) –
    Whether to rerun the matching process. Overrides stored to False
    if True.
    
  • from_url
    Whether to parse and store the already build data from the repo
    website.
    
  • config (Dict, default: None ) –
    Define a configuration varying from the setting in config.yaml.
    Relevant keywords are 'matching_sources', 'fully_included_sources'.
    
  • config_update (Dict, default: None ) –
    Configuration input dictionary to be merged into the default
    configuration data
    
  • extend_by_vres (Boolean, default: False ) –
    Whether extend the dataset by variable renewable energy sources
    given by powerplantmatching.data.OPSD_VRE()
    
  • extendby_kwargs ((Dict,), default: {} ) –
    Dict of keyword arguments passed to powerplantmatchting.
    heuristics.extend_by_non_matched
    
  • fill_geopositions
    Whether to fill geo coordinates by calling
    `df.powerplant.fill_geoposition()` after the matching process
    and before the optional extension by VRES. Only active if
    `update` is true.
    
  • filter_missing_geopositions
    Whether to filter out resulting entries without geo coordinates. The
    filtering happens after the matching process and the optional filling of
    geo coordinates and before the optional extension by VRES. Only active
    if `update` is true.
    
  • **collection_kwargs (kwargs, default: {} ) –
    Arguments passed to powerplantmatching.collection.Collection.