Data Processing Modules¶
data
¶
Collection of power plant data bases and statistical data
Functions:
-
BEYONDCOAL–Importer for the BEYOND COAL database.
-
BNETZA–Importer for the database put together by Germany's 'Federal Network
-
CARMA–Importer for the Carma database.
-
Capacity_stats–Standardize the aggregated capacity statistics provided by the ENTSO-E.
-
EESI–Get the European Energy Storage Inventory (EESI) dataset.
-
ENTSOE–Importer for the list of installed generators provided by the ENTSO-E
-
ENTSOE_EIC–Importer for the meta data given for each ENTSOE entry.
-
EXTERNAL_DATABASE–Importer for external custom databases.
-
GBPT–Importer for the global bioenergy powerplant tracker from global energy monitor.
-
GCPT–Importer for the global coal powerplant tracker from global energy monitor.
-
GEM–Get the combined dataset of all GEM (https://globalenergymonitor.org/) datasets.
-
GEM_GGPT– -
GEO–Importer for the GEO database.
-
GGPT–Importer for the global gas powerplant tracker from global energy monitor.
-
GGTPT–Importer for the global geothermal powerplant tracker from global energy monitor.
-
GHPT–Importer for the global gas powerplant tracker from global energy monitor.
-
GHR–Get the GloHydroRes (GHR) dataset.
-
GND–Get the GeoNuclearData (GND) dataset.
-
GNPT–Importer for the global nuclear energy powerplant tracker from global energy monitor.
-
GPD–Importer for the
Global Power Plant Database. -
GSPT–Importer for the global solar powerplant tracker from global energy monitor.
-
GWPT–Importer for the global wind powerplant tracker from global energy monitor.
-
IRENASTAT–Importer for the IRENASTAT renewable capacity statistics.
-
IWPDCY–This data is not yet available. Was extracted manually from
-
JRC–Importer for the JRC Hydro-power plants database retrieves from
-
MASTR–Get the Marktstammdatenregister (MaStR) dataset.
-
OPSD–Importer for the OPSD (Open Power Systems Data) database.
-
OPSD_VRE–Importer for the OPSD (Open Power Systems Data) renewables (VRE)
-
OPSD_VRE_country–Get country specific data from OPSD for renewables, if available.
-
OSM–Importer for the OpenStreetMap power plant data.
-
UBA–Importer for the UBA Database. Please download the data from
-
WEPP–Importer for the standardized WEPP (Platts, World Elecrtric Power
-
WIKIPEDIA–Importer for the WIKIPEDIA nuclear power plant database.
Attributes:
BEYONDCOAL
¶
BEYONDCOAL(raw=False, update=False, config=None)
Importer for the BEYOND COAL database.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
BNETZA
¶
BNETZA(raw=False, update=False, config=None, header=9, sheet_name='Gesamtkraftwerksliste BNetzA', prune_wind=True, prune_solar=True)
Importer for the database put together by Germany's 'Federal Network
Agency' (dt. 'Bundesnetzagentur' (BNetzA)).
Please download the data from
<https://www.bundesnetzagentur.de/DE/Sachgebiete/ElektrizitaetundGas/
Unternehmen_Institutionen/Versorgungssicherheit/Erzeugungskapazitaeten/
Kraftwerksliste/kraftwerksliste-node.html>_.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
-
header(int, Default 9, default:9) –The zero-indexed row in which the column headings are found.
CARMA
¶
CARMA(raw=False, update=False, config=None)
Importer for the Carma database.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
Capacity_stats
¶
Capacity_stats(raw=False, config=None, update=False, source='ENTSO-E SOAF', year=2015)
Standardize the aggregated capacity statistics provided by the ENTSO-E.
Parameters:
-
year(int, default:2015) –Year of the data (range usually 2013-2017) (defaults to 2016)
-
source(str, default:'ENTSO-E SOAF') –Which statistics source from {'ENTSO-E Transparency Platform', 'EUROSTAT', ...} (defaults to 'ENTSO-E Transparency Platform')
Returns:
-
df(DataFrame) –Capacity statistics per country and fuel-type
EESI
¶
EESI(raw=False, update=False, config=None)
Get the European Energy Storage Inventory (EESI) dataset.
Provided by the European Commission's Joint Research Centre. Contains chemical, electrochemical, thermal and mechanical energy storage technologies in Europe.
https://ses.jrc.ec.europa.eu/storage-inventory-maps
https://ses.jrc.ec.europa.eu/storage-inventory-tool/api/projects
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
ENTSOE
¶
ENTSOE(raw=False, update=False, config=None, entsoe_token=None, entsoe_session=None, **fill_geoposition_kwargs)
Importer for the list of installed generators provided by the ENTSO-E Transparency Project. Geographical information is not given. If update=True, the dataset is parsed through a request to 'https://transparency.entsoe.eu/generation/r2/ installedCapacityPerProductionUnit/show', Internet connection required. If raw=True, the same request is done, but the unprocessed data is returned.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
-
entsoe_token–Security token of the ENTSO-E Transparency platform. If None, it will be read from the config file. A token is required any update.
-
entsoe_session–Whether to pass a session to the ENTSO-E client. This can be useful for some networks with proxy settings. Check the client documentation for more information. This argument is just passed to
entsoe.EntsoePandasClient. -
fill_geoposition_kwargs–Keyword arguments passed to
fill_geoposition. -
Note– -
RESTful– -
https– -
web– -
token–
ENTSOE_EIC
¶
ENTSOE_EIC(raw=False, update=False, config=None, entsoe_token=None)
Importer for the meta data given for each ENTSOE entry.
This data serves to fill up geographical information. If update=True an internet connection is required.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
-
entsoe_token–Security token of the ENTSO-E Transparency platform
-
Note– -
RESTful– -
https– -
web– -
token–
EXTERNAL_DATABASE
¶
EXTERNAL_DATABASE(raw=False, update=True, config=None)
Importer for external custom databases.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GBPT
¶
GBPT(raw=False, update=False, config=None)
Importer for the global bioenergy powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GCPT
¶
GCPT(raw=False, update=False, config=None)
Importer for the global coal powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GEM
¶
GEM(raw=False, update=False, config=None)
Get the combined dataset of all GEM (https://globalenergymonitor.org/) datasets.
Parameters:
-
raw(bool, default:False) –Whether to return the raw dataset, by default False
-
update(bool, default:False) –Whether to update the raw dataset, by default False
-
config(_type_, default:None) –Custom configuration, by default None
GEO
¶
GEO(raw=False, update=False, config=None)
Importer for the GEO database.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GGPT
¶
GGPT(raw=False, update=False, config=None)
Importer for the global gas powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GGTPT
¶
GGTPT(raw=False, update=False, config=None)
Importer for the global geothermal powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GHPT
¶
GHPT(raw=False, update=False, config=None)
Importer for the global gas powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GHR
¶
GHR(raw=False, update=False, config=None)
Get the GloHydroRes (GHR) dataset.
https://www.nature.com/articles/s41597-025-04975-0
https://zenodo.org/records/14526360
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GND
¶
GND(raw=False, update=False, config=None)
Get the GeoNuclearData (GND) dataset.
https://github.com/cristianst85/GeoNuclearData
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GNPT
¶
GNPT(raw=False, update=False, config=None)
Importer for the global nuclear energy powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GPD
¶
GPD(raw=False, update=False, config=None, filter_other_dbs=True)
Importer for the Global Power Plant Database.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GSPT
¶
GSPT(raw=False, update=False, config=None)
Importer for the global solar powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
GWPT
¶
GWPT(raw=False, update=False, config=None)
Importer for the global wind powerplant tracker from global energy monitor.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
IRENASTAT
¶
IRENASTAT(raw=False, update=False, config=None)
Importer for the IRENASTAT renewable capacity statistics.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
IWPDCY
¶
IWPDCY(config=None)
This data is not yet available. Was extracted manually from the 'International Water Power & Dam Country Yearbook'.
Parameters:
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
JRC
¶
JRC(raw=False, update=False, config=None)
Importer for the JRC Hydro-power plants database retrieves from https://github.com/energy-modelling-toolkit/hydro-power-database.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
MASTR
¶
MASTR(raw=False, update=False, config=None)
Get the Marktstammdatenregister (MaStR) dataset.
Provided by the German Federal Network Agency (Bundesnetzagentur / BNetzA) and contains data on Germany, Austria and Switzerland.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
OPSD
¶
OPSD(raw=False, update=False, statusDE=None, config=None, **fill_geoposition_kwargs)
Importer for the OPSD (Open Power Systems Data) database.
Parameters:
-
raw(Boolean, default:False) –Whether to return a dictionary of the raw databases.
-
update–Whether to update the data from the url.
-
statusDE(list, default:['operating', 'reserve', 'special_case']) –Filter DE entries by operational status ['operating', 'shutdown', 'reserve', etc.]
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
-
fill_geoposition_kwargs–Keyword arguments for fill_geoposition.
OPSD_VRE
¶
OPSD_VRE(raw=False, update=False, config=None)
Importer for the OPSD (Open Power Systems Data) renewables (VRE) database.
This sqlite database is very big and hence not part of the package.
It needs to be obtained from
<http://data.open-power-system-data.org/renewable_power_plants/>_
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
OPSD_VRE_country
¶
OPSD_VRE_country(country, raw=False, update=False, config=None)
Get country specific data from OPSD for renewables, if available. Available for DE, FR, PL, CH, DK, CZ and SE (last update: 09/2020).
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
OSM
¶
OSM(raw=False, update=False, config=None)
Importer for the OpenStreetMap power plant data.
Downloads pre-processed OSM data from the osm-powerplants repository.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update(bool, default:False) –Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
Returns:
-
DataFrame–Power plant data from OpenStreetMap
UBA
¶
UBA(raw=False, update=False, config=None, header=9, skipfooter=26, prune_wind=True, prune_solar=True)
Importer for the UBA Database. Please download the data from
<https://www.umweltbundesamt.de/dokument/datenbank-kraftwerke-in-deutschland>_.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
-
header(int, Default 9, default:9) –The zero-indexed row in which the column headings are found.
-
skipfooter(int, Default 26, default:26) –
WEPP
¶
WEPP(raw=False, config=None)
Importer for the standardized WEPP (Platts, World Elecrtric Power Plants Database). This database is not provided by this repository because of its restrictive licence.
Parameters:
-
raw(Boolean, default:False) –Whether to return the original dataset
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
WIKIPEDIA
¶
WIKIPEDIA(raw=False, update=False, config=None)
Importer for the WIKIPEDIA nuclear power plant database.
Parameters:
-
raw(boolean, default:False) –Whether to return the original dataset
-
update–Whether to update the data from the url.
-
config(dict, default:None) –Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
cleaning
¶
Functions for vertically cleaning a dataset.
Functions:
-
aggregate_units–Vertical cleaning of the database. Cleans the "Name"-column, sums
-
clean_name–Clean the name of a power plant list.
-
clean_powerplantname– -
clean_technology–Clean the 'Technology' by condensing down the value into one claim. This
-
cliques–Locate cliques of units which are determined to belong to the same
-
config_target_key–Convert a column name to the key that is used to specify the target
-
gather_and_replace–Search for patterns in multiple columns and return a series of represantativ keys.
-
gather_fueltype_info–Parses in a set of columns for distinct fueltype specifications.
-
gather_set_info–Parses in a set of columns for distinct Set specifications.
-
gather_specifications–Parse columns to collect representative keys.
-
gather_technology_info–Parses in a set of columns for distinct technology specifications.
-
mode–Get the most common value of a series.
Attributes:
AGGREGATION_FUNCTIONS
module-attribute
¶
AGGREGATION_FUNCTIONS = {'Name': mode, 'Fueltype': mode, 'Technology': mode, 'Set': mode, 'Country': mode, 'Capacity': 'sum', 'lat': 'mean', 'lon': 'mean', 'DateIn': 'min', 'DateRetrofit': 'max', 'DateMothball': 'min', 'DateOut': 'max', 'File': mode, 'projectID': set, 'EIC': set, 'Duration': 'sum', 'Volume_Mm3': 'sum', 'DamHeight_m': 'sum', 'StorageCapacity_MWh': 'sum', 'Efficiency': 'sum'}
aggregate_units
¶
aggregate_units(df, dataset_name=None, pre_clean_name=False, country_wise=True, config=None, threads=1, **kwargs)
Vertical cleaning of the database. Cleans the "Name"-column, sums up the capacity of powerplant units which are determined to belong to the same plant.
Parameters:
-
df(Dataframe or string) –Dataframe or name to use for the resulting database
-
dataset_name(str, default:None) –Specify the name of your df, required if use_saved_aggregation is set to True.
-
pre_clean_name(Boolean, default:True) –Whether to clean the 'Name'-column before aggregating.
-
country_wise(Boolean, default:True) –Whether to aggregate only entries with a identical country.
-
threads(int, default:1) –Number of threads to use
clean_name
¶
clean_name(df, config=None)
Clean the name of a power plant list.
Cleans the column "Name" of the database by deleting very frequent words and nonalphanumerical characters of the column. Returns a reduced dataframe with nonempty Name-column.
Parameters:
-
df(Dataframe) –dataframe to be cleaned
-
config(dict, default:None) –Custom configuration, defaults to
powerplantmatching.config.get_config().
clean_technology
¶
clean_technology(df, generalize_hydros=False)
Clean the 'Technology' by condensing down the value into one claim. This procedure might reduce the scope of information, however is crucial for comparing different data sources.
Parameter
search_col : list, default is ['Name', 'Fueltype', 'Technology'] Specify the columns to be parsed config : dict, default None Add custom specific configuration, e.g. powerplantmatching.config.get_config(target_countries='Italy'), defaults to powerplantmatching.config.get_config()
cliques
¶
cliques(df, dataduplicates)
Locate cliques of units which are determined to belong to the same powerplant. Return the same dataframe with an additional column "grouped" which indicates the group that the powerplant is belonging to.
Parameters:
-
df(Dataframe or string) –dataframe or csv-file which should be analysed
-
dataduplicates(Dataframe or string) –dataframe or name of the csv-linkfile which determines the link within one dataset
config_target_key
¶
config_target_key(column)
Convert a column name to the key that is used to specify the target values in the config.
Parameters:
-
column(str) –Name of the column.
Returns:
-
str–Name of the key used in the config file.
gather_and_replace
¶
gather_and_replace(df, mapping)
Search for patterns in multiple columns and return a series of represantativ keys.
The function will return a series of unique identifiers given by the keys of the
mapping dictionary. The order in the mapping dictionary determines which
represantativ keys are calculated first. Note that these may be overwritten by
the following mappings.
Parameters:
-
df(DataFrame) –DataFrame with columns that should be parsed.
-
mapping(dict) –Dictionary mapping the represantativ keys to the regex patterns.
gather_fueltype_info
¶
gather_fueltype_info(df, search_col=['Name', 'Fueltype', 'Technology'], config=None)
Parses in a set of columns for distinct fueltype specifications.
This function uses the mappings (key -> regex pattern) given
by the config under the section target_fueltypes.
The representative keys are set if any of the columns
in search_col matches the regex pattern.
Parameter
df : pandas.DataFrame
DataFrame to be parsed.
search_col : list, default is ["Name", "Fueltype", "Technology", "Set"]
Set of columns to be parsed. Must be in df.
config : dict, default None
Custom configuration, defaults to
powerplantmatching.config.get_config().
gather_set_info
¶
gather_set_info(df, search_col=['Name', 'Fueltype', 'Technology'], config=None)
Parses in a set of columns for distinct Set specifications.
This function uses the mappings (key -> regex pattern) given
by the config under the section target_sets.
The representative keys are set if any of the columns
in search_col matches the regex pattern.
Parameter
df : pandas.DataFrame
DataFrame to be parsed.
search_col : list, default is ["Name", "Fueltype", "Technology", "Set"]
Set of columns to be parsed. Must be in df.
config : dict, default None
Custom configuration, defaults to
powerplantmatching.config.get_config().
gather_specifications
¶
gather_specifications(df, target_columns=['Fueltype', 'Technology', 'Set'], parse_columns=['Name', 'Fueltype', 'Technology', 'Set'], config=None)
Parse columns to collect representative keys.
This function will parse the columns specified in parse_columns and collects
the representative keys for each row in target_columns. The parsing is based
on the config file.
Parameters:
-
df(DataFrame) –Power plant dataframe.
-
target_columns(list, default:['Fueltype', 'Technology', 'Set']) –Columns where the representative keys will be collected, by default ["Fueltype", "Technology", "Set"]
-
parse_columns(list, default:['Name', 'Fueltype', 'Technology', 'Set']) –Columns that should be parsed, by default ["Name", "Fueltype", "Technology", "Set"]
-
config(dict, default:None) –Custom configuration, defaults to
powerplantmatching.config.get_config().
Returns:
-
DataFrame–
gather_technology_info
¶
gather_technology_info(df, search_col=['Name', 'Fueltype', 'Technology', 'Set'], config=None)
Parses in a set of columns for distinct technology specifications.
This function uses the mappings (key -> regex pattern) given
by the config under the section target_technologies.
The representative keys are set if any of the columns
in search_col matches the regex pattern.
Parameter
df : pandas.DataFrame
DataFrame to be parsed.
search_col : list, default is ["Name", "Fueltype", "Technology", "Set"]
Set of columns to be parsed. Must be in df.
config : dict, default None
Custom configuration, defaults to
powerplantmatching.config.get_config().
matching
¶
Functions for linking and combining different datasets
Functions:
-
best_matches–Subsequent to duke() with singlematch=True. Returns reduced list of
-
combine_multiple_datasets–Duke-based horizontal match of multiple databases. Returns the
-
compare_two_datasets–Duke-based horizontal match of two databases. Returns the matched
-
cross_matches–Combines multiple sets of pairs and returns one consistent
-
link_multiple_datasets–Duke-based horizontal match of multiple databases. Returns the
-
reduce_matched_dataframe–Reduce a matched dataframe to a unique set of columns. For each entry
best_matches
¶
best_matches(links)
Subsequent to duke() with singlematch=True. Returns reduced list of matches on the base of the highest score for each duplicated entry.
Parameters:
-
links(DataFrame) –Links as returned by duke
combine_multiple_datasets
¶
combine_multiple_datasets(datasets, labels=None, config=None, **dukeargs)
Duke-based horizontal match of multiple databases. Returns the matched dataframe including only the matched entries in a multi-indexed pandas.Dataframe. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link to the other database. This leads to unique entries in the resulting dataframe.
Parameters:
-
datasets(list of pandas.Dataframe or strings) –dataframes or csv-files to use for the matching
-
labels(list of strings, default:None) –Names of the databases in alphabetical order and corresponding order to the datasets
compare_two_datasets
¶
compare_two_datasets(dfs, labels, country_wise=True, config=None, **dukeargs)
Duke-based horizontal match of two databases. Returns the matched dataframe including only the matched entries in a multi-indexed pandas.Dataframe. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different two datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link in order to obtain unique entries in the resulting dataframe. Attention: When aborting this command, the duke process will still continue in the background, wait until the process is finished before restarting.
Parameters:
-
dfs(list of pandas.Dataframe or strings) –dataframes or csv-files to use for the matching
-
labels(list of strings) –Names of the databases for the resulting dataframe
cross_matches
¶
cross_matches(sets_of_pairs, labels=None)
Combines multiple sets of pairs and returns one consistent dataframe. Identifiers of two datasets can appear in one row even though they did not match directly but indirectly through a connecting identifier of another database.
Parameters:
-
sets_of_pairs(list) –list of pd.Dataframe's containing only the matches (without scores), obtained from the linkfile (duke() and best_matches())
-
labels(list of strings, default:None) –list of names of the databases, used for specifying the order of the output
link_multiple_datasets
¶
link_multiple_datasets(datasets, labels, use_saved_matches=False, config=None, **dukeargs)
Duke-based horizontal match of multiple databases. Returns the matching indices of the datasets. Compares all properties of the given columns ['Name','Fueltype', 'Technology', 'Country', 'Capacity','lat', 'lon'] in order to determine the same powerplant in different datasets. The match is in one-to-one mode, that is every entry of the initial databases has maximally one link to the other database. This leads to unique entries in the resulting dataframe.
Parameters:
-
datasets(list of pandas.Dataframe or strings) –dataframes or csv-files to use for the matching
-
labels(list of strings) –Names of the databases in alphabetical order and corresponding order to the datasets
reduce_matched_dataframe
¶
reduce_matched_dataframe(df, show_orig_names=False, config=None)
Reduce a matched dataframe to a unique set of columns. For each entry take the value of the most reliable data source included in that match.
Parameters:
-
df(Dataframe) –MultiIndex dataframe with the matched powerplants, as obtained from combined_dataframe() or match_multiple_datasets()
collection
¶
Processed datasets of merged and/or adjusted data
Functions:
-
collect–Return the collection for a given list of datasets in matched or
-
matched_data– -
powerplants–Return the full matched dataset including all data sources listed in
collect
¶
collect(datasets, update=False, reduced=True, config=None, **dukeargs)
Return the collection for a given list of datasets in matched or reduced form.
Parameters:
-
datasets(list or str) –list containing the dataset identifiers as str, or single str
-
update(bool, default:False) –Do an horizontal update (True) or read from the cache file (False)
-
reduced(bool, default:True) –Switch as to return the reduced (True) or matched (False) dataset.
-
config(dict, default:None) –Configuration file of powerplantmatching
-
**dukeargs(keyword-args for duke, default:{}) –
matched_data
¶
matched_data(config=None, update=False, from_url=False, extend_by_vres=False, extendby_kwargs={}, extend_by_kwargs={}, fill_geopositions=True, filter_missing_geopositions=True, **collection_kwargs)
powerplants
¶
powerplants(config=None, config_update=None, update=False, from_url=False, extend_by_vres=False, extendby_kwargs={}, extend_by_kwargs={}, fill_geopositions=True, filter_missing_geopositions=True, **collection_kwargs)
Return the full matched dataset including all data sources listed in config.yaml/matching_sources. The combined data is additionally extended by non-matched entries of sources given in config.yaml/fully_included_sources.
Parameters:
-
update(Boolean, default:False) –Whether to rerun the matching process. Overrides stored to False if True. -
from_url–Whether to parse and store the already build data from the repo website. -
config(Dict, default:None) –Define a configuration varying from the setting in config.yaml. Relevant keywords are 'matching_sources', 'fully_included_sources'. -
config_update(Dict, default:None) –Configuration input dictionary to be merged into the default configuration data -
extend_by_vres(Boolean, default:False) –Whether extend the dataset by variable renewable energy sources given by powerplantmatching.data.OPSD_VRE() -
extendby_kwargs((Dict,), default:{}) –Dict of keyword arguments passed to powerplantmatchting. heuristics.extend_by_non_matched -
fill_geopositions–Whether to fill geo coordinates by calling `df.powerplant.fill_geoposition()` after the matching process and before the optional extension by VRES. Only active if `update` is true. -
filter_missing_geopositions–Whether to filter out resulting entries without geo coordinates. The filtering happens after the matching process and the optional filling of geo coordinates and before the optional extension by VRES. Only active if `update` is true. -
**collection_kwargs(kwargs, default:{}) –Arguments passed to powerplantmatching.collection.Collection.