Skip to content

Utility Modules

utils

Utility functions for checking data completeness and supporting other functions

Functions:

Attributes:

cc module-attribute

cc = CountryConverter()

country_map module-attribute

country_map = replace({'name': {'Czechia': 'Czech Republic'}})

breakdown_matches

breakdown_matches(df)

Function to inspect grouped and matched entries of a matched dataframe. Breaks down to all ingoing data on detailed level.

Parameters:

  • df (DataFrame) –

    Matched data with not empty projectID-column. Keys of projectID must be specified in powerplantmatching.data.data_config

config_filter

config_filter(df, config)

Convenience function to filter data source according to the config.yaml file. Individual query filters are applied if argument 'name' is given.

Parameters:

  • df (DataFrame) –

    Data to be filtered

  • name (str, default: None ) –

    Name of the data source to identify query in the config.yaml file

  • config (dict, default: None ) –

    Configuration overrides varying from the config.yaml file

convert_alpha2_to_country

convert_alpha2_to_country(df)

convert_country_to_alpha2

convert_country_to_alpha2(df)

convert_to_short_name

convert_to_short_name(df)

correct_manually

correct_manually(df, name, config=None)

Update powerplant data based on stored corrections in powerplantmatching/data/in/manual_corrections.csv. Specify the name of the data by the second argument.

Parameters:

  • df (DataFrame) –

    Powerplant data

  • name (str) –

    Name of the data source, should be in columns of manual_corrections.csv

country_alpha2

country_alpha2(country)

Convenience function for converting country name into alpha 2 codes

fill_geoposition

fill_geoposition(df, use_saved_locations=True, saved_only=True, config=None)

Fill missing 'lat' and 'lon' values. Uses geoparsing with the value given in 'Name', limits the search through value in 'Country'. df must contain 'Name', 'lat', 'lon' and 'Country' as columns.

Parameters:

  • df (DataFrame) –

    DataFrame of power plants

  • use_saved_position (Boolean, default: True ) –

    Whether to firstly compare with cached results in powerplantmatching/data/parsed_locations.csv

  • saved_only

    Whether to only add geo-positions which are stored at pm.core._package_data("parsed_locations.csv")

fun

fun(f, q_in, q_out)

Helper function for multiprocessing in classes/functions

get_name

get_name(df)

Helper function to associate dataframe with a name. This is done with the columns-axis name, as pd.DataFrame do not have a name attribute.

get_raw_file

get_raw_file(name, update=False, config=None, skip_retrieve=False)

lookup

lookup(df, keys=None, by='Country, Fueltype', exclude=None, unit='MW')

Returns a lookup table of the dataframe df with rounded numbers. Use different lookups as "Country", "Fueltype" for the different lookups.

Parameters:

  • df (pandas.Dataframe or list of pandas.Dataframe's) –

    powerplant databases to be analysed. If multiple dataframes are passed the lookup table will display them in a MulitIndex

  • by (string out of 'Country, Fueltype', 'Country' or 'Fueltype', default: 'Country, Fueltype' ) –

    Define the type of lookup table you want to obtain.

  • keys (list of strings, default: None ) –

    labels of the different datasets, only necessary if multiple dataframes passed

  • exclude

    list of fueltype to exclude from the analysis

parmap

parmap(f, arg_list, config=None, threads=None)

Parallel mapping function. Use this function to parallelly map function f onto arguments in arg_list. The maximum number of parallel threads is taken from config.yaml:parallel_duke_processes.

Parameters:

  • f (function) –

    python function with one argument

  • arg_list (list) –

    list of arguments mapped to f

  • config (dict, default: None ) –

    configuration dictionary

  • threads (int, default: None ) –

    number of parallel threads

parse_Geoposition

parse_Geoposition(location, zipcode='', country='', use_saved_locations=False, saved_only=False)

Nominatim request for the Geoposition of a specific location in a country. Returns a tuples with (latitude, longitude, country) if the request was successful, returns np.nan otherwise.

ToDo: There exist further online sources for lat/long data which could be used, if this one fails, e.g. - Google Geocoding API - Yahoo! Placefinder - https://askgeo.com (??)

Parameters:

  • location (string) –

    description of the location, can be city, area etc.

  • country (string, default: '' ) –

    name of the country which will be used as a bounding area

  • use_saved_postion (Boolean, default: False ) –

    Whether to firstly compare with cached results in powerplantmatching/data/parsed_locations.csv

parse_string_to_dict

parse_string_to_dict(df, cols)

Convenience function to convert string of dict to dict type for specified columns.

Parameters:

  • df (DataFrame) –

    DataFrame on which to apply the parsing

  • cols ((str, list)) –

    Column(s) to be parsed to dict type

Returns:

  • DataFrame

    DataFrame with specified columns parsed to dict type

read_csv_if_string

read_csv_if_string(df)

Convenience function to import powerplant data source if a string is given.

restore_blocks

restore_blocks(df, mode=2, config=None)

Restore blocks of powerplants from a matched dataframe.

This function breaks down all matches. For each match separately it selects blocks from only one input data source. For this selection the following modi are available:

1. Select the source with most number of blocks in the match

2. Select the source with the highest reliability score

Parameters:

  • df (DataFrame) –

    Matched data with not empty projectID-column. Keys of projectID must be specified in powerplantmatching.data.data_config

select_by_projectID

select_by_projectID(df, projectID, dataset_name=None)

Convenience function to select data by its projectID

set_column_name

set_column_name(df, name)

Helper function to associate dataframe with a name. This is done with the columns-axis name, as pd.DataFrame do not have a name attribute.

set_uncommon_fueltypes_to_other

set_uncommon_fueltypes_to_other(df, fillna_other=True, config=None, **kwargs)

Replace uncommon fueltype specifications as by 'Other'. This helps to compare datasources with Capacity statistics given by powerplantmatching.data.Capacity_stats().

Parameters:

  • df (DataFrame) –

    DataFrame to replace 'Fueltype' argument

  • fillna_other (Boolean, default: True ) –

    Whether to replace NaN values in 'Fueltype' with 'Other'

  • fueltypes (list) –

    list of replaced fueltypes, defaults to ['Mixed fuel types', 'Electro-mechanical', 'Hydrogen Storage']

to_categorical_columns

to_categorical_columns(df)

Helper function to set datatype of columns 'Fueltype', 'Country', 'Set', 'File', 'Technology' to categorical.

to_dict_if_string

to_dict_if_string(s)

Convenience function to ensure dict-like output

to_list_if_other

to_list_if_other(obj)

Convenience function to ensure list-like output

update_saved_matches_for_

update_saved_matches_for_(name)

Update your saved matched for a single source. This is very helpful if you modified/updated a data source and do not want to run the whole matching again.

Example

Assume data source 'ESE' changed a little:

pm.utils.update_saved_matches_for_('ESE') ... ... pm.collection.matched_data(update=True)

Now the matched_data is updated with the modified version of ESE.

export

Functions:

  • fueltype_to_abbrev

    Return the fueltype-specific abbreviation.

  • map_bus

    Assign a 'bus' column to the dataframe based on a list of coordinates.

  • map_country_bus

    Assign a 'bus' column based on a list of coordinates and countries.

  • store_open_dataset
  • timestype_to_life

    Returns the timestype-specific technical lifetime.

  • to_TIMES

    Transform a given dataset into the TIMES format and export as .xlsx.

  • to_pypsa_names

    Rename the columns of the powerplant data according to the

  • to_pypsa_network

    Export a powerplant dataframe to a pypsa.Network(), specify specific buses

Attributes:

cget module-attribute

cget = get

fueltype_to_abbrev

fueltype_to_abbrev()

Return the fueltype-specific abbreviation.

map_bus

map_bus(df, buses)

Assign a 'bus' column to the dataframe based on a list of coordinates.

Parameters:

  • df (DataFrame) –

    power plant list with coordinates 'lat' and 'lon'

  • buses (DataFrame) –

    bus list with coordites 'x' and 'y'

Returns:

  • DataFrame with an extra column 'bus' indicating the nearest bus.

map_country_bus

map_country_bus(df, buses)

Assign a 'bus' column based on a list of coordinates and countries.

Parameters:

  • df (DataFrame) –

    power plant list with coordinates 'lat', 'lon' and 'Country'

  • buses (DataFrame) –

    bus list with coordites 'x', 'y', 'country'

Returns:

  • DataFrame with an extra column 'bus' indicating the nearest bus.

store_open_dataset

store_open_dataset()

timestype_to_life

timestype_to_life()

Returns the timestype-specific technical lifetime.

to_TIMES

to_TIMES(df=None, use_scaled_capacity=False, baseyear=2015)

Transform a given dataset into the TIMES format and export as .xlsx.

to_pypsa_names

to_pypsa_names(df)

Rename the columns of the powerplant data according to the convention in PyPSA.

Arguments: df {pandas.DataFrame} -- powerplant data

Returns: pandas.DataFrame -- Column renamed dataframe

to_pypsa_network

to_pypsa_network(df, network, buslist=None)

Export a powerplant dataframe to a pypsa.Network(), specify specific buses to allocate the plants (buslist).

heuristics

Functions to modify and adjust power plant datasets

Functions:

PLZ_to_LatLon_map

PLZ_to_LatLon_map()

aggregate_VRE_by_commissioning_year

aggregate_VRE_by_commissioning_year(df, target_fueltypes=None, agg_geo_by=None)

Aggregate the vast number of VRE (e.g. vom data.OPSD_VRE()) units to one specific (Fueltype + Technology) cohorte per commissioning year.

Parameters:

  • df (DataFrame) –

    DataFrame containing the data to aggregate

  • target_fueltypes (list, default: None ) –

    list of fueltypes to be aggregated (Others are cut!)

  • agg_by_geo (str) –

    How to deal with lat/lon positions. Allowed: NoneType : Do not show geoposition at all 'mean' : Average geoposition 'wm' : Average geoposition weighted by capacity

aggregate_VRE_by_commyear

aggregate_VRE_by_commyear(df, config=None)

derive_vintage_cohorts_from_statistics

derive_vintage_cohorts_from_statistics(df, base_year=2015, config=None)

This function assumes an age-distribution for given capacity statistics and returns a df, containing how much of capacity has been built for every year.

extend_by_VRE

extend_by_VRE(df, config=None, base_year=2017, prune_beyond=True)

Extends a given reduced dataframe by externally given VREs.

Parameters:

  • df (DataFrame) –

    The dataframe to be extended

  • base_year (int, default: 2017 ) –

    Needed for deriving cohorts from IRENA's capacity statistics

Returns:

  • df ( DataFrame ) –

    Extended dataframe

extend_by_non_matched

extend_by_non_matched(df, extend_by, label=None, query=None, aggregate_added_data=True, config=None, **aggkwargs)

Returns the matched dataframe with additional entries of non-matched powerplants of a reliable source.

Parameters:

  • df (DataFrame) –

    Already matched dataset which should be extended

  • extend_by (DataFrame | str) –

    Database which is partially included in the matched dataset, but which should be included totally. If str is passed, is will be used to call the corresponding data from data.py

  • label (str, default: None ) –

    Column name of the additional database within the matched dataset, this string is used if the columns of the additional database do not correspond to the ones of the dataset

fill_missing_commissioning_years

fill_missing_commissioning_years(df)

Fills the empty commissioning years with averages.

fill_missing_commyears

fill_missing_commyears(df)

fill_missing_decommissioning_years

fill_missing_decommissioning_years(df, config=None)

Function which sets/fills a column 'DateOut' with roughly estimated values for decommissioning years, based on the estimated lifetimes per Fueltype given in the config and corresponding commissioning years. Note that the latter is filled up using fill_missing_commissioning_years.

fill_missing_decommyears

fill_missing_decommyears(df, config=None)

fill_missing_duration

fill_missing_duration(df)

gross_to_net_factors

gross_to_net_factors(reference='opsd', aggfunc='median', return_entire_data=False)

isin

isin(df, matched, label=None)

Checks if a given dataframe is included in a matched dataframe.

Parameters:

  • df (DataFrame) –

    The dataframe to be checked

  • matched (DataFrame) –

    The matched dataframe

Returns:

  • bool

    True if all dataframes are included in the matched dataframe, False otherwise

remove_oversea_areas

remove_oversea_areas(df, lat=[36, 72], lon=[-10.6, 31])

Remove plants outside continental Europe such as the Canarian Islands etc.

rescale_capacities_to_country_totals

rescale_capacities_to_country_totals(df, fueltypes=None)

Returns a extra column 'Scaled Capacity' with an up or down scaled capacity in order to match the statistics of the ENTSOe country totals. For every country the information about the total capacity of each fueltype is given. The scaling factor is determined by the ratio of the aggregated capacity of the fueltype within each country and the ENTSOe statistics about the fueltype capacity total within each country.

Parameters:

  • df (DataFrame) –

    Data set that should be modified

  • fueltype (str or list of strings) –

    fueltype that should be scaled

scale_to_net_capacities

scale_to_net_capacities(df, is_gross=True, catch_all=True)

set_denmark_region_id

set_denmark_region_id(df)

Used to set the Region column to DKE/DKW (East/West) for electricity models based on lat,lon-coordinates and a heuristic for unknowns.

set_known_retire_years

set_known_retire_years(df)

Integrate known retire years, e.g. for German nuclear plants with fixed decommissioning dates.

plot

Functions:

Attributes:

cartopy_present module-attribute

cartopy_present = True

boxplot_gross_to_net

boxplot_gross_to_net(axes_style='darkgrid', **kwargs)

boxplot_matchcount

boxplot_matchcount(df)

Makes a boxplot for the capacities grouped by the number of matches. Attention: Currently only works for the full dataset with original names as the last columns.

country_totals_hbar

country_totals_hbar(dfs, keys=None, exclude_fueltypes=['Solar', 'Wind'], figsize=(7, 5), unit='GW', axes_style='whitegrid')

draw_basemap

draw_basemap(resolution=True, ax=None, country_linewidth=0.3, coast_linewidth=0.4, zorder=None, fillcontinents=True, **kwds)

factor_comparison

factor_comparison(dfs, keys=None, figsize=(12, 9))

fueltype_and_country_totals_bar

fueltype_and_country_totals_bar(dfs, keys=None, figsize=(18, 8))

fueltype_stats

fueltype_stats(df)

fueltype_totals_bar

fueltype_totals_bar(dfs, keys=None, figsize=(7, 4), unit='GW', last_as_marker=False, axes_style='whitegrid', exclude=[], **kwargs)

gather_nrows_ncols

gather_nrows_ncols(x, orientation='landscape')

Derives [nrows, ncols] based on x plots, so that a subplot looks nicely.

Parameters:

  • x (int, Number of subplots between [0, 42]) –

make_handler_map_to_scale_circles_as_in

make_handler_map_to_scale_circles_as_in(ax, dont_resize_actively=False)

make_legend_circles_for

make_legend_circles_for(sizes, scale=1.0, **kw)

powerplant_map

powerplant_map(df, scale=20.0, alpha=0.6, european_bounds=True, fillcontinents=False, legendscale=1, resolution=True, figsize=None, ncol=2, loc='upper left')