Utility Modules¶
utils
¶
Utility functions for checking data completeness and supporting other functions
Functions:
-
breakdown_matches–Function to inspect grouped and matched entries of a matched
-
config_filter–Convenience function to filter data source according to the config.yaml
-
convert_alpha2_to_country– -
convert_country_to_alpha2– -
convert_to_short_name– -
correct_manually–Update powerplant data based on stored corrections in
-
country_alpha2–Convenience function for converting country name into alpha 2 codes
-
fill_geoposition–Fill missing 'lat' and 'lon' values. Uses geoparsing with the value given
-
fun–Helper function for multiprocessing in classes/functions
-
get_name–Helper function to associate dataframe with a name. This is done with the
-
get_raw_file– -
lookup–Returns a lookup table of the dataframe df with rounded numbers.
-
parmap–Parallel mapping function. Use this function to parallelly map function
-
parse_Geoposition–Nominatim request for the Geoposition of a specific location in a country.
-
parse_string_to_dict–Convenience function to convert string of dict to dict type for specified columns.
-
read_csv_if_string–Convenience function to import powerplant data source if a string is given.
-
restore_blocks–Restore blocks of powerplants from a matched dataframe.
-
select_by_projectID–Convenience function to select data by its projectID
-
set_column_name–Helper function to associate dataframe with a name. This is done with the
-
set_uncommon_fueltypes_to_other–Replace uncommon fueltype specifications as by 'Other'. This helps to
-
to_categorical_columns–Helper function to set datatype of columns 'Fueltype', 'Country', 'Set',
-
to_dict_if_string–Convenience function to ensure dict-like output
-
to_list_if_other–Convenience function to ensure list-like output
-
update_saved_matches_for_–Update your saved matched for a single source. This is very helpful if you
Attributes:
-
cc– -
country_map–
breakdown_matches
¶
breakdown_matches(df)
Function to inspect grouped and matched entries of a matched dataframe. Breaks down to all ingoing data on detailed level.
Parameters:
-
df(DataFrame) –Matched data with not empty projectID-column. Keys of projectID must be specified in powerplantmatching.data.data_config
config_filter
¶
config_filter(df, config)
Convenience function to filter data source according to the config.yaml file. Individual query filters are applied if argument 'name' is given.
Parameters:
-
df(DataFrame) –Data to be filtered
-
name(str, default:None) –Name of the data source to identify query in the config.yaml file
-
config(dict, default:None) –Configuration overrides varying from the config.yaml file
correct_manually
¶
correct_manually(df, name, config=None)
Update powerplant data based on stored corrections in powerplantmatching/data/in/manual_corrections.csv. Specify the name of the data by the second argument.
Parameters:
-
df(DataFrame) –Powerplant data
-
name(str) –Name of the data source, should be in columns of manual_corrections.csv
country_alpha2
¶
country_alpha2(country)
Convenience function for converting country name into alpha 2 codes
fill_geoposition
¶
fill_geoposition(df, use_saved_locations=True, saved_only=True, config=None)
Fill missing 'lat' and 'lon' values. Uses geoparsing with the value given in 'Name', limits the search through value in 'Country'. df must contain 'Name', 'lat', 'lon' and 'Country' as columns.
Parameters:
-
df(DataFrame) –DataFrame of power plants
-
use_saved_position(Boolean, default:True) –Whether to firstly compare with cached results in powerplantmatching/data/parsed_locations.csv
-
saved_only–Whether to only add geo-positions which are stored at
pm.core._package_data("parsed_locations.csv")
get_name
¶
get_name(df)
Helper function to associate dataframe with a name. This is done with the columns-axis name, as pd.DataFrame do not have a name attribute.
lookup
¶
lookup(df, keys=None, by='Country, Fueltype', exclude=None, unit='MW')
Returns a lookup table of the dataframe df with rounded numbers. Use different lookups as "Country", "Fueltype" for the different lookups.
Parameters:
-
df(pandas.Dataframe or list of pandas.Dataframe's) –powerplant databases to be analysed. If multiple dataframes are passed the lookup table will display them in a MulitIndex
-
by(string out of 'Country, Fueltype', 'Country' or 'Fueltype', default:'Country, Fueltype') –Define the type of lookup table you want to obtain.
-
keys(list of strings, default:None) –labels of the different datasets, only necessary if multiple dataframes passed
-
exclude–list of fueltype to exclude from the analysis
parmap
¶
parmap(f, arg_list, config=None, threads=None)
Parallel mapping function. Use this function to parallelly map function f onto arguments in arg_list. The maximum number of parallel threads is taken from config.yaml:parallel_duke_processes.
Parameters:
-
f(function) –python function with one argument
-
arg_list(list) –list of arguments mapped to f
-
config(dict, default:None) –configuration dictionary
-
threads(int, default:None) –number of parallel threads
parse_Geoposition
¶
parse_Geoposition(location, zipcode='', country='', use_saved_locations=False, saved_only=False)
Nominatim request for the Geoposition of a specific location in a country. Returns a tuples with (latitude, longitude, country) if the request was successful, returns np.nan otherwise.
ToDo: There exist further online sources for lat/long data which could be used, if this one fails, e.g. - Google Geocoding API - Yahoo! Placefinder - https://askgeo.com (??)
Parameters:
-
location(string) –description of the location, can be city, area etc.
-
country(string, default:'') –name of the country which will be used as a bounding area
-
use_saved_postion(Boolean, default:False) –Whether to firstly compare with cached results in powerplantmatching/data/parsed_locations.csv
parse_string_to_dict
¶
parse_string_to_dict(df, cols)
Convenience function to convert string of dict to dict type for specified columns.
Parameters:
-
df(DataFrame) –DataFrame on which to apply the parsing
-
cols((str, list)) –Column(s) to be parsed to dict type
Returns:
-
DataFrame–DataFrame with specified columns parsed to dict type
read_csv_if_string
¶
read_csv_if_string(df)
Convenience function to import powerplant data source if a string is given.
restore_blocks
¶
restore_blocks(df, mode=2, config=None)
Restore blocks of powerplants from a matched dataframe.
This function breaks down all matches. For each match separately it selects blocks from only one input data source. For this selection the following modi are available:
1. Select the source with most number of blocks in the match
2. Select the source with the highest reliability score
Parameters:
-
df(DataFrame) –Matched data with not empty projectID-column. Keys of projectID must be specified in powerplantmatching.data.data_config
select_by_projectID
¶
select_by_projectID(df, projectID, dataset_name=None)
Convenience function to select data by its projectID
set_column_name
¶
set_column_name(df, name)
Helper function to associate dataframe with a name. This is done with the columns-axis name, as pd.DataFrame do not have a name attribute.
set_uncommon_fueltypes_to_other
¶
set_uncommon_fueltypes_to_other(df, fillna_other=True, config=None, **kwargs)
Replace uncommon fueltype specifications as by 'Other'. This helps to compare datasources with Capacity statistics given by powerplantmatching.data.Capacity_stats().
Parameters:
-
df(DataFrame) –DataFrame to replace 'Fueltype' argument
-
fillna_other(Boolean, default:True) –Whether to replace NaN values in 'Fueltype' with 'Other'
-
fueltypes(list) –list of replaced fueltypes, defaults to ['Mixed fuel types', 'Electro-mechanical', 'Hydrogen Storage']
to_categorical_columns
¶
to_categorical_columns(df)
Helper function to set datatype of columns 'Fueltype', 'Country', 'Set', 'File', 'Technology' to categorical.
update_saved_matches_for_
¶
update_saved_matches_for_(name)
Update your saved matched for a single source. This is very helpful if you modified/updated a data source and do not want to run the whole matching again.
Example
Assume data source 'ESE' changed a little:
pm.utils.update_saved_matches_for_('ESE') ...
... pm.collection.matched_data(update=True)
Now the matched_data is updated with the modified version of ESE.
export
¶
Functions:
-
fueltype_to_abbrev–Return the fueltype-specific abbreviation.
-
map_bus–Assign a 'bus' column to the dataframe based on a list of coordinates.
-
map_country_bus–Assign a 'bus' column based on a list of coordinates and countries.
-
store_open_dataset– -
timestype_to_life–Returns the timestype-specific technical lifetime.
-
to_TIMES–Transform a given dataset into the TIMES format and export as .xlsx.
-
to_pypsa_names–Rename the columns of the powerplant data according to the
-
to_pypsa_network–Export a powerplant dataframe to a pypsa.Network(), specify specific buses
Attributes:
-
cget–
map_bus
¶
map_bus(df, buses)
Assign a 'bus' column to the dataframe based on a list of coordinates.
Parameters:
-
df(DataFrame) –power plant list with coordinates 'lat' and 'lon'
-
buses(DataFrame) –bus list with coordites 'x' and 'y'
Returns:
-
DataFrame with an extra column 'bus' indicating the nearest bus.–
map_country_bus
¶
map_country_bus(df, buses)
Assign a 'bus' column based on a list of coordinates and countries.
Parameters:
-
df(DataFrame) –power plant list with coordinates 'lat', 'lon' and 'Country'
-
buses(DataFrame) –bus list with coordites 'x', 'y', 'country'
Returns:
-
DataFrame with an extra column 'bus' indicating the nearest bus.–
to_TIMES
¶
to_TIMES(df=None, use_scaled_capacity=False, baseyear=2015)
Transform a given dataset into the TIMES format and export as .xlsx.
to_pypsa_names
¶
to_pypsa_names(df)
Rename the columns of the powerplant data according to the convention in PyPSA.
Arguments: df {pandas.DataFrame} -- powerplant data
Returns: pandas.DataFrame -- Column renamed dataframe
to_pypsa_network
¶
to_pypsa_network(df, network, buslist=None)
Export a powerplant dataframe to a pypsa.Network(), specify specific buses to allocate the plants (buslist).
heuristics
¶
Functions to modify and adjust power plant datasets
Functions:
-
PLZ_to_LatLon_map– -
aggregate_VRE_by_commissioning_year–Aggregate the vast number of VRE (e.g. vom data.OPSD_VRE()) units to one
-
aggregate_VRE_by_commyear– -
derive_vintage_cohorts_from_statistics–This function assumes an age-distribution for given capacity statistics
-
extend_by_VRE–Extends a given reduced dataframe by externally given VREs.
-
extend_by_non_matched–Returns the matched dataframe with additional entries of non-matched
-
fill_missing_commissioning_years–Fills the empty commissioning years with averages.
-
fill_missing_commyears– -
fill_missing_decommissioning_years–Function which sets/fills a column 'DateOut' with roughly
-
fill_missing_decommyears– -
fill_missing_duration– -
gross_to_net_factors– -
isin–Checks if a given dataframe is included in a matched dataframe.
-
remove_oversea_areas–Remove plants outside continental Europe such as the Canarian Islands etc.
-
rescale_capacities_to_country_totals–Returns a extra column 'Scaled Capacity' with an up or down scaled capacity
-
scale_to_net_capacities– -
set_denmark_region_id–Used to set the Region column to DKE/DKW (East/West) for electricity models
-
set_known_retire_years–Integrate known retire years, e.g. for German nuclear plants with fixed
aggregate_VRE_by_commissioning_year
¶
aggregate_VRE_by_commissioning_year(df, target_fueltypes=None, agg_geo_by=None)
Aggregate the vast number of VRE (e.g. vom data.OPSD_VRE()) units to one specific (Fueltype + Technology) cohorte per commissioning year.
Parameters:
-
df(DataFrame) –DataFrame containing the data to aggregate
-
target_fueltypes(list, default:None) –list of fueltypes to be aggregated (Others are cut!)
-
agg_by_geo(str) –How to deal with lat/lon positions. Allowed: NoneType : Do not show geoposition at all 'mean' : Average geoposition 'wm' : Average geoposition weighted by capacity
derive_vintage_cohorts_from_statistics
¶
derive_vintage_cohorts_from_statistics(df, base_year=2015, config=None)
This function assumes an age-distribution for given capacity statistics and returns a df, containing how much of capacity has been built for every year.
extend_by_VRE
¶
extend_by_VRE(df, config=None, base_year=2017, prune_beyond=True)
Extends a given reduced dataframe by externally given VREs.
Parameters:
-
df(DataFrame) –The dataframe to be extended
-
base_year(int, default:2017) –Needed for deriving cohorts from IRENA's capacity statistics
Returns:
-
df(DataFrame) –Extended dataframe
extend_by_non_matched
¶
extend_by_non_matched(df, extend_by, label=None, query=None, aggregate_added_data=True, config=None, **aggkwargs)
Returns the matched dataframe with additional entries of non-matched powerplants of a reliable source.
Parameters:
-
df(DataFrame) –Already matched dataset which should be extended
-
extend_by(DataFrame | str) –Database which is partially included in the matched dataset, but which should be included totally. If str is passed, is will be used to call the corresponding data from data.py
-
label(str, default:None) –Column name of the additional database within the matched dataset, this string is used if the columns of the additional database do not correspond to the ones of the dataset
fill_missing_commissioning_years
¶
fill_missing_commissioning_years(df)
Fills the empty commissioning years with averages.
fill_missing_decommissioning_years
¶
fill_missing_decommissioning_years(df, config=None)
Function which sets/fills a column 'DateOut' with roughly
estimated values for decommissioning years, based on the estimated lifetimes
per Fueltype given in the config and corresponding commissioning years.
Note that the latter is filled up using fill_missing_commissioning_years.
gross_to_net_factors
¶
gross_to_net_factors(reference='opsd', aggfunc='median', return_entire_data=False)
isin
¶
isin(df, matched, label=None)
Checks if a given dataframe is included in a matched dataframe.
Parameters:
-
df(DataFrame) –The dataframe to be checked
-
matched(DataFrame) –The matched dataframe
Returns:
-
bool–True if all dataframes are included in the matched dataframe, False otherwise
remove_oversea_areas
¶
remove_oversea_areas(df, lat=[36, 72], lon=[-10.6, 31])
Remove plants outside continental Europe such as the Canarian Islands etc.
rescale_capacities_to_country_totals
¶
rescale_capacities_to_country_totals(df, fueltypes=None)
Returns a extra column 'Scaled Capacity' with an up or down scaled capacity in order to match the statistics of the ENTSOe country totals. For every country the information about the total capacity of each fueltype is given. The scaling factor is determined by the ratio of the aggregated capacity of the fueltype within each country and the ENTSOe statistics about the fueltype capacity total within each country.
Parameters:
-
df(DataFrame) –Data set that should be modified
-
fueltype(str or list of strings) –fueltype that should be scaled
set_denmark_region_id
¶
set_denmark_region_id(df)
Used to set the Region column to DKE/DKW (East/West) for electricity models based on lat,lon-coordinates and a heuristic for unknowns.
set_known_retire_years
¶
set_known_retire_years(df)
Integrate known retire years, e.g. for German nuclear plants with fixed decommissioning dates.
plot
¶
Functions:
-
boxplot_gross_to_net– -
boxplot_matchcount–Makes a boxplot for the capacities grouped by the number of matches.
-
country_totals_hbar– -
draw_basemap– -
factor_comparison– -
fueltype_and_country_totals_bar– -
fueltype_stats– -
fueltype_totals_bar– -
gather_nrows_ncols–Derives [nrows, ncols] based on x plots, so that a subplot looks nicely.
-
make_handler_map_to_scale_circles_as_in– -
make_legend_circles_for– -
powerplant_map–
Attributes:
boxplot_matchcount
¶
boxplot_matchcount(df)
Makes a boxplot for the capacities grouped by the number of matches. Attention: Currently only works for the full dataset with original names as the last columns.
country_totals_hbar
¶
country_totals_hbar(dfs, keys=None, exclude_fueltypes=['Solar', 'Wind'], figsize=(7, 5), unit='GW', axes_style='whitegrid')
draw_basemap
¶
draw_basemap(resolution=True, ax=None, country_linewidth=0.3, coast_linewidth=0.4, zorder=None, fillcontinents=True, **kwds)
fueltype_totals_bar
¶
fueltype_totals_bar(dfs, keys=None, figsize=(7, 4), unit='GW', last_as_marker=False, axes_style='whitegrid', exclude=[], **kwargs)
gather_nrows_ncols
¶
gather_nrows_ncols(x, orientation='landscape')
Derives [nrows, ncols] based on x plots, so that a subplot looks nicely.
Parameters:
-
x(int, Number of subplots between [0, 42]) –
make_handler_map_to_scale_circles_as_in
¶
make_handler_map_to_scale_circles_as_in(ax, dont_resize_actively=False)
powerplant_map
¶
powerplant_map(df, scale=20.0, alpha=0.6, european_bounds=True, fillcontinents=False, legendscale=1, resolution=True, figsize=None, ncol=2, loc='upper left')