General Structure

The dataset combines the data of all the data sources listed in Data-Sources and provides the following information:

  • Power plant name - claim of each database

  • Fueltype - {Bioenergy, Geothermal, Hard Coal, Hydro, Lignite, Nuclear, Natural Gas, Oil, Solar, Wind, Other}

  • Technology - {CCGT, OCGT, Steam Turbine, Combustion Engine, Run-Of-River, Pumped Storage, Reservoir}

  • Set - {Power Plant (PP), Combined Heat and Power (CHP), Storages (Stores)}

  • Capacity - [MW]

  • Duration - Maximum state of charge capacity in terms of hours at full output capacity

  • Dam Information - Dam volume [Mm^3] and Dam Height [m]

  • Geo-position - Latitude, Longitude

  • Country - EU-27 + CH + NO (+ UK) minus Cyprus and Malta

  • YearCommissioned - Commmisioning year of the powerplant

  • RetroFit - Year of last retrofit

  • projectID - Immutable identifier of the power plant

All data files of the package will be stored in the folder given by pm.core.package_config['data_dir']

Data Sources

Not available but supported sources:

  • IWPDCY (International Water Power & Dam Country Yearbook)

  • WEPP (Platts, World Elecrtric Power Plants Database)

Reliabilty Score

When the matched power plant entries from different sources are combined, the resulting value per column is determined by the most reliable source. The corresponding reliability scores are:

Dataset

Reliabilty score

JRC

6

ESE

6

UBA

5

OPSD

5

OPSD_EU

5

OPSD_DE

5

WEPP

4

ENTSOE

4

IWPDCY

3

GPD

3

GEO

3

BNETZA

3

CARMA

1

How it works

Whereas single databases as the CARMA, GEO or the OPSD database provide non standardized and incomplete information, the datasets can complement each other and improve their reliability. In a first step, powerplantmatching converts all powerplant dataset into a standardized format with a defined set of columns and values. The second part consists of aggregating power plant blocks together into units. Since some of the datasources provide their powerplant records on unit level, without detailed information about lower-level blocks, comparing with other sources is only possible on unit level. In the third and name-giving step the tool combines (or matches)different, standardized and aggregated input sources keeping only powerplants units which appear in more than one source. The matched data afterwards is complemented by data entries of reliable sources which have not matched.

The aggregation and matching process heavily relies on DUKE, a java application specialized for deduplicating and linking data. It provides many built-in comparators such as numerical, string or geoposition comparators. The engine does a detailed comparison for each single argument (power plant name, fuel-type etc.) using adjusted comparators and weights. From the individual scores for each column it computes a compound score for the likeliness that the two powerplant records refer to the same powerplant. If the score exceeds a given threshold, the two records of the power plant are linked and merged into one data set.

Let’s make that a bit more concrete by giving a quick example. Consider the following two data sets

Dataset 1:

Name

Fue ltype

Clas sific ation

Co untry

Cap acity

lat

lon

File

0

Aa rberg

Hydro

nan

S witze rland

1 4.609

47 .0444

7. 27578

nan

1

Abbey mills pu mping

Oil

nan

U nited Ki ngdom

6.4

5 1.687

-0.00 42057

nan

2

Ab ertay

Other

nan

U nited Ki ngdom

8

57 .1785

-2. 18679

nan

3

Abe rthaw

Coal

nan

U nited Ki ngdom

1 552.5

51 .3875

-3. 40675

nan

4

A blass

Wind

nan

Ge rmany

18

51 .2333

12.95

nan

5

Abono

Coal

nan

Spain

921.7

43 .5588

-5. 72287

nan

and

Dataset 2:

Name

Fue ltype

Clas sific ation

Co untry

Cap acity

lat

lon

File

0

Aa rberg

Hydro

nan

S witze rland

15.5

47 .0378

7.272

nan

1

Abe rthaw

Coal

Th ermal

U nited Ki ngdom

1500

51 .3873

-3 .4049

nan

2

Abono

Coal

Th ermal

Spain

921.7

43 .5528

-5 .7231

nan

3

Abw inden asten

Hydro

nan

Au stria

168

4 8.248

14 .4305

nan

4

Aceca

Oil

CHP

Spain

629

3 9.941

-3 .8569

nan

5

Aceca f enosa

Na tural Gas

CCGT

Spain

400

39 .9427

-3 .8548

nan

where Dataset 2 has the higher reliability score. Apparently entries 0, 3 and 5 of Dataset 1 relate to the same power plants as the entries 0,1 and 2 of Dataset 2. The toolset detects those similarities and combines them into the following set, but prioritising the values of Dataset 2:

Name

Country

Fueltype

Classification

Capacity

lat

lon

File

0

Aarberg

Switzerland

Hydro

nan

15.5

47.0378

7.272

nan

1

Aberthaw

United Kingdom

Coal

Thermal

1500

51.3873

-3.4049

nan

2

Abono

Spain

Coal

Thermal

921.7

43.5528

-5.7231

nan