duke#
- powerplantmatching.duke.duke(datasets, labels=['one', 'two'], singlematch=False, showmatches=False, keepfiles=False, showoutput=False)#
Run duke in different modes (Deduplication or Record Linkage Mode) to either locate duplicates in one database or find the similar entries in two different datasets. In RecordLinkagesMode (match two databases) please set singlematch=True and use best_matches() afterwards
- Parameters:
datasets (pd.DataFrame or [pd.DataFrame]) – A single dataframe is run in deduplication mode, while multiple ones are linked
labels ([str], default ['one', 'two']) – Labels for the linked dataframe
singlematch (boolean, default False) – Only in Record Linkage Mode. Only report the best match for each entry of the first named dataset. This does not guarantee a unique match in the second named dataset.
keepfiles (boolean, default False) – If true, do not delete temporary files