A platform for research: civil engineering, architecture and urbanism
An Unbiased Estimator for Hamming LSH Blocking
Record linkage has recently received much attention as a result of the many data sources that are nowadays available and need to be integrated to enable large-scale data analyses. Accurately estimating the proportion of matching record pairs, and obtaining a sample of such pairs, would greatly assist data custodians to compile appropriate business plans. This paper presents the first sampling method that is specifically tailored to the particular characteristics of the Hamming Locality-Sensitive Hashing (LSH) Blocking. This sampling method estimates the proportion of the number of matching record pairs between two data sets, with a specified acceptable level of error and a certain probabilistic confidence. Through experimental evaluation using four real-world data sets, we show the effectiveness and efficiency of our approach in estimating the number of matching pairs in record linkage tasks.
An Unbiased Estimator for Hamming LSH Blocking
Record linkage has recently received much attention as a result of the many data sources that are nowadays available and need to be integrated to enable large-scale data analyses. Accurately estimating the proportion of matching record pairs, and obtaining a sample of such pairs, would greatly assist data custodians to compile appropriate business plans. This paper presents the first sampling method that is specifically tailored to the particular characteristics of the Hamming Locality-Sensitive Hashing (LSH) Blocking. This sampling method estimates the proportion of the number of matching record pairs between two data sets, with a specified acceptable level of error and a certain probabilistic confidence. Through experimental evaluation using four real-world data sets, we show the effectiveness and efficiency of our approach in estimating the number of matching pairs in record linkage tasks.
An Unbiased Estimator for Hamming LSH Blocking
Karapiperis, Dimitrios (author) / Gkoulalas-Divanis, Aris (author)
2021-09-07
147415 byte
Conference paper
Electronic Resource
English
A best linear unbiased estimator for multi-seam deposits
Online Contents | 1988
|An unbiased probability estimator to determine Weibull modulus by the linear regression method
British Library Online Contents | 2006
|Approximately Unbiased Estimator for Non-Normal Process Capability Index C~N~p~k
British Library Online Contents | 2014
|