fuzzylink: Probabilistic Record Linkage Using Pretrained Text Embeddings

Links datasets through fuzzy string matching using pretrained text embeddings. Produces more accurate record linkage when lexical string distance metrics are a poor guide to match quality (e.g., "Patricia" is more lexically similar to "Patrick" than it is to "Trish"). Capable of performing multilingual record linkage. Methods are described in Ornstein (2025) <doi:10.1017/pan.2025.10016>.

Version: 0.3.0
Depends: R (≥ 4.1.0)
Imports: stats, utils, dplyr, Rfast, reshape2, stringdist, stringr, httr, jsonlite, httr2 (≥ 1.2.1), ranger, ellmer (≥ 0.4.0)
Published: 2026-01-23
DOI: 10.32614/CRAN.package.fuzzylink
Author: Joe Ornstein ORCID iD [aut, cre, cph]
Maintainer: Joe Ornstein <jornstein at uga.edu>
BugReports: https://github.com/joeornstein/fuzzylink/issues
License: MIT + file LICENSE
URL: https://github.com/joeornstein/fuzzylink
NeedsCompilation: no
Materials: README, NEWS
CRAN checks: fuzzylink results

Documentation:

Reference manual: fuzzylink.html , fuzzylink.pdf

Downloads:

Package source: fuzzylink_0.3.0.tar.gz
Windows binaries: r-devel: fuzzylink_0.3.0.zip, r-release: fuzzylink_0.3.0.zip, r-oldrel: fuzzylink_0.3.0.zip
macOS binaries: r-release (arm64): fuzzylink_0.3.0.tgz, r-oldrel (arm64): fuzzylink_0.3.0.tgz, r-release (x86_64): fuzzylink_0.3.0.tgz, r-oldrel (x86_64): fuzzylink_0.3.0.tgz
Old sources: fuzzylink archive

Linking:

Please use the canonical form https://CRAN.R-project.org/package=fuzzylink to link to this page.