Evaluates re-identification risk of synthetic data through multiple privacy metrics: nearest-neighbor distance ratio, membership inference accuracy, and attribute disclosure risk.
Arguments
- x
A
synthetic_dataobject fromsynthesize.- sensitive_cols
Character vector (optional). Columns considered sensitive for attribute disclosure assessment.
References
Snoke J, et al. (2018). General and specific utility measures for synthetic data. Journal of the Royal Statistical Society A, 181(3):663–688. doi:10.1111/rssa.12358
Examples
set.seed(42)
real <- data.frame(age = rnorm(100, 65, 10),
sbp = rnorm(100, 130, 20))
syn <- synthesize(real, seed = 42)
privacy_risk(syn)
#>
#> ── Privacy risk assessment
#> [!] nn_distance_ratio: 0.8892 (Medium)
#> [OK] membership_inference_acc: 0.535 (Low)