Trains a predictive model on synthetic data and evaluates it on real data. Compares to a model trained on real data (gold standard). Measures whether synthetic data preserves predictive signal.
Value
A tibble with columns: train_data, metric, value.
For binary outcomes the metric is AUC; for continuous outcomes
it is R-squared.
Details
The real-data baseline uses in-sample evaluation (train and test on the same real data) to provide an upper bound on achievable performance. The synthetic-data model is also evaluated on real data, so the comparison reflects how well the synthetic data preserves predictive signal.
References
Jordon J, et al. (2022). Synthetic Data – what, why and how? arXiv preprint arXiv:2205.03257. doi:10.48550/arXiv.2205.03257
Examples
set.seed(42)
real <- data.frame(
x1 = rnorm(200), x2 = rnorm(200),
y = rbinom(200, 1, 0.3))
syn <- synthesize(real, seed = 42)
model_fidelity(syn, outcome = "y")
#> Warning: non-integer #successes in a binomial glm!
#> # A tibble: 2 × 3
#> train_data metric value
#> <chr> <chr> <dbl>
#> 1 real auc 0.577
#> 2 synthetic auc 0.438