Cuiwei Gao
Health Data Analyst & R Developer
I develop open-source R packages for problems at the intersection of biostatistics, public health, and clinical informatics. My current work focuses on genomic surveillance methodology, algorithmic fairness in clinical AI, and privacy-preserving data generation for multi-site clinical research.
R Packages on CRAN
lineagefreq
Modelling pathogen lineage frequency dynamics, estimating growth advantages, and generating short-term forecasts from genomic surveillance count data. Five estimation engines (frequentist and Bayesian) behind a unified interface, with built-in rolling-origin backtesting. Ships with real CDC SARS-CoV-2 surveillance data.
survinger
Design-adjusted inference for pathogen lineage surveillance under unequal sequencing and reporting delays. Implements Horvitz–Thompson, Hajek, and post-stratified estimators with Wilson score intervals; right-truncation-corrected delay distributions for nowcasting; and Neyman-optimal resource allocation.
clinicalfair
Post-hoc fairness auditing for clinical prediction models. Computes group-stratified performance metrics with bootstrap confidence intervals, detects four-fifths rule violations, visualises calibration and ROC disparities, and performs threshold-based mitigation. Model-agnostic and aligned with FDA AI/ML guidance.
syntheticdata
Generating synthetic clinical datasets with integrated privacy auditing. Implements Gaussian copula, bootstrap, and Laplace noise synthesis, with built-in distributional validation, membership inference testing, attribute disclosure risk assessment, and downstream model fidelity comparison. Designed for HIPAA/GDPR-constrained data sharing.