A Counterfactual Analysis Framework for Algorithmic Discrimination
Innovative solutions for data curation and counterfactual analysis in critical domains.
Data Curation
Domain-Specific Datasets: Curate datasets from critical domains (e.g., legal documents, medical notes, job descriptions) where algorithmic discrimination has societal consequences.
Counterfactual Augmentation
Use GPT-4 to generate synthetic counterfactuals by systematically varying protected attributes (e.g., race, gender, age) in real-world texts. For example, rewriting a patient's symptoms to remove gendered language ("breast cancer" → "chest cancer").
Attention Analysis
Track cross-attention patterns between perturbed tokens (e.g., names, pronouns) and output decisions to identify "bias hotspots" in the model architecture.
Theoretical Advances
Formalize a counterfactual fairness framework for LLMs, extending causal inference principles to generative AIProve that counterfactual-aware training can align model behavior with normative fairness criteria (e.g., demographic parity).

