Selected Work
Publications & Preprints
-
ICLR 2026 Workshop on LLM Interpretability & Transparency (LIT) · arXiv:2602.11201
-
Submitted · ACL 2026 TrustNLP Workshop · arXiv:2602.01442
I build tools to understand what language models actually do when they reason, not what their weights suggest, but what causally drives their outputs.
I develop mechanistic methods that move beyond gradient-based attribution to identify what models are genuinely doing when they produce outputs, and where that process breaks down. The goal: interpretability tools grounded in causal rather than correlational evidence, so AI reasoning can be audited, trusted, and improved.
Currently at The Associated Press (Election Data Engineer) and Algoverse AI Research, mentored by Jonas Rohweder at LMU Munich.