Donald Ye — AI Interpretability Researcher

← Back

Donald Ye

Donald Ye

AI Interpretability Researcher · New York

I build tools to understand what language models actually do when they reason, not what their weights suggest, but what causally drives their outputs.

I develop mechanistic methods that move beyond gradient-based attribution to identify what models are genuinely doing when they produce outputs, and where that process breaks down. The goal: interpretability tools grounded in causal rather than correlational evidence, so AI reasoning can be audited, trusted, and improved.

Currently at The Associated Press (Election Data Engineer) and Algoverse AI Research, mentored by Jonas Rohweder at LMU Munich.

Research Interests

Mechanistic Interpretability
Causal Attribution in Transformers
Chain-of-Thought Faithfulness
Training Dynamics
AI Safety & Auditability

Education

M.S. Computer Science TBD Fall 2026
B.Sc. Mathematics & Computer Science Fordham University

donaldye827@gmail.com GitHub LinkedIn CV ↗

Selected Work

NLDD framework

We introduce NLDD — a metric for causal faithfulness in chain-of-thought. We identify a Reasoning Horizon k* at 70–85% of chain length, beyond which CoT becomes post-hoc rationalization rather than genuine reasoning.

Faithfulness Decay in Chain-of-Thought Feb 2026

Mechanistic Interpretability · CoT Under review · EMNLP 2026

Gradient vs Causal waterfall

We formalize the divergence between gradient attribution and causal importance. Hidden Heroes are causally critical but gradient-invisible. Gradient Bloats are prominent in gradients but causally inert. Standard methods miss up to 70% of important components.

The Gradient-Causal Gap Feb 2026

Attribution · Causal Tracing Under review · ACL TrustNLP

NYC marathon pacing

K-means clustering of 56,000+ NYC Marathon finishers reveals 5 pacing archetypes — from even-pacers (25.9%) to severe crashers (2.7%). The crisis zone: runners who fade 20%+ from starting pace.

NYC Marathon Pacing Analysis 2024

Data Science · K-means · 56k+ runners

Political bias clusters

Comparative study: linear regression vs. BERT for political bias detection. HDBSCAN + SimCSE embeddings reveal cluster geometry that linear models fail to capture — with meaningful partisan separation in the embedding space.

Political Bias Detection 2024

NLP · BERT · HDBSCAN

Publications & Preprints

Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning

Donald Ye, Max Loffgren, Om Katadia, Linus Wong, Jonas Rohweder

ICLR 2026 Workshop on LLM Interpretability & Transparency (LIT) · arXiv:2602.11201

arXiv PDF Project ↗

2026

ICLR 2026 LIT

EMNLP review
The Gradient-Causal Gap: When Attribution Fails Interpretability

Donald Ye

Submitted · ACL 2026 TrustNLP Workshop · arXiv:2602.01442

arXiv PDF Project ↗

2026

Under review