Paperdropby Constella
Monday · May 4, 2026
9,255
Step 1 of 2 · Setup your drop

What papers do you want to wake up to?

Describe in your own words what you care about. Stella will turn it into a feed and surface a handful of new arXiv papers each morning — just the ones worth reading.

Your topics5
#mechanistic interpretability×#agent evaluation×#long-context retrieval×#sparse autoencoders×#reasoning models×
Or pick from trending in cs.LG · cs.AI
tool usereward hackingin-context learningspeculative decodingworld modelsmultimodal groundingconstitutional AIcircuit analysis
Daily at 8:00 AM · 6 papers
Monday's drop · 6 new papers

6 papers landed for you overnight.

★ Top match
matched#mechanistic interpretability94%
2h ago
Locating and Editing Refusal Circuits in 70B Models
Mei Zhao, Jonas Lindqvist, Priya Ranjan, + 4arXiv · cs.LGarXiv 2604.01723
Identifies a 12-head circuit responsible for refusal behavior in Llama-3 70B and shows targeted fine-tuning of those heads alone reproduces 89% of full-model refusal training.
  • Localizes refusal to layers 47–58, predominantly in 12 attention heads.
  • Editing only those heads recovers 89% of safety-tuning behavior at 0.4% the parameter cost.
  • Generalizes across 4 model families with a transfer rate of 76%.
Click anywhere to expand →
More for you
matched#agent evaluation88%
AgentBench-X: Robustness Evaluation for Tool-Using LLMs
Sho Tanaka, Aria Bell, Wenjun LiuarXiv · cs.AI5h ago
Introduces 240 perturbation suites across 14 agent benchmarks. GPT-class agents lose an average 31% accuracy under nominal-looking input drift; CoT helps less than expected.
arXiv 2604.01198
matched#sparse autoencoders86%
Sparse Autoencoders Don’t Find What You Think They Find
Eli Carrington, Yuki Park, David OseiarXiv · cs.LG7h ago
Shows that 60% of features extracted by leading SAE training recipes are spurious correlations of tokenizer artifacts. Proposes a debiased loss that drops the rate to 11%.
arXiv 2604.00891
matched#long-context retrieval79%
Million-token Retrieval with Hierarchical Position Pruning
Ananya Rao, Marcus VancearXiv · cs.CL11h ago
Hierarchical pruning at the position-embedding level allows 1M-token recall at 2.1× speed of YaRN with no quality drop on RULER.
arXiv 2604.00422
matched#reasoning models74%
Process Reward Models Generalize Worse Than We Reported
Lina Park, Ramon Diaz, Esther Whyte, + 2arXiv · cs.LG18h ago
Re-evaluates 7 popular PRMs on out-of-domain math problems. Reports a 22–38% gap that disappears in original benchmarks. Releases ProcessOOD-1k.
arXiv 2604.00118
matched#agent evaluation71%
A Calculus of Self-Correction in Long-Horizon Agents
Hiroshi Abe, Tara QuillarXiv · cs.AI21h ago
Formal framework for measuring how often agents revise plans mid-execution. Finds revision rate is anti-correlated with task success above 8 steps.
arXiv 2603.27744
That's the drop.
Stella scanned 1,847 new arXiv submissions and picked these 6. Next drop tomorrow at 8:00 AM.
Ask Stella