ml-debug/docs/evidence/nanda_research_process_shared_draft.md

# Shared Publicly: My Model of the Research Process - Neel Nanda draft/local copy

Source file: /home/wassname/Downloads/[Shared Publicly] My Model of the Research Process_ Explore, Understand, Distill.md
Author shown in content: Neel Nanda
Date: not stated in local file; contains published posts dated 2025-04-26, 2025-04-27, and 2025-05-01 plus expanded stage-guide material.
Fetch-status: local user-provided/downloaded markdown. Treat as a shared draft/local copy, not identical to the public LessWrong pages.
Use: practical stage guide for agents. This is the most operational source for ideation, exploration, understanding, distillation, failure modes, and mentor role.

## Why this matters for agents

The published posts establish the frame. This local draft contains the useful agent checklist: when to ideate, when to explore, what counts as surface area, how to test hypotheses, how to refine evidence, and when to go back a stage.

## Quotes

> You can't do research without a question or a domain. Ideation is about finding fertile ground. It might be quick, eg deferring to a mentor, or it might involve significant exploration itself, with explorations of many unpromising domains before you settle on one.

> While research taste is important, there are many other crucial skills, and research taste itself comprises several distinct abilities that shouldn't be naively conflated. Rather than focusing solely on research taste, I’ve tried to break down the research process into concrete and specific skills.

> Chris Olah has an excellent short post on what research taste is and exercises to learn it. In this spirit, for each of the aspects of the below, I highly recommend predicting a mentor’s answer before asking.

> Ideation ends when you have a clear enough question or domain that you can start generating concrete experiments to run.

> Leverage Mentors: Especially early on, it’s fine to let someone else do the work here, i.e. have a mentor recommend a problem.

> This is basically borrowing someone else’s research taste, and IMO is one of the most valuable things I do for my mentees.

> Goal: Gain understanding of the problem/domain, start to identify and crystallise interesting hypotheses.

> Your north star is information gained per unit time/effort.

> Crucially, Exploration is not about testing a specific hypothesis. Exploration is about gaining enough of an understanding of a domain that you know what the interesting hypotheses even are.

> It’s OK to be confused: It’s totally normal to spend a large fraction of this stage feeling pretty confused about what’s going on. This is fine and does not mean that you’re failing! The key question is whether you feel like you are learning things and becoming less confused.

> Reach for a tool that might show you something interesting, and can be employed fast. Don’t hold yourself to the standard of tools that you’re confident are good.

> Notice Weirdness: This is critical. Pay close attention to results that are surprising, counter-intuitive, inconsistent, or just feel off. Ask "Why?" relentlessly.

> Research Log: Keep a detailed log (daily or per session). Note down: goals for the session, what you tried, observations (especially weird ones!), links to code/plots (eg to notebooks or git commits or saved plots), brief thoughts/interpretations, ideas for next steps.

> Mentorship Role: Suggesting initial explorations & relevant resources, distinguishing genuinely weird results from known artifacts, providing sanity checks, helping prioritize which weirdness to pursue first.

> Goal: Rigorously testing specific, plausible hypotheses.

> Design High Information Experiments: Design experiments specifically to differentiate between your main hypothesis and the most plausible alternatives. Ask: "What prediction does H1 make that H2 contradicts?" Think like a Bayesian: what evidence is most likely under H1 relative to H2?

> Avoid the mistake of looking for evidence predicted by H1 that’s also predicted by a bunch of other things!

> Use appropriate baselines - e.g. it’s not enough to show that your technique helps to lower a model’s performance on harmful tasks. Does a random vector do worse?

> Actively Seek Alternatives: Explicitly brainstorm other ways your observations could be explained. What are the simplest explanations? What known circuits or phenomena could be involved? What would a strong skeptic argue?

> Mentorship Role: Aggressively red teaming hypotheses and experimental designs. Suggesting crucial alternative hypotheses or experiments. Helping interpret confusing results. Conveying conceptual frameworks to make sense of findings. Pushing for higher standards of rigor and clarity.

> Goal: Distill all the messy insights from your research into concise, rigorous truth to communicate it to the world.

> Select Strongest Evidence: To start, choose the clearest, most convincing experiments, visualizations, and analyses that directly support your main claims.

> Acknowledge limitations: Inevitably, your results will have some limitations - edge cases, ways your evidence could be wrong, etc. I strongly encourage you to discuss these clearly and prominently in a write-up, even if you don’t have good counters to it.

> Your goal is to inform not persuade.

> The truth is what it is, and you should strive to understand it, even if it is inconvenient.

## Source graph

Links visible in this local draft worth follow-up:
- Chris Olah, research taste: https://colah.github.io/notes/taste/
- Jacob Steinhardt, Research as a Stochastic Decision Process: https://cs.stanford.edu/~jsteinhardt/ResearchasaStochasticDecisionProcess.html
- Nanda paper reading list: https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite
- Nanda Othello research process: https://www.alignmentforum.org/s/nhGNHyJHbrofpPbRG/p/TAz44Lb9n9yf52pv8
- Nanda five-minute timers: https://www.neelnanda.io/blog/post-28-on-creativity-the-joys-of-5-minute-timers
- Nanda weekly reflection: https://www.neelnanda.io/blog/39-reflection
- Negative results for SAEs: https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/negative-results-for-saes-on-downstream-tasks
- Research Debt: referenced by name in local draft; URL not included in visible excerpt.