Claude Code for RAs - Menu
Chapter 5 · Pitfalls

Common Failures

Where things break, and how to spot them before they ship.

AI fails in predictable ways. Once you learn the patterns, you catch them faster. This page is a taxonomy of how things go wrong and what to do about it.

The Failure Taxonomy

Most AI failures fall into eight categories:

Hallucination: making up things that do not exist (file paths, variable names, functions, citations)
Laziness shortcuts: taking the path of least resistance and presenting it as the right answer
Memory over observation: using training data instead of checking the current state of your data or code
Scope creep: doing more than you asked, silently changing things you did not mention
Context degradation: quality dropping in long conversations as it forgets earlier instructions
Confident domain errors: getting econometric methodology or domain knowledge wrong with complete confidence
Silent data decisions: dropping observations, changing merge types, or altering samples without flagging it
Configuration bloat: your CLAUDE.md or skill files grow so long that Claude starts skimming and missing rules

Hallucination

Claude will invent things that do not exist and present them as fact. File paths that do not exist on your machine. Variable names that are not in your dataset. Stata commands with options that were never implemented. Functions in packages that do not have those functions.

The danger is that hallucinated output often looks correct. The file path has a plausible structure. The variable name sounds reasonable. The Stata syntax is almost right.

Example: hallucinated variable
* Claude writes:
merge 1:1 firm_id year using "data/trade_flows.dta"
regress log_revenue trade_exposure i.year, cluster(firm_id)

* Reality:
* - The variable is called "rev", not "log_revenue"
* - The file is "data/trade/bilateral_flows.dta", not "data/trade_flows.dta"
* - Claude invented both names because they sounded plausible

The Laziness Shortcut

AI takes the path of least resistance and presents it as the right answer. You ask it to download a large dataset with multiple tables. Instead of downloading everything, it claims some tables are "redundant" or "reconstructable from joins" to justify skipping them. It sounds plausible. It might even be technically accurate about the schema. But it has not actually verified whether those tables contain the same data.

When you push back, it immediately concedes. "You're right, let me actually check." This is the tell. If the AI concedes instantly when challenged, it was never confident in the first place; it was taking a shortcut.

Memory Over Observation

This is different from hallucination, and more insidious. AI defaults to what it "remembers" from training data rather than using its available tools to check the current state. The knowledge may be factually correct in general, but wrong for your specific situation.

Example: you ask whether a database table can bridge two datasets. AI says "yes, that table has a gvkey column," which is true based on the schema it learned during training. But it never actually checks whether the column is populated in your data. When you run a count, you find 3 rows. The schema knowledge was correct; the usefulness claim was not.

The tell is the same: ask "did you actually check, or are you telling me what you expect to be true?" If the answer starts with "You're right, let me verify...," it was working from memory.

Scope Creep

You ask Claude to fix one thing. It fixes that thing, and also refactors two functions, renames a variable, and adds error handling you did not ask for. Each individual change might be reasonable. Together, they make it impossible to verify what happened.

This is especially dangerous with data work. You ask Claude to add a variable to a regression. It adds the variable, and also changes the sample restriction, drops an interaction term, and switches from areg to reghdfe. The coefficient you get is different, and you do not know why.

Context Degradation

Long conversations lose quality. Around 30 to 40 exchanges in, Claude starts to:

This is not a bug you can work around. It is a fundamental limitation. The fix is simple: clear the conversation and start fresh. See Managing Conversations in Getting Started.

Confident Domain Errors

Claude will state econometric methodology with complete confidence, and be wrong. It will suggest an identification strategy that does not address your endogeneity concern. It will recommend clustering at the wrong level. It will describe an estimator's properties incorrectly.

This is the hardest failure to catch because it requires domain knowledge. A first-year RA may not know that clustering at the state level is wrong when the treatment varies at the firm level. Claude will not flag this. It will present the wrong approach with the same confidence as the right one.

Skills Not Loading

You installed skills from the team repository, but when you type / nothing shows up. Claude does not recognize any of the custom commands. This is almost always a directory structure problem.

Claude Code's ~/.claude/skills/ directory requires each skill to be a directory containing a file called SKILL.md. If you put flat .md files directly in ~/.claude/skills/, they are silently ignored. No error, no warning, just nothing loads.

Will not load

~/.claude/skills/prompt.md

~/.claude/skills/audit-code.md

Correct structure

~/.claude/skills/prompt/SKILL.md

~/.claude/skills/audit-code/SKILL.md

If you followed the installation instructions on the Making It Yours page (clone + symlink), the structure is correct automatically. If you copied files manually, check the directory layout.

Another common cause: you need to restart Claude Code after creating new skill directories. Edits to existing skills are picked up live, but brand-new directories are only detected on session start.

Configuration Bloat

Remember the mental model: Claude is an over-eager, overconfident collaborator. When you hand it a 10-page briefing document, it does exactly what a rushed human assistant would do. It skims. It reads the first few sections carefully, then starts skipping over details buried deeper in the file. It will never tell you "this file is too long for me to follow reliably." It just quietly stops following some of your rules.

This happens with your CLAUDE.md files and your skill files. You start with a lean, focused configuration. Over months, you keep adding rules, clarifications, edge cases. The file grows from 50 lines to 200 to 500 to 800. At some point, Claude starts missing instructions that are right there in the file. You catch it breaking a rule you know you wrote down. You check, and the rule is indeed there, on line 347. Claude just did not process it.

How to keep files lean

Bad Prompts vs. Good Prompts

The way you ask matters. Not because you need special syntax, but because vague requests produce vague output.

Vague

"Clean this data."

Specific

"Drop observations where revenue is missing or negative. Keep all other observations. Report how many were dropped."

Vague

"Run a regression."

Specific

"Regress log revenue on trade_exposure with firm and year fixed effects, clustering at the firm level. Use the sample from 2005-2020."

Vague

"Make a table."

Specific

"Create a LaTeX table with columns (1)-(3), showing progressively richer fixed effects. Include N, R-squared, and fixed effect indicators at the bottom."

Vague

"Fix the error."

Specific

"The merge is failing because firm_id has leading zeros in the master dataset but not in the using dataset. Fix the merge keys to match."

Verification Recipes

Quick checklists for common tasks:

After a Stata do-file runs

After a merge

After a LaTeX table

After any file edit

When to Stop and Do It Yourself

AI is not always the right tool. Stop using it and switch to manual work when:

There is no shame in doing something manually. AI is a multiplier, not a replacement for your skills.

When to Escalate

Some situations require help. Escalate to Adrien when:

Learn when to hand off tasks and let AI work without supervision.