Chapter 5 · Pitfalls

Common Failures

Where things break, and how to spot them before they ship.

AI fails in predictable ways. Once you learn the patterns, you catch them faster. This page is a taxonomy of how things go wrong and what to do about it.

The Failure Taxonomy

Most AI failures fall into eight categories:

Hallucination: making up things that do not exist (file paths, variable names, functions, citations)

Laziness shortcuts: taking the path of least resistance and presenting it as the right answer

Memory over observation: using training data instead of checking the current state of your data or code

Scope creep: doing more than you asked, silently changing things you did not mention

Context degradation: quality dropping in long conversations as it forgets earlier instructions

Confident domain errors: getting econometric methodology or domain knowledge wrong with complete confidence

Silent data decisions: dropping observations, changing merge types, or altering samples without flagging it

Configuration bloat: your CLAUDE.md or skill files grow so long that Claude starts skimming and missing rules

Hallucination

Claude will invent things that do not exist and present them as fact. File paths that do not exist on your machine. Variable names that are not in your dataset. Stata commands with options that were never implemented. Functions in packages that do not have those functions.

The danger is that hallucinated output often looks correct. The file path has a plausible structure. The variable name sounds reasonable. The Stata syntax is almost right.

Example: hallucinated variable

* Claude writes:
merge 1:1 firm_id year using "data/trade_flows.dta"
regress log_revenue trade_exposure i.year, cluster(firm_id)

* Reality:
* - The variable is called "rev", not "log_revenue"
* - The file is "data/trade/bilateral_flows.dta", not "data/trade_flows.dta"
* - Claude invented both names because they sounded plausible

The Laziness Shortcut

AI takes the path of least resistance and presents it as the right answer. You ask it to download a large dataset with multiple tables. Instead of downloading everything, it claims some tables are "redundant" or "reconstructable from joins" to justify skipping them. It sounds plausible. It might even be technically accurate about the schema. But it has not actually verified whether those tables contain the same data.

When you push back, it immediately concedes. "You're right, let me actually check." This is the tell. If the AI concedes instantly when challenged, it was never confident in the first place; it was taking a shortcut.

Memory Over Observation

This is different from hallucination, and more insidious. AI defaults to what it "remembers" from training data rather than using its available tools to check the current state. The knowledge may be factually correct in general, but wrong for your specific situation.

Example: you ask whether a database table can bridge two datasets. AI says "yes, that table has a gvkey column," which is true based on the schema it learned during training. But it never actually checks whether the column is populated in your data. When you run a count, you find 3 rows. The schema knowledge was correct; the usefulness claim was not.

The tell is the same: ask "did you actually check, or are you telling me what you expect to be true?" If the answer starts with "You're right, let me verify...," it was working from memory.

Scope Creep

You ask Claude to fix one thing. It fixes that thing, and also refactors two functions, renames a variable, and adds error handling you did not ask for. Each individual change might be reasonable. Together, they make it impossible to verify what happened.

This is especially dangerous with data work. You ask Claude to add a variable to a regression. It adds the variable, and also changes the sample restriction, drops an interaction term, and switches from areg to reghdfe. The coefficient you get is different, and you do not know why.

Context Degradation

Long conversations lose quality. Around 30 to 40 exchanges in, Claude starts to:

Forget instructions you gave earlier in the conversation
Contradict its own previous output
Repeat the same suggestion after you already rejected it
Produce increasingly generic or formulaic responses

This is not a bug you can work around. It is a fundamental limitation. The fix is simple: clear the conversation and start fresh. See Managing Conversations in Getting Started.

Confident Domain Errors

Claude will state econometric methodology with complete confidence, and be wrong. It will suggest an identification strategy that does not address your endogeneity concern. It will recommend clustering at the wrong level. It will describe an estimator's properties incorrectly.

This is the hardest failure to catch because it requires domain knowledge. A first-year RA may not know that clustering at the state level is wrong when the treatment varies at the firm level. Claude will not flag this. It will present the wrong approach with the same confidence as the right one.

Skills Not Loading

You installed skills from the team repository, but when you type / nothing shows up. Claude does not recognize any of the custom commands. This is almost always a directory structure problem.

Claude Code's ~/.claude/skills/ directory requires each skill to be a directory containing a file called SKILL.md. If you put flat .md files directly in ~/.claude/skills/, they are silently ignored. No error, no warning, just nothing loads.

Will not load

~/.claude/skills/prompt.md

~/.claude/skills/audit-code.md

Correct structure

~/.claude/skills/prompt/SKILL.md

~/.claude/skills/audit-code/SKILL.md

If you followed the installation instructions on the Making It Yours page (clone + symlink), the structure is correct automatically. If you copied files manually, check the directory layout.

Another common cause: you need to restart Claude Code after creating new skill directories. Edits to existing skills are picked up live, but brand-new directories are only detected on session start.

Configuration Bloat

Remember the mental model: Claude is an over-eager, overconfident collaborator. When you hand it a 10-page briefing document, it does exactly what a rushed human assistant would do. It skims. It reads the first few sections carefully, then starts skipping over details buried deeper in the file. It will never tell you "this file is too long for me to follow reliably." It just quietly stops following some of your rules.

This happens with your CLAUDE.md files and your skill files. You start with a lean, focused configuration. Over months, you keep adding rules, clarifications, edge cases. The file grows from 50 lines to 200 to 500 to 800. At some point, Claude starts missing instructions that are right there in the file. You catch it breaking a rule you know you wrote down. You check, and the rule is indeed there, on line 347. Claude just did not process it.

How to keep files lean

Your global CLAUDE.md should contain universal rules only. Anything project-specific belongs in the project CLAUDE.md, not the global one.
If a multi-step instruction keeps growing, extract it into a skill (a separate command file). One job per file.
Remove rules you no longer need. If a project is done, clean out the project CLAUDE.md. If a rule was specific to a dataset you no longer use, delete it.
Prioritize: put your most important rules at the top of the file, not buried at the bottom.

Bad Prompts vs. Good Prompts

The way you ask matters. Not because you need special syntax, but because vague requests produce vague output.

Vague

"Clean this data."

Specific

"Drop observations where revenue is missing or negative. Keep all other observations. Report how many were dropped."

Vague

"Run a regression."

Specific

"Regress log revenue on trade_exposure with firm and year fixed effects, clustering at the firm level. Use the sample from 2005-2020."

Vague

"Make a table."

Specific

"Create a LaTeX table with columns (1)-(3), showing progressively richer fixed effects. Include N, R-squared, and fixed effect indicators at the bottom."

Vague

"Fix the error."

Specific

"The merge is failing because firm_id has leading zeros in the master dataset but not in the using dataset. Fix the merge keys to match."

Verification Recipes

Quick checklists for common tasks:

After a Stata do-file runs

Check the log for error messages (even if Claude says it ran fine)
Verify observation counts at each step
Confirm variable names match the actual data
Check that file paths resolve to real files

After a merge

Tabulate _merge: how many matched vs. unmatched?
Compare observation count before and after
Check for duplicates on the merge key
Spot-check a few observations manually

After a LaTeX table

Does the table compile without errors?
Do the numbers match the Stata output?
Are the fixed effect indicators correct?
Does the table fit on one page?

After any file edit

Run git diff: are the changes exactly what you asked for?
Are there changes you did not request?
Did it modify files you did not mention?

When to Stop and Do It Yourself

AI is not always the right tool. Stop using it and switch to manual work when:

You have corrected Claude on the same issue three times and it keeps making the same mistake
The task requires judgment that you cannot verify (e.g., "is this identification strategy valid?")
You are spending more time fixing Claude's output than it would take to do the task yourself
The task involves sensitive data that should not be processed by external tools

There is no shame in doing something manually. AI is a multiplier, not a replacement for your skills.

When to Escalate

Some situations require help. Escalate to Adrien when:

Claude is stuck in a loop, producing the same error three or more times
You have been trying to fix something for more than 15 minutes and you are not making progress
Claude is suggesting something you do not understand and cannot verify
An MCP connection is broken (Stata, Google Workspace, etc.)
You are unsure whether a methodological choice is correct

Learn when to hand off tasks and let AI work without supervision.