Lesson 9: Memory, Context, and the #1 Problem in Vibe Coding

Course: AI-Powered Development (Dev Track) | Duration: 2 hours | Level: Intermediate

Overview

This is the most important lesson in the course. Everything you learn about prompting, tools, and workflows eventually collides with one hard constraint: the context window. When you understand how AI memory works — and what happens when it fills up — you become a fundamentally better agent director. When you don't understand it, you spend hours debugging problems that aren't bugs.

Session: A5.1 — "What Are Memory and Context? — Visual Explainer"

What you will be able to do after this lesson:

Explain what a context window is and why it is finite
Describe the three types of agent memory and when each one persists
Recognize the symptoms of context rot before it derails a session
Measure context usage in Claude Code and Cursor
Adjust your workflow to stay inside the quality zone

Prerequisites

Lesson 7 (Agent Architecture) — you understand what an agent is
Lesson 8 (Tool Use) — you understand how tool results flow back into the agent
You have Claude Code installed and can run a basic session

Part 1: The Context Window — The Agent's Working Memory (25 min)

What Is a Context Window?

A language model does not have persistent memory between API calls. Every time the model generates a response, it reads a single large block of text — the context window — and produces a reply. That block contains everything: your instructions, the full conversation so far, every file the agent read, every tool result it received.

The context window is the agent's only source of truth in that moment.

Think of it as a whiteboard. The model can see everything written on the whiteboard, and it can write back. But the whiteboard has a fixed size. Once it is full, you either erase something or stop writing.

Visual: The Context Window as a Container

code

┌─────────────────────────────────────────────────────────────────┐
│                    CONTEXT WINDOW (200,000 tokens)               │
├─────────────────────────────────────────────────────────────────┤
│  SYSTEM PROMPT / RULES                          ~2,000 tokens   │
│  (CLAUDE.md, project info, agent instructions)                  │
├─────────────────────────────────────────────────────────────────┤
│  CONVERSATION HISTORY                                           │
│  Turn 1: User message + Assistant response      ~1,000 tokens   │
│  Turn 2: User message + Assistant response      ~1,500 tokens   │
│  Turn 3: User message + Assistant response      ~2,000 tokens   │
│  ...                                                            │
│  Turn N: User message + Assistant response      ~2,000 tokens   │
├─────────────────────────────────────────────────────────────────┤
│  TOOL RESULTS                                                   │
│  File read: src/api/users.ts                    ~3,000 tokens   │
│  File read: src/api/auth.ts                     ~2,500 tokens   │
│  Bash output: npm test                          ~1,200 tokens   │
│  File read: src/models/user.model.ts            ~1,800 tokens   │
│  Search results: 5 files matching "auth"        ~4,000 tokens   │
├─────────────────────────────────────────────────────────────────┤
│  AVAILABLE SPACE  ████████████░░░░░░░░░░░░░░░░░  60% used      │
│                   ↑ shrinks every single turn                   │
└─────────────────────────────────────────────────────────────────┘

Context Window Visualization — How the context fills up over time

What Goes Into the Context Window

Every token in the context window falls into one of four categories:

Category	Examples	Typical Size
System prompt	CLAUDE.md, agent rules, project description	500 – 5,000 tokens
Conversation history	Every user message and assistant reply	500 – 3,000 tokens per turn
Tool results	File contents, bash output, search results	500 – 10,000 tokens per tool call
Current turn	Your new message	10 – 500 tokens

The key insight: none of this ever leaves the window. Every turn adds more. The agent cannot selectively forget Turn 3 while remembering Turn 10. It is all-or-nothing.

What Is a Token?

A token is the smallest unit the model processes. It is not a word, not a character, not a line — it is a subword unit produced by a tokenizer.

Rough rules of thumb:

1 token ≈ 4 characters of English text
1 token ≈ 0.75 words
1,000 tokens ≈ 750 words ≈ ~1.5 pages of text
A typical source code file (200 lines) ≈ 1,500 – 3,000 tokens
A long README ≈ 1,000 – 2,000 tokens

code

"Hello, world!"   →  4 tokens   ["Hello", ",", " world", "!"]
"authentication"  →  3 tokens   ["auth", "enti", "cation"]
"src/api/users"   →  5 tokens   ["src", "/", "api", "/", "users"]

Code is often token-dense. A 300-line TypeScript file with types, generics, and imports can cost 4,000+ tokens. JSON configuration files are expensive. Minified code is very expensive.

Current Context Window Sizes (as of 2026)

Model	Context Window	Notes
Claude 3.5 Sonnet / Claude 3.7	200,000 tokens	~150,000 words
Claude 3 Opus	200,000 tokens
GPT-4o	128,000 tokens
GPT-4 Turbo	128,000 tokens
Gemini 1.5 Pro	1,000,000 tokens	1M token window
Gemini 1.5 Flash	1,000,000 tokens
Gemini 2.0 Flash	1,000,000 tokens

Larger is not always better. A 1M token window does not mean a 1M token conversation is coherent. Models begin to lose precision in the middle of very long contexts — a phenomenon called the "lost in the middle" problem. Information at the start and end of the context is recalled more reliably than information buried in the middle.

Visual: The Glass Analogy

code

Turn 1          Turn 5          Turn 15         Turn 25
   │               │               │               │
   ▼               ▼               ▼               ▼
┌─────┐         ┌─────┐         ┌─────┐         ┌─────┐
│     │         │▓▓▓▓▓│         │▓▓▓▓▓│         │▓▓▓▓▓│ ← overflow
│     │         │▓▓▓▓▓│         │▓▓▓▓▓│         │▓▓▓▓▓│
│     │         │▓▓▓▓▓│         │▓▓▓▓▓│         │▓▓▓▓▓│
│▓▓▓▓▓│         │▓▓▓▓▓│         │▓▓▓▓▓│         │▓▓▓▓▓│
│▓▓▓▓▓│         │▓▓▓▓▓│         │▓▓▓▓▓│         │▓▓▓▓▓│
└─────┘         └─────┘         └─────┘         └─────┘
  8% full         35% full        72% full       100% full

Each turn:  System prompt + all history + all tool results + new message

Once the glass overflows, one of three things happens:

The model hard-fails with a context limit error
The API silently truncates the oldest messages
The agent starts exhibiting context rot symptoms (covered in Part 3)

Part 2: Three Types of Agent Memory (25 min)

Agents operating in Claude Code and similar tools do not have one type of memory — they have three. Understanding which type to use for what purpose is a core skill for anyone directing agents.

Memory Type 1: Working Memory

What it is: The content currently inside the context window.

Scope: Single session only. Cleared when you close the conversation.

What lives here:

Everything the agent has read this session
Every message you sent
Every tool result returned
The current state of your thinking

Analogy: Your desk while you are working. Everything you need right now is spread out on it. When you leave for the day, the desk is cleared.

Characteristics:

Instantly accessible — the model can reference anything in it
Finite — fills up over time
Volatile — gone when the session ends
Expensive — every item in it costs tokens every turn

Memory Type 2: Short-term Memory

What it is: Information written to files that persist between sessions, designed to be read at the start of the next session.

Scope: Cross-session, but intentionally temporary. Updated frequently.

What lives here:

Session summaries ("what we did today")
Current task status — GSD's PLAN.md, PROGRESS.md
In-progress notes, decisions made, blockers encountered
Compact representations of working memory

Analogy: The notepad on your desk that you fill in before you leave. Tomorrow you read it to rebuild context fast.

How to use it: When you start a new session, read the summary file into context. This reconstructs working memory cheaply (a 500-token summary vs. re-reading 50,000 tokens of previous conversation).

Examples in practice:

PLAN.md — current milestone, active phase, next steps
PROGRESS.md — what was done, what is blocked
.claude/session-summary.md — handoff notes

Memory Type 3: Long-term Memory

What it is: Permanent configuration and instruction files that are loaded at the start of every session, every time.

Scope: Permanent. Changed deliberately, not frequently.

What lives here:

Agent behavior rules (CLAUDE.md)
Editor preferences (.cursorrules, .cursor/rules/)
Project standards (SKILL.md, CONVENTIONS.md)
User preferences and style guides
Project architecture decisions

Analogy: The employee handbook and your personal notes about how you work. You don't rewrite them every day, but they shape every session.

How to use it: Write decisions here once so you never have to re-explain them. "Always use TypeScript strict mode" in CLAUDE.md means you never spend context tokens re-stating it.

Memory Type Comparison

code

┌──────────────────┬───────────────────┬──────────────────┬──────────────────┐
│ Property         │ Working Memory    │ Short-term       │ Long-term        │
├──────────────────┼───────────────────┼──────────────────┼──────────────────┤
│ Location         │ Context window    │ Files (PLAN.md)  │ Files (CLAUDE.md)│
│ Scope            │ Current session   │ Cross-session    │ Permanent        │
│ Update freq      │ Every turn        │ End of session   │ Rarely           │
│ Auto-loaded      │ Yes (always)      │ Manually / agent │ Yes (always)     │
│ Size limit       │ Context window    │ Unlimited        │ Keep it concise  │
│ Cost             │ Tokens every turn │ Load once        │ Tokens every turn│
│ Example          │ File you just read│ PLAN.md          │ CLAUDE.md        │
│ Analogy          │ Desk right now    │ Notepad          │ Employee handbook│
└──────────────────┴───────────────────┴──────────────────┴──────────────────┘

Agent Memory Pyramid — Working, Short-term, and Long-term Memory

ASCII Diagram: Memory Hierarchy

code

                        ╔══════════════════════════╗
                        ║     LONG-TERM MEMORY      ║
                        ║  CLAUDE.md, .cursorrules  ║
                        ║  SKILL.md, CONVENTIONS.md ║
                        ║  Permanent. Authoritative.║
                        ╚══════════════╤═══════════╝
                                       │ loaded every session
                        ╔══════════════▼═══════════╗
                        ║    SHORT-TERM MEMORY      ║
                        ║  PLAN.md, PROGRESS.md,    ║
                        ║  session-summary.md       ║
                        ║  Persists across sessions ║
                        ╚══════════════╤═══════════╝
                                       │ loaded at session start
                   ╔═══════════════════▼═══════════════════╗
                   ║           WORKING MEMORY               ║
                   ║     Current context window contents    ║
                   ║  All messages + tool results + files   ║
                   ║  Session only. Gone when you close.    ║
                   ╚═══════════════════════════════════════╝

       Small & stable          Medium & managed         Large & volatile
       (100s of tokens)        (1K–5K tokens)           (10K–150K tokens)

Why This Hierarchy Matters

Without this mental model, developers do two common mistakes:

Mistake 1: Trusting working memory across sessions. You build up 30 turns of context in a long session, then close it. The next day you open a fresh session expecting the agent to "remember" — but it doesn't. Working memory is gone.

Mistake 2: Re-explaining the same rules every session. If you spend 500 tokens every session explaining your coding style, move it to CLAUDE.md. That 500-token investment pays off in every future session.

Part 3: Context Rot — Live Demo (25 min)

What Is Context Rot?

Context rot is the gradual degradation of agent output quality as the context window fills up. It is not a bug. It is not a model failure. It is a predictable consequence of finite context combined with the statistical nature of attention in transformer models.

As the context fills:

Early information becomes harder to attend to (it is "further away" mathematically)
The model must compress its understanding of a larger history
Contradictions in the context create ambiguity the model resolves by guessing
The signal-to-noise ratio of the context decreases

The result: an agent that was doing excellent work at Turn 5 starts making mistakes at Turn 20, and is unreliable by Turn 40.

Live Demo Walkthrough

Follow these steps in an actual Claude Code session to observe context rot directly.

Setup:

code

# Create a demo project with multiple files
mkdir context-rot-demo && cd context-rot-demo
git init

# Create 15 source files with distinct content
for i in $(seq 1 15); do
  echo "// Module $i\nexport const module${i}Name = 'module-$i';\nexport const module${i}Config = { id: $i, enabled: true };" > "module-$i.ts"
done

Step 1: Start a fresh session

Open Claude Code. Note the context usage: it should show near 0%.

Step 2: Read files and make edits (turns 1–10)

code

You: Read all 15 module files and tell me what they contain.
[Claude reads 15 files — this adds ~30,000 tokens to the context]

You: Add a 'version' field to module-1, module-2, and module-3 configs.
[Claude edits 3 files — adds more context]

You: Now add a 'deprecated' flag to module-7 through module-10.
[More edits, more context]

You: Add a logging statement to every module's config.
[More edits — context is now at ~60%+]

Step 3: Ask about something from Turn 1

code

You: What was the original content of module-1.ts before any edits?

Watch what happens. A degraded agent will:

Describe a mix of the original and the edited version
Confuse module-1 with module-3 (they look similar)
State a version number that does not match what was actually edited
Say it "doesn't have the original" even though it read it in turn 1

Step 4: Observe context usage

In Claude Code, run /cost to see token usage. You will typically see 60,000 – 100,000 tokens consumed.

Step 5: The contradiction test

code

You: Does module-7 have a 'deprecated' flag set to true or false?

If context rot has set in, the agent may contradict what it told you two turns ago.

Signs of Context Rot

You can diagnose context rot without measuring tokens directly:

code

SYMPTOM                          │ LIKELY CAUSE
─────────────────────────────────┼──────────────────────────────────────────
Mixes up two similar file names  │ Early file reads pushed out of attention
Forgets a constraint you stated  │ System prompt or early instruction buried
Contradicts a previous answer    │ Conflicting information in long history
Writes code that ignores rules   │ CLAUDE.md tokens compete with tool results
"I don't have information about" │ Tool result truncated or scrolled past
Suddenly changes style/format    │ Style rules no longer in effective range
Makes the same fix twice         │ Cannot recall it already made that edit
Asks you to re-state context     │ It knows it is missing information

Why "Vibe Coding Doesn't Scale" Without Context Management

The phrase "vibe coding" describes the experience of iterating with an AI using natural language, riding the creative flow without a formal structure. This works beautifully for small tasks. It fails predictably for long tasks.

The math is simple:

Each turn in a complex session adds ~2,000–5,000 tokens
A session with 40 turns consumes 80,000–200,000 tokens
At 200K tokens, you are at the hard limit for Claude
Before that limit, quality degrades significantly starting around 60–80% fill

Without context management, a two-hour "vibe coding" session produces:

Excellent results for the first 30 minutes
Acceptable results for the next 30 minutes
Increasingly unreliable results for the final hour
Significant rework as you discover the agent broke earlier code

Part 4: Why This Is the #1 Problem (20 min)

Long Sessions Degrade in Quality

This is not an opinion — it is a measurable phenomenon. The quality of agent output is a function of:

Model capability (fixed for a given model)
Quality of your prompt and instructions (you control this)
Context fill level (you control this, but many developers don't realize it)

Quality vs. context fill is roughly inverse: as fill goes up, quality goes down, and the relationship accelerates near the limit.

A developer who runs 20-turn sessions manages this intuitively. A developer who runs 100-turn sessions is fighting a losing battle, even with a great model.

The Compounding Problem

Context rot creates a compounding problem. The agent makes a mistake (say, it forgets a constraint). You correct it. The correction adds more context. The correction and the original mistake both live in the window, creating ambiguity about what is correct. The agent now must navigate a context that contains both the wrong answer and the correction. This makes the next mistake more likely.

code

Turn 10: Agent makes mistake A (forgot constraint X)
Turn 11: You correct it ("remember, X must be true")
Turn 12: Agent makes mistake B (forgot constraint Y — window too full)
Turn 13: You correct it
Turn 14: Agent re-introduces mistake A (original correction is now far back)
Turn 15: You re-correct A
Turn 16: Agent introduces mistake C
...

This spiral is familiar to anyone who has tried to build a complex feature in a single long session. You feel like you are going in circles. You are — the agent is cycling through a degraded context.

Real-World Case Studies

Case 1: The Disappearing Authentication Rule

A developer was building a REST API with Claude Code. At Turn 2, they stated: "All endpoints must check for a valid JWT token before processing." Twenty turns later, after reading a dozen files and generating several new endpoints, the agent generated an endpoint with no auth check. The developer did not notice. The API shipped to staging with an unauthenticated endpoint.

Root cause: the auth requirement was stated early in the conversation. By Turn 22, it was buried under 80,000 tokens of code, tool results, and back-and-forth. The agent's effective attention on that instruction had dropped significantly.

Case 2: The File Name Swap

A developer asked an agent to refactor two similar services: UserService and AdminService. The agent read both files, then began making edits. By turn 15, when it edited UserService, it was accidentally applying logic intended for AdminService. Both files had similar names and structure. As context filled, the model's disambiguation between them degraded.

Case 3: The Style Reversal

A team had a strict CLAUDE.md rule: "Always use named exports, never default exports." After 45 turns of refactoring (reading 30 files, editing 20), the agent began generating export default in new files. It had not forgotten the rule — the rule was still in the context. But with 180,000 tokens filling the window, the system prompt instructions were competing with 179,000 tokens of other content for attention.

The Math

code

Session with 50 turns, mixed complexity:

Turn  1–10:  ~2,000 tokens/turn  =  20,000 tokens
Turn 11–20:  ~3,000 tokens/turn  =  30,000 tokens  (files being read)
Turn 21–30:  ~3,500 tokens/turn  =  35,000 tokens  (edits + bash output)
Turn 31–40:  ~3,000 tokens/turn  =  30,000 tokens
Turn 41–50:  ~2,500 tokens/turn  =  25,000 tokens
─────────────────────────────────────────────────
Total:                             140,000 tokens

At 200K context: 70% full by Turn 50
Quality begins degrading around Turn 30 (60% fill)

This math is unavoidable. Every professional developer using AI agents needs to internalize it.

The Productivity Inversion

The cruellest aspect of context rot is that it inverts your productivity curve. In a normal coding session, you get faster as you go — you understand the codebase better, you have momentum, you remember what you did. In a context-degraded AI session, you get slower:

code

Without context management:

Productivity
    │
100%│  ▓▓▓
    │  ▓▓▓▓▓▓
    │  ▓▓▓▓▓▓▓▓▓
 50%│  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓
    │  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
 25%│  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
    └──────────────────────────────── turns
       10   20   30   40   50

You spend more time correcting AI than coding.

Part 5: Measuring Context Usage (15 min)

Measuring in Claude Code

Claude Code provides two ways to see context usage:

Method 1: The /cost command

At any point in a session, type /cost in Claude Code. You will see:

code

Session cost: $0.23
Tokens used: 45,231 (input) + 8,122 (output) = 53,353 total
Context: 45,231 / 200,000 = 22.6% used

Run /cost regularly — every 5–10 turns — to track your usage.

Method 2: Visual context bar

In newer versions of Claude Code, a context usage bar appears in the session header. It fills as you use more tokens. The color shifts from green to yellow to red.

Method 3: Token estimation

If you want to estimate before running /cost:

Count the files you have read (each ~2,000 tokens average)
Count your turns (each ~2,000 tokens average for mixed conversation)
Add system prompt (~2,000 tokens for a typical CLAUDE.md)
Sum = rough token estimate

Measuring in Cursor

Cursor does not expose context usage as directly, but you can:

Open Settings → AI → see the model's context window size
Use the @file references carefully — each adds its full content to context
Watch for the "context too long" warning Cursor shows when nearing limits
Use @codebase sparingly — it performs semantic search rather than loading all files

Rules of Thumb

code

Context Fill    │ Status      │ Action
────────────────┼─────────────┼────────────────────────────────────────
0% – 40%        │ Green       │ Work freely, no management needed
40% – 60%       │ Yellow      │ Be intentional about what you add
60% – 80%       │ Orange      │ Actively manage context (summarize, compact)
80% – 95%       │ Red         │ Start a new session soon
95% – 100%      │ Critical    │ Context rot is active, quality unreliable

Quality Degradation Curve

code

Output Quality
    │
100%├──────────────────────
    │                      ╲
 85%│                        ╲
    │                          ╲
 70%│                            ╲─────────
    │                                      ╲
 50%│                                        ╲────
    │                                             ╲
 30%│                                               ╲───
    │                                                   ╲
 10%│                                                     ╲──────
    └──────┬────────────────┬──────────────┬───────────────┬──────
          20%              40%            60%             80%  100%
                        Context Window Fill Level

Zone 1 (0–40%):   Full quality. Agent has full context, full attention.
Zone 2 (40–60%):  Minor degradation. Occasional imprecision.
Zone 3 (60–80%):  Noticeable degradation. Plan context management.
Zone 4 (80%+):    Significant degradation. Start new session.

Note: this curve varies by model. Larger context windows (Gemini 1M) do not automatically give you more quality zones — they give you more space before the same degradation pattern begins.

When to Start a New Session

Start a new session when:

Context fill exceeds 70–80%
You are about to start a significantly different task
You notice any symptoms of context rot
The session has been running for more than 1.5–2 hours of active work
You have more than 30–40 turns of meaningful back-and-forth

Before closing the session, have the agent write a handoff summary:

code

Write a 500-word summary of everything we have done this session:
- What files we changed and why
- What decisions we made
- What the current state of the code is
- What the next steps are

Write this to PLAN.md.

Then in the new session:

code

Read PLAN.md and continue from where we left off.

This is short-term memory management in action.

Part 6: Hands-on Exercise (10 min)

Exercise: Measure Your Context Degradation Point

Goal: Find the point in a real session where your agent's quality begins to drop.

Setup (2 min)

Create a small project:

bash

mkdir context-exercise && cd context-exercise
git init
# Create 10 small files, each with a distinct module
for i in $(seq 1 10); do
  cat > "service-$i.ts" << EOF
// Service $i
export interface Service${i}Config {
  id: number;
  name: string;
  enabled: boolean;
}
 
export class Service${i} {
  private config: Service${i}Config;
 
  constructor(config: Service${i}Config) {
    this.config = config;
  }
 
  getStatus(): string {
    return this.config.enabled ? 'active' : 'inactive';
  }
}
EOF
done

Task sequence (5 min)

Run these tasks in a single Claude Code session. After each task, run /cost and record the token count.

code

Task 1:  Read service-1.ts and service-2.ts. Describe what they do.
         Record: tokens used after task 1 = ____

Task 2:  Add a 'version: string' field to Service1Config.
         Record: tokens used after task 2 = ____

Task 3:  Read service-3.ts through service-7.ts.
         Record: tokens used after task 3 = ____

Task 4:  What was the original interface for Service1Config?
         (This tests recall of task 1)

Task 5:  Add a 'timeout: number' field to services 3, 5, and 7.
         Record: tokens used after task 5 = ____

Task 6:  Read service-8.ts through service-10.ts.
         Record: tokens used after task 6 = ____

Task 7:  List all the fields you added across all services in this session.
         (This tests accumulated recall — compare to your notes)

Task 8:  Which services have a 'version' field?
         (This tests specific recall from early in the session)

Analysis (3 min)

Compare the agent's answers to tasks 4, 7, and 8 against what actually happened. Identify:

At what token count did the first error or imprecision appear?
Which task produced the first sign of context rot?
What was the context fill percentage at that point?

Fill in your degradation profile:

code

My degradation profile for this session:
- First sign of imprecision at: ____ tokens (____ % fill)
- Definite context rot at:      ____ tokens (____ % fill)
- My personal "start new session" threshold: ____ %

This threshold is yours to own. Different models, different project types, and different task complexity all shift it. Knowing your threshold makes you a dramatically more effective agent director.

Checkpoint

Answer these questions before moving to Lesson 10. If you cannot answer confidently, re-read the relevant section.

Concept checks:

What are the four categories of content that fill a context window?
A file is 300 lines of TypeScript. Approximately how many tokens is it?
What is the difference between short-term and long-term memory in an agent workflow?
Name three symptoms of context rot.
At what context fill percentage should you begin actively managing context?

Application checks:

How do you measure context usage in a Claude Code session right now?
You are at Turn 35 of a complex refactoring session. The context is at 75%. What do you do?
You want the agent to always use a specific code style in every session, forever. Which type of memory do you use?

Reflection: Think of a past AI coding session that went badly — where the agent started making mistakes or going in circles. In hindsight, was context rot a factor? What would you do differently now?

Key Takeaways

The context window is finite. Every model has a hard token limit. Every turn adds tokens. The window fills up.
Tokens = working memory. Everything the agent knows right now — files read, instructions given, conversation history — lives in the context window and costs tokens on every single turn.
There are three types of memory. Working memory (context window, session only), short-term memory (files like PLAN.md, cross-session), and long-term memory (files like CLAUDE.md, permanent). Use all three deliberately.
Context rot is predictable and measurable. Quality begins degrading around 60% fill. By 80%, you are in degraded territory. At 100%, you are at the hard limit. This is not a bug — it is physics.
Long sessions degrade. Task 1 in a 50-turn session will be excellent. Task 40 may be unreliable. Without context management, you spend more time correcting the AI than writing code.
Measure it. Use /cost in Claude Code. Know your token count. Start worrying at 60%, act at 80%.
Start fresh strategically. A new session with a good handoff summary outperforms a degraded long session every time. Short-term memory files (PLAN.md) are how you carry knowledge across session boundaries.
This is the #1 problem in vibe coding at scale. Every other technique in this course — prompting, tool use, agent workflows — only works well when you are managing context. Master this and everything else improves.

Lesson 9: Memory, Context, and the #1 Problem in Vibe Coding

Overview

Prerequisites

Part 1: The Context Window — The Agent's Working Memory (25 min)

What Is a Context Window?

Visual: The Context Window as a Container

What Goes Into the Context Window

What Is a Token?

Current Context Window Sizes (as of 2026)

Visual: The Glass Analogy

Part 2: Three Types of Agent Memory (25 min)

Memory Type 1: Working Memory

Memory Type 2: Short-term Memory

Memory Type 3: Long-term Memory

Memory Type Comparison

ASCII Diagram: Memory Hierarchy

Why This Hierarchy Matters

Part 3: Context Rot — Live Demo (25 min)

What Is Context Rot?

Live Demo Walkthrough

Signs of Context Rot

Why "Vibe Coding Doesn't Scale" Without Context Management

Part 4: Why This Is the #1 Problem (20 min)

Long Sessions Degrade in Quality

The Compounding Problem

Real-World Case Studies

The Math

The Productivity Inversion

Part 5: Measuring Context Usage (15 min)

Measuring in Claude Code

Measuring in Cursor

Rules of Thumb

Quality Degradation Curve

When to Start a New Session

Part 6: Hands-on Exercise (10 min)

Exercise: Measure Your Context Degradation Point

Checkpoint

Key Takeaways

Further Reading and References

Concept Map

Try it yourself