Lesson 4: AI Is a Tool, Not a Decision-Maker

Course: AI-Powered Development (PM Track) | Duration: 2 hours | Level: Beginner

Learning Objectives

By the end of this lesson, participants will be able to:

Identify the five most dangerous anti-patterns when teams delegate too much to AI
Explain what "context rot" is and why it reliably derails AI-assisted projects
Apply the PM's DO/DON'T framework to their own team's AI workflow
Define concrete quality gates for AI-generated work
Write a short "3 Rules for AI Use" policy for their team

Prerequisites

Lesson 1: Understanding ML, AI, and LLMs
Lesson 2: Document Intelligence
Lesson 3: Normal vs. Pro AI Usage
No technical background required. This lesson is written entirely in business language.

Lesson Outline

Part 1: The Trap — Delegating Everything to AI (20 min)

The Core Problem

There is a seductive lie at the center of AI hype: "Just tell the AI what you want, and it will build it."

This is not how AI works. AI is a powerful tool. It is not a colleague, not a strategist, not a decision-maker. When product managers treat AI as a decision-maker, predictable and costly failures follow.

This section maps out the five most common failure modes, the anti-patterns that cost teams time, money, and credibility.

The Anti-Patterns Table

Five AI Anti-Patterns That Waste Time and Money

Anti-Pattern	What It Looks Like	Real Business Cost
Blind Acceptance	Dev accepts AI output without review and merges it directly	Bugs in production, security vulnerabilities, customer-facing failures
Context Rot	AI session goes on for hours or days; output becomes inconsistent and contradictory	Hours wasted, team rebuilds features from scratch, sprint collapses
Feature Hallucination	AI "invents" functionality that was never specified	Wasted dev time, bloated codebase, technical debt accumulates
Specification Drift	Requirements evolve through conversation with AI instead of a written spec	Final product does not match what the stakeholder wanted
The 80/20 Trap	AI rapidly completes 80% of a feature, then progress halts	False sense of progress, deadline missed, morale crash at the finish line

Anti-Pattern Deep Dives

1. Blind Acceptance — "It looked right, so we shipped it"

A fintech startup hired a junior developer and gave them GitHub Copilot to build a user authentication flow. The developer accepted AI suggestions without reading them carefully. The AI-generated code contained a well-known vulnerability: it stored user passwords in plain text in the application log.

The error was not caught in code review because there was no code review policy for AI-generated code. It reached production. Three months later, an audit flagged the vulnerability. The cost: two weeks of emergency remediation, a security audit, and a delayed product launch.

The PM's mistake was assuming that because the code "worked" in testing, it was safe. AI-generated code passes functional tests but can still contain logic errors, security flaws, and compliance violations that only a human expert will catch.

2. Context Rot — "The AI forgot what we were building"

A SaaS product team used a single AI chat session to design and implement their onboarding flow. On Day 1, the AI produced excellent, coherent work. By Day 3, the AI was generating code that contradicted its Day 1 decisions: variable names changed, the data model shifted, and error handling was inconsistent.

The team spent two weeks debugging issues that turned out to be internal contradictions introduced by the AI itself. The senior developer's verdict: "We would have been faster building it the old way." Context rot is covered in detail in Part 2.

3. Feature Hallucination — "It built things we never asked for"

A product team asked their AI coding assistant to "build the reporting module." The AI produced a reporting module with eight sub-features, including a custom scheduling engine and a PDF export pipeline. The team had only asked for a simple table with filters.

Three weeks of developer time went into features that users never requested and the product roadmap never included. When the PM reviewed the sprint output, 60% of the work had to be discarded. The root cause: the PM gave an open-ended prompt with no constraints, and the AI filled the ambiguity with invented scope.

4. Specification Drift — "Requirements grew through the conversation"

A retail company used an AI assistant to refine their checkout flow requirements. Over two weeks of back-and-forth, the requirements document evolved through dialogue with the AI. The original spec called for a three-step checkout. The final spec — never reviewed by a human business analyst — called for a six-step checkout with loyalty points integration, gift wrapping options, and a recommendation engine.

The stakeholder had approved the original three-step design. Nobody had approved the drift. Development was 70% complete before anyone noticed. The fix cost six weeks of rework.

5. The 80/20 Trap — "We were so close, and then everything stopped"

A logistics company celebrated after two days of AI-assisted development. The developer reported that the core feature was "basically done — about 80% complete." The PM communicated this optimistically to the executive sponsor.

What happened next is a pattern so common it has a name: the last 20% took six weeks. The remaining work was everything AI is bad at: edge cases, error handling, integration with legacy systems, and performance under real load. AI had produced a prototype that looked impressive in a demo but could not handle real-world conditions. The executive sponsor lost confidence in both the team and the project.

Key Principle

AI accelerates the easy parts and hides the hard parts. A PM's job is to ensure the hard parts are not hidden.

Part 2: Context Rot Explained for PMs (25 min)

What Is Context Rot?

Every AI session has a context window — the amount of text the AI can "hold in mind" at once. Think of it as working memory. When you start a fresh conversation with an AI assistant, its working memory is empty and focused. As the conversation grows — as you add requirements, ask follow-up questions, paste in code, and iterate — the context window fills up.

When the context window fills, the AI does not crash. It does not warn you. It simply begins to forget earlier details, make assumptions to fill gaps, and produce output that is subtly inconsistent with what it produced earlier.

This is context rot: the gradual degradation of AI coherence over a long session.

The "72-Hour Developer" Analogy

Imagine hiring a brilliant developer who has one unusual condition: after 72 hours of continuous work, they begin to forget the early decisions they made. They remember the general goal, but forget the specific constraints. They start rewriting things they already built. They make architectural decisions that contradict ones from the first day.

You would not keep that developer working on the same task for two weeks without a break and a reset. But teams do exactly this with AI: they run single sessions for days, adding more and more context, and wonder why the output becomes inconsistent.

The fix for the 72-hour developer is the same as the fix for context rot: structured breaks, written handoffs, and fresh starts with a clear brief.

How Context Rot Affects Project Timelines

Teams experiencing context rot follow a recognizable timeline:

code

PROJECT TIMELINE WITH CONTEXT ROT

Day 1-2:  [####################] "80% done!" (Vibe coding phase)
          AI is coherent, productive, impressive output
          Team is excited, PM reports strong progress

Day 3-5:  [####              ] Slowdown begins
          Small inconsistencies appear
          Developers notice but don't flag it yet

Day 6-10: [##                ] Context rot aftermath
          AI output contradicts Day 1 decisions
          Developers spend more time debugging than building
          PM wonders why velocity dropped

Day 11-20:[#                 ] Rebuild phase
          Team abandons AI-generated code and rebuilds manually
          Morale is low, deadline is missed

Net result: 3 weeks spent on work that could have taken 1 week
            with proper AI session management

AI Session Quality Over Time

Quality Over Time — The vibe coding trap

code

Code Quality / Coherence
(single AI session, no resets)

High  |  *
      |    *  *
      |         *  *
      |               *
Med   |                  *
      |                     *  *
      |                           *
Low   |                              *  *  *
      |
      +-------------------------------------------> Time
       Hour 1         Hour 8         Hour 24+

Key: Quality peaks early and degrades steadily
     unless the session is reset with a fresh, structured brief

Signs Your Dev Team Is Experiencing Context Rot

As a PM, watch for these signals in sprint reviews and daily standups:

In sprint reviews:

"We had to redo the [X] module because it wasn't consistent with [Y]"
Velocity slows sharply after a fast start
Developers struggle to explain why a feature is built the way it is
The demo works but the developer looks nervous about edge cases
Technical debt items multiply suddenly mid-sprint

In code reviews:

Variable naming is inconsistent across files created in the same sprint
Error handling is present in some places and entirely absent in others
Functions do similar things but are implemented differently
Comments reference requirements that do not match the current spec
The architecture shifts mid-feature without explanation

In conversation:

"The AI kept changing its mind"
"We had to start a new chat because it stopped making sense"
"The AI built it, but nobody is quite sure how it works"
"It was easier to just rewrite it than figure out what the AI did"

What PMs Can Do About Context Rot

Context rot is a manageable engineering workflow problem, not an inherent flaw in AI. The solutions are organizational:

Require session boundaries. Each AI session should address one clearly bounded task. No multi-day sessions.
Require written briefs before each session. The developer should write down what they are asking AI to do before they start.
Require output reviews at session end. What was produced? Does it match the brief? Is it consistent with prior work?
Limit session length. A useful rule of thumb: if an AI session is longer than 2-3 hours, it should be split.
Require context summaries. Before starting a new session on the same feature, the developer should provide AI with a written summary of decisions made so far.

Part 3: The PM's Role — Strategic AI Direction (25 min)

The Core Principle

A product manager does not need to understand AI technically. But they must understand how to direct AI-assisted teams. The PM's job is not to use AI directly — it is to set the conditions under which their team uses AI well.

This means defining scope, requiring checkpoints, managing costs, and enforcing quality. The following DO/DON'T framework gives PMs a concrete starting point.

The PM's AI Direction Framework

Action	DO or DON'T	Explanation
Define scope before AI touches code	DO	See below
Require human review checkpoints	DO	See below
Set token/cost budgets per task	DO	See below
Break work into small, verifiable chunks	DO	See below
Track AI-generated vs. human-reviewed code	DO	See below
Use spec-driven frameworks	DO	See below
Let AI "explore" without clear requirements	DON'T	See below
Auto-merge AI-generated PRs	DON'T	See below
Let agents run without resource limits	DON'T	See below
Give "build the whole feature" prompts	DON'T	See below

The DOs — Explained

DO: Define scope BEFORE AI touches code

What it means: Before a developer opens an AI assistant to work on a feature, the feature requirements must already be written down. The AI session scope should be defined by a human, not discovered through conversation with AI.

Example in practice: Instead of having a developer chat with AI to figure out what the checkout flow should do, the PM writes a one-page requirements brief first. The developer then uses AI to implement what the brief specifies. If something is unclear, the developer asks the PM, not the AI.

Why it matters: When AI is used to discover requirements, you get specification drift. When AI is given clear requirements to implement, you get a tool doing what it is good at: structured execution.

DO: Require human review checkpoints

What it means: AI-generated work must pass through a human expert at defined intervals. These checkpoints should be written into the sprint process, not left to developer discretion.

Example in practice: Your team's definition of "done" for any AI-assisted task requires: (1) developer review, (2) a second developer code review, and (3) a PM acceptance check against the requirements brief. No AI-generated feature ships without all three.

Why it matters: AI errors compound. A small inconsistency in Hour 2 of a session becomes a structural problem by Hour 8. Regular checkpoints catch errors before they multiply.

DO: Set token/cost budgets per task

What it means: AI API usage has a direct financial cost. Each task assigned to AI should have an estimated token budget. If a task is consuming significantly more tokens than estimated, that is a signal that either the task scope is too large or context rot is occurring.

Example in practice: Your team uses an AI API that costs per token. The PM works with the tech lead to set a rough token budget for each user story (e.g., "the search feature should cost no more than X tokens to implement"). Actual costs are tracked and compared to estimates in sprint reviews.

Why it matters: Unlimited AI use is not just expensive — it is a symptom of poor task definition. Tasks with clear scope are completed in predictable token budgets. Runaway costs signal ambiguity or drift.

DO: Break work into small, verifiable chunks

What it means: AI performs best on small, well-defined tasks. User stories should be broken down to tasks that can be completed in a single, bounded AI session and verified against a specific acceptance criterion.

Example in practice: Instead of the user story "Build the user profile module," break it into: "Build the profile display page," "Build the profile edit form," "Build the profile photo upload," and "Build the profile delete confirmation." Each is a separate task with its own requirements brief and review checkpoint.

Why it matters: Large tasks increase context rot risk, make review harder, and produce outputs that are difficult to test. Small tasks are easier to verify, easier to review, and easier to redo if something goes wrong.

DO: Track AI-generated vs. human-reviewed code

What it means: Your team should maintain a record of which parts of the codebase were produced by AI and which have been reviewed and approved by a human expert. This is your AI audit trail.

Example in practice: Your team uses a tagging convention in pull requests: [AI-generated] for code produced by AI that has not yet been reviewed, and [AI-reviewed] for code that a human has verified. The PM can see at a glance how much of any feature is still in unreviewed AI-generated status.

Why it matters: Without tracking, you have no idea how much of your codebase is unreviewed AI output. This is a hidden liability. Tracking makes the liability visible so it can be managed.

DO: Use spec-driven frameworks

What it means: Require developers to provide AI with structured, standardized input — a spec — rather than conversational prompts. Spec-driven frameworks (sometimes called "structured prompting" or "prompt templates") reduce hallucination, reduce context rot, and produce more consistent output.

Example in practice: Your team has a standard "AI task brief" template. Before using AI on any task, the developer fills in: task goal, inputs available, outputs expected, constraints (e.g., must use existing data model), and out-of-scope items. The AI is given this brief, not a freeform request. This is covered in depth in Lesson 6.

Why it matters: Conversational prompts invite the AI to interpret ambiguity. Structured briefs eliminate ambiguity before the session starts.

The DON'Ts — Explained

DON'T: Let AI "explore" without clear requirements

What it looks like: "Hey AI, here's our app — what should the dashboard show? Just explore and build something." The developer uses AI to discover what to build, not just how to build it.

Why it is dangerous: AI will produce something. It might look impressive. But it will not be what your stakeholders actually need, because AI does not know your users, your business constraints, or your product strategy. Requirement discovery is a human job. When AI does it, you get feature hallucination and specification drift.

DON'T: Auto-merge AI-generated PRs

What it looks like: The team sets up automation so that if an AI agent's PR passes automated tests, it is automatically merged to the main branch.

Why it is dangerous: Automated tests catch some bugs. They do not catch security vulnerabilities, logic errors that pass tests but fail in production, compliance violations, or architectural decisions that will create technical debt. Every AI-generated PR needs a human reviewer. Always.

DON'T: Let agents run without resource limits

What it looks like: An AI coding agent is given a task and allowed to run until it completes, with no time limit, no cost limit, and no check-in requirement.

Why it is dangerous: AI agents working without limits will expand scope, consume budget, and produce work that drifts further and further from the original requirement. In documented cases, unmonitored agents have generated thousands of lines of code, incurred significant API costs, and produced output that was entirely discarded. Always set a maximum runtime, cost ceiling, and mandatory human check-in point.

DON'T: Give "build the whole feature" prompts

What it looks like: "Build the entire onboarding flow for our SaaS app." One prompt, one large output, one big review at the end.

Why it is dangerous: This maximizes context rot risk, produces output that is impossible to review thoroughly, and creates a situation where errors at the beginning of the session have propagated through every subsequent decision. Large prompts also invite feature hallucination: the AI will interpret "the entire onboarding flow" using its own assumptions, not yours.

Part 4: Building AI Quality Gates (20 min)

What Is an AI Quality Gate?

A quality gate is a checkpoint that work must pass before it moves to the next stage. Quality gates for AI-generated work serve the same purpose as quality gates for any other work: they catch problems before they become expensive.

The difference is that AI-generated work has specific failure modes — the anti-patterns from Part 1 — that traditional quality processes were not designed to catch. AI quality gates must be designed with those failure modes in mind.

Gate 1: Code Review Requirements for AI-Generated Code

Minimum standard: Every line of AI-generated code must be read and understood by a human developer before merging.

This sounds obvious. In practice, teams under deadline pressure frequently skim AI-generated code or rely solely on automated test results. Establish this as a non-negotiable policy.

What reviewers must check:

Does the code do what the requirements brief specified? (Not what the developer hoped — what was written down.)
Are there security vulnerabilities? (Input validation, authentication, data exposure, dependency risks)
Does the code introduce technical debt that will slow the team down later?
Is the code consistent with the existing architecture and conventions?
Are there edge cases the AI did not handle?

Practical tip for PMs: Include AI code review in your definition of "done." A task is not done until the review is complete and documented. Track this in your project management tool.

Gate 2: Testing Requirements — What Must Be Tested

AI-generated code must meet the same testing standards as human-written code — and in practice, should often be tested more thoroughly because AI errors can be subtle.

Required for every AI-generated feature:

Test Type	What It Checks	Why AI Specifically Needs It
Unit tests	Individual functions work correctly	AI often writes functions that work in the happy path but fail on edge cases
Integration tests	Components work together	AI-generated components can be internally consistent but fail when connected to real systems
Security tests	No known vulnerability patterns	AI learns from code on the internet, including insecure code
Acceptance tests	Feature meets business requirements	Verifies no feature hallucination or specification drift occurred
Regression tests	New code doesn't break existing features	AI changes can have unexpected downstream effects

PM checkpoint question: "Has this feature been tested against the original requirements brief?" If the answer is "the developer tested what the AI built," that is a red flag.

Gate 3: Spec Compliance Checking

Before any AI-generated feature is accepted, it must be checked against the original requirements brief. This is a structural check, not a vibe check.

The spec compliance checklist:

Does the feature do everything in the requirements brief?
Does the feature do only what is in the requirements brief? (Check for hallucinated features)
Do the UX flows match what was specified?
Are all edge cases from the spec handled?
Are there any features present that were explicitly marked as out of scope?
Does the feature match what was demonstrated in stakeholder sign-off?

If any item is unchecked: The feature goes back to development. It does not ship.

Gate 4: Token/Cost Monitoring

AI API usage should be tracked at the task level. Build this into your project management process.

What to track:

Estimated token budget per task (set before the sprint)
Actual token consumption per task (reported at sprint review)
Cost per completed feature (cumulative)
Cost overruns (actual > 150% of estimate) flagged for review

Simple dashboard view:

code

SPRINT 14 — AI COST TRACKING

Feature              Estimated Tokens   Actual Tokens   Status
-------------------------------------------------------------
User Auth Module     50,000             48,000          OK
Search Feature       30,000             95,000          FLAG: 3x budget
Notification Flow    40,000             42,000          OK
Reporting Module     20,000             180,000         FLAG: 9x budget
-------------------------------------------------------------
Sprint Total         140,000            365,000         OVER: Review required

A task consuming 3x or more its estimated token budget is a signal: either the scope was wrong, context rot occurred, or the developer is using AI as a search engine rather than a focused tool.

Gate 5: AI Audit Trail

Your team should maintain a record of AI involvement in every part of the codebase. This is not about blame — it is about risk management.

What the audit trail records:

Which features were built with AI assistance
Which sessions were used (dates, tools, models)
Which outputs were reviewed and by whom
Which outputs were rejected and why
Total AI cost per feature and per sprint

Why this matters:

When a bug is found in production, the audit trail tells you whether it was in AI-generated code or human-written code — helping your team learn which processes need improvement
When a security audit is required, you can show reviewers exactly what AI produced and what humans verified
When stakeholders ask "how much of this was built by AI?", you have a factual answer
It builds organizational knowledge about where AI adds value and where it creates risk

Part 5: Case Study — Vibe Coding Gone Wrong (15 min)

Background

A 12-person product team at a B2B software company was building a client reporting portal. The PM had seen demos of AI-assisted development and was excited to accelerate the timeline. The tech lead proposed using an AI coding agent to build the first version.

The decision: let the AI agent work with minimal constraints. No spec-driven briefs, no token budget, no session boundaries. "Let's see what it can do."

The Timeline

Day 1 — Excitement

The AI agent produced a working prototype of the reporting portal in eight hours. It included a data table, filtering, sorting, and a chart module. The tech lead showed the PM a demo. The PM sent a message to the executive sponsor: "We're ahead of schedule. Core feature is 80% done."

What was not visible: the AI had made significant architectural decisions on its own — decisions that were not in the requirements brief and not reviewed by a human.

Day 5 — First Warning Signs

The developer continued working with the AI to add features. The AI session was now five days old. Subtle inconsistencies had begun to appear: the filtering logic used different data formats in different parts of the module. Some error messages referenced variables that did not exist.

The developer flagged this in the daily standup: "There are some weird inconsistencies I'm working through." The PM noted it but did not escalate. Timeline pressure was high.

Day 10 — The Wall

The developer reported that progress had stopped. The AI was generating code that contradicted its earlier work. Fixing one inconsistency introduced two more. The tech lead reviewed the codebase and identified the core problem: the AI had used three different data models in the same module, and the entire architecture was internally inconsistent.

The tech lead's assessment: "We can't patch this. The foundation is wrong."

The PM sent a revised timeline to the executive sponsor. The original estimate of "2 weeks total" was now "at least 4 more weeks."

Day 20 — The Rebuild

The team made the decision to discard the AI-generated code and rebuild the module from scratch — this time with a human-written spec and proper review checkpoints.

The rebuild took two weeks and produced a codebase that was stable, consistent, and maintainable.

Final count:

Total elapsed time: 5 weeks
Time lost to context rot and rework: 3 weeks
Original human estimate without AI: 3 weeks
Net result: AI made the project take longer

What Went Wrong — Root Causes

Stage	What Went Wrong	Warning Sign That Was Missed
Day 1	AI made architectural decisions without a spec	"80% done" after Day 1 should have been a flag, not a celebration
Day 1-5	Session ran for 5 days without reset or review	No session boundary policy
Day 5	Inconsistencies flagged but not escalated	PM treated it as a normal development hiccup
Day 10	No audit trail made diagnosis slow	Could not identify when the architectural decision was made
Day 10	No spec to check against	Could not define "correct" because requirements were never written down

What They Would Do Differently

The tech lead documented the lessons:

"We would write the requirements brief before touching AI. One page, specific, with explicit out-of-scope items."
"We would set a session boundary of half a day maximum. New brief, new session, every time."
"We would require a code review checkpoint after every session, not at the end of the sprint."
"We would track token costs. The token spike on Day 3 would have told us something was wrong."
"The PM would own the spec. If it's not written down and approved, the developer doesn't start the session."

The Broader Pattern

This case is not unusual. The pattern — fast early progress followed by a slow, painful rework phase — appears across AI-assisted projects that lack governance. The cause is always some combination of the five anti-patterns from Part 1.

The solution is always the same: AI is a tool. Tools require operators. Operators require processes. Processes require owners. In product development, the PM is the process owner.

Part 6: Hands-on — Write Your Team's AI Policy (15 min)

Why a Policy Matters

A policy is not bureaucracy. It is a decision made once so it does not have to be made again under pressure. When a developer is facing a deadline and asks "is it okay to skip the code review on this AI-generated PR?" the answer should already exist. It should be in the policy.

The best AI policies are short, specific, and written by the people who will follow them.

The "3 Rules for AI Use" Template

Use this template to draft your team's AI use policy. Keep each rule to 2-3 sentences: one sentence for the rule, one for the reason, and one for the consequence of breaking it.

code

============================================================
[TEAM NAME] AI USE POLICY
Version: ____     Date: ____     Owner: ____
============================================================

RULE 1: [Scope Rule]
Before any developer uses AI on a task, _________________.
This rule exists because _________________________________.
If this rule is not followed, __________________________.

RULE 2: [Review Rule]
Before any AI-generated work is merged or shipped, _______.
This rule exists because _________________________________.
If this rule is not followed, __________________________.

RULE 3: [Limit Rule]
AI sessions and AI agents must __________________________.
This rule exists because _________________________________.
If this rule is not followed, __________________________.

============================================================
APPROVED BY: ________________     DATE: __________________
============================================================

Example — Completed Policy

Here is an example of a completed policy from a hypothetical product team:

code

============================================================
ACME PRODUCT TEAM — AI USE POLICY
Version: 1.0     Date: April 2026     Owner: PM Lead
============================================================

RULE 1: Spec First
Before any developer uses AI on a task, a written
requirements brief must exist and have PM sign-off.
This rule exists because AI expands scope when requirements
are ambiguous, and undefined scope creates context rot.
If this rule is not followed, the output is not accepted
into the sprint and the developer restarts with a brief.

RULE 2: Human Review Before Merge
Before any AI-generated code is merged to main, it must
pass a review by a second developer who certifies they
have read and understood every line.
This rule exists because AI produces plausible-looking
code that can contain subtle security and logic errors.
If this rule is not followed, the PR is reverted and
the developer and reviewer complete a process review.

RULE 3: Session Limits
AI sessions may not exceed 3 hours without a reset.
AI agents must have a cost ceiling set before they run.
This rule exists because long sessions cause context rot
and unlimited agent runs incur unpredictable costs.
If this rule is not followed, the tech lead is notified
and the output is subject to mandatory full review.

============================================================
APPROVED BY: [PM Lead signature]     DATE: April 1, 2026
============================================================

Workshop Instructions

Individual work (8 minutes): Using the blank template above, write your team's 3 Rules for AI Use. Be specific to your actual team and product context. Avoid generic rules — write rules you would actually enforce.
Pair share (4 minutes): Share your policy with one other participant. Give each other one piece of feedback: one rule that is strong, and one rule that could be more specific.
Group discussion (3 minutes): Two or three volunteers share their policies. Discuss: What is the hardest rule to enforce? What makes a rule actually work in practice?

Key Takeaways

AI is a tool, not a decision-maker. It executes well when given clear instructions. It fails unpredictably when given decision-making authority.
The five anti-patterns — blind acceptance, context rot, feature hallucination, specification drift, and the 80/20 trap — are predictable and preventable.
Context rot is the most insidious failure mode: it is invisible until significant damage has occurred, and it makes AI-assisted development slower than traditional development.
The PM's job is to define scope before AI touches anything, require review checkpoints, manage costs, and track what AI has produced.
Quality gates — code review, testing, spec compliance, cost monitoring, and audit trail — make AI risk visible and manageable.
A short, specific AI use policy is the most practical first step a PM can take to improve their team's AI discipline.

Common Mistakes to Avoid

"I trust my developer to use AI responsibly." Trust is not a process. Processes are what make trust scalable across a team. Give your developer a framework, not just trust.
"If the tests pass, the code is fine." Automated tests catch functional errors. They do not catch security vulnerabilities, architectural inconsistencies, or compliance violations. Tests are necessary but not sufficient.
"We're already behind, so we don't have time for review checkpoints." This is the reasoning that turns a two-week delay into a six-week delay. Review checkpoints are how you avoid the rebuild phase, not a luxury for when timelines are comfortable.
"AI wrote it, so the developer doesn't need to understand it." Every line of code in your codebase is your team's responsibility. If a developer cannot explain what AI-generated code does, it should not be in your codebase.
"We'll write the policy later." The best time to write an AI use policy is before your team starts using AI heavily. The second-best time is now.

Homework / Self-Study

Finalize your policy: Revise the 3 Rules for AI Use policy you drafted in the workshop. Have it reviewed by your tech lead or a developer on your team. Share it at your next team meeting.
Conduct a retrospective: If your team has already been using AI-assisted development, run a short retrospective. Ask: "Have we seen any of the five anti-patterns?" Even identifying one is valuable.
Read: Search for post-mortems or case studies on "AI-generated code in production failures" or "context window limitations LLM." Note the patterns that match what we covered in this lesson.
Prepare for Lesson 5: The next lesson covers context engineering — how to structure your prompts and sessions to get consistently good results from AI. The groundwork you laid in this lesson (spec-driven briefs, session boundaries) is the foundation for Lesson 5.

Checkpoint

Deliverable: A completed "3 Rules for AI Use" policy for your team.

Format: Use the template from Part 6. The policy must include:

Three specific, enforceable rules
A reason for each rule
A stated consequence for each rule
Your name as owner and today's date

Sharing: Post your policy in the course discussion channel before the next session. Participants who share their policy will receive feedback from the instructor and peer reviewers.

Evaluation criteria:

Are the rules specific to your team context, or are they generic?
Would a developer on your team know exactly what to do (and not do) based on this policy?
Would a PM on your team know exactly what to check in a sprint review based on this policy?
Is each rule actually enforceable, or is it aspirational?

Next Lesson Preview

In Lesson 5: Context Engineering for PMs, we will:

Learn what "context" means in AI and why it is the most important input a PM controls
Build a standard AI task brief template for your team
Practice writing briefs that eliminate ambiguity before the AI session starts
Understand how context engineering connects to the quality gates from this lesson

Bring your completed AI use policy — it will be the foundation for the context engineering framework you build in Lesson 5.

Back to Course Overview | Next Lesson: Context Engineering →

Lesson 4: AI Is a Tool, Not a Decision-Maker

Learning Objectives

Prerequisites

Lesson Outline

Part 1: The Trap — Delegating Everything to AI (20 min)

The Core Problem

The Anti-Patterns Table

Anti-Pattern Deep Dives

Key Principle

Part 2: Context Rot Explained for PMs (25 min)

What Is Context Rot?

The "72-Hour Developer" Analogy

How Context Rot Affects Project Timelines

AI Session Quality Over Time

Signs Your Dev Team Is Experiencing Context Rot

What PMs Can Do About Context Rot

Part 3: The PM's Role — Strategic AI Direction (25 min)

The Core Principle

The PM's AI Direction Framework

The DOs — Explained

The DON'Ts — Explained

Part 4: Building AI Quality Gates (20 min)

What Is an AI Quality Gate?

Gate 1: Code Review Requirements for AI-Generated Code

Gate 2: Testing Requirements — What Must Be Tested

Gate 3: Spec Compliance Checking

Gate 4: Token/Cost Monitoring

Gate 5: AI Audit Trail

Part 5: Case Study — Vibe Coding Gone Wrong (15 min)

Background

The Timeline

What Went Wrong — Root Causes

What They Would Do Differently

The Broader Pattern

Part 6: Hands-on — Write Your Team's AI Policy (15 min)

Why a Policy Matters

The "3 Rules for AI Use" Template

Example — Completed Policy

Workshop Instructions

Key Takeaways

Common Mistakes to Avoid

Homework / Self-Study

Checkpoint

Next Lesson Preview

Concept Map

Try it yourself