JustLearn
AI-Powered Development: PM Track
Beginner2 hours

Lesson 1: ML, AI, and LLM — Clearing the Fog

Course: AI-Powered Development (PM Track) | Duration: 2 hours | Level: Beginner

Learning Objectives

By the end of this lesson, you will be able to:

  • Explain the relationship between AI, Machine Learning, Deep Learning, and LLMs without getting lost in jargon
  • Describe the key differences between traditional ML and LLMs — and why that distinction changes your project planning
  • Understand how LLMs work at a conceptual level (tokens, temperature, context window)
  • Identify the major AI providers, pricing models, and the right questions to ask vendors
  • Correctly classify at least 8 out of 10 real business problems as "traditional ML", "LLM", or "not AI at all"

Prerequisites

  • No AI or technical background required
  • Basic familiarity with software projects (you have shipped or managed software before)
  • Curiosity and willingness to question assumptions about what "AI" means

Part 1: The Hierarchy — Not Everything Is the Same (25 min)

Why This Confusion Exists

When your CEO says "we need to add AI", when a vendor pitches "our ML-powered platform", and when an engineer mentions "the LLM API" — they are not talking about the same things. These terms get used interchangeably in marketing, headlines, and hallway conversations. That imprecision costs PMs real money: wrong estimates, wrong vendor choices, wrong build-vs-buy decisions.

Here is the truth: these terms have a specific relationship to each other. They are nested. Each one is a more specific version of the one above it.

The Nested Structure

code
+----------------------------------------------------------+
|                  ARTIFICIAL INTELLIGENCE                 |
|   Any computer system doing "smart" things               |
|                                                          |
|   +--------------------------------------------------+   |
|   |             MACHINE LEARNING                     |   |
|   |   Computers learning from data instead of        |   |
|   |   being explicitly programmed                    |   |
|   |                                                  |   |
|   |   +------------------------------------------+   |   |
|   |   |           DEEP LEARNING                  |   |   |
|   |   |   ML using neural networks               |   |   |
|   |   |   (many layers of math)                  |   |   |
|   |   |                                          |   |   |
|   |   |   +----------------------------------+   |   |   |
|   |   |   |   LARGE LANGUAGE MODELS (LLMs)   |   |   |   |
|   |   |   |   Deep learning trained on text  |   |   |   |
|   |   |   |   Generates text, follows        |   |   |   |
|   |   |   |   instructions                   |   |   |   |
|   |   |   |   e.g. ChatGPT, Claude, Gemini   |   |   |   |
|   |   |   +----------------------------------+   |   |   |
|   |   +------------------------------------------+   |   |
|   +--------------------------------------------------+   |
+----------------------------------------------------------+

Every LLM is a form of deep learning. Every deep learning system is a form of machine learning. Every machine learning system is a form of artificial intelligence. But not every AI system is an LLM — not even close.

The AI Hierarchy — AI, Machine Learning, Deep Learning, and LLMs

Each Layer Explained

Artificial Intelligence (the whole outer box)

The broadest possible term. AI means any computer system that does something we would previously have considered to require human intelligence: recognizing images, translating text, playing chess, navigating a car through traffic.

AI does not have to learn from data. A chess program from 1997 that follows hand-coded rules (if the queen is threatened, move the queen) is technically AI. A thermostat that adjusts temperature based on rules is a very simple AI. The word means "smart-ish computer behavior", nothing more.

Everyday analogy: AI is like a car that drives itself. The car is doing something that, until recently, only a human could do. But "self-driving car" tells you nothing about how the car figures out where to go — rules, cameras, sensors, learning from millions of miles of data — those are implementation details.

Machine Learning (the second box)

A subset of AI where the system learns from data rather than being explicitly programmed with rules.

Old way (traditional software): A programmer writes rules. "If email contains the word 'Nigerian prince', mark as spam." The programmer must anticipate every case.

Machine learning way: Show the system thousands of emails labeled "spam" and "not spam". The system figures out the patterns on its own. It may notice correlations no programmer would have thought to code.

The key shift: you do not program the rules. You provide the data and let the machine find the rules.

Everyday analogy: A spam filter that gets smarter over time. When you mark an email as spam, your email client learns. Over millions of users doing this, the system builds a model that catches spam patterns you never explicitly described.

Deep Learning (the third box)

A subset of machine learning that uses structures called neural networks — layers of mathematical functions loosely inspired by how neurons in the brain connect. More layers = "deeper" = more complex patterns the system can learn.

Deep learning is what made facial recognition work well, what powers voice assistants, what translates languages, and what generates images. Before deep learning, these tasks were either impossible or required painstaking human engineering of features.

The reason deep learning is its own category: it changed what was computationally possible. Traditional machine learning is good at learning patterns from structured data (rows and columns). Deep learning excels at unstructured data: images, audio, text, video.

Everyday analogy: Face recognition on your phone. Your phone learned what your face looks like from a few photos. A traditional ML system would have required someone to manually define what a "face feature" is. Deep learning figured that out itself.

Large Language Models — LLMs (the innermost box)

A specific type of deep learning model trained on text — enormous amounts of text scraped from the internet, books, and other sources. Trained to predict what word (technically, what "token") comes next in a sequence.

LLMs are what most people mean when they say "AI" today. ChatGPT is an LLM. Claude is an LLM. Gemini is an LLM.

The word "large" refers to the number of parameters (billions to trillions of internal numerical weights) and the scale of training data. Size matters because larger models learn more subtle patterns.

Everyday analogy: ChatGPT, Claude, Gemini — the "AI" most people mean today. You type a question in plain English. You get a response in plain English. You can ask it to write code, summarize a document, translate text, explain a concept, or draft an email. This is what has captured public attention since late 2022.

The PM Implication

When someone says "let's use AI for this", your first question should be: "What kind of AI?" The answer determines your entire project approach — the data you need, the team you need, the timeline, the cost, and the risk. We will unpack that fully in Part 2.

Part 2: Traditional ML vs LLM — What PMs Must Know (25 min)

Why the Distinction Changes Everything

These are fundamentally different technologies that happen to both get called "AI". Using one when you need the other is like specifying concrete when you need steel — both are building materials, but the choice determines the architecture, the timeline, and the cost.

Head-to-Head Comparison

Traditional ML vs LLMs — Key differences for project managers

DimensionTraditional MLLarge Language Models (LLMs)
What you feed itStructured data: tables, numbers, categories, rows and columnsNatural language: text, documents, code, questions, instructions
What it producesA prediction or classification: a number, a category, a probabilityGenerated text: a response, a document draft, code, a summary
How it was trainedOn your specific data (your company's transaction history, your product catalog, your customer records)On massive internet-scale text (billions of web pages, books, code repositories)
How you customize itYou collect labeled data, train a model, retrain when things driftYou write prompts; optionally fine-tune on your data for specialized behavior
Cost structureHigh upfront cost (data labeling, model training), low ongoing cost per predictionLow setup cost, ongoing cost per use (pay per "token" — roughly per word processed)
LatencyVery fast predictions (milliseconds) — good for real-timeSlower, seconds per response — not suitable for real-time decisions
ExplainabilityVaries; some models (decision trees) are interpretableGenerally a black box; can explain reasoning but cannot be audited like code
Data privacyYour data stays in your infrastructureData sent to an external API unless you self-host or use enterprise agreements
MaintenanceRequires monitoring for model drift, periodic retrainingPrompt maintenance, new model versions, API deprecation

Real-World Examples: Traditional ML

These problems have been solved with traditional ML for years. They are often called "AI" but have nothing to do with LLMs.

Spam filter. Input: email metadata + content features. Output: spam or not spam. Trained on labeled emails. Runs in milliseconds. Has existed since the early 2000s.

Fraud detection. Input: transaction data (amount, location, merchant category, time of day, historical patterns). Output: fraud probability score. Trained on historical fraud cases. Processes millions of transactions per second.

Recommendation engine. Input: your purchase history, browsing behavior, similar users' behavior. Output: ranked list of products you might buy. Trained on historical interaction data. Drives significant revenue for e-commerce companies.

Demand forecasting. Input: historical sales data, seasonality, marketing events, external factors (weather, holidays). Output: predicted sales for next period. Trained on time-series data. Used in supply chain planning.

What these have in common: they consume structured data (rows and columns), they produce a prediction or score, and they require your company's own historical data to train.

Real-World Examples: LLMs

These problems require understanding and generating natural language. They are what people mean when they say "use AI" today.

Code generation. A developer describes what a function should do in plain English. The LLM writes the code. GitHub Copilot, Claude Code, Cursor — all LLM-powered.

Document drafting. A PM uploads a bunch of notes and asks the LLM to produce a first draft of a requirements document, a status report, or a project brief.

Customer service chatbot. A customer types a question in natural language. The LLM reads the question, retrieves relevant policy information, and generates a helpful answer. Unlike old chatbots (rule trees), LLMs handle unexpected phrasings.

Data analysis assistant. A business analyst uploads a CSV and asks "what are the top three trends in this data?" The LLM interprets the data and explains patterns in plain English.

The PM Takeaway

"Which kind?" changes everything.

QuestionTraditional ML answerLLM answer
What data do I need to collect?Labeled examples of your specific problemLittle or none — the model is already trained
How long until we have a working prototype?Weeks to months (data labeling, training, evaluation)Days (write prompts, call an API)
What does it cost to get started?High (data infrastructure, ML engineers, compute)Low (API key, per-token pricing)
What does it cost at scale?Low per prediction once trainedGrows linearly with usage (every query costs tokens)
What kind of specialist do I need?ML engineer, data scientistPrompt engineer, software developer who can call APIs
What is the main risk?Model not performing well enough on your data; data qualityHallucination (making things up); prompt injection; data privacy

Part 3: How LLMs Actually Work (Simplified) (20 min)

You do not need to understand the mathematics. But you do need to understand the mechanism, because it directly explains both the capabilities and the failure modes you will encounter in your projects.

The Fundamental Mechanism: Next-Word Prediction

At its core, an LLM does one thing: given some text, it predicts what text should come next.

That sounds trivial. It is not.

To predict the next word well across all possible text — code, legal contracts, poetry, physics papers, recipes, sales emails — the model must learn an enormous amount about the world. It must learn grammar, factual knowledge, logical inference, cause and effect, how code works, how arguments are structured. All of this emerges from the training process.

Imagine this: if you showed a student every book ever written and trained them to predict the next word in any passage, they would develop an extraordinarily deep understanding of language, facts, and reasoning — not because you taught them directly, but because that is what the task requires.

Training: How the Knowledge Gets In

Training works roughly like this:

  1. Collect a massive corpus of text — web pages, books, code repositories, academic papers, Wikipedia, legal documents.
  2. Repeatedly show the model a piece of text with the last few words hidden.
  3. Ask the model: "What should come next?"
  4. Check how close the model's prediction was to the actual text.
  5. Adjust the model's internal parameters to make it slightly better at predicting.
  6. Repeat billions of times.

The result: a model that has compressed an enormous amount of human knowledge and reasoning into its parameters.

After this base training, the model is further refined through a process called RLHF (Reinforcement Learning from Human Feedback) — human trainers rate responses for quality and safety, and the model is adjusted to produce better, safer outputs. This is what turns a raw text predictor into a helpful assistant.

Why This Is Powerful

Because the training data includes reasoning, the model learns to reason. Because the training data includes code, the model learns to write code. Because the training data includes arguments, the model learns to argue. The pattern-learning generalizes far beyond simple word prediction.

This is why LLMs can do things that seem magical: explaining a concept in multiple ways, writing in a specific style, debugging code, drafting legal language, translating across languages it has never seen side-by-side. All of it emerges from the same underlying training process.

Why This Is Limited

Prediction is not understanding. The model does not "know" things in the way a human expert knows things. It has learned patterns — and sometimes those patterns lead it confidently in the wrong direction.

Hallucination: The model generates text that is plausible-sounding but factually wrong. It invents citations that do not exist. It produces code that looks right but fails to run. It states incorrect facts with confident language. This is not malice — it is what happens when a prediction system is asked about something outside its reliable patterns.

No real-time knowledge: The model's knowledge has a training cutoff date. It does not know about events after that date unless you provide the information in your prompt.

No persistent memory by default: Each conversation starts fresh. The model does not remember your previous conversations unless the application explicitly provides that context.

Three Concepts Every PM Must Know

Tokens

The model does not process words — it processes "tokens". A token is roughly a word or word fragment. "Unbelievable" might be 3 tokens. "cat" is 1 token. Code and technical text tend to use more tokens per word than plain English.

Why this matters for PMs: You pay per token. Input tokens (what you send the model) and output tokens (what it generates) are both billed. A prompt that includes a 200-page PDF costs far more tokens than a prompt that includes a 2-page summary. Controlling token usage is directly controlling cost.

Rough guideline: 1,000 tokens is approximately 750 words of English text.

Temperature

Temperature is a setting that controls how "creative" or "random" the model's outputs are. It ranges from 0 to 1 (or sometimes higher).

  • Temperature 0: The model always picks the most probable next token. Output is deterministic and consistent — the same input always produces the same output. Use this for tasks where consistency matters: classifying data, extracting structured information, answering factual questions.
  • Temperature 1 (or higher): The model introduces randomness, sometimes picking less probable tokens. Output varies between runs. Use this for creative tasks: brainstorming, generating multiple draft options, creative writing.

Why this matters for PMs: If your use case requires reliable, repeatable outputs (parsing invoices, classifying support tickets), specify low temperature. If your use case benefits from variety (generating five different email subject lines), use higher temperature.

Context Window

The context window is the amount of text the model can "see" at once — its short-term memory. Everything you send it (your instructions, the conversation history, any documents you include) plus everything it generates must fit within this window.

Current context windows range from about 8,000 tokens for some models to 200,000+ tokens for models like Claude 3.5. 200,000 tokens is roughly 150,000 words — about two full novels.

Why this matters for PMs: If your use case requires processing long documents, context window size is a key constraint. If you need to analyze a 500-page contract, you need a model with a large enough context window or you need a strategy for chunking the document. Context window directly affects what use cases are feasible and which model you should choose.

Part 4: The Current AI Landscape for PMs (20 min)

The Major Providers (as of Q1 2026)

You do not need to pick one and stick with it. Many production applications use multiple providers for different tasks. But you do need to understand who the players are.

Anthropic — Claude

Models: Claude 3.5 Haiku (fast, cheap), Claude 3.5 Sonnet (balanced), Claude 3 Opus (most capable, most expensive), Claude 3.7 Sonnet (strong reasoning).

Strengths: Long context windows (up to 200K tokens), strong reasoning, safety-focused, excellent at following complex instructions and producing well-structured documents. Popular for enterprise use cases involving document analysis and code.

API access: api.anthropic.com. Also available through AWS Bedrock and Google Cloud Vertex AI.

OpenAI — ChatGPT / GPT-4

Models: GPT-4o (current flagship), GPT-4o mini (faster and cheaper), o1 and o3 series (specialized reasoning models).

Strengths: Largest ecosystem, most third-party integrations, widest awareness. GPT-4o includes image understanding and voice. The o1/o3 reasoning models are specifically optimized for complex multi-step reasoning tasks like math and code.

API access: api.openai.com. Also available through Azure OpenAI Service.

Google — Gemini

Models: Gemini 1.5 Flash (fast), Gemini 1.5 Pro (1 million token context), Gemini 2.0 series.

Strengths: The largest context window in the industry (1M tokens for Gemini 1.5 Pro — enough to process an entire codebase). Strong multimodal capabilities (text, images, audio, video). Deep integration with Google Workspace and Google Cloud.

API access: Google AI Studio (free tier), Vertex AI (enterprise).

Open Source — Llama, Mistral, and others

Meta's Llama 3 series, Mistral, Qwen, and many others are open-weight models — meaning the model weights are publicly available and can be run on your own infrastructure.

Why this matters: No per-token cost once deployed. Data never leaves your infrastructure. Full control. But: you need infrastructure to run it, you need to manage updates, and performance is generally below the frontier commercial models for complex tasks.

Popular platforms for running open-source models: Ollama (local), Together.ai, Fireworks.ai, Replicate (managed hosting).

Pricing Models

Per-token pricing (most common for APIs)

You pay for input tokens + output tokens. Prices vary enormously.

Rough current pricing ranges (per 1 million tokens, as of early 2026):

Model tierInput tokensOutput tokens
Fast/cheap (GPT-4o mini, Claude Haiku, Gemini Flash)$0.10 – $0.30$0.30 – $0.60
Balanced (GPT-4o, Claude Sonnet)$2.50 – $5.00$10 – $15
Most capable (Claude Opus, o1)$15 – $30$60 – $80

Why output is more expensive: generating tokens requires more compute than reading them.

For PM budget planning: Think about your expected usage. If your application runs 10,000 queries per day, each processing 1,000 input tokens and generating 500 output tokens, you need to calculate:

10,000 x (1,000 input + 500 output) / 1,000,000 x price per million = daily cost

This scales predictably, which is useful for forecasting.

Subscription pricing (for end-user products)

ChatGPT Plus: $20/month for individual users. Claude Pro: $20/month for individual users. Copilot for Microsoft 365: $30/user/month.

These are irrelevant for building products (you cannot resell a personal subscription). They are relevant for evaluating tools for your team's own use.

Enterprise licensing

For large organizations, providers offer enterprise agreements with:

  • Volume discounts
  • Data processing agreements (GDPR compliance)
  • Guaranteed data isolation (your data is not used to train the model)
  • SLAs for uptime and response time
  • Dedicated support

Enterprise pricing is negotiated, not published. Expect meaningful discounts at significant scale.

Deployment Options

Cloud API (simplest)

You call the provider's API. The model runs on their infrastructure. You pay per token. Setup takes hours. No infrastructure to manage. Your data goes to the provider's servers.

Best for: prototyping, small to medium scale, teams without ML infrastructure, use cases where data sensitivity is manageable.

Managed cloud (more control)

Run the model through a cloud platform (AWS Bedrock, Azure OpenAI, Google Vertex AI). The model still runs on cloud infrastructure, but through your cloud account. Better data governance. Often satisfies enterprise compliance requirements.

Best for: enterprise environments already on AWS/Azure/GCP, where data residency requirements rule out direct API calls.

Self-hosted open source (maximum control)

Run an open-source model on your own servers or cloud VMs. No per-token cost. Data never leaves your infrastructure. But: significant infrastructure cost and operational burden. Requires GPU hardware (or cloud GPU instances).

Best for: high-volume use cases where per-token costs become prohibitive, regulated industries with strict data residency requirements, organizations with AI infrastructure teams.

Hybrid

Use a cloud API for non-sensitive tasks (drafting, summarization of public information) and a self-hosted model for tasks involving sensitive data. Complex to manage but increasingly common in large enterprises.

Security Considerations and Vendor Questions

When evaluating an AI vendor, these are the questions that matter for enterprise use:

Data retention: Does the provider store my inputs and outputs? For how long? Who has access?

Model training: Will my data be used to train future versions of the model? (Default answer for most enterprise plans: no. Default for consumer tiers: often yes.)

Data residency: Where are the servers? Does my data leave my country or region? Does this comply with GDPR, HIPAA, or other applicable regulations?

Compliance certifications: SOC 2 Type II? ISO 27001? HIPAA BAA available? These are table stakes for enterprise use.

SLA: What is the guaranteed uptime? What are the rate limits? What happens when the API is slow or down — does your product degrade gracefully?

Model versioning and deprecation: When a new model version is released, does the old one stay available? For how long? (APIs that silently upgrade model versions can break your application's behavior.)

Audit logging: Can you get logs of all API calls — inputs, outputs, timestamps — for compliance and debugging?

Part 5: Hands-on — Classify 10 Real Problems (20 min)

Read each problem. Before reading the answer, try to classify it yourself: "traditional ML", "LLM", or "not AI at all".

Problem 1: Predict which customers will churn next month

A B2B SaaS company wants to identify which customers are likely to cancel their subscription in the next 30 days so the customer success team can intervene.

Answer: Traditional ML.

Why: This is a classification problem using structured data — login frequency, feature usage, support ticket volume, contract value, days since last active use, payment history. You train a model on historical data: customers who churned and their leading indicators. The model outputs a probability score. This has nothing to do with language. LLMs are not the right tool here — they cannot analyze rows of behavioral data. A gradient boosting model (like XGBoost) or logistic regression handles this well.

PM consideration: You need labeled historical data (who churned, when) and feature engineering (building the right input variables). Timeline: 4–8 weeks minimum to get to a usable model.

Problem 2: Draft weekly status reports from Jira tickets

A development team wants to automatically generate a plain-English status report every Friday by reading the week's closed Jira tickets, in-progress items, and blockers.

Answer: LLM.

Why: This is a text generation task using structured source data as input. You feed the LLM a formatted list of Jira ticket titles, statuses, and descriptions, and ask it to produce a narrative summary in the format you specify. LLMs excel at this: transforming structured data into readable prose. Traditional ML cannot generate coherent narrative text.

PM consideration: This is a quick win. A working prototype can be built in 1–2 days using the Jira API to pull data and an LLM API to generate the report. Key risk: the LLM might misinterpret technical ticket language — review outputs before automating distribution.

Problem 3: Detect fraudulent credit card transactions

A financial services company wants to flag suspicious transactions in real time — before the payment is processed.

Answer: Traditional ML.

Why: Real-time fraud detection has strict latency requirements — decisions must be made in milliseconds. LLMs take seconds per response and cost per token, making them completely unsuitable for processing millions of transactions per day. Traditional ML models (trained on historical fraud patterns, transaction velocity, merchant categories, geographic anomalies) are the right tool. This is a mature, well-solved problem — specialized fintech ML platforms exist for exactly this.

PM consideration: This is a highly regulated domain. Explainability of fraud decisions may be legally required. LLMs are even less suitable here — they cannot be audited for decision logic.

Problem 4: Generate API documentation from code

A platform team wants to automatically produce human-readable API documentation from their Python/TypeScript source code, including method descriptions, parameter explanations, and usage examples.

Answer: LLM.

Why: Reading code and generating natural language explanations is precisely what LLMs were trained to do — code and documentation appear together across millions of GitHub repositories in training data. LLMs can read a function signature, understand its purpose, and write a clear explanation with examples. Traditional ML cannot generate natural language.

PM consideration: This is a high-value quick win for engineering teams. The main risk is the LLM misunderstanding domain-specific code — human review of generated docs before publishing is recommended. Tools like GitHub Copilot already do this at the IDE level; building a batch version for your whole codebase is straightforward.

Problem 5: Route support tickets to the right team

A company receives thousands of support tickets per day and wants to automatically assign each one to the correct team: billing, technical support, account management, or feature requests.

Answer: Both are valid — but they differ.

Traditional ML approach: Train a text classification model on historical labeled tickets. Fast, cheap per inference, highly accurate if you have labeled data. Works well when categories are stable and well-defined.

LLM approach: Use an LLM to read the ticket and classify it. More flexible — handles novel phrasings, edge cases, ambiguous tickets. Can also extract more information (urgency, sentiment, affected product). Higher cost per ticket.

PM consideration: For high volume (100K+ tickets/day), the per-token cost of LLMs adds up — traditional ML may be more economical. For lower volume or where flexibility and accuracy are critical, LLMs are compelling. Many real systems use an LLM to build the labeled dataset for a traditional ML classifier.

Problem 6: Translate product descriptions into 10 languages

An e-commerce company has 50,000 product descriptions in English and wants translations in French, German, Spanish, Japanese, Mandarin, Arabic, Portuguese, Italian, Korean, and Dutch.

Answer: LLM (or specialized translation APIs, which are also ML-powered).

Why: Language translation is a core LLM capability. Modern LLMs produce high-quality translations with awareness of tone, cultural nuance, and product terminology. Specialized translation APIs (DeepL, Google Translate) use a different form of ML optimized for translation and are often cheaper and faster for pure translation at scale.

PM consideration: For 50,000 descriptions, pure cost matters. Compare LLM pricing (can preserve product tone and style) against specialized translation API pricing (usually cheaper, less flexible). A hybrid — translation API for the bulk, LLM review for high-priority products — is often optimal. Key risk: brand-specific terms may not translate correctly; human review by native speakers is important for launch.

Problem 7: Forecast next quarter's revenue from historical data

A finance team wants a data-driven model that predicts next quarter's revenue based on historical sales data, pipeline data, macro indicators, and seasonal patterns.

Answer: Traditional ML (specifically, time-series forecasting).

Why: Revenue forecasting uses structured numerical data (dates, revenue figures, deal stages, market indicators). The task is predicting a number, not generating text. Time-series ML models — from simple statistical methods (ARIMA) to more sophisticated gradient boosting models — are the right tool. LLMs cannot do reliable quantitative time-series forecasting; they have no special ability to analyze numerical patterns in historical data.

PM consideration: Common mistake — feeding revenue data to ChatGPT and asking for a forecast. The LLM will produce a plausible-sounding answer, but it is not actually doing time-series analysis. It is generating text that sounds like a forecast. Use a real forecasting tool.

Problem 8: Summarize 100-page contracts into 2-page briefs

A legal team receives large contracts and needs to quickly produce executive summaries highlighting key terms, obligations, dates, and risk clauses.

Answer: LLM.

Why: Reading long documents and producing concise summaries is a core LLM strength. A 100-page contract is approximately 75,000–100,000 words — within the context window of modern LLMs like Gemini 1.5 Pro or Claude 3 with extended context. You can instruct the model to focus on specific clause types and produce structured output in a consistent format.

PM consideration: This is a strong enterprise use case with high value and clear ROI. Key risks: LLMs can miss important clauses or misinterpret legal language. Human legal review of the summary before reliance is essential — treat the LLM output as a first draft, not a final product. For regulated industries, ensure your data processing agreement with the LLM vendor covers confidential legal documents.

Problem 9: Recommend products based on purchase history

An e-commerce retailer wants to show each customer personalized product recommendations on the homepage and in emails.

Answer: Traditional ML.

Why: Product recommendation is one of the most mature traditional ML use cases. Collaborative filtering (finding customers similar to you, recommending what they bought), matrix factorization, and deep learning recommendation models have powered Amazon, Netflix, and Spotify for years. The input is structured behavioral data (user IDs, product IDs, purchase timestamps, ratings). The output is a ranked list of product IDs. LLMs cannot efficiently process millions of user-item interaction matrices.

PM consideration: Recommendation systems require ongoing data collection and model updating as catalog and user behavior evolves. This is a significant data infrastructure investment, not just an AI project.

Problem 10: Build a chatbot that answers HR policy questions

An HR team wants employees to be able to ask natural language questions ("How many days of parental leave do I get?" "What is the policy on remote work?") and receive accurate answers from the employee handbook.

Answer: LLM (specifically, a Retrieval-Augmented Generation or RAG system).

Why: Employees ask questions in natural language; the answers live in documents. LLMs excel at reading documents and answering questions about them. The standard architecture — called RAG — works like this: (1) break the HR policy documents into chunks, (2) when a question arrives, find the relevant chunks, (3) feed the question and relevant chunks to the LLM, (4) the LLM generates an answer grounded in the retrieved text.

PM consideration: This is one of the highest-ROI quick wins for enterprise AI. A working prototype can be built in a week. Key risks: the LLM might answer confidently even when the policy is ambiguous — add "I'm not certain, please confirm with HR" language for edge cases. Keep the knowledge base updated when policies change. Include a feedback mechanism so HR can flag incorrect answers.

Part 6: Discussion (10 min)

Use these questions to connect the lesson content to your real work.

Discussion Question 1: Think of one AI initiative you have heard about or been asked to support in your current organization. Based on what you learned today, is it a traditional ML problem, an LLM problem, or actually something that does not need AI at all?

Discussion Question 2: Using the comparison table from Part 2, what would change about your project planning if you correctly identified the problem type before the project started? Think specifically about data needs, timeline, and team composition.

Discussion Question 3: Which of the 10 classification problems was most surprising to you? What assumption did you have going in that changed?

Exercise (if time permits): Write down one problem from your current work that you think AI could help with. In 2 minutes, classify it and identify:

  • Is it traditional ML or LLM?
  • What data would you need?
  • What would success look like?

Share with a partner and discuss whether you agree on the classification.

Key Takeaways

  • AI, ML, Deep Learning, and LLMs are nested terms — each is a specific subset of the one above it. LLMs are not the same as AI; they are one type of AI.
  • Traditional ML processes structured data to make predictions. LLMs process language to generate text. Mixing them up leads to wrong estimates, wrong vendor choices, and failed projects.
  • LLMs work by next-word prediction trained on internet-scale text. This makes them powerful at language tasks and limited at quantitative reasoning, real-time data, and guaranteed factual accuracy.
  • Tokens, temperature, and context window are the three LLM settings that matter most for PM decisions: cost, consistency, and feasibility.
  • The major providers (Anthropic, OpenAI, Google, open source) have different strengths, pricing models, and deployment options. The right questions to ask are about data retention, model training on your data, and SLAs.
  • Classifying a problem correctly before you start changes everything: timeline, cost, data needs, team composition, and risk profile.

Common PM Mistakes to Avoid

Assuming everything is an LLM now. The LLM hype cycle has led many PMs to try to solve every problem with ChatGPT. Churn prediction, fraud detection, demand forecasting, recommendation systems — these are traditional ML problems. Using an LLM for them is slower, more expensive, and less accurate than the right tool.

Underestimating the data requirement for traditional ML. Traditional ML sounds cheaper than LLMs because there is no per-query cost. But collecting and labeling the training data (often thousands to millions of examples) is expensive and slow. LLMs have largely removed this barrier for language tasks.

Treating LLM output as ground truth. Hallucination is real. Any LLM output used for important decisions — legal, financial, medical — must be reviewed by a human with domain expertise. Build this review step into your process, not as an afterthought.

Ignoring the context window in your project plan. "We'll feed the whole database to the LLM" is not a plan. Every character you send to an LLM costs tokens. Design your application to be economical with context — summarize before you send, retrieve before you include.

Skipping the vendor security questions. Sending customer data to an LLM API without checking the data retention and training policies is a compliance risk. This is especially true for companies in regulated industries (healthcare, finance, legal).

Checkpoint

Pass criteria: Participants can correctly categorize 8 out of 10 use cases from Part 5.

Self-assessment: Without looking at the answers, re-read problems 1–10. For each one, write your classification. Check against the answers. If you scored 8 or more, you are ready to move to Lesson 2.

If you scored below 8, review the comparison table in Part 2 and re-read the explanations for the problems you missed. The most common errors:

  • Classifying routing/categorization tasks as LLM (they can be traditional ML if you have labeled data)
  • Classifying translation as "not AI" (it is ML-powered)
  • Thinking revenue forecasting is an LLM task (it is time-series ML)

Glossary

TermPlain English definition
Artificial IntelligenceAny computer system doing things we previously thought required human intelligence
Machine LearningA subtype of AI where the system learns patterns from data rather than following hand-coded rules
Deep LearningA subtype of ML using neural networks — many layers of mathematical functions
Large Language Model (LLM)A deep learning model trained on text, capable of generating text, answering questions, and following instructions
TokenThe unit LLMs use to process text. Roughly 1 token = 0.75 words. You pay per token.
Context windowThe maximum amount of text an LLM can process in one call — its short-term memory
TemperatureA setting controlling how random (creative) or deterministic (consistent) the LLM's output is
HallucinationWhen an LLM generates confident-sounding text that is factually incorrect
RAG (Retrieval-Augmented Generation)An architecture where relevant documents are retrieved and fed to an LLM before generating an answer — grounding responses in specific source material
Fine-tuningFurther training a pre-trained LLM on your specific data to improve its behavior for your use case
ParameterA numerical weight inside an LLM. Billions of parameters store the model's "learned knowledge".
RLHFReinforcement Learning from Human Feedback — the process of having humans rate model outputs and adjusting the model to produce better responses

Next Lesson: In Lesson 2: Document Intelligence — Making Sense of Unstructured Data, we go deep on one of the highest-value LLM applications in enterprise settings — turning documents, PDFs, emails, and reports into structured, actionable information.

Back to Module Overview | Next Lesson: Document Intelligence →

Concept Map

Try it yourself

Write Python code below and click Run to execute it in your browser.