AI Practical Applications - Phase 1

Note: Learning plans age quickly in the fast-moving AI landscape.

For a personalized, up-to-date learning plan tailored to what you already know, check out the AI Learning Plan Generator Claude Project.

Learning Plan: AI/LLM Practical Applications for Solution Architecture

Goal: Develop detailed conceptual understanding of AI solution architectures to evaluate technical and economic viability, maintain technical credibility as a sanity-check resource, and position strategically in an AI-transformed landscape.

Target Depth: "AI-literate decision-maker and architect" - sufficient understanding to evaluate whether proposed solutions make sense before they're built, identify architectural limitations vs. implementation problems, and translate between technical possibilities and business requirements.

Time Commitment: 1 hour/day, sustained learning
Background: 15 years in education data/tech consulting, familiar with Karpathy's LLM content, regular Claude/ChatGPT user, data engineering background

Note on Structure: Phase 1 is designed to be completable in ~15 days before your strategy meeting. It front-loads actionable architectural knowledge. Phases 2-5 build deeper foundations and expand into specialized topics.


Phase 1: Fast Track to AI Architecture Credibility (Weeks 1-2, ~15 days)

Purpose: Equip you to evaluate RAG proposals, understand token economics, and assess tool usage architectures before your meeting. You'll learn just enough about embeddings and LLM mechanics to ask intelligent questions and spot architectural red flags.


Days 1-3: Embeddings & Vector Search Fundamentals

Primary Resource:

  • MIT Lecture: "Embeddings, RAG, and Vector Databases" (Mike Cafarella, March 2024)
  • PDF: https://dsg.csail.mit.edu/6.S079/lectures/lec10-s24.pdf
  • This is a ~50-slide academic lecture deck
  • Read through systematically, ~20 minutes per session
  • Focus on slides covering: what embeddings are, how similarity search works, trade-offs in embedding model selection

Supplementary Video:

  • "What are Embeddings" by StatQuest (search YouTube, ~15 min)
  • Watch at 1.25x speed
  • Provides intuitive visual explanation of how words become vectors

Why this matters: When someone proposes "embedding all your documents in a vector database," you need to understand: (1) why vectors enable semantic search vs. keyword search, (2) what "distance" means and why it matters, (3) embedding model implications (size, cost, vocabulary), and (4) retrieval precision limitations. This is the foundation for evaluating RAG proposals.

Key concepts to master:

  • Embeddings as semantic representations (king - man + woman ≈ queen)
  • Vector similarity metrics (cosine similarity, euclidean distance)
  • Embedding model trade-offs (performance vs. cost vs. dimensionality)
  • Why embeddings can't be reversed back to original text
  • Vocabulary limitations and out-of-vocabulary handling

Hands-on exercise (1-2 hours):

Option A: Use OpenAI's Embeddings API playground

  • Create embeddings for 5-10 related phrases
  • Visualize similarity scores
  • Change one word and observe embedding changes

Option B: Follow Microsoft's "Generate Embeddings" tutorial

Daily breakdown:

  • Day 1: MIT lecture slides 1-25 (embeddings fundamentals), StatQuest video
  • Day 2: MIT lecture slides 26-50 (vector databases, similarity search)
  • Day 3: Hands-on exercise + review notes

Success metric: Can you explain to a colleague why "search by meaning" works differently than "search by keywords" and what the cost/performance implications are?


Days 4-7: RAG Architecture End-to-End

Primary Resource:

Supplementary:

"Beyond Vector Databases: RAG Without Embeddings" (DigitalOcean, Aug 2025)

Why this matters: RAG is the most common proposal you'll evaluate. You need to understand: (1) the full pipeline (ingestion → chunking → embedding → retrieval → generation), (2) where costs accumulate, (3) common failure modes (retrieval precision, context window limits), (4) when RAG is appropriate vs. fine-tuning or long-context models, and (5) architectural alternatives.

Key concepts:

  • RAG pipeline stages: prepare → embed → store → retrieve → augment → generate
  • Chunking strategies (fixed size vs. semantic, overlap considerations)
  • Vector database options (Pinecone, Chroma, Weaviate, Qdrant, pgvector) and selection criteria
  • Retrieval strategies: similarity search, hybrid search (keyword + semantic), reranking
  • Cost structure: embedding API calls, vector storage, retrieval latency, LLM inference
  • Failure modes: poor chunking loses context, retrieval misses relevant docs, hallucination despite grounding
  • When to use: dynamic knowledge bases, need for citations, cost-prohibitive to fine-tune
  • When NOT to use: stable/small knowledge base, need deterministic responses, ultra-low latency requirements

Analytical exercise (2-3 hours total across days):

Create a cost/architecture analysis document for a hypothetical RAG system:

Scenario: Education company wants to build "AI tutor" that answers questions using 10,000 textbook PDFs

Your analysis should include:

  1. Chunking approach recommendation and rationale
  2. Embedding model selection (OpenAI vs. open-source, dimension count)
  3. Vector database choice and why
  4. Estimated costs: embedding generation (one-time), storage (ongoing), retrieval+generation (per query)
  5. Identified failure modes specific to this use case
  6. When this should be RAG vs. fine-tuning vs. long-context Claude

This exercise forces you to make real architectural decisions with trade-off justifications.

Daily breakdown:

  • Day 4: Microsoft tutorial sections 1-3 (RAG overview, architecture, ingestion)
  • Day 5: Microsoft tutorial sections 4-5 (retrieval, generation), begin cost exercise
  • Day 6: DigitalOcean "Beyond Vector Databases" article, continue cost exercise
  • Day 7: Complete cost exercise, create reference architecture diagram

Days 8-10: Token Economics & Context Management

Primary Resource:

Karpathy's "Let's build the GPT Tokenizer" (YouTube, ~2 hours)

  • https://www.youtube.com/watch?v=zduSFxRajkE
  • Watch at 1.25-1.5x speed
  • Focus on: why tokenization matters, BPE algorithm intuition, token count implications
  • Skip the detailed coding sections unless you find them clarifying

Supplementary:

Anthropic's Prompt Caching Documentation

OpenAI's Token Counting Tool

Why this matters: Token economics drive cost and feasibility. You need to understand: (1) why tokens ≠ words, (2) how context window size affects what's possible, (3) prompt caching mechanics and cost implications, (4) when to use smaller vs. larger context windows, and (5) cost optimization strategies (batching, caching, model selection). This knowledge is critical for evaluating whether proposed solutions are economically viable at scale.

Key concepts:

  • Tokenization fundamentals: Byte Pair Encoding (BPE), subword tokens
  • Token count implications: code uses more tokens than prose, non-English text varies
  • Context window vs. useful context (just because you can fit 200k tokens doesn't mean you should)
  • Prompt caching: how it works, when it saves money, what doesn't cache
  • Cost structures: input tokens, output tokens, cached tokens, rate limits
  • Conversation design for efficiency: minimize redundant context, use caching strategically
  • Model selection based on task: when to use Haiku vs. Sonnet vs. Opus

Practical exercise (1-2 hours):

Build a token economics spreadsheet:

Scenario: Customer service chatbot handling 100k conversations/month

Variables to model:

  • Average conversation length (turns)
  • Context needed per turn (customer history, product docs)
  • With/without prompt caching
  • Model selection (Haiku, Sonnet, Opus)

Calculate:

  • Monthly token consumption (input, output, cached)
  • Monthly costs per model choice
  • Break-even point for caching implementation
  • ROI of conversation design optimization

Daily breakdown:

  • Day 8: Karpathy tokenizer video (first half)
  • Day 9: Karpathy tokenizer video (second half), Anthropic caching docs
  • Day 10: Token economics spreadsheet exercise, experiment with OpenAI tokenizer

Days 11-13: Tool Usage & Agentic Patterns

Primary Resource:

Anthropic's "Tool Use (Function Calling)" Documentation

Supplementary Video:

"Function Calling is All You Need -- Full Day Workshop with Ilan Bigio of OpenAI" (YouTube, ~1.75 hr)

  • https://www.youtube.com/watch?v=KUEmEb71vzQ&list=WL
  • Workshop from someone who built OpenAI's 2024 AI phone ordering demo and led technical development for Swarm
  • Covers: function calling history, agent loop architecture, memory management, delegation patterns, async delegation, and dynamic agent-written tools
  • Includes hands-on demonstrations with the Swarm framework

Why this matters: "AI agents" proposals often boil down to tool usage patterns. You need to understand: (1) how LLMs decide when to use tools vs. respond directly, (2) tool orchestration complexity, (3) error handling and retry logic, (4) security/safety considerations, and (5) realistic team requirements for building and maintaining agentic systems. This helps you evaluate "we'll build an AI agent that..." proposals.

Key concepts:

  • Tool use mechanics: LLM outputs structured tool call, environment executes, result fed back to LLM
  • Decision logic: when models choose to use tools (based on training + instructions)
  • Tool definition patterns: clear descriptions, parameter schemas, example use cases
  • Multi-tool orchestration: sequential vs. parallel, dependency handling
  • Error handling: tool failures, malformed responses, retry strategies
  • Security: tool access controls, rate limiting, sandboxing
  • Tradeoffs: tool use vs. in-context information (latency, complexity, reliability)

Architectural analysis exercise (2-3 hours):

Evaluate a proposed "AI agent" architecture:

Scenario: "We'll build an AI agent that monitors student assignment submissions, checks for plagiarism using TurnItIn API, analyzes writing quality, and posts feedback to our LMS"

Your evaluation should cover:

  1. Tool requirements: what tools does the LLM need?
  2. Orchestration complexity: sequential dependencies, error cases
  3. Developer requirements: junior vs. senior, full-stack vs. specialized
  4. Testing strategy: unit tests for tools, integration tests for orchestration
  5. Failure modes: API downtime, ambiguous cases, hallucinated tool calls
  6. Cost structure: LLM calls, tool API costs, error retries
  7. Alternative approaches: could this be simpler without "agentic" framing?

Daily breakdown:

  • Day 11: Anthropic tool use docs (introduction, basics, advanced patterns)
  • Day 12: Video supplement, start architectural analysis exercise
  • Day 13: Complete architectural analysis exercise, create decision framework

Days 14-15: Critical Evaluation & BS Detection

Primary Resources:

"Hallucination Detection Strategies" (multiple sources to triangulate)

  • Search for: "LLM hallucination detection 2024 2025"
  • Read 2-3 recent articles/papers (30-45 min total)
  • Focus on practical detection methods, not just definitions

LLM Benchmarks & Leaderboards

  • Browse: LMSYS Chatbot Arena (https://chat.lmsys.org/?leaderboard)
  • Read: How to interpret Elo ratings, what benchmarks actually measure
  • ~30 minutes

Anthropic's "Evaluating Claude" Documentation

Why this matters: The most valuable skill is knowing when AI is appropriate vs. traditional approaches, and detecting when proposals make impossible claims. You need to: (1) recognize hallucination patterns, (2) understand output validation techniques, (3) know what benchmarks actually measure vs. what they claim, and (4) identify when proposals contradict technical constraints.

Key concepts:

  • Hallucination types: factual errors, fabricated citations, confident incorrectness
  • Detection strategies: fact-checking, consistency checks, citation verification, ensemble methods
  • When to use AI: ambiguous/creative tasks, natural language interface valuable, acceptable error rate
  • When NOT to use AI: need 100% accuracy, deterministic logic required, liability for errors
  • Benchmark limitations: what MMLU/HumanEval/etc. actually test, distribution shift
  • Impossible claims: "perfect accuracy on subjective tasks," "eliminates need for human review," "real-time without lag"
  • Output validation: structured outputs, confidence scoring, human-in-the-loop

Synthesis exercise (2-3 hours):

Create a "BS Detection Checklist" for AI proposals:

Section 1: Technical Red Flags

  • Claims that violate known constraints (e.g., "real-time analysis of video with no latency")
  • Misunderstanding of model capabilities ("it understands like humans")
  • Benchmark misinterpretation

Section 2: Economic Red Flags

  • Underestimated token costs at scale
  • Ignored API rate limits
  • Missing ongoing maintenance costs

Section 3: Architectural Red Flags

  • Over-complicated "agent" when simple prompt would work
  • RAG when fine-tuning makes more sense
  • No validation/testing strategy

Section 4: Evaluation Questions to Ask

  • How will you measure success?
  • What's your validation strategy?
  • What happens when it's wrong?
  • What's the cost at 10x scale?
  • Why AI instead of traditional approach?

Daily breakdown:

  • Day 14: Read hallucination detection articles, explore LLM leaderboards, Anthropic eval docs
  • Day 15: Create BS Detection Checklist, review all Phase 1 notes

Phase 1 Summary & Pre-Meeting Prep

After Day 15, you should be able to:

  1. Explain how RAG works and when it's appropriate vs. alternatives
  2. Estimate token costs and identify economic red flags in proposals
  3. Evaluate tool usage architectures and spot over-complicated "agent" designs
  4. Ask intelligent questions about validation, failure modes, and scalability
  5. Recognize claims that contradict technical constraints

Pre-meeting review (30 minutes):

  • Skim your notes from Days 1-15
  • Review your RAG cost analysis exercise
  • Review your BS Detection Checklist
  • Prepare 3-5 clarifying questions you might ask about AI proposals

Reference Materials (Keep Accessible)

Essential Documentation

Resource Purpose URL
Anthropic API Docs Tool use, caching, models https://docs.anthropic.com
OpenAI Platform Docs Embeddings, fine-tuning https://platform.openai.com/docs
MCP Specification Protocol details https://modelcontextprotocol.io
Pinecone RAG Guide RAG best practices https://www.pinecone.io/learn/

Video Resources

Cost Calculators & Tools

Your Created Materials

Keep these in an accessible reference folder:

  • RAG Cost Analysis Exercise (Phase 1, Days 4-7)
  • Token Economics Spreadsheet (Phase 1, Days 8-10)
  • BS Detection Checklist (Phase 1, Days 14-15)
  • MCP Implementation Assessment (Phase 3, Days 1-3)
  • Production RAG Checklist (Phase 4, Week 6)
  • All Phase 5 Decision Frameworks

Pacing Notes & Adjustments

If you're moving faster:

  • Deep dive into Karpathy's full "Neural Networks: Zero to Hero" course
  • Implement actual RAG system (LangChain + Chroma + OpenAI embeddings)
  • Take fast.ai full Practical Deep Learning course
  • Build actual MCP server for a real use case

If you're moving slower:

  • Phase 1 is the priority—extend it to 3 weeks if needed
  • Phase 2 (foundations) can be compressed or skipped if time-pressured
  • Phases 4-5 can be done "on-demand" when you encounter those specific needs
  • Focus on exercises over reading—hands-on builds intuition faster

The key metric: Can you evaluate an AI solution proposal and write a 1-page technical assessment covering: viability, cost structure, failure modes, alternative approaches, and team requirements? That's the goal.


Cost Summary

Resource Cost
All video courses (YouTube, fast.ai, Coursera auditing) Free
Documentation (Anthropic, OpenAI, Microsoft, etc.) Free
API experimentation (OpenAI, Anthropic playgrounds) ~$5-10 (optional)
Optional: Coursera verified certificates ~$49 each
Optional: Hands-on RAG implementation ~$20 (API credits)

Minimum cost: $0 (all core resources are free; API experimentation is optional)


Success Indicators by Phase

After Phase 1 (Pre-meeting):

  • You can explain RAG to a non-technical executive and identify when it's appropriate
  • You can estimate token costs for a proposed AI solution and spot economic red flags
  • You can distinguish between genuine architectural complexity and unnecessary "agentic" framing
  • You have a checklist of questions to ask about any AI proposal

After Phase 2 (Foundations):

  • You understand why fine-tuning differs from RAG at a mechanical level
  • You can explain when more training data helps vs. when it doesn't
  • You understand model behavior (sampling, temperature) well enough to configure systems appropriately

After Phase 3 (MCP & Claude Tooling):

  • You can evaluate MCP server proposals and estimate implementation effort
  • You understand when to use system prompts/skills vs. RAG for knowledge injection
  • You know what's possible with computer use and what requires custom infrastructure

After Phase 4 (Production RAG):

  • You can design evaluation frameworks for RAG systems
  • You understand production considerations beyond MVP (monitoring, iteration, cost optimization)
  • You can recommend specific architectural patterns for RAG use cases

After Phase 5 (Decision Frameworks):

  • You have reusable frameworks for rapid evaluation of AI proposals
  • You can generate technical assessments of proposals in <30 minutes
  • You can confidently recommend offshore-suitable vs. senior-required work
  • You maintain technical credibility while translating between technical and business stakeholders

Meta Notes on Learning Approach

Why this structure:

  1. Front-loaded actionability: Phase 1 gets you to "credible evaluator" in 15 days, even though it's pedagogically backwards
  2. Foundations when they're most useful: After seeing practical applications, foundations make more sense
  3. Exercise-heavy: Each phase includes hands-on work because concepts without application don't stick
  4. Reference-optimized: Materials chosen for ongoing utility, not just one-time reading
  5. Economic focus: Unusual for learning plans, but critical for your role as solution architect

Learning philosophy: You're not trying to become an ML engineer—you're building "informed buyer" expertise. The goal is knowing enough to ask the right questions, spot impossible claims, and translate between technical possibilities and business requirements. This requires deeper understanding than typical "intro to AI" content, but different depth than an implementer needs.