Learning Plan: AI/LLM Practical Applications for Solution Architecture

Goal: Develop detailed conceptual understanding of AI solution architectures to evaluate technical and economic viability, maintain technical credibility as a sanity-check resource, and position strategically in an AI-transformed landscape.

Target Depth: "AI-literate decision-maker and architect" - sufficient understanding to evaluate whether proposed solutions make sense before they're built, identify architectural limitations vs. implementation problems, and translate between technical possibilities and business requirements.

Time Commitment: 1 hour/day, sustained learning
Background: 15 years in education data/tech consulting, familiar with Karpathy's LLM content, regular Claude/ChatGPT user, data engineering background

Note on Structure: Phase 1 is designed to be completable in ~15 days before your strategy meeting. It front-loads actionable architectural knowledge. Phases 2-5 build deeper foundations and expand into specialized topics.

Phase 2: Neural Network & Statistics Foundations (Weeks 3-4)

Purpose: Fill in the "why" behind what you learned in Phase 1. Understanding how neural networks learn and basic probability helps you reason about model limitations, training trade-offs, and when fine-tuning makes sense.

Learning Plan: AI/LLM Practical Applications for Solution Architecture

Week 3: Neural Networks Fundamentals

Primary Resource:

3Blue1Brown: "Neural Networks" Playlist (YouTube, 4 videos, ~1 hour total)

https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
Watch at 1.25x speed
Best visual explanations of backprop, gradient descent, activation functions
Videos: "What is a Neural Network?", "Gradient Descent", "Backpropagation", "Backpropagation Calculus"

Supplementary:

Fast.ai: Practical Deep Learning Lesson 1 (video + notebooks)

https://course.fast.ai/Lessons/lesson1.html
Watch video (~2 hours at 1.5x)
Skim notebooks to see code structure
Focus on: what models actually do, transfer learning intuition, training loop

Why this matters: You learned what fine-tuning and training are from Karpathy. Now you need to understand how learning happens (gradient descent, backpropagation) to reason about: (1) why fine-tuning differs from RAG, (2) why more data/compute improves performance, (3) what can go wrong in training (overfitting, underfitting, vanishing gradients), and (4) realistic resource requirements for training.

Key concepts:

Forward pass: inputs → layers → outputs
Loss functions: measuring how wrong the model is
Backpropagation: computing gradients (how to adjust weights)
Gradient descent: actually adjusting weights
Learning rate: trade-off between speed and stability
Overfitting vs. underfitting
Why deep networks work: hierarchical feature learning
Transfer learning: starting from pretrained model

Conceptual exercise (1-2 hours):

Explain in plain language to a non-technical colleague:

Why does a model need millions of examples to learn?
What's actually happening when we "train" a model?
Why does fine-tuning cost less than training from scratch?
What's the difference between parameters and hyperparameters?

Write this out as if preparing a briefing document.

Daily breakdown:

Days 1-2: 3Blue1Brown videos 1-2, begin fast.ai Lesson 1
Days 3-4: 3Blue1Brown videos 3-4, finish fast.ai Lesson 1
Days 5-6: Review, conceptual exercise
Day 7: Buffer/catch-up day

Week 4: Statistics & Probability for LLMs

Primary Resource:

StatQuest: "Machine Learning Fundamentals" Playlist (YouTube, selected videos, ~2 hours total)

https://www.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF
Watch videos on: probability distributions, cross-entropy, logistic regression, overfitting/regularization
StatQuest's style is accessible and visual
Watch at 1.25x

Supplementary:

"Understanding LLM Sampling" Article (search for recent explainer)

Look for articles explaining temperature, top-p, top-k
~20 minutes reading
Anthropic's docs on sampling might also help

Why this matters: LLM outputs are probabilistic. Understanding sampling (temperature, top-k, top-p) helps you: (1) configure models for different tasks (creative vs. factual), (2) understand why models sometimes produce different outputs for the same input, (3) reason about confidence and uncertainty, and (4) evaluate claims about model behavior.

Key concepts:

Probability distributions: what it means when a model outputs "probabilities"
Sampling strategies: greedy, temperature, top-k, top-p (nucleus)
Cross-entropy: how model "confidence" is measured
Temperature: controlling randomness (high = creative, low = deterministic)
Why models are stochastic: sampling from distribution vs. always choosing most likely
Confidence vs. correctness: models can be confidently wrong
Regularization intuition: preventing overfitting

Practical exercise (1-2 hours):

Test sampling parameters yourself:

Use Claude/ChatGPT API playground (or similar)
Same prompt, vary temperature from 0 to 1
Observe output differences
Document: when would you use temp=0? temp=0.7? temp=1?
Create guideline: "For task X, use temperature Y because..."

Daily breakdown:

Days 1-2: StatQuest videos on probability, distributions
Days 3-4: StatQuest videos on cross-entropy, regularization
Days 5: Sampling explainer articles, sampling exercise
Days 6-7: Review, consolidate notes on when to use different sampling strategies

Phase 3: MCP, Skills, and Claude-Specific Tooling (Week 5)

Purpose: Understand Anthropic's contributions to the ecosystem (MCP, Skills) as case studies for general patterns. These are increasingly industry-standard, not just Claude-specific.

Days 1-3: Model Context Protocol (MCP)

Primary Resource:

Official MCP Documentation

https://modelcontextprotocol.io/
Read: Introduction, Quickstart, Core Concepts
~1 hour total reading

"Model Context Protocol Explained" by Nir Diamant (Substack)

https://diamantai.substack.com/p/model-context-protocol-mcp-explained
~30 minutes
Excellent practical examples

Supplementary Video:

"Building Agents with Model Context Protocol" Workshop (AI Engineer Summit)

Search for Mahesh Murag's workshop on YouTube
~1 hour at 1.5x speed
Demos of building simple MCP servers

Why this matters: MCP is becoming an industry standard for connecting LLMs to data sources and tools. You need to understand: (1) client-server architecture, (2) what MCP servers provide (prompts, resources, tools), (3) implementation realities (how big is the task, what languages, frameworks available), (4) security considerations, and (5) when MCP makes sense vs. custom API integration.

Key concepts:

Architecture: client (AI app) → host → server (data/tools)
MCP primitives: Prompts (templates), Resources (data), Tools (functions)
JSON-RPC protocol: how messages are structured
Server implementation: Python or TypeScript SDK, ~100-500 lines for basic server
Security: authentication, authorization, sandboxing
Testing: unit tests for tools, integration tests for full flow
When to use MCP: standard integrations, multiple clients, model-agnostic
When NOT to use MCP: simple one-off integration, extreme latency requirements

Implementation realities assessment (2-3 hours):

For a hypothetical MCP server project, estimate:

Scenario: Build MCP server that connects to company's student information system API

Your assessment:

Developer skill level needed: junior/mid/senior? Specialization?
Implementation time: hours/days/weeks?
What's handled by SDK vs. custom code?
Testing strategy: what needs to be tested?
Ongoing maintenance: what breaks when APIs change?
Cost to run: server hosting, API calls, monitoring
Alternative approaches: when would you not use MCP here?

Daily breakdown:

Day 1: Official MCP docs (Introduction, Quickstart)
Day 2: Official MCP docs (Core Concepts), Nir Diamant article
Day 3: Workshop video, implementation realities assessment

Days 4-5: Claude Skills & System Prompts

Primary Resource:

Anthropic's Skills Documentation

Available in Claude desktop app under Skills
Examine existing skills: docx, pptx, xlsx, pdf skills
Read SKILL.md files to understand structure
~1 hour exploration

"Building Effective System Prompts" Guide

Search Anthropic docs for prompt engineering guidance
Focus on: instruction hierarchy, example patterns, structured outputs
~30 minutes

Why this matters: Skills are structured knowledge injection. Understanding them helps you: (1) recognize when to provide context via skills vs. RAG, (2) design effective system prompts for custom applications, (3) understand how LLMs follow layered instructions, and (4) evaluate "custom GPT" proposals.

Key concepts:

Skill architecture: markdown files with instructions, examples, best practices
Why skills work: injected into system prompt, high-priority instructions
SKILL.md design patterns: clear objectives, step-by-step guidance, examples, troubleshooting
Interaction with computer use: skills guide file creation, tool usage
Portability: skills are just markdown, transferable to other LLMs
When to use skills: reusable workflows, best practices for specific tasks
When to use RAG instead: dynamic data, large knowledge bases

Design exercise (2-3 hours):

Create a SKILL.md file for a specific education use case:

Scenario: "Analyzing standardized test data to identify learning gaps"

Your skill should include:

Clear objective statement
Step-by-step analysis workflow
Examples of good analysis patterns
Common pitfalls to avoid
Output format requirements
When to escalate to human review

This exercise forces you to think through structured instruction design.

Daily breakdown:

Day 4: Explore existing Claude skills, read SKILL.md files
Day 5: System prompt guidance, skill design exercise

Days 6-7: Claude Computer Use Environment

Primary Resource:

Anthropic's Computer Use Documentation

https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool
Read all sections
~45 minutes

Computer Use Demo Videos

Search for "Claude computer use demo" on YouTube
Watch 2-3 recent demos
~30 minutes total

Why this matters: Understanding the computer use environment helps you: (1) evaluate proposals involving file creation/manipulation, (2) understand what's possible vs. what requires special setup, (3) reason about security and sandboxing, and (4) assess developer requirements for building computer-use applications.

Key concepts:

Container environment: Ubuntu, isolated, ephemeral
File system: /home/claude (work), /mnt/user-data/uploads (inputs), /mnt/user-data/outputs (deliverables)
Available tools: bash, file operations, package installation
Limitations: network restrictions, no persistent state between sessions
Caching behavior: what persists, what resets
Security model: sandboxing, what Claude can/cannot access
Use cases: document creation, data analysis, code generation

Architectural pattern exercise (1-2 hours):

Design a workflow using computer use:

Scenario: "Automated report generation from uploaded CSV data"

Your design:

Input: what does user provide?
Processing steps: bash commands, file operations, tools used
Output: what gets delivered to /outputs?
Error handling: what could go wrong?
Skill integration: what SKILL.md instructions are needed?
Cost/time estimate: per report generated

Daily breakdown:

Day 6: Computer use docs, demo videos
Day 7: Architectural pattern exercise, Phase 3 review

Reference Materials (Keep Accessible)

Essential Documentation

Resource	Purpose	URL
Anthropic API Docs	Tool use, caching, models	https://docs.anthropic.com
OpenAI Platform Docs	Embeddings, fine-tuning	https://platform.openai.com/docs
MCP Specification	Protocol details	https://modelcontextprotocol.io
Pinecone RAG Guide	RAG best practices	https://www.pinecone.io/learn/

Video Resources

Karpathy's "Deep Dive into LLMs" (3.5 hours): https://youtube.com/watch?v=7xTGNNLPyMI
Karpathy's "Let's Build GPT Tokenizer" (2 hours): https://youtube.com/watch?v=zduSFxRajkE
3Blue1Brown Neural Networks Playlist: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
StatQuest Machine Learning Playlist: https://www.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF

Cost Calculators & Tools

OpenAI Tokenizer: https://platform.openai.com/tokenizer
Anthropic Pricing: https://www.anthropic.com/pricing
Model comparison (LMSYS): https://chat.lmsys.org/?leaderboard

Your Created Materials

Keep these in an accessible reference folder:

RAG Cost Analysis Exercise (Phase 1, Days 4-7)
Token Economics Spreadsheet (Phase 1, Days 8-10)
BS Detection Checklist (Phase 1, Days 14-15)
MCP Implementation Assessment (Phase 3, Days 1-3)
Production RAG Checklist (Phase 4, Week 6)
All Phase 5 Decision Frameworks

Pacing Notes & Adjustments

If you're moving faster:

Deep dive into Karpathy's full "Neural Networks: Zero to Hero" course
Implement actual RAG system (LangChain + Chroma + OpenAI embeddings)
Take fast.ai full Practical Deep Learning course
Build actual MCP server for a real use case

If you're moving slower:

Phase 1 is the priority—extend it to 3 weeks if needed
Phase 2 (foundations) can be compressed or skipped if time-pressured
Phases 4-5 can be done "on-demand" when you encounter those specific needs
Focus on exercises over reading—hands-on builds intuition faster

The key metric: Can you evaluate an AI solution proposal and write a 1-page technical assessment covering: viability, cost structure, failure modes, alternative approaches, and team requirements? That's the goal.

Cost Summary

Resource	Cost
All video courses (YouTube, fast.ai, Coursera auditing)	Free
Documentation (Anthropic, OpenAI, Microsoft, etc.)	Free
API experimentation (OpenAI, Anthropic playgrounds)	~$5-10 (optional)
Optional: Coursera verified certificates	~$49 each
Optional: Hands-on RAG implementation	~$20 (API credits)

Minimum cost: $0 (all core resources are free; API experimentation is optional)

Success Indicators by Phase

After Phase 1 (Pre-meeting):

You can explain RAG to a non-technical executive and identify when it's appropriate
You can estimate token costs for a proposed AI solution and spot economic red flags
You can distinguish between genuine architectural complexity and unnecessary "agentic" framing
You have a checklist of questions to ask about any AI proposal

After Phase 2 (Foundations):

You understand why fine-tuning differs from RAG at a mechanical level
You can explain when more training data helps vs. when it doesn't
You understand model behavior (sampling, temperature) well enough to configure systems appropriately

After Phase 3 (MCP & Claude Tooling):

You can evaluate MCP server proposals and estimate implementation effort
You understand when to use system prompts/skills vs. RAG for knowledge injection
You know what's possible with computer use and what requires custom infrastructure

After Phase 4 (Production RAG):

You can design evaluation frameworks for RAG systems
You understand production considerations beyond MVP (monitoring, iteration, cost optimization)
You can recommend specific architectural patterns for RAG use cases

After Phase 5 (Decision Frameworks):

You have reusable frameworks for rapid evaluation of AI proposals
You can generate technical assessments of proposals in <30 minutes
You can confidently recommend offshore-suitable vs. senior-required work
You maintain technical credibility while translating between technical and business stakeholders

Meta Notes on Learning Approach

Why this structure:

Front-loaded actionability: Phase 1 gets you to "credible evaluator" in 15 days, even though it's pedagogically backwards
Foundations when they're most useful: After seeing practical applications, foundations make more sense
Exercise-heavy: Each phase includes hands-on work because concepts without application don't stick
Reference-optimized: Materials chosen for ongoing utility, not just one-time reading
Economic focus: Unusual for learning plans, but critical for your role as solution architect

Learning philosophy: You're not trying to become an ML engineer—you're building "informed buyer" expertise. The goal is knowing enough to ask the right questions, spot impossible claims, and translate between technical possibilities and business requirements. This requires deeper understanding than typical "intro to AI" content, but different depth than an implementer needs.