Updated June 2026

Generative AI and LLMs for Python Programmers

Class Duration

35 hours of live training delivered over 5 days.

Student Prerequisites

Experience with Python is required.
No generative AI experience is required.

Target Audience

Python programmers and software engineers who want a comprehensive, end-to-end understanding of Generative AI and Large Language Models - from Transformer fundamentals and prompt engineering to fine-tuning, RAG, agentic patterns, and responsible deployment.

Description

This comprehensive course is the broad survey that takes Python developers from zero generative AI experience to a working command of the whole field as it stands in 2026. It builds the foundations first: the evolution of text generation from N-grams and RNNs to the Transformer architecture, attention and positional encoding, and how text generation actually works. Those foundations are then put to work against today's frontier models (the Claude 5 and 4.x, GPT-5.x, and Gemini 3.x families) through the Anthropic, OpenAI, and Gemini Python SDKs: prompt and context engineering, structured outputs, tool calling, multimodal inputs, embeddings, and RAG. The course then goes under the hood with pre-training, fine-tuning and PEFT, alignment via RLHF, and model evaluation, closing each major theme with the practical question that matters: which technique to reach for, and when. The final day covers running open-weight models locally, evals and observability, cost engineering, and the ethical, privacy, and security concerns of responsible AI practice. Students who want to go deeper afterward have two natural follow-ons: LLM Application Development with Python for production application engineering, and Building AI Agents with Python and MCP for autonomous multi-step systems.

Learning Outcomes

Understand the fundamentals of generative AI and LLMs, including the evolution from N-grams and RNNs to the Transformer architecture, attention mechanisms, and positional encoding.
Gain practical skills in text generation, prompt engineering, context engineering, and generative configuration.
Call frontier models (Claude 5 and 4.x, GPT-5.x, and Gemini 3.x) from Python with the Anthropic, OpenAI, and Gemini SDKs, including streaming, structured outputs with Pydantic, tool calling, and multimodal inputs.
Build semantic search with embeddings, and implement RAG with chunking, vector stores, hybrid search, and reranking.
Explore pre-training, domain adaptation, fine-tuning, PEFT, and alignment with human values, and choose confidently between prompting, RAG, and fine-tuning for a given problem.
Explain agentic patterns and the Model Context Protocol (MCP), and recognize when a problem calls for an agent.
Evaluate generative AI applications with evals and observability tooling, control costs with prompt caching, and run open-weight models locally with Ollama and vLLM.
Understand the ethical considerations, biases, privacy, and security concerns in generative AI, and navigate the full project lifecycle with responsible AI practices.

Training Materials

Comprehensive courseware is distributed online at the start of class. All students receive a downloadable MP4 recording of the training.

Software Requirements

Students will need a free, personal GitHub account to access the courseware. Student will need permission to install Docker Desktop, Python, Visual Studio Code, and Visual Studio Code Extensions on their computers.

Training Topics

Introduction to Generative AI & LLMs

Overview of Generative AI
Introduction to Large Language Models (LLMs)
Historical Perspective on Text Generation
Use Cases and Tasks for LLMs
How LLMs fit into the 2026 software landscape

Text Generation before Transformers

N-grams and Statistical Language Models
Recurrent Neural Networks (RNNs)
Long Short-Term Memory (LSTM) Networks
Limitations of Pre-Transformer Models

Transformer Architecture

Introduction to Transformer Models
Attention Mechanism
Encoder-Decoder Architecture
Self-Attention and Multi-Head Attention
Positional Encoding
Mixture-of-Experts (MoE) and why frontier models use it

Tokens, Sampling, and Generative Configuration

Tokenization and context windows
Text generation techniques: greedy decoding to controlled sampling
Beam Search, Sampling, and Top-k/Top-p Sampling
Temperature and inference configurations
Model hyperparameters, training configurations, and fine-tuning configurations
Practical examples of text generation

Provider Setup and SDK Fundamentals

Anthropic, OpenAI, and Gemini Python SDK setup
API keys, environments, and request/response anatomy
Integrating LLMs into applications
Streaming responses
Provider SDKs and the move toward standardized message APIs
Error handling and rate limits

Frontier Models in 2026

Claude 5 and 4.x families (Anthropic): Fable, Opus, Sonnet, Haiku
GPT-5.x family (OpenAI), incl. GPT-5.5 and GPT-5.2-Codex
Gemini 3.x family (Google)
Reasoning and extended thinking modes
Choosing models by task: capability, latency, cost
Open-weight alternatives and where they fit

Prompting and Prompt Engineering

Introduction to Prompt Engineering
Designing Effective Prompts
System prompts, roles, and few-shot examples
Helping LLMs reason and plan with Chain-of-Thought
Techniques for Prompt Optimization
Examples and Best Practices

Context Engineering

From prompt engineering to context engineering
What belongs in the context window - and what doesn't
Managing context budgets across long interactions
Summarization and compaction strategies
Grounding with retrieved and injected content

Structured Outputs with Pydantic

Structured outputs (JSON Schema-constrained generation)
Pydantic models as output contracts
Validation and error recovery for malformed responses
Extraction, classification, and routing use cases
When structure helps - and when it hurts quality

Tool Use and Function Calling

Interacting with external applications via tool use / function calling
Defining tools: schemas, names, and descriptions
The tool-call round trip: invoke, execute, return
Sequential and parallel tool calls
Program-Aided Language Models (PAL)

Multimodal Inputs

Vision: images as model input
Document understanding: PDFs and rich documents
Audio inputs and transcription workflows
Multimodal prompting patterns
Cost and latency considerations for multimodal calls

Embeddings and Semantic Search

Vector embeddings and similarity search
Embedding models and choosing dimensions
Distance metrics and nearest-neighbor search
Semantic search vs. keyword search
Embeddings beyond search: clustering and classification

RAG Fundamentals

Retrieval-augmented generation: why and when
Chunking strategies and metadata
pgvector, Qdrant, Chroma, and other vector stores
Grounded prompting and citations
The naive-RAG baseline and its failure modes

RAG Quality: Hybrid Search, Reranking, and Evaluation

Hybrid search (BM25 + vector)
Reranking
RAG evaluation harnesses
Retrieval metrics: precision, recall, and groundedness
Iterating on retrieval quality systematically

Agentic Patterns and MCP

Agents vs. workflows
Tool use and the agent loop
ReAct: Combining Reasoning and Action
Model Context Protocol (MCP): standardized tool/server integrations
Multi-agent orchestration
Background agents and long-horizon tasks
Going deeper: Building AI Agents with Python and MCP

Generative AI Project Lifecycle

Project Planning and Scoping
Data Collection and Preprocessing
Model Selection and Training
Evaluation and Iteration
Deployment and Monitoring
LLM application architectures
Lifecycle cheat sheet: key steps, best practices, common pitfalls and solutions

Pre-training Large Language Models

Pre-training Objectives
Datasets for Pre-training
Computational Challenges
Scaling Laws and Compute-Optimal Models

Domain Adaptation, Fine-Tuning, and PEFT

Domain Adaptation Techniques
Instruction Fine-Tuning
Fine-Tuning on a Single Task
Multi-Task Instruction Fine-Tuning
Introduction to Parameter-Efficient Fine-Tuning (PEFT)
PEFT Techniques 1: LoRA (Low-Rank Adaptation)
PEFT Techniques 2: Soft Prompts

Aligning Models with Human Values

Introduction to Model Alignment
Reinforcement Learning from Human Feedback (RLHF)
Obtaining Feedback from Humans
Reward Model and Fine-Tuning with Reinforcement Learning
Addressing Reward Hacking
Scaling Human Feedback

Prompting vs. RAG vs. Fine-Tuning

A decision framework for adapting model behavior
When prompting and context engineering are enough
When retrieval beats fine-tuning - and vice versa
Combining approaches in one system
Cost, maintenance, and data requirements of each path

Model Optimization and Local Models

Model Compression Techniques
Quantization and Pruning
Optimizing Inference Performance
Deployment Strategies
Open-weight models and serving locally with Ollama and vLLM
Privacy, data residency, and restricted environments

Evals, Observability, and Cost Engineering

Evaluation Metrics for LLMs
Standard Benchmarks
Evaluating Model Performance
Application-level evals: golden datasets and LLM-as-judge
Evals and observability (Langfuse, Braintrust)
Cost engineering and prompt caching

Responsible AI and Keeping Current

Ethical Considerations in Generative AI
Bias and Fairness in LLMs
Privacy and Security Concerns
Prompt injection and emerging threat patterns
Developing Responsible AI Practices
Keeping current: tracking model releases and evaluating what matters
Where to go next: LLM Application Development with Python