Why capable agents still fail
We take a frontier model, give it a “build a clone of claude.ai” prompt, run it in a naive loop, and watch it fail in slow motion. Then we name the failure modes every later module addresses: one-shotting, context loss between sessions, premature victory, undocumented progress, environment drift.
Lab · Run a baseline agent against a non-trivial task. Record every place it gets stuck. Write a one-page failure mode taxonomy — your private rubric for the rest of the course.
Deliverable · failure-modes.md