15-113 · Effective Coding with AI · Spring 2026

Best Practices and Lessons Learned

...from the first pioneers of 15-113 Effective Coding with AI


A report distilled from student surveys after each assignment, including the techniques, tools, and tough lessons that emerged as the class moved from browser-chatbot HTML prompts to full agentic workflows.

An important note from Mike

Hi everyone! First, thanks for all of your tremendous effort over these last 14 weeks. It's been an honor to pilot this new course alongside you all, and you've made such wonderful things!

We also have so much insight from all of you, and we're finally close to being able to synthesize those into a set of student-driven best practices that you can carry forward, and that other students will benefit from in future semesters.

This document is a first attempt at summarizing your insight from 15-113 Spring 2026. It's still somewhat incomplete because your capstone is due after the final class where we'll present this, but we'll update it with your feedback from that project as soon as we have it. In the meantime, it's important to note that aside from this first section, this preliminary report is mostly generated with Claude Opus 4.7, with subsequent manual revisions. I've made a reasonable attempt to make sure that it's aligned in its representation of the assignments and your comments, and to make sure that it's not inventing quotes or making unfounded conclusions, but one should still approach these conclusions with a somewhat critical eye. I'll be writing my own summary of this semester in the coming weeks, which I'll of course share with you, and everything in that document will be carefully validated. At the moment I agree with essentially all of the conclusions in his report, but especially where AI is concerned, we know that transparency is always best!

How to read this report

A short legend before we dive in.

Each section below covers one assignment, in chronological order. The structure for every section is the same:

The brief. A short description of what the assignment asked for, drawn from the published course writeup where available.

A stat strip. Averages from the Likert-style questions on the survey. "Satisfaction" and "Understanding" are on a 1–7 scale; "No-bugs confidence" and, where asked, "Challenge" are on a 1–5 scale.

Themes from the class. The substantive advice that showed up in the free-response field, clustered into themes. Each theme is tagged with how well it's represented in the data:

Example theme Dominant

A dominant theme (5 dots) shows up in roughly a third or more of the comments on that assignment. A strong theme (4 dots) is the plurality view but doesn't swamp everything else. A common theme (3 dots) is clearly repeated. Some (2 dots) means a handful of students said it. One student (1 dot) flags an observation worth sharing even if only one person raised it.

When perspectives genuinely conflict, both are shown with their own representation tags. Where a quote is used, the student identifier and assignment are cited as [S## · HW#] so the instructor can trace responses back to the original survey.

Project 1Personal Portfolio Website

Week 1–2 · first major assignment · most students' first-ever website

The brief

Build and deploy a professional portfolio website with an About section, a Projects section (placeholders are fine for now), and contact info. Responsive design, live on GitHub Pages, AI usage documented in code comments. Students were steered toward browser-based chat tools (ChatGPT, Claude, Gemini) rather than IDE integrations, specifically so they would notice the limitations of that workflow.

Responses

42

Satisfaction

5.3/7

Understood code

4.4/7

No-bugs confidence

3.8/5

Can update later

4.3/5

Tools the class reached for

Gemini 31% ChatGPT 26% Claude 26% GitHub Copilot 10% Cursor 5%

Themes

Ask the AI to produce a plan before any code, and read the plan. Strong

The most consistent advice on Project 1 was to treat the AI as an architect first and a coder second. Students who asked for plans, outlines, or section-by-section strategies before letting the model write anything reported smoother work and cleaner code.

"Having the AI make plans for large features is helpful." [S31 · P1] · "Get AI to give you their plan before telling it to execute it." [S43 · P1] · "Using an AI tool to plan out your project first." [S1 · P1] · "When using GitHub Copilot, I found using Plan Mode first and then switching over to Agent Mode to be very effective." [S37 · P1]

When the chat gets muddied, start a new one. Common

Several students independently arrived at the same conclusion: chats accumulate bad context, and prompting harder doesn't fix it; starting fresh does.

"Refreshing the chat helps not muddy the code — don't just keep prompting, and safe often." [S2 · P1] · "Starting a new chat even with the same model fixes problems." [S39 · P1] · "Try start new conversation if the current one won't be able to give what you want." [S42 · P1]

Specificity pays, but a minority got better results by loosening the reins. Strong, with a genuine counter-view

Most students said detailed, explicit prompts yielded the best output: telling the model exactly what to change, what not to change, and where.

"Ask for one change (or group of very similar changes) at a time." [S45 · P1] · "When prompting, specify what need to be changed and what don't need to be changed. Giving sample is generally helpful." [S35 · P1] · "You need to be pretty specific about what u want or it will give you a really simply result." [S26 · P1]

But a visible counter-current, particularly around design work, argued the opposite: hand over aesthetic judgement and let the model surprise you.

"AI performed better the less specific I was, and the more 'free rein' it had to complete a task." [S24 · P1] · "Classmate prompted it to do more general things rather than specific and let AI take more control of the design." [S41 · P1] · "It was easier to use AI in an open-ended way to start and for design features." [S18 · P1]

Show, don't (just) tell: feed concrete references. Common

Visual references and links to inspiration websites outperformed pure text descriptions, especially for layout and style.

"If you want to combine multiple pieces for inspiration, rather than describing it in the prompt, actually provide the sources you want to combine (the link of a cool hackathon website with the link of a cool portfolio you saw)." [S15 · P1] · "Make a sketch in Figma and then tell LLM to make a website that looks like the sketch really helps a lot." [S32 · P1] · "Use AI to view other websites and then generate the format and slowly prompt new elements onto the page." [S5 · P1]

Verify everything: the model will confabulate. Common

Hallucinated details (especially personal info, instructions, and small styling choices) showed up often enough that several students made "check its output" a standing rule.

"Check anything that it writes because it really likes to make things up." [S17 · P1] · "It remembered everything, even from previous prompts. When I made my first skeleton of the website, it had information about me I didn't explicitly write in." [S33 · P1] · "I often had to walk the AI through issues I was facing. It had difficulty finding its own bugs when they were not described in a lot of detail." [S25 · P1]

Ask for explanations so you understand what's being shipped. Some

Treating the assistant as a teacher, not just a typist, was a recurring recommendation; it's useful for the learning goals of the course, and also for being able to reasonably explain the code later.

"When prompting, emphasize that you want to understand why we're changing certain code and what certain code does for us implementing our desired feature." [S21 · P1] · "Asking AI to make the code readable to beginners is a good way to learn." [S40 · P1]

IDE-integrated tools beat the browser window, when you can use them. Some

A small but vocal group found the browser-chat workflow the course prescribed frustrating compared to IDE integrations they already knew, confirming exactly the "notice the limitations" lesson the project was designed to teach.

"Using Claude Code in the IDE was way better than using the browser version." [S8 · P1] · "Using cursor, it is very powerful." [S4 · P1] · "The live preview extension worked well with GitHub copilot, enabling you to directly select elements on your website and let AI modify it." [S36 · P1] · "Use vs code extensions it makes the work flow a lot easier." [S28 · P1]

Try to solve it yourself before pasting the error in. One student

Advice worth highlighting even though only one student voiced it this assignment (the pattern returns later in the semester):

"Always try to solve an error first before copying into the prompt. Even if you just spend 20 seconds looking at it, you will learn a lot more." [S22 · P1]

Model choice matters; credit / rate limits will bite you. Some

Different models excel at different things, and free-tier rate limits surprise people mid-deadline.

"The model you use really makes a huge difference." [S34 · P1] · "Claude is good for good ui design." [S29 · P1] · "GitHub Copilot is very convenient but the credits are gobbled up really quick." [S14 · P1] · "Putting in a contact form is relatively easy with AI — Gemini specific." [S12 · P1]

"AI is better suited for iterating and optimizing after an MVP is designed, rather than creating everything from scratch. The key is that you must have your own ideas first, it shouldn't be completely handed over to AI." [S44 · Project 1]

HW 2Crossy Road

Week 3 · "Core prompting strategies" · building a playable 2D (or, if brave, 3D) game

The brief

After a lecture series on prompting strategies (naive prompting, plan-adjust-execute, detailed prompts, "Tetris three ways"), students were given one hour to prompt their way to as much of a Crossy-Road-style game as they could. The survey asks about Pygame, cmu_graphics, library setup, and whether a 3D version would be feasible, and the student responses make clear that this was a Pygame-centric build in which setting up the right Python environment and assembling a working game loop from scratch were both part of the challenge, under a strict time budget.

Responses

41

Satisfaction

4.4/7

Read code

2.9/7

Understood

3.6/7

No-bugs confidence

2.2/5

The metrics should be read against the one-hour clock. This has the lowest bug-confidence score all semester (2.2/5), but that is partly an artifact of students shipping genuinely unfinished work. Many reflections explicitly mention running out of time before they could debug what the AI produced. The advice that came out of this assignment is, accordingly, dominated by how to extract maximum useful output from an AI under time pressure.

Tools the class reached for

Gemini 44% Claude 34% ChatGPT 34% Cursor 7% GitHub Copilot 7%

Themes

Work incrementally. Small prompts, one feature at a time, MVP first. Dominant

The single strongest message across HW2, and almost certainly shaped by the one-hour constraint: students who asked the model to build the whole game in one shot ended the hour with output too big to debug; students who asked for one segment at a time (player movement, then a lane of traffic, then the score display) got something playable within the time budget and felt they understood what they had.

"Prompting it to complete segments of the game rather than fixing all the mistakes at once." [S5 · HW2] · "Control small and important parts without letting it jump into making the whole thing." [S17 · HW2] · "Start simple, and add more features incrementally." [S31 · HW2] · "Do not let AI generate too much content in one time, and care about the in-context surrounding." [S27 · HW2] · "Giving it bullet points of what features to implement at the start got me a great initial product." [S9 · HW2] · "I think building from simpler ideas and then constructing larger ideas would be a good approach." [S23 · HW2] · "Creating a simple plan first and then adding features." [S41 · HW2]

Don't let AI write one giant file. Ask for multiple files. Some

Two independent students landed on the same specific tactic: force modularity at prompt time.

"Don't create crossy road all in one file." [S3 · HW2] · "Tell AI to separate the code in multiple files — I think this makes understanding the logic of the code easier." [S40 · HW2] · "Telling AI to write code in a specific structure helps with debugging." [S36 · HW2]

Screenshots and sketches work better than words for UI bugs. Some, strongly held

When something looked wrong, students who uploaded a screenshot of the broken thing got fixes faster than students who tried to describe the bug in text.

"Taking screenshots of bad UI components and uploading them to the AI was super useful." [S22 · HW2] · "Give screenshots are powerful for AI to recreate." [S35 · HW2]

One AI writes the prompt, another AI writes the code. Some

A two-model workflow appeared for the first time and will recur in later assignments.

"Using an AI to write a prompt for another AI." [S33 · HW2] · "I tried this process where I asked Gemini for prompts and planning while giving prompts to Claude. The process itself was less buggy, I think I encountered less bugs than I would have if I just prompted myself, but it took much longer than my peers to get to the same point." [S24 · HW2]

Student S24's note is important: the quality was better, but the speed was worse. This tradeoff matters.

When a bug won't go away, switching models doesn't always help. Sometimes you need to read the code. Some

Students noticed a specific failure mode: re-prompting, or swapping to a different chatbot, can dig a deeper hole instead of escaping the current one.

"AI is really bad with sticking with small changes you made, such as using pythonRound and not round." [S7 · HW2] · "I had to actually tell the AI to fix itself for it to fix itself." [S10 · HW2] · "Turning towards another model when there's a bug doesn't necessarily help fix it I think you need to manually debug it." [S29 · HW2] · "AI struggled with listening to specific prompts in my crossy road game. Even with multiple, repeated prompts, it was difficult to debug and change as I had to repeatedly play the game to find the bugs." [S38 · HW2]

Environment / library issues ate more time than code did. Some

The AI is not always aware of your Python version or OS. Several students hit wall after wall over package setup, and one explicitly called out that LLMs struggle with system-level requirements.

"AI struggles to understand system requirements. I spent a lot of time struggling because pygame could not work with Python 3.14 and I ended up having to revert to 3.13." [S13 · HW2] · "Downloading libraries was a nightmare for some reason." [S14 · HW2] · "Having your AI model interfaced with your hardware in order to ensure proper portability. Say 'I want this to be able to play from an html page' and then ensuring that the proper libraries are installed and configured for this task." [S21 · HW2]

Different models have different quirks, even on the same prompt. Some, nuanced

The most interesting observations came from students who ran the same assignment through different tools and compared:

"I compared the output of both chatgpt and gemini and I found that chatgpt took longer and tried to create something more sophisticated while gemini was alot quicker however took alot of prompts to get a product but it made less mistakes." [S20 · HW2] · "Different models still have similar ideas of what features to add, such as an eagle that ends the game when the player doesn't move for a while. Different models also had common issues with player locations and moving platforms." [S18 · HW2] · "Even the same model creates very different problems and outputs." [S39 · HW2]

"Ask" mode before "Agent" mode, when the IDE supports it. One student

A specific IDE-workflow tip that anticipates the agentic-development assignments later:

"Use 'ask' mode before 'agent' mode really helps." [S32 · HW2]

Rate limits and Pro tiers become real and brutal when you're on a clock. Some

Students hit model caps during iterative game-building for the first time this assignment, and with a one-hour budget, each minute spent waiting on a throttled response was material.

"Claude is very very slow and has a tendency to max out quickly." [S2 · HW2] · "Using pro versions is much more helpful. The free version of Claude was not very helpful." [S4 · HW2] · "If you fix small details all the time, most of your time is going to go for you to wait for the model's output." [S15 · HW2]

"Implementation with LLMs requires a lot of playing/working with the project." [S28 · HW2]

HW 3Explore an API

Week 4 · first contact with authentication, keys, and external data

The brief

Find a public API that interests you; build a small Python script, game, or web widget that fetches data from it and does something interactive. Write a README explaining how the API is called, a prompt log, and a short demo video. API keys must be kept out of the repo (environment variables or .gitignore'd config). "Prioritize a thoughtful implementation that does something interesting and more complex than just getting and printing data."

Responses

39

Satisfaction

5.1/7

Read code

3.5/7

Understood

3.9/7

No-bugs confidence

3.0/5

vs. Project 1 difficulty

3.2/5

Tools the class reached for

Gemini 51% ChatGPT 36% Claude 26% GitHub Copilot 23% Cursor 10% Antigravity 8%

First assignment where IDE-integrated tools crossed 30% collectively; also the first appearance of Antigravity.

Themes

Two-model workflow: brainstorm and plan with one, implement with another. Common

The pattern that appeared tentatively in HW2 got clearer here: students split "what should I build" from "now build it."

"Planning with Claude / chat / Gemini and using cursor and antigravity to write actual code." [S4 · HW3] · "It was useful for me to develop a plan and brainstorm features with one AI model first, gaining ideas for the project itself. Then implementing with a new AI and incorporating those details in the description helped refine it I think." [S38 · HW3] · "Use general AI like gpt to certain idea and do brainstorm, and use codex or cursor to code and implement the ideas." [S27 · HW3]

One dissent is worth recording:

"It's hard for me to use different ai when I do one project. The logic of different AI is different. I can't intergrade the idea from different AI tools." [S44 · HW3]

Start with a simple, clear idea: don't go with the flow. Common

This is essentially the "incremental / MVP first" advice from HW2, but with a new twist: students warned each other against aimless drift when there's no strong starting vision.

"Having a clear idea / vision was very important. I went with the flow too hard and my project was just very random." [S43 · HW3] · "Start from a simple idea and then build up." [S23 · HW3] · "Implementing code step by step to avoid major bugs." [S20 · HW3]

Be very specific, because the AI can't read your mind. Common

Several students ran into precise phrasing traps, where the same ambiguous English generated wildly different projects.

"For more developed projects, if you don't clearly specify what you mean AI won't be able to do it. It can't read my mind like I thought it would." [S3 · HW3] · "Be very specific when trying to add the api in the project." [S5 · HW3] · "I found that when I would phrase something with 'I want a project that does X so that I can learn Y while building it' I would receive a literal learning module app, rather than an interactive project that would teach concepts by working through the project." [S21 · HW3]

Feed the AI the actual API documentation, or a URL. Common

A concrete technique that multiple students credited for easier integration:

"It felt good to first ask ai to find a database with well define api documentation." [S11 · HW3] · "Make AI read parts of the api file." [S36 · HW3] · "Ask AI to deeply explain the process of executing the AI, which clarified a lot of things in the code, methods and my understanding of an API." [S9 · HW3]

Plus a related finding about model recency:

"I find Google Gemini had some really great recent APIs that I could not only use, but recognized were beginner friendly. It was more updated than things I found in older lists." [S24 · HW3]

When the prompt loop gets stuck, read the code. Some

The HW2 lesson that re-prompting doesn't always work was stated even more crisply here:

"Once there's a bug, it doesn't go away with just prompting you need to read the code to figure it out to be more specific on prompting." [S29 · HW3]

Screenshots / example pages work for UI, color palettes for style. Some

Same show-don't-tell principle as Project 1, applied to HW3 contexts:

"Attaching screenshots really helps the AI with design." [S31 · HW3] · "Its often useful to provide the AI model with an example website interface to model your own of off." [S40 · HW3] · "It's really good have a color palette to start with, and tell LLM a specific style you want (eg. Pixel), otherwise it's going to make it too 'AI-like'." [S32 · HW3]

Don't trust the AI to handle secrets correctly by default. Some

Given that this was the first assignment with API keys, it's notable that the warning came from students who had almost been burned:

"AI is not very specific in what to do to not push an api key unless you tell it explicitly." [S18 · HW3] · "Don't trust that AI tools will persist environment variables correctly." [S22 · HW3]

AI-in-IDE is better at debugging than browser chat. One student

"AI built in in your IDE can be better at debugging." [S35 · HW3]

Miscellaneous observations worth carrying forward. One student each

"At the very least, ChatGPT is quite familiar with Streamlit, a very cool tool for creating easily launch-able websites. I did not even ask it to use it but it automatically did so and performed quite well." [S13 · HW3]

"AI not good at giving terminal related instructions." [S39 · HW3]

"Jetbrains Junie choked trying to do it." [S19 · HW3]

HW 4Frontend + Backend

Week 5 · "Server-side development" · two things that have to talk to each other

The brief

Build an application with a separate backend and frontend, deploy it, and make the two halves actually talk to each other in production. From the survey questions, students were asked whether they tested the backend locally before deploying. From the responses, it's clear the deployment stack included Render, Vercel, PythonAnywhere, Supabase, and a range of APIs (including Google Maps and Google Calendar). Students found it more challenging than HW3 (mean 3.8/5 on the comparison scale).

Responses

37

Satisfaction

4.8/7

Read code

3.8/7

Understood

3.6/7

No-bugs confidence

2.8/5

vs. HW3 difficulty

3.8/5

Tools the class reached for

Gemini 51% ChatGPT 32% Claude 27% Cursor 14% GitHub Copilot 11% Antigravity 8%

Themes

Run separate chats (or agents) for frontend, backend, and data. Strong

The clearest architectural lesson of the assignment: when a project has cleanly separable concerns, give each one its own context window.

"For a project like this, split your work into multiple chats. I had the backend, frontend, game logic and architect chats. Every time you switch, you must tell it the current state of the project." [S15 · HW4] · "I found starting with the backend and then moving on to the frontend (once the backend was working) was a good workflow for this assignment." [S37 · HW4] · "It helps to have your AI model break down the process for which it should implement each of the frontend and backend, and then implement them. So it has more direction." [S11 · HW4]

Plan with one LLM, implement with another: now with explicit hand-offs. Common

The brainstorm-then-build pattern got sharper and more specific about what to pass between models:

"Have one llm generate a step by step plan and then give that plan to another llm to implement it." [S4 · HW4] · "Sometimes I'd ask Claude to generate the best cursor prompts for me. Don't underestimate an AI's ability to write prompts." [S45 · HW4] · "I recommend building ideas with GPT/Gemini and coding with codeX/ Cursor." [S27 · HW4]

Deployment is its own beast. Free-tier hosts will surprise you. Strong

This was the first time deployment itself (not the code) was the main source of student frustration. Render spinning down on free tiers came up repeatedly, as did the Vercel serverless model forcing code changes after local work had already succeeded.

"Render shuts down every time I leave it unattended for several minutes, making it very hard to generate results consistently." [S34 · HW4] · "Render went to sleep often." [S6 · HW4] · "It was challenging to deploy render since it takes a while." [S26 · HW4] · "So when using Vercel vs local testing I can't have a server.js file and that forced me to change a couple of things after finally getting everything to be stable after local testing." [S14 · HW4] · "From my conversation with [REDACTED] I found that some APIs are more specific on how you should display your API key. I also found that using a tool like uptime robot to decrease the amount of time to load the page was useful for demonstration." [S21 · HW4]

Test the backend locally and with curl before deploying. Some, hard-won

The clearest version of this lesson came from a student who didn't do it and paid for it:

"Manually check your API calls using Curl on the host server. I ran into a long problem with pythonanywhere denying SMTP requests, but the AI code ate the logs." [S22 · HW4]

"The AI code ate the logs" (i.e., the generated code swallowed error information instead of surfacing it). A good concrete reminder that AI-written error handling can hide the very information you need to debug.

Database, auth, and cross-service config are where AI stops being helpful. Common

A consistent finding: when a bug lives outside the code (in a database schema, in Supabase dashboard settings, in Google Cloud key restrictions), AI assistance gets much weaker.

"I found setting up supabase to be challenging at first as I couldn't have Ai do it automatically for me. So when I ran into issues with the data storage I couldn't just prompt GitHub copilot to fix the issue as the errors weren't in the code but rather in the way I set up my tables in supabase." [S5 · HW4] · "It's harder when it involves jumping between different services, especially for the database configuration part." [S32 · HW4] · "Setting up auth and database was tricky and AI wasn't super helpful." [S31 · HW4] · "Google Maps APIs force you to expose your API key on the front end source code because it's required to make direct requests to Google's services." [S8 · HW4]

The AI won't tell you if what you're asking is feasible. You have to check. Some

A failure mode students hadn't really seen before: the model confidently tries an approach that cannot work in the environment they're in.

"One thing I found challenging is AI not determining if my request is feasible before trying to implement." [S3 · HW4] · "For some reason, AI (or at least Gemini) really doesn't want to suggest that you do something to find the available API models. The error message I was giving Gemini had to do with models, and it included a function I could call to check available models. Gemini never suggested I run that even when none of the models it suggested were working." [S18 · HW4]

When a library fights you: swap libraries or swap models. Some

Concrete tactics for getting unstuck:

"Sometimes when the AI struggles with debugging when using a specific setup, it helps to change the external library or modules used in the code." [S36 · HW4] · "Switching between models to cope with running out of credits." [S16 · HW4] · "It seems to be much easier to get AI to work with Javascript for the backend rather than Python, perhaps because it was trained on that more?" [S17 · HW4]

Understand the platforms you're gluing together, not just the code. Some

"Honestly, the hardest part for me was understanding how ports worked and how to run things locally. Otherwise, I found AI to be pretty helpful in understanding how the backend should connect to the front end. It just struggled to tell me why." [S24 · HW4] · "Sometimes, when I read AI responses, some words are like hyperlink, I can read it but do not know what it represents for. So I spent more time to understand these professional nouns name." [S27 · HW4]

"For a project like this, split your work into multiple chats. Backend, frontend, game logic, and architect chats. Every time you switch, you must tell it the current state of the project." [S15 · HW 4]

Project 2Creative Web App

The mid-semester project · ~5–6 hours of new work · first self-directed scope since Project 1

The brief

Design and implement a creative, portfolio-ready web app using AI effectively in the process. Must include at least one of: frontend-backend communication, thoughtful third-party API usage with secure keys, a database, substantial data analysis or visualization, rich interactivity, or computer vision / ML. Deliverables include a deployed app, a public repo, a README, a prompt log, a demo video, and a midpoint check-in. A specific instruction: "you'll need to write or substantially modify at least some of your code, so be careful not to just vibe-code until it's too complicated for you to grasp." This is the first project that combined everything from the first four homeworks into a single scoped deliverable; the heavy agentic and SPEC-driven work (HW8, HW9) was still to come.

Responses

30

Satisfaction

5.2/7

Read code

4.4/7

Understood

4.4/7

Bug confidence

3.4/5

Challenge

3.6/5

Tools the class reached for

Gemini 47% ChatGPT 40% Claude 40% Cursor 37% Antigravity 13% GitHub Copilot 7%

The first assignment where an IDE-integrated agent (Cursor) sat in the top tier alongside the browser chatbots. Four tools pulled at roughly equal share; students were experimenting hard.

Themes

Use multiple AI models deliberately, for what each is best at. Strong

The multi-model workflow had been emerging in quieter form since HW3 and HW4; Project 2 is where it got articulated cleanly. S38's version is worth treating as the class's working definition of the practice:

"Using multiple AI models helped with splitting up tasks; for example using ChatGPT to brainstorm and get ideas then using Claude for a majority of the code and Gemini to debug certain parts. This helped with the model limits and balancing my AI usage, though one downside is that it might not pick up where you left off for other models." [S38 · Project 2]

Related:

"I found it really helpful to plan with cursor, but still found it better to get actual code with Claude over cursor." [S24 · P2] · "Using multiple families of ai models really helps." [S20 · P2] · "Switched from Gemini to claude." [S16 · P2] · "I recommend building ideas with GPT/Gemini and coding with codeX/ Cursor." (carry-over tactic from HW3/HW4, applied again here by multiple students.)

Feed the AI documentation and links, not just descriptions. Common

A direct carry-forward from HW3: pointing the AI at real sources consistently outperforms asking it to remember.

"AI is very good when you give it documents to reference such as API documentation." [S1 · P2] · "It helps so much to paste the link to the website for unique APIs so that the AI can parse it to better understand the nuances and help you code." [S14 · P2]

Understand the platforms, not just the code. Common (clearer than on HW4)

Deployment pain continued; Project 2 brought an explicit diagnosis. The class's advice: if you can't explain the platform to a friend, you'll struggle when it fails.

"You should understand the platforms that you are using to prevent repetitive debugging." [S29 · P2] · "One challenge was working with and understanding the supabase system I used. Also working between multiple platforms — GitHub, vercel, supabase." [S10 · P2] · "When you have to build product ready output having user challenges in mind, its hard to explain AI. It messes up the backend interlinking and fixing it step by step takes a lot of time." [S30 · P2] · "AI has not made setting up live websites easier yet." [S22 · P2]

Sometimes the right answer is to read the docs yourself. Some (but quoted unanimously)

By Project 2, students had hit situations where AI's help was actually slowing them down relative to reading the source.

"It wasn't the best to use AI to teach me how to make API calls since it would try to give me the answer many times, so I probably should've read the documentation myself." [S7 · P2] · "I found that Gemini was particularly helpful when I was creating my Python/Flask backend and provided very detailed explanations of the code." [S37 · P2] — the inverse example, where AI-as-tutor worked.

Free-tier limits bite: for models and for APIs. Some

"Free tier APIs really has a lot of limitations…" [S34 · P2] · "Free APIs for making graphs and visualizing data are challenging to use." [S35 · P2] · "Sometimes it might be slower if youre running an ai backend on render than local because a lot of the time free tiers have limited processing like 0.1 cpu or somthing." [S28 · P2]

AI still struggles with cross-screen, cross-file consistency. Some

A specific and persistent limitation for larger projects:

"When there are multiple elements or layers of screens, AI isn't very good at applying formatting changes to all of them consistently." [S18 · P2] · "Antigravity was great but it often created a whole lot of files and folders with barely any code in them. Even now, I don't think I fully understand what each of them do..." [S39 · P2] · "Chat doesn't like to deviate from its own designs." [S6 · P2]

The hardest part, for many, wasn't AI or code: it was the idea. One student (echoes HW3's S43)

"I think the hardest part of the project was finding something meaningful to make." [S4 · P2]

Worth noting that the project writeup itself pre-empts this: "Having trouble coming up with an idea? Talk to an actual human! Maybe even our TAs!"

Gemini is forming opinions across your chats: that's a design hazard. One student

"Gemini started thinking I had the same preferences for the code it wrote for this project and other things, even when I said I didn't (like not leaving any comments, which is for Google Sheets formulas)." [S17 · P2]

Memory across sessions is helpful, except when it isn't. Something to be aware of as models build up user-specific context.

HW 6Databases / SQL

A database-backed app: the class's favourite assignment by satisfaction and understanding

The brief

Build an application backed by a database (SQLite features prominently in responses), with meaningful CRUD operations and search/filter features. This is the first assignment where student comments start to explicitly celebrate how much they understood what they built, and the numbers back them up.

Responses

35

Satisfaction

5.3/7

Read code

5.0/7

Understood

4.8/7

No-bugs confidence

3.6/5

Challenge

2.6/5

Across the whole semester, HW6 has the highest "how much did you read the code" score (5.0/7) and the highest "how much do you understand" score (4.8/7). It's also the least challenging of the four assignments from HW6 onward (2.6 vs. 2.7 on HW7, 2.8 on HW8, 3.3 on HW9). Whatever the class was doing here, it was working.

Tools the class reached for

Gemini 43% Claude 37% ChatGPT 31% Cursor 23% GitHub Copilot 9% Antigravity 6%

Notably more browser-chat-heavy than Project 2 was: Cursor actually drops from 37% to 23% here, and the themes below are rich with comments about using AI as a tutor for SQL rather than as an agent that writes it for you.

Themes

Plan first: incrementally, with the AI writing docstrings or scaffolding you then fill in. Strong

The "plan first" theme that started in Project 1 reached its most mature form here. The standout technique, described by S5, is the one the class should carry forward:

"I found using Claude to explain the project and create a plan by initializing all the functions with doctrings and then using GitHub copilot to read each functions doctoring and implement it worked very well." [S5 · HW6]

Several students said similar things in different words:

"When you tell AI how to incrementally build it becomes really easy." [S3 · HW6] · "Starting small is the way to go." [S29 · HW6] · "Working through the steps in order, as suggested, was very helpful." [S9 · HW6] · "By prompting AI with smaller tasks and telling them what to do exactly, they tend to give more stable codes." [S35 · HW6] · "Detailed specs are really important here." [S20 · HW6]

Separate CRUD operations into separate prompts. Some

A specific modularity tactic for this assignment shape:

"I found that separating the different CRUD operations across different prompts was most effective." [S37 · HW6]

The "DHH sandwich": AI plans → you write → AI reviews. One student (but the best single framework in the survey)

Only one student articulated this, and they wrote it up beautifully. Highlighted here because it directly explains why HW6 scored so well on understanding:

"One really helpful thing I learned from this homework was how effective a framework I learned from David Hanson (DHH) is. Essentially, I had AI help generate my plan for a manually coded feature, then gave it my best shot without the AI's help, then gave it all the code I wrote and had it comment on it. This way, I really understood the code from the start, and any edits/improvements the AI suggested, I actually could directly understand why it was making those changes, because I knew how it impacted the lines around it." [S13 · HW6]

AI is noticeably weaker at database-related changes that ripple. Some

A specific and useful limitation the class surfaced: database logic is often touched from many code paths, and AI fails to propagate one change to all of them.

"AI seems to be noticeably worse at noticing how changes with a database might require changes in the rest of a program than it is for non-database related changes." [S17 · HW6] · "Generative ai is usually bad at sql so it takes a lot of prompt refining." [S12 · HW6] · "Database has more dependencies so it's less easy for ai or just agent to work on." [S42 · HW6]

AI became a genuine learning tool for SQL this time. Common (very positive tone)

This is the assignment where appreciation for AI as a tutor (rather than just a code generator) showed up most often and most positively.

"AI was a useful learning tool this time around. It was able to explain everything to me clearly and could communicate well why things were the way they were." [S24 · HW6] · "One thing I found challenging was the unorthodox structure of SQLite, but Claude and the documentation was helpful to understand it." [S45 · HW6] · "One thing that was helpful was uploading docs about SQL for ChatGPT." [S1 · HW6] · "I think the structure of this assignment made me interact more meaningfully with the code. I liked the involvement I had with this assignment." [S21 · HW6]

Know what you're committing to git. One student (worth passing along)

"It's about time to have a talk about what should be tracked in git. Your db likely shouldn't be unless you have a good reason to." [S22 · HW6]

Watch out when the tool tries to go beyond the assignment. Some

Multiple students using Cursor / Antigravity ran into the opposite of the HW2 problem: the agent wants to do too much.

"First time using Cursor and it went well. But it was always trying to do more and more, wasn't perfect for a assignment testing your fundamentals." [S15 · HW6] · "Antigravity is too good. I'm hooked." [S39 · HW6]

AI still can't reliably read designs from unclear descriptions. Some

"AI can have issues with formatting." [S18 · HW6] · "Description on UI is difficult understanding for AI." [S27 · HW6] · "AI is getting better at generating UI based on pictures." [S14 · HW6] — suggesting screenshots remain the fix.

"I found using Claude to explain the project and create a plan by initializing all the functions with doctrings and then using GitHub copilot to read each function's docstring and implement it worked very well." [S5 · HW 6]

HW 7Code Handoff

Build the start of a game in phase 1; inherit someone else's half-finished code in phase 2

The brief

In phase 1, students picked a game to start (Pacman and tower defense come up repeatedly in responses, alongside other arcade-style projects). In phase 2 they swapped code with a classmate and had to fix, finish, or extend what they received. The survey asked three unusual things: what did you do to make your code easier to inherit, what made the inherited code easier to work with, and what made it harder. Those three questions together surface the class's collective wisdom on "leaving code for someone else to finish," which is one of the most underrated skills in AI-assisted development.

Responses

33

Satisfaction

4.4/7

Read code

3.8/7

Understood

3.7/7

No-bugs confidence

2.8/5

Challenge

2.7/5

Tools the class reached for

Claude 42% Cursor 36% Gemini 24% ChatGPT 18% GitHub Copilot 18% Antigravity 12%

Cursor climbs back to 36% (near its Project 2 high) while Gemini and ChatGPT both drop by roughly half. This is the first assignment where the IDE-agent share clearly exceeds the browser-chat share, as the handoff structure of the assignment rewards being inside the editor.

Themes: what students did for the next person

A detailed README with a to-do list is the single highest-leverage thing you can leave behind. Dominant

This was the overwhelming consensus. The phrase "READMEs with a to-do list" or close paraphrases appear in well over a third of responses. The key refinement: don't just say what you did; list what's left.

"Yes I tried to clearly state in the read me the features that have already been implemented and created a list of future steps clearly defining what needs to be finished." [S5 · HW7] · "I created a guide document explaining the code and what additions or pieces are missing. I also added very in depth notes and comments throughout the code making it a very clear structure and understandable at most levels." [S10 · HW7] · "Yes, I put in the details in README.md including how to test locally, what functions are there now, what's the project's structure, and suggestions for the future dev." [S32 · HW7] · "Made a detailed README with all details of desired product and to do list." [S39 · HW7] · "I left some specific features out and listed them as starting points." [S45 · HW7] · "I unfortunately was not able to get as much done as I would have like during phase 1 because of a prompt timeout, so I added a to-do list in my reader to help give my partner a starting point." [S37 · HW7]

Ask AI to write meaningful comments as it writes the code. Strong

The companion to a good README is good in-line commentary, and students were explicit about prompting the AI to produce it.

"I asked the AI to write comments in the code. Specifically for tower defense, I asked it to make a clean way to add more tower classes and to tell me how." [S24 · HW7] · "I added documentation throughout the code with comments to explain what each part was doing." [S1 · HW7] · "I explained what features I'd implemented in the readme and made sure my comments were informative and placed wherever needed." [S7 · HW7] · "Made sure to comment all the functions." [S28 · HW7] · "I made a Documentation.me file that explains every code files' usage and purpose." [S47 · HW7]

Architect for extension: split into files and functions that future work can slot into. Common

The students who had thought explicitly about extension points came out ahead.

"Generated and reviewed a nice readme, tried to use best practice packages and break things into separate functions/files." [S31 · HW7] · "I tried to build a very basic but thorough implementation of a basic tower defense game so that the person could easily build upon it." [S4 · HW7] · "I asked [the AI] to make a clean way to add more tower classes and to tell me how." [S24 · HW7]

Document the prompts you used. One student (worth copying)

"I documented my prompts, known issues and things to add, the goal of the app, and made sure the AI coded in a readable way." [S18 · HW7]

The inheritor doesn't just need your code; they benefit from knowing how it got to this state.

A meta-note: "phases" in the README help the next AI as much as the next person. One student

"I included a plan in the read me with all the 'phases' I had planned for my game for AI to build incrementally." [S40 · HW7]

This prefigures the SPEC.md work of HW8 and HW9.

Themes: what made inheriting code easier or harder

Brief free-text advice from inheritors, consolidated.

The class broadly agreed: clarity of README, quality of comments, and modular file structure made inherited code workable. The inverse made it painful. Specific items students called out as making their job easier or harder (paraphrased, because these came from many short responses across two survey questions):

  • Easier: Clear README with file-by-file explanation; a to-do list of remaining work; inline comments near complex logic; consistent naming; a running deployed demo to sanity-check behavior.
  • Harder: Large single files, cryptic variable names (the qp-for-"question points" complaint recurs in HW8), no comments at all, unstated assumptions about the environment, features described in the README that weren't actually implemented.

HW 8Agentic Development

Write a SPEC.md, then let the agents build from it

The brief

The explicit structural shift: instead of live conversational prompting, students wrote a SPEC.md document describing the intended software (acceptance criteria and all) and then handed that spec to one or more agents (IDE-based, like Cursor or GitHub Copilot in agent mode) to build, review, and fix. The survey asked about both the spec-writing step and the agent-wrangling step separately, and students rated spec-writing as the harder of the two (3.3/5 vs. 2.5/5 for agent-wrangling). The recommendation score for agentic development for similar-complexity projects was moderately positive: 4.5/7.

Responses

33

Satisfaction

4.9/7

Read code

3.2/7

Understood

3.6/7

Bug confidence

3.0/5

Spec hard?

3.3/5

Agents hard?

2.5/5

Recommend

4.5/7

Tools the class reached for

GitHub Copilot 52% Cursor 30% ChatGPT 18% Claude 15% Gemini 15% Antigravity 6%

The biggest tool shift of the semester: GitHub Copilot doubles from 18% on HW7 to 52% here. Collectively, over 80% of the class used an IDE-integrated agentic tool on this assignment.

Themes

A detailed SPEC.md pays off: but getting it right takes judgement. Dominant, with real nuance

The most common single claim on HW8: the quality of the spec determined the quality of everything downstream.

"I found that making the spec very detailed from the start helped me to be able to save time later and led to a more desirable output." [S37 · HW8] · "Thoroughness and creativity go a long way in the SPEC.md file. Prompting precisely is incredible important." [S45 · HW8] · "AI works best when you outline a specific plan. It is extremely helpful for them to have the structure built up and only having to fill in the gap." [S35 · HW8] · "AI, like humans, work best when they have plans and are able to organize their own work flow." [S34 · HW8] · "The more specific at first, the easier it gets later." [S42 · HW8]

But the class also identified a concrete failure mode: too much specificity doesn't improve results, it boxes the agent in. One of the best pieces of writing in the whole dataset:

"I found I had to strike a balance between specificity and freedom in my spec. If given too much specificity, the AI could not do its job and the benefit of time saving was removed. If I wasn't specific enough, fixes to the code later on would no longer be simple syntax/functionality changes, but rather fundamental structural changes that the AI is not as good with." [S13 · HW8]

A related observation: underspecify deliberately when you'll want to iterate later.

"It was good at producing the result I described, but it might be helpful to be a little vague with the spec and fix the app later, for time's sake." [S17 · HW8] · "If you don't have something specific to say, don't write it in the plan." [S29 · HW8]

Acceptance criteria are read literally. Everything not listed will be skipped. Common

A specific, technical thing about the SPEC.md format that students learned the hard way:

"The acceptance criteria was taken very seriously and features mentioned elsewhere were not implemented." [S7 · HW8] · "AI is very strict about what counts as passing the acceptance criteria." [S18 · HW8] · "I found it a little challenging to get the agent to follow what was outlined in the spec.md rather than doing its own thing." [S10 · HW8]

Practical implication: if you want something, put it in the acceptance criteria, not in a paragraph somewhere else.

Design the user flow before writing the spec. Some (pointed)

Some students got to the end of a spec and realized the design was the missing step all along:

"Designing is an important step we may miss in this way if we want to control the design. I think it should come before writing the spec cleanly. When I realized after completing the spec, I would've made a better spec if i spent time on designing the user flow better." [S30 · HW8] · "I noticed that basic technical understanding is helpful and even necessary for a good spec. I failed to specify file structure. I hope I can obtain enough knowledge about architecture to be able to do so in future projects though." [S39 · HW8]

Use AI to help draft the spec itself. Some

Several students (one with a slight "I know this wasn't really allowed but…" caveat) pointed out that the SPEC.md itself is a document an LLM can help scaffold.

"I know this wasnt really allowed but have an llm brainstorm and write a general skeleton for spec md helped alot actually and helped it to be alot more detailed." [S20 · HW8] · "Asking chat what would be useful to include in spec.md made that step a lot easier." [S40 · HW8]

Conversational → agentic is a reasonable progression. Common (predictive of later HWs)

The best summary of the meta-workflow:

"I think going from conversational to an agentic build might be the best strategy for building with ai." [S4 · HW8] · "With projects that spans multiple files or have bigger complexity, I think using agentic development is a really good starting point and I think writing the spec out actually really helps with organizing your thoughts for what exact functionalities you want versus just going with the flow in increments in conversational AI." [S38 · HW8]

Spawn a new agent when context / code quality deteriorates. Some

Students hit context-window rot differently than in previous assignments; the agentic build ran long enough that context started breaking down.

"I found it very challenging to read and understand the code that the AI wrote. I had to create a new agent so that it could fix all the shitty named variables across the project. e.g. using variable name qp for 'question points' it becomes very hard to read when working with a large file." [S8 · HW8] · "The context is too short for a chat window, if we are plan to do a huge project, agent may not read SPEC to stop the whole project." [S27 · HW8] · "Try different agents." [S23 · HW8] · "I would try using different models to help AI unstuck itself." [S14 · HW8] · "Agents don't share memories." [S36 · HW8]

The real danger of agentic development: you stop reading the code. Some (the most important critique)

This is the single most self-aware observation in the entire HW8 dataset, and echoes through HW9 as well. It's worth quoting at length:

"I found this way of working caused me to not read my code at all. The only reason I understand what's happening at all is because I am fluent in Python. Spec to project was trivially easy, and I thought the finding bugs like this to be more efficient, but maybe in retrospect now, I would've preferred finding the bugs myself through physical testing rather than just trusting the bug existed and then trusting the AI solved it." [S24 · HW8]

"I did not like that it wasn't as conversational. a lot of the commits and file generation was kind of lost on me." [S12 · HW8]

The drop in the "how much of the code did you read" score (5.0 on HW6 → 3.2 here) is the data signature of this concern.

Writing a SPEC.md in a single 45-minute window felt too short. Some (instructor-actionable)

"Writing the Spec in only 45 mins was difficult to encompass everything." [S6 · HW8] · "When writing the SPEC, I was constantly thinking what I was missing." [S15 · HW8] · "Writing SPEC was hard." [S3 · HW8]

HW 9Phone Apps

Agentic development, but on a new platform (Expo / React Native / Android Studio)

The brief

From the survey questions and responses, HW9 is "HW8's lessons on a phone app." SPEC.md was optional but most students wrote one (23 of 30). Tools mentioned in responses: Expo, React Native, Android Studio, plus the same agentic build tools (Cursor, Copilot) carried over from HW8. Students rated this the most challenging assignment of the semester (3.3/5) and had the lowest understanding scores of any assignment (3.0/7). It's the hardest thing the class had done.

Responses

30

Satisfaction

4.4/7

Read code

2.9/7

Understood

3.0/7

Bug confidence

2.8/5

Challenge

3.3/5

Tools the class reached for

Claude 40% Cursor 40% Gemini 27% GitHub Copilot 20% Antigravity 13% ChatGPT 13%

Themes

Platform setup was harder than coding. Strong

The single most repeated complaint on HW9 isn't about AI at all; it's about tooling. Expo, Android Studio, the live-reload loop, the emulator.

"Setting up expo and having the live updates work was a bit difficult at first." [S41 · HW9] · "The most difficult part was setting up the expo and the novelty of tools working with. The agentic part and prototyping and testing was not that difficult." [S15 · HW9] · "The actual coding part was easier for AI than figuring out the setup." [S18 · HW9] · "Just getting Android Studio to work." [S14 · HW9] · "Make the application run is hard for me." [S44 · HW9] · "It was quite challenging to go through the tutorials and learn how to develop an app for the first time." [S45 · HW9] · "Making a phone app was so different than making a web app — i was very surprised." [S12 · HW9]

SPEC.md still earns its keep: focus it on an MVP. Strong

Carried forward from HW8, with a new emphasis on scoping the spec tightly:

"Spec is the most important thing you can do with your time." [S13 · HW9] · "Writing a good spec made the process very easy, I spent most of my time thinking about the design of the app and writing the spec." [S4 · HW9] · "I found that writing a very very detail spec.md was generally extremely useful." [S1 · HW9] · "When writing the spec, focus on an MVP first." [S31 · HW9] · "The SPEC.md helps give clear instructions to the agents." [S5 · HW9] · "The spec file was really helpful." [S26 · HW9]

A counter-voice (represented, but unusual) questioned its value on this assignment:

"For this one the spec.md is less useful for completing it." [S42 · HW9]

And a caution: writing a spec for a domain you don't understand (mobile) is harder than writing one for a domain you do.

"I found it quite challenging to write a spec when I already wasn't entirely sure how things were working. Like I knew I needed to use a camera function, but had no idea how to write that into my spec other than literally and hope that is made enough sense." [S24 · HW9]

Plan + build + review: one agent builds, another agent reviews. Common (the most mature workflow described)

S30's description is essentially the end-state workflow this course was trying to surface, stated crisply:

"I made a plan after spec using cursor ai's plan mode and created a detailed plan first and then asked one agent to build and other to review. It got done in 3-4 agent prompts." [S30 · HW9]

Related specifics:

"Asking non-agentic model for specific debugging and then pasting its response into agentic model is helpful." [S29 · HW9] · "Having agent actually be able to test the output so that it can find and fix bugs on its own." [S39 · HW9] · "Adding rules to cursor is quite helpful." [S17 · HW9]

Managing agent context is now a recurring design problem. Some

S8's comment here is useful because it names an issue that will get larger, not smaller, as projects scale:

"I found it challenging that as the code base grew the usage for each agent became higher and higher. I would keep spinning up new agents in order to combat this but this doesn't seem sustainable once you have a large enough code base with enterprise code." [S8 · HW9]

"You stop reading the code": the HW8 warning returns. Some

"This might just be me, but whenever I write a SPEC or write something agentically, I naturally never look that deep into the code." [S3 · HW9]

Mobile-specific asides worth sharing. One student each

"It might be more efficient to have a website demo running first, with backend (database, apis, etc) all configured, than start everything from scratch on a mobile dev." [S32 · HW9]

"If you ever need custom icons, typically the ai generated ones arent the best." [S28 · HW9]

"3D graphics is still really hard for Gen-AI." [S6 · HW9]

An honest admissions worth sharing. One student each

"I found it challenging to improve and figure out where to better the human interaction and ux capabilities and features of the app." [S10 · HW9]


How the class's toolkit shifted

Two pictures that tell the semester's story at a glance.

The same students, surveyed across nine assignments, didn't just get better at writing prompts; what they were prompting changed. Project 1 was done in a browser tab; by HW9 most of the class was inside an agentic IDE. The two figures below trace that transition, assignment by assignment.

Figure 1

Tool adoption by assignment (share of students naming each tool as one of the most useful)

Project 1
HW 2
HW 3
HW 4
Project 2
HW 6
HW 7
HW 8
HW 9
ChatGPT
26%
34%
36%
32%
40%
31%
18%
18%
13%
Claude
26%
34%
26%
27%
40%
37%
42%
15%
40%
Gemini
31%
44%
51%
51%
47%
43%
24%
15%
27%
GitHub Copilot
10%
7%
23%
11%
7%
9%
18%
52%
20%
Cursor
5%
7%
10%
14%
37%
23%
36%
30%
40%
Antigravity
·
·
8%
8%
13%
6%
12%
6%
13%
Three things jump out when the columns are in their correct chronological order. First, Project 2 is where Cursor makes its biggest single jump (from 14% on HW4 to 37%); but it doesn't stick: adoption dips again on HW6 (a SQL-learning assignment where students gravitated back to browser chatbots as tutors) before climbing steadily to 40% by HW9. Second, GitHub Copilot sits in the low teens for most of the semester and then more than doubles to 52% on HW8 (the agent-mode features were specifically what the agentic-development assignment wanted). Third, browser chatbots (ChatGPT / Claude / Gemini) don't disappear; they peak on Project 2 and HW6 and only collapse during the agentic arc (HW7–HW9), as students moved the bulk of their work inside the editor.

Figure 2

Student-reported satisfaction, understanding, and code-reading across the semester (1–7 scale)

2 3 4 5 6 7 Project 1 HW 2 HW 3 HW 4 Project 2 HW 6 HW 7 HW 8 HW 9 Satisfaction Understanding How much code was read
With the assignments in correct chronological order, a very clean pattern emerges. All three measures rise together through the first five assignments, peak at HW6 (satisfaction 5.3, understanding 4.8, code-reading 5.0 — the highest values the class ever reached), and then fall together across the three agentic-workflow assignments that end the semester. The gap between the red line (how happy students were with the result) and the gold line (how much of the code they actually read) widens visibly on HW8 and HW9, and several students named this gap in their comments — "I naturally never look that deep into the code." Understanding on HW9 (3.03 on a 7-point scale) is the lowest the class recorded all year. That is the data signature of a pattern worth naming in next year's course.

What to tell next year's class

Twelve practices the class earned by doing. Ordered from highest-evidence and latest-stage first.

What follows is a synthesis of the assignment-by-assignment themes above, weighted toward the later work where agentic tools and larger projects stressed the class's methods most. Each practice is tagged with where in the semester the strongest evidence for it came from.

1

Write a detailed plan: ideally a SPEC.md, before any code.

This is the most consistently-reinforced habit in the entire dataset. On Project 1 it was "have AI make a plan before it executes." On Project 2 it was "plan with Cursor, then code with Claude." By HW6 it was Claude writing function docstrings that GitHub Copilot then filled in. By HW8 it was a full SPEC.md with acceptance criteria. By HW9, students were calling the spec "the most important thing you can do with your time."

The important refinement from the end of the semester: match the specificity of the spec to how much you want the AI to decide. Too vague and downstream fixes become structural, which AI handles badly. Too detailed and the agent boxes itself in (and you've done the work of writing the code in English instead of Python). Acceptance criteria are read literally; if something isn't there, it won't ship.

Strongest on · Project 2 · HW 6 · HW 8 · HW 9

2

Use multiple models deliberately, for what each one is best at.

By the time Project 2 arrived (just past the halfway mark), roughly four tools were being used by roughly the same fraction of the class, because students had learned to use them together rather than choose between them. The cleanest articulation came from a Project 2 response: "ChatGPT to brainstorm and get ideas, Claude for a majority of the code, and Gemini to debug certain parts." Cursor-for-planning plus Claude-for-coding was another common split that persisted through the later assignments.

Practical hazards: different models have different conventions (variable names, framework choices, code style) and won't always agree; and none of them share memory across sessions. If it matters, pass the earlier model's output into the next one's prompt rather than hoping it'll pick up where the other left off.

Strongest on · HW 3 · HW 4 · Project 2 · HW 8

3

Move into IDE-integrated agentic tools by mid-semester.

The clearest data story of the course: Cursor adoption rises from 5% on Project 1 to 37% on Project 2, dips briefly on the database assignment, then climbs to 40% by HW9, and over 80% of students used at least one IDE-integrated agent on HW8. Browser chatbots are a legitimate starting point (they are genuinely simpler and better for reading, explaining, and learning); but large projects with multiple files, databases, deployment, or phone-app tooling are where the editor integration earns its keep.

A specific pattern worth naming: Plan Mode first, then Agent Mode. Several students across P1, HW2, and HW9 independently arrived at this as their go-to workflow in Cursor / GitHub Copilot.

Strongest on · Project 2 · HW 7 · HW 8 · HW 9

4

Separate concerns. One chat (or agent) per module.

S15's HW4 description is the template: "Backend, frontend, game logic, and architect chats. Every time you switch, you must tell it the current state of the project." Applied to HW6 this meant each CRUD operation in its own prompt. Applied to HW2 it meant multiple files rather than one monster file. Applied to HW7 it became a design principle for what to leave for the next person.

The cognitive load of keeping one clean context per concern is real, but much less than the cognitive load of debugging a single giant AI-written file.

Strongest on · HW 2 · HW 4 · HW 6 · HW 7

5

Build incrementally. Get to an MVP before you get fancy.

Present in some form on every assignment. The concrete advice: ask for one feature, one function, or one "change of similar kind" at a time. The prompt "build me a Crossy Road game" produced debugging nightmares; the prompt "add a scoring display to the top bar" got used and reused. For specs (HW8/9), "focus the MVP in the SPEC" is the same principle applied upstream.

Strongest on · HW 2 · HW 6 · HW 9 · every assignment

6

Keep reading the code.

The single most self-aware piece of advice from the late semester, named directly by multiple students on HW8 and HW9: agentic development creates a strong temptation to skim or skip the code entirely. The gap between Figure 2's satisfaction line (how happy you are) and its code-reading line (how much you actually read) on HW8–HW9 is the visual signature of this problem.

The HW6 counter-pattern (read the code, understand it, catch bugs through physical testing rather than trusting the AI's fixes blindly) is why HW6 scored highest on every understanding-related metric all year. The course's own advice applies: "don't just vibe-code until it's too complicated for you to grasp."

Strongest on · Project 2 · HW 6 · HW 8 · HW 9

7

Document for the next person and for the next you.

HW7 made this concrete: a clean README, a to-do list of remaining work, inline comments on non-obvious logic, modular file structure, and (one student added) a log of the prompts you used to get here. A sibling HW9 observation: "phases" in the README help the next agent as much as the next person. Students who inherited code with these things called them life-savers; students who inherited code without them did not use gentle language.

Strongest on · HW 7 · HW 9

8

Feed the AI concrete references, not descriptions.

Screenshots of broken UI beat text descriptions of the bug. Sketches in Figma beat paragraphs about layout. Links to the actual API documentation beat asking the model to remember what the API looks like. Example websites pasted in beat vibe-based style descriptions. This one pays dividends from P1 through P2 and gets only more valuable the more niche or recent the thing you're asking about.

Strongest on · Project 1 · HW 2 · HW 3 · Project 2

9

When the chat gets stuck, start a new one.

Context accumulates; eventually it sours. Multiple students across Project 1, HW8, and HW9 independently arrived at "spin up a new chat" or "spawn a new agent" as their most-used debugging move. The HW8 version includes "new agent to clean up the variable names the first agent left behind." The reverse of this practice (prompting harder inside the same polluted context) rarely works.

Strongest on · Project 1 · HW 8 · HW 9

10

Before pasting an error, spend 20 seconds reading it.

S22's framing on Project 1 stayed true all semester: "Even if you just spend 20 seconds looking at it, you will learn a lot more." HW4's version ("manually check your API calls with curl: the AI code ate the logs") is the same principle in a scarier context: when AI-written error handling swallows the error, the only path forward is to step outside the loop and look for yourself.

Strongest on · Project 1 · HW 2 · HW 3 · HW 4

11

Understand the platforms, not just the code.

Deployment ate more time than coding on HW4 and Project 2. Render spinning down on free tiers, Vercel forcing serverless-shaped code, Supabase table settings that aren't visible to the AI, Expo setup for phone apps; these are the places the AI stopped being useful and the student had to know what was happening. The Project 2 advice is worth taping to a wall: "Understand the platforms that you are using to prevent repetitive debugging."

Strongest on · HW 4 · Project 2 · HW 9

12

Guard your secrets and your rate limits.

API keys don't end up in .gitignore unless you tell the AI explicitly to put them there (HW3). Free-tier credits on Copilot, Claude, Gemini, and Render all ran out mid-assignment for someone in the class. The students who survived this unscathed tended to switch models when they hit a wall rather than keep pushing the same one, and tended to check what was in their commit history before pushing.

Strongest on · HW 3 · HW 4 · Project 2 · HW 6


One closing thought

There is a through-line in the second-half comments that is less tactical than the twelve practices above, and it's worth stating plainly: the students who learned the most were the ones who refused to let the AI think for them. The Project 2 reminder that sometimes the right answer is to read the documentation yourself. The DHH sandwich from HW6 (AI plans, you write, AI critiques). The SPEC.md balance-point from HW8 (specific enough to constrain, loose enough to leave judgement to the model). The HW9 warning that you'll "naturally never look that deep into the code" if you're not careful.

Future cohorts will inherit more capable tools; the heatmap above will look even more shifted toward agentic modes next year. The advice that will age best isn't about any specific tool; it's about keeping a hand on the steering wheel while the AI does more of the typing.