Name(s): ________________________________________________

15-494/694 Cognitive Robotics Lab 8:
Convolutional Neural Networks

I. Software Update and PyTorch Setup

At the beginning of every lab you should update your copy of the vex-aim-tools package. Do this:
$ cd ~/vex-aim-tools
$ git pull
In addition, you will need to install PyTorch and MatPlotLib. First, activate the same Python virtual environment you use to run simple_cli. Then do the following:
$ pip install torch torchvision matplotlib
Note: if you're using Virtual Andrew and run out of disk space, you may need to create a Python environment in C:\Users\myuserid instad of in the default Desktop location in the andrew.ad.cmu.edu file system.

II. Experiments with the MNIST Dataset and Linear Models

You can do this part in teams of two if you wish. When answering the questions below, you are encouraged to refer back to the lecture slides.
  1. Make a lab8 directory.

  2. Download the files mnist_data.zip, mnist1.py, mnist2.py, mnist3.py into your lab8 directory.

  3. Unzip the mnist_data.zip file.

  4. Skim the mnist1.py source code. This is a linear neural network with one layer of trainable weights.

  5. Have a look at the PyTorch documentation, and specifically the documentation for torch.nn.Linear.

  6. Run the model by typing "python3 -i mnist1.py". The "-i" switch tells python not to exit after running the program. Press Enter to see each output unit's weight matrix, or type control-C and Enter to abort that part.

  7. Try typing the following expressions to Python:
    • model
    • params = list(model.parameters())
    • params
    • [p.size() for p in params]
    The first parameter is the 784x10 weight matrix; the second one is the 10 biases.

  8. How long did each epoch of training take, on average? ________________

  9. If your laptop has a GPU, modify the model to use the GPU instead of the CPU. (You just have to uncomment one line and comment out another.)

  10. Run the model on the GPU if you can. How long does each epoch take now? ________________
    Are you surprised? GPUs don't help for small models. A few thousand weights is small.

  11. If you run mnist1 a second time, you won't get exactly the same result. Give two reasons for this: ________________________________________________
    ________________________________________________________________

  12. Skim the code for the mnist2 model. This model has a hidden layer with 20 units. Each hidden unit is fully connected to the input and output layers.

  13. Run the mnist2 model on the CPU. How long does each epoch of training take, on average? ________________

  14. You can use the show_hidden_weights() and show_output_weights() functions to display the learned weights.

  15. If you have a GPU available, modify the mnist2 code to run on the GPU. How long does each epoch take now? ________________

III. Experiments with the MNIST Dataset and a Convolutional Model

You can do this part in teams of two if you wish.
  1. Skim the code for the mnist3 model.

  2. Run the model on the CPU. Look at some of the kernels the model learns.

  3. How many parameters does this model have, where each parameter is a tensor? ________________

  4. What are the parameters of this model? Describe them in English. ________________________________________________
    ________________________________________________________________

  5. Note that two of the parameters are batch normalization values (means and variances) created by the BatchNorm2D layer. The rest are weights. (Biases are considered to be weights.) Looking at the sizes of the various weight and bias tensors, how many total weights does this model have? Show your calculation. ____________________________________

    A convolutional neural network is a "virtual" network where each kernel is replicated many times, but we don't actually build out all the units and connections as individual data structures, since they share the same weights. When running data through the network, though, we still have to do all the multiply and accumulate operations as if we had built out the network, so the number of "effective" weights is many times the number of weight parameters. How many effective weights are in the mnist3 model? Show your calculation. ________________________________________________

  6. If you are able to run this model on the GPU, how long does each epoch of training take, on average? ________________

IV. Object Recognition with MobileNet

You can do this part in teams of two if you wish.
  1. Run the MobileNet demo on the robot. Note: to install this demo you must download both MobileNet.fsm and the labels.py file found in the same directory.

  2. Use your cellphone to call up a picture of a cat and show it to the robot.

  3. Type "tm" to tell the program to proceed with recognition. Did it recognize the cat?

  4. Try some dog breeds, and some other object classes such as airplanes or cars.

V. Homework

These homework problems should be done individually, not as a team.

  1. Teach Celeste to use Flash(). You implemented the Flash() node in lab 4. We now have a version of Flash included in nodes.py. By constructing the right list structure, it's possible to generate complex flashing behaviors, such as the alternating red and blue example given in the lab:
    	blue = vex.Color.BLUE
    	red = vex.Color.RED
    	Flash([ ((blue, red, blue, red, blue red), 2),
    	        ((red, blue, red, blue red, blue), 2) ])
        
    For this FlashPattern problem, create a modified version of GPT_test by implementing a #flash action and teaching Celeste to use it as part of your preamble. If successful, you should be able to describe a flashing pattern to Celeste in plain English and have the robot perform that pattern. For example, you might say "Flash all LEDs alternately green and off." This is an open-ended problem, since there is no limit to the complexity of potential patterns. But see how far you can get using a carefully crafted prompt. Hopefully you can get Celeste to implement the red and blue example above. Your prompt may need to explain concepts like "alternating positions" and give an example or two of each concept. You can make up any syntax you like for the #flash action Celeste will generate.

    To keep things simple, you can limit yourself to the built-in colors like vex.Color.RED instead of dealing with arbitrary (r,g,b) triples, and don't worry about specifying the duration of each pattern. You can use a fixed duration for the Flash action so that the Flash node will post a completion event.

  2. How good is GPT-4o at recognizing digits? Download the file Lab8a.fsm, which asks GPT to look at an image and report the digits that it sees. This program uses a new node GPTOneShot that is simpler than AskGPT: it does not include the preamble, maintain the conversational history, or include world map information. But it can send an image along with the query.

    Write a program ReadNumber based on Lab8a that tests GPT-4o's digit recognition abilities. It should look in the camera image and return an integer based on the digits it sees. You will need to modify the query text to make GPT's answer less verbose, and do some post-processing of the response to convert the string to an integer. The robot should speak the integer.


    Test your program by generating images of digits and seeing how well it does. You can point the robot at your laptop screen or phone to show it the images. How well does it work if the digits are in a strange font, such as a script or Gothic or OCR font? What if they're upside-down? What if they're hand-drawn like in the image above? Try at least six different images. Note: you can take a snapshot of the camera viewer by typing "s" in the camera viewer window.

Hand In

Hand in your written responses to the MNIST questions at the end of today's lab. Be sure to put your name on the sheet

For FlashPattern, hand in your source code plus a brief writeup listing the requests you gave to Celeste that it was able to successfully implement. Grading will be based on the richness of the patterns your program can handle.

For ReadNumber, hand in your source code plus a set of images you tried, and the results for each image. It's okay of GPT doesn't get every image correct. For example, it may have trouble with upside-down digits. Grading will be based in part on how wild a variety of digit images you generate.


Dave Touretzky