Name(s): ________________________________________________
15-494/694 Cognitive Robotics Lab 8: Convolutional Neural Networks
I. Software Update and PyTorch Setup
At the beginning of every lab you should update your copy of the
vex-aim-tools package. Do this:
$ cd ~/vex-aim-tools
$ git pull
In addition, you will need to install PyTorch and MatPlotLib. First, activate the
same Python virtual environment you use to run simple_cli. Then do the following:
$ pip install torch torchvision matplotlib
Note: if you're using Virtual Andrew and run out of disk space, you
may need to create a Python environment in C:\Users\myuserid instad of
in the default Desktop location in the andrew.ad.cmu.edu file
system.
II. Experiments with the MNIST Dataset and Linear Models
You can do this part in teams of two if you wish. When answering the
questions below, you are encouraged to refer back to the lecture
slides.
- Make a lab8 directory.
- Download the files
mnist_data.zip,
mnist1.py,
mnist2.py,
mnist3.py
into your lab8 directory.
- Unzip the mnist_data.zip file.
- Skim the mnist1.py source code. This is a linear neural
network with one layer of trainable weights.
- Have a look at
the PyTorch
documentation, and specifically the documentation
for torch.nn.Linear.
- Run the model by typing "python3 -i mnist1.py". The "-i" switch tells python not to exit
after running the program. Press Enter to see each output unit's weight matrix, or
type control-C and Enter to abort that part.
- Try typing the following expressions to Python:
- model
- params = list(model.parameters())
- params
- [p.size() for p in params]
The first parameter is the 784x10 weight matrix; the second one is the 10 biases.
- How long did each epoch of training take, on average? ________________
- If your laptop has a GPU, modify the model to use the GPU
instead of the CPU. (You just have to uncomment one line and
comment out another.)
- Run the model on the GPU if you can. How long does each epoch
take now? ________________
Are you surprised? GPUs don't
help for small models. A few thousand weights is small.
- If you run mnist1 a second time, you won't get exactly the same result. Give
two reasons for this: ________________________________________________
________________________________________________________________
- Skim the code for the mnist2 model. This model has a hidden
layer with 20 units. Each hidden unit is fully connected to the
input and output layers.
- Run the mnist2 model on the CPU. How long does each epoch of
training take, on average? ________________
- You can use the show_hidden_weights() and show_output_weights() functions to display
the learned weights.
- If you have a GPU available, modify the mnist2 code to run on
the GPU. How long does each epoch take now? ________________
III. Experiments with the MNIST Dataset and a Convolutional Model
You can do this part in teams of two if you wish.
- Skim the code for the mnist3 model.
- Run the model on the CPU. Look at some of the kernels the
model learns.
- How many parameters does this model have, where each parameter
is a tensor? ________________
- What are the parameters of this model? Describe them in
English. ________________________________________________
________________________________________________________________
- Note that two of the parameters are batch normalization values
(means and variances) created by the BatchNorm2D layer. The rest
are weights. (Biases are considered to be weights.) Looking at
the sizes of the various weight and bias tensors, how many total
weights does this model have? Show your calculation.
____________________________________
A convolutional neural network is a "virtual" network where each
kernel is replicated many times, but we don't actually build out
all the units and connections as individual data structures, since
they share the same weights. When running data through the
network, though, we still have to do all the multiply and
accumulate operations as if we had built out the network, so the
number of "effective" weights is many times the number of weight
parameters. How many effective weights are in the mnist3 model?
Show your calculation.
________________________________________________
- If you are able to run this model on the GPU, how long
does each epoch of training take, on average? ________________
IV. Object Recognition with MobileNet
You can do this part in teams of two if you wish.
- Run the MobileNet demo
on the robot. Note: to install this demo you must download both MobileNet.fsm
and the labels.py file found in the same directory.
- Use your cellphone to call up a picture of a cat and show it to the robot.
- Type "tm" to tell the program to proceed with recognition. Did it recognize the cat?
- Try some dog breeds, and some other object classes such as airplanes or cars.
V. Homework
These homework problems should be done individually, not as a team.
- Teach Celeste to use Flash(). You implemented the
Flash() node in lab 4. We now have a version of Flash included in
nodes.py. By constructing the right list structure, it's possible
to generate complex flashing behaviors, such as the alternating
red and blue example given in the lab:
blue = vex.Color.BLUE
red = vex.Color.RED
Flash([ ((blue, red, blue, red, blue red), 2),
((red, blue, red, blue red, blue), 2) ])
For this FlashPattern problem, create a modified version of
GPT_test by implementing a #flash action and teaching Celeste to
use it as part of your preamble. If successful, you should be
able to describe a flashing pattern to Celeste in plain English
and have the robot perform that pattern. For example, you might
say "Flash all LEDs alternately green and off." This is an
open-ended problem, since there is no limit to the complexity of
potential patterns. But see how far you can get using a carefully
crafted prompt. Hopefully you can get Celeste to implement the
red and blue example above. Your prompt may need to explain
concepts like "alternating positions" and give an example or two
of each concept. You can make up any syntax you like for the
#flash action Celeste will generate.
To keep things simple, you can limit yourself to the built-in
colors like vex.Color.RED instead of dealing with arbitrary
(r,g,b) triples, and don't worry about specifying the duration of
each pattern. You can use a fixed duration for the Flash action
so that the Flash node will post a completion event.
- How good is GPT-4o at recognizing digits? Download the
file Lab8a.fsm, which asks GPT to look at
an image and report the digits that it sees. This program uses a
new node GPTOneShot that is simpler than AskGPT: it does not
include the preamble, maintain the conversational history, or
include world map information. But it can send an image along
with the query.
Write a program ReadNumber based on Lab8a that tests
GPT-4o's digit recognition abilities. It should look in the
camera image and return an integer based on the digits it sees.
You will need to modify the query text to make GPT's answer less
verbose, and do some post-processing of the response to convert
the string to an integer. The robot should speak the integer.

Test your program by generating images of digits and seeing how
well it does. You can point the robot at your laptop screen or
phone to show it the images. How well does it work if the digits
are in a strange font, such as a script or Gothic or OCR font?
What if they're upside-down? What if they're hand-drawn like in
the image above? Try at least six different images. Note: you
can take a snapshot of the camera viewer by typing "s" in the
camera viewer window.
Hand In
Hand in your written responses to the MNIST questions at the end of
today's lab. Be sure to put your name on the sheet
For FlashPattern, hand in your source code plus a brief writeup
listing the requests you gave to Celeste that it was able to
successfully implement. Grading will be based on the richness of the
patterns your program can handle.
For ReadNumber, hand in your source code plus a set of images you
tried, and the results for each image. It's okay of GPT doesn't get
every image correct. For example, it may have trouble with
upside-down digits. Grading will be based in part on how wild a
variety of digit images you generate.
|