Welcome
Welcome to 15-213. We're really exited to be here, and we hope that
you are, too. Randy Bryant and I (Gregory Kesden) will be co-teaching
the course this semester. I'm particularly exited, because it is
my first time teaching the course.
I've taught the OS course as either the lab instructor or instructor
for five of the last six semester. And, in that time, I have really
observed the impact of 15-213. We now require it as a prerequisite
for the other systems classes -- and with good reason. The impact
has been tremendous. I'm very exited to be here, because I'll
get to see what is "inside of the box" and how the magic happens.
So, Why Are We Here?
So far in the curriculum, you have studied computer science from
the perspective of many different and powerful abstractions.
You have studied High Level Languages (HLLs) such as
Java. You have learned to specify and use Abstract Data Types
(ADTs), such as classes. You have also studied powerful generic
algorithms for solving problems, and learned to analyze their
performance through amortized analysis and outer-bound complexity
analysis.
No doubt about it, through abstraction you have, and will continue,
to learn many tremendous things. But, what does abstraction mean?
Abstraction: From Latin, ab-stractus and ab-strahere.
The prefix ab- means "from or away" and the roots "stractus" and
"stratahere" mean "to drag or pull".
Abstract (V) means "to drag or pull away". And Abstraction (N)
is what is left after something has been pulled away and isolated
from its own, or any, real circumstances.
That's what we've done so far. We've dragged important ideas away
from some distracting realities to make them clearer. But, in
the real word, the ugly details sometimes matter. That's why we're
here. We here to learn about real computer -- the machines, themselves,
with all of their real world parts and esoteric features.
Let's take a few minutes to take a look at a few examples of
real world details that are inconsistent with common abstraction.
I think these examples will illustrate at least a few good reasons
to study the inner workings of real-world hardware and software
systems.
"ints" are not Integers and "floats" are not Reals
In the abstract, "ints" are just like Integers. But, in reality,
nothing is infinite. All of our data types have finite capacities.
If an "int" gets too big for its storage, it no longer models an
integer. Its value may change irratically -- and may even become
negative. This is what we call "overflow".
Many people dismiss overflow and blame the problem on the machine.
But the machine is not a fault -- it is functioning correctly.
Instead, the problem is that the abstraction doesn't exactly match
the real word a machine.
Here's another one. I coach the our team for the world-wide ACM
programming competition. Although it was slightly before my time,
the team actually lost the world finals in Amsterdam, because of a
problem like this. And last year, when we finished 20-something
in the world, we missed out on top-10 standing because of an
error like this.
Here's the question:
(x + y) + z ?= x + (y + z)
Almost everyone in the class immediately reacted "yes" -- and a few
"associative property of addition". But, this isn't necessarily
the case if x, y, and z are float point numbers.
Floating point numbers are designed to make the best use they can of
limited storage. We'll take a look at the details very soon. But for
now, let's think of things this way. If the numbers are very small,
the decimal place floats left and the bulk of the space is used to
store the fraction with great precision. If the number is very
large, the decimal place floats right, and the space is used to
track the integer component of the number. As the decimal place moves
around with computation, "rounding error" is introduced.
In the example above, we can run into problems if the numbers are
of vastly different sizes. Due to "rounding error", x and y might not
cancel unless they are added directly together. The left side and
right side may have slightly different values.
This is why we tell the programming team that they are never
allowed to use == with floats or doubles. Instead they should
subtract and check to make sure that the result is less than some
epsilon. This makes sure that "noise" doesn't become meaningful.
I also expereinced a related problem when doing research with
neural networks. The decision to implement the network using
floating point number instead of integers lead to a problem with
rounding error dominating (almost) the computation. We fixed it by
scaling and normalizing our input to a range that suffered less
rouding...but the real solution (pardon the pun) would have been
to recognize the problem and use ints. ints can represent just as
wide a range of numbers, but don't have the rounding problems.
Memory is complex. And it comes in all different shapes, sizes, and
styles
Computers have different types of storage. Disk. Main memory. L2-cache,
L1-cache. Registers. They work together, and in the common case
can appear to form a nearly-ideal storage system. But, in certain
edges cases, things may fall apart. Although caches, small amounts
of very fast memory, are designed to hold onto the data that is
most critically needed, in reality, the process is speculative and
reactive, not perfect. Bad memory access patterns can ruin a cache's
effectiveness. You'll see this in the optimization lab.
Many people take the effectiveness of the memory system for granted.
Most of the time, it works well by itself. But, if your application
is one that breaks it -- you will suffer unless you know how to fix
it. I had this experience when I was doing Automated Target
Recognition (ATR) research for a company called Neural Technologies,
Inc. (NTI). Prior to using neural networks to classify images, we
did extensive preprocessing. This preprocessing used wavelet filters
that looked at picel values, performed some computation, and then
made adjustments. It had to consider each picel in the image many,
many times. Memory access latency -- often from cache misses --
contributed to weeks worht of delay during testing. Perhaps if we had
paid more attention to this part of our design before the problem
got so large, we could have reduce the problem (or, perhaps
that was just as good as it got).
Another memory-related reality is that objects that are unrelated in
your mind might be right-next-door in memory. This doesn't matter,
until something goes wrong and one object scribbles outside of its
box into a neighboring objects memory. The results can be very, very
hard to debug. Unless you have an understanding about how memory is
organized and where objects live within memory, it can be hard to
find the offending code.
My longest all-nighter in college or grad school was the result of
an error like this -- 4 days straight! I actually suffered dilusions.
Only an effective understanding of memory -- an skill using tools
like debuggers -- can make this type of situation managable.
Assembly Is the Only Language the Computer Understands
We usually program in a HLL like C, C++, or Java. We do this, because
these languages map well to our problem domain. Unfortunately, they
don't map well to the actual orgnaization of the machine. So, they
are translated by the compiler through the compilation process, into
assembly code (and ultimately machine code, which is roughly equivalent)
so the computer can execute the instructions. Although we're not likely
to program a lot in assembly it is true: no high-level language can
be more powerful than assembly.
But, more importantly, we may need to look at assembly from time to
time. It is a useful debugging tool. As a graduate student, I found
an incorrect optimization in the Sun workshop C compiler by
reading a piece of the assembly. The problem cause my team's project
to hang occasionally if it was compiled with "-O". Without the ability
to do this, we would have been dead in the water -- our running
hypothesis had been a pointer-related memory error.
There are also other times when assembly might be useful. Sometimes
you find yourself debugging an application that is linked against
someone else's libraries. Although debugging in this environment isn't
fun, being able to read the assembly can often provide valuable hints
about what is happening "at the other side" (as can looking at the
names of variable preserved as part of debugging information).
Perhaps most importantly, studying assembly helps us to learn how
the compiler works. By understanding that process, we can write better
code in HLLs. We can select constructs that map better to the
hardware, avoid constructs that don't optimize well, &c.
I/O is critical
Computers are pretty close to useless unless they can exchange
information with their environment. But, there is little in the
computer world that is less diverse or more essoteric than
I/O devices. From disks to networks, each device, and the software
systems we build around them, can have a very different "personality".
Only by understading the machinery can we write software that is
truely correct.
I shudder to think about how many "network errors" are really
programming errors. The same goes for "heat related" problems
associated with disks. The edge cases are tough and programmers
often squint and ignore them -- but for truely important or
dependable systems, this isn't a valid approach. Things must
really be done right.
I probably should also note that many problems are also caused
by incompatible or mis-used communiation paradigms. Again here,
many programmers assume an idea world -- even though bad things
can happen. The overhead of communication can easily cripple
a system -- and the lack of communication can leave resources
valuabel resources idle.
In the Real-World, Errors Happen
Typically, we just ignore the error cases when we code. They probably
won't happen. And, besides that, coding to handle them takes time
and makes the code look uglier.
But, in real world systems, errors are common place. Data is corrupted,
communications fail. Users do stupid things. Life happens. We need
to be able to manage errors and problems, instead of allowing them
cause catastrophic failures.
Syllabus
We talked our way through the class
syllabus and
also about the course
schedule.
They are available on-line.
|