Solutions to Example Floating Point Problem
Part I A:
a) Bias = 2^2 - 1 = 3
E = 1 - Bias = -2
b) When all three frac bits are 1, we have
1/2 + 1/4 + 1/8 = 7/8
Part I B:
a) We get the smallest value of E when exp = [0 0 1]
(remember that we can't have exp = [0 0 0] for
normalized numbers).
To calculate E, we take the number that the binary
sequence [0 0 1] represents and subtract the Bias.
We get E = 1 - Bias = -2
b) We want to make exp as large as possible without
hitting infinity. We get this when exp = [1 1 0]
(exp = [1 1 1] would be infinity).
E = - Bias = 6 - 3 = 3
c) We get the largest value of M when all frac bits
are 1. We get the fraction:
1 + 1/2 + 1/4 + 1/8 = 15/8
The extra 1 comes from the fact that we get a 1 free
for normalized numbers.
Part II: Since all the numbers (except NaN) are positive,
the sign bit is always 0.
Zero:
Zero (positive zero) is encoded by all zeroes.
Because we have a denormalized number, E = 1 - Bias = -2.
-2 0 0 0 000 000
Smallest Positive (Nonzero):
We want to make the number as small as possible,
so we should leave exp bits set to [0 0 0]. We
must change a frac bit to 1, though, or
else we would get the floating point representation
for 0. Make the least significant frac bit 1.
-2 7/8 1/32 0 000 001
Largest denormalized:
We want to make the largest number possible while
still having all exp bits set to 0. Simply set all
frac bits to 1.
-2 7/8 7/32 0 000 111
Smallest positive normalized:
We must increase exp to [0 0 1], or we would have a
denormalized number. However, we can leave the frac
bits at 0.
-2 1 1/4 0 001 000
One:
Since the smallest normalized number is 1/4 (see above),
we know that 1 must be normalized. We know that the M
for a normalized number has a free 1, so we can leave the
frac bits at [0 0 0]. Now we must find what to use for
exp. We know that the final floating point number will
have a value of M * 2^E, and that M = 1. So we need
2^E = 1 --> E = 0. We know that E = - Bias,
so we must set to the Bias. The Bias is
3 = [0 1 1], so exp = [0 1 1].
0 1 1 0 011 000
Largest finite number:
We want to make exp and frac as big as possible without having
exp = [1 1 1] (since this would be infinity). Make all frac
bits 1. Make exp = [1 1 0].
3 15/8 15 0 110 111
NaN:
There are many possible answers. Just set all exp bits to 1
and set some frac bits to 1.
- - NaN 0 111 111
Infinity:
Only one way to do this. Set all exp bits to 1 and frac bits
to 0.
- - infty 0 111 000