# 15110 Fall 2012 [Touretzky/Kaynar]

## Problem Set 7 - due Monday, October 22 in class

Read sections 7.1-7.6 in chapter 7 of the textbook Explorations in Computing and chapter 3 of Blown To Bits.

### Instructions

• Type or neatly write the answers to the following problems.
• On the first page of your homework, include your name, andrew ID, lab section (A-N), and the assignment number.
• You must hand in your homework at the start of class on the given due date.

### Exercises

1. (2 pts) Consider the 8-bit value 11000011. This might be interpreted as an unsigned integer, signed integer, or ASCII character transmitted with even parity.

1. What is its value if it is interpreted as an unsigned integer? Show your work.

2. What is its value if it is interpreted as a signed integer? Show your work.

3. What is its the value if it is interpreted as an ASCII character where the leftmost bit is the parity? Notice that the code of the ASCII character is the 7 bits following the leftmost (parity) bit. You may find it useful to convert the last 7 digits to hexadecimal and then use the ASCII table from lecture slides.

4. If the binary value above represented a character and was transmitted and then received as 11000111, would the receiver be able to detect an error, why or why not?

5. If the binary value above represented a character and was transmitted and then received as 11001111, would the receiver be able to detect an error, why or why not?

2. (1 pt) As we discussed in class, floating point numbers are stored in binary using various formats, and one popular format is IEEE-754. How are the following values stored in binary using IEEE-754 format?

1. The binary value: 1110.0101
(HINT: First, convert this into the following form: 1.______ × 2exponent)

2. +infinity
(HINT: You can use the Internet to research this question.)

3. (1 pt) (Typo corrected) As discussed in class, we can use the notion of parity to detect when an error occurs after transmission of data. Universal Product Codes (UPCs) also use this idea, often called a check digit. A UPC barcode is made up of 11 digits followed by a check digit. The purpose of a check digit is to verify that a barcode reader reads the code correctly. The barcode reader's decoder calculates the check digit by performing a series of mathematical operations on the digits that precede it. If the calculated digit and the check digit in the barcode match, the reader concludes that it has read the barcode correctly.

The barcode reader calculates the check digit using the following algorithm:

1. Add the first, third, fifth, seventh, ninth and eleventh digits of the UPC barcode and store this result in x.
2. Add the second, fourth, sixth, eighth and tenth digits of the UPC barcode and store this result in y.
3. Set z equal to the last digit of the value 3x + y.
4. If z is not equal to 0, then set check_digit equal to 10 - z. Otherwise, set check_digit equal to z.

Do the following:

1. Determine the check digit needed to complete the following UPC barcode:

```78013713559_
```
2. Given an example of a 12-digit erroneous barcode and explain why it is erroneous.

4. (3 pts) Using the Huffman tree for the Hawaiian alphabet shown on page 186 of your textbook (and in our course slides):

1. Decode the following Hawaiian word that was encoded into binary using the given Huffman tree: 0001011000111110000100110.

Once you decode the word, you can use RubyLabs to check your answer:

```>> include BitLab
=> Object
=> {"K"=>0.106, "W"=>0.009, "L"=>0.044, ... }
>> tree = build_tree(f)
=> { 1.000 { 0.428 { 0.195 { ... } } } }
>> hc = assign_codes(tree)
=> {"K"=>001, "W"=>110010, "A"=>10, ... }
=> 0001011000111110000100110
```

See sections 7.4 and 7.5 for more information on how to build a Huffman tree, and how to encode and decode messages using RubyLabs.

(NOTE: Be sure you understand what each of the steps are doing above. Read through the relevant sections for a thorough explanation and try out the tutorial in the textbook for yourself.)

2. The Hawaiian alphabet has 13 characters (5 vowels, 7 consonants and 1 apostrophe). If we used a fixed-width encoding for characters (i.e. every character is encoded using the same number of bits), we would need a 4-bit code for every character so we could get at least 13 unique codes for the 13 characters of the Hawaiian alphabet:

```A   0000       U   0100       L   1000       W   1100
E   0001       '   0101       M   1001
I   0010       H   0110       N   1010
O   0011       K   0111       P   1011
```

Go to Popular Hawaiian Words and Phrases and scroll down to the vocabulary list at the end. Find a word that would be longer if encoded with the Huffman tree than with the 4-bit fixed-width code above. (Hint: look at words that use lots of low-frequency letters.) Give the encoding using the Huffman tree and the 4-bit fixed-width code above to justify your answer.

3. In general, when do you get shorter binary encodings using Huffman trees?

5. (1 pt) Image encodings:

1. The RGB color for "Sea Green" is given in hexadecimal as 2E8B57. Express the red, green and blue values of this color in decimal. Show your work.

2. An image format takes a 24-bit RGB image and compresses it as follows. For each pixel, we average the red, green and blue components and store only this average. This yields a file that is one-third the size of the original image file. Is this compression algorithm a lossless or lossy compression technique? Explain.

6. (2 pt) Answer the following questions about data representation based on Chapter 3 of Blown to Bits.

1. Document redaction by black highlighting is ineffective and has led to several politically embarassing incidents. One redaction method that is guaranteed to work is to print out the document, black out the sensitive sections with a real magic marker, and then scan the pages. What are two disadvantages of this approach?

2. Why should government agencies be required to use "open" document formats instead of proprietary formats such as Microsoft's doc files?