Newsgroups: comp.ai
From: stevem@comtch.iea.com (Steve McGrew)
Subject: Q re: Physical Objects as Characters
Organization: New Light Industries, Ltd.
X-Newsreader: News Xpress 2.0 Beta #2
Date: Thu, 07 Nov 96 23:09:36 GMT
NNTP-Posting-Host: spk1a-14.iea.com
Message-ID: <32826c58.0@news.iea.net>
Lines: 107
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornellcs!newsstand.cit.cornell.edu!portc01.blue.aol.com!newsxfer2.itd.umich.edu!www.nntp.primenet.com!nntp.primenet.com!howland.erols.net!newsfeed.internetmci.com!news.iea.net!sluggo

        Over the past few days I've engaged in an email discussion with a 
person from this newsgroup, in an attempt to get some coaching on a basic 
point of data compression.  The person was abusive and insulting from the 
outset.  I merely wanted to understand two things:

        1) why a series of random numbers would still be considered random if 
all the individually compressible numbers were removed, and
        2) why a long random string of bits would not be very slightly 
compressible by just waiting for a long unbroken subsequence of 1's to show 
up, then replacing that sequence with a new character followed by a 14-bit 
binary number to signify that number of 1's.

        The rude fellow asserted in effect that, even though the one 
subsequence may be compressed, the rest of the long sequence will necessarily 
be expanded.  I then asked him where the expansion would occur, since the only 
place the new character appears is in the compressed subsequence.  I proposed 
an example of an alphabet consisting of red, green and blue marbles, with a 
blue marble serving to announce that the next 14 (red & green) marbles 
represent (in binary) a number of repeated red marbles.

        He gave no reason for his assertion, though I got the impression that 
it's one of the basic theorems of information theory.  He gave no proofs and 
no references to books or articles containing proofs.  He did give a lot of 
abuse and profanity.  Then he insisted that red, green and blue marbles 
constituted an analog encoding.  

        But now I understand why he was so rude, abusive and loud:  his 
ability to present proofs is limited, and he makes up for it with his 
belligerent attitude.  He really should have either helped me or ignored me, 
rather than trying to mug me.

        After considering it a while, I think the answer he could have given 
for the second question above is that compression is a change of data density; 
and data density  is measurable as the ratio of information contained to 
information *capacity*.  A 3-character alphabet (of "trits") has more capacity 
per character than a 2-character alphabet, so an N-bit message needs to be 
converted to a smaller number (M) of "trits", where M=N*log3(2) if it is to 
keep the same density.  If M is larger than N*log3(2), then a net expansion 
has occurred.  In this sense, my suggested "compression" would have resulted 
in reduced data density, even though the number of characters would be 
reduced.

        For question (1), the correct answer seems to be that a series of 
random numbers, represented in a particular code (e.g., binary) is NOT really 
random if all the compressible numbers are deleted from the series; but that 
in the limit as the number of bits per number is increased (or, equivalently, 
as the size range of the numbers is increased or the resolution of each number 
is increased), the deviation from randomness becomes negligible since the 
number of compressible numbers is of measure zero. 

        Since I figured these out for myself rather than having a kindly guru 
lead me to them, I'd like some confirmation from someone---- or, equally good, 
an explanation of how and why my reasoning is wrong.

        If the above reasoning and conclusions are right, though, I would like 
to explore the issue of data compression in a real world system which 
contains real, distinctly different objects that can be used to represent 
data. For example, suppose we represent a "1" with a deuterium atom, a "0" 
with a helium 4 atom, and an "R" with an argon atom.  An original 
binary-encoded message would consist of a string D and H atoms with data 
encoded in the sequence.  That message would be converted into a different 
form as described in (2) above, by using the additional character, "A". 

        Imagine a naive observer who is handed the two versions of the 
message.  In the first version he sees a string of N atoms.  In the second one 
he sees a string of M atoms, where M<N. He doesn't know what the message is, 
or what the meaning of the  A is in the second version, but he knows ahead of 
time that both versions contain the same message.  It would be very hard for 
him to believe, though, that the second version of the message has a *lower* 
information density, since he can clearly see that it contains fewer 
characters.  If it is correct to say that there is a net *reduction* of data 
density in the second case compared to the first, then somehow the data 
capacity of the original two characters, D and H, has been *increased* by 
adding the third character, A.  However, that does not seem likely, since the 
atoms themselves have not been altered.
        
        If someone points out that in the first version the alphabet did, 
after all, contain the "A" even though it was not used,  then it is easy to 
conclude that there was a net *increase* of data density (i.e., net 
compression) since the information capacity of the characters has not been 
changed but the number of characters in the message has decreased.  The 
important distinction between the physical case and the abstract case seems to 
be that in the physical world we can use real, indivisible objects as 
characters; while in the abstract world each character is ultimately 
decomposable to a string of bits.  

        At the moment, I don't see how the two (abstract vs physical) 
encodings can be treated the same, since in the above situation the physical 
and abstract encodings seem to change their data densities in opposite 
directions. 

        My question: How does information theory reconcile this apparent 
fundamental difference between abstract binary encodings and encodings based 
on real physical objects? Certainly the decompression system needs to contain 
more information a priori in order to read the second version of the message, 
regardless of whether the message is sent as a temporal sequence of voltages 
or as a string of atoms.  So, should the information required to decompress a 
message be included in the formula for calculating data compression?   

Steve

==================================================================
| Steve McGrew, President    |   stevem@comtch.iea.com           |
| New Light Industries, Ltd. |   Phone: (509) 456-8321           |
| 9713 W. Sunset Hwy	     |   Fax: (509) 456-8351             | 
| Spokane, WA 99204 USA	     |   http://www.iea.com/~nli         |
==================================================================
