Newsgroups: comp.ai,comp.ai.philosophy,comp.ai.alife
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!news.sprintlink.net!news-peer.sprintlink.net!howland.erols.net!www.nntp.primenet.com!nntp.primenet.com!netcom.com!jqb
From: jqb@netcom.com (Jim Balter)
Subject: Re: rand() - implementation ideas [Q]
Message-ID: <jqbE081H6.KKE@netcom.com>
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
References: <54lr8o$ndm@nntp.seflin.lib.fl.us> <P1ZDtGA4sIeyEwCM@wandana.demon.co.uk> <jqbE05zJw.BAK@netcom.com> <gLjoCMARBmeyEwFY@wandana.demon.co.uk>
Date: Sat, 2 Nov 1996 02:27:06 GMT
Lines: 65
Sender: jqb@netcom23.netcom.com
Xref: glinda.oz.cs.cmu.edu comp.ai:41833 comp.ai.philosophy:48152 comp.ai.alife:6796

In article <gLjoCMARBmeyEwFY@wandana.demon.co.uk>,
Jim Barr  <JimBarr@wandana.demon.co.uk> wrote:
>In article <jqbE05zJw.BAK@netcom.com>, Jim Balter <jqb@netcom.com>
>writes
>>In article <P1ZDtGA4sIeyEwCM@wandana.demon.co.uk>,
>>>I would go so far as to say ALL large patterns CAN be compressed even when NO
>>>global pattern can be found, simply because if you take small enough local
>>>sections of the stream, there are bound to be repeating groups;  These 
>>repeating
>>>groups MAY be reverse engineered to create an algorithm.....
>>
>>There is a fallacy here.  I suggest you check out the comp.compression FAQ.
>>The number of sequences that can be expressed in n bits is 2**n.  Any
>>algorithm that compresses some to a smaller number of bits will necessarily
>>expand others to a greater number of bits.
>
>I may be wrong, I understand that certain compression techniques rely
>soley on the basis of repeating groups.

But they aren't *universal*; they will always make *some* strings longer, not
shorter.  Compression is based upon removing redundancy; compression works in
practice because the *interesting* data we deal with is full of redundancy.  A
really good compression algorithm will compress almost any interesting string
and only expand uninteresting, information-free strings that we don't care
about.

Think about this: you have to represent a repeating group somehow; but however
you represent it, *that* string is(the representation) has to be represented
*differently* or there is an ambiguity.  So if you represent 111, which is a
repeating group, as 13, how do you represent 13, which isn't a repeating
group?  Whenever you push something in, something else pops out.

If that weren't the case, I could take the contents of my disk, compress it,
then compress the result, ... and reduce the whole thing down to 1 bit.  If
you think universal compression only applies to "large patterns", how large?
What is the magic number of bits such that all messages can be compressed to
no more than that number of bits?  The point above about 2**n sequences
*should* make it obvious that there is no such number, to anyone willing to put
aside their intuitions.

>The point is hat I should not have used the term compression as I was
>trying to show that in ANY random sequence there WILL be repeating sub
>sequences

That doesn't say anything useful.  Any sequence of 4 bits has a least one
repetition of a 1- or 2- bit sequence.

>I apologise for the error, is my qualified statement more acceptable or
>have I missed the point.

It's trivially true; I'm not sure what you want to follow from it.

>What do you think of the other point I made, regarding the value of
>predictable RNG's?

I forget.

Really, if you care about this stuff, you might start with looking the
comp.compression FAQ, which has pointers to information theory, noise,
predictability, algorithmic complexity, and the like.  Good intuitions are
based upon knowledge.

-- 
<J Q B>

