Newsgroups: comp.ai.philosophy
From: Lupton@luptonpj.demon.co.uk (Peter Lupton)
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!news.sprintlink.net!demon!luptonpj.demon.co.uk!Lupton
Subject: Re: Strong AI and consciousness
References: <3bg9o6$mmq@newsbf01.news.aol.com> <340778893wnr@luptonpj.demon.co.uk>
Distribution: world
Organization: No Organisation
Reply-To: Lupton@luptonpj.demon.co.uk
X-Newsreader: Newswin Alpha 0.6
Lines:  125
Date: Wed, 30 Nov 1994 22:47:20 +0000
Message-ID: <637666762wnr@luptonpj.demon.co.uk>
Sender: usenet@demon.co.uk

In article: <3bg9o6$mmq@newsbf01.news.aol.com>  jrstern@aol.com (JRStern) writes:
> 
> In <340778893wnr@luptonpj.demon.co.uk> Lupton@luptonpj.demon.co.uk (Peter
> Lupton) writes:
> <
> In article: <3b1h3j$4tb@news1.shell>  hfinney@shell.portal.com (Hal)
> writes:
> > 
> > 
> > I wonder if the question of whether a given computer is running a given
> > program is really as unanswerable as has been assumed in this thread.
> > Maybe Algorithmic Information Theory (as discussed in the writings of
> > Chaitin or Kolmogorov) can shed some light on the matter.
> > 
> 
> I think you are absolutely right!
> >
> 
> Could one of you gentlemen say more?
> 
Sure. 

Algorithmic Complexity (AC) - aka Algorithmic Information Theory aka 
Kolmogorov Complexity - can be viewed as an abstract version of
data compression:
 
take a sequence of bits s and find the shortest program p (wrt
some universal computer U) which produces s.

The complexity of s (wrt U) is defined as the number of bits in p.

This theory can be seen as a vast generalisation of Shannon's
Information Theory - information theory applies only to
distributions of data, AC applies to individual sequences.
There is a theorem that the mean complexity of a source is
just the Information of the source. AC thus looks like the
appropriate way of generalising Information Theory from 
distributions to individual sequences.

We can make contact with the real world by noting that:

(1) the world contains discreteness. Indeed, our ability
    to create digital computers hangs on quantum effects.
(2) Instead of the arbitrariness in U, we can instead 
    talk about compact dynamical systems.
(3) Take the *sensory data* entering the system and compress
    it ala AC

The result of this is a system which continually produces 
programs p which bear the above relation to the input sensory
data. 

If we ask what sort of structures p would have if the sensory
data were the sort of data you and I receive, I think answer
is that p would contain all sorts of structure which would
classify the data into ways remarkably similar to the ways 
we do, in fact, classify data.

Thus, for example, if one considered data of a rotating cube, say,
a very short program for that very data would involve a 3-d
data structure, rotation matrices and perspective mappings.
From a 1-d rasterized image, we infer a 3-d representation
without having to introduce any hypotheses - we just ask ourselves
what a short program would be.    

There is also contact with induction - the detection and projection 
of regularities. If s can be shortened, so p is shorter than s
(contains fewer bits) then there is a residue #s - #p (the length of s
minus the length of p). In average it will take #p bits to
discriminate p and this will permit one to predict #s - #p bits
of s.

There is also contact with randomness - incompressible sequences
pass tests for randomness and compressible sequence don't.

In the Philosophical Investigations, W. keeps pointing out that
there are any number of ways of following the rule. One wants
to say that there aren't so many ways of doing it simply - that
is, with low complexity.

If we consider classifications brought about by AC, we will notice
a number of things:

(1) Whether classifications are sharply defined or over-lapping
    will be entirely contingent on the nature of the data.
(2) They will rarely correspond to necessary and sufficient
    conditions - the world is too jumbled up for that. However,
    a classification as an X *says something*, despite the fact
    that one does not have necessary and sufficient conditions
    for X-ness.
(3) Classifications will be *revised* in order to preserve 
    shortness - the receipt of more data may well force previously
    classified data to require re-classification.
(4) Classifications may need to be revised as a result of further
    analysis and computation. 
(5) Classification will depend crucially upon the history of
    the system - all the previous sensory data of the system
    is potentially relevant to how the very next item of data
    is to be classified.

AC provides an account of how classification can come about
which has no dependency upon any logicist notions. Nor does
AC depend on propositions, content or any such notion. AC
is a computational, not a logical notion.

The relevance to interpretation should be clear - when we 
interpret streams of sensory data as produced by this, 
produced by that, the 'this' and 'that' are classifications
produced by the compression of sensory data.

The structures defined really depend on two things:

(1) the continuous structure of the world
(2) the discrete structure of the world
(3) compactness

to the extent that these are subjective - to the
extent that there might be Moravek-like entities
with different topologies, different quanta, different
locality, then there will different AC-like systems.    

Cheers,
Pete Lupton

