Newsgroups: comp.ai
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!cs.utexas.edu!utnut!nott!cunews!freenet.carleton.ca!FreeNet.Carleton.CA!an995
From: an995@FreeNet.Carleton.CA (Paul Deane)
Subject: Indexing vector spaces
Message-ID: <CzGso4.J25@freenet.carleton.ca>
Sender: an995@freenet3.carleton.ca (Paul Deane)
Reply-To: an995@FreeNet.Carleton.CA (Paul Deane)
Organization: The National Capital FreeNet
Date: Fri, 18 Nov 1994 13:12:04 GMT
Lines: 33



In the project I am associated with we are currently considering using a
vector space representation to solve some problems associated with
information retrieval tasks. I would be very interested if someone could
refer me, therefore, to any work on the construction of indices to a
multidimensional vector space.

       Roughly speaking, what I am looking for is something that will
perform a hierarchical clustering analysis on an arbitrary set of points
in an n-dimensional space. The hierarchical structure thus identified
would be used to construct indices. Roughly speaking, each index would
itself be a vector locating the `center of gravity' of the cluster or
subcluster with which it was associated. Subindexes would express the
difference between the center of gravity of the higher level index and
that of the indexed subcluster. Thus, given a vector to be matched, one
could find the closest match by traversing (and summing) the indices,
always choosing the index with the smallest difference from the target vector.

        Vector space representations are very far from my primary field of
expertise, so I have no idea what is available. At the same time, this
looks to me like the kind of algorithm that has surely been developed
somewhere. I would greatly appreciate references to the literature (or
better yet) information about where I could obtain working code that would
perform the requisite calculations.

       If you do not know of anything directly appropriate, I would still
appreciate references either to statistical software that could perform
the multiple-variable clustering part of the task, or to neural network or
feature map algorithms that perform a similar mapping. However, we are
working within a PC (486) environment; so I would appreciate particularly
any information to software that could easily be installed on a 486.
Thanks in advance.
