Newsgroups: comp.lang.dylan
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!newsfeed.internetmci.com!in1.uu.net!harlequin.com!epcot!usenet
From: norvig@meteor.harlequin.com (Peter Norvig)
Subject: Re: Q: sets in Dylan?
In-Reply-To: aml@gia.ist.utl.pt's message of 10 Jan 1996 11:12:21 +0100
Message-ID: <NORVIG.96Jan10120006@meteor.harlequin.com>
Lines: 58
Sender: usenet@harlequin.com (Usenet Maintainer)
Nntp-Posting-Host: meteor.menlo.harlequin.com
Organization: Harlequin, Inc., Menlo Park, CA
References: <won383ihu4.fsf@sol.gia.ist.utl.pt> <page-0601962338020001@page.vip.best.com>
	<wowx70mk8q.fsf@sol.gia.ist.utl.pt>
Date: Wed, 10 Jan 1996 20:00:05 GMT



It does appear to be a hole in Dylan that there is no <set> class.
It is certainly possible to implement sets as lists or vectors, but
this has some problems with efficiency and with data abstraction.
This message addresses the proper place for sets.  The answer is that
it depends on what you want to do with them.  If you just want to
know the size of a set and iterate over its members, then we see from
the following generic function signatures that <set> should be a
<collection>:

	empty? (<object>)
	size   (<object>)
	forward-iteration-protocol (<collection>)
	\= (<object>, <object>)

However, if you want to add or remove elements, to do union and
intersection, and to test for set membership, then you either have to
invent gratuitous new names for these operations, or accept that
<set>s are <sequence>s:

	add (<sequence>, <object>)
	remove (<sequence>, <object>)
 	intersection(<sequence>, <sequence>)
	union(<sequence>, <sequence>)
	member?(<object>, <sequence>)

This does not mean that we have to use one of the existing <sequence>
classes (like <list> or <vector>) for sets.  Indeed, if we did, we'd
be stuck with O(n) performance for member?, and O(n log n + m log m)
for intersection and union.  It is possible to implement a <set>
class as a subclass of <stretchy-collection> such that it includes a
hash table for fast member? tests, and maintains sorted sequences to
make intersection and union O(m + n).  (This would make add and
remove slower.)  If the universe is known and small, it would also be
possible to represent sets compactly as bit vectors.  (This would be
a separate subclass of <set>.)

If we accept that a <set> is a <sequence>, then we need to provide
methods for element, first, etc.  Antonio Leitao wants these to
signal errors "because a set don't have such thing as 'a second
element'". My understanding is that this would be legal, but I think
most users would rather have the following two fragments do the same
thing:

	for (x in set) f(x);
	for (i from 0 below set.size) f(set[i]);

with the understanding that the former might be more efficient. In
other words, a set may not have a second element on its own, but it
does have a second element with respect to a certain iteration
protocol.  All element, first, etc. do is pick these out, with
respect to the forward-iteration-protocol.
-- 
Peter Norvig                  | Phone: 415-833-4022           FAX: 415-833-4111
Harlequin Inc.                | Email: norvig@harlequin.com
1010 El Camino Real, #310     | http://www.harlequin.com
Menlo Park CA 94025           | http://www.cs.berkeley.edu/~russell/norvig.html
