Type inference in dynamic languages like Dylan (and CL) seems to have a natural
partition into two facets:
 1] Bottom-up derivation of result types from the types of operands.  With
    arithmetic operations, this primarily exploits the mathematical properties
    of the operation, via "interval arithmetic", as in 
    X1..X2 + Y1..Y2 = X1+Y1..X2+Y2.
 2] Top-down propagation of type assertions from receivers of values to the
    generators of the values.  These assertions can be either explicit
    declarations, or "domain restrictions" implicit in an operation, such as
    "the argument to RANDOM is positive", or 
    "the argument to AREF must be between 0 and some limit."

As far as type inferences is concerned, in a dynamically typed language, there
is significant asymmetry between these two aspects, since a type assertion only
affects values which are dominated (can only be reached via) the point at which
the assertion is made.  Thus correct top-down type inference is fundamentally a
flow analysis problem, whereas bottom-up derivation is basically symbolic
evaluation.

In the absence of loops, bottom-up arithmetic inference is pretty obvious,
basically amounting to integer arithmetic (although good approximations for
division and potentially-signed logical operations can be rather complex.)

Loops are significant for three reasons:
 1] Efficiency of loops across small integers is crucial.
 2] If the loop range can be established, this can serve as an input to
    inference of derived expressions in the loop, potentially eliminate bounds
    checks, etc.
 3] The reasoning needed to recognize loops and infer their range is complex
    (involving flow analysis as well as some induction techniques.)

CL (and probably Dylan) throw three unusual barriers in the way of loop range
analysis:
 1] Loops are expressed in many different ways using a collection of low-level
    non-loop primitives (like function call.)  Unlike in other languages,
    standard "for loop" constructs are not considered primitive, and can't be
    directly exploited to determine iteration intervals.  Non-reducible loops
    must be correctly handled.  Some iteration idioms are pretty difficult to
    recognize, like:

        for (x = length(v) then x - 1 until x = start - 1)

    X can go arbitrarily negative unless we can show that start < length, that
    X is an integer, and that the stride is 1.

 2] Arrays are generally dynamically allocated, making it unlikely that there
    is a useful constant bound on a given array's size.
 3] Even determining that a loop is iterating over integers (let alone small
    ones) requires special optimization techniques or an explicit declaration.

Dylan makes life somewhat more difficult than CL by allowing new methods to be
defined on arithmetic and collection operations.  The higher degree of
polymorphism of Dylan operations creates the potential for cascading
type-vagueness.  We need to attach small-integer assertions to the
sequence/array generic functions, and not just to particular sealed methods.
Otherwise, typing becomes increasingly brittle because not knowing that a
sequence is a built-in type can cause type inference for a loop index to fail,
resulting in a cascade of basically spurious (and confusing) compiler
efficiency notes.

Of course, there are interactions between top-down and bottom-up inference.
The simplest interaction is that bottom-up inference through an assertion can
intersect its result type with the assertion (and emit a type check.)  If
assertions depend on the value of expressions (are not static, e.g. a dynamic
array bounds check), then the interaction can become quite complex.

Top-down and loop inference are crucial to having any success in inferring
integer ranges -- this requires a variety of flow analysis techniques.  The
main reason for this is the semantics of addition: integer subranges are not
closed under +, and interval arithmetic rapidly becomes uselessly pessimistic.
Integers are kept bounded only by the frequent encounter of a bounded type
assertion from a declaration or from a non-generic operation, or by reaching a
Loop exit test for a known bound.


Specific CMU CL experience:

Loop range analysis is harder than it looks.  Some intuitive understandings of
the range of a loop are rather complex to discover from the code.  Also, flow
analysis technology tends to break down in unexpected ways.  There is no one
simple technique, you need a bag of several complex tricks.  The current CMU CL
only recognizes non-nested loops bounded by < where the loop index is known to
be an integer from an assertion.  

I was surprised to discover the problem with nested loops, but there isn't any
easy fix.  The problem is that flow analysis can fail to find the "best"
solution to the data flow equations.  More recent flow analysis techniques
(such as static single-assignment) can discover optimistic solutions, but have
problems with non-reducible flow graphs.  There are not "cookbook" solutions to
these problems, though I am sure they can be solved.


CMU CL gains significantly from having small integers (fixnum) smaller than the
machine word size (i.e. C int), 30 v.s. 32:
 1] The extra two bits serve as guard bits, allowing intermediate results in an
    expression to be still represented in a single register.  For example:
        (the fixnum (+ fixnum1 fixnum2 fixnum3))
    since the intermediate result is provably less than 32 bits, the extra cost
    of this type vagueness is small (3 shifts.)
 2] Small integers have a non-heap-allocated "immediate" representation,
    greatly reducing number-consing.  Python has the best technology for
    minimizing consing of 32bit or larger numbers of any Lisp compiler I know
    of, but there is still a big, big difference between the ease of efficient
    coding with floats and small integers.


The most troublesome operations to inference across are addition and
subtraction.  Multiplication and division are theoretically worse, but are much
rarer.

Comparison causes very little difficulty, since it doesn't have a numeric
result.  One subtle problem does arise with casual comparisons of floats and
rationals, like "some-float = 0".  Initially we only open-coded comparisons of
same-type operands, but this caused problems with ZERO? in float code.  A
viable solution seems to be to check if the rational has a precise float
representation (which is true for 0, 1, etc.), and if so, convert it to a
float.  This only works when the rational is a constant.


Inference over logical operations works fairly well, since integer-length is
preserved and logical operands are often known to be quite small, like 8 bits.
Problems can arise when manipulating unsigned full-word integers, since for
example, adding a signed small integer to an unsigned word integer can't be
done using word-integer operations (signed or unsigned).  This requires masks
to be specified as large positive constants rather than small negative ones.


This interest in integer type inference suggests that someone still favors the
"do it like Common Lisp" approach to small integers.  I think that this would
be a really bad idea.  Integer type inference is nonintuitive and frustrating
for users.  I have had CS Phds tell me that they proved Lisp arithmetic was 20x
slower (because they didn't know how to put in the right declarations, to e.g.
allow the DOTIMES hidden temp to be correctly typed.)  A closed small-integer
type is the norm; Lisp is the only exception I know of.

Consider the alternatives:
 -- If we choose a closed small integer type, then sometimes get overflow
    errors at run-time.  These overflows will be localized, clearly identified,
    and easily fixed, either by changing to extended arithmetic, or (more
    often) rearranging an expression to avoid spurious overflow.
 -- If we choose quiet overflow into into extended integers, then people will
    write slow Dylan programs and blame it on the language (with some
    fairness.)  Those who know better will waste time locating type inference
    problems and putting in meaningless declarations.

I'd also argue that we should in effect make short-integer arithmetic the
default by making integer literals short integers.  In my experience, when you
really want an extended integer, you know it --- the only times I've every
actually wanted extended integers were:
 1] Doing integer type inference.
 2] Printing floating point numbers.

More commonly, quiet overflow has introduced obscure efficiency bugs because an
expression was inadvertently written in a way that could generate an extended
intermediate value when a short result was ultimately being computed.