Type inference in dynamic languages like Dylan (and CL) seems to have a natural partition into two facets: 1] Bottom-up derivation of result types from the types of operands. With arithmetic operations, this primarily exploits the mathematical properties of the operation, via "interval arithmetic", as in X1..X2 + Y1..Y2 = X1+Y1..X2+Y2. 2] Top-down propagation of type assertions from receivers of values to the generators of the values. These assertions can be either explicit declarations, or "domain restrictions" implicit in an operation, such as "the argument to RANDOM is positive", or "the argument to AREF must be between 0 and some limit." As far as type inferences is concerned, in a dynamically typed language, there is significant asymmetry between these two aspects, since a type assertion only affects values which are dominated (can only be reached via) the point at which the assertion is made. Thus correct top-down type inference is fundamentally a flow analysis problem, whereas bottom-up derivation is basically symbolic evaluation. In the absence of loops, bottom-up arithmetic inference is pretty obvious, basically amounting to integer arithmetic (although good approximations for division and potentially-signed logical operations can be rather complex.) Loops are significant for three reasons: 1] Efficiency of loops across small integers is crucial. 2] If the loop range can be established, this can serve as an input to inference of derived expressions in the loop, potentially eliminate bounds checks, etc. 3] The reasoning needed to recognize loops and infer their range is complex (involving flow analysis as well as some induction techniques.) CL (and probably Dylan) throw three unusual barriers in the way of loop range analysis: 1] Loops are expressed in many different ways using a collection of low-level non-loop primitives (like function call.) Unlike in other languages, standard "for loop" constructs are not considered primitive, and can't be directly exploited to determine iteration intervals. Non-reducible loops must be correctly handled. Some iteration idioms are pretty difficult to recognize, like: for (x = length(v) then x - 1 until x = start - 1) X can go arbitrarily negative unless we can show that start < length, that X is an integer, and that the stride is 1. 2] Arrays are generally dynamically allocated, making it unlikely that there is a useful constant bound on a given array's size. 3] Even determining that a loop is iterating over integers (let alone small ones) requires special optimization techniques or an explicit declaration. Dylan makes life somewhat more difficult than CL by allowing new methods to be defined on arithmetic and collection operations. The higher degree of polymorphism of Dylan operations creates the potential for cascading type-vagueness. We need to attach small-integer assertions to the sequence/array generic functions, and not just to particular sealed methods. Otherwise, typing becomes increasingly brittle because not knowing that a sequence is a built-in type can cause type inference for a loop index to fail, resulting in a cascade of basically spurious (and confusing) compiler efficiency notes. Of course, there are interactions between top-down and bottom-up inference. The simplest interaction is that bottom-up inference through an assertion can intersect its result type with the assertion (and emit a type check.) If assertions depend on the value of expressions (are not static, e.g. a dynamic array bounds check), then the interaction can become quite complex. Top-down and loop inference are crucial to having any success in inferring integer ranges -- this requires a variety of flow analysis techniques. The main reason for this is the semantics of addition: integer subranges are not closed under +, and interval arithmetic rapidly becomes uselessly pessimistic. Integers are kept bounded only by the frequent encounter of a bounded type assertion from a declaration or from a non-generic operation, or by reaching a Loop exit test for a known bound. Specific CMU CL experience: Loop range analysis is harder than it looks. Some intuitive understandings of the range of a loop are rather complex to discover from the code. Also, flow analysis technology tends to break down in unexpected ways. There is no one simple technique, you need a bag of several complex tricks. The current CMU CL only recognizes non-nested loops bounded by < where the loop index is known to be an integer from an assertion. I was surprised to discover the problem with nested loops, but there isn't any easy fix. The problem is that flow analysis can fail to find the "best" solution to the data flow equations. More recent flow analysis techniques (such as static single-assignment) can discover optimistic solutions, but have problems with non-reducible flow graphs. There are not "cookbook" solutions to these problems, though I am sure they can be solved. CMU CL gains significantly from having small integers (fixnum) smaller than the machine word size (i.e. C int), 30 v.s. 32: 1] The extra two bits serve as guard bits, allowing intermediate results in an expression to be still represented in a single register. For example: (the fixnum (+ fixnum1 fixnum2 fixnum3)) since the intermediate result is provably less than 32 bits, the extra cost of this type vagueness is small (3 shifts.) 2] Small integers have a non-heap-allocated "immediate" representation, greatly reducing number-consing. Python has the best technology for minimizing consing of 32bit or larger numbers of any Lisp compiler I know of, but there is still a big, big difference between the ease of efficient coding with floats and small integers. The most troublesome operations to inference across are addition and subtraction. Multiplication and division are theoretically worse, but are much rarer. Comparison causes very little difficulty, since it doesn't have a numeric result. One subtle problem does arise with casual comparisons of floats and rationals, like "some-float = 0". Initially we only open-coded comparisons of same-type operands, but this caused problems with ZERO? in float code. A viable solution seems to be to check if the rational has a precise float representation (which is true for 0, 1, etc.), and if so, convert it to a float. This only works when the rational is a constant. Inference over logical operations works fairly well, since integer-length is preserved and logical operands are often known to be quite small, like 8 bits. Problems can arise when manipulating unsigned full-word integers, since for example, adding a signed small integer to an unsigned word integer can't be done using word-integer operations (signed or unsigned). This requires masks to be specified as large positive constants rather than small negative ones. This interest in integer type inference suggests that someone still favors the "do it like Common Lisp" approach to small integers. I think that this would be a really bad idea. Integer type inference is nonintuitive and frustrating for users. I have had CS Phds tell me that they proved Lisp arithmetic was 20x slower (because they didn't know how to put in the right declarations, to e.g. allow the DOTIMES hidden temp to be correctly typed.) A closed small-integer type is the norm; Lisp is the only exception I know of. Consider the alternatives: -- If we choose a closed small integer type, then sometimes get overflow errors at run-time. These overflows will be localized, clearly identified, and easily fixed, either by changing to extended arithmetic, or (more often) rearranging an expression to avoid spurious overflow. -- If we choose quiet overflow into into extended integers, then people will write slow Dylan programs and blame it on the language (with some fairness.) Those who know better will waste time locating type inference problems and putting in meaningless declarations. I'd also argue that we should in effect make short-integer arithmetic the default by making integer literals short integers. In my experience, when you really want an extended integer, you know it --- the only times I've every actually wanted extended integers were: 1] Doing integer type inference. 2] Printing floating point numbers. More commonly, quiet overflow has introduced obscure efficiency bugs because an expression was inadvertently written in a way that could generate an extended intermediate value when a short result was ultimately being computed.