does info increment test really emphsize the worst localized or least
responsive axis?  It seems we should combine the two dimensions with
min, or something, not take the mean.

note: rear scanner at normal sensistivity, others at high.

fix shmemscantrackout to respect the preallocated size limits by dropping data or whatever.


todo:
 -- make sure float traps are off for demo.
 -- retune sensitivity parameters to not be too liberal, especially
    min info, min moving sigmas, etc.
 -- increase noise-adaptive statistics time constant?
 -- talk to christoph about quantiative evaluation using bus data.
 -- problems with bogusly moving track being valid yet having high
    velocity (2 M/sec), even racheting up
 -- quantitative eval
 -- Nicolas about GD state replay


reduce number of friend classes, especially eliminate friend class
datmomod by adding angular velocity accessor.

history_filter_order should really be a parameter, perhaps a time.
Currently there is no way short of recompiling to compensate for
different scan rates.

initialize start position in history analysis from latest measurement
or average of a few recent measurements.  If there is a current
*position* error, we don't want that to be interpreted as a velocity
error.

Replay:
 -- still a lot of spin-looping going on, perhaps partly why we can't replay
    faster than about 0.4x real time.  Suppress rendering if nothing
    has actually changed e.g. with some sort of something-changed
    flag?
 -- add debug output in tracker when we drop scans (e.g. time
    increment too big.)
 -- Do object overlay in BoschCam?
 -- object IDs in side cams?
 -- point side cams backwards?

What is the point?
 -- support experiments with:
    -- multi-scanner
    -- off-road, occlusion, etc.
 -- generate new movies and presentation materials showing 360
    performance, non-urban operation.
 -- support anything for paper?  I don't know.  Maybe we could do some
    NL11 experiments for the paper, but it's not clear we need it
    unless we work in the new capabilities.  The improved
    occlusion/dropout doesn't even exist yet.


Was ped accel messed up even before velocity correction?

Multi-model stuff?

Read up more on tracking (IMM, etc.) and non-linear filtering (unscented.)

improve response time of angular estimate?
better solution to rotation estimate (e.g. nonlinear filter)?

problems:
  spurious longitudinal convergence

Why does KF converge to highly symmetrical (uncorrelated) covariance
even though all the measurements are highly correlated?  Is this
intrinisic in KF, or is this due to e.g. the random rotation of the
feature.  Perhaps make the lateral error also large when the
longitudinal is large to reflect orientation uncertainty.  Could also
use actual fit error in finding lateral error estimate.

If we ever get a longitudinal motion estimate on a vague line, it
sticks around, and even becomes validated by covariance convergence.
This is wrong.  If we keep resetting (residue mean), this can go on
forever.

0.5 degree support.
    pose interpolation problem.
    time-synch each point individually using motion estimate?


If Mahalanobis is too big (but less than associate threshold),
suppress velocity change?

for moving test, compare current pos with actual first seen position
by e.g. likelyhood.  If different, evidence of motion.  Or compare
initial segment updated by motion estimate with current pos (or
equivalently, project current state back in time)

Add some sort of "score" scheme where bad track events like resetting
and d/dt limiting count against the track, and normal good behavior is
in its favor.  Min score is required to go valid.  Perhaps this can
replace the feature_associated_count, which is now somewhat degraded.
also last_associated

for vague line ends, reset the longitudinal position on each cycle.
This way vague ends can track o.k., so can have reasonable residue.


what were we getting from the complex->compact association taboo, and
are we still getting it?

maybe put back factor relating line tolerance to size?

maybe not really right to use closeness with 1-1 matching.  or at
least that way, we may have startup problems.

allow older (tenured!) tracks more slack w.r.t. data association or
something.

go back to stealth car:
    /home/ram/data/10-06-03.04.12.08_improve_mar24/parked_car/

consider rotation distance in association test.

check that rotation of covariance and residue is working right.

noise-adaptive heading estimation.



ideas:

We could do better with clutter if we did a better job of associating
clutter tracks.  Then we would be in a position to recognize that is
looking clutter-like over time, and also then to continue to
disqualify that track even if it changes into a shape that is not so
clutter-like.  For ground returns, we could for example do a
polynomial fit, and the smoothness could also be an indicator of
ground return, as well as the model providing a basis for data
association.

We could also probably to a pretty good job of assessing the
similarity of complex tracks by correlating their point lists in scan
order with a bit of searching to find the best-match position.  I
guess this is like the idea of find the closest line-point distance,
but with the idea that scan order imposes a 1d topological constraint
so that it isn't a NxN problem.

We could do quadratic fitting without doing major violence to the
current code, mainly affecting segmentation itself.  The idea is to do
quadratic fit instead of linear fit, but then use the tangent line at
the center of the parabola.  This should work better with nominally
rectangular objects having rounded sides.  We'd want to limit our fit
to rather flat parabolas, and also we'd probably want to stick with
linear fitting for small numbers of points so that we don't overfit
the data e.g. for long-range tracks.  We could also then do a better
vague test, as we can currently be confused by round noses.

Vague test that looks at actual end point spacing.  Should help with
rounded car noses.

problems with car tracks going complex esp with rounded ends.  Perhaps
relax the linear-fit criterion a bit, and add a smoothness criterion?
quadratic fit?  Also sometimes, car returns are a bit noisy.  add
other critera to discriminate against veg-type returns?  Is the return
density as high as it should be?  Are we seeing though it?

Even when we have really bad residue, it doesn't seem to affect the
velocity covariance much.  More process noise needed, I guess.  Also,
our min noise is often larger than the actual residue.

sometimes tracks get wedged without ever hitting the residue-mean
reset.  Should we try harder to save wedged tracks to keep them from
getting wedged?   

probably should discriminate against complex tracks more than we do.
we should be pretty skeptical about velocity acquired when a track
initializes as complex.

track center seems odd sometimes with corners.  I guess this is just
the expected effect of line extending, and since nobody is using the
center, no big deal.

tracks break up too much.  At a minimum, we should try to handle the
track-with-hole problem better.  When the track comes back together,
give preference to the older track.

I don't like that we reinitize so often.  It seems that "forced fixed"
(i.e. reinitializing features) is the main way that tracks become
non-moving.  What does this mean?

consider orientation in likelyhood. e.g. via rotation_noise.  also,
line/corner distance might benefit from being rewritten in that it
considers feature matches that don't make geometric sense, and also is
doing unnecessary direction-fu.

occasional transient weirdness where line goes between two old corner ends.

force tracked corners to be right-angle.

is SquareMatrix being instantiated so that the matrix size is constant
in the code?  Why does transpose take so long?

for non-vague points, the noise minimum isn't having a whole lot of
effect because the info increment is >>1 even with it.

Perhaps consider track center in data association test.  Look at
Mahlanobis distance of position increment too.  higher min measurement
noise for center.  different distance threshold.

make sure we can deal with cycle counter wrapping around.

there may be more efficient KF equations than the ones I'm currently using.
Also, maybe the "stabilized Kalman filter" would be more robust.

add associate likelyhood factor related to relative number of points,
i.e. multiply by point count ratio.  

If we add rotational estimation (theta_motion, v_theta) we need to
consider how this interacts with the linear motion.  It might make
some sense to have different measurement and state frames, where the
state is position (motion), scalar speed, and theta_motion, v_theta.
This way, there would be no need to explicitly rotate the linear
velocity in phi.  However, it seems pretty equivalent, as we'd still
need to have the system represent the way that theta mixes the speed
into the position prediction.  Or in fact, it seems it may be worse
because theta must be the true heading, and not just a theta-motion.
With the velocity vector scheme, we don't need to disambiguate the
actual heading in order to find the angular velocity.


ideally we should represent the fact that line ends have drastically
different measurement error along the line v.s. normal to the line.
Note that the distribution of the resolution-related lateral drift is
more of a bounded uniform distribution, and not gaussian.  However,
gaussian is probably a reasonable approximation.


extend segments along lines, or somehow jump over holes.  either
combine segments before association, or associate a track with
multiple segments.  Or associate raw points with existing track before
or during segmenting.  Or split a track in two, then recombine?


If the track is moving, we do know which is the front and which is the
side, so we could use different length guesses.  Also (easier) we can
guess which side we are looking at by its length.

vary process noise based on bus speed to give a better moving test.

maybe we could get better velocity estimates by comparing positions
across a time-lag.

match all points when doing data association, not just corners.  This
could at least cause tracks to die when we get a bad V estimate. 

do an ICP-like thing where we use the match error to form the motion
estimate.  probably slow.


The natural history of a track:

Track is created at long range.  Actual measurement error is high due to
resolution limits and scanner/pose discrepency.  We have no track history to
establish its reliability, so we start out with our priors.

For line/corner tracks, the ends will be vague, and the shape will switch
between line and corner.  As long as the ends are vague and there is no
corner, we really have only a speculative idea of the lateral V (e.g. assuming
what we see really is the end.)

As soon as we get a corner or non-vague end, we can start to get a V fix.  The
corner/outer end should associate with each other, so the corner/line flipping
doesn't prevent a fix as long as the line-end doesn't go vague.  The flipping
does add some measurement error as the corner/last end wanders around.

Eventually we get a solid corner, and we should soon get a valid fix as long
as the residue is good enough.  If the car is manuvering, this will increase
the residue and tend to delay getting a good fix.  A high residue mean at this
point suggests manuvering.  

The in some order, we lose the front corner, but still have a non-vague front,
then the back corner goes non-vague, and the front goes vague.  There is
considerable risk of spurious motion in the direction of the bus as we wipe
our scan across the vehicle.  The time with no corner, and first getting the
rear corner are particularly bad.  These changes are particularly problematic
because:
 1] they happen when close, so even small velocities can create a short time
    to impact.
 2] they happen when we formerly had a good stable fix, so we might tend to
    assume that the fix is still good if we aren't careful.

The process of losing the track is similar to gaining the track.  There is
perhaps greater risk of bad motion detection because an existing track gets
more credibility than a brand-new one, but the risk of false alarm is
nil at this point


Requirements and methods: what do we really want and how are we getting it?

-- pos and V with covariance output.  How covariance is used is an open
   question (perhaps not, currently.)
-- low rate of false alarms, and good chance of predicting collisions.  It may
   not be important to track extreme manuvering.  We'd like to be able to
   track anything, of course, but I think it's safe to assume that most
   accidents are caused by vehicles in a normal dynamic regime who just happen
   to make the wrong maneuver (e.g. fail to stop.)

they best way to understand the application requirements is to get Christophe
using this code.


What are we getting from KF?
 -- estimates velocity from noisy data.
 -- allows us to get a fix faster than a fixed TC filter by going through a
    startup period where the covergence is faster but the output is known to
    possibly be noisy.
 -- in comparison to a fixed large delta-T scheme I think it should get better
    performance w.r.t. noise, and also permits the possibility of extracting
    useful info earlier when the magnitude of the motion is large.
 -- provides a well-founded framework for adapting behavior when measurement
    noise is dynamic.
 -- covariance output can be input to another KF!

What do we get from state residue?
 -- adaptive estimate of state covariance for output and vaidity testing.
 -- high mean indicates modelling error (settling or acceleration)
 -- can be used to estimate adaptive process noise.
 -- does not incorporate any prior information or information from sensor
    model (scan angle)
    [### not entirely true.  the state residue depends on the Kalman gain,
    which is computed from the covariance.  If the KF covariance is low, this
    reduces the Kalman gain, which reduces the state variation, and thus its
    covariance.  So the KF covariance is sort of multiplied with the observed
    process noise.  This is a nonideality for purposes of process noise
    estimation, but does combine the covariance estimates in a possibly useful
    way. However, the interaction may not be what one would hope.  If our
    input noise model is increased while holding the observed noise constant,
    this will cause a *reduction* in the state residue, whereas we would
    prefer an increase because we increased the noise model due to some
    situation which increases possible worst case error.
 -- If time-invariant, a reasonable estimate of output noise. However, not
    time-invariant.  There is lag in responding to any change in behavior.  If
    a feature suddenly becomes noisy, we will see spurious V.
    

What would we get from adaptive process noise?
 -- this would fold residue information back into the KF covariance, creating
    a more accurate covariance output, and also helping us to track
    maneuvers better.  However, it could eliminate the KFs response to changes
    in the noise model, as in the case where the noise is conservative, we
    would reduce the process noise to create a covariance that matches recent
    behavior.
 -- however, we can't let the process noise get too low, so this linkage comes
    unhooked when the apparent process noise is low.
 -- If measurement noise is conservative, we will infer meaningless negative
    process noise, uncoupling the linkage entirely.
 -- A multi-model approach (fixed, CV, CA) would likely be the best answer to
    time-varying process disturbance, but that may be computationally
    prohibitive. 


I think that the conclusion is that:
 -- Process and measurement noise should be set by worst tractable case
    critera.  Because behavior is time-varying, we can't allow past behavior
    to have much effect on filter response.  We can incorporate sensor known
    situation dependence of scanner performance, and could possibly use
    multiple process noise models for e.g. compact objects (pedestrians)
    v.s. others, as in this case there is [1] reasonable classification
    accuracy, and, [2] being a pedestrian is not time-varying.
 -- If state residue is low, we can have confidence in our fix. Although the
    residue test will fail if a track suddenly becomes unreliable, works in
    the common case where the track is bad all along.

KF covariance -- quasi-worst-case
residue -- apparent typical behavior

under this interpretation one would expect that the residue would normally be
less than the KF covariance.  (this is reinforced by the fact that increasing
covariance reduces the residue.)  However, since the KF behavior depends only
on the ratio of process and measurement noise, no particular relation need hold
if the filter parameters are chosen by ad-hoc tuning.

But supposing that the noise model was in some sense correct, then the residue
exceeding the KF covariance could be regarded as an indication of significant
deviation from the prior quasi-worst-case model, and a loss of confidence in
the output.

It seems that under this theory, the validity test should be:
  residue < KF covariance < threshold parm

This not only forces a particular scale factor on the noise model, but also
tends to lead to adjustments in the process/measuremnt noise ratio. If for
example, manuvering caused a larger residue which exceeded the KF covariance,
then the track would go invalid.  If this was deemed too conservative, the fix
would have to be to increase the process noise (or possibly increase the
residue TC.)  Is not clear whether this would result on conflicting demands on
the noise model for output covariance v.s. filter dynamics.  It seems it might
work, reduces the number of parameters, and perhaps most importantly, assigns
a meaning to the covariance.

What should the residue TC be?  If we accept we are hosed when a track
suddenly becomes unreliable when it was formerly reliable, and if we have a
startup mode for t<TC, then there is not much pressure in the direction of a
short TC.  A longer TC will be able to ride through short periods of
manuvering, and will also be able to maintain invalidity across short periods
of stability.

The main reason that I can see for any decay is that after passing,
reliability can degrade again with increasing distance (not such an important
error in our app), and also scenarios with long tracks due to the bus being
stopped or same-direction traffic.  For changes in reliability due to range
change, it might make most sense to have a distance-related decay, on
range-to-target or odometer.  In the case of loss of reliability due to
increasing range, we have a chance of handing this because it is gradual.
However, we'd also want a time decay when stopped.  Probably simplest would be
to decay based on time or distance traveled, whichever is greater.  However,
at high speeds we just won't have enough time to adapt twice as we drive by
while still keeping a useful sample size. So maybe it makes more sense to
stick with pure time decay.


Noise adaptive & covariance estimate:

Not clear that being measurement noise adaptive is really what we want.  For
one thing, a big part of what we want is an output estimate of state
covariance, and this doesn't get it for us.  The statistics of the state
residual (innovation) do however estimate the state covariance.  This is what
you would use for estimating process noise.

We could try being process noise adaptive, but this is worrying because we
have to take care not to let the process noise get too low or when the object
starts accelerating we will be in denial.

In some ways, being measurement noise adaptive is more conservative than being
process adaptive.  If an object is well behaved for a while, the measurement
noise will go down, causing us to increase the credibility of any manuvering,
whereas noisy tracks will get slow response, tending to minimize any inference
of spurious velocity from them.  The main risk is that if a track becomes
noisy (e.g. due to data association problem) after being well behaved, we may
infer a spurious velocity at that time.

Really there is no point in allowing the measurement noise to go below a level
that allows "fast enough" response to manuvering.

There could be some benefit in allowing the noise to go up for noisy tracks,
in that it would tend to minimize spurious V's.  And insofar as the track is
just noisy, and not pure garbage, it would allow us to estimate V more
accurately (albeit slowly.)  However, it would also tend to lead to spurious
confidence in the result in cases where we heavily filter random clutter.

In contrast, with process noise adaptive, with a well-behaved track, we aquire
a high confidence that we know the true state, which is very dangerous given
the no-manuvering model.  So the process noise could not be allowed to go
below a certain level, or we risk false negatives, whereas with measurement
adaptive we risk false positives.

Well, so strike the argument about measurement adaptive being more
conservative.  It is in the sense of being most likely to alarm, but given
that we are mainly concerned about too many false alarms, that it probably not
a win.

Once the KF has reached steady-state, there are two fixed time constants, the
KF response and the residue filter response.  Note that the KF TC of position
and velocity will differ, with velocity much slower.  We would want the
residue TC to roughly match the velocity response TC.  The steady-state
velocity response TC has to do with the process noise, i.e. really modelling
error. 

It would be mostly harmless to adapt the process noise as long as a good floor
is chosen.  However, if the measurement noise is unrealistically large, we
will estimate small or negative process noise, in which case the adaptation
would be having no effect.


Normal V can be estimated accurately up to
the limits of the scanner/pose discrepency.  Is normal V only useful?  not in
the parked car case, since what we really care about is the lateral V.

However, since cars can't move sideways, when a car starts pulling out of
parking, a component of the normal velocity will intersect with our path.  So
maybe it would be useful.  However, if the car is turned, we should see the
corner.  So maybe it isn't useful.

That's an interesting point, that we could embed the idea of cars not moving
sideways in an assumption that vague lines are not moving longitudinally.
However, cars do move forwards, and in that case we could still fail to see
the corner if the car is driving normal to us at an intersection.


Fusion:

maybe we really want just one KF per track, and then come up with several
measurements per cycle based on feature motion. 

For example, generally the features are a much better indication of the motion
than they are of the vehicle center position.  Yet the measurements
materialize as position measurements.  What we can do is present the
measurements as position, but transform them so that they don't affect the
relative location of the position estimate w.r.t. the feature.  Basically, if
the delta position of a feature is D, then we say that the new measurement is
the current position estimate + D.  So the feature motion is a measure of the
motion of the target, but not of the target's actual location.

Then we need some way to establish the track position.  We can do this more or
less as now, and present this as a measurement with high enough process and
measurement noise so that we get appropriate position smoothing without
affecting the velocity estimate much.  Or there's probably some way to hack it
so that there's no contribution to relative motion.  For one, we could just
directly zero the velocity state increment on the position update (or the V
component of the Kalman gain.)

For this fusion filter, we *do* want adaptive measurement noise.  We'd need to
compute the measurement residue for each feature seperately.  I'm nervous,
though about the business of subtracting the KF covariance from the residue to
get the measurement part, because of the problem that if process noise is too
high, then the adaptive measurement noise will be too low.  Though it lacks
any theoretical rationale, it might work better to use the entire residue as
measurement noise.  I guess this is what would happen if process noise were
negligible w.r.t. measurement noise.


However, this would kill, or at least require revision, of my beautiful theory
about how the covariance is worst case, residue typical, and residue <
covariance means validity.  Because the covariance would then reflect the
typical recent behavior, and also if we let measurement noise get too small,
then this reduces smoothing of glitches (increases their apparent
significance.)  Not clear what we would get from residue < covariance < limit
that we don't get from residue < limit && covariance < limit, and if residue
were incorporated into covariance by measurement adaptive, just
covariance < limit.

I suppose if we just put a floor on the measurement noise, things would
probably work o.k.  Then the validity test would be only on the covariance.
We do have to be careful that our min measurement noise is high enough and
process noise low enough that the low early residue doesn't cause significant
spurious convergence.  Or we could run with a fixed initial measurement noise
until the residue sample is large enough that we can compute a meaningful
residue variance.

Why did I think that was so bad before?  For one, in this scheme, we don't
care that the measurement residual doesn't estimate velocity error.  And I was
worried that the result would be negative when we use conservative process
noise.

Do we want the state residual of the fusion filter?  This is perhaps not so
necessary, as we can use the KF covariance for our output, and for validity
testing.  It would already incorporate residue information via the measurement
noise adaptation.  Process noise adaptation is questionable because the
disturbance is so strongly time-varying.

What do we do with vague?  I guess it has to be considered an extra
noisy measurement.  We'd need to override the adaptive stuff in this
case, with a fixed high value of noise longitudinally.

if a feature goes from non-vague to vague, we are losing that feature,
and may want to reset our confidence, residue, etc.

The idea of making vague points be considered noisy works better with
the fused filter, since the vague points no longer have an independent
track which can wander out of control.


Tuning:

difficulties with track initialization.  Hard to find parameters that
give slow enough velocity convergence at start of track, yet otherwise
reasonable, e.g. for info increment.  Now, the main reason that we
care about the early velocity sigma is for the
initial_d_dt_sigmas_per_sec, which is rather ill-found anyway.  But
the problem is that when we set initial_d_dt_sigmas_per_sec high
enough so that we initialize well, then the V sigma never goes low
enough for the normal max_acceleration limit to take effect.  We could
force this by only using the d_dt_sigmas during when the track feature
associate count is low.

Also, we'd like slower response on V and (especially) A once the track
is initialized.  The accel estimate seems pretty noisy currently.
However, the more we do this, the poorer tracking is when accel
changes, e.g. in rotational motion.

It seems that we can force both sorts of change by increasing the
position measurement noise, but this results in values very much at
odds with the adaptive estimate.  I suppose we could add an ad-hoc
scale factor multiplying the measurement residue to get the desired
dynamics while keeping the adaptive effect of weighting noisy features
less. 

It seems wrong that the acceleration sigma scarcely decreases during
track startup.  Yet, if we increase the initial acceleration noise,
this weakens the no-acceleration prior and we tend to get more
acceleration overshooting on startup.  This can possibly work if we
increase velocity noise even more, but that didn't seem to work all
that well even whewn initial V noise reached somewhat nonsensical
levels.


Rotation:

We already have the track basis vectors, so the rotation is
straightforward to determine.  except that we aren't forcing the
corner tracks to remain right-angle.  

If you look at the corner with rotation model, our state is really
3'rd order, the two arm lengths and the rotation. (?)  But the lateral
motion of the arms definitely says something about linear motion.  So
it's true that the shape only has that many parameters, but our motion
is overdetermined by our features, and the whole idea is to extract
all the useful info.

We could add a squaring step of some sort.  But we don't have any good
idea of how confident we are of the feature positions, since they
don't each have their own covariance.  But as long as the error is
small, it may not matter much if we just evenly weight the two sides.
Or we could just ignore the problem entirely under the small
assumption.  

Note that in reality with current corner fitting, the orientation is
entirely determined by the long side.  Note that the track basis is
just an encoding of the theta.  In tracks, we can update the basis
purely from the theta estimate and not have any explicit coupling
between the track endpoints and the rotational estimate (?).  I guess
this can work because all we really want to estimate is angular
velocity.  Well, except that theta


Overlap stuff:

Change data associate to based on overlap rather than distance.

To see if a segment overlaps a track, grow the track's box by a nudge
(~segment distance), then see if any of the segment's points fall in the
expanded track box.

For linear tracks, the box is based on the features.  We could either grow up
to a canonical object model, or we could just turn a line into a strip 2x
segment distance wide.  There doesn't seem to be any very compelling reason to
use a prior object model, and it could create problems with spurious overlap.

For compact/complex, we can just use the bounding box, though especially for
valid moving complex tracks, we'd probably get better results by using a box
with orientation based on the heading.  

To first, order, we do data association by finding all of the segments that
overlap with each existing track.  If just one segment overlaps, that's the
easy case, associate with that.  If multiple segments overlap, then the
first-order interpretation is that the track has split.  One of the segments
is associated with the old track, and one creates a new track.  The split
track should be initialized with the same state, and probably the same
covariance as the old.  Normally when moving tracks split it is spurious, so
the two tracks are really part of the same object, and if the objects are at
rest, it is probably harmless to start at zero speed with low covariance.

Splitting needs to be somewhat intelligent.  We need to look at what
association makes the most sense, and probably want to forcibly reinitialize
the end feature on the split side so that we don't freak if the end was
formerly considered non-vague.  If only one of the segments has the same
shape-class as the track, then that is probably the best match.  If a line
splits in two, then the old track should get the better end (the non-vague end
closer to the scanner.)  The issue of giving the better data to the old track
isn't crucial, though, as we can keep the old id if the tracks remerge.
Anyway, ID continuity is secondary to the issue of preserving the dynamic
state.

We can optimize the overlap test quite a bit by comparing the world bounding
boxes first.  If the tracks overlap, then the bounding boxes will also.
Usually, (but not always), tracks with overlapping boxes do overlap.

To detect tracks merging, we need to be able to determine when two tracks
overlap.  To keep it the same as segment/track overlap, we'd need to have
valid points for tracks.  So we'd need to predict the track points locations
using the dynamic model.  I guess the simplest thing is to check all n^2
overlaps after each iteration.  However, it would probably also work to only
check dying tracks to be overlapping with some other track.  The only problem
would be if the track death timeout is long enough so that a track can
entirely cruise through an object before dying.  Or we could only check the
tracks that failed to associate.

Of course, we are already doing a n^2 comparison for association.  There are
of course smarter algorithms that could be applied if necessary.

There may be problems with clutter tracks due to e.g. walls, as no matter what
orientation you draw the bounding box in, it can include substantial area that
is outside the object.  One thing we can do is give non-complex tracks first
crack at associating, and be pretty skeptical about any belief that anything
overlaps with a complex track.

We could do better with complex tracks that fit line or corner reasonably
well, but not well enough to be linear.  In that case, we could use the
oriented bounding box with reasonable confidence.  Perhaps we want to
distinguish such tracks.  APPROXIMATE_LINE, APPROXIMATE_CORNER, COMPLEX?
Or a bit flag which can say that LINE/CORNER is fuzzy, then we could treat
more like complex, e.g. being skeptical about setting to valid/moving.
Or could really be quantitative based on fit quality, except there are also
those decision critera in segmentation.  e.g. if is ambiguous, then don't
consider orientation for omega.


Use fit quality in determining prior noise?


Can use position covariance to determine outline grow increment.

 -- Do pre-pass finding all track/segment overlaps, and annotating the tracks
    and segments with their overlappers.
 -- scan tracks in order from oldest to newest (_tracks already sorted this
    way.)
 -- track/seg overlap if the world bounding boxes overlap (for quick test),
    and some points in each track fall in the other's oriented outline box.
 -- If the match is 1 to 1, associate the track and segment as normal.
 -- Of the segs that overlap, find the seg most similar (sum of inverse
    feature distance.)  Reinitialize any features in the track that are too
    far from the segment.  Associate any features close enough.  This case
    also handles reinitialization due to track joining.
 -- In the other segments not associated, set split_from and preinitialize
    filters from our state.  If at the end they have not been associated with
    any other track, then create a new track with the split annotation and the
    copied filter state.
 -- If we want to associate a segment, but that segment has already been
    associated (with an older track due to scan order), then there may be a
    join.  If we have another overlap segment (next closest), we can go on and
    try that one.  If we use up all of our overlap segments without
    associating, then annotate our track as having joined the segment's track,
    and don't associate.  When our track dies we will report it as having
    merged.  Whenever we associate a track, we clear the merged annotation.


track death notification requires some thought, since currently the track is
deleted immediately, so all info is lost before it can be reported out.
Move the dying track to a seperate delete list.  Report out tracks on the
delete list.  When we get to the reap pass, delete the old dying tracks and
find the new dying tracks.


Rotation part II

What should the track basis vectors be?  What are the invariants?  Currently
nothing is forcing feature orientations to track the rotation except by
tracking.  When rotation is settling the basis vectors may be aribtrarily far
out of alignment with the features.

And with heading from motion, the alignment mismatch may be moderately large
for arbitrary periods of time.  Now, if we track orientation and heading
seperately, that can be finesed.

Who uses the basis vectors?

overlaps -- wants alignment with features.
feature_normal -- wants alignment with features.
find_match_dir -- features
limit_acceleration -- features
check_moving -- features

angle_diff uses direction as representation of kf_rotation to find innovation.

check our set_direction theory.

Track THETA_ORIENTATION and THETA_HEADING sperately.  If a measurement is
not there, use a large measurement noise.  
    THETA_ORIENTATION -- linear, not fuzzy
    THETA_HEADING -- moving

Annoying because we don't have both measurements at the same place.  I
guess we could do the thing where we give the current state as the
unmeasured input so that we don't bias the state by giving e.g. 0.
Or we can delay the orientation measurement step until when we have
the motion estimate for this cycle.  This is of course more efficient
in the case where both values are measured.

As currently, reset THETA_xxx if there is a big jump in the
measurement.

set_direction only done for THETA_ORIENTATION?  How does that work for
THETA_HEADING innovation?  I guess we could add an analogous seperate
heading vector, or just compute it each time.
 


Back-check movement test:

We can make a very powerful test for the correctness of our motion
estimate by predicting the position backward assuming current
velocities and accelerations, and seeing if the past measurements
match up with this prediction.

One way to see this is if we have the correct dynamics, then we should
be able to supimpose all of the past observations *in the coordinates
of the track* and get something that looks like a rigid object,
rectangle, etc.

The most general way to do this would be to use the original point
observations, as this would eliminate artifacts due to bad line
matching. 

This is somewhat like forward-backward estimation.

Conceivably, one could also use monte-carlo estimation to refine our
estimate and to deal with uncertainty accumulation and unknown
variables such as center of rotation.



A simple version would be to record the initial segment, then
back-project the current track to the start time.  If we are moving,
then the back-projected track should match the initial segment
significantly better than the current track does.  If not better, then
perhaps we aren't moving after all.

Well, the very first segment might not be ideal, since it's apt to be
complex and sparse.


Multi scanner and interlace:

My top favorite idea for interlace and multi-scanner is to process each scan
seperately, just as we do now.  Slight generalization would be required so
that scanner position, etc., can change.  This avoids the problem with
trying to time-synch the points (any effect is no worse than it is now.)  The
time to scan across an object is usually small because the object doesn't
cover a wide range of azimuth.


If we stick with 1 scanner at a time, we can also keep the one-corner model,
at least as far as segmentation goes.  Note also that corner + 2 ends does in
fact completely define a rectangular object.  We might be able to deal with
multi scanners by only minor enhancements in feature association, such
associating all the points in the "bookend" corner-corner association assuming
a rectangular object, instead of throwing out the points that don't directly
correspond.


In principle time-synch could be done across many scans, not just the
two in an interlaced scan, but this couldn't really be done in the
polar coordinates currently used for segmentation.  However,
shape-classification is done in cartesian coords.  Except there are
some assumptions that points are in the right order in the points
vector, especially w.r.t. corner matching.

However, there could be some benefit in making this work.  It is
similar to the retroactive moving test idea.  If the motion estimate
is good, then we get a better line fit than when the estimate is bad. 

I suppose it would be possible to update the points in Cartesian, then
convert back to polar and sort by azimuth.  This is probably not
entirely auspicious from an efficiency point of view, but it would
minimize changes to segmentation and shape-classification.

hmmn.  we can't update points by motion estimate until after data
association, which depends on shape classification.  At the very
least, points need to be associated before they can be time-synched.
And to individually time-synch each point, I think we would have to
associate each point individually.  So that would be a big change.
However, this seems like it would be compatible with top-down segmentation
where tracks get to claim individual points.


Testing:

Basic idea: see how current estimate predicts future measurements.

Simplest to implement if we see how well we can predict future feature
positions.  Since feature position tracks fairly closely, we could compare
extrapolated feature position to either measured or estimated position.  Using
the filtered feature position should filter out some measurement noise, but
also reeks a bit of circular reasoning.

On one hand, it would be nice to have a general framework for analysis of
tracker performance which could do retrospective analysis by crunching entire
tracks.  On the other hand, I don't want to have to duplicate stuff such as
the basic track update.  Also, the current history based analysis is very
similar.  But I don't want to cruft up datmo with a lot of testing code.  One
approach would be to export some needed functionality from Datmo, then fix up
datmotest to do the analysis.

back up a bit.  What analysis do we want to do?  We want to histogram the
match error, probably breaking down into longitudinal and lateral components.
This data can be accumulated over entire single tracks, and also combined for
all tracks in a run (or particular interesting subsets of tracks.)  We would
make pitcures very similar to the ones Christoph has currently, but for moving
objects, and with error components expressed in the coordinates of the track,
not the bus.

It would be interesting to pull out subsets of the tracks where the tracker is
working badly to see if we can substantiate the intuition that the overall
result is the sum of two different distributions, one where the tracking is
working well (perhaps Gaussian) and one where it is working badly.

I guess one way to do this would be seperately dump out the histograms for
each track (or stick at the end of the current track trace data.)  We could
also get at time-varying behavior by periodically logging incremental
histograms.

One major thing we are interested in is seeing how the prediction accuracy
varies as the prediction horizon changes.  On each cycle we compare the
current measurement or position estimate with predicted positions based on
several different lags.  The error at each time horizon needs to be computed
and seperately histogrammed.  So we have several histograms for each track.
This could be visualized as a 3D histogram surface.

We need to know how far we can extrapolate track positions.  This test
methodology fairly directly tells us this.  However, the resulting error is
from two conceptually different sources: 
 -- estimation error: dynamics not accurately estimated at the time reported.
 -- modelling error: the tracked object violated the constant accel, constant
    omega model.

In practice, the distinction is not so clear because current estimation error
is often caused by modelling error in the recent past.  However, there are
definitely other strong contributors to estimation error, primarly due to bad
feature extraction.


I guess any validation approach more or less amounts to applying a non-causal
batch mode esimator to the same data, and seeing how well they agree (?)  
Not really, I think.  Admittedly, my scheme amounts to using the future
measurement or position estimate as a proxy ground truth, and having a really
good estimator could only help with this approach.  However, we aren't just
comparing a good estimate to a better estimate at the same time, we are
comparing two estimates across time to see how good prediction is.

Having a "gold standard" estimator would be the most general and comprehensive
approach, and would allow one to better seperate the effects of estimation
error and modelling error.

Of course, the mismatch is a combination of the error of the "more
sophisticated" filter and the one under test, so the error estimates will
probably be conservative as long as there is no implicit correlation between
the two estimates.  But shared errors are actually likely.  My DATMO shows
similar errors to Justin's old version, but with lower magnitude.

One thing that we could do is on each cycle dump out the error vectors for the
different prediction horizons in a way that can be easily read into matlab.
This would of course facilitate visualization, but would also allow things
like frequency domain analysis of the error.  However, this might conflict
with the desire to crunch large data sets.

We could have a different log mode which replaces the current output with a
more terse machine readable format.  We could also suppress logging of
short-lived junk tracks, as we can't output any data until some time has
passed anyway.


This idea only applies directly on cycles where we were able to associate.  If
association fails, we don't have any proxy ground truth.  I guess if we only
used the estimate, then this problem is in some sense technical sense avoided,
but it's not clear what it does to the results.  Note that in any case, we can
log the estimate if there is no current measurement, we just may have
difficulty testing the validity of the prediction on cycles where there is no
measurement.  Of course, not having a reading in the past does presumably
affect the prediction accuracy, but this is a real effect, not an artifact.

I guess one thing you would have to look out for when comparing results is
that if one tracker is failing association a lot in cases where the other
associates but gets worse-than-average results, then it will degrade the
apparent performance of the less picky tracker, which doesn't seem fair.

Generally, I guess you need to look at the sensitivity/accuracy tradeoff.  We
can trivially get no prediction error by refusing to make any predictions
(by not creating any tracks.)  Though just collecting statistics on
association rates, etc., would give some info, to really get at these kinds of
issues it seems that you need to find corresponding tracks between the two
different trackers so that you can observe that one tracker had less
consistent detection.

I guess finding corresponding tracks is not so intractable.  Corresponding
tracks should be alive at the same time, and spatially close at any given
time-step.  The ugliest bit seems to be setting up a framework where this kind
of examination can be done.

Not counting short-lived tracks is another measure that could make poor
performing trackers look good.


issues:

why do_validate on so many cycles?

For tracks not valid and moving, what prediction should we be testing against?
I guess for tracks we say definitely not moving, we should use not moving.
Less clear with unknown tracks, though currently we are assuming non-moving.
Could dump both moving and fixed errors.

todo:

refine which tracks are actually moving in dataset by vetting tracks flagged
as moving.  Also look at false negatives by higher sensitivity setting, or
even all do_validate tracks.

find bad matching tracks.


goal: detection sensitivity: false positive v.s. false negative.

What is the definition?  I'm reasonably happy with the false positive
definition that any non-moving object flagged as moving is a false
positive.  It's fairly reasonable to assume that any pedestrian
flagged as moving was truly moving, also, though we can look at video
and the raw data too.

False negative is much more problematic.  Is this on a cycle by cycle
basis?  Does this include startup time?  That is if we ever flag a
track is moving is this a detection?  It make some sense to hit if the
classification is unstable, but there's definitely always going to be
some startup time, and it seems silly to equate that with false
negative.  I guess for the full story you could present the data
excluding the first bit of time, as well as with, and then the
ultimate detection (ever moving)


Do test on velocity distribution for fixed targets.  This does let you
see the distribution of the velocity error (except we are now forcing
V to zero so this won't work in the existing test runs.)  This test
does (1) get at the performance of segmentation, data association,
error modeling, etc., and (2) directly gets at the number of false
positives, which is also interesting.  We can hopefully also show how
the track classification knocks out some of the outliers.

What are those weird outliers?  On the speed of fixed objects, we have
a few points at 8-12 m/sec.  Also, before the change to using only the
nearest match, it seemed that we had one match error greater than 1
kilometer!

pictures:

prediction error v.s. time:
 -- assessment of modelling error, state accuracy for time-varying objects.

speed distribution of non-moving objects (fixed + unknown + false positives)
 -- measure of performance of feature extraction, tracking for time-invariant objects. 
 -- directly relavent to false positive performance, but outliers don't always create
    false positive due to moving test.
 -- need to know false positives if we include.

speed distribution for other track subsets?  It seems we could show the speed
distribution of e.g. do-validate tracks, showing all the tracks we rejected as
moving despite their velocity.  However, some of those are surely false
negatives.

Some sort of stacked line (or perhaps bar if there is no real continuity)
graph showing how the split of ratios of track classification change depending
on various sensitivity settings.  This would be most compelling with real
accuracy data, e.g. false positive is a category.

actual moving
false positive
unknown - actually moving (false negative)
unknown - fixed
fixed

probably base fractions on tracker cycles rather than track counts, as this
minimizes the contribution from short-lived clutter tracks, and also captures
classification instability.

If we could look at all do-validate tracks, this would accurately assess the
performance of do_validate in terms of false negative, though not of the
entire system.  

can do false-positive/negative test for several levels with just one run at
max sensistivity.  The track IDS themselves are stable, so we can just watch
how the tracks move between classes.  Special attention for tracks that are
classified both as fixed and moving.

There's no point in using a larger data set than gives us whatever
statistical power we need, other than also perhaps to sample a wider
range of conditions.  However, if we need to be able to say something
about the nature of some large subset of the tracks, we can
statistically sub-sample.  So, for example we could sample from the
fixed or unknown tracks and see if we think they are moving, in which
case we have a false negative.  If there are none, we can at least set
an upper bound on the number of false negatives.


classification doesn't seem to respond all that strongly to the
parameters (< 2x.)  Actually, what would be particulary interesting
would be to look at all of the do_validate tracks as our most
sensitive test, since we can then directly assess the performance of
the motion test pass.

We could further increase the sensitivity to slow moving objects by
changing the do_validate test parameters, and would then surely find
some more slower moving tracks.  At some point we are down in the
noise, and the classifier pass is doing all the work.

It might be interesting to look at how the output classification is
changed by frobbing the do_valid test without changing the classifier
parameters, but that would be a bit esoteric to explain.  

What we do want a larger data set for is the hist_speed thingie, but
for that we really do want to know the false positives, which requires
manual tagging.  However, this need not be done at high sensitivity as
long as we are willing to accept what the false negatives do to the
distribution.  Also, this should really be done on another data set
(or different part than clip1) where we really don't think there is
any moving stuff.  That way the false negatives don't get us so much.
However, as Christoph points out, it is *not* kosher to select the
segment based on the absence of any identified moving tracks.

The little old noisy = (disoriented && !compact) test seems to be
doing about 1/3 the work of validate.  Maybe this test is too strict
for bicycles, etc.


todo:

track classification test (false positive/false negative)
 -- classify false positives highest sensitivity (in do_validate?)
 -- see how classification changes at different sensitivities
    sensitivities: do_validate, xhi, hi, med, low
 -- left and right side?  Let's do one side first, right I guess.

residual velocity:
 -- find new test sequence with few moving objects, 5-10 minutes long
 -- check for any moving objects by running with very high
    sensitivity.  exclude the actual movers from the data, leave in
    the false positives.
 -- look at outliers in the graph and see if they are really moving,
    if so exclude.
 -- make speed histogram of this data.  Do seperate X&Y in bus coords?
    This would show contribution from odometry error, and also give a
    nominally symmetrical distribution.
 -- also graph the false positives at a reasonable sensitivity level
    (medium).  The idea is to show a lower line representing the
    effect of the moving test.


History stuff:

Is there a less ad-hoc way to do this analysis?  Batch nonlinear KF?
Rao-Blackwellization? 

If we had an efficient batch-mode estimator, it could possibly even
replace the recursive estimation.  Maybe use a recursive constant-velocity
estimator for track startup and filtering of stationary tracks, then
batch-mode estimator for acceleration and turn-rate?

sometimes V correction makes things worse.  it seems that when it is large, it
is usually wrong.  This is partly addressed by invalidating when the
correction is large.

Also, V correction is not filtered, so is not lagged, whereas the moving/fixed
errors are.  Then when something goes wrong, we run with a bad V correction
until the new situation percolates through the filter.

history outlier thingie problematic.


input history V correction fo KF?
  treat as V measurement?

make V correction more KF-like so that we use covariance weighting?
  forward-reverse filter?
    Would be good for developing more accurate historic path.  Could
    extend the path back earlier to times where we had only a poor
    return, not good enough for track to be formed initially.

Should we be using corner->corner features?
  In theory this is redundant info, but we seem to do worse without,
  perhaps partly due to history V correction not being stochastic.
  The original motiviation (info becoming very high causing high error
  sigmas) has been addressed by normalizing the sigma by the info.



Occlusion:

Be aware of occlusion and possible overload dropouts.  Perhaps segment
across them, and also if a track is entirely occluded, allow it to
coast for a good while with no data. The zero-order idea would be to
take the updated old points and see if the all more or less fall
behind a known occlusion, in which case we don't kill off the track.

However, even ignoring "more or less", etc., this doesn't work because
the object won't instantaneously disappear behind an occlusion,
rather it will gradually move behind.  We need to keep the object from
fading away as it gradually moves behind the occlusion.  If we do it
based on points, then we need to merge the new and old point list
when there is occlusion at the end of the list (which we can tell
directly by looking at the occluded flag in the endpoint.)

Alternatively, instead of using the points we could possibly develop
a rectangular size model, and keep the size from being affected by
disappearance.  point-based seems tractable, though, and probably more
precise especially w.r.t. non-rectangular objects.

To merge, we could keep updated old points which fall outside of the
new occluded end, testing by talking the dot product of the predicted
ray to the old point and the normal of the occluded endpoint ray.

Should point merging be done even before data association? That way
the end feature would track right behind the object.  What then would
that do to vague?


Other segmentation ideas:

special segmentation mode when stopped based on motion detection?  Motion
detection is very easy when the scanner is not moving, and this could help us
to detect objects near clutter.  Points with significantly different radial
velocities may not be the same object (but consider legs.)  Also, anything
moving at all is a potential track.

It does seem there could be some benefit in making the segment threshold range
adaptive, however this would increase splitting due to small dropouts.  We
could also try to be smarter about fellow-traveller tracks, combining them
back into one object.

I don't know whether this would be good or not, but you could also dynamically
clump nearer points based on mutual closeness.  For example, if we assumed
there were three objects in a point set, then if we started combining points
by joining closest pairs, then when we are down to three objects, we are done.
It would be possible to combine this with a multi-model approach where we
assign different hypotheses for different plausible collections of cut
points.  This kind of approach would only need to be applied after gross
segmentation based on a large fixed threshold, reducing the search space.

Note also that when there is a known occlusion, this break should be largely
discounted as a cut point, as it is at least as reasonable to assume that the
object is continuous behind the front occlusion as to assume that it is
completely interrupted.  We could also entertain the idea of de-weighting
no-return points so that we can hold together better when there are dropouts.
As long as the hole is less than our gross segment threshold, we could more or
less ignore it for segmentation refinement.

Segmentation revamp ideas:
 -- top-down segmentation: existing tracks get first dibs on new points.
    Would help keep tracking if we initially acquire out of clutter, then go
    into clutter.  Could possibly also reduce problems with spurious track
    splitting due segmentation vagaries, as we could potentially hold together
    points into a single track even when they deveop a hole.
 -- motion detection mode: When vehicle is halted, look for range change in
    individual azimuth bins.  Segment clumps of moving points separately from
    adjacent non-moving points.
 -- Adaptive segmentation distance: when objects are close, points should be
    close together.  If we ignore gaps due to occlusions or no-return, then we
    may be able to get objects to cohere well even with much smaller segment
    threshold.
 -- other adaptive approaches: find clustering and cut points.
    segment outliers seperately (e.g. ped in front of a wall.)
 -- Identify fellow-traveler tracks and combine them into a single track.
    Might be important with pedestrians once we segment more finely, as legs
    will turn into two tracks.  Fellow traveler might be best identified by
    seeing if the change in position is similar over a reasonably long time
    horizon, e.g. a second.
 -- could try to special-case legs to keep pedestrians together.



Top-down segmenation:

What exactly is the idea?  

For each track:
 -- find new points that are a good match (close to feature or near
    bounding box.)  These points are the new points for that track.
    shape-classify then associate them with the track.
 -- segment remaining points basically as now and create new tracks.

hmmn, does this entirely replace existing data association approach?
It seems so.  Is that a good thing?  This would seem to be throwing
away a bunch.  It would reduce the need for robust fitting because
bad-fitting points would never be associated in the first place.  But
it does seem there is a risk of becoming completely wedged if the
orientation estimate is ever significantly wrong, as we would
disregard any evidence to the contrary

An alternative especially w.r.t. the spurious splits would be to merge
the several segments before association.  A concern with merging is
that we don't want to suppress true splits.  This could be dealt with
to some degree by just limiting the maximum hole size, but we could
perhaps be cleverer, such as if we have a good measurement on the
size, then if a hole develops, the size would be limited to the former
size.

Note that there is a difference in the appearance of a true split
v.s. a dropout.  In a true split, the total measured lengths of the
parts is the same or larger than the original (minus one inter-point
spacing), whereas in a dropout, the part lengths sum is significantly
less than the previous length, whereas the hole length plus the part
lengths approximately equals the previous length.

It seems that this could be a quantitative criterion for whether a gap
looks more like a dropout or a true split.  We could look at whether
the measured length is a better match for the decreasing part length
or same part length model.  Another way to look at it is that we would
be willing to tolerate holes such that the overall length does not
increase from the max credibly observed value.  The total amount of
hole is thus bounded.

Of course, in a true split, the parts must also have different
velocities.

Note also that actual splits of cars don't happen, so for some subset
of objects we don't need to worry too much about this possibility.  We
need worry mainly about pedestrians pulling away from cars or other
clutter objects.  It is certainly the case that false splits are the
overwhelming majority now.

w.r.t. the point about dropouts looking different from splits.  This
is another way of saying that dropouts violate the rigid-object model.


occlusion, dropouts, etc.

The idea of a general solution for dropouts seems to lead away from
the idea of point-based occlusion handling and toward a rigid
geometric model (rectangle) approach to modelling the unseen parts of
an object.

Rather than creating a new mechanism for modeling the right shape, we
should use the existing features.  We should hold the position of
unseen features fixed w.r.t. the rest of the object.  This is more or
less a matter of the association procedure.  Probably the way to do
this is the effectively force the longitudinal position of the
measured vague ends to the old (track) dimension.  Some care needed in
the feature resetting.  This doesn't entirely get at shape-constancy,
though.  We do for example via the measurement residue, get statistics
on feature performance, but since each feature is done independently,
we don't entirely see the correlation of feature motion that should
result from rigid motion.

Also, unless we keep the track as corner even when only seeing one
side, we have no way to remember the size of any side not currently
seen.  If we see the end, then the side, we really should remember the
end size somehow (though what effect this knowledge would have
anything is unclear.)

As in the point-based idea, we are clued in to occlusion by an
occluded end of the new segment.  So feature change from non-occluded
to occluded marks a time that we need to start processing that feature
as occluded.

Though we could blow this off at first, ideally we also want to detect
when an occluded feature should once again be visible, having passed
behing the occlusion (or whatever), and if it isn't there, then
disappear it.

A corner can't be vague or occluded.  If the true corner is occluded
this doesn't have a whole lot of effect until the segment splits in
two.  We then have two line segments that should be associated with
one track.

It seems there are a number of things we need:
 1] Merging multiple segments that overlap an existing track,
    especially when there is a reason (such as occlusion) to believe that
    the split is spurious.  In itself, this can deal with the case
    where a hole appears in the middle of an object, but the ends are
    visible.
 2] Keep the longitudinal part of vague feature position when a
    feature goes behind occlusion.  This (1) maintains continuity of
    object size (if anyone cares), and (2) keeps track of the object
    outline so that we can predict when it should become visible
    again.
 3] Keep alive tracks which are entirely occluded (at least for
    quite a bit longer than currently, if not forever.)


What about pedestrians?

hmmn, well hardly any of this applies, except for [3].  Pedestrians
are compact, so they don't get split up, even if they do walk behind a
small occlusion.  And the rectangular model doesn't really work.  So
we are left with keeping occluded peds alive longer.  I mean, we could
work a little harder to keep the box from being smashed by going
behind an occlusion, but it's not clear that it matters hugely, since
when it reappears, it will also be smashed.  But I guess the position
will be in error by the size of the box.

[...]

segment occluded flag.  If good linear fit, only at ends.  Otherwise
any occluded point.  This would probably recognize all cases of going
occluded, but doesn't deal with them becoming visible again.  I
suppose this could be a completely seperate pass killing occluded
tracks.

Let's figure out how the occlusion killing would work, since we pretty
much need it, and it might be usable for initial occlusion
detection too.

I guess the simplest way would be to extract some notion of the
track's extent, then convert these into the scanner polar coords, and
look at the segmentation data to see if there is anything in front in
that azimuth interval.

For "some notion of extent" it is tempting to just use the old
first/last points, but if the track has rotated a lot since loss of
signal, this becomes meaningless, and the first/last could even be
swapped w.r.t. the actual scan order.  I suppose most general would be
to look at all of the old points, and take min/max azimuth.  The
bounding box is fairly meaningless because it is not oriented normal
to the scan ray.  However, if you took the min/max of the four corners
you would get a conservative result.

Should really have a concept of full v.s. partial occlusion.  It would
be ridiculous to treat the track for a huge wall as fully hidden just
because one end was occluded.  If we see something and it has an
occluded end, it is partially occluded.  Only if a track is fully
occluded should we let it live a long time.  However, given the
rapidly building position uncertainty during the occlusion, it's not
so easy to say when the object should definitely reappear.  But if the
track was *ever* fully occluded during the pass behind, then that
rules out the silly wall case.

Kind of hackish, but a fair solution would be to require the track to
be partially occluded and compact at the time it disappears.  This
would of course work for pedestrians, and fairly often for larger
objects too, since the last visible bit will be small.  If we then
only kill it once it becomes completely non-occluded, then that
provides some margin for the position uncertainty.  

Explicitly considering the buildup of position uncertainty also is
relevant.  If we made the overlap region dependent on the position
uncertainty, then we would increase the chance of successful
association after coming out of the shadow.  But if we allow the
uncertainty to get really large, then we also risk ridiculous
associations. 

multiple scanners confuses the occlusion issue somewhat, in for any
given scanner, many tracks will be outside the FOV, hence nominally
occluded.  A track is only "really occluded" if it is not visible from
any scanner, but the tracking is done one scanner at a time.  This
seems to mostly work o.k., except for any heuristic related to "ever
being totally occluded", since most objects will be totally occluded
in some scanner on any given update.  This could work if we require
the track to both be totally contained in the FOV and totally
occluded.


going behind idea.  If a track appears behind an object, associate it
with the track that last went behind, even if it wouldn't otherwise
associate.   If a track appears when occluded, then it is coming out
of behind.  can treat splitting off similarly.  If a track joins, and
then there is later a split, then keep same track.   What is the
value, really?  looks good.  Really for this purpose, you could just
remember the track ID, rather than keeping to whole track
datastructure alive.  If you're going to force the association, then
there is not a whole lot of useful info in the old track.

Note that keeping occluded tracks alive may interfere with merge
detection, in that the track may appear occluded before the merge, so
won't die, at least until it wanders out into the open again.

hmmn, for robust compact segmentation (e.g. ped) we could minimize
code changes by keeping using the bounds, but making the bounds be
taken after outliers are discarded, based on a centroid approach
(K-means, etc.)


Model-based segmentation:

possibly seperable parts:
 split ObjectTrack into ObjectBase/ObjectTrack/Segment (done)
 get rid of bounds tracking:
    Do we need to quantitatively evaluate this, or can we just do it?
    As long as we don't infer heading from the orientation it's to see
    how this could work worse than the bounding box.

issues:
 -- what model to use for COMPLEX segments.  Answer: always use
    line/corner fit, even in complex.  No matter what, this will be no
    worse than the bounding box (as long as we don't get confused and
    believe that we have rotation info when we don't.)  Possibly relax
    right-angle convex corner constraint.
 -- line/corner segmentation/association problem.  [1] we need to
    either choose which model to use or try both.  Trying both is
    intractable when the mapping is not 1-1, since there are 2^N
    possible line/corner combinations when there are N tracks.  I
    guess on each fitting iteration, we can try both fits, and choose
    the one that seems more justified.  For the initial model, we use
    whichever the track had.  "more justified" could even consider
    whether either side of a corner is roughly parallel to the
    velocity, and should also consider which better matches the track.
 -- In k-means, use 2 independent lines for corner, or require right
    angle constraint?  This would work better for non-right-angle
    complex, and would support rhe angle-sharpness test in corner fit,
    but would mean that we would have to basically start over to do
    the final right-angle corner fit.  Could have a seperate class of
    corners that are concave or non-right-angle.
    
    Semi-seperable issue is whether we have k-means assign the points
    to legs, which could be done even with right-angle fit.  This
    would be good, since it would be be robust against outliers,
    unlike the knuckle-point scheme.

Split ObjectTrack class for segments and tracks?  This would mainly be
an efficency thing so that segments which never become tracks don't
have to carry around all of the track baggage.

And quite a bit of stuff is computed in segments which may never be
used at all in the case where the segment doesn't directly become a
track and is never associated with a track (covariances, direction
vectors, etc.)  This will happen much more often with the model based
segmentation, since when there is complex overlap, the initial
segments are discarded, keeping only the points.

There would also be clarity gain by partitioning the private variables
and methods.  It would also support segments having state that tracks
don't.

note that even when we have 1-1 overlap, there could possibly be some
benefit in using the track configuration as the initial model for the
fit.  This could avoid some extra fitting steps, and could also help
stabilize line/corner fit flipping.  Line/corner association and
segmentation could be a bit more intertwined.  We could possibly get
away with doing just one fit instead of three because we don't keep
discarding what we learned in the last measurement cycle.  In effect,
the iterative segmentation procedure can span multiple tracker cycles.

This would result in a cleaner more efficient overall implementation
than adding on model-based segmentation without changing 1-1
segmentation, but would also result in a code where we could no longer
change between the old and new approach using a simple switch.

There would also be some issue with finding the initial overlaps,
which does use fit info.  Either we need to live with not using fit
info or we need to generate an initial rough fit.

This segmentation revamp could also provide a path to proper
multi-scanner support where points from multiple scanners are
considered all together.  We would just pool all of the raw segments
from all of the scanners before looking for overlaps.  If an object is
visible in multiple scanners, it will have multiple segments.  We can
take all of these points and fit them to the track model.  After
points are associated with tracks, we can deal with the scan time skew
by applying a velocity correction to each measured point based on the
time skew.  This could also handle interlace effects.  During
K-means iteration, we would correct each point based on which track it
is tentatively assigned to.

split_from/merged_with

Is it such a terrible thing if we never set merged_with?  We'll have
to see.

Currently a track is split_from if it was created for a segment which
overlaps an existing track, but which did not end up being associated
with that track because another was better.

Model-based segmentation is going to need some other way to detect
splitting, because as the track is splitting we will keep growing the
track so that it always overlaps both actual objects.  The simplest
approach would be to have a max hole size.  Simple refinements treat
see-thru holes, no-return and occlusion holes differently.  Can also
look at whether the object seems to be growing, especially
proportional to hole size growth.

This is fairly obvious in the 1->2 split case, but how does this work
in the general MBS case where we have n->n+1?

It might be good to just implement the general MBS framework before
pondering this too much, because I don't currently have that great
intuition into how the K-means segmentation + tracker is going to
behave. 

The simplest approach would be to split tracks at the time we are
intializing the model for segmentation when the track has a big hole
(looking at longitudinal spacing) or well-separated clump of outliers
(worst fitting point is far from the line and there is a hole in the
point spacing along a lateral line to that point, and there are enough
points on the far side of the hole.)

That is, a track can have a hole in it even though it has all
consecutive scan points.  It can be a big range step.  But if the gap
exceeds the segment size, the back point will have occluded set.

I guess a question is to what degree we can do hole analysis on the
entire scan before segmentation.  This is most analogous to the
current PointData::occluded.  But for example, the just suggested
approach to seeing range holes depends on the line fit, so has to be
recomputed after fitting.

Based on the hole size criterion, this could be done as an iterative
procedure in segmentation.  However track/hole growth can only be
measured on tracks.

Note it is probably easier to do the hole analysis on segments, as we
can more easily then look at adjoining points.

Would there be any value in a "point" segment type?  The main
advantage I can see is that when the elongation is minimal then the
line model may not work so well.  The lateral/longitudinal is not well
defined, so the vague criterion and prior noise model may not work
well.  It is possible that taking the mean or a robust mean might
might localize better.  This seems to mainly apply to compact noisy
objects like bushes which don't move, however the centroid could be
used for tracking compact objects in general, and potentially would be
more efficient due to the simpler model.

For tracking a ped, the line model has the advantage of working well
with separate legs which can be the endpoints.  I suppose if you
wanted to get real tricky, you could model this as two points
belonging to the same track.


Model based part 2: over-fitting

problem especially for under-bus, as over-fit tracks are apt to
spuriously disappear.
 -- avoid spurious splits
 -- don't report out early tracks
 -- merge tracks where there is no/minimal detectable hole and sides
    are very similar.  Could also trial fit combined tracks and look
    at the fit quality.
 -- can also look at the raw top-down tracks that are input to
    model-based segmentaion and see if the model is compellingly better
    than the raw segmentation.
 -- reason about occlusion.  If foreground object actually existed, it
    would shadow background object and we would never see it.
 -- clearly the track we are split_from is a good candidate for
    re-merging. 
 -- when a track dies due to too few points, we can look around to see
    who we could merge with.  Also when a track goes down to two
    points during k-means iteration.

It seems that died_without_descendents should really look in both
directions in the split_from chain, since it is rather arbitrary which
track becomes the splittee. 

It is also not really right that we suppress under-bus due to died
without descendents, because something might split off and then
disappear far enough from the original object that it is clear it
hasn't remerged.  What we really want is a more accurate merged
indication.


Estimation:

random incompatible ideas:
 1] switch to simple constant velocity filter to find moving tracks,
    then more sophisticated filter to characterize moving tracks
    for turn rate and acceleration. (batch filter/smoother)
 2] Either feed lateral acceleration into rotation filter as an
    angular velocity innovation or change model to allow only forward
    acceleration.
 3] Batch estimator can be considered a fixed-lag smoother.  If I
    understand this literature, it (1) helps to give theory to what I
    am doing, and (2) might perform better and generalize to
    rotation/acceleration estimation.
 4] Feed velocity correction into linear KF as velocity innovation.
    This allows dynamics to be used to filter the V correction and
    should significantly improve the acceleration estimate.  Might
    also help the rot estimation a bit because the
    heading-from-direction would be smoothed.
 5] PI KF?  Is this anything like the effect of e.g.increasing from CV
    to CA model?

Is there a simplified version of STI that would be computationally
tractable, e.g. using an efficient circle fit algorithm and no knot
optimization? 

Note that the reason why path fluctuations are currently non-physical
is that physical limits are not applied.  The filter makes no attempt
to smooth out jumps in position that are non-physical due to excess
implied acceleration, etc.  Instead, we just suppress the contribution
to the velocity and acceleration estimates.  Also, the center
innovation is not subject to Mahalanobis limiting; a large feature
jump will cause the track to break up, but the center can do anything.

We could try to smooth the center more, but I think it would be better
to reformulate things so that we are not dependent on something that
is intrinsically difficult to know.  This has two parts:
 1] internally, estimation of e.g. curvature should not strongly
    depend on vagueness in center position, and
 2] our position output should be expressed in a way that makes clear
    what we do know with precision, namely the position of the near
    parts of the object (and not the center), which happens to be
    precisely what we need for collision avoidance.

[1] seems more or less doable, in a way similar to how the current
motion filter works, by effectively finding the innovation in the
coordinates of the individual features.  We can understand the motion
without having an explicit metric concept of the position.

[2] also seems moderately well addressed by emphasizing a rectangular
model as the output, and not explicitly outputting the center.
However, there are two issues.  One is for prediction to be done we
need some center of rotation, and for this to be done on the client
end, we should probably give the same center we use internally
(however derived.)  The other issue is how to express position
uncertainty of the corners in a way that accurately reflects our true
knowledge.  That is, it doesn't make sense to give a general position
uncertainty for all the corners.  We might for example pick the best
known corner, give an uncertainty for that, then express the
uncertainties of the side lengths.  I suppose it would also be
possible to e.g. give covariances for all four corners.

Note also that when we report out curvature, this needs to be w.r.t
some established reference point, most naturally the object center.
An error in the center position would not affect the prediction of the
location of the pointy end (which we are actually tracking) because
the client would use the same kinematics and center position to
reconstruct the pointy end feature position that we used in converting
the feature position curvature into the center curvature.

However, once we start looking at the path over time, we need to more
directly deal with the shift in the part of the object that we are
tracking.

Perhaps most problematically, consider the issue of visual path
display.  It seems that no matter what we do, we need to tolerate
either:
 1] jogs in the path that do not correspond to actual maneuvering, or
 2] retrospective changes in the entire historic path.

Whatever our reference point is, it is always subject to changes in
position either due to:
 1] explicit decision to change the reference point due to viewpoint
    change, or
 2] if we attempt to always follow the path of any particular point on
    the object (which could be the center, or in principle any other
    point, such as the first well seen feature), then we are subject
    to position changes due to the chosen feature position not being
    well known.

Either sort of position glitching if we know that the position change
is spurious, e.g. due to reference point switching, then we can
transform it out, leaving the path shape correct, but causing the
position itself to become basically unknown.  If we use the current
reference point position to transform the path into world coordinates,
then the entire path will jump when the reference point changes.

If we do not know that reference point motion is spurious, then this
inevitably causes jogs in the path, though they can be smoothed and
reduced by using a dynamic model.

Visually the entire path jumping around is unpleasant, but for actual
system purposes keeping the path in the coordinates of the current
position (whatever that is) is preferable because we are only
interested in the path shape, both for estimating the dynamics for
prediction, and for analyzing the past motion to classify track
behavior.

I suppose for display purposes, we can still attempt to maintain a
best guess of the center position (dynamic constraints, etc.), and use
this to transform the path into the world coordinates.  This tend to
reduce the magnitude of the path jumping.  If it is really that
strongly preferable to have a glitched path than one that is always
squirming around on the screen, then it would also be possible to
display the trajectory of the center as we do now, even though this
path doesn't actually have the same shape as the internal path.

Of course, the path shape for the four corners is not the same.  When
the turn radius approaches the size, there is a significant difference
in the radii of the corners.  We really should compensate for this
when changing reference points.  When you consider the curvature
effects of reference point change, it becomes attractive to maintain
the path in relatively fixed coordinates, e.g. of the center.
However, this is only tolerable as long as any path distortion effects
of the center uncertainty are less than the effects of the changing
actual radius of the reference feature.

If we consider the center uncertainty to be length and width
uncertainty, and separately estimate size v.s. position, then we could
transform out the path shape effect of changes in the size by recognizing
the spurious center translation due to the change in size estimate,
and effectively offsetting the historic path by this amount.  Then,
insofar as the center position changes less than the change in the
reference feature position, the path curvature will be more consistent
across reference point changes.

Conceptually, it makes sense to store the path in the coordinates of
the current center (whose world position is not accurately known.)
However, to avoid having to continually transform the entire path when
we only care about the shape, it seems to be more efficient to store
the path in the coordinates of the initial unknown center position.

There is some similarity of the concept of the path maintained
arbitrary relative coordinates and the current incremental motion
state in the KF.  In principle, the distance between the path ends is
the same as the incremental motion.

Either way, since the path shape is the based on the accumulation of
incremental motions, we can end up with a path which diverges from the
actual historic feature positions as you back in time.  I guess, in
fact, to avoid the ridiculous effect of the current path not being
near the current position, we must in effect repeatedly transform the
path, at least for display.  

Note also that a path shape that is out of whack w.r.t. the recorded
feature positions might be an indicator of a bad track.

Perhaps we could do batch analysis of the trajectory of the
incremental motion state.

What is the point behind STI, etc?
 -- better turn rate estimation
 -- maybe better acceleration 
 -- in general, better prediction
 -- move complexity out of the Kalman filter (flush accell
    states). may buy back of some the cycles lost in extra smoother
    complexity.
 -- give smoother (prettier), more compact output paths.  better for
    path shape analysis.

can we reduce false positives w/ better tracker?  Not clear how.  It
would help to maintain validity when maneuvering if we back projected
along the estimated path, rather than using just the current estimate.

The main idea I have w.r.t. validity is to more explicitly use the
right object model, directly estimating the length and width.  Lots of
times when we go astray there is a silly shape change.

Also, we need to reduce the spurious path breakups.  Fixing the
multi-scanner support would probably go a long way here, but there
could be more general approaches to recognize that a new track has
formed similar to a track that is dying.

In summary, batch estimation could potentially work on a single path
(rather than multiple feature paths) if appropriate coordinates are
used.  This would reduce the work required in compute intensive
smoothing algorithms.  However, this would tend to sacrifice the
current effect of the motion test of implicitly verifying the rigid
object assumption.

Another possibility would be to run a smoother on the single path to
develop a refined estimate, but then continue to evaluate the
individual feature fits to the model more or less as we do now
(possible enhancement by using recording feature correspondences,
rather than effecively re-associating.)  Rather than requiring the
features correspond to a simple circular path, when validating, we can
re-match to the inferred multi-segment manuver.

How to proceed w.r.t. estimation improvements?
 -- Evaluate how well approaches work regardless of performance,
    e.g. in matlab.  multi-model adaptive EKF, STI, simplified STI,
    Kalman smoother.
 -- evaluate possible performance, see how fast circle fitting is.
 -- actual code changes:
     - remove accell states, rotation
     - change/enhance/replace moving test.


Feature correspondence/shape modeling:

It seems like it would be an improvement to the moving test/batch
estimation to remember the feature correspondence used at association,
rather than trying to guess which was used.  

This might be easier if we used a model where corners are identified
by a fixed position on the rectangle outline, rather than the feature
index changing as the scanner/track aspect changes (and copying the
feature data from one position to another, e.g. in line/corner
association.)  This would in itself be a good thing.

Of course, when we first see an object, we have little idea which is
the front and which the side, so the feature assignment is basically
arbitrary.

To deal with multiple scanners, we are really wanting a full rectangle
model model anyway, since we can see all four corners.  And with doing
merging before segmentation, we need to be able to fit full
rectangles.  This seems an unpleasantly large change to the rather
involved segmentation code, but there is no good alternative I can
see.

With a rectangle model we need still need to be able to deal with the
possibility that anywhere from 0 to 4 corners are actually observed.


Texture classification:

If we one-dimensionalize, how?
 -- Line mismatch causes problems with non-straight surfaces.  If we
    sort points, it creates spurious HF artifacts in areas where we see a
    surface behind the line.  If we drop out-of order points, this
    suppresses some of the fuzzy nature of veg.
 -- Use polar scanner coords?  What is the advantage of the linear
    approach? Natural approach to interpolating to a fixed spatial
    frequency.  Looks at the spectrum of the fit error, not of the
    shape itself.  Approximately insensitive to rotation of the
    surface w.r.t. scanner ray.  Note however that orientation away
    from normal contributes basically a LF component (modulo end effects.)

Interpolate?


Not clear that there are often enough points in one scan to tell
much.  How can we combine multiple scans?  One way would be just to
average the spectra of the last N scans.  This would at least help
avoid being fooled when a particular scan looks pretty.

You could also combine multiple scans into a single point cloud and
then look at that.  If there is a clean surface and motion is
accurate, then all points will tend to fall on the surface.  With veg
it still looks pretty random.  This is another way of saying that the
scans should all look the same, the shape should be fixed.

Another weird way to combine the scans would be to append them with
alternating forward-reverse direction.  This should introduce spurious
energy mainly at the repetition rate, since as long as the scans are
similar the starts/ends should match up with each other.


over-splitting, especially with model-based segmentation, breaks up
vegetation a lot, reducing the amount of info in any given scan.
It might be a win to enable model-based segmentation only on segment
sets that contain a worthwhile track (e.g. ever_moving.)  This would
both reduce the over-segmentation of veg and also the CPU load
thereof.

A different tack would be some sort of geometric local smoothness
estimate.  For example, consider the area of the triangle formed by
each three consecutive points.  Not clear how it would actually work,
but it has several nice properties:
 -- independent of the model mismatch error, which we can also
 seperately use to classify.
 -- efficient.
 -- doesn't require interpolation.
 -- deals reasonably with variable point spacing.  Wide spacing is
    penalized, unlike in the frequency schemes where wide spacing
    suppresses the HF component.
In some sense, this acts as a bandpass filter near the spatial nyquist
frequency, whatever the actual spatial resolution is.  If the return
actually resembles white noise, then we see pretty much the same thing
at any frequency, however concentrating at the highest frequency
available (the nyquist freq) gives the strongest discrimination of the
HF v.s. LF component.

Whatever we do, we need some sort of story about how we're going to
get training data.  Basically I need to visually classify a lot of
tracks. 

I think that fuzzy classification might have the advantage of not
requring huge amounts of training data, as it's more a matter of
empirical paramter tweaking.
 
near term, I need some more better test cases to explore and evaluate
the idea.  It would be nice to have data with vehicles, pedestrians,
linear-but-not-rectangular (buildings), smooth-but-not-linear (tree,
some ground returns).  non-bus off-road data would be preferable.

Why do we want to classify?
 -- falsing
 -- client classification


If we're interested mainly in falsing, then we can look only at false
and true positive tracks, perhaps at higher validation sensitivity.


I think the thing to do is to replace the current validation rules
with a general classifier (such as fuzzy), with the current
characteristics as inputs:
 -- fit quality (disoriented)
 -- size (compact)
 -- history match error
 -- history info
 -- history covariance? (redundant w/ match error?)
 -- speed

To this we can add:
 -- local roughness
 -- holeyness, sparseness
 -- range (compensate for scanner dropouts.)
 -- time tracked
 -- longer time-frame dynamics, e.g. consistency of motion, random or
    purposeful motion.
 -- pedestrian wiggle
 -- shape constancy?  rapid variation in size, texture, fit quality,
    match error
 -- number of points/point density?  probably somewhat redundant
    with history info.

Falsing classes:
 It will probably help classifier to have several classes of bad
 tracks, as they may fall in to disjoint clusters in the feature
 space.  For example, ground return, brush, low-reflectivity.

User output classes:
 pedestrian
 vehicle (cycle, car, truck)
 fixed rectangle (artificial, poss fixed veh.)
 ground return (nothing actually there.)
 brush, grass etc.  noisy veg
 other fixed


segmentation/association:

for a lot of this stuff we need to be able to conistently associate
bad objects.  Currently we fail association when there is poor feature
match.  It might be better to associate but flag the track as dubious,
perhaps timestamp so that we can detect if a track becomes well
behaved.  Or more generally, I guess we can log the tracker
mahalanobis distance.  This will presumably be rather redundant with
the history match error, but could be an input to the classifier.

It seems though that most of the track crunchiness is due to
non-overlap rather than refused association.  In overlap segmentation
it only makes about a 10% reduction in the number of tracks to defeat
this test.

Changes to model-based segmentation might be more successful in that
it has the potential to prevent tracks from splitting.  However,
somewhat strangely even with very liberal split thresholds, MBS seems
to still be much more splitty.  It may be that the we are losing
tracks due to all the points going away.  We could force the track to
stay alive by moving it to the nearest point.

We can work harder to control the size of the model, rather than just
allowing it to evolve uncontrolled.  Rather than punting MBS when
there is a lot of tracks, just limit the number of tracks, perhaps
proportional on the number of points.  

hmmn, anyway is it really all that crucial that we associate clutter
objects?  At least for falsing no, since if we don't associate them we
can't show false motion.


infrastructure:

Need tools & configuration for classifying experiments with stable
track IDs so that we can code training data once then mess with
classifiers w/o breaking the track association.  Of course, as long as
we're messing with segmentation tracks will not be stable.  So I need
to settle on some segmentation I'm reasonably happy with before coding
lots of data.

Need a way to code tracks with deterministic playback.  Did this using
bus replay before using the track outputs.  Need a way to make this
work with NL11 and XUV data.  Think christoph already made this work
with GD data.

It would be possible to make tools considerably more helpful in
coding, e.g. by detecting tracks that need to be coded, highlighting
them, then accepting and logging the coding.

Does model-based segmentation inherently cause problems with
determinism?  I don't really think so as long as the classification
has no feedback effect on segmentation.  Perhaps in reality it should,
but for classifier traning purposes, this could be suppressed.  Note
however that a top idea I had for improving MBS on veg was to only do
it when there was an ever_moving track in the segment set, which
exactly the kind of feedback from classification to segmentation which
would break determinism across classifier changes.

Also it would be reasonable to do classifier training with MBS off
then hope there would be comparable classification performance with it
on, especially after the veg splitting issue is resolved.  

Another approach to track correspondence between runs would be to
somehow associate tracks based on their space-time location rather
than ID.  However this would be nontrivial and could never work 100%,
especially in the presence of segmentation changes.

Hmmn, another way to relate independent tracker runs would be to look
at which raw points were associated with which tracks, and this might
also help to give insight into how the segmentation changes.  So if
two tracks had pretty much the same points in them, then they are the
same tracks.


Actions:

coding any significant amount of data demands some resolution of
segmentation issues so that the coding effort is not thrown away when
segmentation is changed.

any kind of even vaguely rigorous classification will require a fair
amount of coded training data.


Point-level scanner fusion:

Issues:
 1] Places where scan order is used.
 2] Time skew.  Two-effects: self-motion and track motion.



Places where scan order is used:
 1] In segmentation to assign points to the two sides of the corner.
    A more general algorithm could replace.  Or maybe we don't even
    care which point is on which leg.  It's not being used outside of
    segmentation, it's only used in the corner fit itself.
 2] adjacent_point in vague determination.  Could continue to work as
    now, using whatever scanner happens to be the one for the endpoint.
 3] Segment::find_gap for splitting.  Topo order can find gaps, but
    not so good for sight-line reasoning about the nature of the gap.
    This is more on a per-scanner basis.
 4] Datmo::split_tracks assumes points for child tracks contiguous.
    Track splitting is already a MBS thing, so not a big deal to defer
    to MBS to do the point assignment.  All we need to do is add a new
    model in about the right place.
 5] Datmo::closest_segment, prev point assumed near.  A hueristic.
    Would work with any resonable topological ordering.
 6] simple segmentation, however this can continue, as MBS does the
    fusion (means we need MBS to get the fusion, though,
    r.e. classification testing where MBS over-splitting is a problem.)



Time skew:

self motion, just correctly convert into absolute coordinates based on
scanner time.  Annotate points with their shot time and then linearly
adjust point position for time skew based on track velocity.  We could
adjust all points to an arbitrary time.  Probably better to minimize
error would be to take the mean time of all points, then adjust them
to that time.  Then that delta_t is used in the actual update.  This
would at last give sound time semantics.


Scan order:

Fairly general solution for topological (e.g. segmentation) uses is to
sort by position along the line.  We can have the same monotoniticty
issues that showed up in one-dimensionalization, however any resulting
fuzz probably won't affect the assignment to sides.  However it would
mess up one-dimensionalization if we sorted!  Note that we can also
use MBS to assign points to sides.


MBS basically works on point clouds already, so there isn't any big
problem there.

The concept of a topological surface point ordering is clearly
valuable for finding gaps in the object, etc.  Unless this is
completely intractable, we should generate a new one based on point
position along the segment.

Perhaps we should use a constraint that the points from any given
scanner still be in order.  In other words this is a merge of point
lists, not really a sort.  Discard funky points?  Only use linear
position for resolving order of cross-scanner pairs?

Note that during MBS we move only one point at a time, so it might
make more sense to keep the ordering valid rather than sorting all the
time.   

Also, clearly when we add a point to a segment, the point should go
adjacent in the ordering to whatever point it is closest to in a 2D
sense.

Note that if the ordering depends on the line fit, then it can change
fairly arbitarily on each MBS iteration, since we refit.  This means
we would have to do a sort.  Closest point is not going to change
however.

Now, I'm not sure I really want to bite off 2D closest-point, but we
are already doing 2d closest point stuff in MBS already aren't we?  If
we add another dumb closest-point operation it won't be much worse
than now, and if we do anything clever for the MBS iteration, similar
technology should apply here.

More precisely the point should go in-between the two closest points.
However it is possible that the closest two points may not be
consecutive.  I suppose we can also insert it in the line segment that
it is closest to, which provides a reasonable way to disambiguate
complex situations.

It is not immediately obvious that the resulting ordering is
independent of which order points are added to the set.  Probably it
is given single POV for the scan using geometric constraints, but when
there are multiple POV all bets are off.

Note that with multiple scanners it is possible to observe that an
object contains disconnected parts, invalidating the idea of
representing the object as a single surface.  Also if you can see all
sides of an object, then there is no definite start/end points to the
surface, it is a cycle.

So, given that in general we are hosed, does it really matter?  What
do we want from the surface order anyway?  Mainly to locate split
points and then divide the point lists.  If we could somehow detect
the split point, then we could just split the model and let MBS
reassign the points (it's going to do that anyway.)  For corners, the
knuckle-point idea is not dependent on the ordering.  There are
probably ways to make do without the topological ordering using
e.g. k-means, but it's not clear that would be a win any of
performance, code simplicity or runtime.  

MBS already has problems with *not* using topological constraints,
e.g. fitting across holes.  Of course it's impossible to correctly
model a 3d world in 2d, and given this it's questionable trying to
perfect obscure problems that can arise even when 2d geometry does
hold.  In general we want to be able to handle multiple scan planes,
and we in fact already have that on the XUV due to the low front
scanner.

It seems that given multiple beams the best you can do to form a 2d
model is to average the surfaces seen by each beam.  Of course, that
is basically what is done by the line fit.

Since one of our main purposes is to detect holes, we want a
representation that makes that easy.  Seperate per-beam point lists
don't do that.  We need to merge them, and we don't really care all
that much about the vaildity of the implied surface in the non-hole
areas (though this does affect stable structure analysis.)

Hmmn, well, for there to be a gap on all scanners taken together,
there must also be a gap on each individual scanner.  So there may be
a way to efficiently detect gaps without computing a full topological
point ordering.


Blather blather.  O.k., anyway the idea is that we add points one at a
time, inserting them between the two points that define the closest
line segment.  All else is just optimization.  We can do some
optimizations based on exploiting the scan ordering (e.g. binary
search or merge), and there might be more general ways to optimize
based on a tree structure or whatever.  However, there isn't any way
to define a sort predicate based on just two points, other than by
deferring to some auxiliary datastructure that would have to amount to
what we want already.

So we have a point ordering, so the scan order problem is solved.


It does seem that there's some reasoning that should really be done on
single beam data, though.  We could asess gap based simply on the size
of a hole in the line, but it really means something different
depending on whether there is a missing return, hole or occluding
object, or just a large gap due to resolution limits.  And that
classification needs to be done on a per-beam or per-scanner basis
because it relates to sight-lines.


Another thought.  Perhaps we could express topological constraints to
MBS by inputting lines that are sight-lines through holes.  Segments
should not cross these lines.  For example, if you saw a point in a
different segment set though a hole in a segment, then you could add
the scan sight-line as a restriction.  Probably more important is
enforcing the restriction that segments themselves don't overlap.


blah, blah, o.k., yeah all of the problems could be solved without a
topo order, but as long as they can be solved with, why bother?  The
only thing topo order does not completely solve is find_gap, where we
classify missing returns (due to hole, occlusion or whatever), in
addition to the simple geometric analysis of the point spacing.  If
we're to keep this, then more work is needed.

At least locally in the code, we'd need to keep track of the previous
scan pos for all beams that have a previous point in the segment, not
just one beam.  Could we just drop this missing return classification,
or simplify?  The only way we use the info is in weighing the severity
of the hole.

As long as we have some data do we really care if there is a gap of
any sort on some of the beams?  What does it mean to have some data,
though.   If all beams are missing data it means something.  If any
one doesn't have a gap, then we can ignore others.   The topo ordering
does make missing-return gaps uglier to detect, but it is nice for the
geometric assessment.


Hmmn, just to make life difficult, what about multi-beam slope
assessment?  We're still interested in the surface, but we are also
interested in associating points on different beams that have the same
azimuth.  If we have points nicely lined up with similar XY but
different Z's, then that is nice evidence for verticality.  However,
when we don't have that we need to prove that we should be seeing that
before we get too alarmed.

Though it seems possible in principle to make use of widely spaced
scanners to assess verticality, it's not as straightforward as when
the sight-lines are the same because we can't use azimuth to determine
which points should lie on the same surface. 

But anyway, if we did want to exploit this, then we'd probably want a
point representation where we have parallel lists for each beam, but
associate each point with the closest points on the other beams.


Found a pointer on how to do least-squares fit on corner or rectangle
using SVD (constrained least squares.)  It does *not* address how to
assign points to sides, so I guess this is not magically going to fall
out of a better fit algorithm, but it could work in combination with
K-means. 


Note also that it is definitely not the case that we can do a merge
combination of entire scans with different POV because objects may
appear in different orders in the two scans.  In particular, if two
objects line on a line that passes between the two scanners, then the
objects appear in opposite orders in the two scans.  This can be
considered to be an effect of seeing opposing sides of the same thing.

Note that we definitely do have useful information above and beyond
the point cloud present in the scan orderings.  In particular, we can
infer a surface normal from consecutive points in one scan.  This can
be used to disambiguate situations like the one above because we can
recognize that we are seeing opposite sides of an object when the
surface normals of the two scans are opposite rather than (nominally)
the same.  In practice there is issues with determining how to best
estimate a surface normal in the presence of noise and what criteria
to use to determine whether the normals are "approximately the same"
or not.
   
This is all useful insight into the general problem, but not clear how
it helps with the specific one.

First, I think that if scan-merging is done per-track rather than on
entirely scans, then many order anomalies may be avoided.  Also, if we
do find the closest point pair straddling the point we are adding,
then we will usually build the correct surface ordering as long as the
scans overlap.  There can definitely be local minima during the scan.

I suppose we could also consider the apparent surface normals in the
concept of "closeness".  We could also use them in a test to drop
points that are problematic.  w.r.t. closness, it's hard to see how to
weigh the orientation v.s. distance.  There's got to be a more or less
arbitrary blending factor.  Also, seems nonsensical to pick a position
far away just because of orientation.  

Another thing you could do is after finding a surface order based on
closest line, you could assess the normals in the apparent overlap
regions for general agreement.  With well-behaved objects there should
be a good match, even with non-vertical objects and different scan
planes.  If the match is poor, then either the object does not have a
consistent surface (veg) or we botched assembly of the surface
ordering.  Either way this is a problem indicator.

In some cases the mean and covariance of the surface normals for each
scan could provide a pretty clear indication that we're seeing
opposite sides of the same object.  Really the surface normal is
partially encoding info about the scanner ray (object inside/outside),
and purely geometric considerations can tell us a lot about how scans
should align without thinking a whole lot about the object.

That is, if we know which side of an object the scanner is on (which
we do), then with simple objects (no holes, etc.) no occlusion, we
automatically know which scanner covers which side of the overlap
region.  That is if the points are AAAAABABABABABABBBBBBB, we know
which scanner is A and which is B given only the scanner
orientations.  If B is clockwise from A, then the clockwise end of the
object will be covered by B.  Not clear what use this idea is, as with
occlusion, basically anything can happen in the overlap region.  You
could get AAAAAAABABABABABBBBAAABBBABABABBBBBBBBB, and this sequence
can be truncated from either end by occlusions, meaning that basically
any sequence is possible.  Of course, if a part of an object is not in
a scanner's FOV, then we know we won't see any points there.

With surface normals, it might be more convenient to assign each point
a normal which is the mean of the two segments on either side.  This
provides a bit of smoothing, in particular by suppressing single
front/back outliers, and also may make some code simpler by providing
a good place to stick the normals.  I guess smoothing by taking means
of normals can also be generalized to a sort of LPF by taking weighted
means based on distance or just equal means over a larger window.

The front/back side issue is the same one that was causing problems in
One-deification, but this is a different way to get at the problem which
does not depend on the linear fit.  If we have a consistent surface
ordering, then when we go to one-deify, then when we see retrograde
points along the line we just flip the sign and convert it into
further positive point displacement.  In other words, always dump the
points in surface order, and take the absolute value of the X
increment.  We could of course, continue dropping retrograde points,
but I think this test would preserve useful information.

With noisy stuff like veg, we will probably tend to find that
different scanners are not giving a story consistent with a smooth
surface.  This is in itself valuable information if we can figure out
how to preserve it.

Note also that if an object is not substantially vertical and the scan
planes differ in height or orientation then we get two different
surfaces at different ranges, and when we combine the scans will infer
a surface roughness that doesn't really exist.  For one thing, the LPF
in stable structure analysis will tend to reduce this in many cases,
and maybe this is also not such a bad thing anyway to be able to
detect non-vertical objects.

Hmmn, I think that as long as we are using scanners mounted on a
single vehicle pointing away from each other (e.g. dot product of scan
rays always positive), then ordering by inserting points on the
closest line will always work because the sort of indentations in the
surface that could cause opposite sides of the surface to appear
closer than the truly connected edges *would not be visible*.  The
closest points must always be connected.

In fact, even when objects are disconnected we compute a reasonably
consistent surface ordering, but we may assign an ordering where
groups of points are out of scan order due to the front/back problem.
This is I think sufficient (but not necessary) to establish that the
object is disconnected.  In any case, we should not infer a surface
normal at positions where the surface order jumps backward in scan
order.

Occlusions can be a problem, though, since the segment that we would
have been closest too may not be visible.  I think it's not too bad.

