CONTENTS
BASICS:
The Major AMBL datatypes
----- USIN
----- USOUT
----- REGDEGS
----- FACODES
----- EXTENT
----- ATTNAMES
----- SIDAT
----- ATREE
----- TERMS and COEFFS
----- APRIOR
----- AREQUEST
----- BREQUEST
----- LOC (Locators)
All data structures have a "make a copy of me and recursively copy all my contents" function. And all data structures have a "free me and recursively free all my contents" function.
If the data structure is call "plop", then the two above functions will be called
plop *mk_copy_plop(plop *p)
and
free_plop(plop *p)
Furthermore, any function that returns a plop, and has mk_ in its title, e.g.
plop *mk_plop_from_qibble_and_squibble(quibble *q,sqibble *sq)
is guaranteed to produce a newly allocated plop, in which all subfields
of plop are also allocated (if necessary by including copies of q and sq)
so that after its creation nothing that happens to q affects plop, nothing
that happens to plop affects q, etc.
Any plop that you create by calling a mk_ function, you must also eventually free with free_plop(p).
A usin is represented as a dyv (dynamic vector; amdm.h). It is an unscaled (i.e. raw) point in input space.
A usout is represented as a dyv (dynamic vector; amdm.h). It is an unscaled (i.e. raw) point in output space.
Describe which terms are to be used in a polynomial
Describe pretty much everything you need to know about a given choice of a function approximator. GMstrings can be turned into facodes, and facodes can be turned into gmstrings.
/* Usin size says how many inputs there are. Calls a my_error() and prints problem explanation if string is illegal */
facode *mk_facode_from_string(char *string,int usin_size);
/* Makes a gmstring. Must be freed with free_string() */
char *mk_string_from_facode(facode *fc);
fc->rd : This field of a facode gives the regdeg associated with the facode.
An extent denotes the rough, rounded, minimum and maximum ranges of the input and output features in a dataset
An attnames stores names of input and output columns
A sidat denotes a dataset of numeric inputs and outputs, a set of attribute names, and a rough sketch of the minimum and maximum ranges of the inputs and outputs.
si -> ext is the extent of the sidat si -> ans is the attribute names of the sidat si -> usins is the matrix of unscaled input vectors dym_ref(si->usins,i,j) is the j,th component of the i'th input datapoint. si -> usouts is the matrix of unscaled output vectors dym_ref(si->usouts,i,j) is the j,th component of the i'th output datapoint.
sidat *mk_sidat_from_filename_simple(char *fname);
Loads a sidat. my_error()'s if problem.
A kdtree that allows fast predictions. Contains a dataset in which all points are scaled and stored in an efficient access manner.
atree *mk_atree_from_sidat_and_facode(sidat *si,facode *fc);
dyv *mk_predict_from_atree(atree *at, facode *fc, dyv *query_usin);
A term is a dyv representing the terms in a multivariate polynomial. e.g. if the input space was 2-d with inputs x1 and x2, then linearly scale x1 to z1 so that z1 lies between 0 and 1. Linearly scale x2 to z2 so that z2 lies between 0 and 1. Then terms = (1,z1,z2,z1*z1,z1*z2,z2*z2).
dyv *mk_term_from_usin(dyv *usin, extent *ext, regdeg *rd);
A coeffs is the terms of a multi-input and possibly multi-output linear map in which
scaled predicted output = coeffs^T term
This can be implemented by
dyv *sout = mk_dym_transpose_times_dyv(coeffs,term);
The unscaled output (or "usout") is computed from the sout as follows:
dyv *usout = mk_usout_from_sout(extent *ext,dyv *sout);
(See mk_predict_from_atree in atree.c for example)
The following data structures are probably not necessary for you to know about:
Represents a prior for the Bayesian regression
aprior *mk_aprior_from_facode(facode *fc);
Data representing about half the infomation in a facode. Specifically it is information needed to build an atree (see above), including distance metric info, regdeg info, but not kernel width info or number of neighbors.
arequest *mk_arequest_from_facode(facode *fc);
Data representing the other infomation in a facode. Specifically it is all but the information needed to build an atree (see below), so includes kernel width info and number of neighbors and weight function info.
brequest *mk_brequest_from_facode(facode *fc);
A locator represents a point in a distance metric space. Jeff, please document.