SIMPLE INTRO TO THE PRIMARY DATA TYPES AND OPERATIONS IN MBL CONTENTS [1]BASICS: [2]The Major MBL datatypes ----- [3]USIN ----- [4]USOUT ----- [5]REGDEGS ----- [6]FACODES ----- [7]EXTENT ----- [8]ATTNAMES ----- [9]SIDAT ----- [10]ATREE ----- [11]TERMS and COEFFS ----- [12]APRIOR ----- [13]AREQUEST ----- [14]BREQUEST ----- [15]LOC (Locators) [16]EXAMPLE CODE BASICS: All data structures have a "make a copy of me and recursively copy all my contents" function. And all data structures have a "free me and recursively free all my contents" function. If the data structure is call "plop", then the two above functions will be called plop *mk_copy_plop(plop *p) and free_plop(plop *p) Furthermore, any function that returns a plop, and has mk_ in its title, e.g. plop *mk_plop_from_qibble_and_squibble(quibble *q,sqibble *sq) is guaranteed to produce a newly allocated plop, in which all subfields of plop are also allocated (if necessary by including copies of q and sq) so that after its creation nothing that happens to q affects plop, nothing that happens to plop affects q, etc. Any plop that you create by calling a mk_ function, you must also eventually free with free_plop(p). The Major MBL datatypes USIN A usin is represented as a dyv (dynamic vector; amdm.h). It is an unscaled (i.e. raw) point in input space. USOUT A usout is represented as a dyv (dynamic vector; amdm.h). It is an unscaled (i.e. raw) point in output space. REGDEGS Describe which terms are to be used in a polynomial. FACODES Describe pretty much everything you need to know about a given choice of a function approximator. GMstrings can be turned into facodes, and facodes can be turned into gmstrings. /* Usin size says how many inputs there are. Calls a my_error() and prints problem explanation if string is illegal. Use mk_facode_from_user_string( ) if you want to do graceful error handling. */ facode *mk_facode_from_string(char *string,int usin_size); /* Makes a gmstring. Must be freed with free_string() */ $v char *mk_string_from_facode(facode *fc); fc->rd : This field of a facode gives the regdeg associated with the facode. EXTENT An extent denotes the rough, rounded, minimum and maximum ranges of the input and output features in a dataset ATTNAMES An attnames stores names of input and output columns SIDAT A sidat denotes a dataset of numeric inputs and outputs, a set of attribute names, and a rough sketch of the minimum and maximum ranges of the inputs and outputs. si -> ext is the extent of the sidat si -> ans is the attribute names of the sidat si -> usins is the matrix of unscaled input vectors dym_ref(si->usins,i,j) is the j,th component of the i'th input datapoint. si -> usouts is the matrix of unscaled output vectors dym_ref(si->usouts,i,j) is the j,th component of the i'th output datapoint. sidat *mk_sidat_from_filename_simple(char *fname); Loads a sidat. my_error()'s if problem. ATREE A kdtree that allows fast predictions. Contains a dataset in which all points are scaled and stored in an efficient access manner. atree *mk_atree_from_sidat_and_facode(sidat *si,facode *fc); dyv *mk_predict_from_atree(atree *at, facode *fc, dyv *query_usin); TERMS and COEFFS A term is a dyv representing the terms in a multivariate polynomial. e.g. if the input space was 2-d with inputs x1 and x2, then linearly scale x1 to z1 so that z1 lies between 0 and 1. Linearly scale x2 to z2 so that z2 lies between 0 and 1. Then terms = (1,z1,z2,z1*z1,z1*z2,z2*z2). dyv *mk_term_from_usin(dyv *usin, extent *ext, regdeg *rd); A coeffs is the terms of a multi-input and possibly multi-output linear map in which scaled predicted output = coeffs^T term This can be implemented by dyv *sout = mk_dym_transpose_times_dyv(coeffs,term); The unscaled output (or "usout") is computed from the sout as follows: dyv *usout = mk_usout_from_sout(extent *ext,dyv *sout); (See mk_predict_from_atree in atree.c for example) The following data structures are probably not necessary for you to know about: APRIOR Represents a prior for the Bayesian regression aprior *mk_aprior_from_facode(facode *fc); AREQUEST Data representing about half the infomation in a facode. Specifically it is information needed to build an atree (see above), including distance metric info, regdeg info, but not kernel width info or number of neighbors. arequest *mk_arequest_from_facode(facode *fc); BREQUEST Data representing the other infomation in a facode. Specifically it is all but the information needed to build an atree (see below), so includes kernel width info and number of neighbors and weight function info. brequest *mk_brequest_from_facode(facode *fc); LOC (Locators) A locator represents a point in a distance metric space. Jeff, please document. EXAMPLE CODE #include "sidat.h" #include "ammarep.h" #include "stats.h" #include "mblcli.h" void test_mbl_main(int argc,char *argv[]) { char *filename = "test.mbl"; sidat *si = mk_sidat_from_filename_simple(filename); /* See http://www.cs.cmu.edu/~AUTON/??? for legal sidat file format. */ facode *fc = mk_facode_from_string("A30:SN:{9}",sidat_num_inputs(si)); /* See http://www.cs.cmu.edu/~AUTON/??? for metacode strings. */ atree *at = mk_atree_from_sidat_and_facode(si,fc); int i; bool classify = FALSE; explain_sidat(si); wait_for_key(); explain_facode(si,fc); wait_for_key(); for ( i = 0 ; i < 10 ; i++ ) { dyv *query_usin = mk_constant_dyv(sidat_num_inputs(si),(double)i/10.0 ); dyv *predict = mk_predict_from_atree(at,fc,classify,query_usin); fprintf_dyv(stdout,"Query",query_usin,"\n"); fprintf_dyv(stdout,"Predict",predict,"\n"); free_dyv(query_usin); free_dyv(predict); wait_for_key(); } free_atree(at); free_facode(fc); free_sidat(si); close_statistics(); /* The prediction code calls stats.h code which allocates some global structures (contai ning interpolated tables). This frees all of those things. */ am_malloc_report(); wait_for_key(); } References 1. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#0 2. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#1 3. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#2 4. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#3 5. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#4 6. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#5 7. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#6 8. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#7 9. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#8 10. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#9 11. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#10 12. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#11 13. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#12 14. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#13 15. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#14 16. file://localhost/afs/cs.cmu.edu/project/learn/group/doc/ambl.html#15