Consensus-tree based Likelihood Estimation for AdmiXture (CLEAX) is a program for 
automatically detects population structures, identifies the population history, and 
learns divergence time and admixture fraction. 

Reference: To be updated.

Version History:
Version 1.0 - Capable of inferring only three populations admixture/non-admixture 
	      scenarios.  Assumes that the population with least supporting 
              weight from observed data is the admixed population.
Version 2.0 - Capable of inferring three or more populations adimxture/non-
              admixture scenarios.  Automatically infers population history by
              incorporating all possible admixed/non-admixed scenarios into 
              the MCMC chain.

Compatibility:
The program has been compiled and tested in both Windows and Linux with GNU C++ 
compilers and GNU make.

Compilation:
To compile the program, go to build directory and type:

make clean
make all

This should produce a program called cleax (or cleax.exe in Windows) in the build 
directory.

Usage:
cleax property-file

property-file is a space delimited configuration text file.  Each line will specify a 
specific parameter input supplied to the program.  Below is a list of options the 
property-file understand:

Mode		Program execution mode.  There are currently 4 modes allowed (Normal/
		ConsensusOnly/MarkovOnly/ComputeOnly).  A "Normal" mode allows the 
		program to read a ConsensusInputFile consisting of the SNP data and 
		performs automatic identification of subpopulations and history 
		inference.  A "ConsensusOnly" mode performs only the automatically
		identification of the subpopulation from the SNP dataset by reading
		a ConsensusInputFile.  A "MarkovOnly" mode reads from a specialized
		MCMCInputFile consisting of model bipartitions and its associated 
		weights and performs the history inference.  A "ComputeOnly" mode reads
		both SNP data from ConsensusInputFile and a model bipartition set data 
		from ModelPartitionsInputFile.  Using the SNP data from ConsensusInputFile,
		the program then computes the weights associated with each model bipartition
		in the ModelPartitionsInputFile.

ConsensuInputFile	Location of the genetic variation data.  The program assumes that the
		input is consisted of space-delimited bi-allelic variation dataset 
		where 0 represents one allele and 1 represents another. (See examples/
		example-0.6-0.05-0.2.hap for example)

MCMCInputFile	Location of the input file used for running MarkovOnly mode.  The file
		consisted of two sections: Weights and Models.  A Weights section begins
		with a line with the word "Weights" followed by a line of weights associated 
		with k model bipartitions.  Each weight is separated by one or more spaces.  
		A Models section begins with a line with word "Models" followed by k lines of 
		model bipartitions.  Each model bipartition line consisted of 0s and 1s without
		any spaces.

ModelPartitionsInputFile	Location of the input file used for running the ComputeOnly model. 
		The file specifies the k model bipartitions the user is interested in computing
		the weights associated with each model bipartition.  Each line in the file 
		represents a model bipartition.  A model bipartition is represented with 0 and 1 
		without any spaces. 

OutputFile	Location of the file where the program will write its output to.

MaxParts*	Maximum number of model bipartitions the program will identify.  If
		no value is specified, the program will identify the optimal number
		of model bipratition based on minimum description length (MDL) criteria.

NumEMIters	Number of simulated annealing/expectation maximum iterations the program
		will go through before returning the best scored consensus tree.

NumMCMCIters*   Number of MCMC iterations the program will sample before returning the 
                average expected paramters.  Default will be set at 20,000 iterations.

NumGenealogies* Number of genealogies each sample will consisted of.  Default value is 30.  
                Ideally, the number of genealogies should be at least as many genealogies
                as the number of recombinant sites.

Penalty		Penalty score added to the tre score that used to penalize large 
		complicated consensus tree.

PopSize+	Effective population size.

MutationRate+	Mutation rate.

SeqLength+	Length of the sequence dataset.  NOTE: This is the sequence length in
		base pair, not the length of the SNP dataset.

* Optional parameter
+ The three parameters are used to determine theta that is used to compute the expected number of variant sites.  This parameter is by default sampled by CLEAX.  If you want to fix theta, the three parameters must be specified in order for the program to not sample theta.  