/*******************************************************
 * Copyright 2012-2022 Dannie Durand  durand@cmu.edu
 *
 * This distribution is for use by peer reviewers, only.
 *
 * Files in this distribution may not be copied and/or distributed,
 * without the express permission of Dannie Durand
 *
 * DomArchov was designed and written by Collin McCormick, Yifan
 * Xue, Xiaoyue Cui, Yangi Yi, Alejandro Garces, Maureen Stolzer,
 * and Dannie Durand.


DomArchov is a data-driven domain architecture simulator designed to
reproduce the constraints on domain order and adjacency observed in
nature.  Transition probabilities are estimated from domain pair frequencies
derived from a corpus of genuine domain architectures.

DomArchov is described in "Simulating Domain Architecture Evolution"
currently under review by the ISMB 2022 program committee.


This distribution contains the following directories:

 - bin

    In addition to the simulator itself, this directory contains
    functions to extract domain statistics required for estimation
    of the simulator parameters.

 - ExamplePrimates

     DomArchov takes as input (1) a text file containing an
     "experiment descriptor" in JSON format and (2) files containing
     domain combination statistics.  This directory contains an
     example experiment descriptor and primate genome test data.

Main modules

  - ConsolidateData.py

      Raw data pre-processing of genomes downloaded from Superfamily. This
      program is mainly responsible for retrieving all the unique domain
      architectures from the raw datasets, and calculating the basics statistics
      of domains. This includes the counts of each domain, each pair of domains
      and each triple of domains, which are used in calculating transition
      probabilities, and the frequencies of domains, which are used in assessing
      the results of the simulator.

  - PreCalculation.py

      To make the simulator more efficient, values that used repeatedly in a
      typical simulation are pre-calculated. The program that carries out this
      function is written in PreCalculation.py. Note that the ”domainCounts”,
      ”doubleCounts”, and ”tripleCounts” files generated with ConsolidateData.py
      are required to execute the PreCalculation.py. Therefore, when running
      ConsolidateData.py during the pre-processing step, the arguments ”-s”,
      ”-d”, and ”-t” must be selected.

  - DomainArchitectureGenerator.py

      The main body of the simulator. This program takes in all the input
      parameters in JSON format from a text file.

The following commands will reproduce the primate experiments in the manuscript.
They must be executed from the DomArchov/ directory.

    python3 ConsolidateData.py -usdtaflc ExamplePrimates/
    python3 PreCalculation.py ExamplePrimates/
    python3 DomainArchitectureGenerator.py ExamplePrimates/expdesc.txt
