This folder contains output from DomArchov's preprocessing and precalculation
modules. The following files are generated for each lineage.

  - alphabet

      A list of all superfamily IDs present in the corpus. Generated with
      ConsolidateData.py.

  - domainArchs

      A list of domain architectures. Genearated with ConsolidateData.py.

  - domainCounts

      Singleton counts. Generated with ConsolidateData.py.

  - doubleCounts

      Bigram counts. Generated with ConsolidateData.py.

  - tripleCounts

      Trigram counts. Generated with ConsolidateData.py.

  - rawDomainArchLength

      Domain architecture lengths in the number of domains. Generated with
      ConsolidateData.py.

  - rawDomainArchCompactLength

      Domain architecture lengths, with tandem repeats collapsed. Generated with
      ConsolidateData.py.

  - sequences

      Domain architectures of all sequences present in the SUPFAM annotations.
      Genearated with ConsolidateData.py.

  - onlySingleton

      The list of domains that have only been observed as singletons in the data
      set. This means these domain alone can make up a protein, and they are
      not observed together with any other domain in any other domain
      architecture. Generated with PreCalculation.py.

  - startDomain

      The list of domains that can be inserted as the first domain given an
      empty domain architecture. Generated with PreCalculation.py.

  - doubleEndCount

      Bigram counts with the second position flexible. Generated with
      PreCalculation.py.

  - doubleEndType

      The number of distinct domains that are not observed after a certain
      domain in the raw dataset. Generated with PreCalculation.py.

  - trippleMiddleDomain

      The domains that can exist between two given domains. Generated with
      PreCalculation.py.
