CAFE
Computational Analysis of gene Family Evolution
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
CAFE Usage

In order to give CAFE commands, you must do it interactively through its shell, or by providing CAFE with a shell script listing CAFE commands. To run CAFE interactively, type cafe at your shell prompt. If all is well, the prompt should change to #. At this point you may now begin inputting commands. To exit the shell, type exit. If you need to run multiple analyses using similar inputs, you can provide CAFE with a shell script. These scripts should be saved as text files with UNIX line endings. Scripts may be then executed from your OS or CAFE's shell. Here is an example:

#! cafe
# version
# date
load -i data/example2.tab -t 10 -l logfile.txt -p 0.05
tree (((chimp:6,human:6):81,(mouse:17,rat:17):70):6,dog:93)
lambda -s -t (((1,1)1,(2,2)2)2,2)
report resultfile

In this example, the first line indicates the location of the CAFE shell program. Subsequently, lines beginning with "#" are regarded as comments. Thus, the example above only executes lines 4, 5, 6 and 7. Remember that to run a script you must make the file executable from your OS shell prompt (in the UNIX shell, this is done with chmod a+x filename). CAFE will automatically exit after the last command in the script is completed, so it is not necessary to specify the exit command.

Running an analysis will generally require at least four commands:

  1. load to specify the gene families to analyze
  2. tree to specify the structure of the phylogenetic tree
  3. lambda to specify a specific lambda value or to have CAFE search for a value
  4. report to return the results of running the analysis

Commands

Caferror

caferror.py is a Python script included with the CAFE v4.0 and later software packages that uses the errormodel command iteratively to estimate error in an input data set with no prior knowledge of the error distribution. caferror.py uses the likelihood scores of runs with varying error models to perform a precise grid search of the likelihood surface. The program first estimates average global error across all species in the input phylogeny and then may continue to individual species estimations depending on -s.

Example

$ python caferror.py [-i shell script filename] [-e initial error value] [-d output directory name] [-l log filename] [-o output filename] [-s value]

Note that:

  1. Python 2.6 or newer must be installed on your machine. You can find it on https://www.python.org;
  2. caferror.py must run in the directory in which CAFE is located, as it uses that path to run CAFE.

Command line options

-i shell script filename: The main input of caferror.py is a CAFE shell script, as shown above. caferror.py will extract the following information needed from the shell script and use it to run CAFE many times to estimate error: the input gene family file from the load command, the tree command, and the lambda command. caferror.py will not overwrite the input script, but will instead write its own.

-e initial error value: This is the value with which caferror will begin the grid search. This should be a floating point value between 0 and 1. Default: 0.4.

-d output directory name: caferror.py runs CAFE many times, and therefore creates and stores many error model and CAFE log files. All CAFE log files, error model files, and caferror.py output files will be stored in a directory specified with this option. If the directory has not been created, caferror.py will create it automatically. Default: caferror_tmp_dir_x, where x is an integer one higher than the previous default directory.

-l log filename: caferror.py keeps track of the error estimates and scores in its own log file. This is also where you will find the final error estimates. The user may specify the name of the file with this option. Default: caferrorLog.txt.

-o output filename: The user may specify a name for the output file with this option. The error estimation algorithms create a curve for visualization if plotted, and this output file contains two tab-delimited columns consisting of error model and the corresponding score while using that error model. Simply copy and paste these data points into your favorite graphing software to see how caferror.py estimated the error. Default: caferror_default_output.txt.

Note: this option outputs data points for the global error estimation.