CAFE
Computational Analysis of gene Family Evolution
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
simerror.cpp File Reference
#include <string>
#include <vector>
#include <algorithm>
#include <sstream>
#include <fstream>
#include <stdexcept>
#include <iterator>
#include "simerror.h"

Functions

int backup_original_count (pCafeFamily pcf)
 
int restore_original_count (pCafeFamily pcf)
 
size_t get_random (std::vector< double > misclassification_probability)
 
void write_strings (std::ostream &ost, char **items, int size, std::string delimiter)
 
void write_species_counts (pCafeFamily pcf, std::ostream &ost)
 
void simulate_misclassification (pCafeFamily pcf)
 
double simerror (pCafeFamily pcf, std::string prefix, int repeat)
 

Function Documentation

int backup_original_count ( pCafeFamily  pcf)
size_t get_random ( std::vector< double >  misclassification_probability)

random sampling based on misclassification probability. This may have a bug in it as there is nothing inherently preventing r from going over the limit Is there an unstated assumption that that misclassification_probabilities add up to 1?

int restore_original_count ( pCafeFamily  pcf)
double simerror ( pCafeFamily  pcf,
std::string  prefix,
int  repeat 
)

to check how simex is working generate data with additional error (Ystar) by applying the misclassification matrix to the true data ( random sampling of true factors=f, with probabilities following the column=f of the misclassification matrix ) then run simex on the variable with error (Ystar), and compare it with the true model (based on Y) and the naive model(based on Ystar).

first find the naive estimator by assuming there is no error in the data. then we will add even more error and store the estimates for each dataset with increasing error (k = 0.5, 1, 1.5, 2) so in the end we have (1+number of lambda) * (number of parameters) estimates. for each i = k we run the estimate j = B number of times. each time add error by applying the misclassification matrix = errormatrix^k update the estimates based on error added data, store the mean estimates for all B runs. then predict the estimates at k=-1

void simulate_misclassification ( pCafeFamily  pcf)
void write_species_counts ( pCafeFamily  pcf,
std::ostream &  ost 
)
void write_strings ( std::ostream &  ost,
char **  items,
int  size,
std::string  delimiter 
)