#include "muscle_tree.h"
#include "seq.h"
Go to the source code of this file.
|
int | Mbed (tree_t **tree, mseq_t *prMSeq, const int iPairDistType, const char *pcGuidetreeOutfile, int iClustersizes, const char *pcClusterFile) |
| From scratch reimplementation of mBed: Blackshields et al. (2010); PMID 20470396.
|
|
◆ Mbed()
int Mbed |
( |
tree_t ** |
prMbedTree_p, |
|
|
mseq_t * |
prMSeq, |
|
|
const int |
iPairDistType, |
|
|
const char * |
pcGuidetreeOut, |
|
|
int |
iClustersizes, |
|
|
const char * |
pcClusterFile |
|
) |
| |
|
extern |
From scratch reimplementation of mBed: Blackshields et al. (2010); PMID 20470396.
Idea is a follows:
- convert sequences into vectors of distances
- cluster the vectors using k-means
- cluster each of the k clusters using upgma (used cached distances from above?)
- join the sub-clusters to create on tree (use UPGMA on k-means medoids)
- Parameters
-
[out] | prMbedTree_p | Created upgma tree. will be allocated here. use FreeMuscleTree() to free |
[in] | prMSeq | Multiple sequences |
[in] | iPairDistType | Distance measure for pairwise alignments |
[in] | pcGuidetreeOut | Passed down to GuideTreeUpgma() |
- Note
- : if the number of sequences is smaller than MAX_ALLOWED_SEQ_PER_PRECLUSTER then there's no need to do the subclustering first. In fact it costs some extra time. However, it's insignificant and for simplicities sake we don't do any special checks
- Returns
- Zero on success, non-zero on error