Saturday, January 23, 2010

Phylogenetic tree


A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believe to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, and The Edge lengths in some trees correspond to time estimates. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units (HTUs) as they can not be directly observed.

A rooted phylogenetic tree is a directed tree with a unique node Corresponding to the (usually imputed) most recent common ancestor of all the entities at the leaves of the tree. The most common method for rooting trees is the use of an uncontroversial outgroup - close enough to allow inference from sequence or trait data, but far enough to be a clear outgroup.

Unrooted trees illustrates the relatedness of leaf nodes without making assumptions about common ancestry. While unrooted trees can always be generated from rooted ones by simply omitting the root, root can not be inferred from an unrooted tree without some means of Identifying ancestry, this is normally done by including an outgroup in the input data or introducing additional assumptions about the relative rates of evolution on each branch, such as an application of the molecular clock hypothesis. Figure 1 depicts an unrooted phylogenetic tree for myosin, a superfamily of proteins.

Both rooted and unrooted phylogenetic trees can be either bifurcating or multifurcating, and either labeled or unlabeled. A bifurcating tree has exactly two descendants Arising from each internal node, while a tree multifurcating may have more than two. A labeled tree has specific values assigned to its leaves, while an unlabeled tree, sometimes called a tree shape, only defines a topology. The number of possible trees for a given number of leaf nodes depends on the specific type of tree, but there are always more multifurcating than bifurcating trees, more labeled than unlabeled trees, and more rooted than unrooted trees. The last distinction is the most biologically relevant; it Arises Because there are many places on an unrooted tree to put the root. Among labeled bifurcating trees, the number of unrooted trees with n leaves is equal to the number of rooted trees with n - 1 leaves.

A dendrogram is a broad term for the Diagrammatic representation of a phylogenetic tree.

A cladogram is a tree formed using cladistic methods. This type of tree only represents a branching pattern, ie, its branch lengths do not represent time.

A phylogram is a phylogenetic tree that explicitly represents number of character changes through its branch lengths.

An ultrametric tree or chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.

Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Distance-matrix methods such as neighbor-joining or UPGMA, Which calculate genetic distance from multiple sequence alignments, are simplest to implement, but do not invoke an evolutionary model. Many sequence alignment methods such as ClustalW also create trees by using the simpler algorithms (ie those based on distance) of tree construction. Maximum parsimony is another simple method of estimating phylogenetic trees, but implies an implicit model of evolution (ie parsimony). More advanced methods use the optimality criterion of maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data.

Tree-building methods can be assessed on the basis of several criteria:

* Efficiency (how long does it take to compute the answer, how much memory does it need?)
* Power (does it make good use of the data, or is information being wasted?)
* Consistency (will it converge on the same answer Repeatedly, if each time given different data for the same model problem?)
* Robustness (does it cope well with Violations of the assumptions of the underlying model?)
* Falsifiability (does it alert us when it is not good to use, ie when assumptions are violated?)

Tree-building techniques have also gained the attention of mathematicians. Trees can also be built using T-theory.

No comments:

Post a Comment