Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics and Phylogenetic Analysis

Similar presentations


Presentation on theme: "Bioinformatics and Phylogenetic Analysis"— Presentation transcript:

1 Bioinformatics and Phylogenetic Analysis
Edgar Scott Multicampus Bioinformatics Education Specialist

2 What is Bioinformatics
Interdisciplinary field that combines principles and techniques from computer science, probability and statistics, and linguistics to the study of genomic and proteomic sequences. Biological database for storing and organizng DNA and protein sequences Computational tools for analyzing sequences

3 Phylogenetic Analysis and Bioinformatics
Phylogenetics – study of evolutionary relationships Phylogenetic trees used to represent evolutionary relationships Use of protein or DNA sequences to detect relationships versus morphological characters Bioinformatics provides both sequence repositories and sequence analysis software.

4 Overview Acquiring Data Set Analyzing Data Set
Text searching at the National Center for Biotechnology Information (NCBI) Sequence similarity and homology Sequence similarity searching with Basic Local Alignment Search Tool (BLAST) Analyzing Data Set Phylogenetic Analysis with Molecular Evolutionary Genetics Analysis (MEGA) 3.1 software Build multiple sequence alignments of sequences using ClustalW Build phylogenetic trees

5 Text Searching at NCBI NCBI maintains provides molecular information and bioinformatic tools to the scientific community GenBank – an archival DNA and protein sequence database RefSeq – a curated DNA and protein sequence database Entrez Gene – a gene centered database

6 Sequence Similarity and Homology
Homology – sequence that share a common ancestral sequence Paralogs – arise via gene duplication Orthologs – arise via speciation event Xenologs – arise via gene transfer Evolutionarily related sequences have similar sequences. Sequence differences correspond to amount of change that has occurred since they last shared a common ancestral sequence.

7 Sequence Alignments Sequence Alignment – a process that identifies a series of characters or character patterns that are in the same order in both sequences. Pairwise Global alignment Pairwise Local alignment Optimal alignment – an alignment between sequences in which the number of matching characters are maximized and the mismatching characters are minimized. Quantifying alignments Alignment score of the optimal alignment Percent identity scores Percent similarity scores

8 Sequence Similarity Searching
Basic Local Alignment Search Tool (BLAST) Blastp, Blastn, Blastx, Tblastn, & TblastX Local alignments are reported Expectation Value – the number of times an investigator can expect to find an alignment that has an alignment score as good or better than the alignment score under consideration.

9 Steps to Build a Tree Build a multiple sequence alignment of data set.
Analyze multiple sequence alignment using either distance based methods or character based methods.

10 Molecular Evolutionary Genetics Analysis (MEGA) 3.1
Phylogenetic Analysis program Constructs multiple sequence alignment using ClustalW Provides tree building methods Distance based Methods UPGMA Neighbor-joining method Minimum Evolution Character based Method Maximum Parsimony Provides a great help document!

11 Multiple Sequence Alignment
Multiple Sequence Alignment – an alignment between three or more sequences. Computationally classified as NP-hard Programs ClustalW – fast, applies a progressive method T-Coffee – slower, applies an advanced progressive method Dialign – slow, applies an iterative method Combine – combines multiple sequence alignments

12 Tree Building methods UPGMA, Neighbor-Joining, Minimum Evolution
Distance based methods Analyze the multiple sequence alignment to calculate a distance matrix. Clustering algorithm analyzes the distance matrix to determine which sequences should be clustered. Maximum parsimony Character based method Analyze the multiple sequence alignment to create a tree whose tree length has been minimized.

13 Tree Reliability Bootstrapping – method for assessing the reliability of trees. Steps The original data set is resampled several times (e.g. 1000). For each resampling, a tree is built The trees created from the resampling iterations are compared to the original tree.

14 Review Acquiring Data Set Analyzing Data Set
Text searching at the National Center for Biotechnology Information (NCBI) Sequence similarity and homology Sequence similarity searching with Basic Local Alignment Search Tool (BLAST) Analyzing Data Set Phylogenetic Analysis with Molecular Evolutionary Genetics Analysis (MEGA) 3.1 software Build multiple sequence alignments of sequences using ClustalW Build phylogenetic trees


Download ppt "Bioinformatics and Phylogenetic Analysis"

Similar presentations


Ads by Google