Biological Sequence Analysis

 Biological Sequence Analysis

A major application of bioinformatics is the analysis of biological sequences, which was sparked by the development of the Basic Local Alignment Search Tool (BLAST) program in 1990. This is despite the fact that its beginning dates back several years before the development of this algorithm, between 1969 and 1977. In 1969, sequence analysis of tRNAs was used to infer interactions resulting from associated changes in nucleotide sequences, which eventually led to a tRNA secondary structure model.

Another concern of these years that led to a massive global project was the Human Genome Project. Before this project, limited information about human genes had been discovered. However, not only were the volumes of this information not significant, but not all chromosomes were examined. But, preventing hereditary genetic diseases, such as Down syndrome, is impossible without studying all chromosomes. The purpose of this project was to map and locate about 25,000 human genes (located on 23 different pairs of chromosomes) and to determine the function of each of these genes. This had not previously been possible using non-computational methods.

Biological sequence analysis is the process of analyzing the sequence of RNA, DNA, or a peptide using one of the available analytical methods. Biological sequence relationships and the meaning of these relationships are studied to identify evolutionary relationships (common ancestors) and to predict unknown sequence structure and function.

These goals can be achieved by using algorithms and search tools from biological databases such as BLAST and FASTA, phylogenetic tree construction methods, repetitive algorithms like genetic algorithms, statistical models such as Markov chains, dynamic programming like Smith-Waterman and Needleman-Wunsch, scoring matrices like PAM and BLOSUM, Bayesian alignment algorithms, progressive alignment models like ClustalW and Gibbs Sampler, transformation grammars, neural networks, etc.

Using such methods to solve biological sequence problems has provided more efficient methods in terms of time and cost, as well as solving problems that were not possible with laboratory methods. There have been a number of works in this field, including:

  1. RNA structure analysis and prediction
  2. Comparison of sequences
  3. Global and local sequence alignment
  4. Improving database searching by sequence
  5. Multiple-sequence alignment
  6. Pattern and profile methods of identifying distant homologs
  7. Genomic analysis
  8. Protein structure prediction

For more information, please refer to the following sources:

  1. Singh D. B., Pathak R. K. (2021) Bioinformatics: Methods and Application. Academic Press.
  2. Borodovsky M., S. E. (2006) Problems and Solutions in Biological Sequence Analysis. Cambridge University Press.
  3. Durbin R., Sean. E, Krogh A., G. M. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. illustrate. Edited by R. Durbin. Cambridge University Press, 1998. Available at: https://books.google.com/books?id=R5P2GlJvigQC.