Computational biology uses mathematical and computational approaches to address theoretical and experimental questions in biology. Since 1980s, numerous algorithms have been proposed to analysis DNA and protein sequences. For example, dynamic programming has been widely used in sequence comparison and alignment, so are hidden Markov models in protein profiling. Lately machine learning techniques have found success in fields like microarray analysis, literature searching and structure prediction.

non-coding RNA (ncRNA) projects

For a long time it has been known that some genes dont encode for proteins (rRNA,tRNA, ..., etc.). However, during the last decade an increasingly number of genes, tentatively grouped as snoRNA, miRNA, siRNA, antisense RNA, long mRNA-like RNAs, tncRNA etc, have revealed that ncRNAs play a major role in gene regulation and other cellular processes. A consequence of this is that the central dogma that genetic information flows from: gene > RNA > Protein is outdated or at least need vigorous adjustments.

Even though the function of many ncRNAs has been found, the function of the majority of them is still unknown. Finding the target of these ncRNA genes, along with spatial and temporal expression profiles will help us in elucidating the function of these genes.

miRNA and the tumor suppressor p53

MicroRNAs have been the subject of intense recent interest. They come from endogenous short hairpin precursor structures and usually target other loci with similar but not identical sequences for translational repession.

Recent reports showed that at least 30% of the human genes may be targeted by an miRNA (Lewis, B.P., et al., Cell, 120, 15-20, 2005).

This project aims at finding the miRNA targets (usually located within the 3'-UTR of a protein coding gene) on p53 related genes. For example, we could study the miRNA targets among the transcription factors that are known to regulate the expression of TP53. Alternatively, we may study the p53 downstream genes that could be the targets of miRNAs.

The antisense project: AISTAR

Recently, an increasingly number of eukaryotic genes has been found to have transcription on the antisense strand, overlapping the sense transcription to various degrees. What is the consequence of such transcription? Many sense-antisense pair with small overlap and/or differential transcription will probably not affect each other. However, in many situations sense-antisense transcription have a regulatory effect. For instance it has been shown a correlation been imprinting and antisense transcription. One of the challenges in this area is to determine which of these sense-antisense pair that influences each other, which do not, and which one is experimental/computational artifacts (e.g., not overlapping, spurious transcripts or genomic contamination). In this project we use an in silico approach to identify functionally significant sense-antisense pair.

The transacting RNA project

Most of the known ncRNA acts by complementary binding in trans to other RNAs (e.g. snoRNA and miRNA/siRNA). In some situations both participants of an interaction are known, in others the target (and function!) is unknown. A major task today is to discover the target of these ncRNA. Often these genes are classified as ncRNA based on a known expression without any potential ORF. In other situations, they are classified based on a similarity to other known ncRNA classes in its primary and/or secondary structure, but also here the target is often unknown. The major goal of this project is to discover novel RNA-RNA interactions using comparative genomics.

Protein functional analysis using amino acid physicochemical properties

conserved protein motifs

protein subcellular localization

classifying unknown proteins using amino acid property-based approaches

assess the significance of deleterious missense mutation of a protein

predict the potential allergenicity of unknown proteins


Old projects

Metaheuristics

We also tries to apply metaheuristics algorithms to bioinformatics problems. Metaheuristics are a class of approximate methods designed to solve hard combinatorial optimization problems. These problems traditionally belong to a discipline called operations research. We found that many bioinformatics problems could also benefit from appropriate metaheuristics algorithms. We have also investigated possibilities of using signal processing techniques to study DNA and protein sequences. This idea is to convert biological sequences into numerical sequences and to do analysis in the numerical space. The idea is not completely new but has certainly not underwent extensive studies.

Multiple sequence alignment

Multiple sequence alignment is an important tool for computational biology. We try to use a metaheuristics called tabu search for this problem. Assuming an appropriate objective function could be defined, multiple sequence alignment would become a combinatorial optimization problem. Tabu search is a guided global optimization strategy which has many engineering applications. Focus are put on iterative approaches since they separate the computation of objective functions from the procedures to reach an optimized solution. Signal processing is another discipline that appears to be promising in bioinformatics. A few applications have been found in matching DNA sequences. There are quite few attempts, however, in the analysis of protein sequences. Since proteins are composed by twenty amino acids, the conversion between protein's primary sequence to a numerical representation become more complicated. This will be our first research objective. Subsequent attempts will be to extract hidden features, such as conserved motifs, out of the numerical representation of proteins. To link pure algorithm development to real life applications, we aim at establishing a bacterial genome annotation platform that will automate the functional annotation for bacterial genomes. In-house developed algorithms will also be applied to search protein motifs responsible for human allergenicity.

A distributed and parallel implementation of the popular multiple alignment software, ClustalW, has been completed. It is written in C/MPI and is a open source software. It runs on massively multiple processors as well as simple PC clusters. Our implementation has been used various research groups including groups at European Bioinformatics Institute, UK and Institute for Infocomm Research, Singapore. We have designed and implemented an adaptive iterative algorithm for protein multiple sequence alignment. The algorithm has been benchmarked using a manually curated database called BAliBASE. In terms of the alignment quality, our algorithm outperforms other popular multiple alignment tools in three out of the five benchmark sequence sets. In the other two sets, we are able to produce comparable alignments.

Last updated on July 14, 2005.