|
|
| |
|
| |
| |
| |
Computational biology uses
mathematical and computational approaches to address theoretical and experimental questions in biology.
Since 1980s, numerous algorithms have been proposed to analysis DNA and protein sequences. For example,
dynamic programming has been widely used in sequence comparison and alignment, so are hidden Markov models
in protein profiling. Lately machine learning techniques have found success
in fields like microarray analysis, literature searching and structure prediction.
non-coding RNA (ncRNA) projects
For a long time it has been known that some genes dont encode for proteins
(rRNA,tRNA, ..., etc.). However, during the last decade an increasingly number of
genes, tentatively grouped as snoRNA, miRNA, siRNA, antisense RNA, long
mRNA-like RNAs, tncRNA etc, have revealed that ncRNAs play a major role in gene
regulation and other cellular processes. A consequence of this is that the
central dogma that genetic information flows from: gene > RNA > Protein is
outdated or at least need vigorous adjustments.
Even though the function of many ncRNAs has been found, the function of the
majority of them is still unknown. Finding the target of these ncRNA genes,
along with spatial and temporal expression profiles will help us in elucidating
the function of these genes.
miRNA and the tumor suppressor p53
MicroRNAs have been the subject of intense recent interest. They come from
endogenous short hairpin precursor structures and usually target other loci with
similar but not identical sequences for translational repession.
Recent reports showed that at least 30% of the human genes may be targeted by an
miRNA (Lewis, B.P., et al., Cell, 120, 15-20, 2005).
This project aims at
finding the miRNA targets (usually located within the 3'-UTR of a protein coding
gene) on p53 related genes.
For example, we could study the miRNA targets among
the transcription factors that are known to regulate the expression of TP53. Alternatively,
we may study the p53 downstream genes that could be the targets of miRNAs.
The antisense project: AISTAR
Recently, an increasingly number of eukaryotic genes has been found to have
transcription on the antisense strand, overlapping the sense transcription to
various degrees. What is the consequence of such transcription? Many
sense-antisense pair with small overlap and/or differential transcription will
probably not affect each other. However, in many situations sense-antisense
transcription have a regulatory effect. For instance it has been shown a
correlation been imprinting and antisense transcription.
One of the challenges in this area is to determine which of these
sense-antisense pair that influences each other, which do not, and which one is
experimental/computational artifacts (e.g., not overlapping, spurious
transcripts or genomic contamination). In this project we use an in silico
approach to identify functionally significant sense-antisense pair.
The transacting RNA project
Most of the known ncRNA acts by complementary binding in trans to other RNAs
(e.g. snoRNA and miRNA/siRNA). In some situations both participants of an
interaction are known, in others the target (and function!) is unknown. A major
task today is to discover the target of these ncRNA. Often these genes are
classified as ncRNA based on a known expression without any potential ORF. In
other situations, they are classified based on a similarity to other known
ncRNA classes in its primary and/or secondary structure, but also here the
target is often unknown. The major goal of this project is to discover novel
RNA-RNA interactions using comparative genomics.
Protein functional analysis using amino acid physicochemical properties
conserved protein motifs
protein subcellular localization
classifying unknown proteins using amino acid property-based approaches
assess the significance of deleterious missense mutation of a protein
predict the potential allergenicity of unknown proteins
Old projects
Metaheuristics
We also tries to apply
metaheuristics algorithms
to bioinformatics problems. Metaheuristics are a class of approximate methods designed to
solve hard combinatorial optimization problems. These problems traditionally belong to a discipline called
operations research. We found that many bioinformatics problems could also benefit from appropriate
metaheuristics algorithms.
We have also investigated possibilities of using signal processing techniques to study DNA and protein
sequences. This idea is to convert biological sequences into numerical sequences and
to do analysis in the numerical space. The idea is not completely new but has certainly not
underwent extensive studies.
Multiple sequence alignment
Multiple sequence alignment is an important tool for computational biology. We try to use a metaheuristics
called tabu search for this
problem. Assuming an appropriate objective function could be defined, multiple sequence alignment
would become a combinatorial optimization problem. Tabu search is a guided global optimization strategy which
has many engineering applications. Focus are put on iterative approaches since they
separate the computation of objective functions from the procedures to reach an optimized solution.
Signal processing is another discipline that appears to be promising in bioinformatics. A few applications
have been found in matching DNA sequences. There are quite few attempts, however, in the analysis of
protein sequences. Since proteins are composed by twenty amino acids, the conversion between protein's
primary sequence to a numerical representation become more complicated. This will be our first research
objective. Subsequent attempts will be to extract hidden features, such as conserved motifs, out of
the numerical representation of proteins.
To link pure algorithm development to real life applications, we aim at establishing a bacterial genome
annotation platform that will automate the functional annotation for bacterial genomes. In-house developed
algorithms will also be applied to search protein motifs responsible for human allergenicity.
A distributed and parallel implementation of the popular multiple alignment software, ClustalW,
has been completed. It is written in C/MPI and is a open source software. It runs on massively multiple processors
as well as simple PC clusters. Our implementation has been used various research groups including
groups at European Bioinformatics Institute, UK and Institute for Infocomm Research, Singapore.
We have designed and implemented an adaptive iterative algorithm for protein multiple
sequence alignment. The algorithm has been benchmarked using a manually curated database called
BAliBASE. In terms of the alignment quality, our algorithm outperforms
other popular multiple alignment tools in three out of the five
benchmark sequence sets. In the other two sets, we are able to produce comparable alignments.
Last updated on July 14, 2005.
|
|