The enormous amount
of information generated by genome projects, appropriately organized, is causing
a paradigm shift in theoretical and experimental research. Increasingly,
computational methods need no longer start from first principles, which is
often extremely difficult. Reliable statistics derived from large data
sets can be used to obtain predictive rules, and experiments, as in the physical
sciences, begin at the computer workstation, move to the lab bench, and then
back again to the computer for analysis.
My research program
currently consists of the following components:
Gene Regulation
(Gene Hub at ZLAB )
Cis-elements are the atomic sites in DNA (or RNA) that control gene
expression. Since these sites are usually highly variable, they are
difficult to find in a genomic sequence more than a few thousand base
pairs long. Our general approach is to assume that they occur in
clusters, and search for clusters of cis-elements. We have developed a
number of algorithms in the area of gene regulation, falling into various
key sub-areas: (1) PromoSer
for proximal promoter retrieval; (2) Cister,
Comet and
Cluster-buster
for cis-element cluster search; (2)
HugeIndex
for microarray analysis and detection of co-regulated genes; (3) a Gibbs sampling algorithm
Glam
for ab initio motif detection; and (4)
Clover
for detection of overrepresentation of previously known motifs.
In addition, we have developed a versatile sequence and annotation visualization program called
SeqVISTA.
Protein-Protein Interactions
(Docking)
The ability to computationally predict whether and how two proteins bind
to each other (protein docking) has broad applications in functional genomics.
We develop accurate binding free energy target functions and fast search
algorithms. We have put substantial effort on developing a fully
automated rigid-body docking algorithm ZDOCK, which focuses on the
initial stage of docking unbound protein structures. Due to the diverse
nature of protein interactions, the goal of ZDOCK is to retain at least
one hit in the first two thousand predictions for as many test cases as
possible. To facilitate our own as well as others’ docking efforts, we
have developed a large benchmark of non-redundant test cases. All of our
ZDOCK developments have been extensively tested on this benchmark. ZDOCK
performed competitively at the CAPRI challenge, a community-wide blind
test of docking algorithms. Subsequent to CAPRI, we have developed a
refinement method RDOCK, which can apply sophisticated scoring functions
and time-consuming search algorithms to the predictions by ZDOCK and
rank hits at the top. We also investigate various types of
protein-protein interactions and develop methods to distinguish obligate
oligomers from transient recognition complexes.
Protein structure
building blocks
Protein structures can be dissected into recurring functional units called
domains. When all currently known domain structures are compared, approximately
500 are found to have distinct folds. It has been estimated that roughly 500
more folds are yet to be discovered. We are carrying out an all-against-all
comparison of these domain structures using a structural alignment algorithm
(K2) recently developed by our group. Statistically significant matches
should correspond to the building blocks of protein structures.
Identification of
immunogenic peptides
The ability to determine which protein segments will bind particular MHC
molecules with high affinity is of importance for the development of
peptide vaccines. We have recently developed a novel statistical method
to predict peptide sequences that can bind strongly to a MHC allele HLA-A2
(SMM). We have also extended the method to take into account
interactions between neighboring positions, as well as applying the method
to other major human alleles such as A1, A3, A11 and A24. In the meantime,
we collaborate with experimental laboratories to identify immunogenic peptides
in tumor specific antigens.
Engineering of high-affinity
T cell receptors
Cytotoxic T cell receptors (TCRs) are major players in cellular immunity.
They specifically recognize foreign peptides presented by a class I MHC
molecule and stimulate a series of events culminating in the destruction
of the infected cell. We develop computationally based methodologies
for designing high affinity TCRs or TCR-like molecules. We are in the
process of developing algorithms that will be able to consider all
possible side chain substitutions, which will allow us to consider
the most important possible replacements at the key binding site residues.
Engineering of soluble
protein surface
Some proteins function only as multimers. The interfaces between the
monomers of multimeric proteins are usually highly hydrophobic, which is the
main reason that these proteins can not exist as stable monomers. However,
many current technologies for protein engineering, such as phage display
and gene shuffling, are only applicable to single-chain proteins. Therefore
methods for designing stable monomers for multimeric proteins will be of
broad interests. We are developing statistical approaches to computationally
search possible mutations at the monomeric interfaces of multimeric proteins
to make them soluble.
For a complete list of publications, please visit
ZLAB publication
page.