Research

Teaching

Resume

Funding

 

ZLAB

BME

Bioinformatics

 

Home


 

The enormous amount of information generated by genome projects, appropriately organized, is causing a paradigm shift in theoretical and experimental research.  Increasingly, computational methods need no longer start from first principles, which is often extremely difficult.  Reliable statistics derived from large data sets can be used to obtain predictive rules, and experiments, as in the physical sciences, begin at the computer workstation, move to the lab bench, and then back again to the computer for analysis.

My research program currently consists of the following components:
 

Gene Regulation (Gene Hub at ZLAB )

Cis-elements are the atomic sites in DNA (or RNA) that control gene expression. Since these sites are usually highly variable, they are difficult to find in a genomic sequence more than a few thousand base pairs long.  Our general approach is to assume that they occur in clusters, and search for clusters of cis-elements. We have developed a number of algorithms in the area of gene regulation, falling into various key sub-areas: (1) PromoSer for proximal promoter retrieval; (2) Cister, Comet and Cluster-buster for cis-element cluster search; (2) HugeIndex for microarray analysis and detection of co-regulated genes; (3) a Gibbs sampling algorithm Glam for ab initio motif detection; and (4) Clover for detection of overrepresentation of previously known motifs. In addition, we have developed a versatile sequence and annotation visualization program called SeqVISTA.

Protein-Protein Interactions (Docking)

The ability to computationally predict whether and how two proteins bind to each other (protein docking) has broad applications in functional genomics.  We develop accurate binding free energy target functions and fast search algorithms.  We have put substantial effort on developing a fully automated rigid-body docking algorithm ZDOCK, which focuses on the initial stage of docking unbound protein structures. Due to the diverse nature of protein interactions, the goal of ZDOCK is to retain at least one hit in the first two thousand predictions for as many test cases as possible. To facilitate our own as well as others’ docking efforts, we have developed a large benchmark of non-redundant test cases. All of our ZDOCK developments have been extensively tested on this benchmark. ZDOCK performed competitively at the CAPRI challenge, a community-wide blind test of docking algorithms. Subsequent to CAPRI, we have developed a refinement method RDOCK, which can apply sophisticated scoring functions and time-consuming search algorithms to the predictions by ZDOCK and rank hits at the top.  We also investigate various types of protein-protein interactions and develop methods to distinguish obligate oligomers from transient recognition complexes.
 
Protein structure building blocks 

Protein structures can be dissected into recurring functional units called domains.  When all currently known domain structures are compared, approximately 500 are found to have distinct folds. It has been estimated that roughly 500 more folds are yet to be discovered.  We are carrying out an all-against-all comparison of these domain structures using a structural alignment algorithm (K2) recently developed by our group.  Statistically significant matches should correspond to the building blocks of protein structures. 
 
Identification of immunogenic peptides 

The ability to determine which protein segments will bind particular MHC molecules with high affinity is  of importance for the development of peptide vaccines.  We have recently developed a novel statistical method to predict peptide sequences that can bind strongly to a MHC allele HLA-A2 (SMM).  We have also extended the method to take into account interactions between neighboring positions, as well as applying the method to other major human alleles such as A1, A3, A11 and A24.  In the meantime, we collaborate with experimental laboratories to identify immunogenic peptides in tumor specific antigens.
 
Engineering of high-affinity T cell receptors 

Cytotoxic T cell receptors (TCRs) are major players in cellular immunity.  They specifically recognize foreign peptides presented by a class I MHC molecule and stimulate a series of events culminating in the destruction of the infected cell.   We develop computationally based methodologies for designing high affinity TCRs or TCR-like molecules.  We are in the process of developing algorithms that will be able to consider all possible side chain substitutions, which will allow us to consider the most important possible replacements at the key binding site residues. 
 
Engineering of soluble protein surface 

Some proteins function only as multimers.  The interfaces between the monomers of multimeric proteins are usually highly hydrophobic, which is the main reason that these proteins can not exist as stable monomers.  However, many current technologies for protein engineering, such as phage display and gene shuffling, are only applicable to single-chain proteins.  Therefore methods for designing stable monomers for multimeric proteins will be of broad interests.  We are developing statistical approaches to computationally search possible mutations at the monomeric interfaces of multimeric proteins to make them soluble. 

For a complete list of publications, please visit ZLAB publication page.