Boris Steipe

Boris Steipe

Associate Professor

PhD, Ludwig-Maximilians-Universität, Munich, 1990

Address Medical Sciences Building, 5271
1 King's College Circle
Toronto, ON M5S 1A8
Lab Steipe Lab
Office Phone (416) 946-7741

Boris Steipe was born in Munich, Germany where he graduated from the medical school of the Ludwig-Maximilians University in 1985. He joined Andreas Plückthun’s lab at the Gene Center of the University for his PhD thesis on the recombinant expression and structure determination of an immunoglobulin fragment. Subsequently, his interests turned to protein engineering, and he joined Robert Huber’s Department at the Max-Planck Institute for Biochemistry in Martinsried, Germany in 1990. It was there that his “Canonical Sequence Approximation” – the hypothesis that sequence propensities can be used to predict stability changes in a very general way was first formulated.

Steipe was appointed Research Fellow at the Gene Center of the University, in 1990 and where his group worked on the rational stabilisation of immunoglobulin domains, on sequence determinants of protein folding and on the interplay of the protein matrix with the fluorophore in Green Fluorescent Protein; he was awarded his Habilitation in Biochemistry at the Faculty for Chemistry and Pharmacy of the University in 2000, when he was appointed as lecturer.

In 2001 Steipe moved to Toronto where he holds an appointment as associate professor in the Department for Biochemistry and the Department for Molecular Genetics, University of Toronto. His present work focusses on structural bioinformatics with an emphasis on structural motifs.

Research Lab

My work focusses primarily on the discovery and analysis of structural motifs – recurring, local, patterns of protein structure that reflect the sequence/structure relationships of protein folding.

Learn more: Steipe Lab

Research Description


The cohesive element of my research projects is the quest to understand complexity in biomolecular systems. Complexity arises from a context dependent behaviour of system components and we observe complexity in many hierarchical layers of structure formation and generation of function, from the genome to the living cell. My work mainly focusses on proteins since protein folding is the quintessential paradigm of self-organising molecular systems. Based on our concepts to address complexity, my lab has developed strategies and algorithms to analyse proteins and engineer them in predictable ways.

My current work is mainly focussed on bioinformatics.

The "Canonical Sequence Approximation"

Theoretical and applied bioinformatics provides core technologies for protein engineering. My lab has contributed two strategies to address the complexity issues that limit rational protein engineering.

The Canonical Sequence Approximation

A first-order approximation is to view amino acids as context-independent elements of protein structure. The hypothesis of a canonical sequence approximation which we have developed, views mutation and selection of the immunoglobulin sequence repertoire in analogy to the concept of an ensemble in statistical thermodynamics. To the degree that mutations are independent and randomly distributed, the most probable distribution of amino acid residues (states) in a canonical immunoglobulin sequence will be described by Boltzmann’s law, where the concept of “energy” is replaced by the “fitness” of a domain in selection. To a large degree, the contribution to fitness will be a free energy contribution to thermodynamic stability of the protein. In the simplest application of this hypothesis, the consensus residues of a domain sequence are predicted to be the most stabilizing residues in their respective positions. This is essentially a mean-field approach, in which amino acid residues are approximated to interact with a context that is averaged over a large number of specific sequences by evolution.

Motif engineering

A second order approximation considers local interactions of amino acid residues only. The concept is the same as above, but this time sequences are aligned from recurring, similar structural fragments from a database of non-related protein structures. We have compiled consensus sequences for these structural motifs and we have been able to show that these sequences can be used for protein engineering.

Non-redundant Subsets for Protein Structure Statistics

The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfil minimal requirements regarding the experimental quality of the structures it contains. However, non-redundant datasets based on sequence similarity invariably contain distantly related homologues. We have compiled a reference dataset of non-homologous protein domains, assuming that structural dissimilarity at the topology level is incompatible with recognizable common ancestry. The dataset is based on domains at the Topology level of the CATH database which hierarchically classifies all protein structures. It contains the best refined representatives of each Topology level, validates structural dissimilarity and removes internally duplicated fragments. We are currently building a computational pipeline to automatically update the dataset.


Protein Structure Motifs


If a “protein folding code” exists, it ought to give rise to detectable sequence propensities that are associated with low energy conformations, i.e. native structure. To the degree that the frequency of structure patterns in folded proteins has a Boltzmann-like behaviour, such conformations should be detectable by their excess occurrence over random. We have mined a database of non-homologous, well resolved protein structure domains – Nh3D – and have discovered an abundance of such sets of overrepresented structurally similar patterns. We designate the best representatives of a set a motif. Our motif dictionary schematikon shows significant and interesting sequence propensities and is predictive regarding the experimentally determined consequences of sequence change on stability.


View all publications on PubMed

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database
Boris Steipe and Bhooma Thiruv
Bioinformatics Research And Applications - Springer Lecture Notes in Computer Science Volume 8492, 2014, pp 357-366  Read

Nh3D: a reference dataset of non-homologous protein structures
Bhooma Thiruv, Gerald Quon, S. Adrian Saldanha and Boris Steipe
BMC Struct Biol. 2005 Jul 12;5:12.  Read

Consensus-based engineering of protein stability: from intrabodies to thermostable enzymes.
Boris Steipe
Methods Enzymol. 2004;388:176-86.  Read