The power of simple sequence pattern analysis in predicting protein behavior is illustrated in the case of Cathepsin K, a papain-like cysteine protease, involved in many degenerative diseases, bone development and Pycnodysostosis (Toulouse-Lautrec syndrome.) Triplets of basic amino acids are typical of heparin binding proteins that are internalized.
I admit that I am obsessed with inflammation and heparin. My daughters automatically yell out “Give him heparin!” when a patient on ER has a severe migraine attack. They think it is a good joke until the savvy doc reveals the latest approach and actually injects heparin with satisfying results. Heparin and inflammation are intimately involved and I predict that blood tests that determine the quality and quantity of protein-bound heparin, will ultimately be used as measures of chronic inflammation, as well as revealing a variety of diseases.
I have a habit of examining the molecular basis of diseases that I encounter on TV, in newspapers or in books. Wikipedia is my first source, followed by the National Center for Biotechnology Information (NCBI). As soon as I find the genes/proteins involved, I check to see if the structures has been determined by X-ray crystallography or NMR, and then I look at the amino acid sequence. I check for pairs or triplets of basic amino acids. Invariably the pairs are matched with a neighboring basic amino acid, and that is a putative heparin-binding domain. Triplets almost always indicate secreted proteins that are brought back into cells dependent on strong affinity for recycled heparan sulfate proteoglycans. Within minutes of hearing about a new disease, I usually know something about the molecules involved and particularly whether or not inflammation is going to be a major factor.
I was just reading Outlander by Diana Gabaldon and one of her characters has the short stature and disablity of Toulouse-Lautrec syndrome. I literally ran to my computer, because I am particularly interested in diseases of cartilage and bone. Since TLS is a genetic disease, I checked the NCBI Online Mendelian Inheritance in Man (OMIM) site and found that the genetic defect is in the cathepsin K gene. Cathepsin K is a protease similar to papain, which is intimately involved in many different facets of development, as well as cartilage and bone production. The cells that degrade cartilage to remodel bone, osteoclasts, use cathepsin K to degrade collagen.
I found a structure for cathepsin K bound to chondroitin sulfate. The structure looked all wrong, based on my prejudices -- the sugars of the polysaccharide should have been bound to the basic amino acids or to surface aromatic amino acids. The accompanying amino acid sequence told the whole story:
There were two triplets of basic amino acids (R, arginine or K, lysine), indicative of internalization and strong heparin binding. I performed a quick literature search for heparin binding and internalization and found a reference that confirmed my hunches (note the title):
Nascimento FD, Rizzi CC, Nantes IL, Stefe I, Turk B, Carmona AK, Nader HB, Juliano L, Tersariol IL. Cathepsin K binds to cell surface heparan sulfate proteoglycans. Arch Biochem Biophys. 2005 Apr 15;436(2):323-32.
The article demonstrated that cathepsin K bound only to the surface of cells that produced heparin sulfate and was internalized only by heparin-producing cells. Moreover, cathepsin K changed shape as it bound to heparin, but not to chondroitin sulfate.
This story underscores the predictive power of simple generalizations derived from the dominating interactions between heparin and proteins. Heparin-binding domains, because of their positive charges, stay on the surface of the protein, don’t tend to fold well into helices or other secondary structures and are readily recognized in amino acid sequences of proteins. Stronger heparin-binding domains involved in internalization or transport into nuclei are even more stereotyped as triplets or quadruplets, respectively, of basic amino acids.
There are some complicating special cases involving basic amino acids, since these amino acids are also involved in glycosylation, nucleic acid binding, inositol phosphate interactions, phospholipid interactions, protein folding/chaperone binding, and protease action, but the generalizations outlined here provide a starting point for exploring the exciting area of heparin binding.
Knowing just a few typical patterns, you can just look at a protein sequence and amaze people by telling them that they should be able to purify their protein on heparin Sepharose! You can also point and gasp at the fact that in the early 1990’s in China the bird flu hemagglutinin picked up a new sequence with a quartet of four basic amino acids. When I saw that I called the CDC and explained the new cell receptor! It still has me scared -- am I the first one to notice the potential for a pandemic much more severe than the 1918 Spanish flu?