In its native state each type of protein molecule has a charac¬teristic three-dimensional shape, referred to as its conforma¬tion.

Conformation refers to the spatial arrangement of substituent groups that are free to assume many different positions, without breaking bonds, because of rotation about the single bonds in the molecule. The covalent backbone of a polypeptide chain is formally single-bonded.

Depending on their conformation, proteins can be placed in two major classes, fibrous and globular.

The fibrous pro¬teins consist of polypeptide chains arranged in parallel along a single axis, to yield long fibers or sheets. In globular proteins, on the other hand, the polypeptide chains are tightly folded into compact spherical or globular shapes.

There are different levels of protein structure. Primary structure refers to the covalent backbone of the polypeptide chain and the sequence of its amino acid residues.

Amino acid are joined covalently through peptide bonds. Much evidence supports the conclusion that the peptide bond is the sole covalent linkage between amino acids in the linear backbone structure of proteins.


Very important in the procedure for establishing amino acid sequence are methods for identifying the terminal amino acid residues.

The first useful method for the N-terminal resi¬due of polypeptides was described by Sanger, who found that the free unprotonated a-amino group of peptides reacts with 2,4-dinitrofluorobenzene (DNFB) to form yellow 2,4-dinitrophenyl derivates. When such a derivative of a peptide, regardless of its length, is subjected to hydrol¬ysis with 6 N HCl, all the peptide bonds are hydrolyzed, but the bond between the 2,4-dinitrophenyl group and the a-amino group of the N-terminal amino acid is relatively stable to acid hydrolysis. Consequently, the hydrolyzate of such a dinitrophenyl peptide contains all the amino acid residues of the peptide chain as free amino acids except the N-terminal one, which appears as the yellow 2,4-dinitrophenyl deriva¬tive. This labeled residue can easily be separated from the unsubstituted amino acids and identified by chromatographic comparison with known dinitrophenyl derivatives of the different amino acids.

The most important and most widely used labeling reac¬tion for the N-terminal residue is that designed by P. Edman. In the Edman procedure phenylisothiocyanate reacts quantitatively with the free amino group of a peptide to yield the corresponding phenylthiocarbamoyl peptide. On treatment with anhydrous acid the N-terminal residue is split off as a phenylthiocarbamoyl amino acid, leaving the rest of the peptide chain intact. The phenylthiocarbamoyl amino acid is then cyclized and can be separated and identified, usually by gas-liquid chromatography. Alternatively, the N-terminal residue removed as the phen¬ylthiocarbamoyl derivative can be identified simply by deter¬mining the amino acid composition of the peptide before and after removal of the N-terminal residue; this is called the subtractive Edman method.

The great advantage of the Edman method is that the rest of the peptide chain after removal of the N-terminal amino acid is left intact for further cycles of this procedure; thus the Edman method can be used in a sequential fashion to iden¬tify several or even many consecutive amino acid residues starting from the N-terminal end.

Complete Hydrolysis of Polypeptide Chains.

Peptide bonds are readily hydrolyzed by heating with either acid or base. Heating polypeptides with excess 6 N hydrochloric acid at 100 to 120°C for 10 to 24 h is the usual procedure for complete hydrolysis. Polypeptides can also be hydrolyzed by boiling with strong sodium hydroxide solutions. The amino acid composition of hydrolyzates of polypeptides and proteins is determined by chromatography.

Partial Hydrolysis of Polypeptide Chains

Once the N-terminal and C-terminal amino acid residues of a polypeptide chain have been identified, the next step in the grand strategy for determining the sequence of amino acids is to fragment the chain to yield a set of short peptides which can be separated and identified. This is accomplished by the partial or selective hydrolysis of the polypeptide chain. The method of choice for partial hydrolysis is to use proteases, enzymes that hydrolyze peptide bonds. Sev¬eral highly purified proteases have been used for this pur¬pose. The most specific is trypsin, a digestive en¬zyme secreted into the small intestine from the pancreas. It cata¬lyzes the hydrolysis of only those peptide bonds in a poly¬peptide chain whose carbonyl function is donated by either a lysine or an arginine residue. Other enzymes useful for partial hydrolysis of polypeptide chains are chymotrypsin, pepsin, and thermolysin. Thermolysin, a heat-stable bacterial protease, can hydrolyze peptide bonds in which the amino function is contributed by the nonpolar amino acids leucine, isoleucine, and valine.

Protein is an important nutrient that builds muscles and bones and provides energy. Protein can help with weight control because it helps you feel full and satisfied from your meals.

The healthiest proteins are the leanest. This means that they have the least fat and calories. The best protein choices are fish or shellfish, skinless chicken or turkey, low-fat or fat-free dairy (skim milk, low-fat cheese), and egg whites or egg substitute. The best red meats are the leanest cuts (loin and tenderloin). Other healthy options are beans, legumes (lentils and peanut butter), and soy foods such as tofu or soymilk.

The unique structure and chemical composition of each protein is important for its function; it is also important for separating proteins in a protein purification strategy. Each of these differences in properties can be used as a basis for the separation methods that are used to purify proteins. Because these differences in protein properties originate from differences in the chemical structure of the amino acids that make up the protein, we need to explore the structure of amino acids and their contribution to protein properties in more detail.

Amino acid structure:

Amino acids are composed of carbon, hydrogen, oxygen, and nitrogen. Two amino acids, cysteine and methionine, also contain sulfur. The generic form of an amino acid is shown in Figure 2.1. Atoms of these elements are arranged into 20 kinds of amino acids that are commonly found in proteins. All proteins in all species, from bacteria to humans, are constructed from the same set of twenty amino acids. All amino acids have an amino group (NH2) and a carboxyl group (COOH) bonded to the same carbon atom, known as the alpha carbon. Amino acids differ in the side chain or R group that is bonded to the alpha carbon. (Figure 2.2) Glycine, the simplest amino acid has a single hydrogen atom as its R group - Alanine has a methyl (-CH3) group.

 The chemical composition of the unique R groups is responsible for the important characteristics of amino acids such as chemical reactivity, ionic charge and relative hydrophobicity. In Figure 2.2, the amino acids are grouped according to their polarity and charge. They are divided into four categories, those with polar uncharged R groups, those with apolar (nonpolar) R groups, acidic (charged) and basic (charged) groups.


The polar amino acids are soluble in water because their R groups can form hydrogen bonds with water. For example, serine, threonine and tyrosine all have hydroxyl groups (OH). Amino acids that carry a net negative charge at neutral pH contain a second carboxyl group. These are the acidic amino acids, aspartic acid and glutamic acid, also called aspartate and glutamate, respectively. The basic amino acids have R groups with a net positive charge at pH 7.0. These include lysine, arginine and histidine. There are eight amino acids with nonpolar R groups. As a group, these amino acids are less soluble in water than the polar amino acids. If a protein has a greater percentage of nonpolar R groups, the protein will be more hydrophobic (water hating) in character.

 A protein is formed by amino acid subunits linked together in a chain. The bond between two amino acids is formed by the removal of a H20 molecule from two different amino acids, forming a dipeptide. (Figure 2.3) The bond between two amino acids is called a peptide bond and the chain of amino acids is called a peptide (20 amino acids or smaller) or a polypeptide.

Each protein consists of one or more unique polypeptide chains. Most proteins do not remain as linear sequences of amino acids; rather, the polypeptide chain undergoes a folding process. The process of protein folding is driven by thermodynamic considerations. This means that each protein folds into a configuration that is the most stable for its particular chemical structure and its particular environment. The final shape will vary but the majority of proteins assume a globular configuration. Many proteins such as myoglobin consist of a single polypeptide chain; others contain two or more chains. For example, hemoglobin is made up of two chains of one type (amino acid sequence) and two of another type.

Although the primary amino acid sequence determines how the protein folds, this process is not completely understood. Although certain amino acid sequences can be identified as more likely to form a particular conformation, it is still not possible to completely predict how a protein will fold based on its amino acid sequence alone, and this is an active area of biochemical research.

The final folded 3-D arrangement of the protein is referred to as its conformation. In order to maintain their function, proteins must maintain this conformation. To describe this complex conformation, scientists describe four levels of organization: primary, secondary, tertiary, and quaternary (Figure 2.4). The overall conformation of a protein is the combination of its primary, secondary, tertiary and quaternary elements.

Four levels of Organization of Protein Structure:

Primary Structure refers to the linear sequence of amino acids that make up the polypeptide chain. This sequence is determined by the genetic code, the sequence of nucleotide bases in the DNA. The bond between two amino acids is a peptide bond. This bond is formed by the removal of a H20 molecule from two different amino acids, forming a dipeptide. The sequence of amino acids determines the positioning of the different R groups relative to each other. This positioning therefore determines the way that the protein folds and the final structure of the molecule.

The secondary structure of protein molecules refers to the formation of a regular pattern of twists or kinks of the polypeptide chain. The regularity is due to hydrogen bonds forming between the atoms of the amino acid backbone of the polypeptide chain. The two most common types of secondary structure are called the alpha helix and ß pleated sheet. (Figure 2.4)

Tertiary structure refers to the three dimensional globular structure formed by bending and twisting of the polypeptide chain. This process often means that the linear sequence of amino acids is folded into a compact globular structure. The folding of the polypeptide chain is stabilized by multiple weak, noncovalent interactions. These interactions include:

o        Hydrogen bonds that form when a Hydrogen atom is shared by two other atoms.

o        Electrostatic interactions that occur between charged amino acid side chains. Electrostatic interactions are attractions between positive and negative sites on macromolecules.

o        Hydrophobic interactions: During folding of the polypeptide chain, amino acids with a polar (water soluble) side chain are often found on the surface of the molecule while amino acids with non polar (water insoluble) side chain are buried in the interior. This means that the folded protein is soluble in water or aqueous solutions.


Covalent bonds may also contribute to tertiary structure. The amino acid, cysteine, has an SH group as part of its R group and therefore, the disulfide bond (S-S ) can form with an adjacent cysteine. For example, insulin has two polypeptide chains that are joined by two disulfide bonds.

Quaternary structure refers to the fact that some proteins contain more than one polypeptide chain, adding an additional level of structural organization: the association of the polypeptide chains. Each polypeptide chain in the protein is called a subunit. The subunits can be the same polypeptide chain or different ones. For example, the enzyme ß-galactosidase is a tetramer, meaning that it is composed of four subunits, and, in this case, the subunits are identical - each polypeptide chain has the same sequence of amino acids. Hemoglobin, the oxygen carrying protein in the blood, is also a tetramer but it is composed of two polypeptide chains of one type (141 amino acids) and two of a different type (146 amino acids). In chemical shorthand, this is referred to as a2ß2 . For some proteins, quaternary structure is required for full activity (function) of the protein.

 Some proteins combine with other kinds of molecules such as carbohydrates, lipids, iron and other metals, or nucleic acids, to form glycoproteins, lipoproteins, hemoproteins, metalloproteins, and nucleoproteins respectively. The presence of these other biomolecules affects the protein properties. For example, a protein that is conjugated to carbohydrate, called a glycoprotein, would be more hydrophilic in character while a protein conjugated to a lipid would be more hydrophobic in character.

Proteins are typically characterized by their size (molecular weight) and shape, amino acid composition and sequence, isolelectric point (pI), hydrophobicity, and biological affinity. Differences in these properties can be used as the basis for separation methods in a purification strategy (Chapter 4). The chemical composition of the unique R groups is responsible for the important characteristics of amino acids, chemical reactivity, ionic charge and relative hydrophobicity. Therefore protein properties relate back to number and type of amino acids that make up the protein.


Size of proteins is usually measured in molecular weight (mass) although occasionally the length or diameter of a protein is given in Angstroms. The molecular weight of a protein is the mass of one mole of protein, usually measured in units called daltons. One dalton is the atomic mass of one proton or neutron. The molecular weight can be estimated by a number of different methods including electrophoresis, gel filtration, and more recently by mass spectrometry. The molecular weight of proteins varies over a wide range. For example, insulin is 5,700 daltons while snail hemocyanin is 6,700,000 daltons. The average molecular weight of a protein is between 40,000 to 50,000 daltons. Molecular weights are commonly reported in kilodaltons or (kD), a unit of mass equal to 1000 daltons. Most proteins have a mass between 10 and 100 kD. A small protein consists of about 50 amino acids while larger proteins may contain 3,000 amino acids or more. One of the larger amino acid chains is myosin, found in muscles, which has 1,750 amino acids.

Separation methods that are based on size and shape include gel filtration chromatography (size exclusion chromatography) and polyacrylamide gel electrophoresis.

The amino acid composition is the percentage of the constituent amino acids in a particular protein while the sequence is the order in which the amino acids are arranged.

Each protein has an amino group at one end and a carboxyl group at the other end as well as numerous amino acid side chains, some of which are charged. Therefore each protein carries a net charge. The net protein charge is strongly influenced by the pH of the solution. To explain this phenomenon, consider the hypothetical protein in Figure 2.5. At pH 6.8, this protein has an equal number of positive and negative charges and so there is no net charge on the protein. As the pH drops, more H+ ions are available in the solution. These hydrogen ions bind to negative sites on the amino acids. Therefore, as the pH drops, the protein as a whole becomes positively charged. Conversely, at a basic pH, the protein becomes negatively charged. pH 6.8 is called the pI, or isoelectric point, for this protein; that is, the pH at which there are an equal number of positive and negative charges. Different proteins have different numbers of each of the amino acid side chains and therefore have different isoelectric points. So, in a buffer solution at a particular pH, some proteins will be positively charged, some proteins will be negatively charged and some will have no charge.


Separation techniques that are based on charge include ion exchange chromatography, isoelectric focusing and chromatofocusing.

 Literally, hydrophobic means fear of water. In aqueous solutions, proteins tend to fold so that areas of the protein with hydrophobic regions are located in internal surfaces next to each other and away from the polar water molecules of the solution. Polar groups on the amino acid are called hydrophilic (water loving) because they will form hydrogen bonds with water molecules. The number, type and distribution of nonpolar amino acid residues within the protein determines its hydrophobic character. (Chart of hydrophobicity or hydropathy)

A separation method that is based on the hydrophobic character of proteins is hydrophobic interaction chromatography.


As the name implies, solubility is the amount of a solute that can be dissolved in a solvent. The 3-D structure of a protein affects its solubility properties. Cytoplasmic proteins have mostly hydrophilic (polar) amino acids on their surface and are therefore water soluble, with more hydrophobic groups located on the interior of the protein, sheltered from the aqueous environment. In contrast, proteins that reside in the lipid environment of the cell membrane have mostly hydrophobic amino acids (non polar) on their exterior surface and are not readily soluble in aqueous solutions.

Each protein has a distinct and characteristic solubility in a defined environment and any changes to those conditions (buffer or solvent type, pH, ionic strength, temperature, etc.) can cause proteins to lose the property of solubility and precipitate out of solution. The environment can be manipulated to bring about a separation of proteins- for example, the ionic strength of the solution can be increased or decreased, which will change the solubility of some proteins.

 Biological Affinity (Function):

Proteins often interact with other molecules in vivo in a specific way- in other words, they have a biological affinity for that molecule. These molecular counterparts, termed ligands, can be used as “bait” to “fish” out the target protein that you want to purify. For example, one such molecular pair is insulin and the insulin receptor. If you want to purify (or catch) the insulin receptor, you could couple many insulin molecules to a solid support and then run an extract (containing the receptor) over that column. The receptor would be “caught” by the insulin bait. These specific interactions are often exploited in protein purification procedures. Affinity chromatography is a very common method for purifying recombinant proteins (proteins produced by genetic engineering). Several histidine residues can be engineered at the end of a polypeptide chain. Since repeated histidines have an affinity for metals, a column of the metal can be used as bait to “catch” the recombinant protein.

Although DNA can be isolated and amplified from thousand year old mummies, most proteins are more fragile biomolecules. Therefore, laboratory reagents and storage solutions must provide suitable conditions so that the normal structure and function of the protein is maintained. To understand how the structure of proteins is protected in laboratory solutions, it is necessary to understand how that structure can be destroyed.

Proteins can denature, or unfold so that their three dimensional structure is altered but their primary structure remains intact.(Figure 2.7) Many of the interactions that stabilize the 3-D conformation of the protein are relatively weak and are sensitive to various environmental factors including high temperature, low or high pH and high ionic strength. Protein vary greatly in the degree of their sensitivity to these factors. Sometimes proteins can be renatured but often the denaturation is irreversible.



Figure 2.7. A figure showing the process of denaturation. The polypeptide chain has lost its higher order structure and is now a random coil.

Proteins can also be broken apart by enzymes, called proteases, that digest the covalent peptide bonds between amino acids that are responsible for the primary structure. This process is called proteolysis and is irreversible. Cells contain proteases that are found in lysosomes, membrane bound organelles inside the cell. When cells are disrupted, lysosomes break and release these proteases, which can damage the other proteins in the cell. In the laboratory, it is therefore necessary to minimize the activities of cellular proteases to protect proteins from proteolysis. Methods used to minimize proteolysis include working at lower temperatures (4°C), and adding chemicals that inhibit protease activity.

Sulfur groups on cysteines may undergo oxidation to form disulfide bonds that are not normally present. Extra disulfide bonds can form when proteins are removed from their normal environment. Reducing agents such as dithiothreitol or ß-mercaptoethanol are often added to prevent undesirable disulfiate bond formation.

Proteins readily adsorb (stick to) surfaces, thereby reducing their available activity. To prevent significant loss, do not store dilute solutions of proteins for prolonged periods of time. Always dilute them right before use.

The composition of the extraction buffer is important for maintaining structure and function of the target protein. To prevent denaturation, the buffering pH is based on the pH stability range of the protein. Other components such as ionic strength, divalent cations (Ca++ and Mg++), or reducing agents (dithiothreitol or ß-mercaptoethanol) may be needed to maintain activity. In making the extract, cells are lysed and proteases (enzymes that degrade proteins) are released from their intracellular compartments. To prevent proteases from digesting the target protein, two strategies are commonly followed: 1) The extract is kept cold. The activity of proteolytic enzymes is greatly reduced by cold temperatures. For this reason, the protein purification process is often conducted in cold rooms. At the very least, an effort is made to keep the extract at 4?C. 2) Protease inhibitors are sometimes added to the mixture to prevent degradation by proteases. The drawback to this strategy is that the inhibitors must eventually be removed, along with other contaminant proteins.


 That means that the two simplest amino acids, glycine and alanine, would be shown as:


Glycine and alanine can combine together with the elimination of a molecule of water to produce a dipeptide. It is possible for this to happen in one of two different ways - so you might get two different dipeptides.

In each case, the linkage shown in blue in the structure of the dipeptide is known as a peptide link. In chemistry, this would also be known as an amide link, but since we are now in the realms of biochemistry and biology, we'll use their terms.

If you joined three amino acids together, you would get a tripeptide. If you joined lots and lots together (as in a protein chain), you get a polypeptide.

A protein chain will have somewhere in the range of 50 to 2000 amino acid residues. You have to use this term because strictly speaking a peptide chain isn't made up of amino acids. When the amino acids combine together, a water molecule is lost. The peptide chain is made up from what is left after the water is lost - in other words, is made up of amino acid residues.

By convention, when you are drawing peptide chains, the -NH2 group which hasn't been converted into a peptide link is written at the left-hand end. The unchanged -COOH group is written at the right-hand end.

The end of the peptide chain with the -NH2 group is known as the N-terminal, and the end with the -COOH group is the C-terminal.

A protein chain (with the N-terminal on the left) will therefore look like this:

 The "R" groups come from the 20 amino acids which occur in proteins. The peptide chain is known as the backbone, and the "R" groups are known as side chains.

 Note:  In the case where the "R" group comes from the amino acid proline, the pattern is broken. In this case, the hydrogen on the nitrogen nearest the "R" group is missing, and the "R" group loops around and is attached to that nitrogen as well as to the carbon atom in the chain.

I mention this for the sake of completeness - not because you would be expected to know about it in chemistry at this introductory level.