UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A method to characterize formaldehyde cross-linking in proteins by mass spectrometry Ding, Xuan 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_fall_ding_xuan.pdf [ 3.92MB ]
Metadata
JSON: 24-1.0062184.json
JSON-LD: 24-1.0062184-ld.json
RDF/XML (Pretty): 24-1.0062184-rdf.xml
RDF/JSON: 24-1.0062184-rdf.json
Turtle: 24-1.0062184-turtle.txt
N-Triples: 24-1.0062184-rdf-ntriples.txt
Original Record: 24-1.0062184-source.json
Full Text
24-1.0062184-fulltext.txt
Citation
24-1.0062184.ris

Full Text

A METHOD TO CHARACTERIZE FORMALDEHYDE CROSS-LINKING IN PROTEINS BY MASS SPECTROMETRY  by XUAN DING B.Sc., Nanjing University, 2008  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE in The Faculty of Graduate Studies (Chemistry)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  August 2011  © Xuan Ding, 2011  Abstract The formaldehyde cross-linking approach has been used to identify protein interactions in living cells and organisms, and has the potential to map the geometry of interactions based on cross-linked peptides. However, the identification of cross-linked peptides has not been realized in native proteins, not even in model proteins. In this study, a method to identify and characterize cross-linked peptides in model proteins is developed. The method was initially developed in an insulin model system. Candidates of cross-linked peptides were identified by matching a list of putative cross-linked peptides to experimental MS signals. Signals in the MS/MS spectrum of a candidate were matched with proposed fragment ions, and confirmation of all proposed structural components verified a candidate to be a cross-linked peptide. As a result, three cross-linked insulin peptides were identified for the first time. The CID fragmentation of a formaldehyde cross-linked peptide proved to occur at both the cross-link bridge and peptide backbones. Fragment ions containing the cross-link bridge allowed the localization of cross-link sites, which revealed a specific N-terminus to tyrosine cross-link. The method was then refined using two model protein systems of equivalent and higher complexity. Five cross-linked insulin peptides and three cross-linked myoglobin peptides were identified, with crosslink sites localized. The fragmentation patterns of cross-linked peptides were further confirmed. The localization of cross-link sites in proteins revealed the N-terminus to tyrosine/asparigine and lysine to tyrosine cross-links, cross-links on arginine, and two cross-links forming on one single N-terminus. Furthermore, monitoring progression of the two reaction steps at cross-link sites revealed the chemistry of formaldehyde crosslinking reaction in proteins for the first time. In addition, with more complex data as the ii  size of the model protein increased, the method was refined by applying programming to data processing and a bar-graph visualization to localize cross-link sites on isomeric peptides. In the future, this method can be applied to other model protein systems for a more comprehensive understanding of formaldehyde cross-linking. The fragmentation patterns and reaction chemistry revealed by this method can be used to facilitate the identification of cross-linked peptides in native proteins.  iii  Preface This project was initiated by my supervisor, Professor Juergen Kast. I helped my supervisor in the design of the research program, performed all the benchwork, did all the data analysis and literature investigations. Samples were loaded to mass spectrometers by Jason Rogalski and Shujun Lin. Professor Juergen Kast provided invaluable guidance and suggestions during the entire course of research.  iv  Table of Contents Abstract ..................................................................................................................... ii Preface ...................................................................................................................... iv Table of Contents ...................................................................................................... v List of Tables ............................................................................................................. x List of Figures ......................................................................................................... xii List of Abbreviations ............................................................................................. xvii Acknowledgements ................................................................................................ xix Dedication ............................................................................................................... xx 1  Introduction ...................................................................................................... 1 1.1  Mass Spectrometry of Proteins and Peptides ............................................ 1  1.1.1 Protein Characterization ........................................................................ 1 1.1.2 Peptide Sequencing................................................................................ 2 1.2  Protein Interactions ................................................................................... 4  1.2.1 Affinity Enrichment Coupled with Mass Spectrometry ........................ 4 1.2.2 Protein Cross-linking ............................................................................. 5 1.2.3 Advantages of Formaldehyde as the Cross-linker ................................. 9 1.2.4 Formaldehyde Cross-linking in Living Cells and Organisms ............... 9 1.3  Identification of Cross-linked Peptides ................................................... 10 v  1.3.1 Challenges in the Identification of Cross-linked Peptides................... 10 1.3.2 Experimental Strategies and Bioinformatics Software to Facilitate the Identification of Cross-linked Peptides ..................................................................... 11 1.4  Model Studies of Formaldehyde Cross-linking Reactions ...................... 13  1.4.1 Two-step Reactions ............................................................................. 13 1.4.2 Residue Reactivity ............................................................................... 14 1.4.3 Model Proteins ..................................................................................... 15 1.5 2  Thesis Theme and Overview ................................................................... 16  Method Development in a Model Protein System to Identify Cross-linked  Peptides and Localize Cross-links .................................................................................... 18 2.1  Introduction ............................................................................................. 18  2.2  Experimental ........................................................................................... 19  2.2.1 Materials .............................................................................................. 19 2.2.2 Preparation of Formaldehyde Solution ................................................ 20 2.2.3 Cross-linking of the Model Protein ..................................................... 20 2.2.4 SDS-PAGE Analysis of Cross-linked Insulin ..................................... 21 2.2.5 Glu-C Digestion of Cross-linked Insulin ............................................. 21 2.2.6 Mass Spectrometric Analysis of Peptides ........................................... 21 2.2.7 Labeling of MS/MS Spectra ................................................................ 22 2.3  Results and Discussion ............................................................................ 22 vi  2.3.1 Cross-linking of Insulin ....................................................................... 22 2.3.2 Complexity of Insulin Peptide Mixture ............................................... 23 2.3.3 Candidates of Cross-linked Peptides ................................................... 25 2.3.4 Verification of the Candidate 505.613+ by the MS/MS Spectrum ....... 28 2.3.5 Verification of Candidates 903.422+ and 602.623+by MS/MS Spectra 32 2.3.6 Verification of Candidates 549.794+, 757.884+ and 763.884+by MS/MS Spectra……………................................................................................................... 37 2.3.7 Partial Stability of Cross-link Bridges in CID Fragmentation ............ 42 2.3.8 Localization of Cross-link Sites .......................................................... 43 2.4 3  Conclusions and Outlook ........................................................................ 45  Method Refinement Using Other Model Protein Systems ............................. 48 3.1  Introduction ............................................................................................. 48  3.2  Experimental ........................................................................................... 49  3.2.1 Materials .............................................................................................. 49 3.2.2 Preparation of Formaldehyde Solution ................................................ 50 3.2.3 Cross-linking of the Model Protein ..................................................... 50 3.2.4 SDS-PAGE Analysis of Cross-linked Insulin ..................................... 50 3.2.5 Mass Spectrometric Analysis of Cross-linked Myoglobin .................. 51 3.2.6 Glu-C Digestion of Cross-linked Proteins ........................................... 51  vii  3.2.7 Preparation of Modified Insulin α-Chain ............................................ 51 3.2.8 Mass Spectrometric Analysis of Peptides ........................................... 51 3.2.9 Labeling of MS/MS Spectra of Cross-linked Peptides ........................ 52 3.2.10 Localization of Modification Sites by Degree of Modification (D.O.M)……............................................................................................................. 52 3.3  Results and Discussion ............................................................................ 53  3.3.1 Model Protein Insulin with Alternative Processing ............................. 53 3.3.1.1 Identification of Five Cross-linked Insulin Peptides .................... 53 3.3.1.2 Localization of the Cross-link Bridge .......................................... 55 3.3.1.3 Determination of the Extra Structure of 12 Da Mass Shift by Reactivity Considerations ..................................................................................... 60 3.3.1.4 The Two-Step Reaction between the N-terminus or Lysine and Tyrosine or Asparagine Residues ......................................................................... 63 3.3.1.5 Physiological Relevance of the Identified Cross-links in Insulin. 72 3.3.2 Myoglobin as the Larger Model Protein.............................................. 74 3.3.2.1 Cross-linking of Myoglobin ......................................................... 74 3.3.2.2 Identification of Three Cross-linked Myoglobin Peptides ........... 76 3.3.2.3 Localization of Cross-link Sites to One Individual Residue ........ 83 3.3.2.4 Localization of Cross-link Sites to Several Residues ................... 85 3.3.2.5 Complexity of Extra Modifications/Cross-links........................... 88 viii  3.3.2.6 The Physiological Relevance and Reaction Chemistry of Identified Cross-links in Myoglobin ..................................................................................... 91 3.4 4  Conclusions and Outlook ........................................................................ 92  Conclusions and Future Perspectives ............................................................. 95  References ............................................................................................................. 101 Appendices ............................................................................................................ 109 A.1 The List of Natural Amino Acids ............................................................... 109 A.2 The List of Assigned MS Signals in Figure 3-12 ....................................... 110 A.3 The MatLab Program for Data Processing in Chapter 3.3.2.2 ................... 112  ix  List of Tables Table 1-1. Names, structures, properties and spacer arm lengths of cross-linkers applicable to living cells. .................................................................................................... 8 Table 2-1. The theoretical mass list of possible cross-linked peptides in the digest of formaldehyde treated 6 hr sample, made by considering unmodified and modified insulin peptides as possible component peptides and summing up their masses one by one. Masses in bold are used as examples to illustrate proposed structural components of putative cross-linked peptides. Underlined masses are those of candidates of cross-linked peptides, identified by matching this table with the masses of unknown signals in Figure 2-3a. “Mod” is short for modification(s). ......................................................................... 26 Table 2-2. The m/z value, mass and proposed structural components of candidates of cross-linked peptides in the digest of the formaldehyde treated 6hr insulin sample. The ^ represents the cross-link bridge, while * represents an extra Schiff-base modification or intra-peptide cross-link on one of the component peptides, or a second cross-link bridge between two peptides. ....................................................................................................... 28 Table 2-3. Proposed fragment ions, their masses and the matching MS/MS signals, derived by assuming fragmentations along the backbone of the proposed component peptide II of the candidate 505.613+ (Its MS/MS spectrum is shown in Figure 2-4). ....... 30 Table 3-1. The m/z value, mass and structural components of cross-linked peptides identified in the digest of the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced). The ^ represents the cross-link bridge, while the * represents an extra Schiffbase modification or cross-link. ........................................................................................ 54 x  Table 3-2. Distance constraints between cross-link sites in dimeric and hexameric insulin (PDB# 2A3G and 3AIY)....................................................................................... 73  xi  List of Figures Figure 1-1. The scheme of the nomenclature system to describe different types of ions from fragmentations along the peptide backbone. ...................................................... 3 Figure 1-2. (a) The scheme of the two-step reaction to form cross-links between proteins; (b) The scheme of cross-linked proteins, its digest and types of peptides in the peptide mixture. .................................................................................................................. 5 Figure 1-3. The scheme of the two-step formaldehyde cross-linking reaction in proteins. ............................................................................................................................. 13 Figure 1-4. The 3D structures (a and b) and sequences (c and d) of bovine insulin (a and c) and horse myoglobin (b and d). In (a) and (b), red highlights alpha-helix regions, while green highlights flexible regions. In (d) and (d), solid lines denote disulfide bonds, and dashed lines denote Glu-C cleavage sites. Residues highlighted with colors are reactive in the modification step (orange), potentially reactive in the cross-linking step (hotpink), and (potentially) reactive in both steps (blue). ................................................. 16 Figure 2-1. The sequence of insulin from bovine. Solid lines denote disulfide bonds, dashed lines denote Glu-C cleavage sites. ........................................................................ 19 Figure 2-2. The SDS-PAGE gel of 300 µM insulin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr................................................................... 23 Figure 2-3. (a) The 3D plot (LC retention time, m/z, signal intensity represented by grayscale) of LC-MS/MS data from the digest of formaldehyde treated 6 hr sample. (b) Zoom-ins of regions (i), (ii), (iii), (iv) and (v) in (a). Circled signals in (i), (ii), (iii), (iv)  xii  and (v) are assigned to unmodified (solid circles) and modified (dashed circles) insulin peptides α1-4, β22-30, α18-21β14-21, α5-17β1-13 and α5-17β1-13. .............................. 25 Figure 2-4. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 505.613+. ...................................................................................................... 29 Figure 2-5. The MS/MS fragmentation patterns deduced from the MS/MS spectrum of a cross-linked peptide with an MS signal of 505.613+ (Figure 2-4), to illustrate 3 types of fragment ion series........................................................................................................ 32 Figure 2-6. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 903.422+. ...................................................................................................... 34 Figure 2-7. The fragmentation patterns deduced from MS/MS spectrum of a crosslinked peptide with an MS signal of 903.422+ (Figure 2-6). ............................................. 35 Figure 2-8. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 602.623+. ........................................................................................................................................... 36 Figure 2-9. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 549.794+. The * represents an extra Schiff-base modification or cross-link..................................... 39 Figure 2-10. The fragmentation patterns, types of fragment ions and MS/MS spectra of candidates (a) 757.882+ and (b) 763.882+. The * represents an extra Schiff-base modification or cross-link. ................................................................................................ 41  xiii  Figure 3-1. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide  α18  NYCNα21^α1GIVEα4, with an MS signal of  499.692+. ............................................................................................................................ 57 Figure 3-2. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide β22RGFFYTPKAβ30 ^ α18NYCNα21, with an MS signal of 556.613+. ............................................................................................................. 59 Figure 3-3. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide (α18NYCNα21 ^ α1GIVEα4)* , with an MS signal of 505.692+. ....................................................................................................................... 62 Figure 3-4. Structures of cross-linked peptides identified in the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced), with cross-link sites localized to individual residues. ........................................................................................................... 63 Figure  3-5.  MS/MS  spectra  of  the  singly  modified  insulin  α-chain  (α1GIVEQCCASVCSLYQLENYCNα21+12) after (a) 0.5 hr, (b) 6 hr of formaldehyde exposure. In both spectra, the Schiff-base modification is localized to Gα1, and proves not to be on Nα18 or Yα19. ........................................................................................................ 65 Figure 3-6. The two-step reactions (a and c) and proposed reaction schemes (b and d) to form the Gα1 to Yα19 (a and b) and Gα1 to Nα18 (c and d) cross-links. Symbol ^ represents the cross-linker................................................................................................. 67 Figure 3-7. The MS/MS spectrum of the singly modified  β22  RGFFYTPKAβ30  (+12Da) after 0.5 hr of formaldehyde exposure. The yn+12 ion series indicates that Kβ29 was modified within 0.5 hr. .............................................................................................. 68 xiv  Figure 3-8. The (a) two-step reactions and (b) proposed reaction schemes to form the Kβ29 to Yα19 cross-link. Symbol ^ represents the cross-linker..................................... 69 Figure 3-9. (a) The numbering of b and y ions along peptide β22RGFFYTPKAβ30. (b) The MS/MS spectrum of the modified  β22  RGFFYTPKAβ30 (+24Da) after 6 hr of  formaldehyde exposure. (c) A sample calculation of the DOM value of the b6 ion. PA is the abbreviation of peak area. (c) The bar graphs of DOM values of the b and y ions against the peptide sequence. Series of b and y ions together suggest that the two Schiffbase modifications are on Rβ22 and Kβ29 but not on Yβ26. ................................................. 70 Figure 3-10. The (a) two-step reactions and (b) proposed reaction schemes to form the Gα1 to Yβ26 cross-link. Symbol ^ represents the cross-linker. .................................... 71 Figure 3-11. MS spectra of 100 µM myoglobin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr................................................................... 75 Figure 3-12. The 3D plot (LC retention time, m/z, signal intensity) of LC-MS/MS data from the digest of the formaldehyde treated 6 hr myoglobin sample. ...................... 77 Figure 3-13. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of the cross-linked peptide  138  LFRNDIAAKYKE149 ^  150  LGFQG154,  with an MS signal of 667.373+. ......................................................................................... 79 Figure 3-14. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS 43  spectrum  of  the  cross-linked  peptide  (29VLIRLFTGHPETLE42  ^  KFDKFKHLKTE53)*, with an MS signal of 767.974+. The * represents an extra Schiff-  base modification or cross-link. ........................................................................................ 80  xv  Figure 3-15. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS 43  spectrum  of  cross-linked  peptide  (29VLIRLFTGHPETLE42  ^  KFDKFKHLKTEAE55)*, with an MS signal of 817.974+. The * represents an extra  Schiff-base modification or cross-link. ............................................................................. 81 Figure 3-16. (a) The numbering of b and y ions along the myoglobin peptide 138  LFRNDIAAKYKE149. (b) A sample calculation of the DOM values of detectable b and  y ions. PA is the abbreviation of peak area. (c) Bar graphs of DOM values of b and y ions from peptide  138  LFRNDIAAKYKE149 with a cross-link bridge attached against the  peptide sequence, derived from the MS/MS spectrum of peptide 138LFRNDIAAKYKE149 ^ 150LGFQG154 (Figure 3-13b) .......................................................................................... 86  xvi  List of Abbreviations MS  mass spectrometry(ic)  MS/MS  tandem mass spectrometry  m  mass  z  charge state  m/z  mass to charge ratio  SDS-PAGE  sodium dodecyl sulfate polyacrylamide gel electrophoresis  HPLC  high performance liquid chromatography  MALDI  matrix-assisted laser desorption/ionization  ESI  electrospray ionization  Q  quadrupole mass analyzer  TOF  time of flight mass analyzer  CID  collision-induced dissociation  PFA  paraformaldehyde  PBS  phosphate buffer saline  DOM  degree of modification  hr  hour(s)  DTT  dithiothreitol  xvii  IAA  2-iodoacetamide  BS3  bis(sulfosuccinimidyl)suberate  DTSSP  3,3´-dithiobis(sulfosuccinimidylpropionate)  Sulfo-EGS  ethylene glycol bis[sulfosuccinimidylsuccinate  DSG  disuccinimidyl glutarate  DSP  dithiobis[succinimidyl propionate  DSS  disuccinimidyl suberate  EGS  ethylene glycol bis[succinimidylsuccinate]  DMSO  dimethyl sulfoxide  xviii  Acknowledgements I would like to thank foremost my supervisor Dr. Juergen Kast for training me as a M.Sc student, and allowing me to work on an exciting project. His expertise in the research field and guidance from start to finish enabled me to develop skills in many aspects of research. The completion of this research project has benefited from the assistance and support of many present and past lab-mates (Jason, Shujun, Cordula, Qing, Chengcheng, Arash, Liwen, Davin, Savita, Jiqing, Ru, Geraldine, Judy). Their kindness in helping others and patience to me made the great atmosphere in the lab and helped me to solve many problems in research. Last and most importantly, I would like to thank my parents for their emotional support from the other side of the Pacific Ocean, and my husband Peng who went through all the difficult times together with me in Canada.  xix  Dedication  To my parents and husband  xx  1 Introduction 1.1 Mass Spectrometry of Proteins and Peptides 1.1.1 Protein Characterization Mass spectrometry (MS) is an analytical technique that measures the mass-tocharge ratio (m/z) of ions. MS is made up of three distinct steps: analytes are ionized in the ion source, separated based on different m/z in the mass analyzer, and the number of ions at each m/z value are recorded by the detector. MS analysis has become an essential tool for the characterization of proteins which are vital biomolecules in a living organism, since the development of electrospray ionization (ESI)1and matrix assisted laser desorption ionization (MALDI)2-3. These soft ionization methods made possible the vaporization and ionization of proteins and peptides, which are involatile and thermally unstable. ESI and MALDI have enabled the large-scale protein characterization by mass spectrometry, i.e. proteomics. MS is highly sensitive and requires as little as zeptomole amounts of proteins4-5, making it useful when only small amounts of proteins at low concentrations can be isolated from living cells or organisms. Particularly useful in biological samples, MS provides rapid analysis of many proteins at once6. A typical workflow of MS-based protein characterization starts with the isolation of proteins from cell lysate or fractions thereof, which are enzymatically digested into peptides. Peptide mixtures are usually complex, requiring separation by liquid chromatography (LC), prior to introduction into a mass spectrometer7. The m/z values of all detectable peptides, which contain information of their masses and charge states, are  1  recorded in the MS spectra. The identification of proteins can be done by comparing mass spectra (m/z values) to reference lists of peptide masses that are generated from theoretical digestion of proteins in protein databases, an approach called peptide mass fingerprinting8-10. However, the mass of a peptide is not a unique identifying factor, as different combinations of amino acid sequences can produce the same mass. In order to confirm the identity (sequence), a peptide is fragmented into diagnostic fragment ions using tandem mass spectrometry (MS/MS). In MS/MS, a peptide ion of a specific m/z value is selected in a first mass analyzer as the precursor ion, and passed into a reaction chamber where fragmentations along the peptide backbone are induced in a predictable manner. Fragment ions are drawn out of the collision cell, and enter a second mass analyzer where their m/z values are measured. MS/MS spectra of various peptide ions and peptide masses are searched against reference spectra and masses stored in protein databases using software such as Mascot11, SEQUEST12 and X!Tandem13, to identify various proteins in the sample. Disulfide bonds in proteins are usually reduced and alkylated during the enzymatic digestion, because MS/MS spectra of disulfide connected peptides are too complex and thus not included in databases.  1.1.2 Peptide Sequencing The confirmation of peptide sequences by MS/MS is a key step in MS-based protein characterization. Although there are various peptide fragmentation methods such as electron-capture dissociation (ECD), electron-transfer dissociation (ETD) and infrared multiphoton dissociation (IRMPD), collision-induced dissociation (CID) is the most common fragmentation method in commercial MS instruments. In CID, peptide ions are activated by energetic collisions with an inert gas to initiate the fragmentation. 2  The CID fragmentation of peptides occurs mostly along the peptide backbones 14-15. A nomenclature system16-17 describes different types of backbone ions formed from fragmentation at different types of bonds (Figure 1-1). Backbone ions that contain the Nterminus and C-terminus of the peptide are labeled with (a, b, c) and (x, y, z) respectively, followed by numerical subscripts that identify the position of the amino acid where each fragmentation occurs. B and y ions from fragmentations at peptide bonds are major fragment ions in CID fragmentations18-19. During multiple collisions with the inert gas molecules, some b and y ions are fragmented further into internal ions that can complicate spectral analysis20. The peptide sequence can be determined by comparing the MS/MS spectrum to reference spectra in protein databases, as described in Chapter 1.1.1. An alternative of peptide sequencing is de novo sequencing that matches mass differences between successive b, y and internal ions to masses of amino acids. MS/MS spectra can also be used to localize modifications based on the mass shifts they produce on modified residues and series of ions that contain modified residues21-23.  Figure 1-1. The scheme of the nomenclature system to describe different types of ions from fragmentations along the peptide backbone.  CID fragmentations of peptides are mainly charge-directed processes24-25. The fragmentation at a peptide bond and the formation of b and y ions is initiated by a proton attached to the nitrogen or oxygen of that peptide bond. Protonation sites on a peptide include energetically more favored ones, the N-terminus and side chains of basic residues 3  (Arginine and Lysine), and less favored ones, oxygen and nitrogen atoms at peptide bonds. When the peptide is ionized, protons tend to attach to the energetically more favored N-terminus and side chains of basic residues. Upon activation of the peptide by collisions, they migrate to less favored oxygen and nitrogen atoms throughout the peptide that lead to fragmentations of various peptide bonds26. The resulting fragmentation pattern contains successive b and y ions. Although fragmentation generally occurs on all peptide bonds, predominant fragmentations at certain peptide bonds are observed in some arginine containing peptides26-29.  1.2 Protein Interactions 1.2.1 Affinity Enrichment Coupled with Mass Spectrometry Significant efforts have been made to apply the prototype workflow of protein characterization to large-scale protein profiling of various organelles30-33 and even whole yeast proteome34-35. Protein profiling studies reveal a whole map of proteins, including their identities and locations. However, profiling studies provide little information about protein-protein interactions, which are extraordinarily important in basic cellular processes such as protein synthesis36 and signal transduction37. Changes in these interactions can reflect abnormal states of protein function and therefore disease states. An MS-compatible technique to study protein-protein interactions is affinity enrichment. Affinity enrichment isolates a target protein together with its interaction partners from cell lysates by e.g. interactions between the target protein and specific antibodies. After washing other cellular components away, the purified interacting proteins can be submitted to enzymatic digestion and subsequent steps in the MS-based 4  protein characterization workflow to identify interacting proteins. Affinity enrichment coupled with MS analysis has recently been widely used in studies of protein-protein interactions38-39 in yeast40, E. coli41 and human cell lines42. Affinity enrichment, however, has three limitations. Firstly, it tends to retain interaction partners with strong interactions which can survive the washing step, causing a loss of transient or weak interactions and therefore false negative identifications of interacting proteins. Secondly, as cell lysis removes the temporal and spatial constraints on protein-protein interactions, proteins that are separated by organelle membranes can come into contact and interact with each other, causing false positive identifications of interacting proteins43-44. Thirdly, this approach does not provide any geometry information of protein-protein interactions.  1.2.2 Protein Cross-linking In order to reduce false positive and negative identifications of interacting proteins, and to preserve geometry information of protein-protein interactions, a cross-linking approach can be applied to living cells before cell lysis and affinity enrichment38,45-46.  Figure 1-2. (a) The scheme of the two-step reaction to form cross-links between proteins; (b) The scheme of cross-linked proteins, its digest and types of peptides in the peptide mixture.  5  In the cross-linking process, two proteins that are in close proximity are linked together through the formation of covalent bonds via a small bifunctional molecule known as a chemical cross-linker. The covalent bonds between interacting proteins not only “freeze” the protein-protein interactions that occur in living cells with all spatial and temporal constraints, but also keep weak and transient interactions, thereby reducing false positives and negatives in the identification of interacting proteins. Cross-linkers can be considered as a bridge with one reactive group at each end. Cross-linking reactions occur in two steps (Figure 1-2a): one end of the bridge forms a covalent bond with a protein molecule to generate a modification on the protein, the other end then forms another covalent bond with another or the same protein to generate a cross-link bridge between proteins or within one protein. Consequently, cross-linked proteins may contain multiple modifications and cross-links (Figure 1-2b). Upon enzymatic digestion, cross-linked proteins turn into a mixture of unmodified peptides, modified peptides, peptides with intra-peptide cross-links and cross-linked peptides. The peptide mixture reveals different aspects of protein-protein interactions. Unmodified peptides are predominant and can be used to identify interacting proteins. Cross-linked peptides indicate the regions in proteins that are in close proximity, which are usually regions of interaction. Assigning the cross-link bridges to small segments on the component peptides or even individual residues can further pinpoint regions of interaction. Cross-linked peptides and cross-link sites therefore reveal how the interacting proteins interact with each other47, which is hidden without cross-linking. A number of commercially available cross-linkers can be applied to living cells, as shown in Table 1-1. They form different lengths of cross-link bridges (spacer arm length), 6  ranging from 2.5 Å to 16.1 Å in these examples. Generally, a shorter spacer arm is preferred. Longer spacer arms allow cross-linking of proteins that may not interact, and lower the resolution of geometry information provided by cross-link sites. Most of these cross-linkers, except formaldehyde which will be discussed in Chapter 1.2.3, can be classified into two types by their properties. Water soluble cross-linkers, such as BS3, DTSSP and sulfo-EGS, can be added to living cells under aqueous physiological conditions but cannot permeate cell membranes. They have been used to study proteins on cell surfaces48-50. Water insoluble cross-linkers, such as DSG, DSP, DSS and EGS, can permeate cell membranes and have been used to cross-link intracellular proteins51-53. They need to be dissolved in organic solvents such as DMSO, however, which disturbs the physiological conditions of living cells.  7  Table 1-1. Names, structures, properties and spacer arm lengths of cross-linkers applicable to living cells.  Name  Structure  Properties  BS3  DTSSP  Spacer Arm Length (Å) 11.4  Water -Soluble & Membrane -Impermeable  12.0  Sulfo-EGS  16.1  DSG  7.7  DSP  DSS  Water -Insoluble & Membrane -Permeable  EGS  Formaldehyde  12.0  11.4  16.1 Water -Soluble & Membrane -Permeable  2.5  8  1.2.3 Advantages of Formaldehyde as the Cross-linker Formaldehyde was found to be able to cross-link proteins decades ago. This feature has been widely used to preserve tissues, such as clinical biopsies, by cross-linking proteins, DNA and RNA so that they are fixed in position. In the last decade, it has been used as an especially suitable cross-linker for in vivo cross-linking of proteins coupled with affinity enrichment and MS analysis54-59,60 ,61. As a small molecule, formaldehyde can quickly permeate cell membranes and diffuse quickly inside the cells, resulting in efficient cross-linking of cellular proteins44 that is thought to capture transient interactions43. The corresponding cross-link bridge is approximately 2.5 Å43. Therefore, it only cross-links residues within very close proximity, reducing false positive identifications of interaction partners and allowing high-resolution geometry mapping by cross-link sites. Formaldehyde is also water-soluble and thus no organic solvent is needed for dissolving which is required by all other membrane-permeable cross-linkers53,62-65. Practically, formaldehyde is inexpensive and widely available. These advantages make formaldehyde a useful cross-linker for the study of protein-protein interactions in living cells.  1.2.4 Formaldehyde Cross-linking in Living Cells and Organisms Formaldehyde cross-linking coupled with affinity enrichment and MS analysis has been applied to identify interacting proteins in various living cells and even organisms: bacteria54, yeast57-58, mammalian cells55,59,61, and whole mice brains56,60. In these studies, formaldehyde cross-linking has been shown to be compatible with different protocols of affinity enrichment, e.g. co-immunoprecipitation of endogenous proteins56,59-61and 9  enrichment of tagged proteins55,57-58. Furthermore, cross-links can survive both nondenaturing55-56,59-61 and denaturing57-59 washing conditions. Although the protein of interest varies in both location and function in these studies, the cross-linking approach allowed identification of both known and novel potential interaction partners with all of them. Therefore, formaldehyde cross-linking is a versatile approach that couples well with different cells and organisms, target proteins, and experimental designs. However, it is currently limited to the identification of interacting proteins, and mapping the regions of interaction by cross-linked peptides and cross-link sites has yet to be realized.  1.3 Identification of Cross-linked Peptides 1.3.1 Challenges in the Identification of Cross-linked Peptides The identification of cross-linked peptides and cross-link sites by MS and MS/MS analysis generally meets two major challenges with all cross-linkers. The first arises from the complexity of the digest of cross-linked proteins. It contains not only unmodified peptides from all interacting proteins, but also peptides of these types: modified peptides, peptides with intra-peptide cross-links, cross-linked peptides, and cross-linked peptides with additional modifications and/or cross-links66. Cross-linked peptides are usually of low abundance and form only a subset of this peptide mixture67-68. As a result, it is difficult to identify the MS signals of cross-linked peptides. In addition, cross-link sites are usually identified by the MS/MS spectra of cross-linked peptides, which are beyond the scope of commonly used software tools for the identification of user defined modifications on peptides, such as Mascot and Protein Prospector69. MS spectra of crosslinked peptides usually contain backbone ions from component peptides, backbone ions 10  with (part of) the cross-link attached, and backbone ions with both the cross-link and (part of) the other component peptide attached67. Significant efforts have been put to the development of experimental strategies and computational tools to facilitate the identification of low abundance cross-linked peptides from the complex peptide mixture67,70-79.  1.3.2 Experimental Strategies and Bioinformatics Software to Facilitate the Identification of Cross-linked Peptides Several strategies, including chromatographic enrichment of cross-linked peptides and cross-linkers with signature patterns, have been developed to distinguish cross-linked peptides from other types of peptides. The chromatographic enrichment can be done directly by ion exchange chromatography, which takes advantage of differences in charged groups between cross-linked and other peptides. For tryptic peptides whose Nterminus and the basic side chain of the C-terminal residue each carries a positive charge, cross-linked peptides can carry twice the positive charges compared to other peptides. Cross-linked peptides have been shown to elute at higher salt concentrations from strong cation exchange (SCX) material than other peptides70-72. Similarly, peptides that end with an acidic amino acid, such as Glu-C peptides, could carry twice the negative charges compared to other peptides, resulting in cross-linked peptides eluting at higher salt concentration  from  strong  anion  exchange  (SAX)  material.  Size  exclusion  chromatography (SEC) which capitalizes on the higher molar mass and bulkier size of cross-linked peptides is also a possible chromatographic enrichment method67. In addition, affinity chromatography is used for cross-linkers with affinity tags, to enrich  11  both cross-linked peptides and modified peptides from the predominant unmodified peptides73-74. Other signature patterns have also been developed to distinguish crosslinked peptides from other peptides, such as isotopic labeling of the cross-linker75-76 , and chemically77 or MS/MS cleavable cross-linkers78-79. A number of bioinformatics software platforms have been developed in the last decade to automate the data interpretation71,80-90. These programs share a general working philosophy to identify cross-linked peptides from the peptide mixture. A library of possible cross-linked peptides is created according to user defined parameters such as identities of involved proteins, the specificity of enzymatic digestion, the cross-linker structure and reactive amino acids. Matches between this library and the experimental peaklist are reported as cross-linked peptides. Some software66,71,80-83,85,87,89allow further confirmation of cross-linked peptides by matching their MS/MS spectra to theoretical MS/MS spectra, which are generated by preset fragmentation models19,91 and residue reactivity. The assignment of MS/MS signals helps to elucidate structures of cross-linked peptides, in other words, cross-link sites. The results from software can be manually interpreted for the final verification. Combining the experimental strategies and bioinformatics software, cross-linked peptides can be identified with cross-link sites assigned, thereby providing information on geometry of protein-protein interactions.  12  1.4 Model Studies of Formaldehyde Cross-linking Reactions 1.4.1 Two-step Reactions Besides the general challenges in identification of cross-linked peptides, the formaldehyde cross-linking approach poses an extra challenge of unclarified reaction chemistry, which has limited the application of bioinformatics software because they require residue reactivity for the analysis of both MS and MS/MS data. Unlike other designed cross-linkers with defined cross-linking reactions, the reaction chemistry of formaldehyde cross-linking has been studied in non-physiological models: small model molecules and amino acids92-99, as well as peptides and proteins100-104.  Figure 1-3. The scheme of the two-step formaldehyde cross-linking reaction in proteins.  Formaldehyde cross-linking of proteins consists of two steps (Figure 1-3). An amino group reacts with formaldehyde to form a methylol intermediate (+30 Da), and can subsequently dehydrate into a Schiff-base structure (+12 Da). The Schiff-base structure can then cross-link to a reactive residue in another protein, and turn into a methylene bridge which adds 12 Da to the two proteins. According to studies in small molecule and amino acid model systems under physiological pH and temperature92-98, reactive residues in the cross-linking step include the N-terminus, lysine (K), tyrosine (Y), arginine (R), asparagine (N), glutamine (Q), tryptophan (W), histidine (H) and cysteine (C).  13  (Abbreviations of amino acids are listed in Appendix A.1.) Chemical structures connected to the methylene bridges have been revealed by X-ray crystallography92 and NMR spectroscopy99 in model molecules and amino acids.  1.4.2 Residue Reactivity Residue reactivity discovered in small model molecules and amino acids does not necessarily apply to proteins, as local environments around reactive residues in proteins are very different from those around separate amino acids in solution. Three recent reports on model peptides and proteins shed some light on the residue reactivity in each reaction step101-102,104. The solvent accessible N-termini and Ks have been shown to be major sites to form Schiff-base modifications in several model proteins, and Arg residues are also reactive but less reactive then N-termini and Lys residues104. This study was performed under reactions conditions (formaldehyde concentration, reaction time, temperature and pH) that closely resemble those applied to living cells and organisms, therefore the N-termini, Ks and Rs on the surface of cellular proteins are very likely involved in the modification step which initiates the cross-linking. Residues that could be reactive in the cross-linking step have been revealed by cross-linking of glycine with a Schiff-base modification to various model peptides102 and a model protein101. Major potentially reactive residues in model peptides are N-termini, Arg, Tyr, Asn, Gln, His and Trp residues, among which N-termini, Arg, Tyr and Gln residues also show potential reactivity in the cross-linking step in one model protein. (Abbreviations of amino acids are listed in Appendix A.1.) However, these studies have 14  been performed by elongated formaldehyde incubation, two days for peptides and one week for the protein. Furthermore, observed cross-links are amino acid to peptide or protein, which does not well represent protein to protein cross-links due to a lack of proper local environment at both ends of the cross-link. Therefore, the cross-linking step on cellular proteins within short formaldehyde incubation (<1hr) is hypothesized to occur on a subset of these reactive residues: N-termini, Arg, Tyr, Asn, Gln, His and Trp.  1.4.3 Model Proteins In recent reports of formaldehyde induced reactions in non-physiological model proteins, insulin and myoglobin are two major model proteins. Insulin and myoglobin are both properly folded proteins with secondary and tertiary structures (Figure 1-4a and 14b), which keep them stable when they are secreted to blood105-106 and exposed to various enzymes. Also, they are small proteins of 5.7 kDa (insulin) and 17 kDa (myoglobin), and only generate 6 and 14 peptides upon enzymatic digestion (sequences and enzyme cleavage sites shown in Figure 1-4c and 1-4d). Therefore, the peptide mixtures of unmodified, modified and cross-linked peptides from formaldehyde treated model proteins are relatively simple, and more manageable than those from average-size proteins. Lastly, both proteins contain a number of residues that can be involved in twostep cross-linking reactions, as highlighted with colors in Figure 1-4c and 1-4d. The properly folded structure, relatively simple peptide mixture, and existence of reactive residues make insulin and myoglobin suitable model proteins for studies of formaldehyde induced reactions in proteins.  15  Figure 1-4. The 3D structures (a and b) and sequences (c and d) of bovine insulin (a and c) and horse myoglobin (b and d). In (a) and (b), red highlights alpha-helix regions, while green highlights flexible regions. In (d) and (d), solid lines denote disulfide bonds, and dashed lines denote Glu-C cleavage sites. Residues highlighted with colors are reactive in the modification step (orange), potentially reactive in the cross-linking step (hotpink), and (potentially) reactive in both steps (blue).  1.5 Thesis Theme and Overview Formaldehyde cross-linking has been shown to preserve protein interactions in living cells and organisms, but the analysis of the interactions is currently limited to identification of interacting partners based on unmodified peptides. The geometry information of native protein-protein interactions contained in cross-linked peptides is already in the MS data of the digest of cross-linked proteins, but obscured by several related challenges: the low abundance of cross-linked peptides among the complex peptide mixture, the lack of knowledge about their mass spectrometric (MS) properties, 16  and limited understanding of the reaction chemistry. The former challenge can be solved by a combination of experimental strategies and bioinformatics software. However, so little is known about the MS properties of formaldehyde cross-linked peptides that they are not identified even in cross-linked non-physiological model proteins. The identification and characterization of cross-linked peptides in a model protein is the key issue. It can open the door to clarification of reaction chemistry (residue reactivity) which is required by bioinformatics software, as well as development of chromatographic enrichment methods in the future. This work aims to develop a method to identify cross-linked peptides and characterize the formaldehyde cross-linking reactions in model protein systems. Model proteins are cross-linked in aqueous buffers and enzymatically digested. The resulting peptide mixture is separated by LC to reduce sample complexity, and analyzed by an ESI mass spectrometer which is directly coupled to LC. In order to reduce the complexity of model protein systems, one protein is used in each system. A simple protein is used for the initial method development (Chapter 2), while other model proteins of equivalent and higher complexity are used to refine the method (Chapter 3). Along with the identification of cross-linked peptides, the fragmentation patterns shown in the MS/MS spectra of cross-linked peptides are also investigated. Also studied are the localization of cross-link sites by MS/MS spectra and the chemistry of formaldehyde cross-linking reactions on these sites.  17  2 Method Development in a Model Protein System to Identify Cross-linked Peptides and Localize Cross-links 2.1 Introduction The formaldehyde cross-linking approach has the potential to reveal the geometry of native protein-protein interactions by the identification of cross-linked peptides43. It has yet to be realized, however, because of three challenges: the low abundance of crosslinked peptides in the complex peptide mixture, lack of knowledge about the massspectrometric (MS) properties of cross-linked peptides, and limited understanding of the chemistry of formaldehyde cross-linking in proteins. The latter two challenges have obstructed the development of bioinformatics software to automate data interpretation that usually helps to overcome the former107. Therefore, clarifying the MS properties and the reaction chemistry in model protein systems are the key issues to tackle. So far, there has been neither report on the identification of cross-linked peptides in a model protein system, nor investigations of MS/MS data of cross-linked peptides and further studies of residues involved in the cross-linking reaction. In this chapter, the aim is to develop a method to identify cross-linked peptides in a model protein system and to gain knowledge about their MS properties. A model protein is cross-linked in aqueous buffer, and digested to produce a mixture of unmodified, modified and cross-linked peptides. The peptide mixture derived from a single model protein is less complex than that derived from cross-linked native proteins, making it easier to identify cross-linked peptides. However, the cross-linking only occurs on model protein molecules that randomly come into close contact. The cross-linking yield is 18  therefore hypothesized to be much lower than that formed on interacting proteins. Therefore, cross-linked peptides in a model protein are also of low abundance in a relatively simple peptide mixture. To simplify the model system, a small protein discussed in Chapter 1.4.3, insulin, is chosen as the model protein. It is composed of an α and a β chain, with two inter-chain disulfide bonds (Figure 2-1). Only four peptides are produced upon Glu-C digestion when the disulfide bonds are not reduced, which are later mentioned by their sequences: α1-4, α5-17 β1-13, α18-21 β14-21 and β22-30. Consequently, the formaldehyde treated insulin generates a relatively simple peptide mixture after digestion, making it a good model protein for the method development.  Figure 2-1. The sequence of insulin from bovine. Solid lines denote disulfide bonds, dashed lines denote Glu-C cleavage sites.  2.2 Experimental 2.2.1 Materials Insulin from bovine pancreas, α-cyano-4-hydroxycinnamic acid (CHCA), trizma base, ammonium bicarbonate, sodium hydroxide, sodium dodecyl sulfate (SDS), tetramethylethylenediamine (TEMED) and glycerol were all obtained from Sigma (St. Louis, MO). Paraformaldehyde (PFA), formic acid (FA, 88%) and acetonitrile (ACN, HPLC grade) were purchased from Fisher (Fair Lawn, NJ). Acrylamide, ammonium persulfate (APS), bromophenol blue, Coomassie Blue Brilliant R250, gel casting and 19  running systems were purchased from Biorad (Hercules, CA). Endoproteinase Glu-C was obtained from Roche Applied Science (Penzberg, Germany). 3 kDa MW-cut-off filters were purchased from Millipore Corporation (Cork, Ireland), syringe filters (0.22 µm) were purchased from Pall Corporation (Ann Arbor, MI). Deionized water (18 MΩ cm) was prepared using a Nanopure Ultrapure Water System from Barnstead (Dubuque, IA).  2.2.2 Preparation of Formaldehyde Solution A 4% (w/v) (1.3 M) formaldehyde stock solution was prepared by heating (80 ℃) PFA in phosphate buffer saline (PBS) at pH 7.5 for 30 min, cooling to room temperature and filtering through a 0.22 µm filter.  2.2.3 Cross-linking of the Model Protein The model protein, insulin (300 µM), was incubated with formaldehyde (1%, w/v) in PBS (37 ℃, pH 7.5) for 0, 0.5, 2 and 6 hr. The reaction was quenched through the addition of 1 M Tris buffer (pH 7.5), and a final Tris concentration of 0.5 M was reached. The 0 hr sample was prepared by quenching 1% formaldehyde with 1 M Tris buffer 10 minutes before the addition of protein. Control samples of all four time points were prepared by replacing the formaldehyde volume with PBS. Three repeats of the reaction were performed, and all of the following experimental steps were applied to each sample. The model protein was concentrated with 3 kDa MW-cut-off filters, and the buffer was replaced with 0.01% formic acid in water.  20  2.2.4 SDS-PAGE Analysis of Cross-linked Insulin Insulin samples (8.6 µg) in 0.01% formic acid were mixed with PBS and 4× nonreducing SDS loading buffer (500 mM Tris pH 6.8, 8% SDS, 40% glycerol, 5mg/mL bromophenol blue) with a final pH of 7.5, and incubated at 65 ℃ for 5 min. Proteins were then separated on a 15% acrylamide gel and visualized by Coomassie Brilliant Blue R250.  2.2.5 Glu-C Digestion of Cross-linked Insulin Insulin in 0.01% formic acid was digested overnight at 25 ℃ in 50 mM ammonium bicarbonate (pH 7.8) with endoproteinase Glu-C (enzyme:substrate = 1:20 (w/w)). The digestion was quenched by decreasing the pH using 5% formic acid in water (Vdigestion:Vacid=10:1), and samples were stored at -20 ℃.  2.2.6 Mass Spectrometric Analysis of Peptides Peptide samples were diluted in water to 3 µM, then separated and analyzed by nano-HPLC MS and MS/MS on a nanospray-ESI-Q-TOF (QStar XL, Applied Biosystems, Foster City, CA) in the information-dependent-acquisition mode. The 15 cm long, 75 µm I.D. HPLC column was lab-made, packed with 3 µm reverse phase C18 beads (Dr. Maisch, Ammerbuch-Entringen, Germany). Water:acetonitrile:formic acid with 100 min gradient elution (0.1% formic acid to 80% acetonitrile 0.1% formic acid) was used as the mobile phase. MS/MS spectra were collected with nitrogen as the collision gas, and the collision energy varied as an optimized function of m/z and z.  21  2.2.7 Labeling of MS/MS Spectra To label MS/MS spectra of candidates of cross-linked peptides, the widely accepted b and y nomenclature system was modified to clarify component peptides and disulfide bonds. Component peptides of each precursor ion were labeled by Roman Numberals, I, II, III etc. Disulfide bonds were represented by “-”, for example, I-II. The cross-link bridge and modifications are represented by their mass, 12 or 30. As an example of this modified nomenclature system, I-IIy5+12+III represents the following fragment ion:  2.3 Results and Discussion 2.3.1 Cross-linking of Insulin At the beginning of the cross-linking experiments, insulin was incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hours, and quenched with Tris buffer. These samples were separated by SDS-PAGE with a 15% gel (Figure 2-2) to monitor the reactions.  22  Figure 2-2. The SDS-PAGE gel of 300 µM insulin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr.  In the gel, control samples remained unchanged when incubated for different periods of time. As the reaction between insulin and formaldehyde proceeded from 0 to 6 hr, higher mass species appeared and became larger proportion of the sample. This indicated the formation of cross-links between insulin molecules and the increase in yield with the longer duration of formaldehyde treatment. The majority of the protein in the formaldehyde treated 6 hr sample was cross-linked. This sample was selected for identification of cross-linked peptides after digestion due to the high yield of cross-links.  2.3.2 Complexity of Insulin Peptide Mixture The formaldehyde treated samples were digested and analyzed by LC-MS/MS. The LC-MS data of the 6 hr sample is presented as a 3D plot (Figure 2-3a) where X axis is the LC retention time, Y axis is the m/z ratio, and grayscale represents the intensity of peptide signals. The number of signals demonstrated the complexity of the peptide mixture from formaldehyde treated insulin. Some of the signals could be clearly assigned to unmodified insulin peptides by mass and MS/MS spectra. Five signals, 417.231+, 543.762+, 689.282+, 731.854+ and 975.463+ were assigned to insulin peptide α1-4, β22-30, 23  α18-21β14-21, α5-17β1-13 and α5-17β1-13, respectively. For easier visualization of MS signals, zoom-ins of regions around these signals are shown in Figure 2-3b, with these five signals from unmodified insulin peptides highlighted by solid circles. Since formaldehyde induced modifications cause a (12m+30n) Da (m and n are zero or positive integers and cannot be both zero) mass shift, signals that show such mass shifts relative to those five signals could be considered as derived from modified insulin peptides. Similar LC retention time of these peptides to the respective unmodified peptides or MS/MS spectra further supported that they were corresponding modified forms. Signals assigned to modified insulin peptides were highlighted by dashed circles in Figure 2-3b, including those roughly assigned by mass only. A large number of signals in Figure 2-3b could not be assigned to unmodified or modified insulin peptides. All the signals in Figure 2-3a outside of the zoom-in regions were not from unmodified or modified insulin peptides, either. Possible source peptides of these unknown signals are: cross-linked insulin peptides, Glu-C autolysis products, digested impurities of the insulin sample, oxidized peptides and common contaminants in MS spectra of protein digests108. In theory, source peptides of unknown MS signal, including cross-linked peptides, could be sequenced and identified by MS/MS spectra. However, it is not feasible to identify cross-linked peptides by determining source peptides of all the unknown signals, for three reasons: there are a large number of unknown signals; most unknown signals are of low abundance; little is known about CID fragmentation patterns of formaldehyde cross-linked peptides. Therefore, the large pool of unknown signals needs to be reduced to a smaller pool of candidates of cross-linked peptides. 24  Figure 2-3. (a) The 3D plot (LC retention time, m/z, signal intensity represented by grayscale) of LC-MS/MS data from the digest of formaldehyde treated 6 hr sample. (b) Zoom-ins of regions (i), (ii), (iii), (iv) and (v) in (a). Circled signals in (i), (ii), (iii), (iv) and (v) are assigned to unmodified (solid circles) and modified (dashed circles) insulin peptides α1-4, β22-30, α18-21β14-21, α5-17β1-13 and α5-17β1-13.  2.3.3 Candidates of Cross-linked Peptides Candidates of cross-linked peptides can be identified by matching unknown experimental MS signals to a theoretical library of possible cross-linked peptides. The theoretical list of cross-linked peptides is made by considering all of the observed 25  modified and unmodified insulin peptides as possible component peptides and combining them one by one. Table 2-1 shows the combinatory table of insulin peptides in the Glu-C digest of formaldehyde treated insulin. The first column/row lists the four insulin peptides and their modified forms assigned to the circled signals in Figure 2-3b, which are possible component peptides in this sample. The second column/row lists the masses of these peptides. These masses are summed up one by one to make the rest of the table, i.e. a theoretical mass list of possible cross-linked peptides in this insulin sample as the cross-linking step does not import any more mass shift relative to component peptides. Table 2-1. The theoretical mass list of possible cross-linked peptides in the digest of formaldehyde treated 6 hr sample, made by considering unmodified and modified insulin peptides as possible component peptides and summing up their masses one by one. Masses in bold are used as examples to illustrate proposed structural components of putative cross-linked peptides. Underlined masses are those of candidates of cross-linked peptides, identified by matching this table with the masses of unknown signals in Figure 2-3a. “Mod” is short for modification(s).  Origin  Origin  α1-4  α1-4 +1mod  α1-4 +2mod  α1-4 +2mod  Mass  416.23  416.23 +12  416.23 +24  416.23 +42  α18-21 α18-21 α5-17 β14-21 β14-21 β1-13 +1mod +1mod 1085.57 1085.57 1085.57 1085.57 1376.61 1376.61 1085.57 1376.61 2923.33 +12 +30 +24 +54 +12 +30 β22-30  β22-30 +1mod  β22-30 +1mod  β22-30 +2mod  β22-30 +3mod  α1-4  416.23  832.46  α1-4+1mod  416.23+12  844.46  856.46  α1-4+2mod  416.23+24  856.46  868.46  880.46  α1-4+2mod  416.23+42  874.48  886.48  898.48  β22-30  1085.57  1501.8  1513.8  1525.8  1543.82 2171.14  β22-30+1mod  1085.57+12  1513.8  1525.8  1537.8  1555.82 2183.14 2195.14  β22-30+1mod  1085.57+30  1531.82 1543.82 1555.82 1573.84 2201.16 2213.16 2231.18  β22-30+2mod  1085.57+24  1525.8  β22-30+3mod  1085.57+54  1555.82 1567.82 1579.82 1597.84 2225.16 2237.16 2255.18 2249.16 2279.18  1537.8  1549.8  α18-21 β14-21  916.5  1567.82 2195.14 2207.14 2225.16 2219.14  α18-21β14-21  1376.61  1792.84 1804.84 1816.84 1834.86 2462.18 2474.18  2492.2  2486.18  2516.2  2753.22  α18-21β14-21+1mod  1376.61+12  1804.84 1816.84 1828.84 1846.86 2474.18 2486.18  2504.2  2498.18  2528.2  2765.22 2777.22  α18-21β14-21+1mod  1376.61+30  1822.86 1834.86 1846.86 1864.88  2492.2  2504.2  2522.22  2516.2  2546.22 2783.24 2795.24 2813.26  α5-17β1-13  2923.33  3339.56 3351.56 3363.56 3381.58  4008.9  4020.9  4038.92  4032.9  4062.92 4299.94 4311.94 4329.96 5846.66  The table proposes masses of possible cross-linked peptides by combining possible component peptides, so it implicates structural components of the predicted cross-linked peptides. Three masses in Table 2-1 are highlighted in bold as examples to illustrate this. The mass 844.46 is in the column of α1-4 and the row of α1-4+1mod (12 Da) (“mod” is the abbreviation of modification), which means that a putative cross-linked peptide of 26  this mass would consist of two peptide α1-4s and the cross-link bridge (12 Da) formed from the 12 Da modification. The mass 856.46 is in the column of α1-4 and the row of α1-4+2mod (24 Da), indicating the component structures to be two peptide α1-4s, the cross-link bridge formed from one 12 Da modification, and an extra 12 Da structure formed from another 12 Da modification. This extra 12 Da structure can be the original 12 Da Schiff-base modification or an intra-peptide cross-link bridge on one component peptide, or a second cross-link bridge between two component peptides. A putative crosslinked peptide of m=874.48 Da, in the column of α1-4 and the row of α1-4+2mod (42 Da), is proposed to contain two peptide α1-4s, a cross-link bridge formed from the 12 Da modification, and a 30 Da modification on one component peptide. This theoretical mass list and the masses of unknown signals in Figure 2-3a were matched, and background signals that also appeared in the control samples were eliminated, generating a small poll of six candidates of cross-linked peptides (masses underlined in Table 2-1). Their masses, m/z of MS signals, and proposed structural components are listed in Table 2-2. Later, these candidates are referred to by MS signals: 505.613+, 757.882+, 763.912+, 602.623+, 903.422+ and 549.794+.  27  Table 2-2. The m/z value, mass and proposed structural components of candidates of crosslinked peptides in the digest of the formaldehyde treated 6hr insulin sample. The ^ represents the cross-link bridge, while * represents an extra Schiff-base modification or intra-peptide cross-link on one of the component peptides, or a second cross-link bridge between two peptides.  m/z  Mass  Proposed Structural components  505.613+ 757.882+  1513.80  763.912+  1525.80  (β22RGFFYTPKAβ30 ^ α1GIVE α4)*  602.623+ 903.422+  1804.80  (α18NYCNα21-β14ALYLVCGEβ21) ^ α1GIVEα4  549.794+  2199.16  (β22RGFFYTPKAβ30 ^ β22RGFFYTPKAβ30)*  β22  RGFFYTPKAβ30 ^ α1GIVE α4  2.3.4 Verification of the Candidate 505.613+ by the MS/MS Spectrum MS/MS spectra of the six candidates were collected to verify proposed structural components including component peptides, cross-links and modifications. More specifically, fragment ions were proposed based on fragmentations along peptide bonds of proposed component peptides and at the cross-link bridge, and matched with signals in the MS/MS spectra. Series of matches between theoretical fragment ions and experimental signals could indicate the correctness of proposed structural components. The interpretation of MS/MS spectra is illustrated by the spectrum of the candidate 505.613+ (Figure 2-4). This candidate is proposed to be insulin peptide I, β22  RGFFYTPKAβ30, cross-linked with peptide II, α1GIVE α4. Firstly, fragmentations at the  cross-link bridge could generate pairs of ions of one intact component peptides with or without the cross-link bridge, I and II+12, and I+12 and II. The masses of these four proposed fragment ions were compared with masses of MS/MS signals. Signals 417.251+ and 549.802+ turned out to match with II and I+12, while no matching signals for I and 28  II+12 were found. This indicated that the source peptide of the signal 505.613+ contained two parts whose masses matched that of peptide II with a cross-link bridge and peptide I, separately.  Figure 2-4. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 505.613+.  29  Secondly, fragmentation along the backbones of proposed component peptides could generate series of ions from one component peptide cross-linked with part of the other, and series of their counterparts. Examples of this type of proposed fragment ions generated by fragmentations at each peptide bond along peptide II are shown in Table 2-3. By assuming fragmentations at each peptide bond along peptide I, a similar table was generated. The masses of proposed fragment ions in both tables were compared to masses of MS/MS signals, and three pairs of matches were identified: I+12+IIb1 and IIy3, I+12+IIb2 and IIy2, I+12+IIb3 and IIy1. Mass differences between these signals and the previous two signals I+12 and II, confirmed that the source peptide contained a part with a sequence of GIVE, which matched the sequence of the proposed component peptide II. Table 2-3. Proposed fragment ions, their masses and the matching MS/MS signals, derived by assuming fragmentations along the backbone of the proposed component peptide II of the candidate 505.613+ (Its MS/MS spectrum is shown in Figure 2-4).  Proposed Fragment Ion  Mass (Da)  Matching MS/MS Signal  I+12+IIb1  1154.59  578.292+  IIy3  359.21  360.211+  I+12+IIb2  1267.77  634.862+  IIy2  246.13  247.121+  I+12+IIb3  1366.73  684.392+  IIy1  147.07  148.121+  I+12+IIy1  1244.62  N/A  IIb3  269.18  270.181+  I+12+IIy2  1343.69  N/A  IIb2  170.11  171.101+  I+12+IIy3  1456.77  N/A  IIb1 (Immonium Ion)  29.03  N/A  30  Finally, as internal ions from multiple fragmentations are common in CID fragmentation, fragmentations at both the cross-link and peptide backbones could generate b and y ions of peptide I and II with or without the cross-link bridge. This type of proposed fragment ions matched with 18 MS/MS signals: Ib2, Ib3 , Ib4, Ib5+12, Ib6+12, Ib7+12, Ib8+12, Iy1, Iy2, Iy3, Iy4, Iy6+12, Iy8+12 , IIb2, IIb3, IIy1, IIy2 and IIy3. These b and y ion series confirmed that the two parts revealed by signals I+12 and II had the sequence of RGFFYTPKA and GIVE, same as sequences of proposed component peptides I and II. All three types of information together confirmed that the source peptide of signal 505.613+ contained insulin peptides  β22  RGFFYTPKAβ30 and  α1  GIVEα4, and a cross-link  bridge. This candidate was therefore verified to be a cross-linked peptide. This crosslinked peptide is later referred to as  β22  RGFFYTPKAβ30 ^  α1  GIVEα4, using the symbol ^  to represent the cross-link bridge. Fragmentations on this cross-linked peptide and resulting fragment ions can be classified into three types (Figure 2-5). Type 1 ions are from fragmentations at the crosslink bridge, which are pairs of ions of component peptides, one of each pair with the methylene bridge attached. Type 2 ions are from fragmentation in the backbones of component peptides, which are series of ions that consist of one component peptide cross-linked to part of the other (of different lengths), and another ion series which are their counterparts. Two fragmentations, in the peptide backbones and at the cross-link bridge, produce type 3 ions, b and y ion series of component peptides. These fragmentations and fragment ions are similar to those of peptides cross-linked by other cross-linkers67, validating my conclusion that candidate 505.613+ is a cross-linked peptide.  31  These types of fragmentations and fragment ions are later examined in other candidates of cross-linked peptides.  Figure 2-5. The MS/MS fragmentation patterns deduced from the MS/MS spectrum of a cross-linked peptide with an MS signal of 505.613+ (Figure 2-4), to illustrate 3 types of fragment ion series.  2.3.5 Verification of Candidates 903.422+ and 602.623+by MS/MS Spectra Following this general approach, the MS/MS spectrum of the candidate 903.422+ (Figure 2-6) was also interpreted by matching signals to proposed fragmentations and fragment ions (matches shown in Figure 2-7). Candidate 903.422+ was proposed to be disulfide bond connected peptide I-II α18  NYCNα21-  β14  ALYLVCGEβ21 cross-linked with peptide III  α1  GIVEα4, as shown in  Table 2-2. Fragmentation at the cross-link bridge could generate type 1 ions, I-II and 32  III+12, or I-II+12 and III. The masses of two MS/MS signals 417.241+ and 1389.621+ matched masses of proposed fragment ions III and I-II+12. A list of type 2 ions was derived by assuming fragmentations along the proposed component peptides I, II and III. There were eight MS/MS signals that matched the proposed type 2 ions from fragmentations along the backbone of peptide II: IIb2 and I-IIy6+12+III, IIb3 and IIIy5+12+III, IIb4 and I-IIy4+12+III, IIb5 and I-IIy3+12+III. There were 15 MS/MS signals that matched the proposed type 3 ions, their corresponding fragmentation sites are shown in Figure 2-7. Examples are signals 830.271+, 929.341+ and 1042.431+, matching the proposed fragment ions I-IIy3+12, I-IIy4+12 and I-IIy5+12, generated by fragmentation at both the backbone of peptide I-II and the cross-link bridge. As with the verification of the candidate 505.613+, three types of proposed fragment ions together confirmed that the source β14  peptide  of  ALYLVCGEβ21 and  signal α1  903.422+  contained  insulin  peptides  α18  NYCNα21-  GIVEα4 and a cross-link bridge, therefore was a cross-linked  peptide.  33  Figure 2-6. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 903.422+.  34  Figure 2-7. The fragmentation patterns deduced from MS/MS spectrum of a cross-linked peptide with an MS signal of 903.422+ (Figure 2-6).  The candidate 602.623+ (Figure 2-8) has the same mass and LC retention time as the candidate 903.422+, suggesting that they are the same peptide. Therefore, proposed fragment ions of candidate 602.623+ were the same as those derived from candidate 903.422+. Signals in the MS/MS spectrum of candidate 602.623+ (Figure 2-8b) that matched proposed fragment ions (matches are shown in Figure 2-8a) contained all 3 types of fragment ions, which together confirmed the candidate was a cross-linked insulin peptide, (α18NYCNα21-β14ALYLVCGEβ21) ^ α1GIVEα4. 35  Figure 2-8. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 602.623+.  36  Furthermore, most of assigned MS/MS signals and corresponding fragmentation sites on source peptides were the same between candidates 602.623+ and 903.422+, supporting the hypothesis that 602.623+ and 903.422+ represented different charge states of the same cross-linked peptide. In terms of the CID fragmentation process, different charge states could cause different charge distributions along the peptide backbones on different protonation sites. However, the fragmentation sites along peptide backbones were not significantly affected by the different charge distribution between these two cross-linked peptides, consistent with a general observation that the singly, doubly or triply protonated forms of the same peptide often produce similar fragmentation patterns18,109. In both charge states, no MS/MS signal was matched to a proposed fragment ion from fragmentation at the disulfide bond, suggesting that the disulfide bond was not a preferential fragmentation site. This produced fragment ions that contained up to three peptide chains, adding complexity to proposing fragment ions and the spectrum interpretation.  2.3.6 Verification of Candidates 549.794+, 757.884+ and 763.884+by MS/MS Spectra The previous three candidates were verified by matching MS/MS signals to putative fragment ions derived from proposed structural components, revealing three types of fragment ions generated by cross-linked peptides. MS/MS spectra of the remaining three candidates, 549.794+ (Figure 2-9), 757.884+ and 763.884+ (Figure 2-10), were interpreted in the same way using the same classification of fragment ions.  37  The MS/MS spectrum of the candidate 549.794+ (Figure 2-9b) was found to contain only type 1 and 3 ions, their corresponding fragmentation sites on the peptide shown in Figure 2-9a. Type 1 ions I+12 and II+12 (corresponding signals are 549.782+ and 1098.461+) suggested that the precursor ion consisted of two parts of m=1097.56 Da each. This mass equaled that of  β22  RGFFYTPKAβ30 plus a cross-link bridge or a Schiff-base  modification. Type 3 ions, b and y ion series, confirmed that both parts had a sequence of RGFFYTPKA and a structure of +12 Da mass shift attached. Type 1 and type 3 fragment ions together, suggested that the source peptide of the signal 549.794+ was (β22RGFFYTPKAβ30 ^  β22  RGFFYTPKAβ30)*. Here, the symbol * represents an extra  modification or intra-peptide cross-link on one component peptide, or an extra interpeptide cross-link.  38  Figure 2-9. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 549.794+. The * represents an extra Schiff-base modification or cross-link.  39  The candidate 757.882+ is proposed to be peptide with  α1  β22  RGFFYTPKAβ30 cross-linked  GIVEα4 (Table 2-2). The MS/MS spectrum (Figure 2-10a) contains signals from  all three types of ions. Type 1 ion I (corresponding signals are 543.712+ and 1086.481+) indicated that the peptide consisted of a component of m=1085.48 Da, the same mass as the proposed component peptide I. Observed type 2 ions were IIy3 and I+12+IIb1, IIy2 and I+12+IIb2, IIy1 and I+12+IIb3. The mass differences between type 1 and 2 ions, and the precursor, I+12+II, suggested a peptide segment of GIVE attached to the part of m=1085.48 Da via a 12 Da structure. Type 1 and 2 ions together suggested that the candidate consisted of a part of m=1085.48 Da, possibly linked to  α1  β22  RGFFYTPKAβ30, cross-  GIVEα4. However, there were few type 3 ions, not enough to prove that the  part of m=1085.48 Da has a sequence of RGFFYTPKA. Therefore, the candidate 757.882+ was not confirmed to be a cross-linked peptide, because not all the proposed structural components were confirmed by the MS/MS spectrum, but it is still a likely candidate. The situation of the MS/MS spectrum of candidate 763.882+ (Figure 2-10b) is very similar to that of candidate 757.882+. The candidate is proposed to be β22RGFFYTPKAβ30 cross-linked with  α1  GIVEα4, with an extra modification or cross-link (Table 2-2). The  MS/MS spectrum (Figure 2-10b) contained signals from type 1 and 2 ions that suggested that the candidate consisted of a component of m=1097.60 Da, possibly β22  RGFFYTPKAβ30 with an extra modification or cross-link, cross-linked to  α1  GIVEα4.  However, there were few type 3 ions to sequence the part of m=1097.6 Da. Therefore, the candidate 763.882+ was not confirmed to be a cross-linked peptide, but it is still a likely candidate. 40  Figure 2-10. The fragmentation patterns, types of fragment ions and MS/MS spectra of candidates (a) 757.882+ and (b) 763.882+. The * represents an extra Schiff-base modification or crosslink.  Taking MS/MS spectra of candidates 549.794+, 757.884+ and 763.884+ together, absence or a limited number of one type of fragment ions did not necessarily affect the 41  confirmation of proposed structural components. Type 1 and type 3 ions in the MS/MS spectrum of candidate 549.794+, with the absence of type 2 ions, were enough to confirm sequences of the proposed component peptides. Although candidates 757.882+ and 763.882+ contained all three types of fragment ions, the lack of type 3 ions from one proposed component peptide obstructed the confirmation of its sequence. Therefore, these candidates were possibly cross-linked peptides, but not confirmed. My skeptical attitude towards these candidates are supported by the fact that lack of type 3 ions from one component peptides has been shown to be a common source of false positive assignment of cross-linked peptides from other cross-linkers67,110.  2.3.7 Partial Stability of Cross-link Bridges in CID Fragmentation The formaldehyde induced cross-links were considered to be vulnerable to CID fragmentation, as the cross-link between glycine and a peptide was reported to break easily in one study102. However, the MS/MS spectra of cross-linked peptides β22  RGFFYTPKAβ30 ^ α1GIVEα4 and (α18NYCNα21-β14ALYLVCGEβ21) ^ α1GIVEα4 (Figure  2-4, 2-6 and 2-8b) showed fragmentation along the backbone of insulin peptide α1GIVEα4 or  α18  NYCNα21-β14ALYLVCGEβ21 without affecting the cross-link bridge, which  generated type 2 ions. Therefore, formaldehyde induced cross-links are shown to be more stable than previously hypothesized. At the same time, fragmentation also occurred at the cross-link bridge, producing type 1 and 3 ions in MS/MS spectra of cross-linked peptides  β22  RGFFYTPKAβ30 ^  and (β22RGFFYTPKAβ30 ^  β22  α1  GIVEα4, (α18NYCNα21-β14ALYLVCGEβ21) ^  α1  GIVEα4  RGFFYTPKAβ30)* (Figure 2-4, 2-6, 2-8b and 2-9b). In  conclusion, formaldehyde induced cross-links between peptides are less vulnerable to CID fragmentation than previously predicted but not absolutely stable. This partial 42  stability facilitating the localization of cross-links on the component peptides is discussed in the following section.  2.3.8 Localization of Cross-link Sites MS/MS spectra have been used to localize modifications based on the mass shifts they produce on series of ions that contain modified residues21-23. Therefore, it is hypothesized that in the MS/MS spectrum of a cross-linked peptide, series of fragment ions that contain the cross-link bridge (+12 Da mass shift) could be used to localize the cross-link on component peptides. The localization of cross-link sites is illustrated in the MS/MS spectrum of the cross-linked peptide  β22  RGFFYTPKAβ30 ^  α1  GIVEα4 (Figure 2-4). In the following  discussion, peptide I represents β22RGFFYTPKAβ30, while peptide II represents α1GIVEα4. The mass differences between the ion series I+12, I+12+IIb1, I+12+IIb2, I+12+IIb3 and the precursor ion I+12+II, indicated that a peptide segment of GIVE was attached to peptide I, with a cross-link bridge of +12 Da mass shift. In other words, cross-linked to  α1  β22  RGFFYTPKAβ30 was  GIVEα4 at the Gα1 residue. The cross-link site on  β22  RGFFYTPKAβ30  was localized to Yβ26, suggested by the following conclusions derived from the ion series Ib2, Ib3, Ib4, Ib5+12, Ib6+12, Ib7+12, Ib8+12 and I+12. Specifically, Ib2, Ib3 and Ib4 suggested that the cross-link bridge was not attached to  β22  RGFF β25. The mass difference between  Ib4 (508.281+) and Ib5+12 (683.371+), 175.09 Da, indicated that the cross-link bridge of +12 Da mass shift was attached on Yβ26. Additionally, the ion series Ib5+12, Ib6+12, Ib7+12, Ib8+12 and I+12 indicated that the cross-link bridge was attached to β22RGFFYβ26 and not to  β26  TPKAβ30. These together indicate that Yβ26 was the cross-link site on  43  β22  RGFFYTPKAβ30. The y ion series of  β22  RGFFYTPKAβ30 localized the cross-link  bridge in the same way. The ions series Iy1, Iy2, Iy3, Iy4, Iy6+12, Iy8+12 and I+12 indicated that the cross-link bridge was on β25FYβ26 and not on β22RGFβ24 or β26TPKAβ30, supporting the identification of Yβ26 as the cross-link site by the b ions series. Consequently, the cross-linked peptide  β22  RGFFYTPKAβ30 ^  α1  GIVE α4 was formed by Gα1 cross-linked to  Yβ26. The cross-link sites in the cross-linked peptide (α18NYCNα21-β14ALYLVCGEβ21) ^ α1  GIVEα4 were also localized by fragment ions containing the cross-link bridge. In the  following discussion, peptide I-II represents peptide III represents  α1  α18  NYCNα21-β14ALYLVCGEβ21, while  GIVEα4. In the MS/MS spectra of the doubly charged form  (Figure 2-6), I-IIy3+12+III, I-IIy4+12+III, I-IIy5+12+III and I-IIy4+12+III indicated that peptide III was cross-linked to the I-IIy3 segment of peptide I-II. Series of y ions of peptide I-II with the cross-link attached, I-IIy3+12, I-IIy4+12 and I-IIy5+12, also localized the cross-link to the I-IIy3 segment of peptide I-II. The ion series IIb2, IIb3, IIb4, IIb5, IIb6 and I-II+12 suggested that the cross-link bridge was not attached to the IIb6 segment but the I-IIy3 segment. These evidences taken together proved that the cross-linked peptide (α18NYCNα21-β14ALYLVCGEβ21) ^  α1  GIVEα4 was formed by  α1  GIVEα4 (III) cross-linked  to the α18NYCNα21-β19CGEβ21 (I-IIy3) segment. The α18NYCNα21-β19CGEβ21 segment as the cross-link site was also assigned by similar series of fragment ions from the triply charged form (MS/MS spectrum shown in Figure 2-8b). Localization  of  (β22RGFFYTPKAβ30 ^  β22  the  cross-link  bridge  in  the  cross-linked  peptide  RGFFYTPKAβ30)* (Figure 2-9) is more complicated. Because  the two component peptides are the same, and both the cross-link bridge and the extra 44  modification/cross-link induce a mass shift of +12 Da, it is difficult to discriminate which structure is attached to which component peptide based on the b/y ion series. The detailed discussion of the extra structure and possible cross-link sites, based on additional information, will be provided in Chapter 3 instead. According to these cross-linked peptides, cross-link sites were localized by series of type 2 ions that contained one component peptide cross-linked to part of the other of different lengths, and by type 3 ion series that were b/y ions of component peptides with the cross-link bridge attached. Type 2 and type 3 ions that provided complementary information about the cross-link sites were produced by the cross-link bridge staying intact or breaking, separately. Therefore, the partial stability of cross-link bridges in CID fragmentation facilitated the localization of cross-link sites. In addition, the localization of cross-link sites in peptide β22RGFFYTPKAβ30 ^ α1GIVE α4 revealed a Gα1 (N-terminus) to Yβ26 cross-link in proteins, consistent with the high reactivity of the N-terminus in the modification step, and the potential reactivity of Tyr (Y) in the cross-linking step 102,104.  2.4 Conclusions and Outlook In this chapter, cross-linked peptides have been studied in a model protein to gain knowledge about the MS properties of cross-linked peptides and the reaction chemistry of formaldehyde cross-linking. As a first step, a method was developed to identify cross-linked peptides in model protein systems. Matches between a theoretical list of cross-linked peptides and experimental MS signals, with background signals subtracted, were considered as candidates for cross-linked peptides. Signals in the MS/MS spectra of a candidate were 45  matched with proposed structural components of the candidates, and the confirmation of all of the proposed structural components then verified the candidate to be a cross-linked peptide. The CID fragmentation of a formaldehyde cross-linked peptide proved to occur at both the cross-link bridge and peptide backbones, generating three types of fragment ions: 1. Ions of component peptides; 2. Series of ions that contain one component peptide cross-linked to part of the other, and another ion series which are their counterparts; 3. The b and y ion series of component peptides, with or without the cross-link bridge. Type 2 and type 3 ions allowed localization of cross-link sites on component peptides. The whole approach allowed the identification of three cross-linked peptides in the insulin model protein system: ^  α1  β22  RGFFYTPKAβ30 ^  GIVEα4 and (β22RGFFYTPKAβ30 ^  β22  α1  GIVEα4, (α18NYCNα21-β14ALYLVCGEβ21)  RGFFYTPKAβ30)*. Also, series of type 2 and  type 3 ions allowed the localization of cross-link sites to Gα1 and Yβ26 in β22  RGFFYTPKAβ30 ^  α1  GIVEα4, and to the  α18  NYCNα21-β19CGEβ21 segment in  (α18NYCNα21-β14ALYLVCGEβ21) ^ α1GIVEα4. This method is generally applicable to other model proteins, to identify candidates of formaldehyde cross-linked peptides, to interpret the MS/MS spectra based on proposed structural components, and to localize the cross-link sites. The interpretation of the MS/MS spectra of formaldehyde cross-linked peptides is also applicable to data of crosslinked native proteins, once candidates of cross-linked peptides are found in their much more complex peptide mixture. This model study also provided valuable information on the MS properties of cross-linked peptides and the reaction chemistry, which could help to overcome challenges in the identification of cross-linked peptides from cross-linked native proteins. 46  The challenges of identifying low abundance cross-linked peptides from a high complexity peptide mixture could be solved by a combination of enrichment methods for cross-linked peptides and bioinformatics software to automate interpretation of complex data. On the one hand, bioinformatics software require user-input parameters such as a fragmentation model of cross-linked peptides and residue reactivity of the cross-linker. CID fragmentation patterns of formaldehyde cross-linked peptides were revealed in this study for the first time, which can therefore be submitted to bioinformatics software. Also, an N-terminus to Tyr (Y) cross-link induced by formaldehyde was revealed in proteins, which can be submitted to the residue reactivity part of bioinformatics software. On the other hand, the identification of three cross-linked peptides in a model protein makes it possible to develop enrichment methods for cross-linked peptides. Different chromatographic strategies that are established for other cross-linkers, such as strong cation exchange (SCX), strong anion  exchange (SAX) and  size exclusion  chromatography (SEC)67,70-72, can be adapted to the digest of formaldehyde treated model protein samples to examine whether signals from cross-linked peptides are improved in signal to noise. The method was developed in a very simple model protein system, and requires verification in other model proteins which could generate larger and more complex peptide mixtures. Moreover, only one pair of reactive residues was revealed in one crosslinked peptide, while cross-link sites in two cross-linked peptides were not localized to individual residues due to the complexity added by disulfide bonds and an extra 12 Da structure. Therefore, investigation of other model proteins and refinement of the method is necessary. 47  3 Method Refinement Using Other Model Protein Systems 3.1 Introduction A method to identify formaldehyde cross-linked peptides in the digest of a model protein was developed in Chapter 2. This method allowed the identification of three cross-linked peptides. Moreover, three types of CID fragmentation and resulting fragment ions from cross-linked peptides were revealed: 1. Fragmentation at the cross-link that produces pairs of signals from component peptides; 2. Fragmentation at peptide backbones that generates a series of ions that contain one component peptide cross-linked to part of the other and another ion series consisting of their counterparts; 3. Fragmentation at both peptide backbones and the cross-link bridge that produces b and y ion series of component peptides. Type 2 and 3 ions helped to localize the cross-link to small regions of component peptides and even individual residues. To validate my method, its applicability to more complex model systems needs to be tested. In addition, a deeper understanding of the formaldehyde cross-linking reactions could be gained, if more cross-linked peptides are identified with cross-links localized to individual residues. In this chapter I apply this method to two other model protein systems, to assess its applicability to model proteins of equivalent and higher complexity, identify more cross-linked peptides, explore the reaction chemistry in more depth, and learn to tackle issues associated with increasing sample complexity. Insulin, if its disulfide bonds are reduced during Glu-C digestion, produces six peptides: α1-4, α5-17, α18-21, β1-13, β14-21 and β22-30. This makes a perfect model system to verify the method, because an effective experimental workflow has been 48  established with this protein, but the resulting peptide mixture is different from the one used for method development (Chapter 2). The method is also applied to a slightly larger model protein, myoglobin. Myoglobin has a molecular weight of 17 kDa, and 14 peptides are generated upon Glu-C digestion. Also, myoglobin has been shown to produce various modified peptides, because it contains 19 lysine residues that are very reactive in the modification step104. These facts together lead to a much more complex mixture of unmodified, modified and cross-linked peptides than that from the insulin model system, which is suitable for further method verification and refinement.  3.2 Experimental 3.2.1 Materials Insulin from bovine pancreas, myoglobin from horse heart, α-cyano-4hydroxycinnamic acid (CHCA), trizma base, ammonium bicarbonate, sodium hydroxide, sodium dodecyl sulfate (SDS), tetramethylethylenediamine (TEMED) and glycerol were all obtained from Sigma (St. Louis, MO). Paraformaldehyde (PFA), formic acid (FA, 88%) and acetonitrile (ACN, HPLC grade) were purchased from Fisher (Fair Lawn, NJ). Acrylamide, ammonium persulfate (APS), bromophenol blue, Coomassie Blue Brilliant R250, gel casting and running systems were purchased from Biorad (Hercules, CA). Endoproteinase Glu-C was obtained from Roche Applied Science (Penzberg, Germany). 3 kDa MW-cut-off filters were purchased from Millipore Corporation (Cork, Ireland), syringe filters (0.22 µm) were purchased from Pall Corporation (Ann Arbor, MI).  49  Deionized water (18 MΩ cm) was prepared using a Nanopure Ultrapure Water System from Barnstead (Dubuque, IA).  3.2.2 Preparation of Formaldehyde Solution A 4% (w/v) (1.3 M) formaldehyde stock solution was prepared by heating (80℃) PFA in PBS at pH 7.5 for 30 min, cooling to room temperature and filtering through a 0.22 µm filter.  3.2.3 Cross-linking of the Model Protein The model protein, insulin (300 µM) or myoglobin (100 µM), was incubated with formaldehyde (1%, w/v) in PBS (37 ℃, pH 7.5) for 0, 0.5, 2 and 6 hr. The reaction was quenched through the addition of 1 M Tris buffer (pH 7.5), and a final Tris concentration of 0.5 M was reached. The 0 hr sample was prepared by quenching 1% formaldehyde with 1 M Tris buffer 10 minutes before the addition of protein. Control samples of all four time points were prepared by replacing the formaldehyde volume with PBS. Three repeats of the reaction were performed, and all of the following experimental steps were applied to each sample. The model protein was concentrated with 3 kDa MW-cut-off filters, and the buffer was replaced with 0.01% formic acid in water.  3.2.4 SDS-PAGE Analysis of Cross-linked Insulin Insulin samples (8.6 µg) in 0.01% formic acid were mixed with PBS and 4× nonreducing SDS loading buffer (500 mM Tris pH 6.8, 8% SDS, 40% glycerol, 5 mg/mL bromophenol blue), and incubated at 65 ℃ for 5 min. Proteins were then separated on a 15% acrylamide gel and visualized by Coomassie Brilliant Blue R250. 50  3.2.5 Mass Spectrometric Analysis of Cross-linked Myoglobin Myoglobin samples were mixed with saturated solution of CHCA (in 50:50 ACN:5% FA) to a final concentration of 5 µM. Each sample was spotted onto a MALDI plate, air dried and analyzed by MALDI-TOF MS (4700 Proteomics Analyzer, Applied Biosystems, Foster City, CA) in linear mode. The centroid mass was recorded.  3.2.6 Glu-C Digestion of Cross-linked Proteins Proteins, insulin or myoglobin, in 0.01% formic acid were digested overnight at 25 ℃ in 50 mM ammonium bicarbonate (pH 7.8) with endoproteinase Glu-C (enzyme:substrate = 1:20 (w/w)). Disulfide bonds in insulin were reduced by DTT at 56 ℃ for 1hr, and alkylated by IAA at 25℃ for 0.5 hr before addition of the enzyme. The digestion was quenched by decreasing the pH using 5% formic acid in water (Vdigestion:Vacid=10:1), and samples were stored at -20 ℃.  3.2.7 Preparation of Modified Insulin α-Chain The formaldehyde treated 0.5 hr and 6 hr insulin samples were reduced by DTT at 56 ℃ for 1 hr, and alkylated by IAA at 25 ℃ for 0.5 hr. Thus insulin α- and β-chains containing modifications were generated.  3.2.8 Mass Spectrometric Analysis of Peptides Peptide and α-/β-chain samples were diluted in water to 3 µM, then separated and analyzed by nano-HPLC MS and MS/MS on a nanospray-ESI-Q-TOF (QStar XL, Applied Biosystems, Foster City, CA) in the information-dependent-acquisition mode. The 15 cm long, 75 µm I.D. HPLC column was lab-made, packed with 3 µm reverse 51  phase  C18  beads  (Dr.  Maisch,  Ammerbuch-Entringen,  Germany).  Water:acetonitrile:formic acid with 100 min gradient elution (0.1% formic acid to 80% acetonitrile 0.1% formic acid) was used as the mobile phase. MS/MS spectra were collected with nitrogen as the collision gas, and the collision energy varied as an optimized function of m/z and z.  3.2.9 Labeling of MS/MS Spectra of Cross-linked Peptides To label MS/MS spectra of candidates of cross-linked peptides, the widely accepted b and y nomenclature system was modified to clarify component peptides and disulfide bonds. Component peptides of each precursor ion were labeled by Roman Numbers, I, II, III etc. The cross-link bridge and modifications were represented by their mass, 12 or 30.  3.2.10 Localization of Modification Sites by Degree of Modification (D.O.M) A graphic visualization has been devised by Toews et al.103 to localize multiple modifications on a modified peptide by the average number of modifications, the degree of modification (DOM), on fragment ions. DOM of a fragment ion is calculated by the peak area (PA) of one modification state multiplied by the number of modifications it contains and divided by the total PA of all modification states of that fragment ion, and summed across all modification states. A sample calculation of the DOM value of b4 ion from a doubly modified peptide (+24Da) is shown in the following equation. DOMb4   0  PAb4 1  PAb4 12 2  PAb4 24   PAb4  PAb4 12  PAb4 24 PAb4  PAb4 12  PAb4 24 PAb4  PAb4 12  PAb4 24  52  In order to localize modification sites in a multiply modified peptide, DOM values are calculated for each detectable b and y ion, and plotted as bar graphs against the peptide sequence. Along the peptide sequence, the DOM values show a significant difference (a step in the bar graph) at each modified residue and stay unchanged (a plateau in the bar graph) at unmodified residues. In the case of a singly modified peptide, which is a mixture of the same peptide with modifications on several reactive residues, the DOM values of detectable b and y ions are calculated in the same way but expressed as percentage values. DOM values are also plotted against the peptide sequence as bar graphs, where a significant difference in DOM values (a step in the bar graph) is indicative of a modification site.  3.3 Results and Discussion 3.3.1 Model Protein Insulin with Alternative Processing 3.3.1.1 Identification of Five Cross-linked Insulin Peptides The experimental workflow and data analysis were the same as described in Chapter 2, except disulfide bonds were reduced and alkylated before the addition of GluC. Briefly, insulin was incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hours, and quenched with Tris buffer. Samples were separated by SDS-PAGE on a 15% gel to verify the progression of cross-linking between insulin molecules. All samples were digested and analyzed by LC-MS/MS, and the formaldehyde treated 6 hr insulin sample was selected for the identification of cross-linked peptides due to the high yield of cross-links. The LC-MS data of formaldehyde treated 6 hr sample also contained a lot of  53  unknown signals, besides the signals assigned to unmodified and modified insulin peptides by mass, MS/MS spectra and LC retention time. Unknown MS signals were matched with a theoretical list of possible cross-linked peptides made by combining all the unmodified and modified insulin peptides in this sample, and therefore were reduced to a few candidates of cross-linked peptides with proposed structural components. MS/MS spectra of candidate peptides were collected. Matching MS/MS signals to proposed fragment ions demonstrated that five candidate peptides (Table 3-1) contained all proposed structural components, and therefore were verified to be cross-linked peptides.  They  are  (β22RGFFYTPKAβ30 α18  later ^  β22  referred  to  β22  as  RGFFYTPKAβ30)*,  β22  RGFFYTPKAβ30  α1  ^  RGFFYTPKAβ30 ^  α18  GIVEα4,  NYCNα21,  NYCNα21^α1GIVEα4 and (α18NYCNα21 ^ α1GIVEα4)*, with ^ representing the cross-link,  and * representing an extra modification or cross-link. Table 3-1. The m/z value, mass and structural components of cross-linked peptides identified in the digest of the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced). The ^ represents the cross-link bridge, while the * represents an extra Schiff-base modification or crosslink.  m/z  Mass  505.613+  1513.83  549.794+  2195.16  556.613+  1666.83  499.692+  997.38  505.692+  1009.38  Structural components β22  RGFFYTPKAβ30 ^ α1GIVE α4  (β22RGFFYTPKAβ30 ^  RGFFYTPKAβ30)*  β22  RGFFYTPKAβ30 ^ α18NYCNα21 α18  NYCNα21 ^ α1GIVE α4  (α18NYCNα21 ^ α1GIVE α4)*  Among the five cross-linked peptides, (β22RGFFYTPKAβ30 ^  β22  β22  RGFFYTPKAβ30 ^  α1  GIVEα4 and  β22  RGFFYTPKAβ30)* were also identified in Chapter 2, with 54  MS/MS spectra (not shown) similar to Figure 2-4 and 2-9. The remaining three crosslinked peptides, (α18NYCNα21 ^  α1  β22  RGFFYTPKAβ30 ^  α18  NYCNα21,  α18  NYCNα21 ^  α1  GIVEα4 and  GIVEα4)*, were new findings in this disulfide-reduced model system.  However, peptide α18NYCNα21 ^ α1GIVEα4 could be the disulfide-reduced form of peptide (α18NYCNα21-β14ALYLVCGEβ21) ^  α1  GIVEα4, which was identified in the disulfide-non-  reduced model system in Chapter 2. The disulfide-non-reduced forms of the other two newly found cross-linked peptides, β22RGFFYTPKAβ30 ^ (α18NYCNα21-β14ALYLVCGEβ21) (m=4311.94 Da) and ((α18NYCNα21-β14ALYLVCGEβ21) ^  α1  GIVEα4)* (m=1816.84 Da),  should also exist in the disulfide-non-reduced insulin model system discribed in Chapter 2, although they were not identified. It seems that breaking disulfide bridges makes α18  NYCNα21 containing cross-linked peptides easier to ionize and generate an MS signal,  thereby providing complementary information to the disulfide-non-reduced insulin model system. 3.3.1.2 Localization of the Cross-link Bridge After the identification of cross-linked peptides, cross-link sites on component peptides were localized by the method developed in 2.3.8. For the two cross-linked peptides that were also identified in Chapter 2, the MS/MS spectrum of β22  RGFFYTPKAβ30 ^ α1GIVEα4 also localized the cross-link to Gα1 and Yβ26, while cross-  link sites in (β22RGFFYTPKAβ30 ^ β22RGFFYTPKAβ30)* will be discussed in 3.3.1.3. The three newly identified cross-linked peptides produced all of the three types of fragment ions (fragmentation patterns shown in Figure 3-1a, 3-2a and 3-3a): 1. ions of whole component peptides; 2. series of ions that contain one whole component peptide cross-linked to part of the other, and another ion series of their counterparts; 3. series of b 55  and y ion of component peptides. Type 2 and 3 ions were examined in detail to localize cross-link sites. The cross-link sites in  α18  NYCNα21 ^  α1  GIVEα4 (Figure 3-1) were localized by both  type 2 and type 3 ions. Fragmentations in the backbone of peptide II produced type 2 ions I+12+II  b2  and I+12+II  b3,  suggesting that the cross-link site was on  α1  GIα2. Since  isoleucine was shown not to be involved in the formaldehyde cross-linking reactions92, the N-terminus Gα1 was determined to be the cross-link site. Fragmentations of peptide I generated type 2 ions Iy3+12+II, Ib2+12+II and Ib3+12+II, suggesting that the cross-link was on Yα19. The b and y ion series from peptide I (type 3 ions: Ib2+12, Ib3+12, Iy1, Iy2 and Iy3+12), also localized the cross-link to Yα19. Therefore, a Yα19 to Gα1 cross-link formed in insulin, and produced the peptide  α18  NYCNα21 ^  the peptide (α18NYCNα21-β14ALYLVCGEβ21) ^ localized to the  α18  α1  α1  GIVEα4 upon digestion. Considering  GIVEα4, in which the cross-link was  NYCNα21-β19CGEβ21 segment (see Chapter 2.3.8), it is likely also  formed by the Yα19 to Gα1 cross-link.  56  Figure 3-1. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide α18NYCNα21^α1GIVEα4, with an MS signal of 499.692+.  57  The cross-link sites on both component peptides of α18  β22  RGFFYTPKAβ30 ^  NYCNα21 (Figure 3-2) were determined by both type 2 and type 3 ions. Fragmentations  of the backbone of peptide I Type 2 ions Iy2+12+II, Iy3+12+II and Iy4+12+II, localized the cross-link to  β29  KAβ30. The cross-link site was further localized to Kβ29, as alanine has  been shown not to be reactive in formaldehyde induced reactions92. Type 3 ions (Iy2+12, Iy3+12, Iy4+12 and Iy5+12) also assigned the cross-link to Kβ29. The cross-link site on peptide II was determined to be Yα19, by type 2 ions I+12+IIb2, I+12+IIb3 and I+12+IIy3, and type 3 ions IIb2+12, IIb3+12 and IIy3+12. Therefore, a Yα19 to Kβ29 cross-link formed in insulin, and produced the peptide β22RGFFYTPKAβ30 ^ α18NYCNα21 upon digestion. It should be noted that fragmentation occurred at both ends of the cross-link bridge, which together with fragmentation along the peptide backbones generated ion series of both Ib/y and Ib/y+12, as well as both IIb/y and IIb/y+12. A seemingly similar observation will be discussed in Chapter 3.3.2.4.  58  Figure 3-2. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide β22RGFFYTPKAβ30 ^ α18NYCNα21, with an MS signal of 556.613+.  59  3.3.1.3 Determination of the Extra Structure of 12 Da Mass Shift by Reactivity Considerations The determination of cross-link sites in (α18NYCNα21 ^ α1GIVEα4)* is complex as it is difficult to discriminate among three possibilities for the 12 Da structure represented by the asterisk (*): a Schiff-base modification, an intra-peptide cross-link, or a second cross-link between the component peptides. This difficulty in determinating the extra 12 Da structure directly by MS/MS spectra also occured in peptide (β22RGFFYTPKAβ30 ^ β22  RGFFYTPKAβ30)* as discussed in Chapter 2.3.8. Here I combine considerations on  the reported reactivity of different residues and the MS/MS spectra in this study to determine the corresponding peptide structures. The structure of peptide (α18NYCNα21 ^  α1  GIVEα4)* was partly resolved without  taking into account the extra 12 Da structure. The cross-link site on  α1  GIVEα4 was  localized to Gα1 by both type 2 ions I+24+IIb1, I+24+IIb2 and I+24+IIb3, and type 3 ions IIb2+12 and IIb3+12. The extra 12 Da structure was also localized to one component peptide in the following way. Pairs of type 1 ions I+12 and II+12, and I+24 and II, indicated that the extra 12 Da structure was on peptide I. Type 3 ions Ib2+24 and Ib4+24 also indicated that both the cross-link bridge and the extra 12 Da structure were attached to peptide I. This assignment was also supported by the lack of IIb/y+24 ions, which suggested that the extra 12 Da structure was not attached to peptide II. In this case, the extra 12 Da structure was still possibly a second cross-link bridge between peptide I and II, with the end attached to peptide II easily breaking in CID, resulting in the cross-link bridge only being attached to peptide I in fragment ions. Taken together, this peptide was composed of the N-terminus Gα1 on α1GIVEα4 cross-linked to  α18  NYCNα21, with an extra 60  Schiff-base modification or intra-peptide cross-link on  α18  NYCNα21, or a second cross-  link connecting α1GIVEα4 and α18NYCNα21. At this stage, considering the reactivity of different residues over the course of the 6 hr of formaldehyde cross-linking reactions helps exclude some possible peptide structures. The possibility of a Schiff-base modification on α18NYCNα21 can be eliminated, as none of Asn (N), Tyr (Y) or Cys (C) (in disulfide bonds) has been shown to form Schiff-base modification with formaldehyde exposure for less than 6 hr in model peptides and proteins103-104. Since none of Asn, Tyr or Cys (in disulfide bonds) can be modified into Schiff-base structure to initiate the cross-linking step, an intra-peptide cross-link is unlikely to form between Asn and Tyr, Cys and Tyr or Asn and Cys. This leaves a second inter-peptide cross-link as the most reasonable assumption. This possibility is supported by the ability of the N-terminus Gα1 to form two cross-links102. Furthermore, the possibility of forming two cross-links between the N-terminus and Asn, or two crosslinks between the N-terminus and Tyr can be excluded, because neither Asn nor Tyr has been shown to form two cross-links with one glycine molecule102. Therefore, the source peptide is likely the N-terminus G α1 at  α1  GIVEα4 cross-linked with Nα18 and Yα19, Yα19  and Nα21, or Nα18 and Nα21. The first scenario is supported by the MS/MS spectrum. Type 2 ions, Ib2+24+II and Ib3+24+II, indicated that the two cross-links were on the  α18  NY α19  segment. Type 3 ions, Ib2+24 and Ib3+24, also confirmed two cross-link bridges attached to  α18  NY α19. Moreover, type 3 ions, Iy1, Iy2 and Iy3+12, indicated that there is one cross-  link on Yα19. In summary, the structure of peptide (α18NYCNα21 ^  α1  GIVEα4)* can be  assumed to be Nα18 and Yα19 both cross-linked with G α1.  61  Figure 3-3. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide (α18NYCNα21 ^ α1GIVEα4)* , with an MS signal of 505.692+.  62  For peptide (β22RGFFYTPKAβ30 ^ β22RGFFYTPKAβ30)* (Figure 2-9), the extra 12 Da structure could not be assigned to one component peptide by type 1 or type 3 ions, as the two component peptides were the same. Additionally, both component peptides contained Rβ22 that was shown to be reactive in both the modification and the crosslinking step, Yβ26 that was potentially reactive in the cross-linking step, and Kβ29 that was reactive in the modification step. Together, this suggested many possible cross-link sites, as well as identities and locations of the extra 12 Da structure. If the possibility of isomeric peptides with the same component peptides but different modification and cross-link sites were to be considered, resolving structures of isomeric peptides would be even more complex. The complexity in the clarification of cross-linked peptide structures that also contain an extra 12 Da structure is discussed in Chapter 3.3.2.5. 3.3.1.4 The Two-Step Reaction between the N-terminus or Lysine and Tyrosine or Asparagine Residues In this model system, structures of four cross-linked peptides (Figure 3-4) revealed the N-terminus to Tyr (Y), the N-terminus to Asn (N) and Lys (K) to Tyr (Y) cross-links. The progression of both the modification and the cross-linking step on these residues was examined to clarify the order of the two-step reactions on these reactive residues, thereby revealing the chemistry of the formaldehyde cross-linking reactions in proteins.  Figure 3-4. Structures of cross-linked peptides identified in the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced), with cross-link sites localized to individual residues.  63  The progression of the cross-linking step was examined first. All four cross-linked peptides identified in the formaldehyde treated 6 hr insulin sample appeared in formaldehyde treated 0.5 hr and 2 hr samples as well, with similar LC retention time and MS/MS spectra. Therefore, the observed N-terminus to Tyr/Asn and Lys to Tyr crosslinks were produced in insulin within 0.5 hr of formaldehyde exposure. The progression of the modification step on each cross-link site was then examined, and correlated with the progression of the cross-linking step. Monitoring the modification on Gα1, Yα19 and Nα18 is shown here to illustrate the two-step reactions that form the Gα1 to Yα19 and Gα1 to Nα18 cross-links. Since neither  α18  NYCNα21 nor  α1  GIVEα4 produced  good MS/MS spectra, the progression of the modification on the whole α-chain was examined instead. Figure 3-5a shows the MS/MS spectrum of the singly modified insulin α-chain,  α1  GIVEQCCASVCSLYQLENYCNα21+12, after 0.5 hr of formaldehyde  treatment. The b ion series from b2+12 to b6+12 indicated that the modification was on α1GIα2 and not on α3VEQCα6 (Figure 3-5). The modification was further localized to Gα1 because isoleucine was shown not to be involved in formaldehyde-induced reactions92. The y ions from y1 to y5 suggested that none of the residues in the  α17  ENYCNα21 segment was  modified within 0.5 hr of formaldehyde exposure, including Yα19 and Nα18. In the MS/MS spectrum of the singly modified α-chain after 6 hr of formaldehyde treatment (Figure 35b), the b2+12 to b9+12 and y1 to y6 also suggested that the modification was on G α1 and not on Yα19 or Nα18. Thus, G α1 was modified within 0.5 hr of formaldehyde exposure, while neither Yα19 nor Nα18 was modified after 6 hr of exposure. These facts, combined with the formation of Gα1 to Yα19 and Gα1 to Nα18 cross-links within 0.5 hr of 64  formaldehyde exposure, indicated that Gα1 was modified and then cross-linked to Yα19 or Nα18.  Figure 3-5. MS/MS spectra of the singly modified insulin α-chain (α1GIVEQCCASVCSLYQLENYCNα21+12) after (a) 0.5 hr, (b) 6 hr of formaldehyde exposure. In both spectra, the Schiff-base modification is localized to Gα1, and proves not to be on Nα18 or Yα19.  The two-step reaction to form the Gα1 to Yα19 cross-link in insulin is shown in Figure 3-6a. It is consistent with the reported residue reactivity: N-termini in model proteins have been shown to be modified by formaldehyde within 20 min104; Ys in model peptides have been shown to form cross-links with Schiff-base structures on glycine within 2 days of formaldehyde exposure102. Moreover, reaction schemes of the modification and cross-linking steps (Figure 3-6b) could be derived from chemical structures of cross-linked model molecules and amino acids92,99. In this reaction, the amino group of the N-terminus went through an addition with formaldehyde and formed 65  a methylol modification, which then dehydrated into a Schiff-base structure. In the crosslinking step, the Schiff-base structure turned into a methylene bridge connected to the aromatic ring of the tyrosine side chain. Upon Glu-C digestion, this cross-link produced peptide α18NYCNα21 ^ α1GIVEα4 (Figure 3-4a). The two-step reaction to form a Gα1 to Nα18 cross-link in insulin is shown in Figure 3-6c. The Gα1 to Nα18 cross-link was suggested to form as a second cross-link after the formation of the Gα1 to Yα19 cross-link, based on the fact that the Gα1 to Yα19 cross-link was identified alone, while the Gα1 to Nα18 cross-link was only identified together with the Gα1 to Yα19 cross-link. This hypothesis was also supported by a report that, in model peptides, Ns were less reactive than Ys in the cross-linking step102. After the formation of the Gα1 to Yα19 cross-link, Gα1 was modified again and cross-linked to Nα18 (Figure 3-6c). The order of the two reaction steps is also consistent with the reported reactivity of Ntermini in the modification step and Ns in the cross-linking step102,104. This together with the clarified chemical structures proposed detailed reaction schemes (Figure 3-6d). After the formation of the Gα1 to Yα19 cross-link, another formaldehyde molecule was added to the amino group of the N-terminus to form a methylol modification, which then dehydrated into a Schiff-base structure. The Schiff-base structure then turned into a second methylene bridge connected to the primary amide group at the asparagine side chain. Upon Glu-C digestion, this cross-link produced peptide (α18NYCNα21 ^ α1GIVEα4)* (Figure 3-4b).  66  Figure 3-6. The two-step reactions (a and c) and proposed reaction schemes (b and d) to form the Gα1 to Yα19 (a and b) and Gα1 to Nα18 (c and d) cross-links. Symbol ^ represents the cross-linker.  The formation of the Kβ29 to Yα19 cross-link (Figure 3-4c) was investigated following the progression of the modification on Kβ29 and Yα19. In the MS/MS spectrum of the singly modified peptide β22RGFFYTPKAβ30+12 in the formaldehyde treated 0.5 hr insulin sample (Figure 3-7), y2+12, y3+12, y4+12, y7+12 and y8+12 ions localized the modification to  β29  KAβ30. The modification was further assigned to Kβ29, as alanine was  shown not to be reactive in formaldehyde-induced reactions92. Thus, Kβ29 was modified within 0.5 hr of formaldehyde exposure, while Yα19 was not modified after 6 hr of 67  exposure as discussed above (Figure 3-5b). Moreover, the Kβ29 to Yα19 cross-link formed within 0.5 hr of formaldehyde exposure. Together this indicated that Kβ29 was modified and then cross-linked to Yα19 (Figure 3-8a), consistent with the reported reactivity of Ks in the modification step and Ys in the cross-linking step102,104.  Figure 3-7. The MS/MS spectrum of the singly modified β22RGFFYTPKAβ30 (+12Da) after 0.5 hr of formaldehyde exposure. The yn+12 ion series indicates that Kβ29 was modified within 0.5 hr.  Detailed schemes of reactions producing the Kβ29 to Yα19 cross-link were also proposed (Figure 3-8b): the ε-amino group of the lysine side chain reacted with formaldehyde and formed a methylol modification, which then dehydrated into a Schiffbase structure; the Schiff-base structure then turned into a methylene bridge connecting to the aromatic ring of the tyrosine side chain. Upon Glu-C digestion, this cross-link produced peptide β22RGFFYTPKAβ30 ^ α18NYCNα21 (Figure 3-4c).  68  Y  α19  Figure 3-8. The (a) two-step reactions and (b) proposed reaction schemes to form the K β29 to cross-link. Symbol ^ represents the cross-linker.  The formation of the Gα1 to Yβ26 cross-link (Figure 3-4d) was investigated by the progression of the modification on Gα1 and Yβ26. Since Gα1 was proven to be modified within 0.5 hr of formaldehyde exposure (Figure 3-5a), the modification on Yβ26 was examined here. In the MS/MS spectrum of the singly modified  β22  RGFFYTPKAβ30 (+12  Da) after 0.5 hr of formaldehyde treatment (Figure 3-7), y2+12 to y8+12 ions localized the modification to  β29  KAβ30, while b2+12 to b8+12 ions localized the modification to  β22  RGβ23. Therefore, multiple residues in peptide  β22  RGFFYTPKAβ30 were modified  within 0.5 hr of formaldehyde exposure. Individual modification sites could be localized by the average number of modifications, the degree of modification (DOM), of b and y ions from a more heavily modified  β22  RGFFYTPKAβ30. The modification sites on the  doubly modified β22RGFFYTPKAβ30 after 6 hr of formaldehyde exposure were examined (Figure 3-9). The DOM was calculated for each detectable b and y ion in the MS/MS spectrum (Figure 3-9a) and plotted as bar graphs along the peptide sequence (Figure 39c), with the calculation of DOMb6 shown in Figure 3-9b as an example. In Figure 3-9c, each significant difference (about 1) in DOM values between adjacent b/y ions indicated  69  a modification. The DOM values of b ions suggested that one modification was on β22  RGβ23, and the other was on Kβ29. DOM values corresponding to y ions localized one  modification to  β29  KAβ30 and the other to Rβ22. Therefore, Yβ26 was not modified even  after 6 hr of formaldehyde treatment.  Figure 3-9. (a) The numbering of b and y ions along peptide β22RGFFYTPKAβ30. (b) The MS/MS spectrum of the modified β22RGFFYTPKAβ30 (+24Da) after 6 hr of formaldehyde exposure. (c) A sample calculation of the DOM value of the b6 ion. PA is the abbreviation of peak area. (c) The bar graphs of DOM values of the b and y ions against the peptide sequence. Series of b and y ions together suggest that the two Schiff-base modifications are on Rβ22 and Kβ29 but not on Yβ26.  70  The modification occurring on Gα1 within 0.5 hr and not on Yβ26 after 6 hr of formaldehyde exposure indicated that Gα1 was modified first and then cross-linked to Yβ26 (Figure 3-10a). This order of the N-terminus reacting before Tyr (Y) is the same as that of the Gα1 to Yα19 cross-link (Figure 3-6a), and is also consistent with the reactivity of N-termini and Ys in the modification and cross-linking step. Therefore, the proposed reaction schemes (Figure 3-10b) are the same as in Figure 3-6b.  Y  β26  Figure 3-10. The (a) two-step reactions and (b) proposed reaction schemes to form the G α1 to cross-link. Symbol ^ represents the cross-linker.  Taking all four cross-linked peptides (Figure 3-4) into account, the progression of both the modification and the cross-linking step demonstrated that the N-terminus/Lys to Tyr/Asn cross-links were formed by modification on the N-terminus or Lys first and then cross-linking to Tyr or Asn took place. This conclusion supports a previously published hypothesis that the formaldehyde cross-linking of proteins includes a Mannich type reaction: a primary amino group is modified and then forms a methylene bridge with the side chain of Asn or Tyr. This hypothesis has been demonstrated with amino acids and small molecules98. However, here it has been shown in proteins for the first time.  71  3.3.1.5 Physiological Relevance of the Identified Cross-links in Insulin The characterization of formaldehyde cross-linking reactions in the insulin model system should indicate characteristics of these reactions in living cells. In this part, I discuss the relevance of Gα1 to Yα19, Gα1 to Nα18, Kβ29 to Yα19 and Gα1 to Yβ26 cross-links observed in the insulin model system to formaldehyde cross-linking under physiological conditions. The four observed cross-links are formed within 30 min of exposure to 1% formaldehyde, at 37 ℃ and pH 7.5. These conditions closely resemble those applied when attempting formaldehyde cross-linking in living cells and organisms54-61. Therefore, the N-terminus to Y/N and K to Y cross-links very likely form during the formaldehyde cross-linking of native proteins, as long as these residues in interacting proteins are in close proximity. Additionally, the formation of two cross-links on the N-terminus Gα under the same reaction conditions suggest that two cross-links can form on the Nterminus of a native protein during formaldehyde cross-linking in living cells and organisms. A further question is whether the Gα1 to Yα19, Gα1 to Nα18, Kβ29 to Yα19 and Gα1 cross-links in insulin reflect any physiologically relevant interactions. Although the model system does not contain interaction partners of insulin such as the insulin receptor family111, insulin is well known to interact with itself and form non-covalent dimers and hexamers in aqueous solution. The interaction interfaces in dimeric and hexameric insulin are the basis for designing insulin mutants for the treatment of diabetes112-113. Therefore, the possibility of formaldehyde cross-linking capturing the physiologically relevant dimer- or hexamer-forming interactions is examined. One common way to verify the 72  formation of cross-links between interacting proteins is to compare the length of the cross-link bridge to the distances between cross-link sites, the distance constraints. To determine whether formaldehyde induced cross-links formed between monomeric subunits of the dimeric or the hexameric insulin, distances between cross-link sites located in two monomers of a dimeric or hexameric insulin molecule were measured in their 3D crystal structures (PDB No.: 2A3G and 3AIY), as shown in Table 3-2. All of the distances were much longer than the length of the methylene bridge (2.5 Å), indicating that the observed Gα1 to Yα19, Gα1 to Nα18, Kβ29 to Yα19 and Gα1 cross-links in aqueous insulin were not likely formed within dimeric or hexameric insulin. However, considering the flexibility of peptide backbones and side chains in aqueous solution, the possibility of these residues coming into close proximity (< 2.5 Å) and becoming crosslinked cannot be excluded. Therefore, although exceptional situation might occur, crosslinks identified in the model protein system did not likely provide information about interaction that are likely physiologically relevant. Table 3-2. Distance constraints between cross-link sites in dimeric and hexameric insulin (PDB# 2A3G and 3AIY).  Distance Constraint/Å The Cross-link  Dimer  Hexamer  Gα1 to Yα19  19.82  20.58  Gα1 to Nα18  23.42  23.58  Kβ29 to Yα19  18.18  16.13  Gα1 to Yβ26  15.73  17.22  73  3.3.2 Myoglobin as the Larger Model Protein 3.3.2.1 Cross-linking of Myoglobin In a separate series of experiments, the method for the identification and characterization of cross-linked peptides was applied to a larger model protein, myoglobin. Just as in the insulin model system, the experimental workflow started with cross-linking of the model protein, ensuring a high yield of cross-links to allow identification of cross-linked peptides. Myoglobin was incubated with formaldehyde or without formaldehyde (control), for 0, 0.5, 2 and 6 hr, and then quenched with Tris buffer. The formation of cross-links was confirmed and the yield was determined, this time by acquiring MS spectra (Figure 3-11).  74  Figure 3-11. MS spectra of 100 µM myoglobin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr.  In the four control samples, the signal at around 17 kDa was from unmodified myoglobin. The signal at around 34 kDa was from non-specific non-covalent dimers of myoglobin, which were shown to form during the storage of myoglobin at -20 ℃ as lyophilized powder and to be a common observation in MALDI-MS analysis of proteins114-115. Neither the mass nor the peak intensity of the two peaks changed significantly as incubation time increased, indicating the myoglobin stayed unchanged for 75  6 hr without formaldehyde treatment. For the four formaldehyde treated samples, there was a significant increase in the signal intensity of dimeric myoglobin as the reaction between myoglobin and formaldehyde proceeded from 0 to 6 hr. In addition, trimeric myoglobin (around 51 kDa) appeared at 2 hr, and signal intensity significantly increased at 6 hr of formaldehyde incubation. The m/z values of monomeric, dimeric and trimeric myoglobin also increased as the duration of formaldehyde treatment. These were signs of myoglobin being modified and cross-linked by formaldehyde into dimers and trimers, and signs of an increase in the extent of modification and the yield of cross-links as the reaction proceeded. The formaldehyde treated 6 hr sample with highest cross-linking yield was later selected for the identification of cross-linked peptides. The MS spectrum proved an alternative way to visualize the formation and yield of cross-linking in model proteins. Compared to SDS-PAGE separation, it showed exact changes in protein mass that demonstrate the extent of modification, but it could not provide absolute quantification of the yield of cross-links. However, neither the exact quantification of the extent of modification nor that of the yield of cross-linking is necessary at this stage. Either method can be used to track the formation and yield of cross-links. 3.3.2.2 Identification of Three Cross-linked Myoglobin Peptides The steps of the experimental workflow and data processing in the myoglobin model system were similar to those applied to the insulin model system. All samples were digested and analyzed by LC-MS/MS, and the formaldehyde treated 6 hr myoglobin sample was selected for the identification of cross-linked peptides. The 3D plot of its LCMS data (Figure 3-12) showed a much more complex pattern than that of insulin (Figure 76  2-3a), as it contained many more signals with a lot of overlapping m/z values. Only 55 signals were assigned to unmodified and modified myoglobin peptides based on mass, MS/MS spectra and LC retention time, as listed in Appendix A.2.  Figure 3-12. The 3D plot (LC retention time, m/z, signal intensity) of LC-MS/MS data from the digest of the formaldehyde treated 6 hr myoglobin sample.  The remaining unknown signals were compared to a theoretical list of possible cross-linked peptides to generate a list of candidates of cross-linked peptides. Since the list of myoglobin peptides was longer and the pool of unknown signals was much larger than those of the insulin system, a small program was developed using MatLab (Appendix A.3) to generate the theoretical mass list of cross-linked peptides and compare it to masses of experimental LC-MS signals. Eighty-one masses were found in common between the theoretical and experimental list. After eliminating background signals that also appear in control samples, 27 candidates of cross-linked peptides remained. These candidates were verified by matching their MS/MS signals to putative fragment ions derived from their proposed structural components. Shown in Figure 3-13, 77  3-14 and 3-15 are fragmentation patterns and MS/MS spectra of three candidates that were identified to be cross-linked. Each of the MS/MS spectrum of these three candidates proved to contain type 1 and type 3 ions only, the same situation as the MS/MS spectrum of the cross-linked insulin peptide (β22RGFFYTPKAβ30 ^ β22RGFFYTPKAβ30)* (Figure 29). For candidate 138LFRNDIAAKYKE149 ^ 150LGFQG154 (Figure 3-13), type 1 ions I+12 and II (signals being 740.412+ and 521.271+) indicated that the precursor ion contained two parts of m=1478.8 Da and m=520.3 Da. These masses equaled the mass of 138  LFRNDIAAKYKE149 with a cross-link bridge attached and that of  150  LGFQG154,  separately. Nearly complete Ib and Iy ion series verified that the part of m=520.3 Da had the sequence LGFQG. Similarly, IIb, IIb+12, IIy and IIy+12 ion series verified that the part of m=1478.8 Da had the sequence LFRNDIAAKYKE and a 12 Da structure attached. Type 138  1  and  type  3  ions  together  LFRNDIAAKYKE149 cross-linked with  candidates (29VLIRLFTGHPETLE42 ^ (29VLIRLFTGHPETLE42 ^  43  43  confirmed  that  the  candidate  was  150  LGFQG154. For MS/MS spectra of  KFDKFKHLKTE53)* (Figure 3-14b) and  KFDKFKHLKTEAE55)* (Figure 3-15b), the proposed  structural components were confirmed by type 1 ions demonstrating the mass of the two parts and type 3 ions verifying the amino acid sequence of both parts. The * symbol in these two peptides represents an extra 12 Da structure, a Schiff-base modification or a cross-link bridge.  78  Figure 3-13. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of the cross-linked peptide 138LFRNDIAAKYKE149 ^ 150LGFQG154, with an MS signal of 667.373+.  79  Figure 3-14. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of the cross-linked peptide (29VLIRLFTGHPETLE42 ^ 43KFDKFKHLKTE53)*, with an MS signal of 767.974+. The * represents an extra Schiff-base modification or cross-link.  80  Figure 3-15. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide (29VLIRLFTGHPETLE42 ^ 43KFDKFKHLKTEAE55)*, with an MS signal of 817.974+. The * represents an extra Schiff-base modification or cross-link.  81  The method for identifying candidates of cross-linked peptides and verifying their proposed structural components proved to work in this larger model protein, although the sample complexity required computational power for data processing. The fragmentation patterns observed in cross-linked myoglobin peptides contained fragmentations at both the cross-link bridge and peptide backbones, generating two types of fragment ions: ions of whole component peptides (type 1) and b and y ions of component peptides (type 3). Although type 2 ions, one component peptide cross-linked to part of the other, were not observed, the analysis of type 1 and 3 ions together provided enough information to verify all the proposed structural components. It should be noted that in Chapter 2.3.6, candidates with m/z of 757.882+ and 763.882+ were not verified to be cross-linked peptides although all three types of ions appeared in their MS/MS spectra. Therefore, it is the completeness of information from fragment ions rather than appearance of all types of fragment ions that can verify a candidate to be cross-linked. The remaining 24 candidates that were not confirmed as cross-linked peptides, however, highlighted other issues accompanied with the increasing sample complexity. Twelve candidates did not generate good quality MS/MS spectra that could allow the confirmation of proposed structural components, most likely because they were of low abundance and co-eluted with many abundant peptides. The remaining twelve, according to their MS/MS spectra, did not contain proposed structural components. This reflected overlaps of peptide masses in complex peptide mixtures. Five of these twelve candidates were confirmed by analyzing MS/MS spectra to be modified myoglobin peptides, rather than the proposed component peptides. The remaining seven candidates were not assigned to modified myoglobin peptides. A possible origin of these seven candidates 82  was myoglobin peptides with Tris-formaldehyde adducts, formed by the primary amino group on Tris cross-linked to myoglobin via formaldehyde during the quenching of the formaldehyde reaction by concentrated (1 M) Tris buffer. They could also be solvent cluster ions of modified, cross-linked or Tris-formaldehyde-adduct myoglobin peptides, which were not eliminated by background subtraction based on control samples. In future, developing an enrichment method may help to relieve some of these issues. Enrichment of low abundance cross-linked peptides would improve the quality of their MS/MS spectra to allow the confirmation of proposed structural components. This is especially important for four candidates, which are likely cross-linked peptides because the major signals in their poor quality MS/MS spectra matched with some fragment ions from the proposed structural components. In addition, if part of the unmodified and modified peptides could be depleted as the cross-linked peptides are enriched before LCMS/MS analysis, overlaps of peptide masses and false positive identifications of candidates of cross-linked peptides could be reduced. 3.3.2.3 Localization of Cross-link Sites to One Individual Residue Cross-link sites were localized by type 2 and type 3 ions in the insulin model system. Here, this method was applied to the three cross-linked myoglobin peptides. Since their fragmentation patterns only contain type 1 and type 3 ions, cross-linked sites were localized based on type 3 ions alone—b and y ions of component peptides with or without the cross-link bridge attached. In the MS/MS spectrum of (29VLIRLFTGHPETLE42 ^  43  KFDKFKHLKTE53)*  (Figure 3-14b), the b ion series Ib2, Ib3, Ib4+12, Ib5+12, Ib7+12, Ib8+12, Ib9+12, Ib11+12,  83  Ib12+12 and Ib13+12 localized the cross-link bridge to R32. The y ion series Iy1 to Iy3, Iy5 and Iy7 to Iy10 indicated that the cross-link was not on the 33LFTGHPETLE42 segment but on the 29VLIR32 segment, supporting the localization of the cross-link to R32. Therefore, peptide II  43  KFCKFKHLKTE53 was cross-linked to the R32 on peptide I  29  VLIRLFTGHPETLE42.  43  KFDKFKHLKTEAE55)* (Figure 3-15b), the cross-link was also localized to R32 by the  b and y ions of  29  In  the  MS/MS  of  (29VLIRLFTGHPETLE42  ^  VLIRLFTGHPETLE42.  The localization of the cross-link to R32 is consistent with reported residue reactivity. Arginines have been shown to be potentially reactive in the cross-linking step of formaldehyde induced reactions within 2 days of formaldehyde exposure102. Arginines are not considered to be very reactive during the modification step, but this R32 in myoglobin has been reported to be modified by formaldehyde under the same reaction conditions as in this study104. The two cross-linked peptides (29VLIRLFTGHPETLE42 ^ and (29VLIRLFTGHPETLE42 ^  43  43  KFDKFKHLKTE53)*  KFDKFKHLKTEAE55)* did not only have the same  cross-link site (R32), but also had the same LC retention time and were similar in structural components except for the extra segment of 54AE55 in the latter one. Therefore, cross-link sites and the identity and location of the extra 12 Da structure could be considered as the same in these two peptides. The cross-link site on peptide 43  KFDKFKHLKTE53 or  43  KFDKFKHLKTEAE55 was not localized because ions that  contained part of the peptide backbone and the cross-link bridge were not observed. However, possible cross-link sites could be proposed by considering reactivity of residues in both component peptides. On the one hand, one of the four Lys (K) residues 84  was very likely modified and then cross-linked to R32. On the other hand, it is also possible that R32 was modified and cross-linked to H49. Other residues in component peptides were excluded as they were shown not to be reactive in the formaldehyde crosslinking reactions of model molecules92-104. 3.3.2.4 Localization of Cross-link Sites to Several Residues In the MS/MS spectrum of peptide 138LFRNDIAAKYKE149 ^ 150LGFQG154 (Figure 3-13b), many b and y ions of peptide I 138LFRNDIAAKYKE149 (type 3 ions) had both Ib/y and Ib/y+12 forms. This seemed similar to the cross-linked insulin peptide β22  RGFFYTPKAβ30 ^  α18  NYCNα21 discussed in 3.3.1.2, which had an MS/MS spectrum  (Figure 3-2b) that contained signals from both Ib/y and Ib/y+12 ions, as well as both IIb/y and IIb/y+12 ions. The cause of these observations concerning peptide β22RGFFYTPKAβ30 ^ α18NYCNα21 was fragmentation at both ends of the cross-link bridge. This did not seem to apply to peptide  138  LFRNDIAAKYKE149 ^  150  LGFQG154, because Ib/y ions were not  accompanied by series of IIb/y+12 ions. An alternative explanation is that the cross-link on  138  LFRNDIAAKYKE149 was not at one specific residue but at two or more residues.  In other words, the MS/MS spectrum was from a mixture of isomeric cross-linked peptides that had the same component peptides but different cross-link sites. Localizing multiple cross-link sites in isomeric cross-linked peptides was not straightforward. Therefore, the bar graph visualization to determine multiple modification sites on a singly modified peptide was adapted, by considering the cross-link bridge as a “modification” on peptide I  138  LFRNDIAAKYKE149. DOM values of Ib/y ions were  calculated in the same way as for the sample calculation shown in Figure 3-16a, and plotted as bar graphs against the peptide sequence (Figure 3-16b). The DOM values of b 85  ions showed significant differences between b8 and b9, b9 and b10, b10 and b12, suggesting K146, Y147 and  148  KE149 as the sites of “modification”—cross-link. The plateau of DOM  values around 25% at b5 to b8 ions (142DIAA145) indicated that the  138  LFRN141 segment  was also a cross-link site while 142DIAA145 was not. Therefore, the b ion series indicated that  150  LGFQG154 was cross-linked to the  138  LFRN141 segment, K146, Y147 or  148  KE149.  The significant differences of DOM values along the y ion series, however, localized the cross-link bridge to K148, Y147, K146 or the  138  LFRNDIAA141 segment. The DOM bar  graphs of b and y ions together localized the cross-link to 138LFRN141, K146, Y147 or K148.  Figure 3-16. (a) The numbering of b and y ions along the myoglobin peptide LFRNDIAAKYKE149. (b) A sample calculation of the DOM values of detectable b and y ions. PA is the abbreviation of peak area. (c) Bar graphs of DOM values of b and y ions from peptide 138 LFRNDIAAKYKE149 with a cross-link bridge attached against the peptide sequence, derived from the MS/MS spectrum of peptide 138LFRNDIAAKYKE149 ^ 150LGFQG154 (Figure 3-13b) 138  Assigned cross-link sites are consistent with reported residue reactivity in formaldehyde cross-linking. Lys (K) and Tyr (Y) residues have been shown to be reactive in the modification step and the cross-linking step, as discussed in Chapter 3.3.1.4. The  86  138  LFRN141 segment contains R139 and N141, which are reactive in the modification step  and/or potentially reactive in the cross-linking step102,104. Considering the reactivity of residues in peptide II 150LGFQG154, Q153 which is potentially reactive in the cross-linking step is likely the residue that is cross-linked to R139, K146 or K148, which are reactive in the modification step. It is noteworthy that the DOM values of b2 to b8 ions broke the usual increasing step and plateau pattern in DOM bar graphs. High values of 60% appeared at b2 and b3 ions, which could be explained in two ways. First, the CID fragmentation is known to generate various internal ions that can overlap with b and y ions. A mixture of large and highly charged isomeric cross-linked peptides generates an especially complex MS/MS spectrum, with a high probability of overlapping m/z values between fragment ions. High DOM values of b2 and b3 ions could be caused by overlaps between b2/3+12 and other fragment ions. Second, the extent of CID fragmentations at selective peptide bonds has been shown to be affected by basic residues, especially arginine27,116-117. The cross-link bridge attached to R140 could change its basicity, and therefore alter the extent of fragmentation at some peptide bonds, causing high DOM values of b2 and b3 ions. Unfortunately, the effect of basic residues on fragmentation patterns is not well studied on non-tryptic peptides. There have been a few examples in non-tryptic peptides to examine possible changes in the extent of fragmentations due to formaldehyde-induced modifications on basic residues103. Modifications on basic residues do not seem to alter the extent of fragmentations at peptide bonds. However, effects caused by cross-links on basic residues have not been studied and cannot be ruled out. In order to fully understand the complex fragmentation patterns and facilitate the determination of cross-link sites in 87  isomeric cross-linked peptides, more isomeric cross-linked peptides need to be identified and investigated in the future. 3.3.2.5 Complexity of Extra Modifications/Cross-links Cross-linked peptides (29VLIRLFTGHPETLE42 ^ (29VLIRLFTGHPETLE42 ^  43  43  KFDKFKHLKTE53)* and  KFDKFKHLKTEAE55)* both contain an extra 12 Da  structure, which can be a Schiff-base modification, an intra-peptide cross-link or a second cross-link between component peptides. Their structures were partially resolved before considering the identity and location of the extra 12 Da structure. In the MS/MS spectrum of peptide (29VLIRLFTGHPETLE42 ^  43  KFDKFKHLKTE53)* (Figure 3-14b), type 1 ions I and II+24, and I+12 and II+12  indicated that the extra 12 Da structure was attached to peptide II 43KFDKFKHLKTE53. Series of b and y ions IIb9+12, IIb10+12, IIy1 and IIy2 localized the extra 12Da structure to the  43  KFDKFKHLK51 segment. These, combined with the cross-link site R32, indicated  that R32 on 29VLIRLFTGHPETLE42 is cross-linked to 43KFDKFKHLKTE53 with an extra 12 Da structure attached to the 43KFDKFKHLK51 segment, as shown in Figure 3-14a. In the same way, peptide (29VLIRLFTGHPETLE42 ^ determined 43  to  be  R32  on  29  43  KFDKFKHLKTEAE55)* was  VLIRLFTGHPETLE42  cross-linked  KFDKFKHLKTEAE55 with an extra 12 Da structure attached to the  to  43  KFDKFKHL50  segment, as shown in Figure 3-15a. The similarity between these two partially resolved structures, further confirmed the assumption in Chapter 3.3.2.3 that these two crosslinked peptides had the same cross-link sites and the same identity and location of the extra 12 Da structure.  88  The partially resolved structures and reported residue reactivity can be used to suggest possible cross-link sites and the identity and location of the extra 12 Da structure. As discussed in Chapter 3.3.2.3, R32 is likely cross-linked to K43/46/48/51 or H49. If the cross-link is R32 to H49, the extra 12 Da structure on the 43KFDKFKHL50 segment could be either a Schiff-base modification on K43/46/48, or a second cross-link bridge connecting H37 and K43/46/48. If the cross-link is R32 to K43/46/48/51, the extra 12Da structure could be a Schiff-base modification on K43/46/48, an intra-peptide cross-link between K43/46/48 and H49, or a second cross-link bridge connecting H37 and K43/46/48 or R32 and K43/46/48. In summary, there are still significant number of possibilities, and all of them are still supported by the MS/MS spectra available at the moment. As shown both above and in Chapter 3.3.1.3, an extra 12 Da structure adds much complexity when attempting to clarify the structure of the cross-linked peptide. Combining residue reactivity with MS/MS spectra helped to resolve the structure of (α18NYCNα21 ^ α1GIVEα4)*, in which component peptides were short and each contained residues reactive in either the modification or the cross-linking step. This method could not determine the exact structure of peptide (β22RGFFYTPKAβ30^β22RGFFYTPKAβ30)*, (29VLIRLFTGHPETLE42^43KFDKFKHLKTE53)* and (29VLIRLFTGHPETLE42^43KFDKFKHLKTEAE55)*, in which component peptides were  long and each contained several residues that were reactive in both steps, however. These difficulties in resolving the extra 12 Da structure could become a challenge in other model proteins and living cells and organisms. This is because the average-length Glu-C digest peptides (15 residues)118 are a bit longer than those of the three unresolved crosslinked peptides, and they likely contain many reactive residues. Table 3-3 illustrates the 89  abundance (numbers) of formaldehyde-reactive residues in an average-length Glu-C peptide, that is, their abundance in proteins119 multiplied by the peptide length. In a 15residue peptide, the total number of modifiable residues is 1.66, while the total number of residues that are reactive in the cross-linking step is 3.12. The number of reactive residues in the modification step and the cross-linking step and the fact that some residues can form two cross-links or one cross-link plus one modification92,99,102 together indicate a large possibility of forming an extra modification or cross-link in a crosslinked peptide. Table 3-3. The abundance of formaldehyde reactive residues in proteins and average-length Glu-C peptides (15 residues).  Reactive Step  Modification Modification & Crosslinking  Cross-linking  Amino Acid  K  R  Y  N  Abundance1  0.059  0.051  0.032 0.043 0.043 0.023 0.014  Abundance in a 15Residue Peptide  0.89  0.77  0.48  0.65  Q  0.65  H  0.35  W  0.22  Further investigation into more cases of cross-linked peptides with an extra 12 Da structure is necessary in order to find a more effective way of resolving their structures. Many more formaldehyde cross-linked peptides with the extra 12 Da structure need to be identified for a comprehensive understanding. This could be achieved by identification of cross-linked peptides in more model proteins. Additionally, enrichment of cross-linked peptides from current model proteins may also allow the identification of more cross-  1  Creighton, T. E. Proteins: Structures and Molecular Properties; second ed.; W. H. Freeman, 1992  90  linked peptides that currently escape detection due to low abundance, or that were identified as candidates but not confirmed due to poor quality MS/MS spectra. 3.3.2.6 The Physiological Relevance and Reaction Chemistry of Identified Cross-links in Myoglobin The three cross-linked peptides identified in the formaldehyde treated 6 hr myoglobin sample appeared in formaldehyde treated 0.5 hr and 2 hr samples as well, with similar LC retention time and MS/MS spectra. Therefore, the observed cross-links on Arg (R), Lys (K) and Tyr (Y) residues in myoglobin were produced within 0.5 hr of exposure to formaldehyde. More specifically, these cross-links were formed within 30 min of exposure to 1% formaldehyde, at 37 ℃ and pH 7.5, conditions that closely resemble those applied to living cells and organisms54-61. Therefore, cross-links would very likely form on Arg, Lys or Tyr during the formaldehyde cross-linking of native proteins. Compared to the studies in insulin model system, the myoglobin model system revealed one more formaldehyde-reactive residue, Arg, under near-physiological conditions. The reaction chemistry was determined by the progression of both the modification and the cross-linking step on cross-link sites in Chapter 3.3.1.4. Here, the same analysis is applied to the myoglobin model system. The cross-linking step in which the observed cross-links in myoglobin were formed was known to proceed within 0.5 hr of formaldehyde exposure. However, only the cross-link site on one end of the cross-link bridge was determined for each cross-linked peptide due to the complexity of the MS/MS spectra caused by multiple cross-link sites (see Chapter 3.3.2.4) or the extra 12 Da structure (see Chapter 3.3.2.5). The lack of information on the progression of the modification step on unassigned cross-link sites obstructed clarification of the two-step 91  reaction that produced cross-links observed in myoglobin. In order to study the reaction chemistry in myoglobin or other proteins, challenges in the determination of cross-link and modification sites in complex cross-linked peptides need to be overcome in the future.  3.4 Conclusions and Outlook In this chapter the method to identify cross-linked peptides and localize cross-link sites was refined in two model protein systems: insulin (disulfide bonds reduced during digestion) and myoglobin. The whole approach allowed the identification of 5 crosslinked insulin peptides and 3 cross-linked myoglobin peptides, and the partial localization of the cross-link sites (underlined residues): (β22RGFFYTPKAβ30 α18  150  NYCNα21 ^  LGFQG154,  α1  ^  β22  RGFFYTPKAβ30)*,  GIVEα4, (α18NYCNα21 ^ (29VLIRLFTGHPETLE42  β22  β22  RGFFYTPKAβ30 ^  RGFFYTPKAβ30 ^  α1  GIVEα4)*, ^  43  138  α18  α1  GIVEα4,  NYCNα21,  LFRNDIAAKYKE149 ^  KFDKFKHLKTE53)*  and  (29VLIRLFTGHPETLE42 ^ 43KFDKFKHLKTEAE55)*. Therefore the method has proven to be readily applicable to other model proteins. The fragmentation patterns of cross-linked peptides observed in Chapter 2 were further confirmed in these model proteins. Fragmentations occurred at both the cross-link bridge and in the peptide backbones, generating two or three types of fragment ions. The improved understanding of the fragmentation patterns could be applied to bioinformatics software that automate the data interpretation of cross-linking experiments, and to the manual interpretation of MS/MS spectra from experiments in living cells and organisms. The localization of cross-link sites by type 2 and type 3 ions revealed the Nterminus to Tyr/Asn and Lys to Tyr cross-links, cross-links on Lys, Tyr and Arg, and two 92  cross-links forming on one single N-terminus in proteins. This valuable and direct information about residue reactivity might also be used in bioinformatics software. Furthermore, monitoring progression of both the modification and the cross-linking step on reactive residues revealed the reactivity of the N-terminus and Lys in the modification step and the reactivity of Tyr and Asn in the cross-linking step. The reaction chemistry was revealed in proteins for the first time, and the results were consistent with studies conducted in simpler model systems. As the size of the model protein increased, complexity was added to both the MS and MS/MS data, and prompted refinement of the method. MatLab programming was applied to speed up the processing of MS data. For isomeric cross-linked peptides with multiple cross-link sites and complex fragmentation patterns, DOM values and a bargraph visualization was introduced to localize the cross-link. Increased complexity of the model protein also revealed remaining issues which have to be solved: false positive identification of candidates of cross-linked peptides due to overlapping m/z values in the complex peptide mixture, and insufficient information to verify candidates due to poor quality MS/MS spectra. To help relieve these issues in the future, an enrichment method should be developed to increase the proportion of crosslinked peptides and reduce the complexity of the peptide mixture. As observed in both model proteins, an extra 12 Da structure added complexity to the clarification of the cross-linked peptide structures. Since this issue would very likely occur in the average-length cross-linked peptides in model protein systems and physiologically relevant systems, the structure determination needs to be fully understood by investigation of more cross-linked peptides. In order to identify more cross-linked 93  peptides, the application of the approach to more model proteins or the enrichment of low abundance cross-linked peptides in current model proteins is necessary.  94  4 Conclusions and Future Perspectives The formaldehyde cross-linking approach is a powerful tool to study proteinprotein interactions in living cells and organisms, which could reveal both interacting proteins and the geometry of interactions. However, its potential to map the geometry of protein-protein interactions is limited by challenges in the identification of cross-linked peptides in digests of cross-linked proteins. This study of formaldehyde cross-linking in model proteins is aimed at establishing a method to study the MS properties of crosslinked peptides and the chemistry of cross-linking reactions, in order to facilitate surmounting these challenges. A method to identify and characterize cross-linked peptides in model proteins has been established as follows. Model proteins are cross-linked, with yields confirmed by SDS-PAGE or MS spectra. Protein samples are digested and analyzed by LC-MS/MS. Candidates of cross-linked peptides are identified by matching a theoretical list of possible cross-linked peptides to experimental MS signals using a MatLab program (Appendix A.3). MS/MS spectra of candidates are collected and interpreted by matching signals to proposed fragment ions. A candidate is confirmed to be cross-linked if all of the proposed structural components (component peptides, cross-link and extra modifications or cross-links) are confirmed by the MS/MS spectrum. Series of fragment ions which contain the cross-link bridge localize the cross-link to individual residues or small segments on the component peptides. Multiple cross-link sites on isomeric crosslinked peptides can be localized by a bar graph visualization. After the localization of cross-link sites to individual residues at both ends of the cross-link bridge, examining the  95  progression of both the modification and the cross-linking steps on cross-link sites reveals the chemistry of the two-step cross-linking reaction and reaction schemes. Cross-linked peptides identified in this study are formed in non-physiological protein models and do not reflect physiologically relevant geometry information. However, valuable knowledge about the formaldehyde cross-linking reactions is gained directly in proteins. The deeper understanding of the reactions can be used in several ways toward the ultimate goal of revealing the geometry of protein-protein interactions in their cellular environments by identifying cross-linked peptides and determining cross-link sites. A number of bioinformatics programs have been developed to automate the identification of cross-linked peptides and the determination of cross-link sites in crosslinking experiments in living cells68,71,80-90. However, these programs have not been combined with the formaldehyde cross-linking approach due to a limited understanding of the residue reactivity and the fragmentation characteristics. In this study, the Nterminus to Tyr/Asn and Lys to Tyr cross-links, cross-links on Lys, Tyr or Arg, and two cross-links forming on a single N-terminus are revealed in proteins, under reaction conditions that closely resemble those applied when studying living cells and organisms. These types of cross-links very likely form during the formaldehyde cross-linking of native proteins in their cellular environment, therefore they can be submitted to bioinformatics software as residue reactivity parameters. In addition, the fragmentation patterns of formaldehyde cross-linked peptides are observed for the first time. This information can be used to establish the fragmentation model part of the bioinformatics software. However, the residue reactivity and fragmentation patterns have been studied in less than 10 cross-linked peptides so far. Moreover, the structure determination of a 96  cross-linked peptide with an extra 12 Da structure via the CID MS/MS spectrum is still a puzzle, as only 4 cases have been investigated. For a comprehensive understanding of the reactive residues and a fragmentation model that would contribute to the development of unbiased bioinformatics software, investigation of more model proteins with the established method is necessary in the near future. An enrichment method is usually necessary to facilitate the identification of low abundance cross-linked peptides from the complex digest of cross-linked native proteins67,70-72. Additionally, the complexity of MS data even in the myoglobin model system suggests a need for enrichment methods. When model proteins of equivalent and larger size are investigated, enrichment could reduce false positive identification of candidate cross-linked peptides and improve the quality of MS/MS spectra. The identification of cross-linked insulin and myoglobin peptides opens the door to testing different enrichment methods in model proteins. Chromatography methods such as strong cation exchange (SCX), strong anion  exchange (SAX) and size exclusion  chromatography (SEC) can be applied to the digest of formaldehyde treated insulin or myoglobin, to determine which of these, if any, can increase the proportion of formaldehyde cross-linked peptides in the peptide mixture. If an enrichment method proves to work in model proteins, it can also be used to enrich formaldehyde cross-linked peptides from experiments in living cells and organisms. Besides the development of enrichment methods and bioinformatics software, an interactome and interface (2IP) strategy has also been developed to facilitate mapping the geometry of cross-linked interacting proteins120. In this strategy, cross-linked proteins are cleaved by chemicals and digested by enzymes into different sizes of peptides. The 97  comparative analysis of these peptides allows localization of cross-links to a digest-sized peptide segment. This low-resolution localization of cross-link sites has allowed lowresolution geometry mapping of protein-protein interactions in several model proteins. However, it does not include the direct verification of identified cross-linked peptides and determination of cross-link sites by MS/MS spectra. Our knowledge about the MS/MS characteristics of cross-linked peptides can be combined with this strategy and applied to cross-linked native proteins, for a further confirmation of the identification of crosslinked peptides by MS/MS spectra, and high resolution mapping of the geometry of protein-protein interactions by localization of the exact cross-link sites. Aside from assisting the development of experimental strategies and bioinformatics software for the formaldehyde cross-linking approach, my study also suggests considerations for the general experimental design of protein cross-linking. As shown in the insulin model system, reducing or not reducing the disulfide bonds, digesting or not digesting provides complementary information about cross-linked peptides and crosslinking reactions. More specifically, a smaller peptide mixture was produced when disulfide bonds are not reduced during the enzymatic digestion, which is suitable for the method development but complicates MS/MS spectra of cross-linked peptides. Later, reducing disulfide bonds produced a different peptide mixture that not only provided a good model system for method verification and refinement, but also allowed the localization of cross-links to individual residues due to reduced complexity of MS/MS spectra of cross-linked peptides. Moreover, although cross-linked insulin was digested for identification and characterization of cross-linked peptides, monitoring the progression of modifications on Gα1, Yα19 and Nα18 was performed on the whole insulin α-chain, which 98  was easier to ionize and fragment than short peptides after digestion. These observations suggest that for more comprehensive information about a model protein or a physiological system, digestion with or without reducing disulfide bonds and different proteolytic enzymes or chemical cleavage reagents can be used to produce peptides and peptide mixtures of different complexity. In fact, the 2IP strategy is an excellent example of this idea, because it is based on complementary information from longer peptides produced by chemical cleavage and shorter peptides produced by enzymatic digestion. Therefore, altering the routine workflow of proteomic studies (mentioned in 1.1.1) is a possible way to creatively overcome challenges in the protein cross-linking studies. Last but not the least, different MS instrumentation can be used to optimize the detection of cross-linked peptides. Ion mobility spectrometry (IMS), which separates gaseous ions according to the collision cross-sections, is a possible alternative to simplify peptide mixtures produced by cross-linked proteins. IMS coupled with MS has already been used to separate isomeric modified peptides with different modification sites121. It is therefore a promising technique to separate isomeric cross-linked peptides such that their structures can be resolved individually in the subsequent MS/MS analysis. In this way, individual structures of isomeric cross-linked peptides 150  LGFQG154 and (β22RGFFYTPKAβ30 ^  138  LFRNDIAAKYKE149 ^  β22  RGFFYTPKAβ30)* could be resolved.  Currently, the application of IMS-MS to cross-linking samples is mainly for simple mixtures such as modified standard peptides121 or intact cross-linked model proteins122. This technique has yet to be adapted to more complex peptide mixtures. Applying different fragmentation methods to cross-linked peptides can also be considered. Electron-capture dissociation (ECD) and electron-transfer dissociation (ETD) 99  are suitable for long, highly charged peptides, and tend to cleave only along peptide backbones but not modifications or cross-links. Collision-induced dissociation (CID), infrared multiphoton dissociation (IRMPD), ECD and ETD have been used together for complementary and unambiguous determination of cross-link sites for other crosslinkers123-125. Instruments equipped with these fragmentation techniques are therefore an alternative option for the analysis of formaldehyde cross-linking samples. In conclusion, a method has been established in this study to identify and characterize formaldehyde-induced cross-links in model proteins. Knowledge gained in this study can be used for the development of enrichment methods and bioinformatics software. These future directions combined with proper instrumentation and careful experimental design can facilitate the identification of formaldehyde cross-linked peptides and the determination of cross-link sites in cross-linked native proteins. This shall allow the high-resolution mapping of the geometry of protein-protein interactions in their native cellular environment. The formaldehyde cross-linking approach has been successfully applied to the study of various living cells and organisms with versatile experimental designs. Therefore, it is not difficult to envision that questions of which proteins interact and how they interact in various biological systems can be answered by this approach. Furthermore, clinical biopsies that are preserved by formaldehyde crosslinking can be studied to examine abnormal changes in protein-protein interactions associated with disease states. Considering the variety of tissue banks and the numbers of stored disease tissues, studying formaldehyde cross-linked proteins stands a good chance in revealing characteristic abnormal protein-protein interactions associated with various diseases. 100  References (1)  Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M.  Science 1989, 246, 64. (2)  Karas, M.; Hillenkamp, F. Anal Chem 1988, 60, 2299.  (3)  Chait, B. T. Structure 1994, 2, 465.  (4)  Wilm, M.; Shevchenko, A.; Houthaeve, T.; Breit, S.; Schweigerer, L.;  Fotsis, T.; Mann, M. Nature 1996, 379, 466. (5)  Belov, M. E.; Gorshkov, M. V.; Udseth, H. R.; Anderson, G. A.; Smith, R.  D. Anal Chem 2000, 72, 2271. (6)  McLafferty, F. W.; Fridriksson, E. K.; Horn, D. M.; Lewis, M. A.;  Zubarev, R. A. Science 1999, 284, 1289. (7)  Yates, J. R.; Mccormack, A. L.; Eng, J. Abstr Pap Am Chem S 1994, 207,  (8)  Henzel, W.; Watanabe, C.; Stults, J. J Am Soc Mass Spectr 2003, 14, 931.  (9)  Pan, S. Q.; Gu, S.; Bradbury, E. M.; Chen, X. Anal Chem 2003, 75, 1316.  (10)  Kleno, T. G.; Leonardsen, L. R.; Kjeldal, H. O.; Laursen, S. M.; Jensen, O.  101.  N.; Baunsgaard, D. Proteomics 2004, 4, 868. (11)  Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Electrophoresis  1999, 20, 3551. (12)  Eng, J. K.; Mccormack, A. L.; Yates, J. R. J Am Soc Mass Spectr 1994, 5,  (13)  Craig, R.; Beavis, R. C. Bioinformatics 2004, 20, 1466.  (14)  Biemann, K.; Scoble, H. A. Science 1987, 237, 992.  (15)  Eckart, K. Mass Spectrom Rev 1994, 13, 23.  (16)  Roepstorff, P.; Fohlman, J. Biomedical Mass Spectrometry 1984, 11, 601.  (17)  Biemann, K. Biomed Environ Mass 1988, 16, 99.  (18)  Papayannopoulos, I. A. Mass Spectrom Rev 1995, 14, 49.  (19)  Paizs, B.; Suhai, S. Mass Spectrom Rev 2005, 24, 508.  (20)  Ballard, K. D.; Gaskell, S. J. International Journal of Mass Spectrometry  976.  and Ion Processes 1991, 111, 173. 101  (21)  Gaskell, S. J.; Bolgar, M. S.; Cox, K. A. Methods in Protein Structure  Analysis 1995, 141. (22)  Wysocki, V. H.; Resing, K. A.; Zhang, Q. F.; Cheng, G. L. Methods 2005,  35, 211. (23)  Johnson, H.; Eyers, C. E. In LC-MS/MS in Proteomics; Cutillas, P. R.,  Timms, J. F., Eds.; Humana Press: 2010; Vol. 658, p 93. (24)  Tang, X. J.; Boyd, R. K. Rapid Commun Mass Sp 1992, 6, 651.  (25)  Cox, K. A.; Gaskell, S. J.; Morris, M.; Whiting, A. J Am Soc Mass Spectr  1996, 7, 759. (26)  Wysocki, V. H.; Tsaprailis, G.; Smith, L. L.; Breci, L. A. J Mass Spectrom  2000, 35, 1399. (27)  Tsaprailis, G.; Nair, H.; Somogyi, Á.; Wysocki, V. H.; Zhong, W.; Futrell,  J. H.; Summerfield, S. G.; Gaskell, S. J. J Am Chem Soc 1999, 121, 5142. (28)  Tsaprailis, G.; Somogyi, A.; Nikolaev, E. N.; Wysocki, V. H.  International Journal of Mass Spectrometry 2000, 196, 467. (29)  Farrugia, J. M.; Taverner, T.; O'Hair, R. A. J. International Journal of  Mass Spectrometry 2001, 209, 99. (30)  Kollmann, K.; Mutenda, K. E.; Balleininger, M.; Eckermann, E.; von  Figura, K.; Schmidt, B.; Lübke, T. Proteomics 2005, 5, 3966. (31)  Foster, L. J.; de Hoog, C. L.; Zhang, Y.; Xie, X.; Mootha, V. K.; Mann, M.  Cell 2006, 125, 187. (32)  Dosemeci, A.; Makusky, A. J.; Jankowska-Stephens, E.; Yang, X.; Slotta,  D. J.; Markey, S. P. Mol Cell Proteomics 2007, 6, 1749. (33)  Yan, W.; Aebersold, R.; Raines, E. W. J Proteomics 2009, 72, 4.  (34)  Ghaemmaghami, S.; Huh, W.; Bower, K.; Howson, R. W.; Belle, A.;  Dephoure, N.; O'Shea, E. K.; Weissman, J. S. Nature 2003, 425, 737. (35)  de Godoy, L. M.; Olsen, J. V.; Cox, J.; Nielsen, M. L.; Hubner, N. C.;  Frohlich, F.; Walther, T. C.; Mann, M. Nature 2008, 455, 1251. (36)  Hall, D. B.; Struhl, K. J Biol Chem 2002, 277, 46043.  (37)  Blagoev, B.; Kratchmarova, I.; Ong, S. E.; Nielsen, M.; Foster, L. J.;  Mann, M. Nat Biotechnol 2003, 21, 315. 102  (38)  Gingras, A. C.; Gstaiger, M.; Raught, B.; Aebersold, R. Nat Rev Mol Cell  Bio 2007, 8, 645. (39)  Vasilescu, J.; Figeys, D. Curr Opin Biotech 2006, 17, 394.  (40)  Krogan, N. J.; Cagney, G.; Yu, H. Y.; Zhong, G. Q.; Guo, X. H.;  Ignatchenko, A.; Li, J.; Pu, S. Y.; Datta, N.; Tikuisis, A. P.; Punna, T.; Peregrin-Alvarez, J. M.; Shales, M.; Zhang, X.; Davey, M.; Robinson, M. D.; Paccanaro, A.; Bray, J. E.; Sheung, A.; Beattie, B.; Richards, D. P.; Canadien, V.; Lalev, A.; Mena, F.; Wong, P.; Starostine, A.; Canete, M. M.; Vlasblom, J.; Wu, S.; Orsi, C.; Collins, S. R.; Chandran, S.; Haw, R.; Rilstone, J. J.; Gandi, K.; Thompson, N. J.; Musso, G.; St Onge, P.; Ghanny, S.; Lam, M. H. Y.; Butland, G.; Altaf-Ui, A. M.; Kanaya, S.; Shilatifard, A.; O'Shea, E.; Weissman, J. S.; Ingles, C. J.; Hughes, T. R.; Parkinson, J.; Gerstein, M.; Wodak, S. J.; Emili, A.; Greenblatt, J. F. Nature 2006, 440, 637. (41)  Butland, G.; Peregrin-Alvarez, J. M.; Li, J.; Yang, W. H.; Yang, X. C.;  Canadien, V.; Starostine, A.; Richards, D.; Beattie, B.; Krogan, N.; Davey, M.; Parkinson, J.; Greenblatt, J.; Emili, A. Nature 2005, 433, 531. (42)  Bouwmeester, T. Nat Cell Biol 2004, 6.  (43)  Sutherland, B. W.; Toews, J.; Kast, J. J Mass Spectrom 2008, 43, 699.  (44)  Sinz, A. Anal Bioanal Chem 2010, 397, 3433.  (45)  Back, J. W.; de Jong, L.; Muijsers, A. O.; de Koster, C. G. Journal of  Molecular Biology 2003, 331, 303. (46)  Melcher, K. Curr Protein Pept Sci 2004, 5, 287.  (47)  Aebersold, R.; Mann, M. Nature 2003, 422, 198.  (48)  Staros, J. V.; Anjaneyulu, P. S. R. Method Enzymol 1989, 172, 609.  (49)  Staros, J. V.; Kotite, N. J.; Cunningham, L. W. Method Enzymol 1992, 215,  (50)  Tomaska, L.; Resnick, R. J. J Biol Chem 1993, 268, 5317.  (51)  Suchanek, M.; Radzikowska, A.; Thiele, C. Nat Methods 2005, 2, 261.  (52)  Kobayashi, T.; Hearing, V. J. J Cell Sci 2007, 120, 4261.  (53)  Zhang, H. Z.; Tang, X. T.; Munske, G. R.; Tolic, N.; Anderson, G. A.;  403.  Bruce, J. E. Mol Cell Proteomics 2009, 8, 409.  103  (54)  Layh-Schmitt, G.; Podtelejnikov, A.; Mann, M. Microbiology 2000, 146  ( Pt 3), 741. (55)  Vasilescu, J.; Guo, X.; Kast, J. Proteomics 2004, 4, 3845.  (56)  Schmitt-Ulms, G.; Hansen, K.; Liu, J. L.; Cowdrey, C.; Yang, J.;  DeArmond, S. J.; Cohen, F. E.; Prusiner, S. B.; Baldwin, M. A. Nat Biotechnol 2004, 22, 724. (57)  Guerrero, C.; Tagwerker, C.; Kaiser, P.; Huang, L. Mol Cell Proteomics  2006, 5, 366. (58)  Tagwerker, C.; Flick, K.; Cui, M.; Guerrero, C.; Dou, Y.; Auer, B.; Baldi,  P.; Huang, L.; Kaiser, P. Mol Cell Proteomics 2006, 5, 737. (59)  Hájek, P.; Chomyn, A.; Attardi, G. J Biol Chem 2007, 282, 5670.  (60)  Bai, Y.; Markham, K.; Chen, F. S.; Weerasekera, R.; Watts, J.; Horne, P.;  Wakutani, Y.; Bagshaw, R.; Mathews, P. M.; Fraser, P. E.; Westaway, D.; GeorgeHyslop, P. S.; Schmitt-Ulms, G. Mol Cell Proteomics 2008, 7, 15. (61)  Klockenbusch, C.; Kast, J. J Biomed Biotechnol 2010.  (62)  Meunier, L.; Usherwood, Y. K.; Chung, K. T.; Hendershot, L. M. Mol  Biol Cell 2002, 13, 4456. (63)  Agou, F.; Ye, F.; Véron, M. In Protein-Protein Interactions; Fu, H., Ed.;  Humana Press: 2004; Vol. 261, p 427. (64)  Zeng, P. Y.; Vakoc, C. R.; Chen, Z. C.; Blobel, G. A.; Berger, S. L.  Biotechniques 2006, 41, 694. (65)  Bomgarden, R. D. Genet Eng Biotechn N 2008, 28, 24.  (66)  Schilling, B.; Row, R. H.; Gibson, B. W.; Guo, X.; Young, M. M. J Am  Soc Mass Spectr 2003, 14, 834. (67)  Leitner, A.; Walzthoeni, T.; Kahraman, A.; Herzog, F.; Rinner, O.; Beck,  M.; Aebersold, R. Mol Cell Proteomics 2010, 9, 1634. (68)  Mayne, S. L. N.; Patterton, H.-G. Briefings in Bioinformatics 2011.  (69)  Fabris, D.; Yu, E. T. J Mass Spectrom 2010, 45, 841.  (70)  Maiolica, A.; Cittaro, D.; Borsotti, D.; Sennels, L.; Ciferri, C.; Tarricone,  C.; Musacchio, A.; Rappsilber, J. Mol Cell Proteomics 2007, 6, 2200.  104  (71)  Rinner, O.; Seebacher, J.; Walzthoeni, T.; Mueller, L.; Beck, M.; Schmidt,  A.; Mueller, M.; Aebersold, R. Nat Methods 2008, 5, 748. (72)  Chen, Z. A.; Jawhari, A.; Fischer, L.; Buchen, C.; Tahir, S.; Kamenski, T.;  Rasmussen, M.; Lariviere, L.; Bukowski-Wills, J.-C.; Nilges, M.; Cramer, P.; Rappsilber, J. EMBO J 2010, 29, 717. (73)  Trester-Zedlitz, M.; Kamada, K.; Burley, S. K.; Fenyo, D.; Chait, B. T.;  Muir, T. W. J Am Chem Soc 2003, 125, 2416. (74)  Sinz, A.; Kalkhof, S.; Ihling, C. J Am Soc Mass Spectr 2005, 16, 1921.  (75)  Muller, D. R.; Schindler, P.; Towbin, H.; Wirth, U.; Voshol, H.; Hoving,  S.; Steinmetz, M. O. Anal Chem 2001, 73, 1927. (76)  Schulz, D. M.; Kalkhof, S.; Schmidt, A.; Ihling, C.; Stingl, C.; Mechtler,  K.; Zschoernig, O.; Sinz, A. Proteins 2007, 69, 254. (77)  Kasper, P. T.; Back, J. W.; Vitale, M.; Hartog, A. F.; Roseboom, W.; de  Koning, L. J.; van Maarseveen, J. H.; Muijsers, A. O.; de Koster, C. G.; de Jong, L. Chembiochem 2007, 8, 1281. (78)  Soderblom, E. J.; Goshe, M. B. Anal Chem 2006, 78, 8059.  (79)  Petrotchenko, E. V.; Xiao, K. H.; Cable, J.; Chen, Y. W.; Dokholyan, N.  V.; Borchers, C. H. Mol Cell Proteomics 2009, 8, 273. (80)  McIlwain, S.; Draghicescu, P.; Singh, P.; Goodlett, D. R.; Noble, W. S. J  Proteome Res 2010, 9, 2488. (81)  Lee, Y. J.; Lackner, L. L.; Nunnari, J. M.; Phinney, B. S. J Proteome Res  2007, 6, 3908. (82)  Lee, Y. J. J Am Soc Mass Spectr 2009, 20, 1896.  (83)  Nadeau, O. W.; Wyckoff, G. J.; Paschall, J. E.; Artigues, A.; Sage, J.;  Villar, M. T.; Carlson, G. M. Mol Cell Proteomics 2008, 7, 739. (84)  Heymann, M.; Paramelle, D.; Subra, G.; Forest, E.; Martinez, J.; Geourjon,  C.; Deleage, G. Bioinformatics 2008, 24, 2782. (85)  Anderson, G. A.; Tolic, N.; Tang, X. T.; Zheng, C. X.; Bruce, J. E. J  Proteome Res 2007, 6, 3412. (86)  de Koning, L. J.; Kasper, P. T.; Back, J. W.; Nessen, M. A.; Vanrobaeys,  F.; Van Beeumen, J.; Gherardi, E.; de Koster, C. G.; de Jong, L. Febs J 2006, 273, 281. 105  (87)  Gao, Q. X.; Xue, S.; Doneanu, C. E.; Shaffer, S. A.; Goodlett, D. R.;  Nelson, S. D. Anal Chem 2006, 78, 2145. (88)  Tang, Y.; Chen, Y. F.; Lichti, C. F.; Hall, R. A.; Raney, K. D.; Jennings, S.  F. Bmc Bioinformatics 2005, 6. (89)  Peri, S.; Steen, H.; Pandey, A. Trends Biochem Sci 2001, 26, 687.  (90)  Clauser, K. R.; Baker, P.; Burlingame, A. L. Anal Chem 1999, 71, 2871.  (91)  Barton, S. J.; Richardson, S.; Perkins, D. N.; Bellahn, I.; Bryant, T. N.;  Whittaker, J. C. Anal Chem 2007, 79, 5601. (92)  French, D.; Edsall, J. T. Adv Protein Chem 1945, 2, 277.  (93)  Fraenkelconrat, H.; Brandon, B. A.; Olcott, H. S. J Biol Chem 1947, 168,  (94)  Fraenkelconrat, H.; Cooper, M.; Olcott, H. S. J Am Chem Soc 1945, 67,  (95)  Fraenkelconrat, H.; Mecham, D. K. J Biol Chem 1949, 177, 477.  (96)  Fraenkelconrat, H.; Olcott, H. S. J Am Chem Soc 1946, 68, 34.  (97)  Fraenkelconrat, H.; Olcott, H. S. J Biol Chem 1948, 174, 827.  (98)  Fraenkelconrat, H.; Olcott, H. S. J Am Chem Soc 1948, 70, 2673.  (99)  Kelly, D. P.; Dewar, M. K.; Johns, R. B.; Wei-Let, S.; Yates, J. F. Adv Exp  99.  950.  Med Biol 1977, 86A, 641. (100) Heck, A. J.; Bonnici, P. J.; Breukink, E.; Morris, D.; Wills, M. Chemistry 2001, 7, 910. (101) Metz, B.; Kersten, G. F. A.; Baart, G. J. E.; de Jong, A.; Meiring, H.; ten Hove, J.; van Steenbergen, M. J.; Hennink, W. E.; Crommelin, D. J. A.; Jiskoot, W. Bioconjugate Chem 2006, 17, 815. (102) Metz, B.; Kersten, G. F. A.; Hoogerhout, P.; Brugghe, H. F.; Timmermans, H. A. M.; de Jong, A.; Meiring, H.; ten Hove, J.; Hennink, W. E.; Crommelin, D. J. A.; Jiskoot, W. J Biol Chem 2004, 279, 6235. (103) Toews, J.; Rogalski, J. C.; Clark, T. J.; Kast, J. Anal Chim Acta 2008, 618, 168. (104) Toews, J.; Rogalski, J. C.; Kast, J. Anal Chim Acta 2010, 676, 60.  106  (105) Nelson, D. L., Cox, M. M. Lehninger Principles of Biochemistry; 3rd ed.; W. H. Freeman: New York, 2000. (106) Nolan, C.; Margoliash, E.; Peterson, J. D.; Steiner, D. F. J Biol Chem 1971, 246, 2780. (107) Tang, X. T.; Bruce, J. E. Mol Biosyst 2010, 6, 939. (108) Keller, B. O.; Suj, J.; Young, A. B.; Whittal, R. M. Anal Chim Acta 2008, 627, 71. (109) Tang, X. J.; Thibault, P.; Boyd, R. K. Anal Chem 1993, 65, 2824. (110) Lee, Y. J. Mol Biosyst 2008, 4, 816. (111) Geetha, T.; Langlais, P.; Luo, M.; Mapes, R.; Lefort, N.; Chen, S.-C.; Mandarino, L.; Yi, Z. J Am Soc Mass Spectr 2011, 22, 457. (112) Diepen, M. G. W. T.-v., University of York, 1996. (113) Layloff, T. In American Genomic/Proteomic Technology 2001; Vol. 1, p 10. (114) van den Oord, A. H. A.; Wesdorp, J. J.; van Dam, A. F.; Verheij, J. A. European Journal of Biochemistry 1969, 10, 140. (115) Hardman, K. D.; Eylar, E. H.; Ray, D. K.; Banaszak, L. J.; Gurd, F. R. N. J Biol Chem 1966, 241, 432. (116) Engel, B. J.; Pan, P.; Reid, G. E.; Wells, J. M.; McLuckey, S. A. International Journal of Mass Spectrometry 2002, 219, 171. (117) Hogan, J. M.; McLuckey, S. A. J Mass Spectrom 2003, 38, 245. (118) Roland Kellner, F. L., Helmut E. Meyer Chemical and enzymatic fragmentation of proteins; second ed.; Wiley-VCH: New York, 1999. (119) Creighton, T. E. Proteins: Structures and Molecular Properties; second ed.; W. H. Freeman, 1992. (120) Weerasekera, R.; She, Y. M.; Markham, K. A.; Bai, Y.; Opalka, N.; Orlicky, S.; Sicheri, F.; Kislinger, T.; Schmitt-Ulms, G. Proteomics 2007, 7, 3835. (121) Santos, L. F. A.; Iglesias, A. H.; Pilau, E. J.; Gomes, A. F.; Gozzo, F. C. J Am Soc Mass Spectr 2010, 21, 2062. (122) Smith, D. P.; Anderson, J.; Plante, J.; Ashcroft, A. E.; Radford, S. E.; Wilson, A. J.; Parker, M. J. Chem Commun 2008, 5728. 107  (123) Novak, P.; Haskins, W. E.; Ayson, M. J.; Jacobsen, R. B.; Schoeniger, J. S.; Leavell, M. D.; Young, M. M.; Kruppa, G. H. Anal Chem 2005, 77, 5101. (124) Trnka, M. J.; Burlingame, A. L. Mol Cell Proteomics 2010, 9, 2306. (125) Santos, L. F. A.; Eberlin, M. N.; Gozzo, F. C. J Mass Spectrom 2011, 46, 262.  108  Appendices A.1 The List of Natural Amino Acids Shown in the following are natural amino acids, including their abbreviations and residue masses. The residue mass of an amino acid is calculated by it molecular weight minus that of water. 3-Letter Abbreviation  1-Letter Abbreviation  Alanine  Ala  A  71.04  Arginine  Arg  R  156.10  Asparagine  Asn  N  114.04  Aspartic acid  Asp  D  115.03  Cysteine  Cys  C  103.01  Glutamic acid  Gln  E  129.04  Glutamine  Glu  Q  128.13  Glycine  Gly  G  57.02  Histidine  His  H  137.06  Isoleucine  Ile  I  113.08  Leucine  Leu  I  113.08  Lysine  Lys  K  128.09  Methionine  Met  M  131.04  Phenylalanine  Phe  F  147.07  Proline  Pro  P  97.05  Serine  Ser  S  87.03  Threonine  Thr  T  101.05  Tryptonphan  Trp  W  186.08  Tyrosine  Tyr  Y  163.06  Valine  Val  V  99.07  Name  Residue Mass (Da)  109  A.2 The List of Assigned MS Signals in Figure 3-12 Shown in the following table are the origin, mass, m/z and charge state (z) of unmodified and modified myoglobin peptides assigned to 55 of all the MS signals in Figure 3-12. Position in Origin Myoglobin 2-7 576.24 2-7 576.24+12 2-7 576.24+42.016 2-7 576.24+90.048 2-7 576.24+72.032 8-19 1484.78 8-19 1484.78+12 8-19 1484.78+30.016 20-28 896.4 20-28 896.4+24 20-28 896.4+36 29-39 1280.72 29-39 1280.72+12 29-39 1280.72+30.016 29-42 1623.9  Mass  m/z  z  576.24 588.24 618.256 666.288 648.272 1484.78 1496.78 1514.796 896.4 920.4 932.4 1280.72 1292.72 1310.736 1623.9  577.26 589.23 619.25 667.18 649.3 743.41 749.41 758.42 896.4 920.4 932.4 641.37 647.36 656.35 542.32 812.97 552.29 550.3 824.94 560.33 474.26 478.27 488.28 492.27 498.27 540.96 544.95 554.97 564.99 625.97 634.37 633.16 645.64  1 1 1 1 1 2 2 2 1 1 1 2 2 2 3 2 3 3 2 3 3 3 3 3 3 3 3 3 3 5 5 5 4  29-42 29-42  1623.9+30.016 1623.9+24  1653.916 1647.9  29-42 43-53 43-53 43-53 43-53 43-53 43-55 43-55 43-55 43-55 56-84 56-84 56-84 61-84  1623.9+54.016 1419.79 1419.79+12 1419.79+42.016 1419.79+54.016 1419.79+72.032 1619.87 1619.87+12 1619.87+42.016 1619.87+72.032 3124.75 3124.75+42.016 3124.75+36 2578.51  1677.916 1419.79 1431.79 1461.806 1473.806 1491.822 1619.87 1631.87 1661.886 1691.902 3124.75 3166.766 3160.75 2578.51  110  Position in Origin Myoglobin 61-84 2578.51+42.016 61-86 2778.59  Mass  m/z  z  2620.526 2778.59  656.16 695.67 556.73 698.68 706.17 565.15 463.88 472.29 819.93 822.95 827.45 830.45 837.98 489.93 493.94 499.94 497.94 503.95 501.94 519.94 507.94 513.92 521.25 533.27 551.28 575.32  4 4 5 4 4 5 5 5 4 4 4 4 4 3 3 3 3 3 3 3 3 3 1 1 1 1  61-86 61-86  2778.59+12 2778.59+42.016  2790.59 2820.606  87-106 87-106 107-137 107-137 107-137 107-137 107-137 138-149 138-149 138-149 138-149 138-149 138-149 138-149 138-149 138-149 150-154 150-154 150-154 150-154  2314.35 2314.35+42.016 3275.64 3275.64+12 3275.64+30.016 3275.64+42.016 3275.64+72.032 1466.79 1466.79+12 1466.79+30.016 1466.79+24 1466.79+42.016 1466.79+36 1466.79+90.048 1466.79+54.016 1466.79+72.032 520.26 520.26+12 520.26+30.016 520.26+54.016  2314.35 2356.366 3275.64 3287.64 3305.656 3317.656 3347.672 1466.79 1478.79 1496.806 1490.79 1508.806 1502.79 1556.838 1520.806 1538.822 520.26 532.26 550.276 574.276  111  A.3 The MatLab Program for Data Processing in Chapter 3.3.2.2 Shown as follows is the MatLab program to speed up data processing for the identification of cross-linked peptides in the formaldehyde treated model protein. It makes the theoretical list of possible cross-linked peptides by combining unmodified and modified peptides, which is then compared to the LC-MS signals and generate a list of matches as the output. Lines starting with the % symbol are not part of the program, but annotations.  % Make the theoretical list of possible cross-linked peptides NoPep=x % x is the number of observed unmodified and modified peptides a=0 for j=1:NoPep for k=j:NoPep a=a+1 ExtPepComb(a,1)=j ExtPepComb(a,2)=k ExtPepComb(a,3)=ExtPep(j,1)+ExtPep(k,1) end end 112  % Compare the theoretical list to experimental LC-MS signals b=0 for m=y:z % y and z define the mass range of the LC-MS signals to be compared for n=1:a if LCMS(m,1)>=ExtPepComb(n,3)-0.2 && LCMS(m,1)<=ExtPepComb(n,3)+0.2 b=b+1 CandiPep(b,1)=ExtPepComb(n,1) CandiPep(b,2)=ExtPepComb(n,2) CandiPep(b,3)=ExtPepComb(n,3) end end end  113  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0062184/manifest

Comment

Related Items