UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A method to characterize formaldehyde cross-linking in proteins by mass spectrometry Ding, Xuan 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_fall_ding_xuan.pdf [ 3.92MB ]
Metadata
JSON: 24-1.0062184.json
JSON-LD: 24-1.0062184-ld.json
RDF/XML (Pretty): 24-1.0062184-rdf.xml
RDF/JSON: 24-1.0062184-rdf.json
Turtle: 24-1.0062184-turtle.txt
N-Triples: 24-1.0062184-rdf-ntriples.txt
Original Record: 24-1.0062184-source.json
Full Text
24-1.0062184-fulltext.txt
Citation
24-1.0062184.ris

Full Text

A METHOD TO CHARACTERIZE FORMALDEHYDE CROSS-LINKING IN PROTEINS BY MASS SPECTROMETRY  by XUAN DING B.Sc., Nanjing University, 2008  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE in The Faculty of Graduate Studies (Chemistry)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  August 2011   © Xuan Ding, 2011  ii  Abstract The formaldehyde cross-linking approach has been used to identify protein interactions in living cells and organisms, and has the potential to map the geometry of interactions based on cross-linked peptides. However, the identification of cross-linked peptides has not been realized in native proteins, not even in model proteins. In this study, a method to identify and characterize cross-linked peptides in model proteins is developed. The method was initially developed in an insulin model system. Candidates of cross-linked peptides were identified by matching a list of putative cross-linked peptides to experimental MS signals. Signals in the MS/MS spectrum of a candidate were matched with proposed fragment ions, and confirmation of all proposed structural components verified a candidate to be a cross-linked peptide. As a result, three cross-linked insulin peptides were identified for the first time. The CID fragmentation of a formaldehyde cross-linked peptide proved to occur at both the cross-link bridge and peptide backbones. Fragment ions containing the cross-link bridge allowed the localization of cross-link sites, which revealed a specific N-terminus to tyrosine cross-link. The method was then refined using two model protein systems of equivalent and higher complexity. Five cross-linked insulin peptides and three cross-linked myoglobin peptides were identified, with cross- link sites localized. The fragmentation patterns of cross-linked peptides were further confirmed. The localization of cross-link sites in proteins revealed the N-terminus to tyrosine/asparigine and lysine to tyrosine cross-links, cross-links on arginine, and two cross-links forming on one single N-terminus. Furthermore, monitoring progression of the two reaction steps at cross-link sites revealed the chemistry of formaldehyde cross- linking reaction in proteins for the first time. In addition, with more complex data as the  iii  size of the model protein increased, the method was refined by applying programming to data processing and a bar-graph visualization to localize cross-link sites on isomeric peptides. In the future, this method can be applied to other model protein systems for a more comprehensive understanding of formaldehyde cross-linking. The fragmentation patterns and reaction chemistry revealed by this method can be used to facilitate the identification of cross-linked peptides in native proteins.  iv  Preface This project was initiated by my supervisor, Professor Juergen Kast. I helped my supervisor in the design of the research program, performed all the benchwork, did all the data analysis and literature investigations. Samples were loaded to mass spectrometers by Jason Rogalski and Shujun Lin. Professor Juergen Kast provided invaluable guidance and suggestions during the entire course of research.  v  Table of Contents Abstract ..................................................................................................................... ii Preface ...................................................................................................................... iv Table of Contents ...................................................................................................... v List of Tables ............................................................................................................. x List of Figures ......................................................................................................... xii List of Abbreviations ............................................................................................. xvii Acknowledgements ................................................................................................ xix Dedication ............................................................................................................... xx 1 Introduction ...................................................................................................... 1 1.1 Mass Spectrometry of Proteins and Peptides ............................................ 1 1.1.1 Protein Characterization ........................................................................ 1 1.1.2 Peptide Sequencing................................................................................ 2 1.2 Protein Interactions ................................................................................... 4 1.2.1 Affinity Enrichment Coupled with Mass Spectrometry ........................ 4 1.2.2 Protein Cross-linking ............................................................................. 5 1.2.3 Advantages of Formaldehyde as the Cross-linker ................................. 9 1.2.4 Formaldehyde Cross-linking in Living Cells and Organisms ............... 9 1.3 Identification of Cross-linked Peptides ................................................... 10  vi  1.3.1 Challenges in the Identification of Cross-linked Peptides................... 10 1.3.2 Experimental Strategies and Bioinformatics Software to Facilitate the Identification of Cross-linked Peptides ..................................................................... 11 1.4 Model Studies of Formaldehyde Cross-linking Reactions ...................... 13 1.4.1 Two-step Reactions ............................................................................. 13 1.4.2 Residue Reactivity ............................................................................... 14 1.4.3 Model Proteins ..................................................................................... 15 1.5 Thesis Theme and Overview ................................................................... 16 2 Method Development in a Model Protein System to Identify Cross-linked Peptides and Localize Cross-links .................................................................................... 18 2.1 Introduction ............................................................................................. 18 2.2 Experimental ........................................................................................... 19 2.2.1 Materials .............................................................................................. 19 2.2.2 Preparation of Formaldehyde Solution ................................................ 20 2.2.3 Cross-linking of the Model Protein ..................................................... 20 2.2.4 SDS-PAGE Analysis of Cross-linked Insulin ..................................... 21 2.2.5 Glu-C Digestion of Cross-linked Insulin ............................................. 21 2.2.6 Mass Spectrometric Analysis of Peptides ........................................... 21 2.2.7 Labeling of MS/MS Spectra ................................................................ 22 2.3 Results and Discussion ............................................................................ 22  vii  2.3.1 Cross-linking of Insulin ....................................................................... 22 2.3.2 Complexity of Insulin Peptide Mixture ............................................... 23 2.3.3 Candidates of Cross-linked Peptides ................................................... 25 2.3.4 Verification of the Candidate 505.61 3+  by the MS/MS Spectrum ....... 28 2.3.5 Verification of Candidates 903.422+ and 602.623+by MS/MS Spectra 32 2.3.6 Verification of Candidates 549.79 4+ , 757.88 4+  and 763.88 4+ by MS/MS Spectra……………................................................................................................... 37 2.3.7 Partial Stability of Cross-link Bridges in CID Fragmentation ............ 42 2.3.8 Localization of Cross-link Sites .......................................................... 43 2.4 Conclusions and Outlook ........................................................................ 45 3 Method Refinement Using Other Model Protein Systems ............................. 48 3.1 Introduction ............................................................................................. 48 3.2 Experimental ........................................................................................... 49 3.2.1 Materials .............................................................................................. 49 3.2.2 Preparation of Formaldehyde Solution ................................................ 50 3.2.3 Cross-linking of the Model Protein ..................................................... 50 3.2.4 SDS-PAGE Analysis of Cross-linked Insulin ..................................... 50 3.2.5 Mass Spectrometric Analysis of Cross-linked Myoglobin .................. 51 3.2.6 Glu-C Digestion of Cross-linked Proteins ........................................... 51  viii  3.2.7 Preparation of Modified Insulin α-Chain ............................................ 51 3.2.8 Mass Spectrometric Analysis of Peptides ........................................... 51 3.2.9 Labeling of MS/MS Spectra of Cross-linked Peptides ........................ 52 3.2.10 Localization of Modification Sites by Degree of Modification (D.O.M)……............................................................................................................. 52 3.3 Results and Discussion ............................................................................ 53 3.3.1 Model Protein Insulin with Alternative Processing ............................. 53 3.3.1.1 Identification of Five Cross-linked Insulin Peptides .................... 53 3.3.1.2 Localization of the Cross-link Bridge .......................................... 55 3.3.1.3 Determination of the Extra Structure of 12 Da Mass Shift by Reactivity Considerations ..................................................................................... 60 3.3.1.4 The Two-Step Reaction between the N-terminus or Lysine and Tyrosine or Asparagine Residues ......................................................................... 63 3.3.1.5 Physiological Relevance of the Identified Cross-links in Insulin . 72 3.3.2 Myoglobin as the Larger Model Protein .............................................. 74 3.3.2.1 Cross-linking of Myoglobin ......................................................... 74 3.3.2.2 Identification of Three Cross-linked Myoglobin Peptides ........... 76 3.3.2.3 Localization of Cross-link Sites to One Individual Residue ........ 83 3.3.2.4 Localization of Cross-link Sites to Several Residues ................... 85 3.3.2.5 Complexity of Extra Modifications/Cross-links ........................... 88  ix  3.3.2.6 The Physiological Relevance and Reaction Chemistry of Identified Cross-links in Myoglobin ..................................................................................... 91 3.4 Conclusions and Outlook ........................................................................ 92 4 Conclusions and Future Perspectives ............................................................. 95 References ............................................................................................................. 101 Appendices ............................................................................................................ 109 A.1 The List of Natural Amino Acids ............................................................... 109 A.2 The List of Assigned MS Signals in Figure 3-12 ....................................... 110 A.3 The MatLab Program for Data Processing in Chapter 3.3.2.2 ................... 112   x  List of Tables Table 1-1. Names, structures, properties and spacer arm lengths of cross-linkers applicable to living cells. .................................................................................................... 8 Table 2-1. The theoretical mass list of possible cross-linked peptides in the digest of formaldehyde treated 6 hr sample, made by considering unmodified and modified insulin peptides as possible component peptides and summing up their masses one by one. Masses in bold are used as examples to illustrate proposed structural components of putative cross-linked peptides. Underlined masses are those of candidates of cross-linked peptides, identified by matching this table with the masses of unknown signals in Figure 2-3a. “Mod” is short for modification(s). ......................................................................... 26 Table 2-2. The m/z value, mass and proposed structural components of candidates of cross-linked peptides in the digest of the formaldehyde treated 6hr insulin sample. The ^ represents the cross-link bridge, while * represents an extra Schiff-base modification or intra-peptide cross-link on one of the component peptides, or a second cross-link bridge between two peptides. ....................................................................................................... 28 Table 2-3. Proposed fragment ions, their masses and the matching MS/MS signals, derived by assuming fragmentations along the backbone of the proposed component peptide II of the candidate 505.61 3+  (Its MS/MS spectrum is shown in Figure 2-4). ....... 30 Table 3-1. The m/z value, mass and structural components of cross-linked peptides identified in the digest of the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced). The ^ represents the cross-link bridge, while the * represents an extra Schiff- base modification or cross-link. ........................................................................................ 54  xi  Table 3-2. Distance constraints between cross-link sites in dimeric and hexameric insulin (PDB# 2A3G and 3AIY)....................................................................................... 73    xii  List of Figures Figure 1-1. The scheme of the nomenclature system to describe different types of ions from fragmentations along the peptide backbone. ...................................................... 3 Figure 1-2. (a) The scheme of the two-step reaction to form cross-links between proteins; (b) The scheme of cross-linked proteins, its digest and types of peptides in the peptide mixture. .................................................................................................................. 5 Figure 1-3. The scheme of the two-step formaldehyde cross-linking reaction in proteins. ............................................................................................................................. 13 Figure 1-4. The 3D structures (a and b) and sequences (c and d) of bovine insulin (a and c) and horse myoglobin (b and d). In (a) and (b), red highlights alpha-helix regions, while green highlights flexible regions. In (d) and (d), solid lines denote disulfide bonds, and dashed lines denote Glu-C cleavage sites. Residues highlighted with colors are reactive in the modification step (orange), potentially reactive in the cross-linking step (hotpink), and (potentially) reactive in both steps (blue). ................................................. 16 Figure 2-1. The sequence of insulin from bovine. Solid lines denote disulfide bonds, dashed lines denote Glu-C cleavage sites. ........................................................................ 19 Figure 2-2. The SDS-PAGE gel of 300 µM insulin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr................................................................... 23 Figure 2-3. (a) The 3D plot (LC retention time, m/z, signal intensity represented by grayscale) of LC-MS/MS data from the digest of formaldehyde treated 6 hr sample. (b) Zoom-ins of regions (i), (ii), (iii), (iv) and (v) in (a). Circled signals in (i), (ii), (iii), (iv)  xiii  and (v) are assigned to unmodified (solid circles) and modified (dashed circles) insulin peptides α1-4, β22-30, α18-21β14-21, α5-17β1-13 and α5-17β1-13. .............................. 25 Figure 2-4. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 505.61 3+ . ...................................................................................................... 29 Figure 2-5. The MS/MS fragmentation patterns deduced from the MS/MS spectrum of a cross-linked peptide with an MS signal of 505.61 3+  (Figure 2-4), to illustrate 3 types of fragment ion series........................................................................................................ 32 Figure 2-6. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 903.42 2+ . ...................................................................................................... 34 Figure 2-7. The fragmentation patterns deduced from MS/MS spectrum of a cross- linked peptide with an MS signal of 903.42 2+  (Figure 2-6). ............................................. 35 Figure 2-8. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 602.62 3+ . ........................................................................................................................................... 36 Figure 2-9. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 549.79 4+ . The * represents an extra Schiff-base modification or cross-link. .................................... 39 Figure 2-10. The fragmentation patterns, types of fragment ions and MS/MS spectra of candidates (a) 757.88 2+  and (b) 763.88 2+ . The * represents an extra Schiff-base modification or cross-link. ................................................................................................ 41  xiv  Figure 3-1. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide α18 NYCN α21 ^ α1 GIVE α4 , with an MS signal of 499.69 2+ . ............................................................................................................................ 57 Figure 3-2. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide β22 RGFFYTPKA β30  ^ α18 NYCN α21 , with an MS signal of 556.61 3+ . ............................................................................................................. 59 Figure 3-3. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide ( α18 NYCN α21  ^ α1 GIVE α4 )* , with an MS signal of 505.69 2+ . ....................................................................................................................... 62 Figure 3-4. Structures of cross-linked peptides identified in the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced), with cross-link sites localized to individual residues. ........................................................................................................... 63 Figure 3-5. MS/MS spectra of the singly modified insulin α-chain ( α1 GIVEQCCASVCSLYQLENYCN α21 +12) after (a) 0.5 hr, (b) 6 hr of formaldehyde exposure. In both spectra, the Schiff-base modification is localized to G α1 , and proves not to be on N α18  or Y α19 . ........................................................................................................ 65 Figure 3-6. The two-step reactions (a and c) and proposed reaction schemes (b and d) to form  the G α1  to Y α19  (a and b) and G α1  to N α18  (c and d) cross-links. Symbol ^ represents the cross-linker. ................................................................................................ 67 Figure 3-7. The MS/MS spectrum of the singly modified β22 RGFFYTPKA β30  (+12Da) after 0.5 hr of formaldehyde exposure. The yn+12 ion series indicates that K β29  was modified within 0.5 hr. .............................................................................................. 68  xv  Figure 3-8. The (a) two-step reactions and (b) proposed reaction schemes to form the K β29  to Y α19  cross-link. Symbol ^ represents the cross-linker. .................................... 69 Figure 3-9. (a) The numbering of b and y ions along peptide β22 RGFFYTPKA β30 . (b) The MS/MS spectrum of the modified β22 RGFFYTPKA β30  (+24Da) after 6 hr of formaldehyde exposure. (c) A sample calculation of the DOM value of the b6 ion. PA is the abbreviation of peak area. (c) The bar graphs of DOM values of the b and y ions against the peptide sequence. Series of b and y ions together suggest that the two Schiff- base modifications are on R β22  and K β29  but not on Y β26 . ................................................. 70 Figure 3-10. The (a) two-step reactions and (b) proposed reaction schemes to form the G α1  to Y β26   cross-link. Symbol ^ represents the cross-linker. .................................... 71 Figure 3-11. MS spectra of 100 µM myoglobin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr................................................................... 75 Figure 3-12. The 3D plot (LC retention time, m/z, signal intensity) of LC-MS/MS data from the digest of the formaldehyde treated 6 hr myoglobin sample. ...................... 77 Figure 3-13. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of the cross-linked peptide 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154 , with an MS signal of 667.37 3+ . ......................................................................................... 79 Figure 3-14. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of the cross-linked peptide ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTE 53 )*, with an MS signal of 767.97 4+ . The * represents an extra Schiff- base modification or cross-link. ........................................................................................ 80  xvi  Figure 3-15. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTEAE 55 )*, with an MS signal of 817.97 4+ . The * represents an extra Schiff-base modification or cross-link. ............................................................................. 81 Figure 3-16. (a) The numbering of b and y ions along the myoglobin peptide 138 LFRNDIAAKYKE 149 . (b) A sample calculation of the DOM values of detectable b and y ions. PA is the abbreviation of peak area. (c) Bar graphs of DOM values of b and y ions from peptide 138 LFRNDIAAKYKE 149  with a cross-link bridge attached against the peptide sequence, derived from the MS/MS spectrum of peptide 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154  (Figure 3-13b) .......................................................................................... 86   xvii  List of Abbreviations MS mass spectrometry(ic) MS/MS tandem mass spectrometry m mass z charge state m/z mass to charge ratio SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis HPLC high performance liquid chromatography MALDI matrix-assisted laser desorption/ionization ESI electrospray ionization Q quadrupole mass analyzer TOF time of flight mass analyzer CID collision-induced dissociation PFA paraformaldehyde PBS phosphate buffer saline DOM degree of modification hr hour(s) DTT dithiothreitol  xviii  IAA 2-iodoacetamide BS 3  bis(sulfosuccinimidyl)suberate DTSSP 3,3´-dithiobis(sulfosuccinimidylpropionate) Sulfo-EGS ethylene glycol bis[sulfosuccinimidylsuccinate DSG disuccinimidyl glutarate DSP dithiobis[succinimidyl propionate DSS disuccinimidyl suberate EGS ethylene glycol bis[succinimidylsuccinate] DMSO dimethyl sulfoxide   xix  Acknowledgements I would like to thank foremost my supervisor Dr. Juergen Kast for training me as a M.Sc student, and allowing me to work on an exciting project. His expertise in the research field and guidance from start to finish enabled me to develop skills in many aspects of research. The completion of this research project has benefited from the assistance and support of many present and past lab-mates (Jason, Shujun, Cordula, Qing, Chengcheng, Arash, Liwen, Davin, Savita, Jiqing, Ru, Geraldine, Judy). Their kindness in helping others and patience to me made the great atmosphere in the lab and helped me to solve many problems in research. Last and most importantly, I would like to thank my parents for their emotional support from the other side of the Pacific Ocean, and my husband Peng who went through all the difficult times together with me in Canada.  xx  Dedication    To my parents and husband     1  1 Introduction 1.1 Mass Spectrometry of Proteins and Peptides 1.1.1 Protein Characterization Mass spectrometry (MS) is an analytical technique that measures the mass-to- charge ratio (m/z) of ions. MS is made up of three distinct steps: analytes are ionized in the ion source, separated based on different m/z in the mass analyzer, and the number of ions at each m/z value are recorded by the detector. MS analysis has become an essential tool for the characterization of proteins which are vital biomolecules in a living organism, since the development of electrospray ionization (ESI) 1 and matrix assisted laser desorption ionization (MALDI) 2-3 . These soft ionization methods made possible the vaporization and ionization of proteins and peptides, which are involatile and thermally unstable. ESI and MALDI have enabled the large-scale protein characterization by mass spectrometry, i.e. proteomics. MS is highly sensitive and requires as little as zeptomole amounts of proteins 4-5 , making it useful when only small amounts of proteins at low concentrations can be isolated from living cells or organisms. Particularly useful in biological samples, MS provides rapid analysis of many proteins at once 6 . A typical workflow of MS-based protein characterization starts with the isolation of proteins from cell lysate or fractions thereof, which are enzymatically digested into peptides. Peptide mixtures are usually complex, requiring separation by liquid chromatography (LC), prior to introduction into a mass spectrometer 7 . The m/z values of all detectable peptides, which contain information of their masses and charge states, are  2  recorded in the MS spectra. The identification of proteins can be done by comparing mass spectra (m/z values) to reference lists of peptide masses that are generated from theoretical digestion of proteins in protein databases, an approach called peptide mass fingerprinting 8-10 . However, the mass of a peptide is not a unique identifying factor, as different combinations of amino acid sequences can produce the same mass.  In order to confirm the identity (sequence), a peptide is fragmented into diagnostic fragment ions using tandem mass spectrometry (MS/MS). In MS/MS, a peptide ion of a specific m/z value is selected in a first mass analyzer as the precursor ion, and passed into a reaction chamber where fragmentations along the peptide backbone are induced in a predictable manner. Fragment ions are drawn out of the collision cell, and enter a second mass analyzer where their m/z values are measured. MS/MS spectra of various peptide ions and peptide masses are searched against reference spectra and masses stored in protein databases using software such as Mascot 11 , SEQUEST 12  and X!Tandem 13 , to identify various proteins in the sample. Disulfide bonds in proteins are usually reduced and alkylated during the enzymatic digestion, because MS/MS spectra of disulfide connected peptides are too complex and thus not included in databases. 1.1.2 Peptide Sequencing The confirmation of peptide sequences by MS/MS is a key step in MS-based protein characterization. Although there are various peptide fragmentation methods such as electron-capture dissociation (ECD), electron-transfer dissociation (ETD) and infrared multiphoton dissociation (IRMPD), collision-induced dissociation (CID) is the most common fragmentation method in commercial MS instruments. In CID, peptide ions are activated by energetic collisions with an inert gas to initiate the fragmentation.  3  The CID fragmentation of peptides occurs mostly along the peptide backbones 14-15 . A nomenclature system 16-17  describes different types of backbone ions formed from fragmentation at different types of bonds (Figure 1-1). Backbone ions that contain the N- terminus and C-terminus of the peptide are labeled with (a, b, c) and (x, y, z) respectively, followed by numerical subscripts that identify the position of the amino acid where each fragmentation occurs. B and y ions from fragmentations at peptide bonds are major fragment ions in CID fragmentations 18-19 . During multiple collisions with the inert gas molecules, some b and y ions are fragmented further into internal ions that can complicate spectral analysis 20 . The peptide sequence can be determined by comparing the MS/MS spectrum to reference spectra in protein databases, as described in Chapter 1.1.1. An alternative of peptide sequencing is de novo sequencing that matches mass differences between successive b, y and internal ions to masses of amino acids. MS/MS spectra can also be used to localize modifications based on the mass shifts they produce on modified residues and series of ions that contain modified residues 21-23 .  Figure 1-1. The scheme of the nomenclature system to describe different types of ions from fragmentations along the peptide backbone. CID fragmentations of peptides are mainly charge-directed processes 24-25 . The fragmentation at a peptide bond and the formation of b and y ions is initiated by a proton attached to the nitrogen or oxygen of that peptide bond. Protonation sites on a peptide include energetically more favored ones, the N-terminus and side chains of basic residues  4  (Arginine and Lysine), and less favored ones, oxygen and nitrogen atoms at peptide bonds. When the peptide is ionized, protons tend to attach to the energetically more favored N-terminus and side chains of basic residues. Upon activation of the peptide by collisions, they migrate to less favored oxygen and nitrogen atoms throughout the peptide that lead to fragmentations of various peptide bonds 26 . The resulting fragmentation pattern contains successive b and y ions. Although fragmentation generally occurs on all peptide bonds, predominant fragmentations at certain peptide bonds are observed in some arginine containing peptides 26-29 . 1.2 Protein Interactions 1.2.1 Affinity Enrichment Coupled with Mass Spectrometry Significant efforts have been made to apply the prototype workflow of protein characterization to large-scale protein profiling of various organelles 30-33  and even whole yeast proteome 34-35 . Protein profiling studies reveal a whole map of proteins, including their identities and locations. However, profiling studies provide little information about protein-protein interactions, which are extraordinarily important in basic cellular processes such as protein synthesis 36  and signal transduction 37 . Changes in these interactions can reflect abnormal states of protein function and therefore disease states. An MS-compatible technique to study protein-protein interactions is affinity enrichment. Affinity enrichment isolates a target protein together with its interaction partners from cell lysates by e.g. interactions between the target protein and specific antibodies. After washing other cellular components away, the purified interacting proteins can be submitted to enzymatic digestion and subsequent steps in the MS-based  5  protein characterization workflow to identify interacting proteins. Affinity enrichment coupled with MS analysis has recently been widely used in studies of protein-protein interactions 38-39  in yeast 40 , E. coli 41  and human cell lines 42 . Affinity enrichment, however, has three limitations. Firstly, it tends to retain interaction partners with strong interactions which can survive the washing step, causing a loss of transient or weak interactions and therefore false negative identifications of interacting proteins. Secondly, as cell lysis removes the temporal and spatial constraints on protein-protein interactions, proteins that are separated by organelle membranes can come into contact and interact with each other, causing false positive identifications of interacting proteins 43-44 . Thirdly, this approach does not provide any geometry information of protein-protein interactions. 1.2.2 Protein Cross-linking In order to reduce false positive and negative identifications of interacting proteins, and to preserve geometry information of protein-protein interactions, a cross-linking approach can be applied to living cells before cell lysis and affinity enrichment 38,45-46 .  Figure 1-2. (a) The scheme of the two-step reaction to form cross-links between proteins; (b) The scheme of cross-linked proteins, its digest and types of peptides in the peptide mixture.  6  In the cross-linking process, two proteins that are in close proximity are linked together through the formation of covalent bonds via a small bifunctional molecule known as a chemical cross-linker. The covalent bonds between interacting proteins not only “freeze” the protein-protein interactions that occur in living cells with all spatial and temporal constraints, but also keep weak and transient interactions, thereby reducing false positives and negatives in the identification of interacting proteins. Cross-linkers can be considered as a bridge with one reactive group at each end. Cross-linking reactions occur in two steps (Figure 1-2a): one end of the bridge forms a covalent bond with a protein molecule to generate a modification on the protein, the other end then forms another covalent bond with another or the same protein to generate a cross-link bridge between proteins or within one protein. Consequently, cross-linked proteins may contain multiple modifications and cross-links (Figure 1-2b). Upon enzymatic digestion, cross-linked proteins turn into a mixture of unmodified peptides, modified peptides, peptides with intra-peptide cross-links and cross-linked peptides. The peptide mixture reveals different aspects of protein-protein interactions. Unmodified peptides are predominant and can be used to identify interacting proteins. Cross-linked peptides indicate the regions in proteins that are in close proximity, which are usually regions of interaction. Assigning the cross-link bridges to small segments on the component peptides or even individual residues can further pinpoint regions of interaction. Cross-linked peptides and cross-link sites therefore reveal how the interacting proteins interact with each other 47 , which is hidden without cross-linking. A number of commercially available cross-linkers can be applied to living cells, as shown in Table 1-1. They form different lengths of cross-link bridges (spacer arm length),  7  ranging from 2.5 Å to 16.1 Å in these examples. Generally, a shorter spacer arm is preferred. Longer spacer arms allow cross-linking of proteins that may not interact, and lower the resolution of geometry information provided by cross-link sites. Most of these cross-linkers, except formaldehyde which will be discussed in Chapter 1.2.3, can be classified into two types by their properties. Water soluble cross-linkers, such as BS 3 , DTSSP and sulfo-EGS, can be added to living cells under aqueous physiological conditions but cannot permeate cell membranes. They have been used to study proteins on cell surfaces 48-50 . Water insoluble cross-linkers, such as DSG, DSP, DSS and EGS, can permeate cell membranes and have been used to cross-link intracellular proteins 51-53 . They need to be dissolved in organic solvents such as DMSO, however, which disturbs the physiological conditions of living cells.   8   Table 1-1. Names, structures, properties and spacer arm lengths of cross-linkers applicable to living cells. Name Structure Properties Spacer Arm Length (Å) BS 3   Water -Soluble & Membrane -Impermeable 11.4 DTSSP  12.0 Sulfo-EGS  16.1 DSG  Water -Insoluble & Membrane -Permeable 7.7 DSP  12.0 DSS  11.4 EGS  16.1 Formaldehyde  Water -Soluble & Membrane -Permeable 2.5    9  1.2.3 Advantages of Formaldehyde as the Cross-linker Formaldehyde was found to be able to cross-link proteins decades ago. This feature has been widely used to preserve tissues, such as clinical biopsies, by cross-linking proteins, DNA and RNA so that they are fixed in position. In the last decade, it has been used as an especially suitable cross-linker for in vivo cross-linking of proteins coupled with affinity enrichment and MS analysis 54-59,60 ,61 . As a small molecule, formaldehyde can quickly permeate cell membranes and diffuse quickly inside the cells, resulting in efficient cross-linking of cellular proteins 44  that is thought to capture transient interactions 43 . The corresponding cross-link bridge is approximately 2.5 Å 43 . Therefore, it only cross-links residues within very close proximity, reducing false positive identifications of interaction partners and allowing high-resolution geometry mapping by cross-link sites. Formaldehyde is also water-soluble and thus no organic solvent is needed for dissolving which is required by all other membrane-permeable cross-linkers 53,62-65 . Practically, formaldehyde is inexpensive and widely available. These advantages make formaldehyde a useful cross-linker for the study of protein-protein interactions in living cells. 1.2.4 Formaldehyde Cross-linking in Living Cells and Organisms Formaldehyde cross-linking coupled with affinity enrichment and MS analysis has been applied to identify interacting proteins in various living cells and even  organisms: bacteria 54 , yeast 57-58 , mammalian cells 55,59,61 , and whole mice brains 56,60 . In these studies, formaldehyde cross-linking has been shown to be compatible with different protocols of affinity enrichment, e.g. co-immunoprecipitation of endogenous proteins 56,59-61 and  10  enrichment of tagged proteins 55,57-58 . Furthermore, cross-links can survive both non- denaturing 55-56,59-61  and denaturing 57-59  washing conditions. Although the protein of interest varies in both location and function in these studies, the cross-linking approach allowed identification of both known and novel potential interaction partners with all of them. Therefore, formaldehyde cross-linking is a versatile approach that couples well with different cells and organisms, target proteins, and experimental designs. However, it is currently limited to the identification of interacting proteins, and mapping the regions of interaction by cross-linked peptides and cross-link sites has yet to be realized. 1.3 Identification of Cross-linked Peptides 1.3.1 Challenges in the Identification of Cross-linked Peptides The identification of cross-linked peptides and cross-link sites by MS and MS/MS analysis generally meets two major challenges with all cross-linkers. The first arises from the complexity of the digest of cross-linked proteins. It contains not only unmodified peptides from all interacting proteins, but also peptides of these types: modified peptides, peptides with intra-peptide cross-links, cross-linked peptides, and cross-linked peptides with additional modifications and/or cross-links 66 . Cross-linked peptides are usually of low abundance and form only a subset of this peptide mixture 67-68 . As a result, it is difficult to identify the MS signals of cross-linked peptides.  In addition, cross-link sites are usually identified by the MS/MS spectra of cross-linked peptides, which are beyond the scope of commonly used software tools for the identification of user defined modifications on peptides, such as Mascot and Protein Prospector 69 . MS spectra of cross- linked peptides usually contain backbone ions from component peptides, backbone ions  11  with (part of) the cross-link attached, and backbone ions with both the cross-link and (part of) the other component peptide attached 67 . Significant efforts have been put to the development of experimental strategies and computational tools to facilitate the identification of low abundance cross-linked peptides from the complex peptide mixture 67,70-79 . 1.3.2 Experimental Strategies and Bioinformatics Software to Facilitate the Identification of Cross-linked Peptides Several strategies, including chromatographic enrichment of cross-linked peptides and cross-linkers with signature patterns, have been developed to distinguish cross-linked peptides from other types of peptides. The chromatographic enrichment can be done directly by ion exchange chromatography, which takes advantage of differences in charged groups between cross-linked and other peptides. For tryptic peptides whose N- terminus and the basic side chain of the C-terminal residue each carries a positive charge, cross-linked peptides can carry twice the positive charges compared to other peptides. Cross-linked peptides have been shown to elute at higher salt concentrations from strong cation exchange (SCX) material than other peptides 70-72 . Similarly, peptides that end with an acidic amino acid, such as Glu-C peptides, could carry twice the negative charges compared to other peptides, resulting in cross-linked peptides eluting at higher salt concentration from strong anion exchange (SAX) material. Size exclusion chromatography (SEC) which capitalizes on the higher molar mass and bulkier size of cross-linked peptides is also a possible chromatographic enrichment method 67 . In addition, affinity chromatography is used for cross-linkers with affinity tags, to enrich  12  both cross-linked peptides and modified peptides from the predominant unmodified peptides 73-74 .  Other signature patterns have also been developed to distinguish cross- linked peptides from other peptides, such as isotopic labeling of the cross-linker 75-76  , and chemically 77  or MS/MS cleavable cross-linkers 78-79 . A number of bioinformatics software platforms have been developed in the last decade to automate the data interpretation 71,80-90 . These programs share a general working philosophy to identify cross-linked peptides from the peptide mixture. A library of possible cross-linked peptides is created according to user defined parameters such as identities of involved proteins, the specificity of enzymatic digestion, the cross-linker structure and reactive amino acids. Matches between this library and the experimental peaklist are reported as cross-linked peptides. Some software 66,71,80-83,85,87,89 allow further confirmation of cross-linked peptides by matching their MS/MS spectra to theoretical MS/MS spectra, which are generated by preset fragmentation models 19,91  and residue reactivity. The assignment of MS/MS signals helps to elucidate structures of cross-linked peptides, in other words, cross-link sites. The results from software can be manually interpreted for the final verification. Combining the experimental strategies and bioinformatics software, cross-linked peptides can be identified with cross-link sites assigned, thereby providing information on geometry of protein-protein interactions.  13  1.4 Model Studies of Formaldehyde Cross-linking Reactions 1.4.1 Two-step Reactions Besides the general challenges in identification of cross-linked peptides, the formaldehyde cross-linking approach poses an extra challenge of unclarified reaction chemistry, which has limited the application of bioinformatics software because they require residue reactivity for the analysis of both MS and MS/MS data. Unlike other designed cross-linkers with defined cross-linking reactions, the reaction chemistry of formaldehyde cross-linking has been studied in non-physiological models: small model molecules and amino acids 92-99 , as well as peptides and proteins 100-104 .  Figure 1-3. The scheme of the two-step formaldehyde cross-linking reaction in proteins. Formaldehyde cross-linking of proteins consists of two steps (Figure 1-3). An amino group reacts with formaldehyde to form a methylol intermediate (+30 Da), and can subsequently dehydrate into a Schiff-base structure (+12 Da). The Schiff-base structure can then cross-link to a reactive residue in another protein, and turn into a methylene bridge which adds 12 Da to the two proteins. According to studies in small molecule and amino acid model systems under physiological pH and temperature 92-98 , reactive residues in the cross-linking step include the N-terminus, lysine (K), tyrosine (Y), arginine (R), asparagine (N), glutamine (Q), tryptophan (W), histidine (H) and cysteine (C).  14  (Abbreviations of amino acids are listed in Appendix A.1.) Chemical structures connected to the methylene bridges have been revealed by X-ray crystallography 92  and NMR spectroscopy 99  in model molecules and amino acids. 1.4.2 Residue Reactivity Residue reactivity discovered in small model molecules and amino acids does not necessarily apply to proteins, as local environments around reactive residues in proteins are very different from those around separate amino acids in solution. Three recent reports on model peptides and proteins shed some light on the residue reactivity in each reaction step 101-102,104 . The solvent accessible N-termini and Ks have been shown to be major sites to form Schiff-base modifications in several model proteins, and Arg residues are also reactive but less reactive then N-termini and Lys residues 104 . This study was performed under reactions conditions (formaldehyde concentration, reaction time, temperature and pH) that closely resemble those applied to living cells and organisms, therefore the N-termini, Ks and Rs on the surface of cellular proteins are very likely involved in the modification step which initiates the cross-linking. Residues that could be reactive in the cross-linking step have been revealed by cross-linking of glycine with a Schiff-base modification to various model peptides 102  and a model protein 101 .  Major potentially reactive residues in model peptides are N-termini, Arg, Tyr, Asn, Gln, His and Trp residues, among which N-termini, Arg, Tyr and Gln residues also show potential reactivity in the cross-linking step in one model protein. (Abbreviations of amino acids are listed in Appendix A.1.) However, these studies have  15  been performed by elongated formaldehyde incubation, two days for peptides and one week for the protein. Furthermore, observed cross-links are amino acid to peptide or protein, which does not well represent protein to protein cross-links due to a lack of proper local environment at both ends of the cross-link. Therefore, the cross-linking step on cellular proteins within short formaldehyde incubation (<1hr) is hypothesized to occur on a subset of these reactive residues: N-termini, Arg, Tyr, Asn, Gln, His and Trp. 1.4.3 Model Proteins In recent reports of formaldehyde induced reactions in non-physiological model proteins, insulin and myoglobin are two major model proteins. Insulin and myoglobin are both properly folded proteins with secondary and tertiary structures (Figure 1-4a and 1- 4b), which keep them stable when they are secreted to blood 105-106  and exposed to various enzymes. Also, they are small proteins of 5.7 kDa (insulin) and 17 kDa (myoglobin), and only generate 6 and 14 peptides upon enzymatic digestion (sequences and enzyme cleavage sites shown in Figure 1-4c and 1-4d). Therefore, the peptide mixtures of unmodified, modified and cross-linked peptides from formaldehyde treated model proteins are relatively simple, and more manageable than those from average-size proteins. Lastly, both proteins contain a number of residues that can be involved in two- step cross-linking reactions, as highlighted with colors in Figure 1-4c and 1-4d. The properly folded structure, relatively simple peptide mixture, and existence of reactive residues make insulin and myoglobin suitable model proteins for studies of formaldehyde induced reactions in proteins.  16   Figure 1-4. The 3D structures (a and b) and sequences (c and d) of bovine insulin (a and c) and horse myoglobin (b and d). In (a) and (b), red highlights alpha-helix regions, while green highlights flexible regions. In (d) and (d), solid lines denote disulfide bonds, and dashed lines denote Glu-C cleavage sites. Residues highlighted with colors are reactive in the modification step (orange), potentially reactive in the cross-linking step (hotpink), and (potentially) reactive in both steps (blue). 1.5 Thesis Theme and Overview Formaldehyde cross-linking has been shown to preserve protein interactions in living cells and organisms, but the analysis of the interactions is currently limited to identification of interacting partners based on unmodified peptides. The geometry information of native protein-protein interactions contained in cross-linked peptides is already in the MS data of the digest of cross-linked proteins, but obscured by several related challenges: the low abundance of cross-linked peptides among the complex peptide mixture, the lack of knowledge about their mass spectrometric (MS) properties,  17  and limited understanding of the reaction chemistry. The former challenge can be solved by a combination of experimental strategies and bioinformatics software. However, so little is known about the MS properties of formaldehyde cross-linked peptides that they are not identified even in cross-linked non-physiological model proteins. The identification and characterization of cross-linked peptides in a model protein is the key issue. It can open the door to clarification of reaction chemistry (residue reactivity) which is required by bioinformatics software, as well as development of chromatographic enrichment methods in the future. This work aims to develop a method to identify cross-linked peptides and characterize the formaldehyde cross-linking reactions in model protein systems. Model proteins are cross-linked in aqueous buffers and enzymatically digested. The resulting peptide mixture is separated by LC to reduce sample complexity, and analyzed by an ESI mass spectrometer which is directly coupled to LC. In order to reduce the complexity of model protein systems, one protein is used in each system. A simple protein is used for the initial method development (Chapter 2), while other model proteins of equivalent and higher complexity are used to refine the method (Chapter 3). Along with the identification of cross-linked peptides, the fragmentation patterns shown in the MS/MS spectra of cross-linked peptides are also investigated. Also studied are the localization of cross-link sites by MS/MS spectra and the chemistry of formaldehyde cross-linking reactions on these sites.  18  2 Method Development in a Model Protein System to Identify Cross-linked Peptides and Localize Cross-links 2.1 Introduction The formaldehyde cross-linking approach has the potential to reveal the geometry of native protein-protein interactions by the identification of cross-linked peptides 43 . It has yet to be realized, however, because of three challenges: the low abundance of cross- linked peptides in the complex peptide mixture, lack of knowledge about the mass- spectrometric (MS) properties of cross-linked peptides, and limited understanding of the chemistry of formaldehyde cross-linking in proteins. The latter two challenges have obstructed the development of bioinformatics software to automate data interpretation that usually helps to overcome the former 107 . Therefore, clarifying the MS properties and the reaction chemistry in model protein systems are the key issues to tackle. So far, there has been neither report on the identification of cross-linked peptides in a model protein system, nor investigations of MS/MS data of cross-linked peptides and further studies of residues involved in the cross-linking reaction. In this chapter, the aim is to develop a method to identify cross-linked peptides in a model protein system and to gain knowledge about their MS properties. A model protein is cross-linked in aqueous buffer, and digested to produce a mixture of unmodified, modified and cross-linked peptides. The peptide mixture derived from a single model protein is less complex than that derived from cross-linked native proteins, making it easier to identify cross-linked peptides. However, the cross-linking only occurs on model protein molecules that randomly come into close contact. The cross-linking yield is  19  therefore hypothesized to be much lower than that formed on interacting proteins. Therefore, cross-linked peptides in a model protein are also of low abundance in a relatively simple peptide mixture. To simplify the model system, a small protein discussed in Chapter 1.4.3, insulin, is chosen as the model protein. It is composed of an α and a β chain, with two inter-chain disulfide bonds (Figure 2-1). Only four peptides are produced upon Glu-C digestion when the disulfide bonds are not reduced, which are later mentioned by their sequences: α1-4, α5-17 β1-13, α18-21 β14-21 and β22-30. Consequently, the formaldehyde treated insulin generates a relatively simple peptide mixture after digestion, making it a good model protein for the method development.  Figure 2-1. The sequence of insulin from bovine. Solid lines denote disulfide bonds, dashed lines denote Glu-C cleavage sites.  2.2 Experimental 2.2.1 Materials Insulin from bovine pancreas, α-cyano-4-hydroxycinnamic acid (CHCA), trizma base, ammonium bicarbonate, sodium hydroxide, sodium dodecyl sulfate (SDS), tetramethylethylenediamine (TEMED) and glycerol were all obtained from Sigma (St. Louis, MO). Paraformaldehyde (PFA), formic acid (FA, 88%) and acetonitrile (ACN, HPLC grade) were purchased from Fisher (Fair Lawn, NJ). Acrylamide, ammonium persulfate (APS), bromophenol blue, Coomassie Blue Brilliant R250, gel casting and  20  running systems were purchased from Biorad (Hercules, CA). Endoproteinase Glu-C was obtained from Roche Applied Science (Penzberg, Germany). 3 kDa MW-cut-off filters were purchased from Millipore Corporation (Cork, Ireland), syringe filters (0.22 µm) were purchased from Pall Corporation (Ann Arbor, MI). Deionized water (18 MΩ cm) was prepared using a Nanopure Ultrapure Water System from Barnstead (Dubuque, IA). 2.2.2 Preparation of Formaldehyde Solution A 4% (w/v) (1.3 M) formaldehyde stock solution was prepared by heating (80 ℃) PFA in phosphate buffer saline (PBS) at pH 7.5 for 30 min, cooling to room temperature and filtering through a 0.22 µm filter. 2.2.3 Cross-linking of the Model Protein The model protein, insulin (300 µM), was incubated with formaldehyde (1%, w/v) in PBS (37 ℃, pH 7.5) for 0, 0.5, 2 and 6 hr. The reaction was quenched through the addition of 1 M Tris buffer (pH 7.5), and a final Tris concentration of 0.5 M was reached. The 0 hr sample was prepared by quenching 1% formaldehyde with 1 M Tris buffer 10 minutes before the addition of protein. Control samples of all four time points were prepared by replacing the formaldehyde volume with PBS. Three repeats of the reaction were performed, and all of the following experimental steps were applied to each sample. The model protein was concentrated with 3 kDa MW-cut-off filters, and the buffer was replaced with 0.01% formic acid in water.  21  2.2.4 SDS-PAGE Analysis of Cross-linked Insulin Insulin samples (8.6 µg) in 0.01% formic acid were mixed with PBS and 4× non- reducing SDS loading buffer (500 mM Tris pH 6.8, 8% SDS, 40% glycerol, 5mg/mL bromophenol blue) with a final pH of 7.5, and incubated at 65 ℃ for 5 min. Proteins were then separated on a 15% acrylamide gel and visualized by Coomassie Brilliant Blue R250. 2.2.5 Glu-C Digestion of Cross-linked Insulin Insulin in 0.01% formic acid was digested overnight at 25 ℃ in 50 mM ammonium bicarbonate (pH 7.8) with endoproteinase Glu-C (enzyme:substrate = 1:20 (w/w)). The digestion was quenched by decreasing the pH using 5% formic acid in water (Vdigestion:Vacid=10:1), and samples were stored at -20 ℃. 2.2.6 Mass Spectrometric Analysis of Peptides Peptide samples were diluted in water to 3 µM, then separated and analyzed by nano-HPLC MS and MS/MS on a nanospray-ESI-Q-TOF (QStar XL, Applied Biosystems, Foster City, CA) in the information-dependent-acquisition mode. The 15 cm long, 75 µm I.D. HPLC column was lab-made, packed with 3 µm reverse phase C18 beads (Dr. Maisch, Ammerbuch-Entringen, Germany). Water:acetonitrile:formic acid with 100 min gradient elution (0.1% formic acid to 80% acetonitrile 0.1% formic acid) was used as the mobile phase. MS/MS spectra were collected with nitrogen as the collision gas, and the collision energy varied as an optimized function of m/z and z.  22  2.2.7 Labeling of MS/MS Spectra To label MS/MS spectra of candidates of cross-linked peptides, the widely accepted b and y nomenclature system was modified to clarify component peptides and disulfide bonds. Component peptides of each precursor ion were labeled by Roman Numberals, I, II, III etc. Disulfide bonds were represented by “-”, for example, I-II. The cross-link bridge and modifications are represented by their mass, 12 or 30. As an example of this modified nomenclature system, I-IIy5+12+III represents the following fragment ion:  2.3 Results and Discussion 2.3.1 Cross-linking of Insulin At the beginning of the cross-linking experiments, insulin was incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hours, and quenched with Tris buffer. These samples were separated by SDS-PAGE with a 15% gel (Figure 2-2) to monitor the reactions.  23   Figure 2-2. The SDS-PAGE gel of 300 µM insulin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr. In the gel, control samples remained unchanged when incubated for different periods of time. As the reaction between insulin and formaldehyde proceeded from 0 to 6 hr, higher mass species appeared and became larger proportion of the sample. This indicated the formation of cross-links between insulin molecules and the increase in yield with the longer duration of formaldehyde treatment. The majority of the protein in the formaldehyde treated 6 hr sample was cross-linked. This sample was selected for identification of cross-linked peptides after digestion due to the high yield of cross-links. 2.3.2 Complexity of Insulin Peptide Mixture The formaldehyde treated samples were digested and analyzed by LC-MS/MS. The LC-MS data of the 6 hr sample is presented as a 3D plot (Figure 2-3a) where X axis is the LC retention time, Y axis is the m/z ratio, and grayscale represents the intensity of peptide signals. The number of signals demonstrated the complexity of the peptide mixture from formaldehyde treated insulin. Some of the signals could be clearly assigned to unmodified insulin peptides by mass and MS/MS spectra. Five signals, 417.23 1+ , 543.76 2+ , 689.28 2+ , 731.85 4+  and 975.46 3+  were assigned to insulin peptide α1-4, β22-30,  24  α18-21β14-21, α5-17β1-13 and α5-17β1-13, respectively. For easier visualization of MS signals, zoom-ins of regions around these signals are shown in Figure 2-3b, with these five signals from unmodified insulin peptides highlighted by solid circles. Since formaldehyde induced modifications cause a (12m+30n) Da (m and n are zero or positive integers and cannot be both zero) mass shift, signals that show such mass shifts relative to those five signals could be considered as derived from modified insulin peptides. Similar LC retention time of these peptides to the respective unmodified peptides or MS/MS spectra further supported that they were corresponding modified forms. Signals assigned to modified insulin peptides were highlighted by dashed circles in Figure 2-3b, including those roughly assigned by mass only. A large number of signals in Figure 2-3b could not be assigned to unmodified or modified insulin peptides. All the signals in Figure 2-3a outside of the zoom-in regions were not from unmodified or modified insulin peptides, either. Possible source peptides of these unknown signals are: cross-linked insulin peptides, Glu-C autolysis products, digested impurities of the insulin sample, oxidized peptides and common contaminants in MS spectra of protein digests 108 . In theory, source peptides of unknown MS signal, including cross-linked peptides, could be sequenced and identified by MS/MS spectra. However, it is not feasible to identify cross-linked peptides by determining source peptides of all the unknown signals, for three reasons: there are a large number of unknown signals; most unknown signals are of low abundance; little is known about CID fragmentation patterns of formaldehyde cross-linked peptides. Therefore, the large pool of unknown signals needs to be reduced to a smaller pool of candidates of cross-linked peptides.  25   Figure 2-3. (a) The 3D plot (LC retention time, m/z, signal intensity represented by grayscale) of LC-MS/MS data from the digest of formaldehyde treated 6 hr sample. (b) Zoom-ins of regions (i), (ii), (iii), (iv) and (v) in (a). Circled signals in (i), (ii), (iii), (iv) and (v) are assigned to unmodified (solid circles) and modified (dashed circles) insulin peptides α1-4, β22-30, α18-21β14-21, α5-17β1-13 and α5-17β1-13.  2.3.3 Candidates of Cross-linked Peptides Candidates of cross-linked peptides can be identified by matching unknown experimental MS signals to a theoretical library of possible cross-linked peptides. The theoretical list of cross-linked peptides is made by considering all of the observed  26  modified and unmodified insulin peptides as possible component peptides and combining them one by one. Table 2-1 shows the combinatory table of insulin peptides in the Glu-C digest of formaldehyde treated insulin. The first column/row lists the four insulin peptides and their modified forms assigned to the circled signals in Figure 2-3b, which are possible component peptides in this sample. The second column/row lists the masses of these peptides. These masses are summed up one by one to make the rest of the table, i.e. a theoretical mass list of possible cross-linked peptides in this insulin sample as the cross-linking step does not import any more mass shift relative to component peptides. Table 2-1. The theoretical mass list of possible cross-linked peptides in the digest of formaldehyde treated 6 hr sample, made by considering unmodified and modified insulin peptides as possible component peptides and summing up their masses one by one. Masses in bold are used as examples to illustrate proposed structural components of putative cross-linked peptides. Underlined masses are those of candidates of cross-linked peptides, identified by matching this table with the masses of unknown signals in Figure 2-3a. “Mod” is short for modification(s).  Origin α1-4 α1-4 +1mod α1-4 +2mod α1-4 +2mod β22-30 β22-30 +1mod β22-30 +1mod β22-30 +2mod β22-30 +3mod α18-21 β14-21 α18-21 β14-21 +1mod α18-21 β14-21 +1mod α5-17 β1-13 Origin Mass 416.23 416.23 +12 416.23 +24 416.23 +42 1085.57 1085.57 +12 1085.57 +30 1085.57 +24 1085.57 +54 1376.61 1376.61 +12 1376.61 +30 2923.33 α1-4 416.23 832.46  α1-4+1mod 416.23+12 844.46 856.46  α1-4+2mod 416.23+24 856.46 868.46 880.46  α1-4+2mod 416.23+42 874.48 886.48 898.48 916.5  β22-30 1085.57 1501.8 1513.8 1525.8 1543.82 2171.14  β22-30+1mod 1085.57+12 1513.8 1525.8 1537.8 1555.82 2183.14 2195.14  β22-30+1mod 1085.57+30 1531.82 1543.82 1555.82 1573.84 2201.16 2213.16 2231.18  β22-30+2mod 1085.57+24 1525.8 1537.8 1549.8 1567.82 2195.14 2207.14 2225.16 2219.14  β22-30+3mod 1085.57+54 1555.82 1567.82 1579.82 1597.84 2225.16 2237.16 2255.18 2249.16 2279.18  α18-21β14-21 1376.61 1792.84 1804.84 1816.84 1834.86 2462.18 2474.18 2492.2 2486.18 2516.2 2753.22  α18-21β14-21+1mod 1376.61+12 1804.84 1816.84 1828.84 1846.86 2474.18 2486.18 2504.2 2498.18 2528.2 2765.22 2777.22  α18-21β14-21+1mod 1376.61+30 1822.86 1834.86 1846.86 1864.88 2492.2 2504.2 2522.22 2516.2 2546.22 2783.24 2795.24 2813.26  α5-17β1-13 2923.33 3339.56 3351.56 3363.56 3381.58 4008.9 4020.9 4038.92 4032.9 4062.92 4299.94 4311.94 4329.96 5846.66  The table proposes masses of possible cross-linked peptides by combining possible component peptides, so it implicates structural components of the predicted cross-linked peptides. Three masses in Table 2-1 are highlighted in bold as examples to illustrate this. The mass 844.46 is in the column of α1-4 and the row of α1-4+1mod (12 Da) (“mod” is the abbreviation of modification), which means that a putative cross-linked peptide of  27  this mass would consist of two peptide α1-4s and the cross-link bridge (12 Da) formed from the 12 Da modification. The mass 856.46 is in the column of α1-4 and the row of α1-4+2mod (24 Da), indicating the component structures to be two peptide α1-4s, the cross-link bridge formed from one 12 Da modification, and an extra 12 Da structure formed from another 12 Da modification. This extra 12 Da structure can be the original 12 Da Schiff-base modification or an intra-peptide cross-link bridge on one component peptide, or a second cross-link bridge between two component peptides. A putative cross- linked peptide of m=874.48 Da, in the column of α1-4 and the row of α1-4+2mod (42 Da), is proposed to contain two peptide α1-4s, a cross-link bridge formed from the 12 Da modification, and a 30 Da modification on one component peptide. This theoretical mass list and the masses of unknown signals in Figure 2-3a were matched, and background signals that also appeared in the control samples were eliminated, generating a small poll of six candidates of cross-linked peptides (masses underlined in Table 2-1). Their masses, m/z of MS signals, and proposed structural components are listed in Table 2-2. Later, these candidates are referred to by MS signals: 505.61 3+ , 757.88 2+ , 763.91 2+ , 602.62 3+ , 903.42 2+  and 549.79 4+ .   28  Table 2-2. The m/z value, mass and proposed structural components of candidates of cross- linked peptides in the digest of the formaldehyde treated 6hr insulin sample. The ^ represents the cross-link bridge, while * represents an extra Schiff-base modification or intra-peptide cross-link on one of the component peptides, or a second cross-link bridge between two peptides. m/z Mass Proposed Structural components 505.61 3+  757.88 2+  1513.80 β22 RGFFYTPKA β30  ^ α1 GIVE  α4  763.91 2+  1525.80 ( β22 RGFFYTPKA β30  ^ α1 GIVE  α4 )* 602.62 3+  903.42 2+  1804.80 ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4  549.79 4+  2199.16 ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )*  2.3.4 Verification of the Candidate 505.613+ by the MS/MS Spectrum MS/MS spectra of the six candidates were collected to verify proposed structural components including component peptides, cross-links and modifications. More specifically, fragment ions were proposed based on fragmentations along peptide bonds of proposed component peptides and at the cross-link bridge, and matched with signals in the MS/MS spectra. Series of matches between theoretical fragment ions and experimental signals could indicate the correctness of proposed structural components. The interpretation of MS/MS spectra is illustrated by the spectrum of the candidate 505.61 3+  (Figure 2-4). This candidate is proposed to be insulin peptide I, β22 RGFFYTPKA β30 , cross-linked with peptide II, α1 GIVE  α4 . Firstly, fragmentations at the cross-link bridge could generate pairs of ions of one intact component peptides with or without the cross-link bridge, I and II+12, and I+12 and II. The masses of these four proposed fragment ions were compared with masses of MS/MS signals. Signals 417.25 1+  and 549.80 2+  turned out to match with II and I+12, while no matching signals for I and  29  II+12 were found. This indicated that the source peptide of the signal 505.61 3+  contained two parts whose masses matched that of peptide II with a cross-link bridge and peptide I, separately.  Figure 2-4. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 505.61 3+ .  30  Secondly, fragmentation along the backbones of proposed component peptides could generate series of ions from one component peptide cross-linked with part of the other, and series of their counterparts. Examples of this type of proposed fragment ions generated by fragmentations at each peptide bond along peptide II are shown in Table 2-3. By assuming fragmentations at each peptide bond along peptide I, a similar table was generated. The masses of proposed fragment ions in both tables were compared to masses of MS/MS signals, and three pairs of matches were identified: I+12+IIb1 and IIy3, I+12+IIb2 and IIy2, I+12+IIb3 and IIy1. Mass differences between these signals and the previous two signals I+12 and II, confirmed that the source peptide contained a part with a sequence of GIVE, which matched the sequence of the proposed component peptide II. Table 2-3. Proposed fragment ions, their masses and the matching MS/MS signals, derived by assuming fragmentations along the backbone of the proposed component peptide II of the candidate 505.61 3+  (Its MS/MS spectrum is shown in Figure 2-4). Proposed Fragment Ion Mass (Da) Matching MS/MS Signal I+12+IIb1 1154.59 578.29 2+  IIy3 359.21 360.21 1+  I+12+IIb2 1267.77 634.86 2+  IIy2 246.13 247.12 1+  I+12+IIb3 1366.73 684.39 2+  IIy1 147.07 148.12 1+  I+12+IIy1 1244.62 N/A IIb3 269.18 270.18 1+  I+12+IIy2 1343.69 N/A IIb2 170.11 171.10 1+  I+12+IIy3 1456.77 N/A IIb1 (Immonium Ion) 29.03 N/A   31  Finally, as internal ions from multiple fragmentations are common in CID fragmentation, fragmentations at both the cross-link and peptide backbones could generate b and y ions of peptide I and II with or without the cross-link bridge. This type of proposed fragment ions matched with 18 MS/MS signals: Ib2, Ib3 , Ib4, Ib5+12, Ib6+12, Ib7+12, Ib8+12, Iy1, Iy2, Iy3, Iy4, Iy6+12, Iy8+12 , IIb2, IIb3, IIy1, IIy2 and IIy3. These b and y ion series confirmed that the two parts revealed by signals I+12 and II had the sequence of RGFFYTPKA and GIVE, same as sequences of proposed component peptides I and II. All three types of information together confirmed that the source peptide of signal 505.61 3+  contained insulin peptides β22 RGFFYTPKA β30  and α1 GIVE α4 , and a cross-link bridge. This candidate was therefore verified to be a cross-linked peptide. This cross- linked peptide is later referred to as β22 RGFFYTPKA β30  ^ α1 GIVE α4 , using the symbol ^ to represent the cross-link bridge. Fragmentations on this cross-linked peptide and resulting fragment ions can be classified into three types (Figure 2-5). Type 1 ions are from fragmentations at the cross- link bridge, which are pairs of ions of component peptides, one of each pair with the methylene bridge attached. Type 2 ions are from fragmentation in the backbones of component peptides, which are series of ions that consist of one component peptide cross-linked to part of the other (of different lengths), and another ion series which are their counterparts. Two fragmentations, in the peptide backbones and at the cross-link bridge, produce type 3 ions, b and y ion series of component peptides. These fragmentations and fragment ions are similar to those of peptides cross-linked by other cross-linkers 67 , validating my conclusion that candidate 505.61 3+  is a cross-linked peptide.  32  These types of fragmentations and fragment ions are later examined in other candidates of cross-linked peptides.  Figure 2-5. The MS/MS fragmentation patterns deduced from the MS/MS spectrum of a cross-linked peptide with an MS signal of 505.61 3+  (Figure 2-4), to illustrate 3 types of fragment ion series.  2.3.5 Verification of Candidates 903.422+ and 602.623+by MS/MS Spectra Following this general approach, the MS/MS spectrum of the candidate 903.42 2+  (Figure 2-6) was also interpreted by matching signals to proposed fragmentations and fragment ions (matches shown in Figure 2-7). Candidate 903.42 2+  was proposed to be disulfide bond connected peptide I-II α18 NYCN α21 -  β14 ALYLVCGE β21  cross-linked with peptide III α1 GIVE α4 , as shown in Table 2-2. Fragmentation at the cross-link bridge could generate type 1 ions, I-II and  33  III+12, or I-II+12 and III. The masses of two MS/MS signals 417.24 1+  and 1389.62 1+  matched masses of proposed fragment ions III and I-II+12. A list of type 2 ions was derived by assuming fragmentations along the proposed component peptides I, II and III. There were eight MS/MS signals that matched the proposed type 2 ions from fragmentations along the backbone of peptide II: IIb2 and I-IIy6+12+III, IIb3 and I- IIy5+12+III, IIb4 and I-IIy4+12+III, IIb5 and I-IIy3+12+III. There were 15 MS/MS signals that matched the proposed type 3 ions, their corresponding fragmentation sites are shown in Figure 2-7. Examples are signals 830.27 1+ , 929.34 1+  and 1042.43 1+ , matching the proposed fragment ions I-IIy3+12, I-IIy4+12 and I-IIy5+12, generated by fragmentation at both the backbone of peptide I-II and the cross-link bridge. As with the verification of the candidate 505.61 3+ , three types of proposed fragment ions together confirmed that the source peptide of signal 903.42 2+  contained insulin peptides α18 NYCN α21 - β14 ALYLVCGE β21  and α1 GIVE α4  and a cross-link bridge, therefore was a cross-linked peptide.   34   Figure 2-6. The MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 903.42 2+ .  35   Figure 2-7. The fragmentation patterns deduced from MS/MS spectrum of a cross-linked peptide with an MS signal of 903.42 2+  (Figure 2-6). The candidate 602.62 3+  (Figure 2-8) has the same mass and LC retention time as the candidate 903.42 2+ , suggesting that they are the same peptide. Therefore, proposed fragment ions of candidate 602.62 3+  were the same as those derived from candidate 903.42 2+ . Signals in the MS/MS spectrum of candidate 602.62 3+  (Figure 2-8b) that matched proposed fragment ions (matches are shown in Figure 2-8a) contained all 3 types of fragment ions, which together confirmed the candidate was a cross-linked insulin peptide, ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4 .  36   Figure 2-8. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 602.62 3+ .  37  Furthermore, most of assigned MS/MS signals and corresponding fragmentation sites on source peptides were the same between candidates 602.62 3+  and 903.42 2+ , supporting the hypothesis that 602.62 3+  and 903.42 2+  represented different charge states of the same cross-linked peptide. In terms of the CID fragmentation process, different charge states could cause different charge distributions along the peptide backbones on different protonation sites. However, the fragmentation sites along peptide backbones were not significantly affected by the different charge distribution between these two cross-linked peptides, consistent with a general observation that the singly, doubly or triply protonated forms of the same peptide often produce similar fragmentation patterns 18,109 . In both charge states, no MS/MS signal was matched to a proposed fragment ion from fragmentation at the disulfide bond, suggesting that the disulfide bond was not a preferential fragmentation site. This produced fragment ions that contained up to three peptide chains, adding complexity to proposing fragment ions and the spectrum interpretation. 2.3.6 Verification of Candidates 549.794+, 757.884+ and 763.884+by MS/MS Spectra The previous three candidates were verified by matching MS/MS signals to putative fragment ions derived from proposed structural components, revealing three types of fragment ions generated by cross-linked peptides. MS/MS spectra of the remaining three candidates, 549.79 4+  (Figure 2-9), 757.88 4+  and 763.88 4+  (Figure 2-10), were interpreted in the same way using the same classification of fragment ions.  38  The MS/MS spectrum of the candidate 549.79 4+  (Figure 2-9b) was found to contain only type 1 and 3 ions, their corresponding fragmentation sites on the peptide shown in Figure 2-9a. Type 1 ions I+12 and II+12 (corresponding signals are 549.78 2+  and 1098.46 1+ ) suggested that the precursor ion consisted of two parts of m=1097.56 Da each. This mass equaled that of β22 RGFFYTPKA β30  plus a cross-link bridge or a Schiff-base modification. Type 3 ions, b and y ion series, confirmed that both parts had a sequence of RGFFYTPKA and a structure of +12 Da mass shift attached. Type 1 and type 3 fragment ions together, suggested that the source peptide of the signal 549.79 4+  was ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )*. Here, the symbol * represents an extra modification or intra-peptide cross-link on one component peptide, or an extra inter- peptide cross-link.   39   Figure 2-9. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of a candidate of cross-linked peptide, with an MS signal of 549.79 4+ . The * represents an extra Schiff-base modification or cross-link.  40  The candidate 757.88 2+  is proposed to be peptide β22 RGFFYTPKA β30  cross-linked with α1 GIVE α4  (Table 2-2). The MS/MS spectrum (Figure 2-10a) contains signals from all three types of ions. Type 1 ion I (corresponding signals are 543.71 2+  and 1086.48 1+ ) indicated that the peptide consisted of a component of m=1085.48 Da, the same mass as the proposed component peptide I. Observed type 2 ions were IIy3 and I+12+IIb1, IIy2 and I+12+IIb2, IIy1 and I+12+IIb3. The mass differences between type 1 and 2 ions, and the precursor, I+12+II, suggested a peptide segment of GIVE attached to the part of m=1085.48 Da via a 12 Da structure. Type 1 and 2 ions together suggested that the candidate consisted of a part of m=1085.48 Da, possibly β22 RGFFYTPKA β30 , cross- linked to α1 GIVE α4 . However, there were few type 3 ions, not enough to prove that the part of m=1085.48 Da has a sequence of RGFFYTPKA. Therefore, the candidate 757.88 2+  was not confirmed to be a cross-linked peptide, because not all the proposed structural components were confirmed by the MS/MS spectrum, but it is still a likely candidate. The situation of the MS/MS spectrum of candidate 763.88 2+  (Figure 2-10b) is very similar to that of candidate 757.88 2+ . The candidate is proposed to be β22 RGFFYTPKA β30  cross-linked with α1 GIVE α4 , with an extra modification or cross-link (Table 2-2). The MS/MS spectrum (Figure 2-10b) contained signals from type 1 and 2 ions that suggested that the candidate consisted of a component of m=1097.60 Da, possibly β22 RGFFYTPKA β30  with an extra modification or cross-link, cross-linked to α1 GIVE α4 . However, there were few type 3 ions to sequence the part of m=1097.6 Da. Therefore, the candidate 763.88 2+  was not confirmed to be a cross-linked peptide, but it is still a likely candidate.  41    Figure 2-10. The fragmentation patterns, types of fragment ions and MS/MS spectra of candidates (a) 757.88 2+  and (b) 763.88 2+ . The * represents an extra Schiff-base modification or cross- link. Taking MS/MS spectra of candidates 549.79 4+ , 757.88 4+  and 763.88 4+  together, absence or a limited number of one type of fragment ions did not necessarily affect the  42  confirmation of proposed structural components. Type 1 and type 3 ions in the MS/MS spectrum of candidate 549.79 4+ , with the absence of type 2 ions, were enough to confirm sequences of the proposed component peptides. Although candidates 757.88 2+  and 763.88 2+  contained all three types of fragment ions, the lack of type 3 ions from one proposed component peptide obstructed the confirmation of its sequence. Therefore, these candidates were possibly cross-linked peptides, but not confirmed. My skeptical attitude towards these candidates are supported by the fact that lack of type 3 ions from one component peptides has been shown to be a common source of false positive assignment of cross-linked peptides from other cross-linkers 67,110 . 2.3.7 Partial Stability of Cross-link Bridges in CID Fragmentation The formaldehyde induced cross-links were considered to be vulnerable to CID fragmentation, as the cross-link between glycine and a peptide was reported to break easily in one study 102 . However, the MS/MS spectra of cross-linked peptides β22 RGFFYTPKA β30  ^ α1 GIVE α4  and ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4  (Figure 2-4, 2-6 and 2-8b) showed fragmentation along the backbone of insulin peptide α1 GIVE α4  or α18 NYCN α21 - β14 ALYLVCGE β21  without affecting the cross-link bridge, which generated type 2 ions.  Therefore, formaldehyde induced cross-links are shown to be more stable than previously hypothesized. At the same time, fragmentation also occurred at the cross-link bridge, producing type 1 and 3 ions in MS/MS spectra of cross-linked peptides β22 RGFFYTPKA β30  ^ α1 GIVE α4 , ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4  and ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* (Figure 2-4, 2-6, 2-8b and 2-9b). In conclusion, formaldehyde induced cross-links between peptides are less vulnerable to CID fragmentation than previously predicted but not absolutely stable. This partial  43  stability facilitating the localization of cross-links on the component peptides is discussed in the following section. 2.3.8 Localization of Cross-link Sites MS/MS spectra have been used to localize modifications based on the mass shifts they produce on series of ions that contain modified residues 21-23 . Therefore, it is hypothesized that in the MS/MS spectrum of a cross-linked peptide, series of fragment ions that contain the cross-link bridge (+12 Da mass shift) could be used to localize the cross-link on component peptides. The localization of cross-link sites is illustrated in the MS/MS spectrum of the cross-linked peptide β22 RGFFYTPKA β30  ^ α1 GIVE α4  (Figure 2-4). In the following discussion, peptide I represents β22 RGFFYTPKA β30 , while peptide II represents α1 GIVE α4 . The mass differences between the ion series I+12, I+12+IIb1, I+12+IIb2, I+12+IIb3 and the precursor ion I+12+II, indicated that a peptide segment of GIVE was attached to peptide I, with a cross-link bridge of +12 Da mass shift. In other words, β22 RGFFYTPKA β30  was cross-linked to α1 GIVE α4  at the G α1  residue. The cross-link site on β22 RGFFYTPKA β30  was localized to Y β26 , suggested by the following conclusions derived from the ion series Ib2, Ib3, Ib4, Ib5+12, Ib6+12, Ib7+12, Ib8+12 and I+12. Specifically, Ib2, Ib3 and Ib4 suggested that the cross-link bridge was not attached to β22 RGFF  β25 . The mass difference between Ib4 (508.28 1+ ) and Ib5+12 (683.37 1+ ), 175.09 Da, indicated that the cross-link bridge of +12 Da mass shift was attached on Y β26 . Additionally, the ion series Ib5+12, Ib6+12, Ib7+12, Ib8+12 and I+12 indicated that the cross-link bridge was attached to β22 RGFFY β26  and not to  β26 TPKA β30 . These together indicate that Y β26  was the cross-link site on  44  β22 RGFFYTPKA β30 . The y ion series of β22 RGFFYTPKA β30  localized the cross-link bridge in the same way. The ions series Iy1, Iy2, Iy3, Iy4, Iy6+12, Iy8+12 and I+12 indicated that the cross-link bridge was on β25 FY β26  and not on β22 RGF β24 or β26 TPKA β30 , supporting the identification of Y β26  as the cross-link site by the b ions series. Consequently, the cross-linked peptide β22 RGFFYTPKA β30  ^ α1 GIVE  α4 was formed by G α1  cross-linked to Y β26 . The cross-link sites in the cross-linked peptide ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4  were also localized by fragment ions containing the cross-link bridge. In the following discussion, peptide I-II represents α18 NYCN α21 - β14 ALYLVCGE β21 , while peptide III represents α1 GIVE α4 . In the MS/MS spectra of the doubly charged form (Figure 2-6), I-IIy3+12+III, I-IIy4+12+III, I-IIy5+12+III and I-IIy4+12+III indicated that peptide III was cross-linked to the I-IIy3 segment of peptide I-II. Series of y ions of peptide I-II with the cross-link attached, I-IIy3+12, I-IIy4+12 and I-IIy5+12, also localized the cross-link to the I-IIy3 segment of peptide I-II. The ion series IIb2, IIb3, IIb4, IIb5, IIb6 and I-II+12 suggested that the cross-link bridge was not attached to the IIb6 segment but the I-IIy3 segment. These evidences taken together proved that the cross-linked peptide ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4 was formed by α1 GIVE α4  (III) cross-linked to the α18 NYCN α21 - β19 CGE β21  (I-IIy3) segment. The α18 NYCN α21 - β19 CGE β21  segment as the cross-link site was also assigned by similar series of fragment ions from the triply charged form (MS/MS spectrum shown in Figure 2-8b). Localization of the cross-link bridge in the cross-linked peptide ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* (Figure 2-9) is more complicated. Because the two component peptides are the same, and both the cross-link bridge and the extra  45  modification/cross-link induce a mass shift of +12 Da, it is difficult to discriminate which structure is attached to which component peptide based on the b/y ion series. The detailed discussion of the extra structure and possible cross-link sites, based on additional information, will be provided in Chapter 3 instead. According to these cross-linked peptides, cross-link sites were localized by series of type 2 ions that contained one component peptide cross-linked to part of the other of different lengths, and by type 3 ion series that were b/y ions of component peptides with the cross-link bridge attached. Type 2 and type 3 ions that provided complementary information about the cross-link sites were produced by the cross-link bridge staying intact or breaking, separately. Therefore, the partial stability of cross-link bridges in CID fragmentation facilitated the localization of cross-link sites. In addition, the localization of cross-link sites in peptide β22 RGFFYTPKA β30  ^ α1 GIVE  α4  revealed a G α1  (N-terminus) to Y β26  cross-link in proteins, consistent with the high reactivity of the N-terminus in the modification step, and the potential reactivity of Tyr (Y) in the cross-linking step 102,104 . 2.4 Conclusions and Outlook In this chapter, cross-linked peptides have been studied in a model protein to gain knowledge about the MS properties of cross-linked peptides and the reaction chemistry of formaldehyde cross-linking. As a first step, a method was developed to identify cross-linked peptides in model protein systems. Matches between a theoretical list of cross-linked peptides and experimental MS signals, with background signals subtracted, were considered as candidates for cross-linked peptides. Signals in the MS/MS spectra of a candidate were  46  matched with proposed structural components of the candidates, and the confirmation of all of the proposed structural components then verified the candidate to be a cross-linked peptide. The CID fragmentation of a formaldehyde cross-linked peptide proved to occur at both the cross-link bridge and peptide backbones, generating three types of fragment ions: 1. Ions of component peptides; 2. Series of ions that contain one component peptide cross-linked to part of the other, and another ion series which are their counterparts; 3. The b and y ion series of component peptides, with or without the cross-link bridge. Type 2 and type 3 ions allowed localization of cross-link sites on component peptides. The whole approach allowed the identification of three cross-linked peptides in the insulin model protein system: β22 RGFFYTPKA β30  ^ α1 GIVE α4 , ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4  and ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )*. Also, series of type 2 and type 3 ions allowed the localization of cross-link sites to G α1  and Y β26  in β22 RGFFYTPKA β30  ^ α1 GIVE α4 , and to the α18 NYCN α21 - β19 CGE β21  segment in ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4 . This method is generally applicable to other model proteins, to identify candidates of formaldehyde cross-linked peptides, to interpret the MS/MS spectra based on proposed structural components, and to localize the cross-link sites. The interpretation of the MS/MS spectra of formaldehyde cross-linked peptides is also applicable to data of cross- linked native proteins, once candidates of cross-linked peptides are found in their much more complex peptide mixture. This model study also provided valuable information on the MS properties of cross-linked peptides and the reaction chemistry, which could help to overcome challenges in the identification of cross-linked peptides from cross-linked native proteins.  47  The challenges of identifying low abundance cross-linked peptides from a high complexity peptide mixture could be solved by a combination of enrichment methods for cross-linked peptides and bioinformatics software to automate interpretation of complex data. On the one hand, bioinformatics software require user-input parameters such as a fragmentation model of cross-linked peptides and residue reactivity of the cross-linker. CID fragmentation patterns of formaldehyde cross-linked peptides were revealed in this study for the first time, which can therefore be submitted to bioinformatics software. Also, an N-terminus to Tyr (Y) cross-link induced by formaldehyde was revealed in proteins, which can be submitted to the residue reactivity part of bioinformatics software. On the other hand, the identification of three cross-linked peptides in a model protein makes it possible to develop enrichment methods for cross-linked peptides. Different chromatographic strategies that are established for other cross-linkers, such as strong cation exchange (SCX), strong anion exchange (SAX) and size exclusion chromatography (SEC) 67,70-72 , can be adapted to the digest of formaldehyde treated model protein samples to examine whether signals from cross-linked peptides are improved in signal to noise. The method was developed in a very simple model protein system, and requires verification in other model proteins which could generate larger and more complex peptide mixtures. Moreover, only one pair of reactive residues was revealed in one cross- linked peptide, while cross-link sites in two cross-linked peptides were not localized to individual residues due to the complexity added by disulfide bonds and an extra 12 Da structure. Therefore, investigation of other model proteins and refinement of the method is necessary.  48  3 Method Refinement Using Other Model Protein Systems 3.1 Introduction A method to identify formaldehyde cross-linked peptides in the digest of a model protein was developed in Chapter 2. This method allowed the identification of three cross-linked peptides. Moreover, three types of CID fragmentation and resulting fragment ions from cross-linked peptides were revealed: 1. Fragmentation at the cross-link that produces pairs of signals from component peptides; 2. Fragmentation at peptide backbones that generates a series of ions that contain one component peptide cross-linked to part of the other and another ion series consisting of their counterparts; 3. Fragmentation at both peptide backbones and the cross-link bridge that produces b and y ion series of component peptides. Type 2 and 3 ions helped to localize the cross-link to small regions of component peptides and even individual residues. To validate my method, its applicability to more complex model systems needs to be tested. In addition, a deeper understanding of the formaldehyde cross-linking reactions could be gained, if more cross-linked peptides are identified with cross-links localized to individual residues. In this chapter I apply this method to two other model protein systems, to assess its applicability to model proteins of equivalent and higher complexity, identify more cross-linked peptides, explore the reaction chemistry in more depth, and learn to tackle issues associated with increasing sample complexity. Insulin, if its disulfide bonds are reduced during Glu-C digestion, produces six peptides: α1-4, α5-17, α18-21, β1-13, β14-21 and β22-30. This makes a perfect model system to verify the method, because an effective experimental workflow has been  49  established with this protein, but the resulting peptide mixture is different from the one used for method development (Chapter 2). The method is also applied to a slightly larger model protein, myoglobin. Myoglobin has a molecular weight of 17 kDa, and 14 peptides are generated upon Glu-C digestion. Also, myoglobin has been shown to produce various modified peptides, because it contains 19 lysine residues that are very reactive in the modification step 104 . These facts together lead to a much more complex mixture of unmodified, modified and cross-linked peptides than that from the insulin model system, which is suitable for further method verification and refinement. 3.2 Experimental 3.2.1 Materials Insulin from bovine pancreas, myoglobin from horse heart, α-cyano-4- hydroxycinnamic acid (CHCA), trizma base, ammonium bicarbonate, sodium hydroxide, sodium dodecyl sulfate (SDS), tetramethylethylenediamine (TEMED) and glycerol were all obtained from Sigma (St. Louis, MO). Paraformaldehyde (PFA), formic acid (FA, 88%) and acetonitrile (ACN, HPLC grade) were purchased from Fisher (Fair Lawn, NJ). Acrylamide, ammonium persulfate (APS), bromophenol blue, Coomassie Blue Brilliant R250, gel casting and running systems were purchased from Biorad (Hercules, CA). Endoproteinase Glu-C was obtained from Roche Applied Science (Penzberg, Germany). 3 kDa MW-cut-off filters were purchased from Millipore Corporation (Cork, Ireland), syringe filters (0.22 µm) were purchased from Pall Corporation (Ann Arbor, MI).  50  Deionized water (18 MΩ cm) was prepared using a Nanopure Ultrapure Water System from Barnstead (Dubuque, IA). 3.2.2 Preparation of Formaldehyde Solution A 4% (w/v) (1.3 M) formaldehyde stock solution was prepared by heating (80℃) PFA in PBS at pH 7.5 for 30 min, cooling to room temperature and filtering through a 0.22 µm filter. 3.2.3 Cross-linking of the Model Protein The model protein, insulin (300 µM) or myoglobin (100 µM), was incubated with formaldehyde (1%, w/v) in PBS (37 ℃, pH 7.5) for 0, 0.5, 2 and 6 hr. The reaction was quenched through the addition of 1 M Tris buffer (pH 7.5), and a final Tris concentration of 0.5 M was reached. The 0 hr sample was prepared by quenching 1% formaldehyde with 1 M Tris buffer 10 minutes before the addition of protein. Control samples of all four time points were prepared by replacing the formaldehyde volume with PBS. Three repeats of the reaction were performed, and all of the following experimental steps were applied to each sample. The model protein was concentrated with 3 kDa MW-cut-off filters, and the buffer was replaced with 0.01% formic acid in water. 3.2.4 SDS-PAGE Analysis of Cross-linked Insulin Insulin samples (8.6 µg) in 0.01% formic acid were mixed with PBS and 4× non- reducing SDS loading buffer (500 mM Tris pH 6.8, 8% SDS, 40% glycerol, 5 mg/mL bromophenol blue), and incubated at 65 ℃ for 5 min. Proteins were then separated on a 15% acrylamide gel and visualized by Coomassie Brilliant Blue R250.  51  3.2.5 Mass Spectrometric Analysis of Cross-linked Myoglobin Myoglobin samples were mixed with saturated solution of CHCA (in 50:50 ACN:5% FA) to a final concentration of 5 µM. Each sample was spotted onto a MALDI plate, air dried and analyzed by MALDI-TOF MS (4700 Proteomics Analyzer, Applied Biosystems, Foster City, CA) in linear mode. The centroid mass was recorded. 3.2.6 Glu-C Digestion of Cross-linked Proteins Proteins, insulin or myoglobin, in 0.01% formic acid were digested overnight at 25 ℃ in 50 mM ammonium bicarbonate (pH 7.8) with endoproteinase Glu-C (enzyme:substrate = 1:20 (w/w)). Disulfide bonds in insulin were reduced by DTT at 56 ℃ for 1hr, and alkylated by IAA at 25℃ for 0.5 hr before addition of the enzyme. The digestion was quenched by decreasing the pH using 5% formic acid in water (Vdigestion:Vacid=10:1), and samples were stored at -20 ℃. 3.2.7 Preparation of Modified Insulin α-Chain The formaldehyde treated 0.5 hr and 6 hr insulin samples were reduced by DTT at 56 ℃ for 1 hr, and alkylated by IAA at 25 ℃ for 0.5 hr. Thus insulin α- and β-chains containing modifications were generated. 3.2.8 Mass Spectrometric Analysis of Peptides Peptide and α-/β-chain samples were diluted in water to 3 µM, then separated and analyzed by nano-HPLC MS and MS/MS on a nanospray-ESI-Q-TOF (QStar XL, Applied Biosystems, Foster City, CA) in the information-dependent-acquisition mode. The 15 cm long, 75 µm I.D. HPLC column was lab-made, packed with 3 µm reverse  52  phase C18 beads (Dr. Maisch, Ammerbuch-Entringen, Germany). Water:acetonitrile:formic acid with 100 min gradient elution (0.1% formic acid to 80% acetonitrile 0.1% formic acid) was used as the mobile phase. MS/MS spectra were collected with nitrogen as the collision gas, and the collision energy varied as an optimized function of m/z and z. 3.2.9 Labeling of MS/MS Spectra of Cross-linked Peptides To label MS/MS spectra of candidates of cross-linked peptides, the widely accepted b and y nomenclature system was modified to clarify component peptides and disulfide bonds. Component peptides of each precursor ion were labeled by Roman Numbers, I, II, III etc. The cross-link bridge and modifications were represented by their mass, 12 or 30. 3.2.10  Localization of Modification Sites by Degree of Modification (D.O.M) A graphic visualization has been devised by Toews et al. 103  to localize multiple modifications on a modified peptide by the average number of modifications, the degree of modification (DOM), on fragment ions. DOM of a fragment ion is calculated by the peak area (PA) of one modification state multiplied by the number of modifications it contains and divided by the total PA of all modification states of that fragment ion, and summed across all modification states. A sample calculation of the DOM value of b4 ion from a doubly modified peptide (+24Da) is shown in the following equation. 2441244 244 2441244 124 2441244 4 4 210               bbb b bbb b bbb b b PAPAPA PA PAPAPA PA PAPAPA PA DOM   53  In order to localize modification sites in a multiply modified peptide, DOM values are calculated for each detectable b and y ion, and plotted as bar graphs against the peptide sequence. Along the peptide sequence, the DOM values show a significant difference (a step in the bar graph) at each modified residue and stay unchanged (a plateau in the bar graph) at unmodified residues. In the case of a singly modified peptide, which is a mixture of the same peptide with modifications on several reactive residues, the DOM values of detectable b and y ions are calculated in the same way but expressed as percentage values. DOM values are also plotted against the peptide sequence as bar graphs, where a significant difference in DOM values (a step in the bar graph) is indicative of a modification site. 3.3 Results and Discussion 3.3.1 Model Protein Insulin with Alternative Processing 3.3.1.1 Identification of Five Cross-linked Insulin Peptides The experimental workflow and data analysis were the same as described in Chapter 2, except disulfide bonds were reduced and alkylated before the addition of Glu- C. Briefly, insulin was incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hours, and quenched with Tris buffer. Samples were separated by SDS-PAGE on a 15% gel to verify the progression of cross-linking between insulin molecules. All samples were digested and analyzed by LC-MS/MS, and the formaldehyde treated 6 hr insulin sample was selected for the identification of cross-linked peptides due to the high yield of cross-links. The LC-MS data of formaldehyde treated 6 hr sample also contained a lot of  54  unknown signals, besides the signals assigned to unmodified and modified insulin peptides by mass, MS/MS spectra and LC retention time. Unknown MS signals were matched with a theoretical list of possible cross-linked peptides made by combining all the unmodified and modified insulin peptides in this sample, and therefore were reduced to a few candidates of cross-linked peptides with proposed structural components. MS/MS spectra of candidate peptides were collected. Matching MS/MS signals to proposed fragment ions demonstrated that five candidate peptides (Table 3-1) contained all proposed structural components, and therefore were verified to be cross-linked peptides. They are later referred to as β22 RGFFYTPKA β30  ^ α1 GIVE α4 , ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )*, β22 RGFFYTPKA β30  ^ α18 NYCN α21 , α18 NYCN α21 ^ α1 GIVE α4  and ( α18 NYCN α21  ^ α1 GIVE α4 )*, with ^ representing the cross-link, and * representing an extra modification or cross-link. Table 3-1. The m/z value, mass and structural components of cross-linked peptides identified in the digest of the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced). The ^ represents the cross-link bridge, while the * represents an extra Schiff-base modification or cross- link. m/z Mass Structural components 505.61 3+  1513.83 β22 RGFFYTPKA β30  ^ α1 GIVE  α4  549.79 4+  2195.16 ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* 556.61 3+  1666.83 β22 RGFFYTPKA β30  ^ α18 NYCN α21  499.69 2+  997.38 α18 NYCN α21  ^ α1 GIVE  α4  505.69 2+  1009.38 ( α18 NYCN α21  ^ α1 GIVE  α4 )*  Among the five cross-linked peptides, β22 RGFFYTPKA β30  ^ α1 GIVE α4  and ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* were also identified in Chapter 2, with  55  MS/MS spectra (not shown) similar to Figure 2-4 and 2-9. The remaining three cross- linked peptides, β22 RGFFYTPKA β30  ^ α18 NYCN α21 , α18 NYCN α21 ^ α1 GIVE α4  and ( α18 NYCN α21  ^ α1 GIVE α4 )*, were new findings in this disulfide-reduced model system. However, peptide α18 NYCN α21 ^ α1 GIVE α4 could be the disulfide-reduced form of peptide ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4 , which was identified in the disulfide-non- reduced model system in Chapter 2. The disulfide-non-reduced forms of the other two newly found cross-linked peptides, β22 RGFFYTPKA β30  ^ ( α18 NYCN α21 - β14 ALYLVCGE β21 ) (m=4311.94 Da) and (( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4 )* (m=1816.84 Da), should also exist in the disulfide-non-reduced insulin model system discribed in Chapter 2, although they were not identified. It seems that breaking disulfide bridges makes α18 NYCN α21  containing cross-linked peptides easier to ionize and generate an MS signal, thereby providing complementary information to the disulfide-non-reduced insulin model system. 3.3.1.2 Localization of the Cross-link Bridge After the identification of cross-linked peptides, cross-link sites on component peptides were localized by the method developed in 2.3.8. For the two cross-linked peptides that were also identified in Chapter 2, the MS/MS spectrum of β22 RGFFYTPKA β30  ^ α1 GIVE α4  also localized the cross-link to G α1 and Y β26 , while cross- link sites in  ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* will be discussed in 3.3.1.3. The three newly identified cross-linked peptides produced all of the three types of fragment ions (fragmentation patterns shown in Figure 3-1a, 3-2a and 3-3a): 1. ions of whole component peptides; 2. series of ions that contain one whole component peptide cross-linked to part of the other, and another ion series of their counterparts; 3. series of b  56  and y ion of component peptides. Type 2 and 3 ions were examined in detail to localize cross-link sites. The cross-link sites in α18 NYCN α21 ^ α1 GIVE α4 (Figure 3-1) were localized by both type 2 and type 3 ions. Fragmentations in the backbone of peptide II produced type 2 ions I+12+II b2 and I+12+II b3, suggesting that the cross-link site was on α1 GI α2 . Since isoleucine was shown not to be involved in the formaldehyde cross-linking reactions 92 , the N-terminus G α1  was determined to be the cross-link site. Fragmentations of peptide I generated type 2 ions Iy3+12+II, Ib2+12+II and Ib3+12+II, suggesting that the cross-link was on Y α19 . The b and y ion series from peptide I (type 3 ions: Ib2+12, Ib3+12, Iy1, Iy2 and Iy3+12), also localized the cross-link to Y α19 . Therefore, a Y α19  to G α1  cross-link formed in insulin, and produced the peptide α18 NYCN α21 ^ α1 GIVE α4 upon digestion. Considering the peptide ( α18 NYCN α21 - β14 ALYLVCGE β21 ) ^ α1 GIVE α4 , in which the cross-link was localized to the α18 NYCN α21 - β19 CGE β21  segment (see Chapter 2.3.8), it is likely also formed by the Y α19  to G α1  cross-link.   57   Figure 3-1. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide α18 NYCN α21 ^ α1 GIVE α4 , with an MS signal of 499.69 2+ .   58  The cross-link sites on both component peptides of β22 RGFFYTPKA β30  ^ α18 NYCN α21  (Figure 3-2) were determined by both type 2 and type 3 ions. Fragmentations of the backbone of peptide I Type 2 ions Iy2+12+II, Iy3+12+II and Iy4+12+II, localized the cross-link to β29 KA β30 . The cross-link site was further localized to K β29 , as alanine has been shown not to be reactive in formaldehyde induced reactions 92 . Type 3 ions (Iy2+12, Iy3+12, Iy4+12 and Iy5+12) also assigned the cross-link to K β29 .  The cross-link site on peptide II was determined to be Y α19 , by type 2 ions I+12+IIb2, I+12+IIb3 and I+12+IIy3, and type 3 ions IIb2+12, IIb3+12 and IIy3+12. Therefore, a Y α19  to K β29  cross-link formed in insulin, and produced the peptide β22 RGFFYTPKA β30  ^ α18 NYCN α21  upon digestion. It should be noted that fragmentation occurred at both ends of the cross-link bridge, which together with fragmentation along the peptide backbones generated ion series of both Ib/y and Ib/y+12, as well as both IIb/y and IIb/y+12. A seemingly similar observation will be discussed in Chapter 3.3.2.4.   59   Figure 3-2. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide β22 RGFFYTPKA β30  ^ α18 NYCN α21 , with an MS signal of 556.61 3+ .  60  3.3.1.3 Determination of the Extra Structure of 12 Da Mass Shift by Reactivity Considerations The determination of cross-link sites in ( α18 NYCN α21  ^ α1 GIVE α4 )* is complex as it is difficult to discriminate among three possibilities for the 12 Da structure represented by the asterisk (*): a Schiff-base modification, an intra-peptide cross-link, or a second cross-link between the component peptides. This difficulty in determinating the extra 12 Da structure directly by MS/MS spectra also occured in peptide ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* as discussed in Chapter 2.3.8. Here I combine considerations on the reported reactivity of different residues and the MS/MS spectra in this study to determine the corresponding peptide structures. The structure of peptide ( α18 NYCN α21  ^ α1 GIVE α4 )* was partly resolved without taking into account the extra 12 Da structure. The cross-link site on α1 GIVE α4  was localized to G α1  by both type 2 ions I+24+IIb1, I+24+IIb2 and I+24+IIb3, and type 3 ions IIb2+12 and IIb3+12. The extra 12 Da structure was also localized to one component peptide in the following way. Pairs of type 1 ions I+12 and II+12, and I+24 and II, indicated that the extra 12 Da structure was on peptide I. Type 3 ions Ib2+24 and Ib4+24 also indicated that both the cross-link bridge and the extra 12 Da structure were attached to peptide I. This assignment was also supported by the lack of IIb/y+24 ions, which suggested that the extra 12 Da structure was not attached to peptide II. In this case, the extra 12 Da structure was still possibly a second cross-link bridge between peptide I and II, with the end attached to peptide II easily breaking in CID, resulting in the cross-link bridge only being attached to peptide I in fragment ions. Taken together, this peptide was composed of the N-terminus G α1  on α1 GIVE α4  cross-linked to α18 NYCN α21 , with an extra  61  Schiff-base modification or intra-peptide cross-link on α18 NYCN α21 , or a second cross- link connecting α1 GIVE α4  and α18 NYCN α21 . At this stage, considering the reactivity of different residues over the course of the 6 hr of formaldehyde cross-linking reactions helps exclude some possible peptide structures. The possibility of a Schiff-base modification on α18 NYCN α21  can be eliminated, as none of Asn (N), Tyr (Y) or Cys (C) (in disulfide bonds) has been shown to form Schiff-base modification with formaldehyde exposure for less than 6 hr in model peptides and proteins 103-104 . Since none of Asn, Tyr or Cys (in disulfide bonds) can be modified into Schiff-base structure to initiate the cross-linking step, an intra-peptide cross-link is unlikely to form between Asn and Tyr, Cys and Tyr or Asn and Cys. This leaves a second inter-peptide cross-link as the most reasonable assumption. This possibility is supported by the ability of the N-terminus G α1  to form two cross-links 102 . Furthermore, the possibility of forming two cross-links between the N-terminus and Asn, or two cross- links between the N-terminus and Tyr can be excluded, because neither Asn nor Tyr has been shown to form two cross-links with one glycine molecule 102 . Therefore, the source peptide is likely the N-terminus G  α1  at α1 GIVE α4  cross-linked with N α18  and Y α19 , Y α19  and N α21 , or N α18  and N α21 . The first scenario is supported by the MS/MS spectrum. Type 2 ions, Ib2+24+II and Ib3+24+II, indicated that the two cross-links were on the α18 NY  α19  segment. Type 3 ions, Ib2+24 and Ib3+24, also confirmed two cross-link bridges attached to α18 NY  α19 . Moreover, type 3 ions, Iy1, Iy2 and Iy3+12, indicated that there is one cross- link on Y α19 . In summary, the structure of peptide ( α18 NYCN α21  ^ α1 GIVE α4 )* can be assumed to be N α18  and Y α19  both cross-linked with G  α1 .   62   Figure 3-3. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide ( α18 NYCN α21  ^ α1 GIVE α4 )* , with an MS signal of 505.69 2+ .   63  For  peptide ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* (Figure 2-9), the extra 12 Da structure could not be assigned to one component peptide by type 1 or type 3 ions, as the two component peptides were the same. Additionally, both component peptides contained R β22  that was shown to be reactive in both the modification and the cross- linking step, Y β26  that was potentially reactive in the cross-linking step, and K β29  that was reactive in the modification step. Together, this suggested many possible cross-link sites, as well as identities and locations of the extra 12 Da structure. If the possibility of isomeric peptides with the same component peptides but different modification and cross-link sites were to be considered, resolving structures of isomeric peptides would be even more complex. The complexity in the clarification of cross-linked peptide structures that also contain an extra 12 Da structure is discussed in Chapter 3.3.2.5. 3.3.1.4 The Two-Step Reaction between the N-terminus or Lysine and Tyrosine or Asparagine Residues In this model system, structures of four cross-linked peptides (Figure 3-4) revealed the N-terminus to Tyr (Y), the N-terminus to Asn (N) and Lys (K) to Tyr (Y) cross-links. The progression of both the modification and the cross-linking step on these residues was examined to clarify the order of the two-step reactions on these reactive residues, thereby revealing the chemistry of the formaldehyde cross-linking reactions in proteins.  Figure 3-4. Structures of cross-linked peptides identified in the formaldehyde treated 6 hr insulin sample (disulfide bonds reduced), with cross-link sites localized to individual residues.  64  The progression of the cross-linking step was examined first. All four cross-linked peptides identified in the formaldehyde treated 6 hr insulin sample appeared in formaldehyde treated 0.5 hr and 2 hr samples as well, with similar LC retention time and MS/MS spectra. Therefore, the observed N-terminus to Tyr/Asn and Lys to Tyr cross- links were produced in insulin within 0.5 hr of formaldehyde exposure. The progression of the modification step on each cross-link site was then examined, and correlated with the progression of the cross-linking step. Monitoring the modification on G α1 , Y α19  and N α18  is shown here to illustrate the two-step reactions that form the G α1  to Y α19  and G α1  to N α18  cross-links. Since neither α18 NYCN α21  nor α1 GIVE α4  produced good MS/MS spectra, the progression of the modification on the whole α-chain was examined instead. Figure 3-5a shows the MS/MS spectrum of the singly modified insulin α-chain, α1GIVEQCCASVCSLYQLENYCNα21+12, after 0.5 hr of formaldehyde treatment. The b ion series from b2+12 to b6+12 indicated that the modification was on α1 GI α2  and not on α3 VEQC α6  (Figure 3-5). The modification was further localized to G α1  because isoleucine was shown not to be involved in formaldehyde-induced reactions 92 . The y ions from y1 to y5 suggested that none of the residues in the α17 ENYCN α21  segment was modified within 0.5 hr of formaldehyde exposure, including Y α19  and N α18 . In the MS/MS spectrum of the singly modified α-chain after 6 hr of formaldehyde treatment (Figure 3- 5b), the b2+12 to b9+12 and y1 to y6 also suggested that the modification was on G  α1  and not on Y α19  or N α18 . Thus, G  α1  was modified within 0.5 hr of formaldehyde exposure, while neither Y α19  nor N α18 was modified after 6 hr of exposure. These facts, combined with the formation of G α1  to Y α19  and G α1  to N α18  cross-links within 0.5 hr of  65  formaldehyde exposure, indicated that G α1  was modified and then cross-linked to Y α19  or N α18 .  Figure 3-5. MS/MS spectra of the singly modified insulin α-chain ( α1 GIVEQCCASVCSLYQLENYCN α21 +12) after (a) 0.5 hr, (b) 6 hr of formaldehyde exposure. In both spectra, the Schiff-base modification is localized to G α1 , and proves not to be on N α18  or Y α19 . The two-step reaction to form the G α1  to Y α19  cross-link in insulin is shown in Figure 3-6a. It is consistent with the reported residue reactivity: N-termini in model proteins have been shown to be modified by formaldehyde within 20 min 104 ; Ys in model peptides have been shown to form cross-links with Schiff-base structures on glycine within 2 days of formaldehyde exposure 102 . Moreover, reaction schemes of the modification and cross-linking steps (Figure 3-6b) could be derived from chemical structures of cross-linked model molecules and amino acids 92,99 . In this reaction, the amino group of the N-terminus went through an addition with formaldehyde and formed  66  a methylol modification, which then dehydrated into a Schiff-base structure. In the cross- linking step, the Schiff-base structure turned into a methylene bridge connected to the aromatic ring of the tyrosine side chain. Upon Glu-C digestion, this cross-link produced peptide α18 NYCN α21  ^ α1 GIVE α4  (Figure 3-4a). The two-step reaction to form a G α1  to N α18  cross-link in insulin is shown in Figure 3-6c. The G α1  to N α18  cross-link was suggested to form as a second cross-link after the formation of the G α1  to Y α19  cross-link, based on the fact that the G α1  to Y α19  cross-link was identified alone, while the G α1  to N α18  cross-link was only identified together with the G α1  to Y α19  cross-link. This hypothesis was also supported by a report that, in model peptides, Ns were less reactive than Ys in the cross-linking step 102 . After the formation of the G α1  to Y α19  cross-link, G α1  was modified again and cross-linked to N α18 (Figure 3-6c). The order of the two reaction steps is also consistent with the reported reactivity of N- termini in the modification step and Ns in the cross-linking step 102,104 . This together with the clarified chemical structures proposed detailed reaction schemes (Figure 3-6d).  After the formation of the G α1  to Y α19  cross-link, another formaldehyde molecule was added to the amino group of the N-terminus to form a methylol modification, which then dehydrated into a Schiff-base structure. The Schiff-base structure then turned into a second methylene bridge connected to the primary amide group at the asparagine side chain. Upon Glu-C digestion, this cross-link produced peptide ( α18 NYCN α21  ^ α1 GIVE α4 )* (Figure 3-4b).  67   Figure 3-6. The two-step reactions (a and c) and proposed reaction schemes (b and d) to form the G α1  to Y α19  (a and b) and G α1  to N α18  (c and d) cross-links. Symbol ^ represents the cross-linker. The formation of the K β29  to Y α19  cross-link (Figure 3-4c) was investigated following the progression of the modification on K β29  and Y α19 . In the MS/MS spectrum of the singly modified peptide β22 RGFFYTPKA β30 +12 in the formaldehyde treated 0.5 hr insulin sample (Figure 3-7), y2+12, y3+12, y4+12, y7+12 and y8+12 ions localized the modification to β29 KA β30 . The modification was further assigned to K β29 , as alanine was shown not to be reactive in formaldehyde-induced reactions 92 . Thus, K β29  was modified within 0.5 hr of formaldehyde exposure, while Y α19  was not modified after 6 hr of  68  exposure as discussed above (Figure 3-5b). Moreover, the K β29  to Y α19  cross-link formed within 0.5 hr of formaldehyde exposure. Together this indicated that K β29  was modified and then cross-linked to Y α19  (Figure 3-8a), consistent with the reported reactivity of Ks in the modification step and Ys in the cross-linking step 102,104 .  Figure 3-7. The MS/MS spectrum of the singly modified β22 RGFFYTPKA β30  (+12Da) after 0.5 hr of formaldehyde exposure. The yn+12 ion series indicates that K β29  was modified within 0.5 hr. Detailed schemes of reactions producing the K β29  to Y α19  cross-link were also proposed (Figure 3-8b): the ε-amino group of the lysine side chain reacted with formaldehyde and formed a methylol modification, which then dehydrated into a Schiff- base structure; the Schiff-base structure then turned into a methylene bridge connecting to the aromatic ring of the tyrosine side chain. Upon Glu-C digestion, this cross-link produced peptide β22 RGFFYTPKA β30  ^ α18 NYCN α21  (Figure 3-4c).  69   Figure 3-8. The (a) two-step reactions and (b) proposed reaction schemes to form the K β29  to Y α19  cross-link. Symbol ^ represents the cross-linker. The formation of the G α1  to Y β26  cross-link (Figure 3-4d) was investigated by the progression of the modification on G α1  and Y β26 . Since G α1  was proven to be modified within 0.5 hr of formaldehyde exposure (Figure 3-5a), the modification on Y β26  was examined here. In the MS/MS spectrum of the singly modified β22 RGFFYTPKA β30  (+12 Da) after 0.5 hr of formaldehyde treatment (Figure 3-7), y2+12 to y8+12 ions localized the modification to β29 KA β30 , while b2+12 to b8+12 ions localized the modification to β22 RG β23 . Therefore, multiple residues in peptide β22 RGFFYTPKA β30  were modified within 0.5 hr of formaldehyde exposure. Individual modification sites could be localized by the average number of modifications, the degree of modification (DOM), of b and y ions from a more heavily modified β22 RGFFYTPKA β30 . The modification sites on the doubly modified β22 RGFFYTPKA β30  after 6 hr of formaldehyde exposure were examined (Figure 3-9). The DOM was calculated for each detectable b and y ion in the MS/MS spectrum (Figure 3-9a) and plotted as bar graphs along the peptide sequence (Figure 3- 9c), with the calculation of DOMb6 shown in Figure 3-9b as an example. In Figure 3-9c, each significant difference (about 1) in DOM values between adjacent b/y ions indicated  70  a modification. The DOM values of b ions suggested that one modification was on β22 RG β23 , and the other was on K β29 . DOM values corresponding to y ions localized one modification to β29 KA β30  and the other to R β22 . Therefore, Y β26  was not modified even after 6 hr of formaldehyde treatment.  Figure 3-9. (a) The numbering of b and y ions along peptide β22 RGFFYTPKA β30 . (b) The MS/MS spectrum of the modified β22 RGFFYTPKA β30  (+24Da) after 6 hr of formaldehyde exposure. (c) A sample calculation of the DOM value of the b6 ion. PA is the abbreviation of peak area. (c) The bar graphs of DOM values of the b and y ions against the peptide sequence. Series of b and y ions together suggest that the two Schiff-base modifications are on R β22  and K β29  but not on Y β26 .  71  The modification occurring on G α1  within 0.5 hr and not on Y β26  after 6 hr of formaldehyde exposure indicated that G α1  was modified first and then cross-linked to Y β26  (Figure 3-10a). This order of the N-terminus reacting before Tyr (Y) is the same as that of the G α1  to Y α19  cross-link (Figure 3-6a), and is also consistent with the reactivity of N-termini and Ys in the modification and cross-linking step. Therefore, the proposed reaction schemes (Figure 3-10b) are the same as in Figure 3-6b.  Figure 3-10. The (a) two-step reactions and (b) proposed reaction schemes to form the G α1  to Y β26   cross-link. Symbol ^ represents the cross-linker. Taking all four cross-linked peptides (Figure 3-4) into account, the progression of both the modification and the cross-linking step demonstrated that the N-terminus/Lys to Tyr/Asn cross-links were formed by modification on the N-terminus or Lys first and then cross-linking to Tyr or Asn took place. This conclusion supports a previously published hypothesis that the formaldehyde cross-linking of proteins includes a Mannich type reaction: a primary amino group is modified and then forms a methylene bridge with the side chain of Asn or Tyr. This hypothesis has been demonstrated with amino acids and small molecules 98 . However, here it has been shown in proteins for the first time.  72  3.3.1.5 Physiological Relevance of the Identified Cross-links in Insulin The characterization of formaldehyde cross-linking reactions in the insulin model system should indicate characteristics of these reactions in living cells. In this part, I discuss the relevance of G α1  to Y α19 , G α1  to N α18 , K β29  to Y α19  and G α1  to Y β26  cross-links observed in the insulin model system to formaldehyde cross-linking under physiological conditions. The four observed cross-links are formed within 30 min of exposure to 1% formaldehyde, at 37 ℃ and pH 7.5. These conditions closely resemble those applied when attempting formaldehyde cross-linking in living cells and organisms 54-61 . Therefore, the N-terminus to Y/N and K to Y cross-links very likely form during the formaldehyde cross-linking of native proteins, as long as these residues in interacting proteins are in close proximity. Additionally, the formation of two cross-links on the N-terminus G α  under the same reaction conditions suggest that two cross-links can form on the N- terminus of a native protein during formaldehyde cross-linking in living cells and organisms. A further question is whether the G α1  to Y α19 , G α1  to N α18 , K β29  to Y α19  and G α1  cross-links in insulin reflect any physiologically relevant interactions. Although the model system does not contain interaction partners of insulin such as the insulin receptor family 111 , insulin is well known to interact with itself and form non-covalent dimers and hexamers in aqueous solution. The interaction interfaces in dimeric and hexameric insulin are the basis for designing insulin mutants for the treatment of diabetes 112-113 . Therefore, the possibility of formaldehyde cross-linking capturing the physiologically relevant dimer- or hexamer-forming interactions is examined. One common way to verify the  73  formation of cross-links between interacting proteins is to compare the length of the cross-link bridge to the distances between cross-link sites, the distance constraints. To determine whether formaldehyde induced cross-links formed between monomeric subunits of the dimeric or the hexameric insulin, distances between cross-link sites located in two monomers of a dimeric or hexameric insulin molecule were measured in their 3D crystal structures (PDB No.: 2A3G and 3AIY), as shown in Table 3-2. All of the distances were much longer than the length of the methylene bridge (2.5 Å), indicating that the observed G α1  to Y α19 , G α1  to N α18 , K β29  to Y α19  and G α1  cross-links in aqueous insulin were not likely formed within dimeric or hexameric insulin. However, considering the flexibility of peptide backbones and side chains in aqueous solution, the possibility of these residues coming into close proximity (< 2.5 Å) and becoming cross- linked cannot be excluded. Therefore, although exceptional situation might occur, cross- links identified in the model protein system did not likely provide information about interaction that are likely physiologically relevant. Table 3-2. Distance constraints between cross-link sites in dimeric and hexameric insulin (PDB# 2A3G and 3AIY). The Cross-link Distance Constraint/Å Dimer Hexamer G α1  to Y α19  19.82 20.58 G α1  to N α18  23.42 23.58 K β29  to Y α19  18.18 16.13 G α1  to Y β26  15.73 17.22   74  3.3.2 Myoglobin as the Larger Model Protein 3.3.2.1 Cross-linking of Myoglobin In a separate series of experiments, the method for the identification and characterization of cross-linked peptides was applied to a larger model protein, myoglobin. Just as in the insulin model system, the experimental workflow started with cross-linking of the model protein, ensuring a high yield of cross-links to allow identification of cross-linked peptides. Myoglobin was incubated with formaldehyde or without formaldehyde (control), for 0, 0.5, 2 and 6 hr, and then quenched with Tris buffer. The formation of cross-links was confirmed and the yield was determined, this time by acquiring MS spectra (Figure 3-11).  75   Figure 3-11. MS spectra of 100 µM myoglobin incubated with or without formaldehyde (control), for 0, 0.5, 2 and 6 hr. In the four control samples, the signal at around 17 kDa was from unmodified myoglobin. The signal at around 34 kDa was from non-specific non-covalent dimers of myoglobin, which were shown to form during the storage of myoglobin at -20 ℃ as lyophilized powder and to be a common observation in MALDI-MS analysis of proteins 114-115 . Neither the mass nor the peak intensity of the two peaks changed significantly as incubation time increased, indicating the myoglobin stayed unchanged for  76  6 hr without formaldehyde treatment. For the four formaldehyde treated samples, there was a significant increase in the signal intensity of dimeric myoglobin as the reaction between myoglobin and formaldehyde proceeded from 0 to 6 hr. In addition, trimeric myoglobin (around 51 kDa) appeared at 2 hr, and signal intensity significantly increased at 6 hr of formaldehyde incubation. The m/z values of monomeric, dimeric and trimeric myoglobin also increased as the duration of formaldehyde treatment. These were signs of myoglobin being modified and cross-linked by formaldehyde into dimers and trimers, and signs of an increase in the extent of modification and the yield of cross-links as the reaction proceeded. The formaldehyde treated 6 hr sample with highest cross-linking yield was later selected for the identification of cross-linked peptides. The MS spectrum proved an alternative way to visualize the formation and yield of cross-linking in model proteins. Compared to SDS-PAGE separation, it showed exact changes in protein mass that demonstrate the extent of modification, but it could not provide absolute quantification of the yield of cross-links. However, neither the exact quantification of the extent of modification nor that of the yield of cross-linking is necessary at this stage. Either method can be used to track the formation and yield of cross-links. 3.3.2.2 Identification of Three Cross-linked Myoglobin Peptides The steps of the experimental workflow and data processing in the myoglobin model system were similar to those applied to the insulin model system. All samples were digested and analyzed by LC-MS/MS, and the formaldehyde treated 6 hr myoglobin sample was selected for the identification of cross-linked peptides. The 3D plot of its LC- MS data (Figure 3-12) showed a much more complex pattern than that of insulin (Figure  77  2-3a), as it contained many more signals with a lot of overlapping m/z values. Only 55 signals were assigned to unmodified and modified myoglobin peptides based on mass, MS/MS spectra and LC retention time, as listed in Appendix A.2.  Figure 3-12. The 3D plot (LC retention time, m/z, signal intensity) of LC-MS/MS data from the digest of the formaldehyde treated 6 hr myoglobin sample. The remaining unknown signals were compared to a theoretical list of possible cross-linked peptides to generate a list of candidates of cross-linked peptides. Since the list of myoglobin peptides was longer and the pool of unknown signals was much larger than those of the insulin system, a small program was developed using MatLab (Appendix A.3) to generate the theoretical mass list of cross-linked peptides and compare it to masses of experimental LC-MS signals. Eighty-one masses were found in common between the theoretical and experimental list. After eliminating background signals that also appear in control samples, 27 candidates of cross-linked peptides remained. These candidates were verified by matching their MS/MS signals to putative fragment ions derived from their proposed structural components. Shown in Figure 3-13,  78  3-14 and 3-15 are fragmentation patterns and MS/MS spectra of three candidates that were identified to be cross-linked. Each of the MS/MS spectrum of these three candidates proved to contain type 1 and type 3 ions only, the same situation as the MS/MS spectrum of the cross-linked insulin peptide ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* (Figure 2- 9). For candidate 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154  (Figure 3-13), type 1 ions I+12 and II (signals being 740.41 2+  and 521.27 1+ ) indicated that the precursor ion contained two parts of m=1478.8 Da and m=520.3 Da. These masses equaled the mass of 138 LFRNDIAAKYKE 149  with a cross-link bridge attached and that of 150 LGFQG 154 , separately.  Nearly complete Ib and Iy ion series verified that the part of m=520.3 Da had the sequence LGFQG. Similarly, IIb, IIb+12, IIy and IIy+12 ion series verified that the part of m=1478.8 Da had the sequence LFRNDIAAKYKE and a 12 Da structure attached. Type 1 and type 3 ions together confirmed that the candidate was 138 LFRNDIAAKYKE 149  cross-linked with 150 LGFQG 154 . For MS/MS spectra of candidates ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTE 53 )* (Figure 3-14b) and ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTEAE 55 )* (Figure 3-15b), the proposed structural components were confirmed by type 1 ions demonstrating the mass of the two parts and type 3 ions verifying the amino acid sequence of both parts. The * symbol in these two peptides represents an extra 12 Da structure, a Schiff-base modification or a cross-link bridge.  79   Figure 3-13. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of the cross-linked peptide 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154 , with an MS signal of 667.37 3+ .  80   Figure 3-14. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of the cross-linked peptide ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTE 53 )*, with an MS signal of 767.97 4+ . The * represents an extra Schiff-base modification or cross-link.  81   Figure 3-15. The (a) fragmentation patterns and types of fragment ions and (b) MS/MS spectrum of cross-linked peptide ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTEAE 55 )*, with an MS signal of 817.97 4+ . The * represents an extra Schiff-base modification or cross-link.  82  The method for identifying candidates of cross-linked peptides and verifying their proposed structural components proved to work in this larger model protein, although the sample complexity required computational power for data processing. The fragmentation patterns observed in cross-linked myoglobin peptides contained fragmentations at both the cross-link bridge and peptide backbones, generating two types of fragment ions: ions of whole component peptides (type 1) and b and y ions of component peptides (type 3). Although type 2 ions, one component peptide cross-linked to part of the other, were not observed, the analysis of type 1 and 3 ions together provided enough information to verify all the proposed structural components. It should be noted that in Chapter 2.3.6, candidates with m/z of 757.88 2+  and 763.88 2+  were not verified to be cross-linked peptides although all three types of ions appeared in their MS/MS spectra. Therefore, it is the completeness of information from fragment ions rather than appearance of all types of fragment ions that can verify a candidate to be cross-linked. The remaining 24 candidates that were not confirmed as cross-linked peptides, however, highlighted other issues accompanied with the increasing sample complexity. Twelve candidates did not generate good quality MS/MS spectra that could allow the confirmation of proposed structural components, most likely because they were of low abundance and co-eluted with many abundant peptides. The remaining twelve, according to their MS/MS spectra, did not contain proposed structural components. This reflected overlaps of peptide masses in complex peptide mixtures. Five of these twelve candidates were confirmed by analyzing MS/MS spectra to be modified myoglobin peptides, rather than the proposed component peptides. The remaining seven candidates were not assigned to modified myoglobin peptides. A possible origin of these seven candidates  83  was myoglobin peptides with Tris-formaldehyde adducts, formed by the primary amino group on Tris cross-linked to myoglobin via formaldehyde during the quenching of the formaldehyde reaction by concentrated (1 M) Tris buffer. They could also be solvent cluster ions of modified, cross-linked or Tris-formaldehyde-adduct myoglobin peptides, which were not eliminated by background subtraction based on control samples.  In future, developing an enrichment method may help to relieve some of these issues. Enrichment of low abundance cross-linked peptides would improve the quality of their MS/MS spectra to allow the confirmation of proposed structural components. This is especially important for four candidates, which are likely cross-linked peptides because the major signals in their poor quality MS/MS spectra matched with some fragment ions from the proposed structural components. In addition, if part of the unmodified and modified peptides could be depleted as the cross-linked peptides are enriched before LC- MS/MS analysis, overlaps of peptide masses and false positive identifications of candidates of cross-linked peptides could be reduced. 3.3.2.3 Localization of Cross-link Sites to One Individual Residue Cross-link sites were localized by type 2 and type 3 ions in the insulin model system. Here, this method was applied to the three cross-linked myoglobin peptides. Since their fragmentation patterns only contain type 1 and type 3 ions, cross-linked sites were localized based on type 3 ions alone—b and y ions of component peptides with or without the cross-link bridge attached. In the MS/MS spectrum of ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTE 53 )* (Figure 3-14b), the b ion series Ib2, Ib3, Ib4+12, Ib5+12, Ib7+12, Ib8+12, Ib9+12, Ib11+12,  84  Ib12+12 and Ib13+12 localized the cross-link bridge to R 32 . The y ion series Iy1 to Iy3, Iy5 and Iy7 to Iy10 indicated that the cross-link was not on the 33 LFTGHPETLE 42  segment but on the 29 VLIR 32  segment, supporting the localization of the cross-link to R 32 . Therefore, peptide II 43 KFCKFKHLKTE 53  was cross-linked to the R 32  on peptide I 29 VLIRLFTGHPETLE 42 . In the MS/MS of ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTEAE 55 )* (Figure 3-15b), the cross-link was also localized to R 32  by the b and y ions of 29 VLIRLFTGHPETLE 42 . The localization of the cross-link to R 32  is consistent with reported residue reactivity. Arginines have been shown to be potentially reactive in the cross-linking step of formaldehyde induced reactions within 2 days of formaldehyde exposure 102 . Arginines are not considered to be very reactive during the modification step, but this R 32  in myoglobin has been reported to be modified by formaldehyde under the same reaction conditions as in this study 104 . The two cross-linked peptides ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTE 53 )* and ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTEAE 55 )* did not only have the same cross-link site (R 32 ), but also had the same LC retention time and were similar in structural components except for the extra segment of 54 AE 55 in the latter one. Therefore, cross-link sites and the identity and location of the extra 12 Da structure could be considered as the same in these two peptides. The cross-link site on peptide 43 KFDKFKHLKTE 53  or 43 KFDKFKHLKTEAE 55  was not localized because ions that contained part of the peptide backbone and the cross-link bridge were not observed. However, possible cross-link sites could be proposed by considering reactivity of residues in both component peptides. On the one hand, one of the four Lys (K) residues  85  was very likely modified and then cross-linked to R 32 . On the other hand, it is also possible that R 32  was modified and cross-linked to H 49 . Other residues in component peptides were excluded as they were shown not to be reactive in the formaldehyde cross- linking reactions of model molecules 92-104 . 3.3.2.4 Localization of Cross-link Sites to Several Residues In the MS/MS spectrum of peptide 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154  (Figure 3-13b), many b and y ions of peptide I 138 LFRNDIAAKYKE 149  (type 3 ions) had both Ib/y and Ib/y+12 forms. This seemed similar to the cross-linked insulin peptide β22 RGFFYTPKA β30  ^ α18 NYCN α21  discussed in 3.3.1.2, which had an MS/MS spectrum (Figure 3-2b) that contained signals from both Ib/y and Ib/y+12 ions, as well as both IIb/y and IIb/y+12 ions. The cause of these observations concerning peptide β22 RGFFYTPKA β30  ^ α18 NYCN α21  was fragmentation at both ends of the cross-link bridge. This did not seem to apply to peptide 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154 , because Ib/y ions were not accompanied by series of IIb/y+12 ions. An alternative explanation is that the cross-link on 138 LFRNDIAAKYKE 149  was not at one specific residue but at two or more residues. In other words, the MS/MS spectrum was from a mixture of isomeric cross-linked peptides that had the same component peptides but different cross-link sites. Localizing multiple cross-link sites in isomeric cross-linked peptides was not straightforward. Therefore, the bar graph visualization to determine multiple modification sites on a singly modified peptide was adapted, by considering the cross-link bridge as a “modification” on peptide I 138LFRNDIAAKYKE149. DOM values of Ib/y ions were calculated in the same way as for the sample calculation shown in Figure 3-16a, and plotted as bar graphs against the peptide sequence (Figure 3-16b). The DOM values of b  86  ions showed significant differences between b8 and b9, b9 and b10, b10 and b12, suggesting K 146 , Y 147  and 148 KE 149  as the sites of “modification”—cross-link. The plateau of DOM values around 25% at b5 to b8 ions ( 142 DIAA 145 ) indicated that the 138 LFRN 141  segment was also a cross-link site while 142 DIAA 145  was not. Therefore, the b ion series indicated that 150 LGFQG 154  was cross-linked to the 138 LFRN 141  segment, K 146 , Y 147  or 148 KE 149 . The significant differences of DOM values along the y ion series, however, localized the cross-link bridge to K 148 , Y 147 , K 146  or the 138 LFRNDIAA 141  segment. The DOM bar graphs of b and y ions together localized the cross-link to 138 LFRN 141 , K 146 , Y 147  or K 148 .  Figure 3-16. (a) The numbering of b and y ions along the myoglobin peptide 138 LFRNDIAAKYKE 149 . (b) A sample calculation of the DOM values of detectable b and y ions. PA is the abbreviation of peak area. (c) Bar graphs of DOM values of b and y ions from peptide 138 LFRNDIAAKYKE 149  with a cross-link bridge attached against the peptide sequence, derived from the MS/MS spectrum of peptide 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154  (Figure 3-13b) Assigned cross-link sites are consistent with reported residue reactivity in formaldehyde cross-linking. Lys (K) and Tyr (Y) residues have been shown to be reactive in the modification step and the cross-linking step, as discussed in Chapter 3.3.1.4. The  87  138 LFRN 141  segment contains R 139  and N 141 , which are reactive in the modification step and/or potentially reactive in the cross-linking step 102,104 . Considering the reactivity of residues in peptide II 150 LGFQG 154 , Q 153  which is potentially reactive in the cross-linking step is likely the residue that is cross-linked to R 139 , K 146  or K 148 , which are reactive in the modification step. It is noteworthy that the DOM values of b2 to b8 ions broke the usual increasing step and plateau pattern in DOM bar graphs. High values of 60% appeared at b2 and b3 ions, which could be explained in two ways. First, the CID fragmentation is known to generate various internal ions that can overlap with b and y ions. A mixture of large and highly charged isomeric cross-linked peptides generates an especially complex MS/MS spectrum, with a high probability of overlapping m/z values between fragment ions. High DOM values of b2 and b3 ions could be caused by overlaps between b2/3+12 and other fragment ions. Second, the extent of CID fragmentations at selective peptide bonds has been shown to be affected by basic residues, especially arginine 27,116-117 . The cross-link bridge attached to R 140  could change its basicity, and therefore alter the extent of fragmentation at some peptide bonds, causing high DOM values of b2 and b3 ions. Unfortunately, the effect of basic residues on fragmentation patterns is not well studied on non-tryptic peptides. There have been a few examples in non-tryptic peptides to examine possible changes in the extent of fragmentations due to formaldehyde-induced modifications on basic residues 103 . Modifications on basic residues do not seem to alter the extent of fragmentations at peptide bonds. However, effects caused by cross-links on basic residues have not been studied and cannot be ruled out. In order to fully understand the complex fragmentation patterns and facilitate the determination of cross-link sites in  88  isomeric cross-linked peptides, more isomeric cross-linked peptides need to be identified and investigated in the future. 3.3.2.5 Complexity of Extra Modifications/Cross-links Cross-linked peptides ( 29 VLIRLFTGHPETLE 42 ^ 43 KFDKFKHLKTE 53 )* and ( 29 VLIRLFTGHPETLE 42 ^ 43 KFDKFKHLKTEAE 55 )* both contain an extra 12 Da structure, which can be a Schiff-base modification, an intra-peptide cross-link or a second cross-link between component peptides. Their structures were partially resolved before considering the identity and location of the extra 12 Da structure. In the MS/MS spectrum of peptide ( 29 VLIRLFTGHPETLE 42 ^ 43 KFDKFKHLKTE 53 )* (Figure 3-14b), type 1 ions I and II+24, and I+12 and II+12 indicated that the extra 12 Da structure was attached to peptide II 43 KFDKFKHLKTE 53 . Series of b and y ions IIb9+12, IIb10+12, IIy1 and IIy2 localized the extra 12Da structure to the 43 KFDKFKHLK 51  segment. These, combined with the cross-link site R 32 , indicated that R 32  on 29 VLIRLFTGHPETLE 42  is cross-linked to 43 KFDKFKHLKTE 53  with an extra 12 Da structure attached to the 43 KFDKFKHLK 51  segment, as shown in Figure 3-14a. In the same way, peptide ( 29 VLIRLFTGHPETLE 42 ^ 43 KFDKFKHLKTEAE 55 )* was determined to be R 32  on 29 VLIRLFTGHPETLE 42  cross-linked to 43 KFDKFKHLKTEAE 55  with an extra 12 Da structure attached to the 43 KFDKFKHL 50  segment, as shown in Figure 3-15a. The similarity between these two partially resolved structures, further confirmed the assumption in Chapter 3.3.2.3 that these two cross- linked peptides had the same cross-link sites and the same identity and location of the extra 12 Da structure.  89  The partially resolved structures and reported residue reactivity can be used to suggest possible cross-link sites and the identity and location of the extra 12 Da structure. As discussed in Chapter 3.3.2.3, R 32  is likely cross-linked to K 43/46/48/51  or H 49 . If the cross-link is R 32  to H 49 , the extra 12 Da structure on the 43 KFDKFKHL 50  segment could be either a Schiff-base modification on K 43/46/48 , or a second cross-link bridge connecting H 37  and K 43/46/48 . If the cross-link is R 32  to K 43/46/48/51 , the extra 12Da structure could be a Schiff-base modification on K 43/46/48 , an intra-peptide cross-link between K 43/46/48  and H 49 , or a second cross-link bridge connecting H 37  and K 43/46/48  or R 32  and K 43/46/48 . In summary, there are still significant number of possibilities, and all of them are still supported by the MS/MS spectra available at the moment. As shown both above and in Chapter 3.3.1.3, an extra 12 Da structure adds much complexity when attempting to clarify the structure of the cross-linked peptide. Combining residue reactivity with MS/MS spectra helped to resolve the structure of ( α18 NYCN α21  ^ α1 GIVE α4 )*, in which component peptides were short and each contained residues reactive in either the modification or the cross-linking step. This method could not determine the exact structure of peptide (β22RGFFYTPKAβ30^β22RGFFYTPKAβ30)*, ( 29 VLIRLFTGHPETLE 42 ^ 43 KFDKFKHLKTE 53 )* and ( 29 VLIRLFTGHPETLE 42 ^ 43 KFDKFKHLKTEAE 55 )*, in which component peptides were long and each contained several residues that were reactive in both steps, however. These difficulties in resolving the extra 12 Da structure could become a challenge in other model proteins and living cells and organisms. This is because the average-length Glu-C digest peptides (15 residues) 118  are a bit longer than those of the three unresolved cross- linked peptides, and they likely contain many reactive residues. Table 3-3 illustrates the  90  abundance (numbers) of formaldehyde-reactive residues in an average-length Glu-C peptide, that is, their abundance in proteins 119  multiplied by the peptide length. In a 15- residue peptide, the total number of modifiable residues is 1.66, while the total number of residues that are reactive in the cross-linking step is 3.12. The number of reactive residues in the modification step and the cross-linking step and the fact that some residues can form two cross-links or one cross-link plus one modification 92,99,102  together indicate a large possibility of forming an extra modification or cross-link in a cross- linked peptide. Table 3-3. The abundance of formaldehyde reactive residues in proteins and average-length Glu-C peptides (15 residues). Reactive Step Modification Modification & Cross- linking Cross-linking Amino Acid K R Y N Q H W Abundance 1  0.059 0.051 0.032 0.043 0.043 0.023 0.014 Abundance in a 15- Residue Peptide 0.89 0.77 0.48 0.65 0.65 0.35 0.22  Further investigation into more cases of cross-linked peptides with an extra 12 Da structure is necessary in order to find a more effective way of resolving their structures. Many more formaldehyde cross-linked peptides with the extra 12 Da structure need to be identified for a comprehensive understanding. This could be achieved by identification of cross-linked peptides in more model proteins. Additionally, enrichment of cross-linked peptides from current model proteins may also allow the identification of more cross-  1  Creighton, T. E. Proteins: Structures and Molecular Properties; second ed.; W. H. Freeman, 1992  91  linked peptides that currently escape detection due to low abundance, or that were identified as candidates but not confirmed due to poor quality MS/MS spectra. 3.3.2.6 The Physiological Relevance and Reaction Chemistry of Identified Cross-links in Myoglobin The three cross-linked peptides identified in the formaldehyde treated 6 hr myoglobin sample appeared in formaldehyde treated 0.5 hr and 2 hr samples as well, with similar LC retention time and MS/MS spectra. Therefore, the observed cross-links on Arg (R), Lys (K) and Tyr (Y) residues in myoglobin were produced within 0.5 hr of exposure to formaldehyde. More specifically, these cross-links were formed within 30 min of exposure to 1% formaldehyde, at 37 ℃ and pH 7.5, conditions that closely resemble those applied to living cells and organisms 54-61 . Therefore, cross-links would very likely form on Arg, Lys or Tyr during the formaldehyde cross-linking of native proteins. Compared to the studies in insulin model system, the myoglobin model system revealed one more formaldehyde-reactive residue, Arg, under near-physiological conditions. The reaction chemistry was determined by the progression of both the modification and the cross-linking step on cross-link sites in Chapter 3.3.1.4. Here, the same analysis is applied to the myoglobin model system. The cross-linking step in which the observed cross-links in myoglobin were formed was known to proceed within 0.5 hr of formaldehyde exposure. However, only the cross-link site on one end of the cross-link bridge was determined for each cross-linked peptide due to the complexity of the MS/MS spectra caused by multiple cross-link sites (see Chapter 3.3.2.4) or the extra 12 Da structure (see Chapter 3.3.2.5). The lack of information on the progression of the modification step on unassigned cross-link sites obstructed clarification of the two-step  92  reaction that produced cross-links observed in myoglobin. In order to study the reaction chemistry in myoglobin or other proteins, challenges in the determination of cross-link and modification sites in complex cross-linked peptides need to be overcome in the future. 3.4 Conclusions and Outlook In this chapter the method to identify cross-linked peptides and localize cross-link sites was refined in two model protein systems: insulin (disulfide bonds reduced during digestion) and myoglobin. The whole approach allowed the identification of 5 cross- linked insulin peptides and 3 cross-linked myoglobin peptides, and the partial localization of the cross-link sites (underlined residues): β22 RGFFYTPKA β30  ^ α1 GIVE α4 , ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )*, β22 RGFFYTPKA β30  ^ α18 NYCN α21 , α18 NYCN α21 ^ α1 GIVE α4 , ( α18 NYCN α21  ^ α1 GIVE α4 )*, 138 LFRNDIAAKYKE 149  ^ 150 LGFQG 154 , ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTE 53 )* and ( 29 VLIRLFTGHPETLE 42  ^ 43 KFDKFKHLKTEAE 55 )*. Therefore the method has proven to be readily applicable to other model proteins. The fragmentation patterns of cross-linked peptides observed in Chapter 2 were further confirmed in these model proteins. Fragmentations occurred at both the cross-link bridge and in the peptide backbones, generating two or three types of fragment ions. The improved understanding of the fragmentation patterns could be applied to bioinformatics software that automate the data interpretation of cross-linking experiments, and to the manual interpretation of MS/MS spectra from experiments in living cells and organisms. The localization of cross-link sites by type 2 and type 3 ions revealed the N- terminus to Tyr/Asn and Lys to Tyr cross-links, cross-links on Lys, Tyr and Arg, and two  93  cross-links forming on one single N-terminus in proteins. This valuable and direct information about residue reactivity might also be used in bioinformatics software. Furthermore, monitoring progression of both the modification and the cross-linking step on reactive residues revealed the reactivity of the N-terminus and Lys in the modification step and the reactivity of Tyr and Asn in the cross-linking step. The reaction chemistry was revealed in proteins for the first time, and the results were consistent with studies conducted in simpler model systems. As the size of the model protein increased, complexity was added to both the MS and MS/MS data, and prompted refinement of the method. MatLab programming was applied to speed up the processing of MS data. For isomeric cross-linked peptides with multiple cross-link sites and complex fragmentation patterns, DOM values and a bar- graph visualization was introduced to localize the cross-link. Increased complexity of the model protein also revealed remaining issues which have to be solved: false positive identification of candidates of cross-linked peptides due to overlapping m/z values in the complex peptide mixture, and insufficient information to verify candidates due to poor quality MS/MS spectra. To help relieve these issues in the future, an enrichment method should be developed to increase the proportion of cross- linked peptides and reduce the complexity of the peptide mixture.  As observed in both model proteins, an extra 12 Da structure added complexity to the clarification of the cross-linked peptide structures. Since this issue would very likely occur in the average-length cross-linked peptides in model protein systems and physiologically relevant systems, the structure determination needs to be fully understood by investigation of more cross-linked peptides. In order to identify more cross-linked  94  peptides, the application of the approach to more model proteins or the enrichment of low abundance cross-linked peptides in current model proteins is necessary.  95  4 Conclusions and Future Perspectives The formaldehyde cross-linking approach is a powerful tool to study protein- protein interactions in living cells and organisms, which could reveal both interacting proteins and the geometry of interactions. However, its potential to map the geometry of protein-protein interactions is limited by challenges in the identification of cross-linked peptides in digests of cross-linked proteins. This study of formaldehyde cross-linking in model proteins is aimed at establishing a method to study the MS properties of cross- linked peptides and the chemistry of cross-linking reactions, in order to facilitate surmounting these challenges. A method to identify and characterize cross-linked peptides in model proteins has been established as follows. Model proteins are cross-linked, with yields confirmed by SDS-PAGE or MS spectra. Protein samples are digested and analyzed by LC-MS/MS. Candidates of cross-linked peptides are identified by matching a theoretical list of possible cross-linked peptides to experimental MS signals using a MatLab program (Appendix A.3). MS/MS spectra of candidates are collected and interpreted by matching signals to proposed fragment ions. A candidate is confirmed to be cross-linked if all of the proposed structural components (component peptides, cross-link and extra modifications or cross-links) are confirmed by the MS/MS spectrum. Series of fragment ions which contain the cross-link bridge localize the cross-link to individual residues or small segments on the component peptides. Multiple cross-link sites on isomeric cross- linked peptides can be localized by a bar graph visualization. After the localization of cross-link sites to individual residues at both ends of the cross-link bridge, examining the  96  progression of both the modification and the cross-linking steps on cross-link sites reveals the chemistry of the two-step cross-linking reaction and reaction schemes. Cross-linked peptides identified in this study are formed in non-physiological protein models and do not reflect physiologically relevant geometry information. However, valuable knowledge about the formaldehyde cross-linking reactions is gained directly in proteins. The deeper understanding of the reactions can be used in several ways toward the ultimate goal of revealing the geometry of protein-protein interactions in their cellular environments by identifying cross-linked peptides and determining cross-link sites. A number of bioinformatics programs have been developed to automate the identification of cross-linked peptides and the determination of cross-link sites in cross- linking experiments in living cells 68,71,80-90 . However, these programs have not been combined with the formaldehyde cross-linking approach due to a limited understanding of the residue reactivity and the fragmentation characteristics. In this study, the N- terminus to Tyr/Asn and Lys to Tyr cross-links, cross-links on Lys, Tyr or Arg, and two cross-links forming on a single N-terminus are revealed in proteins, under reaction conditions that closely resemble those applied when studying living cells and organisms. These types of cross-links very likely form during the formaldehyde cross-linking of native proteins in their cellular environment, therefore they can be submitted to bioinformatics software as residue reactivity parameters. In addition, the fragmentation patterns of formaldehyde cross-linked peptides are observed for the first time. This information can be used to establish the fragmentation model part of the bioinformatics software. However, the residue reactivity and fragmentation patterns have been studied in less than 10 cross-linked peptides so far. Moreover, the structure determination of a  97  cross-linked peptide with an extra 12 Da structure via the CID MS/MS spectrum is still a puzzle, as only 4 cases have been investigated. For a comprehensive understanding of the reactive residues and a fragmentation model that would contribute to the development of unbiased bioinformatics software, investigation of more model proteins with the established method is necessary in the near future. An enrichment method is usually necessary to facilitate the identification of low abundance cross-linked peptides from the complex digest of cross-linked native proteins 67,70-72 . Additionally, the complexity of MS data even in the myoglobin model system suggests a need for enrichment methods. When model proteins of equivalent and larger size are investigated, enrichment could reduce false positive identification of candidate cross-linked peptides and improve the quality of MS/MS spectra. The identification of cross-linked insulin and myoglobin peptides opens the door to testing different enrichment methods in model proteins. Chromatography methods such as strong cation exchange (SCX), strong anion exchange (SAX) and size exclusion chromatography (SEC) can be applied to the digest of formaldehyde treated insulin or myoglobin, to determine which of these, if any, can increase the proportion of formaldehyde cross-linked peptides in the peptide mixture. If an enrichment method proves to work in model proteins, it can also be used to enrich formaldehyde cross-linked peptides from experiments in living cells and organisms. Besides the development of enrichment methods and bioinformatics software, an interactome and interface (2IP) strategy has also been developed to facilitate mapping the geometry of cross-linked interacting proteins 120 . In this strategy, cross-linked proteins are cleaved by chemicals and digested by enzymes into different sizes of peptides. The  98  comparative analysis of these peptides allows localization of cross-links to a digest-sized peptide segment.  This low-resolution localization of cross-link sites has allowed low- resolution geometry mapping of protein-protein interactions in several model proteins. However, it does not include the direct verification of identified cross-linked peptides and determination of cross-link sites by MS/MS spectra. Our knowledge about the MS/MS characteristics of cross-linked peptides can be combined with this strategy and applied to cross-linked native proteins, for a further confirmation of the identification of cross- linked peptides by MS/MS spectra, and high resolution mapping of the geometry of protein-protein interactions by localization of the exact cross-link sites. Aside from assisting the development of experimental strategies and bioinformatics software for the formaldehyde cross-linking approach, my study also suggests considerations for the general experimental design of protein cross-linking. As shown in the insulin model system, reducing or not reducing the disulfide bonds, digesting or not digesting provides complementary information about cross-linked peptides and cross- linking reactions. More specifically, a smaller peptide mixture was produced when disulfide bonds are not reduced during the enzymatic digestion, which is suitable for the method development but complicates MS/MS spectra of cross-linked peptides. Later, reducing disulfide bonds produced a different peptide mixture that not only provided a good model system for method verification and refinement, but also allowed the localization of cross-links to individual residues due to reduced complexity of MS/MS spectra of cross-linked peptides. Moreover, although cross-linked insulin was digested for identification and characterization of cross-linked peptides, monitoring the progression of modifications on G α1 , Y α19  and N α18  was performed on the whole insulin α-chain, which  99  was easier to ionize and fragment than short peptides after digestion. These observations suggest that for more comprehensive information about a model protein or a physiological system, digestion with or without reducing disulfide bonds and different proteolytic enzymes or chemical cleavage reagents can be used to produce peptides and peptide mixtures of different complexity. In fact, the 2IP strategy is an excellent example of this idea, because it is based on complementary information from longer peptides produced by chemical cleavage and shorter peptides produced by enzymatic digestion. Therefore, altering the routine workflow of proteomic studies (mentioned in 1.1.1) is a possible way to creatively overcome challenges in the protein cross-linking studies. Last but not the least, different MS instrumentation can be used to optimize the detection of cross-linked peptides. Ion mobility spectrometry (IMS), which separates gaseous ions according to the collision cross-sections, is a possible alternative to simplify peptide mixtures produced by cross-linked proteins. IMS coupled with MS has already been used to separate isomeric modified peptides with different modification sites 121 . It is therefore a promising technique to separate isomeric cross-linked peptides such that their structures can be resolved individually in the subsequent MS/MS analysis. In this way, individual structures of isomeric cross-linked peptides 138 LFRNDIAAKYKE 149 ^ 150 LGFQG 154  and ( β22 RGFFYTPKA β30  ^ β22 RGFFYTPKA β30 )* could be resolved. Currently, the application of IMS-MS to cross-linking samples is mainly for simple mixtures such as modified standard peptides 121  or intact cross-linked model proteins 122 . This technique has yet to be adapted to more complex peptide mixtures. Applying different fragmentation methods to cross-linked peptides can also be considered. Electron-capture dissociation (ECD) and electron-transfer dissociation (ETD)  100  are suitable for long, highly charged peptides, and tend to cleave only along peptide backbones but not modifications or cross-links. Collision-induced dissociation (CID), infrared multiphoton dissociation (IRMPD), ECD and ETD have been used together for complementary and unambiguous determination of cross-link sites for other cross- linkers 123-125 . Instruments equipped with these fragmentation techniques are therefore an alternative option for the analysis of formaldehyde cross-linking samples. In conclusion, a method has been established in this study to identify and characterize formaldehyde-induced cross-links in model proteins. Knowledge gained in this study can be used for the development of enrichment methods and bioinformatics software. These future directions combined with proper instrumentation and careful experimental design can facilitate the identification of formaldehyde cross-linked peptides and the determination of cross-link sites in cross-linked native proteins. This shall allow the high-resolution mapping of the geometry of protein-protein interactions in their native cellular environment. The formaldehyde cross-linking approach has been successfully applied to the study of various living cells and organisms with versatile experimental designs. Therefore, it is not difficult to envision that questions of which proteins interact and how they interact in various biological systems can be answered by this approach. Furthermore, clinical biopsies that are preserved by formaldehyde cross- linking can be studied to examine abnormal changes in protein-protein interactions associated with disease states. Considering the variety of tissue banks and the numbers of stored disease tissues, studying formaldehyde cross-linked proteins stands a good chance in revealing characteristic abnormal protein-protein interactions associated with various diseases.  101  References  (1) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Science 1989, 246, 64.  (2) Karas, M.; Hillenkamp, F. Anal Chem 1988, 60, 2299.  (3) Chait, B. T. Structure 1994, 2, 465.  (4) Wilm, M.; Shevchenko, A.; Houthaeve, T.; Breit, S.; Schweigerer, L.; Fotsis, T.; Mann, M. Nature 1996, 379, 466.  (5) Belov, M. E.; Gorshkov, M. V.; Udseth, H. R.; Anderson, G. A.; Smith, R. D. Anal Chem 2000, 72, 2271.  (6) McLafferty, F. W.; Fridriksson, E. K.; Horn, D. M.; Lewis, M. A.; Zubarev, R. A. Science 1999, 284, 1289.  (7) Yates, J. R.; Mccormack, A. L.; Eng, J. Abstr Pap Am Chem S 1994, 207, 101.  (8) Henzel, W.; Watanabe, C.; Stults, J. J Am Soc Mass Spectr 2003, 14, 931.  (9) Pan, S. Q.; Gu, S.; Bradbury, E. M.; Chen, X. Anal Chem 2003, 75, 1316.  (10) Kleno, T. G.; Leonardsen, L. R.; Kjeldal, H. O.; Laursen, S. M.; Jensen, O. N.; Baunsgaard, D. Proteomics 2004, 4, 868.  (11) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551.  (12) Eng, J. K.; Mccormack, A. L.; Yates, J. R. J Am Soc Mass Spectr 1994, 5, 976.  (13) Craig, R.; Beavis, R. C. Bioinformatics 2004, 20, 1466.  (14) Biemann, K.; Scoble, H. A. Science 1987, 237, 992.  (15) Eckart, K. Mass Spectrom Rev 1994, 13, 23.  (16) Roepstorff, P.; Fohlman, J. Biomedical Mass Spectrometry 1984, 11, 601.  (17) Biemann, K. Biomed Environ Mass 1988, 16, 99.  (18) Papayannopoulos, I. A. Mass Spectrom Rev 1995, 14, 49.  (19) Paizs, B.; Suhai, S. Mass Spectrom Rev 2005, 24, 508.  (20) Ballard, K. D.; Gaskell, S. J. International Journal of Mass Spectrometry and Ion Processes 1991, 111, 173.  102   (21) Gaskell, S. J.; Bolgar, M. S.; Cox, K. A. Methods in Protein Structure Analysis 1995, 141.  (22) Wysocki, V. H.; Resing, K. A.; Zhang, Q. F.; Cheng, G. L. Methods 2005, 35, 211.  (23) Johnson, H.; Eyers, C. E. In LC-MS/MS in Proteomics; Cutillas, P. R., Timms, J. F., Eds.; Humana Press: 2010; Vol. 658, p 93.  (24) Tang, X. J.; Boyd, R. K. Rapid Commun Mass Sp 1992, 6, 651.  (25) Cox, K. A.; Gaskell, S. J.; Morris, M.; Whiting, A. J Am Soc Mass Spectr 1996, 7, 759.  (26) Wysocki, V. H.; Tsaprailis, G.; Smith, L. L.; Breci, L. A. J Mass Spectrom 2000, 35, 1399.  (27) Tsaprailis, G.; Nair, H.; Somogyi, Á.; Wysocki, V. H.; Zhong, W.; Futrell, J. H.; Summerfield, S. G.; Gaskell, S. J. J Am Chem Soc 1999, 121, 5142.  (28) Tsaprailis, G.; Somogyi, A.; Nikolaev, E. N.; Wysocki, V. H. International Journal of Mass Spectrometry 2000, 196, 467.  (29) Farrugia, J. M.; Taverner, T.; O'Hair, R. A. J. International Journal of Mass Spectrometry 2001, 209, 99.  (30) Kollmann, K.; Mutenda, K. E.; Balleininger, M.; Eckermann, E.; von Figura, K.; Schmidt, B.; Lübke, T. Proteomics 2005, 5, 3966.  (31) Foster, L. J.; de Hoog, C. L.; Zhang, Y.; Xie, X.; Mootha, V. K.; Mann, M. Cell 2006, 125, 187.  (32) Dosemeci, A.; Makusky, A. J.; Jankowska-Stephens, E.; Yang, X.; Slotta, D. J.; Markey, S. P. Mol Cell Proteomics 2007, 6, 1749.  (33) Yan, W.; Aebersold, R.; Raines, E. W. J Proteomics 2009, 72, 4.  (34) Ghaemmaghami, S.; Huh, W.; Bower, K.; Howson, R. W.; Belle, A.; Dephoure, N.; O'Shea, E. K.; Weissman, J. S. Nature 2003, 425, 737.  (35) de Godoy, L. M.; Olsen, J. V.; Cox, J.; Nielsen, M. L.; Hubner, N. C.; Frohlich, F.; Walther, T. C.; Mann, M. Nature 2008, 455, 1251.  (36) Hall, D. B.; Struhl, K. J Biol Chem 2002, 277, 46043.  (37) Blagoev, B.; Kratchmarova, I.; Ong, S. E.; Nielsen, M.; Foster, L. J.; Mann, M. Nat Biotechnol 2003, 21, 315.  103   (38) Gingras, A. C.; Gstaiger, M.; Raught, B.; Aebersold, R. Nat Rev Mol Cell Bio 2007, 8, 645.  (39) Vasilescu, J.; Figeys, D. Curr Opin Biotech 2006, 17, 394.  (40) Krogan, N. J.; Cagney, G.; Yu, H. Y.; Zhong, G. Q.; Guo, X. H.; Ignatchenko, A.; Li, J.; Pu, S. Y.; Datta, N.; Tikuisis, A. P.; Punna, T.; Peregrin-Alvarez, J. M.; Shales, M.; Zhang, X.; Davey, M.; Robinson, M. D.; Paccanaro, A.; Bray, J. E.; Sheung, A.; Beattie, B.; Richards, D. P.; Canadien, V.; Lalev, A.; Mena, F.; Wong, P.; Starostine, A.; Canete, M. M.; Vlasblom, J.; Wu, S.; Orsi, C.; Collins, S. R.; Chandran, S.; Haw, R.; Rilstone, J. J.; Gandi, K.; Thompson, N. J.; Musso, G.; St Onge, P.; Ghanny, S.; Lam, M. H. Y.; Butland, G.; Altaf-Ui, A. M.; Kanaya, S.; Shilatifard, A.; O'Shea, E.; Weissman, J. S.; Ingles, C. J.; Hughes, T. R.; Parkinson, J.; Gerstein, M.; Wodak, S. J.; Emili, A.; Greenblatt, J. F. Nature 2006, 440, 637.  (41) Butland, G.; Peregrin-Alvarez, J. M.; Li, J.; Yang, W. H.; Yang, X. C.; Canadien, V.; Starostine, A.; Richards, D.; Beattie, B.; Krogan, N.; Davey, M.; Parkinson, J.; Greenblatt, J.; Emili, A. Nature 2005, 433, 531.  (42) Bouwmeester, T. Nat Cell Biol 2004, 6.  (43) Sutherland, B. W.; Toews, J.; Kast, J. J Mass Spectrom 2008, 43, 699.  (44) Sinz, A. Anal Bioanal Chem 2010, 397, 3433.  (45) Back, J. W.; de Jong, L.; Muijsers, A. O.; de Koster, C. G. Journal of Molecular Biology 2003, 331, 303.  (46) Melcher, K. Curr Protein Pept Sci 2004, 5, 287.  (47) Aebersold, R.; Mann, M. Nature 2003, 422, 198.  (48) Staros, J. V.; Anjaneyulu, P. S. R. Method Enzymol 1989, 172, 609.  (49) Staros, J. V.; Kotite, N. J.; Cunningham, L. W. Method Enzymol 1992, 215, 403.  (50) Tomaska, L.; Resnick, R. J. J Biol Chem 1993, 268, 5317.  (51) Suchanek, M.; Radzikowska, A.; Thiele, C. Nat Methods 2005, 2, 261.  (52) Kobayashi, T.; Hearing, V. J. J Cell Sci 2007, 120, 4261.  (53) Zhang, H. Z.; Tang, X. T.; Munske, G. R.; Tolic, N.; Anderson, G. A.; Bruce, J. E. Mol Cell Proteomics 2009, 8, 409.  104   (54) Layh-Schmitt, G.; Podtelejnikov, A.; Mann, M. Microbiology 2000, 146 ( Pt 3), 741.  (55) Vasilescu, J.; Guo, X.; Kast, J. Proteomics 2004, 4, 3845.  (56) Schmitt-Ulms, G.; Hansen, K.; Liu, J. L.; Cowdrey, C.; Yang, J.; DeArmond, S. J.; Cohen, F. E.; Prusiner, S. B.; Baldwin, M. A. Nat Biotechnol 2004, 22, 724.  (57) Guerrero, C.; Tagwerker, C.; Kaiser, P.; Huang, L. Mol Cell Proteomics 2006, 5, 366.  (58) Tagwerker, C.; Flick, K.; Cui, M.; Guerrero, C.; Dou, Y.; Auer, B.; Baldi, P.; Huang, L.; Kaiser, P. Mol Cell Proteomics 2006, 5, 737.  (59) Hájek, P.; Chomyn, A.; Attardi, G. J Biol Chem 2007, 282, 5670.  (60) Bai, Y.; Markham, K.; Chen, F. S.; Weerasekera, R.; Watts, J.; Horne, P.; Wakutani, Y.; Bagshaw, R.; Mathews, P. M.; Fraser, P. E.; Westaway, D.; George- Hyslop, P. S.; Schmitt-Ulms, G. Mol Cell Proteomics 2008, 7, 15.  (61) Klockenbusch, C.; Kast, J. J Biomed Biotechnol 2010.  (62) Meunier, L.; Usherwood, Y. K.; Chung, K. T.; Hendershot, L. M. Mol Biol Cell 2002, 13, 4456.  (63) Agou, F.; Ye, F.; Véron, M. In Protein-Protein Interactions; Fu, H., Ed.; Humana Press: 2004; Vol. 261, p 427.  (64) Zeng, P. Y.; Vakoc, C. R.; Chen, Z. C.; Blobel, G. A.; Berger, S. L. Biotechniques 2006, 41, 694.  (65) Bomgarden, R. D. Genet Eng Biotechn N 2008, 28, 24.  (66) Schilling, B.; Row, R. H.; Gibson, B. W.; Guo, X.; Young, M. M. J Am Soc Mass Spectr 2003, 14, 834.  (67) Leitner, A.; Walzthoeni, T.; Kahraman, A.; Herzog, F.; Rinner, O.; Beck, M.; Aebersold, R. Mol Cell Proteomics 2010, 9, 1634.  (68) Mayne, S. L. N.; Patterton, H.-G. Briefings in Bioinformatics 2011.  (69) Fabris, D.; Yu, E. T. J Mass Spectrom 2010, 45, 841.  (70) Maiolica, A.; Cittaro, D.; Borsotti, D.; Sennels, L.; Ciferri, C.; Tarricone, C.; Musacchio, A.; Rappsilber, J. Mol Cell Proteomics 2007, 6, 2200.  105   (71) Rinner, O.; Seebacher, J.; Walzthoeni, T.; Mueller, L.; Beck, M.; Schmidt, A.; Mueller, M.; Aebersold, R. Nat Methods 2008, 5, 748.  (72) Chen, Z. A.; Jawhari, A.; Fischer, L.; Buchen, C.; Tahir, S.; Kamenski, T.; Rasmussen, M.; Lariviere, L.; Bukowski-Wills, J.-C.; Nilges, M.; Cramer, P.; Rappsilber, J. EMBO J 2010, 29, 717.  (73) Trester-Zedlitz, M.; Kamada, K.; Burley, S. K.; Fenyo, D.; Chait, B. T.; Muir, T. W. J Am Chem Soc 2003, 125, 2416.  (74) Sinz, A.; Kalkhof, S.; Ihling, C. J Am Soc Mass Spectr 2005, 16, 1921.  (75) Muller, D. R.; Schindler, P.; Towbin, H.; Wirth, U.; Voshol, H.; Hoving, S.; Steinmetz, M. O. Anal Chem 2001, 73, 1927.  (76) Schulz, D. M.; Kalkhof, S.; Schmidt, A.; Ihling, C.; Stingl, C.; Mechtler, K.; Zschoernig, O.; Sinz, A. Proteins 2007, 69, 254.  (77) Kasper, P. T.; Back, J. W.; Vitale, M.; Hartog, A. F.; Roseboom, W.; de Koning, L. J.; van Maarseveen, J. H.; Muijsers, A. O.; de Koster, C. G.; de Jong, L. Chembiochem 2007, 8, 1281.  (78) Soderblom, E. J.; Goshe, M. B. Anal Chem 2006, 78, 8059.  (79) Petrotchenko, E. V.; Xiao, K. H.; Cable, J.; Chen, Y. W.; Dokholyan, N. V.; Borchers, C. H. Mol Cell Proteomics 2009, 8, 273.  (80) McIlwain, S.; Draghicescu, P.; Singh, P.; Goodlett, D. R.; Noble, W. S. J Proteome Res 2010, 9, 2488.  (81) Lee, Y. J.; Lackner, L. L.; Nunnari, J. M.; Phinney, B. S. J Proteome Res 2007, 6, 3908.  (82) Lee, Y. J. J Am Soc Mass Spectr 2009, 20, 1896.  (83) Nadeau, O. W.; Wyckoff, G. J.; Paschall, J. E.; Artigues, A.; Sage, J.; Villar, M. T.; Carlson, G. M. Mol Cell Proteomics 2008, 7, 739.  (84) Heymann, M.; Paramelle, D.; Subra, G.; Forest, E.; Martinez, J.; Geourjon, C.; Deleage, G. Bioinformatics 2008, 24, 2782.  (85) Anderson, G. A.; Tolic, N.; Tang, X. T.; Zheng, C. X.; Bruce, J. E. J Proteome Res 2007, 6, 3412.  (86) de Koning, L. J.; Kasper, P. T.; Back, J. W.; Nessen, M. A.; Vanrobaeys, F.; Van Beeumen, J.; Gherardi, E.; de Koster, C. G.; de Jong, L. Febs J 2006, 273, 281.  106   (87) Gao, Q. X.; Xue, S.; Doneanu, C. E.; Shaffer, S. A.; Goodlett, D. R.; Nelson, S. D. Anal Chem 2006, 78, 2145.  (88) Tang, Y.; Chen, Y. F.; Lichti, C. F.; Hall, R. A.; Raney, K. D.; Jennings, S. F. Bmc Bioinformatics 2005, 6.  (89) Peri, S.; Steen, H.; Pandey, A. Trends Biochem Sci 2001, 26, 687.  (90) Clauser, K. R.; Baker, P.; Burlingame, A. L. Anal Chem 1999, 71, 2871.  (91) Barton, S. J.; Richardson, S.; Perkins, D. N.; Bellahn, I.; Bryant, T. N.; Whittaker, J. C. Anal Chem 2007, 79, 5601.  (92) French, D.; Edsall, J. T. Adv Protein Chem 1945, 2, 277.  (93) Fraenkelconrat, H.; Brandon, B. A.; Olcott, H. S. J Biol Chem 1947, 168, 99.  (94) Fraenkelconrat, H.; Cooper, M.; Olcott, H. S. J Am Chem Soc 1945, 67, 950.  (95) Fraenkelconrat, H.; Mecham, D. K. J Biol Chem 1949, 177, 477.  (96) Fraenkelconrat, H.; Olcott, H. S. J Am Chem Soc 1946, 68, 34.  (97) Fraenkelconrat, H.; Olcott, H. S. J Biol Chem 1948, 174, 827.  (98) Fraenkelconrat, H.; Olcott, H. S. J Am Chem Soc 1948, 70, 2673.  (99) Kelly, D. P.; Dewar, M. K.; Johns, R. B.; Wei-Let, S.; Yates, J. F. Adv Exp Med Biol 1977, 86A, 641.  (100) Heck, A. J.; Bonnici, P. J.; Breukink, E.; Morris, D.; Wills, M. Chemistry 2001, 7, 910.  (101) Metz, B.; Kersten, G. F. A.; Baart, G. J. E.; de Jong, A.; Meiring, H.; ten Hove, J.; van Steenbergen, M. J.; Hennink, W. E.; Crommelin, D. J. A.; Jiskoot, W. Bioconjugate Chem 2006, 17, 815.  (102) Metz, B.; Kersten, G. F. A.; Hoogerhout, P.; Brugghe, H. F.; Timmermans, H. A. M.; de Jong, A.; Meiring, H.; ten Hove, J.; Hennink, W. E.; Crommelin, D. J. A.; Jiskoot, W. J Biol Chem 2004, 279, 6235.  (103) Toews, J.; Rogalski, J. C.; Clark, T. J.; Kast, J. Anal Chim Acta 2008, 618, 168.  (104) Toews, J.; Rogalski, J. C.; Kast, J. Anal Chim Acta 2010, 676, 60.  107   (105) Nelson, D. L., Cox, M. M. Lehninger Principles of Biochemistry; 3rd ed.; W. H. Freeman: New York, 2000.  (106) Nolan, C.; Margoliash, E.; Peterson, J. D.; Steiner, D. F. J Biol Chem 1971, 246, 2780.  (107) Tang, X. T.; Bruce, J. E. Mol Biosyst 2010, 6, 939.  (108) Keller, B. O.; Suj, J.; Young, A. B.; Whittal, R. M. Anal Chim Acta 2008, 627, 71.  (109) Tang, X. J.; Thibault, P.; Boyd, R. K. Anal Chem 1993, 65, 2824.  (110) Lee, Y. J. Mol Biosyst 2008, 4, 816.  (111) Geetha, T.; Langlais, P.; Luo, M.; Mapes, R.; Lefort, N.; Chen, S.-C.; Mandarino, L.; Yi, Z. J Am Soc Mass Spectr 2011, 22, 457.  (112) Diepen, M. G. W. T.-v., University of York, 1996.  (113) Layloff, T. In American Genomic/Proteomic Technology 2001; Vol. 1, p 10.  (114) van den Oord, A. H. A.; Wesdorp, J. J.; van Dam, A. F.; Verheij, J. A. European Journal of Biochemistry 1969, 10, 140.  (115) Hardman, K. D.; Eylar, E. H.; Ray, D. K.; Banaszak, L. J.; Gurd, F. R. N. J Biol Chem 1966, 241, 432.  (116) Engel, B. J.; Pan, P.; Reid, G. E.; Wells, J. M.; McLuckey, S. A. International Journal of Mass Spectrometry 2002, 219, 171.  (117) Hogan, J. M.; McLuckey, S. A. J Mass Spectrom 2003, 38, 245.  (118) Roland Kellner, F. L., Helmut E. Meyer Chemical and enzymatic fragmentation of proteins; second ed.; Wiley-VCH: New York, 1999.  (119) Creighton, T. E. Proteins: Structures and Molecular Properties; second ed.; W. H. Freeman, 1992.  (120) Weerasekera, R.; She, Y. M.; Markham, K. A.; Bai, Y.; Opalka, N.; Orlicky, S.; Sicheri, F.; Kislinger, T.; Schmitt-Ulms, G. Proteomics 2007, 7, 3835.  (121) Santos, L. F. A.; Iglesias, A. H.; Pilau, E. J.; Gomes, A. F.; Gozzo, F. C. J Am Soc Mass Spectr 2010, 21, 2062.  (122) Smith, D. P.; Anderson, J.; Plante, J.; Ashcroft, A. E.; Radford, S. E.; Wilson, A. J.; Parker, M. J. Chem Commun 2008, 5728.  108   (123) Novak, P.; Haskins, W. E.; Ayson, M. J.; Jacobsen, R. B.; Schoeniger, J. S.; Leavell, M. D.; Young, M. M.; Kruppa, G. H. Anal Chem 2005, 77, 5101.  (124) Trnka, M. J.; Burlingame, A. L. Mol Cell Proteomics 2010, 9, 2306.  (125) Santos, L. F. A.; Eberlin, M. N.; Gozzo, F. C. J Mass Spectrom 2011, 46, 262.   109  Appendices A.1 The List of Natural Amino Acids Shown in the following are natural amino acids, including their abbreviations and residue masses. The residue mass of an amino acid is calculated by it molecular weight minus that of water. Name 3-Letter Abbreviation  1-Letter Abbreviation Residue Mass (Da) Alanine Ala  A 71.04 Arginine Arg  R 156.10 Asparagine Asn  N 114.04 Aspartic acid Asp  D 115.03 Cysteine Cys  C 103.01 Glutamic acid Gln  E 129.04 Glutamine Glu  Q 128.13 Glycine Gly  G 57.02 Histidine His  H 137.06 Isoleucine Ile  I 113.08 Leucine Leu  I 113.08 Lysine Lys  K 128.09 Methionine Met  M 131.04 Phenylalanine Phe  F 147.07 Proline Pro  P 97.05 Serine Ser  S 87.03 Threonine Thr  T 101.05 Tryptonphan Trp  W 186.08 Tyrosine Tyr  Y 163.06 Valine Val  V 99.07   110  A.2 The List of Assigned MS Signals in Figure 3-12 Shown in the following table are the origin, mass, m/z and charge state (z) of unmodified and modified myoglobin peptides assigned to 55 of all the MS signals in Figure 3-12. Position in Myoglobin Origin Mass m/z z 2-7 576.24 576.24 577.26 1 2-7 576.24+12 588.24 589.23 1 2-7 576.24+42.016 618.256 619.25 1 2-7 576.24+90.048 666.288 667.18 1 2-7 576.24+72.032 648.272 649.3 1 8-19 1484.78 1484.78 743.41 2 8-19 1484.78+12 1496.78 749.41 2 8-19 1484.78+30.016 1514.796 758.42 2 20-28 896.4 896.4 896.4 1 20-28 896.4+24 920.4 920.4 1 20-28 896.4+36 932.4 932.4 1 29-39 1280.72 1280.72 641.37 2 29-39 1280.72+12 1292.72 647.36 2 29-39 1280.72+30.016 1310.736 656.35 2 29-42 1623.9 1623.9 542.32 3  812.97 2 29-42 1623.9+30.016 1653.916 552.29 3 29-42 1623.9+24 1647.9 550.3 3  824.94 2 29-42 1623.9+54.016 1677.916 560.33 3 43-53 1419.79 1419.79 474.26 3 43-53 1419.79+12 1431.79 478.27 3 43-53 1419.79+42.016 1461.806 488.28 3 43-53 1419.79+54.016 1473.806 492.27 3 43-53 1419.79+72.032 1491.822 498.27 3 43-55 1619.87 1619.87 540.96 3 43-55 1619.87+12 1631.87 544.95 3 43-55 1619.87+42.016 1661.886 554.97 3 43-55 1619.87+72.032 1691.902 564.99 3 56-84 3124.75 3124.75 625.97 5 56-84 3124.75+42.016 3166.766 634.37 5 56-84 3124.75+36 3160.75 633.16 5 61-84 2578.51 2578.51 645.64 4  111  Position in Myoglobin Origin Mass m/z z 61-84 2578.51+42.016 2620.526 656.16 4 61-86 2778.59 2778.59 695.67 4  556.73 5 61-86 2778.59+12 2790.59 698.68 4 61-86 2778.59+42.016 2820.606 706.17 4  565.15 5 87-106 2314.35 2314.35 463.88 5 87-106 2314.35+42.016 2356.366 472.29 5 107-137 3275.64 3275.64 819.93 4 107-137 3275.64+12 3287.64 822.95 4 107-137 3275.64+30.016 3305.656 827.45 4 107-137 3275.64+42.016 3317.656 830.45 4 107-137 3275.64+72.032 3347.672 837.98 4 138-149 1466.79 1466.79 489.93 3 138-149 1466.79+12 1478.79 493.94 3 138-149 1466.79+30.016 1496.806 499.94 3 138-149 1466.79+24 1490.79 497.94 3 138-149 1466.79+42.016 1508.806 503.95 3 138-149 1466.79+36 1502.79 501.94 3 138-149 1466.79+90.048 1556.838 519.94 3 138-149 1466.79+54.016 1520.806 507.94 3 138-149 1466.79+72.032 1538.822 513.92 3 150-154 520.26 520.26 521.25 1 150-154 520.26+12 532.26 533.27 1 150-154 520.26+30.016 550.276 551.28 1 150-154 520.26+54.016 574.276 575.32 1    112  A.3 The MatLab Program for Data Processing in Chapter 3.3.2.2 Shown as follows is the MatLab program to speed up data processing for the identification of cross-linked peptides in the formaldehyde treated model protein. It makes the theoretical list of possible cross-linked peptides by combining unmodified and modified peptides, which is then compared to the LC-MS signals and generate a list of matches as the output. Lines starting with the % symbol are not part of the program, but annotations.  % Make the theoretical list of possible cross-linked peptides NoPep=x % x is the number of observed unmodified and modified peptides a=0 for j=1:NoPep     for k=j:NoPep         a=a+1         ExtPepComb(a,1)=j         ExtPepComb(a,2)=k         ExtPepComb(a,3)=ExtPep(j,1)+ExtPep(k,1)     end end  113   % Compare the theoretical list to experimental LC-MS signals b=0 for m=y:z % y and z define the mass range of the LC-MS signals to be compared     for n=1:a         if LCMS(m,1)>=ExtPepComb(n,3)-0.2 && LCMS(m,1)<=ExtPepComb(n,3)+0.2            b=b+1            CandiPep(b,1)=ExtPepComb(n,1)            CandiPep(b,2)=ExtPepComb(n,2)            CandiPep(b,3)=ExtPepComb(n,3)         end     end end    

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0062184/manifest

Comment

Related Items