Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Theory and application of Eigenvalue independent partitioning in theoretical chemistry Sabo, David Warren 1977-02-25

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


831-UBC_1977_A1 S22.pdf [ 19.54MB ]
JSON: 831-1.0060960.json
JSON-LD: 831-1.0060960-ld.json
RDF/XML (Pretty): 831-1.0060960-rdf.xml
RDF/JSON: 831-1.0060960-rdf.json
Turtle: 831-1.0060960-turtle.txt
N-Triples: 831-1.0060960-rdf-ntriples.txt
Original Record: 831-1.0060960-source.json
Full Text

Full Text

THEORY;: AND APPLICATION OF EIGENVALUE INDEPENDENT PARTITIONING IN THEORETICAL CHEMISTRY lay DAVID WARREN SABO BW Sc* (Hons.) University of Alberta, 19?2 A THESIS SUBMITTED: IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY ira the Department oaf CHEMISTRY We accept this thesis as conforming to the required standard / THE UNI VERS IT Yf OF BRITISH COLUMBIA July,, 1977 ®David Warren: Sabo, 1977' In presenting this thesis impartial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission! for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain- shall not be allowed without my written; permission)* Department of Chemistry The University of British Columbia Vancouver, British Columbia, Canada, V6T 1W5 Date iii. ABSTRACT This work concerns the description of eigenvalue indepen dent: partitioning theory, and its application to quantum mech anical calculations of interest in chemistry. The basic theory for an m-fold partitioning of a hermitian matrix H, (2 < m < n, the dimension of the matrix), is developed in detail, with particular emphasis on the 2x2 partitioning, which is the most' useful. It consists of the partitioning of the basis space into two subspaces — an n^-dimensional subspace (n^ ^- 1), and the complementary n-nA « nB?-dimensional subspace. Various n^-(or ng-) dimensional effective operators, and projections onto nA- (or ny) dimensional eigenspaces of H, are defined in terms of a mapping, f, relating the parts of eigenvectors lying im each of the partitioned subspaces. This mapping is shown to be determined by a simple nonlinear operator equation, which can be solved by iterative methods exactly, or by using a pertur-bation expansion. Properties of approximate solutions, and various alternative formulas for effective operators, are examined. The theory is developed for use with both orthonormal and non-orthonormal bases. Being a generalization of well known one-dimensional partitioning formalisms, this eigenvalue independent partition ing theory has a number of important areas of application. New and efficient methods are developed for the simultaneous deter mination of several eigenvalues and eigenvectors of a large hermitian matrix, which are based on the construction and iv» diagonali'zation of an appropriate effective operator. Pertur bation formulas are developed both for effective operators defined in terms of f, and for projections onto whole eigen-spaces of H. The usefulness of these formulas, especially when the zero order states of interest are degenerate, is illustrated by a number of examples, including a formal uncoup ling of the four component Dirac hamiltonian to obtain a two component hamiltonian for electrons only, the construction of an effective nuclear spim hamiltonian in esr theory, and the derivation of perturbation series for the one-particle density matrix in molecular orbital theory (in both Huckel-type and closed shell self-consistent field contexts). A procedure is developed for the direct minimization of the total electronic energy in closed shell self-consistent field theory in terms of the elements of f, which are uncon strained and contain no redundancies. This formalism is extended straightforwardly to the general multi-shell single determinant case. The resulting formulas, along with refine ments of the basic conjugate gradient minimization algorithm* which involve the use of scaled variables and frequent basis modification, lead to efficient, rapidly convergent methods for the determination of stationary values of the electronic energy* This is illustrated by some numerical calculations in the closed shell and unrestricted Hartree-Fock cases. TABLE OF CONTENTS Page Abstract . iii List of Tables . • • • ...... . . . . • . . . . xii List of Figures •••••••• • xvi Acknowledgements xviiChapter 1 Eigenvalue Independent Partitioning, An Introduction • • • • • • • • • • • 1 Chapter 2 2x2 Partitioning Thecry ••«••<>• 9 2*1 Basic Theory «.•••......••.• 10 2.1#a The f-operator 12.1*b The Befining Condition for f . . . 14 ^•l^c? Rederivation From A Projection; Point of View a • • • • • • • • • 16 2»l.d The Relationship Between T and the EigenprojiBCtions—Covariant and Contra variant Representations ... 21 2.1.e Variational Formulation of D(f)=0 . 23 2»l.f Relation: Between o and D(f) —Eigenvalue Dispersion • • • • • 25 2.1. g Transformation, of f Under a Change of Basis o • • • 27 2.2 Effective Operators ....... o • • • 30 2.2. a Basic Definitions • 32.2»b Eigenvectors and Eigenvalues of the Effective Operators . • • . - • * 3^ vi. Page 2..2.C Relationships With Other Formulations • • • • • 37 2*3 Generalization, to a Nonorthonormal Basis 42 Chapter 3 The Effective Hamiltonians—Practical Considerations * • • • • * • • • • 46 3*1 Alternative Formulas • • • • • • • • • * 47 3*2 Implications of Inexact Solutions * • • *. 51 3*3 Perturbation! Theory for HA, GA, and HA# * 55 3*3 *a The HA Scheme • . * * * * * * * • 55 3*3«b: The GA Scheme • • • . • 59 3*3*c The HA Scheme * * • • * • • • • • 62 Chapter 4 Multiple Partitioning Theory * • • • • 64 4*1 The mi x m\ Partitioning Formalism * • • • 66 4*1.a Basic Theory• • ••••••••• 66 4*1*b The Defining Condition on the fjj 72 4*1*c Variational Formulation of the Equations for the * * * * * 74 4*1 *d Transformation! of the fjj Under a Change of Basis ........ 77 4*2 Effective Operators * * * . » • • • • • 79 4*2*a Basic Definitions * . • . • • • • 79 4.2*b Eigenvalues and Eigenvectors of the Effective Operators . . • • • * 82 4*3 Generalization; to a Nora-orthonormal Basis 85 4*4 Practical Considerations * . . • • • • • 91 4»4*a Alternative Formulas * * • • • • * 91 vii. Page 4.4.b Implications of Inexact Solutions 95 Chapter 5 Exact Determination*of T • • •• • • • • 98 5*1 The Calculation; of a Few Eigenvalues of a Large Hermitian Matrix • • • . • . • • 100 5.2 2 x 2 Partitioning—Orthonormal Basis • • 104 5.2.a General Considerations * • .. • • • 104 5.2.0 Methods Based on D^(f) . • • . . 109 5.2.c Methods Based on D*2*(f) . . . . . 113 5»2*d Solution; of the Newtore-Raphson Equations by Descent Methods • • 11? 5*2.e Extremizing the Trace • • • • • • 120 5.2.f Minimization) of the Norm of D . . 122 5.2. g Test Calculations • • • • . • • • 123 5*3 Generalization! to a Non-orthonormal Basis — 2 x 2 Case • • • 131 5«3.a General Considerations • . • • • . 131 5.3»b Methods Based ore Gg^ and gg^-— A Generalization! of Algorithm SDNR 132 5.3. C Methods Based onvD^(f) — Generalized Nesbet Procedures. • 135 5*3«d Other Methods, •• .. ., . • . ; . 137 5«3..e Choice of an Initial Estimate, and Improvement of Convergence Rates 140 5.3.-f Test Calculations With Overlap . . 144 5.4 Multiple Partitioning .......... 159 Chapter 6 Perturbations Theory • •> • • • • . • • • 161 6.1 Introduction! • • • • .... • • ... • 162 viit* Page 6*2 2x2 Partitioning — Orthonormal Basis 164 6*2 *a General Discussion * • • * • • • * 164 6*2*b A-states Degenerate * • • • • * • 173 6*2 *c A-states Nora-degenerate • • • * • 178 6*3 Examples * • • • •»•••• * * • • • • 183 6*3*a The Dirac Equation • • * * * * * * 183 6*3*b Derivatioro of a Spini Hamiltonian — Strong Field Case • * * * * . 191 6*4 Non-orthonormal Basis —2x2 Partitioning 195 Chapter 7 Eigenvalue Independent Partitioning And Molecular Orbital Theory * • * * • 209 7*1 Introduction ••»*•»•»•*•• • • 210 7*2 Perturbation of the Density Matrix — Orthonormal Basis • • * • * • • • • 212 7*2*a General Theory *.. * * • . . • • • • 212 7.2*b Huckel Molecular Orbital Theory * 221 7*2*c Numerical Example — Huckel Theory 224 7*3 Perturbation of the Density Matrix — Non-orthonormal Basis * • * * * * * 235 7*3«a General Theory * • * ...... * 235 7*3*b Extended Huckel Molecular Orbital Theory * ........... . 240 7*4 Self-Consistent Perturbation Theory • . . 243 7*4*a General Theory * . • • • • * * • • 243 7*4*b: Coupled Hartree-Fock Perturbation Theory * »*•*•••••••• 246 Ix* Page 7*4*cc Uncoaplied Hartree-Fock Perturbation Theory . * . * . . • • * • • • • 249 Chapter 8 Direct Minimization Self-consistent Field Theory •> * ...... . . . . 251 8*1 Introduction! . . * . • . * 252 8*2 Closed Shell Systems * . • • . . . . * . 257 8*2.a Orthonormal Basis • • • • • • » 257 8*2*b Non-orthonormal Basis • • • • • • 263 8.2.c Results of Test Calculations • • • 266 8*3 Unrestricted Hartree-Fock Theory • • * • 271 8*3»a Energy Derivatives •• * » • • . • • 271 8»3*b Test Calculations and Computational Refinements* • * • • • • • • . • 275 8*3*c Use of Scaled Variables • • . • • 282 8*4 Theory for the General Single Determinant Case. • • • . • . . . • . . . * . . . . 290 8*4.a The Basic Variables of the Calculation; . . . . ..... . 290 8.4.b The Energy Variation and First Derivatives . * • . . . . • • • 293 8*4.c The Second Derivatives . . . * 297 8*4*4 Incorporation! of the Intershell Orthogonality Constraints • . • 302 8.4.e Example — The Two Shell System * 305 Bibliography 313 Page Appendices ••••••••••••••••••••• 316 Appendix 1 Proofs of Alternative Formulas ~ 2 x 2 Partitioning • • • • . • • 317 Appendix 2 The 3x3 and 4x4 Case — Orthonormal Basis ...... . . ........ 323 Appendix- 3 Proofs of Alternative Formulas -.— Multiple Partitioning . . . . . . 327 Appendix 4 Description of Algorithms — 2x2 Case 333 Appendix 5 Rates of Convergence and Asymptotic Error Constants • • • • ••••••• 341 <» Appendix 6 Algorithms for the Determination: of T — Multiple Partitioning Case •. • • • 349 A6.1 Methods Based omD^(I) * 0 349 A6*2 Methods Based on DjJ'(T) - 0 ...... 353 A6*3 Methods Based on the Simultaneous Solution of GJX(T)«0 and gJI(T)*0 . .. . • • .. • 356 A6.4 Methods Based on D^(T) =0 . • • . . . 358 Appendix T5 Additional Perturbation Series — Ortho-normal Basis • • • •• .. » • . • • e • 359 Appendix; 8 Noro-relativistic; Approximation of the Dirac Hamiltonian 370 Appendix 9 Additional Perturbation Series — Non-orthonormal Basis • • • • » • • 377' Appendix 10 Self-consistent Perturbation Theory When F^ is not Block Diagonal . . . 382 xi* Page Appendix 11 Minimization Algorithms • • • •> • • • • 385 All*l Method of Conjugate Gradients •. * • • 385 A11..2 The Newton-Raphsoni Method • • • • . . • 38? Appendix 12 Derivatives With Respect to Real and Imaginary Parts of f. . • • . • . • 390 Appendix 13 Covariant and Contravariant Representa tions — the General Case • • • • • • 391 xii* List of Tables Page Table 5«1 Linear Convergence Rates of the Algorithms in Selected Calculations • • • • . • • • 125 Table 5,2 Rates of Convergence • . • » • • • • • • * 146 Table 5*3 Rates of Convergence * • • • • • • . • • • 149 Table 5*4 Rates of Convergence » • • •- ••• • • • • • • 155 Table 6..1 D(n)(f). ....... 169 Table 6.2 ^(f). 170 Table 6.3 171 Table 6.4 . 17Table 6.5 H{N) 2 Table 6.6- f*n^ (A-states degenerate) ...... • • 174 Table 6.7 f*n* (A-states degenerate) * 174 Table 6.8 H^N^ — Reduced Formulas (A-states degen.) 175 Table 6.9 (A-states degenerate) ....... 0 1765 Table 6.10 G^n^ (A-states degenerate) ....... 9 176 Table 6.11 HJ^ (A-states degenerate) . . 177 Table 6.12 f(re) . .................. 180 Table 6.13 S{n) • • 180 Table 6.14 c|n) . :. . * * * 181 Table 6.15 H^M)  . 182 Tables 6.16- D^n^(f) — Non-orthonormal Basis . • • • • 198 Table 6.17 ftj^ (f) — Non-orthonormal Basis ..... 199 Table 6.18 H$N^ — Non-orthonormal Basis ... . • • 199 xiii* Page Table 6,19 G^ — Non-orthonormal Basis ...... 200 Table 6*20 g^ — Non-orthonormal Basis ...... 200 Table 6*21 f^n' — Non-orthonormal Basis (A-states degenerate) • •*••• 201 Table 6*22 H^'' — Non-orthonormal Basis (A-states degenerate) •••••* 202 Table 6.23 — Non-orthonormal Basis r • o- .. (A-states degenerate) • • • • • • 203 Table 6,24 HAn' — Non-orthonormal Basis (A-states degenerate) • • * * • * 203 Table 6.25 f^n' — Non-orthonormal Basis * * . . • • 20? Table 6»26 HA — Non-orthonormal Basis . . . • o 20? Table 6»2? GAn^ — Non-orthonormal Basis • • • • • * 208 Table 6,28 fij^1* — Non-orthonormal Basis ...... 208 Table 7*1 C^AA* Molecular 0rbital Basis * . • 216 Table 7*2 (P^)^ Molecular Orbital Basis * . • 217 Table 7*3 (PA)BB^ Mol1ecuIar Orbital Basis * . * 218 Table 7*4 E^n' — Molecular Orbital Basis . . . . . 219 Table 7*5 E*n^ — Molecular Orbital Basis • . . • * 220 Table 7.6 (PA0)(i) for A6 System (K^ - 0) . * . * 227 Table 7*7 (PA0)(i) for AN A*A, System: (H^ = -1) . 228 Table 7.8 (PA0)*1* for an A^ System (H^ = 0) . * 228 Table 7*9 (P^)^ *— Non*-orthonormal Basis * . * • • 237 Table 7*10 (PJ)^ — Non-orthonormal Basis * * • • 237 xiv* Page Table 7.11 (^BB^ Non>-orthonormal Basis • • • • 238 Table 7*12 E^n? — Non-orthonormal Basis • • • • • 238 Table 7*13 E^ — Non-orthonormal Basis • • '.. • • 239 Table 8*1 Closed Shell Case — Test Calculations *, 270 Table 8.2 Details of Direct Minimization Calcula tions, CN Molecule (r•• 2,0 a.u.) • • 278 Table 8*3 Details of Direct: Minimization Calcula tions, CN Molecule (r • 2.2 a.u.) • * 283 Table A7*l g^™** * . * . . . . . . . 363 Table A7.2 gj^(n). *,..o... 364 Table A7.3 • • • • - 365 Table A7.4 gA^(ni). ....... . . . . . . . . * 365 Table A7*5 gj^^• . . * 366 Table A7.6 g^1^. 36Table A7.7 H^n) in Terms of the g^n) and fij^K- . . 367 Table A7.8 H*n* in Terms of the g^ and GJn*. . * 368 Table A7.9 G^.. • 369 Table A8.1 Pauli Hamiltonian (adapted from DeVries (1970)) . . . . . . . * 374 Table A8.2 gA — Non-.relativistic Approximation! • 3?4 Table A8*3 gA — Nont-relativistic Approximation! • 375 Table A8*4 gj^ — Non>-reIativistic Approximation * 375 Table A8..5 Eriksen Hamiltonian: (adapted fromi DeVries (1970)) . ..... . •> 376-Page Table A8.6 Transformation*Connecting Hpaui_i and Hg^ (adapted froirn DeVries (1970)) ., ., 376 Table A9«l — Non-orthonormal Basis • •, • • • 378 Table A9.2 g^1^ — Non-orthonormal Basis • • • • 379 Table A9*3 gj[^n^ — Non*-orthonormal Basis • • o 379 Table A9.4 — Non*-orthonormal Basis • ... 380 Table A9*5 Sj[n^ — Non-orthonormal Basis • . • . . 380 Table A9.6 — Non-orthonormal Basis • • • • • 381 xvi. List Of Figures Page Figure 5.1 Algorithm! SDNRS . 153 Figure 5.2 Algorithm SDNRS 154 Figure 7.1 P^ vs. for the Ag System .... • 229 Figure 7.2 P12 vs. Hii for the Ag System • . ... • 230 Figure 7*3 Variation of T±1 with Hn for the A^B^ System,, H|J* = -1.0 231 Figure 7*4 P21 vs H11 for the A^B^ System^ It[J* • -1.0 * ....... 232 Figure 7*5 Variation of P^ with for the A^B^ System, • 0.0 • . • • • . • • 233 s Figure 7*6 Variation of P21 with Hn for the A^B^ System, « 0.0 234 Figure 8.1 Total electronic energy as a function of iteration number for the CN molecule, bond length =2.0 a.u. ....... 279 Figure 8*2 Total electronic- energy as a function of iteration number for the CN molecule,, bond length = 2.0 a.u. • ... • • • 280 Figure 8*3 Total electronic energy as a function of iteration number for the CN molecule„ bond length « 2.0 a.u. * • • • • • • 281 Figure 8.4 Total electronic energy as a function of iteration number for the CN molecule, bond length =2.2 a.u. ... • . . . 284 xvii Page Figure 8*5 Total electronic energy as a functions of iteration! number for the CN molecule, bond length • 2.2 a.u • 285 Figure 8*6 Total electronic: energy as a function of iteration number for the CN molecule, bond length « 2.2 a.u. ....... 286 xviii. ACKNOWLEDGEMENTS I would like to take this opportunity to express my gratitude and sincere thanks to Dr. John A. R. Coope for his guidance and many helpful suggestions during my time as a graduate student at UV Bv C, and especially during the prep aration) of this thesis* His accessibility and willingness to become involved ire my research, and to demonstrate how to communicate the results of that work, have made this a reward ing and productive time* I would like to thank my wife, Marlene, not only for transcribing the figures and hand drawn tables in. this thesis, but also for her support, and her patience, tolerance, and understanding of a husband often obsessed with writing a Ph.. D.. thesis* My parents also deserve much credit for their support and wise counsel over the many years of my education. I would like to thank the National Research Council of Canada, the H. R.« MacMillan Foundation, and the University of British Columbia, for financial support,, without which these studies would not have been possible. I would also like to acknowledge the contribution of the U. Bi. C. Computing Center to the more practical aspects of this work. Their extensive program library, and their extensive and powerful hardware facilities have made this part of the work far less painful thaniicould be expected at many other institutions. 1 CHAPTER 1 EIGENVALUE INDEPENDENT PARTITIONING, AN INTRODUCTION ( "The average Ph. D. thesis is nothing but a transference of bones from one graveyard to another." (J. Frank Dobie, A Texan in England,, 1945) 2. Matrix partitioning is a well established technique ire linear algebra, and such techniques have been found to be very useful in quantum chemistry. In a series of papers, Lttwdin (1968, and references cited therein) has demonstrated the power and generality of a one-dimensional partitioning forma lism, which contains, as special cases, many conventional methods used in quantum mechanical calculations. Through the partitioning of the basis space into two subspaces — a one-dimensional space spanned by a chosen reference function, and the complementary n-1 dimensional space — he obtains an expres sion for the eigenvalues, € „ of the matrix H as • Haa * "ab<eaV- "WX.'-where H_Q is a function not only of the elements of H, but also, of €& itself. Further development of the formalism leads to a variety of perturbation formulas (including, among others, the Rayleigh-Schrodinger and Brillouim-Wigner types), iterative methods for determining a single eigenvalue, formulas for upper and lower bounds to eigenvalues, and many other useful results. The function H (€) in eq. (1.1) can be regarded as a 2L.31 SL one-dimensional effective operator which depends implicitly,on the eigenvalues €Q of H. A number of attempts have been made to construct effective operators without implicit eigenvalues (see Klein (1974) and references cited therein), one of which is the eigenvalue independent partitioning of Coope (1970), which has some similarities to a non-canonical approach to the con struction of effective operators,in elementary particle theory, first formulated by Okubo (1954). This thesis is primarily concerned with the development of this partitioning formalism, and its application in;quantum mechanical calculations. The basic theory is described ini considerable detail in chapters 2 - 4. In the simplest (2x2) case, the basis space is partitioned into two subspaces — an nA-dimensional subpace and the comple-mentary n-nA s ttg, dimensional subspace, where 1 < nA < n-1 r-but now, the fundamental quantity is taken to be a mapping, f, relating the parts of the eigenvectors lying in these two sub-spaces. It is possible to define a variety of nA-dimensional (and also, nB«dimensional) effective operators in terms of this mapping. The set of eigenvalues of these effective operators form a subset of the eigenvalues of the matrix H, but the effective operators themselves no longer depend explicitly or implicitly on these eigenvalues. Also, the corresponding eigenvectors of the full matrix H are obtained straightforwardly from those of the effective operators using the mapping f. Lowdim and Goscinski (1971) are quite correct ih\ pointing out that implicitness of some sort is unavoidable in a parti tioning formalism, and that this eigenvalue independent parti tioning formalism could be described, in: a particular sense,, 4 as an eigenvector implicit partitioning. This implicitness is basically a result of the fact that the eigenvalues (and through them, the eigenvectors) of a matrix are nonlinear functions of the elements of the matrix. As indicated by Coope (1970),, the one-dimensional partitioning formalism of Lowdim can be obtained as a special case of this eigenvalue independent partitioning formalism when nA * 1 (as is, ire fact, also demonstrated, but N not emphasized, by Lowdin and Goscinski (1971))• The adoption of this more general point of view, in which the partitioning theory is formulated ire terms of a mapping between the partitioned spaces rather than in terms of the eigenvalues and eigenvectors of the matrix, leads to new and important areas of application. In particular, it is especially suitable when groups of eigenvalues or eigenvectors are to be treated simultaneously. In chapter 2, it is shown that the mapping f can be used to define projections onto whole eigen-spaces of H. The condition defining f can be formulated variationally, and is also seen to be related to measures of errors in such eigenprojections. It is also shown that f transforms nonlinearly under a linear transformation of the basis vectors, and that this has important practical implica tions. The simplest (2x2) case of the eigenvalue independent partitioning described above is straightforwardly generalized to partitioning of the basis space (and eigenvector space) into mv (2 < m;< n>), subspaces, as is demonstrated in chapter 4. 5. There are two main areas of application of this partitioning formalism* One of them is in; the construct ioni of effective operators in nA-dimensional spaces, with nA >• 1. For eigen values which are well separated f romo all others, one-dimensional partitioning formalisms, as iro eq. (1.1), are useful, but when degeneracy or near degeneracy occurs, these formulas become ill-conditioned. Traditionally,, multi-dimensional effective opera tors have been constructed using a canonical procedure, H = UfH U, (1.2) requiring the calculation! of a unitary transformation,, U, which uncouples the desired eigenspace of the operator from; the rest of the eigenvector space (for example, Van Vleck perturbation:theory, (Van; Vleck, 1929). also, see Tani (1954) and Kleins (1974)). The unitarity of U is commonly ensured by writing it as TJ « e13,, (1.3) where S is a hermitian operator. Thus, ih obtaining the desired uncoupled operator Ii, one must determine the exponential iS operator, e • This can be done straightforwardly using a perturbation formalism when that is appropriate, but it is very difficult, in general,, to calculate S exactly otherwise. On the other hand, the mapping f, in this partitioning formalism, is defined by a much simpler, though still nonlinear, equation,, which can not only be solved using a perturbation expansion, when appropriate, but can also be solved iteratively in a very 6. t straightforward manner to obtain f to any desired level of accuracy* Methods for the iterative determination of f, and its generalization in a multi-partitioning formalism,are given in chapter 5» and accompanying appendices. The particular appli cation to the calculation of a small number of eigenvalues and eigenvectors of a large hermitian matrix is considered in detail, and test calculations demonstrate the usefulness of this new approach to the problem. Because of the simple algebraic form of the condition defining f, compared to those defining the operator S in eq,. (1.3), perturbation formulas for f and for effective operators defined in terms of f,. are obtained straightforwardly for arbitrary order, unlike the involved step by step procedure required in the canonical approach. Certain of the more useful series are developed in chapter 6. Two examples are included to demonstrate the scope and ease of use of these formulas. It is shown that a formal uncoupling of the four-component Dirac obtain a two-component relativistic wave equation for electrons, is obtained by a particularly simple application of the basic formulas derived in the early part of chapter 6. Also, a nuclear spin hamiltonian for the strong field case is derived to second order. In all cases, the presence of degeneracy in zero order is of no concern as long as all degenerate or nearly degenerate levels are treated at the same time. 1 Another major application of this eigenvalue independent partitioning formalism is in the use of the mapping operators to describe projections onto particular eigenspaces. As shown in chapter 2, projections onto eigenspaces can be written iii terms of f in a form which is automatically idempotent and , self-adjoint for any value of f• Because the elements of f are required to satisfy only a single simple defining condition* perturbation formulas to arbitrarily high order are again obtained straightforwardly. In chapter 7, perturbation: formulas for such projections are developed with reference to molecular orbital theory. In particular, perturbation; formulas for the density matrix in Huckel, extended Huckel, and closed shell self-consistent field theory are produced. The density matrix (the projection; onto the occupied orbitals) in closed shell self-consistent field theory can be written solely in terms of the operator f corresponding to a partitioning of the eigenvectors of the Fock operator, F, into two sets, consisting, respectively, of the occupied and the unoccupied orbitals, and thus, the total electronic energy is completely specified by f• The application of this partitioning formalism in self-consistent field theory represents a generali zation; of the simple matrix partitioning described above, in that the oiterator,, F,, to be brought to block diagonal form, itself depends on the partitioning operator f through its dependence on the density matrix R» Since the matrix elements of f are not constrained ini any way and do not contain; any 8 redundancy (see section 8.2), they are a particularly suitable set of variables in terms of which to determine the stationary values of the energy directly. The derivatives of the Hartree-Fock energy with respect to these variables are given very compactly using the columns of the density matrix and its complement. This formalism is extended straightforwardly tp the general multi-shell single determinant case using the multi-parrtitioning formalism described im chapter 4. Some numerical calculations in the closed shell and unrestricted Hartree-Fock cases are described im chapter 8, and they indicate that refine ments involving the use of scaled variables and the adoption of bases which nearly diagonalize the Fock matrices, result in practical procedures which are superior to the Roothaan proce dure and to other currently available direct minimization]self-consistent field procedures. CHAPTER 2 2 PARTITIONING THEORY The White Rabbit put on his spectacles* •Where shall I begin please your Majesty?* he asked. •• Begirt at'the beginning', the King said gravely, 'and go on till you come to the end« then stop.' " (Alice's Adventures in Wonderland. Lewis Carroll) 2.1 Basic Theory 2.1.a The f-operator Consider the matrix eigenvalue equation, H X = X £ , (2.1a) X*X = 1, (2.1bwhere H is an n x n hermitian matrix, X is the n x n unitary matrix whose columns are the orthonormal eigenvectors of H, and 'f is the n x n diagonal matrix whose elements are the corresponding real eigenvalues of H.. If the n-dimensional basis set being used is partitioned into two subsets spanning spaces SA and Sfi of dimensions nA and nfi, respectively, and the eigenvectors of H are similarly partitioned into two ,sets (A) Xv"' and X^, spanning spaces SA and Sg, also of dimensions lB« nA and nB, respectively, then, the matrix, X, above can be written in the block form, X « [X(A) X<B)] XAA XAB n h h XAA 0 _*_BA XBB £ "a 1 V = T X. (2.2) Formally,: one has, ,-1 and, f * XBA XAA •• h = XAfi Xg* . .(A) (2.3a) (2.3b) The operator f maps the part of an eigenvector x£ lying in SA into the part lying in Sg. It can be considered as a generalization of the operator f(E), defined by Lowdin (1962), in connection with a partitioning formalism with nA = 1 (that is, with the space SA one-dimensional). The function of the space SA here is analogous to that of the so-called reference function in one-dimensional partitioning formalisms. Similarly, (B) the operator h maps the part of an eigenvector x* lying in SB into the part lying imSA. From eqs. (2.3), it is seen that f exists if the matrix block X^A is non-singular, while h exists if the matrix block XfiB is non-singular. Since the eigenvectors of a hermitian matrix are orthogonal if the basis functions are linearly independent, the above conditions on XAA and XBB are sa-tisf*ed simultaneously for at least one partitioning of the basis functions. The orthonormality condition, (2.1b), on X can be used to show that h = -f*. Thus, in the simple 2x2 case, (2.4) A. T = -f 1 B (2.5) The operator f is the fundamental quantity in this 2x2 partitioning formalism. Because of (2.4), it completely determines projection operators, PA and Pfii, onto the two eigenspaces SA and Sfi. One has 1I"\AXIA>^A PA - X(A>X<A>t = (XAA Xi, ) [1. ff ]. 12. However, from the orthonormality condition, (2,1b), on X, one can write, XfX « X+ g X = ln , (2.6) where, <• t * g = T T = 1A • f'f 0 S *A 0 1B + f fi 0 gB_ (2.7) The matrices gA and gfi define metrics, with respect to which the truncated eigenvectors in X^ and XBB, are orthonormal. That is, XAA GA XAA = 1A ' and • (2.8a) (2.8b) XBB % XBB = 1B! " These truncated eigenvectors are not orthonormal with respect to unity unless f = 0. Since X is invertible, from (2.6) or (2.8), one has, sA 3 <XAA XIA S "A + FTF)» (2.9a) and, gB = (XfiB X^B r1 - (1B + fff). (2.9b) Using (2.9a), the projection PA can now be written, PA ' el1 Hk ff] = g -l f si1 t &11 ff (2.10) In a similar manner, the projection PB onto the eigenspace SB,can be written solely in terms of f as, 13 B -f 1 B 3B ^.t -1 "f «B -1 *B (2.11) It is easily verified that PA + PFI = 1. The operators PA and PFI above are manifestly self-adjoint. Furthermore, using the definitions of gA and gfi in terms of f, given in eqs. (2.9). these matrices can be shown to be idempotent by direct matrix multiplication.. Finally, tr g"1^ + fff) tr P! • tr gT1 + tr fgT1ft - tr 1, and similarly, tr PB « tr lg » nfi, (2.12a) (2.12b) where the cyclic property of the trace has been used. Thus, for arbitrary f, the operators PA and PFI satisfy all the criteria necessary to be orthogonal projection operators. The usefulness of the formulation in terms of the operator f is essentially that, while the operators PA and PFI must satisfy a complicated set of general constraints in order to be projec-tions onto (any) spaces SA and Sg, the partitioning operator f is not constrained in any way. The eigenprojection PA is completely specified by the nAn complex components of the vectors x£A\ (r = 1, .... nA), spanning SA. However, the space SA is also spanned by any (A) (A) other set of n. vectors vi ' related to the xj, ' by a non-A r r singular nA x nA linear transformation. This transformation 14 corresponds merely to a change of basis in SA» Therefore, 2 there are nA arbitrary, or redundant, complex parameters • (A) present in the specification of SA using Xx '• Thus only nA(n - nA) = nAnB complex parameters are necessary to specify the eigenspace SA* But this is exactly the number of degrees of freedom (or matrix elements) in f. Thus the operator f represents the minimum amount of information necessary to specify a projection onto the eigenspace SA (and therefore, also onto Sfi, which is the complement of SA) of HV This parti tioning formalism is therefore particularly useful in situations in which only eigenspaces have significance, rather than specific eigenvectors. 2»l.b The Defining Condition For f The matrices f and h, defined in the previous subsection, can be obtained by diagonalizing H to get its eigenvectors, X, and then applying the formulas (2.3) directly. However, it is possible to formulate a system of equations for f and h, which do not require knowledge of the eigenvectors of H.. The eigenvalue equation (2.1a) is rewritten as H T = T H, (2.13) where H = X ? X V — A HA 0 0 HB (2.14) is to be block diagonal. The diagonal blocks of eq. (2.13) give expressions for HA and Hfi in terms of f, h, and H, and, FTA " HAA + HABF- (2-15A> »B * HBAH * HBB' (2'15B) If these expressions are substituted back into eq. (2.13), the two off-diagonal blocks become nonlinear matrix equations, and, D(f) = HBA • HBBf - f HA(f) * HBA + HBBf * f HAA - f HABf " °' D'(h) = HMh + HAB - h Hfi(h) " HAB + HAAh ' h HBB * h HBAh = °-(2.16) (2.17) Equations (2.16) and (2.17) are both systems of nAnfi simulta neous nonlinear equations, the first for the matrix elements of f, and the second for the matrix elements of h. It is noteworthy that the two systems are not coupled, and thus can be solved independently. Of course, in this case, it is not necessary to solve both (2.16) and (2.17), because if one or the other has been solved, the solution of the remaining system is given by eq. (2.4). In fact, it can easily be seen that D'* is of the same form in -hf, as D(f) is in f, implying eq. (2.4) without explicitly making use of orthogonality (the hermiticity of H is usedf! and this, of course, implies the orthogonality condi tion anyway). 16. In the 2x2 partitioning formalism, eq. (2.16) is the fundamental equation determining the operator f, if an ortho-normal basis is used. A number of efficient iterative tech niques for the exact solution of (2.16) will be detailed later. The quantity D(f) is closely related to other more commonly used quantities in the determination of eigenprojections. In particular, D(f) will be seen to be related in several ways to the error in an eigenprojection. 2»l.c Rederivation From a Projection Point of View An alternative approach to this partitioning formalism can be made via the projection operators themselves. The objective is to determine the eigenprojection PA onto a space SA spanned by nA eigenvectors of H, in terms of some minimal set of variables which number nAnfi., as shown previously. It is useful to examine this approach in some detail, not only because it provides a different point of view, but also, because the projections themselves are manifestly basis independent. The conditions that PA be an eigenprojection of H are that PA commute with H, [ H , PA ] = 0, (2.18) and that PA be a projection operator, that is, PA2 " V PAf s V tr PA = nA- (2'19) It is convenient to define a partitioning of the basis functions into two sets, spanning spaces SA and Sfi of dimensions nA and nfi:, respectively/ Projections onto the spaces SA and Sfi are given by, 0 0 0 0 B (2.20) This partitioning of the basis functions implies that the projection PA can be written in block form, ~ i t' P.. P. AA i 3BA AB 3BB (2.21a) where, and PAA * PA 1 PA PA • PBA ' PB • PA PA • PAB " PA • PA PB • • PBB * PB i PA PB ' (2.21b) In terms of the partitioned matrix, (2.21),; the idempotency condition, PA « PA, is equivalent to the three block equations, • 2 • » PAA " PAA PAB PBA ' G» PBA " PBA * AA P» i - Pr>r> PT>A = 0, BB BA (2.22) and * 2 ' ' "* PT.TI PT>« PA-O = 0,. BB BB BA AB the remaining block equation being just the adjoint of the second one. Since there are only nAnfi independent variables in PA, it is possible, in principle, to express P^ and Pfifi 18. in terms of Pj^* However, the equations are nonlinear, and while formal general solutions can be written down, PAA * *CX t (1 * ^PBA • (2.23a) and p^ = £[i * (i - **BA - (2-23b) they are seen to contain the ambiguity in the square root, and are generally difficult to evaluate• A more useful result is reached by a different route.. The matrix PA is of rank nA» because the projection operator onto an nA-dimensionaI space has precisely nA non-zero eigen-values, corresponding to the nA eigenvectors of PA which span the image space SA* This means that there is at least one nA x nA SUDma,trix ot PA which; is non-singular. It will be assumed that the partitioning of the basis set,(2.20),is carried out so that PAA is such a submatrix, that is, detCP^.) f 0. With this assumption, the first equation of (2.22) can be rewritten as PAA " PAA <*A * PMlpBA PBA PAA • «'2»>--The quantity inside the brackets in (2.24) will be greatly simplified if PgA is written as some factor times P^* that is, if PBA * f PAA • PAB a PAA ft ' <2«2*> where f is an n^ x: nA matrix, and thus represents a suitable quantity, in terms of which the matrix PA could be expressed. The existence of f is assured by the invertibility of P^r f . P;A p^1. (2.25b) Now, (2.24) yields PAA * (1A * ^f)"1* (2.26) From (2.25), PBA * f (1A * f+f>"1« (2.2?) Finally, substituting (2.25) into the second of eqs. (2.22), and multiplying from the right by f^ff*)"1, yields, Pfifii » f (1A • f^)"1^* (2.28) Equation (2.18) can now be used to derive an equation defining the operator f. Expansion of the commutator again yields three unique block equations, <E*>AA * <*A * ftf)"1(HAA + f\B> - <HAA + W»lA * ftfrl 3 °' (EQ)ba - dB • tt^rHa^ • fffHBA) (2.29) "(HBA * HBBf)(1A * ftf>"1 ' G» and, (EQ)fiB; - (1B • ffV^fH^* ff\B) Here, use has been made of the relations, f(lA • ftf)"1 - (lfi • fft)"1f, and (2.30) (1A + t^t)"1^ • ff(lB • ff*)"1. to move all of the inverse operators to the outside of each term. It is then seen that -f(lA • fff) (EQ)^ (1A + fff) «• (1B + fff) (EQ)^^ • fff) * HBA + HBBf * fHAA - fHABf °» (2.3D and also, (1B *-fft) (EQ)M(1A • • (1B + *tf)- (EQ)BB(1B + fff)f • D(f) » 0, where the quantity D(f) has been defined in eq..(2.16) That Is,, the operator f defined in eq. (2.16) is of the same size and satisfies the same defining equation as the partitioning operator f described in the previous two subsections. This result re-emphasizes the fact that this partitioning formalism is based on the idea of defining an eigenspace of a hermitlan operator, rather than individual eigenvectors. The •pull-through' relations, (2.30), are used extensively in the 2x2 partitioning formalism. They are most usefully written as f gAX - gj1 fr (2.32a) and gj1 f1" * ftgj1 , (2.32b) I'm the notation; established in the previous subsections. 21. 2»l.d The Relationship Between T and the Eigenprojections— Covariant and Contravariant Representations The columns of the partitioning matrix T, of eq. (2.2) or (2.5), can be regarded as a set of non-orthonormal basis vectors spanning the original n-dimensional basis space. These vectors will be denoted here by er, (r * 1,, n), that is $ « Oir e2, enJ -f 1 B (2.33) The metric defined by the scalar products of these vectors, Srs = W is given by ^ at II ss o gB (2.34) using the notation developed in eq. (2.7). Using the inverse metric, % = g]*1* a se-t °* contragredient basis vectors er, (r * 1„ re), can be defined by ^ n = Z grs e , s=l 3 (2.35) or [e , e , .•., e ] o gj1 h -f •B g* "SA f -1-5A (2.36) On comparison with eq. (2.10), the first nA of these vectors er can be identified as the first nA columns of the projection PA onto SA.. Similarly, the last nfi of the er are the last 22. n^ columns of Pg, = (1 - PA), the projection onto the comple-mentary subspace Sfi* Thus the two sets of nA vectors 1-3A * and EA = (PA>AA (PAV (2.37) are dual (contragredient), both spanning the eigenspace SA, while the two sets of nfi. vectors. and ®B 1, -'VAB *B " ^PA^BB (IWAB ^ PB ^ BB? (2.38) are also dual,, both sets spanning the eigenspace Sfi. Prom a different point of view, a metric & can be defined, with respect to which the er are orthonormal, namely, e • t-± • e = 6 , r s rs • (r,s s* 1, .*•, n) • (2.39) That is,, Here s ee i ee = (2.40) -1 Is the same as g- above. Similarly, the e are orthonormal with respect to ,, which is the same numerically as g^ above. It should be noted that A and g, and and g as denoted here are, im principle, quite different quantities. They happen to be numerically identical here because = e*e 23. (and %$} « Such is not the case, however, if the ori ginal basis is non-orthonormal. These sets of contragredient vectors are very useful for writing a number of important relations, to be developed later, in a very compact manner. 2.1.e Variational Formulation! of D(f) » 0 The expectation! value of an operator with respect to one of its eigenvectors is stationary with respect to arbitrary small variations in that eigenvector. As a result, if PA is a projection! onto the eigenspace SA of the operator H, then the expectation value of H over SA, given by E = tr PAH , (2.41) will be stationary with respect to arbitrary small variations in PA. That is E(PA • 6PA) - E(PA) - tr[PA(f+6f) - PA'(f)']H • tr 6PAH + 0(62), (2.42) must vanish to first order in the infinitesimals. It is assumed here that H is independent of PA or f. From eqs. (2.10), to first order, <6PA>AA " <hh • and (&P*) « -fg^sg g-1ff • 6fg"1ft + fgT^f1" , 10 A'BB SA 06A 6A 0 *A 6A • 24. where to first order, 6gA - 6fff • ff6f. (2.43b) Substitution of (2.43a,b) into eq. (2.41),, followed by use of the cyclic property of the trace, and the •pull-through' relations (2.32) for f and f*, results in an expression of the form 6E » tr6ffI + tr6f Df + 0(62), (2.44) where D = g"1 D(f) g"1. (2.45) Because 6f and 6f* are arbitrary variations in f and f*, the condition! that E vanish in first order is that the matrix 15 -1 -1 vanish. The matrices gA and gfi are positive definite,, however, and thus D can vanish only if D(f) itself vanishes. Thus, the condition that the expectation value of H over the image space of the projection PA be stationary is equivalent to the condition D(f) = 0, eq. (2.16).. It is also interesting to note here that the quantity D in eq. (2.45) is the BA block of the hamiltonian H, in the basis of contragredient, non-orthonormal vectors e of eq. (2.36). Thus one can write H. - ([(1 - fA) H P;V„ (Z.'.Sa) or = H o r (2.46beBeA and the rate of change of the expectation value E with an element for of f is seen to be proportional to the correspond ing element of the off-diagonal block of the hamiltonian^ in this particular basis. 25. 2 2»l»f Relation Between o and D(f) — Eigenvalue Dispersion lmthe study of matrix eigenvalue problems, it is useful to define the variance a , which is a measure of the error in an approximate eigenvector, x, of a matrix H, given by r2 _ (Hx - Xx))f(Hx - Xx) f . (2.47) X X If the approximate eigenvalue, X,. is calculated as the Rayleigh quotient of H with respect to x, x = i_Hx ^ (2.48) x x then. eq. (2.47) becomes o2 . * X X \ X ->X i which is in the form of the usual definition; of variance. In terms of the projection Px • x xT, onto the one-dimensional space spanned by the normalized vector x, eq. (2.49) can be written as o2 = tr H(l - Px) H Px * (2.50) Equation (2.50) suggests a generalization of the concept 2 • of the variance o to apply to projections PA onto a multi dimensional space spanned by several approximate eigenvectors > A of H.. Substitution of eq. (2.10) for P! into eq. (2.50), and use of (2,16),, gives o2 * tr H(1-PA)HPA = tr g^DCf Jg^pCf ).f - |«i*DCf)gX*||? (2.5D 26, where ||A Q = ( L |A- p.^ denotes the Hillbert-Schmidt norm of the r,s matrix A* This: may also be written im the form _2 _ 1J if rr -n* ~\ 2 = -£tr([ H , P. ]/)., (2.52) -1 If PA is an exact eigenprojectiom of H, the variance a in (2.51) must vanish, because then £ K » » 0.. Since g^ -1 2 and gg are positive definite matrices, o can vanish only if D(f) « 0. In this case, D(f) is seen to give a quantitative t measure of the error in PA, rather than merely a criterion for the presence or absence of error. In terms of matrix elements, one has o2 = Ll<0p Ig^DtfJg^l^l2 = E|<0°|H|0°>\2, (2.53) Pt t p,t where <fy , (,0=1', ng), and 0^, (t = 1, n^), are basis elements in the subspaces Sfi and SA,, respectively. The 0p, (p = 1,. ng,), and the 0^, (t « 1, nA), are the orthogonalized transformed basis vectors, f g -f (2.54) in the basis of the 0^ and 0^., above. Thus QC is seen to be a measure of the smallness of the elements of the off-diagonal block of H in this basis. Using the closure relation Z \p><p\ -1 - Eltxt|, eq* (2.53) can be rewritten as t = £ f0j|Hz|0?> - Z , <0°.|H|0°> % (2.55) t€S t,s€SA where 0?, 0°, in these summations run over eigenvectors in the x s space SA only. On transforming these vectors to a new set: (r = 1, ..., nA), which diagonalizes H in SA, eq. (2.55) becomes n=i = £A o2 . (2.56) n=l n If f is an exact solution of (2.16), uncoupling the parts of the 0°, (t = 1. nA)» in SA, and the 0p, (p « 1, • •», n^) in Sg, exactly, then the ^ in (2.56) are exact eigenvectors of H> and each on is: identically zero. If f is not exact, then the ^ will be only approximate eigenvectors of H, and o o is the variance of H with respect to the single approximate 2 eigenvector yn+ Thus o is the <i sum over these individual. variances, and is useful not only as a quantitative measure of the accuracy of f, but also as an upper bound to the 2 individual a... n. 2.1.g Transformation of f Under a Change of Basis The quantity f defined by eq. (2.2) is clearly dependent upon the basis set being used. Because of eq. (2.3), it does not transform linearly under a linear transformation of the basis vectors. Consider the linear transformation, *L -'.£ *..V.\. . (2.57) 28. of the basis vectors so that the eigenvectors of H, referred to the new basis [0*] have coefficients,; X* = V X. (2.58) In the new basis, partitionings of eigenvectors and basis vectors similar to those described in section 2.1.a can be carried out, yielding, x = XAA XAB i i XBA XBB h -f X AA 0 "BB (2.59a) where f = X-n A X» 4 , (2.59b) BA "AA analogous to eqs. (2.2) and (2.3).. To obtain the relationship between f and f , we proceed as follows.- Prom (2.2), X* « V X « V T X * A • = T X or <• I AAA* _ 1 T = V T X X , (2.60) whgfce" the right hand side of eq. (2.60) is independent of f , A • but does depend on the truncated new eigenvectors X • However, from the AA block equation; of (2.60), it follows that, XAA * <VAA + VAB f > *AA- (2.61) Substitution of this equation into the BA block equation of (2.60) gives f in terms of f and V only, *' = <VBA + VBB^<VAA + ¥rl" (2-62) While such a complicated transformation can be very 29. inconvenient in some cases, it is also a feature which can be usefully exploited.. In calculations in which the elements of f are acting as coordinates, the metric character of the object function can be radically altered by a simple basis change, because of the nonlinear dependence of f on V.. For quantities transforming linearly in V, such a basis change merely results in a rotation of the object function. This point is discussed in. greater detail in chapters 5 and 8. If f is small, the inverse matrix in (2.62) can be expanded as <VAA + W"1 - V'A <*A * VAB'VlA>"1 " VAA * ^AB^lA + ' and thus, to first order in f, F" • VBAVIA + <VBB " VIAW^IA + Thus, if f is small, the transformation (2.62) is nearly linear, although not homogeneous. 30 2.2 Effective Operators 2.2.a Basic Definitions The primary application of the partitioning formalism just described is in the construction of effective operators. In this context, such operators are defined in either of the sub-spaces SA or Sg. of the full basis space, but their eigenvalues form a subset of the eigenvalues of the original operator in the full basis space, and the corresponding eigenvectors are related im some way to those of the original operator. There are two ways of regarding the matrices of such effective opera tors. They can be regarded as the matrix of a transformed operator in the old basis (active sense), or, alternatively, as the matrix of the old operator in a transformed basis (passive sense). Both points of view are equivalent, but in , what follows, the former will be emphasized.. The: simplest set of such effective operators for the matrix H has already been defined in equations (2.14) and (2.15). In SA, we have the operator ftAaHAA+HABf> (2*65a) with the eigenvalue equation fi*XAA-XAA ?U)' (2'65b) and in SB» ftB * HBB " (2*66a) with the eigenvalue equation, 31. Both and Hg. are nonfhermitian in general, although their eigenvalues f ^ and f are real, since they are subsets of the eigenvalues of the hermitian operator H* The eigen vectors X^ and Xgg, are not orthonormal in general, because they are truncations of the orthonormal eigenvectors X of the full hamiltonian H. It is possible to derive a pair of self-adjoint effective operators directly from the eigenvalue equation (2.1a). Pre-multiplicationi by T+ , and use of eq. (2.17), yields, °AXAA * *L*Xk f <A)' <2'67a) where and, GA - (T'HT)^ = Haa * KAfif • fFHBA • fFHBB:f, (2.67b) where VBB * %XBB f U)» <2'68a) GB « (T+HT)Bg. = HM - HBAft - fHAB • fH^f*. (2.68b) * t * The off-diagonal block of T HT Is given by, GBA * «m + V - ™U " '"AB*' f2'6" which is just the quantity D(f), defined in; eq. (2.16). When GgA « 0, it can be shown that, GA " gA*A» QB. " gB%» (2*70) using eqs. (2.9)» and the definitions of the effective operators presented above* Thus, when f is known exactly, the self-adjoint effective operators GA and Gg could be considered to be obtained from the non-selfadjoint effective operators HA 32. and Hg; by orthonormalizing the eigenvectors of the latter. It is also possible to obtain, self-adjoint effective operators in SA and Sg. by orthonormalizing the truncated eigenvectors. The effective operators HA, Hg,,, and GA and Gfi above,, are uniquely determined once particular partitionings of the basis and eigenvector spaces are defined. The self-ad'joint effective operators obtained by orthonormalization are not unique, however, in that they depend on the particular orthonormalizationi procedure employed. The symmetrical orthogonalization procedure of Lowdin (1970) and others, has the feature that the new orthonormalized vectors resemble the initial vectors as closely as possible, in a particular sense.1 Applied to the present case, the new orthonormal eigenvectors are given by g * X.» , (2.71a) "AA " &A "AA in SA, and, CBB = «B* XBB • (2'71b) in Sg. Thus one has, CAACAA * XAAgAXAA = 1k * (2.72) CBBCBB = XBBgBXBB = 1B • by eq* (2.8).. The eigenvalue equation in is obtained either by premultiplying (2.65b) by gA or (2.67a) by gA to get, *In the notation used above, the difference between the two sets is measured by ^ij " Xij' which is minimized if G is given by eq. (2.71) (Lowdin, 1970).. 33. SACAA = CAA (A) where ~ 4 * -4 HA - SA* W (2.73) (2.74a) (2.74b) Similarly, premultiplicatiorri of (2.66b) by gg or (2.68a) by -4 gg . gives the equation (B) HBCBB * CBB f (2.75) where HB - gg* ifggg* -4 -4 (2.76a) (2.76b) It is also possible to define effective operators in, either SA or Sg for any other operator defined in the unpartl-tioned space. For some operator M, (2.77) where I 14 V] MAA MAB MBA MBB f = MAA + MABf + f MBA + f MBBf' (2.78) MA has the same form in M as 0A defined in (2.67b) has in H. Here M*A has the same expectation values for the truncated eigenvectors XAA as the operator M has for the full eigen-vectors X. . An effective operator with the same properties with respect to the orthonormalized eigenvectors is clearly given by, 34. which is analogous to R"A defined im eq. (2.74).- The analogue of the effective operator HA of eq. (2.65a) can be obtained by premultiplying MA by g * , following eq. (2.67b). Effective operators for M restricted to Sg, analogous to (2.77) - (2.79), can be obtained in a similar manner. 2.2.b: Eigenvectors and Eigenvalues of the Effective Operators In order to amplify the material in the immediately pre ceding subsection, the connection between the eigenvalues and eigenvectors of the operator H and those of the effective operators H, G, and H, will be illustrated here from a diff erent point of view. The full operator H has the eigenvalue equation 1 Tl 1 (2.80) Once the two basis spaces, SA and Sg, are defined, each eigen vector can be written as a sum of two parts, ti - tiA * fiB . (2.81) one part in SA and one in Sg*, The eigenvectors are them selves divided into two sets, 1PJ^* (i = 1, •-••»». nA), and » (i s 1* *••» nB), where nA + nfi = n, according as they lie in SA or Sg. The basic property of f is to map the part of 'f'^ *n SA *nito the parx *n SB» accordinS x0 35. V[B S f f±K* (2.82a) Similarly, one has,. - (-f*)^? * (2.82b) Combination with eq.. (2.81) yields, ^U) . ^(A) . ^(A) . ^(A) + f^(A> - (1A * t)V{£h (2.83a) and, * (lfi - ff) fiB - (2.83b) Ini the notation! used here and throughbut this subsection, the operator f is to be regarded, when necessary, as embedded in the n-dimensional basis space, but will, be denoted by the same symbol as before. The eigenvalue equation; for HA is ftA ^if * Ji^iA^ (i " nA>* (2'8*a) where the eigenvectors satisfy <^iA)lgAl ^JA}> " 6iF U,j B lr nAK t'2.8iH») For the effective operator GA, it is eA^U) ' Ii*' **tt' (i - 1 V' <2'85> with the same orthonormality conditions (2.84b). Finally, for the effective operator HA, the eigenvalue equation is |(t)X^) . (i - 1. .... nA), (2.86a) where, 36. and, <XIA)I^A)> A hy " L» V- (2'86C) In terms of the eigenvectors of HA, the eigenvectors of the original operator H are ^[A> - <1A * Dg^Xi^. (2.87) In all of these equations, the eigenvalues j» (i s i» •-•>•§ nA), are exactly the nA eigenvalues of the original operator H corresponding to the eigenvectors f'j^* (i a 1, nA) • The eigenvalue equations for the effective operators Hg, Gg.,, and Hg, defined in; Sg,, are of the same form as those given above for the corresponding effective operators in SA» Finally, consider the projections PA and Pg, onto the eigenspaces SA, and Sg., respectively. For PA, "A PA * f l^^x^^^l "A » . E (1A • f) Ifi^xfiVl (1A • ff) (2.88a) Here r E |^ [^x^l^l * g(A)-1 „ (2.88b) defines an embedding of the inverse of the metric gA in the n>-dimensional basis space. Similarly, Pl. E •|VjB)»«tiB>l i=l 37. - £ ' (1B- ff) l^iftX^i^ldB - *>.- (2.89a) where„ X ltiB><fifrl * g<B?M. (2.89b) is an embedding of the inverse of the metric gfi in the full n-dimensional basis space. 2.2.C Relationships With Other Formulations Many of the quantities defined or derived above have appeared in one form or another in the literature, usually in connection with the calculation of effective operators in a perturbation formalism.. The treatment by Friedrichs (1965) of an isolated part of the spectrum of an operator H, is particularly interesting in this regard. Several interrela tions between the current non-canonical formulation and the more commonly used unitary methods are illustrated by rewriting some of the quantities introduced in that treatment, using the block notation employed here.. Following Friedrichs, the aim; here is to obtain an expres sion for a projection operator P^ onto a space spanned by a set of eigenvectors which correspond to an isolated part of the spectrum of some perturbed operator H.. Rather than requiring that the projection P- be orthogonal (that is,, that the operator Fc be hermitian), or explicitly idempotent, it is 38:. required only that P€Po " V PoP€ * Po' (2.90) where PQ Is a projection onto the corresponding eigenspace of the unperturbed operator tt • These linear conditions, (2.90), imply idempotency, P2 . pep0pe . P,P0 - p€. and OJ O € O O € O thus verifying that PQ and P^ are projections. However, by themselves, they do not imply that P^ * P€, or that P* = PQ. Equations (2.90) represent the minimal conditions for P£ to be a projection, without prescribing any information about the internal structure of its image space. In a basis adapted to the solution of the zero order problem, that is, with the matrix representation, Po " (2.9D 0 0_ where the subscript A denotes the space spanned by the zero) order eigenvectors of interest, the form of the matrix repre sentation of Pg is restricted by (2.90) to 0 0 (2.92) where? f6 Is a matrix undetermined by (2.90). It is now possible to define mappings, Ug(SQ->Se) and l£(S,->S V. between the spaces S- spanned by the eigenfunctions t c O t 39. <3f the perturbed operator, and the space SQ spanned by the corresponding eigenfunctions of the unperturbed operator. Im terms of the projections PQ and P€, these mappings are U* = H(Pr P0). (2.93) as given by Friedrichs. It then follows that the operator = U* H€ U~ r (2.94) is from S_ to S , but has the same spectrum as Hct, the O) o t F perturbed operator. That is, Hg is an effective operator in the space S . + In the matrix notation introduced above, the mappings U~ are A 0 I, (2.95) i where the subscript B) denotes the space of all eigenvectors of HQ except those of interest. Thus, in the notation developed in the previous sections, H AB D(fe) ftB?(f€) (2.96) It is possible to define a new set of unitary mappings, U, , which map between S_ and S . and vice versa, as € to U +1 "A ;f. (2.97) I € "B|_ Using (2.97) instead of (2.95) in eq. (2.94),, a new trans formed perturbed operator is obtained, 40. (2.98) which is self-adjoint. The operators GA and Gg are given by eqs. (2.67b) and (2.68b). These results are in accord with the fact that the non-selfadjointness of the operators HA and Hfi, introduced im the previous section, is associated with the fact that the mappings between the two spaces and Sg are, not unitary (that is, they do not leave the inner product unchanged). We point out that Hg is block diagonal when the matrix block f€ of U*' satisfies D(f€) = 0r eq. (2.16). It is inter-+ esting to note that choosing the matrix block fg in U^ of eq. (2.95) to1 satisfy (2.l6)„ is equivalent to a partial reduction of H>€ toward the upper Hessenberg form, the result of a non-unitary procedure used in numerical matrix diagonali-zation. However, Hg is not exactly upper Hessenberg even if D;(f€) vanishes, because the diagonal blocks of H^ and Hg, are not upper triangular in general. Finally, note that Friedrichs introduces an operator (P^Pg)"1, which is defined only irr the image space SQ of PQ. In the matrix notation used abover PJP€ = lk * fjf6 = (gi)A. U.99) Thus* the orthogonal, projection onto S^,, given by Friedrichs as GA(fe) D(f€)T D(f€) GB(f€) 41. is written in matrix notation* here as ,-1 P = linr fe a-0 (gc) -l o 0 €'A €'A 0 0 a (2.100) which is identical to the projection PA of eq. (2.10). Finally, we also point out that the operator H, defined by symmetrical orthogonalization in eqs. (2.74) and (2.76), coincides with operators of Sz.-Nagy (1946/47? see also Riesz and Sz.-Nagy, 1955m §136),, Primas (1961, 1963)r and also? Kato (1966, Remark 4.4 of chapter 2)» 42. 2.3 Generalization to a Nonorthonormal Basis Set The formalism presented in the first part of this chapter can easily he generalized to the situation in which the basis functions 0^, (i * 1. •»•>,, ra)* being used, are not orthonormal, Ira this case, the eigenvalue equation; has the form H X - S X E, (2.101a)) with normalization! X'S X - in „ (2.101b) where the elements of the matrix S are the inner products of the basis functions, Slj5 - <0il0j>. The partitioning of the basis set, and of the eigenvectors of H into two sets of dimensions nA and ng:, respectively, is carried out exactly as before, leading to eq. (2.2), X = B AA BK = T X , where f and h are again formally given by (2.3). However, as a result of the more complicated normalization condition (2.101b), the simple relation (2.4) is now replaced by |;).. (2.102) h = -<sAA • t's^rhs^ • ffsBB/ Because of the complexity of (2.102), it is convenient here to retain the notation h and f throughout, rather than eliminate h entirely, as was done for the orthonormal case. The metric matrices for the truncated eigenvectors, as ih eq. (2.8), are given by the diagonal blocks of the product 43. TTS T, 6A AA AB: BA BB * (2.103a) and % = SBB + SBAh + AB + h SAAh- (2.103b) The projection PA still has the form (2.10), but the projectionsPfi must here be written, h PB * •B B ] --1 -1 hgg h hggj . (2*104) These projections are self-adjoint, but now the idempotency conditions become (PAS)2= P^S,, and (PgS)2 « PgS, as can be verified by direct matrix multiplication., Also, it can easily be shown that tr PAS * nA,, and tr PgS = ng. The defining conditions on f and h can be obtained from the analogue of eq. (2.13), namelyr KI«S T H„ (2.105a) where H: - X f X"1, (2.105b) is to be block diagonal. The non^selfadjoint effective opera tors HA and Hg are given by the diagonal blocks of (2.105a) as 'AA 'AB AA 4ABJ and «B * (SBAh + SBBrl(HBAh + HBB>}' (2.106) (2.10?) With these definitions, the eigenvalue equations for these effective operators have exactly the same form as in the orthonormal case. 44 Alternatively, the inverse matrices in (2.106) and (2.10?) could be transferred to the right hand sides of the eigenvalue equations for HA and Hg_, respectively, and be regarded as effective overlap matrices, giving eigenvalue equations of the form fiAXAA'SA*AA ?U). <2'108> and «EXBfiSB XBB: /(B)» <2'109) where now, fi! and HL are given formally by eq. (2.15), as A, "B3 K * HAA + HABf • (2-U0) and ! % * HBB?* HBAh- (2.111) The operators and Sfi are of the same form in Sy §A - SAA + SABf' H S SBAh + SBB* <2*112) Equations (2.108) and (2.109) are generalized eigenvalue equations for a non-selfadjoint operator* Using (2.106) and (2.10?) in the off-diagonal blocks of eq. (2.105a),, the defining equations for f and h,, analogous to (2.16) and (2.17), are now found to be and " HAB*HAAh-(SAAh*SAB)(SBAh+SBB)"1(HBAh+HBB> * °« 45. As for an orthonormal basis, the equations for f here are not coupled to those for h.. Equations (2,113) and (2.114) are not the only useful equations defining f and h. An alternative approach is used in some detail in the next chapter. Self-adjoint effective operators can again be obtained by premultiplying the eigenvalue equation, (2.101a), by T • The resulting operator, GA, in SA is given by eq. (2.67b), but the corresponding effective operator in Sg must now be written GB * HBB * HBAh * ^"AB + htftAAh (2.115a) - ggHRr (2.115bwith (2.115b) holding only if eqs. (2.113) and (2.114) are satisfied. The eigenvalue equations for these effective opera tors are as in eqs. (2.6?a) and (2.68a), applicable also in an orthonormal basis. The BA block of TfH T is GBA * HBA * HBBf * ht<KAA + HABf)* (2-ll6) which becomes identical to D(f) if h is given by (2.102). The effective operators HA and HR are given by eqs. (2.74) and (2.76). respectively, in this case. Their eigenvalue equations are given by (2.73) and (2.75)., Sets of contragredient vectors can be defined here in terms of the columns of PA and (1 - PAS);,, and their reciprocal vectors* These are useful in writing various quantities in a compact manner when a nonorthonormal basis is used. These vectors are considerably more complicated in this case than those given in section 2.1.d. Their detailed examination will tee deferred until some motivation has been provided for defining them. 46. CHAPTER 3 THE EFFECTIVE HAMILTONIANS— PRACTICAL CONSIDERATIONS "So I prophesied as I was commanded* and as I prophesied, there was a noise, and behold a shaking, and the bones came together, bone to his bone* And when I beheld, lo, the sinews and the flesh came up upon them, and the skin covered them abovet but there was no breath in them*" Ezekiel 37» 7,8 (KJV) 47. 3.1 Alternative Formulas The purpose of this section is to examine some of the inter-relationships between the effective operators described in sections 2.2 and 2,3, ih somewhat greater detail, especially when f is known only approximately. As has been pointed out before, the two alternative expressions for the operators GA and Gg given in eqs. (2,67b),, (2.68b), and (2.70), are equiva lent only if f satisfies D(f)=0. If f satisfies D(f)=0 only approximately, it is possible to distinguish two types of approximate effective operators, HA and Hg, namely, "A0 88 HAA + HABf» (3-la)  } " HB»- HBAft» <3'lDand HA2) • g"1 CJA , (3.2a) «K2) s «B % * <3.2bThese two types of operators are related by H<2) » k[X) + g~*ffD(1)(f), (3.3a) and "R1 • "i1* + *£n>Mlt)\ (3.3b) where the notation (f); is defined below in eq. (3.4)* Thus the two sets of formulas, (3*1) and (3*2), are equivalent only if D*l*(f) =0. In effect, Hij2* and Hg2^, here are gen eralizations of the Rayleigh quotient <V> |H \*J»/<'P\H»> for a single eigenfunction. The operators HA and H^ 'correspond 48 to the use of an intermediate normalization, involving writing the expectation! value of tt as «t>\ H | V*>/<0 | "P>, where 0 is some arbitrary reference function. The Rayleigh quotient is second order im , while this intermediate normalization is only , first order im In terms of the operators tt^1^ and H^2^,. eq. (2.16) can be written in one of two forms, D(1)(f) = HBA + HBBf - ffti1^^ 0, (3.4) and D(2)(f) * HBA + Hggf - fHA2) = 0. (3.5) These two equations are equivalent in that they both have the same solutions. However, their detailed forms are quite different away from this solution. Equation (3*5) can be obtained directly by requiring that T^HT,, rather than T+HT, be block diagonal, the latter being implicit in the derivation; of (2.16). It can be shown that the relationship between these two quantities is given by D(2)(f» * gs1D(1)(f) , (3.6) im the case of an orthonormal basis. It is also possible to distinguish between three different formulas for calculating operators of the type designated H*A, depending on which form of eq. (2.74) and also which form of HA is used. Only one such form is useful, and for practical purposes, is given by either eq. (2.74b) or by 49. In the case of a nonorthonormal basis, the situation is considerably more complicated. Because the orthogonality condition:, (2.101b), Is no longer simple, it is necessary to allow for the possibility that if f and h do not exactly satisfy eqs. (2.113) and (2.114), they may also fail to satisfy eq. (2.102).. As a result, the off-diagonal blocks of both the matrices, ** - I GA 6AB G » T*HT = ' GBA GB (3.8a) and gr a T ST = (3.8b) gA gAB must be considered to be potentially nonzero in what follows. Twoo pairs of operators H^ and Hg are again defined in this case,, Si1' " <SAA + W^JM* HABFI)' and (3.9a) " (SBAh + SBB)rl(HBB * HBAh)' identical to eqs. (2.106) and (2.107). and SA2> " *i\ • and H (2) . -1 Gtt ., LB " GB B These two sets of operators are shown in Appendix 1 to be related by the equations, (3.9b) (3.10a) (3.10b) 50. n[2) - H|1) + g'V D(1)(f)„ (3.11) and SB2)H*1* • gj1f D(1)(f)f* (3.12) where D^(f),, given formally by eq.. (3*4),, is the quantity im eq.. (2.113) definingr f. Thus, these two pairs of operators are identical only when both Dv '(f) * 0., The two operators HA ' and H"A ' give rise to two different defining conditions on f „ given by D(1)(f) = HM + H&Bf - (SBA + S^fjhj1* „ (3.13) and D(2>(f) = HfiA + HBgf - (SM • Sfi2f)H{2) . (3.14) In this case, the relationship between the two quantities D(1)(f) and D(2)(f) is D(2)(f* - (S^S^hKg^h'g^^ (3.15a) When gg^ = 0,, this reduces to D(2)(f) = (SBR • SBAh)g-XD(l)(f), (3.15h>) or D(2)(f) - [1B - (SM + SggffJ^^y^CfK (3.15c) The derivations of eqs. (3*12) - (3*15) are quite long, and have been outlined in Appendix 1* 51. 3.2 Implications of Inexact Solutions Consider an approximate solution, faPProxf to eq. (2.16), given by ^pprox = f + 6ff (3.l6) where f is an exact solution of (2.16). If the effective operators H^, GA, and HA are calculated using faPProxf the error 6f will result in errors in the effective operators at some order in 6f. Starting with the operator GA, and writing Gapprox = GA • 6GA ^ (3.17a) where GA, is exact#! it is easily verified, using (2.16), that 6GA = (6fff)HA + HA(ff6f) • 0(62)„ (3.17b) to first order in the errors. Here the operator HA is exact. Thus the error in GaPProx is first order in 6f. Similarly, from eq. (2.70), °A + S0A " <SA * 'SA»fii2) + 4fiA2))-or 6HJ2) = gJ1[6GA - 6gAHA] + 0(62). (3.18) Since Sgk = 6fff * ff6f • 0(&2). (3.19) eq. (3.18) then yields 6HA2> = Sx1t"A(ftfif) " <ft6f>*y + 0(62).. (3.20) On the other hand, from (3.1a), one has dfij1* = HAfi6f, (3.21) 52 exactly. Except for reA * 1, both these errors, 6HA ' and 6HA " are first order in 6f. However, (3*20) consists of the differ ence of two very similar terms, which actually vanishes for nA =1. This corresponds to the familiar property that the error in an eigenvalue calculated as the Rayleigh quotient of an approximate eigenvector is second order in the error in the eigenvector. For nA > 1, the first order error, (3*20), does not vanish in general, but, as will be shown presently, the first order correction; to the eigenvalues does vanish. Using eq. (3.20),. and the result = "gA*6gAigA* + 0(*2)* (3'22) it is easy to show from (3*7) that 6*A = tHAr g^Vfifg-* - *gA*«A*V'1, °^Z)' <3.23) This also vanishes in first order when nA = 1, but is ire general non-vanishing when nA, > 1. For a non-orthonormal basis, eqs. (3.17b) and (3»23) remain; the same because the formula for the operators GA and HA used ire deriving these results does: not contain the overlap *(1) matrix explicitly. However, for the two operators HA and HA , the form of the errors caused by an error in f does differ from (3»20) and (3.21).. From eq. (3.9a),. 6ftAl} " (SAA+SABfrl(HAB6f-SAB6fAA1)); + °<»2>* ^'2k) From (3.10a), and using the same procedure as was used to obtain eq. (3.20)„ one obtains,. 53. 42) • ^>XB + ftsBB>6f f3#25) - (SA& + ffSBB)6fHA] + 0(62). Except for (3-25) vanishing when nA * 1, both (3.24) and (3.25) are first order in 6f» Although the first order term in (3.24) now involves a difference between two terms, the two terms are * (2) not very similar, as is the case in (3»25)t and therefore ' is still expected to be inherently more accurate for a non-*(1) orthonormal basis than HA The first order variation in the eigenvalues of GA due to some variations 6GA, £gAt in the operators GA and gA,< respec tively, is given by 6 ?i 55 <^U)|6GA " li6^l^lA}> + 0(62)* (3'26) The functions Y'jJ^ are the eigenfunctions of the exact operator GA, and eq., (3*26) follows directly from the eigen value equation, (2,85). for GA» But upon substitution of (3.17a) and (3.20) into (3,26), and using the eigenvalue equation (2.84a) for H~A, it is seen that 6^ of (3»26) vanishes in the first order in 6f» *(1) In the case of the operator HA ',, the first order error in the eigenvalues is given by (3.27) which clearly does not vanish in general. On the other hand, one has, • •<+^)i^si2)i^)> 54. « <tit)|HA(ft6f) - (ft6f)HA|^[A)> + 0(62) = 0(62). (3.28* (2) The operator HA ' is thus inherently more accurate than the operator HA , when evaluated using an inexact f» The first order errors in the eigenvalues of the operators HA are just given as the expectation values of the first order error operator,(3*23),with respect to the eigenfunctions of the exact operator HA, defined in eqs* (2.86b). Since 6HA can be written as a commutator to first order, its expectation value will be zero, and therefore, 6li a <^CU)|6SA - 0(62)- (3-29) For a nonorthonormal basis, only the expectation values of €HA ' and SKA ' are different in nature from those given above for an orthonormal basis* From (3*24),; one obtains, which does not vanish in first order in 6f under any obvious general conditions. However, from (3«25)t *fi2) • <^)|gA6fii2)i^u)> + 0(62 > " <^iA}I ^^\B+ftsBB>6f^SAL^ffW»A IV'&Wft2) * o(62).. (3.31Therefore the eigenvalues of HA ' in a non-orthonormal basis are affected only in second order by errors in f * 55 3*3 Perturbation Theory For H^,, G^.. and The purpose of this section is to outline perturbation formulas for the eigenvalues and eigenvectors of the effective operators HA„ GA, and HA„ defined in the space SA» For the most part, the formulas presented below are not new, however, those for HA and GA are not well known. These formulas are necessary if the eigenvalues and eigenvectors of the full operator are to be calculated via a perturbation procedure based on; this partitioning formalism. The formulas for HA will be derived in some detail, ; because they are unusual in that HA is a non-selfadjoint operator (but with real eigenvalues)* Those for GA and HA will then just be summarized* 3.31a The HA Scheme The eigenvalue equation in this case is written, SA*I " Mi • <^il«Al^r - 6ir <3-32) where the subscripts and superscripts 'A" on the ^ and the <y> ^ have been suppressed, and will be throughout this section. The metric matrix gA is selfadjbint. We have A npO A x n*0 x oo (nj oo (nj (3*33) ' 1 n=0 J 1 A n-0 A where the superscript is to indicate the order of the term 56. in the perturbation parameter? (or parameters), and the solution of the zero order eigenvalue equation «!°M03= fi0)'*i0). ^i0,i40)^i0)>-4H (3.3t) is known. The terms in the series for HA and gA are given,, and the terms in the series for {p ^ and are to be calculated. Consider the first order term of the eigenvalue equation and normalization condition (3*32), given by (SpL tfh^ • (S<°>- $»)1>™ = 0. (3.35a) and 2<*[0) U<0) ^[iy> * <^°> Igi1' \ii0>> - o. (3.35b) when all quantities are real. The first order eigenvalue is obtained by premultiplyihg (3.35&) by (g^0^^0*)*, and integrating, to give «(D = <V.<0H4°>5|1) W[0)> . (3.36) No contribution is obtained from the second term of (3«35a), since from (3.34), cancelling ithe rest of the term. The first order wavefunction is obtained in a similar manner. Premultiplying (3»35a) by (gA^^fe0^* and integrating, gives < i i°>- j<o)x^o) i46) i «i>>-<^0> ui0'^!1'- ;il*H0)>. Writing Vi1' here as, V-J1* - S^0^.: (3.3?) them gives <^(0). JO)-(l) i^(0) aki = TToT 7W) * (3.38a) ?i " £k The coefficient aj^is obtained from eq. (3«35b) as a and thus (1) » .^c^0>|^1>|^0>> ,, (3.38b) <^,(0). (0)-(l) |-i,(0) ^(1) . E <T.1 'gA A 1^1 > ^(0) and tfi g(0) *(0) (3.39) >i " 'i The second order terms of eqs.. (3»32) are (fif ).|<f))t<0) + (HA1)-|il,)ti1)* (HA°>.^°>)V<2) = 0„ (3.40a) 2<1p(2)|g(0) |f|0)>+ 2<^(l)k(l»|t(0)^ (3.40b) —-v^i1} 140> i ^ [0) 142 > i - o. The approach here is the same as that in the first order case. Premultiplicatiore of (3.40a) by and integration, leads to, Substitution of eqs. (3.36) and (3*39) into (3.41) to eliminate f>^"^ and fronri the latter, results in a formula somewhat 58. reminiscent of the usual Rayleigh-Schrodinger second order energy formula, w) <H0)i^8)si1)i»S0)x»i0)i^1>i»^> — ' •""""•'—I Illl. I .1 II1JI.IBII. -OIMI 1 11.11 I- II .IIIJM.IUI. ..Mill-fx " j/i fi(0) -(0) Jri " Ii (3.41b) f[0)|gi0)H{2)|T[0)>. The second order wavefunctiom ^ is expanded] im terms of the zero order wave functions,, (2) and the coefficients a^' determined from eqs. (3.40a,b) in the same manner as was used in the first order case. The final result is k/lL s(0) »(0) i " fit (3.A3) s -ic?^1'!^'! ^°W[°VA' i^W^ui0* if[1)>]V'i0> The pattern is now clear. The n order terms of eq. (3*32) cam be written as Z (ftp* - % <3>)^(n-J> « o, (3.44a) and Z Tj <ti'5)|ir;'""k) l^ik)> * 0* (3.44b) Premultiplying (3«44a) by (gA°^ )f» and integrating gives, |in) =<fi0)|gi0)Hin)|ti0)> (3.45) n-1 + ;i1<^i0)U<0)(5ij)-;ii,)|V'in-j>>. 13 X 59. The rr**1 order wavefunction, is expanded as a linear combination of the zero order wavefunctions, and the expansion coefficients are deduced from-eqs, (3.44a,b), The result is <^k^^ Is^^^^^i^^*^^^ k^ lgA^ (H^^-^^ ) |^n"*^> ^0) k/i ^(0) m ^(0) (3.46) j=0 k=0 1 A 1 1 k/n No attempt has been made in any of these formulas to eliminate higher order quantities in terms of lower order ones• because that leads to computationally less efficient formulas. All formulas above are given in terms of the eigenfunctions of HA» To obtain formulas applicable in a matrix notation, the functions are replaced by column vectors xj^, and all operators by their matrix representations. 3.3 .b The GA Scheme The eigenvalue equation in this case is written, Both G. and gr. are selfadjoint. We have (3.48) 6o* where the ajj^ and the g^ are given, and the and the % ^ are to be calculated. It is assumed that the zero order eigenvalue equation, 0(0) ^(0) . f(0)g<0) ,(o)r <V,(0)|g(0)| ^,(0),. . 4^ f (3<w has been solved* The n order term of (3.47) is then; n i E [o[3) - ( E S[j)gp-k))]tin-j0 a 0„ (3.50a) and E "s3 «+<i»| .(J-J-W | . o. (3.50b) j=0 k»0 i 1 A 1 The m order eigenvalue cam be obtained by/ pre-multiplying (3.50a) by ^°^t and integrating to give |<*> -V^^IGW-C i dk)4j-k))|^n-j)> j«l k=o"1 A 11 • <t(0)|oAn),V?(k)g(|-k)|t(0)^ (3>51) The n^h order wavefunctioniis expanded in terms of the and the expansion coefficients deduced from eqs. (3»50a,b>). The final result is. . « - <^>l4J)-1^[1)4J-1)lVi-J)> f(„ k/i j=l ^(0) (Q) -i1?1 "L3 '<^,»)|^*-J)|V{«>*i°> (3'52) j=0 k=0 1 A 1 x k/m These formulas can be shown to be equivalent to those derived 61. Ira the HA scheme, by expressing the G^?^ im terms of the gj^ and H|^,, according to eq. (2.70). These results also agree with those given by A.. Imamura (1968),, in, a different notation* to second order. The first order formulas here are ?(D . <^(0)|G(1). . f{o)g) |^(0)>#j (3.33a) ^(1) « 2 ^ ISA -?i % in > ^(0) 1 j/i c(o) g(o) d fi f* (3.53b) -K^Mgi0 |^[0)>^0). The second order formulas are (2, = E |<^>ia{1>-;i°>41)lVi8)>|2 'A 3/1 g(0) «(0) ri " /j (3.54a) ^(0)^2)^(0)^2)^(1)^1)^(0)^ and *i2) •J^no)-^o))-ic<Mi^'?>^oin*,4i>i*ii)> ^i0)i42)-|[i0,42,-li1)41,i^i0)>]^c> -K<vi?°i42)ivi0>> * 2<*{1)l^1*l"*i°>> ,<fu)|gio,|V,a)>]V,(o,(3^ Here* eq. (3«53b) was used to eliminate Y'j1^ from the expression for js[2^. The resulting formula is longer, but the first term is now of the more familiar form for a second 62 • order energy formula. The extra terms here compared to the usual Rayleigh-Schrodinger formula are due to the presence of 41* and g<2>. 3,3,-tr. The HA Scheme The eigenvalue equation in this case is «A*i-/i*i' <*!•!*,>-•«• <3,55> where and the are the eigenfunctions considered in the and GA schemes. We have. HK -Xi - *^in). fi - = f(n). (3.56) A n=0 A -1- n=0 A ' x n=0J The solution of the zero order eigenvalue equation* SXo) ^(o) . j(0) ^Co),. <X(0), ^(o)v . (3>57) is assumed known. The HAN^ are given, and the "XJ11^ and $ are to be calculated. Since HA is selfadjoint, this is just the usual Rayleigh-Schrodinger perturbation theory. The mth order term of (3*55) is £ - ftfhXi*-** = 0, (3.58a) and S <*{J)|X{N"3)> - 0„ (n/ 0).. (3.58b) j*0 1 x Pre-multiplying (3*58a) by %\0^ and integrating gives the nth 63. order eigenvaluesr n*l j f <w) - <x[0)|sin)|^[0)> • 5 <xi0)|fiiJ)-5i3) |xin"j)>. (3.59) The m order eigenfunction is expanded im terras of the zero order eigenfunctions,, *in) • 0o>an) •• (3-60) where a<f = Z * ' ?i 11 „ (k / i),(3.6la) K1 j=l «(0) g(0) and* aii): = <X£j)|Xini"J)> (3.61b) The coefficient ajP vanishes, but, in general, the afn^ for n > 1,. are not zero,. Equations (3*59) and (3»6l) can be written in terms of the eigenfunctions ^{3) used in the and GA schemes using - * gA*(k)Y43-k). (3.62) The terms lm the series g** - ? gAiU). (3.63) are given below,, in chapter 6. 64 CHAPTER 4 MULTIPLE PARTITIONING THEORY "Great fleas have little fleas upon their back to bite 'era,, and little fleas have lesser fleas, and so ad infinitum* The great fleas themselves in turn have greater fleas to go on,, while these again have greater still, and greater still, and so on/ (quoted in C. F» Froberg, Introduction to Numerical Analysis. (1969)) 65. Ira this chapter, the possibilities of generalizing the formulas derived im the preceding two chapters to a more extensive partitioning will be examined* Such mix nr parti tioning formalisms, for m > 2, have a number of applications lm the construction of effective operators and in the derivationi of perturbatiom formulas in. eigenvalue problems in which it is convenient or necessary to divide the eigenvalues and their eigenvectors into several distinct sets. The limiting parti tioning formalism is that in which m: « n, that is, the m-dimensibnal space spanned by the basis functionsaand by the eigenvectors, is partitioned into n one-dimensional spaces.. This is the ordinary eigenvalue problem.. 66. 4*1 The nr x mi Partitioning Formalism 4-1.a Basic Theory For the present, it is assumed that the basis set used consists of orthonormal functions, so that the eigenvalue equation to be examined is H X = X £ , XFX = 1 ,. (4.1) where H is hermitian, X^ the matrix of the eigenvectors of H, is unitary, and $ is the real diagonal matrix of the eigen values of H* The set of basis functions is now divided into m> subsets, each spanning one of m subspaces,, S^, S2, ...» S^, m of dimensions,, n^,, n^, n^,, respectively. Here, E n^ is equal to n, the dimensions of the full space*. Similarly, the set of n\ eigenvectors of H, which are represented as the columns of X above, are divided into m subsets,, X^1^, X^, (m) • • ' Xx ',, each spanning one of mi subspaces S^„ S2, • «., Sm, of the same respective dimensions n^,, n^,, • ••» n^* Because of this double partitioning,, the matrices H and X can be written in an m x m block form, H = "ll H12 H1M 11 X12 ... x1M "21 H22 — # » X = X21 X22 ... x2M • * • * • . • • • • • • * "Ml %2 %1 %2 (4.2) where the symbols Hjj and XJJ represent nj. x tij dimensional matrix blocks* Let the diagonal part of the eigenvector 67 matrix he denoted by X kll *22 (4*3) MM The basic quantities in this mi x nr. partitioning formalism are now obtained as the off-diagonal blocks of the operator T, defined, as for a 2 x 2 partitioning, by the equation, X « T X . (4.4) In the notation to be adopted, one has TII s 1I» and (I,,J * 1, • m), (4.5) T, = fn J / I, IJ " *IJ » where lj is the identity matrix in the space Sj,, and fjj is an nj x nj matrix given by or XJI = fJIXII " fJI = XJ;I XII ' (4.6a) (4.6b) The operators f^j are straightforward generalizations of the two operators f and h defined for the 2x2 partitioning (where f s f21„ and h « £,?)•• The specific operator f,^ maps "12 ,(L) KL the part of an eigenvector, xp . Iyihg< in the space SL, into the part lying in the space SK>i where the eigenvector x^ is 68© irt the space S^.. It; is seen from eq#. (4,6) that all the fjj, (I,J * 1, m, I / J),, exist only if the matrix,.X, of eq. (4..3); is non-singular,, or alternatively,, only if the diagr onal blocks, (I = 1,, m)r are nonsingular., However, if the full eigenvector matrix,, X.,, is itself nonsingular (as It must be if H is hermitian, since then X is an orthogonal matrix, with inverse given by X*), then there is at least one partitioning of the basis functions for which X. is nonsingular:. A particular block Xj^ will be singular only if at least one of the eigenvectors x£^, (r - 1„ n^.)* is orthogonal to the basis subspace S^. A The blocks fjj of the partitioning operator T in eq, (4,4) are not entirely independent.. From the orthonormallty condition (4.1),, one has 4- A 4. A 4-A A XTX = XTTTT X = 1 m orr T T = (X XT); 1 = g„ (4.7) A which is to be block diagonal.. Thus the blocks of T are related by the equations, t nr. t SjK = fKJ + fJK + L^ fLJ fLK = °* (J»K=1»J/K)» L/J,K (4.8) Since g is symmetric,, eqs. (4.8) represent £m:(m-l) unique matrix block equations, involving the m(m-l) different off-A diagonal blocks of T.. Equations (4.8) could be used to eliminate half of the elements of the off-diagonal blocks of T in favour of the remaining half.. While this procedure leads 69. to; the very simple result f^2 = -f2i *n 2x2 case, when mi is larger than 2, the increase in complexity of eqs. (4.8) makes it impossible im practice to incorporate these equations explicitly into the general formalism. As a result, im what follows, a notatiom involving all m(m-l) off-diagonal blocks of T will be used (though eqs.. (4.8) are implicit, at least when the fjj are exact)* For the cases m * 3 and 4, eqs. (4.8) are examined in somewhat greater detail in Appendix 2. Equations (4.8) express the orthogonality of eigenvectors of H belonging to different sets X^ and X*K*. Thus it is not necessary to impose them explicitly, since if H is hermitian, this orthogonality is automatic. As a result,: in many appli cations,, the increasing, complexity of eqs* (4.8) with increasing mi is of no practical concern* The diagonal blocks g| of the matrix gr of (4.7) are metrics,, with respeot to which the corresponding; truncated eigenvectors,, X^j,, are orthonormal* That is, XII gI XII = h (^*9) where m % " <*f*>n " h + & fJI fJI #1 (4.10) The projections P^ onto the eigenspaces cam be written solely-in terms of the (J = 1* •••• nr, J / I), for each I* Using (4.7) and P^ = X(I)X(l)t = T^g"1^1**,, it is seen that <PI>KL " fKI & fll " 70 The definition-, (4.10),, of can be used to establish the rdempotency of Pj, and if eqs. (4*8) are satisfied, it is easy to show that PJPJ = 0* Equation (4.10) alone is suffic-lent to establish that tr Pj = n^.. Thus, in expressing the projection operators, P-j-,, (I = 1„ ».., m), im terms of the fjj as In (4.11),, it is necessary to constrain'the m(m-l) blocks fjj only If the Pj are to be mutually orthogonal. Furthermore,, this formalism provides an apparatus via eqs. (4.8),, however tedious it may be,, to express the Pj in terms of a minimum number of unconstrained variables. The minimum number of variables required to describe the * i eigenspaces S^,, ..., & m Is of considerable importance here, as in the 2x2 case. The projection Pj onto the eigenspace is completely specified by the n n^ complex components of (I) ' the eigenvectors X.N ,.. which spam S^.. However, the space is equally well spanned by any set of n^ vectors obtained from the x^r (r = 1,, n^), by a nonsingular-linear 2 transformation.. Thus, there are n^, complex variables lm (I) • X* ',, which serve only to specify a particular basis in S^.. Furthermore,, the orthogonality constraints im(4.1), written, x(I)tx(J) = 0„ I < J, nr. 1-1 cam be used; to elinrinate a further Z nu E n\T complex 1=2 1 J=l J variables from all of the X^^„ The remaining number of nom-redundant and unconstrained variables required to simply specify the eigenspaces S^„ ••*, Sm, is thus 71 nr, nri 2 mi 1-1 m, 1-1 E numT - E ny - E m,- E m, = E E nTnT. I,,J=1 1 d 1=1 1 1=2 1 J=l J 1=2 J=l 1 J (4.12) This is just the number of elements in the upper (or lower) Mock triangle of T,. and is also the number of independent variables left in T when eqs. (4.8) are explicitly incorporated into the formalism.. This multiple partitioning formalism can be defined completely from the point of view of the determination of the eigenprojections P^,, (I = 1,, •>...„ nr,)„ in a manner analogous to: that used in section 2.I.e. From. eq. (4.11):,, it is easily seen that rKL " <Pi>KL (PL>£L ' (*-13) One of the difficulties in manipulating quantities in this multipartitioning formalism arises from: the fact that there is nxrcounterpart here to the "pull-through" relations, (2.32), which were used extensively in the 2x2 case to simplify the various expressions arising.. In. fact, in this case, the analogue of eqs., (2.33) is which,, for J / K,, gives fJK%1 = -gjlfK.T - ,£ f S"lft * (J"K = !»•••»> m„ J/K). (4.14) Equations (4.14) are not of great use in general because of the summation term on the right hand side., In the 2x2 case, this term-i does not occur,, leaving eqs;* (2.32)., 72. 4*1 »b: The Defining Conditions an-, the fjj. As^ im the 2x2 case, the off-diagonal blocks of the A partitioning matrix T can be determined by diagonalizing the matrix H„ to obtain its eigenvectors, X,. and then using eqs* (4.6) directly* However, it is again possible to formulate systems of nonlinear equations which can be solved to obtain the fJJ directly,, thus making it unnecessary to fully diag onal ize H* Consider first the eigenvalue equation (4.1),, written,, using (4.4)r as A A A _ A_1 A A , , V H T = T X f X 1 « T H„ (4.15) A where the: matrix H,, given as A A _ A H = X f X -1 A H, 0 (4.16) is to be block diagonal* Equation (4.15) is valid only if A the diagonal block part, X, of X is nonsingular,, which is exactly the condition that must be satisfied if the fJ;P are to exist* The diagonal block parts of (4.15) define the operators H-j- and the off-diagonal block parts provide equations for the Thus, one has A m. HI * HII + HIJfJI * (4.17) J/I 73. The equations determining the fjj are then, nn DJI<*) « 0 - HJJ + ^ HJKfKI - fjjHj K/I (4.18) m> mi = HJI + K^ HJKfKI " f JIHII " tJI1^1 HIKfKI 9 K/I K/I (I„J - 1, .»•,, mi, i/j). im Equations (4.18) consist of E n_niT coupled nonlinear I„J-1 1 J I/J equations In the matrix elements of the f The solutions of these equations will automatically also satisfy eqs. (4.8) because oi" the hermiticity of H.. Explicit incorporation of eqs. (4.8) into (4.18) could be used to reduce the total number of equations and variables by a factor of two, but at the expense of greatly increasing the complexity of the equations to be solved., In eqs. (4.18),, coupling occurs only between f.jj in the same block column of T.. Thus, the (n - n^Jn^ equations DJJ(T) = 0„ (J =1, ...,,m, J/I)> can be solved for th * the elements of the I block column of T, namely, the fjj, (J - 1* m* J/l)„ without having to determine any of the fKL for L=I* A somewhat different set of equations for the off-diagonal Mocks of T: result if the eigenvalue equation is rewritten as G = T HT = T T X f X. „ (4.19a) and ff = rfT = (X V)'1 r (4.19b) 74. where the second equality in (4.19a) is obtained using (4.15). Both G and gr are to be block diagonal, and the condition, that their off-diagonal blocks vanish provides equations for the fJJ. Since both; G and g are hermitian, the vanishing of their off-diagonal blocks cam each provide only £ £ n~.mT unique I/J x d equations,, and thus both of (4.19a) and (4.19b.) must be used together to determine all the fjj* This results in a set of coupled nonlinear equations of the form' °JI* " ° " % + jjfc HJLFLI + £ FKJHKI +RJ 'JAL^U ' (4.20> and. t m t «JI = 0: = FIJ+ FJI + Llt FLJFLI * <^21> L/J.K (T,J = 1, mi, J < l).)f where eqs. (4.21) have appeared before in (4.8). These equations A effectively couple all of the off-diagonal blocks of I, and A therefore,, the entire matrix T must be determined at once if (4.20) and (4.21) are used.. As a result, while the system of equations (4.20)-(4.21) has the same solutions as the system (4.18),, the two systems must be treated quite differently from1 a computational point of view. 4.1.c: Variational Formulation of the Equations for the fjj. In this multiple partitioning procedure, it is also possible to show that eqs* (4.18),. determining- the fjj, are equivalent 76* to a variational criterion,, in that the vanishing of the quantities DKI(T), (K = 1* ..., my K/l), implies that the trace of the operator H over the image space of the projection, operator Pj is stationary. This stationarity implies that Pj Is an> eigenprojection of H* The algebra required to demonstrate this is considerably mere tedious here than in the case of a 2 x 2 partitioning* The objective is to obtain: an expression, for the first order variation: of the quantity, EI * tr PIH " T * tr(PI>KJ%C * <*-22> with respect to small variations in the f^jt (K=l, .»., my K/l). Fromi (4*11), one obtains,, 6<PI>KJ " 6fKIgIlfJI + fKI*%lfJI + fKI%l6fJI + (4.23a) where, SgJ1 = -gj^gjgj1 + 0(62) 1 ill (4*23b:) " £l (6f^lfLI * ^I6^^!1 + 0(52)' Substitution of eqs. (4.23a,b) Into the equation, m , 6Ej = tr 2 (6PX)KJHJK •• (4.24) when H is independent of the fjjr leads to the rather compli cated expression, 6Ej = trrE 6fpigJ1[(TtH)IF - (T^Tj^g^f^] P/I .c (4.25) + tr 2 6fpI[(HT)pi - fpjgJ^T^T^jlgJ1 + 0(&2). P?I ?6 The passage from (4.24) to (4.25) uses the cyclic property of the trace. Consider the coefficient of 6fpi in (4.25). From (4.18), one has,, (HT)pi - fpjg^^HT)^ " Dpi(T) * fpjg^CgiHj - (T'^HT " DPI<*> + 'KSI^^IA - (i'HT)^] A 1 M" + » A • DPI<T) + fPIgI ^ fKlCfKIKI - <HT>Kl] * -1 m t * = DPI(T» - fPIgI ^ fKIDKI<T> Consequently, the vanishing of all the dJQ(T)» (K=l, my K/I)r Implies the vanishing of the coefficients of 6fpi in eq. (4.25). Since the coefficients of 6fpi in (4*25) are just the adjoints tt of those of &fpj»< they vanish also, causing oEj to vanish to first order im the infinitesimals* The vanishing of 6Ej to first order implies the converse, namely,, that all DKjt (K=l,; .... nr, K/l), vanish* This follows from the fact that the rank: of the matrix (1 - P^) must be n-n-j. if it is to project onto the complement of the eigenspace S^ of H of dimension n-nj. Because of this, the set of linear systems (one for each column D , of D T)» written compositely as m • * ^ (i - Pjjj^d) - o „ K/I has only the trivial solution DKI(T) = 0, (K=l, m, K/l). 77. 4.1. d Transformation of the fjj Under a Change of Basis Under a linear transformation of the basis functions {.0^; * (see section 2.1.g), the eigenvectors of K become (eq. (2.50)), x' = V X •IX , (4.27) where Tu • h -and (I,J = 1„ m). (4.28) At A The off-diagonal blocks of T in the new basis and those of T in the old basis are related by A All A A A ,1 1 r = v x x x= x T x x 1 from- which, fJI = (V?)JIXIIXH' (4'29> But, from (4.27)# one has, • A XII = (VT}IIXII and thus, which gives the fjj solely im terms of the transformation coefficients VJJ, and the fjj in the old basis. Again, the transformation for the fJJ under a linear basis change is complicated and nonlinear in both the coeffi cients of the transformation and the old variables. While 78. such a complicated transformation is doubtless disadvantageous under some circumstances, it can be also usefully exploited, as pointed out in section 2.1*g. Note also,, that, writing, <VII + VKl)"1 - <4+ ^ VliVIK*KI>"lvH * (1r & vriviKfKi + 0(f2),vli • it is seen that fji - vJivn + K% <VJK- vjivnviK^Kivn * °<f2>' W1 (4.31) which, for small f» is nearly linear* but not homogeneous. 79 4*2 Effective Operators 4*2.a Basic Definitions Like the 2x2 partitioning formalism, one of the primary applications of this multiple partitioning formalism is in the; construction), of effective operators. Such operators would be defined in one of the subspaces, S^, of the full basis space, but would have as eigenvalues, a particular subset of the eigenvalues of the original operator H in the full space. Those eigenvalues correspond to the eigenvectors of H spanning the space Sj* Since they are restricted to the subspaces in which the effective operators are defined, the corresponding eigen vectors of these operators are simple, or orthonormalized, truncations of eigenvectors of the operator H in the full space. However, given the matrix T and the eigenvectors of the effective operators, those of the original operator K im the full space can be obtained straightforwardly* The types of effective operators arising here are analogous to those defined previously in the 2x2 case* The simplest set of effective operators has been defined already in eq. (4.14)* These operators, , * nr HI = HII + 2 HufJi ' U = 1» .*., m);, (4.32) are defined; im the corresponding subspaces Sj, and have the' eigenvalue equations, % XXI = f(I) XI;[ , (I = 1, m), (4*33) 80. as seen from eq. (4.13). Here, f ^ is the I**1 diagonal block of the matrix of eigenvalues, f , in eq. (4.1). The operators Hj are non-selfadjoint, in general. However, their eigenvectors, X^j, are orthonormal with respect to the non-unit metrics g^, according to eqs. (4.9). The corresponding basic set of self*adjbiht effective operators are those defined by the diagonal blocks of eq. (4.19a), namely, Gj = (TtH0?)II , (4.34) with the eigenvalue equation, GI XII = gI XII ?U) •• (I = 1» C4.-35) where gT - (TfT)TT . (4.36) -I Ni X'II In detail,, one has. GT - HTT + E (fT-rHrT + HTTfTT) + E ftrH II T jpT ^IJ^JI " nIJAJl' T AJIilJKAKI» K/I (4.37) If eqs. (4.16) are satisfied, this can also be writtenasy GI " I (it)IJ(H5)JI • = <$t>IJ<™')JI. » (T^JJJHJ - gjHx . (4.38) As in the 2x2 case,, other sets of self-adjoint effective operators can be obtained by orthogonalizihg the truncated eigenvectors by other procedures. Lowdire's (1970) symmetrical orthogonalization (see section 2.2.a) leads to the set of orthonormal eigenvectors, 81. CII = gI^XII • (I = 1, .... ra),, (4.39) where Cj^C^ = Xllgixil a 1l» fev eci# (4..9)» These new vectors satisfy the eigenvalue equation, Hj CJJ = CIT (4.40) where Hx * gj* Hx gj* , (4.41a) - gj* Gx g|* , (4.41bthe equivalences here being based implicitly on the assumption m that the partitioning operator T used is known exactly. Effective operators in the spaces S^,. • • •, S^, can be defined for any other operator in the full space, using the definitions (4.4) and (4.39)• In particular, matrix elements of some operator 0 can be written, x(I)t0 X(I) = XJJ'OJ Xn. (4.42a) Here Oj = (Tt0T)II , (4.42b) is an operator confined to the subspace S^., but possessing the same expectation values with respect to the X^j as the original (I) operator does with respect to the full eigenvectors XN '•• A second type of effective operator, \ a gI^ °I gI^ ' (4.^3) will give the same matrix elements with respect to the ortho-normalized vectors Cj^,, of (4.39), as the original operator 0 does with respect to the X^^. Here 0^ has the same form 82 in 0 as the operator Gjt eq. (4.34),, does in H, and Oj is of the same form as HT given by (4.4lb)* 4.2.b Eigenvalues and Eigenvectors of the Effective Operators Up to this point, the m x nr partitioning formalism has been presented almost totally in matrix notation. It is instructive, however, to re-examine some of the relationships quoted previously, from the point of view of the actual eigen functions of the operator H, and derived effective operators. The eigenvalue equation, (4.1), for H is written as H^i " ?i^i ' (irj = 1,. ..... n);. (4.44) If a partitioning of the basis space into m subspaces „ S ». is carried out, these eigenfuctions of H can be written as a sum of parts, m ti • fa +*iZ + ••• •tim " £*U> (4.45) with 'Vf>iJ being the part of ty^ lying in the subspace Sj. The partitioning of the eigenvectors of H into m sets, spanning eigenspaces .... Sm, merely divides the Y ^ into m sets — the notation 'V/[J^, (J=l,, ... ,mj i=l, nj), now denoting the i*h member of the j*h such set. The basic equations, (4.6), of the partitioning formalism then are, 83. This means that the eigenfunctions of H in the full space can be written as In the notation used im this section, the symbol fKJ represents an embedding of the mapping fKJ» (Sj -*> SK) t in the whole n-dimensional basis space. It is a simple matter to write down the eigenvalue equations for the effective operators in this notation. The counterpart of eq., (4.33) for the operators Hj is, HI Til " fi ^il » (i,j=l...., nT), 1 (4.48) <¥llls\el\1$)> = *ir (I = 1^tt mU The eigenvalue equation, (4.35), for the operators Gj becomes, GI ^iV = fl ^I^il^ 1=1.....m). (4.49) with the same orthonormality condition as im (4.48). The eigenfunctions obtained from the hy the symmetric ortho-gonalization procedure are given by, xii} • {k'50) Thus, the eigenvalue equation,, (4.40), for the operators is f ^(D , e(D^(I) 1 11 fi 11 * (i.j=lr...,nI» 1=1....,m). <%(l) X(I) _ (4.5D < *il 1 ^jl > " 84„ The full eigenfunctions of H are given in terms of the eigen functions, (4.50);, of the operators Hj, by K/l Finally, for the eigenprojections,. PT, it is seen that, p' = E rtWxyph = E1 (1 • E fKT) l^^xf H}|(1 E f* J i=l K=l K' =1 K/I f/I m /T\ m + = (1 + E fKT)gu;(l + E fJ.T),- (4.53) K=l ^ K'=l * X K^I K'/I where I (I) . s |^(I)><1//(I)|, (4.5^) i=l 11 11 defines an embedding of the metric gj in the full n-dimensional basis space. 85 4.3 Generalization to a Non-orthonormal Basis The generalization of the multiple partitioning formalism to the case of a non-orthonormal basis is straightforward. The eigenvalue equation is now H X » X S E, (4.55a) with XfS X = ln> (4.55b) as in eq. (2.90), where S is the matrix of overlap integrals of the basis functions* The set of basis functions, and the eigenvectors, X„ of H, are each partitioned into m subsets, exactly as described in section 4.1.a, making it possible to write the eigenvector matrix X in the partitioned form (4.2). The full matrix X can then be written in: terms of some matrix A A T, and the diagonal block part, X,, of X, as given in eq. (4.4), X = T X. (4.56) The matrix elements of T are given here also by eqs. (4.5). The conditions under which eqs. (4.56) will be valid are identical to those under which (4.4) are valid, namely, that the partitioning of the basis functions must be so defined A that X is invertible. While X is no longer a unitary matrix, the hermiticity of H implies that the columns of X are linearly independent (except possibly if S is singular) and thus there will be at least one partitioning of the basis A functions for which X is invertible. A The m(m-l) off-diagonal blocks of the matrix T are not 86. all independent, as can be demonstrated using the orthogonality condition, (4.55b)• The analogue of eq.. (4.7) is T ST « (X XT) 1 = g, (4.57) which must be block diagonal if the orthogonality condition Is to be satisfied. This implies the equations, nri m ^. m ^ gIJ = SIJ * L^ SILfLJ * L^ fLISLJ + R £=1 fKISKLfLJ L/J L/I K/I,L/J • 0,. (I,J = 1, mi I/J). (4.58) These equations could be used in specific cases to eliminate half of the elements of the off-diagonal blocks of T from the formalism.. However, they are considerably more complicated than the corresponding equations, (4. ,8), for an orthonormal basis* Therefore, the remarks following eqs. (4*8) apply here with even greater emphasis. From a practical point of view, such an elimination procedure is not to be recommended* The diagonal blocks of g, given by m m ^ m ^ gI ' SIIjjj SILfLI + j£z fLISLI + ^^/K^KLhl9 (4.59) serve as metrics for the truncated eigenvectors X^.^, as indicated in eq. (4.9). Because of the explicit presence of the overlap matrix, S* the leading term of g^ here is S^j,. rather than a unit matrix of the same dimension, as occurs with an orthonormal basis, eq. (4.10) *% The defining conditions on the off-diagonal blocks of T are obtained in a manner similar to that employed with an 87 orthonormal basis. Direct substitution of (4.56) into (4.55), and use of the fact that X is invertible leads to H T = S T (X f X"1) « S T S, (4.60) where H is to be block diagonal, as in eq. (4.14). The diagonal A A blocks of (4.60) give H in terms of H, S, and T, as % = [(ST^r^HTjjj m m If the overlap matrix is a unit matrix, the inverse matrix in (4.6l) reduces to an identity matrix, and eq. (4.17) for an orthonormal basis is recovered. The expression, (4.61), for the effective operators Hj, (1*1,, m),, are of the same A form as eqs. (2.95) and (2.96), given for the operators A and Hg. in the 2x2 partitioning formalism. From (4.60), it A is seen that the eigenvalue equations for these Hj are1 given by flj XJJ * XXI (I =1„ .... m), (4.62) exactly as in (4.33) for an orthonormal basis. As pointed out A • in chapter 2, however, a new set of effective operators Hj could be defined by -• m HI = HII + E HIJfJI» (3>1» m)» (^«63) J=l J/I which is identical to (4.17), but leads to an effective eigenvalue equation of the formn ftI XII= gI XII !U)» 88. where §I = SII *' /t SIJfJI • J/I cart; be regarded as an effective overlap matrix. Equations (4,64) and (4.62) are simply restatements of the same eigen value equation, and one typical way of actually solving (4.64) Is by using (4.62) as an intermediate. Defining conditions on- the fare now obtained from the off-diagonal blocks of eq. (4.60), after substitution of (4.6l). The result is DI|T(T) = (HTJJJ - (STJJJHJ m m „ = HLJ * K-l HLKFKJ " (SLJ + K-1 SlKfKj)Hj K^ (4.66) m nr. m KVI K/J K/J x (HJJ+JlHJKfKI> = 0# Clearly, the presence of the overlap matrix severely complicates the determination of the fjj* It is seen that these equations still retain the property of being separately soluble for individual block columns of T. Despite the complexity of eqs. (4.66), it is still possible to devise efficient iterative schemes for their solution. Am alternative set of defining conditions, analogous to eqs. (4.19), are obtained by premultiplyihg the eigenvalue 89. Ad equation, (4.60), by T , to give A + A .* t A . A a A _1 . G' = T HT « (T ST) X J X x, (4.67a) where TfST • (X X*)"1 = g, (4.67b) must be block diagonal.. Thus G itself must be block diagonal and its diagonal blocks form a second set of effective operators, when this is so, with the eigenvalue equations GI XII " gI XII (3>1* m)r (4.68) This is identical in form to eqs. (4.35), the effects of the presence of the overlap matrix being buried in the detailed form of gj* The operator Gj here is identical in form with the corresponding quantity for an orthonormal basis. When eqs. (4.66) are satisfied, implying that the second equality in (4.67a) is satisfied, it is seen that Gj • gjfij, (I » 1, m). (4.69) The matrices fJJ can therefore also be determined by the 1 condition that the off-diagonal blocks of G and g vanish, eqs* (4.20) and (4.58). Since both G and g are hermitian, both eqs. (4.67a) and (4.67b) are required to determine all the fjj. These equations effectively couple all of the off-A diagonal blocks of T, which must therefore be completely determined simultaneously, rather than block column-wise, as is possible using (4.66)* This drawback in using eqs. (4.67) is probably more than compensated for by the much simpler form of these equations. 90. TWo of the three types of effective operators which were defined for an orthonormal basis have been introduced above in eqs. (4.62) and (4.68) for the present case. The third type of effective operator, namely, the Hj, (I = 1, m), given by eqs. (4.4l), are identical in form here because the overlap matrix does not appear explicitly in their definitions. The corresponding eigenvalue equations are given by (4.40) with the eigenfunctions of Hj(;.being related to those of Hj and G, by (4.39). 91. 4.4 Practical Considerations 4.4. a Alternative Formulas In the development of iterative procedures for the deter mination of the fJX either from eqs, (4.18) or (4.20), (4.21), or their counterparts in the case of a non-orthonormal oasis, it is necessary to take into account the manner in which a given f-dependent quantity is evaluated. This point has been explored in detail in section 3*1 for a 2 x 2 partitioning. The purpose of this section is to outline the corresponding (more complicated) results for a multiple partitioning formalism. Consider first the case of an orthonormal basis. The operators Gj are given by eq. (4.37) as Gj = (T^HTJJJ •;• (4.70) When the DXJ(T) are not all zero, it is possible to distinguish two distinct forms for the operators Hj, namely, HI * HII + 2 HufJi» (1=1. m), (4.71) and in eq* (4.17), and, J*2* = g^Gj „ (I * 1„ ro), (4.72) from eq. (4.69). It is a relatively simple matter to demonstrate that (see Appendix 3). H<2) = QX) + g"1 Z fJjD^^ , (1 = 1. .... m), J/I (4.73) (1) where the B\r , defined below, are essentially the conditions 92 (4.18), defining the fjj. Thus,, if all D^, (J=l, mi J/l), *fl) vanish for a particular value of I,, the two operators Ej ' * (2) and Hj ' (for that particular: value of I) are identical. Given the two forms of the operator Hjr it is possible to write the first form of the defining conditions, (4.18), on the fJJ in one of two alternative ways, namely, and »«•<*> • HJI + £ HJKfKI " fJl"i2'-These two forms are equivalent in the sense that they both have the same zeros, by virtue of (4.73)• Ira. detailed form, they are quite different, however, away from a zero. Substitution • of (4.73) into (4.75) gives verifying that (4.74) and (4.75) are only equal where they vanish, and that they do have all their zeros in common.. Equation; (4.76) is the generalization of eq. (3*6) in the 2x2 partitioning formalism* In the 2x2 case, it was shown that the conditions . (2) *-l DjJ' * 0 also arose out of the requirement that I HT be block diagonal* In the present multiple partitioning case, this is no longer truer because of the increased complexity of the orthogonality conditions, (4.8)* Since II « g, one has T"1 = g'1**, (4.77) 93. and therefore, I HI = & T HT = g G. (4.78) In the 2x2 case, g could easily be made block diagonal, and (1) in so doing, Gg^ became identical to Dv '(f), leading to eq., (3.6). In the present case, the general block diagonalization of g is not possible, and therefore, a result similar to that of the former case cannot be obtained. Here, as in the 2x2 case, it is possible to calculate the operators Hj using one of three different formulas, in terms of Hj , ', and Gj, respectively. The first form,, in terms of H| , as indicated in (4.4la)r is not of practical interest, because it represents only a partial re-normalization of the truncated eigenvectors. The latter two formulas, are effectively identical from a practical point of view. Consider now the case of a non-orthonormal basis. Many of the results presented earlier, for the simple 2x2 parti tioning, have analogues in the present m x nr partitioning formalism which are too complicated to be likely to be useful. Again, it is useful to distinguish two sets of effective , at operators of the HT-type, namely, a gj8 % gj . (4.79a) (4.79b) (4.80) and, 94. U^2 ) s o-"*l ft * gl 1 (4.81) = «l1C«n*2(fJIHjI* HufJi) +J^IfJIHJKfKI^ I = 1,, »..,, m, in both cases. The relationship between these two sets of operators is found to be (see Appendix 3), exactly as for an orthonormal basis. *(1) * (2) The two types of effective operators, and H£ ', lead to two different sets of defining conditions for the fJiaof the type (4.18). They are written "jl'* " HJI+K^IHJKfKI-(SJI^1SJKfKI>fti1)' and, where, in both (4.83) and (4.84), I,J • 1, m, I/J. Direct application of eq. (4.82) to eq. (4.84) gives the relationship between these two types of quantities, Equation (4.85) is the generalization of eq. (3.15c) to the multiple partitioning case. A generalization of (3»15b) can also be obtained here, but only at the expense of a great deal of tedious algebra.. The final result contains many additional terms not appearing in (3.15b), and thus is not likely to be useful. 95. 4.4.h> Implications of Inexact Solutions The purpose of this section is to examine the errors in the effective operators Hj„ Gj, and Hj, arising from the use of inexact fjj, (I,J » 1, ...» m, I/J). These results closely correspond to those given in section 3»2, and thus, only a brief summary is required here. Consider an approximate solution to eqs, (4.18) or (4.20), (4.21)r written as fapprox = + ^ {lfJ s lt 9f m> lfij)t (if#86) where the fjj» here, are to represent an exact solution to those equations. The error 6fjj in fjj gives rise to errors in the effective operators Hj, Gj, and Hj... The only complica ting factor here, compared to a 2 x 2 partitioning, is that the errors in several fXJ will contribute to the overall error ih a given effective operator. Prom: eq. (4.37), it is seen that fiGj = E (SfJx^jjHj + Hjfjjfifjj) • 0(62). (4.87) J/l Similarly, from (4.71), = E HIJ6fJI + °(fi2)» (4.88) J/I Using the equation, (gx * 6gx)(H(2) + bw[2)) = Gx • 6GX , obtained from the definition (4.72) of H£2\ the error in H{2^ Induced by the above errors in the fXJ is given by 96 gJ1C!*SI« 6glH<2)] + 0(62) (4*89) In obtaining eq. (4.89),, eq. (4.10) has been, used to write using eq. (4.79).. A similar, though not identical, form for 6Hj is obtained if the formula (4.79) for Hj in terms of Gj is used* In eqs. (4.87)-(4.90), all quantities on the right hand sides which are not incremental, are exact. The formulas (4.87)-(4.90) exhibit substantial similarities with the corres ponding formulas for a 2 x 2 partitioning., In fact, in most cases, it is seen that terms involving the single block f in. the 2x2 case here contain sums over similar terms for each th * block fin the I block column of T. As for a 2 x 2 partitioning, the error expressions (4.87)-(4.90) are all first order in the errors bfjj* However, the errors 6Gj of (4.87), 6H£2^ of (4.89), and fiHj of (4.90), have a vanishing expectation value in first order with respect to *(1) the exact eigenvectors^of these operators. The error 6Ei sfij = 6(gl* ft<2)g-*): 97. of (4,88) does not have such an expectation value which vanishes in first order in the errors in the fjj* For this reason, the *(1) Hj can be considered as inherently less accurate than the former three effective operators, when inexact values for the elements of T are used* 98. CHAPTER 5 EXACT DETERMINATION OF T "•Why,1 said the Dodo, *the best way to explain it is to do it*. (And,, as you might like to try the thing yourself some winter day, I will tell you how the Dodo managed it,)" (Alice's Adventures in Wonderland, Lewis Carroll) 99. Several sets of simultaneous non-linear equations, defining the off-diagonal blocks of the partitioning operator T, have been derived. In general, these equations can only be solved numerically. Some numerical iterative techniques are described in this chapter, and some assessment of their efficiency and reliability is made* A number of additional ways of defining T are also discussed, together with the numerical procedures they suggest* The methods described can be applied in. a wide range of quantum mechanical calculations• They are particularly useful when only a small number of the eigenvalues and eigenvectors, or only a projection onto a whole eigenspace (rather than the individual eigenvectors) of a hermitian operator are desired* The techniques described below represent new and practical approaches to such calculations* 100. 5.1 The Calculation of a Few Eigenvalues of a Large Hermitian Matrix The choice of algorithms to determine f depends to some extent on the nature of the applications which are anticipated. One important application) of the methods of this chapter is the calculation of a small number of the lowest (or highest) eigenvalues, and corresponding eigenvectors, of a large hermitian matrix. Such applications arise in the determina tion of electronic wavefunctions for the lower lying energy levels of atoms and molecules in large scale configuration interaction calculations, and in a variety of calculations in applied mathematics and physics. The matrices arising may have dimensions up to tens of thousands, (ROos, 1975)* Algorithms for the partial diagonalization of large matrices must satisfy a number of conditions to be practical. With a matrix so large that it must be stored on some auxiliary device, rather than in.the central computer memory, only small sections are available to random access at one time., Techniques which involve many successive modifications to the original matrix thus become very inefficient, and their vulnerability to significant cumulative round-off error increases with the dimension of the matrix. Further, in tech niques in which the entire matrix must be brought to some standard form before the calculation of a single eigenvalue and eigenvector, the calculation of a small number of eigen values and eigenvectors may require nearly as much work as the calculation of all of them. 101. In iterative techniques, on the other hand, these diffi culties can be minimized. With proper organization, small sections of the matrix can be used sequentially, and the work per iteration can be made proportional to the actual number of eigenvalues being calculated. For large matrices, this work should then also be roughly proportional to the square of the dimension of the matrix, rather than the third power* Most iterative techniques now available1 for the partial dlagonalizatiom of large matrices are based on the calculation of successive corrections to some starting vector, to obtain a sequence of vectors converging to a single eigenvector. Since these techniques typically use the maximization or minimization of the Rayleigh quotient with respect to the approximate eigen vector as the criterion for the calculation of the appropriate corrections, the single eigenvector obtained usually corresponds to the largest or smallest eigenvalue of the matrix. To find other eigenvalues and eigenvectors of the matrix, the same procedure is repeated, but convergence onto previously cal culated eigenvectors is prevented using one of several tech niques (Shavitt, 1973)• A different approach to the partial diagonalization of a large hermitian matrix by iterative methods, is provided by this eigenvalue independent partitioning formalism. If a 1See Shavitt et. al.,(1973)l Shavitt,. (1970) i Nesbet, (1965) I and Feler, (1974). 102. matrix f, corresponding to the uncoupling of an n^-dimensional subspace spanned by the desired eigenvectors, can be deter mined, then the calculation of these ^eigenvalues and eigen vectors reduces to the construction and solution of an n^-dimensional eigenvalue equation, to get truncated eigenvectors X^, only, followed by the matrix multiplication, r 1, XU>. A f XAA* (see eq. (2.3)). The nA eigenvalues and eigenvectors are determined simultaneously, and thus, no error prone and time consuming deflation or eigenvalue shifting procedures need be employed to obtain eigenvalues greater than the smallest one. If the accuracy of the elements of f is uniform, the accuracy of the nA eigenvalues and eigenvectors calculated should be uniform, rather than slowly deteriorating in the order ih which they are calculated. These methods are especially use ful when the desired eigenvalues are nearly, or exactly,, equal, but well separated from the remaining eigenvalues of the matrix. Existing procedures which consist of successive cal culation of the desired eigenvalues, one at a time, may perform very poorly in such a situation. The major part of the procedures described here involves the calculation of f» In developing suitable algorithms for the iterative determination of f, two criteria were satisfied whenever possible, namely, that the amount of computation per iteration be proportional to n^ng , and that the columns or 103. rows of HgB fee required only sequentially. With ng >>nj^» manipulations of nfix ng matrices (such as inversion, or the evaluation of the product of two of them) require of the order of computational operations, which is of the same order as the amount of work required to completely diagonalize the entire matrix by traditional methods.. To; maximize their accuracy, given f to some accuracy^ the; eigenvalues and eigenvectors should be computed from one of * (2) ~* *(l) the effective operators H"A » or HA. rather than from even though the latter is easier to calculate. The computed eigenvalues will then be accurate to second order in the error in f (see section 3*2).. For ng, >>J*A, the calculation of GA requires of the order of ng nA computational operations. The remainder of the calculation, including the calculation of HA ' or HA, if desired, the diagonalizatiom of the nA x nA (A) effective operator, and the determination: of Xx all represent negligible additional computation. 104 5.2 2x2 Partitioning — Orthonormal Basis 5.2.a General Considerations This section is concerned with the determination of f by solution of eq. (2.16). D(f) = HM • HBBf - fnA « 0. (5.D This matrix equation represents a system of nAnB simultaneous nonlinear equations for the individual matrix elements for* A general solution can be written down in only two special cases. If the hamiltonian is already block diagonal, then*, clearly, f 88 0. If the diagonal blocks of H vanish, so that H is block off-diagonal or "alternant",, then (5*1) reduces to HBA " "AR*" °» <5.2) which has the solution, f " <HBAKAB>"*HBA * HBA(HABHBA>'*' as can be verified by direct substitution.. When H does not have one of the special forms mentioned above, some iterative procedure or perturbation method must be used to solve (5*1)• Iterative methods to successively correct the approximation to a solution are considered here. Perturbation methods are discussed in the following chapter. Among the simplest iterative techniques to apply are those in which eq. (5*1) is rewritten as a fixed point problem, f * 5(f) = *mltD{t) * /ff]. (5.4) where ^r* is some non-singular, possibly f-dependent super-105. operator* Successive substitutions, *m+i * ^,(fm^» starting from some initial guess f0, give the scheme, 6fm+1 - ^1D(fm). (5.5a) fm*i * *m * 6W m 3 o-1.2-— <5.5b)-hopefully,convergent to a solution of (5.1). If the sequence [fm! converges, the rate of convergence will be linear in general if -Hr is independent of f • Iterative procedures with better than linear convergence invariably involve the use of an f-dependent operator Ar» The Newton-Raphson procedure is the simplest of this type. The generalized Newton-Raphson: equations, are a special case of (5«5a). in which J^ris the negative of the Jacobian matrix, J(f), which consists of the first deriva tives of the elements of D(f) with respect to the elements of f. Iteration on eqs. (5»6) and (5«5b) results in a second order convergent sequence (fj. That is,, the error in the estimate, fffl, of f, after the m iteration:is given as a linear combination of second order products of the errors in fTm-l* the result of the previous iteration (in the sense described in Appendix 5). so that convergence becomes very rapid as the solution is approached* For eq. (5»6), as for any iteration formula of order greater than one, convergence See Rail (1969), especially section 121 Traub, (1964)i and also, Appendix 5 of this thesis. 106. will always occur If a sufficiently accurate initial approxi mation, fQ, can be obtained. For linear iteration functions, there need not be any initial estimate of f which will lead to convergence. A set of related iterative procedures with high order convergence properties can be generated according to the scheme, 6fm2) " -^-l^W •41)> (5.7) . . . .m m—l m—l m j (k) It can be shown that the error in f„ 8 f» S 6f« is a linear m m-i m combination of (j+1) order products of errors in fm_i (Traub, 1964). The advantage of using an iteration formula of the type (5*7) is that the Jacobian matrix, which is typically of large dimension (nAnfi x n^ng here), need be constructed and inverted only once for each cycle of the type (5*7). Iteration schemes with second order convergence require the evaluation and manipulation of the (nAng) first derivatives of D(f)., Similarly, third order convergent iteration schemes generally require the evaluation and manipulation of the i(nAnB)3 second derivatives of D(f). Algebraic expressions for these sets of derivatives are easily obtained. Third and higher order derivatives of D(f), eq. (2.16), with respect to f arte zero. For the particular application to large matrices, these 10?, iteration; schemes with better than linear convergence involve the manipulation] of unacceptably large amounts of information. The solution of eq. (5*6) for 6£m+1 involves of the order of (nAng)^ computational operations, and for n^ » m>>nA, this is comparable to the amount of work required to diagonalize H completely* For n^, >>n^, a third order formula involves of k the order of m operations per iteration — equivalent to the complete diagonalization of the matrix m times over* For large matrices, it is therefore necessary to concentrate on compu tationally efficient, linearly convergent iteration procedures. When H is diagonally dominant, with the diagonal elements of closely grouped about the value X^, the simple choice A = Ujl, - H<|>)®1A (3.8) ( d) (direct product notation) suggests itself. Here Hg-g/ is the diagonal part of Htgg.. This gives an iteration scheme based on the correction ,f - B"(f) • (5.9) closely related to degenerate perturbation theory. In eq. (5*9)•• and throughout the treatment of the 2x2 case, Greek letters refer to basis elements in Sg,, and Roman letters to basis elements in S^. The iteration index mi will be dropped wherever the context does not require it* More generally, for diagonally dominant matrices, the simple choice,, 108 Heads to the corrections, Dar<f) &far = ~ffl . (5.U) Hrr ~ Hoo also closely related to perturbation theory. The procedure based on (5*11) will be designated as the "Simple Perturbation" (SP) algorithm* Numerical calculations indicate that it converges well only when the diagonal elements of H. are ordered monotonicallyj and when the diagonal elements of are well separated from those of K^g* Details of test calculations, using this and other algorithms, are given in section 5*2.g. A better approach is to base the choice of on approxi- ( mations to the appropriate Newton-Raphson equations. As demonstrated in Appendix 5? these methods are still linearly convergent, but hopefully exhibit some of the stability of the Newton-Raphson equations, over a range of problems. Different approximations to (5*6) lead to algorithms exhibiting different rates of linear convergence. In assessing the compu tational efficiency of such algorithms, however, it is neces sary to consider both the amount of computation per iteration and the number of iterations required to obtain desired accuracy. During the iterative solution of (5*1)* the required f-dependent quantities must be evaluated using the current approximation to f. Thus,, the considerations in sections 3*1 109. andi 3.2 are relevant here, and it is useful to classify the algorithms developed below according to the way in which the f-dependent quantities involved are evaluated. 5.2.b: Methods Based on D(1*(f) If f is an approximation, to the solution of eq. (5.1)t and 6f is the exact correction, so that f * f + 6f Is the o exact solution of D^^(f) « 0, then, it follows from the definition, (3.4), of D^Cf)* that HBl)(f0)t6f - 6fHA(f) » -D(1)(f0). (5.12) This is an exact equation for 6f• The Newton-Raphson equations for the system D* '(f) = 0 are 4l)(fo)t6f - ^A1>(fo) " -B(1)(V» (5-13) the matrix elements of the Jacobian in this case being, D(D ort Equation (5*13) differs from the exact equation (5«12) only in that the exact operator HA(f) appearing in (5*12) is replaced by the current approximation HA*^(fQ) in (5*13). Despite the sparseness of the Jacobian matrix here, the Newton-Raphson method is still computationally inefficient. A nom-iterative method, such as Gaussian elimination, for solution of (5*13)• does not easily allow proper exploitation 110. of the blocked structure of the Jacobian.. Straightforward application) of the Gauss-Seidel method and its refinements, to the determination of 6f from (5.13) with J^(f) and D^U) fixed* does allow the sparseness of the Jacobian matrix to be exploited* However, such a procedure is inefficient in that it does not make use of all the information available about f at all times if J*1^ and D^1^ are held fixed during the iteration to determine 6f* Thus, a modified Gauss-Seidel procedure applied to (5*13) is required* The simplest linear iteration formula based on the Newton-Raphson equations, (5*13)» is one in which the operator A" in (5.4) is taken as the negative of the diagonal part of the Jacobian matrix. The successive corrections to f„^ are then or given by D(D 6for » 22 (5.15) VflA ;rr v a 'aa Im view of the simplicity of the matrices involved, the most efficient computational procedure is to change only one element of f at a time, calculating B^} at that time, and updating H^ and the diagonal part of Kg1** continually* After a change im a single for» these quantities are easily updated because they are linear in fr (^'isr8 Hso6for • (s " 1* "- nA>» (5'l6a) and (&K(1)t)rt„ " -*f„ H « •.(fiHj^)^ . (5.16b) oo or ro A rr (1) Calculation of D^*' as required involves the same number of computational operations per sweep through 6f as the continual updating of (for which D^ must be stored), but there is a likelihood of significant accumulation! of round-off error as the solution is approached if such an updating procedure is used for D^. Where the diagonal elements of are fairly well separated from those of H^, the usual starting approxi mation is f = 0., In this case, the starting approximations to 7 and ' are simply and HfiB. The iterative scheme based on (5»15) and (5«l6) will be referred to here as the "Simple Diagonal Newton-Raphson" (SDNR) algorithm. A precise statement of computational details is given in Appendix 4.. The idea of the correction &fior,, calculated in (5.15), is; that it should reduce the corresponding B^} approximately ta> zero. This may be far from true early in the calculation if (1) 6for is large* The change &far required to reduce D^r' exactly to zero can be determined fronr (5.12)» The result is a quadratic equation in 6£0rt namely, «ro*4 + C<^il>>rr-<4B1>t>«.3i*«-DaJ> = °* <5'l7> The iterative scheme based on this equation will be referred to as the "Quadratic Diagonal Newton-Raphson" (QDNR) algorithm. Precise details are given in Appendix 4. If (5»17) has two real roots, the desired correction is the one of smallest magnitude numerically. When (H^1*) -(fig1**) is much greater than either or both of B^} or H^, this correction 112. differs negligibly from that given by (5.15). As the solution of (5»1) is approached, and the magnitude of D^J becomes progressively smaller compared to the other coefficients ih (5*17) (which are constant, or effectively constant once a reasonable approximation to f is achieved), it is necessary to use the formula for the root of a quadratic equation with a rationalized numerator to avoid serious round-off error, that is, 2XDii) or 6f or where (5.18) X* •«»C(Kil))rr-(Hi1)t)co]»- (5.19) when all coefficients are real. This equation can be used instead of eq. (5*15) incases where difficulty is experienced in establishing convergence. *(l) *(l) If diagonal elements of Hj^ and Kg ' become very nearly equal at some stage of the iterative calculation, eq. (5.15) nay lead to divergence. Such diverging tendencies may be damped if (5.18) is used. On the other hand, situations occur in which eqs. (5.18) accelerate the divergent process. The results of some numerical calculations using both of these algorithms are included in section 5*2.g and Table 5«1. If diagonal elements of H^ and Hgg. are equal, it is necessary either to use a non^zero starting approximation for f, or to use algorithm QDNR initially, since application of the 113. SDNR algorithm may lead to a division by zero early ih the calculation. It is unlikely that either of these stratagems will lead to a rapidly converging calculation, however, unless a reasonable separation is soon established between the diag-onal elements of and H^. In the limit tig > > n^, the quadratic algorithm, QDNR, requires effectively the same amount of computation per sweep through 6f: as the linear algorithm SDNR. In both cases,, the time consuming part of the calculation is the evaluation of Dor*, and'possibly, the updating of H^ and Hg1^, rather than the calculation of 6fffr from either (5»15) or (5.18). Iteration on (5.15) for 6for, while updating (HJ1^, but * (1 )t (1) keeping (Hg )ag and D* f fixed, is equivalent, if convergent, to using (5*18)• This is not necessarily an efficient procedure, however. (2) 5.2.c Methods Based on Dv '(f) The operator H^(f) appearing in D^(f) must be con sidered to have errors of the same order as those in f itself. *(2) As shown in chapter 3, however, the error in H| ' is smaller, in some sense, the eigenvalues being unaffected ih first order by a first order error in f. This insensitivity can be exploited in iterative procedures for solving the equation,. D(2)(f) = Hg^ + Hggf - fHA2) « 0, (5.19) which has the same solutions as does eq. (5»l)« -1 *(2) Because of the inverse operator gA in , the exact (2) Jacobian matrix of Dv '(f) is no longer simple, j(2),;* H 6 + - (H<2>) +6 - if *«* (5.20a) Here, one has, a(H<2)) * - <«;lVfV6rt " (*IlftVVtr- ^.20b) Because of the term involving the derivatives of the elements * (2) of H^ , the exact Jacobian matrix is not at all sparse, in (1) *(2) generalj unlike JK ' of eq. (5.14). However, since H^ ' varies slowly with f near the solution of (5*19)r it is (2) expected that those elements of Jx ' arising solely from the third term of (5.20a) will be relatively smaller than the remaining non-zero ones. On neglecting this term in (5«20a), 'i the approximation AD . * = H 6 . - (HJ2) )^tA„ , (5.21) av,pt op rt A rt po ' w is obtained. This gives the simple equation, HBB6f - 6fHA2) * -D*2)(f)„ (5.22) as an approximation to the Newton-Raphson equations for the system (5.19). In contrast to (5»13)i this equation involves the original HfiB only, and not some modified nfi x nB matrix. *(2) On the other hand, it is more complicated to update H£ ' than H^1^. For any change 6f in f, the change in H^2^ is given exactly,by, *w(2)_ -l(new)r(new) -l(old)r(old) A " gA GA "* gA GA _ -l(new)r*« _ «(2) where '0GA - 6gAHA^] (5.23) - gA1(n6w)[6f^D(2)+H^ WBA " HBA * HBBf« (5'2if) All quantities on the right side of (5.23) are before updating, except where explicitly indicated. Since an nA x nA matrix inversion is required for each updating of HA '., the use of (5.22) is efficient only if groups of elements of f are changed simultaneously before updating HA • In application to large matrices, it is most efficient to change entire nA-dimensional rows of f at one time. For refi >>nA, this leads to an algorithm requiring comparable work, per iterative sweep through 6f, to algorithm SDNR (that 2 is, of the order of nAnfi computational operations per sweep). As in SDNR, only single columns of the block HfiB are required at one time* Two iterative methods based on eq. (5*22) appear useful. The first is the simplest diagonal approximation* which corre sponds to taking M- of eq. (5»k) as the negative of the diagonal (2) part of Jr This leads to the iteration formula, D(2) 6r . 2E (r « lt ^ n )# (5.25) or *(2) (HA )rr - Hoa When 6far is given by this equation, the expression,.,., (5«23)> Il6» * (2) for 6HA ' simplifies somewhat to &ft(2)_ -l(new)r/f(new)tx &f ti(2)+/wt x &f A " 6A LCf ;Aa6foAHA +<WBA;Aa5faA * (»/)4A.Si2)d] . (5"26) 'Aav"aA"A where H^2^d is the diagonal part of H^2\ and where (WgA)A(J, (6ff)Aar (f^new)t)A^. and Uf)aA, refer respectively to the ath rows of W^, f(new)f, and 6ff, and the oth column of 6f. The second method is to treat the nA equations in (5*22) for each fixed a as a system of simultaneous linear equations for the &for» (r * 1, • nA). This corresponds to taking to be block diagonal, each diagonal block being the negative of the diagonal block of «T ' referring to a row of 6f, The resulting iteration formula can be written,, 4FOA ' -^AWA " "A2^"1 • (5.27) which,, im practice, involves the solution of a system of nA simultaneous linear equations in nA unknowns. For this change 6f, the first term; of (5»23) vanishes, so that the updating *(2) formula for HA reduces to ,S<2> . ^—ICCwJ^f^ - (^J^f^l2']. (5.28) This method involves somewhat more computation per sweep through 6f than the preceding one, but may be expected to converge in fewer overall iterations in certain cases where the off-diagonal elements of HAA are large. The two procedures described above will be referred to as the "Diagonal Generalized Nesbet" (DGN), and the "Full Generalized Nesbet" (FGN) algorithms, respectively. A precise statement of computational details is given in Appendix 4, In the case nA = 1, they both reduce to an algorithm of Nesbet (1965)* There are also certain similarities to that of Davidson (1975)* Test calculations using them are described in section 5*2.g, 5*2,d Solution, of the Newton-Raphson Equations by Descent Methods The approximation of the full Newton-Raphson equations by much simpler equations, to avoid prohibitively costly calcula tions, reduces both the rate of convergence and the range of calculations for which convergence occursv A major factor in non-convergence of any of the algorithms SDNR, QDNR, DGN, or FGN, must be the neglect of some or all of the coupling between elements of 6f in the Newton-Raphson equations. The successive correction of individual (or at most a few) elements of f can lead to very slow convergence ("spiralling"), and also divergence, in the case of systematic over-estimation of the elements of 6f, It is desirable to vary all of the elements of f simulta neously, but using methods which are less costly than solving the Newton-Raphson equations exactly. The Gauss-Seidel method applied to the full Newton-Raphson equations, with updating of J and D only after one or more sweeps through 6f have been completed, is one possible way to approximate the coupling between the elements of 6f• However, in this sub-118. section, alternative methods Based on the minimization of the residual, J6f + D, of the full Newton-Raphson equations, will be examined. When 6f is real,, the solution of the Newton-Raphson; equations is equivalent to determination of the stationary points of Q(6f) - #6ffJ6f + 6ffD, (5.29) considered as a function of 6f., Further, if J is positive-definite, the solutions of the Newton-Raphson equations are equivalent to local minima of (5*29), so that 6f can be determined using a gradient minimization technique. The eigen values of of eq. (5*14), are evidently the differences *(1) between the eigenvalues of Hifi ' and • Thus, as long as "(1) all the eigenvalues of Hg are larger than all the eigem-(1) values of H£ ', the Jacobian Jx ' will be positive definite. (2) The eigenvalues of the Jacobian matrix Jx ' are less easy to *(2) deduce because of the terms involving the derivatives of H^ • in. eq* (5.20a)• If these derivatives are sufficiently small, then JA ' will be positive definite as long as some minimum,, separation is maintained between the largest eigenvalue of ... 4* (2) H^ ' and the smallest eigenvalue of HfiB. The condition that the Jacobian be positive definite implies that it is the lowest eigenvalues which are sought. If the Jacobian matrix is not positive definite, the solution of the Newton-Raphson equations is equivalent to minimization of the functional, 119. Q(6f) » (J 6f t D)r(Ji 6f + B) • DfD • DfJi6f • 6ffJ D • df^Sf, (5O0) which is more difficult to handle than (5.29) because of the generally large dimension of When J is positive definite, the use of an iterative gradient minimization technique (such as: the method of steepest descents or the method of conjugate gradients) to calculate 6f by minimizing (5»29) while holding J and D fixed, involves modifying 6f as a whole by successive amounts a^v^, where is a scalar step length chosen to minimize Q along the search direction v^. The search directions v^ are chosen equal to, or related to, the directions along which Q changes most rapidly. Computational details of the application of these minimization techniques to quadratic forms like (5*29) are given by Ralston (1965, PP» 439-445). If the steepest descent method is used with the Newton-Raphson equations based on D^^(f), the most costly part of the minimization iteration is the determination of the step lengths a^, which involves evaluation of the scalar product Vi,F(1'-Vi * o /» t {lWS»l>t)P.(Ti).t ..... - <TiW5i1}>tr*TlVr* 2 For nfi >>nfr* this requires of the order of nAnfi computational operations. In the conjugate gradient method, an additional 2 nAnB operations are required to evaluate the product V£J«D, 120* necessary in determining vi+^» Thus, if mi minimization iterations are carried out, the calculation of 6f, including the initial evaluation of and requires of the order 2 of (m+2)nAnB operations using steepest descents, and a mini-2 mum of (2m+2)nAn-B operations for conjugate gradients. This is roughly equivalent to m+2 and 2m+2 iterations, respectively, of the algorithms discussed in the previous three subsections* The advantage here is that a very good estimate of 6f may be obtained for m small, because the first few iterations in such minimization techniques frequently result in the greatest movement towards the minimum* Coupling between the elements of 6f is taken into account here, while the computation per iteration is still proportional to nAnfi for nfi >>nA* as is desired* While such descent methods are not expected to be of much use in application to large matrices, they are very useful in the slightly more complicated self-consistent field problem; in molecular orbital theory (see chapter 8), where it is considerably more costly to update D(f) and J(f), because of the complicated dependence on f of the matrix being block diagonalized. 5*2 *e Extremizing the Trace Another alternative to eq* (5*1) is to determine f such i that the trace of the matrix H over the eigenspace S4 is 121. stationary (see section 2.1.e), that is, such that E(f+6f) - E(f) * 6E * tr[PA(f+6f)-PA(f)]H • 6tr PAH, (5.32) vanishes to first order in 6f. Prom eq. (2.39)# this is equivalent to the vanishing of the quantity B - VjE - gj1 D(1)(f) gj1 r (5*33) 0 being the derivative dE/df,. The derivatives of D with or or respect to the elements of f and f* are given by or and 771 ••-<%1V<«ILGA«A1)tr + ^^tr^lW^^ or (5.34b) Thus, the Newton-Raphson equations for the system D = 0 can be written - -D. (5.35) On multiplying from the right by gA and from the left by gg,, these become iiB2^t*f-6fH|2)-«D(1>*[D^1)g][16ftf+f6ftD^2^] . (5.36) If and D^2^ are considered to be of the same order as 6f as the solution is approached, then the last term of (5«36) is of higher order than the remaining three terms. If this term 122. is neglected, the resulting equation is of the type (5*13) with the operators ' and Hg , replaced, respectively, by *(2) *(2) *M) and Kg. Because the difference between H.^ , Hfi ', and H^ , Hg.', is of the same order as the term neglected, the resulting approximate equation is not necessarily an improve ment over (5*13)» despite the presence of the more accurate effective operators. The exact equations, (5»35) and (5*36), could be significant for gradient minimization techniques, which can be set up so *(2) that divergence cannot occur. However, the evaluation of H^ ' involves the inversion of the nfi x n^ matrix, gg, as well as the formation of the product gg^Gg*- For nfi >>nA, these two computations could be prohibitive. 5.2.f Minimization of the Norm of D A further alternative to determining f by solution of , (5.1) is to minimize the square of the Hllbert-Schmidt norm of D, l|D||2 - E ID. |, (5.37) o,r with respect to f.. The required gradients are Ms!!2. .2E D hfr o,r 0r df_.e TS TS * 2 (HgD + D HA)Tg. (5*38) This approach is attractive because it involves a suitable 123. convergence criterion directly. If a gradient minimization technique is used, it is easy to ensure that a maximal rate of convergence is maintained. By minimizing ||D'l\ itself along some search direction in each iteration, problems of over shoot and undershoot can largely be avoided. 5.2 »g Test?-Calculations-The algorithms described in sections 5-2.a - 5.2.c have been applied to a series of matrices based on that considered by Nesbet (1965). The off-diagonal elements of these matrices are all unity, and the diagonal elements are some combination of the first n odd integers, 1*3,5, ••• • Matrices with dimensions up to 250 x 250 were consideredi this being suffi cient for testing.. The calculations were carried out on an IBM 370/168 computer using double precision arithmetic.L The convergence criterion was based on the Hilbert-Schmidt norm, ||Dll=(tr D1!)^, of the particular form of D(f) used in each method. A criterion based on the maximum change °for in the elements of f during an iterative sweep can also be useful. In all examples, the basis space, SA, is defined by the first nA basis functions in order, so that re-ordering the diagonal elements of H is equivalent to varying SA* For convergent calculations, it was found, except for the first few iterations in some cases, that log||D|| is usually well approximated as a linear function of the iteration number. 124. That is, convergence was linear once the calculation stabilized, with the value of ||0|| decreasing on the average by some constant factor for each iteration. This factor can be regarded as an average asymptotic error constant. Table 5*1 gives these error constants (or convergence rates) for a number of examples, to illustrate the effects of varying the size of the matrix, varying the differences between the diagonal elements of HAA and HgB, and varying the ordering of the diagonal elements of the full matrix to change SA« For comparison, Nesbet*s algorithm (Nesbet, 1965) was used to obtain a single eigen value of each of the matrices considered. The square root, o, of the variance for the approximate eigenvector, as defined in eq. (2.49), was used as the convergence criterion ih this case, and log a was also found to be a linear function of the iteration number. Note that the smallest numbers in Table 5.1 represent the fastest convergence. For the basic Nesbet matrix, with SA the space corresponding to the nA smallest (or largest) diagonal elements of H, all methods converge to give the eigenspace of the nA smallest • (or largest) eigenvalues. The rate of convergence varies little with nA, either increasing or decreasing slightly as nA increases. When the largest eigenvalues are sought ( or equivalently, when the off-diagonal elements are -1), con vergence is considerably poorer for nA • 1 than for nA = 5. except for algorithm DGN. The five algorithms tested have rates of convergence generally comparable to the Nesbet method TABLE 5*1 Lineara Convergence Rates of the Algorithms in Selected Calculations. tnA » *8) DIAGONAL MATRIX METHOD SP •nesbet i / / / S $ $ 10 20 2 So 10 20 2ZO ',3,5\. ./7,/0C 1.3, S... 37,3? /.3,5\.. V97, /.3,*.. . 37,39 /.S,5".. • **7,*59 0.27 a 33 o.afc O.S8 O.SjT 0-3/ 0.4"/ 0.22 0.29 a 5"/ 0.3/ 0.23 0.3o 0.23 o.3o 0.5O o.37 O.V9 0.25 O.3o O.SO a/9 (4) o.So (it) O.J 3 o.30 a23 0.30 /O t1,ntis... 3.1 0.3V 0.23' 0.23' o.av (a) OliWf" 5" 5 S s lo to IO 10 iO /.3.-././;/3.;.f'...'/vy 0.58' /,3.. .19,11,9,1.. IWOSI* O.S3(5) 0.S5* 0.3/* 0.3/* '*,/7.. 9,//, .3.1 0.*9<< o.**e 0.3/^ 0.3/* c//V. OA2* o.42* oMl 0.3©<a> 0.3/ (t)e o.m(4)i 0.23*-o.a3 0.33 0.23 ©.33 5 S S 5 20 20 2D to »,»» ','V-V-3J/.V,3.5...4f, 0.2o K 0.6* 0/8 0.20 OAS O.I& 0.21 o.JS 0.1 la O.9O I m no m 126. TABLE 5.1 (continued) aThe tabulated numbers represent the average factor by which the norm J/DH is decreased per iteration, once a linear conver gence rate is established. SA is spanned by the first nA basis functions. All off-diagonal elements are unity. The numbers are obtained by a least squares calculation of the slope of logllDII as a function of iteration number. bthe number of iterations before linear convergence is established is indicate in brackets to the right of the convergence factor when not zero. cthe eigenvalues of this matrix are. O.386, 2.461, 4.519, 6.753. 8.629, 10.691, 12.766, 14.868, 17.037. 22.072. dconverges to the eigenvalues 0.386, 2.461, 4.519. 6.573, 10.691. Converges to the eigenvalues 0.386, 2.461, 4.519, 8.629, 10.691. ^converges to the eigenvalues 0.386, 2.461, 4.519, 10.691, 12.766. ^apparently converges to the eigenvalues 0.386, 2.461, 4.519, 10.691, 14.868. "converges to the eigenvalues 10.691, 12.766, 14.868,17.037, 22.072. f. converges to the eigenvalues 8.629. 12.766, 14.868, 17.037, 22.072. ^ HDll is oscillatory. Yr HDll is apparently divergent. *l|Dl| becomes constant (* 4.34) after 25 iterations. m o becomes constant or increases very slowly after about 50 iterations. 127. in cases where convergence is straightforward, as it is when the diagonal elements of H are ordered monotonically and are well separated. Frequently, large matrices arising in various applications, for which only a few of the lowest eigenvalues and their eigenvectors are required, have diagonal elements arranged in roughly increasing order, with variation in the diagonal elements large compared to individual off-diagonal elements. As seen from the results in the first part of the table, these algorithms are well suited for such calculations. The simple perturbation (SP) algorithm generally exhibits the poorest convergence rates in these examples, as may be expected, since it represents the crudest approximation to the exact Newton-Raphson equations. The algorithm DGN works relatively poorly in two cases for n^ * 5« Presumably, one of the diagonal elements of H£ ' approaches one of Hgg too closely during the calculation.- The algorithm QDNR has convergence rates identical to SDNR, because these two calcu lations differ significantly only at initial stages before linear convergence is established. The effect of varying the spaces S^ and Sg, as defined by the associated diagonal elements of H, is illustrated by the third part of Table 5.1. Rates of convergence deteriorate markedly when one or more diagonal elements of exceed at least one diagonal element of Hgg. The algorithms DGN and FGN sometimes converge to different eigenspaces SA than do SDNR and QDNR. It is noteworthy that QDNR gives no improvement 128. over SDNR in these examples, and is actually non-convergent in one case where SDNR converges well. The uncertain convergence is presumably due to one of the differences (HBUt)aa - (HAl})rr, or H^0 - (HA2))rr, appearing in the denominators of the iteration formulas, becoming small or changing sign. The iteration formulas become ill-conditioned or even singular under such circumstances. The presence of an Hinduction period" before linear convergence is established is presumably associated with an initial uncertainty in the sele-ctibn of a space SA , when the diagonal elements of the approximate A-space and B-space effective operators are not well separated. In principle, the space SA, specified by the calculated f, may correspond to any group of nA eigenvalues of the matrix H. Thus, in principle, any subset of nA eigenvalues, none of whose eigenvectors are orthogonal to the subspace SA of the full basis space, can be calculated without previous deter mination of any of the other eigenvalues. However, the first three sections of Table 5«1 show that the iterative calculation converges best when SA corresponds to the nA lowest (or highest) eigenvalues of the matrix, and SA to the smallest (or largest) diagonal elements of the matrix. Deviations from this arrange ment entail considerable risk: of poor convergence or no conver gence at all. If nA of the lowest m eigenvalues (m > nA) are desired, these convergence problems can be avoided by pre-diagonalizing a block of H containing the m smallest diagonal 129. elements„ as described in section 5*3*e. The last section of Table 5*1 shows the superiority of the algorithms developed here over Nesbet*s algorithm when the lowest few diagonal elements of the matrix are nearly equal, but well separated from the remaining ones. Generally, the f-operator calculated must correspond to the space SA spanned by the eigenvectors belonging to all of the nearly equal eigen values if good convergence rates are to be obtained. However, a surprising feature of the results is that the algorithm SDNR performs very well, even when a diagonal element of Hgg is relatively close to one of HAA« These computations do not indicate any clear cut superiority of one algorithm in all cases. When convergence is straight forward, all converge effectively equally rapidly. When con vergence is not straightforward, any one of the methods may be more stable or rapidly convergent than the others. However,, the algorithm DGN appears to be less successful than SDNR and FGN,. generally. The simple diagonal Newton-Raphson (1) procedure, based on Dv (f) • 0, is somewhat easier to program (2) efficiently for n^ > 1,. than the methods based on D '(f), and from this standpoint is particularly attractive. In fact, in> most cases,, the rates of convergence for this method compare very favourably with those of the other, more complex, methods. The extra computation involved im using QDNR rather than SDNR appears to be of little value, im general, even though this represents a negligible amount of additional work as nB 130 becomes very large. While SDNR yields only the approximation directly, a calculation of ' at the end of the iterative sequence takes only of the order of the time of one iteration, and can be carried out if the eigenvalues and eigenvectors of H corresponding to S^ are desired. For n^ =1, SDNR offers an alternative to Nesbet's method,, of comparable efficiency. 131. 5.3 Generalization to a Non-orthonormal Basis — 2x2 Case g»3.a General Considerations When the basis is not orthonormal, the relation of the two off-diagonal blocks of the partitioning operator is more complicated than before (see section 2.3)* The off-diagonal blocks, f and h,, of T, eq. (2.2),, can be defined by the pair of simultaneous equations representing the vanishing of the off-diagonal blocks of G = T^HT and g = TtST, namely, GBA * HBA * HBBf * h\ " °» (5'39a) and *BA = SBA + SBBf + h\ " °» . (5.39b) where HA = • HAgf„ and SA • S^* SAfif. Alternatively,, these equations may be combined to give separate equations for f and h, DBA(f) ° HBA * HBBf * (SBA * SBBf)5A " °» <5^0a> and DAB(h) " HAB * HAAh - (SAAh * SAB)5B " °» <5.40b) as irr. eqs. (2.113) and (2.114). Mere, one has, KA "Sj1^ » (5.4la) L « S^H' , (5.^1h) and * KB ~ "B "B with Hg - HBB + HfiAh, and Sg * Sfig + SfiAh. Algorithms to calculate f and h, or either separately, have again been founded on approximations of the full Newton-132. Raphson equations by less costly linearly convergent iteration scheme . 5.3.b Methods Based omGj^ and — A Generalization of Algorithm SDNR A direct generalization of the SDNR algorithm based on the equation Dgj^ (f) » 0t. does not lead to an efficient computa tional scheme in a non-orthonormal basis* Because of the inverse matrix SA a change in even a single element of f *(i) changes all of the elements of HA , making updating costly* However, a very simple procedure can be based on the simulta neous solution of eqs. (5*39), this procedure reducing to algorithm SDNR when the overlap matrix S is replaced by 1^ and h by -f+,. The Newton-Raphson equations corresponding to the system (5»39) can be written as the pair H^f + 6hfHA = -GflA » (5.42a) and Sg6f + 6hfSA m -gM (5.42b) These represent 2^^, equations for the nAnfi elements of f and the n^ng-elements of h. The diagonal parts of these equations are of the form (Sf ) B oo (I*AW (SA^rr — 6f or • 6h „ ro gor _ _ (5.43) 133. with the solution, A • -G (SA) - g (H. ) 6for » gr r ggr r , (5.44a) A or and -(Hw ) + (S„) G 6h;r - qggqr acp cr „ (5.44b) ^or where Aar - ("B^oo^A'rr " - »•»*> A computational procedure based on these equations involves roughly double the work per iteration as algorithm SDNR. The A I quantities Gor and gor are calculated as required, while H^, A A £ A I f S^, and the diagonal elements of Sg and Hg are stored* These latter matrices are easily updated, because they are linear in f and h* For a change in f and h\ , one has . or ro * andi (6H')OW * HFL 6f , A sr so or (6S.) = S 6f v A sr so or *' t * (6H-,. )^_ s 6h__H' _ , B oo ro ro , *t v * (6Sl ) « 6h S • v B 'oo ro ra } (s * 1, ..., n^),, (5*46a) (5.46b) Precise computational details of this procedure,, designated as the "Simple Diagonal Newton-Raphson with Overlap" (SDNRS) algorithm, are given in Appendix 4. Again, a quadratic generalization (here QDNRS) can be + obtained. Equations for the exact corrections 6f and ©h ,, 134. required to reduce Gg^ and g^ exactly to zero, are obtained froar (5*39) in the form, ifBf6f + 6hfHA* 6hfHAB6f - -GM , (5.47a) and S*$f • 6h+SA • 6hfSAB6f - -gM . (5.47b) The corresponding "diagonal equations", <"B\o4for + shw(*A>rr + ' "^r-and (si) 6f • 6h* (S-.) + 6h* S 6f » -g , (5,48b) v Woo or rov A'rr ro ro or 6or • w # can be combined to give 6for and dh^ as the roots of quadratic equations. The correction to f is the smallest root of A6for + B6for + 0 * °» (5.49) where * t . *' t A s (SBp00Hro " Sro(HBi ^aa • B * - A -S G + Hi g , (5.50) tior ro or rc&or / and Oor A rr A rr or The correction to h* is then * * ~G„~ - (H4t)rt„6f„r, 6h*a » —£E ar . (5.51) <«A>rr + Hro6far Precise computational details again appear in Appendix 4, 135. (2) 5.3.C Methods Based on Dv '(f) — Generalized Nesbet Procedures As explained in section 3*2, equation (5«40a) can be understood either as D^(f) » 0, or D^(f) « 0, according as * * (1) "(2) HA is taken as the approximation HA ' or Hj . In either case, the Jacobian matrix has the elements i<i> dD F(i) + . I^r df Pt H_^6 .—S__(H.1 ^ )*._" ap rt ap A tr dH (i) <SBA*SBBf>" *fpt (5.52) or Even without the last term involving the derivatives of , this matrix is no longer sparse, and the convergence of iterative methods based on "diagonal" approximations may be adversely affected* For D^(f), this Jacobian matrix can be written, • -or, ft Oop "YoA^ljIAP ^rt<Sap ^oA^A? ^ <HA>tr • (5.53) -1, This is considerably more complicated than before, and must (2) fee handled in a similar way to the Dx ' methods* (2) The Dv (generalized Nesbet) methods are extended to a non-orthonormal basis straightforwardly* As before, it is *(2) reasonable to neglect the derivatives of , giving, »<2) dD i df or,pt ap rt op A tr r(2) (5.54) For some change 6f in f, the operator ' is updated according to 136. 6-(2) . g-l(new)[6ft(D(2) ^ H 6f . S_6fH<2)) t t *(2)-n (5*55) BA *BA"*"A where WfiA • HBA + Hj^f and YfiA * SfiA + S;fiBf. The "Diagonal Generalized Nesbet with Overlap? (DGNS) iteration formula is D<2> 6f » ^ , (r * 1, ....... nA), (5.56) *(2) S__ (H. ) •* H__ oo* A rr oo * (2) for which the updating formula for , eq. (5*55)» becomes A gA «-CWBA}Ao6foA * ' 'BA 'A06foAHA t *(2)d-i (5*57) * Soo6foA6foAHA ' 3-The "Full Generalized Nesbet with Overlap" (FGNS) iteration formula is **LVt*.Az) - HI-1, (5.58) LoA " "oA L~0O"A " "oo-which again, in practice, is treated as a system of nA simul taneous linear equations. The first term of eq. (5*55) now *(2) vanishes, so that the updating formula for HA becomes 4»i2>! - %1(neW,[(wL)A0«aA - <*BA>Ao««A5i2)3- W-5»> For both algorithms DGNS and FGNS, approximately twice the computation is required per sweep through 6f, as for their counterparts in an orthonormal basis. A precise statement of computational details is given in Appendix 4. 137. 5.3.d Other Methods For a non-orthonormal basis, the gradient of the trace of the matrix H over the image space,. SA, of the projection PA is given by - -V- tr (P* H) * 1, (5.60) df df or or where As this trace is stationary if and only if SA is an eigenspace of H (section 2.1.e), one way to determire f is to solve the equation D * 0. Using eqs. (3.11) - (3*15), eqs. (5*61) can be transformed to D = D(2)(f) g?1 A (5.62) which vanishes, as it should, when D^(f) and D^(f) vanish. This equation reduces to eq. (2.45) in an orthonormal basis. Algebraic expressions for the derivatives of D with respect to the elements of f can be obtained without difficulty, and give the Newton-Raphson equations for the system D»0, as '<V'IM<llk>6f<eAlsA«I1»+<YBAe:lwBA+HBB-5lBA>6F«I1 These are somewhat more complicated than the previous eq. (5.35), but not hopelessly so. 138. The Newton-Raphson equations, (5.42), or those arising from D^(f) * 0, or D^(f) * 0, can be solved for 6f using descent methods as described in section 5*2,d« As before, the costly part of the minimization iteration, the evaluation of products like Vj^J»v^, generally require of the order of 2 2 nA nB comPu*ta"tiGnal steps, ih addition to the work required in calculating the Jacobian, and the other vectors entering the product* For the Newton-Raphson equations, (5*42), reduction in computation by a factor of inA results if the blocked structure of the Jacobian is explicitly taken into account, yielding, T.J.T - £ [(vf)or(H^) (vf) •(vh)ro(st) (vf) ] (5.64a) • l t(vf>or(5A)tr(vh)to+(vh)ro(SA)tr(vh)to3 where the search vector, v, has been divided into an f part and an h part. vf vh (5.64b) 2 Equation (5.64a) represents of the order of 2nAnB computational operations for nB>>nA« As for an orthonormal basis, then, the approximate calculation of 6f (or 6f and 6h) from the Newton-Raphson equations using a gradient minimization proce dure is as costly as several iterations in algorithms SDNRS, QDNRS, DGNS, or FGNS, A third alternative is to determine f by minimizing the 139. Hilbert-Schmidt norm of GM and gfiA, or of D*A*(f) or D*2*(f) directly. Only the scheme based on G^ and gg^ is considered here because the derivatives of GfiA and gg^ with respect to f and h are particularly simple, and because the form of the quantity to be minimized is not as simple as (5*37). Since GBA *xas dlmens*ons °* energy, whereas gg^ is dimensionless, the quantity to be minimized should be of the form N - IIGMf * d2||gBA|| - £ (|Gar| • a2|gffr| >, (5.65) o,r where a is a constant scale factor with dimensions of energy. The first derivatives of N are *N * 2(HBGBA + ^BfBA^s* <5.66a) and dN * 2(GBA»r + ^BA^Ts' (5.66b) ST Actual test calculations are required to develop criteria for the choosing of a* It is desirable to choose a in some way which maximizes the rate of convergence, but such a criterion is not easily translated into an algebraic condition on a. The interpretation of a as an average energy scaling factor ••1 suggests a » nA tr G^. 140. 5.3.e Choice of an Initial Estimate, and Improvement of Convergence Rates When the off-diagonal elements of Hi are small compared to differences between diagonal elements in H^ and HfiB, the matrix elements of f are small, and a reasonable (and practical) starting approximation in an iterative calculation is fQ * 0. An improved starting approximation may be provided by the solution of a similar problem when available, or more easily calculated. For example, a possible starting estimate of f for a non-orthonormal basis is an approximate solution of the corresponding problem with S replaced by a unit matrix (ortho-normal basis). Similarly, the operator, f - (5.67a) for the matrix H , related to H by H H 0 mm 0 HAA HAli 0 HmA mnr 0 (5.67b) 0 0 HB1_ with m; > nA, 1 » mi - n^,, and ng * ng5 - m, will also be an improved initial estimate for f, especially when H-j^ contains the most significant elements of HgA» If m is not too large, the (m- - nA) x nA block fl is easily calculated from the eigenvectors of the block Hmm,. eq, (5«67b)r using eq. (2.3).. The idea here is to improve the initial estimate of the larger 141 elements of f. The consideration of asymptotic error constants and rates of convergence (Appendix 5) implies that an improved initial estimate of f may make the difference between convergence and divergence, but, in general, will have little effect on the rate of convergence eventually established. This has been born out in test calculations. Generally, the rate of linear convergence in these algorithms is inversely related to the ratio between the off-diagonal elements of H and the denominators occurring in the iteration formulas. Thus, the rate of convergence will be increased if these ratios are decreased by carrying out a linear transformation to reduce the size of the off-diagonal elements of H, and perhaps increase the size of the denominators in the iteration formulas. Therefore, a partial diagonalization of H to reduce to zero those off-diagonal elements Which,are coefficients of the potentially largest errors in the" error formulas given in Appendix 5, followed by the iterative calcula tion of f (with fQ « 0) in this new basis, will result in improved rates of convergence. The desired mapping, f,. in the original basis is obtained using the transformation equations given in section 2.1.f. Typically, this prediagonali-zation would involve an nr; x: nn block of H (m > n^) containing KAA im addl'fcion to that part of the remainder of H with the strongest coupling to H^. This prediagonalization is especially useful when some 142. denominators in the iteration formulas are small (implying that and H^g., have some nearly equal or equal diagonal elements), since in the new basis, these denominators may be much larger, and rates of convergence correspondingly become significantly improved. If the diagonal elements of HAA and HgB are initially well separated, the effect of prediagonalizing a small block of a relatively larger matrix may not be as noticeable. It must be emphasized that this procedure is not the same thing as the prediagonalization procedure described earlier to obtain the starting approximation f of eq. (5«67a)., The linear basis transformation corresponds to a nonlinear trans formation on the elements of f, and the metric properties of the iteration formula are changed, thereby changing the entire character of the iterative calculation. I*, is easily seen that for ng, » m: > n^. the transformation of H to the new basis,, given by the columns of the matrix V relative to the old basis, and the subsequent back-transforma-tiom of f requires at most of the order of nAnfi operations, because the greater part of the forward transformation matrix is a unit matrix• that is mnr B: VAA V — 0 VmA V— mm 0 (5.68) 0 0 Here V is the m x w> matrix of the eigenvectors of the mi x m mm 143. Block of H, lg is an (n-m) x (n-m) unit matrix, and m » m-nA» TJie transformed matrix H is then H VfHV V H V mm mm rami "BrnXmrni <Vmm> HBB: (5.69a) 0 HBA 0 mm Bm » (5.69b) * _^BA t HBm The reverse transformation for f is t • [(^^^(^w'K^Jiitt^j^'r1.. (5.70) BB AA 'AEJ The eigenvectors in vmin are normalized with respect to the corre« sponding m x m. block of S, that is, vrom^mmymmS5lm» and tnerefore» the inverse of the transformation' V* is 0 rt-1 . S V mmi mm (5.7D Using this, the transformation (5«70) becomes,, f = (SV)1A + <SY>Imfm [(SV)AA+ (SVj.-f-r1, (5.72) 'AA Am m-where the operator f im the partially diagonalizing basis has been written f » The evaluation of the right hand side of eq, (5.72) requires 2 of the order of nAnfi; operations when n^ » nA* No direct handling or manipulation of an n x n matrix is required. 144. 5.3.f Test Calculations With Overlap A series of test calculations have been carried out using algorithms SDNRS, QDNRS, DGNS, and FGNS.. In the model problems examined, the basic matrix H was the same as that used previously in the calculations without overlap, namely, with diagonal elements equal to the first n odd integers, and the off-diagonal elements all unity. The overlap matrices were of the form S * tn-l n-2 _n-l „n»-2 (5.73) This matrix is positive definite for all a < 1* It resembles the overlap matrix for a linear chain of atoms, with overlap falling off with distance (S^ * a ^""^ ),but it also serves to model a configuratiom interaction calculation in which overlaps fall off with energy differences. For a = 0, the orthonormal case is recovered, while as a approaches the maximum value unity, the eigenvalue equation, (2.101) becomes ill-conditioned. At a * 1, all but one of the eigenvalues of S vanish, and the eigenvalue equation is singular.^ All other computational details are the same as for the -^For large n, the eigenvalues of (5*73) will not differ significantly from those for the corresponding McirculantM matrix (Rutherford, 1949) of the same dimension. For such a matrix, it can be shown generally that the eigenvalues range between (l-a)/(l+a) and (l+a)/(.l-a). with the greatest concentration of eigenvalues near the lower end as a approaches unity. 145. calculations of Table 5*1• As before, SA is the space of the basis functions corresponding to the first nA diagonal elements of H. The results of three series of calculations are given in Tables 5*2 - 5*4, which include information on the effect of varying the initial approximations to f and h, and of prediagonalization. As before, the rates of convergence decreased only slowly with increasing dimension of the eigen value problem. Table 5*2 shows how convergence rates vary as the overlap integral a increases from zero to 0.9t with nA and n held constant. It is seen that all the calculations diverge between a « 0,4 and a = 0.6, except those with prediagonaliza tion, for which the upper limit overlap is between 0.8 and 0.9* In this case, the rate of convergence of the algorithms SDNRS and QDNRS at first deteriorates only slowly, but changes abruptly to divergence between a = 0,8 and a « 0.9. . For DGNS and FGNS, the deterioration of convergence rates and onset of divergence is more gradual. Initializations in this series of calculations tend to favour convergence to eigenspaces corresponding to the nA lowest eigenvalues. Except where noted, convergence in all cases was to the space SA corresponding to the nA lowest eigen values. The only other combination of eigenvalues obtained from a convergent calculation consisted of the nA - 1 lowest eigenvalues together with the largest eigenvalue. A possible explanation of this is that the iterative corrections are 146. Q) tu o 2: to i o O si ^ 5 .2 CO U. ^- 3 3 §T £ *g: ^0 d 0* 0' 0 0 o- ^ © JO m K *>flO^ .*» to cD«i 0 d 0'^ o'~» e S;^ 8 IS ^^rf o*"- 0 <3 o"w O' <"* V» 0 . 6 0 < 0 © r- J> Q 0- . "» f* *™ «* 00 °^ 0 0 6' <$ 0' 0' -® **» - en 0 «n •» a-eD o* 0' 0' 0""-e m r» '0 *- 0*-Jn Cf 0' °* 0* 0'S CP a: z A C5 & ^ ^ •> 0' 0 0-0 00 c ro 5 ft — ~ 0 0' d d ^ 0'" n $ £ ^ © S* S ^ ^ V 0' 0" 0' d ^ ai •z @ & _> 0' do -5 5 3 JS > 8 .>• do d d 0 ©w -5 CO -O X * R .v d 0" 0 ^ ^ t ^ £> 00 O 0 ^6 Q — <~i ~0 t~- Cfi 0— c> 0' ^> d 0 d d d *«* *> a = = = . = . = i*-> - - = - - - = 147. TABLE 5.2 (continued) afQ = 0, no basis change. f calculated from eigenvectors of upper diagonal 10 x 10 block. °f =0, iterative calculation carried out in basis diagonalizing upper diagonal 10 x 10 block. f = f calculated for a=0 (non-overlap case). o eBlank spaces indicate no calculation was carried outi bracketed numbers indicate number of iterations before linear convergence is established (error constants were determined neglecting these points). Convergence to lowest four eigenvalues, and largest eigenvalue, llDII slowly decreasing. gllDl| = 0.34 after 50 iterations — but convergence apparently is to lowest 5 eigenvalues. hllD» slowly increasing — calculation may be divergent. 1 a HDII^ 10 after 50 iterations and is oscillatory. J Calculation divergent if iterative scheme restarted in original basis.. \\D\\ diverges slowly in partially diagonalizing basis, but begins to converge in original basis.. ^Calculation! is possibly converging slowly after 28 iterations. ^Indicates that iterative calculation is divergent, llDll increases over most of first 5® iterations. 148. reflecting the situation which would occur if a actually were greater than one. As a increases through unity, the largest eigenvalue increases to .+» , re-emerges from -co t and becomes the new lowest eigenvalue, while the corresponding eigenvector direction presumably changes little.. Table 5«2 shows that the rate of convergence, once linear convergence is established,, is effectively independent of the starting f• An improvement of the initial f may slightly reduce the overall number of iterations, but does not increase the rate of convergence. These results also illustrate the substantial improvement in convergence rates (as well as the substantially wider range; of values of a over which convergence is obtained) resulting from prediagonalization of a small block of H.. This improve ment is not due to the improved starting approximation, but to the change of basis. Table 5»3 gives rates of convergence for a set of calcula tions in which the basis space S^ does not correspond to the nA smallest diagonal elements of H.. It is seen that convergence rates are very poor indeed, and that a large proportion of the calculations did not converge at all. When convergent, DGNS and FGNS did not give the lowest n^ eigenvalues in these calculations. However, except with prediagonalization, SDNRS and QDNRS still give the lowest nA eigenvalues! the .rates of convergence being far superior to those of DGNS and FGNS in these cases. It o O" 8 d 0 CO Ci d O' O' 85 o */> ® r-. <-| ti «*> Vn d —i d d-to il. • O' r-o-d •a Ci • ^ 8 ti «*» •a C$ <r> «—^- •o cr- «** • O cr «•» I". d tvo .-> d to ® d 6 •J a • ~-> o-S A ® o' r~ d v» o» a d §s d*~ M RS O o -o M RS 0""» S3 •o •* cn O* • ^ 00 a ."2 -a T) e *~ «r)t> cr ."2 -o OCT d- •3 -0 <o QC & .2' .2* H5 CM ©' CO o' •o 00 rt CJ CO % S .•2--© .> fif s rt a •«* "ST ^ sr «o" o-r-" c>C or-; 4 H vn vn vn 150e TABLE 5.3 (continued) aBracketed numbers indicate the number of iterations before linear convergence is established. These points are ignored in calcula ting the rates of convergence. Convergence is to the lowest nA eigenvalues im all calculations, unless otherwise noted. DfQ « 0, no basis change. cfQ calculated from eigenvectors of upper diagonal 10 x 10 block, d f 88 °» iterative calculation carried out in basis diagonalizing upper diagonal 10 x 10 block. efQ * f calculated for a * 0 (non-overlap case). converges to eigenvalues #1, 2, 3, 4, and 6. ^slow convergence (or possibly slow divergence), eigenvalues after 50 iterations apparently #1,, 2, 3, 5, and 6. nslow oscillation in llDll, eigenvalues after 50 iterations are #1, 2, 3. 5. and 6. ^"convergence apparently to eigenvalues #1, 2, 3, 6, and 7 — modification Q) unstable after transformation back to original basis. ^convergence to eigenvalues #1, 2r 3. 6, and 7 — convergence continues after back transformation. 151. On the other hand, comparatively good rates of convergence to eigenspaces SA not corresponding to the lowest nA eigen values of H were obtained if the iterative calculation was carried out after prediagonalization (m = 2nA in Table 5*3)• In fact, it is clear that prediagonalization is necessary if higher eigenvalues are to be obtained reliably/ and efficiently. If the resulting back-transformed f-operator was used as an initial approximation for a calculation in the original basis,the calculation diverged rapidly ih a number of cases, even though this initial approximation to f yielded values for ||D(2)|| or [ E (G2r • g2r)]* which were less than IO*12, This o,r indicates that in certain cases, no improvement in the starting approximation for f (without also changing the basis) will lead to convergence — the asymptotic error constants defined in Appendix 5 must be predominantly greater than one, leading to an increase in the errors eor im fort regardless of how small the ear are initially. By transforming to the, partially diagonalizing basis, the most important of these error constants are reduced to zero, and convergence occurs. The calculations using the generalized Nesbet algorithms, DGNS and FGNS, frequently consisted of a few initial iterations (2) during which ||DV '|| changed relatively rapidly, either increasing, or decreasing, or both, followed by a region of (2) apparent convergence in which ||DV '|| , decreased extremely (2) slowly. In such cases, it was not unusual for IIDv llto L < decrease by only one part in 10 - 10v per iteration. In 152* many of these calculations where convergence was very slow, certain of the nA eigenvalues of the effective operator GA were surprisingly accurate in view of the large value of |ID^2^||. In several cases, with final l|D^2\ll in the range 0.1 - 0.2, those eigenvalues of GA belonging to the lowest nA of H were obtained accurate to eight or more figures, whereas the remaining eigenvalues of GA were much less accurate. The poor convergence is thus apparently associated with determining that part of S.A corresponding to eigenvalues not among the lowest nA» For convergent calculations, also, the plots of log(||GBA||2 |lgBAH2)*., or log ||D(2)|| as a function of iteration number often exhibited "induction periods" before linear conver gence was established* Figures 5«1 and 5.2 show such plots for two groups of calculations. The shape and length of these induction periods depends strongly on the initial f. Typically, only 5-10 iterations are involved — the example in Fig. 5*2 is an extreme case in which over 30 iterations are required before convergence finally occurs. As indicated in Table 5»3t the two converging calculations in Fig. 5*2 are to different eigenspaces SA» Figure 5*1 illustrates clearly the independence of the rate of convergence on the starting approximation of f* Table 5*4 gives rates of convergence for a series of calculations in which the first nA or nA+l diagonal elements of H are nearly equal* When the first nA diagonal elements of H are well separated from the rest, convergence is rapid. 153 FIGURE 5.1 Algorithm SDNRS Hii • 1, 3, 5. 7, U, 9. 13, 15, .... 39. nA * n * 20* 01 8 0#2 1. fQ * 0, iterative calculation in original basis*. 2* fiQ calculated from eigenvectors of upper 10 x 10 block of H* 3* f- * 0, iterative calculation carried out in basis o diagonalizing upper diagonal 10 x 10 block* 4* fQ * f calculated for a = 0 (non-overlap case)* 15k. FIGURE 5*2 Algorithm SDNRS Hu » 1. 3. 5. 11. 13.. 7* 9., 15*. 17* .... 39i 1^5. a*20, a«0.2. !• f0 e °. iterative calculation im original basis. 2* f& calculated from eigenvectors of upper 10 x 10 block of H* 3«. f « 0. iterative calculation carried out in basis o diagonalizing upper diagonal 10 x 10 block* k* f^ « f calculated for a « 0 (non-overlap case)* 155. re Uj O o O vn -1 3 §5 §£ Is s$ §S u» CS Oi O d Q d d O © d ti ddodd d d 6 o* ti 6 0' o' 0 © d d* S S? ti 2£ |s •z v!I A o* Q' o cj o* d e> d ti d ti ti © ci d ° ® ti ti o ti ti o ti d * ?S $ 7s 5 a 3 g d •< d d*-' ©' d do S z A CJ 5 t ^ ^ Is * cr ti 6 d d- d d o d ©tidddo^d 6 d d d ti d © © «> ® •£ $P ' ^ »- r~ — *•» 'i* ;J. rt eo iyj In ti ti d d © d d ti «n or 1) «X «*> w> ti ti ti d""" ti o o ti © «• £ $ ^ S S c> d d d o' d d o' «0 e*~ 5 S" ^- "* Cr © ti ti ti do* d ti~ •a e » > S $ 1 * ^ IT d ci ti d o d ti © « <"« Cf rr * •* tt ci d o1 Ci 6 Ci O - _> -o 3: ^«. < r-L co .«*»«*>•>. > >* c.-3 «*> *C «o <ns rr," 3 . 3 i* c,* a" 2 F &3rt&$3$3 Vnw>vnVA\AiniAVA 156. TABLE 5»k (continued) aAll calculations converge to the lowest 5 eigenvalues unless otherwise noted—bracketed numbers indicate the number of iterations before linear convergence rates are established* **f * 0, no basis change. cf0 calculated from eigenvectors of upper diagonal 10 x 10 block. df0 * 0, iterative calculation carried out in basis diagonalizing upper diagonal 10x10 block. efQ = f calculated for a « 0 (non-overlap case). f converges to second lowest eigenvalue—in cases with long induction period, there is a shallow minimum in II DU after between 10 and 20 iterations, after which it increases to a maximum before decreasing again. Apparently converging to eigenvalues #1, 2. 3, 4,, 9| after 50 iterations, JlDll * 14.8. apparently converging to eigenvalues #1,-2, 3, 4, and 8j after 50 iterations, RDII • 5.9. ^apparently converging to eigenvalues # 1, 2, 3, 4, and 7i: after 50 iterations, l|Dll * 5.8* 157. The denominators in the iteration formulas are large, so that the asymptotic error constants are small, and the iterative calculations well-conditioned. When the first diagonal element of HfiB is close to diagonal elements of (and a * 0.2), the rates of convergence of the algorithms SDNRS and QDNRS are virtually unaffected, whereas, those of DGNS and FGNS deteriorate to a much greater extent. The greater the number of diagonal elements of Hfifi near those of H^, the slower the rate of convergence, as evidenced by the poor convergence here when nA = 1 (Nesbet algorithm). The Nesbet algorithm is apparently converging here to the second lowest eigenvalue of H at the 50 iteration, but the convergence is very slow. In the calculations • reported in Table 5»4» convergence is normally to the space SA corresponding to the lowest nA eigenvalues of H. The only exceptions are the most poorly converging generalized Nesbet calculations, in which the space SA corresponds to the lowest nA - 1 plus the (nA+l)st, or (nA+2)nd eigenvalues of H. Generally, the rates of convergence shown in Table 5*4 decrease as the overlap a increases. From Tables 5.2 - 5«k, it is seen that the two algorithms, SDNRS and QDNRS are generally more reliable than the generalized Nesbet algorithms. However, when convergence occurs, the rates of all algorithms are similar. Again, algorithm SDNRS is easier to program efficiently than the others, and, unless the size of the problem makes the extra storage to hold h a critical factor,, this is probably the algorithm of choice. While the eigenvalue problem is more difficult with overlap, these methods are useful, especially with prediagonalization* It appears that attempts to obtain improved starting approxima tions for f are not of great value, and im particular, the solution of the corresponding problem for an orthonormal basis was frequently the worst starting approximation tried in these calculations* 159.' 5.4 Multiple Partitioning Finally, the possibilities for efficient, practical iteration procedures for solving the m x m partitioning equations of chapter 4 are considered. The basic strategy is again to obtain efficient linearly convergent iterative schemes by approximating the second order convergent Newton-Raphson equations corresponding to the nonlinear system to be solved* Three sets of equations were introduced in chapter 4 for the determination of the off-diagonal blocks of T in an ortho-normal basis* They arei (1) «4J>(T). = Hj, • HJKfKI • f^P - 0, K/I (I,J = 1,..., m, l/J), (2) D^T) - HjI • £ HJKfKI * fjjHj2) - 0, K/I (I,J a 1, •*., m, I/J), and the pair of systems, (3) GJl(T) « H„^HJ1fK^jf^^fJXK'KI " «f K^I «JI^> s fIJ * fJI * L^ fLJfJI * °' (5.76b) L/J (I < J a 1, ••*, m)• A fourth set of equations, intermediate between (5«74), (5*75) and eqs* (5*76) are 160* D^}(T) = (g"1G)JI • 0, (I.J = 1, .... ro, I/J), (5.77) A 1 " which arise from the condition that (ST) HT be block diagonal. These four sets of equations have the same solutions, however, they lead to algorithms which are quite different. While the iteration schemes derived from eqs.. ('5«?4) - (5*76) are very similar to those developed for a 2 x 2 partitioning, a major difference is that the complexity of the orthogonality conditions (4.8) makes it a practical impossibility to explicitly A eliminate half the elements of the off-diagonal blocks of T before solving one of these sets of equations for the remaining elements. In all algorithms, therefore, it is assumed that the A elements of all m(m-l) off-diagonal blocks of T are to be determined. A description of possible algorithms for solving the four sets of equations above is given in Appendix 6. No numerical testing of these algorithms has been carried out. The determination of the matrix elements of the off-A diagonal blocks of T for an m x m partitioning in a.non-ortho normal basis involves complications only in detail due to the presence of the overlap matrix. Equations for this,case have been given in chapter 4, and they may be handled in essentially the same way as those defining equations given above. 161. CHAPTER 6 PERTURBATION THEORY 0 polish*d perturbation} golden; care! That Hreep'st ports of slumber open wide, To many a watchful, night! (Shakespeare, King Henry IV. Part II) I wouldn't lose any sleep over it" (wise old saying) 162, 6>1 Introduction Only the simplest problems in quantum mechanics can be solved exactly. As a result, perturbation expansions of some sort are involved in many quantum mechanical calculations. Perturbation series for effective operators are useful in treating a set of degenerate, or nearly degenerate levels with one or more perturbation parameters,, especially when the degeneracy is not split in first order. Effective operator perturbation series are also useful in developing physical pictures, as, for example, in uncoupling the Dirac equation to obtain equations for electrons only.. In this chapter, perturbation series are developed for the effective operators HA, G^, and H^, defined for a 2 x 2 parti tioning in terms of the operator f.. These series can be derived straightforwardly because of the relatively simple algebraic form of the relations defining the operators. The absence of constraints or\ auxiliary conditions on f makes possible efficient computational schemes for automatic sequen tial calculation of the terms in the perturbation series toj arbitrary high order.. The perturbation formulas are not complicated by degeneracies at any order, so long as all eigenfunctions in a given degenerate set in zero order are partitioned into the same space. In fact, as will be seen below, the presence of degeneracy tends to simplify the use of these perturbation series* Two examples are presented to illustrate the perturbation 163 formulas derived. These are the uncoupling of the Dirac equation for a spin-i particle, and the construction of a nuclear spin hamiltonian im esr theory. Perturbation of the projection PA which, in molecular orbital theory, becomes the one-particle density operator, is considered in the following chapter, and the formalism is extended there to the related, but more complicated, self-consistent field molecular orbital problem. 164. 6.2 2x2 Partitioning — Orthonormal Basis 6.2.a General Discussiom A perturbation formalism based on the material presented in chapter 2, and in particular, on the eigenvalue equation! (2.1),, will be considered first. It is assumed that the hamiltonian H can be written as an infinite series 00 H = Z H(n), (6.1) n=0 where the perturbation parameter or parameters are to be cont-sidered to be included implicitly in symbols like Hv , which is of order n im the perturbation. The operator f is written as an infinite series, f = ? f(n). (6.2) n=0 Substitutiom of these two series into the condition D(f) = 0, eq. (2.16), defining f,, yields the series D(f) = £ D(n)(f) = 0, (6.3a) m=0 where ^(f)-^* S (HBn-^f(JLfU)Htn-j).f(j)n^^^ °" 1S (6.3b) s4A)+.^0(4B"J)fU)-^U)"An"J))- (6'3c) *% The series for HA is given below. Since (6«3a) is implicitly a power series in one or more arbitrary perturbation parameters, D(f) will vanish as a whole only/if each term vanishes. Thus, 165 a hierarchy of perturbation equations is obtained, D*n)(f) = 0, n 0,1,2 (6,4) from which the fcan be determined.. The zero order condition; is formally D<0>(f)=K^UH(«)f(0)-f(0»H^>-f<°)H<^f(0». 0. (6.5) which is just the original condition defining f for the zero order operator H^0^. Unless H^ is block diagonal, f^0^ will not vanish,-and consequently, the D^n^ will depend on f^Q\ taking the form D<^f) = (H<^ • A(n)(f(n-l)f #>#f f(0)j . J(0)tf(n).f(n)5(0)+A(n) . 0f (6#6) where A^n^ is a quantity depending on terms in the series for f of order n-1 or lower. General solutions for the n^ng-dimen-sional system of simultaneous linear equations, (6.6), cannot usually be written down, and the fv must therefore be deter mined by numerical methods. If H*0) is block diagonal, then f*°* = 0, and eqs. (6.6) become H(0)f(n).f(n)H(0)!s.A(n)(f(n-l)f _ t(l))§ (6#?) which is again a system of nAnfi simultaneous linear equations» which, in general, must also be solved numerically. However, these equations are considerably simpler than eqs. (6.6). 166. Finally, if is diagonal, eqs. (6.7) reduce to (H(0) - H(0))f(n) - -A(n)„ (6.8a) oc rrr or or^ ' and ih this case, the solution can be given explicitly,, f" • «">> H<°> ' (6'8b> rr oo ) Here,, again, Greek letters refer to basis elements in the sub-space Sg., and Roman letters to basis elements in S^, In general, for f(°) = o, the A^n^ are given by A<n>-H&)*V(45-3>f").f<i)i4j-J)).N? ^'VJWJ-H)^), j=i aa aa j=i i=i Aa (6.9) which is obtained by deleting terms depending on fand f^ from eq. (6.3b). When the series for H contains only a few terms — for example, when H = H^ • H^1^ only -- it is more (n) In) useful to group terms in the Dv and Av ' according to the order of the hamiltonian, H, rather than f, in the term. For A*n\ this gives kal 3*1 (6.10a) -^•^(Hl^^^-f^^Hl^). (6.10b) k=l A Table 6.1 lists the first few members of the perturbation hierarchy, D^(f),, for the case f^ = 0. Explicit formulas, in the format of eq. (6.10a), for low order A^ are obtainable from: Table 6.1 by deleting the f^ dependent terms in the D^. 16?. Perturbation formulas, in terms of the f<n> and H<»>. for all of the other quantities defined in sections 2,1 and 2,2 follow directly from their definitions. The formulas for them presented in the remainder of this section apply when f<°> . 0. Using eq. (2.65a), the series for the effective operator HA is found to be HA - S H<n). (6.11a) where A "A n=0 fit(O) . „(0) HA * AA » W(1) - H(l) HA AA * (6.11b) and HAn) = HBn) • HAJ-^f(^t <m> 1). For GA, given in eq.o(2.67b), one obtains G. * £ a[n), (6.12a) n=0 A where, and, P(0) . H(0) GA ~ AA • GA AA * (6.12b) + ^2n-j-l f(n.j.i)tHU)f(i)t! (n>1) j=l i=l BB 168 The metric gA has the very simple series where n=0 A g(0) - 1 . A *A* °° £n), (6.13a) «A1} * G' (6.13b) and A j=l If the hierarchy, (6.4), is used to explicitly eliminate the terms in H'g^ from eqs.. (6.12b), the resulting series for GA will be identical to that obtained by expansion of the relation GA = gA**A' 6(1,1 (2,7°)» 'tna'fc is G<n) « S glJ)HAn-J}. (6.14) That the two expressions, (6.12) and (6.14), for GA are equiva lent if and only if the equations of the hierarchy (6.4) are satisfied, is in accord with the fact (section 3»l) that the two definitions, GA • (T^HT)^ and GA « Sjfik9 are ecluivalen"t if and only if D(f) « 0. An advantage of eqs. (6.12) is that they are the same whether or not the basis is orthonormal, (n) whereas, any formulas incorporating the relations Dv '(f) * 0 explicitly must be different in a non-orthonormal basis, since the condition D(f) depends explicitly on the overlap matrix ih that case. +4 Perturbation series for the powers, gA » of gA» can be obtained in several ways outlined in Appendix 7« Given these, 169. TABLE 6.1 D(n)(f) D(1)(f) D(2)(f) D(3)(f) DW(f) D(5)(f) H(0)-(l) ^(lMO BB f "f AA BB f "*f AA • HBfi?f(l)-f(1 H(0)f(3) f.(3)H(0 BB f "f AA + H<B>f<2)-f<2 + 42)f(l)-f(l H(0)f(k) f(k)„(0 BB AA • H<2)f<2>-f<2 H(0)f.(5) f(5)H(0 BB: f f AA • H<B¥k>-f<k • HB2>f(3).f(3 * HB3)f<2>-f<2 (1) BA H(l) . H(2) AA + BA AA "f AB f H(2). „(3) AA + BA H(l)„f(2)H(l)f(l)-f(l)H(l)f(2) AA AS..1 A-&> H(2)-f(l)H(2)f(l) AA A13 H(3) + H(k) AA * BA H(l).f(DH(l)f(3).f(2)H(l)f(2) AA A 33^- AS f.(3)„(Df.(l) "f AB-: f H(2)-f(l)H(2)f(2)-f(2)H(l)f(l) AA .A13. i^B' H<3)-f(l)H(3)f(l) 170. TABLE 6.2 E^n\ «{0) - «<•> " «<»>• H«|)f<^+H(^f<^^),(3) "I6' " H^)+Hl|)f<l)^)ff2)+H<3)f(3)+„U)f(^+„a)f(5) the series for the effective operator H*A can be written down from eqs. (2.74) as oo ~(r,) H. = Z HA , (6.15a) n=0 A where, H<"> -t ^W'ilW-J'^"1, (6.15b) 1=0 j=0 A or H<n> .£. S^11^1^1^ (6.15c) i=0 j=0 A A A Explicit expressions for the lower order fij^,. G^ * and „ according to eqs. (6.11), (6.12), and (6.13), and for the HAn*, are given in Tables 6,2 - 6,5. All three effective operators, HA, 6A, and HA, are identical in zero and first order. In general, they differ in second and higher order. 171. TABLE 6.3 GJn) •i° • -ii' G(2) = H(2)4.H(l)f(l)+f(l)tH(l)+f(l)tH(0)f(l) A AA AS SA 8S GA3) " H^XfM1^'1^' • f'^H^M^+f^^f'2) • HA£ > f< 3 > *f < 3 > FH<l >+f <1 > FH < * > f <2 Kt<2 > 'H'B f <1 > • f(1)tH<°)f(3)+fU)tH(0)f(2)tf(3)tH(0)f(l) IT s s 0 = f(l)tf(l > s f(2)tf(l )+f(l)tf(2) gA s f(3)tf(l )+f(2)tf(2)+f(l)tf(3) g(5) gA s fwtf(l )+f(3)tf(2)+f(2)tf(3)+f(Dtf(4) gA s f(5)tf(l )+f(4)tf(2)+f(3)tf(3)+f(2)tf(4)+f(l)tf(5) 172. TABLE 6.5 HJnJ ^(0) s H(o) A AA A AA H[f(l)tf(2)+f(2)t1,(l)t H(0)-, HA AA AB AB f AB: f +i[fA)tf(3)+f(2)tf(2)+f(3)tf(l)f ni0)^ +i[f(l)tf(2)+f(2)tf(l)f HU)L +i[f(Dtf(l)f H^+H^f^]. Several alternative formulas for the terms of the perturbation series of these effective operators can be given, the usefulness of a given set depending on the situation. The formulas given above are not particularly well suited in some cases for the calculation, of high order terms. Procedures for deriving alternative series are given ire Appendix 7». along with a tabula tion of some alternative formulas. The perturbatiom series (6.9) through (6.15) are rather general in that they give the various effective operators ultimately in terms of the f^ and H^. However, as indicated in eq. (6.8), if H*0* is diagonal, the ff*n* can be written explicitly in terms of the matrix elements of the H^n^... Expansions corresponding to each of eqs. (6.8) through (6.15) in terms of only the perturbed operator H will be given in the next two sections. 173 6.2.b A-states Degenerate Explicit perturbation formulas are especially simple when the eigenvalues of H^0^ corresponding to the subspace are all equal, say, to In its eigenbasis, is just a multiple of the unit matrix, and the f^n^ are defined by (HBB} " €A1B)fU) " -A<n)» <6-l6a> with the solution f(n) a ^(n^ (6.16b) where is the reduced resolvent matrix evaluated at 6^.and restricted to Sg.. The useful point here is that L is a matrix, not a supermatrix. If Hgg^ is diagonal, thef^n^ are given simply as the products of the matrices A^n^ with the nfi x nfi diagonal matrix L, and thus, relatively simple matrix expressions can be given in terms of H only, for the various perturbation series derived in the previous subsection. Perturbation formulas of this type are given in Tables 6.6 - 6.11. When HggV is not diagonal, the f^ must be determined by solving a system of nAnB; slmul'taneous equations. Substitution of eq. (6..l6b) into eqs. (6.10) yields 4i>= A< • ''-"fcH^f <n-*>-f («>-k)S< W) » f^JV^Vc^f^-f^^Hi^). (6.18) k*l BB A which can be used to eliminate high order f^ from perturbation TABLE 6.6 (A-states degenerate) •H<|>f<2>-f<2>&J2> CABLE 6.7 f(l" (A-states degenerate) + /W(1)TH.(D TH(l)H(l)wH(l) ' BB: LHBB "LHBA AB ,LHBA + rT „.(1)7 TH(1)H(1)+I2H(1)H(1)2 +lL»HBB i+LHBA AA +L BA AA 175 TABLE 6.8 H^n* ~ REDUCED FORMULAS (A-states degenerate) a{0) • < ap) = i^»««>'f(1>+*(1)ti^)«(1)ti^!,*<1)-*(l>t'(1)Mii) + H(2)f(2)+f(l)tK(2)f(l)-(f(l)tf(l)*(2) AB BB: A + f(l)tH(l)f(2)_f(l)tf(2)g(l) BB A •H<3>f<2W2,tHJ3>«<1>tl^>f(1>-f(1)tf(1>Hp> +f(l)tK(2)f(2)+f(2)tH(2)f(l)_f(l)ff(2)g(2) BB BB'. A .f(2)tf(l)j^2)+f(2)tH(l)f(2)-f(2)tf(2)j(l) . [f(i)tf(3)f5(i)L .. H(6)+Hi(5)f>(l)+f.(l)tHi(5) HA " AA AB f +f BA +HA|^f(3)+f(2)tHB3)f(l).f(2)tf(l)^3) +f(1)tl43)f(2Lf<l)tf(2)gp)+f(2)tH(2)f(2) +f(2)tf(2)5(2)+f(l)tH(2)f(3)-f(l)tf(2)J(3) A BB A +f(2)tHBB/f(3).f(2)tf(3)5(l)+ [5ADf f(l)tf •>)-,_ 176* TABLE 6.9 HAn^ (A-states degenerate) u(DTH(l) AB. LHBA „(2)TH(1) . „(1)TW(2) AB LHBA + AB LHBA H(1)TW(1)TH(1) W(1)T2H(1)H(1) AB LHBE LHBA " AB L BA AA H<3)L„U) + „U)M<3) + S(|)MU) . KU)^)^) H<£>LH&>LK<i> • ^^LHg) - H(|)L^)H(1)M(1J H(1)TH(1)TW(1)TH(1) H(1)TH(1)T2H(1)H(1) AB LHBB LHBB. LHBA ~ AB BB L BA AA .H(1)T2„(1)T„(1)W(1) . „(1)T3W(1)H(1)2 AB L BB LHBA AA + AB L BA AA TABLE 6.10 QAn) (A-states degenerate) ?H(1)TW(1) H(1)TM(0) (1) AB LHBA + AB! LHBB LHBA (2) (1) (1) (2) . o„(DTHU)TH(l) AB LHBA + AB LHBA * 3HAB LHBB LHBA H(1)T2H.(1)H(1) W(1)H(1)T2H(1) HAB^ L BA AA " AA AB L BA (1) (0)Trw(2) W(1)TH(1) T„,d)„(l)\ AB LHBB LCHBA + BB LHBA " LHBA AA } TABLE 6.11 HA  (A-states degenerate) 177. (n) *(0) HA - W<0) ~ AA HA AA Tj(2 ) HA . „(i)TH(i) + AB LHBA S(3) HA + HABl)LHB^> . „(1)T„,(2) + AB LHBA + HAai)LHB^)LHBJ) For H = H?.*0* + H(1* only* HA« - B<|'toS>l^>l«»Up8Al,.(Hi|»L L.H<B> H(ff<1'^H(|)L3„U)}+.4{IA2).„Al)L„<A)j + s('5) - H(i)TH(i)TW(i)*H(i)TH(i) HA " AB LHBB: LHBB: ^BB. LrtBA HlffAl)^H(|)L(t2HB|)+LH^)L.4|)L2)LH(A)! + ^H|B,^Bi)CH{B)^El,.-H{1)3J-1?8, expressions for the effective operators. Substitution of eqs. (6.18) for H|^, (J = 1, n-1), in the formulas for HA2N^ and HA2N+1\ and simplification of the resulting expressions -1 (n) using the formulas for L~ fv ' given in Table 6.6, results in *(n) the so-called "reduced" formulas for the HA given in Table 6.8. It is seen that, in the formula for nj2n) and H|2N"1"1^, all terms containing f(n+1^ through f^2n^ are in the form of A commutators with a lower order term in the expansion of HA. When nA * 1, these commutators vanish, and one obtains a 2n+l rule in the sense that f through order n is sufficient to A determine R*A correct through order 2n+l. Similar explicit results have not been obtained for the operators GA and HA, but the discussion of section 3«2 implies that errors in the eigenvalues of these two operators, when calculated from f correct through order n, should be of order 2n+2. For nA/l, none of these effective operators can be given correct to order n+2 or higher solely in terms of f^1^ through f^. * (2 ) ~(2) In this case, HA and HA are identical. In general, all three effective operators are different in third and higher order here. 6.2.C A-states Non-degenerate If the eigenvalues of corresponding to the subspace SA are not all equal, then the factorization (6.16a) is not possible, and it is necessary to calculate the matrix elements 179. ( Y\) of the fvn; via eq. (6.8), which can be written formally as f(n) a ^(A(n))4 (6.19) Here, X is a superoperator, which, when acting on the operator produces the operator f^n^. The superoperator mC can be represented as a four index matrix, so that (6.19) becomes An) _ E j> An) xor " * ^or.^ft * (6.20a) If H^ is diagonal, with eigenvalues €?, then or,^ot eo _ €o ap rt* r " o (6.20b) In this case, the perturbation formulas are for single matrix elements. Tables 6.12 - 6.15 give some low order formulas of this type. One application of these formulas is in molecular orbital theory. The application to the derivation of Coulson-Longuet Higgins type Huckel theory is outlined in the next chapter. 180, TABLE 6.12 f (n) .(1) _ 'or H (1) "or €° - €° r o .(2) _ or H<?> • or n H(DH(1) 2 ou- ur U = l 6 - € r (i L Hat Htr t=l €? - €° , , t o J r .(3) _ or or OU [XT n ,=1 - €; . E <>* tr _ 2 t a B 2 ox tu ur =i t=i («)(€°-€°) + E . -O^O ^ r u , . . nfi H(DH(1) nA (1)H(1) \ H(2)+ E HY Yr _ £ ut tr - - . €© y II / ur Y=l € t=i e; ^(2)+ £ ou ut _ £ os st ot .._« ,.0 ^o ^o S = l €" - €' s o H (1) tr t a r a 0(0) . „(0) Hrs " Hrs u(1) . H(l) Hrs " Hrs TABLE 6.13 H (n) •A— u( 2) Hrs - H<2) rs lA + E U=l H(1)H(1) Hru Hus €° - €° s IX 0(3) Hrs " Hrs nB H(2)H(1) • H(1)H(2) + I nru n|is nru "us U=l €o _ eo n B H. (1) ru U=l € - € ^ S U L B nUY nYS Y-l *° " *° 1 S Y nA Hut t=l € - € t U 181 TABLE 6.14 G (n) G(0) = H(o) rs rs Grs ~ Hrs .(2) _ rs H(2) • ZBH(1)H(1) (€^€a"€^ rs x ru us (€o_eo)( o 0) ^ v r u'v s u .(3) . rs rS c.l u(2)H(D u^M2* Hro H0S + Hro Has €° - e° e° - €° LB cr r 0 n + Z B nB H(1)H(1)„(1) Hro Hou Hus 0=1 ,=1 (e°-€°)(€°-€°) • Z r r° 0=1 r o H(2)+ EB H0U Hus nA H^H^ _ ox ts t=l - € o J S 0 + Z 0=1 nw H(1)H(D /« \ m rl„ n HV ; + Z nA HilM1^ ru uo _ E "rt "to t o €°-€° r o €oH(l) S OS €°-€° S 0 182. TABLE 6.15 H[n) 6(0) _ H(o) rs rs Hrs " Hrs *(2) Hrs rs + ^ [i<€°<)-€°] „(i)H(D Sri' rs = H. (3) rs H(DH(1) + L ro os o=l €" - € nB nA + i z z o=l t=l „(1)W(1)H(1) Hro Hot Hts LK<^1- <o> (DH(i)H(i) rt to os nB „ ^o o o=l € -€ s o r o as m, H(1)H(D nA H(1)H(D B Hctt Hus _ EA Hot Hts ,=1 €; - €° t=i - e° t o *B +| Z o=l n» H(1)M(D H(2)+ z _rji £§_ _ n. Hrt Hto H (1) as ra H=l e° -r t=l € 0 J <*>«S>(€»-€°) 183 6.3 Examples 6.3.a The Dirac Equation A particularly simple application of the perturbation formalism just developed is to the formal uncoupling of the Dirac equation for a spin-i particle in an electric and magnetic field. Not only are the A-states (E^°^ = mc2) degen erate in zero order here, but the B-states (E^°^ = -mc2) are degenerate as well. Historically, much effort has been expended on the problem of obtaining a two component effective operator describing the behaviour of a spin-i particle in an electromagnetic field, from the four component Dirac Hamilt onian. In several cases, special algebraic properties of the Dirac hamiltonian were used to construct the desired effective operators, so that it appeared that such operators were unique, in some sense, to the Dirac equation, and not necessarily analogous to effective operators constructed in other contexts. In this section and the accompanying Appendix 8, it is shown that the perturbation formulas tabulated in section 6.2.b yield the desired effective operators immediately. The Dirac equation is special in that if only a magnetic field is present, the condition D(f) * 0 can be solved exactly. Other methods for the exact uncoupling of the Dirac equation in the absence of an electric field have, of course, been known for many years (for example, Foldy and Wouthuysen, 1950). The Dirac hamiltonian, including electromagnetic inter actions, will be written here as 184 where, and (6.21) (6,22a) (6.22b) Here <t> is the electric potential, (o ta ,a ) are the Pauli spin A jf <w matrices, TT * p - A is the mechanical momentum of the system, and my e, c, are the mass and charge of the electron, and the velocity of light. With the perturbation defined in (6.22b), the implicit perturbation parameter is l/m. This is strictly not the usual non-relativistic approximation, in which the terms of the series are ordered by powers of v/cc, and which will be dealt with later. Both ordering schemes eventually give the same terms in the perturbation! series, but the order in. which specific terms occur may be different. The hamiltonian (6.22a,b) is blocked according to the partitioning of the basis space into the subspaces SA (E^°^*mcr2), and Sg (E^s-mc2). The reduced resolvent (6.17) is just a multiple of the unit matrix because both the A-states and the B-states are degenerate, that is, L » • 1 . (6.23) 2mc Referring now to Table 6.8, it is seen that 185. HA°> - mc2 , ft(2) - i_ (o.n)2 (6.24) and "A3> * ""IT (S'3MfiMJ) - (£'H)20 • 4m c Standard methods* can be used to transform these expressions into the more explicit forms, g(2) . 0,x+ 1 WBW and 2ic 2i (6.25) A 4m2c2 4m2c2 4m2c2 ~ ~ Here ^ = A is the magnetic field, and IS * -J^ is the electric field. The various terms in (6.24) are readily * (A) identifiable. HA ' is the rest energy of the system in the *( 1) absence of any fields, and HA ' is the electrostatic energy. * (2) HA includes both the kinetic energy of the system, and the magnetic dipole interaction. The non-hermiticity of the *Using the well known commutation properties of both the Pauli matrices and of differential operators, one obtains, O, TT]_ = ih£* - -ihg, (CJ»TT)0(O»TT) = -ho»(E x Jj)-~^g,« IML + ihE»n + 0TT»TT, (o.n)2 = 2io.(E x n) + ih^.E, (O«JT)20 « -•—0 £•>< + JT.TT0, O,, TT<«TT]_ = h2 V2<t> - 2ihE«TT. 186. operator is seen to appear first in third order. The first term in X^ * eq. (6.25), is the spin*-orbit interaction, and the second term is the so-called Darwin term. Equations (6.24) are identical to the results obtained using the Pauli elimina tion method to uncouple the Dirac hamiltonian. The correction to the kinetic energy due to the relativistic variation of mass 1 / x4 *(4) arises out of the term • j A (O»TT) , appearing in H: • 8m-V A Similarly, using Table 6,11, one obtains, r?(0) . 2 HA - mc , -(1) H^i; * e0 fc(2) = 1 /„m„\2 (6.26) A. 85 2m" (£'2) and »A3) m TTT fe-JJ. CS'H. *3J. om c 4m <r 8m c The second and third order terms can be rewritten S{2) is-*. and (6.27) Hi3) - c(E x TT) - -S^y V «E. 4m2c2 - - - 8m2c2 ~ ~ Except for the fourth order relativistic correction to the kinetic energy, which will appear here in , this is effect ively the result quoted by DeVries (1970), which was obtained from a perturbation series to fourth order in v/c calculated 187. via the Foldy-Wouthuysen procedure. The point of eqs. (6.24)-(6.27) is that, except for some algebraic manipulations necessary to obtain the effective hamiltonians in a more familiar form, these expressions could be written down without any other calculation, using the tabulated formulas in section 6.2. The first and second order terms in the expansion of f are particularly simple here also, being given by f(D = 1 o.n, 2mc ~ and (6.28) That is, each is made up of only one term. These two terms are sufficient to determine HA and HA to fourth order. Equations (6.28) are also useful for calculating effective operators for other properties of the system. For an expansion in powers of v/c (the non-relativistic approximation), the Dirac Hamiltonian is usually rewritten as (DeVries, 1970). |~3 [*t £»E]«. = H = H (6.29a) where H (0) is as in (6.22), but now H (6.29b) and H (6.29c) 188. Actual expressions for HA, GA,and HA to sixth order, based on eqs* (6.29)»are given in Appendix 8 in a somewhat more abstract notation. Because of the particular form of the first and second order perturbations here (the first order perturbation couples two states only if they are in different subspaces SA and Sfi, whereas, the second order perturbation couples states in the same subspace only), the perturbation series for these effective operators have only even order terms nonzero, while the series for f has only odd order terms nonzero. To sixth order, the operator HA is exactly equal to the result obtained to sixth order in v/c using the Pauli elimination method to obtain a non-relativistic approximation. Similarly, to sixth order, HA is identical to the results of a canonical uncoupling of the Dirac hamiltonian, such as that carried out by Eriksen, (1958). DeVries (19?o) demonstrates that the Pauli hamiltonian is related to the Eriksen hamiltonian to sixth order by a fourth order similarity transformation defined in the space of positive energy states only. Such a relationship is evident from the definition of HA, eq. (2.7ka), in terms of H"A, namely that HA - gA*HA g"*. (6.30) Thus, the required similarity transformation matrix is just gA . Because R"AA^ here is a multiple of the unit matrix, the terms (6) *•* in gA in eq. (6.30) exactly cancel if HA is desired to sixth (<) order only. Since gA = 0, as seen from the tabulations in Appendix 8, the similarity transformation gA need be known only 189. to fourth order to determine HA to sixth order. A treatment of the Dirac equation with some similarity to the above application of the partitioning formalism has been given by Morpurgo (i960). In the course of a rather complicated derivation of a unitary transformation to bring the Dirac hamiltonian to an uncoupled form, it becomes convenient for Morpurgo to define an operator of the form G - UBB , (6.31) where the quantities UBB and UAB are blocks of the unitary transformation matrix when partitioned in the same manner as H in eq. (6.22). It is not difficult to show from the defining condition given by Morpurgo that, for the Dirac hamiltonian only, G » (-f*)"1. (6.32) It is not known if the quantity G has useful generalizations in other contexts, as does the operator f. Certainly, the relationship (6.32) can possibly hold only in cases where and Sg have identical dimensions (so that UAB has an inverse). If an electric field is not present, these effective operators can be calculated exactly, since the perturbation is nonzero only in off-diagonal blocks. This was, in fact, the basis of Foldy and Wouthuysen's free particle calculation (19^0)• The equation D(f) = 0, defining the operator f, becomes, O«JJ - 2mcf - fft'Trf = 0. (6.33) Multiplication by O/JT from the right yields a quadratic equation for (g.«Tr)f, The desired solution is 190. f - r 2 %*J g-i . (6.34) mc + [m c + (g.Jj) J8 since the root with the plus sign in the denominator leads to the expansion (6.28). Given this exact expression for f, all other quantities defined in chapter 2 can be written, exactly, in terms of mc and £«TT. The operator is particularly simple, HA = HAA + HABf mc + [arc + (g'Jj) J The operators GA and H*A are obtained in the same way, but the expressions are much more complicated. It is also possible to write down an exact expression for the projection onto the space S^, spanned by the eigenvectors of the perturbed hamil tonian which have zero order energy E^0^ » mc2. Since *A - h * ftf * \ +f2 « ["MCT + 7(mc)2 + (g«B)21j2 + (g*g)2 t mc • [(mc)2 + (g.Tj)2]* by eq. (2.9a), one obtains, from eq. (2.10), PA « {[mc • [(mc)2 • (£»TT)2]*]2 + (fi-jj)2}"1 fmc+[(mc)2+(o.TT)2]* [mc+[(mc)2+(o.TT)2]*](2.Tr) (o.TT)[mc+[(mc)2+(o.Tr)2]i] (o-rr)2 (6.36) 191 6+3*o Derivation of a Spin Hamiltonian —> Strong Field Case Consider the hamiltonian operator H - Jj.S + t «5 + (6.37) where £ is the effective magnetic field, -3g*H,, and where I - I IU).4(;)>. (6.38) J the 4^ being hyperfine tensors. This hamiltonian describes the interaction of a system of nuclear spins with an electronic spin. In the strong field case, the term h/*§ (the electronic Zeeman interaction) is large enough that the energy separation between levels of different electronic spin is greater than the energy separations between nuclear spin levels. Therefore, a perturbation expansion with respect to H^0^ = h»S, is appropriate in examining the characteristics of the nuclear spin system. In this subsection, a nuclear spin hamiltonian is constructed from (6.37)» in which the electron spin quantum numbers are present only as parameters. In the strong field case, the electronic spin is quantized in the field direction, taken to be the z-axis in the notation adopted here. Thus, the zero order hamiltonian is H(0) = hSz, (6.39) where h « |h,|. It is convenient to expand the perturbation H(1^ = in the form, H(D . aS • S+ • i*+S_ Z Z (6.40) •D+2S? +D*l(S-Sz+SzS-)+Do(SHs2)+D-l(S*Sz+SzS+)+D-2 192. Here S± = S t S are the usual shift operators for the electronic x y spin, and $+ = 3 1 3„ are of the same form in the components — x y of 1 • The coefficients D„ are q Do = 2Dzz» Dil " *<Dxz * LV* I6AL) and D±2 S *C*<Dxx - Dyy) t iDxy]. The zero order levels having S2 = m define the space for the effective nuclear spin hamiltonian ^"m. Because the perturbation has matrix elements with Am = 0, il, ±2 only, there are only a small number of nonzero matrix elements in f^\. which can be written down directly using Table 6.6 and eq. (6.40). These are and #2.. " ^D.2[S2-B(m.l)]*[S2-(l»+l)(m+2)]* 4^2.. ' ^>+2Ca2-^»-l)]*C52-(»-l)(^2)]* • (6.1*2) The A-states are degenerate, and so is identical to to second order. Therefore, Tables 6.8, 6.9» or 6.11 yield (to second order), 193. " "AA and #J2> * X^Z^t^^m^ll-^D^D^L^-Sm-l] 2 2 (6,43) This derivation appears to assume a special, and inconvenient, coordinate system. However, all reference to special coordinate directions (x and y) perpendicular to the field direction disappears on developing the terms in (6.43). If h, specifies a unit vector in the direction of h, then the effective hamil tonian to second order can be written - Cm + n'£ AU).I(j) • E (6.44) 0 i» J Here C is m 0, - hm • D0(„^2) * fffi^ - kg*] [252-2m2-l] -xCS*2z-£ - (E*H)2]t"§z-8m-13« (6-'»5a) the latter two terms giving the overall second order shift of the endor levels. The other effective parameters are 2(j)=i4(j)-h+|(3ra2-S2)D.(l4h)«4(j) + ^ (S2-m2)A(;5), (6.45b) and A(J}.(1 - Sh).^^. (6.45c) 194. Also, D and A are the cofactor matrices, D * D"1det(D), A » A-1det(A). (6.46) £6 SSL S. 9C £5. 2& The third order jty-^ has been obtained in a similar way. m 195. 6". 4 Non-orthonormal Basis — 2x2 Partitioning Many quantum mechanical calculations are carried out using a non-orthonormal basis. In such situations, it may be inconven ient or undesirable,, (or even impossible for certain kinds of perturbation) to transform to an orthonormal basis in order to carry out a perturbation calculation. This section outlines perturbation expansions based on the formalism in section 2.3,, applicable in a non-orthonormal basis. It is assumed that the given perturbed hamiltonian and overlap matrices have the expansions H = ? H(n)fl S = t S(n), (6.47) n=0 n=0 where H^n^ and S^n^ are implicitly of order n in the perturba tion parameter or parameters. As observed in section 2.3* there are two alternative types of conditions defining the off-diagonal blocks of the partitioning operator T, in this case, the first being eqs. (2.113) and (2.114). If perturbation expansions are desired only for effective operators in the A-space, then only eq. (2.113) need be considered, D(f) = HBA - HBBf - (SBA + S^fJS^S; (6.48a) •^+W-«SBA+SBBF^SAA+W)"1<HAA+HABF^°-(6.48b) Substitution of the series, (6.4?), for H and S, and expansion of f in a similar series then leads, as before, to a hierarchy 196. of equations, D(n)(f) = 0, n « 0,1,2 (6.49) It will be assumed in the remainder of this section that and are at least block diagonal, so that f^=0. Formulas for the D^(f) in terms of H, S, and f are then obtained in stages. Writing ^V^^^"1 ? (sU'^siS-^f"))]. (6.51) n=2 j=l one obtains A AA k=1 AA AA (6.52) *—1 from which low order terms in the perturbation series for SA can be obtained. Then for the operator HA= SA HA, one has K " »in)» (6.53a) A n=0 A where s(n). j S-KJ)S:^J) - ? ST^J^H^-^^T'H^^^^)- (6.53b) j=0 A AA k=l Aa Given (6.52) and (6.53)» eq. (6.48) can be expanded straight forwardly, and the hierarchy (6.49) can finally be written 19? 4B)f(n)-SBB)f<n)sM)"lHAA) * "A(n)* nSBl'2(6'54) where A(»).,4n)+^1^l)f(J).^1(S<n-J)+^Js(»-l-Wf(W)5(J) (6.55) The equations (6.54) for the f^n^ are more complicated than those for an orthonormal basis because of the blocks Sgg^and SAA^-1 appearing on the left hand side. This linear system can be solved for f^n^ using numerical methods, but a general solution in terms of S and H cannot be given. The expansions (6.51) - (6.55) become much simpler when H and S are diagonal in zero order. Then S^°^=ln (so that SBB^ = XB» SAA^"1= 1k^* and the ea.uations defining the f(n^ become identical inform to eqs. (6.8), A(n) • H(0)°r H(0) • <«•*> rr oo where the A are now given by (6.55). Low order terms in the expansions (6.48) and (6.53) are given in Tables 6.16 and 6.1? for the case S^°^=l • In terms n of f and H, the effective operator G^ is independent of S, and therefore the expansion (6.12) still holds (Table 6.3). Low order terms for the metric gA have been listed in Table A9.1 of Appendix 9. The formal definition of HA, eq. (2.74), in ±4 terms of gA 2 is also unaffected. However, the series for gA 198. TABLE 6.16 p("'(f) — Non-orthonormal Basis n(l) . „(0),(1) ,(1)„(0)+h(1) q(l)„(0) D " BB f "f AA BA BA AA D " BB f "f AA tl)(2).„(l)f(l) ,s(l}.f|l)Ul|(l) ,.(1)H(0), f.(2) „(1),(1),„(0) BA BB f *(SBA +f AA AA AA '-lSBA +SBB f ' AA +HB^HBBV1>+HBBV2> -(s(A)+f(^)[„{2).„(B)f(i>-s<A'HU)-(sAl)+s(B)f^))„^) ^)2HAA0>] -Cs^V^s^2',^ D<*> - HB°)f(«-f^»H|°» +HBit)t"BB)f(lJ+HBB,f(2)+HBB)f<3> -CSB3)+s<|)f(l)+s(B)f(2)+f(3))(H(l).s(l)H(0)) .(s^)+sB3)f(i)+sB|)f(2)+sB|)f(3))HA0) 199. TABLE 6.17 HAn^(f) ~ Non-orthonormal Basis „(D o(DW(0) AA AA AA .(s(l)«»),Ci))„U,^(i)2H(i) s(D3H(o) AA AA TABLE 6.18 HAn^ — Non-orthonormal Basis •Hi!' ' AA AB f +*Lf ' BA +f '' AA J-200. TABLE 6.19 GJ'}' — Non-orthonormal Basis JOl-ID.JDt 10).„(1) BB f *h AA aA „(0),O).h(3)tH(0) BB f +h AA +H(3>+H^f(l)+H(tl)f(2'+h<l)t(HAA»+H{l»f<1))+h'2»tHAA) <)tH(3)f(l)+„(|)f<2)+„(l)fO) th<3,tHAi) TABLE 6»20 g^n^ —- Non-orthonormal Basis  f(3)+h(3)t «»>«g>f<1>«»>f<2Wl>t(sA2>*<|>f«1>)*h<2»tsAi) +SBA)+SBB)f<1,+SBHf(2)tSBB)f(3> +h(1)t(s<3).s(?'f(1)+s(i)f(2))+h'2)Us(2^s<i>f(1)) AA AB AB AA AB +h<3>tsU> 201 TABLE 6.21 f^ — Non-orthonormal Basis (A-states degenerate) ,(D ,(2) ^BA^A3^ H H -«°S LH(A)+LH^I(H<i»-€AS^)) ^)*L<)-€^<i))](K<i>.€jsAA)) +€A1LSBA +SBB L(HBA A BA 'J .(3) +^(s(A)+sU)Ln(A),]SAi) -6o[sO)+s(|)^u)tS(i)l[Hu).Hu)ju).(sa)+Ja))5(1) AA now depends on S, and has a first order term, so that, while eqs. (6.15) hold here also, the formulas in Tables 6.5, A7.7» and A7.8 are no longer valid. Explicit expressions for low order terms in the series for H*A in terms of H, S, and f are given in Table 6.18. As in an orthonormal basis, the perturbation series for the effective operators H^, G^, and H^, are much more compact when the A-states are degenerate in zero order. Equations 202, TABLE 6.22 k[n^ — Non-orthonormal Basis A • 1 1 (A-states degenerate) H{0) - H<j>> - €jlA -(1) _ H(l) .0.(1) _ HA ~ AA " A AA " AA HA " AA AB LHBA -iB,^«Bi)<)^)-(s^,^BA>)a^)«0(sBi)+s<B)sBi')] -«)^I,^Bi,^i,*BA,<«Bi,-^Bi,^Bi))51i, "Afc AA ':AWBA T*BB ""Bk •S (1)2H(1) eos(l)3 ors(l) s(2)+s(l)L5(Dl 'AA AA A AA +6AlSAA »* AA AB LrtBA •* • (6.16) and (6.17) apply here, with the modified k{n) of eq. (6.55)» A and can be used to obtain explicit matrix formulas for f, HA, GA , and HA, solely in terms of H and S. The lower order terms for these expansions are given in Tables 6.21 - 6.24. If the A-states are not degenerate in zero order, use of eqs. (6.56) yields formulas for the individual matrix elements of the operators f, HA, GA, and HA. Low order formulas of this type are given in Tables 6.25 - 6.28. The second set of conditions defining f and h arise from the requirements 203. TABLE 6.23 o[n^ — Non-orthonormal Basis (A-states degenerate) HX2)+H(i)*(i)+2(iWi)+S(i)T„(o)TS(i) AA AB LKBA AE ^BA AB LHBB LHBA H,(3)+H(2)L2,(1)+S(1)LH(2) AA AB- LHBA AB LHBA X'LHBH)]C^JU) AB LHBB LLHBA BB LHBA MBBA +LHBA ' AA +<:oTK(2).q(l)*(l)n  +€AL(SBA +SBB: LHBA }-> *(1)LH(1) «(1) AB LHBB: LHBA TABLE 6.24 HJ"^ — Non>orthonormal Basis (A-states degenerate) »AA) - *A W(D -os(ii- _ a(D AA " A AA " AA H(2).Hl(l)Tg(l) *o,„(2)+s(l)T8(lK ifs(D H(l)7 AA AB; LHBA "€A(SBA +SAB LHBA AA " AA > + + A AA 201*. and «BA S (rS$)BA " SBA+ W+h^SAA+SABf) = °< (6^7b) When and are diagonal, expansion of these equations yields 4°)f(n)+h(n)tHA°) - -B<n). (6.58a) and f(n) + h.(n)t = -fiXn)^ (6.58b) where explicit use has been made of the condition S^=l. Here, one has, B < rri =H« n) n- j), (j } ) t„(n- j)+ 1 BA flit J- ., AA J"1 j 1 (6.59a) ^z"1 toC«tH(«-l-*)rC3>t i=l j«l and J'1 ' (6.59b.) i-i j-i m Equations (6.58) can be solved simultaneously for the matrix elements of f^ and h/n^, which are given by tM = 1 or S-2 2£ (6.60a) or €° e° r " o and n(n)% ->1 'or 'or (6.6ob) or c° c° r o 205 where the €^ are the eigenvalues of If the overlap matrix is not perturbed,, the quantities B^ all vanish, and eq. (6.60a) reduces to eq. (6.9), and (6.60b) implies eq. (2.4). If eqs. (6.57) are to be used as the basis of a perturbation formalism here, the series for h and f must both be considered simultaneously. This complication is offset by the simpler for of the expansions (6.59)• Several low order terms in the series for G'g^ and gfiA are given in Tables 6.19 and 6.20. Explicit expressions for the corresponding quantities Jijn^ and; Bg11^ can be obtained from these tables by deleting the terms in h(n) and f(n). If the A-states are degenerate, eqs.. (6.58) can be written as Lr(0)-.(n) co.(n)t _ R(n) BB "* €Ah ~ ~&1 » and (6.61) f(n) + hi(.n).t = -B|n)t which can be solved as a system of two matrix equations in. two unknown matrices. The solution is f(n) = L[B(n) _ €oB|n)]f and <6'62> h<n)t = (€°L - l)B<n) - LB<n). where L is given ineq.. (6.17)• If these equations are used to obtain expressions for the f^ solely in terms of H and S„ the expressions obtained are naturally identical to those froim (6.16) and (6.17) with (6.55). However, eqs. (6.62) provide a more 206 efficient computational scheme for the calculation! of high order terms in the series for f• A collection of alternative formulas for the terms in the series for H^ along with formulas for the metric gA and related quantities have been given in Appendix 9» This type of perturbation theory is useful, for example, in extended Huckel molecular orbital theory* 207. TABLE 6.25 — Non-orthonormal Basis 4V - <£0i) or IT or €° - €° r o Hor} + £ <»ej} / JJ( 1) €r*V ; €o - €°S(1)) + 6°S<2) r or r/sd) • ot ' €tSot ) (II(1) . €os(1)) 1 - J ( Sot + ToTo } (Htr " €rStr ' eo o TABLE 6.26 HJn^ — Non-orthonormal Basis w(0) Hrs Hd) eos(l) Hrs " €sSrs HS2) + 2 rs p rP o„dh(Hgs €sSPs > s rp ' €s " €<° ^ AA AA ;rs - <4V s rs + €r*SAA >rs 208, TABLE 6.27 G^n^ Non-orthonormal Basis H(0) Hrs H. (1) rs H. (2) + L n(l)(HPS -gsSPs > + £ 1*TP "€rSrP J H(l) rs €s €P ,H(1) foq(l)ww(l) e°<5(1)) + L €° (Hry> ~ €rSrP MHPS 6SSPS > TABLE 6.28 H*[n^ — Non-orthonormal Basis H H (0) rs (1) rs - i(€° + €°)S(1) s' rs H'il* + 2 H rs p rp /H(l) eoq(l)x (l)CHPs gsSPs } €s " €P +*<€°+e°> 1 r s p rp r rp S(D + _£s a Pa -O -o ps cO -O s(2) + E S(1)(H^ VVs > rs P rp fcs ^ 209. CHAPTER 7 EIGENVALUE INDEPENDENT PARTITIONING AND MOLECULAR ORBITAL THEORY "•We have applied, the same process,' Mein Herr continued, not noticing Bruno*s question, *to many other purposes. We have gone on selecting walking-sticks--always keeping those that walk best—till we have obtained some, that can walk by themselves! ,,,.,M (Sylvie and Bruno Concluded. Lewis Carroll r 210 7.1 Intro due ti ore The eigenvalue independent partitioning formalism developed in chapter 2 is particularly suited to situations in which only the whole space spanned by a subset of eigenvectors of some operator has significance, rather than the individual eigenvectors themselves,. The mapping f is sufficient to determine the proj ection. PA„ eq, (2,10),, onto the subspace of interest, so that,, im principle, all relevant properties can be determined once f has been calculated. One of the more important areas of quantum) chemistry in which these aspects of the partitioning formalism can be exploited is im molecular orbital theory. In molecular orbital theory,, a closed shell system containing 2nA electrons is represented by a Slater determinant made up fronc reA doubly occupied orbitals. Since this determinantal wave-function- changes by at most a complex scalar factor under am arbitrary linear transformation!of these occupied orbitals, the individual orbitals have no direct significance. In an n-dimem-sional basis space, the nA occupied molecular orbitals are specified by n^n = ^(n^ + rig) complex numbers, the Icao (linear combination-of atomic orbitals) coefficients. Since these nA molecular orbitals are arbitrary up to an x nA linear trans-2 formation', so that nA of these complex numbers must be redundant, there are only nAnfi independent complex variables in. the problem* which is exactly equal to the number of Brillouin conditions that must be satisfied. This is also the number of (complex) matrix elements im the mapping f defined im eq. (2.2), arising out of a partitioning of the eigenvector space of the hamiltonian! into art m^-dimensional subspace spanned by the occupied molecular orbitals, and an nB-dimensional subspace spanned by the unoccu-i pied orbitals. Thus, not only is the mapping f sufficient to determine the projectiom onto the space of the occupied orbitals, but it also represents the minimunh amount of informations required to specify that projection. The matrix elements of f contain no redundancies, and are subject to no constraints. These two properties of f are of considerable practical importance.-This chapter is primarily concerned with the derivatiom of perturbatibm formulas for the projectiom onto the space ©f the occupied molecular orbitals. This projection, is also frequently referred to as the one-particle density matrix in molecular orbi tal theory,, and is equal to the charge-bond order matrix except for a factor of two., Both the simple matrix (Huckel theory), and the self-consistent field„cases are considered:. The latter is more general than the matrix uncoupling considered hitherto in that the operator to be block diagonalized by f, itself depends om f. This chapter is restricted to consideration of closed shell systems only. iThe detailed! nature of the partitioning of the basis space is not of central importance here, as long as f exists. Nevertheless,, particular partitionings may be of special interest im certain cases because the elements of f then have a particular physical significance. One example is a basis made up of localized; bond,, lone pair, and antibond orbitals. When this space is partitioned into an n^-dimensional subspace, S^, spanned by the bond and lone pair orbitalsr and an ng-dimensional subspace,, Sg.,, spanned by the antibonding orbitals, then the elements of f measure the delocali-zationr of the bond and lone pair orbitals through the mixing im of antibond orbitals. In the same way,, in a self-consistent field calculation,, carried out im a Huckel basis,, which diagonalizes the hamiltonian in the absence of explicit electrom-electrom inter action, and partitioned into occupied and unoccupied orbitals,, the elements of f represent the magnitude of the mixing of these init ially occupied and unoccupied orbitals because of the electron repulsiom terms. 212* 7.2 Perturbations of the Density Matrix —Orthonormal Basis 7•2.a General Theory Consider a partitioning of the basis space into two sub-spaces, SA and Sg,, spanned by the orbitals occupied and unoccupied,, respectively,, in zero order. The projection PA onto a subspace SA», spanned by the occupied perturbed orbitals, can be written (eq.. (2.10)),, as,, (7.1) PA -1 -1-t ffA % f fg-1 fg:1ft =A i&A where gA = 1A + f*f• The perturbation series for f therefore t> determines series for each of the blocks of PA,, given by /p'\(n) _ _-l(n) ^ A'AA ~ gA • -1 f(iV(n-3). J=1 (7.2) and i=i j=i when the zero order hamiltonian is at least block diagonal (that is,, when f^0^ = 0). In the simple matrix case, the terms in the perturbation series for f are determined from the hierarchy of conditions D^n^(f) = 0, eqs. (6.4). In a self-consistent ( n) field formalism, the equations defining the fK ' may be more complicated. They are considered in some detail in section 7»4. Fromi eqs. (7.2), it is seen that the first two terms in the series for PA are given by, 213. and (7.3) (7.4) where is the projection onto the basis space SA. The blocks of the second and higher order terms of PA are all non-vanishing in general. The special form of P^ , in which the only non-vanishing matrix elements are those between the zero order occupied and unoccupied spaces, is a consequence of the absence of a first order contribution! to the metric g^. Using the formulas given ins sectiom 6.2, the perturbation series for P^ cam be expressed solely im terms of the perturbed hamiltonian H. Tables 7.1 - 7»3 give these formulas for the elements of (P^)^,. (P^'BA^ AND ^BB^' FOR NI = °'1»2, and 3. The case im which the A-states are all degenerate is not of great importance im molecular orbital theory, and no formulas ' ( IT) for (P^) applicable to that' case are included here. The formulas in Tables 7.1 - 7*3 give the matrix elements of the perturbed density matrix im the basis of the zero order orbitals. These,, in turn may be known in terms of some more primitive basis functions, for example,, as a linear combination of atomic orbitals. The coefficients with respect to such a basis (eg. the lcao) coefficients) will be denoted here as the columns of a unitary matrix C. The perturbed density matrix im the original basis will be denoted by R. The terms im the 214. perturbation! series for R- are given by/ R(n) = CpA(n)ctr or /^\ occ , / \ » unocc , / \ -l!J o irLV A'AA Jrs js „ „ 10*- 'BB Jc jr (7.5) occ unocc: , , •%(n0-i * + I I Cir£<PA>AB ^oc'^io'^PAJ^lcr^r -where the primes on the summation indices indicate that they are referring to basis elements in Sg, (that is,, numerically, o* = a + rtk in C^,,).. In the simple matrix case (Huckel theory), the energy of the system described by the determinantal wavefunction made up of orbitals which span the image space of PA is given by E = v tr: PAH, (7.6a) v being am occupation number for the orbitals. Using eqs. (7.2), a perturbation series for E in terms of f and H„ and ultimately, In terms of H only, can be derived.. The general formula1 Is E(n) = v Z tr["(p')(;3)H(rr^ +-(p,)U').H('n-j') + /p1) (0 )H(n«- j) -to A ^ A AB BA A BA rtAB J — u "A'BB "BB Formulas for E^ im terms of f ^ and H^ are given in Table) 7.4 through fifth order. By using the conditions D^(f) = 0, eqs. (6.4), it is possible to obtain, a 2n+l rule here, in that E^2N) and E^2*1*1* can be written in terms of f(1^ through f^ only, as is done in the formulas in Table 7.4. Table 7*5 gives (n) formulas for the EV im terms of the perturbed hamiltonian: only,, through third order. The formalism presented above corresponds to some extent to that developed by McWeeny (1962),, for the perturbation; of the density matrix in the context of self-consistent field theory. Since self-consistency terms are not indicated explicitly im much of that derivation1,, the resulting formulas: correspond closely to those derived above. The procedure used by McWeeny to derive a perturbation'series for P^ was to expand 2 the equations 1 PA [H„ p'] = 0„ (7.7a) and if " PA ' PAf " PA • in perturbation series, and then successively isclye the hierarchy of simultaneous equations effectively for the blocks of P^,, as it is partitioned in eq. (7.1). The series obtained by McWeeny for P^ is identical to that obtained here — only the derivation: is different. Here PA has been written im terms of a matrix f" ire such a way that eq. (7.7b) is automatically ( rt) satisfied. The hierarchy of equations, Dv (f) = 0,, defining the series for f„ is equivalent to the hierarchy resulting from eq. (7.?a)„ as showm im section. 2»-3. In his derivation, p McWeeny refers to the one-particle density matrix,, denoted as #> PA here-, by the symbol p im his 1962 paper. 216. McWeeny effectively takes the elements of (P^BA. as the independent variables to be determined in. the calculation,, ' (n) and calculates the elements of the other blocks of (PA) from themi. TABLE 7.1 (pA)^ Molecular: Orbital Basis (p')(0) = 1 ^ A'AA LA A AA u nw K(DW(1) [tp;>2)] -- - ^-s*-A'AA -"rs a=1 (€o,_ €o)(€o . ,o} %T nv, w(l)w(l) / nt, H(1)K(1) \ C(P;)(3)] .. ZB „H(i) ZB _ ( / ) H(i> -H (1) or n. + H^¥2) + H(2)H(1) or OS- or os WAV t-l (€°-] H (1) OS (€°-€°)(€°-€°) s or o 217/, TABLE 7*2 (P^)^ — Molecular Orbital Basis (p')(0) = 0 * A BA U L*rA'BA Jar H (1) or v r a' *- ^ A' BA Jor nSn. H(l)H(l) or Z at tr P=l (€°- €°) t=l <€°- €°) LvrA;BA Jor H(3) + nBH(2)H(l) nA:H(l)H(2) *op "pr lot "tr or P=l (€°- 6g) t=l (€°- €°) nfe K(1) %=1 <€j-€2> V ^ Y=l (€°- €°) fl (€°- $) E [H(2)+ E gp pt Z Hos Hsr } Htr + E E gt t<° P=l t=l(€°.€°)(€°-€°) - E E H(1)H(1)H(D as sP Pr 3=1 «o=l (€°-€°)(€^€°)(€°-€p°) 218. TABLE 7.3 (P^)!^ — Molecular Orbital Basis 1 A BB u n, ff(l)H(l) A BB ^ r=l (€°- €?>(€£- €°) raA r- nw H(1)H(1) rrp"^3)n . rALdW2) + K<2)H(1) + H(1) v py V ^VBB JA/> - ^ I Hor V H«r V- Y=i €°) •~—  Y 219. TABLE 7*4 E(n^ — Molecular Orbital Basis E(o) = V tr H<0) AA E(l) = V tr E(2) = V tr E(3) = V tr 'AB * -> -(Dt^dJud) . ^1)^1)^1) AA BB * t^WuM - f(1)f(1)tf(1)HJl> + KA^)f(2)tf<2) - f(2)f(2)tH<°J] +f(2)tH(3)+f(2)H(3) + (f(2)f(l)t+fd)f(2)t)H(2) .(f(2)tf(l)+f(l)tf(2))H(2)_f(2)tr(2)H(l)+f(2)f(2)^d) AA AA AA _f(l)tf(2)f(l)tH(l)_f(l)f(2)tf(l)H(l) BA AB + (f(l)f(l)t)2Hd).(fd)tfd))2Hd)+f(l)tf(l)f(2)tK(0)f(l) +fu)tfd)fd)tH(o)f(2>_fd)fd)tf(2)H(o)fd) BB AA .f(Df(l)tf(l)H(0)f(2)t-| 220 TABLE 7.5 Ev"' -- Molecular Orbital Basis ,(0) _ = v E € r=l r .(1) . = v E H .(2) . • (3) . (1) = v r-1 rrr n4 E I H E H r% H(l)tr(l) 2 rg or nA nB H^H^H^ 2 ro os sr 8-1 cl (€^€°)(€°-€°) nv, H(1)H(2) 2 Hr-g Hor nB • v E o=l rr B; „ os sp pa P-l 8-1 <€°-€°)(€°-^ nA H(DH(D 3 = 1 (<- «°> 221. 7.2.D Huckel Molecularr Orbital Theory As an illustration of the straightforward way in which the tabulated perturbation formulas for PA can be used,, expressions for the bond-bond,, bond-atom, atom-bond, and atom-atom; polarizabilities,. as defined by Coulson and Longuett-Higgins (19^7) In Huckel molecular orbital theory,, will be derived. These quantities are proportional to the first order response of diagonal and off-diagonal elements of the charge-bond order matrix,, P = 2R, where R denotes the density matrix in the atomic orbital basis, (or the second order response of the energy, (7.6)),, to a perturbation of diagonal or off-diagonal elements of the hamiltonian, H^0^,. in the atomic orbital basis. Thus, the results below are determined by combining the formulas in Tables 7..1 - 7«3 with eq. (7»5). First, consider the single center perturbation given by representing a change in H^0^ by an amount 6a^.. On trans forming to the zero order molecular orbital basis,, this becomes (HjWO^ij; = 6atCtiCt(j „ (i,j = 1, n), (7.8b) assuming the are all real.. Substitution of (7.8b) into the first order formulas for PA In Tables 7.1 - 7»3t and back-transformation to the atomic orbital basis via (7*5) then gives the resultst = -L- (R(D) „ (7.9a) 3Htt 6at * 222.. where, 1 n \ occ umoce C. C. , -i- RH; = E E xr xg [C ,C. + C C. ,"]„ (7.9b) 6at 1Jl r a €° - ej, ia,0jr lr jo,J" w y ' and,, lm particular, for the diagonal elements,, . /4i \ dec: unocc; C.„,C. C. ,C. -i-R?lJ-2E E to' tr 10' ir (?<9c) 6at 11 r o €° - €°, These quantities are respectively the atom-bond and atom-atom polarizabilities (TT. . . and TT. .) to within a factor of two, 1J , X 1, X as defined by Coulsom and Longuett-Higgins (1947). One may obtain second derivatives of the elements of R with respect to one or more diagonal elements of H in an analogous manner. A summation over all 6a^ is incorporated into eq. (7.8), allowing for a simultaneous perturbation of all diagonal elements of H^;. The second derivatives of elements of R with respect to two diagonal elements, HL , and H , of (2) H^0 are obtained by isolating the coefficient of S^fia^ in R. , 2 t * , <™* unocc Cpo'Cpr CqsCqo'CirC,fs ^pp^qq r 0 €r- €o« L 3 €s " €o* +unoca Oqf,,CqrCig,C..p, +occ unocc C^C^ + ' o ^ o :r' V cO ,0 r a t - € x 223. While this is considerably more complicated than the first order formula, it is nevertheless obtained quite straightforwardly from- the formulas tabulated for PA. This procedure can be continued to arbitrary order, but the explicit formulas rapidly become more complex and less useful. The computation of high order terms can be done more efficiently by successively cal culating the f^ and g~ 1 (i) ^ evaluating the PA^^ in terms of these quantities numerically using eqs. (7.2),, and then trans forming to the atomic orbital basis. The attractive feature of the derivation of (7.9) and (7.10) above is that the various ( n) summations in the formulas for the .. appear automatically J, as being either over occupied orbitals or over unoccupied orbitals., This is not so when conventional perturbation formulas, based on the perturbation series for the occupied orbitals,, are used, for which the derivation and simplification (n) of formulas for R for n> 1 becomes very laborious. For a two-center perturbation given by (HAQ^)pq ~ ^AO^qp = 6B , the matrix of the perturbation in the molecular orbital pq basis is The formulas In Tables 7.1 -7.3 and eq. (7*5) yield immediately the bond-bond and bond-atom polarizabilities, TT... and TT. iJtPq 11 pq (to within a factor of 2), respectively, 22*U where^ and 1 W.(D °r un2cc ^Pa^qr+Cqa'CTor^Gia'^ IT" *i* = r a €° - €° Hpq r a* (7.12b;) . h\ oc:cr, unocc C.. ,C. R>.r' =22 Z ——— [C ,C +C „C ]. nn J €o^ €o *• pa' qr qa« pr-J W r: ff (7.12c.) Higher* order formulas are obtained straightforwardly here also, but even ire second order, they are lengthy and none will be given here.. 7..2.C- Numerical Example — Huckel Theory To obtain some information on the nature and usefulness of high order perturbation series for the charge-bond order matrix,, P = 2Rr a number of numerical calculations were carried out,, based on three Huckel-like hamiltonians• The first two of them, were 0 -1 0 0 0 -1 -1 0 -1 0 0 0 0 -1 0 -1 0 0 0 0 -1 0 -1 0 0 0 0 -1 0 -1 -1 0 0 0 -1 0 (7.13) which could represent a six-membered ring of identical atoms in Huckel theory (eg., benzene), and,, 225. H (0) _ -1 -1 0 0 0 0 0 -1 -1 0 -1 0 0 0 0 0 0 -1 -1 -1 0 0 0 0 0 0 -1 0 -1 ,0 0 0 0 0 0 -1 -1 -1 0 0 0 0' 0 0 -1 0 -1 0 0 0 * 0 0 0 -1 -1 -1 -1 0 0 0 0 0 -1 0 (7.14) representing an eight-membered ring around which two kinds of atoms alternate (eg. P^N^). The third example,, denoted as H1°1 „ is obtained by setting the (1„1) element of H!J01 to A5B3 A4B4; zero. Series for P^ and P^ for a single-center perturbation (H^ varying) only will be described. They can be writteni as (7.15) and coefficients in each case for nn = 0,,1„2,3» and 4, are givero in Tables 7.6 - 7.8. Because of the symmetry of H^0], the series for P11 contains only odd order terms, while that for P^ contains only evens order terms. For the A^B:^ and A^B^v systems, none of the coeffi cients im the series for P^ and P^ through fourth order are zero. The coefficients im the series obtained decrease in magnitude quite rapidly, with the fourth order coefficient being smaller than the zero order term by a factor of up to several hundred. The coefficients given im Tables 7.6 - 7«8 are both positive and negative, but no pattern; in sign is recognizable to fourth order. Plots of exact values of P^ and P12 as a 226 functions of H^, along with plots of the first through fourth order series approximating these quantities aire given in Figures- 7*1 - 7.6-.. The two matrices, HI0! and H»°l „ can be considered as A4FL;2+ 5 3 alternative zero order terms when the (1,1) element only is to be perturbed, Thus,, the exact quantities P^ and P12» considered as functions of are identical im both cases. The alternative series givem Im Tables 7»7 and 7.8 are them seen to be power; series: expansions of the functions ^n^n^ and Pi2^11^ around two different values of H^. Two potential pitfalls im the use of high order perturbation, series, which warrant some emphasis„ are illustrated by the results- here.. Thesie are rather obvious dangers which apply quite generally to the use of any truncated power series expan sion, which is what such finite perturbation, approximations actually- are. Firstly,, as the size of the perturbation In creases, the error lm a higher, (but finite) order partial sum eventually becomes larger than the errors in, the lower order truncations of the series,, although, by the time this occurs, none of the partial sums of lower order may be sufficiently accurate to be useful. Thus,, while the inclusion; of the next higher order termi im a given series will generally increase the accuracy of the approxiraatiom when the perturbation; is; small., It may substantially decrease the; accuracy If the perturbation is large. Secondly,, the range of acceptable accuracy of an approximations to. a given order depends signifi-22? cantly on the zero order approximation1 used.. As: seen from Figures 7*4 and 7«6», the first order approximation for P^2^H11^ has a considerably wider range of usefulness around H^0^ = -1 than around H^0^ = 0. If an approximation for P12 as a function of is desired for -1 < < 0„ It is clear that the zero order hamiltonian H![01 Is: superior to . A4f°4- 5 3 TABLE 7.6 (PA0)(i) for Ac- System (H<°} = 0) p£l) = -0.398148 pj2) = 0.0 P*3) = 0.031875 P<f = 0.0 p[P = 0.666667 (2/3) p<2) = -0.053626 p[l> =0.0 ,(4) 12 0.006489 TABLE 7.7 (pAQ)^ for an A^ System (H^^ = -1) (0) . p(0) _ rll ~ p(D _ p(2) _ Pll ~ pll " p<4) . *ll " 1.477301 -0.273157 -0.068263 -0.001732 0.005201 >(0) _ 12 " >(D = 12 3(2) _ 12 ~ 3C3) . 12 ~ ,(4) . 12 " 0.5758691 0.104447 -0.009937 -0.011540 -0.002410 TABLE 7.8 (PAQ)(l* for an A.B^ System: (HJ^ = 0) ,(0) _ 11 ~ 1.140825 pli* = -°»3&7179 Pli^ = -°»0301^7 P^3) = 0.027841 PJJ7 = O.OO5568 FlV = °«65?296 P^ = O.045056: p12* = -°»°^8380 p12^ = -°»oo8863 P1Z = °«00^3 230. 231. 232 233. 234 235 7.3 Perturbation of the Density Matrix -- Non-orthonormal Basis 7»3va General Theory In this section, perturbation formulas for the density-matrix are developed for the case in which the primitive (atomic orbital) basis is initially non-orthonormal, and in which the overlap between these orbitals may itself be perturbed. Such a situation would arise, for example, for a perturbation involving a bond length change in; a Huckel-type molecular orbital formalism. The major complications here are the explicit presence of the perturbed overlap matrix, and the fact that the transformation between the zero order molecular orbitals and the atomic orbital basis is now non-unitary. The projec;-tion, PA», onto the space of occupied orbitals, is still given by eq. (7.2), and therefore, the formal expansions, (7*3)$ still hold.. However, now the formulas for the f ^ and g^ J' must be obtained from section 6.4. The initial perturbation series are calculated in a basis of zero order molecular orbitals, with coefficients relative to the atomic orbital basis denoted here as the columns of the (generally-non-unitary) matrix C'. That is,, in the calculation of the f ^ 3) and gj^*^',. the perturbations are HM0) = C+HA0>G - SM0) = C+SA0)c» (n=0„l,2,,...)), (7.1S) where H^0^ and S'M°^ are to be at least block diagonal (so that. ff*°* =0). When CtSA^C = S^ = 1„ the transformation, t of the density matrix,, PA.,, from; the molecular orbital basis to 236. the original atomic^ orbital basis can be written as R = CPAcJ , (7.17) When HM0 Is diagonal,, explicit formulas for the elements of i PA (and R), in terms of those of H,, S,, and C, only, can be written down. It: will be assumed that Is diagonal and that SMQ ' = 1 In: what follows., The zero order termi of PA Is still given by eq. (7«3)» However,, the first order term is now i (1) . -(S(1)) f(l)t ^MO 'AA 1 f;(D o (7.18) The matrix elements between zero order occupied orbitals appear as a result of the perturbation: of the overlap matrix.. Explicit formulas for the matrix elements of the blocks of p^ni) In terms of HMQ and1 S'M0 only are given in; Tables 7«9», 7el0'» and 7.11 for m = 0,1„ and 2. A perturbation^ series for the Huckel energy E (eq. (7.6)) can again; be obtained'using eq. (7.7). Expressions for the E^n^ In terms of fV H„ and. S only are given in. Table 7.12 for n =0, (2) 1, 2, and' 3» No difficulty Is encountered in eliminating fA ' frorm the expression: obtained via (7*7) for Ew • However, no (h.) (<) attempt was made to) verify that Ev ' and Ew/ can; be written down; solely In; terms of f^ and f^2^ by; using the conditions defining the f ,, as was done- for the case of an; orthonormal basis. Formulas for the Ev ' In; terms of the elements of H and S only are given for m = 0,, 1,, and 2 in Table 7.12. 237 I (—A TABLE 7.9 (PA^AA — Won-orthonormal Basis (p')(0) = 1 * A AA lA (P*)(l) - -S(1) * A AA AA A AA Jrs rs 4 r , ,o ,Oi . ., rt ts P-1 Ug-€pj t=l nB (Hi^-C^yj^^-S^C?) + N rp r rp ps sp p' TABLE 7.10'' (PA^BA — Non-orthonormal Basis (P*)(0) = 0 * A BA U „(D eoq(l) ^ A BA -W " co co €r " 6c NB (HL^^ii^CHii^ii.)) ^ A BA -Jor; nor r-or ^o: ^ €0j "fl tr J £ - *° t=l (€?-€°) tr t o 238. TABLE 7.11 (P^)^ Non>orthonormal Basis (P')(0) = 0 * A BB U fp'y^1^ = o A BB J op nA (H^ r-l - S(1)€°)(H(1)  ore it7 v r<o - S(1)€°) rp ry TABLE 7.12 E^7 — Nion-orthonormal Basis E<°> * v tr HA°» "t2> • * trCHjr^Hji).,'^^^ -r<i>3ii>^>^Mi)*H(i)+1.u),ci)tH<i) +[-(^M2B^(1,^(1,t(S<A,^'f(1>)) + <SAi>+SAE f(1 ^f*1) f (S <A>+f <1 > ) )s£> ^(1)0(1)^(1)^(0) -r r BB 239. TABLE 7.13 E.^' — Non;-orthonormal Basis E<°> - v LX r=l ^ nA . rr r rr r=i E = v n B (H v Z 0 = 1 (1) cr; " €r?or- > H(l) 0^ -o.\. ro 240 7»3»b Extended' Huckel Molecular Orbital Theory The formulas: just developed for use with a non-orthonormal basis will new be used to derive explicit expressions for the first order response of the elements of the density matrix R (in the atomic orbital basis) — equal to the charge-bond order matrix divided by 2 — under a perturbation of both the hamil tonian and overlap; matrices* These formulas: are analogous to those of section 7»2,b. for ordinary Huckel theory, and would be applicable, for example, in an extended Huckel formalism. The increased complexity of the formulas due to the presence of the overlap matrix probably accounts in part for the lack, of a detailed treatment of this problem', in the literature» although a number of low order formulas have appeared in connect ion with particular applications (Fujimoto et: al,, 1974; Co ope, 1956; Libit and Hoffmann,, 1974).. For single center perturbations,, we; have n (H t^6at 6ptV' and (7.19a) n (S 2 6S t=l tt 6pt6qt so that (Hi t?1 6ttt CtiCtj and (7.19b>) n (S t^ 5SttCtICtj The derivatives a%j/3(H'A0^pp and aRij/3 ^SA0^pp are Siven bv 241. the coefficients of 6a and 6S , respectively, in the expression obtained for RH by inserting the first order formulas in Tables 7.9 - 7.11 with eq., (7.19b), in eq. (7.17). In detail (1) n occ: occ # # R..' = - E 6a. I £ C. C. C. C. U t=1 t r s ir tr ts _js (7.20) ni occ unocc [6a. - €°6S+. 3 + £ £ E x- r„ xx C. C. , t=l r o €rr- €a, x [C^C*^ + Cia,C*,.], where the €^ are the eigenvalues of H^0^, and it has been; assumed that Hjj^ is diagonal and = 1., (1) In the same way, for two-center perturbations, (&A0 'pq= <Kio\p " 6Spq' and ^A^'pq " \p " 6Spq' for a11 **• (p / q.)» which implies that V'tahi m _ 6ewt°pi0q3 * °pfqi]> p,q-x and (7.21) one obtains /x n occ occ R;V = - E 6S E E C . [C C +C C 1 G (7.22) ij p,q=l 1X1 r s irL pr qs ps qrJ js n occ: uriocc (6P__-6°>6S__ ) + E EE q r • [C- C ,+C ,C ] ~ „ p° c0 u pr qo1 po* qrJ ptlq=l r o r " a* 242 ffcom which the first derivatives of the elements of R with respect to off-diagonal elements of HAQ and SAQ may be obtained# Formulas for higher order terms in the series for R cam he obtained here in the same way. However, they are long and tedious, and not very informative by themselves. Nevertheless, using formulas developed im section 6,4, it is possible to compute these higher order terms numerically for specific applications, in a relatively efficient manner* 243. 7»4 Self-consistent Perturbation: Theory The object of this section is to develop a perturbation formalism for the one-particle density matrix in closed shell Hartree-Fock theory.- The formulas developed here allow a more rigourous calculation of various properties of atoms and molecules than those given in previous sections of this chapterr because electron repulsion terms are included expli citly. The entire effect of the self-consistency terms is buried in the detailed calculation of the perturbation series for f, and therefore, formulas for the density matrix and related quantities in terms of f, which were derived for the simple matrix case, will apply here also. 7.4.a General Theory In this case, the perturbation series for the operator f is obtained by expansion of the equation D(f) = FBA • FBfif - f(FAA • FAfif) = 0, (7.23) leading to a hierarchy of equations determining the fv • This hierarchy Is formally identical to that obtained in the simple matrix case, except that now, the matrix F (the closed shell Fock matrix) itself depends on f through its dependence on the density matrix, PAt) F(PA) = H + G(PA). (7.24) Here H is the core hamiltonian, and the two-electron part, 244. G(PA),, representing electron repulsion, is given by G<VT.<, A 2 2 <V+U<£RSL,U*3 ~ *[rtllus]),. (7.25a) A rs t„u*l A tu with [rsjut] » ^*(l)^s(l)r^20u(2)0t(2)dtlo/|'2 • (7.25b) the 0r being elements of the zero order molecular orbital basis usetf for the calculation. Direct iterative solution of eq. (7.23) for f, without making a perturbation expansion, is equivalent to solving the Hartree-Fock equations exactly. We will consider here only those perturbations which can be Introduced as perturbations of the core hamiltonian, H « *? K(n) . (7.26) n=0 This perturbation will induce changes in the electron distribu-tion described by PA, and through that, the two-electron part of the Fock matrix is perturbed. Thus, the n^h order term in the perturbation series for the Fock matrix consists not only of the nth order term.H^, in (7.26), but also includes an m order two-electron term* The exact form of this n order two-electron term depends on the manner in which eq. (7*23) is expanded into a hierarchy of equations determining the terms of the series for f, as is explained below. It is convenient, but not strictly necessary, to require that F^(f) be at least block diagonal, so that f itself is at least a first order quantity, f = E fv '., In some n=l applications, it may be desirable to relax this requirement, 245. and the modifications which must be made in such a case to the formalism given below are indicated in Appendix 10v However, in-: this section* and sections 7.4.b and 7.4.c following, it will be assumed that we are working in a basis in which F^ is at least block diagonal* Formal substitution of the series for f and F into (7.23) gives D(n) . p(n)+ g (F(n-j)f(j).f(j)p(np.j)) BA BB; AA 1=1 3=1 (7.27) n = 0, 1, 2, ... • Here, one has A(ro) =n21(F(n-j)f(j).f(j)F(n-j)) BA j—1 ^A + -2 f{iU?H)f«); i=i j=i AB (7.28) which does not depend explicitly or implicitly (through F^) on f^n\ The quantity (7,28) is not the same as the analogous quantity in eq, (6,9) in the simple matrix case,, because may now depend orcf^n^, and therefore, for the purposes of solving (7.27), it must appear explicitly. The extension.of these equations to a non-orthonormal basis will lead to equations similar to (6,54) and (6.55) in place of (7.27) and (7.28). 246. Despite the similarity of the basic equations, the deter mination of the perturbation series for f is more complicated here than in the simple matrix case when the term F^ im (7.27) is considered to depend on f^, that isr when the order Fock matrix is considered to be F(n) m H(n) + G(pA(n)), (7#29) When (7»29) is incorporated into (7.27)» the so-called "coupled Hartree-Fock" perturbation scheme results, and the equations for the f ^ in this case are derived in the following sub section. Formalisms im which the dependence of on PA^ Is partially or completely neglected, leading to schemes referred to as "uncoupled Hartree-Fock" perturbatiom theory, are dis cussed in section 7.4.c. 7.4.,b Coupled Hartree-Fock Perturbation Theory In the coupled Hartree-Fock perturbation scheme, F^n^ is ( rs) considered to be dependent on fv as indicated in eq. (7.29). That is, the two-electron integrals are considered to be order neutral. It is convenient to write the n> order density matrix, PA^» in the form /Or) . ^(n) + 0 A A + f(n) f(n)t •S'(n) An) (7.30) where l>Mn^ depends: on f^ for $4 n-1 only. Thus, eqs. (7.27) 247. become D(n)(f) » GBA(f(n)) * PB^f(n) " f(n)F^) + F^(P^n)) BA (n) (7'31  + = 0. n = 0, 1, 2fl ... , where The last two terms of (7*31) are now independent of f^n^. The important! feature of eqs. (7*31)# however, is that the first (n) three terms — those which are dependent on fv ' — are of a particularly simple form, which is the same for all values of n. Prom; eq. (7.25a), unocc occ GBA(fl Ks =22 fa™ 2[r's|ro'] - [f'oMIrs] o r (7.33) unocc occ / \» + Z Z f<n' 2[r»s]|o»r] - O'rllo's] . a r If all quantities are real,, this reduces to / \ unocc occ. /_\ GBA(fl Vs = E z fcv' Mr'sllo'r] - [r»o»||rs] - [r«r||o's] unocc occ / \ - L Z iJJ'A .8ro. (7.34) a r The four index quantity, Ar,sro,, has sometimes been referred to as the Nesbet supermatrix. Thus, if the zero order Fdck matrix is block diagonal, the n* order equation (7.31) cam 248 be written D(n) . <*" "J8* B f(n) + [p(n)(?;(n), , AW-, . rs ~ ~~ Tsro or L BA A 'O BAJrs ' r o (falt ,,,, ng, s=l, ,,,, nA), (7.35a) where Vra " (PBB Wrs " ^AA^rsVo + Vsro* (7.35*) Where the zero order Fock matrix is diagonal, eq, (7.35a) becomes t~\ « /_A unocc: occ. t\ rs T s rs _ „, Tsro or o r (7.363 • \?<j*hil<*>).• - o. (r*lt nBi s»l, •••» nA)t where the €^ are the eigenvalues of the zero order Fock matrix. In either case, the calculation.of f^n^ reduces to the solution of a system of nAng simultaneous linear equations. Even when the zero order Fock matrix is diagonal, it is no longer possible to obtain a closed formula for the elements of fbecause of the self-consistency term,. However, only the terms Fg^CPj^*1^) and'Ag^ need be calculated for each value of n, since the coefficients of the f^} in D^(f) are the same for every value of n. The calculation of these two quantities is easily done automatically, and therefore, the formalism above can be used to calculate high order perturbation series for PA without having to derive and use explicit perturbation equations. 249. Because of the potentially large dimension of the matrix B im eq. (7.35a),, mon-iterative methods of solution (such as Gaussian elimination) may not be practical in application! to that linear system, especially if B is a sparse matrix. Per haps the simplest iterative technique is the Gauss-Seidel procedure, with the iteration formula (n) (rr) (m) 1111000 000 .(n)n EFBA (PA )+ABA ^ts" oSr rSs ^sro* .(in) or ^n;B *-"i3A " A * OA -"rs pgr r»<B Tsro uc ^ (7.37) BTSST This procedure has been found to be satisfactory im the small number of calculations we have done, although no data has beem obtained on actual rates of convergence. Many other efficient techniques are available for the iterative solution of large linear systems* We shall not explore this aspect further here, however. 7.4.C Uncoupled Hartree-Fock Perturbation Theory The term "uncoupled Hartree-Fock perturbation theory" has been applied to a number of related approximations, proposed over a period of years, im order to simplify the solution of (7.27) (for example, Langhoff, Karplus, and Hurst, 1965f Musher, 1967)t usually only in first order. The complicated coupling term- im eqs. (7.35a) or (7.36) arises directly from the requirement that self-consistency be maintained in all orders in the perturbation. However, in a situation im which the perturbation is expected to distort the electronic distri-250. bution only .slightly, it may be possible to obtain acceptable results by relaxing this self-consistency requirement somewhat. In first order, this amounts to ignoring the dependence of *BA^ on leading to the same result as in the simple matrix case. f(l)unca_or (? 38) r " €o whemF^0^ is diagonal. A degree of ambiguity enters if this formalism is to be extended to higher order, however. It is not clear whether one should ignore just the f^ dependent part of p£nK or all of PA^ irn the nth order equation, (7.27). There may be an accumulation of non»-self-consistency as one proceeds to higher orders, depending on the exact form of the approximations employed,, and this may cast doubt on the validity of these higher order terms. An internally consistent and unambiguous perturbation formalism does result if the two-electron, integrals are con sidered to be first order quantities except where they enteir implicitly in P(0). Them p(n) . H(n) + G(PA(n-1}), (7.39) and no self-consistency term in f^n^ will occur in the n**1 order equation of (7.27). In fact, except for the implicit dependence of the in A^ om lowerr order f^, the resulting hierarchy of equations determining the f^n^ will be identical to that in the simple matrix case. Only by actual calculations can the validity of the assumption (7*39) be assessed, however. 251. CHAPTER 8 DIRECT MINIMIZATION SELF-CONSISTENT FIELD THEORY "•In that case', said the Dodo solemnly, rising to its feet, *I move that the meeting adjourn, for the immediate adoption of more energetic remedies—*" "•Would you tell me, please, which way I ought to go from here?* •That depends a good deal on where you want to get; to*, said the Cat© 'I don't much care where—', said Alice, •Then it doesn't matter which way you go', said the Cat. •—so long as I get somewhere', Alice added as an explanation. •Oh, you're sure to do that*, said the Cat, 'If you only walk long enough*• Alice felt that this could not be denied, s she tried another question, 'What sort of people live about here?1 •In that direction,* said the Cat, waving its right paw around, 'lives a Hatteri and in that direction', waving the other paw* •lives a March Hare, Visit either you lik they're both mad,* (Alice's Adventures in  Wonderland, Lewis Carroll) 252. 8.1 Introduction; In the Hartree-Fock approximation, the total electronic energy of an atomic or molecular system described by a single determinant wavefunction, ^, can be written in a given basis as mi n / • \ mi n M\ /n E - Z v. E E Vi Z Rsr Rtu i=l ^,,8=1 sr rs i, 1=1 1 J r,s sr xu t,u=l x ([rsllut] - aijCrtllus]) i-1 1 r.s=l p ^ r» rs H f vlVi E T x">x">^>x<J># i,j=l 1 j r.s a.0=1 sa ra tp 110 t,u=l x ([rs||ut] - a^CrtHus]). Here h is the core hamiltonian for the system, and the [rsj|ut] are two-electron; integrals defined in. eq. (7.25b). The summation indices, i,j, refer to electronic shells. The X,^ are expansion; (lcao) coefficients, expressing the occupied normalized orbitals as linear combinations of the given basis functions. The are occupation numbers for these orbitals, and the a.. are constants determined according to the values of and v... The operator (the one-particle density matrix for the i shell) is a projection onto the space of + h the i shell occupied orbitals, R(i) . x(i)x(i)t. (8.2) 253* The electronic energies of the stationary states of the system are approximated by the stationary values of E, eq. (8.1), considered as a function, of a suitable set of variables, such as the X^ or R^. The traditional and still, at present, the most commonly used procedure for determining the stationary values of E has been by solving the corresponding Hartree-Fock equations (Roothaan, 1951)» F(i)x(i) „ s x(i) f (i)# (i a lf ...fin). (8.3) The matrices F^ depend on all of the occupied orbitals X^, (i) (ji =1, ra).. An initial estimate of the Xv ' is used to construct approximations to the F^\ which are then diagonalized to yield, it is hoped, an improved estimate of the X^, which can be used to obtain a further improved approximation, for the F^^. This iterative procedure is continued until self-consist ency is achieved. It is conceptually very simple, and in applications to the simplest (single shell) systems, rates of convergence relative to the work required in each iteration are quite good. Difficulties in obtaining convergence do arise, however, especially in calculations involving more complicated multi-shell systems. An alternative to the use of the Hartree-Fock equations is to minimize the energy, E„ directly with respect to a chosen set of variables. One problem Ira using the elements of the density matrices, R^, or the lcao coefficients, X^\, for this, is that a relatively large number of constraints must be imposed if the simple functional form, (8.1), of the energy 254. Is to be preserved. When using the lcao coefficients, the presence of redundant variables also causes difficulties for procedures, such as the Newton-Raphson method (Appendix 11), which require that the matrix of second derivatives of E (the Hessian matrix) be non-singular near stationary points. Redundancy among the expansion coefficients is associated with the invariance (to within a complex soalar factor) of the determinantal wavefunction! to non-singular linear transformations of occupied orbitals im the same shell. Under orthonormality constraints, the redundancy associated with unitary transformations still remains. The density matrices contain no redundancy, but must satisfy more complicated constraints. The presence of q redundant variables implies the existence of a q-dimensional constant energy surface through each point in the coordinate space of the unconstrained variables. A serious consequence for some gradient minimization techniques is that the Hessian matrix is then singular at stationary points of the energy (Sutcliffe, 1974, 1975i Coope, unpubl.). An efficient technique for eliminating the orthogonality constraints on the lcao coefficients for closed shell systems has been developed by Fletcher (1970), and extended by Kari and Sutcliffe (1970, 1973) to more general multi-shell and multi-determinant cases. However, calculations in which Fletcher's method is used lm conjunction^with the conjugate gradient minimization technique, are frequently poorly 255. convergent near the energy minimum. As a result, such direct energy minimization procedures have been used more to provide improved starting estimates for the solution of the Hartree-Fock equations than as am alternative to the Hartree-Fock equations (see, for example, Claxton and Smith, 1971). Sutcliffe (1974, 1975) has explicitly exhibited the singul arity of the Hessian matrix at the energy minimum in formalisms based on Fletcher's method,; and he has suggested that this singularity may contribute to the slowness of convergence near the energy minimum.. This suggestion is questioned below, both om theoretical grounds, and by examination of rates of converg ence for calculations involving minimization of the energy with respect to a set of unconstrained variables containing no redundancies. It is our contention that the observed poor convergence rates arise rather out of deficiencies in the straightforward! implementation of the conjugate gradient mini mization algorithm. Sutcliffe (1974,, 1975) has proposed several solutions to the redundancy problem, but clearly, the simplest would be to write the total electronic energy, from the beginning, in terms of a set of umcomstraimed variables mot possessing such redund ancies. The eigenvalue independent partitioning formalisms developed ire chapters 2 and 4 provides such sets of variables* namely the matrix elements of the off-diagonal blocks of the matrix T. In the following sections, the application! of the partitioning formulas to the minimization of the energy of a 256. systenr represented by a single determinant wavefunction is described. One of the major advantages here is the fact that the derivatives of E with respect to these variables can be expressed very simply in terms of the columns of the projec tions, (8.2),, onto the occupied orbitals, and their complements. A scaled descent method,based on partitioning with respect to current occupied and unoccupied molecular orbitals, is proposed, (section 8.3.c), which appears to be very successful in practice. 257. gA 1 t 8.2 Closed Shell Systems 8.2.a Orthonormal Basis The square matrix, X, of the eigenvectors of the closed shell Fock matrix is partitioned into the nA occupied and the nig unoccupied molecular orbitals. The orthonormal basis func tions, in terms of which these orbitals are expressed, are partitioned into two sets of the same dimensions, nA and n^, defining spaces SA and Sg. In this way, the coefficient matrix X can be written in the blocked form (2.2). The projection* R, onto the space of occupied orbitals, is given by eq. (2.10) as (8.4) t (Awhere gA = 1A + f f is the metric for the eigenvectors X'x-'r truncated to the space SA» In the closed shell case, the energy functional is parti cularly simple, E = 2 tr Rh + tr RG(R) a 2 E * £ Rt ([rsllut] - i[rt||us]). (8.5) r,s t,u Substituti©m of (8.4) into (8.5) gives the energy im terms of the matrix elements of f only. Since the degrees of freedom, available*, n^ng, exactly equal the number of matrix elements 1The argument involving numbers of variables is of central-importance here, and is as follows for the closed shell case. (cont'd) 258. of f, there can be no redundancy. Also, the matrix f is completely unconstrained because R,. eq. (8.4), automatically satisfies the criteria necessary to be a projection (section 2.1*a). In short., the matrix elements of f represent a set of unconstrained variables possessing no redundancies, with respect to which the energy can be minimized. In principle, the elements of the block Rg^ also provide a set of non-redundant and unconstrained variables, but they are not very suitable for specifying the energy, because of 2 the complicated relationship between RgA and R^ or Rgg. If no constraints are imposed on the occupied molecular orbitals, ft) Xx , specified by nAn complex parameters, then these orbitals are arbitrary up to an nA x nA linear transformation. Therefore, there must be nA2 complex redundant variables among the lcao coefficients in the single determinant wavefunction, leaving nknB eomPliBX: variables which are independent. If the molecular orbitals are constrained to be orthonormal, then n^2 real para meters are eliminated by the constraints, and nA2 of the real parameters remaining are redundant (equal to the independent parameters in a unitary transformation — this includes the n^ arbitrary phase factors), again leaving 2nAnfi real parameters or n^ng. complex parameters which are independent. For the density matrix, the requirement of idempotency leads to 2 2 2 the nA • ng (complex:) constraints. RAA - R^ • RABRBA * °» and Rgg, * RBARAARAB* wnicn 6ive the blocks R^ and Rgg in terms of Rgj^r specified by complex parameters. Given R^ and Rg^, it is easy to calculate Rgg» hut the first equation here is not easily solved for R^ (see section 2.1.c, and in,particular* eq. (2.23)). 259. The first and second derivatives of E with respect to the elements of f are most easily obtained by the incremental app roach used in section; 2.1 .e, but now retaining terms to second order in the variation. Writing g^(f • of) B fij1*"^ + 6gA1' to second order one has, S -gAl6gA *k + gAlfigA gAlfigA *k * °^3^ <8'6> where 6gA • 6f*f • f+6f + off6f. (8.7) For the density matrix,, the variation R(f • 6f) = R(f) • 6R, is given, exactly, by 6RAA a 6gf» *RAB * * gI1&ft + 6gIl6ft» (8.8) 6RBA = "ffA1 + ^A1 + ^^A1' 6RBB - fSg"1^ • dfg^V + fgj^f* + SfSg^V + fog^fif1" * dfg^df1" + afdgj^&f1-., The first order term in the expansion of E(f • 6f) can be simplified to give 6^E: * 2 tr 6R F (8.9a) « 2 tr 6ffD + 2 tr 6fD?, (8,9b) where D * g^Dff)^1. (8.9c) This is identical to the result obtained for a simple matrix 260, (section 2.1.e), except that here, the quantity D(f), given; D(f) * FBA + FBBf " fFAA -fW» (8-10) Is defined in terms of the Fock operator, F • h• G(R), (8.11) •B which itself is a function, of f • As before, gtt = 1*, • ff*,, Is the metric for the eigenvectors Xv 7 of F truncated to the space Sg. From (8.9b)t the first derivatives of the energy are seen rto be 2D_r (8.12a) df or * 2[(1 - R)FR]or (8.12b)) « 2 For , (8.12ceB?A using the notation- developed in section 2.1.d. Here and below, Greek letters denote basis elements in; Sg, and Roman letters denote basis elements in; SA.. The first derivatives of the energy with respect to the variables far are therefore given, by elements of the off-diagonal blocks of the current Fbck matrix between contragredient non-orthonormal vectors given by the first nA columns of R and the last nfi columns of (1-R)• Because the metric matrices gA and gg are positive definite (as,; therefore, are their inverses),, it is seen that the first derivatives of the energy with respect to the elements of f cam vanish only if D(f) = 0. In.fact, this condition, or the more 261 general one, eq. (2.13)» P I s T P, with F defined as im (2.15a), can be regarded as an expression for the Hartree-Fock equations in the present formalism. As indicated in section 2.2.a, the condition-. D(f) = 0 (and therefore, VftE = 0) is also equivalent to X*FX being block diagonal. Isolation of the coefficients of the second order terms ih;E(f + 6f) yields the second derivatives of the energy with respect to the elements of f. After considerable algebraic manipulation, one obtains - vBo^s - <t - wA - [-MUMP or TS and r o r S TT + [eBeA eAeEJ, (8.13a) .„ Bf A II BT A j fcLCB"A irB'A-ar ^s (8.13b) SE „. ^ dE Rl In the particular case that the partitioning chosen is defined by the current projection! R, so that f = 0 and R « 1A, and further, when particular bases adapted to R and (1 - R) are chosem which diagonalize F^fR) and FgB(R), respectively, them the dominant terras in eqs. (8.13) are the derivatives a E/d:£crd*W with the value, 262, or or (the €^ being the eigenvalues of F) equal to the singlet single excitation- energies. At the energy minimum, the remaining second derivatives all reduce to combinations of two-electron integrals. To the extent that these combinations are small, the excitation energies, (8,14), approximate the eigenvalues of the Hessian matrix, which are thus positive, as they should be for an energy minimum* In the case that all quantities are real, the above derivative formulas become (see Appendix 12), — - ^.o.r . (8.15) 8fnr 6B®A or and af at, or T s aE n aE R + L + F o xR„ - (l-R)„*.For s, af 08 af ^s eBeB rs ar eAeA OATr os (8.16a) with X ~ _ J B Bll A AJ -"-•EBAir°BwA-» df ro or °r + Fa0o°Rr.r. " ^-RJflB^P1, * (8.16b) eBeB rr oo eAeA 263 8.2.b Non-orthonormal Basis In this case, the density matrix, R, and the electronic energy, E, are still given by eqs. (8.4) and (8.5), respectively. However, now both E and R depend, through the metric gA, on the overlap matrix. According to eq.. (2.103a), the metric gA is given here by «A ° SAA * W * ftsBA + ftSBE?' (8'17) so that now eq. (8.7) must be replaced by (SAR • ffSBtt)6f • 6f+(SWA + SRRf) • 6ffSRB6f BB' BA BB BB * YAB6f + f>f^m * 6ffSBB6f, BBT (8.18) in the energy variation:. The quantity YfiA = SfiA • sggf has been defined previously in section 5«3«c, and reduces to f for an orthonormal basis. Isolation of the first order part of E(f + 6f) gives the first derivatives of the energy with respect to the elements of f as -T- • 2Fe°er • (8.19) which is identical to eq. (8.12c). The orbitals ej, (r«l,...,nA), are the same as before, but now the ©B„ (o»l,,...„ nB), are given as the columns of "gAlYAB H " fgAlyAB - [1 - RS](B). (8.20)} 264.. That is, the first derivatives of the energy with respect to the elements of f are given as matrix elements of the current Pock matrix between two sets of contragredient non-orthonormal vectors consisting, respectively, of the first nA columns of the density matrix, R,f and the last nfi columns of the comple mentary matrix (1 - RS)* ^ As before, the second derivatives are obtained by isolating the coefficients of second order terms im E(f + 6f). The calculation of these coefficients is considerably more lengthy and tedious than for an orthonormal basis, but the final results are given simply by 2 d E _ or_o_rnJ»*^s; *forsfrs AT? ap (8.21a) - (SR)_-*§-(SRL^f-ae7f Traf* dfrrr "os and .2, cr *s (8.21b) + RsrFe°e^ " (1 " RS)oTFeaer * sr efieE or eAeA These formulas are identical in form to those obtained in ana orthonormal basis, eqs. (8.13), except for the factors R and (1-R) being replaced by SR and (1-RS), respectively, in certain places• •^This is not the complement of R im the usual sense of the word. Since (RS)2 » RS, one has (l-RS)R * 0, however, the reverse product R(l-RS) • R - R S is not zero im general* 265. Am analysis of the metric properties of the and efi similar to that of section 2.1.d for an orthonormal basis cam be carried out. The algebra is tedious and only the major results are listed here. Writing these vectors as columns of a matrix e% -1 -«A1YAB 1B - 'A AB (8.22) one can show that 0 0 9 g 3S -i "FGALYAB"YBAGALFT (8.23) verifying the non-orthonormality of the columns of %• A set of vectors dual to the e (that is, such that e e * 1 * g, S) are given by % m BA -f 1 B (8.24) where §A • + SAfif• These vectors are also non-orthonormal. as is seen from 2 S " |" SASA + YABYBA -^A * YBA YAB " SAF lfi • fff (8.25) It is seen from eq. (2.33) that the last nfi of the are 266* formally the same here as in an orthonormal basis. Metrdc matrices, with respect to which the e* and e^ are orthonormal, can be constructed explicitly,. One obtains, A - ee1" * * t t + f f VA " F VAB - F' XB + XBAYAB (8.26) and A ee* = -sI^V WBA^A1 -^^^(V^B^BA^^ r (8. for which it is easy to verify that e^e « 1 , ©*A® * 1 • Not only are these results more complicated than for an ortho-normal basis, but now g / A a^d g'/'^i » ire contrast to the previous case. The matrices e and % are no longer normal., 8.2.c Results of Test Calculations — Closed Shell Case A set of CNDO/2 calculations were carried out to obtain information! ohi the convergence properties of direct energy minimization procedures based on the formalism presented ihv sections 8.2.a and 8.2.b. The calculations were carried out 267;. on an IBM 370/168 computer using double precision arithmetic*"* In all calculations, the convergence criteria imposed were |SE|< 10~12 a.u* and l&R^jJ < 10"^ per iteration* The number of iterations required to satisfy both these criteria are given in Table 8.1 for selected calculations. In practice, a single iteration In addition to those indicated in the table is . required in each case to verify that the convergence criteria have been satisfied* The seven molecules chosen are one5 for which the Roothaan iteration method can be used with varying degrees of success. Pour of them, CH^, HP, LiP, and HgO, present no problems at all. For two of them, BeO and BN, Roothaan*s method is only slowly convergent, and the last one, PN, leads to oscillations between definite charge distributions after about thirty Roothaan iter ations. For each of these last three difficult cases, converg ence of Roothaan's method will occur or can be accelerated if a suitable inter-iteration density matrix averaging procedure is employed. The variables far were defined by a partitioning between 'occupied' bond and lone pair orbitals, and 'unoccupied* anti-bond and atomic orbitals. The bond orbitals were non-polar combinations of hybrid atomic orbitals, the hybrid AOs used, being far from optimal in some cases (for example, sp-^ hybrids ^The parts of the programs involved in calculating the CNDO/2 integrals and core hamiltonian were adapted from the CNDO/2 program of Pople and Beveridge (1970);. 268 oro the F atom). In this bond/antibond/lone pair orbital basis, the starting approximations was f = 0. For calculations done directly im the atomic orbital basis, the starting; value of f was? calculated using eq. (2.3a), where X defines the starting orbitals in the AO basis. It is seen that im all but a small number of calculations, substantially fewer iterations were required to satisfy the convergence criteria when using the direct minimization methods, than when using Roothaan's method. Even in the cases causing difficulty for the Roothaan method,, convergence appears striaght-fdrward for the direct methods. When variablesafor» defined by an arbitrary partitioning of the AO basis are used, the number of: iterations required increases somewhat. Rates of convergence for Fletcher's method and the partitioning method are generally comparable, indicated that the presence of redundant variables ire the former has no observable effect on convergence rates. Generally, it was found that the overall rate of convergence depends very little on the accuracy of the step length as long as some minimal accuracy is maintained. Assuming that the construction of the Fbck matrix is by far the most costly single step in an SCF calculation, direct energy minimization procedures based on the conjugate gradient algorithms are at least twice as costly per iteration as the Roothaan method. Therefore, even a rather substantial decrease in the number of iterations required for a direct method may not represent a more efficient overall calculation. However, 269. the direct methods do have an advantage of reliability — they can never diverge, if set up appropriately. With the partitioning defined in the bond/antibomd/lone pair basis, the full Newton-Raphson equations converge very rapidly — im none of the seven examples studied are more than five iterations required to satisfy the stringent convergence criteria* In the case of CH^, this rate of convergence can be duplicated using the conjugate gradient technique if the step lengths during the linear search are calculated sufficiently accurately (correct to four figures), but not for HgO. If am arbitrary partitioning is defined in the atomic orbital basisr initial convergence of the Newton-Raphson method is generally very much poorer. For two of the molecules, the calculation actually diverges, while for a third, it converges to a stat ionary point above the minimum- value of the energy. Because of the expense involved in using the full Newton-Raphson equations, both a diagonal block and diagonal approxi mation were tested, these being analogous, respectively to algorithm FGN, and to algorithms DGN and SDNR, as described in chapter 5» While these approximations represent a very signifi cant reduction in computation required, the methods are seen to be generally unreliable. Convergence is not only much poorer, but some calculations actually diverge in cases where Roothaan*s method converges. The Newton-Raphson equations can, nevertheless, be usefully exploited in other ways, one of which is described and illustrated in section 8.3.c. 270 TABLE 8.1 Closed Shell Case — Test Calculationsa Method0 HF Molecule LiF H20 BeO BN PN Roothaan 10 16 17 20 49 >80b osc. Fletcher dl 7 5 14 7 18 19 16 d3 2 14 4 18 19 16 j c 7 5 14 9 18 18 16 Partitioning di 7 5 15 7 19 19 18 bond orbital 2 14 5 19 20 15 basis 3 c 7 5 14 9 19 19 16 d1 8 7 20 10 18 19 44 atomic orbital 1 4 5 18 10 18 20 43 basis 3 c 5 5 18 10 18 20 43 Partitioning 5 >40 (steepest descents) Newton-Raphson Full (B/A basis) 2 3 4 2 5 5 5 Full (AO basis) 3 13d 4 div. 5 div. Block Diagonal 3 27 7 div. div. div. (B/A basis) Diagonal 17 11 div. 24 div* div. div. (B/A basis) aNumber of iterations required to satisfy |6E| <10"li5r |*Ri<i|'<10 per iteration. ^convergent ^interpolation schemesi d^ — secant formula, i timesi c • — cubic formula converged to an excited state. 271 8»3 Unrestricted Hartree-Fock Theory 8. 3. a Energy Derivatives The formalism developed in the previous section for the closed shell case can be carried over with minor modifications to unrestricted Hartree-Fock calculations. In fact, it is possible, im some sense, to view the resulting formalism as that of two coupled closed shell systems, one for the a-spin; electrons, and one for the 3-spin electrons. The energy functional is now The matrices Rf* and Rp are the one-particle density matrices referring to the a-spin and 3-spin occupied orbitals, respectively. A set of unconstrained, non-redundant variables completely specifying E can be introduced as follows. In the chosen basis, the n^ occupied a-spin orbitals are written as columns: of a matrix Xa^, and similarly, the n£ occupied 3-spin. orbitals as X^A\ These orbitals will be eigenvectors of the appropriate Fock operators. Now, two different partitionings of the basis set are carried out. In the first case, the basis functions are partitioned into two sets of dimensions n^ and n^ spanning 2?2 spaces and Sg,. A second partitioning is defined in which the basis functions are partitioned into two sets of dimensions 4 and 4 spanning spaces and Sg, respectively. As a result, the occupied a-spin and 3-spin orbitals can be written in the block formi ,a(A) AA "BA r3(A) AA CB BA (8.29) It is now possible to define two f-operatorsr namely, BA AA »• (8.30) in terms of the two sets of occupied orbitals. Then one has ,-1 with Ri . xi(A)xi(A)t i i+ i 4 * h * f f » gA f i i-1 if . (8.31) (8.32) giving the two density matrices, and thus the electronic energy solely in terms of the njn** n^ elements of fa and fp. That the elements of f0" and f8 are the minimum number of" variables necessary to specify the energy, but not subject to any constraints nor possessing any redundancies, can be esta blished in the same way as for the closed shell case. The requirement that the a-spin occupied orbitals be orthonormal, and the redundancy associated with the invariance of the energy, (8.28), to an n^ x n| unitary transformation of these orbitals 273. together eliminates no? of the nf(n? • nfj lcao coefficients number of elements im f01. Similarly, orthonormality constraints and redundancy leave only n^n^ independent variables invthe P-spimoccupied orbitals, which is equal to the number of elements in f^« Orthogonality between a-spin orbitals and the 0-spina orbitals is automatic, due to the orthogonality of the spire parts. The so-called "pairing conditions" sometimes used in the derivation!of this different-orbitals-different-spim (DODS) formalism (Rosenberg and Martino, 1975)» merely represent a particular choice of some of the redundant variables im the orbital coefficients of the two sets of spim-orMtals. and thus need not be considered in the above arguments concerning the number of degrees of freedom in the problem. For a variation 6R* in the R* (i=a,0),, the corresponding change 6E im E, eq. (8,28), is given exactly as 6E = tiC&RV1 + 6RPF*] + itr[6Ra6Ga + ©R?6Ge]. (8.33) Here F® and FS are the a-spin and 0-spin orbital Fock matrices respectively,, Fa • h • G* » h+ J(Ra) - K(Ra) • J(R*) , (8.34a) and pe = h + GP = h + J(RP) - K(RP) * J(Ra) . (8.3W The first order part of (8.33) is the sum of two terms of the same form as (8.9a) for the closed shell case. Therefore, one has immediately that 274* * [(1 - R^FV} on (8.35a) (8.35b) Thus, the first derivatives of the energy with respect to the elements of the f* are again just matrix elements of the corres ponding current Fock operator between two sets of contragredient (non-orthonormal) molecular orbitals, which are, respectively, i i i the first n^ columns of the density matrix R and the last rig a columns of the matrix (1 - R S),. (i • a, 0). It is seen that the first derivatives of the energy with respect to elements of f0, depend on fe only implicitly through the dependence of Fa on R8,. and vice versa. The second derivatives of the energy are given by,, i i«E i. = -iftSR1) »^rsfrs 1 • c(4?'t»iflk4r<«i>"K(4)'(»i>'lk»i7 c«i (8.36) *{RLE(.4)^4f- (1 - Ris>otF(ei)s(ei)r and « i[(eB)°(e^)r||(e|f (eAf],. i / ji (8.37) 275. J-^j-- *[(e|)«(ei)r||(ei)B(e|:)r]. ifl . or TS These formulas are different fronr those in the closed shell case because the coupling between a-spin and 0-spim orbitals is explicit in the second order variation of E with fa and fP. 8.3.b> Test Calculations and Computational Refinements A series of minimal basis set (STO) ah initio calculations were carried out on the molecule CN in order to obtain informa tion on the practical implementation of the UHP-SCF formalism just described. Claxtom and Smith (1971) have reported cone vergence problems ih similar calculations. A Roothaan iteration! procedure converges very slowly when the interatomic distance is 2.0 a.u*, and exhibits oscillatory behaviour, failing to converge, when this distance is increased to 2.2 a.u. (see Figures 8.1 and 8.4)). When a direct minimization procedure based om Fletcher"s method was used, it was found that converg ence was rapid at first, but became very slow as the minimum was approached. They concluded that the most efficient proce dure was to use the direct method initially, until a good estimate of the energy minimum was obtained, and then complete the calculation-using a Roothaan iteration procedure, which converges well when provided with a good starting approximation. The calculations here were carried out on an IBM 370/168 computer using double precision arithmetic. The integrals in 276> the Slater orbital basis were obtained from a version of the POLYCAL program. Orbital exponents were takem from Clement! (1963). The linear search step in the conjugate gradient algorithm was required to reduce dE/dX by a factor € compared to its value at X 8 0,, and € was usually chosen; as 0.1. The starting approximation! in all but one case was equivalent to the eigenvectors of the core hamiltonian. It was found that the convergence of the direct minimiza tion, calculations based on; the partitioning formalism was very poor if the were defined by an arbitrary partitioning of the atomic orbital basis. Convergence improves greatly if they are defined by the partitioning of a set of molecular orbitals, X ,, which more nearly block diagonalize the Fock operator. In practice, this involves evaluating the energy gradient and f-operator in the new basis, that is, VfiE**0 - (e^X>FiA0(XoeA)' (8'38) the calculation requiring less computation if the quantities in the brackets are evaluated first, and then the back-trans formation of the density matrix as calculated from the MO basis f-operator using (8.31), (R1)*0 - X^R1)"8**. (8.39) iif xlsx^ 9 i„ No transformation! of the two-electron integrals 00 is necessary..^ The Fletcher and Roothaan calculations were done %he transformation; to the MO basis has an additional advantage when working in a non-orthonormal AO basis, because if the new (cont'd) 277. in the original atomic orbital basis. Tables 8*2 and 8.3 summarize the results of sixteen different calculations done here. The relative rates of convergence of some of the methods and refinements are also illustrated im Figures 8.1 - 8.3 for the CN molecule with a bond length of 2.0 a.u., and lm Figures 8*4 - 8.6 for a bond length of 2.2 a.u. The energy range is Figures 8.1 and 8.4 is larger by a ratio:Vof 400115 than that in the other four figures. 0m comparing the results of the calculations involving Fletcher's method (2, 4„ 5) to those based on; the use of the f£ , (6, 7). it is seen that not only d® both methods converge poorly near the energy minimum, but that Fletcher's method actually slightly outperforms the method based on the parti tioning formalism. This is also seen in Figures 8.1 and 8.4. A number of modifications of the basic method based ont the use of the were examined. Slow rates of convergence near the minimum imply significant linear dependence between successive search directions in the conjugate gradient calcula tion. Simply restarting the calculation with a steepest descent direction more frequently resulted in no improvement (Figures 8.2 and 8.5)* However, a major' increase in the rate of conver gence was obtained when the basis, XQ, in which the partitioning was defined was replaced by the eigenbasis of the current Fock basis vectors satisfy X^SXQ » 1, then the energy gradient formulas applying in an orthonormal basis can be used since MO S * 1. This partially, if not completely, offsets the additional cost of the transformations in (8.38) and (8.39)• 278. TABLE 8.2 Details of Direct Minimization Calculations CN Molecule (r * 2.0 a.u.) Typea min.& alg. € modification! f Final Energy-a.u. ranlP cc d e 1 R -112.691216s 16 2 Fl c»g% -112.835540h 10 3 R -112.8457221 3 4 PI- cvg. 0.1 -112.841230 8 5 PI. ct*g. 0.01 -112.840916 9 6 P e.g. 0.1 -112.818824 14 7 P c.g. 0.01 -112.822130 13 8 P e.g. 0.1 3 -112.805275 15 9 P c.g. 0.1 3 X -112.845708 4 10 P e.g. 0.1 xk -112.845365 6 11 P e.g. 0.1 3 X: -112.845418 5 12 P e.g. 0.1 3 x X -112.845722^ 1 13 P s.d. 0.1 -112.824592 12 14 P s • d • 0.1 31 -112.845121 7 15 P s.d. 0.1 Xk -112.832094 11 16 P s.d. 0.1 31 X -112.845722 2 aR*Roothaanr Fl*FIetcherr P*Partitioning ^cg. « conjugate gradient, s.d. = steepest descent. csteepest descent restart frequency. basis update at steepest descent restart. ^gradient scaling inv effect after 30 Iterations unless otherwise noted, exact energy is -112,845722 a.u. ®28 iterations h29 iterations *uses final result from calculation #2 as starting approximation, ^convergence criteria |6E| <10"12„ |6Rii|<10*6 satisfied im 25 its. ^using eigenvalues of core hamiltonian. ^indicates the frequency of basis modification. "indicates the order of the final energies, fromi lowest to highest. FIGURE 8*1 Total electronic energy as a function of iteration number for the CN molecule, (bond length = 2,0 a.u.). (1) Roothaani (2) partitioning, steepest descent search directions onlyi (3) partitioning, conjugate gradientsr (4) Fletcher, conjugate gradients! (5) partitioning conjugate gradients with gradient scaling and basis update with steepest descent restart every 3 iterations. In all direct minimization calculations, € = 0.1 (see Table 8.2). ITERATIONS -i 1 r 1 • 5T to iS 30 3° FIGURE 8.2 Total electronic energy as a functions of iteration number for the CN molecule, (bond length = 2*0 a.u.)* Comparison, of the effect of various modifications on the conjugate gradient algorithm*—partitioning approach onlyi (1) steepest descent restart every 3 iter ations onlyi (2) basic conjugate gradient algorithm! (3) gradient scaling only; (4) gradient scaling and steepest descent restart every 3 iterationsi; {.$). steepest descent restart every 3 iterations with basis update at restart» (6) gradient scaling, steepest descent restart every 3 iterations with basis update at restart (see Table 8.2). FIGURE 8.3 Total electronic energy as a function of iteration number for the CN molecule, (bond length » 2.0 a.u..). Comparison of the effect of various modifications on the steepest descent algorithm—partitioning approach onlyi (1) steepest descent algorithm only» )f( b*sic conjugate gradient algorithm* (3) steepest descents with gradient scaling onlv* (4) steepest descents jyithbasis update every 3 iterationsi (5) steepest descents with gradient scaling and basis update every 3 iterations, (6) conjugate gradients with gradient scaling, steepest descent restart every 3 iterations with basis update at restart (see Table 8, 282 operators at the time of the steepest descent restart, as was done for the calculations numbered 9 in the tables. The partitioning operators are set to zero when the new basis is incorporated into the calculation, and therefore, this basis modificatiom is equivalent to a single Roothaan iteration. 8,3.c Use of Scaled Variables A second modification of the basic algorithm which results in a major improvement ini convergence, is suggested by the Newton-Raphson equations for determining the zeros of the energy gradient (see Appendix 11). Upon neglecting the two-electron; integrals in eqs. (8.36) and (8.37), and in a basis diagonalizing the current Pock operators, it is seen that &g - €*•- «*.. (8.40) Thus, the diagonal approximation of the Newton-Raphson equations can be written, &for « -.<(€j - e^r1 -*f- , (8.41) dfor where A is some constant independent of 0 and r., If the minimization problem is rewritten in terms of a new set of variablesr 283. TABLE 8*3 Details of Direct Minimization Calculations CN Molecule (r • 2.2 a.u.) Typea min.b € modification! f Final Energy rank1 alg» a d e a.u. 1 R osc • ^ 16: 2 Pl e.g. -110.991333h 8 3 R: -111.0129141 2 4 Fl e.g. 0.1 -110.995595 5 5 Fl c.g» 0.01 -110.994936 6 € P c>g. 0.1 -110.976114 13 7 P eg. 0.01 -110.977213 12 ' 8 P c.g. oa 3 -110.975463 14 9 P e.g. 0.1 3 X -111.011872 3 10 P eg* 0.1 -110.981282 11 11 P e.g. 0.1 3 X -110.984485 9 12 P C.gv 0.1 3 X x: -111.012980 1 13 P S .d. 0.1 -110.951620 15 14 P s»d., 0.1 3* -110.992565 7 15 P s.d. 0.1 -110.981113 10 16 P s.d. 0.1 3* X: -111.010863 4 a RsROothaan, Fl»Fletcher, P*Partitioning •Li, c.g. = conjugate gradient, s.d. * steepest descent. °steepest descent restart frequency. d basis update at steepest descent restart.. egradient scaling in effect. after 30 iterations unless otherwise noted, exact energy is -111.012980 a.u. ®2Q iterations ^9 iterations *uses final result from calculation #2 as starting approximation. ^using eigenvalues of core hamiltonian. v indicates frequency of basis modification, ^indicates the order of the final energies from.lowest to highest. ' I ITERATIONS T 1 T- 1 r r * 10 I* 20 2S 3» FIGURE 8.4 Total electronic energy as a function) of iteration number for the CN molecule, (bond length « 2,2 a«u.)» (1) Roothaani (2) partitioning,, steepest descent search directions onlyi (3) partitioning, conjugate gradients 1 (4) Fletcher, conjugate gradients, (5) partitioning, conjugate gradients with gradient scaling and basis update with steepest descent restart every 3 iterations. In all direct minimization calculations,, € » 0,1 (see Table 8,3). ' OD I ITERATIONS J i 1 1 • 1 7 s 10 is x> 3o FIGURE 8»5 Total electronic energy as a function of iteration number for the GN molecule, (bond length « 2.2 a.u»)» Comparison of the effect of various modifications on\ the conjugate gradient algorithm--partitioning approach onlyi (1) steepest descent restart ev«ry 3 iterations onlyi (2) basic conjugate gradient algorithm} (3) gradient scaling onlyi (4) gradient scaling and steepest descent restart every 3 iterationsi (5) steepest descent restart every 3 Iterations with basis update at restarti. (6) gradient scaling, steepest descent restart every 3 iterations with basis update at restart (see Table 8«3)« ITERATIONS i , 1 1 1 1 / S 10 IS 2o 05" 3D FIGURE 8*6 Total electronic energy as a function of iteration number for the CN molecule, (bond length * 2*2 a.u*)*. Comparison: of the effect of various modifications on the steepest descent algorithm—partitioning approach only, (1) basic steepest descent algorithm, (2) basic conjugate gradient algorithm! (3) steepest descents with gradient scaling only, (4) steepest descents with"basis update every 3 Iterations, (5) steepest descents with gradient scaling and basis update every 3 iterations,: (6) conjugate gradients with gradient scaling, steepest descent restart every 3 iterations with basis update at restart (see Table 8*3)• then the diagonal Newton>Raphson equations, which are approxi mations of a second order convergent method, give the correction! 6f^r as a simple step (the same for all o and r) along the steepest direction* This modification can easily be incorporated into the ordinary conjugate gradient formalism by scaling the energy gradient, Them the appropriate correction for the unsealed variabiles is where v is the conjugate search direction, computed from the scaled gradients, and X is the step length computed by inter polation in the usual manner* This gradient scaling (or the **i implicit use of the scaled variables f£r) is aimed at correcting the problems in descent methods caused by anisotropy in the curvature of the energy surface*. In practice, the numbers used are the best available estimates of the eigenvalues of the Fock operator at any stage of the calculation* Initially, any suitable estimate may be used (for example, orbital energies from a semi-empirical calculation of some sort, or even the eigenvalues of the core hamiltonian, as was done for the cal culations described in Tables 8*2 and 8*3)* This scaling procedure has no simple counterpart for Fletcher's method* The calculations numbered 10 were done by incorporating only this scaling procedure into the basic conjugate gradient (8.43) of1 * (61 - €i)"*6fi « XCC1 - €l)~*v or v a r or v o r/ < or (8.44) 288. algorithmresulting im a substantial improvement in the rate of convergence. Increasing the steepest descent restart frequency to every three iterations (compared with 45 as recommended by Fletcher and Reeves, (1964))* resulted in a further small Improvement* However, when the molecular orbital basis defining the partitioning is replaced by the eigenbasis of the current Fock matrix at the steepest descent restart, the most rapidly convergent algorithm, resulted* For the 2.0 a.u. interatomic distance* the energy became correct to fifteen figures (effectively the limit of the machine precision), and the diagonal elements of the density matrices to seven figures, in; only 25 iterations. Nearly the same results were obtained for the 2.2 a.u. bond length." The test calculations described above support the assertion; that the singularity of the Hessian matrix at the energy minimum has no observable effect om the rate of convergence of the con*-6 jiugate gradient algorithm. Rather, they indicate that much of the poor convergence is due to the fact that the energy curvature is highly anisotropic im. general, and the usual conjugate gradient algorithm does not take proper account of this* A single average For a quadratic form, it is easily demonstrated that a singular Hessian matrix has no effect on the convergence properties of the conjugate gradient algorithm, except that the minimum; will be located in fewer iterations, since no linear search is required in directions corresponding to those along which the form has zero curvature. In; a converging energy minimization calculation, the part of the coordinate space corresponding to the redundant variables should effectively act as a null space as far as the choice of search directions is concerned. This is especially true near a minimum, where the energy is most like a quadratic form. 289. step length along the descent directions generated tends to overestimate the necessary correction for some variables and underestimate it for the others. This is a well known short coming of steepest descent procedures also* Im fact, the steepest descent algorithm: employing a cubic interpolation linear search, does not converge much more slowly than the conjugate gradient algorithm. It appears that in application to direct minimizatiomi self-consistent field theory, the finite termination; property of the conjugate gradient method is of little advantage, since, even for the smallest systems, this finite termination) irr principle requires considerably more iterations than are acceptable if efficient calculations are to result. No attempt was made in this series of calculations to determine an optimal restart frequency. The rate of convergence is usually greatest either in an iteration involving a restart, or in the one immediately following, and therefore, it is unlikely that an interval between restarts of much more than: three iterations will result inn faster overall convergence.. Despite such frequent steepest descent restarts* there still appears to be some advantage to using the conjugate gradient search directions,, as can be seen on comparison of calculations 13 - 16, respectively, with calculations 6, 9. 10, and 12, The calculation of these search directions from the steepest direct tions is a small part of the whole calculation. Even so, the steepest directions by themselves give remarkably good results here (see Figures 8.3 and 8.6). 290. 8.4 Theory for the General Single Determinant Case 8.4.a The Basic Variables of the Calculation; We now consider the case of an N-electron system, represented by a single determinant wavefunction constructed from occupied orbitals for which there is a natural grouping into mr sets, called shells, which are relatively weakly coupled by the hamiltonian; operator. The rt^ occupied orbitals, Xj^, associated with the i*^' shell, are chosen from a set of n; orbitals X^, which are eigenfunctions of the Pock operator, F^, referring to the I***1 shell.. The total energy of the system, eq. (8.1), Is; then completely determined by the projections (one-particle density matrices invmolecular orbital theory), R(i) = X^X*1**, (i - 1, ...» m)„ (8.45) onto the individual nj-dimensional subspaces of the full n-dimensional basis space, each subspace spanned by one of these sets of occupied orbitals. It will be shown; that the columns of these projections and their complements again pro vide non>orthonormal basis vectors, im terms of which the first and second derivatives of the energy, (8.1), with respect to a set of variables provided by the multi-partitioning formalism of chapter 4r can be written effectively as compactly as in the closed shell case. If the simple form; of the energy, (8.1), is to be preserved, these projections must satisfy the constraints R(i)t , RU)f (8.46a) 291 and R(i)SRU) « R(i)6ijt (8,46b) or,, equlvalently,, the occupied orbitals must satisfy the ortho-normality conditions,, X<iJ)tSX^) * (8.47) Here S is the matrix of overlap integrals of the fixed basis functions in terms of which the X^ and R^ are defined. The number of independent parameters necessary to specify the energy exactly is determined as follows. The total number of parameters in the Icao coefficients,; xi^, (i * 1,, nc), mi m+1 x is m £ %( where m « Z nv, is the dimension of the full I«l x 1*1 1 basis space. Within each set X-j^, the orthonormal it y constraint, and the redundancy due to the invariance of the energy, (8.1), to an arbitrary unitary transformation! of orbitals in the same 2 shell, together account for nT parameters.. The orthogonality m I-rl of orbitals im different shells is expressed by Z nT Z nT 1=2 A J«l 0 unique conditions, making it possible, im principle, to elimin ate am equal number of parameters. Thus, the total number of unconstrained and non-redundant parameters required to specify mi I the energy in the form (8.1) is Z nv(n\ - Z mT). 1*1 1 J*l 0 The iratra-shell constraints and redundancy can be elimin ated by rewriting the energy im terms of a new set of parameters chosen as follows. For each i, (i = 1, ra), the m eigen vectors X^r of the Fock operator F^,, are divided into m*l 292 subsets, X^A^„ of dimension rij, respectively* such that the subset X^ is the set of occupied Ith shell orbitals. Simil arly, the n-dimensional basis space is partitioned into m+1 subspaces S^, (J a ir m+1), of the same dimensions, n\jt, respectively. The matrices X^, (i « 1, m), can now be written in an (m+1) x: (m+1) block form, similar to that im eq. (4.2). A set of uncoupling operators,, is defined, such that (eq. (4.4)), x(i) a $(i)£(i) f (i - lr ..... mi),, (8.48) *(i) (i) where Xx ' is the diagonal block part of Xv making it possible to write the block columns of interest as (i) c(i) L1I ,(i) L2I c(i) Sn+i ,i a ;(i)x(i) TI XII (i) II (1) 21 (i) +1..I II (8.49a) (i * lp ..., un), where fjj^ * X^Y^ ^TT^ ^» (*" 88 lr ••••»> m+1) MI "II (8.49b) That is, we have implicitly set up m (m+l)-fold partitionihgs., one for each of the X (i) The parts of each shown in (8.49a) are the only ones which enter the energy expression, (8.1). The f^, (J » 1„ m+1, Ji/I), are specified by n^Cn-n^) 293 complex parameters, which is exactly the number of parameters im the Xj ' afterr intra-shell orthonormality and redundancy/ have been accounted for* m 1-1 The intershell orthogonality constraints imply 2 Z n^m-1=2 J=l A * relations between the elements of the fjj^r (i 88 1». ur)* The explicit incorporation of these relations into the theory will be considered im section 8.4.d. 8»4«b) The Energy Variation: and First Derivatives The energy functional will be written here im terms of the R<1> a* E * Z tr v,Rlx'fci + £ E tr v.RVL'G. , (8*50) i=l 1 i=l 11 where h is the core hamiltonian matrix, representing the elec tronic kinetic energy,, and the interaction! between the electrons and the nuclei, and the represent the imter-electronic repul sion terms of the hamiltonian operator* In detail, one has 6* 88 EG, 1;(V,RM^) • Z v.£. .(R(;J>)„ (8.51) where GI>^(R): = J(R) - atjK(R), (8.52a) and aij; =1 if v£ = v. * 1, = otherwise*. (8,52te) 294. The matrices J(R) and K(R) are the usual Coulomb and exchange matrices with elements given by J<R>rs * Z RU>s||ut3» K<R)rs " £ RtutrtHus]-t*u t,,u (8.52c) where the symbol [rs||ut] is defined in eq. (7«25b)). The occupa tion numbers, v^„ may have values of 1 or 2 only.. An incremental approach is employed to obtain the deriva tives of the energy. A change 6R^ in R^f (i « 1, .... m), produces a change in the energy given exactly by 6E « £ v,tr dR^M1* + £ L 6R(i*6G,t (8.53) i=l 1 i*l 1 *• where F^ is the Fock matrix associated with the ith shell., F(i) a h «• G^. (8.54) In the notation established in the previous subsection, the Mocks of the projection R^ are given by where ,<1> . (x(i)x(i)f)-l gI ^AII AII ' t 1* 11 L/I ^ LI K/I KI 0 K/I KI ^ LI L^ (8.56) Then, one has* «5i,-41)4l)-141)t^1)»41)-1^),*^)41)-1«^t +64J)6g<i)-16f^)t . (8.57) 295. To second order,, one has • 0(63), (8.58) where eq. (8.56) yields I ^ IL LI ^ Kl Kl (8.59) K/I The calculation of the first derivatives is simplified by noting that terms in oE linear in ©fjj^ or 6f^*, for a specific value of i, can only enter via a single term of the first sum mation (over shells) in eq. (8.53). Substituting (8.57) - (8.59) into (8.53), and retaining terms only to first order, im the 6£j^ and their adjoints, one obtains +M/I<Sf«)tS«L*">+f«li,tS«-6f") >] v .(l).-lf(i)t PU)7 . mL164i)*[(F<i>?(i))pI-(S*(i))I>lg(i)-l x (f^V1'?*1*)^1)-1 • 296. •^*4|)«5l)"1t<*tl)Vl))IPr(»{1)t»<1,T<«) ^ *; 41>-1(}«1>t8)I^ II aji ^^2^4i)tJy«R(i)WwRji) ^ (8.60a) m+1 P-l PI J»K IK KJ JP Thus,, on defining a new set of non-orthonormal basis vectors, e<*> - (1 - R^^S)^,, K/I, ^ eJI KJI • (J,K « 1, m+1), (i s 1, m), (8.61) one can write, m ra+1 £ Vi Fromi this, one obtains,,, (8.60b) 1 ,SE  1 dlfPI Vv F (i) (8.62) as the formal first derivatives of the energy with respect to the elements of the fj^ and their hermitian conjugates. As in the simpler closed shell case, the first derivatives of the energy are matrix elements of the appropriate current Fock operator in a basis of non-orthonormal molecular orbitals 297;. which are columns of the corresponding projection Rv/ and the complementary matrix (l-R^S). The metric properties of the oasis vectors, (8,6l)„ are examined in Appendix 13. 8.4.c The Second Derivatives The second derivatives of the energy are obtained in a straightforward, but somewhat tedious, manner, by isolating the second order terms in (8,53)• These terms consist of two types. The first arises from the trace over the product of second order variations in the projections R^ and the corresponding Fock (i) operator Fx , while the second arises from the terms of the form tr 6R^^6G^, which contain products of first order varia tions of the density matrices. Consider first the simple term This equation can be viewed as representing linear transforma tions on the basis functions in terms of which the two-electron integrals are evaluated. The summations over r,s can be treated independently of those over t and u, above. To first order, one has (8.64) J,K»1 r€K S€JT,' E (6R «JK 'sr ) <rlls> 298, -2 ,ip(6f« X riKC6--Sf-)^iMf-i>ts^ /I v€I s€J xc[g<i)-14J)t]vr<r||s> P=l u€I ri »*V J,K«1 r€K DA 1 8,1 /I v€P s€J «• [6KP-^wii,4i)"14i,t]vr<HiB> -"E1 £ {•f«J»)B¥<(^l>)"||(41»)'S. 'PiSS " (8.65) *"z £ (.fjl)*) M<(.«1>)»||(.Ji))<,». **I v€P Here, the notation <r||s> is to indicate symbolically only that the basis function #r enters the expression antilinearly, while 0_ enters it linearly. It is not meant to imply that matrix s elements of the type [rs||ut], given by (7»25b), can be written as the product of two simpler matrix elements. Combining (8,65) with the corresponding result for the sum over indices t*u, in the original expression: (8,64), then leads toathe result /1\ m+1 m+1 m+1 tr 6Rvi'6G, 3 E E E E E x j=l Pal Q»l u6P a€J /I /J v€I 0€Q tif^>t)nit6*<«*)a,[ct.<»)''C.<1>>»||t^.>>»£41>)«] -aijf<4i)>|1(43))alh^))^41))v]} 299. + (64^t)>(1(ef(|>)ea{[(e^))"(e<i))1,||(e(J))a(e(^)»] -aiJf(e(i»)"(e<^)»||(e<i»)«(e(i>)»]} •<«4i)V»<S4l',t)aB[t(e<i))V(e<1>)t'||(e^))»(e^))n -aijC(41>>v(4J>>0ii<e8i)»s<4i))|1]} •<«^)V,<«^))u[[<41>>v««p1)>i*lk4i>>"<»Q,)>'3 -a1JFci1)»,<4i)l,lk4J))"f4i)>l,3}V (8.66) Each of the four terms here consists of a Coulomb and exchange integral combination, evaluated in the particular non-orthonormal molecular orbital basis given by (8.6l). The contributions of the second term in (8.53) to the second derivatives of E are easily obtained from (8.66). Consider now the first term of eq. (8.53). A considerable amount of algebraic manipulation is required to obtain the second order terms in compact form. The final result is tr 6(2>ieti)r(i) » _tr "J1 •f<J)t(M<i>)PI»«i«V1Jn m /I Vi II PI II QI J 300 - tr. Z 6fp^ R:(I)S)IQ64^F(ILv m . /I (8.67) This equation contains no terms involving products of matrices 6fPI^ W1<th di:ff'eren't values of i. Thus, the first term of (8*53) gives contributions only to second derivatives of the energy with respect to variables referring to the same shell* The complete second derivatives of the energy can now be written down by combining eqs, (8,66) and (8,67)» and iheorpora ting constant factorsand occupation numbers where indicated by (8,53), In all, there are only six different formulas (of which two pairs are complex conjugates of each other)* d2E -v. l d(fQI /. \ dE d(fPI >UP ([(e«i))'i(e^i))v||(e^)a(e^))e] f(4i))'l(.{i))plk^l))a(eji))¥]} ~aiif d2E vi dE d(fPI Vvd(fQI }CLfi 2 *UPI >nP d(f,( dE QI 'av_ 301 » 2 J-[C(41))V(e^i>)"||(e<1))e(e<i))a] 2 • B.(i)».(i) ~| -aij[(e|>)"(e<i>)«||(e<i>)B(e(i»)»]} , (8.68) Wi>*/ '..Mi-* • - ^ (C(e^,)^4i))-||(e^),«(eU))' ^IJct4i>^<«S,)>vKJ)>B<-il>),'3}' »,2 + -2 d2E v.v -a1-1f(.<1))v(.^))a||(.^))¥(4i>)|,]J , K(rlU4) *trli^ 2 l p ' ' 1 " J a J -1Jc<4l>),,<-,1))Blk4J)>M-)),'3j • a<4i Vv9(fw >«Y 2 In all these formulas, the convention! fi€Pt v,0 € I# a€Q, and Y€J, is implied. 302 8.4.d Incorporation! of the Intershell Orthogonality Constraints The intershell orthogonality constraints on the lcao coefficients are given by eq. (8.47). The equivalent expres sions ire terms of the fpj^ are analogous to eqs. (4.58). Only half of these equations are unique, the other half being their adjoints. There are two ways to incorporate these constraints into the theory. The constraint equations, (8,69) t cam be used to^ explicitly eliminate an appropriate number of elements of the fpj^ occurring im the energy functional. The derivatives of the energy with respect to the remaining unconstrained variables are then obtained from eqs. (8.62) by a simple application of the chain rule. The advantage of using this method to handle the imtershell constraints is that the energy and its deriva tives are them expressed im terms of a minlmumi number of uncon strained and non-redundant variables. The resulting formalism is suitable to use with true minimization! techniques, such as the conjugate gradient method with variations discussed previously (sectiom 8.3.b)• Such procedures would be reliable, since divergence could not occur, and efficient, as long as the number of shells is small. As the number of shells increases, the intershell constraint equations become considerably more complicated (see Appendix 2) and the additional cost of calculating 303. the energy derivatives with respect to the independent variables may soon offset the other advantages. Another problem here is that the elimination! procedure is not easily automated for use with an arbitrary number of shells. This approach is illustrated ire detail in sectiore 8.4.e for a two shell system.. A second approach to incorporating the intershell constraints m 1-1 into the theory is to consider the £ % £ nT unique cora-1=2 1 J=l 0 m m+1 straint equations, (8.69). and the Ere,. £ re, independent 1=1 1 J=I+1 J equations expressing the vanishing of an appropriate set of the same number of first derivatives of the energy, as a system of m for the elements E nv(n - riy) simultaneous nonlinear equations 1=1 1 1 JLI: •pi of the fli*, (P * 1, P/Il i s 1» »)'• The deriva tives of the gjj^t eq. (8.69). are aJlK * •u*qr»Li<* Sm*K¥ >ps • <8-?0a> d(fB8L ;pq K 1 and m = 6.,6TT6 ( £ fi-r'TsvM)r.^ • (8.70b) >/ft(D\ jl LJi sq K 4 KI KM rp * ML 'pq *• 1 With these formulas, and eqs. (8.62), the Jacobian matrix for the complete system can be constructed, and the can be determined iteratively, using one of several methods (for example, the Newton-Raphson equations). The advantage of using this approach is that the energy 304 derivatives, (8.62), and the derivatives, (8,70), of the constraints, can be used!without further modification, and can; be calculated automatically for systems involving an arbitrary number of shells. The calculation now involves twice as many variables as in the first approach. The large number of variables may preclude the use of the full Newton-Raphson equations, making it necessary to develop linearly convergent approximations to them which are more efficient overall, much as was done in 7 chapter 5» in a different context,' These methods are not descent methods, and therefore, will not necessarily yield an energy minimum at all times. Nevertheless, for systems involving a large number of shells, this would appear to be the approach of choice. 'The situatiom is admittedly greatly complicated here by the presence of the large number of two-electron\ integrals (which must be transformed to a new molecular orbital basis in each iteration) entering the second derivatives of the energy. They would have to be partially or totally neglected, or else approximated in some manner if a computationally efficient algorithm based on the Newton-Raphson equations is to result. An alternative procedure would be to use a method not requiring these second derivatives (for example, a generalization of the secant method for the solution of a single nonlinear equation). No information on the performance of such methods has yet been obtained,however» 305 8.4.e Example -- The Two Shell System As an illustration of the general formalism just described, formulas applicable to a two shell system are given here explic itly, A 3 x 3 partitioning must be used in this case. The variables entering the calculation are A*7 f, 1 (1) 21 ,(D '31 T (2) 12 h >(2) 32 (8.7D where the occupied orbitals for the two shells are written,, (1) . £(l)v(D (2) J(2) (2) K2 l2 A22 ' (8.72) The projection operators onto these two occupied spaces are given explicitly by XD g(1>  l -1 J21 61 *31 -1 -1 1 ,21 ADM)'1 f(Dt J21 gl *21 ADM)'1 f(Dt ?31 gl 21 M)mlAl)i gl A31 f(l)ff(l)"1f(l)t *21 gl x31 f(l)_(l)"1f(l)t J31 gl 31 (8.73) and ,(2) A2) J2)"1f(2)t ri12 g2 14 gg 12 (2)""1f(2)t r12 A2) J2)"1f(2)t J32 g2 I12 fX2)_(2)-1  x12 g2 J2)"1 g£ f(2) (2)-1  z32 g2 f(2) J2)-Ij(2)t E12 g2 *32 J2)-If(2)f g2 J32 f(2) (2)-1f(2)t x32 g2 x32 (8.74) 306o lm an orthonormal basis, one has, J1) * 1 + f(l)tf(D + f(Dtf(l) gl * Xl + f21 f21 + f3i f31 * (8.75a) and «2 *2 + f12 f12 * f32 f32 # (8.75fc) If the basis is non-orthonormal, explicit formulas for the g£ are considerably lengthier,, for example, gl Sll*S12f21 S13f31 21 s21*r31 31 21 b22r2 u21 +f.(Dtq *(ihf(i)ts f(i)+f(i)ts f(D ,*21 ^3 31 31 32r21 +r31 33 31 * .(2) (8.76) and similarly for gj?*"* For an orthonormal fixed basis, the nora-orthonormal contra gredient molecular orbital basis, (8«6l), in terms of which the energy derivatives can be written very compactly, are given by S(D = R<D e Al .(I) c21 r(D c31 g (1) -1 e2 (1 -R<1J) ,2 1 - f(1)g(1) x2 r21 gl J31 gl 21 -1 ,(l)t r21 (8.77a) 307. and -4irl4i}t  f(i)ji)*lf(i)f x21 gl z31 1 - f(-)f(l)"lf(l)t A3 31 sl 04 "31 -(2) The expressions for the e£ ' are analogous, *<2> - (l.R(2))fl, e2 R(2) ~(2) K,2 '•' e3 * (1-R(2)) ,3' (8.77 c;) For a non-orthonormal fixed basis, explicit expressions for the; %^ im terms of S and the fp^ and g^ , (i = 1»2), are con siderably lengthier. In the course of a calculation, the current projections R^1^ and R^2^ would always be known, so that the e^1^ would be obtained directly from formulas like (8.77b), rather than being evaluated using formulas like (8.77a). The vectors dual to the e ,(l)t 1 (1) 21 (1) 31 -f, 21 lo -f are given by (1)T 31 0 (8.78) The scalar products of these vectors and the metric matrices with respect to which they are orthonormal are given by 2(1) . ^(Dtjyd); a *<l)fc(l)t g n e 308 and! 4° -1 12 r21 gl J21 r21 gl x31 "f3l gi *2i x3 31 gl 31 ^(1) * £<1>V1} • e(1)e(1)t , (8.79) (1) g(1)  l . + f(l)f(Dt i2 + i21 r21 ,(l)f(l)t r31 r21 ADADt x2i A3l , + f.(Df(l)t 3 31 31 (8.80) with similar results for e/2^ and e^2\ Similar, but lengthier, results are obtained in a non-orthonormal fixed basis, but, in that case, g(i) / A(i\ and g(i) / A(i), (i • lf2), and thus, the number of formulas doubles. The similarities between eqs. (8.77) - (8.80), and the results given in section 2.1.d, appli cable to a single shell system, are easily seen. The formal first derivatives of the energy are SE *<f2r>or a(f &E 31 'ar SE Jlf12 ;ro Vx^e<x>)«(e<x>)r B . « F(2> (8.81) 309 and as w(2) Formal second derivatives can be written down explicitly from eqs. (8.68). Even in this simple case, there are thirty-two different explicit second derivative formulas (E depends OK f21^» f31^» f12^ and f32^' and tneiir ad joints) neglecting those which are complex conjugates. There is only one intershell constraint equation in this case. In an orthonormal basis, it is *122) - f<|> • f<2)t •• t™'t™ - 0. (8.82) This equation is easily used to obtain (2) (I) ll) (2) giving fjg in "terms of f^l r *31 •' and f32 • wnose elements can be used as a set of unconstrained and non-redundant variables, in terms of which the energy may be minimized. Equation (8.83) is unusually simple. For the next simplest case, a three shell system, there are three intershell constraint equations, which, while similar to (8.82), cannot be used to obtain three "dependent" blocks, fpj^r ire terms of the remaining six "independent" blocks without introduction of an inverse matrix (see Appendix 2).. In fact,, for a two shell system, when the fixed basis is non-orthonormal, the intershell constraint becomes -(12) . (jCDtgftt)) g12 = {L1 bl2 )12 310. " Sllf12) + S12 * S13f32) + f21)t(S21f12)'fS22+S23f32)) • 4l)t(S31f12)+S32+S33f32)) * 0# (8#84) from which one obtains '« • - £sn «li)ts2i• WSJ'1 <8>85a) x L^12*T21 b22+131 32 ^13 31 23 r31 33' 32 J = -A"1B. (8.85b) Not only is this expression considerably lengthier than (8.83), hut the presence of the inverse matrix complicates the applica tion of the chain rule, and leads to more complicated formulas for the energy derivatives with respect to the remaining indep endent variables. From (8.83), one obtains ii£li!2£j£ « K & d(f12))op _ . ,A2), arf(l)x " "V6«V wf(l)«/ 6ov<f32 Vp» dU21 'uv *ir31 'fiv (2)x (8.86) 12 1 op m . /^(D~x au31 Vv Combining these with eqs. (8.81) then yields. SB B F(l) „ F(2) Ht^\r ^.<lV(.<l>>* 2F(e<2>)°(e<2>)r f '^^(e^ -v2p2)FJ)tEU)]( SE _ „ p(l) 3(fU)*) V'F *lx31 ;ar x~3 ' ' ~| ^" •.£*" *e:~' lar (8.8?) 311. andi , x, w(2) u [Z(i)v{z) "1 which require little additional work once the derivatives in (8.81) are known. For a n©ni-orthonorraal basis, eqs. (8.85) can be used to obtain, a / 2).;: \ 0KI12 ;ro s .-lrc. A-lw q <, -(2)-, ArsL&21A n ~ ^22 • &23x32 -Vo* dU21 Vs d(f(2J) wJl)»r * Ars--S3lA"lB * s32 - S33f32>\o • (8'88)  dU31 'as and d(fi2))ro _ . rA-i/^Dtc * f(Dtq -n 77-^27; s "6orLA u21 b23 *31 33'Jra» dvr32 'af which lead to +v2[(S21A"1B - S22 - S23:f32))Pj2)t (2)A"xl.s* e2 el 31 as 3 1 2 1 312* and (2) (.<2)>a(.<2)f (8.89) Both sets of equations, (8.87) and (8.89)». are suitable for use with gradient minimization;algorithms. Second derivatives of E with respect to elements of fgi^» f3i^» and *32^* and "t^16*1, adjoints are obtained in a similar way.. The greater complexity of the formulas (8.89) compared to those in (8.8?) is not of much concern here, since in an actual calculation, one would expect to carry out the energy minimization in a molecular orbital basis in which S « 1 (see section 8.3.b)>. In this case,, the stationary points of the energy can also be determined by solving the system of 2rcln2 • n^n^ + general complex) simultaneous nonlinear equations given by P s (8.89) 0 and * 0. 313. BIBLIOGRAPHY: Claxton, T.. A*.and Smith, N, A., Theoret. Chim. Acta. 22, (1971). 399. Clementi, E*., J. Chem. Phys., 38, (1963). 2686. Coope, J, A. R., Molecular Physics. 18, (1970), 571. — Ph. D. Thesis, Oxford University, (1956). Coulson, C, and Longuet-Higgins, H. C, Proc. Roy. Soc. A191, (1947), 39. Daniel, J.. WY, Numerische Mathematik. 10, (1965).. 125. Davidson, E. R., J. Comp. Phys.. 17» (1975)* 87. DeVries, E., Fortschritte der Physik. 18, (1970),, 149. Eriksen, E.., Phys* Rev.. Ill, (1958),, 1011. Feler, M. G.„ J. Comp. Phys.-., 14,, (1974),, 341. Fletcher, R.,, Molecular Physics. 19» (1970), 55* Fletcher, R., and Reeves, C.M.. Computer Journal,, 7, (1964), 149. Foldy,r L. L., and Wouthuysen, S.. A... Phys. Rev.. 78. (1950), 29. Friedrichs,, K» 0., Perturbation; of Spectra in Hilbert Space. (1965), American; Mathematical Society, Prov,, R. I. Fujiimoto, H.. et. al., J. Phys. Chem.. 78, (1974), 1167. Garton, D., and Sutcliffe, B. T..,. Theoretical Chemistry, Vol. 1, Quantum Chemistry, Specialist Periodical Reports. Chemical Society, London, (1974), 314. Imamura, A., Molecular Physics. 15t (1968), 225. Kari, R., and Sutcliffe, B. T.t Chem. Phys. Lett.. 7, (1970), 149. — International Journal of Quantum Chemistry. 7» (1973). 459. Kato, T., Perturbation Theory for Linear Operators. Springer-Verlag, New York, (1966). Klein, D., J. Chem. Phys.. 61, (1974),, 786. Langhoff, P. W., Karplus, M., and Hurst, R. P., J. Chem. Phys.. 44, (1966), 505. Libit, L., and Hoffmann, R., Journal of the American- Chemical Society, 96,. (1974), 1370. Lowdin, P* 0., J. Math. Phys.. 3t (1962), 969. — International Journal of Quantum Chemistry. 2, (1968), 867. — Advances in Quantum Chemistry. 5» (1970), 185. Lowdin, P. 0., and Goscinski, 0., International Journal of Quantum Chemistry. 5. (1971)t 685. McWeeny, R., Phvs. Rev.. 126, (1962), 1028. Morpurgo, G., Nuovo CJmento. 25,. (i960),, 624. Musher, J., J. Chem. Phys.. 46, (1967). 369. Nesbet, R. K., J. Chem. Phys.. 43, (1965). 3H* Okubo, S., Prog. Theor. Phys.. 12, (1954), 102. Pople, J. A., and Beveridge, D.. L., Approximate Molecular Orbital Theory. McGraw-Hill, Toronto, (1970). Primas, H-, Helv. Phvs. Acta. 34, (1961), 331. — Rev. Mod. Phys.. 35, (1963). 710. Rail, L. B., Computational Solution of Nonlinear Operator Equations> Wiley, New York, (1969). 315. Ralston, A., A First Course in Numerical Analysis, McGraw-Hill, Toronto,. (1965). Riesz, F.,, and Sz.-Nagy, S.„ Functional Analysis. Frederick Ungary (1955). Roos, B..„ "The Configuration: Interaction Method" im Computational  Techniques in Quantum Chemistry and Molecular Physics., (G.. H.. F. Diercksen, B. TV Sutcliffe, and A. Veillard, eds,),, Reidel, Bostom, (1975). Roothaan, C. C. J.,, Rev. Mod. Phys.. 23, (1951). 69. Rosenberg, M.r and Martino, F., J, Chem, Phys.. 63, (1975) $> 5354. Rutherford, D. £., Roy, Soc* of Edinburgh. Proc. A. 63, (1949/52), 232. Shavitt, I., J. Comp. Pfays.. 6, (1970),, 124.. Shavitt, I., et. al., J. Comp. Phys.. 11,, (1973)., 90. Sutcliffe, B. T., Theoret. Chim. Acta. 33, (1974),, 201. — Theoret. Chim. Acta. 39, (1975). 93* Sz.-Nagy, B.., Comment. Math. Helvet.. 19, (1946/47), 347. Tarair S.., Prog., Theor. Phys.. 12,, (1954), 104. Traub, J. F., Iterative Methods for the Solution of Equations. Prentice Hall, Englewood Cliffs, N, J., (1964). VamVlecic, J. H., Phys., Rev.., 33* (1929),. 467. 316. APPENDICES "Humpty Durapty looked doubtful* 'I'd N rather see that done on paper,* he said, (Through the Looking Glass, Lewis Carroll) 317 APPENDIX 1 Proofs of Alternative Formulas —2x2 Partitioning This appendix outlines some of the manipulations necessary to establish a number of inter-relations which have been quoted in section 3»1* Consider first the orthonormal case.. The relationship: between the two sets of effective operators (3«1) and (3*2) is easily established. From the definition •.(2) _ -U HA " gA Gk-,f one obtains, "'I' - SI'C %A •. HABf • tHnBti • HBB;f)] = ft*1' • gj1^1^). A similar procedure establishes the relation (3«3b;) between Hg2* and Hg1*. To: establish eq.. (3.6),, the result (3.3a); is substituted into (3*5)»- and the " pull-through'* relations, (2.32), used.. This yields D<2>(f): = HBA • Hggf - fH<2) = %A + HBBf -^A1^ - fgI1ftD(1)(f) = (iB - fg-ifW^Cf) = 4yi):(f). The condition that T HT be block diagonal is easily determined! in a direct manner. The inverse of T is 318. *-l T: 1 1„ - fg^f' Matrix multiplcation, followed by use of the "pull-through" relations, (2.32), then establishes that the off-diagonal blocks of T.rlHT are given by D*2^(f) = g^D^^f). Before deriving-eqs.. (3.11); - ('3*15)» applying in the case of a nonorthonormal basis,, it is necessary to examine the orthonormality condition,, (2,101b)), im more detail. The blocks of the matrix g are gA * SAA + SABf + f+SBA + ^Bfif' % " SBB: + SBAh + ^SAB + htsAAh» and = hfsAA + htsABf + sM * sBBf . g{B .. Thus,, one has:, gA = (iA - fV)sA • ftgBA , and (A1.2) (A1.3) §A = SAA + SA«f ' % ' (1B " htft)§B. + Here* and throughout this appendix,, the notation 'ABA * ^B: = SBB + SBAhr established in eq.. (2.112),, is used to simplify the equations. From- (A1.2) and (A1.3). one obtains, and; (A1.5) -1 ABTB * 319 From these last relations, two generalizations of the "pull-through" relations in the orthonormal case, can be derived. They are ^VA" = • hVg^1 - fc'g^jV, (A1.6) andl ^Alft " FTGB^B + ^SABFB " FTGBA§Ilft • (A1-7) The last two terms vanish in each if ggA * 0t. leaving the simpler expressions ^SA^A1 * SB^'V , (Al.,8) and g^V - ^ggSj1 .. (A1.9) Two other relations will be useful below in deriving (3*15)• They are, hfgA = -(1B: - htft)(SBA * SBBf) • gfiA , (AU10) and FT% " -<1A " fV>(sAAh + SAB)} + gAi • (Aiai) The first, (Al»10)„ is obtained as follows. hfgAX - (1B - htft)h+SA + htftgA - -(1B * htft)(SfiA * sBBf - gBA) • htftgBA = -dB- htft)(sBAi+ sBBf) • gM ., The first line here is obtained by premultiplying (A1.2) by h*". The second line then follows directly from the definition^ (Al.l), of g"BA« The relation!. (Al. 11) is derived analogously, 320. by first premultiplying (A1.3) by f*, and then using (Al.l) for gAg * gg^* A number of other relations similar to these could be derived here also, but these are sufficient for what follows. *M) To establish the relationship between the operators HA ' * (2) and HA ,, we proceed as follows.- First, from (Al.l), • tSBA " <SBA + * and thus, eq* (3.13) yields HBA + HBBf " -ft* - emK'^AK + «ABf> ^("(f). (A1.12) Then, using (3.8),, one obtains £(2) _ -lr A " gA QA * gA^HAA + HABf + f^HBA + HBBf>3 " tf&UL * "AB* - *W - ^A^^AA + KABf) - D(1)(f)]] ' gI1(1A - ^HH^ • HABf) • tftWhtt + % f gBAHA * But, from eq, (A1.2)„ (iA. fV) - (gx - fV^;1* and thus, .'A "A estaniishihg (3.11). Equation (3.12) is obtained im an analogous manner* 321. A number of approaches can be used to obtain eqs* (3«15). one of which is as followsi °U) aHBA + HBBf- <SBA+SBBf> ^kGK ' HBA * %Bf + (1B• * htft>"^ht- gBAgI1)GA = dB - hhfrK(iBhtf^cH^gf) * htGA.gBAg;1GA - dB. h'fW1* - gM(Hi2> - s^n - (iB - hVjf1^ - gBAg^1f+)D(1)(f).. (Ai.13) The transition from- the first line to the second is effected using the relation (Al.10)» and the remainder by use of previous definitions„ including (A1.12) above. From (Al*3), one has (iB - hV)-1 = (sBB + sMh):(gB * hh^r1 > which,, upon substitution into (A1.12)),, gives eq* (3.15a)* Equation (3.15b.)) follows directly from (3«15a); by simply dropping the terms in g^andi ggA* The easiest way to obtain (3.15c) is to begin again from the definition, (3.14), of D^2*(f) D(2)(f) * HM + KBBf - (SfiA + SBBf)n}2) = HBA + HBB* ~ (SBA * W^A1* + gIlftDU)(f)^ " ^B - <SBA + S^fJg^f^D^^f). This derivation is analogous to that establishing^ (3*6) in the case of an orthonormal basis. For an orthonormal basis, it wasr found that the condition (2) Dv '(f) = 0 could be obtained by requiring that the product T^HT be block diagonal* For a nonorthonormal basis, the 322 cx>rrespoTKiing condition! is to require the product (ST) HT to> be block diagonal. But writing T ST = gf! one has, and: (ST) HT • g T HT * g AG., Thus, whem g is block, diagonal. (ST) HT has diagonal blocks HA2^ and H^2*,. and off-diagonal blocks,, D^' * [(STr^T]^ = gjV1*,, and (2)* While? this is of the fornr of eq. (3-6), DfiA' can not be * (2) written in terms of H£ ' as in*eq. (3«l4), unlike the analogous result lm an orthonormal basis. 323 APPENDIX 2 The 3x3 and 4x4 Case — Orthonormal Basis To illustrate some of the complications which arise in a multiple partitioning formalism, a number of explicit formulas are given here for quantities arising out of a 3 x 3 and a 4' x< 4 partitioning formalism. In the 3 x 3 partitioning formalism, the three non-self-adjoint effective operators given by (4.1?) are Hl = Hll + H12f21 + H13f31 * H2 = H22 + H21f12 * H23f32 »' (A2.1) and H3 = H33 + H3lf13 + H32f23 each containing only one extra term compared to the 2x2 case. A considerably greater increase in complexity occurs when m=3 in the defining conditions on the fjj* given by (4.18). There are now six matrix block equations with six terms each, in place of the two block equations with four terms each, as im the 2x2 case. They are, D21 " "21* H22f21+ H23f31 - f21(Hn+ H12f21 + Hl3f3l' = 0 D3l ' H^l+ H33f31 - f31(Hn* H12f21 + Hl3f31 s 0 »12 " H12+ %f12+ H13f32 " f12<H22+ H2lf12 + H23f32' s 0 D32 " H32+ H3lfl2+ H33f32 " f32(H22+ H21f12 + H23f32' s 0 D13 " H13+ HUf13+ K12f23 - f13(H33+ H3lf13 + H32f23' = 0 and, 324. D23 = H23* H23f13+ H22f23 " f23(H33+ H3lfl3 + "32^3* = °' (A2.2) The orthogonality condition in (4.1) gives rise to three matrix block equations here, and,. g12 " f12 + f21 + f31f32 " °» g13 = f13 + f21f23 + f31 = °' g23 = f12f13 + f23 + f32 = °* (A2.3) These equations can be used to eliminate f12» ^13 and ^23 ^rom the remainder of the formalism, in favour of f21, and f^2» In fact, it is not difficult to show that t t f12 " "f21 " f31f32 • f23 = (11 " f12f21* 1("f32 + f12*31* • -lh + (f2l + f 3lf32 ^zi^t*^ + (f2l + f3lf32)f3i]' and (A2..4) f13 = "f31 " f21f23 • -f3i+ f2iCV<f2i+ 'Ji^^i^^^^'Ji^i^^i3* The problems involved in eliminating fjL2» f23 and f13 from ea*s» (A2.2), and thereby reducing the number of block equations which must be considered from six to three, are clear from these equations. It would be quite difficult to derive efficient procedures to solve such a system, because of the generally complex dependence of the remaining three block equations on the elements of f21, ^31 and f32* 325 For a 4 x 4 partitioning,, the orthogonality conditions, gjj = 0, give the following six unique matrix block equations, g12 " f12 + f21 + f31f32 + ftlf»2 " °» (A2'5) g13 * f13 + f21f23 + f31 + filf43 = °» (A2.6) g23 = fl2fl3 + f23 + f32 + fi2f43 s °» gl4 = fl4 + f2lf24 + f3lf34 + f4l = °' g24 " f12fl4 + f24 + f32f34 + ftz = °» (A2*7) g34 = fl3fl4 + f23f24 + f34 + f43 = 0# These six equations can be used to write the fjj» (J > I), solely in terms of the fIJt (J < I), as follows. Equation! (A2.5) gives f12 directly as f12 - -(f^ + f^f32 + f^f^). (A2.8) Then the two equations, (A2.6)^are solved simultaneously for f13 and fgy yielding, f23 = (f12f21",l)~1Ef32* fl2f43" f12(f31+ ^l^^* and <A2-*a> f13 = -(f^ + fjjf^) - f^f23. (A2.9b) Substitutioni of the adjoint of (A2.B) into (A2.9a) then gives f23 im terms only of the (J"< I), and substitution of that result into (A2.9b) does the same for f'^j-- The three equations (A2.7) can be solved simultaneously for ^j^r and f^,, yielding, 326 f34 = "^1"f13f3l"^f23"f13f21)(1"f12f21) 1(f32"fl2f31^ (A2*10a) f24 = "(1"f124l)"1^fl2+f12fil+(f32"f12f^ (A2.10b) and *l4 = "<fJl + f3lf34 * f2lW- (A2.10c) Substitution of eqs* (A2.8) and (A2.9) into (A2,10a) gives f^ im terms of the fjJt, (J/ < only. Similarly, (A2.8), (A2.9), and (A2.10a) can then be used to write fg^ in terms of the same set of variables. Equations (A2.10a,b) then can be used to eliminate and f^, from (A2,10c). The resulting expres sions will clearly be very lengthy* It should be noted that the elements of the fjj» (J > I'),, can be calculated numerically much more easily from those of the fIjr (J < I), than eqs, (A2.3) or (A2.8) - (A2.10) indicate. Such a calculation involves the solution of E "l^j I,.J J<I simultaneous linear equations in; the same number of scalar variables* The complicated formulas above arise only when analytic formulas are desired relating these different matrix blocks f JJ« 327. APPENDIX 3 Proofs of Alternative Formulas — Multiple Partitioning This appendix outlines some of the manipulations necessary to establish a number of inter-relations which have been quoted in section 4.4* Equations (4.73) for an orthonormal basis are obtained as follows* From eq. (4,37) and eq. (4.72), one has, *(2) a -1 P HI s % Gr -•^E^^^JlC^^WlClM • CA3.1) Elimination of the quantity/ in the inner brackets in this equation using leads to the desired expression, Equation (4.10) has also been used in the last step. The relation (4.76) between D^ and D^, in an ortho-normal basis* is established immediately by substituting eq* (4.73) into the definition, (4.75). of D(2). The non-orthonormal case presents many more complications here. From eq. (4*58), one has, 328e Upon substitution of this equation into eq. (4.59)» a series of alternative formulas for the metric gj can be obtained, among them, gl=K/lf Klt gKI\//^(SLI+M/lSLMfMI} ^SH+K^iSlKf KI  S-^I4l4K(SLI+M^ISLMfMI)\^IfKIgKI+SII+K^ISIKfKI at1-K2i4lfIK^(SII+M^ISIM%)*K^I4ltL^KfLK(SLI L/I + E fJTgKT» (J = x" •••» m>» (A3«5) K/I KI KI This last form' is the generalization to the m x m partitioning of eqs. (Al.l) and (A1.2) of Appendix 1. In the case of a 2x2 partitioning,, the second term of (A3.5) does not occur at all because of the restrictions on the range of the inner summation.. Also, the summation symbols in the first and third terms of (A3.5) can be deleted in that case, since the summation is over only one term. Using the notation Sj of eq. (4.65), the generalizations of eqs. (A1.3) and (A1.4) of Appendix 1 are obtained from (A3.5) as L/I (A3.6) 329. The analogues of eqs. (Al*5) and (Al»6) of Appendix 1 are now obtained by right multiplying (A3.6) by f|p, (P/l), and left multiplying the equation for gpSp by the same factor, and combining the two equations to get, txPsIS'l' = SpSj^f Jp - SO fiP + fIP » (A3.7) gI^IlfPI " ^I^'P1 * fPI^+ ^fPI» and where * K/if«C^K<S"^iSL«fMl):l"^ifilgKI " K/P L/l and CA3.8) ^ =K/pfKpfpK* ^f^,^fM(SLP^WlIP>>^fKP«KP 3 K/1 L/P The two equations, (A3*7)» asa well as the two quantities, JB and 30 t can be obtained from each other by interchanging the indices P and 1. The generalization of eqs, (Al.7) and (Al*8) of Appendix 1 is then obtained by dropping those terms in (A3.7) above involving gKI, K/l* Here, however, this amounts to dropping only the last term inside the curly brackets of and dD * Thus the usefulness of the resulting equations as generalized multiple partitioning 'pull-through* relations is severely hampered, because of the complexity and size of the last two terms of (A3«7)» even when the orthogonality condition) is satisfied* Finally, the generalization of eqs. (A1.9) and (ALIO) of 330. Appendix 1, which were used to obtain one of the relations between D^2^ and for a 2 x 2 partitioning, can be obtained as follows* From (4*58), one has t * t " GJI*^/KJ(SKI+L^ISKLFLI)"(SJl",'K^ISJKFKIJ» K/l (A3.9) so that, from; eq. (4.59) and (A3.4), fijgi * <l-fIj4.>C«ji-^/^ K/I *FIJ,^FKIFIK^I"fL K/J L/I * FIJKyIFKIGKI N "< 1"FIJFI(SJI^K^J3JKFKl^GJI+FIJK/jfKIGKI K/J K/J K/I L/I (A3.10) " W^KI^/IO/LI} * K/I The last four complicated summation terms in (A3.10) make the result effectively useless, and therefore, no generalization of eqs, (3*15a,,b) is given here. The proof of eqs. (4.82) is as follows. From eq. (A3.9)» one has, K/I 331. Then* eq. (4.83) becomes, DJI'" "Jl^j"JKfKI-<SJI*^3JKfKI>"SlX <HII*j^HIJf JI> Mi x ^ii^iAi*' (A3'U) from which, an expressions for the combination HJJ+ E HjKfKI can be obtained. By definition, one has, "Si1 t^I* j^11! J* JI+jpjf JI(H JI*^11 JKf KI > ^ • which becomes, after using (A3.ll). "i2'"^1 "ii^^rj'ji-jj/ji^ij- su *^tL^a*^IstLta) SJ^<Hii+^1KiJfJi)-Djl> K/J =sjl tl-JJI*«fiJ*(,^i*JJIHiJ*ii) •J^*5iC^I-1SJ4j»Ki»IJIsKL'ii>^l>*JSi'JiBn) K/I L/1 K^l 332. using eq. (A3.5) to obtain this last form. A large amount of cancellation now occurs among the coefficients of "l1)a ^(H^^H^fj!), with the final result being eq. (4.82), (V(2) *(1) -1 r rt D(l) HI = HI + gl j^xfJIDJI ' APPENDIX k Description of Algorithms — 2x2 Case This appendix gives detailed descriptions of the implementation of the algorithms discussed in chapter 5« In various instances below, especially in the updating cycles, the order in which the computations are done is important, Greek indices refer to basis elements ih Sg, Roman indices to basis elements in S^, 1» Simple Diagonal Newton-Raphson (SDNR). initialization! f « 0 *(l)diag s Hdiag them tr » r=l updatei 0*1 or (s=l , •,•,n^), >oo* <»B1) + )oo- 6forH; f f„„ + 6f , or or or Quadratic Diagonal Newton-Raphson (QDNR), initialization! f s 0, H(1) « H A AA* -(l)diag _ Hdiag,, B: " BB, ' then Bi A if H = 0, then 6f « Z>IV/&nr. . ar or or ' ^or if A„„ -0, then &f =*<*^>JF/H~ if real or or or ' ro 6for=0t otherwise, if both H_ -0,,A_ =0, then 6f =0, or or or otherwise,, *-»or L^*or or or J % * sgn(Aar), update t <»i1)>.r*<i4l)).r+HBadfar- (8slnA) (H *oo^rt 'oo 0Iornro * f -*>f • 6f • Aor^ or or Diagonal Generalized Nesbet (DGN). initialization! f * 0, HA " AA* them n B W s H__+ £ H_ _f »• D«)"Wor%i*.t(8i2))tr-t = l A 'rr "oo-r*l. • n. updatei (s,t « 1, •«., nA) , -** SA • 6&J for — for * *for • (r'1' • V" A =f* IA6f„+(nf2)). •6fl„6frta(nJ2)) rs T0%»\ ot A ts ro os A +WL**«. » (r.s « 1, ..... n.)t ro os 336. 4* Full Generalized Nesfcet (FGN) • initialization! f = 0, H<2> . H AA gA 58 XA * then! rig W =H + E H f , r*l, •••.,,n. t solve ^oA[Hi2)-HoalA] - D<2). updatei = 1, .•••, nA) » gA * gA * 6gA -n; £(2) £(2) -1. HA HA + gA A * a-1t n B* for for + 6for • (r * lr •••• nA)f 337. 5. Simple Diagonal Newton-Raphson With Overlap (SDNRS), initialization! 0 . h • 0, HA = HAA • SA = SAA • u'tdiag; _ „tdiag £tdiag „ ^tdiag B3 "* » °B ~ RB » *BB B BB-them nA G »H + £H f + £ h (tt.) . or or ^ op pr ^ sox A'sr • "B n. ^or*^^* (SA> sr • A0r-(^t)0O<»A)PP-(ii)„(sJ) 00 6f =[g (S!) -G (S.) 1/A . or L&orv A'rr ''or* A/rrJ'I-*<"* •' or 6»ro<<SB)ao0or-(2Bt>oa«or^ar. • updatei A sr A sr so or (Si. )i, (S. ) _ + S__6f_^.« * A sr ' A sr so or* (%\o-* ("Bf)oo + 6hroHro ' s«l» •••f,nA, 'B'do wB'oo * 6nroSro • or or or ^ h„ + 6h. _ , ro ro ro r*l„ n. 338. 6> Quadratic Diagonal Newton-Raphsom with Overlap (QDNRS). initialization! them G *HT HI or or nB nA £ 1 •f jP + £ P=l nB nA f * 0, HA * HAA » S'tdiag , „tdiag WB BB •' W^A^sr • h - 0, SA * SAA »' Stdiag _ ctdiag BB S,-aS„~+ 2 S_ f + £ hM(SA) A = ^oAo " Sro<HB>oc • B * (SB)ao(HA)rr-(SA)rr(HB)oa-SroGar+Hrogdr, C: • -(S») G + g (M.) , v A'rr or 6orA A'rr* 6f * 0. or * if A=0, B=0, if A/0, B»0, 6far * (-C/A).* if C/A < 0, If A*0,. Bf'O, 6for • 0 if C/A > o; 6far B "*CJ/B' if A/0, B^0f 6for • -B -B / 2C \ |BT llB|+JBF-4AC J 6h, ro or B oo or *"R'«.^^^»»-."" (K« )__ ro or A rr updatei (H! )'-»(£!) +H 6f t A sr A'sr so or* (S. ),-•(§.) + S of , A'sr A'sr so or' (s«i ,»•-»• ,nA), (HB^OO^^B^OO^V^O' ^B^oo"*^! )oa*6hroSro• f f • 6f_„ or or or hw. + 6h _ ro ro ro t r*l,...,n4t 0 = 1 f • • • |Tlg;» 339. 7. Non^Orthogonal Diagonal Generalized Nesbet (DGNS), initialization* f » 0, SA2> = SAA HAA AA then i Y«r.aS«^+ E Srr/>f,vr*» (r«l , . . . ,n. ) , or orv/^j op prw 9 9 k 9 vvp{vv nA «or=D(2'/CtH{2))RRS00-HC0], or or update t r»l,•••tnA, tss6<oYos^ioSfos+6ftoSoo6f08 • (s,t = n.), SA • 66A t ^•."^^CT—^S 6f (r=l f • • • ,nA), or or occ or' * * A A AT.a=wl^6f«o+YL £ 6f„+(nf2j). rs ro os ro^_^ ot A ts ro oo os A ss (r,s=lr,..tnA), H<2> -H<2> * S?A . or for + 6for (r«l,...,nA). o«l,n 340* 83* Non-Orthogonal Full Generalized Nesbet (FGUS).. initialization! f « 0, "12) * SAA HAA » gA 3 SAA • then! n B ^or^^opV • <r»l....,nA), ,(2). u(2) or or 4-_« ot A tr * t=i solve. «0A[S00H{2'-H001a] = D£'.. update! r«lr.••,n lA» ,(2) (s,t=l,....nA), gA ffA * 6gA n. r(2) A+ =W* 6f -Y' 2 6f (HA ') , ts to os aic A rs * (s,t=l,.,.,nA), ^(2) *(2) -1 A HA gA for for + 6for» <r-l,..-nA). o3l(•••,n B* 341. APPENDIX 5. Rates of Convergence and Asymptotic Error Constants This appendix analyses the rates of convergence of some of the algorithms, for the determination of f, described im chapter 5»* These considerations are based to some degree on the work of Traub (1964). To avoid confusion1, between subscripts denoting iteration number, and those denoting matrix elements, the fixed point iteration formula, eq. (5«4), will be rewritten here as. 0(f) = f - XD(f), (A5.1) where X « Thus, 0(fexact) - fexact^ beCause D(fexact)»0 The basic iteration formula, eqs. (5»5)» can be written in this notation as fm*l s fm ~ <*5.2) If the necessary derivatives of 0(f) exist at f6**0*, then 0(ft) can be expanded im a Taylor series centered on ft , which allows one to write an expression for the error in the current estimate of f, given by 0(f), in terms of the error in the result of the previous iteration. One has .(*) • z *%*f* or rexact i»5*fardVr' ^exact • • ••• . (A5.3) where 4;' - <v„ -cact- us.*) 342 The iteration function 0(f) is then said to be of order p if all derivatives of the elements of 0, with respect to elements of f, of order less than p vanish at f = fexact^ wnile at least one derivative of order p does not vanish. Near the solution, the dominant term in the error will then, be a sum; over terms containing the product of p errors from the previous iteration. The asymptotic error constants for this iteration functions are taken here as the p order coefficients in (A5«3)» For the exact Newton-Raphson equations, (5«6), the iteration function is 0NR(f) = f - J^D. (A5.5) Thus, one has _£I B . £ eixl£ D , (A5.6) df r.s df T or * or which vanishes at f = fexact, because D(fexact) « 0. The second derivatives are £1 a - E e±£* D * E CJ"1), ^s df df . ., r.s df df , . r.s K*»<IBaf df . . or or or or or or (A5.7) where the identity, d(J'^Jj/df^ 85 0, has been used to obtain the last term* At f • fexactt the first summation in (A5*7) vanishes*, but not the second one,in general. The Newton-Raphson equations are thus second order convergent, as is well known, and one can write Ft o.vfp* pt'Ts *fordfa,r, r,s fexact ar Vr' + 0(e3)r (A5.8) 3^3 NR indicating explicitly the second order nature of 0(f). The algorithm SDNR is based on the equation D^*(f)=0. As seen from eq. (5*15)• the operator JC in; the iteration. SDNR formula 0 (f) is just the inverse of the diagonal part of the Jacobian matrix, X SDNR _ 1 pt,cr " j(D Pt,,pt so that, ^SDNR /- \ *Pt (f) 6«»o6rt • D<^(f) Pt - TTTJ J and d0 SDNR Pt 3f or _ e c /Qt.,or exact ^ rt " T1^ fiexacx J>trpt ..exact (A5-9) (A5.10) ( which vanishes only for P-o, and t=r, in general. Thus, for SDNR 0 , one can write .<f*>. «* * P°0t +0(e2), (A5.12) CHA ;tt - {lim >PP SDNR which verifies that 0 is indeed linearly convergent, and gives an expression for the dominant error term near the solution. A sufficient condition for convergence to occur is l«pt+1)I < \ept\* Cp-1-.-....nBi t»l,...,nA). (A5.13) Assuming that .(m) or >t e < 1„ (o=l,...,nBJ r«l,...,nA), (A5.14) is true when 0p+J is to be evaluated, it can be seen from (A5.12) 344. that convergence will definitely occur if o=|(5i1)t>.|*rJt|(fii1))rt|<|(5il>)tt-(^,t)| . (A5.15) which is obtained by replacing all the ratios of the type (A5«l4) occurring in (A5.13) by unity. When cycling systematically through the elements of f, all fQr* (<*//0t r/t), will have been updated more recently than f^ at this point, and thus, in an appropriate basis, the condition (-A5.14), is not unreasonable, as long as the calculation is converging and the errors thus decreasing. While the condition (A5.15) is too crude to be of any practical use, it does indicate that the rate of conver gence is related to the relative magnitudes of the differences between diagonal elements in HA and Hfi, and of their off-diagonal elements. Convergence requires only that the errors in the elements of f decrease over a number of iterations, rather than that the errors in each element of f decrease in every iteration. The results of test calculations in Table 5»1 show that good rates of convergence occur when (A5.15) is violated substantially for some elements of f (as for the example with n * 250). A more detailed error analysis indicates that a crucial factor for convergence is the calculation of 6f one element at a time, with continual updating of ' and Hg (and implicitly, of D^), If these quantities are updated only after a complete sweep through 6f, convergence occurs only for small nA, and nR, and is very slow, at best. 3^5. For the generalized Nesbet algorithms, based on the (2) equation D '(f)=0, the same sort of result is obtained* From eq. (5«25)» one has ,DGN d0 Pt df or .exact powtr x(2) Pt.or HPP " (\ ;tt f exact (A5.16) .DGN which does not vanish in general, and thus 0 is linearly convergent in general, with e z Hbn4?} - £ (HA2)) .e^ (m+1). o^P pq gt r*t A rt ? r Pt ("i2))tt " Kf>f> + 0(e*). (A5.17) For nA*l,. one has E H> e (m) e (m+1) . o/P <°° ° •T2T (HA *11 " V + 0(e*). (A5.18) Therefore, algorithm DGN is second order convergent when n^-l only if Hgg, is diagonal. For the algorithm FGN, the operator !K is the inverse of (2) the diagonal block part of JT defined in eq. (5*21), each such diagonal block of corresponding to a row of Hr '• The algorithm is linearly convergent, since, d0 FGN Pt fffc n. df with or does not vanish in. general at fexac1'. B3 "A po°rt -^ "fs.or, fi(2)n-l (A5.19a) (A5.19b) 346V The generalized Nesbet algorithms for use with a non-orthonormal basis give similar results. From section 5»3»c, it is seen that ,DGNS f\.wii<~> _ —1 6 6 oo A rr oo (A5.20) and FGNS _ . rH s ft(2)rl "pt.or" 6p°L oa * SooI1A J * (A3.21) Therefore, one has, ,DGNS 30 l°t df or f exact = Vo6rt + T(2) ^Pt.or S^(HA }tt " HPP. .exact (A5.22) which does not vanish in general for any values of pt and or. Thus, for algorithm DGNS, one has, T<2) e (m+1) pt o,r 6pa6rt Pt.or H pp - s/0/o(HA2;)tt (2) .exact 0(e2). or (A5.23) It is seen from eq. (5»52), defining Jv , that unless nA=l, the coefficient of e^' on the right hand side here is not zero, although it is likely very small. The expression for the error FGNS in 0 is of the same form as (A5.19a) with (A5«21) substituted for (A5.19b).. The error analysis for algorithms SDNRS and QDNRS requires an extension of the procedures used above. The iteration formula must now be written as the pair of equations ,(11),* «/(12) - -] 0f(f,h) S f 0h(f,h) h ^(21)(f,h) #(22)u,h) GfiA(f.h) g^U.*) (A5.24) 34?. Comparison to equation (5»44) yields the result, <A.t,or ~ powtr' At„or /s. f>o rt *-Vt pt „ t (A5.25) 0>(21) = - (22) . (HB } A t,or -— 6. 6. , A t.or ^ >o0trf A,t p0 tr ^Pt where is defined in eq. (5«45). An expression for the errors in the iteration formula (A5.24) must now be obtained from a Taylor series in the elements of both f and h, which yields the result, d +2-D) s £ P(Vpt (e(m)} + d<*f>pt (e(m)} "1 +4 z d2^f)^ (e(m)) (e(m)) ••.',L ^ °r (A5.26) 2, (*f}Pt (e(m)} (e(m)} ,d (*f>Pt ( (mK ( (• , • 0(e3), for 0f, with all derivatives evaluated at fexact and hexact, A similar expansion can be written for (e^m+^). Substitution of (A5.25) and (A5.24), then gives ,(m+lh . r E("A}tt(§Bikc " ^AWVW (e(m) (A5.2?) + £ E"Vtt<SA>rt - 'SA)tt'"A>rt](e(m)) +0(e2 348 and (A5.28) Neither (•fl^)p.t °r (e£m^)tp occur in the first order term of either of these equations. In all the first order error estimates derived in this Appendix, the denominator of the error estimate is seen to he identical to the denominator in the iteration formula. Thus, if this denominator becomes small, not only does 6f (or 6f and 6h) become large, but so do the errors in f (or f and h). Also, it is seen that these error estimates all involve off-diagonal elements of H (or H and S), and therefore, improved convergence is expected in all algorithms if these matrices are made more diagonal. 349. APPENDIX 6 Algorithms for the Determination of T — Multiple Partitioning Case The purpose of this appendix is to outline, in some detail, algorithms for solving eqs, (5*74) - (5*77) for the matrix elements of the off-diagonal blocks of the uncoupling operator T in an nr, x m partitioning. (l) * A6.1 Methods Based on D« (T) = 0. If the fjj, (I,J=1, • **, m, I/J), are approximate solutions to any of the defining conditions (5*74) - (5*77), and the exact solutions are given by fjj = fjj + 6fJi» "then, from the equations DJJ (T) = 0, it is seen that the exact corrections &fjj to the fJJ are given by I HJK(T°)6fKI -SfjjHjCT) * -D^T0), (I, J=l,... ,m, I/J), ^ (A6.1) where «JK^0) = HJK " fJIHIK ' <A6'2) If the exact effective operators HjfT), (I=l,,,.,m), were known, the linear system (A6.1) could be solved directly for the The Newton-Raphson equations corresponding to the nonlinear system = 0, eqs, (5*74), are ^H^K^0)*^ - efjjH^T0) - -D^CT0),, (I,J=l,...,m, I/J), (A6.3) 350. which differ from (A6.1) only in that the approximate effective operators Hj*^(T°) appear ih (A6.3) in place of the exact operators Hj(T) in (A6.1). Equations (A6.3) are obtained by substituting the Jacobian matrix with elements am*1*} aJrI^KtL 1 ,tr6o^>6KJ6LI+(HJK)o^6rt6LI • d(fiaVt (A6.4) into eq. (5.6), and isolating the JI block. The similarity of eqs. (A6.1) - (A6.4) to the corresponding equations for a 2x2 partitioning is seen if it is noted that Hgfi, = H^Mf in that case. If solved exactly, eqs. (A6.3) would lead to a second order convergent algorithm. In fact, if the Hj (T ) are replaced by the Hj2^(T°), the resulting iteration formula is nearly third order convergent, just as in the 2x2 case. However, the linear system (A6.3) is of dimension 2 2. nTnT, KJ 1 J which can be unacceptably large even when a8 S IU is itself I not unusually large. There are at least two levels of diagonal approximations possible here. In the diagonal block approximation, only terms involving &fjj itself im eq, (A6.3) are retained, leaving %j6fji * 6fji"i1) = 'BJI* (I»J=1 m» W» <A6«5) This involves the solution of m(m-l) smaller systems of linear equations in each iteration, of dimensions, respectively, njnj, (i,J=l,...,m, I/J), a considerable reduction in computation; per iterative sweep. These equations become 35U especially useful if each of the is small, that is, if the partitioning divides up the full space into a large number of subspaces of small dimension* It might also be necessary to use (A6*5) if the off-diagonal elements of the Hjj and the H£ ' are large* In the 2x2 case, eqs. (A6.5) are still the full Newton-Raphson equations, however* The lowest level of diagonal approximation of (A6.3) gives an iteration formula which reduces to that of algorithm SDNR in the 2x2 case* It consists of retention of only the indi vidual diagonal elements of the Jacobian matrix (A6.4), which leads to the iteration formula, D(D ° TrT j 1 c(4u)rr - (tjj)00 ] Like SDNR, an efficient iterative scheme based on.eq. (A6.6) would consist of cycling through the 6fone element at a time, calculating the D^ £ as required, and storing the Hj 7 J I and diagonal elements of the H\JJ continuously. Because the * (1) ^1 Rj ' and Kjj are; linear in the fJJ, they are easily updated, according to (*^I1>);sr s ^IJ^soFo r • (s=1» •••»ni>» (A6.7a) J I and (""JAO ' <HIJ>ro«Vl • ""l^rr • <k6-™> A* change im fj^ affects only ft^1 * and If the diagonal elements in different diagonal blocks of H are well separated, and the off-diagonal elements are small compared to these 352 separations, then a reasonable starting approximation is T=ln» or fjj « 0, (I,J=l,,..,m, I/J). The block columns of T can be determined individually here, in any order, because the effective "(1) (i) operators H£ , and HjJt as well as the quantities Dj£ , (J=l, ...,m, J/l), depend only on the fjj, (L=l, ,-•. ,m, L/I). Substitution of fJX * fjj •*fjI into eq. (A6.1) yields the exact equation for the &fjj, (I,J=1, m, I/J). (A6.8) The diagonal block approximation, D^^T^-^fj^ef^H^^dfjjHjjSfjj, (A6.9) has a form in &fjj like the form of the basic defining condition, eq. (2.16), for a 2 x 2 partitioning. The diagonal elements of this equation give a quadratic iteration formula, («IJ)r<,^JrI+i:(»i1,)rr-^J)a0]6VI-<Dji))or= •• (A6.10) When 6f is large, this formula may give improved convergence. °JrI The relative increase in cost accompanying the use of (A6.10) in place of (A6.6) depends on; the dimension of the problem, but becomes negligible as H becomes large. 353. (2) * A6.2 Methods Based omDjj (T) » 0, The elements of the Jacobian matrix in this case are given by, j(2) »<pg:>or °Jri*PL^K a(fLK)pt ^^J^^rt-^^^tr^^KI + (HJLV6rt6KI nI d(H^2^) 81 a(fLlVt where, from (4,89). one obtains, *r<2) ¥HI isr . v. r/_-lStx.t s . , -Ut d(fLlVt J C ( gj Hjf) g^ 6rt- ( gJ1 f^ ) s<0( Hj ) tr]6LM, (A6.12) For any n^ which are unity, the first derivatives of the corres-* (2) ponding ' are zero. The relative ihsensitivity of the effective operators Hj '(T. ),as approximations to the exact A- 4* j(T), to errors in the off-diagonal blocks of T, *(2) can again be exploited by neglecting the derivatives of Hj ' in (A6.11), This truncation leaves the simplified Jacobian matrix, J- , with the only nonzero elements given by, 7o2iT..TtT't(HJJ)0(!>»rt-(42')tr6<,p]6I|J+(HJL)0()6rt, 7(2) ^rtu \ * _ru(2) rJrI,/»LtI (A6.13) This yields the Newton-Raphson equations HJJ6fJI-4fJIfii2,t1^JHJK6fKl"-DJI)> I/J). ( 354 *(2) In these equations, only the coefficient matrices Hj depend * th * on T, and these only om the I block columnjof T, respectively. Therefore, in principle, eqs. (A6.14) can be used to solve for the fLI, (L=l,...,m, L/l) for each value of I individually, that is, for a single block column of T at a time. If the separations between diagonal elements in different diagonal blocks are not small compared to the off-diagonal elements of H, then diagonal approximations of (A6.14) are useful. The diagonal block approximation, is HJJ6fJI-6fJI*i2)="DJ2)» (^=1.J/l), (A6.15) where tne nj(n-nj) dimensional system of linear equations (A6.14) for a given value of I is replaced by (m-1) linear systems of the smaller respective dimensions n^-nj, (.1=1,..,,m, J/I). Equations (A6.15) can still lead to relatively costly overall iterations unless the products njrij are all small. However, they can be approximated further to give generalizations of the algorithms DGN and FGN. "(2) The exact change in Hj ' due to a change in a set of fKI, (K=l,...,m. K/I), is 6ft(2),gil(new,[ ti(D(2) ^^.^^(2), (A6.16) • S WfTT6f TT- E ftT6f TTH<2)], where WJI " HJI + L^ HJLfLI * (A6'17) Iff only a single fjj is changed, eq. (A6.16) reduces to the 355 same form as eq. (5«23)t .««2>.^<—>[.fJ1(»«)+HJJ.fJ1.«JIfii«) (A6.18) + WJI6fjrfJI6fJl"i2)3' For any change in fjj, the entire operator H£ ' is changed, and thus, as for a 2 x 2 partitioning,, it is most efficient to change groups of elements withim the matrix fJJ before updating H^'. The generalization of the algorithm DGN is (D(2)) 6Vis (fit*>>" !H ) • (ral»-»ni)' (A6a9) J 1 (HI 'rr " (HJJ>oo and when all Oj- elements of the c^*1 row of fjj are changed im *(2) this way, the change in H£ ' is 6S(2)=g-l(new)[(f(„ew)t)io(6fji)oifi(2)+(wti)io(6fji) ol +(6fJI)Ic(6fJI)oIH<2>d], (A6.20) The generalization of algorithm FGN is (*fJI)cI-(DJI))alt<HJJ)aa1I-5i2)^1' <A6-21) * (2) and the corresponding change in Hj 'is •fii2>-^l(nW,,C(wJI)I.»faI-(*J1)1(,MelH<2)D» (A6.22) In eqs, (A6.20) - (A6.22), the symbols Io (ol) indicate the o columns (rows) of the I block row (column). 356. A6.3 Methods Based on the Simultaneous Solution of Gjj.(T)a0 and fijj(T)s0. The Newton-Raphson equations arising from eqs. (5*76) are the pairs E (6fTT)fWTT+ 2 (WTT)f6fTT* -GTT, (A6.23a) I and (6fJI)%6fIJ+LEi[(6fLJ)tfLI+(fLJ)t6fLI]=-gJI, (A6.23b:) L^J (I < J = 1, .... m), where the quantities W"LI are defined in eq. (A6.17)« All of these equations must be solved simultaneously. That is, they cannot be separated easily into a number of subsets without common variables. The diagonal block approximation to (A6.23) is the pair f6fIJ)twlI+WJJ6fJI-°JI' (A6.24a) and efjj+UfjjJ^-gjj. (A6.24b) Solving the diagonal parts of these equations simultaneously for corresponding elements of and (6fJJ)*, gives the iteration formulas <WII>rr " <WjAa and 357. which are similar to the iteration formulas for algorithm SDNRS, The simplicity of eq,. (A6,24b) makes it possible to eliminate either or (Gfjj)* from eq,. (A6.24a), leaving an equation in terms of only one of these. Substitution of (6fIJ)t = -(gjI • 6fJ3;) (A6.26) into (A6,24a), gives the equation -ofJIWII • wjjfefjj = -GJJ • gjjWjj , (A6.27) for 6fJI# The diagonal part of this equation yields Wjl)or-(QjI><nP " (gjfIl)gr , (A6.28) (WII}rr " (wjAo and, using this in eq, (A6.26) gives ,* , _-<GJI*or^gjIWII)or-(gjI>or^WII*rr-(WL)oo3 r° (WII*rr " ^JJ^oo (A6.29) These formulas amount to addition; of the quantity E ^gJl*ot^WII*tr t/r to the numerator of (A6.25b) and its subtraction from the numera tor of (A6.25a)„ If bfjj is first eliminated from (A6.24a)„ then the same sort of result is obtained,, but now, the quantity 2 ^\i)aa (S^TT)oy. is added to the numerator of (A6,25b) and p/o ™ ox pr subtracted from the numerator of (A6,25a), This ambiguity in iteration formula is undesirable. Only actual numerical studies can determine whether the inclusion of these potentially costly additional terms in the iteration formulas (A6.28) and (A6,29) is justified, in comparison to (A6.25). The implementation of these iteration formulas is similar 358. to the SDNRS algorithm. The quantities WJJ, (I,J=l,...,m) are stored, and updated continuously as the elements of fj^ are changed. The elements of GJJ are calculated from GJI * WJI + L^ <fLJ)twLI» (A6-3°> and those of gj^ fronr (5*76b:), as required. A6.4 Methods Based on D^(T) » 0. The Newton-Raphson equations corresponding to eqs. (.5••77) are ^<»Wt*6W^<6Wtfw+<Wt*fiBi>3^a) (A6.31) +^2K(6fLK)TWLI+^(WLK)t6fLI=-GKI,, (K,I=l,...,m, K/l). All of the off-diagonal blocks of T occur in each of these equa tions in a complicated manner. If only those terms Involving 6fKI on the left hand side of (A6.31) are retained, a much simpler approximation! results, WKK6fKI ~ 6fKIDil) = -GKIf (K,I=l,...,m, K/I)., (A6.32) Note that D^j = (g G)IX / Hj. ' unless eqs. (5*77) are satisfied. It has not been determined whether or not an efficient iterative procedure can be based on eqs. (A6.32). The quantities are comparatively costly to calculate, and the iterative scheme would apparently require the maintainence of some estimate of gT1 throughout the calculation. 359. APPENDIX 7 Additional Perturbation Series — Orthonormal Basis In section 6.2, several perturbation formulas for the mapping f» and the effective operators HA, GA, and HA, were given. The purpose there was to give the low order terms of the perturbations series solely in terms of the perturbed operator H. Such perturbation formulas for the effective operators would then have a general significance, in that they are not necessarily obtainable only from the partitioning form alism; presented in chapters 2 and 3*> For example, the formulas for HA can also be obtained using a canonical transformation; formalism. The purpose of this Appendix is to supplement the material in section; 6*2,. Additional information on the efficient calcu lation, especially of high order terms, of the perturbation; ±1 +4 -* series for g^ , gA , and HA is presented. The formulas tabulated in section 6,2 become too lengthy with increasing order to be of practical use much beyond third order. Perturbation series for those powers of the metric gA, namely gA t and g^ , can be obtained in; a number of ways. In terms of the perturbation series for f, the series for these quantities is obtained by generalization of the familiar power series expansions, sA* S <*A + ftf>* s " £iKn) n=0 A 36o (A7-U gj* - (1A • ffff)-* = t grli(n) =lA-iftf+|(f^f)2-^(ftf)3^(ftf)2|'-^(ftf)5+..., (A7.2) and S (1A + FTF)"1 = * G:1(N) ni=0 = 1A - fff + (fff)2 - (fff)3 + (f^f)4 - ... . (A7.3) In each of (A7«-l)» (A7.2), and (A7.3)» actual expressions for the gj^n^„ gj^11^. and gj1^ are obtained by substituting eq. (6.8) into each product, and isolating all terms of order n. Tables A7.1, A7.2, and A7»3 contain some low order formulas of this type. Perturbatiom formulas based on eqs. (A7.1) - (A7.3) cam be generated to high order automatically without great; difficulty, but with increasing order, they rapidly become very lengthy, and thus costly to use. For automatic computation of high order terms im each of these series, a more efficient procedure is available. It is possible to obtaint a series for any of gA ,, g^ » and gA P in terms of that of g^ and other powers of gA by expanding identities of the form S^I1 = *A » (A7.4a) (g^)2 • gj1,, (A7.4b361. <gA*>2 * gA (A7.4c) gA*%* - 1A (A7.4dgA*€lgi* * h (A7.4e) gAigA gA* * *A •• U7M) and so on. From (A7.4a), one obtains giving g^1^) in; terms of gA^f (I = 0,..., n), and lower order1 g^1^^* (3 = Of •••» n-1). Here, the second term is assumed: to make no contributiom when (n-1) < 0.. Similarly, eq. (A7.4c) yields the expansion gAl(n) = igin) - i eA*(**V(3)- (A7'6) j^i giving g^^^ in terms of g^11^ and lowerr order terms in the i expansiom of gA • Equation (A7.4b) yields a similar result for gj^^n^ in terms of gj[^n^ and lower order terms in the series for gA • From eq. (A7.4d), one obtains gAi(n) « - L gAi(n^)gIiU)» n>2, (A7.7a) A A A and «•*<*>.. E g^^g^^, m>2, (A7.7b) which Inter-relate the perturbation; series for gA^ and g^. 362. Somewhat more complex expressions are obtained from eqs. (A7.4e) and (A7.4f). The important feature of eqs. (A7.5) - (A7.7), and other similar ones, Is that the evaluation of a le order quantity generally requires the evaluation of no more than It-products of two nA x nA matrices. As k increases, this represents a rapidly decreasing fraction of the computation! required if formulas like (A7»l) - (A7«3) are used explicitly. This substan tial computational advantage is a result of not having to repeatedly re-evaluate certain often-occurring combinations In (A7«l) - (A7.3)» The need to store all lower order terms in these series for use in the calculation of higher order terms may be regarded as a disadvantage, but it Is of no consequence if nA << n^,. An appropriate combination of eqs. (A7.5) - (A7.7) with eqs. (6.15), together with eqs. (6.12) or (6.13), is certainly the most practical procedure for the calculation of high order terms in the series for HA» Equations (A7»5) - (A7«7) can also be used to obtain the ±4-1 /1) terms in the series for gAa and g^ solely in terms of the g£J • Such expressions are particularly useful when moderately high order calculations are being done by hand and algebraically, rather than numerically by machine. Tables A7.4, A7.5 and A7.6 contain several low order formulas of this type. Note the simplicitly of these formulas relative to those in Tables A7.1 - A7«3« Tables A7#7 and A7.8 contain expressions for low order terms in the series for H"A in terms of only gA and R*A or GA. Finally, Table A7«9 contains low order formulas for GA 363 Irc which the equations Dv '(f) = 0 have been used to eliminate all terms involving Hgg.* These formulas are the same as those derived using eqs. (6.14). TABLE A7.1 g^"* g U0) « 1 A XA gA = 0 %i(2) . 4fd)tf(l) %i(3) a^(f(l)tf(2)+f(l)tf(2)) gAW . 4(f(l)tf(3)+f(2)tf(2)+f(3)tf(l)) m l(f(Dtf(l))2 gj*<5> . i(f(Dtf(4)+f(2)tf(3)+f(3)tf(2)+f(4)tf(i)) .l|f(l)tf(2)+f(2)tf(l)f f(l)tf(l)J SM6) m i<f(l)tf(5)tf(2)tf(4)+f(3)tf(3)+f(4)tf(2)+f<5)tf(l)) _l|f(l)tf(3)+f(2)tf(2)+fr(3)tf(l)r f(l)tf(l)J + .l(f(Dtf(2)+f(2)tf(l)}2 ^(fXDtfCDp 364* TABLE A7>2 g^(n) gA XA -K2) = „Mf(i)tfd)) g-*W * .4(f(Dtf(3)+f(2)tf(2)+f(3)tf(l)) + |(f(Dtf(l))2 g-i(5) = .|(f(Dtf(4)+f(2)tf(3)+f(3)tf(2)+f(4)tf(l))  +||f(l)tf(2)+f(2)tf(l)f £(l)tf(l)J+ gj*(6)" = -i(f(1>tf(5)+f(2)tf(4)+f(3)tf(3)+f(4)tf(2)+f(5)tf(l))  +||'f(l)tf(3)+f(2)tf(2)+f(3)tf(l)( f(l)tf(l)j +  +|(f(2)tf(l)+f(l)tf(2))2 .^(f(l)tf(l))3 365. TABLE A7.3 gA li -id) a o gA g"^2) = _f(l)tf(l) gA -1(3) = -(f(Dtf(2)+f(2)tf(l)) gj[1(4) = .(f(1)tf(3)+f(2)tf(2)+f(3)tf(l)) + (f(l)tf(l))2 g-lf5) * .(f(Dtf(4)+f(2)tf(3)+f(3)tf(2)+f(if)tf(l))  +ff(2)tf(l)+f(l)tf(2)j f(l)tf(l)J g-^6) = „(f(l)tf(5)+f(2)tf(4)+f(3)tf(3)+f(4)tf(2)+f(5)tf(l))  +{f(l)tf(3)+f(2)tf(2)+f(3)tf(l)f f(l)tf(i)j + (f(l)tf,(2)+f(2)tf(l))2 _ (f(l)tf(l))3 TABLE A7.4 eA^^ - i(o) gA = gA s 0 .4(2) gA s *42) .4(3) gA s *43) ff 4(4) gA s - M2)2 m 4(5) s *45) - M42). gp*i+ g *(6) s *46) - toi*'. ff<2)> . 1_(3) gA 5+ HgA To^A .-id) gA TABLE A7.5 g^(n) • o TABLE A7.6 gj1*1** --K0) gA • *A .-id) gA • 0 %1(2) • 42)2 €1(5) = -g(5) gA •J42). %1(6) gA + [42).. gP^5+ v3 367 TABLE A7.7 H*Ani) im Terms of the gAn) and n|re) HA ~ HA sp} • *P> Hp> = fi|3) • i[gp>.. ftA0)]_ • *G42). iii1']. sp' * ap * *[gp. fi|0)]- • «43). si1'].+ *c42). "PL 1J2)2&(0) + 3ft(0) J2)2 . 1J2)S(0) (2) "S^A HA * SHA GA 5% HA GA Hp' . Hp )+i[gp> .Ap> ]_+4[gP) .Hp > ]_H[gp > .Hp > ].H[gp > .Hp> ]. sp - sP'ncgP',ai°>].+*[gp).ap>].H[gP».sp>]_H[gp>.ap']. -kP)2HP)^0)gP,2^42)3Hp)-#P)gp)3 4f42^431X1)^P,f42)43)K-|sP)26P)^P)gP)2 IGA "A GA 5GA NA GA To^A NA GA 15GA "A GA l^(3)i(0)_(3)^(3)i(l),(2) 1^(2)J(1).(3) l.(2)S(2) (2) T^A HA GA T^A HA GA ^A HA GA TfGA HA GA 368 TABLE A7..8 K["^ im Terms of the gj"^ and ~4oy - *i0) Si2> - o«> - *fc«>. oi<»} + «i3) • °i3) - »fci3). °i0)}+ - *(42>. 'Ph -*lsi2). 42)U • ki2M0)42) R<5) . op)+/Oi0)..445)tif42)43)jJ++fG(l),.ig(4)+ig(2)2}+ -i[ai2>.,gp)K.i{(Jp)„g(2)j + «i6) • ^)-*l46>.40)}+-^gp>,Gii)}+-4bf>.of)u -*fri^.oi3)}+-*{gi2,.af)K4{ai<»,.[gf),^)}+U 4H0^43,2U^i0\42,3U4fci1,.t42).43)!+U 4K( 3) G( 1) J2) 4S(2) E( 1).( 3) 3 Jt2 )2 ( 0) (2) ^A "A *A T?SA "A SA I6GA ?A SA 3,(2)0(0) (2)21 (2) (2) (2) T^A UA ^ Tiek UA GA 369. TABLE A7»9 GAn) r(0) _ H(o) GA ~ AA UA ~ AA Q(2) = H(2)+H(l)f(l)+f(l)tf(l)H](0) A AA AA °A3) • Ki2>+K«*(1)+,tAB)'t2>+*<1)t'(1)HAi) + (f(l)tf(2)+f;(2)tf(l))H(0) /v/v °A° " ^)*HA|)f(i)+^)f(2>+fCi)ff(l)KU) + (fP(Dtf(3) + f(2)tf(2)+f(3)tf(l))H(0) AA 370. APPENDIX 8 Non-relativlstic: Approximations of the Dirac Hamiltonian The purpose of this appendix is to list expressions for the various effective operators dealt with in section 6.3a,; to order (v/c)^, for comparison;with expressions reported by DeVries (1970). In order to facilitate this, the terms im the Dirac hamiltonian will be written in the symbolic form H Co) . m 0 0 a* 0 -m a 0 H (2) _ 0 0 0 .0 (A8.1) Here all the natural constants have been dropped except for the mass m, which is useful in comparing the formulas below to those in sectiom 6.3A» In this notation, the reduced resolvent is (A8.2) Formulas of the type given in Table 6.6 or 6.7 give the following first six terms in the series for f,, 2m (2) t: = o, f(3) = -l.[0a - a0] ^? aafa,, 4mi 8mf' = 0, f<5) = JL 8m-(A8.3) ^[02a+a02-20a0] + "-~r[-20aata+2aata0+a0ata-aat0a] J l6m> * a(afa)2„ 6m? 371* and f<6> = 0, Only odd order terms are nonvanishing in the expansions for f• Formulas of the type given in Table 6.8 can now be used to obtain, H<3> = 0, (A8.4) A HA -Vaf0a - afa0) - -MaV2, kmf 8m HA = 0„ - (at02a+ata02-2at0a0) A 8mJ + ~^(-2at0aa+a-ataat0a+2(ata)20+ata0ata)+—^-?(afa) I6nr* 16m3 As expected from the structure of the perturbation, only even order terms are non-vanishing in this expansion. It is seen that the fourth and sixth order terms here are explicitly non-herraitianv Comaprison of eqs. (A8.4) with the formulas in Table A8.1, obtained using the Paul! elimination method, indicates that both sets are Identical. For calculations of GA and HA„ up to sixth order, formulas of the type given in Tables 6.10 and 6.11 are cumbersome. It 372 is preferable to calculate the series for g^,, * ana-and to use eqs. (6.14) and (6.15b) or (6.15c), respectively. The perturbation series for these metric quantities are given in Tables A8.2 through A8.4. Again, it is seen that they contain only even order terms since they are defined in only a single subspace, S^. Equation (6.14) yields, a<°> - », ap) - o. GA2> " *+ & «*«• and1 r(3) _ o GA ~ °* (A8.5) = -^•(4at0a-0a+a-ata0) ~(afa)2, A 8m- 8m^ GJ6) « —i-7(5at02a-3at0a0-30at0a+ata02+02ata-0a+a0) A 16m? + --^(-•4a+0aata-4ataat0a+(ata)20+0(ata)2+2a+a0ata) 32m. + -l,(a+a>3. 64m? Equation (6.15) yields, Ri2) • * 373. = -K (2at<z5a-ata0-0a+a) - -~ (a+a)2, (A8.6) A 8m 8m3 Hp) . 0„ ^l6) = (2aVa-2at0a0-20at0a+aat02+02ata) A 16m3 +—^r(12at0aa+a-12ataat0a+7(ata)20+70(ata)2+lOata0at 128m-(afa)3. 16m3 Both of these operators are manifestly self-adjoint. Comparison of eqs. (A8.6) with the formulas in Table A8«5 obtained using Eriksen's method (Eriksenr 1958) indicates that the Eriksem hamiltonian is identical to HA„ at least to sixth order. The transformation* V, used by DeVries (1970) to transform the Pauli hamiltonian into the Erlksen hamiltonian, «Er " V HPauli <A8-?> Is given in Table A8,6. Oni comparison! of Tables A8,4 and A8.6, the similarity transformation, V"*1, implied by eq. (A8.7) is seen, to be identical, to fourth order# ta gA"^. 374. * m = 0 0 • ^jja+a TABLE A8.1 Pauli Hamiltonian (adapted from De Vries (1970)) (0) Pauli (1) Pauli (2) Pauli (3) Pauli (4) Pauli (5) Pauli (6) Pauli = 0 -^(-ata0+at0a) ^(afa)2 4m 811^ = 0 = -^?(ata02-2at0a0+at02a) 8nr* +__LT-(2(a+a)20-ataat0a+ata0ata-2a+0aata) + 16m4 ^(a'a)3 16m-TABLE A8.2 g^ — Non-relativistic Approximation 0 a ~^(2at0a-0ata-ata0) McJa)2 A 8mJ 8m* £(5) . 0 gf^ = -~^-r(3at02a-30at0a-3at0a0+02ata+ata02+0ata0) l6m + (-4ataat0a-at0aa+a+30(a+a)2+3(ata)20+2ata0ata)—^-£(a+a)3 32m:? 64m° 375. 4 TABLE A8»3 — Non>relativistic Approximation g i(0)= 1 A XA g.*'"0* -^(ZoVa-dSaVoV) ^rta'a)2 A l6mJ 128J>T SA *= -^Tt( 3at02a-30at0a-3at0a0+02ata+a fa02+0afa0) A 32m7 •—^-F(-18ataat0a-l8at0aata+130(ata)2+13(ata)20+lOata0ata) 256m? +_J£> (ata)3 1024m° TABLE A8.4 g^ — Non^relativist^- Approximation ^A A gIiU)= ° gAM2)= 1 ata  A 8m ^(3). 0 ^' "= 16m3 128m -4(4)= _l (.2at0a+0ata-»-ata0) + -^(a^)2 -i(5) g-£(6)e —L^(.3at^2a+^at^a+jat^a^-i^2ata_atafl}2-^ata^) A 32mi —^-?(22ataat0a+22at0aata-150(ata)2-15(ata)20-l4ata0ata) - -^(a+a)3 1024m° 376. TABLE A8.5 Eriksen Hamiltonian (adapted from DeVries. (1970)) Er, " m 4i} • o = o HP> "Er HiiJ * —i5-(ata0-2at0a+0ata) - -~(a+a)2  Er 8m2 8m3 0 ni.^ = —^-7(ata02+02ata-2at0a0-20at0a+2at02a) Er 16m3 +-~^-Tr(7(ata)20+70(a+a)2-12ataat0a-12at0aata+lOata0ata) 128hT +-^-7(ata)3 16m5 TABLE A8.6 Transformation Connecting Hpaul^ and H^ (adapted from DeVries,. (1970)) vi0) • iA vi1' = 0 A V<2) s 1 ata 8m2 v<3) = 0 A y(^> = -L-(2at0a-ata0-0ata) ^-r (afa)2 A 16m3 128itT 377. APPENDIX 9 Additional Perturbation iSeries — Non-orthonormal Basis In this appendix, some alternative perturbation formulas, applicable in the case of a non-orthonormal basis, are derived and listed* In particular,, the series for the metric gA, and its powers, and two sets of alternative formulas for the opera-tors HA ,are given, here* Formulas for the perturbation series for gA in terms of f and S are obtained straightforwardly by expanding eq* (2,103a) to obtain gA - ? g|n). A n=0 A where _(n) a s(n)+n"1rs(n-j)f(j)+f(j)t,s(n-j)+n"Js(n-o-i)f  gA AA £1 AK BA i=l BB (A9.D Explicit expressions for several low order terms of (A9.1) are given in Table A9*l* It is seen that gA now contains a nonzero first order term. Similar explicit expressions could be obtained for the -1 4 -4 matrices gA ». gA • and g^ • They rapidly become even more lengthy than those im Table A9.1. and lose their usefulness. However, eqs.(A7*4) - (A7.7) still hold, and can be used here to express the perturbations series for these powers of gA im terms of the series for gA itself. Such formulas, given im Tables A9.2 - A9.4, are: seen to be very similar to the corres ponding formulas in an orthonormal basis, given in Tables 378?* A7.4 - A7«6. They are more lengthy generally, because of the presence of the first order term in gA« Finally, formulas such as those given in Tables A9»2 -A9«4 can,again be used to obtain: useful alternative formulas for the HAn* in. terms of the g|n^ and either HAn) or . Low order expressions of this type are given in Tables A9»5 and A9.6. TABLE A9»l — Nom-orthonormal Basis g<°> . 1 gA lA Jl> = S(l)  gA AA g(2) . f(l)tf(l),(SAl)f(l)+f(l)tS&A)) + aU) gC3) . f(l>tf(2)+f(2)tf(l^ (4) B f(l)tf(3)+f(2)tf(2)+f(3)tf(l)+(f(3)ts(l)+s(l)f(3)) A BA + (f(2)*S^>+SU)f(2))+f(2)tsa)f(l)tf(l)ts(l)f(2) e(5) = f(l)tf(t)+f(2)tj(3)tC(3)tf(2)+£('»)tf.(l)+(f('*)ts(l)+s(l)f(t)) A SA AS t(f(3)tsU)+s(|)f(3))+f(3)ts(l)f(l)+.fCl)ts(3)f(l) •f<1>*S«>*«?>*(f<2>*si5>*<3>f<2>)*f<2>*S«)f«1> +f(2)ts(l)f(2)+f(l)tsU)£(2)+(r(l)ts(W+sWf<l)),s(5) 379. TABLE A9.2 gA^(n^ — Non-orthonormal Basis - i(3) = AJ3) . ifJD -(2)1 + 1 (i)3 gA tgA " ^lgA • gA J+ + T5gA - = J^iLd) .(3)1 JLfCz)2*! f.(i)2 J2)J gA tgA 8CgA »gA 5+ SgA TolgA »,gA J + gAi(5) • *«i54«i}-n+4fri2)^3^«i)2.«p^ T5lgA "gA *+^5gA gA gA +T5gA gA gA .4fff(D3 -(2)1 .^(.(l)-(2)-(l)2 (l)2 (2) <lK 1 ( "l2^lgA »gA i+"I75(gA gA gA +gA gA gA J"2l5gA TABLE A9>3 gA^n) Non-orthonormal Basis gI*(0) • h sii(3) • -43^{41,.42,!--£41)3 ^w - -^4feP.43){+^P2-^fei1,2.42l ^(5) • -45,A^41^^£«).43,}t^4l)2.43,i+ -^{1)43,41,^^2,2-41,^-^42)41,42) ^41)42,^42)41,.41,2L-^41)5 380, TABLE A9.4 g^1^ "• Non-orthonormal Basis -1(0) . . ^ - -42,-41)2 gA1(3) • -43>-&2).«il,K - 4 *ilW - -^-M3'.^- -42,2-f41,2.43)i+ -41,42)41,*41) gi1(5) • -45,+f^).41,K443,.42)L-f41) .43,L-41,43>41) -(42,2.41,L-42)41)42)441)42)*42)41).41,2L-41) TABLE A9.5 K^cx) — Non-orthonormal Basis H(0) = H(0) A AA »i2) ^i2,+*c42,.H^)i+i[g|1).fi|1)]. i (D2„(o) i.(i>n<oUi)+3H(o) (i)2 ^gA AA TfsA AA «A 8 AA SA «i3) • "i3)^c43i41)L+*c42,.5i1,].^c41). 42,X441,2«A1) 441)ai1,41)^i1)41,24f4l,.42)}^4"iA,{41,.42)it ^41)2H{r41)^41)<)41)2 381 • TABLE A9>6 Non-orthonormal Basis HA " AA ^ • 41,-*(41). 40,h sp>. 0p)-*{43).40,L-H42,.41,}+-*{41,.42)K ^{1,41)41,^42)40)41,^{1)40)42) -^i1,240,41,-^1)40)41,2 382. APPENDIX 10 Self-Consistent Perturbation Theory When F^°^ Is not Block Diagonal The requirement that the zero order part of the Fock matrix; be at least block diagonal was imposed in section 7.4 for reasons; of convenience rather than necessity.. The basic changes in the formalism resulting from a relaxation of that requirement will be summarized here* If p(0) has nonzero off-diagonal blocks, eq. (7«23) implies the existence of a zero order term in the series for f, given by the equation D^fX-P^P*^ 0.. (A10.1) This equation has a non-zero solution f(°^ in general,, if FBA^ ^ °* ^ecause i* is Jus-t "*ne defining equation for the mapping f(°) corresponding to the non-block diagonal F^G/>. In the coupled Hartree-Fock perturbation formalism, the n order equation (defining f. ') now becomes D(ir)(f) = F^>+^(P^-^f^Lf^ipj^-J?) • I "i1 f(i)pC«-i-3-)f(J') i=o j=o AB =GBA(fin))+GBB(fin>)f(0)-f(°)GAA(f{n)) -f(0)GAB(fin))f(0) * 383. •in,(?;(n,)^«fi(B,'f(0,•f(0,pS,(fi(B,, ^(0)F(n)/«'(n)xf(0).n-\p(n-j)f(j)-f(o)p(n-j)x "f AB5 {FA }I j«i 325 1 1 AA ; i=l j=l AiS * 0, (A10.2) These equations can be written in the simplified form DTsn)(f) L >W fo? - Crs)= °' <A10'3) (r=l, ••••»• nfi» s«l, nA), but now,. B - A + E A f(0)- E A f(0)- E f(0)A f(0) Hrsor " Arsro £ Ar^ro>s ^ AtsroVt *t Vt ^wVs + CFiCf(0))Va».r- 'V^^rsW <A10^> and c(n) . pCj^U))^^ -f(°hl«h?kin))fi0) ^(F^-^f^Lft^P^-^) «j~ 1 - NEX "E1 f(i)Ffn-i-^/f(^. (A10.5) i=l j«l A* The operators Pfi , and FA are defined formally ih eqs. (2.66a) and (2.65a),. respectively. 384. The additional complexity of eqs. (A10.3) - (A10.5) over the corresponding equations given in section 7»4 for f^G/*=0, is easily seen. Nevertheless, there are situations in which it may be desirable to use this formalism. For example, if the calculation is to be carried out in a particular basis (for instance, localized orbitals of some sort), it is probable that the zero order Fock operator is not block diagonal. It may, however, be more efficient in such a case to carry out the calculation in a second basis in which F^0/t is at least block diagonal, and then transform the results back to the desired basis. It must be remembered that the presence of a nonzero invalidates all the perturbation formulas derived in chapter 7, including those for PA and E. 385. APPENDIX 11 Minimization Algorithms Details of two minimzation algorithms referred to in Chapter 8 are given here, with particular reference to direct energy minimization calculations for closed shell systems, All.l Method of Conjugate -Gradients The conjugate gradients method is a descent optimization procedure. It can be regarded as a steepest descent algorithm with memory. As is true of any descent method,, the value of the object function cannot diverge here if it is bounded from below. However, convergence is not guaranteed in general., As applied to the closed shell case, when the energy is to be minimized with respect to the elements of the operator f, the algorithm is as followsr 1* Initialization—an initial estimate of the f-operator, leading to an initial estimate of the density matrix, R, is required. An initial estimate of the Fock matrix, F(R), is calculated from: this initial density matrix. 2* The energy gradient is calculated, ^JE * 4 Fgtg , (all quantities real). B A 3* Given VfE» and "the search* [direction used in the previous iteration, ^old, the current search direction is calculated as 386. v = - VjJE + 3v' .old where 3 a fa8! n o^rlVArl2 If this is the first iteration (or an iteration numbered a multiple of nAnB,) take 3 = 0, that is v = -VfJS, which is the steepest descent direction. 4., Minimize E(f • Xv) as a function of the single parameter X, representing a step length along the current search direction. This is usually done using a cubic interpolation procedure of Davidon (see Garton and Sutcliffe, 1974).. 5 Update, and re-evaluate R and F(R). If predetermined convergence criteria have not been satisfied, return to step 2. Otherwise, exit the procedure. The linear search is the most costly step in the calculation. It is therefore important to use interpolation schemes which do not require a large number of energy evaluations, and which make maximal use of the information available. The cubic interpolation formula will give the exact minimum of a quadratic function, and is therefore quite suitable in direct energy minimization calcula tions,, especially near the energy minimum. In the calculations f f + X mini 387. reported in section 8.2.c, a second interpolation procedure, based on the secant method for solving nonlinear equations, was used. Given values of dE/aX at two points along the search direction, an approximation to the minimizing step length is given by E'(X9)X1 - E'(X1)X?  X . ?_1 L_2 # (All.l) mini » v A2 " Al While this interpolation formula does not make use of all the information available (it uses the energy derivatives, but not the energy itself), it does have the advantage of not requiring that the energy minimum be bracketed by X^ and X2» If E(X) is a quadratic function, ^m^n.given by (All.l) is exact. Since both the cubic interpolation formula and eq. (All.l) locate the minimum along the search direction only approximately, it is necessary to ensure that E(f + X„, v) is indeed less than mm E(f). If this is not so, then a second interpolation on one of the two subintervals of the original interval must be carried out. Finally, it should be noted that components of the search direction v on surfaces where E is constant can only enter via the memory term. Therefore, if the calculation is converging (that is, if *s decreasing),, then g < 1, and these compo nents are attenuated imsucceeding iterations. All.2 The Newton-Raphson Method The application! of the Newton-Raphson: method to the closed shell self-consistent field calculation involves a different \ 388. strategy for determining stationary values of the energy, namely, solving for the roots of the system of simultaneous nonlinear equations F*t« =0, This method is not a descent method, and eBrA does not necessarily converge to an energy minimum. The overall algorithm as applied to the closed shell case can be summarized* as followst 1* Initialization—same as for the conjugate gradient method.. 2» The energy gradient is calculated, , ^f_E a 4 Fgt- , (all quantities real). 1 B'A 3. The Jacobian matrix is calculated (the Hessian matrix of the energy), 2 "or,«rs " f • or Ts 4.. The Newton-Raphson equations, J6f • -VfE. are solved for the elements of the correction 6f to f. 5.. The f-operator is updated, f ^ f + if, and new esti mates of R and F(R) calculated. If the prescribed convergence criteria are satisfied at this point, the calculation is terminated. Otherwise, return to step 2. The Newton-Raphson algorithm is conceptually simple to implement in the sense that there is no ambiguity present like 389. that associated with the linear search step in the conjugate gradient method. It is second order convergent\ one Newton-Raphson iteration! being roughly equivalent, in principle, to m conjugate gradient iterations, where m is the number of independent variables in the problem (Daniel, 1965). However, the large amount of computation required per iteration as m becomes large tends to offset the rapid rate of convergence, and it is generally considered inapplicable for application to self-consistent field calculations, as outlined above. 390. APPENDIX 12 Derivatives With Respect to Real and Imaginary Parts of f. Most of the formulas derived in this chapter have been in terms of the elements of f and their complex conjugates. Under some circumstances, it is more useful to rewrite these formulas in terms of the real and imaginary parts of f, denoted here as and f1. If a real basis set is used, it is necessary to have derivatives of the energy only with respect to the real part of f. The formulas for obtaining these derivatives from the previously obtained ones are summarized here. Writing f«r. = f!L • ifL r f* = fL - if* , or or or or or or one has fTl df__ af!_ dfL 1V 8f df* ) * — \ cr or / *-or "or "or and a2 _ a2 A a2 A a2 a2 and af^ af;; af af- af af- af af af af« or Ts or fs or Ts or* TS or TS 2 2 2 2 2 a* , ag , a* . a41 a* —_i * ,+ 5~ • —5 5 y-af:Laf:L af^af-,, af af,a af JLC jf 8L0 or TS or Ts or "s or "s , or TS a2 a2 a2 ^ a2 a2 af!LafJa af* af* af-,o af* af* or TS or Ts or TS or Ts or Ts It is worth noting that if both E and f are real, then aE/df1 vanishes. 391 APPENDIX 13 Covariant and Contravariant Representations -- The General Case An analysis of the metric properties of the non-orthonormal molecular orbitals defined ini eq. (8.6l) for a general multi-partitioning, can be carried out in a manner analogous to that of section 2.1.d. The major formulas only are summarized here. We have «&> - (1 - R(1,S)j]t and ~<i) _ R(i) eJI " KJI - (A13.1b) Writing g(i) . g(Dt-(i) ( (A13.2a) one obtains S(D . mlx g<irVi)tf(i) ci)-1 gJI 11 JI J-^J-P1?! gI r-i LI rLI JgI Pal Lsl - (R(1) - SR*1)2)^ - g£>\ U/I). and 392, gJK *JKpti1 SI PI bPK bJPrPI gI fKI J ppi^tlbJPrPI gI XLI £LI gI rP*I bP*K * (1 - R^S - SR(i' - SR^^S)^, (J,K / I),; (A13.2b) demonstrating the non«-orthonormality of the e^ with respect to the identity in general. A set of vectors, e/*^, dual to the are given by ZlV " ~fm*' (M ^ (A13.3) and »II SIPfPI * They are also non-orthonormal with respect to the identity, ; e where (Dte(i) tt gU)t (A13.4a) and « "E1 S f(i) - f(i) E S f(i) §L1 pZl LP PI LI p*t SIPfPI = g**'*,, (L / I) , s (A13.W ,Ji) s ^ f(i)ts « if(i) . In s JP;,8l fpi spjsjp,fp,i • 393. However, these two sets of contragredient vectors can be used to construct metric matrices, with respect to which they are orthonormal. In detail, s(i)t A(i)?(i) . 1( (A13.5a) where the blocks of * e^e,^* are **** &IM - 6LM •,5l1si»fn)fp*itsp'ii' <*••» * and (A13.5b) A(i) . "J1 *(i)tf(i) + 5' o f(i)f(i)ts Oil * jl± fJI fJI p#p?=1 SIPfPI fP*I SP'I* Similarly, ^Dtgjiyi) . lt (A13.6a) where the blocks of B e*1^1** are,. 2a)•Hr*L1M1^S^thr<Svpi))4l)^1>, P.P'al (L,. M / I), (A13.6b) 394. fid) « -(D-d)"1 "J1 rf!(i)ts ws .fCpjJi)"1 ^LI FLI GI JJ^J (FPI SPJMSJP,FP'l'GI P,P'«1 -"ESS f^U1*"1 • f(i)Kd)-2 B^Tii^n gi ILI gi and AII " GI L1* UPI spjMSjp,fp,i'-lgi P,P*=l It is seen that g(i) / S(i/, and g}^ f ^XK so that the matrices e^ and e}x^ are not normal in this general case* where the fixed basis is non-orthonormal. The above formulas simplify greatly if the fixed basis is orthonormal. One obtains, ®JK' = (1 - R(i))JK, (K/I), *\ (JfKslr ...f m+i)V \ (A13.7) ~(i) . R(i) J d«l. .... m). EJI " KJI • Then, one has, ~(i) . g(i)tg(i) . ,(i),<i>t . • 2fi)t, (A13.8a) where = d - R(i))LM. (L.M/I). * 0 * gii)fr (MD. (A13.8b) andi *d> . Rd) gII KII Similarly, for the dual vectors, one obtains, &LL^ = *L ' (L = 1, m+1), e(i> = f(i) . »e(i)t -§LI XLI &IL » and *1M * °R (M^LT L,M/I)9, Then, one has, where and g(i) = e(i)te(i> * e(i)et1>t * dlK •Si" • 0 • SIL'*. (L /I)» «I GI 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items