Modeling Operational Risk using the TruncationApproachbyDaniel P. HadleyB.Sc., Economics, The University of Scranton, 2005A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)July 2018c© Daniel P. Hadley, 2018The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Modeling Operational Risk using the Truncation Approachsubmitted by Daniel P. Hadley in partial fulfillment of the requirements for thedegree of Master of Science in Statistics.Examining Committee:Natalia Nolde, StatisticsSupervisorHarry Joe, StatisticsAdditional ExamineriiAbstractBanks that use the advanced measurement approach to model operational risk maystruggle to develop an internal process that produces stable regulatory capital overtime. Large decreases in regulatory capital are scrutinized by regulators while largeincreases may force banks to set aside more assets than necessary. A major sourceof this instability arises from the loss severity selection process, especially whenthe selected distribution families for severity risk categories change year-to-year. Inthis report, we examine the process of selecting severity distributions from a candi-date distribution list within the guidelines of the advanced measurement approach,propose useful tools to aid in selecting an appropriate severity distribution, and an-alyze the effect of selection criteria on regulatory capital. The log sinh-arcsinh dis-tribution family is added to a list of common candidate severity distributions usedby industry. This 4-parameter family solves issues introduced by the 4-parameterg-and-h distribution without sacrificing flexibility and shows promise in outper-forming 2-parameter families, reducing the frequency of severity distribution fam-ilies changing year-to-year. Distribution parameters are estimated using the maxi-mum likelihood approach from loss data truncated at a known minimum reportingthreshold. Our severity distribution selection process combines truncation prob-ability estimates with Akaike Information Criterion (AIC), Bayesian InformationCriterion, modified Anderson-Darling, QQ-plots, and predictive measures such asthe quantile scoring function and out-of-sample AIC, and we discuss some of thechallenges associated with this process. We then simulate operational losses andcalculate regulatory capital, comparing the effect on regulatory capital of selectingloss severity distributions using AIC versus quantile score. A combination of thesetwo criteria is recommended when selecting loss severity distributions.iiiLay SummaryRegulatory capital is the minimum amount of capital that a bank must set asideto cover future operational losses. The advanced measurement approach allowsbanks to develop internal models to calculate regulatory capital by estimating lossfrequency and severity distributions from their internal loss data. A bank’s inter-nal model is updated annually, so that regulatory capital reflects the bank’s currentbusiness environment. Operational loss severity data are often dominated by lowprobability, high severity events, and the annual selection of loss severity distribu-tions is a major source for year-to-year volatility in the regulatory capital calcula-tion. To mitigate volatility, this thesis analyzes a loss severity distribution selectionprocess by investigating the log-sinh-arcsinh distribution to model loss severitiesand distribution selection criteria that combine relative measures of overall fit withpredictive performance of extreme quantiles.ivPrefaceThis dissertation is original, unpublished work by the author, Daniel Hadley, underthe supervision of Professor Natalia Nolde and Professor Harry Joe. The researchtopic was suggested by the Global Risk Management department at Scotiabank.All simulations and analyses were designed and carried out by the author.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Loss Distribution Approach . . . . . . . . . . . . . . . . . . . . . . . 52.1 Loss Severity Distributions . . . . . . . . . . . . . . . . . . . . . 92.1.1 Estimating Parameters for Single Severity Distribution Can-didates . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Estimating Parameters for Piecewise Severity DistributionCandidates . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.3 Analysis of Estimated Severity Distributions . . . . . . . 21vi2.1.4 Challenges with Maximum Likelihood Estimation for LossSeverity Distributions . . . . . . . . . . . . . . . . . . . . 252.2 Loss Frequency Distributions . . . . . . . . . . . . . . . . . . . . 342.3 Total Annual Loss and Regulatory Capital Estimation . . . . . . . 353 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . 383.2 Loss Severity Distribution Estimation and Selection . . . . . . . . 403.2.1 SRC 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2.2 SRC 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.2.3 SRC 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.3 Loss Severity Selection Criteria and Regulatory Capital . . . . . . 523.3.1 Marginal Distributions . . . . . . . . . . . . . . . . . . . 543.3.2 Regulatory Capital . . . . . . . . . . . . . . . . . . . . . 573.4 Challenges of Truncation Probabilities . . . . . . . . . . . . . . . 574 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66A Loss Severity Distributions . . . . . . . . . . . . . . . . . . . . . . . 69A.1 Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . 69A.2 Generalized Pareto Distribution . . . . . . . . . . . . . . . . . . . 70A.3 Burr Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 74A.4 Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 77A.5 Loglogistic Distribution . . . . . . . . . . . . . . . . . . . . . . . 79A.6 g-and-h Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 80A.7 log-SaS Distribution . . . . . . . . . . . . . . . . . . . . . . . . 84A.8 Lognormal Spliced Lognormal Distribution . . . . . . . . . . . . 88A.9 Lognormal Spliced Generalized Pareto Distribution . . . . . . . . 91B MLE Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95B.1 MLE Results for SRC 1 . . . . . . . . . . . . . . . . . . . . . . . 96B.2 MLE Results for SRC 2 . . . . . . . . . . . . . . . . . . . . . . . 97viiB.3 MLE Results for SRC 3 . . . . . . . . . . . . . . . . . . . . . . . 98viiiList of TablesTable 2.1 Candidate distributions: A distribution has a Regularly Vary-ing (RV) right tail if its density decreases to 0 at the rate x−bwith b > 1, Subexponential (SUBEX) if its density decreasesto 0 slower than e−x, but faster than RV, or Superexponential(SUPEX) if its density decreases to 0 faster than e−x. For moretheoretical definitions, see Foss et al. [2013]. . . . . . . . . . . 11Table 2.2 Proportion of distribution below zero, estimated parameters, andthe minimized negative log-likelihood when fitting the g-and-hdistribution to a left-truncated sample using MLE and PMLE. . . 33Table 3.1 Table of frequency and severity distributions used to simulateoperational losses for three SRC’s for fourteen years spanning2004 - 2017. The frequency distribution is the same for eachSRC, Pois(λ = 100). The loss severity distributions for SRC 1,SRC 2, and SRC 3 are Burr, log-SaS, and a mixture model wherecomponent 1 is simulated from lognormal and component 2 issimulated from Burr. . . . . . . . . . . . . . . . . . . . . . . . 38Table 3.2 Summary Statistics for each simulated SRC named SRC 1, SRC2, and SRC 3, respectively. From left to right, the columns showthe name of the SRC, sample size, minimum observable loss,25th percentile, median, mean, 75th percentile, and maximumloss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39ixTable 3.3 SRC 1 selection critera from left to right: truncation probabil-ity estimate, BIC, AIC, modified Anderson-Darling test at the95% confidence level, QS, out-of-sample AIC, and estimated99.9% quantile. Values from the true model are given in thefirst row, and subsequent rows are sorted by AIC from best toworst. Ranks from best (1) to worst (9) are presented for othercriteria. The rank for the 99.9% quantile estimate is by distanceto the true quantile. Lgn/Lgn and Lgn/Gpd refer to the spliceddistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Table 3.4 SRC 2 selection critera from left to right: truncation probabil-ity estimate, BIC, AIC, modified Anderson-Darling test at the95% confidence level, QS, out-of-sample AIC, and estimated99.9% quantile. Values from the true model are given in thefirst row, and subsequent rows are sorted by AIC from best toworst. Ranks from best (1) to worst (9) are presented for othercriteria. The rank for the 99.9% quantile estimate is by distanceto the true quantile. Lgn/Lgn and Lgn/Gpd refer to the spliceddistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Table 3.5 SRC 3 selection critera from left to right: truncation probabil-ity estimate, BIC, AIC, modified Anderson-Darling test at the95% confidence level, QS, out-of-sample AIC, and estimated99.9% quantile. Values from the true model are given in thefirst row, and subsequent rows are sorted by AIC from best toworst. Ranks from best (1) to worst (9) are presented for othercriteria. The rank for the 99.9% quantile estimate is by distanceto the true quantile. Lgn/Lgn and Lgn/Gpd refer to the spliceddistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51xList of FiguresFigure 2.1 Asymmetry of the quantile scoring function at the 0.999 quan-tile penalizes underestimates more than overestimates: Thesolid line is the 0.999-quantile score for a sample of 2500 inde-pendent Unif(0,1000) random variables with forecasts of inte-gers 899 to 1299. The true quantile is the dotted vertical lineat 999. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 2.2 Power law mimicking behavior of a lognormal distribution withsufficiently high truncation as seen from a log-log plot: Theleft-hand plot shows 4 truncated samples of 100, 400, 10,000,and 100,000 independent lognormal random variables with pa-rameters µ =−8 and σ = 4.5, truncated at the 10%,75%,99%,and 99.9% quantiles, respectively. The right-hand plot shows4 truncated trunated samples 100, 400, 10,000, and 100,000independent Pareto random variables with parameter α = 1,truncated at the 10%, 75%, 99%, and 99.9% quantiles, respec-tively. The lognormal samples resemble the constant slope ofthe Pareto samples when truncation is high enough. . . . . . . 31Figure 2.3 Plot of the truncated sample density and the estimated densitiesunder the MLE and PMLE approaches on the log scale. ThePMLE parameters produce a right-tail that is almost identicalto the MLE distribution, but places a higher probability of aloss occurring around the mode instead of the probability ofnegative losses estimated by the MLE approach. . . . . . . . . 33xiFigure 3.1 Number of observable annual losses from 2004 - 2017 for eachSRC with the mean and variance under each plot . . . . . . . 39Figure 3.2 Top Row: Sample densities of the log-losses for each SRC;Second Row: Histogram for the smallest 75% of raw losses . 40Figure 3.3 The sample density plotted against the true severity model andeach estimated candidate distribution for SRC 1 on the log scale.The number of observations in SRC 1 is 1368, and the smooth-ing bandwidth is 0.1709. . . . . . . . . . . . . . . . . . . . . 42Figure 3.4 QQ-Plots for the true model and each estimated candidate dis-tribution for SRC 1. Estimated quantiles are given on the ver-tical axis with empirical quantiles along the horizontal. Pointsbelow the 45◦ line indicate underestimates the empirical quantile. 45Figure 3.5 The sample density plotted against the true severity model andeach estimated candidate distribution for SRC 2 on the log scale.The number of observations in SRC 2 is 1342, and the smooth-ing bandwidth is 0.2209. . . . . . . . . . . . . . . . . . . . . 46Figure 3.6 QQ-Plots for the true model and each estimated candidate dis-tribution for SRC 2. Estimated quantiles are given on the ver-tical axis with empirical quantiles along the horizontal. Pointsbelow the 45◦ line indicate underestimates the empirical quantile. 48Figure 3.7 The sample density plotted against the true severity model andeach estimated candidate distribution for SRC 3 on the log scale.The number of observations in SRC 3 is 1365, and the smooth-ing bandwidth is 0.1479. . . . . . . . . . . . . . . . . . . . . 50Figure 3.8 QQ-Plots for the true model and each estimated candidate dis-tribution for SRC 3. Estimated quantiles are given on the ver-tical axis with empirical quantiles along the horizontal. Pointsbelow the 45◦ line indicate underestimates the empirical quantile. 53Figure 3.9 The top plot shows the log of the 99.9% quantile from SRC1’s loss severity distribution as selected by AIC and QS. Theshorthand name for loss severity distribution shows each year’sselection. The bottom plot shows the log of the 99.9% quantilefor SRC 1’s total annual loss marginal distribution. . . . . . . 55xiiFigure 3.10 The top plot shows the log of the 99.9% quantile from SRC2’s loss severity distribution as selected by AIC and QS. Theshorthand name for loss severity distribution shows each year’sselection. The bottom plot shows the log of the 99.9% quantilefor SRC 2’s total annual loss marginal distribution. . . . . . . 56Figure 3.11 The top plot shows the log of the 99.9% quantile from SRC 3’sloss severity distribution as selected by AIC and quantile score.The shorthand name for loss severity distribution shows eachyear’s selection. The bottom plot shows the log of the 99.9%quantile for SRC 3’s total annual loss marginal distribution. . . 58Figure 3.12 The top plot shows the log of RC as the 99.9% quantile for totalannual operational loss when using a t-copula with 10 degreesof freedom and correlation parameter of 0 to combine lossesacross SRC’s. The bottom plot shows the log of RC with acorrelation parameter of 0.1. In both plots, the true RC is thesolid line, the dashed line is RC using estimated distributionsselected by AIC, and the dotted line shows RC using estimateddistributions selected by QS. . . . . . . . . . . . . . . . . . . 59Figure 3.13 Distribution of truncation probability estimates for 1000 trun-cated samples of size 2500 at the 2.5%, 5%, 10%, 20%, and40% quantiles. Maximum likelihood estimation is performedusing the Burr distribution. Even under these optimal condi-tions, there is a lot of uncertainty in the truncation probabilityestimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Figure B.1 SRC 1 MLE parameters for each candidate distribution whenincluding all loss data before each row’s designated year. Thelast row uses all simulated data. Lgn/Lgn and Lgn/Gpd refer toLognormal Body Spliced with Lognormal Tail (LGNLGN) andLognormal Body Spliced with Generalized Pareto Tail (LGNGPD),respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . 96xiiiFigure B.2 SRC 2 MLE parameters for each candidate distribution whenincluding all loss data before each row’s designated year. Thelast row uses all simulated data. Lgn/Lgn and Lgn/Gpd referto LGNLGN and LGNGPD, respectively. . . . . . . . . . . . . 97Figure B.3 SRC 3 MLE parameters for each candidate distribution whenincluding all loss data before each row’s designated year. Thelast row uses all simulated data. Lgn/Lgn and Lgn/Gpd referto LGNLGN and LGNGPD, respectively. . . . . . . . . . . . . 98xivGlossaryAIC Akaike’s Information CriterionAMA Advanced Measurement ApproachBASEL II Basel II frameworkBCBS Basel Committee on Banking SupervisionBIC Bayesian Information CriterionBUR Burr DistributionCDF Cumulative Distribution FunctionEVT Extreme Value TheoryGNH g-and-h DistributionGPD Generalized Pareto DistributionLDA Loss Distribution ApproachLDCE Loss Data Collection ExerciseLGN Lognormal DistributionLGNGPD Lognormal Body Spliced with Generalized Pareto TailLGNLGN Lognormal Body Spliced with Lognormal TailLLOG Loglogistic DistributionxvLSAS log sinh-arcsinh DistributionMLE Maximum Likelihood EstimationOOS Out-of-SampleOR Operational RiskORX ORX Global DatabasePDF Probability Density FunctionPMLE Penalized Maximum Likelihood EstimationQS Quantile Scoring FunctionRC Regulatory CapitalRRC Regulatory Risk CategoryRV Regularly VaryingSAS Sinh-arcSinh DistributionSRC Severity Risk CategorySUBEX SubexponentialSUPEX SuperexponentialWBL Weibull DistributionxviAcknowledgmentsI am forever grateful to my supervisor, Professor Natalia Nolde, for her patience,insight, and guidance throughout my research. Natalia is an invaluable resourcefor not only statistical knowledge, but also for advice on navigating life in graduateschool. I am truly humbled by the time she has spent with me, the knowledgeshe has imparted, and patience with which she has helped me. I hope to one dayemulate her qualities as a professor. I also want to thank Professor Harry Joefor being my second reader and whose door is always been open to my ceaselessquestions.Thank you to the Scotiabank Cybersecurity and Risk Analytics Initiative forfunding my research, and to the Global Risk Management team at Scotiabank inToronto for sharing their expertise.xviiDedicationTo my wife, Jessica, whose unending support and sacrifice allowed me to pursuethis dream. She is my biggest advocate and best friend. I am fortunate to have sucha loving partner to help me through life’s challenges.To my parents, for setting a good example by being the hardest working peopleI know.xviiiChapter 1IntroductionAccording to the Basel II framework (BASEL II) last updated in 2006, OperationalRisk (OR) as defined by the Basel Committee on Banking Supervision (BCBS) is“the risk of a loss resulting from inadequate or failed internal pro-cesses, people and systems, or from external events. This definition in-cludes legal risk but excludes strategic and reputational risk.” [BCBS,2006]Thus, OR is the risk arising from execution of a company’s business functions[Embrechts and Hofert, 2011].BASEL II provides recommendations on banking regulations which are issuedand updated by BCBS. In this report, we focus on the Advanced MeasurementApproach (AMA) to calculate the minimum capital requirements for operationalrisk. BASEL II intentionally grants banks a high degree of flexibility under the AMAto encourage growth in the discipline [Embrechts and Hofert, 2011]. The minimumamount of capital that a bank must set aside for one year to mitigate potentialoperational losses is called Regulatory Capital (RC) and is calculated each year asthe 99.9%-quantile of the estimated total operational loss distribution. The totaloperational loss distribution is commonly estimated using the Loss DistributionApproach (LDA), which requires a bank to estimate and select distribution functionsfor both the loss frequency and severity of each category of operational losses.A major obstacle facing a bank using the LDA is to develop a procedure that1calculates stable RC from year-to-year. Since RC must be set aside as a reserve, itrepresents assets that cannot be used freely by the bank to generate revenue. Thus,a bank has a vested interest in minimizing the probability of overestimating opera-tional risk. Since regulators scrutinize large decreases in a bank’s RC from one yearto the next, a bank may have difficulty correcting an overestimation. Stability inRC is a common problem faced by industry that is not well covered by operationalrisk research.A large source of RC’s instability is the loss severity distribution selection pro-cess, which uses historical data to estimate distributions for a list of candidate dis-tribution families and selects the “best” distribution based on some criteria. Thisprocess is repeated annually for each risk category. When a risk category’s sever-ity distribution changes from one family to another, the category’s contribution tototal annual operational risk is likely to change dramatically. Thus, we are inter-ested in finding a flexible distribution family that can outperform other candidatesyear-after-year.This flexible distribution family should be able to capture various tail behaviorsand model the upper and lower tails separately. Two-parameter distribution fam-ilies, while popular in industry, are not flexible enough to accomplish both goals.Spliced and mixture distributions can model the two tails separately, but are usu-ally not able to capture various tail behaviors and are thus susceptible to changingdistribution families. The four-parameter g-and-h distribution [Hoaglin, 1985] is ahighly flexible distribution commonly used in operational risk, but has many chal-lenges in practice. To maintain the flexibility of a four-parameter family whilealleviating problems of the g-and-h distribution, we investigate the Sinh-arcSinhDistribution (SAS) [Jones and Pewsey, 2009] as a candidate model for loss severi-ties.Besides the list of candidate distribution families, we also analyze criteria forthe loss severity distribution selection process. Since RC is the 99.9%-quantileof the total annual operational loss distribution, we are interested in criteria thatassess the fit of the extreme right tail. Common estimators such as Akaike’s Infor-mation Criterion (AIC) [Akaike, 1974] and Bayesian Information Criterion (BIC)[Schwarz, 1978] use likelihoods that give equal weight to each observation in adataset and thus may overweight the central portion of the data. We investigate2the Quantile Scoring Function (QS) [Gneiting, 2011], which allows us to assess anestimated distribution’s performance for specified quantiles, truncation probabilityestimates, and QQ-plots to aid in the selection of a loss severity distribution.Banks struggling to calculate stable RC could be a contributing factor to themove away from the AMA. In the next manifestation of the Basel accords, BaselIII, the AMA is being removed from the regulatory framework.“The option to use an internal model-based approach for measuringoperational risk - the ‘Advanced Measurement Approaches’ (AMA) -has been removed from the operational risk framework. BCBS believesthat modeling of operational risk for regulatory capital purposes is un-duly complex and that the AMA has resulted in excessive variability inrisk-weighted assets and insufficient levels of capital for some banks.”[BCBS, 2016]Regardless of regulatory requirements, OR remains a potentially catastrophicthreat to a financial institution. In addition to the direct loss resulting from an op-erational loss event, financial institutions are likely to suffer further damages dueto the loss of trust of their customers. These additional losses are considered rep-utational losses and are specifically excluded from OR. Even as regulation movesaway from the AMA, a bank still has plenty of motivation for internal modeling oftheir OR as evidenced by recent operational losses.According to the article, “The Final Bill - financial crime” [Economist, 2016],there have been 188 settlements since 2009 for criminal and civil prosecutionsagainst banks costing $219 billion as of August 2016. Eleven firms have paidpenalties in excess of 10% of their market capitalization. For example, Bank ofAmerica has paid the most in both dollars ($77 billion) and as a percentage of itsmarket capitalization (50%). As a result, banks that saw opportunities for profitby operating in countries where bribery and suspicious transactions are toleratedare now finding the cost of operating in these environments exceed profits. Morerecently, a March 2018 article in the Wall Street Journal [Strasburg, 2018] reportsthat Barclays was ordered to pay $2 billion in civil penalties for fraudulently sellingmortgage securities that contributed to the 2008 financial crisis. Additionally, twoformer Barclays executives were considered personally responsible for their role3and ordered to pay $2 million. In April 2018, Wells Fargo was fined $1 billion forthe “bank’s failures to catch and prevent problems, including improper charges toconsumers in its mortgage and auto-lending businesses.” [Hayahsi, 2018]. With somuch at stake, we believe it is in a bank’s best interest to continue to assess its ORexposure using the highly flexible AMA.This report presents possible modeling procedures within the AMA guidelinesand identifies assumptions and methodologies applied to the RC estimation pro-cedure that may contribute to the variability in risk-weighted assets cited by theBCBS. The rest of this report is outlined as follows: Section 2 reviews the LDA un-der BASEL II and outlines our AMA for estimating the loss severity and frequencydistributions for the various severity risk categories. We also address importantchallenges one may encounter when following our procedure. Section 3 presentsa simple, heuristic approach to model estimation and selection of loss severity dis-tributions using simulated data. We select loss severity models from a candidatelist based on two different criteria, AIC and QS, and compare the impact of the se-lection criteria on RC. All simulations and numerical analyses use the R softwareavailable at https://www.r-project.org/. Then, we examine the issue of truncationprobability estimates. Section 4 summarizes our conclusions and suggests areasfor future research.4Chapter 2Loss Distribution ApproachUnder the BASEL II AMA guidelines, bank activities are partitioned into eight busi-ness lines and operational losses are categorized into seven event types. We callthe occurrence of an operational loss a loss event. Loss events are mapped to anintersection of business line and event type, so that each loss event falls into one of56 business line/event type intersections, called a Regulatory Risk Category (RRC).The eight business lines are corporate finance; trading and sales; retail bank-ing; commercial banking; payment and settlement; agency services; asset man-agement; and retail brokerage. The seven event types are internal fraud; externalfraud; employment practices and workplace security; clients, products, and busi-ness practices; damage to physical assets; business disruption and system failures;execution, delivery, and process management. As an example, we look at the $2billion penalty assessed to Barclays [Strasburg, 2018] mentioned in Section 1. Theloss amount is $2 billion occurring in year 2018 for the retail banking businessline/internal fraud event type. A loss event is limited to one event type, but mayaffect multiple business lines simultaneously.The AMA guidelines allow a bank to use another mapping, so long as it istransparent to third parties, approved by the board of directors, and independentlyreviewed. The number of internal event types and business lines may vary fordifferent banks and usually depends upon the size of the bank and the amount ofdata available. Regardless of a bank’s internal mapping, the bank must be able tomap their historical losses to the eight business lines and seven event types outlined5in BASEL II. A loss event has a time stamp, a loss amount, and an associated RRC.The time stamp is usually a fiscal quarter or year and the loss amount is a positivevalue. The number of loss events that occur in a particular RRC over a given timeperiod is called the loss frequency. The loss amount for a loss event is called lossseverity.Throughout this report, we exclusively consider an annual loss frequency whichhas many benefits. First, RC is an annual forecast so it is natural to work withhistorical data on the same frequency. Secondly, the use of an annual frequencynaturally mitigates some of the reporting biases that were evidenced by the 2004Loss Data Collection Exercise (LDCE). According to Dutta and Perry [2006], datacollected during the 2004 LDCE show both structural reporting bias and temporalclustering of losses. Structural reporting bias is evidenced by a clear trend in lossevents over time, most commonly seen as an increase in loss events as a bank’s sys-tems and processes for identifying operational losses improve. However, improvedsystems and processes may also decrease loss events as risk is identified and mit-igated. Structural reporting bias can also affect loss severity, since earlier systemsare more likely to catch large losses as opposed to smaller losses. The second typeof reporting bias, temporal clustering of losses, commonly manifests as a dispro-portionate number of losses occurring on the last day of a fiscal quarter or the lastfiscal quarter in a year. Thus, an annual frequency alleviates the temporal cluster-ing bias and structural reporting bias for data at higher than annual frequencies, butmay not completely address structural reporting bias for annual loss frequencies.We refer to Chavez-Demoulin et al. [2015] for a promising general solution to thisproblem that uses covariates to model trends in a loss frequency, but we do notincorporate those methods here since our simulated data assume no trend.Historical loss data are used to estimate the loss frequency distribution and theloss severity distribution by interpreting the historical loss data as the realizationsof random variables. Using notation adapted from Embrechts and Hofert [2011],we denote a loss event by{Xb,lt,n}, for t = 1,2, ...,T ; b = 1,2, ...,B; l = 1,2, ...,L; n = 1,2, ...,Nb,lt ; (2.1)where Xb,lt,n is a random variable for the loss severity of the nth loss event occurring6in year t for business line b and event type l, and Nb,lt is a random variable for thenumber of losses occurring in year t for business line b and event type l. Thus, thetotal annual operational loss for next fiscal year can be calculated asST+1 =B∑b=1L∑l=1Nb,lT+1∑n=1Xb,lT+1,n. (2.2)The goal of the LDA is to estimate the distribution of ST+1 and calculate the 99.9%quantile of the distribution of ST+1. This number is the bank’s RC for the next year.The random variable ST+1 has two sources of randomness, the loss frequencyand the loss severity. The loss frequency, Nb,lT+1, is a discrete random variablefor the number of losses in year T + 1 in business line b and event type l. Theloss severity, Xb,lt,n , is a non-negative, continuous random variable as defined in(2.1). When calculating RC, we follow two common assumptions in operationalrisk modeling:• The loss frequency, Nb,lT+1, is independent of the loss severity, Xb,lt,n , for a given(b, l)• Loss severities are independent and identically distributed within a RRCThe first assumption does not exclude dependence of loss frequencies or severitiesacross business lines/event types. The second assumption treats loss severities asindependent and identically distributed through time, so that Xb,lt,n is independent ofboth t and n.Under an AMA, a bank must have well-documented procedures that justify theongoing relevance of the historical loss data included in the RC calculation. Thehistorical loss data should reflect all current, material activities, risk exposures,and all relevant losses over a minimum gross loss threshold. For a bank’s internallosses, BCBS sets the minimum threshold at $10,000. The bank’s internal lossdata must be exclusively used when estimating the loss frequency for each RRC.When estimating loss severity distributions, however, data often contain too fewobservations to reliably estimate a distribution for each RRC. Therefore, a bankmay combine losses across multiple business lines for a given event type to forma Severity Risk Category (SRC), so long as the bank’s activities across business7lines for a given loss event type are similar enough to justify the assumption ofa single loss severity distribution and the mapping of each loss to the standardbusiness line/event type matrix is disclosed. The option for a bank to create SRC’sis explicitly stated by BCBS Supervisory Guidelines [BCBS, 2011]:“A bank should determine the optimum balance between granularityof the classes and volume of historical data for each class.”While the desired number of loss events for a SRC is not given, research by Grootersand Reinink [2013] suggests optimal sample sizes between 500 and 10,000.Even after forming SRC’s, internal data may still be too sparse to estimate lossseverity distributions for each SRC. To address this concern, BASEL II allows abanks’ internal data to be supplemented by an external database. If the minimumreporting threshold of the data in the database differ from the bank’s, the higherthreshold is used and applied to all internal and external losses for that SRC. Mini-mum reporting thresholds and losses measured in different currencies are convertedto a common currency using current exchange rates. Some external datasets in-clude losses from banks of various sizes located all over the world, so care shouldbe taken to filter the data so that they are appropriate to a bank’s current businessactivities both in size and scope. For example, the ORX Global Database (ORX)contains more than 500,000 loss events whose loss amount exceeds a thresholdof e20,000. ORX contains loss events whose loss amount may be multiple timeslarger than total assets held by a small to medium sized bank and should be ex-cluded for such banks. Also, international banks may face different sources of riskthat are not relevant to a local bank. If ORX external data are used to supplementinternal SRC data, the higher ORX threshold should be applied to both internal andexternal data for that SRC. Since some SRC’s may need to be supplemented byexternal data and others may not, a bank may have different minimum reportingthresholds for different SRC’s.For the remainder of this section, we discuss a procedure that uses historicalloss data to estimate the loss frequency and severity distributions. When estimatingloss frequency and severity distributions, we use SRC’s instead of RRC’s. To avoidconfusion, a brief discussion of the mapping between RRC’s and SRC’s is presented.For each loss event type l, a bank may combine losses across one or more8business lines, bi1 ,bi2 , . . . ,bik , and map the associated loss severities Xbi j ,lt,n , fort = 1,2, . . . ,T ;n = 1,2, . . . ,Nbi j ,lt ; j = 1,2, . . . ,k ≤ B, to one and only one SRC, r.Thus, loss severities are pooled across time and business lines to create the histori-cal loss data in SRC r. These pooled loss severities are treated as independent andidentically distributed random variables from some loss severity distribution, Fr.Loss frequency distributions, however, are estimated separately for each busi-ness line/event type, (b, l), by pooling the historical loss frequencies across time:nb,l1 ,nb,l2 , . . . ,nb,lT . From this historical dataset, we estimate each loss frequency dis-tribution, Fb,l . Thus, the loss frequency for SRC r is a random variable,Nrt = Nbi1 ,lt +Nbi2 ,lt + · · ·+Nbik ,lt ;where Nbi j ,lt ∼Fbi j ,l . In some instances, such as if Nbi j ,ltiid∼ Pois(λ j), the distributionof Nrt is easily derived as Pois(λ1+λ2+ · · ·+λk).Using SRC’s simplifies our notation so that losses from (2.1) can be rewrittenas{X rt,n}, for t = 1,2, ...,T ; r = 1,2, ...,R; n = 1,2, ...,Nrt , (2.3)and the total annual operational loss from equation (2.2) can be rewritten asST+1 =R∑r=1NrT+1∑n=1X rT+1,n. (2.4)2.1 Loss Severity DistributionsA loss severity distribution, modeled respectively for each SRC, has an associateddensity function to describe the probability of the loss amount given an operationalloss event occurs. For our LDA procedure, we use nine loss severity distributionfamilies to model loss amounts for each SRC. These distributions are listed inTable 2.1. Of these nine candidates, seven are comprised of a single distributionto describe all loss amounts, and two are piecewise distributions. A piecewisedistribution can potentially handle a situation where loss events in the body and tail9are driven by two unique processes. The splicing point of a piecewise distributionis the loss amount that separates the body and tail of the spliced distribution. Alllosses exceeding the splicing point fall into the tail and all losses below fall intothe body. The splicing point is treated as an additional parameter that must beestimated.The single distribution candidate families are selected to span various tail be-haviors and include common distributions employed by industry. In a survey pub-lished in 2009, BCBS reported that 33% of surveyed banks use the lognormal dis-tribution and 17% use the Weibull distribution when modeling losses by a singleseverity distribution. This same survey reported that when modeling the body andtail distributions separately, 14% use lognormal to model the tail while 31% usegeneralized Pareto [BCBS, 2009].Since operational loss data only include losses whose loss amounts exceed aminimum threshold, the datasets are incomplete. In particular, since we have noinformation about the frequency and severity of losses occurring below the thresh-old, the data are said to be truncated from below or left-truncated. Note that left-truncated data differ from left-censored data, because left-censored data includethe number of observations below the minimum threshold.There are three distinct approaches for handling left-truncated data. The firstapproach, commonly referred to as the naive approach, ignores the threshold andmodels the dataset as if it were complete. Evidence in the existing literature indi-cates that this approach may underestimate both the loss frequency and loss sever-ity simultaneously [Baud et al., 2003, Chernobai et al., 2005, Luo et al., 2007]. Asa result, the naive approach is likely to underestimate RC. The second approachmodels the excess loss amount over the minimum threshold. This is referred to asthe shifted approach, since loss amounts are shifted downward by the amount ofthe reporting threshold. Luo et al. [2007] states that the shifted approach underes-timates loss frequency, but overestimates loss severity. The aggregate effect on RCis thus uncertain. However, Dutta and Perry [2006] use this approach to effectivelyproduce “realistic” RC estimates using the g-and-h distribution as the loss severitydistribution, where realistic RC estimates are estimates that constitute less than 3%of a bank’s total assets. Therefore, the shifted approach should not be completelydisregarded especially when data and expert opinion for losses below the threshold10Distribution Notation Parameters Tail BehaviorLognormal LGNlocation: µ ∈ RSUBEXscale: σ > 0GeneralizedGPDlocation: µ ∈ R ξ > 0 =⇒ RVParetoscale: θ > 0 ξ = 0 =⇒ Exponentialtail: ξ ∈ R ξ < 0 =⇒ Bounded aboveBurr BURshape 1: α ∈ RRVshape 2: γ > 0scale: θ > 0Weibull WBLshape: a> 0 a< 1 =⇒ SUBEXscale: θ > 0 a> 1 =⇒ SUPEXLoglogistic LLOGshape: γ > 0RVscale: θ > 0g-and-h GNHlocation: a ∈ Rh> 0 =⇒ RVscale: b> 0h = 0 =⇒ SUBEXskewness: g ∈ Rh< 0 =⇒ SUPEXelongation: h ∈ RLSASlocation: a ∈ R δ ≤ 0.5 =⇒ RVLog- scale: b> 00.5< δ < 1 =⇒ SUBEXsinh-arcsinh skewness: ε ∈ R δ > 1 =⇒ SUPEXelongation: δ > 0LognormalLGNLGNlocation: µb ∈ RSUBEXBodyscale: σb > 0Lognormalsplice point: xs ∈RTaillocation: µu ∈ Rscale: σu > 0LognormalLGNGPDlocation: µb ∈ R ξ > 0 =⇒ RVBody scale: σb > 0 ξ = 0 =⇒ ExponentialPareto splice point: xs ∈R ξ < 0 =⇒ Bounded aboveTail tail: ξ ∈ RTable 2.1: Candidate distributions: A distribution has a Regularly Varying(RV) right tail if its density decreases to 0 at the rate x−b with b > 1,Subexponential (SUBEX) if its density decreases to 0 slower than e−x, butfaster than RV, or Superexponential (SUPEX) if its density decreases to 0faster than e−x. For more theoretical definitions, see Foss et al. [2013].11are unavailable. The third approach, called the conditional approach or truncationapproach, treats operational loss data as left-truncated at the minimum reportingthreshold. The truncation approach assumes the following:• Losses below and above the minimum threshold belong to the same distri-bution• For SRC r, loss frequency, Nrt , and loss severity, X rt,n, can be treated as inde-pendent random variables• For SRC r, all loss severities, X rt,n, are independent and identically distributedrandom variables from the loss severity distribution, Fr.While the truncation approach is often favored over the naive approach, seeChapter 9 of Chernobai et al. [2007], a recent study by Yu and Brazauskas [2017]shows that the truncation approach often leads to lower RC estimates than both thenaive and shifted approaches. Whether these lower RC estimates are more accurateor not is still up for debate. We believe that the truncation approach introduces un-certainty regarding the proportion of operational losses that occur below the trun-cation point, which may have contributed to the results of Yu and Brazauskas. Weelaborate on this issue in Section 3.4.The truncation approach estimates the loss severity distribution using only theobserved data above the reporting threshold. For n loss events exceeding the min-imum threshold in a given SRC, assume loss amounts X1,X2, ...,Xniid∼ Fr, forsome loss severity distribution Fr with parameters θ ∈ Θ. For the remainder ofSection 2.1, we drop the superscript r for loss severity distributions. If we let τrepresent the minimum reporting threshold of the given SRC, where τ is alwaysnon-random, then the conditional Cumulative Distribution Function (CDF) for aloss given that it exceeds τ is defined byF˜(x;Θ,τ) =F(x;Θ)−F(τ;Θ)1−F(τ;Θ) . (2.5)The conditional Probability Density Function (PDF) for a loss given that it exceeds12τ isf˜ (x;Θ,τ) =ddxF˜(x;Θ,τ) =f (x;Θ)1−F(τ;Θ) . (2.6)We can find the estimated unconditional CDF, F(x), and PDF, f (x), by as-suming a parametric form for F and performing Maximum Likelihood Estima-tion (MLE) on a sample, x, using the conditional likelihood function,L˜(Θ;x,τ) =n∏i=1f˜ (xi;Θ,τ) =[1−F(τ;Θ)]−n n∏i=1f (xi;Θ). (2.7)Maximizing L˜(Θ;x,τ) over Θ results in distribution parameter estimates Θ̂.We define truncation probability as the probability that a loss event’s severityis less than or equal to the minimum reporting threshold, given a loss event hasoccurred. Truncation probability is estimated for each SRC by evaluating the es-timated unconditional CDF at the minimum reporting threshold, F(τ;Θ̂). Whenusing the truncation approach to estimate loss severity distributions, this estimateis used to correct the downward bias of the historical loss event frequency for thegiven SRC when simulating losses over the entire SRC distribution. See Section 2.2for details on estimating the loss frequency.2.1.1 Estimating Parameters for Single Severity DistributionCandidatesSeven of the nine candidate distributions assume that all loss amounts above andbelow τ in a given SRC can be estimated by a single distribution. This subsec-tion details the estimation procedure for the unconditional candidate distributionfamilies: lognormal, generalized Pareto, Burr, Weibull, loglogistic, g-and-h, andlog-SaS. See Appendix A for details on the loss severity distributions and theirparameterizations.Under the truncation approach, MLE is performed by minimizing the condi-tional negative log-likelihood function. Given a sample, x, of n loss amounts froma single SRC, the conditional likelihood function is given by equation (2.7). The13conditional log-likelihood function is˜`(Θ;x,τ) = log(L˜(Θ;x,τ))=−n log(1−F(τ;Θ))+ n∑i=1log(f (xi;Θ)). (2.8)The estimated parameters, Θ̂, are found by maximizing the conditional log-likelihood function, or equivalently, minimizing the negative conditional log-likelihoodfunction, n˜`, defined byn˜`(Θ;x,τ) =− ˜`(Θ;x,τ).Since we perform MLE as the minimization of the negative conditional log-likelihoodfunction, the maximum likelihood estimates are the solution to the following min-imization problem,Θ̂= argminΘ(n˜`(Θ;x,τ)), (2.9)and thus each candidate distribution, F , has an associated estimated distribution,F(x;Θ̂). From now on, we use likelihood function to refer to the conditional like-lihood function from equation (2.7), and we use log-likelihood function to refer tothe conditional log-likelihood function from equation (2.8).2.1.2 Estimating Parameters for Piecewise Severity DistributionCandidatesIn addition to the single loss severity distribution candidates, we consider twopiecewise distributions, Lognormal Body Spliced with Lognormal Tail (LGNLGN)and Lognormal Body Spliced with Generalized Pareto Tail (LGNGPD). The spliceddistributions assume that the unobservable losses with amounts below τ follow thedistribution of the body. Let X be a non-negative random variable generated fromthe piecewise distribution F , which is comprised of one distribution for the body,Fbody, and another for the tail, Ftail . To derive the conditional PDF of a piecewisedistribution, we first derive the conditional CDF and PDF separately for the bodyand the tail. For the derivation, we treat the splicing point as given, but the splicingpoint is an additional parameter that must be estimated.14Given a splicing point xs, minimum reporting threshold τ , where τ < xs, andparameter vector Θb, the conditional CDF for the body can be written asF˜body(x;Θb,τ,xs) = P(X ≤ x|τ < X ≤ xs)=P(τ < X ≤min(x,xs))P(τ < X ≤ xs)=Fbody(min(x,xs);Θb)−Fbody(τ;Θb)Fbody(xs;Θb)−Fbody(τ;Θb)=1 for x> xsFbody(x;Θb)−Fbody(τ;Θb)Fbody(xs;Θb)−Fbody(τ;Θb) for τ < x≤ xs0 for x≤ τ,where the last equality uses the condition x ≤ xs in order for an observation to bein the body. The conditional PDF for the body follows as usual,f˜body(x;Θb,τ,xs) =ddxF˜body(x;Θb,τ,xs)=1Fbody(xs;Θb)−Fbody(τ;Θb) fbody(x;Θb) for τ < x≤ xs0 otherwise.The conditional CDF and PDF for the tail are derived in the same manner asequations (2.5) and (2.6) for the single severity distributions by treating the splicingpoint, xs, as the minimum reporting threshold. Treating xs as given and Θu as thevector of tail distribution parameters, we find the tail CDF and PDF, respectively,to beF˜tail(x;Θu,xs) =Ftail(x;Θu)−Ftail(xs;Θu)1−Ftail(xs;Θu) for xs < x0 otherwisef˜tail(x;Θu,xs) =11−Ftail(xs;Θu) ftail(x;Θu) for xs < x0 otherwise.15In order to piece the body and tail together, we consider a truncated sample, x,of n observed loss amounts ordered from smallest to largest,x(1) ≤ x(2) ≤ ·· · ≤ x(nb) ≤ x(nb+1) ≤ ·· · ≤ x(n),where nb := max{ j ∈ 1,2, . . . ,n | x( j) ≤ xs}. All sample observations less than orequal to xs are in the body of the sample, and we define the proportion of the samplein the body as pb = nbn . The remaining nu observations, where nu = n−nb, are inthe tail.To derive the conditional piecewise PDF, f˜ , we want to make sure f˜ integratesto 1 over the support of the random loss severities. Let the parameter vector for thepiecewise distribution be Θ= [Θb xs Θu ]. Then, for a,b> 0,∫ ∞0 f˜ (x;Θ,τ) = a∫ ∞0 f˜body(x;Θb,τ,xs)dx+b∫ ∞0 f˜tail(x;Θu,xs)dx = a+b = 1,with constraints ∫ xsτf˜ (x;Θ,τ)dx = pb∫ ∞xsf˜ (x;Θ,τ)dx = 1− pb.Thus, we see that a = pb and b = 1− pb. We now have the conditional piece-wise PDFf˜ (x;Θ,τ) =pb f˜body(x;Θb,τ,xs) for τ < x≤ xs(1− pb) f˜tail(x;Θu,xs) for xs < x=pb fbody(x;Θb)Fbody(xs;Θb)−Fbody(τ;Θb) for τ < x≤ xs(1− pb)1−Ftail(xs;Θu) ftail(x;Θu) for xs < x.(2.10)The conditional body and tail densities are used to find their respective like-lihood functions. Then, we perform MLE separately on the body and tail of thesample to find the piecewise distribution that maximizes the sum of the body andtail likelihood functions. An outline of the MLE approach for each of the two piece-wise candidate distributions follows.16The piecewise distribution parameter vector, Θ, includes the parameters forthe body distribution, Θb, the splicing point, xs, and the parameters for the uppertail distribution Θu. To start the MLE procedure, we restrict the set of possibleestimates of xs to the sample percentiles xp, where p = 0.3,0.32,0.34, ...,0.96, sothat x̂s ∈{x0.3,x0.32, . . . ,x0.96}. For each p, treat xp as the splicing point. Thesample is then split into a body, xb,p ∈ Rnb , and a tail xu,p ∈ Rnu , where xb,p ={x ∈ x | x ≤ xp}, xu,p = {x ∈ x | x > xp}, and nb + nu = n. Since this process isperformed for each p, we get 34 possible estimates for the parameter vectors of thepiecewise distribution,Θ̂p =[Θ̂b,p xp Θ̂u,p],where Θ̂b,p and Θ̂u,p are the estimated parameters for the body and tail distribu-tions, respectively, given that xp is the splicing point. Finally, the piecewise dis-tribution is determined by choosing the value of p that minimizes the negativelog-likelihood function. Details are below.For either the LGNLGN or LGNGPD distribution, the procedure to estimate thedistribution of the body is the same. For each xp, x is separated into a body xb,p anda tail xu,p, and the estimated parameters for the distribution of the body, denotedΘ̂b,p, are found by minimizing the negative log-likelihood function for the body,denoted n˜`b,p. From the conditional piecewise PDF in equation (2.10),n˜`b,p(Θb;xb,p,τ,xp) = nb log(Fbody(xp;Θb)−Fbody(τ;Θb))−nb log(pb)−nb∑i=1log(fbody(xi;Θb)),where τ is the minimum reporting threshold determined by the operational losssample’s SRC. For each of the 34 estimates of the splicing point, xˆs = xp, we es-timate the lognormal distribution for the body, Fbody(x;Θ̂b,p), by solving the mini-mization problemΘ̂b,p = argminΘb(n˜`b,p(Θb;xb,p,τ,xp)).The tail distribution estimation procedure differs for each piecewise candidate.We start with LGNLGN, as the process to estimate the tail is the same as the body.17For each xp, the estimated parameters for the distribution of the upper tail, denotedΘ̂u,p, are found by minimizing the negative log-likelihood function for the tail,denoted n˜`u,p. From the conditional piecewise PDF in equation (2.10),n˜`u,p(Θu;xu,p,xp) = nu log(1−Ftail(xp;Θu))−nu log(1− pb)−nu∑i=1log(ftail(xi;Θu)).Just as for the body, we end up with 34 estimated lognormal distributions for thetail. For each xp, Ftail(x;Θ̂u,p) is found by solving the minimization problemΘ̂u,p = argminΘu(n˜`u,p(Θu;xu,p,xp)).After estimating the body and tail distributions for each of the 34 splicing pointestimates, we have 34 LGNLGN distributions identified by their parametersΘ̂p =[Θ̂b,p xp Θ̂u,p].The estimated LGNLGN piecewise distribution parameters are found by solvingΘ̂= argminp(n˜`b,p(Θ̂b,p;xb,p,τ,xp)+ n˜`u,p(Θ̂u,p;xu,p,xp)),where the p that minimizes the above equation also determines the estimate of thesplicing point.The LGNLGN distribution allows for discontinuity in the PDF at the splicingpoint. There are arguments both for and against continuity constraints for spliceddistributions. For example, imposing continuity constraints on a piecewise densitymay not lead to a better likelihood measure of fit than allowing for a jump disconti-nuity. On the other hand, allowing discontinuity seems to favor the fit of the tail assplicing points tend to be chosen in the lower half of possible values. A discussionon the decision to impose continuity constraints on a spliced distribution can befound in Chapter 1 of Peters and Shevchenko [2015].For the conditional LGNGPD distribution family, we force the body and tail18distributions to be equal at the splicing point by imposing the constraintpb f˜body(xs;Θb,τ,xs) = (1− pb) f˜tail(xs;Θu,xs)pb fbody(xs;Θb)Fbody(xs;Θb)−Fbody(τ;Θb) =1− pbθ[1+ξxs− xsθ]−1−1/ξθ =1− pbpbFbody(xs;Θb)−Fbody(τ;Θb)fbody(xs;Θb).This constraint forces the scale parameter of generalized Pareto tail distribution, θ ,to be completely determined by the estimated body distribution. As a result, thescale parameter is not treated as a parameter for the LGNGPD distribution. Whilethis continuity constraint prevents jumps in the density, it does not require differ-entiability of the density at the splicing point. One could impose such a constraintusing the derivatives of the conditional body and tail density functions, see Petersand Shevchenko [2015].Since the scale parameter for the tail distribution is determined by the log-normal body distribution, we only need to estimate the tail parameter, ξ . For theLGNGPD distribution, the losses in the tail of the sample are modeled as the excesslosses over the splicing point. This is the “Peaks-Over-Threshold” method in Ex-treme Value Theory (EVT). Given this approach, EVT offers alternative approachesfor estimating the splicing point that emphasize the fit of the tail as a generalizedPareto distribution by the use of mean residual life and parameter stability plots.For a practical discussion with examples, see Chapter 4 of Coles [2001].For each xp and associated estimate of the body distribution, Fbody(x;Θ̂b,p), theestimated scale parameter for the tail distribution is calculated asθˆp =1− pbpbFbody(xs;Θ̂b,p)−Fbody(τ;Θ̂b,p)fbody(xs;Θ̂b,p).The estimate of the tail parameter, ξˆ , is found by minimizing the negative log-19likelihood function for the tail over ξ ,˜`u,p(ξ ;xu,p,xp, θˆp) = nu log{1− pb}−nu log(θˆp)−(ξ +1ξ) nu∑i=1log(1+ξθˆp(xi− xp)),andξˆp = argminξ(n˜`u,p(ξ ;xu,p,xp, θˆp)).For each xp, the estimated LGNGPD distribution has four estimated parameters,Θ̂p =[Θ̂b,p xp ξˆp],and the estimated LGNGPD distribution has parameter vector Θ̂, which is found bysolvingΘ̂= argminp(n˜`b,p(Θ̂b,p;xb,p,τ,xp)+ n˜`u,p(ξˆp;xu,p,xp, θˆp)),where the p that minimizes the above equation also determines the estimate of thesplicing point.Finally, to derive the unconditional piecewise CDF, PDF, and quantile func-tion, we must normalize the conditional piecewise density to integrate to 1. Letf (x;Θ) be the unconditional piecewise density with unconditional body densityfbody(x;Θb) and unconditional tail density ftail(x;Θu). Then, by letting c be thenormalizing constant,c∫ ∞−∞f˜ (x;Θ,τ)dx = 1,ca∫ xs−∞f˜body(x;Θb,τ,xs)dx+ cb∫ ∞xsf˜tail(x;Θu,xs)dx = 1,c∫ xs−∞pb fbody(x;Θb)Fbody(xs;Θb)−Fbody(τ;Θb)dx+ c∫ ∞xs1− pb1−Ftail(xs;Θu) ftail(x;Θu)dx = 1,20cpb Fbody(xs;Θb)Fbody(xs;Θb)−Fbody(τ;Θb) + c1− pb1−Ftail(xs;Θu)(1−Ftail(xs;Θu))= 1,cpbFbody(xs;Θb)−Fbody(τ;Θb)(Fbody(xs;Θb)−0)+ c (1− pb) = 1,cFbody(xs;Θb)− (1− pb)Fbody(τ;Θb)Fbody(xs;Θb)−Fbody(τ;Θb) = 1.Finally, we arrive at the unconditional piecewise density function by multiplyingthe conditional piecewise density by the normalizing constant,f (x;Θ,τ) =pb fbody(x;Θb)Fbody(xs;Θb)− (1− pb)Fbody(τ;Θb) for x≤ xsc(1− pb) ftail(x;Θu)1−Ftail(xs;Θu) for x> xs.2.1.3 Analysis of Estimated Severity DistributionsTo assess the estimation of loss severity candidate distributions, we employ quali-tative metrics outlined by Dutta and Perry [2006]:1. Good Fit - Statistically, how well does the method fit the data?2. Realistic - If a method fits well in a statistical sense, does it generate a lossdistribution with a realistic capital estimate?3. Well-Specified - Are the characteristics of the fitted data similar to the lossdata and logically consistent?4. Flexible - How well is the method able to reasonably accommodate a widevariety of empirical loss data tail behavior?5. Simple - Is the method easy to apply in practice, and is it easy to generaterandom numbers for the purposes of loss simulation?One measure of model performance is Akaike’s Information Criterion (AIC) as21developed by Akaike [1974]. The AIC is defined asAIC =−2 ˜`(Θ̂;x,τ)+2 k, (2.11)where Θ̂ is the estimated distribution parameter vector as found via MLE and k isthe number of estimated parameters in the distribution. Including the number ofparameters in the AIC is an attempt to prevent selecting a model that overfits thedata. The AIC is a relative performance measure, so while it can pick which modelfits the data better than another, it cannot tell if any model is a good fit. Otherdiagnostics, such as QQ-plots and density plots should be consulted for adequacy.Another measure that compares model performance is the Bayesian Informa-tion Criterion (BIC) as developed by Schwarz [1978]. Like AIC, the BIC allows fora comparison across models with different numbers of parameters by incorporating“a mathematical formula for the principle of parsimony in model building”. TheBIC is defined asBIC =−2 ˜`(Θ̂;x,τ)+ k log(n), (2.12)where Θ̂ is the MLE parameter vector, k is the dimensionality of the parametervector, and n is the number of observations in the sample x. Compared to AIC, BICfavors models with fewer parameters when n ≥ 8, since model dimensionality ismultiplied by log(n) instead of 2.The final measure of fit we employ is the modified Anderson-Darling test [Sin-clair et al., 1990], which uses the differences between empirical quantiles and es-timated quantiles from the candidate distributions while assigning higher weightsto higher quantiles than the standard Anderson-Darling test. The idea is to mea-sure goodness-of-fit in the tail from each candidate distribution. The modifiedAnderson-Darling test statistic is calculated asÂD =n2−2n∑i=1F˜(x(i);Θ̂,τ)−n∑i=1[2− 2i−1n]log[1− F˜(x(i);Θ̂,τ)],where n is the number of observations, x(i) is the ith order statistic such that x(1) ≤x(2) ≤ ·· · ≤ x(n), and F˜(x;Θ̂,τ) is the estimated conditional CDF for the candidate22distribution. To perform the modified Anderson-Darling test at the 95% confidencelevel, we use ÂD to calculate the p-value,pAD =[1+ exp{2.31+1.73ÂD+0.275ÂD− 2√ÂD− 0.092ÂD3/2}]−1.If pAD < 0.05, then the null hypothesis that the data follow the estimated candidatedistribution is rejected at the 95% significance level.We include the results of the modified Anderson-Darling test, but QQ-plotsare preferred for their simplicity and interpretability. Especially in the case ofmodel misspecification, the modified Anderson-Darling test may reject a true nullhypothesis as we show in Section 3.2.3.Forecasting ability for the estimated distributions is measured by the QuantileScoring Function (QS) and Out-of-Sample (OOS) AIC. Both make OOS predictionsto gauge forecasting performance. Following Gneiting [2011], let α = 0.999 bethe quantile. For a sample x of n operational losses, let x(−i), for i = 1,2, ...,n,be the sample with the i-th observation removed. We define q(−i) as the estimatedα-quantile when the ith observation is excluded. Then,q(−i) = F−1(α;Θ̂,x(−i)),where Θ̂ is found by using the MLE approach outlined in Sections 2.1.1 and 2.1.2for the sample x(−i). The quantile scoring function is defined asS(q(−i),xi) =1nn∑i=1(1(q(−i) ≥ xi)−α) (q(−i)− xi), (2.13)which is non-negative with values closer to zero indicating better performance.When α = 0.999, the quantile scoring function is asymmetric, penalizing morefor severe underestimation than for overestimation. This asymmetric feature shouldbe particularly appealing to regulators, who want to avoid underestimation of risk.The QS should also appeal to financial institutions, who face the possibility of ex-tremely overestimating RC when using candidate distributions capable of modelingheavy tail behavior, see Dutta and Perry [2006]. Since the 99.9% quantile from an23SRC’s severity distribution is a good proxy for its contribution to RC, see Sec-tion 3.3.2, we employ the QS when selecting a candidate distribution and comparehow this affects the RC calculation as opposed to selecting severity distributions byAIC. Since the QS relies heavily on the MLE process, the pitfalls mentioned in Sec-tion 2.1.4 become even more important when selecting severity distribution by QS.This is especially true when the MLE algorithm fails to converge due to boundaryconditions which can greatly impact the tail behavior.To see the asymmetry of the QS at α = 0.999, we look at a sample of 2500independent and identically distributed random variables distributed uniformly be-tween 0 and 1000. The true 0.999-quantile is 999, so the QS is minimized when weforecast 999. Forecasts below 999 are penalized more than forecasts that exceed999 by the same amount. Figure 2.1 plots the quantile scoring function when the0.999-quantile is forecasted to be 899 thru 1299.OOS AIC, denoted AICOOS, is calculated by excluding each year’s operationallosses, estimating each candidate distribution via MLE on the remaining losses, andcalculating the likelihood for the excluded year’s data. This process is repeated sothat each year’s losses have been excluded exactly once. The excluded-year likeli-hoods are then summed and AICOOS is calculated as in equation (2.11). These sim-ple forecasting metrics allow us to assess whether the estimated candidate distri-bution is well-specified and flexible and when combined with QQ-plots, can gaugehow realistic the quantile estimates are for each estimated candidate distributionfamily.Finally, the truncation probability estimate, F(τ;Θ̂), may provide a simple met-ric to gauge whether an estimated candidate distribution violates the assumptionthat losses above and below the minimum reporting threshold are generated by thesame distribution. In the academic literature, the truncation probability estimate isseldom mentioned, but we feel it is an easily interpretable signal of the appropriate-ness of one distribution family over another when using the truncation approach.As discussed in Section 2.1.4, large truncation probability estimates (greater than50%) or extremely small truncation probability estimates (less than 1%) may pro-vide evidence that a candidate distribution family is inappropriate. For example, itmay not be realistic to assume more than half of the operational losses in a givenSRC, or less than 1% of the losses occur below the threshold. Truncation probabil-24Figure 2.1: Asymmetry of the quantile scoring function at the 0.999 quantilepenalizes underestimates more than overestimates: The solid line is the0.999-quantile score for a sample of 2500 independent Unif(0,1000) ran-dom variables with forecasts of integers 899 to 1299. The true quantile isthe dotted vertical line at 999.ity estimates that are very large provide support for adopting the shifted approachwhile estimates near zero provide support for using the naive approach. With-out more data collection or expert opinion on losses below the threshold, settingconservative bounds on the truncation probability estimate proves quite useful inselecting from estimated candidate distributions.2.1.4 Challenges with Maximum Likelihood Estimation for LossSeverity DistributionsFor particularly flexible distributions such as Burr and log-SaS, their associatednegative log-likelihood functions can be numerically unstable. Additionally, thelarge positive skew that characterizes operational loss severity data often creates abadly-scaled problem, where the value of the parameters differ by orders of mag-25nitude. These problems are hard to solve for two reasons. First, different vari-able magnitudes make it hard to formulate reasonable stopping criteria. Secondly,functions with variables of different magnitudes usually require more iterations toconverge. As a result, solving the minimization problem numerically may fail toconverge or converge to different local minimums depending upon the parameterstarting values.To increase stability and alleviate the badly-scaled problem, we run our MLEalgorithm on the log-losses instead of the raw losses, where appropriate. The log-transform is used for the lognormal, generalized Pareto, log-SaS, lognormal bodyspliced with lognormal tail, and lognormal body spliced with generalized Paretotail distributions. Even after using log-losses, the badly-scaled problem still existsfor the Burr distribution. This issue can be further reduced through a reparame-terization of the log transform of a Burr random variable. See Appendix A.3 fordetails. To increase our confidence in convergence to a global minimum, a gridof various starting values is used and the parameter values producing the smallestnegative log-likelihood are chosen for Θ̂.For a sample of loss severities, x, equation (2.9) tells us that Θ̂ is the value ofΘ that minimizes n˜`(Θ;x,τ). Using log-losses y = log(x), the same Θ̂ minimizesn˜`(Θ;y, log(τ)). For loss severities and log-loss severities, the minimum valuesof their negative log-likelihood functions differ by a constant, which is a functionof the losses. Thus, the minimum negative log-likelihood of the log-loss data canbe easily scaled to enable comparisons between candidate distributions estimatedfrom raw loss severity data and log-loss severity data.Let X be a random variable for the amount of a loss from a given SRC and letτ be the minimum reporting threshold for that SRC. Then, the conditional CDF andPDF for the loss data are F˜X(x;Θ,τ) and f˜X(x;Θ,τ), respectively. Let Y = log(X).The conditional CDF and PDF of the log-loss data are:F˜Y(y;Θ, log(τ))= F˜X(ey;Θ,τ)f˜y(y;Θ, log(τ))= ey f˜X(ey;Θ,τ).The negative conditional log-likelihood functions for the loss and log-loss data26are respectivelyn˜`(Θ;x,τ) =−n∑i=1log(f˜X(xi;Θ,τ))n˜`(Θ;y, log(τ))=−n∑i=1log(f˜X(xi;Θ,τ))− n∑i=1yi.Thus, we must add ∑ni=1 yi =∑ni=1 log(xi) back to the negative log-likelihood of thelog-loss data to make it comparable to the negative log-likelihood of the loss data.Despite using a grid of starting values and the log-transform of random vari-ables, the MLE algorithm still fails to converge when Gumbel-type distributions arefit to left-truncated data exhibiting regularly varying tail behavior. In these situa-tions, convergence is artificially stopped as one or more of the distribution’s param-eters approach their boundaries. This situation often manifests itself in extremelylarge truncation probability estimates (> 0.95). We illustrate this phenomenon, weuse asymptotic behavior of order statistics for a specific example.When a conditional lognormal distribution is estimated by MLE from a sam-ple that exhibits tails heavier than subexponential, the lognormal distribution canmimic the heavier-tailed behavior of an inverse power law by sufficiently increas-ing the truncation point. This phenomenon is exhibited by distributions in the Gum-bel domain of attraction, which includes the lognormal, Weibull (for 0 < a < 1),and LGNLGN distributions (see Appendix A for parameterizations). We refer todistributions in the Gumbel domain of attraction as Gumbel-type.Using derivations from Section 3.1 of Perline [2005], we first analytically de-rive an asymptotic approximation to the largest order statistics of Gumbel-typedistributions. Secondly, we explicitly show that the conditional lognormal distri-bution merely mimics an inverse power law and does not obey a true inverse powerlaw. Finally, we graphically show that a sample of independently and identicallydistributed random variables from a lognormal distribution can mimic an inversepower law in the upper tail at sufficiently high truncation points by comparing thelog-log plots of truncated lognormal samples to truncated Pareto samples. All ap-proximations and conclusions are sourced from Perline [2005] unless otherwisespecified.Let X1,1,X1,2, ...,X1,n be n independent observations drawn from a Pareto dis-27tribution with CDFFX(x;α) =1−(1x)α for x> 1,0 for x≤ 1.The order statistics of this sample satisfyX1,(1) ≥ X1,(2) ≥ ·· · ≥ X1,(n),and we say a sample satisfies an approximate inverse power law if the order statis-tics obeyX1,( j) ≈cnjβ,for j = 1,2, ...,n and β ,cn > 0, where cn depends on n. The plot of the log trans-form, log(X1,( j))= Y1,( j), against log( j), should be approximately linear with in-tercept cn and slope −β . This is called a log-log plot, and its simplicity motivatesour work with the log transform, Y1,( j), which has an exponential distribution withrate parameter α . From pages 69–72 of Beirlant et al. [2004], we know the expo-nential distribution is Gumbel-type.If we have a sample of n independent and identically distributed random vari-ables from a Gumbel-type distribution with CDF, FY (y;Θ), then for some fixed jsuch that j n, there exist two sequences of standardizing constants an and bnsuch that,limn→∞P{Y(i)−anbn}= exp(−e−y)i−1∑k=01Γ(k+1)e−ky,for 1≤ i≤ j. The limiting first moment convergence [Polfeldt, 1970] islimn→∞E[Y(i)−anbn]=γEC−∑i−1k=1 1k for i> 1,γEC for i = 1,28for 1 ≤ i ≤ j and where γEC is Euler’s constant, limn→∞(∑nk=11k − logn). There-fore, we can approximate the expected value of the largest order statistics forGumbel-type distributions where n is large and i = 1,2, . . . , j such that j n, asE(Y(i))≈an+bnγEC−bn∑i−1k=1 1k for i> 1,an+bnγEC for i = 1.To find an and bn, we use Proposition 1.19 from Resnick [1987] which givesus the equationsF(an) = 1− 1n ; bn =1−F(an)f (an).For exponential distribution F with rate α , we find the standardizing sequence anas1− e−αan = 1− 1nan =1αlog(n).Standardizing sequence bn for the exponential distribution isbn =1−F( 1α log(n))f( 1α log(n)) = exp{ lognα }α exp{ lognα} = 1α .Thus for the exponential distribution, we can approximate the first j order statisticswith the formulaE(Y(i))≈( 1αlog(n)+1αγEC)− 1αi−1∑k=11k.Since 1α(γEC−∑i−1k=1 1k)≈ − log(i), the log-log plot of the largest order statisticsof X1,(i) is approximately linear with slope − 1α .Now let X2,1,X2,2, . . . ,X2,n be n independent observations drawn from a lognor-mal distribution with parameterization given in Appendix A.1. Again, we let the29order statistics of this sample satisfyX2,(1) ≥ X2,(2) ≥ ·· · ≥ X2,(n).The log transform of the jth order statistic, Y2,( j), is normally distributed with lo-cation parameter µ and scale parameter σ . From Beirlant et al. [2004], we knowthe normal distribution is Gumbel-type, so all of the above results hold, except weneed to derive the standardizing sequences an and bn for the normal distribution.From Embrechts et al. [1997], we getan = µ+σ√2logn−σ log logn+ log4pi2√2logn,bn =σ√2logn.The most important feature is the approximate slope of the log-log plots be-tween the Pareto and lognormal samples. While the Pareto sample has a constantslope in the log-log plot of − 1α , the lognormal slope in the log-log plot, σ√2logndepends on sample size n and is of order O(1/√logn). Thus, the largest orderstatistics of the lognormal only mimic a power law.To show this graphically, we generate 4 different samples from a Pareto dis-tribution with parameter α = 1 of sizes n1 = 100,n2 = 400,n3 = 10,000,n4 =100,000. The samples are then truncated at the 10%, 75%, 99%, and 99.9% quan-tiles. Similarly, we perform the same process for 4 different samples from a lognor-mal distribution with parameters µ =−8 and σ = 4.5. The results are presented inFigure 2.2.The final issue with the MLE approach is specific to the candidate g-and-h dis-tribution family since it is the only candidate distribution family with support onthe real numbers. As a result, simulating from the estimated unconditional g-and-hdistribution can result in negative loss severities. To prevent negative losses, weinstead simulate from the g-and-h distribution truncated at zero using the uncondi-tional MLE parameters. If selecting a g-and-h candidate distribution for a SRC, wemust be sure that the probability of a negative observation is very low. Otherwise,the g-and-h distribution truncated at zero will overestimate the probability of anextreme observation.30Figure 2.2: Power law mimicking behavior of a lognormal distribution withsufficiently high truncation as seen from a log-log plot: The left-handplot shows 4 truncated samples of 100, 400, 10,000, and 100,000 in-dependent lognormal random variables with parameters µ = −8 andσ = 4.5, truncated at the 10%,75%,99%, and 99.9% quantiles, respec-tively. The right-hand plot shows 4 truncated trunated samples 100, 400,10,000, and 100,000 independent Pareto random variables with param-eter α = 1, truncated at the 10%, 75%, 99%, and 99.9% quantiles, re-spectively. The lognormal samples resemble the constant slope of thePareto samples when truncation is high enough.While we did not encounter large probabilities of negative values in our sim-ulations, it is not an unusual situation. In fact, such a scenario is easily createdby simulating from a Burr(α = 0.065,γ = 15,θ = 1.226) distribution, truncatingthe sample at the 2.5% quantile, and fitting a g-and-h distribution to the truncatedsample using MLE. The estimated parameter vector, Θ̂mle, creates a g-and-h dis-tribution that is nearly symmetric with extremely fat tails. The problem is that theprobability of a negative observation, F(0; Θ̂mle) is approximately 0.27. Thus, ifwe use the parameter vector Θ̂mle to simulate from a g-and-h distribution truncatedat zero, the density on the positive real numbers is shifted upwards by 11−F(0;Θ̂mle),31which over estimates the probability of experiencing an extreme loss.Since we know that operational loss data are almost always positively skewed,it is reasonable to want a skewness parameter to reflect this. If we simply restrictthe skewness parameter space to avoid low positive skew, the MLE algorithm arti-ficially stops when it hits this boundary and thus does not converge. The resultingestimated distribution may not accurately represent the sample due to this prema-ture stopping condition. A much more effective approach is Penalized MaximumLikelihood Estimation (PMLE). For example, the PMLE approach that adds 1 to thenegative log-likelihood function for each percent of the distribution that falls belowzero is a minimization of the formΘ̂pmle = argminΘ(n˜`(Θ;x,τ)+100 ·F(0;Θ)),which is modified from equation (2.9). Minimizing this function results in a dis-tribution that is almost identical to the MLE distribution for the right tail, but hasa higher probability of experiencing a loss around the mode. This is a desirableresult, since simulating from the zero-truncated g-and-h distribution using the pa-rameter vector Θ̂pmle has almost the same probability of an extreme loss as theunconditional g-and-h distribution with parameters Θ̂mle. In other words, usingPMLE in this situation has a similar effect as simulating from the unconditionalg-and-h distribution with parameter vector Θ̂mle, but shifting most of the negativeprobability to the mode and changing the tail very little. Results are shown inTable 2.2 and Figure 2.3.This is an important point, since the g-and-h distribution estimated using MLEis essentially useless for our operational risk modeling procedure as we cannotuse it for simulations. Using PMLE leads to a more practical distribution whilemaintaining the salient properties of the data. While we did not conduct specificresearch into the best penalty term or criteria to penalize, it is worth noting thatmethods similar to those presented here can be adopted to easily solve a ratherfrustrating problem.32Table 2.2: Proportion of distribution below zero, estimated parameters, and theminimized negative log-likelihood when fitting the g-and-h distribution to aleft-truncated sample using MLE and PMLE.Figure 2.3: Plot of the truncated sample density and the estimated densities un-der the MLE and PMLE approaches on the log scale. The PMLE parame-ters produce a right-tail that is almost identical to the MLE distribution, butplaces a higher probability of a loss occurring around the mode instead ofthe probability of negative losses estimated by the MLE approach.332.2 Loss Frequency DistributionsThe estimated loss frequency distribution is used to simulate the number of lossevents that may occur next year for each business line/event type intersection cor-responding to a given SRC. When modeling loss frequency for the LDA, only in-ternal losses are used. Since most databases only include operational loss eventswhose loss amount exceeds a minimum threshold, loss frequencies based on his-torical data are biased downwards. Loss events with a loss severity below thisthreshold occur, but are not reported in the dataset. Failure to acknowledge theselosses would underestimate loss frequency but overestimate loss severity, leadingto an uncertain impact on RC [Luo et al., 2007].Under our AMA procedure, we estimate these smaller losses by treating thehistorical data as a truncated sample, estimating the non-truncated severity distri-bution, and estimating the truncation probability by evaluating the estimated sever-ity distribution function at the minimum reporting threshold. Thus, we are ableto estimate the proportion of operational loss events that go unreported due to theminimum reporting threshold. This proportion is then used to increase the lossevent frequency proportionally.According to BCBS, the two most popular distributions for loss frequencyare Poisson followed by negative binomial [BCBS, 2011]. Trends in the annualnumber of loss events can be modeled via covariates as demonstrated by Chavez-Demoulin et al. [2015] and used for forecasting. Strictly monotone trends can alsobe detected by the simple Mann-Kendall trend test [Gilbert, 1987, p. 208-217] andexponential smoothing can be used to forecast loss events. If assuming the simplestdistribution, Poisson, for the loss frequency, the estimated Poisson rate parameter,λˆ , is the mean number of observable annual losses. Since this mean number of lossevents only includes losses that exceed the minimum reporting threshold, we canderive the estimate of the Poisson rate parameter for loss events both above andbelow the threshold asλˆ ∗ =λˆ1−F(τ;Θ̂) . (2.14)342.3 Total Annual Loss and Regulatory CapitalEstimationBASEL II defines RC as the 99.9%-quantile of the total annual loss distribution. Thetotal annual loss, ST+1, given by equation (2.4) is restated below,ST+1 =R∑r=1NrT+1∑n=1X rT+1,n,where NrT+1 is the number of loss events in SRC r, and XrT+1,n is the nth loss severityin SRC r. Equation (2.4) requires forecasts of the number of loss events for SRC rin year T +1, NrT+1, and each loss event’s severity, XrT+1,n, for n = 1,2, . . . ,NrT+1.Using the relevant historical loss severity data pooled across years for SRCr, loss severity distributions are estimated for each candidate distribution family.One of the nine estimated candidate distributions is chosen as the loss severitydistribution for SRC r. Let Fr(x;Θ̂) be the estimated loss severity distribution forSRC r. If we assume the loss frequency distribution for observable losses in SRCr is Poisson with estimated rate parameter, λˆr, then the loss frequency distributionfor all loss events in SRC r is Poisson with estimated rate parameterλˆ ∗r =λˆr1−Fr(x;Θ̂) .To find the loss distribution of ST+1, we use simulation. We can write the totalannual operational loss for SRC r as the sumST+1,r =NrT+1∑n=1X rT+1,n,and we rewrite the total annual loss asST+1 =R∑r=1ST+1,r.We create N∗ simulations of ST+1,r by first simulating a sequence of indepen-dent Poisson(λˆ ∗r)random variables Pr1 ,Pr2 , ...,PrN∗ . For each Pri , we generate a35sequence of independent and identically distributed loss severities,X rT+1,1,XrT+1,2, ...,XrT+1,Prii.i.d.∼ Fr(x;Θ̂).Summing these loss severities creates N∗ simulations of annual operational lossesfor SRC r, and we denote the ith simulation of SRC r’s annual loss as S(i)T+1,r. If werank each S(i)T+1,r from lowest to highest as 1,2, ...,N∗, we can derive the empiricalmarginal distribution of ST+1,r by dividing the rank of each annual loss by N∗+1.Repeating this process for each SRC, we can derive the R marginal distributionsfor the annual operational loss from each SRC. We denote the empirical marginaldistribution of ST+1,r as FSr .To sum across SRC’s and arrive at simulations of the total annual operationalloss, ST+1, we must account for any dependence of losses across SRC’s. This isdone by t-copulas as presented by McNeil et al. [2015]. We simulate M∗ ran-dom vectors of dimensionality R from a multivariate t-distribution. By Sklar’sTheorem, we can transform the t-distributed random vectors into uniform randomvectors by applying the inverse of the t-distribution function. We derive the in-verse of each SRC’s marginal distribution, denoted F−1Sr , numerically using the Rfunction pchip() from the signal package and apply this inverse to the uni-form random vectors. The pchip() function performs piecewise cubic Hermite(monotone) interpolation. Finally, by summing the values of each vector we arriveat M∗ simulations of ST+1. The 99.9%-quantile estimate of these M∗ simulationsis our estimate of RC.36Chapter 3Simulation StudiesIn this section, operational loss data both above and below a reporting thresholdare simulated for three unique SRC’s for the fourteen years 2004 – 2017. Each ofthe SRC samples are truncated at their true distribution’s respective 2.5% quantile,which are treated as known. The frequency and severity distributions are given inTable 3.1.The loss severity distributions for SRC 1 and SRC 2 are chosen from the bestcandidate distributions as measured by AIC for actual operational loss data given tothe authors. For anonymity, the data are scaled before estimation was performed,so the parameters and selected distributions are for the scaled data, but maintain thesalient properties of tail behavior, skewness, and overall shape. Finally, loss sever-ities for SRC 3 are generated from a mixture model to examine the performance ofour estimation approach under model misspecification.For each SRC, we look at the density plots, goodness-of-fit and predictionstatistics, the estimated truncation probability, and QQ-plots for each candidateloss severity distribution. Severity distribution parameters are estimated using MLEfollowing the truncation approach. Section 3.2 presents a heuristic procedure us-ing this information to select a loss severity distribution for each SRC. Assuminga Poisson frequency distribution, we then calculate RC for years 2014 - 2018 inSection 3.3.2 and compare the impact on RC when choosing loss severity distribu-tions by AIC versus quantile score. Finally, we briefly revisit the issue of estimatedtruncation probabilities.37SRC Truncation Frequency Frequency Severity SeverityPoint Distribution Parameters Distribution Parameters1 τ = 1.167 Poisson λ = 100 Burrα = 0.065γ = 15θ = 1.2262 τ = 3.147 Poisson λ = 100 log-SaSa = 1.06b = 0.37ε = 1.65δ = 0.973 τ = 1.133 Poisson λ = 100X ∼ βX1 β = 0.3+(1−β )X2X1 ∼ LGN µ = 0.4σ = 0.16X2 ∼ Burrα = 0.065γ = 15θ = 1.226Table 3.1: Table of frequency and severity distributions used to simulate op-erational losses for three SRC’s for fourteen years spanning 2004 - 2017.The frequency distribution is the same for each SRC, Pois(λ = 100). Theloss severity distributions for SRC 1, SRC 2, and SRC 3 are Burr, log-SaS,and a mixture model where component 1 is simulated from lognormaland component 2 is simulated from Burr.3.1 Exploratory Data AnalysisTo create truncation in our data, we truncate each SRC at their known 2.5% quantileand only use the truncated sample for the remainder of this section. Summarystatistics for the truncated SRC’s are given below in Table 3.2. As is common inoperational loss data, the data exhibit extreme right-skewness as evidenced by themean > median and the maximum value the 75% quantile.Time series plots of the annual number of observable loss events are presentedin Figure 3.1. Since all loss frequencies were simulated from a Pois(λ = 100) dis-tribution and then truncated at the each loss severity’s 2.5% quantile, we know thatthe true Poisson rate for observable losses is 97.5. We assume the loss frequencydistribution is Poisson with no trend, so any perceived trend in the loss events isdisregarded. Under each plot is the mean and variance of the number of observable38Table 3.2: Summary Statistics for each simulated SRC named SRC 1, SRC 2,and SRC 3, respectively. From left to right, the columns show the name ofthe SRC, sample size, minimum observable loss, 25th percentile, median,mean, 75th percentile, and maximum loss.losses from 2004-2017.Figure 3.1: Number of observable annual losses from 2004 - 2017 for eachSRC with the mean and variance under each plotFigure 3.2 presents the sample densities for log-losses in the top row and thehistogram of smallest 75% of the raw losses. Since the data exhibit extreme skew-ness, sample densities of the log-losses better highlight the differences between theSRC’s. The histograms allow us to see whether there is a clear mode in the rawdata or if the mode may fall below the truncation point for an underlying unimodaldistribution.For each SRC, we create 30 evenly spaced buckets ranging from the minimumloss in each SRC to its 75% empirical quantile. Some subjectivity is needed to39Figure 3.2: Top Row: Sample densities of the log-losses for each SRC;Second Row: Histogram for the smallest 75% of raw lossesinterpret the histograms, as modes may arise out of the number of buckets usedand not necessarily from the data. Viewing these same histograms with 25 and20 buckets may appear to yield different stories. One thing that this visualizationcan tell us is that any turn in the densities for SRC’s 1 and 2 must be sharp, sincethe mode appears in the first few buckets and losses cannot be negative. If weassume the underlying severity distribution is unimodal, we expect to have a verylow truncation point between zero and the first bucket. See Section 3.4 for furtheranalysis of the issues with truncation probability estimation and how one may usethese histograms in practice.3.2 Loss Severity Distribution Estimation and SelectionIn this section, we use all of the historical truncated loss data to estimate and selectdistributions for each SRC. This process is somewhat subjective, since one mustuse their own judgement to gauge whether a distribution seems reasonable.403.2.1 SRC 1To generate losses for SRC 1, we first simulate 14 independent and identicallydistributed Pois(λ = 100) random variables, one for each of the 14 years encom-passing 2004 - 2017, inclusive. The loss frequencies for SRC 1 can be representedas a time series {N1t }, where N1t iid∼ Pois(λ = 100) for t = 2004,2005, . . . ,2017.For each simulated loss frequency nt , we generate nt independent and identicallydistributed loss severities from a Burr(α = 0.065,γ = 15,θ = 1.226) distributionand assign them to year t. Finally, all loss severities are then truncated at τ = 1.167,the 2.5% quantile from the true loss severity distribution.We perform MLE on the truncated sample as outlined in Section 2.1. Figure 3.3plots the sample density log-losses against the densities for the true loss severitydistribution and each estimated candidate distribution on the log scale. The densi-ties shown are conditional densities to emphasize fit to the truncated sample. Theplot of the true underlying distribution against the sample gives us a good indica-tion of how representative the sample is of its generating process. Density plotsare a good sanity check to make sure the MLE algorithm is producing reasonableresults.41Figure 3.3: The sample density plotted against the true severity model and each estimated candi-date distribution for SRC 1 on the log scale. The number of observations in SRC 1 is 1368,and the smoothing bandwidth is 0.1709.42The density plots do not rule out any candidate distributions since all esti-mated densities seem to fit the data, but the lognormal, generalized Pareto, Weibulland loglogistic distributions are unable to capture the mode of the sample density.These are all 2-parameter distribution families and are typically not able to accom-modate lower and upper tail behaviors. This is indicative of a high truncation point.Also, the density plots do not tell us much about the fit in the extreme right tail, aportion of the distribution that is of much interest for OR modeling, and cannot beused to compare fit or predictive ability across models. Table 3.3 gives the trun-cation probability estimate, BIC, AIC, modified Anderson-Darling test at the 95%confidence level, quantile score, out-of-sample AIC, and estimated 99.9% quantilefor each estimated candidate distribution. Values from the underlying true modelare given in the first row, and all subsequent rows are sorted from best to worst AIC.The number in parentheses in the other columns shows the rank from best to worstwithin a given column. The rank for the 99.9% quantile estimate is by distance tothe true model’s 99.9% quantile. Tables 3.4 and 3.5 are analogous to Table 3.3 forSRC’s 2 and 3, respectively.The Burr distribution has the best fit based on AIC and BIC, passes themodified Anderson-Daring test, a reasonable truncation probability, and the best99.9% quantile estimate. We consider “reasonable” truncation probabilities tobe 0.01≤ F(τ;Θ̂)≤ 0.5 and excludes the lognormal, generalized Pareto, Weibull,and loglogistic distributions. The Burr distribution also dominates the predictivemeasures of the QS and out-of-sample AIC. A logical choice for the loss severitydistribution of SRC 1 is Burr.Finally, QQ-plots give a good visualization of the fit in the right-tail and arepresented in Figure 3.4. Like the density plots, we show the QQ-plots for the truemodel as well as the estimated candidate distributions. The sample quantiles aregiven on the x-axis while the estimated quantiles are given on the y-axis. Thus,points below the diagonal line indicate quantiles that are underestimated by thecandidate distribution. The QQ-plots inform us that all candidate distributions sig-nificantly underestimate the two largest observations.43Table 3.3: SRC 1 selection critera from left to right: truncation probability esti-mate, BIC, AIC, modified Anderson-Darling test at the 95% confidence level,QS, out-of-sample AIC, and estimated 99.9% quantile. Values from the truemodel are given in the first row, and subsequent rows are sorted by AIC frombest to worst. Ranks from best (1) to worst (9) are presented for other criteria.The rank for the 99.9% quantile estimate is by distance to the true quantile.Lgn/Lgn and Lgn/Gpd refer to the spliced distributions.3.2.2 SRC 2To generate losses for SRC 2, we again simulate 14 independent and identicallydistributed Pois(λ = 100) random variables. For each simulated loss frequencynt , we generate nt independent and identically distributed loss severities from alog-SaS(a = 1.06,b = 0.37,ε = 1.65,δ = 0.97) distribution and assign them toyear t. Finally, all loss severities are then truncated at τ = 3.147, the 2.5% quantilefrom the true loss severity distribution.Figure 3.5 plots the sample density log-losses against the densities for the trueloss severity distribution and each estimated candidate distribution on the log scale.All distributions fit the data, but we note that the lognormal, generalized Pareto,Weibull, and loglogistic distributions are not able to capture the mode seen in thesample density. Therefore, their estimated truncation probabilities are going to behigh.44Figure 3.4: QQ-Plots for the true model and each estimated candidate dis-tribution for SRC 1. Estimated quantiles are given on the vertical axiswith empirical quantiles along the horizontal. Points below the 45◦ lineindicate underestimates the empirical quantile.45Figure 3.5: The sample density plotted against the true severity model and each estimated candi-date distribution for SRC 2 on the log scale. The number of observations in SRC 2 is 1342,and the smoothing bandwidth is 0.2209.46From Table 3.4, Both the g-and-h and log-SaS distributions are appropriatefor modeling SRC 2. We give a slight edge to log-SaS due to its superior quan-tile score. An important observation is that the Weibull distribution has the bestQS even though it has the worst fit based on AIC. The QQ-plots in Figure 3.6can help explain this. Remember that the quantile score at the 99.9% quantile isvery asymmetric and penalizes more for underestimating the quantile than overes-timating. From the QQ-plots below, we see that the Weibull distribution can getrelatively close to the extreme quantiles, but the distribution is likely to underesti-mate the extreme right tail. However, all other distributions overestimate the 99.9%quantile by so much that it overcomes the QS’s asymmetry. This is an importantfeature of the quantile score from the bank’s perspective, accurately estimating theright-tail without the extreme overestimation seen in literature. Finally, the Weibulldistribution would not be selected for its QS due to its extremely high truncationprobability. The MLE parameters presented in Appendix B.2 show the Weibulldistribution’s scale parameter is hitting the boundary condition θ > 0.Table 3.4: SRC 2 selection critera from left to right: truncation probability esti-mate, BIC, AIC, modified Anderson-Darling test at the 95% confidence level,QS, out-of-sample AIC, and estimated 99.9% quantile. Values from the truemodel are given in the first row, and subsequent rows are sorted by AIC frombest to worst. Ranks from best (1) to worst (9) are presented for other criteria.The rank for the 99.9% quantile estimate is by distance to the true quantile.Lgn/Lgn and Lgn/Gpd refer to the spliced distributions.47Figure 3.6: QQ-Plots for the true model and each estimated candidate dis-tribution for SRC 2. Estimated quantiles are given on the vertical axiswith empirical quantiles along the horizontal. Points below the 45◦ lineindicate underestimates the empirical quantile.3.2.3 SRC 3To generate losses for SRC 3, we again simulate 14 independent and identically dis-tributed Pois(λ = 100) random variables. For each simulated loss frequency nt , wegenerate nt independent and identically distributed loss severities from a mixturedistribution, F(x;Θ) = βF1(x;Θ1)+(1−β )F2(x;Θ2); where β = 0.3, F1(x;Θ1) isLN(µ = 0.16,σ = 0.4), and F2(x;Θ2) is Burr(α = 0.065,γ = 15,θ = 1.226). Theloss severities are assigned to year t. Finally, all loss severities are then truncatedat τ = 1.133, the 2.5% quantile from the true loss severity distribution.Figure 3.7 plots the sample density log-losses against the densities for the trueloss severity distribution and each estimated candidate distribution on the log scale.48All distributions fit the data, but we note that the lognormal, generalized Pareto,Weibull, and loglogistic distributions are not able to capture the mode seen in thesample density.49Figure 3.7: The sample density plotted against the true severity model and each estimated candi-date distribution for SRC 3 on the log scale. The number of observations in SRC 3 is 1365,and the smoothing bandwidth is 0.1479.50Table 3.5 gives the truncation probability estimate, BIC, AIC, modifiedAnderson-Darling test at the 95% confidence level, QS, out-of-sample AIC, andestimated 99.9% quantile for each estimated candidate distribution. Values fromthe underlying true model are given in the first row, and all subsequent rows aresorted from best to worst AIC.Table 3.5: SRC 3 selection critera from left to right: truncation probability esti-mate, BIC, AIC, modified Anderson-Darling test at the 95% confidence level,QS, out-of-sample AIC, and estimated 99.9% quantile. Values from the truemodel are given in the first row, and subsequent rows are sorted by AIC frombest to worst. Ranks from best (1) to worst (9) are presented for other criteria.The rank for the 99.9% quantile estimate is by distance to the true quantile.Lgn/Lgn and Lgn/Gpd refer to the spliced distributions.The results of LGNLGN (denoted Lgn/Lgn in Table 3.5) distribution in Table 3.5should be scrutinized. Due to the high truncation probability estimate for the sin-gle lognormal distribution, it is likely the case that the lognormal body spliced withlognormal tail distribution is unable to converge for the tail distribution’s MLE. Aquick investigation into the estimated parameters presented in Appendix B.3 an-swers this question for us. The estimated splicing point for the Lgn/Lgn distribu-tion is 1.8977. Remember from Section 2.1.3 that the lognormal tail is treated as atruncated distribution with a truncation point equal to the splicing point. The log-normal upper tail parameters are µu =−7.477 and σu = 3.1516, and the truncation51probability of the lognormal tail is 0.995. This is a boundary condition in the MLEalgorithm and indicates that the Lgn/Lgn distribution results cannot be used.Additionally, one may wonder why the modified Anderson-Darling test is re-jecting the Burr distribution when Burr has the best QS, and we know the true tailbehavior is Burr. Since losses for SRC 3 are generated from a mixture distribu-tion with distinct components, the estimated Burr parameters are affected by thebody of the losses generated by the lognormal component resulting in an under-estimated 99.9% quantile. Thus, the estimated Burr distribution may have a tailthat behaves differently from the mixture distribution’s Burr component. Anotherissue is caused by the weights used in calculating the test statistic. Since 30% ofthe distribution is generated by a lognormal distribution, the weighting functionshould assign more weight to the tail than our calculation. More importantly, thisshows the effectiveness of using the QS and QQ-plots for assessing a distribution’sability to model tail behavior over the modified Anderson-Darling test.Finally, we observe the effect of using AIC and BIC that focus on the centralportion of the distribution when trying to estimate the 99.9% quantile. The Burrdistribution, which is the true tail behavior for SRC 3, is unable to capture the tailbehavior due to the mode of the losses occurring in the body of the distributionmost of which are generated from lognormal. This scenario of model misspecifi-cation, which is likely to occur when working from a candidate distribution list,demonstrates the importance of combining criteria that focus on overall fit, such asAIC, with performance at extreme quantiles, such as QS.3.3 Loss Severity Selection Criteria and RegulatoryCapitalIn this section, we calculate RC for the years 2014,2015, . . . ,2018 under real-world conditions faced by a bank using AMA. To calculate RC for year T + 1,for T +1 = 2014,2015, . . . ,2018, we remove all of the operational losses for theyears T +1,T +2, . . . ,2018. This recreates the challenge of estimating a forward-looking RC. Since we know the true value of RC, we are able to compare eachyear’s calculation to the true value. All estimated parameters are available in Ap-pendices A.2, A.3, and A.4 for SRC 1, 2, and 3, respectively.52Figure 3.8: QQ-Plots for the true model and each estimated candidate dis-tribution for SRC 3. Estimated quantiles are given on the vertical axiswith empirical quantiles along the horizontal. Points below the 45◦ lineindicate underestimates the empirical quantile.All historical losses for the years 2004,2005, . . . ,T are used to estimate param-eters for the candidate severity distributions using the MLE approach. Loss severitydistribution selection for each SRC is performed using two separate objective crite-ria. First, we use lowest AIC and truncation probability estimate less than 0.5. Theselected loss severity distributions are then used to derive the Poisson rate param-eter for all losses. Then, we can empirically derive the marginal distributions fortotal annual operational loss from each SRC, denoted FSr , see Section 2.3. Finally,we use t-copulas with 10 degrees of freedom and correlation parameters 0 and 0.1,respectively, to find the distribution for the total annual operational loss. From thisdistribution, we calculate RC as the 99.9% quantile. The process is then repeatedusing quantile score instead of AIC.533.3.1 Marginal DistributionsFor each SRC, Figures 3.9, 3.10, and 3.11 show side-by-side plots of the 99.9%quantile from the best loss severity distribution each year based on AIC and quantilescore and the 99.9% quantile of the SRC’s marginal distribution for total annualloss. An important observations from these plots is that the loss severity quantilecan be used as a proxy for the total annual loss quantile.For SRC 1, the plots are given in Figure 3.9. The Burr distribution wins everyyear based on AIC and all but one year when using QS. In 2015, the g-and-hdistribution had the best QS. We observe that changes in the winning distributioncan cause drastic changes in the extreme quantiles. The underestimation of theg-and-h distribution in 2015 is due to the QS selecting a distribution based only onthe 99.9% quantle. Thus, we see that the rest of the distribution may contributea lot to the total annual risk for SRC 1, and measures of overall fit should also beconsidered when selecting a distribution.For SRC 2, the plots are given in Figure 3.10. When selecting the best dis-tribution by AIC, log-SaS wins in 2014 and 2015, but is then beat by g-and-h forthe remaining years. When selecting the loss severity distribution based on QS,log-SaS wins every year.For SRC 3, the plots are given in Figure 3.11. When selecting the best distri-bution by AIC, the lognormal body spliced with generalized Pareto tail wins everyyear. When using QS, the Burr distribution wins. SRC 3 creates the model misspec-ification scenario and yields the most interesting results.Using AIC alone may lead to selecting a severity distribution that drasticallyoverestimates risk, which is a deterrent from the bank’s perspective. We see thisdemonstrated in Figure 3.11 for the earlier years in the simulation. The QS selectioncriteria, however, chooses distributions that consistently underestimate the 99.9%quantile of the distribution of SRC 3. This scenario is also seen when analyzingSRC 1. Selecting severity distributions by the best QS can severely underestimaterisk. This signals that the other 99% of the distribution has considerable impact onthe annual losses for SRC 3.If we believe that model misspecification is likely to occur, then these resultspromote the use of AIC and BIC together with the QS, truncation probability esti-54Figure 3.9: The top plot shows the log of the 99.9% quantile from SRC 1’s lossseverity distribution as selected by AIC and QS. The shorthand name for lossseverity distribution shows each year’s selection. The bottom plot shows thelog of the 99.9% quantile for SRC 1’s total annual loss marginal distribution.55Figure 3.10: The top plot shows the log of the 99.9% quantile from SRC 2’s lossseverity distribution as selected by AIC and QS. The shorthand name for lossseverity distribution shows each year’s selection. The bottom plot shows thelog of the 99.9% quantile for SRC 2’s total annual loss marginal distribution.56mates, and graphical tools like QQ-plots and density plots to get a full picture ofthe loss severity data.3.3.2 Regulatory CapitalWe use the loss severity distributions from the previous section to compare RCwhen selecting severity distribution using AIC versus QS. Since there are no datato assess the dependence across SRC’s, we use t-copulas with 10 degrees of free-dom and correlation parameters 0 and 0.1 when combining losses for total annualoperational loss. Using these two separate correlation parameters, we see that RCbarely changes, as presented in Figure 3.12, suggesting little sensitivity to changesin the dependence between SRC’s. This analysis is limited, however, and using var-ious degrees of freedom with different correlation parameters can provide a betterpicture of sensitivity.We see a similar story as told by the marginal distribution quantile plots,namely that the quantile score calculates RC consistently below that of the truevalue and RC calculated by choosing loss severity distributions by AIC. This seemsto suggest that accurate RC calculations should consider more than just the 99.9%quantile.3.4 Challenges of Truncation ProbabilitiesWithout any data collection or expert opinion of the operational loss severities be-low an SRC’s minimum reporting threshold, it is difficult to set a general intervalof reasonable truncation probability estimates. While we can set conservativelywide ranges, such as (0.01,0.5), it may be possible that the actual proportion ofan SRC’s losses below the threshold fall outside this interval. This problem oftruncation probabilities exists throughout the industry with no consensus on howto handle it [AMA Group, 2013]. Using either the naive approach or the shiftedapproach avoids truncation probabilities altogether, but they have their own detrac-tions as discussed in Section 2.1.In addition to the challenges directly associated with estimating truncationprobability, a study by Yu and Brazauskas [2017] analyzed the differences in RC57Figure 3.11: The top plot shows the log of the 99.9% quantile from SRC 3’s lossseverity distribution as selected by AIC and quantile score. The shorthandname for loss severity distribution shows each year’s selection. The bot-tom plot shows the log of the 99.9% quantile for SRC 3’s total annual lossmarginal distribution.58Figure 3.12: The top plot shows the log of RC as the 99.9% quantile for total an-nual operational loss when using a t-copula with 10 degrees of freedom andcorrelation parameter of 0 to combine losses across SRC’s. The bottom plotshows the log of RC with a correlation parameter of 0.1. In both plots, thetrue RC is the solid line, the dashed line is RC using estimated distributionsselected by AIC, and the dotted line shows RC using estimated distributionsselected by QS.59estimates between the three approaches and concluded that the truncation approachprovides the lowest RC (called VaR by Yu and Brazauskas) estimates when usingLDA:“We demonstrate that for a fixed probability distribution, the choiceof the truncation approach yields lowest VaR estimates, which maybe viewed as beneficial to the bank, whilst the naive and shifted ap-proaches lead to higher estimates of VaR.”The uncertainty surrounding the truncation probability estimate may be an expla-nation for the phenomenon of RC estimates being systematically lower when usingthe truncation approach. If the truncation probability estimate is too high, then toomany of the simulated losses may be too small.To investigate this idea further, we simulate losses in a scenario where we havemore information than in reality. We assume a known distribution and 1000 inde-pendent samples each of size 2500 from the same Burr distribution used to generateSRC 1. We then truncate each sample at the actual 2.5%, 5%, 10%, 20%, and 40%quantiles and estimate the truncation probability assuming the Burr distribution.Figure 3.13 shows the distribution of truncation probability estimates under theknown model for different truncation points.As expected, when the truncation point is small, the distribution of trunca-tion probability estimates is very positively skewed. As the truncation point is in-creased, we see larger standard errors. With a truncation point at the 20% quantile,we have a bimodal distribution with many estimates far exceeding the actual value.Without any knowledge of the losses below the truncation point, we acknowledgethat truncation probability estimates introduce more uncertainty in calculating RC.In light of the challenges in estimating truncation probability, we propose asimple alternative. The purpose of the truncation approach is to estimate an entiredistribution over which we can simulate losses for each SRC when using the LDA.Using the histograms presented in Figure 3.2, we can make reasonable assumptionsabout the number of losses below the truncation point. For example, the histogramsfor SRC 1 and SRC 3 seem to be indicating a mode for the losses around the 2nd or3rd bucket. Under the assumptions of the truncation approach combined with ourlist of candidate distributions, it would be reasonable to assume that the number60Figure 3.13: Distribution of truncation probability estimates for 1000 trun-cated samples of size 2500 at the 2.5%, 5%, 10%, 20%, and 40%quantiles. Maximum likelihood estimation is performed using the Burrdistribution. Even under these optimal conditions, there is a lot of un-certainty in the truncation probability estimate.of losses for each bucket below the truncation point can be bounded above bythe mode of the observable losses. Under this assumption, we could eliminatecandidate distributions that do not have a truncation probability below this “worstcase” scenario. Unfortunately, this would not help us with losses exhibited by anSRC that does not show a clear mode in the histogram and whose mode might liebelow the truncation point. Any data collection for losses below the threshold, evenin aggregate, improves the truncation approach by setting bounds on the truncationprobability estimates.As mentioned in Section 2.1.4, the truncation probability is a useful tool foreliminating inappropriate distributions that seek to mimic tail behavior by increas-ing the truncation probability. Using the same 1000 Burr samples used to generate61Figure 3.13, estimating the truncation probably using the lognormal distributionproduces no estimates below 0.9. These high estimates would lead to extremelyhigh Poisson rates for the loss frequencies when simulating losses. Therefore, thespeed and efficiency of using the truncation approach to select a loss severity dis-tribution is increased by considering the truncation probability estimate.62Chapter 4ConclusionWhile both regulators and financial institutions share the main objective to measureOR as accurately as possible, they have diametrically opposing priorities. Regula-tors want to minimize a bank’s exposure to financial ruin from an operational lossand thus want operational risk modeled as accurately as possible while minimizingunderestimation. Since a bank must set aside assets equal to RC to cover poten-tial losses, the bank wants to accurately model operational risk while minimizingoverestimation. While we do not pose a solution to this problem, the research pre-sented supports the quantile score as a function that considers the priorities of bothstakeholders.Since the 99.9% quantile of a loss severity distribution is a good proxy of aSRC’s total annual loss, selecting distributions based on the quantile score is in-tuitive and easily interpretable. The asymmetry of the quantile score at the 99.9%quantile penalizes underestimation more than overestimation which aligns with thepriorities of regulators. However, the quantile score also penalizes overestimation,and severe overestimation is a concern for banks that is also seen in operationalrisk research [Dutta and Perry, 2006]. Thus, the quantile score accounts for boththe regulator’s and the bank’s priorities.A concern when using the quantile score at the 99.9% quantile to select a sever-ity distribution is that it ignores the rest of the distribution, which can make up asignificant portion of a SRC’s total annual loss. One way to address this issue is tocalculate the quantile score at multiple quantiles. Since we are most concerned with63the right-tail of the loss severity distributions, we suggest using the “letter values”presented in Hoaglin [1985] which are comprised of the quantiles(1− (1/2)n) forn = 2,3, . . . ,8. If the focus is on achieving the best quantile score in the right-tail, however, then the maximum likelihood approach may not provide parametersthat minimize the quantile score objective function, equation (2.13). Minimizingequation (2.13) is a computationally taxing process, so due diligence to eliminateinappropriate distributions such as those with unrealistically high truncation prob-abilities would be prudent.The quantile score should be complemented with estimators such as AIC andBIC, which consider the entire distribution with a focus around the mode, and somecombination of these estimators should provide better distribution selection thanonly using one. We noticed in the SRC 3 simulation that AIC and quantile scorepicked very different distributions, and this may be a signal that loss severities aregenerated from separate processes in the body and the tail, so combining thesemeasures is especially important when there is model misspecification. A singledistribution may dominate the quantile score, but is unable to capture the true lossgenerating process if it is a mixture model or some other combination of distribu-tions. The opposite is also true. A distribution that dominates in AIC may fail toaccurately capture the extreme right tail of the loss generating process and lead todrastic over or underestimations.The log-SaS distribution is a highly flexible distribution that performs wellwhen modeling loss severities generated from different processes when using max-imum likelihood estimation. Log-SaS solves the problems of the g-and-h distribu-tion by having a support on the positive real numbers, a monotonic transformationover the entire parameter space, and an analytical inverse. The log-SaS distribu-tion offers these advantages while maintaining the flexibility to capture various tailbehaviors and shapes. Another benefit of the log-SaS severity distribution is thatthe parameters can be estimated using a log-transform of the loss severity data,aiding in the convergence of numerical algorithms and mitigating the badly-scaledproblem.Issues with maximum likelihood estimation and truncation probability esti-mates should be included in the loss severity distribution selection process, asthey can quickly eliminate inappropriate distributions. When using the truncation64approach, extremely high truncation probabilities are often the result of a non-converging maximum likelihood algorithm due to a lighter-tailed distribution’s ef-fort to mimic heavier-tailed behavior. Even if the algorithm is converging, hightruncation probabilities may simply be unrealistic and more data collection and/orexpert opinion is desired for losses below the minimum reporting threshold. Forthe g-and-h distribution specifically, care must be taken to analyze the results ofmaximum likelihood to avoid situations where too much of the distribution liesbelow zero.This report has laid the groundwork for future work in loss severity distribu-tion selection that incorporates both regulators’ and banks’ preferences to opera-tional risk modeling. Inclusion of the quantile score shows promise in capturingthe right-tail of loss severity data, and future work may provide optimal quantilesto incorporate and the effectiveness of estimating distribution parameters usingthe quantile score in equation (2.13) as the objective function. We also presenta new perspective to maximum likelihood estimation for the g-and-h distributionthat includes penalties to the portion of the distribution below zero. This penalizedlikelihood approach can be improved by assessing the optimal penalty term andthe weight given to the penalty and may also be applied to other distributions toeliminate power tail mimicking of Gumbel-type distributions. We also stress thatthe truncation approach can benefit from additional data collection for losses belowthe reporting threshold and encourage banks to start this collection process.65BibliographyAkaike, H. (1974). A new look at the statistical model identification. IEEETransactions on Automatic Control, AC-19(6):716–723.AMA Group (2013). Ama quantification challenges: Amag range of practice andobservations on ‘the thorny lda topics’. Industry position paper, The RiskManagement Association.Baud, N., Frachot, A., and Roncalli, T. (2003). How to avoid over-estimatingcapital charge for operational risk? Opertaional Risk - Risk’s Newsletter.BCBS (2006). International convergence of capital measurement and capitalstandards. Technical report, Bank for International Settlements.BCBS (2009). Observed range of practice in key elements of advancedmeasurement approaches (ama). Survey, Bank for International Settlements.BCBS (2011). Operational risk - supervisory guidelines for the advancedmeasurement approaches. Technical report, Bank for International Settlements.BCBS (2016). Standardised measurement approach for operational risk.Consultative document, Bank for International Settlements.Beirlant, J., Goegebeur, Y., Teugels, J., and Segers, J. (2004). Statistics ofExtremes: Theory and Applications. John Wiley & Sons, Ltd.Chavez-Demoulin, V., Embrechts, P., and Hofert, M. (2015). An extreme valueapproach for modeling operational risk losses depending on covariates. TheJournal of Risk and Insurance, 83(3):735–776.Chernobai, A., Menn, C., Truck, S., and Rachev, S. (2005). A note on theestimation of the frequency and severity distribution of operational losses. TheMathematical Scientist, 30(2):1–10.66Chernobai, A., Rachev, S., and Fabozzi, F. (2007). Operational Risk: A Guide toBasel II Capital Requirements, Models, and Analysis. John Wiley & Sons, Inc.Coles, S. (2001). An Introduction to Statistical Modelling of Extreme Values.Springer Series in Statistics.Degen, M., Embrechts, P., and Lambrigger, D. (2007). The quantitative modelingof operational risk: Between g-and-h and evt. Astin Bulletin, 37(2):265–291.Dutta, K. and Perry, J. (2006). A tale of tails: An empirical analysis of lossdistribution models for estimating operational risk capital. Working Papers06-13, Federal Reserve Bank of Boston.Economist (2016). The final bill - financial crime. The Economist, 11.Embrechts, P. and Hofert, M. (2011). Practices and issues in operational riskmodeling under basel ii. Lithuanian Mathematical Journal, 51(2):180–193.Embrechts, P., Klu¨ppelberg, C., and Mikosch, T. (1997). Modelling ExtremalEvents for Insurance and Finance. Springer.Foss, S., Korshunov, D., and Zachary, S. (2013). An Introduction to Heavy-Tailedand Subexponential Distributions. Springer Science + Business Media NewYork, 2nd edition.Gilbert, R. (1987). Statistical Methods for Environmental Pollution Monitoring.Van Nostrand Reinhold Company, Inc.Gneiting, T. (2011). Making and evaluating point forecasts. Journal of theAmerican Statistical Association, 106(494):746–762.Grooters, D. and Reinink, B. (2013). Data set size for operational risk modelling.Technical report, Black Cat’s Walk.Hayahsi, Y. (2018). Wells fargo to pay $1 billion to settle risk managementclaims. Wall Street Journal, 20.Hoaglin, D. (1985). Summarizing shape numerically: The g-and-h distributions.In Hoaglin, D., Mosteller, F., and Tukey, J., editors, Exploring Data Tables,Trends, and Shapes, chapter 11, pages 461–513. John Wiley & Sons, Inc.James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction toStatistical Learning with Applications in R. Springer Science + Business MediaNew York, 6th edition.67Jones, M. and Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika,96(4):761–780.Lee, S. Y. and Kim, J. H. (2018). Exponentiated generalized pareto distribution:Properties and applications towards extreme value theory. Communications inStatistics - Theory and Methods, page DOI: 10.1080/03610926.2018.1441418.Luo, X., Shevchenko, P., and Donnelly, J. (2007). Addressing the impact of datatruncation and parameter uncertainty on operational risk estimates. The Journalof Operational Risk, 2(4):3–26.McNeil, A. J., Frey, R., and Embrechts, P. (2015). Quantitative Risk Management:Concepts, Techniques, Tools. Princeton University Press, Princeton.Panjer, H. (2006). Operational Risk: Modeling Analytics. John Wiley & Sons,Inc.Perline, R. (2005). Strong, weak and false inverse power laws. Statistical Science,20(1):68–88.Peters, G. and Shevchenko, P. (2015). Advances in Heavy Tailed Risk Modeling:A Handbook of Operational Risk. John Wiley & Sons, Inc.Polfeldt, T. (1970). Asymptotic results in non-regular estimation. ScandinavianActuarial Journal, Supplemental Volumes 1-2.Resnick, S. (1987). Extreme Values, Regular Variation, and Point Processes.Springer.Ross, S. (2010). A First Course in Probability. Pearson Education, Inc., 8thedition.Schwarz, G. (1978). Estimating the dimension of a model. The Annals ofStatistics, 6(2):461–464.Sinclair, C., Spurr, B., and Ahmad, M. (1990). Modified anderson-darling test.Communications in Statistics - Theory and Methods, 19(10):3677–3686.Strasburg, J. (2018). Barclays to pay $2 billion to resolve mortgage-securitiesclaims. Wall Street Journal, 29.Yu, D. and Brazauskas, V. (2017). Model uncertainty in operational risk modelingdue to data truncation: A single risk case. Risks, 5(49).68Appendix ALoss Severity DistributionsA.1 Lognormal DistributionLet Y be a normally distributed random variable with location parameter µ andscale parameter σ . Since Y ∼ N(µ,σ2), then X = eY is lognormally distributedwith location parameter µ and scale parameter σ , denoted LN(µ,σ2). To derivethe lognormal distribution, let Z ∼ N(0,1). Then Y = µ+σZ andFX(x) = P(X ≤ x) = P(eY ≤ x)= P(µ+σZ ≤ log(x))= P(Z ≤ log(x)−µσ)=Φ(log(x)−µσ),whereΦ is the standard normal CDF. Let φ be the standard normal density function.Then for Θ= [ µ σ ], where µ ∈ R and σ > 0,FX(x;Θ) =Φ(log(x)−µσ)for x> 0,0 for x≤ 0;fX(x;Θ) =1σxφ(log(x)−µσ)for x> 0,0 for x≤ 0;69F−1X (p;Θ) = exp{µ+σΦ−1(p)}for 0< p< 1.The conditional CDF and PDF for a lognormally distributed random variablewith minimum threshold τ are derived from equations (2.5) and (2.6), respectively.The quantile function is found by setting F˜(x;Θ,τ) = p, for 0< p< 1, and solvingfor x:F˜(x;Θ,τ) =Φ[(log(x)−µ)/σ]−Φ[(log(τ)−µ)/σ]1−Φ[(log(τ)−µ)/σ] for x> τ,0 for x≤ τ;f˜ (x;Θ,τ) =φ[(log(x)−µ)/σ]σx[1−Φ[(log(τ)−µ)/σ]] for x> τ,0 for x≤ τ;F˜−1(p;Θ,τ) = exp{µ+σΦ−1[(1− p)Φ(log(τ)−µσ)+ p]}for 0< p< 1.From equation (2.7), we derive the conditional likelihood function asL˜(Θ;x,τ) =[σ −σ Φ(log(τ)−µσ)]−n n∏i=11xiφ(log(xi)−µσ).From equation (2.8), the conditional log-likelihood function is˜`(Θ;x,τ) =−n log(σ)−n log[1−Φ(log(τ)−µσ)]+n∑i=1log[φ(log(xi)−µσ)]−n∑i=1log(xi).A.2 Generalized Pareto DistributionAs presented in Coles [2001], for a sequence of random variables,X1,X2, . . . ,Xniid∼ H, let70Mn = max{X1, ...,Xn}.Suppose that there exist sequences of constants {an > 0} and {bn} such thatH(bn+anz) = P{(Mn−bn)/an ≤ z}→ G(z) as n→ ∞for a non-degenerate distribution function G. Then, G is a member of the general-ized extreme value (GEV) family,G(z) = exp{−[1+ξ(1+z−µσ)]−1/ξ},defined on{z : 1+ξ (z−µ)/σ > 0}, where µ,σ > 0 and µ,ξ ∈ R. For large n,P{(Mn−bn)/an ≤ z} ≈ G(z)P{Mn ≤ z} ≈ G(bn+anz)= G∗(z),where G∗ is another distribution from the GEV family.Finally, for large enough u> µ and θ = σ+ξ (u−µ), the distribution functionof (X−u) conditioned on X > u is approximatelyF(x) =1−[1+ ξθ (x−u)]−1/ξfor ξ 6= 0,1− exp{− x−uθ } for ξ = 0,defined for x > u when ξ > 0 and u < x < u− θ/ξ when ξ < 0, and ξ = 0 in-terpreted as ξ → 0 leading to an exponential distribution for the excess x−u withparameter 1/θ . This is called the GPD! (GPD!) family.For our purposes, we work exclusively with the GPD! where ξ > 0, so allfunctions for the remainder of Appendix A.2 assume tail parameter ξ > 0. TheCDF, PDF, and quantile function for the GPD! family with parameter vector Θ =[ ξ θ ], where ξ > 0 is the tail parameter, θ > 0 the scale parameter, and u is thetreshold, are71F(x;Θ,u) =1−(1+ ξθ (x−u))−1/ξfor x> u,0 for x≤ u;f (x;Θ,u) =1θ(1+ ξθ (x−u))− 1+ξξ for x> u,0 for x≤ u;F−1(p;Θ,u) = u+θξ((1− p)−ξ −1) for 0< p< 1.For single severity candidate distributions, the threshold u = 0 is not a param-eter and thus is not included in the parameter vector. When estimating the pa-rameters for the GPD! family assuming a single severity distribution, operationalloss amounts are treated as excesses over 0. Since our data are from a truncatedsample with minimum reporting threshold τ , we calculate the conditional CDF andPDF for a generalized Pareto distributed random variable using equations (2.5) and(2.6), respectively. The quantile function is found by setting F˜(x;Θ,τ) = p andsolving for x. Assuming τ > u, the conditional CDF, PDF, and quantile functionsareF˜(x;Θ,u,τ) =1−(θ+ξ (x−u)θ+ξ (τ−u))−1/ξfor x> τ,0 for x≤ τ;f˜ (x;Θ,u,τ) =1θ+ξ (τ−u)(θ+ξ (x−u)θ+ξ (τ−u))− 1+ξξ for x> τ,0 for x≤ τ;F˜−1(p;Θ,u,τ) = u+(1− p)−ξ (θ +ξ (τ−u))−θξfor 0< p< 1.For the LGNGPD distribution, the tail distribution is the unconditional GPD! withthe threshold equal to the splicing point, u = xs.Using equation (2.7), the conditional likelihood function for the single severitygeneralized Pareto candidate distribution is72L˜(Θ;x,u,τ) =[θ +ξ (τ−u)]n/ξ ∏ni=1 [θ +ξ (xi−u)]− 1+ξξ ,and from equation (2.8), the conditional log-likelihood function for the singleseverity generalized Pareto candidate distribution is˜`(Θ;x,u,τ) = nξ log{θ +ξ (τ−u)}−(1+ξξ )∑ni=1 log{θ +ξ (xi−u)}.To aid in the convergence of the MLE algorithm as mentioned in Sec-tion 2.1.4, we fit the log-transform of a generalized Pareto random variable. LetX ∼ GPD(ξ ,θ) with threshold u, and let Y = log(X). The CDF and PDF for Y arederived, respectively, asFY(y;Θ, log(u))= F(ey;Θ,u) (A.1)fY(y;Θ, log(u))= ey f (ey;Θ,u) (A.2)Then Y is called the exponentiated generalized Pareto distribution [Lee and Kim,2018], and the CDF and PDF for Y , using equations (A.1) and (A.2) areFY(y;Θ, log(u))=1−(1+ ξθ(ey−u))−1/ξ for y> log(u),0 for y≤ log(u);fY(y;Θ, log(u))=eyθ(1+ ξθ(ey−u))− 1+ξξ for y> log(u),0 for y≤ log(u).Assuming τ > u≥ 0, the conditional CDF and PDF areF˜Y(y;Θ, log(u), log(τ))=1−(θ+ξ (ey−u)θ+ξ (τ−u))−1/ξfor y> log(τ),0 for y≤ log(τ);f˜Y(y;Θ, log(u), log(τ))=eyθ+ξ (τ−u)(θ+ξ (ey−u)θ+ξ (τ−u))− 1+ξξ for y> log(τ),0 for y≤ log(τ).73By letting y= log(x) and using equation (2.7), the conditional likelihood functionfor the log-losses isL˜(Θ;y,u,τ) =[θ +ξ (τ−u)]n/ξ exp{∑ni=1 yi}∏ni=1 [θ +ξ (eyi−u)]− 1+ξξ ,and from equation (2.8), the conditional log-likelihood function for the log-lossesis˜`(Θ;y,u,τ) = nξ log{θ +ξ (τ−u)}+∑ni=1 yi−(1+ξξ )∑ni=1 log{θ +ξ (eyi−u)}.A.3 Burr DistributionThe parameterization of the candidate Burr Distribution is a generalized three-parameter version of the Pareto distribution derived from Chapter 6 of Chernobaiet al. [2007]. This parameterization allows for greater flexibility due to an addi-tional shape parameter and is derived asF(x) = 1−(ββ + xγ)α= 1−(β + xγβ)−α= 1−(1+xγβ)−α= 1−(1+( xθ)γ)−α,where θ = β 1/γ . This parameterization of the Burr distribution has shape param-eters α, γ > 0 and scale parameter θ > 0. The flexibility of this parameterizationis in its ability to capture both Pareto and loglogistic distributions. When γ = 1,the Burr distribution reduces to the Pareto distrbution with power tail x−α . Whenα ≤ 1, we have a very heavy-tailed distribution with infinite mean and variance.When α = 1, the Burr distribution reduces to the loglogistic distribution.The CDF, PDF, and quantile function for the Burr distribution with parametervector Θ= [α γ θ ] areF(x;Θ) =1−[1+( xθ)γ]−α for x> 0,0 for x≤ 0;74f (x;Θ) =αγx−1(xθ)γ(1+(xθ)γ)1+α for x> 0,0 for x≤ 0;F−1(p;Θ) = θ((1− p)−1/α −1)1/γ for 0< p< 1.The conditional CDF and PDF for a Burr distributed random variable withminimum threshold τ are derived from equations (2.5) and (2.6), respectively. Thequantile function is found by setting F˜(x;τ) = p and solving for x:F˜(x;Θ,τ) =1−(θ γ+τγθ γ+xγ)αfor x> τ,0 for x≤ τ;f˜ (x;Θ,τ) =αγ xγ−1[θ γ + τγ]α [θ γ + xγ]−α−1 for x> τ,0 for x≤ τ;F˜−1(p;Θ,τ) =[(1− p)−1/α [θ γ + τγ]−θ γ]1/γ for 0< p< 1.Using equation (2.7), the conditional likelihood function for the Burr candidatedistribution isL˜(Θ;x,τ) = (αγ)n[θ γ + τγ]nα n∏i=1xγ−1i[θ γ + xγi]−α−1.From equation (2.8), the conditional log-likelihood function for the Burr can-didate distribution is˜`(Θ;x,τ) = n log(αγ)+nα log[θ γ + τγ]+(γ−1) n∑i=1log(xi)− (α+1)n∑i=1log[θ γ + xγi].MLE for the Burr distribution is performed on the log-loss data. Unfortunately,the log-losses do not eliminate the badly-scaled problem discussed in Section 2.1.4.To address this, we again reparameterize the Burr distribution. Making the follow-75ing substitutions,α∗ = αγ, ζ = γ, η = θ ,the Burr CDF becomesF(x;Θ) = 1−(1+( xη)ζ)−α∗/ζ,where * is used to differentiate the new α∗ parameter from the original parameter-ization. In this parameterization, α∗ is a tail parameter, and ζ is a shape parameterthat can affect the lower tail. The Burr distribution is unimodal when ζ > 1.Now, let X ∼ Burr(α,γ,θ) and let Y = log(X). Y is said to have a generalizedlogistic distribution, since the parameterization can be used to model both logis-tic and exponentiated Pareto random variables. Using equations (A.1) and (A.2),the CDF, PDF, and quantile function for the generalized logistic distribution withparameter vector Θ= [α∗ ζ η ] where α∗ ∈ R and ζ ,η > 0 areFY (y;Θ) = 1−[1+(eyη)ζ]−α∗/ζfor y ∈ R;fY (y;Θ) =α∗(ey/η)ζ[1+(ey/η)ζ]α∗/ζ+1 for y ∈ R;F−1Y (p;Θ) = log(η)+1ζlog{(1− p)−ζ/α∗−1} for 0< p< 1.The conditional CDF and PDF for a generalized logistic distributed randomvariable with minimum threshold log(τ) are derived from equations (2.5) and (2.6).The quantile function is found by setting F˜Y((y;Θ, log(τ))= p and solving for y:F˜Y(y;Θ, log(τ))=1−[(ηζ + eζτ)/(ηζ + eζy)]α∗/ζfor y> log(τ),0 for y≤ log(τ);76f˜Y(y;Θ, log(τ))=(α∗eζy)(ηζ + eζτ)/(ηζ + eζy)α∗/ζ+1 for y> log(τ),0 for y≤ log(τ);F−1Y(p;Θ, log(τ))= log(η)+1ζlog{1+(eτ/η)ζ ∗(1− p)ζ/α∗ −1}for 0< p< 1.All estimated parameters are converted back to the original Burr(α,γ,θ) distribu-tion throughout the report.By letting y= log(x) and using equation (2.7), the conditional likelihood func-tion for the generalized logistic candidate distribution isL˜(Θ;y, log(τ))= α∗n(ηζ + exp{ζτ})nα∗/ζ n∏i=1exp{ζyi}[ηζ + exp{ζyi}]α∗/ζ+1 .From equation (2.8), the conditional log-likelihood function for the generalizedlogistic candidate distribution is˜`(Θ;y, log(τ))=n log(α∗)+ nα∗ζlog(ηζ + eζτ)+ζn∑i=1yi−(α∗ζ+1) n∑i=1log(ηζ + eζyi).A.4 Weibull DistributionThe Weibull Distribution is one of three limiting distributions in Extreme ValueTheory (EVT), see Coles [2001]. In EVT, the Weibull distribution is the GEV dis-tribution with negative shape parameter ξ < 0. The Weibull distribution is similarto the shape of a lognormal distribution, but with a thinner tail. This distribution iswidely used as the distribution of the lifetime of some object, particularly when the“weakest link” model is appropriate [Ross, 2010]. For parameter vector Θ= [ a θ ],with shape parameter a> 0 and scale parameter θ > 0, the CDF, PDF, and quantilefunction for the Weibull distribution are77F(x;Θ) =1− exp{− [ xθ ]a} for x> 0,0 for x≤ 0;f (x;Θ) =aθ[ xθ]a−1 exp{− [ xθ ]a} for x> 0,0 for x≤ 0;F−1(p;Θ) = θ[− log(1− p)]1/a for p ∈ (0,1).The conditional CDF and PDF for a Weibull distributed random variable withminimum threshold τ > 0 are derived from equations (2.5) and (2.6), respectively.The quantile function is found by setting F˜(x;Θ,τ) = p and solving for x.F˜(x;Θ,τ) =1− exp{( τθ)a− ( xθ )a} for x> τ,0 for x≤ τ;f˜ (x;Θ,τ) =aθ[ xθ]a−1 exp{( τθ )a− ( xθ )a} for x> τ,0 for x≤ τ;F˜−1(p;Θ,τ) ={(1− p)−1/α [θ γ + τγ]−θ γ}1/γ for 0< p< 1.Using equation (2.7), the conditional likelihood function for the Weibull can-didate distribution given a sample x ∈ Rn isL˜(Θ;x,τ) = an θ−na exp{n(τ/θ)a} n∏i=1xa−1i exp{− (xi/θ)a},and from equation (2.8), the conditional log-likelihood function for the Weibullcandidate distribution is˜`(Θ;x,τ) = n log(a)−na log(θ)+n( τθ)a+(a−1)n∑i=1log(xi)−n∑i=1(xiθ)a.78A.5 Loglogistic DistributionLike Weibull distribution, the loglogistic distribution is similar in shape to thelognormal distribution, but with a heavier right-tail than both the lognormal andWeibull. From [Panjer, 2006, p. 62], the loglogistic distribution has shape param-eter γ > 0 and scale parameter θ > 0. The CDF, PDF, and quantile function for theloglogistic distribution areF(x;Θ) =[1+(x/θ)−γ]−1 for x> 0,0 for x≤ 0;f (x;Θ) =γ(x/θ)γx[1+(x/θ)γ]2 for x> 0,0 for x≤ 0;F−1(p;Θ) = θ(1− pp)−1/γfor 0< p< 1.The conditional CDF and PDF for a loglogistic distributed random variable withminimum threshold τ > 0 are derived from equations (2.5) and (2.6), respectively.The quantile function is found by setting F˜(x;Θ,τ) = p and solving for x.F˜(x;Θ,τ) =[xγ − τγ]/[θ γ + xγ] for x> τ,0 for x≤ τ;f˜ (x;Θ,τ) =γ xγ−1 [θ γ + τγ ]/[θ γ + xγ ]2 for x> τ,0 for x≤ τ;F˜−1(p;Θ,τ) =[pθ γ + τγ1− p]−1/γfor 0< p< 1.Using equation (2.7), the conditional likelihood function for the loglogistic79candidate distribution given a sample x ∈ Rn isL˜(Θ;x,τ) = γn [θ γ + τγ ]nn∏i=1{xγ−1i /[θγ + xγi ]2},and from equation (2.8), the conditional log-likelihood function for the loglogisticcandidate distribution is˜`(Θ;x,τ) = n log(γ)+n log[θ γ + τγ ]+ (γ−1)∑ni=1 log(xi)−2∑ni=1 log[θ γ + xγi ].A.6 g-and-h DistributionThe g-and-h distribution, as presented by Hoaglin [1985], is a four-parameter gen-eralization of the lognormal distribution resulting from a transformation of a stan-dard normal random variable. The transformational nature of the g-and-h distri-bution allows one to use the quantiles of sample data relative to the quantiles ofthe standard normal distribution to estimate the parameters. This method, calledthe percentile method, allows for the skewness and elongation parameters, g and hrespectively, to be estimated as functions of the standard normal percentiles, g(zp)and h(zp) for percentile p, leading to an extremely flexible distribution. This flex-ibility enables the g-and-h distribution to model fatter or thinner tails and positiveor negative skewness when compared to the lognormal distribution. Full details areavailable in Hoaglin [1985].Let Z ∼ N(0,1). Then the random variable X∗ such thatX∗ = Ag,h(Z) =(egZ−1g)ehZ2/2,is said to have a standard g-and-h distribution with skewness parameter g and elon-gation parameter h. We refer to X∗ as “standard” since the g-and-h location andscale parameters are 0 and 1, respectively. When g = 0, there is no skewness. Thiscan be seen by writing the Taylor expansion of egz,egz = 1+gz+(gz)22!+(gz)33!+ ...egz−1g= z+gz22!+g2z33!+ ...,80and letting g = 0 in the right-hand side of the last equation. To further generalize,let a ∈ R be a location parameter and b > 0 be a scale parameter. Then randomvariable X , whereX = a+b ·Ag,h(Z) = a+b(egZ−1g)ehZ2/2,is said to have a g-and-h distribution.For our purposes, estimating parameters using the percentile method has majordrawbacks. First, the percentile method allows the elongation and skewness param-eters to change unsystematically, which can cause Ag,h(Z) to be non-monotonic inZ with potentially multiple turning points. Also, h may take on negative valueswhich also causes non-monotonicity in Ag,h(Z). While non-monotonicity is not aproblem, per se, multiple turning points cause the g-and-h PDF to become unwieldywhen calculating likelihoods for MLE, AIC, and BIC. More importantly, however,non-monotonicity in Ag,h(Z) may lead to undefined regions, where some observa-tions from the sample do not have a defined inverse. When this occurs, there is noway to calculate a likelihood for these observations. The undefined regions tend toinclude the largest observations, which we are most concerned with when model-ing OR. Thus, the observations with the highest impact on operational losses arethe ones that we are least able to model.The second drawback of the percentile method occurs as the number of statis-tically significant parameters increase. When g and h are allowed to be functions,the percentile method can easily result in 5 or more significant parameters, leadingto the phenomenon known as overfitting the data [James et al., 2013]. As a result,the model suffers prediction accuracy which is the goal of the AMA. Also, whileg(zp) and h(zp) are assumed to be polynomials, there is no clear consensus on howto estimate the degree of the polynomials and their coefficients.Finally, the selection of which percentiles to use when estimating parametersg and h by the percentile method is subjective and can greatly affect the estimates.There is little research done in picking the optimal percentiles for various scenarios.Hoaglin suggests using the “letter values”, the 34 ,78 ,1516 ,3132 ,6364 ,127128 ,255256 percentiles.This allows the upper tail of the distribution to be measured with some precisionrelative to other areas of the distribution. Once the percentiles have been selected,the second drawback still exists to estimate g(zp) and h(zp). Hoaglin assumes a81linear form, but selects the model visually. For these reasons, the percentile methodis better used as an exploratory data analysis technique to gauge the systematic andunsystematic properties of the sample data than as an estimation procedure.Given the drawbacks associated with the percentile method and to comparethe estimated g-and-h distribution to the other candidate distributions, the maxi-mum likelihood approach is used when estimating the g-and-h distribution for lossseverity. The maximum likelihood approach requires that Ag,h(Z) be monotonic,so we restrict both g and h to be constant, with h > 0. As shown by Degen et al.[2007], this restriction forces the g-and-h distribution to have regularly varyingtails with index −1/h.The skewness parameter, g, signifies both the direction and magnitude of skew-ness. Positive g signifies positive skewness and larger g signifies more skewness.When g= 0 and h> 0, the distribution is symmetric with fatter tails than the normaldistribution. If g = 1 and h = 0, we have a shifted lognormal distribution. In thecontext of positively skewed distributions, the lognormal distribution is assumedto have “neutral elongation”. As a result, restricting the elongation parameter topositive values is not unreasonable when the tail is fatter than lognormal. The lim-itation of a constant g may not be optimal, but is the same restriction imposed onthe other candidate distributions used to model operational loss severities.Let A−1g,h(X−ab)be the inverse standard g-and-h transformation which does nothave an analytical form, and let A′g,h(Z) be the derivative of the standard g-and-h transformation. Also, let Φ(z) and φ(z) be the standard normal CDF and PDF,respectively. Let Φ−1(p) be the standard normal quantile function for 0 < p < 1.Then, for parameter vector Θ= [ a b g h ] with a,b,g,h ∈ R and b,h> 0,F(x;Θ) =Φ[A−1g,h(x−ab)]for x ∈ R;f (x;Θ) =φ[A−1g,h( x−ab)]b A′g,h[A−1g,h( x−ab)] for x ∈ R;F−1(p;Θ) = a+b Ag,h(Φ−1(p))for 0< p< 1.82The g-and-h distribution has support on the real number line, so the g-and-hloss severity distribution can lead to negative losses when performing simulations.To avoid this, simulations are performed from the conditional g-and-h distributionwith minimum threshold τ = 0. Using the conditional distribution for simulationsis reasonable when the estimated truncation probability is sufficiently small. SeeSection 2.1.4 for details.The conditional CDF and PDF for a g-and-h distributed random variable withminimum threshold τ are derived from equations (2.5) and (2.6), respectively. Thequantile function is found by setting F˜(x;Θ,τ) = p and solving for x.F˜(x;Θ,τ) =Φ[A−1g,h((x−a)/b)]−Φ[A−1g,h((τ−a)/b)]1−Φ[A−1g,h((τ−a)/b)] for x> τ,0 for x≤ τ;f˜ (x;Θ,τ) =11−Φ[A−1g,h((τ−a)/b)] φ[A−1g,h((x−a)/b)]b A′g,h[A−1g,h((x−a)/b)] for x> τ,0 for x≤ τ;F˜−1(p;Θ,τ) = a+bAg,h{Φ−1[(1− p) Φ[A−1g,h(τ−ab)]+ p]}for 0< p< 1.The R function pchip() from the signal package is used to numericallyderive the inverse function, A−1g,h( x−ab). The pchip() function performs piecewisecubic Hermite (monotone) interpolation.Using equation (2.7), the conditional likelihood function for the single severityg-and-h candidate distribution given a sample x ∈ Rn isL˜(Θ;x,τ) = b−n[1−Φ(A−1g,h((τ−a)/b))]−n n∏i=1φ[A−1g,h((xi−a)/b)]A′g,h[A−1g,h((xi−a)/b)] ,and from equation (2.8), the conditional log-likelihood function for the single83severity g-and-h candidate distribution is˜`(Θ;x,τ) =−n log(b)−n log{1−Φ[A−1g,h((τ−a)/b)]}+n∑i=1log{φ[A−1g,h((xi−a)/b)]}− n∑i=1log{A′g,h[A−1g,h((xi−a)/b)]}.A.7 log-SaS DistributionThe Sinh-arcSinh Distribution (SAS) distribution introduced by Jones and Pewsey[2009] is a four parameter distribution resulting from a transformation of a standardnormal random variable, similar to the g-and-h distribution. The SAS distributionuses a monotonic transformation with an analytical inverse giving it two majoradvantages over g-and-h when calculating likelihoods.Let Z ∼ N(0,1) and ε,δ ∈ R with δ > 0. Also, let sinh(·) be the hyperbolicsine function and sinh−1(·) be the inverse hyperbolic sine function (arcsinh). Thenthe random variable Y ∗, such thatY ∗ = Aε,δ (Z) = sinh{sinh−1(Z)+ εδ},is said to have a standard SaS distribution with skewness parameter ε and tailweightparameter δ . We consider Y ∗ a “standard” SaS random variable since it assumes alocation parameter of 0 and scale parameter of 1. To further generalize, let a,b∈Rbe the location and scale parameters, respectively, with b> 0. Then,Y = a+b ·Aε,δ (Z) = a+b sinh{sinh−1(Z)+ εδ},where Y ∼ SaS(a,b,ε,δ ).We note that the generalized SAS transformation, a+b Aε,δ (·), has a closed-form inverse given byA−1ε,δ(y−ab)= sinh[δ · sinh−1(y−ab)− ε].To calculate the density function of a SAS random variable, we need the derivative84of the inverse given byddyA−1ε,δ(y−ab)= cosh[δ · sinh−1(y−ab)− ε] δb√( y−ab )2+1dy,where cosh(·) is the hyperbolic cosine function.As pointed out by Jones and Pewsey [2009], ε gives both the magnitude anddirection of skewness. A positively skewed distribution will have parameter valueε > 0, with larger ε indicating more skewness. The tailweight parameter, δ , hasa negative relation to the thickness of the tails. As the value of δ approaches zerofrom the right, the tail behavior becomes heavier. For example, distributions withheavier tails than the normal distribution have a tailweight parameter 0< δ < 1. Atailweight parameter greater than 1 indicates thinner tails than the normal distribu-tion.LetΦ(·) represent the standard normal CDF andΦ−1(·) be the standard normalquantile function. Then, for parameter vector Θ = [ a b ε δ ] with a,b,ε,δ ∈ R andb,δ > 0,FY (y;Θ) =Φ[A−1ε,δ(y−ab)]for y ∈ R;fY (y;Θ) = φ[A−1ε,δ(y−ab)] ( ddyA−1ε,δ(y−ab))for y ∈ R;F−1Y (p;Θ) = a+b Aε,δ(Φ−1(p))for 0< p< 1.The conditional CDF and PDF for a SaS distributed random variable with min-imum threshold τ∗ are derived from equations (2.5) and (2.6), respectively. Thequantile function is found by setting F˜Y (y;Θ,τ∗) = p and solving for y.F˜Y (y;Θ,τ∗) =Φ[A−1ε,δ((y−a)/b)]−Φ[A−1ε,δ((τ∗−a)/b)]1−Φ[A−1ε,δ((τ∗−a)/b)] for y> τ∗,0 for y≤ τ∗;85f˜y(x;Θ,τ∗) =11−Φ[A−1ε,δ(τ∗−ab)] φ[A−1ε,δ( y−ab )](ddy A−1ε,δ(y−ab))for y> τ∗,0 for y≤ τ∗;F˜−1Y (p;Θ,τ∗) = a+bAε,δ{Φ−1[(1− p) Φ[A−1ε,δ(τ∗−ab)]+ p]},for 0< p< 1.Using equation (2.7), the conditional likelihood function for the SaS candidatedistribution isL˜(Θ;x,τ∗) =1{1−Φ[A−1ε,δ( τ∗−ab )]}nn∏i=1{φ[A−1ε,δ(yi−ab)] ddyA−1ε,δ(yi−ab)}.From equation (2.8), the conditional log-likelihood function for the single severitySaS candidate distribution is˜`(Θ;x,τ∗) = n log(δ )−n log(b)−n log{1−Φ[A−1ε,δ(τ∗−ab)]}+n∑i=1log{cosh[δ · sinh−1(yi−ab)− ε]}+n∑i=1log{φ[A−1ε,δ(yi−ab)]}− 12n∑i=1log{(yi−ab)2+1}.When attempting to fit the SAS distribution to actual operational loss data, wefind that MLE suffers from the badly-scaled problem from Section 2.1.4. To al-leviate this issue, we treat losses as log sinh-arcsinh Distribution (LSAS) randomvariables.If Y ∼ SaS(a,b,ε,δ ), then X = eY has a LSAS distribution with parameter vec-tor Θ = [ a b ε δ ], denoted by X ∼ lsas(a,b,ε,δ ). Using the log-SaS distributionsolves another problem of the g-and-h distribution since the support for the log-SaS distribution is only the positive real numbers. Thus, the log-SaS distributionsolves the problems associated with the g-and-h distribution without sacrificingthe flexibility of a four parameter distribution and proves to be an incredibly useful86distribution for modeling loss severities.Let X = eY . We can find the CDF and PDF of X using the formulasF(x;Θ) = FY(log(x);Θ);f (x;Θ) =1xfY(log(x);Θ).Then, the CDF, PDF, and quantile function for the LSAS distribution areF(x;Θ) =Φ[A−1ε,δ(log(x)−ab)]for x> 0,0 for x≤ 0;f (x;Θ) =1x φ[A−1ε,δ(log(x)−ab)] (ddx A−1ε,δ(log(x)−ab))for x> 0,0 for x≤ 0;F−1(p;Θ) = exp{a+b Aε,δ(Φ−1(p))}for 0< p< 1.To derive the conditional CDF, PDF, and quantile function, let the loss data haveminimum reporting threshold τ = exp{τ∗}. The functions are written in terms ofthe functions for Y = log(X) for brevity.F˜(x;Θ,τ) = F˜Y(log(x);Θ,τ∗)for x> 0;f˜ (x;Θ,τ) = f˜y(log(x);Θ,τ∗)for x> 0;F˜−1(p;Θ) = exp{F˜−1Y (p;Θ,τ∗)}for 0< p< 1.Since we use the log-loss data to estimate the distribution parameters, the condi-tional likelihood and log-likelihood function for the SAS distribution are used onthe log transform of the log-loss data.87A.8 Lognormal Spliced Lognormal DistributionUsing the derivation from Section 2.1.2, we can calculate the unconditional CDF,PDF, and quantile function for the piecewise LGNLGN distribution. For ease ofnotation, letD1(Θ,τ) = Fbody(xs;Θb)− (1− pb)Fbody(τ;Θb);D2(Θ,τ) =1− pbD1(Θ,τ)Fbody(xs;Θb)−Fbody(τ;Θb)1−Ftail(xs;Θu) ,whereFbody(x;Θb) =Φ[log(x)−µbσb]; Ftail(x;Θu) =Φ[log(x)−µuσu],and Θ= [Θb xs Θu ], Θb = [ µb σb ] and Θu = [ µu σu ]. Also, letfbody(x;Θb) =1σbxφ[log(x)−µbσb]; ftail(x;Θu) =1σuxφ[log(x)−µuσu],where Φ(·) and φ(·) are the standard normal CDF and PDF, respectively.Then, the CDF, PDF, and quantile function for the LGNLGN distribution areF(x;Θ) =pbD1(Θ,τ)Fbody(x;Θb) for 0< x≤ xs,pbD1(Θ,τ)Fbody(xs;Θb)+ D2(Θ,τ)[Ftail(x;Θu)−Ftail(xs;Θu)]for x> xs,0 otherwise;88f (x;Θ) =pbD1(Θ,τ)fbody(x;Θb) for 0< x≤ xs,D2(Θ,τ) ftail(x;Θu) for x> xs,0 otherwise;F−1(p;Θ) =exp{µb+σb Φ−1[ppbD1(Θ,τ)]}for 0< p≤ ps,exp{µu+σu Φ−1[Ftail(xs;Θu)+p− psD2(Θ,τ)]}for ps < p< 1,where ps =pbD1(Θ,τ)Fbody(xs;Θb), and Φ−1(·) is the standard normal quantilefunction.The conditional CDF, PDF, and quantile function follow directly from thederivation in Section 2.1.2.F˜(x;Θ,τ) =pbFbody(x;Θb)−Fbody(τ;Θb)Fbody(xs;Θb)−Fbody(τ;Θb) for τ < x≤ xs,pb+(1− pb) Ftail(x;Θu)−Ftail(xs;Θu)1−Ftail(xs;Θu) for x> xs,0 for x≤ τ;89f˜ (x;Θ) =pb fbody(x;Θb)Fbody(xs;Θb)−Fbody(τ;Θb) for τ < x≤ xs,(1− pb) ftail(x;Θu)1−Ftail(xs;Θu) for x> xs,0 otherwise;F˜−1(p;Θ) =exp{µb+σb Φ−1[ppbFbody(xs;Θb)+pb− ppbFbody(τ;Θb)]}for 0< p≤ pb,exp{µu+σu Φ−1[Ftail(xs;Θu)+p− pb1− pb− p− pb1− pb Ftail(xs;Θu)]}for pb < p< 1.For the conditional log-likelihood functions of the body and tail, let nb be thenumber of observations in the body of the sample and nu be the number of theobservations in the tail of the sample. The conditional log-likelihood for the bodyis˜`b(Θb,xb,τ,xs) = nb log(pb)−nb log(σb)−nb∑i=1xi+nb∑i=1log{φ(log(xi)−µbσb)}−nb log{Φ(log(xs)−µbσb)−Φ(log(τ)−µbσb)}.90Likewise, the conditional log-likelihood function for the upper tail is˜`u(Θu,xu,xs) = nu log{1− pb}−nu log(σu)−nu∑i=1xi+nu∑i=1log{φ(log(xi)−µuσu)}−nu log{1−Φ(log(xs)−µuσu)}.A.9 Lognormal Spliced Generalized Pareto DistributionUsing the derivation from Section 2.1.2, we can calculate the unconditional CDF,PDF, and quantile function for the piecewise lgn/gpd distribution. For ease ofnotation, letD1(Θ,τ) = Fbody(xs;Θb)− (1− pb)Fbody(τ;Θb);D2(Θ,τ) =1− pbD1(Θ,τ)Fbody(xs;Θb)−Fbody(τ;Θb)1−Ftail(xs;Θu) ,whereFbody(x;Θb) =Φ(log(x)−µbσb);Ftail(x;ξ , θˆ ,xs) = 1−(1+ξθˆ(x− xs))−1/ξ,and Θ= [Θb xs ξ ], Θb = [ µb σb ], and θˆ = 1−pbpbFbody(xs;Θb)−Fbody(τ;Θb)fbody(xs;Θb). Also, letfbody(x;Θb) =1σbxφ[log(x)−µbσb],where Φ(·) and φ(·) are the standard normal CDF and PDF, respectively.91Then, the CDF, PDF, and quantile function for the LGNGPD distribution areF(x;Θ) =pbD1(Θb,τ)Fbody(x,Θb) for 0< x≤ xs,pbD1(Θb,τ)Fbody(xs,Θb)+D2(Θ,τ)[1−{1+ ξθˆ (x− xs)}]for x> xs,0 otherwise;f (x;Θ) =pbD1(Θ,τ)fbody(x,Θb) for 0< x≤ xs,D2(Θ,τ)θˆ(1+ξθˆ(x− xs))−1−1/ξfor x> xs,0 otherwise;F−1(p;Θ) =exp{µb+σb Φ−1[D1(Θ,τ)pbp]}for 0< p≤ ps,xs+θˆξ[(1− p− psD2(Θ,τ))−ξ−1]for ps < p< 1,where ps =pbD1(Θ,τ)Φ[log(xs)−µbσb].The conditional CDF, PDF, and quantile function follow directly from the92derivation in Section 2.1.2.F˜(x;Θ,τ) =pbFbody(x,Θb)−Fbody(τ,Θb)Fbody(xs,Θb)−Fbody(τ,Θb) for τ < x≤ xs,pb+(1− pb)[1−(1+ξθˆ(x− xs))−1/ξ]for x> xs,0 for x≤ τ;f˜ (x;Θ) =pb Fbody(x,Θb)Fbody(xs,Θb)−Fbody(τ,Θb) for τ < x≤ xs,1− pbθˆ(1+ξθˆ(x− xs))−1−1/ξfor x> xs,0 otherwise;F˜−1(p;Θ) =exp{µb+σb Φ−1[ppbFbody(xs;Θb)+pb− ppbFbody(τ;Θb)]}for 0< p≤ pb,xs+θˆξ[(1− pb1− p)ξ−1]for pb < p< 1.For the conditional log-likelihood functions of the body and tail, let nb be thenumber of observations in the body of the sample and nu be the number of theobservations in the tail of the sample. The conditional log-likelihood for the body93is˜`b(Θb,xb,τ,xs) = nb log(pb)−nb log(σb)−nb∑i=1xi+nb∑i=1log{φ(log(xi)−µbσb)}−nb log{Φ(log(xs)−µbσb)−Φ(log(τ)−µbσb)}.Likewise, the conditional log-likelihood function for the upper tail is˜`u(ξ ;xuθˆ ,xs) = nu log{1− pb}−nu log(θˆ)−(ξ +1ξ) nu∑i=1log(1+ξθ(xi− xs)).94Appendix BMLE Results95B.1 MLE Results for SRC 1Figure B.1: SRC 1 MLE parameters for each candidate distribution when including all loss databefore each row’s designated year. The last row uses all simulated data. Lgn/Lgn andLgn/Gpd refer to LGNLGN and LGNGPD, respectively.96B.2 MLE Results for SRC 2Figure B.2: SRC 2 MLE parameters for each candidate distribution when including all loss databefore each row’s designated year. The last row uses all simulated data. Lgn/Lgn andLgn/Gpd refer to LGNLGN and LGNGPD, respectively.97B.3 MLE Results for SRC 3Figure B.3: SRC 3 MLE parameters for each candidate distribution when including all loss databefore each row’s designated year. The last row uses all simulated data. Lgn/Lgn andLgn/Gpd refer to LGNLGN and LGNGPD, respectively.98
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Modeling operational risk using the truncation approach
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Modeling operational risk using the truncation approach Hadley, Daniel P. 2018
pdf
Page Metadata
Item Metadata
Title | Modeling operational risk using the truncation approach |
Creator |
Hadley, Daniel P. |
Publisher | University of British Columbia |
Date Issued | 2018 |
Description | Banks that use the advanced measurement approach to model operational risk may struggle to develop an internal process that produces stable regulatory capital over time. Large decreases in regulatory capital are scrutinized by regulators while large increases may force banks to set aside more assets than necessary. A major source of this instability arises from the loss severity selection process, especially when the selected distribution families for severity risk categories change year-to-year. In this report, we examine the process of selecting severity distributions from a candidate distribution list within the guidelines of the advanced measurement approach, propose useful tools to aid in selecting an appropriate severity distribution, and analyze the effect of selection criteria on regulatory capital. The log sinh-arcsinh distribution family is added to a list of common candidate severity distributions used by industry. This 4-parameter family solves issues introduced by the 4-parameter g-and-h distribution without sacrificing flexibility and shows promise in outperforming 2-parameter families, reducing the frequency of severity distribution families changing year-to-year. Distribution parameters are estimated using the maximum likelihood approach from loss data truncated at a known minimum reporting threshold. Our severity distribution selection process combines truncation probability estimates with Akaike Information Criterion (AIC), Bayesian Information Criterion, modified Anderson-Darling, QQ-plots, and predictive measures such as the quantile scoring function and out-of-sample AIC, and we discuss some of the challenges associated with this process. We then simulate operational losses and calculate regulatory capital, comparing the effect on regulatory capital of selecting loss severity distributions using AIC versus quantile score. A combination of these two criteria is recommended when selecting loss severity distributions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2018-07-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0369252 |
URI | http://hdl.handle.net/2429/66611 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2018-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2018_september_hadley_daniel.pdf [ 1.45MB ]
- Metadata
- JSON: 24-1.0369252.json
- JSON-LD: 24-1.0369252-ld.json
- RDF/XML (Pretty): 24-1.0369252-rdf.xml
- RDF/JSON: 24-1.0369252-rdf.json
- Turtle: 24-1.0369252-turtle.txt
- N-Triples: 24-1.0369252-rdf-ntriples.txt
- Original Record: 24-1.0369252-source.json
- Full Text
- 24-1.0369252-fulltext.txt
- Citation
- 24-1.0369252.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0369252/manifest