International Construction Specialty Conference of the Canadian Society for Civil Engineering (ICSC) (5th : 2015)

Developing failure age prediction model of hazardous liquid pipelines Parvizsedghy, L.; Zayed, T. Jun 30, 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52660-Parvizsedghy_L_et_al_ICSC15_285_Developing_Failure_Age.pdf [ 1.83MB ]
JSON: 52660-1.0076416.json
JSON-LD: 52660-1.0076416-ld.json
RDF/XML (Pretty): 52660-1.0076416-rdf.xml
RDF/JSON: 52660-1.0076416-rdf.json
Turtle: 52660-1.0076416-turtle.txt
N-Triples: 52660-1.0076416-rdf-ntriples.txt
Original Record: 52660-1.0076416-source.json
Full Text

Full Text

5th International/11th Construction Specialty Conference 5e International/11e Conférence spécialisée sur la construction    Vancouver, British Columbia June 8 to June 10, 2015 / 8 juin au 10 juin 2015   DEVELOPING FAILURE AGE PREDICTION MODEL OF HAZARDOUS LIQUID PIPELINES L. Parvizsedghy1,2, T. Zayed1 1 Department of Building, Civil, and Environmental Engineering, Concordia University, Canada,  2  Abstract: Pipelines are the most common way of transporting the hazardous materials. They are considered to be the safest way of transporting petroleum products; however, there have been several failures with considerable consequences. As a result, the importance of studying failure of pipelines is not covert to anybody. Failure prediction of pipelines has been the subject of some studies from different perspectives. Estimation of failure age has also been studied from the specific points of view. Most of the studies have focused on producing models with data from inspection tools. These tools are very expensive; although, they are considerably accurate. This research aims to develop a model based on the basic attributes of pipelines without data from the inspection tools to predict the probability of failure. The model predicts the age of failure considering the historical data that was gathered on pipelines’ failures. The effect of several variables on the frequency of failures in different age classes is studied in order to identify the effective variables on pipelines’ failure. Then, a regression model is developed to estimate the age of failure. Pipe manufacture year, maximum operating pressure (MOP), specified minimum yield strength (SMYS) and pipe diameter over pipe wall thickness are the variables that are considered in the developed model after significant number of modeling iterations. Statistical parameters of the developed regression model prove its soundness. Validation results prove the accuracy of the model with over 80 percent.  1 INTRODUCTION While pipelines are considered to be the most effective and safe way of transporting hazardous liquids there is probability of failure with monetary and safety consequences. The Pipelines and Hazardous Materials Safety Administration (PHMSA 2013) of US Department of Transportation has gathered data on the failures of oil and gas pipelines in three different date ranges. Over 7,300 failures are recorded from 1986 to 2013 in the United States of America which has resulted in almost 2.9 billion dollar property damages and the leakage of around 4.1 million barrels of hazardous liquids in the environment. Also, records prove the happening of 60 fatalities as well as 2,150 serious injuries which demands significant attention. Accordingly, failures of pipelines specially the ones carrying hazardous liquids (which will be called oil pipelines from now) has become the subject of interest for this study. Literature review proves the insufficiency of studies on the failures of oil pipelines; while, there are a few researches mostly focusing on the safety and reliability-based studies of such pipelines. This study tries to develop a model to predict the age of failure applying pre-mentioned data. The age of failure then will be applied to predict the probability of failure. Risk of failure will be calculated when probability of failure is combined with the consequences of failure. The age of failure can be used to plan the maintenance operations of the pipelines. In order to develop the model, existed data has been studied from different perspectives. Several diagrams have been drawn to analyze the effect of various variables in different classification. 285-1 Then, the trend has been studied to find the most effective variables. Data on the selected variables has been embedded to the regression model to produce predictive models of failure age. Then the efficiency of the produced models has been improved by changing the set of the variables considering their importance through analyzing the statistical parameters of the model. Error measuring methods have been used for validation applying test data which consists of randomly selected ten percent of data. Validation phase compares the estimated outputs of model with the actual data from the test dataset. A model has been selected finally which is proved to be the most effective, while reserving the simplicity of the model.  Objectives of this research are to (1) identify and study the most effective factors on the pipelines’ age of failure and (2) develop a model that predicts the age of failure to estimate the probability of failure of oil pipelines. 2 BACKGROUND Several researchers have tried to model failures of oil and gas pipelines. Parvizsedghy & Zayed (2013) have developed a model for the prediction of failure consequences of oil and gas pipelines. This model has obtained data from the US Department of Transportation. It has identified a series of primary variables including the basic attributes of oil pipelines and some other variables related to the efficiency and quality of the inspection systems in the pipelines. Number of variables are optimized through comparing the efficiency of the produced Neural Networks. The final model is efficiently estimating the monetary consequences of the failures in oil and gas pipelines. Senouci et al. (2013) have developed a model to predict the failure type of oil and gas pipelines. The model benefits from data on the accidents of pipelines which has been published in European database (Davis et al. 2010). The model can predict the potential type of failure out of mechanical, operational, corrosion, third party and natural hazards. Validation results have proven accuracy of the model. Variables that have been applied in this model include product type, location of pipe and land use as categorical variables and pipe age and diameter as the numerical variables. The research gets use of Artificial Neural Network and regression analysis in developing the model. NOOR et al. (2011) have proposed a probabilistic method to forecast the remaining strength of offshore pipelines considering data from the Inline inspection tools. This method is developed based on the assessment rules of DNV’s Recommended Practice for Corroded Pipelines (DNV 2010) considering the standard deviation of inspection tools in determining the defect sizes. Bersani et al. (2010) have proposed a model to predict the probability of failures applying Artificial Neural Networks. For each cause of failure, a set of factors are proposed as the independent variables. Preliminary results on the prediction of third party failures’ probability have been presented; however, results do not prove the importance of the proposed factors and neither the soundness of the model. Caleyo et al. (2009) have developed probability distributions of corrosion depth and rate of growth applying Monte Carlo simulation. Different curves are proposed for underground pipelines considering properties of various soil types. Teixeira et al. (2008) have proposed a probabilistic model to evaluate the reliability of the pipelines under corrosion. The model employs experimental data and estimates the burst pressure of corroded pipelines. Bertolini and Bevilacqua (2006) have developed regression classification tree to predict the failure class of cross country oil pipelines. The model applies data from the CONCAWE (Davis et al. 2010) database in the training phase and aims to recognize the most risky pipelines to help the operators to decide about the maintenace of their network of pipelines. A risk-based decision making support system is developed using Analytic Hierarchy Process (AHP) technique. This model applies expert opinions to obtain the weight of variables that have been identified important on the failures of pipelines. Variables are risk factors includig external and internal corrosion, construction and material defects and acts of God. Failure probability is estimated by the scores through the expert judgement. Dey (2004), Sinha (2002) and Ahammed (1998) have developed probabilistic models due to the uncertainties of pipeline parameters. Researches have obtained data from Inline inspection tools to predict the failure probability of oil and gas pipelines under corrosion. These tools are used to gather data on the condition of oil and gas pipelines. The models require data on defects depth and length from Inline inspection tools. Ahammed (1998) has applied reliability theory to estimate the remaining strength and integrity of pipelines.  285-2 This study proves deficiency of the researches on the estimation of failure probability of oil and gas pipelines. Current researches are either subjective and based on the expert opinion or have concentrated on physical models which require data from inspection tools. However, inspection tools are very expensive to run regularly and there is a need to develop predictive models based on the attributes of oil and gas pipelines.  2.1 Overview of regression modelling Regression models are suitable for construction management problems and are used in a wide range of researches. Salman and Salem (2012) have used regression analysis to develop deterioration models for sewer pipelines, Wang et al. (2009) to forecast the annual break rates of water mains, Chughtai and Zayed (2008) to develop structural condition assessment models of sewer pipelines and Zayed et al. (2007) for productivity estimation models of horizontal directional drilling activities. These pattern recognition techniques can discover the relationship between the dependent and independent variables of historical data. Result of a linear regression model becomes in the form of the Equation 1 (Senouci, et al., 2013) where Yi represents the dependent variable, x1, x2, … and xn represent independent variables and β0, β1, … and βn are the parametrs of the regression model.   [1]    This equation represents the best curve that fits the historical data assuming the error in each trial from the training data should be 1) independent of the predictor variable 2) the errors should be normally distributed around each value. (Senouci, et al. 2013) After developing the regression model, it should be assessed via different parameters. Significance of independent variables is evaluated through F-test, the results of which is presented as P(f) values which should be less than alpha level. The acceptable amount of risk for F-test is the value that the user determines as alpha and is usually equal to 0.05. Accordingly, if the value of P(f) for each variable is less than the amount of alpha then that variable is significant; otherwise, the variable should be removed. Then the regression modelling should be repeated with the new subset of the variables. The other parameter that should be assessed is R-squared of the regression model and the closer is the value of R-squared to 100% the more efficient is the regression model. (Wang, et al. 2009) 3 RESEARCH METHODOLOGY In this section, the methodology of the model that is developed to estimate the age of failure of oil pipelines is described. Figure 1 presents the overall flowchart of the model development. It includes two major phases of pre-processing and regression analysis phases. Data is obtained from the Pipeline and Hazardous Materials Safety Administration (PHMSA 2013). Historical data will be analyzed to discover the variables which can affect the age of failure. Actually, this analysis will look for the trends that different variables can present in various classifications. After that, data will be prepared to be embedded into the model. Data preparation is comprised of removing missing and technically irrational data and combining existing categories of the database to produce new effective variables. Then data is normalized to make all in the same range of zero to one. In the next phase, training dataset which includes 90 percent of the data that is randomly selected from the prepared dataset is embedded to the regression analysis model. Then the produced models are assessed, based on the basic diagnostic parameters such as R-squared, P(f) and P(t). Different subsets of the primary variables are tested and the satisfactory models are selected. Selected models have gone through the residual analysis. This analysis plots residuals against the normal distribution and tests normality of residuals. This test tries to prove the assumption that the produced errors should be normally distributed around each value. After that, test dataset is applied to validate the satisfactory models. The validation step assesses the accuracy of the produced models against actual data via different methods.   285-3 Operating pressure exceeds 950. In this group the number of failures drops; while, the previous groups confirm the increasing trend. An increasing trend of incident frequencies happens in the total failures of pipelines too.  Muhlbauer (2004) suggests combining pipe diameter and wall thickness to form a new variable, namely pipe diameter over pipe wall thickness (D/th.). This variable can determine the crack potential or strength of the pipe. Table 1: Frequency in Different Failure Age classes versus Various Pipe Wall Thickness, Diameter, Specified Minimum Yield Strength (SMYS) and Maximum Operating Pressure (MOP) Classes  0-5 6-10 11-20 21-30 31-40 41-50 >50th<0.2 538 49 28 76 152 114 52 670.2<=th<0.3 1279 92 43 110 237 271 215 3110.3<=th<0.4 860 73 22 54 81 129 157 344th>=0.4 140 24 16 24 24 19 12 210-5 214 33 11 34 38 38 29 315<=d<11 1595 123 58 125 282 272 235 50011<=d<17 625 48 19 62 105 139 100 15217<=d<23 181 12 7 18 27 39 47 3123<=d<29 107 13 8 10 18 20 21 1729<=d<35 61 8 4 8 21 12 4 4>=35 31 2 0 7 5 11 2 4SMYS<=25,000 817 59 36 68 131 124 112 28725,000<SMYS<=35,000 653 54 17 34 61 94 147 24635,000<SMYS<=45,000 417 44 20 42 25 124 85 7745,000<SMYS<=55,000 616 37 17 76 176 154 74 82SMYS>55,000 96 18 8 22 13 14 5 16MOP<=250 198 21 14 20 27 23 27 66250<MOP<=600 444 37 13 35 53 53 68 185600<MOP<=950 733 31 17 44 113 115 138 275MOP>950 1326 127 53 155 290 321 187 193Age ClassesWall ThicknessDiameterSMYSMOPTotalVariableClasses  Figure 2 illustrates the average age of failures in each diameter by wall thickness class versus the manufacturing year of pipelines. As it can be seen from the graph, there is a regular pattern in each diameter by wall thickness class. Actually, the average age of failure is decreasing in each classification by increasing the manufacturing year of pipelines. This fact reveals the importance of the factors of diameter over pipe wall thickness and pipe manufacturing year. Study of the other variables did not either result in any regular patterns or did not have enough data to be studied. Due to the results of these studies, manufacturing year of pipes (Man. Year), Diameter (D), wall thickness of pipe (th.), Diameter over pipe wall thickness (D/th.), Specified Minimum Yield Strength (SMYS) and maximum operating pressure (MOP) are identified as the primary effective variables in modeling the age of failure. 285-5 models are analyzed considering statistical values such as correlation coefficient (R2), P(f) and P(t). Higher correlation coefficient values prove the efficiency of the regression models. F-test is carried out against alternate hypothesis (Ha). This hypothesis assumes that regression coefficients could be equal to zero the results of which are presented as P(f) for each variable. Values less than alpha reject the alternate hypothesis which proves the significance of the variable. If more than one model is produced, the validation results determine the selected model. The higher validation parameters verify the efficiency of the model. In the next section, results of the regression analysis will be presented.  The training dataset is embedded to the Minitab 16 Statistical Software (Minitab 2013) and the regression analysis has been done with three different subsets of the primary variables. It has resulted in three models. Due to the satisfactory results of these models considering their statistical values, validation test has been done on all of them, the results of which are presented in Table 3. As it can be seen, Model No. 2 and 3 are very close in validation values and more efficient than Model No. 1. However, as the Model No. 3 is a non-linear quadratic model, it is recommended by (Kutner, et al. 2005) to select the linear model and preserve the model’s simplicity.  Table 3: Correlation coefficient, variables and validation results of three different models  The results of the regression analysis for model No. 2 which is selected to predict the age of failure of oil pipelines is presented in Figure 3. This model includes four variables: pipe manufacturing year, MOP, SMYS and Diameter over pipe wall thickness. All of the variables are proved to be significant as the value of P(f) for all of them is around zero which is smaller than the amount of alpha (0.05). As a result, it can be claimed that this model is the best model to represent the relationship between the age of failure and pre-mentioned variables. Correlation coefficient of the model equals 87.6% which verifies the efficiency of the model.  Regression Analysis: Age versus Man. Year, MOP, SMYS, D/Th.   The regression equation is Age = 0.859 - 0.899 Man. Year - 0.131 MOP + 0.0562 SMYS - 0.0936 D/Th.   Predictor       Coef   SE Coef        T      P Constant    0.859489  0.005501   156.23  0.000 Man. Year  -0.899431  0.007567  -118.87  0.000 MOP         -0.13132   0.01163   -11.29  0.000 SMYS        0.056170  0.007177     7.83  0.000 D/Th.       -0.09356   0.02809    -3.33  0.001   S = 0.0685833   R-Sq = 87.6%   R-Sq(adj) = 87.5%   Analysis of Variance  Source            DF      SS      MS        F      P Regression         4  73.783  18.446  3921.54  0.000 Residual Error  2230  10.489   0.005 Total           2234  84.272                                                        Figure 3: Regression Analysis Results 5.1 Residual Analysis Satisfactory results of model’s statistical parameters, leads us to the next phase which is the residual analysis. Figure 4 illustrates histogram of the frequency of residuals along with the normal distribution Models R2 Variables AIP RMSE MAE AVP Model No. 1 87.7% Man. Year, MOP, D, Th., SMYS 21.708 0.008 0.064 78.292 Model No. 2 87.6% Man. Year, MOP, SMYS, D/th. 19.493 0.004 0.056 80.507 Model No. 3 87.6% Man. Year, MOP, D, SMYS, D 2, SMYS 2 19.306 0.004 0.056 80.694 285-7 curve. Distribution of the residuals is almost normal with a small skew to the right. Results are satisfactory and acceptable as it is close to a normal distribution. Figure 5 presents normal probability plot of the residuals of the model. The same results are drawn from this graph as the results are almost linear, although showing some discrepancies in the tail of the plot. Discrepancy can be as a result of the outliers; however, historical data shows that they are not outliers and these scenarios are possible.              Figure 4: Frequency Histogram of Model’s Residuals against Normal Probability Distribution            Figure 5: Normal Probability Plots of Regression Model’s Residuals 5.2 Validation of the model Validation tests the estimated results of developed model against the actual data and it is done through utilizing four equations. In all of the equations, Ei represents the ith estimated value through the model and Ci represents the ith actual value of failure age; while, n is the number of data in the test dataset. Equation related to the Root Mean Square Error (RMSE) is presented in equation 3. The closer is the value of RMSE to zero; the more efficient is the estimation function. Mean absolute error (MAE) is presented in equation 4 and as it can be predicted, the closer is the value of MAE to zero; the more accurate is the prediction. Equations related to the calculation of average invalidity percentage (AIP) and average validity percentage (AVP) are presented in equations 5 and 6 accordingly. In an accurate prediction the value of AIP should be closer to zero; while, the value of AVP is closer to one. AIP of the selected model equals to 19.5%, RMSE equals to 0.004, MAE equals to 0.056 and AVP equals 80.5% which prove the accuracy of the model. ResidualFrequencyHistogram(response is Age)43210-1-2-3-499.999995805020510.01Standardized ResidualPercentNormal Probability Plot(response is Age)285-8 [3]  [4]                                                                              [5]                  [6] AVP = 1- AIP                                                                             Estimated ages of failure are plotted versus the actual ages of failures in the test dataset which is presented in figure 6. The best fit is depicted by the line drawn on the graph. The relationship of the estimated and actual values is presented with the function on the graph and the correlation coefficient of the generated values versus the actual ones equals to 87% which proves the efficiency of the prediction.    Figure 6: Estimated age of failure versus actual age of failure  5.3 Probability of Failure Accuracy of the model is proved by the statistical parameters and the validation results. The model can predict the age of failure and it is possible to calculate the probability of failure over time. Equation 7 is proposed to forecast the failure probability assuming the failure probability increases linearly through the time. Failure age is the estimated value through the model and “t” represents the number of years from now before which the failure could happen. In this equation, individual failure probability in all of the years has been assumed to be identical.    [7]                                                       6 CONCLUSION The pipelines of Hazardous Liquids are considered to be the safest way of transporting these products. However, recorded data on the failures of these mega infrastructures has proved the importance of research on them. Literature review has also verified the shortage of studies in this field. Operators of these pipelines need inspection tools to predict the failures of their pipelines before happening of the incidents. Due to the fact that inspection tools are expensive to run regularly, predicting models are 285-9 required. This study has developed a model to predict the age of failure of these pipelines. The model benefits from the historical data (1986-2013) that is gathered through the PHMSA (2013). First, the effect of different factors is studied to identify the most effective variables on the age of failure. Four variables that have been recognized through this process are pipe manufacturing year, maximum operating pressure, diameter over pipe wall thickness and the Specified Minimum Yield Strength. Then, data is inserted to the regression analysis and several functions have been produced with different subsets of pre-defined variables. Functions are assessed through the statistical values such as R-squared and F-test values. Results proved the efficiency and appropriateness of some of the models. Selected functions are then compared by validation results. Validation is done through the test dataset which includes ten percent of the data. The best function is the most accurate one with the Average Validity Percentage (AVP) equal to 80.5%. Residual analysis over this regression function presents sound outcomes. Finally, the estimated values of failure ages are compared with the actual values in a graph and the best fit is determined. Correlation coefficient of the regression model equals to 87% which proves the efficiency of the model. Finally an equation is proposed to calculate the probability of failure applying failure age that is estimated through the model. References Ahammed, M. 1998. Probabilistic estimation of remaining life of a pipeline in the presence of active corrosion defects. Int.J.Pressure Vessels Piping, 75(4), pp. 321-329.  Bersani, C., Citro, L., Gagliardi, R. V., Sacile, R., and Tomasoni, A. M. 2010. Accident occurrance evaluation in the pipeline transport of dangerous goods. Chemical Engineering Trans. 19, pp. 249-254. Bertolini, M., and Bevilacqua, M. 2006. Oil pipeline spill cause analysis: A classification tree approach. Journal of Quality in Maintenance Engineering, 12(2), pp. 186-198.  Caleyo, F., Velázquez, J. C., Valor, A., and Hallen, J. M. 2009. Probability distribution of pitting corrosion depth and rate in underground pipelines: A Monte Carlo study. Corros.Sci., 51(9), pp. 1925-1934.  Chughtai, F. and Zayed, T. 2008. Infrastructure Condition Prediction Models for Sustainable Sewer Pipelines. Journal of Performance of Constructed Facilities, 22(5), pp. 333-341.  Davis, P.M., Dubois, J., Gambardella, F., and Uhlig, F. 2010. Performance of European Cross-country Oil Pipelines: Statistical Summary of Reported Spillages in 2008 & Since 1971, CONCAWE, Brussels, June 2010. Dey, P. K. 2004. Decision support system for inspection and maintenance: a case study of oil pipelines. Engineering Management, IEEE Transactions, 51(1), pp. 47-56. DNV. 2010. Risk Assessment of Pipeline Protection. Rep. No. DNV-RP-F107, Det Norske Veritas, Norway.  Muhlbauer, W. K. 2004. Pipeline Risk Management Manual: Ideas, Techniques, and Resources. Gulf Professional Publishing, Elsevier.  Noor, N. M. D., Ozman, N. A. N., and Yahaya, N. 2011. Deterministic Prediction of Corroding Pipeline Remaining Strength in Marine Environment Using DNV RP-F101(Part A). Journal of Sustainability Science and Management, 6(1), pp. 67-78.  Parvizsedghy, L., and Zayed, T. 2013. Predictive risk based model for oil and gas pipelines. 4th Construction Specialty Conference, CSCE, Montreal, QC., Canada. PHMSA. 2013. Pipeline & Hazardous Materials Safety Administration. (October/01, 2013).  Salman, B. and Salem, O. 2012. Modeling Failure of Wastewater Collection Lines Using Various Section-Level Regression Models. J. Infrastruct. Syst., 18(2), pp. 146–154. Senouci, A., Elabbasy, M., Elwakil, E., Abdrabou, B., and Zayed, T. 2013. A model for predicting failure of oil pipelines. Structure and Infrastructure Engineering,10(3), pp. 1-13. Sinha, S. K., and Pandey, M. D. 2002. Probabilistic Neural Network for Reliability Assessment of Oil and Gas Pipelines. Computer-Aided Civil and Infrastructure Engineering, 17, pp. 320-329. Teixeira, A. P., Guedes Soares, C., Netto, T. A., and Estefen, S. F. 2008. Reliability of pipelines with corrosion defects. Int.J.Pressure Vessels Piping, 85(4), pp. 228-237.  Wang, Y., Zayed, T. and Moselhi, O. 2009. Prediction Models for Annual Break Rates of Water Mains”, Journal of Performance of Constructed Facilities, 23(1), pp. 47-54.  Zayed, T., Amer, M.,Dubey, B. and Gupta, M. 2007. Deterministic Productivity Model for Horizontal Directional Drilling. Twelfth International Colloquium on Structural and Geotechnical Engineering, Cairo.  285-10 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items