Loading...
HomeMy WebLinkAboutStatistical Methods for Determining BTVs_Revised_05262017(> DUKE ENERGY. REVISED DRAFT Statistical Methods for Developing Reference Background Concentrations for Groundwater and Soil at Coal Ash Facilities May 26, 2017 Prepared By: HDR Engineering, Inc. 440 S. Church St, Suite 1000 Charlotte, NC 28202 and SynTerra Corporation 148 River Street, Suite 220 Greenville, South Carolina 29601 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 CONTENTS INTRODUCTION..........................................................................................................................1 PART I — DESCRIPTION OF BACKGROUND DATA SETS...................................................4 Groundwater..........................................................................................................................4 Soil.........................................................................................................................................4 PART II — PRELIMINARY DATA ANALYSIS..........................................................................6 1. Descriptive Statistics...............................................................................................6 2. Graphical Analysis...................................................................................................6 3. Identify Outliers........................................................................................................7 4. Identifying Data Distributions...................................................................................7 5. Evaluating Background Groundwater Data.............................................................8 6. Autocorrelation........................................................................................................9 7. Seasonality.............................................................................................................. 9 8. Trends....................................................................................................................10 9. Additional Methods for Identifying Trends in Background Groundwater Data ......10 10. Determining Baseline Period for Background Wells..............................................11 PART III — TESTING FOR SUB -GROUPS IN BACKGROUND GROUNDWATER DATA ...12 GraphicalAnalysis...............................................................................................................12 Analytical Tests for Comparing Sub-Groups.......................................................................13 Tests for Identifying Differences Among Sub-Groups.........................................................13 PART IV — DEVELOPMENT OF BTVs FOR CONSTITUENTS IN GROUNDWATER AND SOIL......................................................................................................................14 REFERENCES...........................................................................................................................17 i I P a g e REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 FIGURES Figure 1. Box -and -Whisker Plot Figure 2. Quantile-Quantile (Q-Q) Plot Figure 3. Scatter Plot of Time versus Concentration Figure 4. Sample Autocorrelation Function Figure 5. Scatter Plots of Time versus Concentration Illustrating Seasonality Figure 6. Scatter Plots of Time versus Concentration Illustrating Trends Figure 7. Piece -Wise Polynomial Regression Output Exhibiting Multiple Trends Figure 8. Empirical Distribution Plot Comparing Constituent Concentrations between Two Seasons TABLES Table 1. Chemical Parameters Analyzed in Groundwater and Soil REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 INTRODUCTION In 2015, the North Carolina Coal Ash Management Act (CAMA) required the preparation of a Comprehensive Site Assessment Report (CSA) for each regulated facility. The purpose of the CSA was to identify the source and cause of exceedances of regulatory standards, potential hazards to public health and safety, and identify receptors and exposure pathways. The CSA was conducted in accordance with a conditionally approved Work Plan to meet the requirements of 15A NCAC 02L .0106(g), which includes an assessment of the horizontal and vertical extent of soil and groundwater contamination for all contaminants confirmed to be present in groundwater in exceedance of groundwater quality standards. Regulations regarding North Carolina groundwater quality standards provided in T15A NCAC 02L .0202. Section (b)(3) of the regulation state that: Where naturally occurring substances exceed the established standard, the standard shall be the naturally occurring concentration as determined by the Director. For soil and groundwater assessments under the CAMA, naturally occurring concentrations of constituents need to be determined in order to complete horizontal and vertical delineations required as a basis for development of Corrective Action Plans. The horizontal and vertical extent of constituent migration cannot be determined until naturally occurring background concentrations are known. This document serves as a framework for a consistent technical approach which will be utilized for Duke Energy sites in North Carolina to determine proposed provisional background threshold values (PPBTVs') for naturally occurring constituents in groundwater and soil. For the purpose of establishing background threshold values (BTVs') at this time, the value which represents the upper threshold value from the data distribution for a given constituent will be considered the value representative of a naturally occurring concentration, or the PPBTV. The process for evaluating background concentrations over time is iterative; therefore, as additional background data is collected, the approach for developing BTVs may be reviewed and potentially modified with consideration of expanded data sets, changes in data set distribution, and input from the North Carolina Department of Environmental Quality (NCDEQ). For groundwater, non -filtered (total) results will be used to establish BTVs. In general, groundwater data will not be included in the development of BTVs when turbidity of the groundwater sample was reported to be greater than 10 nephelometric turbidity units (NTU) or when pH is greater than 8.5. Professional judgment can be used to retain data that does not meet these criteria. However, the decision to retain data that does not satisfy these criteria must be documented; such as, concurrence with NCDEQ that naturally occurring pH is greater than 8.5 in the unit being evaluated. Background locations for groundwater were identified for each site in the CSA Reports and/or Corrective Action Plans (CAPs). Other wells unaffected by The terms PPBTV and BTV are used interchangeably in this document. The term BTV is used in the EPA ProUCL User guide. 12 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 plant operations may also be used to augment the background data set with agreement from NCDEQ. For soil, only samples collected above the water table and at locations not influenced by Plant operations will be included in the calculation of BTVs. Site -specific soil sampling locations and intervals are described in the CSA Work Plans. The methods for developing PPBTVs described in this document are based on the US Environmental Protection Agency (USEPA) "Unified Guidance" (USEPA 2009), USEPA's Guidance for Comparing Background and Chemical Concentrations in Soil for CERCLA Sites (USEPA 2002), and the ProUCL Technical Guide (USEPA 2015). In addition, the North Carolina Division of Water Quality (NCDWQ) technical assistance document for Evaluating Metals in Groundwater at DWQ Permitted Facilities (NCDWQ 2012) was also referenced. USEPA's ProUCL Version 5.1 Technical Guide (EPA/600/R-07/041 December 2015) states that: A defensible background data set represents a "single" environmental population possibly without any outliers. In a background data set, in addition to reporting and/or laboratory errors, statistical outliers may also be present... elevated outliers should not be included in background data sets and estimation of BTVs. The objective here is to compute background statistics based upon a data set which represents the main background population, and does not accommodate the few low probability high outliers (e.g., coming from extreme tails of the data distribution) that may also be present in the sampled data. The occurrence of elevated outliers is common when background samples are collected from various onsite areas (e.g., large Federal Facilities). The proper disposition of outliers, to include or not include them in statistical computations, should be decided by the project team. The project team may want to compute decision statistics with and without the outliers to evaluate the influence of outliers on the decision making statistics. The methods described in this document are intended to serve as guidelines to develop BTVs. The use of the upper tolerance limit (UTL) to establish BTVs for constituents analyzed during assessment monitoring is consistent with NCDEQ Guidance as well as the USEPA's Unified Guidance (2009). The UTL will be evaluated as the statistic for development of groundwater and soil BTVs. BTVs will be developed for a select group of constituents derived from the list of parameters investigated as part of CAMA (Table 1). The UTL will be used to represent an upper limit for naturally occurring concentrations such that values exceeding this limit may be indicative of groundwater and soil impacts. Naturally occurring concentrations determined by the process presented in this document will be submitted to the NCDEQ Division of Water Resources for determination of the PPBTVs. Site -specific reports documenting the procedures, evaluations, and calculations will be prepared and submitted to NCDEQ. Following NCDEQ's approval of the PPBTVs, the PPBTVs will be used as groundwater and soil standards when the values exceed concentrations appearing in T15A NCAC 02L .0202(g) or the Interim Maximum Allowable Concentrations (Appendix #1 to T 15A NCAC 02L) for groundwater or Preliminary Soil Remediation Goals as 21 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 described in Section 4 of the NCDEQ 2015 Inactive Hazardous Sites Program Guidelines for Assessment and Cleanup (NCDEQ 2015) for soil. This document consists of four parts describing the process for establishing BTVs for constituents in groundwater and soil: Part I — Description of Background Data Sets Part I provides discussion of groundwater and soil sample collection, background data set attributes, and preliminary treatment of background data. Part II — Preliminary Data Analysis Part II includes analyses used to assess and transform data (where necessary) for use in producing appropriate UTLs. This analysis includes screening data sets for outliers, fitting data sets to distribution models, assessing data for temporal variability, and appropriateness of the period of record (sampling period). Part III — Testing for Sub -Groups in Background Groundwater Data Part III summarizes the approach for testing data sets for distinct sub -groups. If testing indicates presence of subgroups, the same steps described in Part I can be applied to the partitioned data to better understand the distribution of the samples within a sub -group for each constituent. Part IV — Development of BTVs for Constituents in Groundwater and Soil Part IV documents the steps for producing UTLs for each constituent for groundwater and soil. 3 1 P a g e REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 PART I — DESCRIPTION OF BACKGROUND DATA SETS Background data sets vary by size at each of the sites. Background groundwater data has been collected over a period of time at multiple locations per site. Background soil samples were primarily collected as part of the CSA activities. The following sections describe the groundwater and soil samples. Additional details regarding site -specific data sets have been provided in the CSA, CAP 1 and CAP 2, supplemental reports and electronic data submittals for each site. The data sets continue to be refined as additional data are available over time. Sample results with a detection or reporting limit greater than the applicable standard will not be included in the background data sets. Should the detection or reporting limit criteria impact the data set such that sufficient data is not available for producing BTVs for particular constituents, NCDEQ will be consulted to discuss alternative evaluation options for assessment of background, such as groundwater fate and transport modeling. Groundwater Groundwater samples are collected from monitoring wells screened in different flow layers using low -flow sampling techniques in accordance with the USEPA Region 1 Purging and Sampling Procedure for the Collection of Groundwater Samples from Monitoring Wells (revised January 19, 2010) and the Groundwater Monitoring Program, Low Flow Sampling Plan, Duke Energy Facilities, Ash Basin Groundwater Assessment Program, North Carolina, dated June 10, 2015. Groundwater samples have been analyzed for constituents listed in Table 1. Only non -filtered sample results will be utilized for producing BTVs. Sample data associated with a reported turbidity greater than 10 NTUs, samples without a recorded turbidity, samples with a pH greater than 8.5, or non -detect samples with a method detection limit above the respective 2L Standard or IMAC will be excluded from the background data set. Where site conditions require, professional judgment can be used to retain data that does not meet these criteria (such as where the naturally occurring groundwater pH is greater than 8.5). However, the decision to retain data that does not satisfy these criteria must be documented. BTVs will be calculated for each constituent within a flow layer using data pooled from all background wells screened within that flow layer. Soil Discrete soil samples were collected from multiple depth intervals during the CSA or other assessment events. The total number of samples and depth intervals in which samples were collected vary by site. Soil samples have been analyzed for constituents listed in Table 1. Only constituent concentrations from samples collected above the water table will be utilized for producing BTVs. To allow for comparison of results from soil samples collected from different depth intervals and locations across the site to the BTVs, background soil samples will be pooled from multiple depth intervals and non -impacted locations. Non -detect sample results with a method detection limit above the North Carolina Protection of Groundwater Preliminary Soil Remediation Goal (PSRG) will be excluded from the background soil dataset. Soil data are susceptible to exhibit spatial variation (by depth and geology), and as such preliminary data analysis methods will be used to evaluate the soil data set. To aid in identifying outliers, visual assessments will be performed using box -and -whisker plots and 41 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For I DUKE Groundwater and Soil At Coal Ash Facilities A; ENERGY. May 26, 2017 quantitative assessments will be used to test for differences in mean or median concentration across depth intervals or geologic formations. Results from the statistical analysis of soil data sets will allow for decisions to be made if pooling of soil data across multiple depth intervals or geologic formations is appropriate. 5 1 P a g e REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 PART II — PRELIMINARY DATA ANALYSIS Preliminary data analysis includes eight steps and is summarized in the following sections. 1. Descriptive Statistics Descriptive statistics are useful for characterizing data, increasing data set understanding, and for assessing information quality. For each site, descriptive statistics will be calculated for groundwater and soil data sets. For groundwater, descriptive statistics will be calculated for each constituent within each groundwater flow layer using pooled data from that groundwater flow layer. Soil descriptive statistics will be calculated for each constituent using the pooled background data set. The following statistics will be calculated to describe each data set. • Sample Size • Number of detects and non -detects • Percentage of non -detects • Number of distinct observations • Number of distinct method detection limits (MDL) 2. Graphical Analysis • Mean and median • Maximum and minimum • Standard Deviation • Skewness • Kurtosis Background groundwater data can be graphically portrayed using scatter plots, box -and - whisker and quantile-quantile (Q-Q) plots (Figures 1 and 2), while background soil data can be illustrated using box -and -whisker and Q-Q plots. The construction of scatter plots of concentration versus time (Figure 3) for each constituent within each background monitoring well or using the pooled data across all the background wells can assist in identifying potential trends or seasonality within data. Box -and -whisker and Q-Q plots can be constructed for each constituent within each groundwater flow layer using all data pooled from background wells monitoring that flow layer to identify possible outliers and sub -groups in addition to assessing data set distributions. Since only one constituent per soil boring is sampled, side -by -side box - and -whisker plots containing the concentrations of all constituents will be generated to capture any spatial variability arising from the different soil boring depths. Q-Q plots for the soil samples can be constructed per constituent to visually identify outliers based on the observations pooled across the soil depths. Instructions for interpreting box -and -whisker plots can be found on Figure 1. Q-Q plots (Figure 2) evaluate if a theoretical distribution can accurately model a sampled distribution. If the 3 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 sampled population is accurately modeled by the theoretical distribution, then quantiles from the sampled distribution should plot along a straight line when plotted against the quantiles of the theoretical distribution. Sampled values that plot markedly away from the straight line or jumps or breaks in the plot may indicate the presence of multiple sample populations, potential outliers, or non -normal sample distributions. The graphical analysis provides information regarding a steady-state baseline period. Multiple method detection limits over time will also be evaluated to determine if such variability affects the quality of the data. This process will be used to determine if all data can be incorporated into the analysis or if older historical data may need to be removed from the data set due to a change in the data reporting protocols for samples over time. 3. Identify Outliers Outliers are values that are not representative of the population from which they were sampled and whose presence can significantly alter statistical results. Data sets will initially be screened for potential outliers visually using box -and -whisker and Q-Q plots (Figures 1 and 2). Following the visual assessment of data for potential outliers, data sets will be screened for outliers quantitatively. While there are several tests available to test for possible outliers, Dixon's or Rosner's Outlier tests are specifically identified in the Unified Guidance (USEPA 2009) for providing requisite statistical strength and power necessary to appropriately identify potential outliers. Dixon's Outlier Test is suitable for data sets containing less than 25 samples, whereas Rosner's test is applicable for data sets containing greater than 25 samples. Both tests assume data are normally distributed. Extreme outliers are of interest; therefore, outlier tests will be conducted using a significance level of 0.01. Groundwater and soil constituent concentrations determined to be outliers will be provided in the statistics report submitted to NCDEQ. If statistical outliers have been detected, the project scientist will review the values to determine if they should be removed from the data set or are representative of background and should be retained for statistical analysis. Reasons as to why a particular statistical outlier should be included or excluded from either groundwater or soil background data sets will be documented as part of the final reference background concentration value documentation notes. 4. Identifying Data Distributions Many statistical tests, such as UTLs, make an explicit assumption concerning the distribution of sample data. Therefore, data must be fitted to a known distribution model (e.g., normal distribution). Upon completion of screening data sets for outliers, groundwater and soil data will be fitted to known distribution models using Goodness -of -Fit (GOF) tests. GOF tests assess how closely a data set resembles a given distribution model. The distribution models under consideration for the determination of groundwater and soil BTVs are normal, lognormal, and gamma distributions. In order to assess if data are normally or lognormally distributed, the Shapiro -Wilk or Lilliefors GOF test will be used. The Shapiro -Wilk GOF test is applicable for data sets comprised 50 or fewer samples, while the Lilliefors GOF test is appropriate for data sets containing more than 50 samples. To evaluate if data are gamma distributed, the Anderson -Darling or Kolmogorov- 7 1 P a g e REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 Smirnov GOF test will be utilized. GOF tests will be performed using a significance level of 0.05. The software package developed by the US EPA, ProUCL, has incorporated these methods to automatically test for either normal, lognormal, or gamma distribution types. If all GOF tests fail, non -parametric estimation methods will be used. The distribution of data will be evaluated as the data sets are established, with the understanding that distributions may change over time. 5. Evaluating Background Groundwater Data The following section applies to data sampled over time, such as groundwater data, and is not applicable to soil data. Constituent concentrations in groundwater sampled over time from multiple background well locations may exhibit patterns which suggest concentrations are increasing or decreasing over time. For background samples to be considered representative of areas unimpacted by human activity and be meaningful in the production of the BTVs, constituent concentrations over time should reflect a steady state, or `temporal stationarity'. In other words, a constituent's population characteristics (mean and variance) do not fluctuate over time (with consideration of normal seasonal fluctuations). Another assumption regarding samples collected across multiple background wells at a site is a constituent's mean and variance are constant across background wells, or `spatial stationarity'. If data collected from the background wells exhibit temporal or spatial non-stationarity, pooling of background well data can result in an inflated population variance and biased estimates of BTVs. A comparison of multiple box -and -whisker plots (Figure 1) can be used to visually assess whether background wells distributions have similar constituent concentration means and variances. Based on visual inspection of box -and -whisker plots, further analysis (such testing for differences in means or medians across background well locations) may be warranted to determine if a background well should be considered representative of background. Statistical tests for trends over time using the pooled data from the background wells should show no statistical significance. However, before proceeding to test for trends in the background samples, another assumption regarding constituent concentrations is the values must be independent from one another. When values are related to each other over varying time intervals, then values at any point in time can be expressed as a function of previous value(s). This type of relationship is termed autocorrelation. When values express a seasonal relationship, this type of autocorrelation is termed seasonality. The presence of autocorrelation, seasonality, or trends indicates data are temporally non - stationary. Assessment of background groundwater data should be performed to address temporal stationarity prior to pooling background data for the production of BTVs. Details for assessing data sets for temporal stationarity is summarized in the following sections. 8 1 P a g e REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 6. Autocorrelation Autocorrelation occurs when measurements collected at different points in time correlate with one another. Sources of autocorrelation in groundwater data can be due to seasonality, trends, or samples being collected too close to one another in time. Data that exhibit autocorrelation can affect sample variance and can lead to biased estimates of BTVs. For purposes of the initial raw background dataset and development of PPBTVs, a minimum 60-day interval between sample events will be used. In the event samples are collected at intervals shorter than 60 days (e.g., for catchup sampling at problematic locations, site conditions, etc.), autocorrelation evaluations will be performed and may be provided to the Division of Water Resources as lines of evidence to confirm the samples are not autocorrelated and can be included in the background data sets. Constituent concentrations in groundwater at a given background well will be checked for autocorrelation using the sample autocorrelation function (USEPA 2009). The sample autocorrelation function graphs correlation values between successive measurements against the time lag between sampling events and assumes data can be fitted to a known distribution model (Figure 4). Correlation values can be between zero and one, where one indicates a perfect correlation (dependence) and zero represents no correlation (independence). The sample autocorrelation function will be calculated using a significance level of 0.05. Autocorrelated observations can be corrected by 1) reducing sampling frequency and increasing the time between sample collection; 2) altering the statistical test used to analyze the data; or, 3) removing temporal patterns using a technique such as deseasonalization. 7. Seasonality Constituents in groundwater at background well locations may experience predictable recurring increases and decreases in concentrations, termed seasonality (Figure 6). Seasonality within a data set can introduce bias into the calculation of BTVs and result in falsely identifying a seasonal effect as potential impacts. Data should be assessed for seasonality once an adequate number of background groundwater samples have been collected. Useful diagnostic tools for evaluating data sets for seasonality are autocorrelation and scatter plots (Figures 4 and 5). When sufficient observations are available, then a side -by -side comparison of multiple box -and -whisker plots constructed by season are informative. If constituent concentrations within a given background well appears to experience seasonal fluctuations, the seasonal component within the data can be removed for the purpose of testing for trends. If seasonality is not addressed prior to testing for trends, then the statistical tests for trends may be misleading (i.e., fail to detect a trend when one is actually present or may indicate a significant trend when in fact, no trend exits). Strong evidence for the cause of seasonality within a data set should exist prior to removing seasonal components from background data sets. 9 1 P a g e REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 8. Trends Wells installed at background locations monitor natural groundwater quality unaffected by anthropogenic activities. Therefore, a key assumption regarding background is constituent concentrations in groundwater should demonstrate stationary conditions through time, free of any trends (Figure 6). Background data exhibiting trends (upward or downward) violate the assumption of temporal stationarity. Trending constituent concentrations in background wells may identify potential anthropogenic impacts (resulting in the well no longer being considered background), seasonality, or altering groundwater conditions. Furthermore, presence of trends in background data can lead to overestimation of variances which result in inflated BTVs. Prior to the calculation of BTVs, background well data will be evaluated for the presence of trends. Depending on the presence of non -detects (NDs) and seasonality, background data sets can be assessed for trends using one of three tests: • Mann -Kendall trend test • Seasonal Kendall Regression • Maximum likelihood estimation (MILE) regression The Mann -Kendall (MK) trend test is a nonparametric test method that can be used to identify linear trends within data sets that do not adhere to specific distribution models, do not exhibit seasonality, and contain NDs. The MK trend test can only be utilized to evaluate data sets containing only one MDL. Seasonal Kendall regression is similar to the MK test (data sets do not have to adhere to specific distribution models and can contain NDs as long as they are represented by a single MDL), except it accounts for seasonality. MILE Regression is a parametric method that estimates parameters of a statistical model and for fitting a statistical model to data. MILE Regression can be performed on data sets that can be fitted to a specific distribution model, do not demonstrate seasonality, and contain NDs. In cases where trending background constituent concentrations are identified, further analysis is recommended to rule out if the trend is more of an artefact related to the length of time available for the analysis and/or the small sample sizes. For example, if less than 10 samples have collected from a background well over a short duration (less than two years), then an observed trend in the well may not necessarily indicate changes in the natural variability of groundwater quality and may be representative of natural variation. If sufficient data is available for a constituent (> 20 observations), a statistical method called the piece -wise polynomial model can be used to inform the overall trend results. A description of this approach is as follows. 9. Additional Methods for Identifying Trends in Background Groundwater Data The piece -wise polynomial model is a useful tool for assessing constituent concentrations that have experienced multiple trends throughout monitoring. Piece -wise polynomial models attempt to find an appropriate mathematical function that expresses the relationship between the constituent concentrations and the sampling dates by using piece -wise regressions. Two 101Page REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 types of piece -wise models can be used to evaluate trends, the linear -linear and linear -linear - linear regression models. The linear -linear regression model assumes and identifies one structural break in a time -series, in which the two portions of the data separated by the break point exhibit two different trends modeled by two different linear equations. Similarly, the linear -linear -linear regression model attempts to identify two structural breaks to assess three different linear trends. Piece -wise polynomial models can be informative, but it have the disadvantage of not being able to account for NDs in within data sets. Therefore, it is recommended implementing piece - wise polynomial models in conjunction with MLE regression. Piece -wise models can also serve as a visual guide when selecting the baseline sampling periods for statistical analysis. For example, in Figure 7 the MLE regression suggested that the overall trend in constituent concentrations over time are steadily increasing, whereas the polynomial piece -wise regression with two structural breaks indicates concentrations have experienced upward and downward trends. 10. Determining Baseline Period for Background Wells This step provides information to make a determination of whether the entire period of record from which the background samples were collected is representative of natural background conditions and represents a baseline against which downgradient constituent concentrations in groundwater can be tested. If trend analysis indicates that over time the observations are steadily increasing or decreasing, then review of the data will be performed to determine if a sub -segment of the data set better represents the background period. For values to be considered representative of background, they should demonstrate temporal stationarity. REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 PART III — TESTING FOR SUB -GROUPS IN BACKGROUND GROUNDWATER DATA The following sections summarize the methodology for identifying sub -groups in groundwater data resulting from spatial or temporal variability and will not be applicable for the assessment of soil data. Part III summarizes steps to validate if statistically significant differences in background concentrations exist across potential sub -groups. Sub -groups represent distinct populations with statistically significant differences in mean or median concentrations among potential groups within a data set. An example of possible sub -groups is a difference in concentrations among constituents detected in background wells monitoring bedrock groundwater that were installed in different rock types. Each rock type has its own chemical characteristics that can influence groundwater chemistry and result in differing concentrations for constituents across background wells. In order to test for differences across potential sub -groups, sufficient sample size of at least eight to 10 samples is recommended for each potential sub -group (USEPA 2009, 2015). Testing for potential sub -groups within background data will be completed in three steps: • Graphical analysis • Analytical test for comparing sub -group differences • Tests for distinguishing which sub -groups are different Statistical tests utilized to test for potential sub -groups will be performed using a significance level of 0.05. Graphical Analysis Graphical representation of data is an effective tool for depicting patterns and relationships within data. Background groundwater data can be assessed for sub -groups using box -and -whisker and Q- Q plots (Figures 1 and 2). Multiple box -and -whisker and Q-Q plots can be constructed for comparing constituent concentrations and variability across perceived sub -groups. Another useful visual test assessing potential sub -group differences is the Empirical Distribution Function (EDF). EDFs compute summary statistics, generate EDF plots (Figure 9), and compute hypothesis tests appropriate for comparing two or more groups for data containing NDs (provided less than 50 percent of the results are NDs). Figure 8 of an EDF plot demonstrates that the two sub -groups representing samples taken during two different seasons show similar distributions or no differences in constituent concentrations between the two seasons. 12 1 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 Analytical Tests for Comparing Sub -Groups The following methods can be used to test for differences across sub -groups: • T-test and One-way Analysis of Variance (ANOVA) • Wilcoxon rank -sum and Kruskal-Wallis (KW) tests • Kaplan -Meier (KM) (log -rank) test All three types of tests can be used to test data sets containing NDs The t-test and One-way ANOVA are parametric statistical analyses that test for differences in means among groups. T-tests are used to test for differences in means among two groups, whereas ANOVA is used to test for differences in means across three or more groups. Both tests assume data are normally distributed with normally (or lognormally) distributed residual values and the variances among groups being compared are roughly the same. The Wilcoxon rank -sum test and KW Test are nonparametic equivalents of the parametric t-test and One-way ANOVA. Both the Wilcoxon rank -sum and KW Test analyze the ranks of the data rather than the actual concentrations and test for difference among average ranks between groups. The Wilcoxon rank -sum test compares the average rank between two groups, while the KW test compares the average rank across three or more groups. The KM (log -rank) test is a nonparametric test that compares the survival distribution between two or more groups. The KM (log -rank) test is useful for data sets that cannot be fitted to a discernible distribution model and contain a large percentage of NDs concentrations. Testing for potential sub -groups within background groundwater data sets will be performed using a significance level of 0.05. Tests for Identifying Differences Among Sub -Groups If results from One-way ANOVA, KW, and KM (log -rank) tests indicate a statistically significant difference during comparison of three or more groups, additional tests need to be performed to compare all possible pairs of sub -group means or average ranks to determine which ones are different from one another. These tests are referred to as `post -hoc' tests because they are performed after the fact. The Tukey-Kramer and Dunn's test are post -hoc tests that should be used to compare possible pairs of sub -groups means or average ranks. The Tukey-Kramer test is parametric and should be used to evaluate One -Way ANOVA results, whereas Dunn's test is nonparametric and should be utilized to assess KW and KM (log -rank) test results. Post - hoc analysis will be performed using a significance level of 0.05. 13 1 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 PART IV — DEVELOPMENT OF BTVs FOR CONSTITUENTS IN GROUNDWATER AND SOIL The USEPA Unified Guidance (2009) recommends using UTLs to estimate BTVs for constituents evaluated during assessment monitoring as opposed to the use of other statistical intervals such as confidence limits (UCL) and upper prediction limits (UPL). UTLs represent fixed values that do not rely on future observations, unlike UPLs, and are constructed using background data as opposed to a fixed health -based standard, unlike UCLs (USEPA 2009). UTLs allow for a "suitably high enough level above current background to allow for reversal of the test hypothesis and are the preferred statistical interval (USEPA 2009). In nearly all cases, the UTL is computed because the concern is generally for exceedances greater than the value. The only parameter that may require both upper and lower tolerance limits is pH. Site -specific BTVs for select constituents from Table 1 in groundwater and soil will be produced using UTLs. Tolerance intervals test the null hypothesis that concentrations in downgradient wells or at impacted soil sampling locations are similar to that of background and are constructed using the mean, standard deviation, and tolerance factor. For the estimation of BTVs for constituents in groundwater and soil, a coverage of 95 percent (p) and a confidence interval of 95 percent (1 -a) will be used. This means, there is a 95 percent probability that 95 percent of background sample concentrations will fall within this limit. The formulation of the UTL may vary slightly with the details of the test to be made and the characteristics of the data involved (see chapters 3 and 5 of ProUCL's Version 5.1.02 Technical Guide for the full specifications of the UTL formula under differing parametric and non -parametric assumptions), but the basic form for the (1-a)*100 percent UTL with coverage coefficient, p, under normal distribution assumptions in general is: Where UTL=x+K*s x = baseline (historical data) sample mean; and, s = baseline (historical data) standard deviation. K represents a special function called the tolerance factor. It depends on the sample size (n), the confidence coefficient (1 — a), and the coverage proportion (p). For selected values of n, p, and (1 -a), values of the tolerance factor (K) have been tabulated extensively in the statistical literature. ProUCL will be utilized to produce UTLs for each constituent. The type of UTL produced is a factor of distribution type, the desired confidence interval, coverage, and the percentage of NDs. Following completion of the preliminary data analysis described in Part II and applicable steps in Part III, the steps below will be completed for selection of appropriate UTLs. 1. UTLs will be produced for constituents in groundwater and soil using the statistical software program ProUCL. The first step in constructing UTLs using ProUCL is to 141Page REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 categorize constituents based on the presence or absence of NDs. ProUCL calculates UTLs differently depending on whether NDs are present within a data set. The algorithms in ProUCL use imputation and modeling techniques to address NDs. ProUCL does not substitute values (e.g., multiplying the MDL by a constant) for NDs, as this method introduces bias into the estimation of UTLs. Some constituent data sets may be represented 50 percent or more NDs. A large percentage of NDs make it difficult to fit data to distribution models. For data sets containing 50 percent or more NDs, UTLs will be constructed utilizing nonparametric techniques. 2. Produce UTLs using a coverage (p) and confidence level (1- a) of 95 percent. 3. Record all UTLs under all parametric and non -parametric distribution models. When data sets used for producing UTLs can be fitted to multiple distribution models, a specific hierarchy preference is applied. Calculation of a specific UTL will follow the distribution hierarchy preference below, with the noted exceptions: I. normal, II. gamma, III. lognormal, and; IV. nonparametric. The exception to the hierarchy is based on situations where the data set exhibits skewness that is moderate and higher (e.g. standard deviation of logged data is greater than 1) and sample size is small (e.g., n < 30). In these situations, the nonparametric UTL is preferred over lognormal UTL. Data set distributions will continue to be evaluated as additional samples are collected, with the understanding that distributions may change over time. 4. It has been demonstrated that if there are insufficient samples sizes, the non - parametric UTL cannot achieve the desired confidence coefficient of 95 percent. Depending on background sample size, a different order statistic is selected to produce UTLs. For constituent data sets containing less than 59 samples, UTLs will be produced using a coverage of 85 percent (i.e., the 85th percentile) and a confidence coefficient of 95 percent as this coverage is more likely to be achieved even with sample sizes as low as 10. For data sets containing 59 or more samples, UTLs will be produced using coverage of 95 percent and confidence coefficient of 95 percent. 5. A minimum of ten valid background samples should be obtained prior to producing BTVs for each constituent for soil and each constituent in each flow layer for groundwater. If it is deemed necessary to produce BTVs prior to obtaining ten valid samples, UTLs will not be calculated and the PPBTV for a constituent (soil) and a constituent within a flow layer (groundwater) will be estimated to be either: • the highest value, or 151Page REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE Groundwater and Soil At Coal Ash Facilities '*'ENERGY. May 26, 2017 • if the highest value is an order of magnitude greater than the geometric mean of all values, then the highest value will be considered an outlier and the second highest value will be utilized as the PPBTV. In situations where there are non -detects and less than ten valid samples, the geometric mean, which is the product of all values (including the censored values) taken to the root of n, may not be representative of the central tendency of that sample. The median may be a better reference value from which to determine if the highest value is an acceptable estimate for the PPBTV, and may be utilized if determined appropriate. In addition, the allocated time frame necessary to collect an additional ten samples for further evaluation of background may not be available given the assessment deadlines and autocorrelation restrictions. In evaluating the need for inclusion of additional background data to produce revised BTVs, DEQ will determine what data are appropriate for inclusion in a comprehensive background data set based on relevant considerations. 161Page REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE Groundwater and Soil At Coal Ash Facilities 17" ENERGY. May 26, 2017 REFERENCES NCDEQ DWR, 2012. Evaluating Metals in Groundwater at DWQ Permitted Facilities: A Technical Assistance Document for DWQ Staff. http://digital.ncdcr.gov/cdm/ref/collection/pl 6062coII9/id/251181. USEPA, 1992. Supplemental Guidance to RAGS: Calculating the Concentration Term. Publication 9285.7-081. USEPA, 2002. Guidance for Comparing Background and Chemical Concentrations in Soil for CERCLA Sites. EPA 540-R-01-003. USEPA, 2009. Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities — Unified Guidance, March 2009. EPA 530-R-09-007. USEPA, 2015. ProUCL 5.1.002 Technical Guide Statistical Software for Environmental Applications for Data Sets with and without Nondetect Observations. EPA/600/R07/041. 171 REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For /-DuKE Groundwater and Soil At Coal Ash Facilities ti `ENERGY. May 26, 2017 FIGURES Box Plot for PARAMETER 0 1000 - 800 - X w W 600 - a 100 - PAR,aN E-ER Possible Outliers 90t" Percentile 10t" Percentile 75t" Percentile 50t" Percentile 25t" Percentile Q-Q Plot for PARAMETER s 1000 - • 0 800 - w w z 600 - 0 w w w 400 - # • 200 - • -1_8 -1.2 -0.5 0.0 0.5 1.2 1.8 QUANTILE (THEORETICAL) (•� DUKE FIGURE 2 �I ENERGQUANTILE-QUANTILE (Q-Q) PLOT Y® 450 400 sOr 350 °•� z 300 �• ' ®� O H 250 H V 200 f MW-1 z - -Cl M W-2 150 100 50 0 1/1/2010 8/24/2011 4/15/2013 12/6/2014 DATE DUKE FIGURE 3 A CRG/® SCATTER PLOTS COMPARING TIME VERSUS CONCENTRATION BETWEEN TWO WELLS Autocorrelations of Parameter �0,0,4,1,0) Lag Correlation Lag Correlation Lag Correlation Lag Correlation 1 0.557470 6-0.226209 11 -0.342618 16 0.140436 2 0.399406 7-0.206603 12 -0.337405 17 0.185468 3 0.182616 8-0.202751 13 -0.171677 18 0.160949 4 0.023263 9-0.321882 14 -0.147838 19 0.140392 5-0.096115 10 -0.36 9379 15 0.043 38 9 Significant if JCorrelationj> 0.426401 60 Z 50 O Q H 40 Z Lu u p 30 u Lu W 20 Q a 10 0 3/3/2010 5/11/2012 7/20/2014 9/27/2016 DATE FIGURE 5 DUKE '` SCATTER PLOT OF TIME VERSUS ENERGY. CONCENTRATION ILLUSTRATING SEASONALITY 120 zo 100 ard�Cerd H Q 80 Vp� W I= Q Z 60 Q u 40 - z a 0 20 0 3/3/2010 5/11/2012 7/20/2014 9/27/2016 DATE 120 z 100 Downward H Q 80 Trend W J= Q Z 60 Q u 40 z a O 20 0 3/3/2010 5/11/2012 7/20/2014 9/27/2016 DATE jDUKE FIGURE 6 SCATTER PLOTS OF TIME VERSUS ENERGY CONCENTRATION ILLUSTRATING TRENDS Annual Trend Analysis: Deseasonalized Data vs Date Piece -Wise (Linear -Linear -Linear) 0.0250 0.0232 0.0213 0.0195 • 0.0176 0.0158 Downward Trend 0.0139 Upward Trends • q 0.0121 • • is 0.0103 ( • a> .� 0.0084 0 • • • 0 0.0066 • Cnco • • + • 0.0047 •• • • • • • p 0.0029 • • • • 0.0011 • -0.0008 -0.0026 -0.0045 -0.0063 -0.0082 -0.0100 � T 2-1 D p C CD CD CD v ? 7 s cQ N N N N N N N N N N O O O O O O O p O O CD � � Oco co co O N IV Date ( DUKE FIGURE 7 PIECE -WISE POLYNOMIAL REGRESSION ENERGY° OUTPUT EXHIBITING MULTIPLE TRENDS r I SEAS0 01 r r II 26 0_00 12tis.CIl1 20— Cl Cl Cl 37 IIII yIIII III, Seasonal FIGURE 8 (•� DUKE EMPERICAL DISTRIBUTION PLOT COMPARING ENERGY. CONSTITUENT CONCENTRATIONS BETWEEN TWO SEASONS REVISED DRAFT Duke Energy Carolinas, LLC Statistical Methods For Developing Reference Background Concentrations For /-DuKE Groundwater and Soil At Coal Ash Facilities ti `ENERGY. May 26, 2017 TABLE TABLE 1 CHEMICAL PARAMETERS ANALYZED IN GROUNDWATER AND SOIL FIELD PARAMETERS pH*t Specific Conductance* Temperature* Dissolved Oxygen* Oxidation Reduction Potential* Eh* Turbidity* INORGANICS Aluminum Antimony Arsenic Barium Beryllium Boron Cadmium Chromium Cobalt Copper Iron Lead Manganese Mercury Molybdenum Nickel Selenium Strontium Thallium (low level) Vanadium (low level) Zinc RADIONUCLIDES Radium 226* Radium 228* Uranium (233, 234, 236, 238)* ANIONS/CATIONS/OTHER Alkalinity (as CaCO3)* Bicarbonate* Calcium Carbonate* Chloride Magnesium Nitrate (as N)t Nitrate + Nitrite* Potassium Percent Moisturet Methane* Sodium Sulfate Sulfide* Total Dissolved Solids* Total Organic Carbon Total Suspended Solids* NOTES: * = Indicates parameter analyzed in groundwater only. t = Indicates parameters analyzed in soil only. Metals in groundwater were analyzed for total and dissolved concentrations. Soil pH measured at 25 degrees C. Analysis of groundwater and soil samples for Chromium (VI) began after initial samples were collected as part of CSA. Page Iof1