HomeMy WebLinkAboutStatistical Methods for Determining BTVs_Revised_05262017(> DUKE
ENERGY.
REVISED DRAFT
Statistical Methods for
Developing Reference Background
Concentrations for
Groundwater and Soil
at Coal Ash Facilities
May 26, 2017
Prepared By:
HDR Engineering, Inc.
440 S. Church St, Suite 1000
Charlotte, NC 28202
and
SynTerra Corporation
148 River Street, Suite 220
Greenville, South Carolina 29601
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
CONTENTS
INTRODUCTION..........................................................................................................................1
PART I — DESCRIPTION OF BACKGROUND DATA SETS...................................................4
Groundwater..........................................................................................................................4
Soil.........................................................................................................................................4
PART II — PRELIMINARY DATA ANALYSIS..........................................................................6
1. Descriptive Statistics...............................................................................................6
2. Graphical Analysis...................................................................................................6
3. Identify Outliers........................................................................................................7
4. Identifying Data Distributions...................................................................................7
5. Evaluating Background Groundwater Data.............................................................8
6. Autocorrelation........................................................................................................9
7. Seasonality..............................................................................................................
9
8. Trends....................................................................................................................10
9. Additional Methods for Identifying Trends in Background Groundwater Data ......10
10. Determining Baseline Period for Background Wells..............................................11
PART III — TESTING FOR SUB -GROUPS IN BACKGROUND GROUNDWATER DATA ...12
GraphicalAnalysis...............................................................................................................12
Analytical Tests for Comparing Sub-Groups.......................................................................13
Tests for Identifying Differences Among Sub-Groups.........................................................13
PART IV — DEVELOPMENT OF BTVs FOR CONSTITUENTS IN GROUNDWATER AND
SOIL......................................................................................................................14
REFERENCES...........................................................................................................................17
i I P a g e
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
FIGURES
Figure 1. Box -and -Whisker Plot
Figure 2. Quantile-Quantile (Q-Q) Plot
Figure 3. Scatter Plot of Time versus Concentration
Figure 4. Sample Autocorrelation Function
Figure 5. Scatter Plots of Time versus Concentration Illustrating Seasonality
Figure 6. Scatter Plots of Time versus Concentration Illustrating Trends
Figure 7. Piece -Wise Polynomial Regression Output Exhibiting Multiple Trends
Figure 8. Empirical Distribution Plot Comparing Constituent Concentrations between Two
Seasons
TABLES
Table 1. Chemical Parameters Analyzed in Groundwater and Soil
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
INTRODUCTION
In 2015, the North Carolina Coal Ash Management Act (CAMA) required the preparation of a
Comprehensive Site Assessment Report (CSA) for each regulated facility. The purpose of the
CSA was to identify the source and cause of exceedances of regulatory standards, potential
hazards to public health and safety, and identify receptors and exposure pathways. The CSA
was conducted in accordance with a conditionally approved Work Plan to meet the
requirements of 15A NCAC 02L .0106(g), which includes an assessment of the horizontal and
vertical extent of soil and groundwater contamination for all contaminants confirmed to be
present in groundwater in exceedance of groundwater quality standards.
Regulations regarding North Carolina groundwater quality standards provided in T15A NCAC
02L .0202. Section (b)(3) of the regulation state that:
Where naturally occurring substances exceed the established standard, the standard
shall be the naturally occurring concentration as determined by the Director.
For soil and groundwater assessments under the CAMA, naturally occurring concentrations of
constituents need to be determined in order to complete horizontal and vertical delineations
required as a basis for development of Corrective Action Plans. The horizontal and vertical
extent of constituent migration cannot be determined until naturally occurring background
concentrations are known.
This document serves as a framework for a consistent technical approach which will be utilized
for Duke Energy sites in North Carolina to determine proposed provisional background
threshold values (PPBTVs') for naturally occurring constituents in groundwater and soil. For
the purpose of establishing background threshold values (BTVs') at this time, the value which
represents the upper threshold value from the data distribution for a given constituent will be
considered the value representative of a naturally occurring concentration, or the PPBTV. The
process for evaluating background concentrations over time is iterative; therefore, as additional
background data is collected, the approach for developing BTVs may be reviewed and
potentially modified with consideration of expanded data sets, changes in data set distribution,
and input from the North Carolina Department of Environmental Quality (NCDEQ).
For groundwater, non -filtered (total) results will be used to establish BTVs. In general,
groundwater data will not be included in the development of BTVs when turbidity of the
groundwater sample was reported to be greater than 10 nephelometric turbidity units (NTU) or
when pH is greater than 8.5. Professional judgment can be used to retain data that does not
meet these criteria. However, the decision to retain data that does not satisfy these criteria
must be documented; such as, concurrence with NCDEQ that naturally occurring pH is greater
than 8.5 in the unit being evaluated. Background locations for groundwater were identified for
each site in the CSA Reports and/or Corrective Action Plans (CAPs). Other wells unaffected by
The terms PPBTV and BTV are used interchangeably in this document. The term BTV is used in the EPA
ProUCL User guide.
12
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
plant operations may also be used to augment the background data set with agreement from
NCDEQ.
For soil, only samples collected above the water table and at locations not influenced by Plant
operations will be included in the calculation of BTVs. Site -specific soil sampling locations and
intervals are described in the CSA Work Plans.
The methods for developing PPBTVs described in this document are based on the US
Environmental Protection Agency (USEPA) "Unified Guidance" (USEPA 2009), USEPA's
Guidance for Comparing Background and Chemical Concentrations in Soil for CERCLA Sites
(USEPA 2002), and the ProUCL Technical Guide (USEPA 2015). In addition, the North
Carolina Division of Water Quality (NCDWQ) technical assistance document for Evaluating
Metals in Groundwater at DWQ Permitted Facilities (NCDWQ 2012) was also referenced.
USEPA's ProUCL Version 5.1 Technical Guide (EPA/600/R-07/041 December 2015) states
that:
A defensible background data set represents a "single" environmental population
possibly without any outliers. In a background data set, in addition to reporting and/or
laboratory errors, statistical outliers may also be present... elevated outliers should not
be included in background data sets and estimation of BTVs. The objective here is to
compute background statistics based upon a data set which represents the main
background population, and does not accommodate the few low probability high outliers
(e.g., coming from extreme tails of the data distribution) that may also be present in the
sampled data. The occurrence of elevated outliers is common when background
samples are collected from various onsite areas (e.g., large Federal Facilities). The
proper disposition of outliers, to include or not include them in statistical computations,
should be decided by the project team. The project team may want to compute decision
statistics with and without the outliers to evaluate the influence of outliers on the
decision making statistics.
The methods described in this document are intended to serve as guidelines to develop BTVs.
The use of the upper tolerance limit (UTL) to establish BTVs for constituents analyzed during
assessment monitoring is consistent with NCDEQ Guidance as well as the USEPA's Unified
Guidance (2009). The UTL will be evaluated as the statistic for development of groundwater
and soil BTVs. BTVs will be developed for a select group of constituents derived from the list of
parameters investigated as part of CAMA (Table 1). The UTL will be used to represent an
upper limit for naturally occurring concentrations such that values exceeding this limit may be
indicative of groundwater and soil impacts.
Naturally occurring concentrations determined by the process presented in this document will
be submitted to the NCDEQ Division of Water Resources for determination of the PPBTVs.
Site -specific reports documenting the procedures, evaluations, and calculations will be
prepared and submitted to NCDEQ. Following NCDEQ's approval of the PPBTVs, the PPBTVs
will be used as groundwater and soil standards when the values exceed concentrations
appearing in T15A NCAC 02L .0202(g) or the Interim Maximum Allowable Concentrations
(Appendix #1 to T 15A NCAC 02L) for groundwater or Preliminary Soil Remediation Goals as
21
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
described in Section 4 of the NCDEQ 2015 Inactive Hazardous Sites Program Guidelines for
Assessment and Cleanup (NCDEQ 2015) for soil.
This document consists of four parts describing the process for establishing BTVs for
constituents in groundwater and soil:
Part I — Description of Background Data Sets
Part I provides discussion of groundwater and soil sample collection, background data set
attributes, and preliminary treatment of background data.
Part II — Preliminary Data Analysis
Part II includes analyses used to assess and transform data (where necessary) for use in
producing appropriate UTLs. This analysis includes screening data sets for outliers, fitting
data sets to distribution models, assessing data for temporal variability, and
appropriateness of the period of record (sampling period).
Part III — Testing for Sub -Groups in Background Groundwater Data
Part III summarizes the approach for testing data sets for distinct sub -groups. If testing
indicates presence of subgroups, the same steps described in Part I can be applied to the
partitioned data to better understand the distribution of the samples within a sub -group for
each constituent.
Part IV — Development of BTVs for Constituents in Groundwater and Soil
Part IV documents the steps for producing UTLs for each constituent for groundwater and
soil.
3 1 P a g e
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
PART I — DESCRIPTION OF BACKGROUND DATA SETS
Background data sets vary by size at each of the sites. Background groundwater data has
been collected over a period of time at multiple locations per site. Background soil samples
were primarily collected as part of the CSA activities. The following sections describe the
groundwater and soil samples. Additional details regarding site -specific data sets have been
provided in the CSA, CAP 1 and CAP 2, supplemental reports and electronic data submittals
for each site. The data sets continue to be refined as additional data are available over time.
Sample results with a detection or reporting limit greater than the applicable standard will not
be included in the background data sets. Should the detection or reporting limit criteria impact
the data set such that sufficient data is not available for producing BTVs for particular
constituents, NCDEQ will be consulted to discuss alternative evaluation options for assessment
of background, such as groundwater fate and transport modeling.
Groundwater
Groundwater samples are collected from monitoring wells screened in different flow layers
using low -flow sampling techniques in accordance with the USEPA Region 1 Purging and
Sampling Procedure for the Collection of Groundwater Samples from Monitoring Wells (revised
January 19, 2010) and the Groundwater Monitoring Program, Low Flow Sampling Plan, Duke
Energy Facilities, Ash Basin Groundwater Assessment Program, North Carolina, dated June
10, 2015. Groundwater samples have been analyzed for constituents listed in Table 1. Only
non -filtered sample results will be utilized for producing BTVs. Sample data associated with a
reported turbidity greater than 10 NTUs, samples without a recorded turbidity, samples with a
pH greater than 8.5, or non -detect samples with a method detection limit above the respective
2L Standard or IMAC will be excluded from the background data set. Where site conditions
require, professional judgment can be used to retain data that does not meet these criteria
(such as where the naturally occurring groundwater pH is greater than 8.5). However, the
decision to retain data that does not satisfy these criteria must be documented. BTVs will be
calculated for each constituent within a flow layer using data pooled from all background wells
screened within that flow layer.
Soil
Discrete soil samples were collected from multiple depth intervals during the CSA or other
assessment events. The total number of samples and depth intervals in which samples were
collected vary by site. Soil samples have been analyzed for constituents listed in Table 1.
Only constituent concentrations from samples collected above the water table will be utilized for
producing BTVs. To allow for comparison of results from soil samples collected from different
depth intervals and locations across the site to the BTVs, background soil samples will be
pooled from multiple depth intervals and non -impacted locations. Non -detect sample results
with a method detection limit above the North Carolina Protection of Groundwater Preliminary
Soil Remediation Goal (PSRG) will be excluded from the background soil dataset.
Soil data are susceptible to exhibit spatial variation (by depth and geology), and as such
preliminary data analysis methods will be used to evaluate the soil data set. To aid in
identifying outliers, visual assessments will be performed using box -and -whisker plots and
41
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For I DUKE
Groundwater and Soil At Coal Ash Facilities A; ENERGY.
May 26, 2017
quantitative assessments will be used to test for differences in mean or median concentration
across depth intervals or geologic formations. Results from the statistical analysis of soil data
sets will allow for decisions to be made if pooling of soil data across multiple depth intervals or
geologic formations is appropriate.
5 1 P a g e
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
PART II — PRELIMINARY DATA ANALYSIS
Preliminary data analysis includes eight steps and is summarized in the following sections.
1. Descriptive Statistics
Descriptive statistics are useful for characterizing data, increasing data set understanding, and
for assessing information quality. For each site, descriptive statistics will be calculated for
groundwater and soil data sets.
For groundwater, descriptive statistics will be calculated for each constituent within each
groundwater flow layer using pooled data from that groundwater flow layer.
Soil descriptive statistics will be calculated for each constituent using the pooled background
data set.
The following statistics will be calculated to describe each data set.
• Sample Size
• Number of detects and non -detects
• Percentage of non -detects
• Number of distinct observations
• Number of distinct method detection
limits (MDL)
2. Graphical Analysis
• Mean and median
• Maximum and minimum
• Standard Deviation
• Skewness
• Kurtosis
Background groundwater data can be graphically portrayed using scatter plots, box -and -
whisker and quantile-quantile (Q-Q) plots (Figures 1 and 2), while background soil data can be
illustrated using box -and -whisker and Q-Q plots. The construction of scatter plots of
concentration versus time (Figure 3) for each constituent within each background monitoring
well or using the pooled data across all the background wells can assist in identifying potential
trends or seasonality within data. Box -and -whisker and Q-Q plots can be constructed for each
constituent within each groundwater flow layer using all data pooled from background wells
monitoring that flow layer to identify possible outliers and sub -groups in addition to assessing
data set distributions. Since only one constituent per soil boring is sampled, side -by -side box -
and -whisker plots containing the concentrations of all constituents will be generated to capture
any spatial variability arising from the different soil boring depths. Q-Q plots for the soil samples
can be constructed per constituent to visually identify outliers based on the observations pooled
across the soil depths.
Instructions for interpreting box -and -whisker plots can be found on Figure 1. Q-Q plots (Figure
2) evaluate if a theoretical distribution can accurately model a sampled distribution. If the
3
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
sampled population is accurately modeled by the theoretical distribution, then quantiles from
the sampled distribution should plot along a straight line when plotted against the quantiles of
the theoretical distribution. Sampled values that plot markedly away from the straight line or
jumps or breaks in the plot may indicate the presence of multiple sample populations, potential
outliers, or non -normal sample distributions.
The graphical analysis provides information regarding a steady-state baseline period. Multiple
method detection limits over time will also be evaluated to determine if such variability affects
the quality of the data. This process will be used to determine if all data can be incorporated
into the analysis or if older historical data may need to be removed from the data set due to a
change in the data reporting protocols for samples over time.
3. Identify Outliers
Outliers are values that are not representative of the population from which they were sampled
and whose presence can significantly alter statistical results. Data sets will initially be screened
for potential outliers visually using box -and -whisker and Q-Q plots (Figures 1 and 2).
Following the visual assessment of data for potential outliers, data sets will be screened for
outliers quantitatively. While there are several tests available to test for possible outliers,
Dixon's or Rosner's Outlier tests are specifically identified in the Unified Guidance (USEPA
2009) for providing requisite statistical strength and power necessary to appropriately identify
potential outliers. Dixon's Outlier Test is suitable for data sets containing less than 25 samples,
whereas Rosner's test is applicable for data sets containing greater than 25 samples. Both
tests assume data are normally distributed.
Extreme outliers are of interest; therefore, outlier tests will be conducted using a significance
level of 0.01. Groundwater and soil constituent concentrations determined to be outliers will be
provided in the statistics report submitted to NCDEQ. If statistical outliers have been detected,
the project scientist will review the values to determine if they should be removed from the data
set or are representative of background and should be retained for statistical analysis.
Reasons as to why a particular statistical outlier should be included or excluded from either
groundwater or soil background data sets will be documented as part of the final reference
background concentration value documentation notes.
4. Identifying Data Distributions
Many statistical tests, such as UTLs, make an explicit assumption concerning the distribution of
sample data. Therefore, data must be fitted to a known distribution model (e.g., normal
distribution). Upon completion of screening data sets for outliers, groundwater and soil data will
be fitted to known distribution models using Goodness -of -Fit (GOF) tests. GOF tests assess
how closely a data set resembles a given distribution model. The distribution models under
consideration for the determination of groundwater and soil BTVs are normal, lognormal, and
gamma distributions.
In order to assess if data are normally or lognormally distributed, the Shapiro -Wilk or Lilliefors
GOF test will be used. The Shapiro -Wilk GOF test is applicable for data sets comprised 50 or
fewer samples, while the Lilliefors GOF test is appropriate for data sets containing more than
50 samples. To evaluate if data are gamma distributed, the Anderson -Darling or Kolmogorov-
7 1 P a g e
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
Smirnov GOF test will be utilized. GOF tests will be performed using a significance level of
0.05.
The software package developed by the US EPA, ProUCL, has incorporated these methods to
automatically test for either normal, lognormal, or gamma distribution types. If all GOF tests
fail, non -parametric estimation methods will be used.
The distribution of data will be evaluated as the data sets are established, with the
understanding that distributions may change over time.
5. Evaluating Background Groundwater Data
The following section applies to data sampled over time, such as groundwater data, and is not
applicable to soil data.
Constituent concentrations in groundwater sampled over time from multiple background well
locations may exhibit patterns which suggest concentrations are increasing or decreasing over
time. For background samples to be considered representative of areas unimpacted by human
activity and be meaningful in the production of the BTVs, constituent concentrations over time
should reflect a steady state, or `temporal stationarity'. In other words, a constituent's
population characteristics (mean and variance) do not fluctuate over time (with consideration of
normal seasonal fluctuations). Another assumption regarding samples collected across multiple
background wells at a site is a constituent's mean and variance are constant across
background wells, or `spatial stationarity'. If data collected from the background wells exhibit
temporal or spatial non-stationarity, pooling of background well data can result in an inflated
population variance and biased estimates of BTVs.
A comparison of multiple box -and -whisker plots (Figure 1) can be used to visually assess
whether background wells distributions have similar constituent concentration means and
variances. Based on visual inspection of box -and -whisker plots, further analysis (such testing
for differences in means or medians across background well locations) may be warranted to
determine if a background well should be considered representative of background.
Statistical tests for trends over time using the pooled data from the background wells should
show no statistical significance. However, before proceeding to test for trends in the
background samples, another assumption regarding constituent concentrations is the values
must be independent from one another. When values are related to each other over varying
time intervals, then values at any point in time can be expressed as a function of previous
value(s). This type of relationship is termed autocorrelation. When values express a seasonal
relationship, this type of autocorrelation is termed seasonality.
The presence of autocorrelation, seasonality, or trends indicates data are temporally non -
stationary. Assessment of background groundwater data should be performed to address
temporal stationarity prior to pooling background data for the production of BTVs.
Details for assessing data sets for temporal stationarity is summarized in the following sections.
8 1 P a g e
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
6. Autocorrelation
Autocorrelation occurs when measurements collected at different points in time correlate with
one another. Sources of autocorrelation in groundwater data can be due to seasonality, trends,
or samples being collected too close to one another in time. Data that exhibit autocorrelation
can affect sample variance and can lead to biased estimates of BTVs.
For purposes of the initial raw background dataset and development of PPBTVs, a minimum
60-day interval between sample events will be used. In the event samples are collected at
intervals shorter than 60 days (e.g., for catchup sampling at problematic locations, site
conditions, etc.), autocorrelation evaluations will be performed and may be provided to the
Division of Water Resources as lines of evidence to confirm the samples are not autocorrelated
and can be included in the background data sets.
Constituent concentrations in groundwater at a given background well will be checked for
autocorrelation using the sample autocorrelation function (USEPA 2009). The sample
autocorrelation function graphs correlation values between successive measurements against
the time lag between sampling events and assumes data can be fitted to a known distribution
model (Figure 4). Correlation values can be between zero and one, where one indicates a
perfect correlation (dependence) and zero represents no correlation (independence). The
sample autocorrelation function will be calculated using a significance level of 0.05.
Autocorrelated observations can be corrected by 1) reducing sampling frequency and
increasing the time between sample collection; 2) altering the statistical test used to analyze
the data; or, 3) removing temporal patterns using a technique such as deseasonalization.
7. Seasonality
Constituents in groundwater at background well locations may experience predictable recurring
increases and decreases in concentrations, termed seasonality (Figure 6). Seasonality within
a data set can introduce bias into the calculation of BTVs and result in falsely identifying a
seasonal effect as potential impacts.
Data should be assessed for seasonality once an adequate number of background
groundwater samples have been collected. Useful diagnostic tools for evaluating data sets for
seasonality are autocorrelation and scatter plots (Figures 4 and 5). When sufficient
observations are available, then a side -by -side comparison of multiple box -and -whisker plots
constructed by season are informative. If constituent concentrations within a given background
well appears to experience seasonal fluctuations, the seasonal component within the data can
be removed for the purpose of testing for trends. If seasonality is not addressed prior to testing
for trends, then the statistical tests for trends may be misleading (i.e., fail to detect a trend when
one is actually present or may indicate a significant trend when in fact, no trend exits). Strong
evidence for the cause of seasonality within a data set should exist prior to removing seasonal
components from background data sets.
9 1 P a g e
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
8. Trends
Wells installed at background locations monitor natural groundwater quality unaffected by
anthropogenic activities. Therefore, a key assumption regarding background is constituent
concentrations in groundwater should demonstrate stationary conditions through time, free of
any trends (Figure 6). Background data exhibiting trends (upward or downward) violate the
assumption of temporal stationarity. Trending constituent concentrations in background wells
may identify potential anthropogenic impacts (resulting in the well no longer being considered
background), seasonality, or altering groundwater conditions. Furthermore, presence of trends
in background data can lead to overestimation of variances which result in inflated BTVs.
Prior to the calculation of BTVs, background well data will be evaluated for the presence of
trends. Depending on the presence of non -detects (NDs) and seasonality, background data
sets can be assessed for trends using one of three tests:
• Mann -Kendall trend test
• Seasonal Kendall Regression
• Maximum likelihood estimation (MILE) regression
The Mann -Kendall (MK) trend test is a nonparametric test method that can be used to identify
linear trends within data sets that do not adhere to specific distribution models, do not exhibit
seasonality, and contain NDs. The MK trend test can only be utilized to evaluate data sets
containing only one MDL. Seasonal Kendall regression is similar to the MK test (data sets do
not have to adhere to specific distribution models and can contain NDs as long as they are
represented by a single MDL), except it accounts for seasonality. MILE Regression is a
parametric method that estimates parameters of a statistical model and for fitting a statistical
model to data. MILE Regression can be performed on data sets that can be fitted to a specific
distribution model, do not demonstrate seasonality, and contain NDs.
In cases where trending background constituent concentrations are identified, further analysis
is recommended to rule out if the trend is more of an artefact related to the length of time
available for the analysis and/or the small sample sizes. For example, if less than 10 samples
have collected from a background well over a short duration (less than two years), then an
observed trend in the well may not necessarily indicate changes in the natural variability of
groundwater quality and may be representative of natural variation. If sufficient data is available
for a constituent (> 20 observations), a statistical method called the piece -wise polynomial
model can be used to inform the overall trend results. A description of this approach is as
follows.
9. Additional Methods for Identifying Trends in Background
Groundwater Data
The piece -wise polynomial model is a useful tool for assessing constituent concentrations that
have experienced multiple trends throughout monitoring. Piece -wise polynomial models
attempt to find an appropriate mathematical function that expresses the relationship between
the constituent concentrations and the sampling dates by using piece -wise regressions. Two
101Page
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
types of piece -wise models can be used to evaluate trends, the linear -linear and linear -linear -
linear regression models.
The linear -linear regression model assumes and identifies one structural break in a time -series,
in which the two portions of the data separated by the break point exhibit two different trends
modeled by two different linear equations. Similarly, the linear -linear -linear regression model
attempts to identify two structural breaks to assess three different linear trends.
Piece -wise polynomial models can be informative, but it have the disadvantage of not being
able to account for NDs in within data sets. Therefore, it is recommended implementing piece -
wise polynomial models in conjunction with MLE regression. Piece -wise models can also serve
as a visual guide when selecting the baseline sampling periods for statistical analysis.
For example, in Figure 7 the MLE regression suggested that the overall trend in constituent
concentrations over time are steadily increasing, whereas the polynomial piece -wise regression
with two structural breaks indicates concentrations have experienced upward and downward
trends.
10. Determining Baseline Period for Background Wells
This step provides information to make a determination of whether the entire period of record
from which the background samples were collected is representative of natural background
conditions and represents a baseline against which downgradient constituent concentrations in
groundwater can be tested. If trend analysis indicates that over time the observations are
steadily increasing or decreasing, then review of the data will be performed to determine if a
sub -segment of the data set better represents the background period. For values to be
considered representative of background, they should demonstrate temporal stationarity.
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
PART III — TESTING FOR SUB -GROUPS IN BACKGROUND
GROUNDWATER DATA
The following sections summarize the methodology for identifying sub -groups in groundwater
data resulting from spatial or temporal variability and will not be applicable for the assessment
of soil data.
Part III summarizes steps to validate if statistically significant differences in background
concentrations exist across potential sub -groups. Sub -groups represent distinct populations
with statistically significant differences in mean or median concentrations among potential
groups within a data set. An example of possible sub -groups is a difference in concentrations
among constituents detected in background wells monitoring bedrock groundwater that were
installed in different rock types. Each rock type has its own chemical characteristics that can
influence groundwater chemistry and result in differing concentrations for constituents across
background wells.
In order to test for differences across potential sub -groups, sufficient sample size of at least
eight to 10 samples is recommended for each potential sub -group (USEPA 2009, 2015).
Testing for potential sub -groups within background data will be completed in three steps:
• Graphical analysis
• Analytical test for comparing sub -group differences
• Tests for distinguishing which sub -groups are different
Statistical tests utilized to test for potential sub -groups will be performed using a significance
level of 0.05.
Graphical Analysis
Graphical representation of data is an effective tool for depicting patterns and relationships
within data.
Background groundwater data can be assessed for sub -groups using box -and -whisker and Q-
Q plots (Figures 1 and 2). Multiple box -and -whisker and Q-Q plots can be constructed for
comparing constituent concentrations and variability across perceived sub -groups.
Another useful visual test assessing potential sub -group differences is the Empirical Distribution
Function (EDF). EDFs compute summary statistics, generate EDF plots (Figure 9), and
compute hypothesis tests appropriate for comparing two or more groups for data containing
NDs (provided less than 50 percent of the results are NDs).
Figure 8 of an EDF plot demonstrates that the two sub -groups representing samples taken
during two different seasons show similar distributions or no differences in constituent
concentrations between the two seasons.
12 1
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
Analytical Tests for Comparing Sub -Groups
The following methods can be used to test for differences across sub -groups:
• T-test and One-way Analysis of Variance (ANOVA)
• Wilcoxon rank -sum and Kruskal-Wallis (KW) tests
• Kaplan -Meier (KM) (log -rank) test
All three types of tests can be used to test data sets containing NDs
The t-test and One-way ANOVA are parametric statistical analyses that test for differences in
means among groups. T-tests are used to test for differences in means among two groups,
whereas ANOVA is used to test for differences in means across three or more groups. Both
tests assume data are normally distributed with normally (or lognormally) distributed residual
values and the variances among groups being compared are roughly the same.
The Wilcoxon rank -sum test and KW Test are nonparametic equivalents of the parametric t-test
and One-way ANOVA. Both the Wilcoxon rank -sum and KW Test analyze the ranks of the data
rather than the actual concentrations and test for difference among average ranks between
groups. The Wilcoxon rank -sum test compares the average rank between two groups, while
the KW test compares the average rank across three or more groups.
The KM (log -rank) test is a nonparametric test that compares the survival distribution between
two or more groups. The KM (log -rank) test is useful for data sets that cannot be fitted to a
discernible distribution model and contain a large percentage of NDs concentrations.
Testing for potential sub -groups within background groundwater data sets will be performed
using a significance level of 0.05.
Tests for Identifying Differences Among Sub -Groups
If results from One-way ANOVA, KW, and KM (log -rank) tests indicate a statistically significant
difference during comparison of three or more groups, additional tests need to be performed to
compare all possible pairs of sub -group means or average ranks to determine which ones are
different from one another. These tests are referred to as `post -hoc' tests because they are
performed after the fact. The Tukey-Kramer and Dunn's test are post -hoc tests that should be
used to compare possible pairs of sub -groups means or average ranks. The Tukey-Kramer
test is parametric and should be used to evaluate One -Way ANOVA results, whereas Dunn's
test is nonparametric and should be utilized to assess KW and KM (log -rank) test results. Post -
hoc analysis will be performed using a significance level of 0.05.
13 1
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
PART IV — DEVELOPMENT OF BTVs FOR CONSTITUENTS IN
GROUNDWATER AND SOIL
The USEPA Unified Guidance (2009) recommends using UTLs to estimate BTVs for
constituents evaluated during assessment monitoring as opposed to the use of other statistical
intervals such as confidence limits (UCL) and upper prediction limits (UPL). UTLs represent
fixed values that do not rely on future observations, unlike UPLs, and are constructed using
background data as opposed to a fixed health -based standard, unlike UCLs (USEPA 2009).
UTLs allow for a "suitably high enough level above current background to allow for reversal of
the test hypothesis and are the preferred statistical interval (USEPA 2009). In nearly all cases,
the UTL is computed because the concern is generally for exceedances greater than the value.
The only parameter that may require both upper and lower tolerance limits is pH. Site -specific
BTVs for select constituents from Table 1 in groundwater and soil will be produced using UTLs.
Tolerance intervals test the null hypothesis that concentrations in downgradient wells or at
impacted soil sampling locations are similar to that of background and are constructed using
the mean, standard deviation, and tolerance factor. For the estimation of BTVs for constituents
in groundwater and soil, a coverage of 95 percent (p) and a confidence interval of 95 percent
(1 -a) will be used. This means, there is a 95 percent probability that 95 percent of background
sample concentrations will fall within this limit. The formulation of the UTL may vary slightly
with the details of the test to be made and the characteristics of the data involved (see chapters
3 and 5 of ProUCL's Version 5.1.02 Technical Guide for the full specifications of the UTL
formula under differing parametric and non -parametric assumptions), but the basic form for the
(1-a)*100 percent UTL with coverage coefficient, p, under normal distribution assumptions in
general is:
Where
UTL=x+K*s
x = baseline (historical data) sample mean; and,
s = baseline (historical data) standard deviation.
K represents a special function called the tolerance factor. It depends on the sample size (n),
the confidence coefficient (1 — a), and the coverage proportion (p). For selected values of n, p,
and (1 -a), values of the tolerance factor (K) have been tabulated extensively in the statistical
literature. ProUCL will be utilized to produce UTLs for each constituent. The type of UTL
produced is a factor of distribution type, the desired confidence interval, coverage, and the
percentage of NDs.
Following completion of the preliminary data analysis described in Part II and applicable steps
in Part III, the steps below will be completed for selection of appropriate UTLs.
1. UTLs will be produced for constituents in groundwater and soil using the statistical
software program ProUCL. The first step in constructing UTLs using ProUCL is to
141Page
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
categorize constituents based on the presence or absence of NDs. ProUCL
calculates UTLs differently depending on whether NDs are present within a data set.
The algorithms in ProUCL use imputation and modeling techniques to address NDs.
ProUCL does not substitute values (e.g., multiplying the MDL by a constant) for NDs,
as this method introduces bias into the estimation of UTLs. Some constituent data
sets may be represented 50 percent or more NDs. A large percentage of NDs make
it difficult to fit data to distribution models. For data sets containing 50 percent or
more NDs, UTLs will be constructed utilizing nonparametric techniques.
2. Produce UTLs using a coverage (p) and confidence level (1- a) of 95 percent.
3. Record all UTLs under all parametric and non -parametric distribution models. When
data sets used for producing UTLs can be fitted to multiple distribution models, a specific
hierarchy preference is applied. Calculation of a specific UTL will follow the distribution
hierarchy preference below, with the noted exceptions:
I. normal,
II. gamma,
III. lognormal, and;
IV. nonparametric.
The exception to the hierarchy is based on situations where the data set exhibits
skewness that is moderate and higher (e.g. standard deviation of logged data is greater
than 1) and sample size is small (e.g., n < 30). In these situations, the nonparametric
UTL is preferred over lognormal UTL. Data set distributions will continue to be
evaluated as additional samples are collected, with the understanding that
distributions may change over time.
4. It has been demonstrated that if there are insufficient samples sizes, the non -
parametric UTL cannot achieve the desired confidence coefficient of 95 percent.
Depending on background sample size, a different order statistic is selected to
produce UTLs. For constituent data sets containing less than 59 samples, UTLs will
be produced using a coverage of 85 percent (i.e., the 85th percentile) and a
confidence coefficient of 95 percent as this coverage is more likely to be achieved
even with sample sizes as low as 10. For data sets containing 59 or more samples,
UTLs will be produced using coverage of 95 percent and confidence coefficient of 95
percent.
5. A minimum of ten valid background samples should be obtained prior to producing
BTVs for each constituent for soil and each constituent in each flow layer for
groundwater. If it is deemed necessary to produce BTVs prior to obtaining ten valid
samples, UTLs will not be calculated and the PPBTV for a constituent (soil) and a
constituent within a flow layer (groundwater) will be estimated to be either:
• the highest value, or
151Page
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(~DUKE
Groundwater and Soil At Coal Ash Facilities '*'ENERGY.
May 26, 2017
• if the highest value is an order of magnitude greater than the geometric mean
of all values, then the highest value will be considered an outlier and the
second highest value will be utilized as the PPBTV.
In situations where there are non -detects and less than ten valid samples, the
geometric mean, which is the product of all values (including the censored values)
taken to the root of n, may not be representative of the central tendency of that
sample. The median may be a better reference value from which to determine if the
highest value is an acceptable estimate for the PPBTV, and may be utilized if
determined appropriate.
In addition, the allocated time frame necessary to collect an additional ten samples
for further evaluation of background may not be available given the assessment
deadlines and autocorrelation restrictions. In evaluating the need for inclusion of
additional background data to produce revised BTVs, DEQ will determine what data
are appropriate for inclusion in a comprehensive background data set based on
relevant considerations.
161Page
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For ,,(.,DUKE
Groundwater and Soil At Coal Ash Facilities 17" ENERGY.
May 26, 2017
REFERENCES
NCDEQ DWR, 2012. Evaluating Metals in Groundwater at DWQ Permitted Facilities: A
Technical Assistance Document for DWQ Staff.
http://digital.ncdcr.gov/cdm/ref/collection/pl 6062coII9/id/251181.
USEPA, 1992. Supplemental Guidance to RAGS: Calculating the Concentration Term.
Publication 9285.7-081.
USEPA, 2002. Guidance for Comparing Background and Chemical Concentrations in Soil for
CERCLA Sites. EPA 540-R-01-003.
USEPA, 2009. Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities — Unified
Guidance, March 2009. EPA 530-R-09-007.
USEPA, 2015. ProUCL 5.1.002 Technical Guide Statistical Software for Environmental
Applications for Data Sets with and without Nondetect Observations. EPA/600/R07/041.
171
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For /-DuKE
Groundwater and Soil At Coal Ash Facilities ti `ENERGY.
May 26, 2017
FIGURES
Box Plot for PARAMETER
0
1000 -
800 -
X
w
W 600 -
a
100 -
PAR,aN E-ER
Possible Outliers
90t" Percentile
10t" Percentile
75t" Percentile
50t" Percentile
25t" Percentile
Q-Q Plot for PARAMETER
s
1000 -
•
0
800 -
w
w
z 600 -
0
w
w
w 400 - # •
200 - •
-1_8 -1.2 -0.5 0.0 0.5 1.2 1.8
QUANTILE (THEORETICAL)
(•� DUKE FIGURE 2
�I
ENERGQUANTILE-QUANTILE (Q-Q) PLOT
Y®
450
400
sOr
350
°•�
z
300
�• ' ®�
O
H
250
H
V
200
f MW-1
z
- -Cl M W-2
150
100
50
0
1/1/2010
8/24/2011 4/15/2013
12/6/2014
DATE
DUKE
FIGURE 3
A CRG/®
SCATTER PLOTS COMPARING TIME VERSUS
CONCENTRATION BETWEEN TWO WELLS
Autocorrelations of Parameter �0,0,4,1,0)
Lag Correlation Lag Correlation
Lag
Correlation
Lag
Correlation
1 0.557470 6-0.226209
11
-0.342618
16
0.140436
2 0.399406 7-0.206603
12
-0.337405
17
0.185468
3 0.182616 8-0.202751
13
-0.171677
18
0.160949
4 0.023263 9-0.321882
14
-0.147838
19
0.140392
5-0.096115 10 -0.36 9379
15
0.043 38 9
Significant if JCorrelationj> 0.426401
60
Z
50
O
Q
H
40
Z
Lu
u
p
30
u
Lu
W
20
Q
a
10
0
3/3/2010
5/11/2012 7/20/2014
9/27/2016
DATE
FIGURE 5
DUKE
'`
SCATTER PLOT OF TIME VERSUS
ENERGY.
CONCENTRATION ILLUSTRATING
SEASONALITY
120
zo 100 ard�Cerd
H Q 80 Vp�
W I=
Q Z 60
Q u 40 -
z
a 0 20
0
3/3/2010 5/11/2012 7/20/2014 9/27/2016
DATE
120
z 100 Downward
H Q 80 Trend
W J=
Q Z 60
Q u 40
z
a O 20
0
3/3/2010 5/11/2012 7/20/2014 9/27/2016
DATE
jDUKE FIGURE 6
SCATTER PLOTS OF TIME VERSUS
ENERGY
CONCENTRATION ILLUSTRATING TRENDS
Annual Trend Analysis: Deseasonalized Data vs Date
Piece -Wise (Linear -Linear -Linear)
0.0250
0.0232
0.0213
0.0195
•
0.0176
0.0158 Downward Trend
0.0139 Upward Trends •
q 0.0121 • •
is
0.0103
( •
a>
.� 0.0084
0 • • •
0 0.0066 •
Cnco
• • + •
0.0047 •• • • •
• •
p 0.0029 • • • •
0.0011 •
-0.0008
-0.0026
-0.0045
-0.0063
-0.0082
-0.0100
�
T 2-1
D p C CD CD CD v
? 7 s cQ
N N N N N N N N N N
O O O O O O O p O O
CD � � Oco co co O N IV
Date
( DUKE FIGURE 7
PIECE -WISE POLYNOMIAL REGRESSION
ENERGY° OUTPUT EXHIBITING MULTIPLE TRENDS
r I
SEAS0
01
r
r
II 26
0_00 12tis.CIl1 20— Cl Cl Cl 37 IIII yIIII III,
Seasonal
FIGURE 8
(•� DUKE EMPERICAL DISTRIBUTION PLOT COMPARING
ENERGY. CONSTITUENT CONCENTRATIONS BETWEEN
TWO SEASONS
REVISED DRAFT
Duke Energy Carolinas, LLC
Statistical Methods For Developing Reference Background Concentrations For /-DuKE
Groundwater and Soil At Coal Ash Facilities ti `ENERGY.
May 26, 2017
TABLE
TABLE 1
CHEMICAL PARAMETERS ANALYZED IN GROUNDWATER AND SOIL
FIELD PARAMETERS
pH*t
Specific Conductance*
Temperature*
Dissolved Oxygen*
Oxidation Reduction Potential*
Eh*
Turbidity*
INORGANICS
Aluminum
Antimony
Arsenic
Barium
Beryllium
Boron
Cadmium
Chromium
Cobalt
Copper
Iron
Lead
Manganese
Mercury
Molybdenum
Nickel
Selenium
Strontium
Thallium (low level)
Vanadium (low level)
Zinc
RADIONUCLIDES
Radium 226*
Radium 228*
Uranium (233, 234, 236, 238)*
ANIONS/CATIONS/OTHER
Alkalinity (as CaCO3)*
Bicarbonate*
Calcium
Carbonate*
Chloride
Magnesium
Nitrate (as N)t
Nitrate + Nitrite*
Potassium
Percent Moisturet
Methane*
Sodium
Sulfate
Sulfide*
Total Dissolved Solids*
Total Organic Carbon
Total Suspended Solids*
NOTES:
* = Indicates parameter analyzed in groundwater only.
t = Indicates parameters analyzed in soil only.
Metals in groundwater were analyzed for total and dissolved concentrations.
Soil pH measured at 25 degrees C.
Analysis of groundwater and soil samples for Chromium (VI) began after initial samples were collected as part
of CSA.
Page Iof1