HomeMy WebLinkAboutDWR guidance to Duke for background determination 12-15-2015 Ver2Division of Water Resources Position on Background Determination for Coal Ash Facilities
The Division of Water Resources (Division) expects a consistent overall technical approach applied to
develop background determinations across all facilities. With this objective in mind, clarification of the
background determination process is provided herein to facilitate a defensible technical approach to
meet project objectives.
The determination of background and the use and applicability of a statistical method depend on data
size, data skewness, and data distribution. Instances where the maximum value was taken to be the
background level did not acknowledge or address this fundamental point. This is of particular concern
when nonparametric, compromised (turbidity effects, e.g.), and (or) too small background datasets are
used to determine a background value that is then compared to on-site data; the test power may be
prohibitively low in these cases, depending on the order statistic (highest, second highest, etc.) used for
background. Adequate test power is paramount. To this end, the Division generally advocates
background determinations computed using prediction limits over that of tolerance limits when site wide
false positives can be controlled. Because of site heterogeneities, the Division generally advocates
assessment monitoring comparisons to individual locations rather than to areal averages when site wide
false positives can be controlled. Close coordination with Division staff (including before product
submittal) will help ensure that the datasets that are used and the comparisons that are made are
appropriate and defensible.
Both the American Society of Testing and Measurement (ASTM) guidance and United States
Environmental Protection Agency (EPA) ProUCL software align with the various methods presented in the
2009 EPA Unified Guidance. Principles outlined in the ProUCL Version 5.0.00 Technical Guide provide
direction for use and applicability of a statistical method for developing background determinations that
best addresses overall Division concerns.
The ProUCL manual addresses specific issues related to statistical analysis that the Division considers
important for determining background conditions including:
• Computation of minimum sample sizes to address project objectives;
• Identification of outliers and whether to incorporate them into the background determination;
• Performing Goodness-of-Fit (GOF) tests for Normal, Lognormal, and Gamma distributions; and
• Addressing non-detects in the statistical analysis.
ProUCL addresses the use of background datasets with an inadequate number of samples:
The lack of sufficient amount of background data makes it difficult to perform defensible
background versus site comparisons and computing reliable estimates of background threshold
values (BNsJ. A small background data set may not adequately represent the background
population; and due to uncertainty and larger variability, the use of a small data set tends to yield
non-representative estimates of BTVs.
An appropriate background data set of a reasonable size (preferably computed using Data Quality
Objectives (DQO) processesJ is needed to represent a background area and to compute upper
limits (e.g., estimates of BNs) based upon background data sets and also to compare site and
background data sets using hypotheses testing approaches. At the minimum, a background data
set should have at least 10 (more observations are preferable) observations to perform
background evaluations. On a related note, ASTM recommends a minimum of 8 background soil
Page1�3
sample locations, all collected from a similar depth interval corresponding to the depth of the
highest concentrations in the source area.
ProUCL also speaks to the issue of testing/identifying outliers and observing the effects of a dataset with
and without the inclusion of outliers.
The inclusion of outliers in the computation of the various decision statistics tends to yield inflated
values of those decision statistics, which can lead to incorrect decisions. Often inflated statistics
computed using a few outliers tend to represent those outliers rather than representing the main
dominant population of interest.
A couple of classical outlier tests (Dixon and Rosner tests) are available in ProUCL. Since both of
these classical tests suffer from masking effects (e.g., some extreme outliers may mask the
occurrence of other intermediate outliers), it is suggested that these classical outlier tests be
supplemented with graphical displays such as a box plot and a Q-Q plot. The use of exploratory
graphical displays helps in determining the number of outliers potentially present in a data set.
The use of graphical displays also helps in identifying extreme high outliers as well as intermediate
and mild outliers. The use of robust and resistant outlier identification procedures (Singh and
Nocerino, 1995, Rousseeuw and Leroy, 1987) is recommended when multiple outliers are present
in a data set.
Even though ASTM directs the testing of outliers, EPA's ProUCL addresses this issue in some detail and
points out that the use of maximum/outlier values from limited background datasets can be problematic
and produce skewed outcomes. ProUCI recommends that those values be highly scrutinized and
generally recommends avoiding the use of very low probability high values in the background dataset for
purposes of computing background and comparing to on-site data. The problem as they describe it is
that these values often come from a portion of the background distribution that is not representative:
Since the presence of outliers in a data set tends to yield distorted (incorrect and misleading)
values of the decision making statistics (e.g., UCLs, UPLs and UTLsJ, elevated outliers should not be
included in background data sets and estimation of BNs. The objective here is to compute
background statistics based upon the majority of the data set representing the main dominant
background population, and not to accommodate a few low probability high outliers (e.g., coming
from extreme tails of the data distribution) that may also be present in the background data set.
The occurrence of elevated outliers is common when background samples are collected from
various onsite areas (e.g., large Federal Facilities). The proper disposition of outliers, to include or
not include them in statistical computations, should be decided by the project team. The project
team may want to compute decision statistics with and without the outliers to evaluate the
influence of outliers on the decision making statistics. The use of inflated statistics as BTV
estimates tends to result in a higher number of false negatives.
ProUCL also makes a compelling case for the testing of gamma distributions as well as normal and log
normal distributions, since log normal distribution statistics can have problems (overstate a background
level, for example) when a dataset is made up of too few samples. Note that the ASTM method
recommends only normal and log normal tests (apparently for simplicity).
Page2�3
This direction provides a starting point for developing sound, defensible background determinations to
support remedial design and performance standards.
Page3�3