Accuracy, Conservatism and Parsimony of Three Vapour Intrusion Models Used in Sweden

This study presents an evaluation of three screening-level models, namely the Dilution Factor (DF) model from 1996, the update version from 2005, as well as the Johnson and Ettinger model (JEM) from 1997, that are applied within the frameworks for contaminated land management (CLM) in Sweden. This evaluation applies, besides a deterministic approach (point estimate), a probabilistic assessment plus sensitivity analysis. The latter approach allows the models to be ranked according to conservatism, accuracy and parsimony by contrasting predicted and observed soil and indoor air concentrations for two contaminants (benzene and ethylbenzene), as to determine their suitability for application within CLM. The results and conclusions from this study suggest that the most accurately model for predicting the soil and indoor concentration is the JEM followed by the DF 2005 and 1996. Predictions of the soil air concentration are primarily driven by variation in physico-chemical parameters. The variation in indoor air concentration by physico-chemical and/or soil parameters for the DF models, while for the JEM soil parameters dominate. The deterministic analysis showed that default parameter values could be revised as to increase the conservatism, and be closer to the probabilistic 95-percentile predicted indoor air concentration. The DF 1996 model includes a limited number of parameters, and this analysis shows that a model with more parameters is more accurate, and less conservative. The DF 2005 seems to be the most parsimonious model as it is accurate, sufficiently conservative, and has 14 parameters, whereas the DF 1996 with 9 parameters is the most conservative and the JEM with 27 parameters the most accurate with an increased probability to produce false negative predictions. For the latter some of the dominant parameters cannot easily be measured on site.


Introduction
In Sweden contaminated soils are regulated by the Sweden Environmental Protection Agency (S-EPA), who issued guideline values in 1996 as part of their assessment framework for contaminated land management (CLM), which was expanded in 1998 to include petrol stations (S-EPA, 1996;1998).The framework includes tiered risk assessment and triggers for remedial actions, allowing the owner of the site and S-EPA to either set priorities or determine the need for risk reduction.
The generic exposure model as described in S-EPA (1996) and resulting guideline values are applicable for typical Swedish land-use and are sufficiently conservative based on the precautionary principle.If generic guideline values do not apply (for example if site conditions, such as building type, differ) a site specific assessment should be conducted.For consistency reasons a model for the prediction of site specific concentrations and exposure (S-EPA, 2005a) was issued, and includes lists of typical parameter values (physical and chemical, bioconcentration factors, (eco)toxicological reference values), as well as background concentrations and exposure.The 2005 model includes more parameters in comparison to the 1996 model, which increases the parsimony (predict with the fewest possible assumptions).The S-EPA published a supplementary guidance to risk assessment of contaminated areas (S-EPA, 2005b;2009) that advises on how to adapt the model to site specific conditions.Additional information on the approach in Sweden can be found in Carlon et al. (2007), and its progress on managing contaminated land in van Liedekerke et al. (2014).
The parameters needed to run the model from 1996 include measured concentrations in soil and groundwater, resulting in phase distribution.The model does not include sorption and degradation during transport, which in turn is mainly driven by soil properties.The 1996 and adapted 1995 model are both used for the derivation of general and site specific guideline values.Model predictions can be made for different land-use types (for example industrial area, considered less sensitive, or residential area, considered sensitive) and routes (for example consumption of locally grown vegetables and groundwater, or inhalation of air contaminated by vapour intrusion).Exposure from the different routes are summed and compared to toxicological reference values, like the Tolerable Daily Intake (mg/kg.bw.day) for oral exposure, and Reference Concentration (mg/m 3 ) for inhalation exposure.The reader is referred, for a description of the legal framework, scientific basis, technical approach, receptors taken into account and the derivation method for guideline values, to reports on contaminated land management in Sweden (S-EPA, 1996;1998;2005a;2005b;2009, Carlon et al., 2007)).
Risk assessment of contaminated soils is associated with variation and implies the use of models within an assessment framework.Models are applied to predict a point estimate of for example the soil and indoor air concentration based on a set of default parameters, also called the deterministic approach.This framework should provide sufficient conservatism in protecting the general public health, as well as adequately discriminate between sites that need further action, which raises the issue of accuracy (Hers, 2004;Fitzgerald, 2009).The Swedish guidelines do not reflect a worst case scenario, but aims to protect most individuals (±95%) (Öberg, 2006;Sanders, Bergback & Öberg, 2006).
A major pathway of exposure is inhalation by humans of indoor air that is contaminated, resulting from nearby buried sub-surface volatile chemicals.Soil vapour can enter a building and, if contaminated, affect the indoor air quality negatively (Kaplan et al., 1993;Fugler & Adomait, 1997).Predictions the indoor air concentration in a building and the related human exposure is convoluted and is influenced by physico-chemical, environmental and building factors, which are subject to variation (McAlary et al., 2011;Provoost et al., 2010).As opposed to a deterministic approach, variation can be included in a probabilistic approach by computing probability distribution functions that propagate to a distribution of predicted air concentrations.The probabilistic approach, which includes a sensitivity analysis, gives insight into what parameters drive the predicted outcomes, and the effect of variation thereon.
The Swedish Dilution Factor model from 1996 (S-EPA, 1996) and the Johnson and Ettinger model (Johnson and Ettinger, 1991;1997), both frequently used within the framework of CLM in Sweden, were assessed for their accuracy and conservatism by Provoost et al. (2008Provoost et al. ( , 2010Provoost et al. ( , 2013) ) and via a probabilistic approach by Provoost et al (2014).However, the peer-reviewed literature does not include research on the Swedish Dilution Factor model from 2005 (S-EPA, 2005a).The model was not investigated for accuracy, conservatism or parsimony for regulatory purposes.
Different models can be selected depending on the context it is used in, so regulators may want to use a model with a sufficient level of conservatism, while those that caused the contamination may desire predicted indoor air concentrations that are very accurate (McAlary andJohnson, 2009, Provoost et al. 2013).Thus, the purpose of this study is to rank the three vapour intrusion (VI) models for their accuracy, conservatism and parsimony, so allow users to make a better informed decision on what model to apply.
Chapter 2 describes the models used for this study and gives further information of the contaminated site that provides input for the observed air concentrations.Chapter 3 defines the terms accuracy, conservatism and parsimony in relation to the deterministic and probabilistic approach.Chapter 4 provide the results, and chapter 5 and 6 the conclusion and discussion, followed by further research needs.

Swedish Dilution Factor Model from 1996
In 1996 the S-EPA issued a framework for risk management within contaminated land management.The guidance contains models and data to be used for the derivation of generic guideline values, when exceeded indicate a possible effect on human health and/or the environment (S-EPA, 1996).The guideline includes multiple routes of exposure to humans, and for volatile organic compounds (VOC) the vapour intrusion (VI) route dominates the exposure.The model for VI, hereafter called DF SE 1996, is a dilution factor (DF) between the source in the soil and the indoor air as shown in Figure 1.The transport process is based on diffusion, and because of uncertainty the source concentration is assumed to be constant over time and thus conservative.Equilibrium fugacity of the VOC is assumed between the solid phase, pore water, and soil air.Equilibrium between the pore water and soil air is predicted by using the Henry's constant (vapour pressure and solubility).Soil and contaminant properties are used to predict the soil air concentration, from which the indoor air concentration is calculated by applying a DF.The below formulas indicate the default parameter value where applicable.A default parameter value is used for the volume and ventilation rate of the single compartment building.
The concentration in pore water (C w ) is calculated from the total concentration of the contaminant in the soil (C s ) by using the formula: Where the K d is the distribution coefficient soil-water (l/kg), θ w the soil water content (0.3 dm³ water/dm³ soil), θ a the soil air content (0.2 dm³ air/dm³ soil), H the Henry's constant (-) and the ρb the dry soil bulk density (1.5 kg/dm³).
If the distribution coefficient soil-water (K d ) is not provided it is calculated from: (2) Where K d is the distribution coefficient soil-water (l/kg), K oc is the distribution factor between water and organic carbon (l/kg) and f oc the organic carbon content of the soil (0.02 -) by weight.
The soil pore air concentration is calculation via the formula: (3) Where C a is the vapour concentration in air (mg/dm³), H Henry's constant (-) and C w the concentration in pore water (mg/dm³).
The concentration in the indoor air (C ia ) is given by: C = C × DF (4) Where C ia is the concentration in the indoor air (mg/dm³), C a the vapour concentration in air (mg/dm³) and the DF ia the dilution factor for indoor air to soil air (-).
The DF ia is derived from empirical data from MDEP (1994) and the default was set to 1:20 000.The DF ia can be adapted to the floor type.The DF ia for an open floor basement was put on 1:5 000 and for a concrete basement floor about 1:70 000.

Swedish Dilution Factor model from 2005
In 2005 the model for calculating soil screening values (SSV), hereafter called SE DF 2005, underwent a revision.The main difference is the DF for VI which is calculated instead of assumed to be a fixed factor of 1:20 000.The calculation of the DF ia from equation 4 was adapted, and now includes dimensions of the building and temporal perspective (S-EPA, 2005a) and allows the model to be used for site specific assessments.
The model assumes that a zone of influence (C d ) is situated directly under the floor as indicated in Figure 2. The concentration in the building (C hus ) depends on the volume of the building (V hus ), surface floor (A), concentration in the C d and the soil air intrusion in the building (L a ).(5) Where DF ia is the dilution factor for indoor air to soil air (-), A the surface floor (100 m²), D soil vapour flux (m²/day), L a the soil air vapour intrusion in the building (2.4 m³/day), V hus the volume of the building (240 m³), l the indoor air exchange rate (12/day or 0.5/hour), and Z the depth of the contaminant (0.35 m).
The DF as presented above does not take into account the reduction as a result of temporal depletion of the contaminant.Contaminants for which the VI is an important route of exposure often have a low K d and a high Henry constant, resulting in a depletion of the contaminant closest to the zone of influence under the building.This might result in reduced temporal indoor air concentrations.To account for this effect the model as applied by the US EPA was incorporated (US EPA, 1996) and depends on the chemical specific physico-chemical parameters.
The model calculates an average soil air concentration over 5 years and the formula is given below.
The apparent diffusion (D a ) is calculated from: Where D a is the apparent diffusion (m²/day), D the diffusion (m²/day), ρb the dry soil bulk density (1.5 kg/dm³), K d is the distribution coefficient soil-water (l/kg), H Henry's constant (-), θ w the soil water content (0.3 dm³ water/dm³ soil) and θ a the soil air content (0.2 dm³ air/dm³ soil).
The overall DF (DF ia,inne ) is then calculated by combining the DF from equation 4 with 7: Where DF ia,inne is the overall dilution factor, DF ia the DF for indoor air to soil air, and DF SSL the time specific DF (-).
The overall DF accounts for diffusion through an uncontaminated soil layer (DF ia ) and for dilution due to depletion of the soil layer closest to the zone of influence (DF SSL ).The indoor air concentration is then calculated by applying: ) Where C ia is the concentration in the indoor air (mg/dm³), C a the vapour concentration in air (mg/dm³) and the DF ia,inne the overall dilution factor for indoor air to soil air (-).

Johnson and Ettinger model from 1991
In 1998 the US EPA developed a model, hereafter called JEM, to predict the indoor air concentrations as a result of sub-surface contamination.The model is based on publications from Johnson and Ettinger (1991;1997) and Johnson et al. (1999).The Johnson and Ettinger model, hereafter called JEM, includes both diffusion and convection as transport of vapour to the indoor air, as indicated in Figure 3.The model allows the user to include a set of site-specific data as well as parameters related to physico-chemical, soil and building properties.The general model is given below: Where α is the dilution factor between soil air and indoor air (-), D the total overall effective diffusion coefficient (cm²/s), A B the area of the enclosed space (cm²), Q building the ventilation rate (cm³/s), L T the source-building separation (cm), Q soil the volume flow rate of soil gas into the enclosed space (cm³/s), L crack the enclosed space foundation thickness (cm), D crack the effective diffusion coefficient through the cracks (cm²/s), and A crack the area of total cracks (cm²) (US EPA, 2004).

Description of the Site
Few well-documented site, with sufficient temporal and spatial data to contrast predictions and observations, are available in the public domain.From those available a site situated in the city Vilvoorde (Belgium) was selected, as it was used for over 30 years by the coating industry to produce paints and varnishes, resulting in a contamination with volatile aromatic and chlorinated hydrocarbons.
At the source of the contamination the soil as well as the groundwater is contaminated.The migration of the plume shows a near-surface and deep groundwater contamination (Provoost et al., 2014).The conceptual site model (CSM) revealed that the aromatic hydrocarbons benzene (hereafter BE) and ethylbenzene (hereafter EB) were the contaminants of concern, given their concentrations ranges in the soil.Therefore the dominant route for exposure of humans is considered to be migration of soil vapour to the building (indoor air), resulting from sub-surface soil contamination.The soil was on average 1.5 m thick, with sandy-loam near the sub-surface and a loamy soil at groundwater level.The building floor was made out of concrete with an average thickness of 0.3 m (with cracks and gaps).The vadose zone contaminated covers around 4 000 m 2 , has an estimated volume of 50 000 m³, with an average benzene concentrations of 408 mg/kg soil and 491 mg/kg soil for ethylbenzene.The volume of contaminated groundwater is around 63 000 m³ with an average concentration of 1 330 mg/l benzene and 580 g/l ethylbenzene.Samples were taken in the building overlying the vadose zone contamination.For more details on the site, CSM and observed concentrations the reader is referred to Provoost et al. (2002Provoost et al. ( , 2014)), and upon request the full study report Bronders et al. (2000) can be made available.

Deterministic Approach
The deterministic approach requires the selection of a single value for each of the model parameters to arrive at a conservative predicted air concentration.Consequently, model parameters were adjusted to specific conditions on the site, such as the soil type, contaminant concentration and its depth below ground level, and volume of the building.This approach is usually applied by risk assessors within a CLM framework, and results in a point value for exposure.
However, model parameters have variation, so the deterministic approach obscures the propagation of either uncertainty or variability to the predicted air concentrations (Ragas et al., 2009).Consequently deterministically predicted concentrations do not consider the range or likelihood of the outcomes, so may underestimate the exposure risk (false negative prediction) (Krupnick et al., 2006).To address this shortcoming a probabilistic approach was applied.

Probabilistic Approach
A probabilistic approach is an assessment that produces a distribution for the predicted air concentration, generally by assigning a probability distribution function to represent variability or uncertainty in input parameters (EPA, 2001).Furthermore, predicting VOC exposure as a result of inhaling of indoor air is subject to variation, which has various sources.The first source is variation in model parameters (1), which in turn are either uncertain or variable.A parameter can be uncertain because insufficient data is obtained to derive a value.An example is the fraction organic carbon in the soil, which is not routinely measured during field investigations, but rather estimated from the soil organic matter.Parameters can also vary because they describe a population with different values, for example the thickness of the concrete floor in the building.Variability on the contrary cannot be reduced by gathering more data (Cullen & Frey, 1999;Finley & Paustenbach, 1994;McKone & Bogen, 1990).Other source are uncertainty related to the model itself (2), scenario uncertainty (in the application of the model) (3) and uncertainty due to simplification relevant for a given decision context (4).The probabilistic approach used in this study address parameter variation only (source 1), because models are applied as made available with their specific scenarios (source 2 and 3) and simplification (source 4).This approach requires the user to consider uncertainty, variability and interdependencies of parameters and includes various probability distribution functions in the prediction (van der Sluijs et al., 2004), and allows a sensitivity analysis of the dominant parameters, effect of variation therein.
As indicated in chapter 2 three VI models were selected for this study, and were made available in a spreadsheet version, as to allow for the calculation of probabilistic distributions of the predicted air concentration.For the probabilistic approach a Monte Carlo simulation (Hammersley & Hanscomb, 1964;McKone & Ryan, 1989;EPA, 2001) was applied, by utilizing the add-in to Excel® named Crystal Ball® (Crystal Ball, 2000), for which each parameter distribution function is sampled 10 000 times.The sensitivity analysis, as part of the probabilistic approach, provides insight in how variation, because of variability and uncertainty in input parameters, propagate through the model into variation of the predicted air concentration (Hammersley & Hanscomb, 1964;McKone & Ryan, 1989;Provoost et al., 2014).The probability distribution function for model parameters were derived from the literature as well as from the site investigation report.

Visualisation of the Results
The graphical representation of the deterministic and probabilistic approach, including the sensitivity analysis, are visualized by box-and-whisker plot, bar charts and tables.A plot is provided for each combination of contaminant (benzene or ethylbenzene), model (DF SE 1996, 2005 or JEM) and medium (soil air or indoor air).
The plots display the minimum and maximum, 25, 75 and 95-percentile (x) and median (□) predicted soil or indoor air concentration.Each plots displays the predicted deterministic concentration (○) and, in the case of the indoor air concentration, for reason of comparison, the tolerable concentration in air (TCA) ().For both contaminants the observed soil and indoor air concentrations are shown for comparison with the predictions.The plots provide an insight in the spread (difference between the whiskers minimum and maximum concentration), and the midspread (the middle 50 % of the predicted concentrations), also called inter-quartile range.The location of the median value relative to the 25 and 75 quartiles value indicates the amount of asymmetry in the data, also called skewness (Provoost et al, 2014).
The results from the sensitivity analysis are displayed via tables indicating for each of the contaminants and model the contribution of dominant parameters.
A graph containing the outcome of the three statistical criteria (RMSE, ME or CMR, as defined below) for each combination of contaminant (BE and EB) and model (DF SE 1996, 2005 or JEM) provides a measure for the inter model accuracy for predicted indoor air concentrations.

Accuracy
The accuracy of a model is defined as the difference between the predicted and observed (soil or indoor) air concentrations, and the closer the proximity the more accurate the model (Hers et al., 2002;Warmink et al., 2010;Provoost et al, 2010).Within this context the accuracy of a model is defined by applying three statistical criteria, namely the maximum relative error (ME), root mean squared error (RMSE) and coefficient of residual mass (CRM), as proposed by Loague and Green (1991).The lower the score, which is unitless, for a criterion, the more accurate a model predicts concentrations in close proximity to the observed concentrations, and are thus a measure for the precision.The three criteria are defined below, where the O means the observed concentration, P the predicted concentration, and n the number of pairs (observation and prediction).
Maximum relative Error (ME): The ME provides the maximum difference between the O and P concentrations.

Root Mean Squared Error (RMSE):
( ) The RMSE provides a measure of the average difference between all O and P concentrations.
Coefficient of Residual Mass (CRM): The CRM provides a measure whether the model over-or under-predicts in comparison to observations.If the CRM has a negative (-) value the predicted concentration by the model under-predicts in relation to the observed concentration and inversely.

Conservatism
The conservatism is defined as to what extent the predicted air concentration reflects variation as a result of variability or uncertainty in model parameters (Eklund & Burrows, 2009;Johnston & MacDonald, 2010;Labieniec et al., 1996), while maintaining a 95-percentile level of protection.Hereto, the 95-percentile predicted concentration of the probabilistic distribution is used as an adequate level of protection (US EPA, 2001;Ferguson, 1999).The conservatism is determined by comparing the deterministically predicted concentration with the 95percentile value of the probabilistic distribution (Provoost et al. 2014).

Parsimony
A parsimonious model accomplishes a desired level of prediction with as few parameters as possible (Pitt & Myung, 2002), or as "simple as possible, as complex as necessary" (Bilitewski B. et al, 2013).In relation to the probabilistic approach, a parsimonious model obtains the simplest model possible, that explains as much of the variation in indoor air as possible (Coles, 2001).
For that reason, the number of model parameters was used as a measure for the parsimony of the models under investigation.Preference is given to a model which is parsimonious while still accurate and sufficiently conservative.The 1996 DF model and new 2005 VI model, both from Sweden, and the US-EPA 1991 JEM, includes respectively 9, 14 and 27 model parameters.

Soil Air
Figure 4. Box-and-whisker plots for predicted and observed soil air concentration by model and contaminant x 95-percentile,  deterministic value, □ median, BE benzene, EB ethylbenzene, 10 000 predictions, 35 observations.
The ranges for the predicted concentrations cover the observed ranges and reveal that the mid-spread of predictions and observations are within 1 order of magnitude (OOM).The median and deterministic predicted soil air concentrations are in the mid-range, therefore close proximity.Deterministic predicted concentrations tend to be slightly lower than the median predicted concentration, with the exception of the JEM for ethylbenzene.
The 95-percentile predicted concentrations are ±1 OOM higher than the deterministic predicted concentration, which indicates a low conservatism (level of protection).The predicted and observed concentrations are not much skewed.

Indoor Air
Figure 5. Box-and-whisker plots for predicted and observed indoor air concentration by model and contaminant x 95-percentile,  deterministic value, □ median,  tolerable concentration in air (TCA), BE benzene, EB ethylbenzene, 10 000 predictions, 52 observations.
The ranges for the predicted concentration are higher when compared to the observed ranges, with the exception of the JEM for BE.The ranges and mid-spread of predictions and observations differ more than 1 OOM.The median and deterministic predicted indoor air concentrations are in the mid-spread, therefore close proximity.Deterministic predicted concentrations are in close proximity to the median predicted concentration.The 95percentile predicted concentration is 1 to 2 OOM higher than the deterministic predicted concentration, which indicates a low conservatism (level of protection).The predicted and observed concentrations are slightly skewed.The below table 1 provides more details for each of the model parameter groups.Figure 6 provides an overview of the different groups of model parameters contributing to the variation.The physico-chemical parameters overall contribute most (59-89%) to the variation of the predicted benzene and ethylbenzene soil air concentration, while the soil parameters contribute less (8-38%).

Sensitivity Analysis
Table 1 reveals that for the groups of physico-chemical parameters mainly the initial soil concentration drives the predictions, with the exception of the JEM for ethylbenzene, where the Henry constant dominates.For the group of soil parameter, the organic carbon fraction in soil contributes most tot the variation.The below table 2 provides more details for each of the model parameter groups.Figure 7 provides an overview of the different groups of model parameters contributing to the variation in indoor air.The group of physico-chemical parameters contribute 40-61% to the variation of the predicted indoor air concentration, with the exception of the JEM where the contribution reduces to 24-36%.The contribution of the group of soil parameters vary for the DF SE models (1996, 2005) between 23-43%, while this increases to 51-57% for the JEM.For all three models the building parameters do not contribute much to the variation of the indoor air concentration (14-20%).

Indoor Air
Table 2 reveals that for the group of physico-chemical parameters mainly the initial soil concentration drives the predictions, except for EB in the JEM, where the Henry constant dominates.For the group of soil parameter the organic carbon fraction in the soil drives the variation for both the DF SE (1996,2005) models (22-31%), while for the JEM the soil vapour permeability contribute most to the variation (31-54%).The group of building parameters contribute the least (14-20%).Table 3 provides per model an overview of the parameters that contribute most to the variation of the indoor air concentration.The JEM has different parameters driving the variation in predicted indoor air concentration, than the both DF SE models.The equations presented in Chapter 2 show that the three model have implemented similar fugacity algorithms for calculating the soil air concentration (equation 3), resulting in similar parameters driving the predictions, like for example the initial soil concentration or organic carbon fraction in the soil.
However the three models incorporated different mathematical algorithms (equation 4, 9 and 11) for predicting the indoor air concentration from the soil air concentration, so different building parameters dominate the variation (dilution factor from soil to indoor air vs. the intrusion rate of pore air vs. soil-building pressure differential).Figure 8 shows the scores (unitless) of the three statistical criteria for both contaminants and three models and reveals a decreasing value, meaning increasing accuracy, from DF SE 1996 to DF SE 2005, and further for JEM.Thus, the latter can be considered to be the most accurate model.The statistical criteria produce systematically a higher outcome for BE than EB, which suggests that indoor air concentration for EB are more accurately predicted than for BE.The positive CRM indicate that the models have a tendency to over-predict indoor air concentrations in relation to the observed concentrations.The model DF SE 2005 performs better for the RMSE and CMR for EB, when compared to the other two models.

Parsimony
The number of parameters for the DF SE 1996, 2005 and JEM VI model for which a distribution function was derived (see chapter 3.2) are respectively 9, 14 and 27.The variation for predicting the indoor air concentration results in a four OOM spread, with the mid-spread mostly within one OOM (Figure 5).The predicted indoor air concentration for the DF SE 1996 and 2005 are mainly driven by physico-chemical parameters and less by soil or building parameters.The exception is the JEM where half of the variation is caused by soil parameters (Figure 7).
The most accurate level of prediction of the indoor air concentration, with as few parameters as possible, is obtained by the DF SE 2005 model.The dominate parameters that drive the variation in predicted indoor air concentration (Table 3) are measured routinely during site investigations, like the initial soil concentration and organic carbon fraction in the soil (derived from organic matter).The soil vapour permeability for the JEM model will require additional sampling and analysis.The dominant building parameters for JEM are not routinely measured during standard investigations and will incur additional site investigation (costs).

Conclusions and Discussion
A deterministic and probabilistic approach (includes a sensitivity analysis) was applied to provide insight in the accuracy, conservatism and parsimony of three VI models used in Sweden for human health risk assessment within the framework of CLM.
The model that most accurately predicts the soil air concentration is the JEM followed by the DF SE 2005 and 1996 (Figure 4).Predictions are predominantly driven by variation in physico-chemical model parameters (Figure 6).The most accurate model for predicting the indoor air concentration is the JEM, followed by DF SE 2005 and 1996 (Figure 5 and 8).The variation in indoor air is driven by physico-chemical and/or soil parameters, with the exception of JEM, where soil parameters dominate (Figure 7).
The deterministic predicted soil and indoor air concentrations were in the mid-spread of the probabilistic range, often close to the median, but below the 95-percentile concentration (Figure 4 -7).Thus, default parameter values could be revised as to increase the conservatism, and be closer to the 95-percentile predicted indoor air concentration.This revision should take into account the dominant parameters for each model (Table 1 and 2, Figure 6 and 7).
Application of the three models on a site requires that parameters are adapted to the CSM.Table 3 indicates for each of the model what parameters drive the predictions, so site specific data can be gathered with the dominant parameters in mind.Some parameters cannot easily be measured on site, like the intrusion rate of pore air, so a value needs to be selected from the distribution function representing the CSM.Literature, handbook or manuals can be consulted for the ranges and representative default parameter settings.
The above findings are in agreement with Provoost et al. (2008;2010a;2013;2014), where the JEM model was among the most accurate and driven by the same parameters, while the DF SE 1996 was amongst the most conservative models, being less accurate, however producing less false negative predictions.A false negative is a predicted air concentration that is lower than the observed concentration, resulting in a reduced need to take further action (site investigation or remediation), while this might be triggered based on observed concentrations.The minimum concentration of the whiskers in figure 4 and 5 are below the minimum of the observed concentrations, therefore suggest a review on the probability for false negative predictions.
The sensitivity analysis reveals the contribution of each of the parameter to the predicted concentrations, resulting in a better prioritisation of additional data gathering on a site or in literature.Compared to the deterministic point estimated, the probabilistic approach revealed instead which parameters dominate the predicted indoor air concentration, hence the human health risk estimate (Johnson & MacDonald, 2010).Nevertheless, the use of results from a probabilistic approach is challenging for regulatory decision makers, as there is no rule-of-thumb to decide on the amount of acceptable variation as a result of variability or uncertainty of input parameters (van der Sluijs et al., 2004;EPA, 1997;Morgan & Henrion, 1990;Saltelli et al., 2004;Burmaster & Anderson, 1994).A guide for making decisions under variation (uncertainty) and how to communicate this are proposed by van der Sluijs et al. (2004) andEPA (2001).
The DF SE 1996 model is a screening level model, which shows a higher level of conservatism (lower accuracy).
A limited number of model parameters are included, and this analysis shows that a model with more parameters is more accurate, and less conservative.A model can be selected depending on the users desired level of conservatism and/or accuracy.A more conservative, less accurate, model like the DF SE 1996, could be selected for deriving general soils screening values, while a more accurate model can be applied for site specific risk assessment (Provoost et al. 2013).The DF SE 2005 seems to be a parsimonious model as it includes sufficient parameters to be accurate, but is sufficiently conservative with 14 parameters, while the DF SE 1996 with 9 parameters is conservative and the JEM with 27 parameters accurate.For the latter some of the dominant parameters cannot easily be measured on site.
The sensitivity analysis revealed dominant parameters for each of the models, where some have variability and some uncertainty, Fisher et al. (2002) suggests to use two models as to account for variability and explains differences in predictions.

Further Research Needs
The probabilistic approach applied in this study was limited to variation in model parameters, which have either variability or uncertainty.Both are propagated into a single output distribution, unless a 2D simulation is performed in which both sources of variation are separated and their contribution calculated (Ragas et al., 2009).A 2D analysis shows how much of the variation is caused by parameters that have either variability or uncertainty.
A similar study is required for other well documented sites, as to provide a broader insight in the accuracy, conservatism and parsimony of the three VI models.Additional research can confirm if the same model parameters drive the predicted soil and indoor air concentrations.
Other contaminant, like chlorinated hydrocarbons can add to this broader view.Consequently extrapolating the results from this study to the application on other sites require caution.

Figure 1 .
Figure 1.Conceptual model for the 1996 DF model from Sweden

Figure 2 .
Figure 2. Conceptual model for the 2005 model from Sweden

Figure 3 .
Figure 3. Conceptual model for the 1991 JEM

Figure 6 .
Figure 6.Contribution of groups of model parameters to the variation in predicted soil air concentration by model and contaminant BE benzene, EB ethylbenzene.

Figure 7 .
Figure 7. Contribution of groups of model parameters to the variation in predicted indoor air concentration by model and contaminant BE benzene, EB ethylbenzene.

Figure 8 .
Figure 8. Statistical criteria by contaminant and model for the predicted indoor air concentration BE: benzene, EB: ethylbenzene, ME: Maximum relative Error, RMSE: Root Mean Squared Error, CRM: Coefficient of Residual Mass.

Table 1 .
Contribution of model parameters to the variation in the predicted soil air concentration by model and contaminant

Table 2 .
Contribution of model parameters to the variation in the predicted indoor air concentration by model and contaminant

Table 3 .
Model parameters that contribute most to the variation in the predicted indoor air concentration by model

Table A2 .
Chemical related parameters ethylbenzene

Table A3 .
Soil properties