Estimation of Spatial Variations in Urban Noise Levels with a Land Use Regression Model

Background: Outdoor noise is a source of annoyance and health problems in cities worldwide. Objective: We developed a land-use regression using a GAM Model to estimate the spatial variation of noise levels in Montreal. Methods: Noise levels were measured over a two week period during the summer of 2010 at 87 sites and during the winter of 2011 at 62 sites. A land use regression model was produced for both seasons to estimate noise levels as LAeq24h (resolution of 20 m). A leave one out cross-validation (LOOCV) was performed. Results: LAeq24h measured range from 53.4 to 73.7 dBA for the summer and from 54.1 to 77.7 dBA for the winter. The land use regression models explained 64 % of spatial variability for the summer and 40 % for the winter. The main predictors are the Normalized Difference Vegetation Index; the length of vehicular arteries, highways, and bus lines; and the proximity to an international airport. The Root mean square error from the LOOCV was 3.3 and 4.5 dBA for the summer and the winter respectively. Conclusion: The model explained a large part of the variability in noise levels and the RMSE remain relatively important on the noise levels scale.


Introduction
Prolonged exposure to high environmental noise levels has been associated with annoyance, learning difficulties in children, sleep disturbances, hypertension, and other cardiovascular outcomes (Basner et al., 2014;Sørensen et al., 2012;Huss et al., 2010;Basner et al., 2006;Jarup et al., 2005;Stansfeld et al., 2005).Spatial variations of noise levels have been observed in urban areas (Zuo et al., 2014;Seto et al., 2007;Alberola et al., 2005).These spatial differences have been associated with main emission sources and with characteristics of the built environment (Zuo et al., 2014).Main emission sources of noise in urban environments include road, aerial, and railroad traffic; public and construction works; industries, domestic noise, and noise from leisure activities (Berglund et al., 1999).Among these sources, road traffic is the greatest contributor in urban settings (Makarewicz, 1993).
Sound waves are attenuated by their geometrical spread and their atmospheric absorption.Sound wave propagation is also influenced by its absorption and reflection off different surface types, by topography, and by meteorological parameters like wind speed, direction and temperature (Piercy et al., 1977).Dense vegetation decreases noise levels through the absorption of sound waves and by disrupting their spread (Reethof, 1973).
In order to estimate environmental noise exposure of populations, a number of approaches can be used.Personal exposure measurement is the most accurate method to assess exposure but it cannot be used to assess the exposure of large populations (Zou et al., 2009).Numerical models have frequently been used to estimate noise level over a large territory, for example in Europe to produce maps of daily noise levels for large communities, in order to meet the directive requirements of the 2002 European Union Directive (Directive 2002/49/CE).Such models estimate the propagation of sound waves from emission sources, their attenuation, and subsequently noise levels over a territory.Various software are available to estimate noise levels from emission sources, such as CadnaA®, Mithra-SIG®, FHWA, CORTN (Xie et al., 2011) and SoundPlan® (Guedes et al., 2011); however their use is somewhat limited due to cost and sometimes unrealistic assumptions about dispersion patterns (Xie et al., 2011;Jerrett et al., 2005).
Land Use Regression models (statistical models, LUR), used extensively in the field of air pollution, can overcome some of the limitations of numerical models, and can easily be developed to predict noise exposure levels over a large territory.Nonetheless, LUR model have rarely been used to estimate noise levels.In LUR models, geo-referenced predictors of noise levels, available for a large territory, are used to develop an equation to predict measured noise levels.This equation is then applied to areas where noise levels were not measured in order to generate noise maps.
The objective of this study was to develop LUR models to estimate the spatial variation of noise levels on the Island of Montreal.

Materials and Methods
Our study site was the city of Montreal, which is the second largest city in Canada.Its population is over 1.8 million in 2006(Statistique Canada, 2006) and covers approximately 500 km 2 .The sections below describe the noise sampling campaigns over Montreal, the variables used to predict noise levels, and the statistical models developed.

Measurements of Noise Levels
Two sampling campaigns were conducted: the first campaign over a summer two-week period in 2010 (August 11 th to 24 th ), and the second over two periods in the winter of 2011 (February 26 th to March 12 th , and March 12 th to April 3 rd ).We recorded two minute noise levels continuously in decibels using a type A filter (dBA), with the Sound Level Meter and logger Noise Sentry (Paillard, 2010).According to the manufacturer, the operating temperature range of the device is between -20 and 70°C.Noise meters were covered with small zipper storage bags to protect them from the rain.Field testing showed minimal influence of the bag, even under windy and rainy conditions (data not shown).
Sampling took place at locations used to develop traffic air pollution LUR models.Locations were selected with a population-weighted location-allocation model.This model situated samplers in areas likely to have high spatial variability in traffic intensity and in population density (see Kanaroglou et al., 2005 for model description).One hundred samplers were installed at the selected locations during the summer period and 81 during the winter sampling campaign (see map in appendix).All 81winter sites were also sampled in the previous summer sampling period.Measurements were taken concurrently for all sites during the summer, while during winter sampling about half of the sites were measured concurrently.
The samplers were installed at a height of 2.5 m above ground and were attached to street light poles, hydro-electric poles, or parking signs, usually near the sidewalk of the closest road.The samplers were installed at least 10 cm from the poles or signs.The geographic coordinates of each sampling location were recorded using Garmin eTREx Legend Cx Global Positioning System devices.

Predictors of Monitored Noise Level
In total, 69 variables of the built environment were considered as candidate predictors of noise levels.Variables included were related to vegetation cover, roads, bus and train networks, airport, industrial, commercial and residential land use zones, density of residential units and of non-residential buildings, and divisions of the Island of Montreal.Predictors were calculated as lengths, distances, density or proportion of areas for five different buffers (50, 100, 150, 500 and 750 m) for most variable of interest (see below).These variables were created in ArcGIS 9.2 and postGIS 1.5 for PostGresSQL 9.1.

Vegetation (one variable)
The Normalized Difference Vegetation Index (NDVI) was calculated using a Landsat 5TM (Montréal: 014-028) image taken on the 27 th of July 2010 with a resolution of 30 m (Jackson & Huete, 1991).NDVI was the only raster-based variable included, as all other variables were produced as vectors.Sampling locations were assigned the value of the pixel at their coordinates.

Land use zone densities (35 variables)
Seven land use density variables were calculated for the five buffers using the most recent Land Use file available for Montreal (Cartographie d'utilisation du sol, version 2012, Communauté Métropolitaine de Montréal).Variables were the total area (in meters) of each of the following land uses: industrial, office building, commercial or residential (low, medium-low, medium and high density). .

Density of residential units and of non-residential buildings (10 variables)
The number of residential units and the number of non-residential buildings were calculated for each buffer (n=5) using the Montreal Island property assessment database (Rôle de l'évaluation foncière de la Ville de Montréal, 2011).

Road network (5 variables)
The length (in meters) of arteries and highways (collecting most road traffic) was calculated for the five buffers, using the Adresses Québec 2013 (© Gouvernement du Québec, Ministère des Ressources naturelles).

Bus network (10 variables)
The length (in meters) of bus lines and the number of bus stop was calculated for the five buffers.The Montreal Transport Society (STM) bus network file (2012) was used to calculate these variables.

Train network (5 variables)
The length (in meters) of train lines was calculated for the different buffers (Base de données topographiques du Québec 1:20000).

Airport (one variable)
We geo-referenced the Noise Exposure Forecast 25 map of the Montreal airport (NEF25) available at the following web site (http://www.boeing.com/commercial/noise/montreal2011.pdf;accessed 14/02/28).The NEF25 is a complex indicator of noise levels calculated based on aircraft movements, which is used for land use planning in order to avoid high level of annoyance in the population (Transport Canada. 2010).Measurements sites were categorized as being within less than a km from the NEF 25 or not.

Port of Montreal (one variable)
Terminal accesses to the Montreal port (available at http://www.port-montreal.com/)were geo-referenced, and the distance (in meters) of each sampling site to the closest access was calculated; the maximum distances was set to 10 km based on the visual inspection of the relation between noise levels and distance to the port; at 10 km, the relation reached a plateau.

Divisions of the Island of Montreal (one variable)
The Island of Montreal was divided into the following four geographic divisions based on the municipal sectors of the Origin-Destination survey 2008 of the Montreal region (AMT, 2008): East, Center, Downtown and West.This survey is used to model road traffic by Transportation ministries in Canada (Gourvil & Joubert, 2004).Sampling sites were then attributed to the region in which they were contained to capture specific noise sources for these areas.

Statistical Analyses (Development of the Land Use Regression Model)
We calculated LA eq24h (equivalent noise level) at all sites, and for each sampling period, using two minute measurements (Fahy & Walker, 1998).At each site, all two-minute noise levels that were different from the average for the sampling period by three times its standard deviation were discarded as outliers.LUR models were then developed with one LA eq24h per site, separately for the winter and the summer period, due to differences in the selected noise determinants for the two periods.
The set of 69 variables described above was reduced to 17 by keeping the four following variables: NDVI, NEF25, distance to terminal access and division of the island, as well as choosing only one buffer size per buffer-based variable (i.e.land use and residential and building density, road and bus and train network layers).The selection of the best buffer per variable was done as follows.First, bivariate General Additive Models (GAM) (Hastie & Tibshirani, 1990) were developed using LA eq24h and each variable (n=65) independently.Our GAM model used an iteratively reweighted least square (IWLS) and a thin plate spline basis with a smoothing parameter value of three in order to describe the non-linearity of the relations between noise levels and the variables of the built environment.We then selected a single buffer size based on leave one out cross-validation (LOOCV) using the Root Mean Square Error by dropping sites individually and re-running the model to predict noise levels at the dropped site.
Finally, to develop a multivariate LUR model, an approach similar to a backward selection was applied, integrating all determinants of the built environment previously selected (n=17).But first, the shape of the relationship between LA eq24h and each variable was visually inspected using LOESS, and variables in non-linear relationships were log-transformed.For variables' relations with LA eq24h which were still non-linear, spline variables, with a smoothing parameter value of three, were used instead of the log transformation.GAMs were used to minimize the number of degrees of freedom associated with the use of spline variables Spline functions (Hastie & Tibshirani, 1990) were removed if their degree of freedom in the resulting model were very close to one.Variables were removed one by one, based on the value of the RMSE obtained from a LOOCV as described above.All determinants were removed one by one from the model until the RMSE value reached a minimum.All statistical analyzes were performed with the R software (version 2.11.1, package mgcv).

Application of Model and Estimation of Noise Levels in Montreal
Noise levels were estimated for all central points of a 20m x 20m grid covering the island of Montreal (~ 1.2 million points) using the equation of the summer noise model developed (see results).Noise levels estimated outside our sampling range (52.4 to 73.7) were subsequently deleted (n=26315).

Results
Table 1 shows descriptive statistics for LA eq24h for both sampling periods.87 and 62 sites during the summer and the winter periods with valid data, respectively.13 and 19 samplers were stolen or malfunctioned during the two study periods.More samplers malfunctioned during the winter period due to cold temperatures during the night (<-15°C).At Most sites, at least 10 days of continuous measurements were obtained but at 15 sites during the summer and at 28 sites during the winter, sampling took place between 4 and 9 days.LA eq24h levels ranged between 53.4 and 77.7 dBA across all seasons, with winter mean LA eq24h noise levels being approximately 7 dBA higher than during the summer sampling period.The correlation between LA eq24h summer and winter is 0.74 (n=62).
Sector Center N = 49 Sector Downtown N = 2 Sector West N = 18 Table 3 presents the results of the summer and winter multivariate GAM models.A higher R 2 was obtained for the summer (64%) compared to the winter (40%) model, with kept predictors differing between seasons.Regression coefficients for each variable are not interpreted as we did not take the collinearity between variables into account, and because we aimed to predict noise levels and not to assess the impact of each determinant on noise levels.
According to the validation results (LOOCV), the RMSE for the summer period (3.3 dBA) was better than for the winter (4.5 dBA).LA eq24h were then estimated with the summer model across the island of Montreal at a 20 m resolution.Overall, noise levels were higher along major roads, and in proximity to industrial sectors (in the East-End of Montreal Island and close to the airport in the center-west of the Island) (see Appendix).

Discussion
We developed statistical models to predict summer and winter noise levels (LA eq24h ) on the Island of Montreal.Our summer model outperformed our winter model (in terms of R 2 and RMSE).Common determinants for both winter and summer LA eq24h models were NDVI, land use variables, the length of highways and arteries, the length of bus lines in various buffers around sampling sites, the proximity to the airport noise contours (NEF25) and the sector of the Island.
It is likely that the summer models outperformed the winter models due to the fact that a higher number of noise level measurements were available.Nonetheless, recent studies in air pollution suggest that a larger number of sites often lead to a lower R 2 because the larger number of monitoring sites captures more complexity in the combinations of predictors (Basagaña et al., 2012).Other factors may also explain differences between the models for the two seasons.For instance, in Montreal in the winter, there are snowplows and snow cover.In the summer, additional sources of noise include the gathering of people outside given the clement temperature.
Our results can be compared to similar models developed in other countries.Xie et al. 2011 developed a LUR to estimate noise levels (L day for the period from 6h00 to 22h00), using land use and road categories.The R 2 of their model was not reported but the authors calculated the percentage of noise measurements predicted with an error of less than 10% (referred as PAS).Their PAS was 78% and similar figures, calculated with our LA eq24h measured and predicted values were 99 % for summer and 92 % for winter.
Our results can also be compared to those of Foraster et al. 2011 in Girona (Spain).Similar to our results, these authors reported that building and road characteristics, daily traffic flows, bus lines and stops, open areas, and the proximity to the river explained a large part (73%) of the variability in noise levels estimated with a numerical model.Finally, our results can also be compared to those of Seto et al. 2007. Seto et al. 2007 reported that in San Francisco (USA), road categories, hourly traffic flows by vehicle types and neighborhoods explained 37% of the variability in equivalent hourly noise level.As with Seto, we noted that the length of arteries and highways alone explained 26% of the variability in noise levels.
There are a number of limits inherent to our model.First, summer noise sampling was performed at the end of the holiday season.At that time, traffic flows and associated noise levels might have been lower due to reduced local traffic, which may reduce the external validity of our model and its applicability in other circumstances.
Another limit of our model is that the samplers for the measurements were attached closely to poles (10 cm).Reflection of sound waves on the surface of the poles to which the samplers were attached to is likely (ISO-1996-2).As such, actual LA eq24h levels are likely to be three to six dBA lower than the ones reported in this paper.
Finally, our LA eq24h Land Use Regression models predict noise levels with limited validity as the RMSEs were large in comparison to the range of measured and estimated noise values.Efforts to improve our models would include increased sampling at a number of sites, attempting to better capture the influence of airport, railways and other noise sources poorly represented in our model.Improvements would also be expected with the addition of traffic flows as predictors.Furthermore, the addition of proximity to noisy commercial zones (e.g.clubs and bars) may also be an improvement to the use of the proportion of industrial, commercial or residential land use types in various buffers.While noise mapping is not required by an authority in Canada, improvements in methods to map noise levels are nonetheless necessary to orient interventions aiming at reducing annoyance and health risks.

Table 1 .
LA eq24h (dBA) for noise sampling periods on the island of Montreal Table2presents the summary statistics of the potential determinants of noise levels at the sampling locations.All associations with buffer variables selected for the multivariate model (n=17) were in the expected direction with LA eq24h in bivariate models (see Figurein Appendix).The majority of predictors had a linear relationship with noise levels for each season (i.e.Positive with length of highways and arteries, with industrial land use and population density and with the airport noise curve; negative with the NDVI).

Table 2 .
Summary statistics of the potential determinants of noise levels at the sampling locations (n=87) 1

Table 3 .
Results of the GAM model to predict LAeq24h on the island of Montreal