A Look at the Grouping Effect on Population-level Risk Assessment of Radon-Induced Lung Cancer

On the basis of considerable knowledge gained by studying health effects in uranium and other underground miners who worked in radon-rich environments, radon exposure has been identified as a cause of lung cancer. Recent pooled analyses of residential studies have shown that radon poses a similar risk of causing lung cancer in the general public when exposure occurs at generally lower levels found in homes. With the increasing accessibility of statistical data via the internet, people are performing their own analyses and asking why, in some cases, the lung cancer occurrence at the community level does not correlate to the radon levels. This study uses statistical data available to the general public from official websites and performs simple analyses. The results clearly show the difficulty in linking observed lung cancer incidence rates at the provincial/territorial level, with possible cause, such as smoking or radon exposure. Even the effect of smoking, a well-documented cause of lung cancer, can be overlooked or misinterpreted if the data being investigated is too general (i.e., summary data at population level) or is influenced by other factors. These difficulties with simple comparisons are one of the main reasons that epidemiological studies of lung cancer incidence and radon exposure requires the use of cohorts or case controls at the individual level as opposed to the more easily performed ecological studies at the population level.


Introduction
Epidemiological studies of uranium and other types of miners have shown a strong relationship between lung cancer mortality and radon exposure (National Research Council, 1988, 1999. These studies were typically of the cohort-type, meaning the population studied was classified according to past exposure history and followed forward in time to observe the rates of various causes of death. These studies compared the rates of lung cancer in miners who worked in radon-rich environments to the general male population. Typically such studies had good accuracy and included data on radon levels, exposure durations and smoking habits of the individuals in the study, which were factored into the analysis. Historically, the data from the miner studies have been extrapolated to the levels typically found in residential homes and have suggested a risk exists for residents of some houses as well (National Research Council, 1999;UNSCEAR, 2006). Unfortunately direct data on residential risk from radon has been quite limited with most studies comparing lung cancer rates and mean radon exposures in various geographical areas without specific data on individuals. These types of ecological studies suffer from several weaknesses, including biases acting within a population group caused by inadequate control of confounding factors, the assignment of group exposure levels to all individuals of the group, use of crude estimates of population exposure and biases from the mobility of individuals, all of which make conclusions difficult. Ecological studies are often used because they are easy, quick and relatively inexpensive; however, because their results can be difficult to interpret, they are best used as an indicator of the need for a second more carefully designed study if strong associations are indicated.
In the field of epidemiology, the fact is well recognized that ecologic or population-level associations are not necessarily consistent with those measured at the individual-level. Unlike ecological studies, case-control studies link individual outcome events (i.e. lung cancer incidence in this context) to individual exposure (i.e. radon exposure), major affecting factors (such as smoking) and other covariate histories; with all data detailed at individual level. Case-control studies have also been conducted in several countries to try to estimate the risk of lung cancer from residential radon exposure; however, in the past none of these studies was large enough to reliably assess the risks. Such studies are often strongly influenced by the use of data from urban areas where radon concentrations tend to be lower compared to rural areas due to underlying geology and because more people live in apartments where radon levels tend to be reduced. Greater statistical power is needed to correct for these factors and can be accomplished by combining the data from several studies. In 2004 and 2005 researchers in Europe and North American conducted independent pooled case-control studies of lung cancer incidence and radon exposure in residential homes (Darby et al., 2005;Krewski et al., 2006). Both pooled studies indicated an increased risk of lung cancer associated with radon exposure at levels found in some homes.
In 2011 Health Canada completed an extensive two-year survey of residential radon levels in houses across the country. This Cross Canada Survey of Radon Concentrations in Homes provided test results for approximately 14,000 houses and identified areas where a higher percentage of homes were expected to be above Canada's National Radon Guideline of 200 Bq/m 3 . The data from this survey was subsequently used by Health Canada to reassess the number of lung cancer deaths in Canada due to radon exposure. This revised estimate of 16% points to approximately 3,000 deaths each year from radon with most of these due to the synergistic interaction between radon exposure and smoking (Chen et al., 2012). With such a large number of lung cancers occurring, Health Canada is often asked why regions of the country with high radon levels would not have significantly more lung cancers in comparison to other regions. Sceptics of the health risks associated with radon exposure have asked why the overall lung cancer occurrence by province would not correlate to a corresponding occurrence of radon levels with smoking incidence taken into consideration. While perhaps this might seem to be a reasonable assumption to make, from the previous discussion of epidemiological studies performed in the past, it can be seen to be a gross over-simplification of the analysis needed in order to show a correlation. The dependence of the effects of radon exposure on various factors, such as age at, level and length of exposure, past exposure history, as well as exposure to smoking, implies that an analysis cannot simply be made using mean radon exposures and lung cancer rates for a specific area.
To illustrate the risk of over-simplifying the data, in this publication we have attempted to show how simple comparisons of lung cancer rates with a well documented cause of lung cancer, such as smoking, as well as radon concentration, can lead to a conflicting conclusion of the risk.

Method
Canada is currently divided into 123 health regions defined by the provincial ministries of health as administrative areas (Statistics Canada, 2011). Statistics Canada has used a statistical method to determine peer groups and assign health regions to peer groups to achieve maximum statistical differentiation between health regions. Health regions were grouped into peer groups in order to effectively compare health regions with similar socio-economic characteristics. Twenty-four variables were chosen to cover as many of the social and economic determinants of health as possible, using data collected at the health region level mostly from the Census of Canada. Variables considered include basic demographics (such as population and ethnicity), living conditions (such as housing information, and income inequality), and working situation (such as the unemployment rate). There are currently ten peer groups identified by letters A through J, as shown in Figure 1.  Vol. 5, No. 6;2013 In the statistical analysis conducted here, linear regression is used. The R 2 coefficient is a statistical measure of how well the regression line approximates the real data points. R 2 = 1 indicates that the regression line perfectly fits the data while R 2 = 0 indicates no linear relationship, i.e. we cannot predict one variable from the other.

Results and Discussion
Summary statistics of lung cancer incidence, percentage of current daily smokers among a population and percentage of homes above 200 Bq/m 3 are given in Table 1 for the 10 provinces and 3 territories in Canada, respectively. Even though an enormous body of scientific evidence clearly documents that cigarette smoking is the major cause of lung cancer (IARC 2004), provinces/territories, such as Quebec, Prince Edward Island, Nova Scotia and New Brunswick, with higher lung cancer incidence rates do not necessarily correlate with higher smoking rates as shown in Table 1, making it clear that many factors must contribute to the development of lung cancer. As can be seen, this type of simple tabular comparison can be deceiving, and thus a graphical view of lung cancer incidence in relation to the percentage of current daily smokers is presented in Figure 2. Clearly, there is a strong relationship between lung cancer incidence rate and the smoking rate with R 2 = 0.86. However, the strong relationship with smoking is dominated by the data from Nunavut. If the data from Nunavut is excluded as an outlier, a correlation (R 2 = 0.17) can still be seen between lung cancer and smoking, albeit significantly weaker. Compared to the effect of smoking, we know exposure to radon is a minor cause of lung cancer (the current estimate is 16% of all lung cancers are attributable to radon). As shown in Figure 3, a province/territory of higher lung cancer incidence does not necessarily correlate with more homes above 200 Bq/m 3 . A clear negative association (R 2 = 0.14) is observed between cancer incidence and radon exposure. This negative association is dominated by the data from Nunavut. However, Nunavut is a unique territory in that many homes are built on stilts because of the permafrost. This architectural factor means many homes in Nunavut will not suffer from infiltration of any radon that is able to find its way to the surface of the earth. For this reason data from Nunavut is not necessarily comparable to the rest of Canada. If we treat the data from Nunavut as an outlier, a very weak but positive relationship (R 2 = 0.04) can be seen between lung cancer and radon exposure.

Conclusions
While primarily interested in the effect of radon exposure at the individual level, the information most often available for analysis is population-level summary data, which if used without understanding of the limitations can produce inaccurate conclusions. The most significant shortcoming of ecological studies is the fact that regional average exposure levels are assigned to individuals of a population group, but the average risk determined for the population does not correlate well with the average exposure. Compounding this issue is that the magnitude and direction of the ecological bias could vary depending on the population level selected or how individuals are grouped. These biases often cannot be eliminated by the addition of more data at the population-level, therefore population-level analyses or ecological studies should be primarily reserved for hypothesis generation. Results from the current study clearly show the difficulty in linking observed lung cancer incidence rates at the provincial/territorial level, with cause, such as smoking or radon exposure. Even the influence of smoking, a well-documented cause of lung cancer, can be overlooked or misinterpreted if the data being investigated is too general or too crude (i.e., using summary data at provincial or regional level) or is influenced by other factors which cannot be effectively controlled with data summarised at the population level. These difficulties with simple population level comparisons are one of the main reasons that epidemiological studies of lung cancer incidence and radon exposure require the use of cohorts or case controls at the individual level as opposed to the more easily performed ecological studies at the population level. Although also subject to some biases, it is individual-level analyses, i.e. case-control epidemiologic studies, that should be used to test a hypothesis. In the case of environmental radon exposure, involving low doses and therefore subject to large statistical uncertainty, even a case-control study at the individual-level suffers from problems and requires a very large group of individuals in order to be able to observe a statistically significant number of effects.