Nonparallel Regressions with Indicator Variables

A multiple linear regression model which includes 0/1 variables to indicate membership in a group is a convenient way to model parallel regression surfaces. Building upon this, an extended model which includes predictor variables that are the product of the indicator and other variables will coincide with separate regressions for each group using only the other predictors. The algebraic basis for this concurrence is demonstrated. Least squares estimation is presumed. Examples with two groups and with four groups are presented.


Introduction
Most courses on linear regression include a discussion of using indicator, or dummy, variables to allow for parallel regression surfaces for different levels of a factor variable. Typical examples may be found in section 4.2.3 of Fox and Weisberg [2011, pp. 157-169]; section 2.3.1 of Harrell [2001, pp. 14-16]; Chapter 11 of Neter et al. [1996]. As a special case, the linear regression model with a single continuous predictor variable combined with a single binary indicator variable portrays a model comprising parallel lines. Weisberg [2005, § 6.2.2] and Christensen et al. [2011, § 7.4] both present a more general treatment within the linear model family.
When a third variable consisting of the product of the indicator with the continuous predictor is included, nonparallel regression surfaces are portrayed. It is not widely appreciated that in this case, the least squares coeffcient estimates are identical to those obtained by fitting each of the two groups separately. This note explicates that result.

Method
Consider the simple linear regression (SLR) model where both Y and ϵ are n-element vectors; the errors are iid with mean zero; X s = (X 0 , X 1 ); and β s = ( β s0 , β s1 ) ′ .
Here, X s comprises the unit vector X 0 and an auxiliary variate vector X 1 , evidently both of length n.
For this model the least squares estimator of β s is Suppose that the n observations are split between two groups, with n 1 from Group I while the remaining n 2 = n−n 1 hail from Group II. Further suppose that we set up an indicator variable to distinguish group membership: Further suppose the data are arrayed such that X 2 = (I 1 , I 2 , . . . , I n ) ′ and X 3 = (I 1 x 11 , I 2 x 12 , . . . , I n x 1n ) ′ .
www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 2;2015 Consider the multiple linear regression (MLR) model The least squares estimator of β m is It is commonly known that a regression with only X 0 , X 1 and X 2 will portray parallel lines. The vertical distance between the fitted lines is given by β m2 .
For the MLR model given by (3), it is less well known that the least squares estimates of the coefficients are identical to those that obtain when SLR models for each group are fitted separately. The analytical basis for this assertion is demonstrated in the following section.

Nonparallel Regressions
Without loss of generality, assume that the data are arranged so that the first n 1 rows are the Group I observations. Therefore where 1 n 1 are the first n 1 elements of X 0 , and 0 n 2 is an n 2 -element null vector. Likewise, let Evidently, X s is a submatrix of X m . Less evident is the relationship between X ′ s X s and X ′ m X m : where Define X II = X s − X I . From results on the inverse of a partitioned matrix, e.g. [Searle, 1966, p. 210], we know that when ) .
In the present context, the partitioned matrix of interest is X ′ m X m as presented in (7) with By substitution, we get ( International Journal of Statistics and Probability Vol. 4, No. 2;2015 so that Therefore, Also, . Therefore, where β I and β II are the coefficient estimators for the SLR fitted separately to the data from Group I and Group II, respectively. In other words, β m0 and β m1 in (4) coincide with the intercept and slope estimates of β s that one would obtain from the Group II data alone, whereas β m2 and β m3 are the difference between the estimates derived from the Group I data alone and those obtained from using Group II data alone.
The fitted values corresponding to an observation in Group I, say observation i = 1, is Likewise, the fitted value corresponding to an observation in Group II, say observation i = n, is In the next section we exemplify these results using data presented by Robinson [2000] on k = 2 groups of rats subjected to a diet treatment. Identical results hold when there are more than two groups, as shown by our second example where group membership in one of k = 4 species groups is indicated.

Example 1
We present results from fitting separate, parallel, and nonparallel regressions to weight data on two groups of rats, one of which was subjected to a dietary treatment. The treated group included n 1 = 14 rats, while the control group included n 2 = 12 rats. The response variable of interest, Y, is the final weight of the rats, with implicit interest in whether the diet treatment successfully lowered final weight. Data were initially presented by Robinson [2000].
All models were fitted with packages and functions available in the R computing environment [R Core Team, 2012], particularly the gls function of the nlme package [Pinheiro et al., 2009].
Figure 1a displays a joint fit of these data using a simple linear regression with initial weight as the lone predictor. Even a cursory examination of this display reveals the inadequacy of the fitted model to capture the effectiveness of the treatment to lower final weight: an examination of the scatterplot itself strongly suggests that the treatment successfully lowered final weight on average. The joint SLR has failed to capture the noticeable effect of treatment, as might be expected a priori. Figure 1b displays the separate SLR fits of these data. Clearly the effect of treatment is more adequately depicted.
Yet it comes at the expense of reduced samples size, n 1 or n 2 for each group, rather than n for both together.
When the effect of initial weight on final weight is expected to be invariant to treatment, then a model for parallel regressions may be preferred. In the present context, such a model is given by where Y is final weight, X 1 is initial weight, and X 2 is the vector of indicator values, with X 2i = 1 indicating that the ith rat is from the treated group (Group I). The fitted regressions are shown in Figure 1c. An advantage to fitting parallel regressions is that the full size of the sample is preserved for the fitting of the model. Another is that it allows for a common estimate of residual error for both groups, which may not only be apt but an important consideration for modeling.
Lastly we fitted model (3) to allow for differing intercepts and slopes yet avoiding the need to subset the data into separate groups for purposes of fitting the model. The graphical display of the fitted model is identical to that shown in Figure 1b.
Parameter estimates and the AIC summary measure of the adequacy of the fit of these competing models are displayed in Table 1. (The default REML method of estimation in the gls() function of the nlme package in R was used throughout). As demonstrated in the previous section, β 0 and β 1 for the jointly fitted, nonparallel model are identical to β 0 and β 1 for the control group (Group II) fitted separately. Moreover, β 2 = −57.2 coincides with β 0 = 14.9 for the treated group fitted separately less β 0 = 72.1 for the control group fitted separately. Likewise, β 3 = 0.173 coincides with β 1 = 0.700 for the treated group fitted separately less β 1 = 0.527 for the control group fitted separately.
I often refer to β 2 as an "offset" to the intercept for the omitted group's regression, i.e., the "omitted group" (Christensen et al. [2011] uses the term reference category) denotes that group which was not assigned an indicator unit value. Similarly, β 3 can be regarded as an offset to the slope of the omitted group's regression. In the context of the present example, the offsets are a measure of treatment effect.
Despite the appeal of a common estimate of residual error for both groups, the data may suggest that it is not apt. The magnitude of difference in σ between control and treated groups belies the presumption of a common error variance. Therefore we refitted both the parallel and non-parallel models to permit an estimate of the error   variance separately for each group. 1 As may be surmised from the results displayed in Table 1, the latter model provides results, not merely coefficient estimates, that are identical in all respects to fitting models separately to both groups' data.

Example 5
The data for this example comprise 1013 measurements of foliar biomass among three coniferous and one deciduous species of forest trees. Foliar biomass of trees is modeled well by a simple linear regression using the cylindrical volume (the product of cross-sectional, aka basal, area and the height) of the tree's bole as the predictor, X 1 . Figure 2 displays these data separately by species. The joint SLR fitted to all 1013 observations appears as the solid line in each panel, whereas the individual fit for each species alone appears as the superimposed dashed line. Summary statistics of these five fitted models appear in the uppermost five rows in Table 2.
We set up X 2 , X 3 , and X 4 as indicator variables for the three coniferous species (balsam fir, black spruce, and white spruce), respectively, and fitted parallel SLRs jointly, with the results as shown in Figure 3, and tabulated in Table 2. Judging by the lower AIC value for the parallel regressions model, it is superior to the joint SLR model fitted to By setting up X 5 , X 6 , and X 7 as the product of X 1 and each of the three indicator variables, we fitted a nonparallel regressions model, both with common and separate error variances. The coefficient estimates for both were identical. These results appear in the lowest two lines of Table 2. Because white birch constituted the omitted group in the indicator variable coding scheme, β 0 and β 1 are identical to the coefficient estimates for white birch fitted alone. In contrast, β 0 + β 2 yields a value of 2.73, which is the intercept estimate for the balsam fir regression fitted separately. Likewise, β 1 + β 5 evaluates to 30.9, which coincides with the estimated slope of the balsam fir regression. In a similar fashion the coefficient estimates for white and black spruce may be recovered from the estimates for the non-parallel regressions model.
In Figure 2 and 3, there is a strong suggestion of increasing error variance with increasing tree size. For all the models summarized in Table 2 we re-fitted with analogous models that specified that V[ϵ] = σ 2 D 2δ , where D is the measured diameter of the tree bole. REML estimates of both σ and δ were obtained 2 , which are shown in Table 3. In this specification, δ is the parameter that models the heteroscedasticity, whereas σ has units of measure  Common δ and σ parameters for all species 2 Separate δ and σ parameters for each species kg/m 3 and acts as a scaling factor for the error variance. As is partly evident from the last line of results displayed in Table 3, not only the coefficient estimates but all other results (e.g. AIC, δ, σ) from fitting the non-parallel regression again coincide with fitting the models separately.

Discussion
The equivalence of the non-parallel regression models which incorporate indicator variables to the set of separate regression models for each group is evident in the results displayed long ago in Gujarati [1970]. Inasmuch as the emphasis in that article was on testing sets of coefficients, the equivalence went unremarked and unnoticed. The present note was prompted by a realization that this equivalence is not generally appreciated by instructors or their students.
The equivalence is retained even when modeling variance heterogeneity with a parametric function, as we did with the biomass data. This is an issue that is quite apart from the matter of whether the error variance is parameterized commonly for all groups, or separately for each, or a subset of each, group. Indeed, for the equivalence to hold, both variance parameters must be modeled separately by group.