GLM for Some Class of Com-Poisson Distributions with Applications

In this paper, we present regression models (GLM) for the class of Conway-Maxwell-Poisson (Com-Poisson) distributions. This class of models include the Com-Poisson, the Com-Poisson negative binomial, the Generalized Com-Poisson and the Extended Com-Poisson distributions, all of which have been presented in various literatures within the last five years. While these distributions have been applied most especially to frequency count data exhibiting over or under dispersion, not much has been presented in the application of this class of models to data having several covariates (the exception being the Com-Poisson itself). Thus in this paper, we present the generalized linear model formulation for these distributions and compare our results with the baseline Com-Poisson and Poisson models. Two data sets are employed in this application. We further extended our discussion to the zero-inflated versions of these distributions and applying same to a well established data with having 64% zero observations. All the models are fitted using SAS PROC NLMIXED. In all cases, empirical means and variances are generated which leads to our ability to compute the Wald’s goodness-of-fit test statistic for all the models employed in this paper.


Introduction
Most often, discussions are based on fitting distributions for over-dispersed data.Not much is focussed on under-dispersed count data.The Com-Poisson distribution (Shmueli et al.) however can be used to model both under and over dispersed count data.Consequently, we have applied the Com-Poisson model to three data sets having covariates in this study.Thus, our goal here is to fit the generalized linear model versions to the class of Com-Poisson to these data sets.We note here that the Com-Poisson distribution and its generalized linear model (GLM) has been explored and found very flexible in handling count data.See results in Lord et al. (2008), Sellers et al. (2012), and Francis et al. (2012) amongst several others.However, not much has been extended to the various extensions of the Com-Poisson distribution presented in Imoto (2014), Chakraborty & Ong (2014) and Chakraborty & Imoto (2016).We present in the next sections brief discussions of this class of Com-Poisson distributions.
For the data employed in this study however, the first data set is extracted from the 2013 Nigerian Demographic Health Survey (NDHS).The data comprises a subset of 3980 observations where the response variable is the number of children alive and using as an offset, the total number of children given birth to by women respondents in the six south-west states of Nigeria.The data is under-dispersed.The second data set is the 2003 U.S. Medical Expenditure Panel Survey (MEPS) data set relating to the number of doctor visits (Y=docvis) in 2003 for a number of elderly patients as well as several other covariates relating to patients' characteristics.This data set is over-dispersed.
Our third data set is the example which examines how waste quotas (emps) and the strictness of policy implementation (strict) affect the frequency of waste spill accidents of plants (accident) in Australia.Our focus is on employing the zero-inflated versions of these distributions to this data set that has excess zeros.We present here a brief introductory discussions on the Com-Poisson distributions and its extensions.These are models that are subsequently applied to the three data sets described above.

The Com-Poisson Distribution
The Com-Poisson (here referred to as COMP) distribution, first introduced by Conway and Maxwell (1962) and which Shmueli et al. (2005) re-introduced has seen a lot of attention in recent times.The distribution is defined for a random variable Y as: Where (2) is the normalizing term and ν is the dispersion parameter such that if (ν > 1) we have under dispersion, and when (ν < 1), we have over dispersion.The distribution reduces to the Poisson, Geometric and Bernoulli as ν = 1, ν = 0 and ν → ∞ distributions respectively.The mean and variance of Y can be obtained respectively from: (3) However, the mean and variance of the Com-P distribution do not have closed form expressions, and, consequently, Shmueli et al. (2005) provided approximate mean and variance of the distribution as:

Class of COM-Poisson Distributions
Chakraborty & Iyamote (2016) introduced the distributions presented in the following sections, their properties and applications to both frequency distributed data,including zero-truncated case.

Com-Poisson NB-COMNB
The COM-Poisson Negative Binomial distribution Chakraborty & Ong, (2014) has the pdf in (5) with parameters (ν, p, α): where and the distribution is defined in the parameter space Imoto (2014) proposed the generalized Com-Poisson distribution-GCOM with parameters (ν, p, β) and has the pdf: where With the distribution is defined in the parameter space

The Extended COM-Poisson (ECOMP) Distribution
The pmf of a random variable Y having the extended COM-Poisson distribution with parameters (ν, α, , β) is given by: where The distribution is defined in the parameter space 2014) have discussed the properties of these distributions in detail, and we would thus not focus on these here.

Com-Poisson Type Regression Formulation
We present in this section the application of these distributions in regression situations where we have several covariates.This regression approach is often described as generalized linear Models (GLM regressions.In all cases, we would assume a log link between the parameter, λ, µ or p and the linear predictor (x ′ β).For the COMP (basic) model, is modeled in two ways as: The formulation in (8a) is designated here as model COMP and models λ and the linear predictor x ′ β.The formulation in (8b) however is based on Guikema and Coffel (2008) alternative parameterization of the COM-Poisson regression model using µ = λ 1/ν , the approximate mean of the distribution.This approach leads to the expression in (1) now becoming: where, S (µ, ν) ) ν .This model will be designated here as COM µ and has its mean and variance defined as: and Var(Y) = µ/ν.Models COMNB, GCOMP and ECOMP are each modeled with the following GLM formulation:

Estimation
In all cases, MLE of the above models and those in other sections are carried out with PROC NLMIXED in SAS, which minimizes the function −LL(y, Θ) over the parameter space Θ numerically.The integral approximations in PROC NLMIXED is the Adaptive Gaussian Quadrature, Pinheiro & Bates (1995) and the Newton-Raphson optimization algorithm in PROC NLMIXED (NEWRAP) are employed.

Application-Example I
The data for this example comes from the 2013 Nigeria Demographic Health Survey (NDHS).This is a nationwide survey covering the six geographical zones of the country.For our analysis here, we have selected those for zone 6-South West The covariates are: • age in years • age1-age at first child birth • term-any previous still birth or miscarriage (1 for yes, 0 if no) • lch=number of living children (response variable) • tch-total number of children given birth to.

Results
The results of applying these model to the NDHS data are presented in 9002 on 3976 d.f., giving a dispersion parameter of 0.1081 << 1 clearly indicating under-dispersion in the data.We note that for the Poisson, mi = σ2 i .The negative-binomial (NB) model will not be suitable in this case as it often leads to non-convergence for under-dispersed data.Under the circumstance therefore, the Com-Poisson (two parameterization approaches) and its extended forms (COMPNB, GCOMP and ECOMP) models discussed earlier are implemented on this data set.
Based on the -2 log-likelihood (-2LL) and the Akaike Information Criteria (AIC), the most parsimonious model would be the Guikema and Coffel (2008) parameterized model COM µ with the lowest -2LL and consequently lowest AIC of 7841.6.However, this model produces a very high Wald's goodness-of-fit test statistic of 3568.7497 on 3975 d.f.This GOF value is very much on the high side when compared with those of the other Com-Poisson based models in Table 1.The Wald's test statistics computed for the other models are such that mi and σ2 i are computed using similar expressions for the means and variances in (3) for the COMP model.Our approach here, for these computations agree with those employed in SAS PROCs GLIMMIX, GENMOD and HPFMM.To further test the validity of our approach in computing these moments from expressions in (3), we have similarly, for the Com-Poisson distribution, computed the approximate means and variances given by expressions in (4) respectively.Results in Table 2 gives the results of this comparison.Only results for the first five and last five observations are presented.
In actuality, the summation converges in the region 200 ≤ j ≤ 1000.Thus, the mean is obtained as (sumb/suma) and the variance similarly as (sumc/suma)-(mean*mean).Similarly, the approximate means and variances (m1 & v1) are obtained using expressions in (4).The corresponding Wald test statistic under the latter is 2099.3164for the Com-Poisson (λ parameterization based) model.We observe here that the means and variances obtained from either expressions in (3) or (4) are very close, indicating that the procedure employed in the former (which is adopted here across all models) is validated.
Although the COM µ has the lowest -2LL and AIC fit statistics, its Wald's GOF is unusually high.This is due to fact that it generally underestimates the expected variances.For Instance, we present in Table 3, the estimated mean and variances under this model using expressions in (3).We see that the procedure gives estimated means that are very close to the estimated mean μ in column 2, however, the variances are all very small compared with those in Table 2.For all these models, the effects of age, age at first birth (age1) and previous miscarriage or still birth (term) are all significant.Thus, for ten years increase in age (keeping all other variables constant), the expected number of children alive is exp(10 × 0.0538) = 1.71,or an average of 1 additional child surviving.

GLM with Variable Dispersion Parameters
For all the models applied above to the NDHS data, we have assumed that the dispersion parameters (e.g.ν in the COMP model) are constant and do not depend on the explanatory variables that may or may not necessarily belong to the list of covariates in the main model as for example in model (10).Many often times, the dispersion parameters themselves can be function of these regressions which makes the assumption of constant dispersion parameter untenable.Consequently, in this section, we will model the dispersion parameters ν in the following form: where {a 0 , a 1 , a 2 , a 3 } are additional parameters to be estimated from the data.Further, all or some of these additional parameters may or may not be significant at say, α = 0.05.The results of these applications to our models are presented in Table 4.We note that we did not implement a variable dispersion parameter for the ECOMP model because results from Table 1 indicate that the dispersion parameter is not significant for this model, its estimate being ν ≈ 0.00.However, for all the other models, Table 4 gives the results of this application.Clearly, not all covariates are necessarily significant in the dispersion formulation.For instance for the COMP model, only the explanatory variable age is significant while for the COM µ model, all the explanatory variables are significant.All the significant covariates are asterisk-ed in Table 4.
These are clearly contrasting results.Ideally therefore,it would make sense to remove all non-significant covariates in the dispersion GLM formulation.Thus for the COMP model for instance, the dispersion model should be log(ν) = a 0 +a 1 age.Again here the COM µ model gives the lowest -2LL and AIC but much higher Wald's GOF test value.The GCOMP model provides a much better fit with the dispersion parameter modeled as log(ν) = a 0 + a 1 age + a 2 age1 (a 3 omitted).The results for this reduced model when implemented is presented in figure 1.

Example II
Our second example data here is the U.S. Medical Expenditure Panel Survey (MEPS) data set relating to the number of doctor visits (Y=docvis) in 2003 for a number of elderly patients as well as several other covariates relating to patients' characteristics.The covariates are: • private insurance coverage (supplemental to Medicare) (0,1) • medicaid-eligibility for low income Medicaid coverage (0,1) • female-gender of patients (1 if female, 0 if male) • actlim-limitation of activity (0,1) • totchr-number of chronic conditions • phylim-physical limitation (0,1) • educyr-number of years of educational attainment.
We present the first and last five observations for this data set (n = 3677).
Obs docvis female phylim private medicaid educyr actlim totchr --------------------------------------------------------------------------3671 5 We also created the interaction term fem edu of female and educyr.The baseline Poisson regression model has Pearson's X 2 = 22930.3628on 3668 d.f., giving a dispersion parameter of 6.2515, which clearly indicates very strong overdispersion, considering the size of the data.In Table 5 are the estimated parameters, together with their standard errors for the distributions considered here.f.We observe here the considerable reductions in the AIC and Wald's test statistic for the Com-Poisson models and its extensions.For all these models, the effect of medicaid is not significant.actlim-activity limitation is barely significant in the COMP models but not in the extended models.The interaction term between female and education years is also significant.Thus, for a ten year increase in education (keeping all other variables constant), the expected number of doctors' visits is exp(10 × −0.0052) = exp(−0.052)= 0.949,or 5.1% reduction in men visits to doctors.The corresponding value for females would be exp{10 × (0.0494 − 0.0052)} = 1.045.Thus, females expected number of visits will increase by 4.5%.These are based on adopting the COMP model as the most parsimonious.

Variable Dispersion Parameter Models
For the corresponding variable dispersion models, we present below the summary fit statistics for the models.Although the ECOMP model gives the lowest -2LL and AIC and a most parsimonious value of Wald's X 2 , it however most often suffers from convergence and under the circumstance, the GCOMP model may be preferred to the ECOMP model and a typical output under the GCOMP model is presented in figure 2. A test on whether the additional estimated parameters from the variable dispersion model is significant is provided by

COMP
and is based on (3665-3657)=8 d.f which is highly significant.We note from figure 2 however, that not all the dispersion covariates are significant in the model.

The SAS System
The NLMIXED Procedure

Zero-Inflated GLM Models
We present in this section the effect of applying the procedure employed in the last section to data exhibiting excess zeros (like the data in our next example where 64% of the data are zeros.We present here the results of fitting the ZICOMP, ZICOM µ , ZINBCOMP and ZIGCOMP models are given in Table 6.To accomplish these, we recall that a zero-inflated (ZI) model is a two-part process manifested by the structural zeros part and the process that generates random counts and can be written in the form: where ϕ is the extra proportion of zeros and Y is the count random variable with specified parameters.ϕ is modeled here in the logit form.Thus, the probability mass function for the ZICOMP, ZICOM µ , ZINBCOMP and ZIGCOMP models are given respectively in expressions (15) to (19).
For the Com-Poisson, we have the probability density function: where and the mean and variance of Y i are respectively given as: Similarly, for the ZICOM µ , the probability mass function is given by: A frequency distribution of the data reveals that 498 or 64% of the data are zeros.Clearly, there is overabundance of zeros in this data.We would therefore expect that there are more zeros in these data than would normally be expected from a Poisson model.
We present the results of these analyses in Table 6.6 indicate that the ZICOM µ gives parameter estimates whose standard errors are greater than the absolute values of the parameters.This observation is validated by the use of PROC COUNTREG in SAS using the zeromodel option.However, the model gives the same test statistics values as the lambda based ZICOMP model.The ZIECOMP model is sometimes intractable and very unattractive because of its convergence issues.Further, parameter estimates are not often conformed with the theoretical justification of the model such as α > β.Of all these models, the ZINBCOMP (zero inflated negative binomial Com-Poisson) is the most parsimonious model with Wald's GOF value being 974.1406 on 770 d.f.For this model, both covariates are significant in the zero part of the model.The estimate of the dispersion parameter here is ν = 0.3455 which is significant.Of course we could stretch this model further by modeling with a variable dispersion parameter that incorporates the covariates.

Conclusions
We have demonstrated here that we can extend the generalized linear modeling approach to the Com-Poisson class of distributions.While the ECOMP model is sometimes difficult to fit in terms of convergence and obtaining initial parameter estimates, we have however, for the examples provided in this paper able to obtain convergence for this distribution.The Generalized Com-Poisson model (GCOMP) and its various forms works well for most data and readily converges.The Com-Poisson re-parameterization by Guikema & Coffelt (2008) produces very large values of Wald's GOF because it sometimes underestimates the true variances.The SAS programs for implementing these models are available from the author.

Table 1 .
These results demonstrate that any Comp-Poisson model and its other extensions clearly performs better than the underlying Poisson model.The Poisson model gives a Wald's test statistic.X 2 W

Table 1 .
Estimated ML estimates and standard errors in Parentheses for all the models

Table 2 .
Computations of means and variances and Wald test StatisticsIn Table2, the columns labeled mean and var are based on computations from expressions in (3), while those labeled m1 and v1 refer to computations based on approximate results inShmueli et al. (2005)presented in expressions in (4).The means and variances are very similar and this is true for all the 3980 observations in the data.The columns labeled suma, sumb, sumc are computed as follows:

Table 3 .
Mean and variances Computations under the COM µ Model µ model, we see from Table1, that of all other models, the GCOMP gives the most parsimonious model with Wald's X 2 = 2016.5757on 3974 d.f.The corresponding -2LL and AIC are 10,033 and 10,045 respectively.The ECOMP model has serious convergence problem and the estimated parameter ν under this model is very small.

Table 4 .
Estimated ML estimates and standard errors in Parentheses for all the models Under variable dispersion

Table 5 .
Estimated ML estimates and standard errors in Parentheses for all the models Results in Table5demonstrate again that that the Com-Poisson model and its various other extensions clearly perform better than the underlying Poisson model.The most parsimonious models are the COMP and COM µ models.Although the extended Com-Poisson models have slightly lower AIC and BIC than the COMP models, however, COMP is based on fewer parameters and has the lowest GOF of 4541.3103 on 3666 d.