Determinants of the Number of Children Born to Reproductive Women in Ethiopia : Sampling Cluster Based National Spatial Analysis of the 2016 Demographic and Health Survey Data

The study used averages of predictor variables measured at 643 sampling clusters selected for the 2016 Ethiopian Demographic and Health Survey to assess the strength of their individual and combined impacts on the average number of children ever born at the sampling cluster. The 2016 Ethiopian Demographic and Health Survey data on women aged 15 to 49 was used. In a multivariate analysis, the average values of nine predictor variables were regressed on the average number of children ever born per sampling cluster. The statistical analysis system software (SAS) version 9.4 and the geographic information system (GIS) software ArcGIS 10.4 were used. All but one of the nine predictor variables the presence or absence of co-wives – are found to have a statistically significant effect (P < 0.001) on the number of children ever born to Ethiopian women currently in their reproductive years. The adjusted R-Square of 0.74 for the model is also statistically significant with the average number of deceased sons per cluster having the greatest contribution. The altitude of a cluster is the only nonsocioeconomic variable considered. It too has a small but statistically significant effect (p < 0.001). The nine predictor variables explained three-fourths of the spatial variability in the number of children ever born. The wealth index has the biggest negative contribution and the number of deceased boys has the biggest positive contibution. Measures that can help increase wealth and reduce infant and child mortality in general and the mortality of boys in particular can help reduce the number of children overborn which remains high due to the need to replace deceased children. As this work is based on cluster-level averages, the goodness of fit shown by the R2 value of the model appears to be better than that which could have been achieved by using individual scores.


The Problem
Not enough is known about the correlates of high or low fertility in Ethiopia despite significant regional variations in the number of children born to women in their reproductive years.This study is intended to fill the knowledge gap created by the absence of comprehensive studies that shed light on fertility differences which can be as high as four children per woman regionally.Such an understanding is key to knowing future societal changes in both socioeconomic and demographic factors, including mortality and migration, that often end up impacting fertility.We hypothesize that, in addition to variables often mentioned as important correlates of fertility such as contraceptive prevalence, the other demographic events, namely mortality and migration, also play a role in determining the number of children an Ethiopian woman would have at the conclusion of her reproductive years.
or insecure households, absence of child sex preference (Wubegzier and Alemayehu, 2011), unmet need for family planning (Demeke, et al 2017), and regional as well as ethnicity factors (Taylor and Tesfayi, 2009).However, these determinants were often studied in isolation or in pairs to the exclusion of all others.The main objective of this study is to provide a holistic picture of the combined effects of nine predictor variables that we believe underpin geographic variations in the number of children ever born in Ethiopia.The 643 sampling clusters selected as part of the 2016 Demographic and Health Survey sample design are used (Figure 1) as units of analysis instead of regions or districts.We believe that this represents a methodological improvement.The inclusion of a variable rarely included in fertility studies -altitude -also marks a new approach.Additionally, the role of population mobility is examined based on the migration status of women which was found not to be a contributor in a recent paper (Wubegzier and Alemayehu, 2011), but was found to have a statistically significant impact in another study (Aynalem and Kloos, 2016).

Materials and Methods
We used the demographic and health survey (DHS) data downloaded by permission from ICF International's website (https://dhsprogram.com/data/).The data was collected during Ethiopia's 2016 Demographic and Health Survey.The 2007 Population and Housing Census was used as the sampling frame.A two-stage sampling process led to the selection of 643 sampling clusters and 15,683 women interviewees between the ages of 15 and 49.We use regression analysis with the cluster averages of the number of children ever born as a response variable Ŷ and cluster averages of nine independent variables (X1…X9) as predictor variables.The formula below represents our regression model consisting of nine predictor variables, a response variable and an error term: Where: Ŷ is the predicted or expected value of the dependent variable and X 1 through X 9 are our distinct independent or predictor variables b 0 is the value of Y when all of the independent variables (X 1 through X 9 ) are equal to zero, b 1 through b 9 are the estimated regression coefficients and e is an error term.Each coefficient represents the change in Y for a one-unit change in the respective independent variable.For example, b 1 , represents the change in Y for every unit change in X 1 , holding all other independent variables constant (i.e., when all of the other independent variables are held at the same value, or are fixed).We assume that the predictor variables are normally distributed and linearly correlated with the response variable.Figure 2 proves that our assumption of normal distribution of the predictor variables is correct.Tests are performed to evaluate whether each coefficient is significantly different from zero.The predictor variables are: duration of residence, age at first cohabitation, the number of deceased sons, presence or absence of husbands, the existence of co-wives, reported ideal family size, contraceptive use, the wealth index, and altitude.All missing cases were excluded.

Results
The goal of a regression analysis is to understand the relationship between several inputs (predictor variables) and one output (response variable).Our chosen response variable -the average number of children ever born -captures the lifetime reproductive experiences of women in the 643 sampling clusters used as units of analysis.A woman's age is the main determinant of the number of children she has given birth to.For instance, everything else being equal, women in their 40s would have more children than women in their 20s.We assume that random selection of clusters during the design stage insured that the age distribution of women was not too dissimilar from cluster to cluster.
Figure 2 proves that our assumption of a linear correlation between predictor variables and the response variable is correct.If the presumed linear relationship between our response variable and each of the nine predictor variables was nonlinear, then the residuals for each variable would have shown a systematic trend with the predicted values.There is no systematic pattern in the case of each of the nine predictor variables in our model (see Figure 2) as the residuals are randomly scattered around the plot in the range of the predicted values.One would be inclined to do additional model fitting by removing one or more of the outlying observations.While this is instinctively tempting to do, the outlier issues are a bit more complicated and cannot often be resolved by such a pick-and-remove action.Therefore, we will not attempt to do so here.We also looked at the histogram (not shown here) of the frequency plot obtained by placing the data in regularly spaced cells and plotting each cell frequency versus the center of the cell.It showed a symmetric bell-shape with values evenly distributed around zero, further confirming the validity of our assumption of normality, and of a linear relationship.A normal density function on the histogram also helped to check whether or not the variance is normally distributed.
The nine predictor variables together explained three-fourths of the cluster level variability in the number of children born to women in their reproductive years at the time of the 2016 Ethiopian Demographic and Health Survey (Table 1).In other words, the regression model explained three-quarters of the variation in cluster averages of the number of children ever born; adjusted R-Square = 0.7419 (Table 1).Eight of the nine predictor variables have a statistically significant effect (p < 0.001).Only one variable, the presence or absence of co-wives proved not to be statistically significant (Table 2).The parameter estimate for the 'Deceased sons' variable has a positive sign and is statistically significant (p < 0.001).This suggests that couples' desire to replace a deceased child or children (we only measured the effect of deceased sons) has the largest contribution.We believe that this is due to the often-cited linkage between high fertility and high childhood mortality in developing countries.This result represents the first confirmation of its outsized impacts in Ethiopia.
The duration of residence parameter estimate also has a positive sign (and it is statistically significant -p<0.001)suggesting a higher average number of children ever born in clusters where women had always been residents or have been residents for a long time.Where the relationship is negative, as is the case for all other variables, an increase in the value of the predictor variables is correlated with a decrease in the number of children ever born.
The "Husbands-in-house" index and the "Wealth index" have the biggest such effect.We excluded the education, marital status, and duration of marriage variables due to multicolinearity concerns, as education and marital status are strongly correlated with wealth.Lastly, altitude's role as a predictor of spatial variations in the number of children ever born has been found to be positive and statistically significant (p<0.001).

Discussion
The cluster-level data used in our regression analysis are all averages.For example, the age at first cohabitation variable (Table 2) refers to the averages of all ages at first cohabitation reported by all women in a sampling cluster.
Only one predictor variable -altitude -was not averaged as every cluster had a single and unique altitude.

The Dependent Variable: Cluster Average of the Number of Children Ever Born
This is not the first study to examine the correlated of the number of children ever born and its spatial variation.A number of other studies have used children ever born as a dependent variable (Jafari et al, 2016, Kabir, Jahan, Shahidul, and Ali, 2001, Sharma 2015, Lindstrom, Gebre-Egziabher, and Hogan, 2009).Figure 3 shows the maximum number (not the average) of children reported in a cluster (y-axis) and the cumulative number of clusters reporting that maximum (x-axis).For example, the skinny bar at the very left end of the horizontal axis shows the maximum number of children ever born to a woman to be 14 (vertical scale), and the number of clusters reporting this maximum to be 4 (horizontal scale).The next bar shows the maximum number of children ever born per woman is 13 (vertical scale) and the number of clusters reporting this maximum is 14 (horizontal scale).In a recent study, cohabitation before age 18 was associated with a more than twofold likelihood of having high fertility as compared to getting married at age 18 or older (Getachew and Zelalem, 2014).However, cohabitation has not always meant immediate exposure to the risk of child bearing, at least not in some parts of Ethiopia.In their analysis of the role of social factors in decoupling early cohabitation and early child bearing, Eshetu and Dula (2014) cite examples from Amhara Region where cohabitation takes place early for cultural reasons but sexual intercourse is delayed to allow for physiological maturity of the young bride who also faced high risk of divorce and single motherhood.In this study, the "Age at first cohabitation" variable has a statistically significant effect (p < 0.001).

Cluster Average of the Contraceptive Use Index
The link between contraceptive use and fertility is obvious.A recent international summit on family planning ended with a pledge to provide 120 million poor women increased access to contraceptives by 2020 (FDRE, 2016).
As one of the participating countries, Ethiopia committed itself to increasing its prevalence to 69 percent by 2015, as well as reduce the total fertility rate to 4 children per woman.Additionally, the Federal Ministry of Health has committed itself to attaining a contraceptive prevalence of 73.3 percent by 2020 (FDRE, 2016).The 2016 DHS showed a nationwide contraceptive prevalence of 58 percent among currently married and sexually active unmarried women (Alazbih, Getachew, and Tariku (2017).Although very impressive by the standards of the 1970s, 80s, 90s, and early 2000s, this is far short of the Ministry's 2020 target.The regional picture is also mixed with major urban regions -Addis Ababa, Harari, and Dire Dawa -at the top, and predominantly rural regions such as Amhara experiencing a fast growth in coverage, while others lagged behind.A study of family planning trends in Amhara showed the increase in contraceptive use as the single most important contributor to a recent fertility decline there (Rutstein and Staveteig 2014).The contraceptive use index has values ranging from 0 (no method) to 3 (modern method).There were no missing cases.Expectedly, this variable has a statistically significant effect in our regression model (p < 0.001).

Cluster Average of the Wealth Index
The DHS wealth index has been in existence for more than a decade (Adebowale, Adedini, Ibisomi, and Palamuleni, 2014).The index provides the opportunity to analyze household differences based on indicators of wealth by using easy-to-collect data on ownership of selected assets including televisions and bicycles, housing construction materials, types of water access, and sanitation facilities.These are converted into indices using the principal components statistical method (Adebowale et al, 2014).The wealth index is a point-in-time survey-specific gauge of the relative economic standing of a given household based on its assets and service amenities.It is calculated separately for each survey.Specific scores and wealth quintiles represent different levels of economic status within specific surveys but are not comparable from survey to survey.The wealth effect has been studied both in terms of its fertility impacts, and its relationship to contraceptive use (Kulu H. (2003).Fifty five of the 56 clusters (98 percent) in Addis Ababa are in the highest wealth quintile Q1, whereas 55 of the 170 sampling clusters (32 percent) in Afar and Somali in the eastern part of the county and Gambella in the west are in the lowest wealth quintile, Q5 (Figure 4).

Cluster Average of the Number of Years of in Residence
Competing theories exist regarding the effects of geographical mobility on childbearing.The range of views include the theory that internal migrants exhibit fertility levels that prevailed in their childhood environment at places of origin (Kulu H., 2005).Others espoused the belief that migrant fertility resembled more closely the experience of the native population at destinations (Chaudhuri, 2012).A recent research by Adugna A and Helmut K ( 2016) analyzed the impact of resettlement on birth rates using Gambella's DHS 2000 data and found it to have a statistically significant negative effect on young women who migrated when they were just coming into marriageable ages, and a much lesser effect on women who were older at the time me arrival.
The DHS 2016 duration of residence data has numeric values ranging from zero (new arrivals) to 48 years.Those who always lived at a place of interview (9,792 women or 62.4 percent) were given a code of 96.There were no missing cases (46 visitors were excluded).We used cluster averages of years of residence as an index.A high index shows that a large proportion of respondents in a sampling cluster were natives.In 37 sampling clusters, every woman interviewee (a total of 742 respondents) reported themselves as having always lived there.The duration of residence variable has a statistically significant effect in our regression model (p < 0.001).

Cluster Average of the Number of Deceased Sons
A recent study out of India showed women who had more sons than daughters as being less likely than those with more daughters than sons to have another child (Defo, 1998).It also showed, that at any given parity, the lastborn child was more likely to be a son than a daughter.A study of couples' reactions to the death of sons, especially first-born sons, found the response to be both swift and repetitive (CSA and ORC Macro, 2001).This means that reaction to the first child's death was immediate, and that it was then followed by lagged responses in the form quick successions to third, fourth, and fifth parity births.The "Number of deceased sons" variable has a positive parameter estimate and is statistically significant (p < 0.001).This variable has the largest effect in our regression model.

Cluster Average of the Ideal Number of Children per Woman
Demographer use the term replacement fertility in connection with fertility levels required to replace a couple.An ideal number in low mortality countries is around 2.1 births per woman.However, in developing countries including Ethiopia, couple's ideal number of children is much higher with husbands desiring an even larger number than wives.During Ethiopian's 2000 Demographic and Health Survey, two out of three women favored four children or more (Bhargava, 2006).Only 17 percent favored fewer than four children.The desired number of children among women and men was 5.3 and 6.4 children respectively (Bhargava, 2006).The "Ideal family size" variable has been studies as a predictor of current births in the past (Potter and Kobrin, 1982).In this study, the "Ideal number of children" variable also has a positive parameter estimate and is statistically significant (p < 0.001).

Cluster Average of the "Husband-in-house" Variable
A study in the 1980s used simple probabilities of births that have been averted by two types of temporary spousal separation -single separation, and cyclical separation -to study its fertility impacts under conditions of natural fertility where no family planning was being exercised (Dutt, 1980).For either class of spousal separation, the number of births averted increased with longer separations.For both types of separations, births averted were more sensitive to changes in length of anovulation than to changes in level of natural fecundability or the risk of spontaneous abortion.It also showed that the impacts decreased with increasing age of couples.We included the "Husband-in-house" variable (Table 1) to study the effects of spousal absence in the Ethiopian context.We found that it has a statistically significant negative impact (p < 0.001).The "Husband-in-house" index was constructed using replies women respondents gave to the question of whether or not their husbands lived at home at the time of the survey.A value of 1 was entered for a "yes" answer and 2 for a "no" answer.There were no missing cases.

Altitude
Researchers have, for years, pondered the possible links between altitude and fertility albeit in the context of the impacts of very high altitudes on either women's actual fertility or their reproductive potential.A study out of Bolivia seemed to suggest that high altitudes lowered human fertility (Vitzthum, 2001).Another study from the Andean region of South America used data on women's ovarian functions and concluded that neither progesterone levels nor the menstrual cycle length, or its regularity were significantly different for women living at high altitudes than at lower altitudes.It, nevertheless, suggested that transients from lowlands, whether human or animal, might experience changes in reproductive functioning and therefore a reduction in fertility (Pradhan, 2006)).Our parameter estimate for altitude has a positive sign suggesting a positive links between altitude and children ever born.The result is statistically significant (p < 0.001).However, since residents of the mountainous regions of Amhara and Oromia who would certainly qualify for an investigation on the impact of altitude on fertility, had not been subjected to such a study, we cannot conclude with certainty that the positive effect is due to altitude's favorable biological impacts on residents of Ethiopia's highlands.

Conclusion
This study showed that the predictor variables we selected -duration of residence, age at first cohabitation, number of deceased sons, the presence of husbands, number of co-wives, ideal family size, wealth index, contraceptive use, and altitude -are best studied holistically with all variables included.It has also provided a methodological improvement by involving measurements taken at all of the 643 sampling clusters chosen in the 2016 Ethiopian Demographic and Health Survey as the basic units of analysis rather than aggregating the data to the level of major administrative regions of the country as was routinely done in the past.The "Number of deceased sons" variable has the largest effect in our regression model showing that couple's eagerness to replace deceased children (sons in particular), is the main motivator and a driving force behind increased number of births.This has led to a larger number of children ever born per woman than would be expected in environments where childhood mortality is low.All of the predictor variables except one -the "Co-wife present" variable -have a statistically significant effect (p < 0.001) on the average number of children ever born.The overall finding is that the desire to replace deceased children (sons in particular), the presence of a husband or co-habiting partner, and a household's wealth are the most important determinants of the spatial variation of children ever born to individual women in various parts of Ethiopia.
Rapid educational expansion in the last two decades have significantly increased school enrollment among girls who are now in their mid to late teens and early twenties (Pradhan, 2016).This may have directly impacted their reproductivity by delaying cohabitation and sexual intercourse, and by introducing them to fertility control measures.However, education, marital status, and duration of marriage, factors that are often added to the list of fertility determinants, were not included in this study to prevent model over-fitting resulting from multicolinearity with the wealth index.
Three variables -duration of residence, the number of deceased sons, and the ideal family number of childrenhave a positive effect.All other predictor variables have a negative effect, with the wealth index at the top of those with negative contributions.The multivariate analysis also showed the crucial importance of canvasing the entire geographical extent of the country through a spatial analysis that included all of the sampling clusters selected in Ethiopia's 2016 Demographic and Health Survey.In doing so, this study was able to account for three-fourths of the spatial variability in the number of children ever born to Ethiopian women in their reproductive years.In sum, measures that can help reduce infant and child mortality in general and the mortality rate of boys in particular can help reduce the number of ever children overborn which remains high due to the need to replace deceased children.All of our results are sample-based with no efforts to use the weights provided by the survey.
Replacement fertility -a case where women go on to have another child due to the passing of a previous child -is the main predictor of the total number of children likely to be born to an Ethiopian woman.This has significant policy implications.Among others, it obviates the need to continue the steps taken by institutions and the government in the past to expand maternal and child care as well as reduce prenatal, infant, and child mortality.It is very likely that this has reduced subsequent births aimed at replacing a deceased child.It is an established fact that, every pregnancy, including one aimed at replacing a deceased child comes with significant health and mortality risks for both the mother and the infant thereby by leading to a potentially higher infant, child, and maternal mortality.Improved maternal and child care breaks this cycle of high mortality leading to high fertility, which in turn could potentially give rise to higher mortality of both mothers and children.
A word of caution on a multiple regression analysis with data grouped at the sampling cluster level using mean scores rather than individual scores is that even though this approach allows the reduction of potentials for error and, therefore, improves the reliability of the technique, it also introduces some undesirable effects.According Mtz-Vara de Rey C. Camacho, G. M., Galindo P. Velarde M. A. and Arias, V.M.A (2001), the undesirable effects include artificial increases in R 2 which give the impression that a high degree of fit has been achieved for the regression model.Therefore, the goodness of fit shown by the value of R 2 in this study appears to be better than that which could have been achieved by using individual women's scores.

Figure 1 .
Figure 1.Location of the sampling clusters: Ethiopian Demographic and Health Survey, 2016

Figure 2 .
Figure 2. Residuals by Regressors for Children Ever Born

Figure 3 .
Figure 3.The Maximum Number of Children Ever Born Reported at a Sampling Cluster, and the Cumulative Number of Sampling Clusters reporting it

Figure 4 .
Figure 4. Sampling Clusters with the Highest and Lowest Average Wealth Indices: Ethiopian Demographic and Health Survey, 2016

Table 1 .
Analysis of variance

Table 2 .
Parameter Estimates of the Linear Regression Model