Adjustment of Lactation Curves of Holstein Cows from Herds of Minas Gerais, Brazil

Random regression models (RRM) differ in terms of the functions used to describe the shape of lactation curves. The aim was to compare random regression models under different functions to describe the lactation curves from Holstein cows in herds of the state of Minas Gerais. A database of 28,118 production records was analyzed using the test-day records of 4,230 first parity cows from five herds. The Wilmink, Ali & Schaeffer and Legendre polynomial (orders 4, 5 and 6) functions were adjusted in RRM to model the mean production trend (fixed) and genetic and permanent environmental (random) effects. The residual variances were assumed to be constant throughout lactation. Analyses were performed using the AIREMLF90 program. Except for the model with the polynomial function of order 5, all models converged. The Wilmink function showed lower values for criteria based on the -2log (L), AIC and BIC. The model with the Legendre polynomial of order 6 showed lower residual variance. Heritability estimates were similar between functions, ranging from 0.07 to 0.18 and were higher from 215 days of lactation. From 155 days of lactation, genetic and permanent environmental correlations between successive controls are of high magnitude. The Wilmink function is the most suitable for the study of milk yields from primiparous Holstein cows. The selection of animals is possible from 155 days of lactation on. Permanent environmental effects have greater influence on the milk production at the end of lactation of primiparous cows and should be considered since they are important and may be cumulative throughout lactation.


Introduction
The Holstein breed has the largest number of individuals in the world and has been, and still is, the target of an intensive genetic selection, which has turned it into an extremely productive animal.
According to the Foreign Agricultural Service of the US Department of Agriculture (USDA) December 2012 report on milk yield, Brazil was ranked third in the world.Minas Gerais is the Brazilian State with the highest annual milk production, with over 27% of the total production (Poll, 2013).
Each country and breed association defines a method for evaluating phenotypic and genetic traits of economic interest that usually are selected simultaneously, but with different weights, within selection indices.
for the RRM predicted values may be higher than those obtained by models of multiple traits (Meyer, 2001).The genetic and residual variances, estimated by multiple trait analyses are obtained for each trait.In RRM covariance functions (CF) are used to estimate them.
Methodologies that allow genetic parameters estimation more accurately can help to increase genetic gains obtained by selection.In this context, RRMs have been recognized as the most appropriate for longitudinal data analysis in animal breeding research.
Considering the Holstein breed in Brazil, heritability estimates for the test-day milk yield, adjusted with the Legendre polynomials, Wilmink, Ali & Schaeffer under RRM range from .11 to .31(Cobuci et al., 2004;Cobuci et al., 2006;Araújo et al., 2006;Costa et al., 2008;Biassus, Cobuci, & Costa, 2011).These results are in agreement with heritability results observed in studies with Holsteins from other countries, ranging from 0.08 to 0.23 (Hahammi et al., 2008;Silvestre, Petim-Batista, & Colaço, 2006;Strabel & Misztal, 1999).A most realistic heritability curve is characterized by higher values inmiddle lactation and lower values at the beginning and at the end of lactation, since it is similar to the results obtained by multitraits models (Hahammi et al., 2008).
The functions used to describe the shape of the lactation curve, may contain in their formula an excess or fewer parameters to be estimated, leaving up to the researcher to decide which model and respective function is more appropriate to describe the test-day yield during the lactation.
For the choice of more parsimonious models, Burnham and Anderson (2004) emphasize the importance of selecting models based on scientific principles.Most studies that compare alternative regression submodels use the maximum likelihood values and the likelihood ratio test for nested models as quality information criteria for the selection of the most suitable models.The complexity is usually measured by counting the number of parameters included in the model.In addition to these criteria, the residual variance and the curve of the estimates of variance obtained for each control throughout lactation and correlations between the different periods of lactation have also been considered (Mazerolle, 2004;López-Romero & Carabaño, 2000).Thus, it is imperative that the focus of the Brazilian research must also be set onto the choice of methodologies that best fit the data of the various milk producing regions, so that greater accuracy of genetic evaluations may be achieved.The final productivity gain obtained with the accuracy of the estimates of the breeding values, warrants further scientific research and the development of new technologies.
The objective was to compare random regression models which differ by the function used to describe the lactation curve of Holstein cows in herds of the state of the Minas Gerais, Brazil.

Animals and Data File
Data were provided by a company specialized in software for agricultural management, from Belo Horizonte, MG, Brazil.The information belongs to Holstein cows from five herds reared in the State of Minas Gerais, Brazil.
Monthly test day milk yield data were considered, from primiparous cows, controlled from 5 to 305 days of lactation considering parturition as day 0. In order to be considered, total daily yield should add to at least 3.0 kg to a maximum of 99.9 kg (ICAR, 2012).
The minimum age at birth for the first lactation order was 18 months.Milk records without dates in which they were performed and without the calving dates were discharded.For lactations with more than one monthly control record, only the first record was considered.
For the structuring of the pedigree file and construction of the relationship matrix, no restrictions were applied to animals without known parents.Initial data on age at calving (AC) and test day milk yield (TDY) were submitted to normality distribution tests.
An outlier detection analysis was performed for AC and TDY.Classes of animals were created to detect possible outliers: for age at calving, animals from the same herd were considered; for the test day milk yield, besides animals from the same herd, the day of the monthly milk record was also considered.The standard deviation method, based on normal distribution assumptions was adopted.Thus, the mean (μ) and standard deviation of the studied traits were obtained.Data within the interval μ±3 standard deviations accounted for 99.73% of the observations.Records outside this range were considered outliers (Lethen, 1996).
Firstly, the elimination of outliers was performed for AC and subsequently for TDY.After elimination of the outliers, only lactations with at least three milk records were considered.A total of 28,118 milk production records from 4,230 animals were considered.

Genetic Analyses
The random regression model used to obtain the fixed and random solutions was: where, y = test-day milk yield; HYM the fixed effect of the 401 classes comprising the effects of herd and date (year and month) of test day; HAGE-fixed effect of the 39 classes comprising the effects of herd and age at calving; b-fixed regression coefficients obtained for each herd; a-genetic random regression coefficients for each animal; p-random regression coefficients for permanent environmental effect for each animal; and e-random residual effect of each observation.The residual variance was assumed to be constant throughout the lactation and independent between milk records (Costa et al., 2008;Biassus, Cobuci, & Costa, 2011).The different functions adopted, characterizing the different sub-models of random regression, have been adjusted in bz, az and pz in order to model the fixed and random animal effects and animal environmental permanent, respectively.
The Wilmink function (1987) is represented by: in which the parameters: Yt is the test day milk yield at t days of lactation; a is associated with milk production in early lactation; b to the increase in production until the lactation peak, the higher the magnitude of this parameter the greater the production before the peak; c is related to the decline in production after the peak, where higher values may be related to greater persistence; e means exponential (e is not a parameter to be estimated); and k is related to the peak moment.The k parameter is related to the peak moment and usually takes a fixed value that can be obtained from a preliminary analysis, which implis that the model has only three estimable parameters.However, Schaeffer et al. (2000) found no significant differences between the actual and the estimated productions, when the value of k is estimated for each situation analyzed.Thus, it was assumed that k = 0.05, associated with a peak occurring near 50 days of lactation, according to Wilmink (1987) and Schaeffer et al. (2000).
The Ali and Schaeffer (1987) function is represented by: Where:  t = (t/305),  t = ln(305/t), Yt is the test day milk yield at t days of lactation; a is the production at the lactation peak; d and e are related to the rate of increase in milk production before the peak, while b and c are related to the decline in production after the lactation peak.When d and e have estimates of high magnitude, milk yield can reach its maximum (peak) in a few weeks after calving.The rate of decrease in production described by the parameters b and c can provide information about ability of the animal to maintain a fairly constant milk production (persistency of lactation) after the peak.
The Legendre Orthogonal Polynomials are a methodology published by Spiegel (1971) and modified by Kirkpatrick, Lofsvold, and Bulmer (1990).They are polynomial functions of n degree and domain n + 1, implying in the existence of an additional parameter other than the polynomial degree being estimated.In these functions, a single observation can be written as: where,  is a standardized time unit ranging from -1 to +1 (Kirkpatrick, Lofsvold, & Bulmer, 1990) and can be obtained by: In this case, it is assumed that t min = 5 days and t max = 305 days, corresponding to the first and last test day after calving, respectively. Following: where, P n (ω) is the non-normalized polynomial of n degree and Ø n () is the standardized polynomial.If the fifth degree is considered, six non-normalized polynomial functions would result: Finally, the normalized polynomials up to the fifth degree, or up to the sixth order, are: The estimation of variance and covariance components between the days of predetermined test days: 5 th , 35 th , 65 th , 95 th , 125 th , 155 th , 185 th , 215 th , 245 th , 275 th and 305 th were obtained by multiplying the components of variances and covariances between the different function parameters of the genetic effects of animal and of animal permanent environment, and the covariates related to the t test day of lactation, as reported by Cobuci et al. (2006).
Once the matrices of variance and covariance components between the test days were defined, these (co) variances were used to obtain the estimates of heritability and correlations (genetic and permanent environmental) (Cobuci et al., 2006).The heritability for a test day in particular was calculated by dividing the additive variance by the sum of the variances: additive, permanent environmental and residual estimated in each function for each day of milk recording.These estimates were obtained with the support of the R Core Team (2013) program.

Convergence Criteria
The convergence criteria adopted when using the AIREMLF90 program was 10 -10 .The number of iterative rounds or cycles adopted for the whole study was of 50,000 cycles and when not enough it was increased to 200,000 cycles.The initial values of the variance and covariance matrices among the parameters of each sub-model for genetic and permanent environmental effects were obtained from the first 10 cycles of iterative analysis (OPTION EM-REML, of the AIREMF90 program).
The estimate of the AIC, for a particular model, is given by: AIC = -2log (L) + 2k where L is the maximum of the logarithm of the model likelihood function and k is the number of parameters of each model.The model with the lowest AIC value is considered the best fit to the production data.
On the other hand, the value of the BIC criteria, for a particular model, is given by: BIC= -2log (L) + klog (n), where n corresponds to the number of observations.The model with the lowest BIC is considered the best fit.
The residual variance (σ e 2 ) obtained with the fit of each model was also considered for selection of the model that best fits the considered milk yield data.

Results and Discussion
A general description of the relationship matrix structure is shown in Table 1.Animals were born between the years 1992 and 2009.The high number (59.97%) of unrelated individuals (Table 1) was due to the failure in identifying in the records the complete genealogy.Sire and dam could not be identified in 4,645 (53.69%) records; Sire only was identified in 194 (2.24%) records; dam only was identified in 247 (2.85%) and in 3,566 (41.22%) sire and dam were known.
Only three individuals were inbred.The highest coefficient of inbreeding (F) observed was 25% and the average inbreeding coefficient of the three inbred animals was 20.83%.Possibly, the low number of detected inbred animals was the result of incomplete genealogy records.
A higher number of lactations were observed in herd 2 (Table 2).Nevertheless, in all herds, milk yield (TDY) was recorded in the total course of the lactation period.This fact allows a better fit of the functions to the test day data, since information from all the different stages of lactation was present (upswing, peak and decreasing production phase).For each animal, on average, 6.64 milk records were evaluated by lactation.Whereas the model with the most parameterized function (LEG5), the number of dairy controls registered for lactation may be considered sufficient for the adjustment of the evaluated functions.
The means and standard deviations of days in milk at the time of milk recording among the herds are similar (Table 3).Also, within herds, there is regularity in the milk record intervals.Such considerations are important because they show fairly similar conditions between the different herds sampled.Thus, more reliable comparisons of milk production curve patterns between herds are possible.In addition, the variations due to genetic and environmental effects that occur at specific times in the course of lactation may be quantified equally among the herds studied.
Apparent differences in productivity between herds and milk record data are observed on Table 4 and Figure 1.These are milk records from 4,230 lactations.Cows were 26.99 month-old on average (18.85 up to 47.78 months) at first calving.
Herd 1 had numerically lower milk production records compared to the remaining herds, and registered the highest production at test day in third month (TD3) (Table 4 and Figure 2), which was close to 70 days of lactation (Table 3). jas.ccsenet.
In herds 2 yield occu (Table 3).sampled in between h Convergence was reached for all models studied, except for the adjusted model with LEG4 function, even after extending the number of iterations up to 200,000 cycles for this function.
Residual variances estimated in this study ranged from 14.11 for the model with function adjustment LEG5 to 17.56 for the model adjusted with the WIL function (Table 5).The lower values obtained for the residual variance models are associated with a greater number of parameters to be estimated, as it is observed when the model is adjusted with the LEG5 function (Table 5).Higher values for the information criteria [-2log (L), AIC and BIC] were observed for the most parameterized models, especially the BIC criterion, which is directly related to the number of parameters to be adjusted in the model (Table 5).The information criterias, based on the likelihood function, helps to select models with better quality of fit to the data.Higher information criteria values may be linked to the difficulty of adjustment of the parameterized models.
For the Wilmink function (Table 9), despite the higher estimated residual variance estimates are found for the smaller -2log(L), AIC and BIC criteria.Thus, the Wilmink function, despite its easier adjustment given the smaller number of parameters, is the one with better quality adjustment to the milk production data.
The values of the estimates of the variance components for the additive genetic (VGA), animal permanent environmental (VEP) and phenotypic effects (VP) are similar between the fitted function and tended to increase during lactation (Table 6 and Figure 2).The highest estimates of VGA and VPE were observed in the final third of the lactation curve (Table 6 and Figure 2).
The variance for the permanent environmental effect was the most relevant cause for variation since the beginning of lactation (Table 6 and Figure 2).This effect, in the case of primiparous animals, may be associated to problems that occurred during growth or during the pre-pubertal period.
In a study by Schaeffer (2011), it was observed that the permanent environmental effects can be cumulative over the course of the productive life of the animal.According to the author, animals can suffer new environmental influences every day in their lives.So, it could have lasting environmental effects that influence milk production in the first milk record of a cow.During that first lactation, the cow would be influenced by new effects that would be added to those already present.This may explain the higher values of the permanent effect of variances observed at the end of lactation (Table 6 and Figure 2).
Notice that, by adjusting the LEG5 model, decreases in genetic and permanent environment variances in the last month of lactation were observed, which was not observed when adjusting the other functions (Table 6 and Figure 2).This may be associated with the number of model parameters (LEG5) that needed to be adjusted (Table 5) and the lower number of test day in the last milk recording observations (Table 3).In a study by Costa et al. (2008) and Dornelles et al. (2016), problems with the fit of random regressions to extremes of lactation data were also observed and related to the highest number of parameters of this function.Regardless of the fitted function the genetic and permanent environmental correlations of milk yield, between test days, were close to the unity in more adjacent milk recording dates (Tables 8, 9, 10 and 11).The further the test days, the smaller the magnitude of the correlations.This may indicate that the additive genetic effects of genes and permanent environment effects affecting two consecutive test days are superior to the effects responsible for the performance between two more distant test days.Note.*DIM = Days in milk.
In adjusting the LEG5 (Table 9) and ALI (Table 10) functions, some negative genetic correlations between yields were observed, which were not observed with the LEG3 (Table 8) and WIL (Table 11) functions.Negative genetic correlations were observed only between the 5th and 305th days of lactation, when adjusting the LEG5 function (Table 9), and between the 5th and the other controls from the 185th day in milk adjusting for ALI function (Table 10).Despite the low magnitude of these negative genetic correlations, this may be associated with the difficulty of their functions to model the variations in milk production between controls taken at the beginning and at the end of the third lactation.This difficulty may be related to the number of parameters to be estimated by the functions, because in models with less parameterized functions such as LEG3 and WIL functions, no genetic correlation negative estimates were observed (Tables 8 and 11).In studies by Brotherstone, White, and Meyer (2000), Pereira et al. (2010) and Bignardi et al. (2011) negative estimates of genetic correlations between yields made at the beginning and end of lactation were also reported.Difficulties of adjustment of genetic and permanent environmental correlations for LEG5 also were verified of Dornelles et al. (2016).Note.*DIM = Days in milk.
It was also noted that, from the 155th day of lactation, genetic and permanent environmental correlations between subsequent test days were of high magnitude, regardless of the fitted function (Tables 8, 9, 10 and 11).
Possibly, genes that act on milk production in controls taken from the 155th day of lactation are the same.Breeding values for milk production in dairy cattle are highly related to genetic values of the milk yields in subsequent lactation controls.This implies that animals can be selected from this stage of lactation and that milk yields from not yet performed controls can be projected for the current lactation.This result is consistent with the observations already discusseded in this study related to the increased heritability observed at the end of the lactation.As a result, there will be a decrease on the generation interval and an increase on the number of daughters evaluated from each sire.Additionally, economic gains for producers can be observed, because lower productivity cows may be culled or replaced before completing their lactations.Correlations of high magnitude between consecutive milk weights taken from the 6th milk recording, close to 183 days of lactation, were also verified by Melo et al. (2005).

Conclusions
The Wilmink function is the most suitable for the study of primiparous cow test day milk yields from Holstein herds in the state of Minas Gerais, Brazil.
From the 155 days of lactation, the selection of animals can be practiced.In the proposed model fitting, selection for milk yield may be done at 155 days in milk.Later milk productions after 155 days in milk may be estimated by projection of the lactation.
Permanent environmental effects have more influence over primiparous cow test day milk yields particularly at the end of lactation.The nature of these effects on the productive performance of the animals should be taken into consideration, since these effects are important and may be cumulative throughout lactation.Permanent environmental effects must be considered when estimating genetic parameters.This will enable more precise the separation of variance components.

Table 1 .
Population description of five Holstein herds from in Minas Gerais, Brazil

Table 2 .
Numbers of monthly milk production records (TD) and numbers of lactations from five Holstein herds in the state of Minas Gerais, Brazil, assessed in the first lactation

Table 5 .
Choice criteria for models fitted with different functions to data of test day milk yield from primiparous Holstein cows from five herds in the state of Minas Gerais, Brazil

Table 8 .
Genetic (upper triangular) and permanent environmental (lower triangular) correlations for the test day milk yield from five Holstein herds in first lactation created in the Minas Gerais State, Brazil, obtained by adjusting the polynomial function of degree Legendre 3

Table 9 .
Genetic (upper triangular) and permanent environmental (lower triangular) correlations for the test day milk yield from five Holstein herds in first lactation created in the Minas Gerais State, Brazil, obtained by adjusting the polynomial function of degree Legendre 5 Note. *DIM = Days in milk.

Table 10 .
Genetic (upper triangular) and permanent environmental (lower triangular) correlations for the test day milk yield from five Holstein herds in first lactation created in the Minas Gerais State, Brazil, obtained by adjusting the polynomial function ofAli & Schaeffer

Table 11 .
Genetic (upper triangular) and permanent environmental (lower triangular) correlations for the test day milk yield from five Holstein herds in first lactation created in the Minas Gerais State, Brazil, obtained by adjusting the polynomial function of Wilmink