Heteroscedasticity and Model Selection via Partitioning in Fisheries Data

Selecting a proper model for a data set is a challenging task. In this article, an attempt was made to answer and to find a suitable model for a given data set. A general linear model (GLM) was introduced along with three different methods for estimating the parameters of the model. The three estimation methods considered in this paper were ordinary least squares (OLS), generalized least squares (GLS), and feasible generalized least squares (FGLS). In the case of GLS, two different weights were selected for improving the severity of heteroscedasticity and the proper weight (s) was deployed. The third weight was selected through the application of FGLS. Analyses showed that only two of the three weights including the FGLS were effective in improving or reducing the severity of heteroscedasticity. In addition, each data set was divided into Training, Validation, and Testing producing a more reliable set of estimates for the parameters in the model. Partitioning data is a relatively new approach is statistics borrowed from the field of machine learning. Stepwise and forward selection methods along with a number of statistics including the average square error testing (ASE), Adj. R-Sq, AIC, AICC, and ASE validate along with proper hierarchies were deployed to select a more appropriate model(s) for a given data set. Furthermore, the response variable in both data files was transformed using the Box-Cox method to meet the assumption of normality. Analysis showed that the logarithmic transformation solved this issue in a satisfactory manner. Since the issues of heteroscedasticity, model selection, and partitioning of data have not been addressed in fisheries, for introduction and demonstration purposes only, the 2015 and 2016 shrimp data in the Gulf of Mexico (GOM) were selected and the above methods were applied to these data sets. At the conclusion, some variations of the GLM were identified as possible leading candidates for the above data sets.


Introduction
Finding a suitable model for a given data set is a challenging method.The issue becomes more complex as the number of potential covariates increases.Although, many research articles have addressed this issue, it is still an open-ended question and every bit of progress is worthy of consideration.We may never be able to find a "perfect" model, but always attempt to find the most reliable one for representing a given data set.Model selection has been a subject of research for many years.Zucchini (2000) presented an introduction to the topic for non-specialists with basic knowledge of statistical concepts.Cherkassky and Ma (2003) presented an empirical comparison between Akaike information criterion (AIC) and the Bayesian information criterion (BIC), and the structural risk minimization (SRM).Hastie et al. (2001) claimed that the SRM method performs poorly and suggested that AIC results in a superior performance.Lubke et al. (2017) performed a simulation study for selecting a model via a bootstrap approach.In addition, Vrieze (2012) addressed the difference between the statistics AIC and BIC focusing on latent variable models.
Adding to the problem of selecting a proper model for a given data set, there are some other important issues, which play significant roles in the process.Heteroscedasticity is one of these issues where the data analyst should investigate.It is an important issue in modeling where the existence of it is often ignored by researchers.Heteroscedasticity is a statistical term meaning that the variability of a response variable is unequal across the range of its predictor and it is quite common in fishery data sets.Generally, it is the result of violating other assumptions.Heteroscedasticity gives the same weight to all the observations disregarding the possibility of some observations having larger error variances and containing less information about the predictor (s).Because of heteroscedasticity, least square estimates are no longer BLUE, significant tests will run either too high or too low, and standard errors and confidence intervals will be biased.
This topic has been addressed by many authors.Breusch and Pagan (1979) addressed this issue and developed a method known as the Lagrange Multiplier (LM) for testing the existence of heteroscedasticity in a data set.White (1980) modified the method by assuming that the error terms were not necessarily normal also included the non-linear heteroscedasticity in his approach.In addition, Marzjarani (2018 a ) applied these testing methods to the shrimp data 1984-2001 in the Gulf of Mexico (GOM).In this article, to check for heteroscedasticity the Breusch-Pagan test, hereafter, called the B-P test and the White"s heteroscedasticity test, hereafter, called the White test were deployed.Furthermore, since many data do not follow the assumption of normality, Box and Cox (1964) developed a method for transforming data to normal.In this article, this transformation was applied to the response variable in the data files used in this research.
There are situations where models that are more accurate can be identified.Large data sets such as the 2015 and 2016 shrimp data in the GOM provide an interesting luxury to the researchers in all disciplines.Like in an organization where each unit is responsible for a particular activity, in the case of having sufficiently large data records, the data set could be divided into two or three parts, each part responsible for a particular action.This technique is used in some areas of machine learning where a portion of the data is used to train the system.An example of this is a decision tree where after its construction, a portion of the data is devoted to train the tree and make it ready to accept and classify an incoming observation.It is relatively new to the field of statistics and statistical software packages such as SAS (1) have added this feature to their products.One approach here was to divide such data set into two parts, Training and Testing.Alternatively, due to the availability of a large number of records in each file, each data set was divided into three parts: Training, Validation, and Testing.Training portion was used to estimate the parameters of the model.The Testing portion of the data was used to estimate the predictive performance of the model and was not used to estimate the parameters in the model.The Validation portion was used for the purposes such as terminating the selection process or selecting the final model.

Methodology
Heteroscedasticity, data partitioning, and model selection have not been addressed in depth in fisheries.For this reason and for demonstration purposes only, in this article the 2015 and 2016 shrimp data in the GOM were selected for analysis.The description of each file along with the process of preparing these files for analysis is similar to the presentations given in Marzjarani (2016) and Marzjarani (2018 b ).Following the preparation phase, in the next step, each data file was checked for the existence of heteroscedasticity using the B-P and White tests.Then, in order to account for the presence of heteroscedasticity, generalized least square (GLS), weighted least square (WLS), and feasible generalized least square (FGLS) were deployed.It must be noted that the ordinary least square (OLS) does not address heteroscedasticity, but it is extended to GLS and FGLS.The OLS gives equal weight to observations regardless of the fact that the observations with large residuals contain less information about the model.The reader is referred to Fomby et al. (1984), Musau et al. (2015), Wooldridge (2002), Poloni and Sbrana (2014), and others for details on these topics.
As mentioned earlier, each data set was divided into three parts: Training, Validation, and Testing.The model considered in this research was similar to the one used in Marzjarani (2018 b ) and it is listed below with notations borrowed from the same reference.
In this model, length is the vessel length (size), ln (totlbs), hereafter, called lnlbs is the natural logarithm of aggregated pounds of shrimp harvested, wavgppnd is the weighted average price per pound of shrimp per trip, area (a categorical variable with four levels), and depth and trimester are categorical variables with three levels.The response variable in (1) is towdays and in (2) is lntd.
Under the assumption of finite sample properties and Ω = σ 2 I, where I is an identity matrix, the ordinary least square (OLS) estimate of β is: ̂= (") -1 (") (3) It can easily be shown that this an unbiased estimator of β.Under the assumption of Ω = σ 2 I, the model defined by (2) is called a "Homoscedastic" model.The normality assumption on the error term in (2) is not needed when performing OLS, but is necessary to conduct statistical tests such as t-tests and F-tests on model parameters.Deviation from this requirement may be relaxed when dealing with a large sample size (The central limit theorem, CLT).
Most data sets present some degree of heteroscedasticity, that is, the diagonal elements of the matrix Ω are not identical.Two possibilities present themselves here.The first is where these elements are known.In such cases, we can reduce a heteroscedastic model to a homoscedastic one as follows.The matrix Ω can be decomposed into Ω= (') -1 (See Bení tez, and Liu, (2013), Dereniowski, and Kubale (2004)) and multiplying both sides of (2) on the left by ω will result in  = + .Since E( )=0, then E[()()']=σ 2 Ω', which results in a homoscedastic model.By the application of OLS, it can easily be seen that the vector β can be estimated by the following: ̂= (" -1 ) -1 (" -1 ).
(4) This is known as the generalized least square estimate of β denoted by  ̂GLS .It can also be shown that this is an unbiased estimator of β.A special case of GLS is where the off-diagonal elements of Ω are 0 (no correlations among the observed variances), but the diagonal element of this matrix are not identical (heteroscedasticity).The method used to estimate β in this situation is known as weighted least square (WLS).There are several ways to define the weight.For example, the weight for an element could be inversely proportional to the variance of the response for this element.In this article, to correct (or at the least to improve heteroscedasticity) two weights were defined as follows: (5) where  is the residual of the model defined in (2) and the remaining symbols are parameters used in the models.
Another possibility is that the variance-covariance matrix of the error term is unknown.For simplicity, here it was assumed that Ω was a diagonal matrix with fully unknown elements.In such case, the GLS does not exist.The appropriate method to use under this condition is known as feasible generalized least squares (FGLS).There are different ways to implement FGLS.The third weight considered here was the FGLS defined as follows: In these formulas,  and  are vectors of the residuals and the natural logarithm of the residuals squared respectively.For simplicity and without loss of generality (WLOG), hereafter, the vectors  1 ,  2 ,  3 are labeled as w 1 , w 2 , and w 3 respectively.
Following the steps for dealing with the heteroscedasticity issue, the next phase was the selection of covariates for the model given in (2) via Forward and Stepwise selection procedures.In addition, some statistics or options were included in each of the two selections resulting in 8 variations of model ( 2), hereafter, called models 1 through 8. Table 1 displays these models along with the optional hierarchies Single (S) or None (N).In the case of a Single hierarchy, only single effects are allowed to enter the model or to be removed from the model.With the option None, all effects are allowed to enter or leave the model.In order to implement these methods and options, the statistical software package SAS (1) was used throughout this research.SAS (1) also provides a third hierarchy option called Singleclass which was not included in this article.It is the same as hierarchy Single except that only CLASS effects are subject to the hierarchy requirement.
Table 1.Selections and options/statistics for models 1 through 8. Since the difference between AIC and AICC, that is, 2p*(p+1)/ (n-p-1) where p is the number of parameters in the model and n is the sample size was negligible, the latter quantity was removed from further inclusion in the model.

Model
For the 2015 and 2016 shrimp data, out of 8 model selection procedures for each hierarchy, a model was selected based on the minimum value of ASE test, the maximum value of Adj.R-Sq, and the minimum values of the remaining criteria AIC, SBC, and ASE validate in the order listed here.
The above approach produced a few variations of (2) for the 2015 and 2016 shrimp data sets.In the next step for each data set, these models were compared and the lists of possible candidate models for the 2015 and 2016 data were generated.The differences among these models were in the number and type of covariates.In either 2015 or 2016 data (where applicable), the model with the minimum number of parameters was called the "Initial" and the remaining ones were called "Full" models.In what follows, it was assumed that each full model was just the result of adding more variables to the initial model.Upon adding more variables to the model, the sum of the square of the error in the initial model (SSE I ) will be reduced.At the same time, the model sum of the squares will be increased by, say, SS a .The reduction in SSE I or increase in SS of the full model is caused by, say, q variables added to the model.In order to set up the hypothesis for testing the impact of the additional covariates, let SS a =SSE Full -SSE I, (7) where SSE Full is the error sum of square term in the model with additional q variable(s).The ratio of the mean square of the difference (SS a /q) and that of the full model (MSE Full ) is a proper criterion to use as the test statistic for justifying the addition (s) to the model.The hypotheses were defined as follows: H 0 : The addition (s) of covariate(s) not justified. vs The test statistics for testing (8) can be expressed as: The degrees of freedom for this test statistic are q and dfMSE Full .Larger F values in (9) support the rejection of the null hypothesis.
The plots of the response variable (towdays) in both the 2015 and 2016 shrimp data showed that the distributions were positively skewed (skewed to the right).Some authors impose the normality assumption on the response variable.The natural logarithm of this variable showed that the normality of the response variable was satisfied.Although not needed here (log transform), the reader is referred to Marzjarani (2016) where the empirical rule was deployed for checking the normality assumption of the error term in (2).The Box-Cox transformation was also applied to the original and the transformed data.This transformation as appeared in that reference can be written as: where g is the geometric mean of the response variable y, I a is the indicator function, ** is the exponentiation operator, * represents the multiplication, and ⊕ is the exclusive OR operator.
Upon the selection of the candidate models, a file was created with the following contents: The pair of year and weight as one field (such as year2015w 1 ), the model (1-8), the hierarchy code SN (S or N), and the corresponding effort figures as the response variable.A two-way ANOVA model: was fitted to this data file and some statistical analyses were performed.

Analysis/Results/Discussion
It was not the intention of this article to propose any method (s) for the shrimp effort estimation in the GOM.Since the topics included in this paper have not been addressed in fisheries in depth, the 2015 and 2016 shrimp data files in the GOM were selected and the issues covered in this paper were applied to these data for demonstration purposes only.The analysis was performed on these data using the three weights given earlier in ( 5) and ( 6).Table 2 displays the results of applying the B-P and White tests to the original (raw) data as well as the results of applying the three weights to the 2015 and 2016 shrimp data files.Out of the three weights, w 2 did not improve heteroscedasticity as well as the other two weights and therefore this weight was eliminated from further consideration.For the 2015 data, both w 1 and w 3 seemed appropriate and they were selected for additional analyses.As for the 2016 data, w3 scored lower in both the B-P and White tests and therefore it was selected for additional analysis.  (1.
For later comparisons, the original 2015 and 2016 data files were analyzed under the assumption of homoscedasticity using models 1 through 8.The results are illustratein Table 3, Table 4, Figure 1, and Figure 2.           As stated earlier, the 2016 shrimp data file was analyzed under the choice of w 3 as the weight.Figure 6 displays the results of applying models 1 through 8 to this data set for the hierarchy Single/None respectively.
Figure 6.Percentages of effort estimates for the 2016 shrimp data file using models 1-8 with hierarchy Single/None using w 3 .
Tables 6 through 8 display the values of different statistics used in selecting a model.The selection was performed based on the minimum value of ASE test, and in the case of ties, the maximum of Adj.R-Sq, followed by the minimum of the remaining criteria in the order of AIC, SBC, and ASE validate.The final selections for the 2015 and 2016 shrimp data are listed in Table 9.     (1): Subscripts represent levels of the categorical variables.
The selected models were further analyzed by deploying a two-way ANOVA with no replications/interactions where models 1 through 8 forming the first factor and the hierarchy option Single/None (SN) as the second factor with efforts as the response variable.The hypotheses were to find out if there were differences in effort estimations among the models 1 through 8 or there was a difference in effort estimates between S and N.In addition, a couple of contrasts comparing the model mean efforts and the hierarchy mean efforts were included in the analysis as listed below.These were selected arbitrarily and for demonstration purposes only.
Hypothesis: The average effort generated by hierarchies S and N are the same.
Table 10 displays the results.The selected models 1-8 in 2015 with w 1 as the weight showed a significant result.In addition, the comparison between models 1, 3, 3, 4 with 5, 6, 7, and 8 was also significant.Further analysis showed that model 6 was placed in a separate group than the remaining models.Of course, column 6 in this table is equivalent to column 3 as it is the t-test version of an F-test.Again, this test was performed for demonstration purposes only.In the next step, for the 2015 data the weights w 1 and w 3 captured all the effects in the models (including the intercept).
Either of these could be selected as a candidate model for this data set.However, if it would make sense at all, models 4 and 1 had lower ASE tests for the hierarchies S and N respectively.The approach was applied to the 2016 data set with model 8 as the initial model.Results are listed in Table 11.As displayed in this table, model 7 showed a minor increase in the adj.R-Sq and therefore it was selected as the possible candidate for the 2016 data set.(1): Both significant at p-value<0.0001 Figures 1, 2, 4, 5, and 6 display the change in effort estimates by model and hierarchy and Table 12 contains the corresponding CVs for each year.In the case of 2015 data, the variations among models with hierarchy None was lower than those under the hierarchy Single.The 2016 data showed the opposite results.However, there was not sufficient evidence that the difference was statistically significant (Table 10).In this article, the 2015 and 2016 shrimp data files in the GOM were analyzed by considering the possibility of heteroscedasticity and some alternative estimation models.This study extended the heteroscedasticity issue presented in Marzjarani (2018 a ) where the scope of that reference was limited to the use of the WLS.The GLS (and its special case, WLS) and FGLS were included in this study.As displayed in Table 2, both the 2015 and 2016 data files contained heteroscedasticity.The weights w 1 and w 3 reduced the severity of heteroscedasticity to some degree, but not completely.
Weights that are more appropriate might be considered.
In order to get a more reliable set of parameter estimates for the model considered in this study, each of the 2015 and 2016 shrimp data in the GOM was divided into three parts (Training, Validation, and Testing).The percentages used were 25, 50, and 25 respectively.It would be at the discretion of the researcher to modify these and use different percentages as desired.
Analysis showed that the hierarchy Single/None did not play a significant role in 2015 data.The 8 variations of the GLM showed a significant difference in the 2015 data under the weight w 1 Out of 8 variations considered in this article, the selected ones are listed in Table 9.The selection was performed by considering the minimum value of ASE test, the maximum value of the Adj.R-Sq and the minimum values of the other criteria used in this study.This selection is subject to the discretion of the researcher and understanding of the differences among these criteria.
In his 2012 article, Vrieze (2012) addressed the difference between the AIC and the BIC statistics in detail.As appeared in that article, BIC is asymptotically consistent in selecting a model if the said model is among the candidate models.In that respect, AIC is not efficient.If the model is not among the candidates, AIC is more efficient as it selects the model, which minimizes the MSE of the prediction or estimation.Kuha (2004) argues, "It is argued that useful information for model selection can be obtained from using AIC and BIC together, particularly from trying as far as possible to find models favored by both criteria."Hansen and Yu (2001) extensively studied another model selection method called the minimum description length (MDL) and extended the work of Rissanen (1978) where the idea was to select a model based on the shortest description of data.Through simulation, they showed that MDL outperformed AIC, AICC and BIC.They also showed that the two-stage MDL is equivalent to BIC.
The criteria such as AIC were included in this article when selecting a model.Priority was given to the ASE test followed by Adj.R-Sq and the remaining criteria listed in Tables 6 through 8.However, to limit the scope of this paper, MDL was not included in the selection process.Rissanen (1978) addressed the information criteria in depth.As appeared in that article, "Sometimes, the AIC-favored model might be so large as to be difficult to use or understand, so the BIC-favored model is a better choice."The question to consider here is the "Model parsimony."As stated in the above reference, "Model parsimony is not a motivating goal in its own right, but is a means to reduce unnecessary sampling error caused by having to estimate too many parameters relative to n." Aho et al. (2014) state "While some scientists feel that more complex models are always more desirable (cf.Gelman, 2009), others prefer those that balance uncertainty, caused by excessively complex models, and bias, resulting from overly simplistic models.The latter approach emphasizes parsimony."The first argument calls for including all the variables, which have significant effects on the model.This of course, might result in an overly complex and overfitting model especially if the sample size is small.The second argument emphasizes a balance between the model complexity and its simplicity.Needless to say that each approach has an advantage and a disadvantage.For example, a more complex model generally requires more expertise.In addition, the selected approach heavily depends on tangible resources.
Applying the log transformation to a dependent variable normalize the residuals (See Lo and Andrews (2015)).The question of the normality assumption on the dependent variable in a GLM has been raised many times in the literatures.The distinction between GLM and generalized linear mixed models (GLMM) is the fact that GLMM does not make the default assumption that this distribution is Gaussian and therefore requires that the researcher specify an appropriate distribution (See Feng, et at.( 2014)).Table 5 still indicated a slight departure from normality.However, the skewness and excess kurtosis were close to 0 indicating that the normality assumption was approximately satisfied.
The transformation to normality proposed by Box and Cox (1964) did not play a role in capturing normality beyond what was achieved by the logarithmic transformation (λ= 1).That is, in both 2015 and 2016 data, skewness and kurtosis remained unchanged at the acceptable level for satisfying the normality assumption following the application of this transformation.This was expected, as generally speaking, log transformation will shrink large values much faster than small values, but it does not necessarily make the data normal (Feng et al. (2014)).The Box-Cox transformation in fact reduces to a log-transformation when the parameter λ=0.For additional information on the log transformation, see Koch (1966) and Koch, (1969) and the related article by McAlister (1879).Formula (10) holds only for positive values of y (here either towdays or lntd).Since lntd will be negative or zero for towdays less than or equal to 1, a proper number, say, λ 1 , as the scale parameter must be added to it before it is entered into the equation.
Although the proposed method (s) was applied to the fisheries data only, it can be extended and deployed in analyzing other data sets.However, the results/conclusions might vary depending on the data sets used.
In this research, the GLM consisted of both categorical and continuous variables.The decision of retaining significant parameters was performed by the selection methods and options listed in Table 1and in addition, it was governed by the selection of Training, Validation, and Testing data sets by the software, and also the percentages assigned to these defined by the user.
In order to implement a categorical variable, one must create some coding patterns.Examples of coding patterns include dummy coding, effect coding, and orthogonal coding among others.Statistical software packages have designed their own coding systems.For example, SAS (1) and STATA (1) use dummy coding whereas JMP (1) deploys effect coding.
Categorical variables play an important role in data analysis.However, they present some issues regarding "how" they are to be implemented.For example, multiple imputation was developed to handle this class of variables.However, one must make sure that this method is appropriate for estimating missingness in a given categorical data due to the rounding issue involved (Horton et al., 2003, Marzjarani, 2018 c ).
It must be mentioned that the user must know the default coding pattern used by the software.In SAS (1) , for example, if the coding pattern is different from the default used by the software, the CLASS option in PROC GLM will not be needed and if it is used, the software simply uses its own default pattern.In other words, the CLASS option in this procedure uses the default coding regardless of the coding patterns selected by the user.In addition, it should be mentioned that some SAS procedures do not support the CLASS option at this time and if a categorical variable were to be passed to these routines, it would have to be passed as dummy or effect coding, for example.
Theoretically, all non-significant levels of a categorical variable should be retained if such variable is significant as a whole.This preserves the relationships among the intercept and the levels of the said categorical variable and estimates.Such relationships include, for example, in the case of only categorical variables using dummy coding, each parameter estimate equals the mean of the corresponding level minus the mean of the reference level and the intercept is equal to the mean of the reference level.In the case of using effect coding, for example, the intercept is equal to the grand mean of the said categorical variable.However, when continuous variables are also present, they will affect these relationships in the sense that one also needs to consider the contributions of these variables to the intercept.The intercept in the model represents the predicted mean value for the response when all covariates in the model are set at their "bases" or reference levels when using the dummy coding pattern, for example.Regardless of which of the coding patterns used, the predicted values must and will be the same, but some figures such as parameter estimates will be different.
If some levels of a categorical variable are non-significant, instead of retaining or removing those from the model, one approach is to combine (collapse) the said level (s) with the corresponding reference level because their distinctions from the reference level seem to have no significant effect on the response variable.Clearly, none of the retaining, removing, or collapsing the non-significant levels of a categorical variable is justified and none provides a perfect solution to the issue.Box (1979) states, "Essentially, all models are wrong, but some are useful." (1). References to any software package throughout this article does not imply the endorsement of the said product.

Disclaimer
"The scientific results and conclusions, as well as any views or opinions expressed herein, are those of the author(s) and do not necessarily reflect those of National Oceanic and Atmospheric Administration or the Department of Commerce." p select= SBC (default), stop=validate select= SBC (default), stop= AIC, choose=validate select= SL (Significance level), sle 2 =0.15 (Default), sls 3 =0.15(Default), choose= C p ) Specifies the order in which the parameters first entered the model Single (S)or None (N) (4) Single (S)or None (N) Single (S)or None (N) Single (S)or None (N) Single (S)or None (N) Single (S)or None (N) Single (S)or None (N) Single (S)or None (N) 1: Significance level for the test statistic F for entering or departing a variable.2: Significance level for entry.3: Significance level for stay.4:Hierarchy=None (N) is the default and it is equivalent to no hierarchy.Note: Defaults are only relevant to the software package used.

Figure 1 .
Figure 1.Percentages of effort estimates for the 2015 shrimp data file using models 1-8 with hierarchy Single/None under the assumption of homoscedasticity.

Figure 2 .
Figure 2. Percentages of effort estimates for the 2016 shrimp data file using models 1-8 with hierarchy Single/None under the assumption of homoscedasticity.Table5and Figures 3a and 3b display the results of testing for normality of the 2015 and 2016 shrimp data in the GOM.In both cases, the original data were sharply right (positively) skewed.However, the logarithmic transformation solved this issue to a satisfactory point.The Box-Cox transformation did not improve the log-transformed data set towards the normality requirement.The minor deviation from normality as displayed by the skewness and kurtosis coefficients should not affect the estimation process significantly.
-Wilk test was not performed due to the large sample sizes (> 50).

Figure 3a .
Figure 3a.Plots of the data points in shrimp data file 2015 (towdays, lntd and after the application of Box-Cox transformation to lntd.

Figure 3b .
Figure 3b.Plots of the data points in shrimp data file 2016 (towdays, lntd) and after the application of Box-Cox transformation to lntd.

Figures 4
Figures 4 and 5display the results of applying models 1 through 8 to the 2015 shrimp data file using the weight w 1 and w 3 for the hierarchy Single/None respectively.

Figure 4 .
Figure 4. Percentages of effort estimates for the 2015 shrimp data file using models 1-8 with hierarchy Single/None with weight w 1 .

Figure 5 .
Figure 5. Percentages of effort estimates for the 2015 shrimp data file using models 1-8 with hierarchy Single/None with weight w 3 .
statistics of interest used to select the most appropriate model for the 2015data using hierarchy Single/None with w 3 as the weight.

Table 2 .
Selection of weights based on the B-P and white tests.All test statistics in this table were significant at p-value< 0.0001.Output was generated via SAS

Table 3 .
The ASE and the number of significant parameters for the 2015 shrimp data file using models 1-8 with hierarchy Single/None under the assumption of homoscedasticity.

Table 4 .
The ASE and the number of significant parameters for the 2016 shrimp data file using models 1-8 with hierarchy Single/None under the assumption of homoscedasticity.

Table 5 .
Results of testing for normality of the 2015 and 2016 shrimp data before and after the application of Box-Cox transformation.

Table 6 .
Some statistics of interest used to select the most appropriate model for the 2015 data using hierarchy Single/None with w 1 as the weight.

Table 8 .
Some statistics of interest used to select the most appropriate model for the 2016 data using hierarchy Single/None with w 3 as the weight.

Table 9 .
Summary of the selected models and the corresponding information for the years 2015-2016.

Table 10 .
Results of deploying a GLM with covariates model (1 through 8) and the options Single/None (SN) with the response variable effort.

Table 11 .
Selections of final models for representing shrimp data file, 2016.