Ratio Type Estimation Using the Knowledge of the Auxiliary Variable for Ranking and Estimating

In this paper, the behavior of ranked set sampling is analyzed considering the knowledge of the auxiliary variable. The suggested estimators are compared with their simple random sampling counterparts. A numerical study is developed using data from a study developed on the contamination due to burning compost from solid waste from hospitals.


Problem Statement
Estimation of the population parameters is widely considered in statistical inference.Good estimators are considered as those with good properties, for example, that are unbiased, efficient, and consistent.There are unbiased estimators that are good estimators due to their very small mean squared error, such as ratio estimators, which were considered in this study.
When there is not auxiliary information available, using simple random sampling is commonly the only solution in statistical inference.When we have information about an auxiliary variable that is correlated with the variable of interest, other options are possible.This information about the auxiliary variable can be used to obtain a more efficient estimator from the population mean using a ratio estimator or regression estimators.
We will denote the finite population of interest as U, with N different and identified units, which means  = {1, ⋯ , , ⋯ , }.The list that allows us to identify each one of the population units is called frame.The population units have many characteristics of interest, some of them known and some others unknown.We will represent for variable Y the population characteristic that we want to study, and we are going to name it, variable of study.The value that takes such variable over the population is unknown, but it will be given for the value that each unit j allocates to such characteristic   , which means, y ={ 1 ,  2 , ⋯ ,   , ⋯ ,   }.
Moreover, we can register the known characteristics of a variable , that we will suppose -dimensional, to provide of  characteristics of each population unit.Thus, for each unit j in the population, we have a vector of additional information, x  = (x 1 , ⋯ , x  ).For this, the vector, x = {x 1 , ⋯ , x  } receives the name of additional information vector.This auxiliary information, that it is supposed to be related with the variable of interest, must be used to help to know the associated values with the variable of interest.
With this auxiliary information, we can find ratio type estimators to improve the precision of the estimation in SRS, SRSWOR or any other sampling design.Additional improvements in the ratio estimator are also obtained through the introduction of modified ratio estimators proposed by Singh, Tailor and Tailor (2010) and Al-Omari Amer I., Jaber Kalifa and Al-Omari Ahmad (2008).

Importance of the Problem
In applications, obtaining information from an additional variable is often costly, but classification of observations according to it, is relatively easy.
We assume that the "Ranked Set Sampling" (RSS) sample design can be used to improve the accuracy of estimations, regarding the simple random sampling (SRS), and at the same time maintaining the cost or time limit on sampling.The RSS was first applied by McIntyre (McIntyre, 1952), in his study about the estimation of mean yields of pastures, and later developed by different statisticians.
In this paper we consider that due to the relationship between X and Y, RSS can be used to estimate and to classify.Singh et al. (Singh, et al., 2010) proposed a class of ratio type estimators, which are expanded using our proposal and we believe that they are more accurate due to the reduction of the sampling error, for both, the ratio type estimator and the sampling design type RSS.
Therefore, our objective is to propose new ratio estimators for the population mean generated from an RSS design, to compare them with modified ratio estimators proposed by Singh et al. (Singh, et al., 2010) and Al-Omari et al. (Al-Omari, et al., 2008) and establish the gain in precision using the relative efficiency (RE) of the estimators.

Initial Definitions.
Let  ̅ = As we know  1 , … ,   and its distribution we are able to determine   =  { 1 , … ,   },   = { 1 , … ,   } and the needed quartiles.We will consider only the first and third quartiles of the auxiliary variable X denoted by  1 and  3 These values are perfectly known before sampling.Once a sample of size  is selected using SRSWR we are able to rank them using the known values of  before measuring .Hence ranked set sampling (RSS) can be used as an alternative sampling design.

Methodology
This is a theoretical and applied research, due to the new modified ratio estimators are proposed, for which approximate variance and bias are estimated using a Taylor series expansion.Once the approximate variance is estimated, the mean squared error of each estimator is found.In order to compare the estimators, the mean squared errors (MSE) of the estimators are estimated and are compared by quotient, that is  = ( ̂1)
Theorem: For the first degree of approximation Proof: See Singh et al (Singh, et al., 2010) and Al-Omari et al (Al-Omari, et al., 2008). The approximate Mean Squared Errors (MSE) are given by Theorem: For the first degree of approximation

Proof:
See Singh et al (2010) and Bouza-Al-Olomari (2011). The transformation Then is derived that the bias is given, approximately, by and the MSE by

𝑛
Another transformation is based on and the ratio type estimator derived using it is , its bias and MSE are approximately,

RSS Estimators
The ranked set sampling (RSS) was first suggested by McIntyre (McIntyre, 1952) for estimating the population mean of forage yields.He claimed that it was a more efficient method than the commonly used simple random sampling (SRS).
The ranked set sampling method can be described as follows: The theoretical frame that permits use of the RSS model is based on the hypothesis i We wish to enumerate the measurable variable Y.
ii The units can be ordered linearly without ties.
iii Any sample sU of size m can be enumerated.
iiii To identify a unit, order the units in s.To enumerate them is less costly than to evaluate {  ,  ∈ } or to order U.
In survey sampling settings it is logic ranking the units based on the values of an auxiliary variable correlated with the variable of interest.The basic RSS procedure is the following: Step 1: Randomly select  2 units from the target population.These units are randomly allocated into  sets, each of size .
Step2: The  units of each set are ranked visually or by any inexpensive method free of cost, say , with respect to the variable of interest .
Step2: From the first set of  units, the smallest ranked unit is measured; from the second set of  units the second smallest ranked unit is measured.
Step 3: Continue until the mth smallest unit (the largest) is measured from the last set.
Step 4: Repeat the whole process () times (cycles) Step 5: Evaluate the corresponding units.
Takahasi and Wakimoto (1968) provided the mathematical theory of RSS and showed that Without losing in generality we will drop the value  of the sample size in the notation in the sequel when it provides no further information.
Note that if  = 1 we observe only a RSS of size  = .
We will consider generically We can write We will consider the terms of order larger than 2 to be negligible in the needed Taylor series developments.

𝑚
)) These results support the following proposition Proposition.Consider the use of an RSS design of size  = , () =  ̅  ̅ +  and the estimator Therefore, a sufficient condition for having the last term of the bias of  ̅  smaller than the above equation is ) > 0 In such cases we have that  ̅  is more precise than  ̅  .
Similar remarks can be fixed easily but it is worth noting that using  ̅  is better than  ̅  alternative as   >   .A recommendation it to use an auxiliary variable with a large range   −   .
Let us look for the RSS version of  ̅  .We propose Proposition.Consider the use of an RSS design of size  = , and the estimator Its bias and MSE are

𝑚𝑛
)) and Proof: Take )) The MSE is obtained by calculating ) ) and Under the hypothesis used for being sure that  ̅  is more accurate than  ̅  the same result holds for  ̅  with respect to  ̅  .

A Numerical Study
We evaluated the behavior of the analyzed estimators in terms of their MSE's derived.We used data coming from a study of leaching of elements from solid waste compost.The grab samples were prepared from multiple grab samples, using coning and quartering methods.The compost was collected from hospitals.The particles were mechanically separated and passed through a fine.Each batch, send for burning, was evaluated in terms of its estimated toxicity.
This estimation is made from a sample analyzed by a laboratory.A qualification in the range 0-100 was given to each batch.This qualification was used for ranking.For the experiment a sensor was placed in the chimney for measuring the content in the smoke of plumb, magnesium, cadmium and the rest was classified as "other contaminants".The measurement was made for each batch introduced in the furnaces.The study was developed during six months.In the period 1678 batches were evaluated.
When (, ) > 1 is preferred the sampling strategy with mean squared error equal to   .The results are reported in tables 1-4.In all the cases the RSS estimators were more efficient.The largest gains were obtained by  ̅  when estimating the contamination in the case of plumb,  ̅  provided the arger gains in accuracy for magenesium and  ̅  for cadmium and for the rest of the contaminates

Conclusion
According to the obtained results in the numerical study, all ratio estimators constructed from an RSS design are at least as efficient as those proposed by Singh et al. (Singh, et al., 2010) and Al-Omari et al. (Al-Omari, et al., 2008), so the recommendation is that, the estimators proposed by us are used in the future, because, they do not represent an increase in cost and time of the sampling in the estimation phase and its mean squared errors are lower than those obtained by direct estimation or by modified ratio type estimation discussed in the present study for simple random sampling.We consider that, due to the relationship between X and Y the RSS design can be used to estimate as well as to classify.The ratio type estimators are an extension to those proposed by Singh et al. (Singh, et al., 2010), such extension resulted to be more accurate due to the reduction of the sampling error, for both; the use of ratio type estimators and for the RSS sampling design.
̅Developing a similar analysis we have Proposition.Consider the use of an RSS design of size  = , (

proposed the use of the transformed variables
are larger than the corresponding terms of the MSE of  ̅  .The last term can be rewritten, as   = 1 + rewrite the bias of the SRSWR-based estimator asThe first two terms of ( ̅  ) are smaller than the corresponding ones of ( ̅  ).

Table 1 .
Relative efficiency of the estimators for plumb

Table 2 .
Relative efficiency of the estimators for magnesium

Table 3 .
Relative efficiency of the estimators for cadmium

Table 4 .
Relative efficiency of the estimators for other contaminants