A Classroom Approach to Illustrate Transformation and Bootstrap Confidence Interval Techniques Using the Poisson Distribution

The Poisson distribution is here used to illustrate transformation and bootstrap techniques in order to construct a confidence interval for a mean. A comparison is made between the derived intervals and the Wald and score confidence intervals. The discussion takes place in a classroom, where the teacher and the students have previously discussed and evaluated the Wald and score confidence intervals. While step by step interactively getting acquainted with new techniques, the students will learn about the effects of e.g. bias and asymmetry and ways of dealing with such phenomena. The primary purpose of this teacher-student communication is therefore not to find the best possible interval estimator for this particular case, but rather to provide a study displaying a teacher and her/his students interacting with each other in an efficient and rewarding way. The teacher has a strategy of encouraging the students to take initiatives. This is accomplished by providing the necessary background of the problem and some underlying theory after which the students are confronted with questions and problem solving. From this the learning process starts. The teacher has to be flexible according to how the students react. The students are supposed to have studied mathematical statistics for at least two semesters.


Introduction
In this paper we will illustrate two different techniques for confidence interval construction from a teaching perspective, using the Poisson mean as the target parameter.We start with variance-stabilizing and symmetrizing transformations, where the Delta method is the main tool.Nonparametric bootstrap will be the next topic, where we concentrate on the basic and percentile intervals and some modifications thereof.
Papers involving transformation and bootstrap intervals for a Poisson mean include (Barker, 2002, Swift, 2009, Patil & Rajarshi, 2010).Chapter 4 in (DasGupta, 2008) is recommended, especially on the topic of transformation methods in general.
The primary aim is to show how you can teach these two types of techniques using a very simple setup.The students learn about quite complex reasoning in a straightforward way, due to the simplicity of the Poisson distribution.Furthermore, we will be able to compare performances of various interval estimators.From his own experience the author believes that using two seemingly disparate methods such as transformations and the bootstrap on the same problem have great advantages.Usually theory and methods are first presented and the results are applied on data (real or simulated).Then one moves on to additional methods which are applied to other types of data and so on.Students are often having trouble choosing which method to apply in a given situation, since they have been presented to many combinations of methods and structures of data.Keeping the problem fixed, as in this presentation, encourages students to concentrate on the methods at hand, including comparisons of their performances.
While introducing the theory step by step, students will be given tasks in groups to work on and the results are discussed in the classroom with immediate feedback from the teacher.On the one hand the teacher must at all times be able to slow down or even go back if the students seem to get into trouble, while on the other hand he/she should allow for and encourage unexpected initiatives and thoughts, which may lead to activities not planned beforehand.The latter situation symbolizes the strong connection between teaching and research: from what you have learned and are learning thoughts can evolve, leading to new ideas.
We will follow the teacher and the students during five days of lectures in a Master's level statistical inference course with emphasis on asymptotically motivated methods.The students are supposed to be familiar with statistical inference at the level of e.g.(Casella & Berger 2002, Hogg, McKean & Craig, 2005).It is further assumed that previously the students have investigated properties of confidence intervals for a Poisson mean based on the Wald and score statistics, see (Andersson, 2015).They found that for those values of the mean that we studied, the score interval outperformed the Wald interval.Since then the students have spent yet another semester on studies in probability and statistical inference, so it seems a good idea to explore some other methods in order to construct confidence intervals.
Remark: Whenever it says Student, it means a student representing one of the groups the students are divided into.

The First Day
Teacher addressing the students: We will start by considering the first three moments of the Wald and score statistics.Ideally, for a pivotal statistic Z in this situation: E(Z) = 0, V(Z) = E(Z 2 ) = 1 and E(Z 3 ) = 0.In our previous investigation we saw that for X Poisson distributed with the mean µ, the Wald statistic had a negative bias and a variance larger than one and furthermore possessed a substantial negative skewness.
The score statistic on the other hand, is unbiased with a unit variance and a positive skewness, which in absolute value turned out to be smaller than the skewness for the Wald statistic.
Generally, for a statistic to be really useful for inferential purposes using e.g. the normal distribution, we have at our disposal methods which deal with the following three problems: bias, non-constant variance and skewness.The second problem can be handled by variance-stabilizing transformations and the third by symmetrizing transformations and on top of this we can also adjust for bias, as it turns out.Now, before we look into these particular methods, and others, we will increase the level of information in the previous hypothetical example.Before we had 28 observed impulses (passing vehicles on a road) during one hour.Suppose that we partition the one-hour-period into ten six-minute intervals and further suppose that in each of these intervals we will get to know the number of impulses.(This will actually not be necessary for the subsequent transformation based intervals.)So, again we may assume, at least as an approximation, that we have a homogenous Poisson process in each interval with unknown intensity.
The additional information is the following sequence of observed impulses for the six-minute intervals: 2, 4, 0, 3, 5, 3, 1, 3, 4, 3. Our primary target is the expected number of impulses in one such interval, which we denote by θ.If we further let X 1 , . . ., X 10 denote the number of impulses in the intervals, then, of course, ∑ 10 i=1 X i has the expected value µ = 10 θ.Thus a natural estimator of θ is θ = X and μ = 10 θ.
In general, for an observed sequence x 1 , . . ., x n the 100(1 − α)% Wald interval for θ with observed x is and the corresponding score interval We derived these intervals by inverting the statistics (1) and (2), respectively.
In the search for alternative intervals, let us first look at a variance-stabilizing transformation (VST).As it turns out, we will have good use for the Delta Method, which you encountered in a previous probability course.We start assuming that for an arbitrary sequence, say, T n , we have the following convergence in distribution: and if we have a function g, where the derivative g ′ (θ) 0, then We are now going to use this result in a kind of "backward" way.As you can see, the asymptotic variance of T n is here allowed to depend on θ and if T n is to be used for the inference of θ , we would like to find a function of T n which instead has a (known) constant asymptotic variance.
Until the break you will work in groups trying to find a suitable function of T n .

After the Break
Student: We think you mean that g ′ (θ) 2 σ 2 (θ) = c 2 (a constant) and a primitive function solving this differential equation is c ∫ 1 σ(θ) dθ.T n = X here?Teacher: Yes, and formally we consider the iid sequence X 1 , X 2 , . . .X n , where X i ∼ Poi(θ), i = 1, 2, . . .n and from the Central Limit Theorem: We are not sure though what is to be done with c.
Student: The asymptotic variance of θ ) 2 θ = 1/4 and we have: Teacher: Correct and this particular result can be tracked back to (Bartlett, 1936).
Actually, now you have done the hard part of the job and from here we can construct an approximate confidence interval for √ θ and then it is quite straightforward to derive an interval for θ.
appr ∼ N(0, 1), so a 100(1 − α)% two-sided symmetric confidence interval for √ θ using the observed x is and the corresponding VST interval for θ is Comment from a student: This looks quite different from the Wald and score intervals!Teacher: Well, not if we expand the squares!Then we arrive at a kind of mixture between the Wald and score intervals.Anyway, with α = 0.05, n = 10 and x = 2.8, the interval ( 5) is (1.86, 3.93), so here we get the interval (18.6, 39.3) for µ.This can be compared with the Wald (17.6,38.4) and score (19.4,40.5).
Question from a student: When you earlier mentioned that we can correct for bias, I was not sure of what you meant.Bias of what?
Teacher: I was alluding to √ X and the fact that E( √ X) ≈ √ θ.But before we start looking for an estimator of θ, which is, at least, closer to being unbiased, we should also pay attention to the asymptotic variance 1/4.(Anscombe, 1948) found that for a Poisson distributed X the variance of √ X + 3 8 is closer to 1/4 than the variance of √ X is.We therefore say that √ X + 3 8 is a variance corrected transformation (an approximation of a higher order).
In our case X = ∑ n i=1 X i and µ = nθ and from this we construct the pivotal 2 ) The variance corrected VST confidence interval for θ is then It can further be shown that a better estimator of √ θ with respect to bias is have variance 1/4, so the resulting bias corrected interval for θ using √ X + 1 4n is?Until next break you will work in groups again to come up with a suggested interval.

After the Break
Student: We started with the pivotal 2 √ n ) From this we got the bias corrected VST interval Teacher: Yes, I agree with you.Others actually advocate using the pivotal ) Anyway, these last three intervals should numerically be close to each other and the 95% intervals for µ for our data using (5), ( 6) and ( 7) are (18.6,39.3), (18.5, 39.4) and (18.8, 39.6), respectively.
Theoretically we arrive at an important conclusion here, namely that we cannot find an "optimal" choice of an estimator of √ θ, in terms of reducing bias and at the same time attaining an "accurate" variance.Now let us look at the other transformation method, that of symmetrizing.Here we also have use for the Delta Method and the aim is to find a function g( X) with a distribution which is as symmetric as possible.Without explanation of the technical details involved, the result is that we should use g( X) = X2/3 and this is brought up in (DasGupta, 2008).What is then the resulting confidence interval of θ?
I suggest that you go back to your respective groups again and work on this for a few minutes.

After 10 Minutes
Student: Our group did not arrive at a resulting interval, but I will present what we have agreed upon.
But this is not a confidence interval!
Teacher: I think we will call it a day and meet again tomorrow, but I encourage you to continue to work on this problem until then.

The Second Day
Student: We have been working on the confidence interval based on X2/3 and we found that the values of β = θ 1/3 , for which the following function f (β) is negative, constitute the interval for θ 1/3 (and correspondingly, the values of θ for which the function is negative, yield the interval for θ): Teacher: What has your group been doing?Let me guess, you have worked on (8) in the same way as when you derived the score interval?Actually this was not what I had in mind.But please go on.
Student: We started with the assumption that the pivotal √ n constitute a 100(1 − α)% approximate confidence interval of θ, which we may call "score" symm T? Elaborating on this inequality, we found that it is equivalent to and letting β = θ 1/3 , We got f (β).We thought it was kind of cool, putting f (β) = 0, to end up with a situation where you have to solve a fourth-order equation: a quartic!Teacher: I agree that it is quite stimulating.In your case, are you sure that the resulting confidence set for θ is really an interval?
Student: No, we have actually not checked if we generally get an interval for θ out of (9).We were not sure if it is worthwhile to derive the solutions explicitly of the fourth-order equation generally, but we thought we could look at f (β) for n = 10, X = 2.8 and α = 0.05 and see how it behaves.
Teacher: Absolutely, and I suspect that you have already done this!
Student: But of course!Here is a plot (Figure 1) and we have done it for values of β ranging from 0 to 2.
Teacher: Apparently you get an interval here for β = θ 1/3 and I guess that from the equation f (β) = 0, two of the roots will be complex.
Student: Yes, and the two real-valued roots are 1.2358 and 1.5832 (using MATLAB).So, in terms of θ we obtain the interval (1.89, 3.97) and for µ: (18.9, 39.7).This is somewhat closer to the score interval (19.4,40.7) than the interval obtained from symmetrization.
Teacher: Quite right.Do we have an alternative solution from some other group?
Student: Maybe we have been taking an easier way out of this, arguing that X is a consistent estimator of θ, replacing θ 1/6 by X1/6 , finally arriving at the following Wald "symm T" interval for θ: This interval (10) yields that the corresponding 95% interval for µ is (18.3, 39.0).

Teacher:
The result is somewhat closer to the obtained Wald interval (17.6, 38.4), than the intervals using the variance stabilizing transformation.The "Wald solution" as you call it, can be found in (Brown, Cai & DasGupta, 2014).
Good work from both groups and from what I have seen, the rest of the groups have arrived at (10), which was what I had expected from you.Though the "score solution" was a bonus!Now I think it is about time to sum up our findings and make a table of the computed intervals.(18.8, 39.6) "Wald" symm T (18.3, 39.0) "score" symm T (18.9, 39.7) The bias and variance corrections make the VST intervals come closer to the score interval and the same holds for the score-type symmetrized interval.Now I would like to move on to another approach of estimation: the bootstrap.
We will consider both the parametric and non-parametric bootstrap.In this situation the former could be used as a check to whether the assumption of a Poisson process is valid or not.
Comment from a student: Everything we have done so far seems to depend on that assumption!From what I understand, non-parametric bootstrap does not rely on distributional assumptions, but we have not learnt how to construct confidence intervals using the bootstrap.
Teacher: We will get to that soon, but first we have the issue of checking the validity of the (homogenous) Poisson process assumption.
Let us look at an example from (Casella & Berger, 2002).We will compare x with s 2 and see if they differ much.They should, at least, be close to one another, if the Poisson assumption is true, since they are both estimators of µ.In our case x = 2.8 and s 2 ≃ 2.2.We can then mimic what they have done in that example and check if s 2 = 2.2 is an "unlikely" value when θ = 2.8.In other words, we find the probability distribution of S 2 when θ = 2.8.So, for tomorrow you will (group-wise) generate 1000 Poisson random samples of size 10, where each value is the outcome of a Poisson distributed random variable with θ = 2.8.Then you will compute, for each sample, S 2 and draw a histogram to find the approximate distribution of S 2 .

The Third Day
Student: This is our histogram of S 2 .The Poisson process assumption seems valid here since the value 2.2 is far from being an outlier.
Teacher: Yes, your observed sequence of impulses certainly does not contradict this assumption.As you already know, in the non-parametric bootstrap we take resamples from the original sample X 1 , . . ., X n under simple random sampling with replacement.For each such bootstrap sample X * 1 , . . ., X * n , we compute the value of some statistic.Here θ * = X * and we use this to "obtain" the distribution of θ = X.One way of constructing a confidence interval for θ is simply by finding the corresponding percentiles of the empirical distribution of θ * .This is described in e.g.(Hogg, McKean & Craig, 2005).
If we let the percentiles of this distribution be denoted by a * α/2 and a * 1−α/2 , the interval for θ is simply This is the so called percentile bootstrap confidence interval.Actually, this could also be achieved based on your generated Poisson samples from a histogram of X, as an example of the parametric bootstrap.The other approach, yielding the "basic" bootstrap confidence interval, is to start from the assumption that the distribution of θ − θ is the same as that of θ * − θ, i.e. θ * relates to θ in the same way as θ relates to the true value θ.How should we proceed from this?
Work in groups until the break!

After the Break
Student: It seemed a bit tough to have three components to keep track of: θ, θ and θ * , but what you said seems to mean that P(a Teacher: Yes!
The percentiles a * α/2 and a * 1−α/2 from the histogram of θ * are such that but that got us nowhere!
Teacher: If we combine this with what you have in ( 12) and also put those probabilities to 1 − α, we get and the resulting confidence interval is which looks quite different from (11)!
Teacher: Yes, but there are situations when they produce similar endpoints.If we put a * α/2 = 2 θ − a * 1−α/2 , we arrive at and we get the same result starting with the upper bound.What does this mean?
Student: Symmetry!Teacher: Yes, the empirical distribution of θ * tries to capture the distribution of θ and ( 14) implies that this distribution is symmetric.In this case we know that X is skewed, so these intervals should produce different results.However, as (Navidi, 2006) has put it: "...which is better is a matter of some controversy".
There is a command in MATLAB yielding bootstrap resamples.For tomorrow, use that to compute the two bootstrap intervals.

The Fourth Day
Student: We simulated 10 000 bootstrap resamples.So, the approximate 95% percentile confidence interval for µ was (19.0, 36.0) and the corresponding basic interval (20.0, 37.0).They have the same length and are shorter than all the previous intervals.We became curious enough to carry out a simulation study in order to find out the coverage properties of these intervals.
Teacher: Good, I am curious too!Student: We wanted to be able to compare the results with the earlier simulation study for the Wald and score intervals.Therefore, we chose θ = 4 and n = 10 (as in our hypothetical example), so µ = 40.The results were not good for the bootstrap intervals with coverage rates 91.3% (percentile) and 90.4% (basic).10 000 Poisson samples were drawn and for each sample 1000 bootstrap resamples were generated.For reference, the coverage rate for the Wald interval was 94.1%.
We guessed that n = 10 is not enough for the bootstrap method?
Teacher: No, but you could try θ = 2 and n = 20 and θ = 1 combined with n = 40.This will improve the results for the bootstrap method, while not affecting the Wald interval.But before you do this I would like to throw in yet another bootstrap interval!We could start by considering the pivotal for each bootstrap resample.The empirical distribution of t * then gives us percentiles t * α/2 and t * 1−α/2 (t * not necessarily symmetric around 0). From this, what will the resulting confidence interval be?I will give you before the next break to think about this in your groups.

After the Break
Student: In this situation we assume that the distribution of Teacher: Quite right, but you have actually assumed that the underlying distribution is the Poisson, since you let se( θ) = X/ √ n, so you get a kind of mixture between the non-parametric and parametric bootstrap.You might call this a "plug-in" estimator and the result is called a bootstrap t-interval, This is brought up in e.g.(Efron & Tibshirani, 1993), Efron being the inventor of the bootstrap.For a pure non-parametric approach, you should have used se( θ) = S / √ n.
Student: You are right of course, but we also thought about trying a "score" bootstrap interval.Since we know that the score interval is at least an improvement of the Wald interval, we could instead start with and arrive at Then it is just about finding those values of θ for which Teacher: Good idea.All groups can work on this for a while.
After 20 minutes: Student: We started with the "observed" pivotal which is a decreasing function of θ, so we should get the lower limit from letting and the upper limit from Teacher: Sound reasoning!

Student:
We put β = √ θ in both cases and in the first we got Only the positive root is possible here and From this the upper limit must be x Teacher: Correct!And now having all these intervals we should summarize our findings.I want you to perform simulations to study coverage and non-coverage rates for the seven types of intervals included in Table 1 and the four bootstrap intervals: percentile, basic, t and the last proposal, which we can name "score".Draw 10 000 Poisson samples, all of size n = 10.For the bootstrap intervals 1000 resamples should be drawn.Put the nominal confidence level to 95%.Let further θ = 1, 1.5, 4 and 10.
Good luck!

The Fifth Day
Student: We have put our results in a table.
Teacher: This will be interesting to study.What are your comments on the results in your table?Student: Compared with the transformation and bootstrap based intervals, the "old" score interval is still very competitive.The transformation based intervals also perform well and actually yield identical results except for the smallest value of θ.
As we saw earlier, the percentile and basic bootstrap intervals are not accurate with respect to either coverage or non-coverage.This goes for the other two bootstrap intervals as well, but the "score" interval is, at least, well balanced for all values of θ.Furthermore we have, as you suggested, tested other combinations of n and θ corresponding to µ = 40.The results improve for increasing n, but the coverage rates are still below the nominal level (Table 3).Teacher: Nearly all of these bootstrap intervals do indeed have coverage rates below 95%.Having observed this it should also be pointed out that we will obtain increased coverage rates using the parametric bootstrap, where the resamples are generated from the Poisson distribution, see e.g.(Patil & Rajarshi, 2010).A very short summary of your results is that the "good old" score interval is the most preferable of all the intervals you have studied in terms of maximizing the coverage rate.For some cases it is clearly conservative though.
As you have shown empirically, the differences between the transformation based intervals with respect to coverage and non-coverages are practically nil.If we compare the score interval 4n with the bias corrected VST interval we find that the mid-point of the score interval is shifted away (to the right) from x almost twice as far as for the bias corrected VST interval.From the results this seems to increase the coverage and yield more balanced non-coverages.
Another comment: In our study we have not compared lenghts of intervals, but from the interval expressions we may observe that the score interval is slightly longer than the transformation based intervals, see e.g.(Barker, 2002).
By the way, we have the Bayesian approach left to investigate for this situation.Where are you going?
Student: We do have other classes, you know!

Summary
The Poisson process/distribution was used to illustrate various techniques for construction of confidence intervals in an educational context.The students will learn more about the properties of the Poisson distribution and about some of the many options available to obtain confidence intervals of the mean.Most important though is to convey to the students a scientific and analytic way of thinking.We started with an explicit problem and discovered that the standard method did not lead to satisfactory results.We then tried to find out why the method failed and started looking for alternative approaches, which means that we need more advanced tools, such as the Delta Mehod.From a theoretical point of view the discussion presented is not very far from the research front.Generally, as commented in (DasGupta, 2008) on page 59: "No studies seem to have been made that compare bias-corrected VSTs to bias-corrected symmetrizing transforms."

Figure 2 .
Figure 2. Histogram of S 2 for 1000 random samples from a Poisson distribution with θ = 2.8.

Table 1 .
Wald, Score and Transformation Intervals for µ with x = 2.8.

Table 2 .
Coverage (CR) and Non-Coverage (lower: LNCR, upper: UNCR) rates for the Wald, Score, Transformation and Bootstrap Intervals with Nominal Confidence Level 95%