Assessment of Results Produced by an Algorithm Delivering Earthquake Epi-& Hypocenters Concurrently and in Real-Time

This paper is to suggest a means by which confidence limits may be placed around the Epi& Hypocenter values that are evolved, concurrently and in real-time, on the detection of an increasing number of P-wave first arrivals, by the present algorithm. As described in previous papers, this algorithm is table driven in that it uses an “interpolative tabular scanning process” to deliver its results. For this purpose a set of three main tables are provided: a table of travel times for rays originating from a graduated set of depth points to a given set of colatitudes; a table for a set of take-off angles corresponding to the traveltimes and a set of calibrating tables to correct numerical error found in the table generating process itself. These tables are generated by any from a set of point-to-point (P2P) ray tracers parameterized by any from a set of radial Earth velocity models {PREM, iasp91, ak135}. The production of the confidence limits is concomitant on considering that each value (i.e. discovered Epi& Hypocenter) is stationary, and subject to perturbation by error. In brief, the error is considered ultimately to be normal. Therefore normal theory is used, as in Chauvenet’s test to screen the production of individual localizations for outlying data inputs. Subsequently it is used to monitor for outliers in the set of localizations themselves. Use is then made of the t-variate (“Student’s t”) to dynamically establish confidence limits, of varying levels of significance, about regressions on the set of localizations as this set increases in real-time. The upshot being that the algorithm can produce sucessive localizations of the Earthquake as data input (P-wave onset times) arises and at the same time monitor the accuracy and integrity of the solutions.


Aims, Uses and Properties
The aim of this paper is to demonstrate that • The algorithm structure described below will produce results in close accord with the received values • This overall structure is capable of assessing the error (to given levels of significance) in this output.The authors feel that the algorithm shall be employed as a rapid and lightweight front-end for possible Earthquake Early Warning (EEW) systems.By using incoming data to rapidly evolve solutions for

• Epicenter and Hypocenter coordinates
• Take-off angles for events in real-time, there can evolve, from an initial small set of onsets, localizations of increasing accuracy.It must be emphasized that only the timings of P-wave first arrivals are initially required from the activated stations, and this, being one of the clearest observations an automatic energy-onset detection system can make, would make it a prime candidate for inclusion in the overall software architecture of a centralized facility supporting EEW.
The build-up to this algorithm is given in detail in references from (Daglish G. R. & Sizov Yu. P. 2011), to reference (Daglish G. R. & Sizov Yu. P. 2014. 2ECEES).In (Daglish G. R. & Sizov Yu. P. 2013) there is given a short explanation of how the algorithm appears to branch out from the main progression, being table-driven and dependent on pre-calculated tables for use in an "interpolative tabular scan" As stated these tables are 5-fold 1. Travel times for rays from each depth point in a set of depth points to a set of specific colatitudes 2. Take-off angles for each ray so treated 3. Callibration data for later interpolation to correct error due to the ray tracer's occasional nonconvergiance 4. Accuracy data monitoring the degree of convergence of each ray traced

Path length of each ray
These five tables are also accompanied by a Log of non-convergence which lists those points at which the P2P ray tracers did not achieve the required accuracy.
The timing for the primary (see below) algorithm is ~0.7 s/station for a set of 4 to 8 stations; ~1.3 s/station for a set of up to 30 stations and ~2.15 s/station for a set of up to 45 stations, running on a 3.2 GHz processor.

Data Structures and Modus Operandi
The theory alluded to in the Abstract, above, is implemented in the routine:

ETTST 16.cpp
This routine will evolve a set of sequential solutions or localisations as it accrues data from elements in the network surrounding the event.Each localization is accompanied by the following vector of information:  This set of vectors, produced sequentially, is destined for the output file:

ABTrackedData.txt
As stated the main point for the Algorithm is to produce an Epicenter/Hypocenter pair together with an error assessment.In the above vector definition we can see that the output fulfills this.The program ETTST 16.cpp incorporates two scanning algoritms in tandem: 1.A primary algorithm which produces the primary Epicenter/Hypocenter pair.
2. A subsequent algorithm, which takes the Epicenter from the afore-mentioned pair and, knowing this, performs a scan for a secondary Hypocenter.The two Hypocenters thus found tend to agree closely.
This program performs a restricted emulation of the output of an Earthquake event in so far as it takes in data (P-wave onset times) from stations (with corresponding station coordinates) in strict sequence but does not attempt to emulate the real time buffering that would be necessary if the emulation stuck to the actual time differences between the incoming data items.
The file which contains the input data for the entire process associated with one Earthquake is: The output from ETTST 16.cpp, apart from a narrative to the monitor, goes in the following directions, (these Data tables apply to a single, unique earthquake): HypoCentreResultSQ.txt (This contains all output relating to all the localisations performed, including Take-off angles) StationPolarTransformSQ.txt (This is a resume of the input subsets and traces of each scan) Combinations02Sequence.txt (This is a set of traces of the localisations made on all subsets for this Earthquake) ABTrackedData.txt (This is the set of data vectors for the localisations realised for each subset, structured as above) EpiCentrePolar04.txt(This is the set of Epicentre localisations for each subset in this Earthquake) The above output is created from the following input, ancilliary to the main

Overview of Software Action
The processing of each localization proceeds in 5 main phases (for symbols, see above): 1.The production of the next Epicenter/Hypocenter pair ( ) In phase 1: to screen and eliminate if necessary data used for the primary algorithm, within that algorithm In phase 4 and 5: to screen and eliminate if necessary elements of the series of localized: and to use the resulting map of eliminations in generating: ; The means whereby the "best estimates" are produced and how the Chauvenet procedure is implemented are described at Appendices B and C.
The system is tested by taking each of 7 known Earthquakes and passing their data through the routine ETTST 16.cpp (see above).
Summaries of these known Earhquakes are given at Table 1.
The data used for these trials is taken, using WILBER III (supported by Incorporated Research Institutes for Seismology -IRIS).Their parameters are tabulated at Table 1.

Algorithm Fine Detail
This section deals with the location procedures as mentioned at the outset and is largely paraphrased and quoted from Daglish G. R. & Sizov Yu.P. 2ECEES 2014.We begin with the scanning process.Although there appear to be two possible ways of scanning within this context: 1. that which makes use of a prior estimate of the coordinates of the event epicenter, here the secondary process 2. that which makes no such use of prior esrtimation, here the primary process.
space dictates that only the second, and more important, type will be dealt with in detail here.
The co-latitudes and longitudes of the set of active stations are formed into a set of Cartesian co-ordinates within the Earth space-frame.These are to be used later in the localisation calculation which is used to derive the Epicentral position.
The next step is the organisation of the set of P-wave first arrival times into a set of differences forming a fixed "Timing Template".Having formed these two sets of information the scan commences by: • Interpolating an entire co-latitudinal row of timings (P-wave 1 st arrivals from the tabular Structure referred to above) for the next depth-point reached in the scan.

•
Laterally scanning the fixed "Timing Template" along this interpolated row to generate co-latitudes corresponding to its elements by reverse interpolation.

•
The above two processes are repeated for each of the set of depth points which form the scan and the smallest local minimum of the indicator so found is taken to define the Epicentral co-ordinates, ε , and the Hypocentre Depth, H d .
To repeat this in plainer language: an actual fixed "Time Template" for the lateral scanning procedure consists of a set of differences: 0 ; 0, 1 The base in time for the lateral scan is defined as 0 T , and the template is shifted across the depth-interpolated time row as: Here N is the granularity of thr scan and max 0 . 0 T and max T are the limiting values of the depth-interpolated row.These new j t are used to reverse interpolate to a set of values for colatitude.At each point, i τ , in the scan the indicator is formed.This indicator is ( ) σ is the logarithm of a 2 χ variate and is to be minimised.The i ν are the colatitudes calculated for the trial epicenter, while the i c are those formed from the time-template.There are n stations.The calculation for the trial epicenter ia as follows: at each point in the scan, the colatitudes i c are used to form radii (as depicted in Figure 1, below).These radii are translated and then subtended from the known station locations.A least-squares calculation for the trial epicentral co-ordinates ensues.γ " is the unknown angle giving the immaterial orientation of the frame of the Table to the Earth space-frame.AC is a chord subtanded from the point of match, C, to the "epicenter pole" of the Table at A. This chord is then translated and placed with one end at L which now subtends it as the spherical radius, i r , of the th i sphere centered at the th i station colatitude, L.
Once all these radii have been assembled from the matched positions of timings within the template, then the following system of equations is solved for ( ) , , x y z ε , which will be a possible location for the epicenter within the Earth space-frame: In the above, the ( ) , , i a b c are the coordinates of the station positions within the Earth frame.r ε is a value for the Earth radius, while , n ( ) 3 ≥ , is the number of stations with which the scan is undertaken.The ( ) , , x y z ε is converted to Latitude and Longitude.

Results
Although there are 7 Earthquakes in the process, only the results of the 7 th will be fully presented, since space does not allow for all 7. However the results for the entire set can be made available on request.
All subsequent figures 2 to 7 apply to Earthquake 07, Santa Cruz Islands.The sequences of localisations found by this algorithm generally are characterized by an initial "erratic" or "transient" period followed by a phase when the output is stabilized around a more or less constant set of values.This is shown in Figure 3 which displays the output from the first localization usng 4 stations to 12 th localization, using 15 inputs.
Figure 2 shows a comparison between the RMS for residuals arising from the final Epicenter calculations and threshold values formed from a percentage of the mean of the radii also involved in the same calculation.It can be seen that the first points (5 to 15) are more erratic than those subsequent to point 15, which remain on a more regular "trajectory".This characterizes the transient phase for this set of localizations.
The zones 5 to 15 in figures 2 and 3 are complementary with the erratic behaviour in 3 corresponding to the disturbed zone in 2.
Herafter most of the graphs begin at about the level of 15 inputs (i.e. the 12 th localization).In fact the calculations establishing the confidence intervals (see Appendix C) do not output until they have accumulated 10 points of data to form mean-and sigma-values.Because of this initial wildness of the output, the confidence intervals tend to be wider at the start.
The effect of the Chauvenet rejections, (also see Appendix D), is discussed in the next section, "Results".

Discussion of Results
The main features to note are that:

•
The Hypocenter was delivered at 251.0 8.0 ± kilometers

•
The distance between the delivered Epicenter and the the received value was 10 1.78 7.89 01 ± − km.(This delivered Epicenter was estimated by the method given in Appendix A).

•
The confidence intervals are seen to severely contract when the number of rejected localizations peaked at the end of the run (see figures 6 & 7).
These values are very close to the received values and their Confidence Intervals are narrow.
By consulting Appendix D it can be seen that the evolution of the Confidence Intervals follows more or less closely the gradually increasing number of rejection of localizations by the Chavenet test.
The results arising from the processing of Earthquake 07 (Santa Cruz Islands), and given here, would appear to show that this algorithm which concurrently delivers Epi-& Hypocentres by "interpolative tabular scanning" is capable of monitoring the accuracy of its output reliably and in real-time.This forms a calculation for a point on a sphere,α , as "centroid" to a set of other points, i x , on a sphere.We have: . r ε is the local Earth radius.i ϑ is the angle between α and i x , at Earth centre.Considering the arc-lengths subtended by i ϑ , namely i S , we have: ( ) We should seek to minimize the sum: J be the Jacobean of f , then, applying the Gauss-Newton iterative method, we have: ( ) This is considered to have converged when the modulus of the adjustment vector: ( ) is less than a given small value, e.g. one meter.
However, if the convergence is not satisfactory, then it can be accelerated by including the Hessian matrix of the cost function in the calculation of the adjustment vector: Here the Hessian matrices, ) , and: The Chauvenet test for Outliers This simple test consists in comparing a normalized variate against a value taken from a table of values which establishes outlying limits of normal distribution according to the order of the set from which the candidate normalized variate is generated.If the normalized variate is greater than the table value then it may be judged an outlier.
The value from the table is indexed by n the order of the set from among whose elements outliers are required to be identified.This is a single tails test and the normalized variate builds up as: A more in depth way of describing this process of disassociation, which is Chauvenet's test, is to state as follows: "If the population from which the sample is drawn is normal then: represents that proportion of the population that is equal to or exceeds the value i h .This proportion can be forced to be: So that, now i h represents a margin of probability at and beyond which only smaller and smaller non-integral sample-elements can exist, then we can decide to call i h and all smaller values -outliers, We can say that the expected or probable relative frequency for the occurrence of the value i h is at the most 0 1 .n q ⋅ Further, we can solve for 0 z in the inequality below for a set of n elements and parameterized by values of , whose order is n , the index of the last localization.
The line of regression will have been found as: This regression will be performed anew for each localization as n increases A "best estimate" of d H from the set H , representing a value assumed stationary, would be: By which the Confidence Interval is given as: Here the t -variate is indexed from tables by the degree of freedom, ν , and the level of significance (probability level) required.In these tests a level of 5% is chosen.Thus the value h has a 95% chance of finding itself within the interval I .
The above refers to the "best estimate" and its Confidence Interval.However, we also would like to put a Confidence Interval about the latest entry to the set H .We write: of Total Processing Times for main algorithm, secondary algorithm , (vid.inf.28)for RMS Comparison on Calculation of ε (Epicenter coordinates) 12. Actual RMS from Final Calculation of ε , ε , and reference Epicentre, I ε , from IRIS) 17. Tracked δε value 18. Radial Discrepancy, δρ , as percentage (again) the derived Earth radius, while R ε is an Earth radius, local to the Latitude given at 20. 19.Tracked δρ value 20.Latitude 21. & Longitude 22. ρ (Calculated Earth radius) -20, 21, 22 are for the calculated ε , as in 16. , 24, 25 are IRIS Cartesian coordinates in Earth space-frame for I ε as reference.
the ε and the d H ; generating the δε Also generating the parameters for the secondary algorithm; i.e. the set of relative colatitudes for each active station based on the position of the primary ε3.The production of the secondary and consensusd H from the four interpolation regimes (see above at 1.1 ) 4. The application of the external Chauvenet procedure to the series { } the results of 4. (i.e.taking into account the temporarily eliminated elements of these two latter series), confidence limits are generated for: the Chauvenet test (Neville A.M. & Kennedy J.R.1964) takes place in two situations at two levels:

Figure 7 .
Figure 7. Series of "best estimates" of

h
is a general set element to be screened for disassociation from H .
: an angle between the current Tracked version of ε and I ε or the prior ε .34. Length of the spherical arc subtended by θ 35.A similar arc distance to the above, at 33 and 34, but between ( )

Best Estimate of Epicentre from the Accumulating Set
{ } i ε

Best estimation" and Confidence Interval
the number of elements in the sample set are deemed constant, it could be the same value, and are therefore only perturbed by error in the data and the self-noise in the algorithm.The overall error distribution affecting the values d H

Rejections by Applying the Chauvenet Test This
Appendix lists those Localisations which have been rejected as outliers by applying the Chauvenet principle.The first line, for instance, reads "at localization 11, the results of localizations 1 and 5 were excluded from the calculations leading to the "best estimates" of whatever variables and their corresponding Confidence Intervals".Such variables would be: