A Log-Density Estimation Methodology Applicable to Massive Bivariate Data

First, it is shown that a univariate bona fide density approximation can be obtained by assuming that the derivative of the logarithm of the density function under consideration is expressible as a rational function or a polynomial. Then, the density function of a bivariate continuous random vector is approximated by standardizing it and applying a polynomial adjustment to the product of the density approximants of the marginal distributions. As well, it is explained that this approach can easily be extended to the estimation of density functions. For illustrative purposes, the proposed methodology is applied to several datasets. Since this technique is solely based on sample moments, it readily lends itself to the modeling of large datasets.


Introduction
We initially consider the problem of approximating the density function of a continuous random variable.Obtaining an accurate density approximation can prove useful when the exact density function of a statistical quantity such as an estimator or a test statistic may not be tractable or have a simple closed form.The flexible methodology that is proposed relies on the moments of the target distribution and can even be utilized to approximate irregular or multimodal density functions.
There exist several types of density estimates and approximants.However, many of these techniques will fail to provide adequate approximations, especially when the target density is not a smooth unimodal function.Silverman (1986) provides a survey of the various available methodologies while (Reid, 1988) focuses on the saddlepoint approximation.Moment-based techniques are described for instance in (Elderton & Johnson, 1969), (Solomon & Stephens, 1978) and (Provost, 2005).Efromovich (1999) presents a unified account of nonparametric approaches to density estimation.Other types of nonparametric density estimates that are based on the L 1 norm are presented in (Devroye, 1985) while both parametric and nonparametric approaches are discussed in (Eggermont, 2001).The multivariate case is extensively treated in (Scott, 2015).
The bivariate density estimation methodology that is introduced in this paper relies on a univariate density approximation technique that produces differentiated log-density approximants (DLDA's) whereby the derivative of the logarithm of a density function is assumed to be expressible as a rational function.This approach only necessitates the moments of a distribution up to some particular order; accordingly, when used in conjunction with sample moments, it enables one to process large amounts of data that often arrive in streams without having to access previously collected observations.Upon solving a system of linear equations, the coefficients of the rational function can easily be determined, the density approximant being then obtained by solving a differential equation.This density estimation technique is then applied to each of the marginal distributions of a standardized bivariate sample; the product of the resulting density estimates serves as a base density that is adjusted by means of a bivariate polynomial whose coefficients are determined from the joint sample moments of the standardized dataset being modeled as well as those associated with the base density function.The resulting expressions assume relatively simple functional representations that can lend themselves to algebraic manipulations; this is not the case for kernel density estimates, which incidentally may not be as accurate, as is suggested by a numerical example (Example 3.1) involving a sample of simulated values.This paper is organized as follows.First, the technique being utilized for obtaining univariate DLDA's is developed in Section 2, including the special case where the derivative of the logarithm of the target density function is assumed to be a polynomial.The bivariate case is then considered in Section 3 where DLDA's are utilized to obtain approximants or estimates of the marginal density functions whose product is adjusted by a bivariate polynomial.To illustrate the applicability of the new methodology, several numerical examples are presented in Section 4.

Differentiated Log-Density Approximation
This section summarizes the results obtained in (Provost & Ha, 2015) wherein a novel technique for approximating continuous univariate density functions is introduced.This approach will be utilized in the next section to approximate the density functions associated with each of the marginal distributions of a standardized bivariate random vector.
We now explain how differentiated log-density approximants are determined.Let f X (x) be a continuous density function defined on the interval (α, β) ≡ S .It is assumed that the derivative of logarithm of f X (x) can be represented by a rational function, that is, where N ν (x) and D δ (x) being polynomials in x of orders ν and δ .Without any loss of generality, c δ , the coefficient of x δ in the denominator of r(x), is set equal to one.After determining the a i 's and c j 's, by solving a linear system involving a certain number of moments of the target distribution, f X (x) is approximated as where κ is the normalizing constant, which is such that the integral of f ν,δ (x) from α to β numerically integrates to one, and e ∫ x α r(y) dy is the solution of the differential equation specified by (6).
In light of Equations ( 1) to (2), one has from which the polynomial coefficients can be obtained as follows: Multiplying both sides of Equation (3) by x h and integrating over the interval (α, β) yields On interchanging the sum and the integral on each side of this equation and proceeding by parts for integrating the left-hand side, one has Note that the first term on the left-hand side, that is, f ν,δ (x) ∑ δ j=0 c j x j+h | β α , will be zero whenever f ν,δ (α) = f ν,δ (β) = 0 , which is the case for most densities of interest.Thus, omitting this term and letting µ h , h = 0, 1, . . ., ν + δ, denote the h th moment of the approximated density function f ν,δ (x), one obtains ν + δ + 1 linear equations having the following form: with µ(0) ≡ 1.In order to determine the unknown coefficients of r(x) as specified by Equation (2), one needs to solve the linear system resulting from Equation (4).On replacing the unknown µ(h) by µ X (h), for h = 0, 1, . . ., ν + δ, where µ X (h) denotes the h th moment of the distribution being approximated, one obviously obtains the following linear system: Once the solution of this linear system is obtained, one still has to solve the differential equation where r(x) = ∑ ν i=0 a i x i / ∑ δ j=0 c j x j , which can easily be achieved by making use of symbolic computation packages such as Mathematica or Maple.
Remark 1 The degree δ is set to be the number of times the density function (or its components in the case of mixtures) intersects the abscissa plus the number of points at which the density function is not differentiable, the roots of D δ (x) corresponding to the intersection points and the points of non-differentiability as the case may be.For instance, in the case of a triangular distribution, one would let δ = 3, and for the mixture of density functions described in Example 2.1, δ was set equal to 4.
For a given δ, let the integrated squared difference (or error) between the approximate density function f ν,δ (x) and the exact density function f X (x) over the support of the distribution be denoted by Remark 2 In order to quantify the discrepancy between the approximate density f ν,δ (x) and the target density f X (x) and to determine the optimal order of the numerator of r(x), we seek the value ν 0 such that ISD(ν 0 ) reaches a set tolerance level or ISD(ν) only decreases marginally beyond ν 0 .
The following algorithm summarizes the DLDA procedure for approximating a univariate continuous density function f X (x).
Algorithm Differentiated log-density approximation methodology 1.Let ν = 0 be the initial order of N ν (x) as specified in Equation ( 2) and δ, the order of D δ (x), be selected as per Remark 1. (It should be noted that, in most cases of interest, ν is greater than or equal to two.) 2. Evaluate the moments of the random variable X, that is, µ X (i) for i = 0, 1, . . ., r, where (These moments replace those associated with the approximated distribution appearing in Equation (4).) 3. Determine the coefficients of the rational function by solving the linear system (5).
4. Find the solution of the differential equation specified by ( 6) by making use of a symbolic computation package and normalize the resulting function to obtain a bona fide density function f ν,δ (x).
6. Repeat Steps 2-5 with larger values of ν until ISD(ν) is deemed to be sufficiently small as per Remark 2.
Example 2.1 Suppose that f X (x) is the univariate density function of a mixture of two equally weighted beta distributions with parameters (2, 20) and (3, 2).In this example, we set ν = 4 and δ = 4.The plots of the exact and approximate density functions are shown in Figure 1.In this case, after rounding to three decimals, the coefficients are a 0 = 0.031, .621 e −0.147+1.551arctan(2.337−12.731x) /0.036.

Polynomial Log-density Approximants
As a particular case, one may assume that the differentiated log-density function is a polynomial of order n, that is, in which case c 0 = 1 and the other c i 's are equal to zero in Equation ( 2).This gives rise to the following linear system: which, in matrix form, can be expressed as Ma = τ where a is the vector of unknown coefficients.We now show that M is a positive definite matrix.Suppose that z is an arbitrary non-null vector of ℜ n .Then, Thus, the linear system specified by Equation ( 8) has the unique solution M −1 τ.The resulting density approximant, which shall be referred to as a Polynomial Log-density Approximant (PLDA), will then have the following representation: where κ the normalizing constant, is determined by numerical integration.
Remark 3 The DLDA (PLDA) methodology can be applied in the context of density estimation by replacing the exact moments of the target distribution by the sample moments associated with a given dataset.In this case, the degree of N ν (x) is determined in terms of the sum of the squared differences between empirical distribution function and the estimated CDF obtained from One could select the degree ν for which SSD(ν) reaches a minimum value or beyond which SSD(•) does not decrease significantly.A suitable degree for D δ (x) can be determined by following the guidelines provided in Remark 1 on the basis of a preliminary density estimate such as a histogram.

Bivariate Density Estimation
In this section, the DLDA methodology, as described in the previous section, is initially utilized to approximate each of the marginal density functions of a standardized bivariate random vector (X, Y) ′ .A bivariate polynomial adjustment is then applied to the product of the marginal density approximants to produce a bivariate density approximation.As well, it is explained that the proposed bivariate density approximation methodology can be utilized in the context of density estimation by substituting joint sample moments of given orders to the corresponding exact joint moments of a target distribution.
When X and Y are independent random variables, their joint density function can be expressed as the product of the marginal density functions, that is, f X,Y (x, y) = f X (x) f Y (y).However, in general the variables forming a random vector are not independently distributed even after standardizing it, and some adjustment to the product of the approximate or estimated marginal density functions is needed.We are proposing to apply a bivariate polynomial adjustment to the standardized vectors, which yields a density of the form specified in Equation ( 10).The density approximant/estimate corresponding to the original bivariate distribution/data is then obtained by applying the inverse transformation.Now, letting (w i , z i ), i = 1, . . ., n, constitute a dataset with sample mean ( w, z), an estimate of the covariance matrix V is required in order to standardize these n observation vectors.Let this estimate be the m.l.e. of V, that is, V = {v i j }, where . The standardized data is then obtained as V−1/2 denoting the inverse of the symmetric square root of V.The x i 's and the y i 's are then uncorrelated (however, in general, they are not independently distributed), and we let where f ν 1 ,δ 1 (x) and f ν 2 ,δ 2 (y) denote the estimated marginal density functions for the standardized vector (X, Y) ′ and π p (x, y) is a bivariate polynomial adjustment of order p in each variable.Note that whenever δ i = 0, i = 1, 2, the subscript δ i is omitted on both sides of Equation ( 10) and that the subscript p is omitted when there is no polynomial adjustment.
The degrees ν 1 , δ 1 and ν 2 , δ 2 associated with the density estimates of X and Y are obtained in accordance with the guidelines provided in Remark 3. Due to the presence of a polynomial adjustment, smoother estimates (of lesser degrees) of the marginal density functions could be utilized.

Obtaining the Coefficients of the Polynomial Adjustment
The coefficients of the polynomial adjustment π p (x, y) = ∑ p i=0 ∑ p j=0 d i, j x i y j can be determined as follows.For simplicity, we denote the estimated density function f ν1,δ1,ν2,δ2,p (x, y) by f p (x, y), and f ν1,δ1 (x) f ν2,δ2 (y) by ψ(x, y) so that Let the (k, ℓ) th joint moment associated with the exact density function f (x, y) be denoted by µ(k, ℓ) = ∫ R 2

∫
x k y ℓ f (x, y) dx dy and the (k, ℓ) th joint moment associated with the initial density ψ(x, y), by m In order to obtain a computable representation of the approximant f p (x, y), one needs to determine the coefficients d i, j of the polynomial adjustment.To this end, the joint moments of the exact density f (x, y) are equated to those associated with f p (x, y), which yields x k+i y ℓ+ j ψ(x, y) dx dy, for k = 0, . . ., p and ℓ = 0, . . ., p, which yields the following (p + 1) 2 linear equations: Thus, the d i, j 's can be obtained by solving the linear system Md = µ where d and µ are vectors of dimensions (p + 1) 2 whose (i(p + 1) + ( j + 1)) th components, d i, j and µ(i, j), appear in the same order for i = 0, 1, . . ., p and j = 0, 1, . . ., p.Note that increasing p should theoretically result in greater accuracy.The generalization to three or more variables is straightforward.
The selection of the optimal degree p associated with π p (x, y) is then made in terms of the following sum of squared differences: International Journal of Statistics and Probability Vol. 6, No. 5; 2017   The bivariate kernel density estimate, the transformed (by means of the inverse of the standardizing transformation) base density estimate ϕ(w, z) obtained from the product of the estimated marginal densities based on the PLDA methodology and the final joint density estimate g 4,3,7 (w, z) wherein p = 7 is selected as the optimum degree for the polynomial adjustment, as indicated by the SSD(p) values plotted in panel (f) of Figure 4, are included in Figure 5.
Example 4.2 In this example, we consider the dataset 'Concrete' also included in the UC Irvine Machine Learning Repository dataset, which contains 1030 observation vectors related to concrete compressive strength in civil engineering.More information about this dataset is available in (Yeh, 1998).We selected 'Cement': kg in a m3 mixture as the W variable and concrete compressive strength as the Z variable.
In Figure 6 Example 4.3 The dataset being considered, which is called 'Covertype', contains 581, 012 observations.It was extracted from the UC Irvine Machine Learning Repository dataset.This data was analyzed in (Blackard and Denis, 2000) in connection with forest cover studies.We selected horizontal distance in meters to nearest roadway as the W variable and horizontal distance in meters to nearest wildfire ignition points as the Z variable.Figure 8 displays a three-dimensional histogram of the data as well as f 5 (x) and f 6 (y), the estimated marginal density functions, and the corresponding histogram plots.The bivariate kernel density estimate is shown in panel (d).
The transformed base density estimate ϕ(w, z) and the final joint density estimate g 5,6,7 (w, z) wherein p = 7 is the selected degree of the polynomial adjustment are both included in Figure 9.We observe that the proposed density estimate is consistent with the histogram of the observations.
Example 4.4 Finally, the bivariate dataset being considered and referred to as the 'Flood' data was collected in the Madawaska Basin, Quebec, from 1990-1995.It includes 77 observations.The first component of the data is the peak value and the second one is the volume.In this case, the DLDA methodology is applied and it is appropriate to let The bivariate kernel density estimate, the transformed density estimate ϕ(w, z), that is, the transformed product of the estimated marginal densities, and the final joint density estimate g 5,2,5,2,6 (w, z) wherein p = 6 is selected as the optimal degree for the polynomial adjustment based on the SSD(p) values plotted in panel (c) are all included in Figure 11.The SSD associated with the proposed density estimate, that is, 0.0265 is about a third of that corresponding to the kernel density estimate, which is 0.0782.

Concluding Remarks
A technique is developed whereby the derivative of the logarithm of a univariate continuous density function is approximated by a rational function, which enables one to obtain bona fide density approximants for each of the marginals of a standardized continuous bivariate density function.Then, a bivariate density approximant is determined by adjusting the

Figure 1 .
Figure 1.Exact (solid line) and approximated (dashed line) density functions in connection with Example 2.1.
Scatterplot of the original data (b) Histogram of the original data

Figure 6 .Figure 8 .Figure 9 .
Figure 6.The dataset, SSD(ν 1 ), SSD(ν 2 ) and estimates of the marginal density functions (Example 4.2) The scatterplot and a histogram of the data are displayed in panels (a) and (b) of Figure10.The SSD plots of the estimated marginal density functions that are shown in panels (c) and (e) of Figure10indicate that ν 1 = 5 and ν 2 = 5 are suitable degrees.The corresponding density functions are plotted in panels (d) and (f).
Figure 10.Dataset, SSD's and estimates of the marginal density functions (Example 4.4)