Multi-Stage Image Restoration in High Noise and Blur Settings

We describe a simple approach useful for improving noisy, blurred images. Our approach is based on the use of a parallel block-based low rank factorization technique for projection based reduction of matrix dimensions and on a customized iteratively reweighted CG approach followed by the use of a Fourier Wiener filter. The regularization scheme with a transform basis offers variable residual penalty and increased per-iteration performance. The outlined approach is particularly aimed at high blur and noise settings.


Introduction
It is often necessary to enhance blurry images corrupted with noise.While many approaches have been developed for both denoising and deblurring [9,10], most of the methods are not designed for high noise and blur settings.In this article, we outline a modular approach for the latter case.We first describe the problem setup.Mathematically, blur is typically accomplished using a convolution operation with a particular function (such as a 2D Gaussian) or multiplication with a Toeplitz matrix [3], while the noise is an additive term.We can model the process in Figure 1 as: where the width of the Gaussian controls the amount of blur.With a Toeplitz matrix construction, one may use one or two sided multiplication to obtain the blurred image with control parameter L > 0: BR = toeplitz ([ ones(L ,1); zeros (n-L ,1)] ,[1; zeros (n -1 ,1)])/L; BL = toeplitz ([ ones(L ,1); zeros (m-L ,1)] ,[1; zeros (m -1 ,1)])/L; img_blur = BL* orig_img *BR; An issue with (1.2) is that the dimensions of R and that of M = RW −1 (which does not need to be explicitly formed) are MN × MN for an image of size M × N. In this article we outline a combined tractable approach of (1.1) and (1.2) which is useful for the general case.The approach works by means of a projector which reduces the size of the involved matrices and couples the iteration together with Fourier based filtering.

Conjugate-gradient based IRLS Scheme
The solution of (1.2) can be obtained via iterative thresholding techniques [1].However, as the blur matrix R is large, such an approach is time consuming.Iterative deconvolution can be performed on blocks of an image in parallel.However, to avoid edge artifacts, the use of larger blocks is desirable.Moreover (particularly for the high noise / blur setting we consider), it is desirable to be able to impose custom penalties on the residuals and for the regularization term, which may be difficult to do with thresholding techniques.We propose the use of randomized projections with the projection matrix constructed in parallel and the use of a conjugate gradient based algorithm to speed up the iterative solve.Such a scheme can be applied to general inverse problems and not just to image enhancement applications.The iteratively reweighed least squares based method we outline is able to handle various norms on both portions of the equation (1.2), which is useful in case the noise introduces outlier pixel regions in the image.
We make use of the work from [11], where a generalized iteratively reweighted least squares (IRLS) method is developed and discuss here how to apply the scheme to large problem sizes.Consider, for a general m × n system Mw ≈ b, the regularized functional defined by: where l and p are assumed in the range [1, 2] and control the type of penalty on the residual term and the coefficients.The IRLS approach is based on the approximation: , where in the rightmost term, a small 0 is used, to insure the denominator is finite, regardless of the value of y k .The resulting algorithm [11] for the minimization of (2.1) can be written as: with two diagonal, iteration dependent matrices D n and S n .The diagonal matrix D n has elements 1 2 λpy n k and S n has diagonal elements l|r n i | l−2 (with r n i = (Mw n − b) i ; for i where |r n i | < , we can set the entry to l l−2 with the choice of user controllable, tuned for a given application).Here, the iteration dependent weights are given by: 3) The diagonal matrices (or simply vectors holding the diagonal elements) are updated at each iteration and the system in (2.2) can be solved approximately via a few iterations of CG or LSQR based algorithms at each iteration n = 0, . . ., N.
Another advantage of the IRLS approach is that the powers p and l in (2.1) can be made component dependent.This then allows for better inversion of partially sparse signals (if of course, the location of the sparse part can be efficiently estimated).Alternatively, p, l can be adjusted as iteration progresses, particularly to incorporate the possibility of nonconvex penalties.
In order to implement the scheme in (2.2), we adapt the classic CG scheme, as described in [6].The iteration dependent factors S and D do not need to be formed explicitly and can be applied via element-wise multiplication, as shown in Figure 3. Thresholding to remove small coefficients left over after CG iterations should be used to prevent build up errors from small values.For this task, the function below can be used, which varies the threshold based on the parameter p, with each component of w replaced with sgn(w i ) max(0, |w i | − λ|w i | p−1 ).This formulation varies the shape of the threshold with p: function w2= pThreshold (w, lambda , p) w2 = w; for i=1: length (w) w2(i) = sign(w(i))* max (0, abs(w(i)) -lambda *abs(w(i))ˆ(p -1)); end end

Continuation Scheme
The system (2.2) is typically solved along the L-curve [4], starting at a large λ (generally a value close to M T b ∞the nonzero cutoff for the 1 functional) and decreasing down in logarithmic fashion using the logarithmically spaced sequence: At each λ, the IRLS scheme is performed for a few iterations.Typically N is in the range of [10,20], with 5 CG-based iterations at each λ being sufficient.The use of the randomized projections and of the CG approach can significantly reduce computation demands over the whole L-curve.Typically, when a sparse promoting penalty is imposed in (2.2), we hard threshold the smallest coefficients at the end of each L-curve iteration i.One possibility is to compute some percentile (e.g.15%) of the absolute values of w obtained from the optimization problem at each λ value and set to zero all coefficients (hard threshold) whose magnitude falls below this value.

Use of Fourier Filtering
Fourier filtering is effective at mitigating blur, if the source of the blur can be accurately modeled and the noise is contained.After the end of iteration along the L-curve, or in between successives values of λ along the L-curve, we can perform a Fourier filtering approach based on the Wiener filter: with G the Fourier transform of the corrupted image, H the optical transfer function, and F( x) the Fourier transform of the recovered image for some q > 0 and α > 0, related to the inverse signal to noise ratio (1/SNR).We can take for this, the inverse from the expression max 1 MN abs(F(b)F(b) T ) involving the Fourier transform of the noisy and blurred image.Notice that by varying the eps1 value, we can control the inclusion of small values of H in the filter.For noisy inputs, a relatively high eps1 value can prevent the blowup of the noise components.

Randomized Low Rank Projection
We now go on to discuss a fast, scalable approach to obtain a projection of the approximate Mw ≈ y system with M = RW −1 .In many cases the singular values of the R matrix exhibit fast decay, with the same then holding true for M. As a result a simple projection of the form Q T (Mw ≈ y) ⇒ Q T Mw ≈ Q T y reduces the problem dimension and speeds up iteration as long as Q has less columns than M. In the same way, in (2. are ideal for such case.To make this approach efficient we make use of randomization based techniques.The idea behind randomization is simple to understand.A survey is found in [8] and included references.Suppose we have a matrix M ∈ R M×N whose range we wish to sample.We can take some random vectors v 1 , . . ., v l (whose pairwise inner products will be with high probability close to 0) and compute Mv 1 , . . ., Mv l .If the numerical rank of A is substantially smaller than min(M, N), then we do not need to compute many such computations in order to accurately sample the range of M.More precisely, if we form Ω ∈ R N×l with Gaussian iid (independent, identically distributed) entries and compute Y = MΩ, then perform Gram-Schmdit on the columns of Y yielding a matrix Q.Then with large enough l relative to the numerical rank of M, we will have that range(Q) ≈ range(M).In this situation, it can be shown that QQ T M ≈ M, with the result also holding for complex-valued matrices with the transpose replaced by the Hermitian.
The scheme in Figure 4 constructs Q given M and several parameters, most notably , such that QQ T M − M < , in the same norm as used in the algorithm (the operator norm as default may be assumed).The idea behind the scheme is to draw samples of the range of M, via matrix-vector multiplications with randomly drawn vectors and then orthogonalize the resulting set of outputs using a Gram-Schmidt procedure.The algorithm above is simply a blocked implementation of such a routine (where instead of one vector, a set of b p vectors at a time samples the range).The main idea behind the steps needed to expand the matrix Q can be analyzed through the following sequence of steps: Input: r+1) .The output of the above procedure [8] is a matrix Q ∈ R M×(r+1) such that: (3) Q i = orth(MΩ i ). ( 4) for j = 1 : q (5) Figure 4.A blocked and adaptive QB algorithm proposed in [8].
Note: we must have y range(Q) as otherwise QQ T y = y and ȳ = (I − QQ T )y = 0. (For randomly drawn y from large dimension space, this is not likely).That is the range of Q has been expanded and orthogonality has been preserved.The blocked scheme in Figure 4 is simply an extension of the above procedure.
The parameter q controls the power iteration: (MM T ) q M = UΣ 2q+1 V T , compared to the original SVD of M. Due to the power scaling term for Σ, values of q > 1 typically improve performance for matrices M with modest singular value decay.Lines 4-7 implement the power iteration.Line 8 is a re-orthonormalization procedure that ensures accuracy in the presence of floating point operations.Lines 9 and 10 implement the recurrence relations in line with the above range expanding procedure and line 11 checks the exit condition, for when (I − Q j Q T j )A < , the scheme is exited.A simple matrix splitting strategy allows this scheme to be applied in parallel over parts of the matrix, which is important for matrices which have large dimensions, such as the case we consider.We can proceed by subdividing M into blocks along the rows.We assume here that the number of blocks is a power of two.Without loss of generality, we assume the use of four blocks: We then perform QB factorizations on the blocks of the B matrix: Finally, we perform a QB factorization on: It follows that: The benefit of this formulation is that the QB algorithm can be performed on smaller matrices in parallel.In particular, we handle the decompositions of blocks M 1 , . . ., M 4 in parallel, following which we can do in parallel the decompositions of matrices M (1) and M (2) and finally that of M (3) .The reason for blocking M along the rows instead of columns is that we would like to keep the orthonormality of the resulting matrix Q which is done by working with block diagonal matrices.Notice that performing QB factorizations at later steps is not essential.Instead, we can replace QB with the full QR factorization computed on the smaller matrices (which would be efficient to do).The ranks chosen at the initial (first order) steps should be made to insure that Q T M is of significantly smaller dimensions than M.
Let us consider the hierarchical factorization approach with explicit error accounting for two blocks: Then in particular, collapsing the row matrices, it follows that: Since the Q i 's are orthogonal, the overall error bound is additive, i.e.M − QB ≤ E (1) + E (2) .This is useful to know in a distributive environment, as the errors at each iteration of different blocks can be communicated to e.g. the root processor, which can make a decision about the rank to use for each block so that a final ouput tolerance is met.

Combined Approach
The combination can be performed adaptively, by computing a gradient map of one of the images, subdividing in blocks, and combining smooth and sharp results with respect to the observed smoothness (weighing more heavier towards the smooth version for smooth regions).Of course, more than two bases can be used and merged in a similar way.Alternatively, more advanced merging techniques can be employed.One particular class of methods is based on the use of the 2-D wavelet transform to decompose the image into high frequency and low frequency portions.The high frequency and low frequency components are then merged according to different rules [5,3].In Matlab, the commands wavedec2,detcoef2,wcodemat can be utilized to extract the approximation (low frequency) and detail (higher frequency portions) of an image under a specified transform.Alternatively, superresolution based methods [2] can also be employed for the merge step.

Figure 3 .
Figure 3. Main steps in the iteration of the generalized CG based scheme from [11].

Figure 5 .
Figure 5. Fusion scheme diagram for fusing multiple images (e.g.obtained with iterative regularization using different projection bases) into one [3].

Figure 6 .
Figure 6.Row 1: Successive image degradation with blur (image 2) and noise addition (image 3).Row 2:Fourier Wiener filter deblur with OSF derived from a Gaussian function applied to images 2 and 3