3D Stereo Reconstruction of SEM Images

In this work is proposed a new fully automated methodology using computer vision and dynamic programming to obtain a 3D reconstruction model of surfaces using scanning electron microscope (SEM) images based on stereovision. The horizontal stereo matching step is done with a robust and efficient algorithm based on semiglobal matching. The cost function used in this study is very simple since the brightness and contrast change of corresponding pixels is negligible for the small tilt involved in stereo SEM. It is used a sum of absolute differences (SAD) over a variable pixel size window. Since it relies on dynamic programming, the matching algorithm uses an occlusion parameter which penalizes large depth discontinuities and, in practice, smooths the disparity map and the corresponding reconstructed surface. This step yields a disparity map, i.e. the differences between the horizontal coordinates of the matching points in the stereo images. The horizontal disparity map is finally converted into heights according to the SEM acquisition parameters: tilt angle, image magnification and pixel size. A validation test was first performed using as reference a microscopic grid with manufacturer specifications. Finally, with the 3D model are proposed some applications in materials science as roughness parameters estimation and wear measurements.


Introduction
Stereo matching has been one of the fundamental and most widely studied problems in computer vision (Scharstein & Szeliski. 2002, pp. 7-42). It has been successfully used in a variety of applications, including robot navigation, space missions, etc. (Tingbo, et. al. 2012, pp. 908-921). Stereo matching is the process of taking two or more images from different viewpoints simultaneously and estimating a 3D model of the scene by finding matching pixels in the images and converting their 2D positions into 3D depths (Tingbo, et. al. 2012, pp. 908-921) (Richard Szeliski. 2010).
The basic principle involved in the recovery of depth using passive imaging is triangulation. The triangulation needs to be achieved with the help of only the existing ambient illumination. Hence a correspondence need to be established between the features from two images that correspond to some physical feature in space. The major steps involved in the process of stereopsis are preprocessing, establishing correspondence, and recovering depth (Umesh, et. al. 1989(Umesh, et. al. , pp. 1489(Umesh, et. al. -1510. The solution may be represented as a disparity field specifying the positional differences of corresponding feature points relative to the image coordinate systems. The distance to the scene may then be computed from the disparity field, and from knowledge on the transformation relating the two image coordinate systems (Soren. 1990, pp. 309-315). A main problem in stereo analysis is to detect disparities over a large range of values. To reduce the search space, it is commonly assumed that the epipolar line geometry is known. This knowledge reduces the search space to a line segment (Soren. 1990, pp. 309-315).
The Scanning Electron Microscope (SEM) is a tool in a broad range of scientific and engineering applications (Johnson. 1996). The SEM is useful to observe and characterize different kind of materials on a nanometer (nm) to micrometer (µm) scale (Joseph, et. al. 2003). The SEM can provide information on surface topography, crystalline structure, chemical composition and electrical behavior of the top 1 µm or so of specimen (Shweta & Rama. 2013, pp. 105-114) (Bogner, et. al. 2007, pp. 390-401). The capability of the SEM to obtain three-dimensional-like images of the surfaces is much appreciated (Mahovic, et. al. 2008, pp. 3449-3458) (Risovi, et. al. 2008, pp. 3063-3070). This characteristic makes the SEM very popular in a wide variety of media, from scientific journals to popular magazines to the movies (Joseph. et.al. 2003).
All SEM are composed of an electron column, that creates a beam of electrons; a sample chamber, where the electron beam interacts with the sample; detectors, that monitor different signals resulting from the beam-sample interaction (secondary and backscattered electrons, and X-rays) (Johnson. 1996) (Vernon. 2000; and a viewing system, that constructs an image from the signals. An electron gun at the top of the column generates the electron beam. In the gun, an electrostatic field directs electrons emitted from a very small region on the surface of an electrode, through a small spot called the crossover. Then, the gun accelerates the electrons down the column toward the sample with energies typically ranging from a few hundred to tens of thousands of electron volts. Several types of electron guns like tungsten, LaB6 (lanthanum hexaboride) and field emission (Joseph, et.al. 2003) (Bogner, et. al. 2007, pp. 390-401) (Vernon. 2000 use different electrode materials and physical principles, but all share the common purpose of generating a directed electron beam having stable and sufficient current, and the smallest possible size. The electrons emerge from the gun as a divergent beam. A series of magnetic lenses and apertures in the column reconverge and focus the beam into a demagnified image of the crossover. Near the bottom of the column, a set of scan coils deflect the beam in a scanning pattern over the sample surface (Joseph, et. al. 2003). The final lens focus the beam into the smallest possible spot on the sample surface. The beam exits from the column into the sample chamber. The chamber incorporates a stage for manipulating the sample, a door or airlock for inserting and removing the sample, and access ports for mounting various signal detectors and other accessories. As the beam electrons penetrate the sample, they give up energy, which is emitted from the sample in a variety of ways. Each emission mode is potentially a signal from which to create an image (Johnson. 1996).

Stereo Pair Acquisition
3D reconstruction from stereoscopic images (acquired at varying specimen tilt angles (Kayaalp. 1990, pp. 21-246) is based on the measurement of the disparity, which is the shift (in pixels) of the specimen features from one image to the other (Pouchou. 2002, pp-135-144). In SEM, images are formed by scanning a focused electron beam rectiliearly over the sample surface and synchronously detecting one of the signals generated by the beamspecimen interaction processes (Marinello. 2008).
Two main kinds of detection principle exist for imaging surfaces: secondary electron emission and back scattered electron emission. Secondary electrons, which produce an emission from less than ∼ 10 nm below the surface, are usually preferred since they provide topographies with anoptimal signal-to-noise ratio and eventually with very high resolution (Joseph, et.al. 2003) (Marinello. 2008). Viewing an object from two separated viewpoints which subtend an angle θ at that object is equivalent to taking two imagesfrom a single viewpoint, but rotating the object through the same angleθ between the images ( Figure 1). It is this procedure that is usually employed in the SEM (Joseph, et.al. 2003). Toproduce a stereoscopic reconstruction, it is necessary to tilt the sample a few degrees to acquired two images (stere-opair), and capture approximately the same region of interest (Joseph, et.al. 2003) (Marinello, 2008) (Roy, et. al. 2012, pp. 4361-4364).
To facilitate the acquisition of the images, it is hardly recommended to use a SEM with a manual or automatic specimen stage (Oliveira, et. al. 1999, pp. 256-263), with movement along five axes (x, y, z, tilt and rotation). mas.ccsenet.org Modern Applied Science Vol. 12, No. 12;2018 Figure 1. Tilt concept to acquire a stereo pair (Roy, et. al, 2012, pp. 4361-4364).
For simplicity in further calculation, we take the two images with equal tilt, but different sign.

Preprocessing
Before the stereo matching it is necessary to assure two main characteristics in the stereo pair to obtainaccurately results: the brightness and the alignment.
The global histogram contrast must be similar for both images. A histogram matching was chosen to assure the same brightness and contrast in both images while preserving relative positions of edges and other textural characteristics (Oliveira, et. al. 1999, pp. 256-263). Specifically, the mean and standard deviation of one imageis matched regarding the other.
The first step is set one image as the reference image, and the other one as the work image. To equalize the histograms, it is necessary to obtain the difference between each pixel of the work imageand itsmean. Then the result is divided by the standard deviation of the work image. This step makes the work image to have a mean equal to zero and a standard deviation equal to 1. Finally, theresulting image is multiplied by the standard deviation of the reference image and the mean valueofthe image is added to the equalize work image. Thus, both images will have approximately the samebrightness and contrast.
There is uncertainty during the acquisition process that can produce image misalignments in the stereo pairs. Alignment is needed to ensure that only the parallax displacement components will be measured (Oliveira, et. al. 1999, pp. 256-263). Aiming to correct such misalignments, a geometric transformation that aligns the stereo pair is approximated.
To obtain a reasonable alignment between two images it is necessary to find corresponding points between the stereo pair. With this information is possible to calculate a geometric transformation, andfinallyuse the matrix obtained to align one image with respect to the other.
The correspondence problem consists in finding correct point-to-point correspondences between images (Olof Enqvist. 2009). We have two images of the same 3D scene; each image tilted a few degrees. The objectives to find a set of distinctive features in one image which can be identified as the same features in the otherimage.
To search image point correspondences, the task can be divided into three main steps. Interest pointsareselected at distinctive locations in the image.
Next, the neighborhood of every interest point is represented by a feature vector. This descriptor has tobe distinctive and at the same time robust to noise, detection displacements and geometric and photometric deformations.  Vol. 12, No. 12;2018 Finally, the descriptor vectors are matched between different images (Bay, et. al. 2008, pp. 346-359). Speeded-Up Robust Features (SURF) algorithm is used to find landmarks in the stereo pair. The landmarks detected in the two images are matched, conforming a set of corresponding points.
To find outliers in the set of points is used the Random Sample Consensus (RANSAC) algorithm, that makes the points considered as outliers can not be used to determine the geometric transformation. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors (Fischler & Bolles. 1981, pp. 381-395). With this constraint, the transformation can be moreaccurate.
We estimate an affine geometric transform, which returns a transformation function, that can be applied to the positions of the corresponding points in one image to align them with the positions in theother image.

Stereo Matching and Dynamic Programming
Stereo matching is the process of taking two or more images from different viewpoints simultaneously and estimating a 3D model of the scene by finding matching pixels in the images and convertingtheir 2D positions into 3D depths (Tingbo, et. al. 2012, pp. 908-921) (Szeliski. 2010) (Kayaalp. 1990, pp. 21-246)].
The basic principle involved in the recovery of depth using passive imaging is triangulation. The triangulation needs to be achieved with the help of only the existing ambient illumination. Hence a correspondence need to be established between the features from two images that correspond to somephysical feature in space. The major steps involved in the process of stereopsis are preprocessing, establishing correspondence, and recovering depth (Dhond & Aggarwal. 1989, pp. 1489-1510. The solution may be represented as a disparity field specifying the positional differences of corresponding feature points relative to the image coordinate systems. The distance to the scene may then be computed from the disparity field, and from knowledge on the transformation relating the two image coordinate systems (Olsen. 1990).
A main problem in stereo analysis is to detect disparities over a large range of values. To reduce the search space, it is commonly assumed that the epipolar line geometry is known. This knowledge reducesthe search space to a line segment (Olsen. 1990) (Roy, et. al. 2012, pp. 4361-4364).
Dynamic programming solves problems by combining the solutions of subproblems (Cormen, et. al. 2009). The idea behind dynamic programming is given a problem, solve different parts of the problem (subproblems), and then combine the solution of the subproblems to reach an overall solution. Dynamic programming is applied to optimization problems. Such problems can have many possible solutions. Each solution has avalue, and the objective is to find a solution with the optimal (minimum or maximum) value.
For every pixel in the left image, the goal is to find corresponding pixel in the right image (a match). Matching single pixels is nearly impossible, therefore every pixel is represented by a small regioncontaining it, called correlation window. The window is centered around the pixel and has a constant size (Mhlmann, et. al. 2002, pp. 79-88).
The choice of the size of the window has some drawbacks. If the window is too small, it is going to find small details, but the result is going to be noisy. On the other hand, if the window is too big, it is not going to find small details, but the result is going to be smooth (Scharstein & Szeliski. 2002, pp. 7-42). In general, the robustness of matching is increased with large areas. The dimensions of the correlation window have to be uneven; otherwise there would be an offset of half a pixel between the disparity map and the left image. (Mhlmann, et. al. 2002, pp. 79-88)

Figure 2. Disparity search using a correlation window [23]
The matching cost is calculated for a left image pixel from its intensity and the suspected correspondence in the mas.ccsenet.org Modern Applied Science Vol. 12, No. 12;2018 right image. It is used a sum of absolute differences (SAD) to correlate the windows of both images. Pixel-wise cost calculation is generally ambiguous and wrong matches can easily have a lowercost than correct ones, due to noise, etc (Hirschmller. 2008, pp. 328-341). An additional constraint is added that supports smoothness penalizing changes of the neighboring disparities. Finally, the matching cost has a term that is the sum of all pixel matching costs for the disparities, the second term adds a constant penalty for all pixels in the neighborhood of a pixel, for which the disparity changes a little bit, and a third term adds a larger constant penalty, for all larger disparity changes (Roy, et. al. 2012, pp. 4361-4364) (Hirschmller. 2008. The problem of stereo matching can now be formulated as finding the disparity image that minimizes the cost function. Figure 3. Disparity space volume defined by the dimensions of the left image and the disparity search range (Mhlmann, et. al, 2002, pp. 79-88) The cost function values are stored in memory in a cuboid (Figure 3) whose dimensions are given by the width and the height of the images and the disparity (d) search range from dmin to dmax. Every position (x, y, d) in the volume contains a similarity or correlation measure between the window representing position (x, y) in the left image and the one representing (x + d, y) in the right image (Mhlmann, et. al. 2002, pp. 79-88).
The disparity image that corresponds to the left image is determined by selecting for each pixel the disparity that corresponds to the minimum cost.
For subpixel estimation, a quadratic curve is fitted through the next higher and lower disparity, and theposition of the minimum is calculated (Hirschmller. 2008, pp. 328-341).
The horizontal disparity map is finally converted into heights according to the acquisition parameters: tilt angle, magnification and pixel size, with simple trigonometric equations. If the tilt angles are differentthe trigonometric equations carry to a complicated relation between disparity and height (Dhond & Aggarwal, 1989, pp. 1489-1510. To simplify, we use symmetrical stereo pair. Height h and disparity d in θ microns are related by the following equation (Szeliski, 2010) (Dhond. 1989(Dhond. , pp. 1489(Dhond. -1510 (Roy, et. al. 2012, pp. 4361-4364): d = 2 · h · sin (θ / 2). Therefore, the height h (in microns) of a point whose disparity is d (in pixels) is: Where θ is the total tilt angle and p the pixel size in sample units (e.g. microns). The latter can be obtained from the scale provided by the SEM system or a calibration object (e.g. a microscopic grid) usedfor that purpose. Missing or hidden pixels can be estimated by interpolation.

Results
The 3D height map of each surface using stereo pairs of SEM images were computed with an algorithm developed with the software Wolfram Mathematica §R 10. To evaluate our methodology, a validation test was performed using as reference a microscopic grid. The microscopic grid was imaged at x1000 magnifications with a scanning electron microscope (Jeol JSM-6300F, Jeol Ltd.). The reconstructed surface was then compared with manufacturer specifications (10µm step width and 2µm step height).

Conclusions
SEM is an excellent tool to obtain information from micro to nano scale. It is robust and has a lot of applications and modules that make it an instrument widely used in research and industry. However, the three-dimensional information that can be obtained from it is limited, needing additional analysis to generate accurate 3D models.
In this work was studied a method using stereo vision and dynamic programming to analyze SEM stereo pairs and generate 3D models from them.
Implementing the 3D reconstruction algorithm was possible to estimate 3D models and surface roughness values using SEM stereo imaging. At the beginning, was used an optical flow algorithm to find the disparity map but the results were not enough accurate for our application. The algorithm uses the basics of stereo vision. Using dynamic programming the algorithm find the disparity map between the stereo pairs, that allows to calculate the 3D values through a trigonometric equation, supported in the geometry of the acquisition. The 3D values are used to plot the 3D model.
The 3D model of the calibration grid gave an accuracy result, showing the values that the manufacturer set. The results in edges and homogeneous areas are satisfactory and the algorithm make an excellent differentiation between the two heights of the sample.
With the 3D reconstruction algorithm and these results on the calibration grid, some applications can be considered to enrich the research. As future work we are looking forward to use the 3D models to measure 2D and 3D surface roughness parameters. We also expect to use the reconstruction methodology to propose a new way to obtain values of wear coefficients of materials. The work with the algorithm is not done yet. We expect to test our methodology using different calibration samples, alternative materials and new geometries.