Invarianceness for Character Recognition Using Geo-Discretization Features

,

Recently, the field of pattern recognition is considerably improved and revealed due to the emerging applications which are not only challenging but also attracted many researchers' attention.New applications include (data mining, web searching, retrieval of multimedia data, face recognition, handwritten recognition).These techniques require robust and intelligent pattern recognition techniques.Pattern recognition described by (Anil, Robert, & Jianchang, 2000) as a most critical role in human decision making task, even though we as a human can easily refuse to understand how actually human could recognize patterns.
The character recognition based off-line English handwriting is an open research area in pattern recognition and computer vision fields (Bayan, 2013;Binod, & Goutam, 2012).The shape or style in off-line English character is complex and has similarity among some characters (Binod, & Goutam, 2012;Nisha, Hem, & Singh, 2012).However, there are still unique features for each character.These unique features can be generalized as the individual's character handwriting even though there can be complex and high similarity in off-line English language characters.Figure 1 shows an example of off-line English characters and the similarity among them.An improvement step is added to provide a better representation for the input samples from the same or different characters.Extracted features in the feature extraction process show that the character in an off-line English language has similar style or format which affects the accuracy of the performance.

Off-Line English Character Individuality
Off-line English Handwriting character has long been considered individualistic and character individuality rests on the hypothesis that each individual character has consistent handwriting (Binod, & Goutam, 2012;Azmi, Kabir, & Badi, 2003;Bayan, 2012;Nisha, Hem, & Singh, 2012).Figure 1 shows the handwriting of the same character and Figure 2 of different character by four writers.Characters are shown as taking a specific texture (Binod, & Goutam, 2012) and can be seen in below figures.The character structure is faintly different for the identical character and completely different for non-identical character, this is known as individuality of English character.Intra-class measurement is showed for features of the same character, and inter-class for different character.Well-being single features must acquire the minimum error of similarity for intra-class and the maximum similarity error for inter-class.

Uniqueness in Off-line English Character Representation
Selecting most predominant features acting as an input to a classifier are very interesting to get better performance in the process of recognition.These kinds of feature do not represent individual features of the character because of representing the character by different features.The proposed method is based on an invariant discretization algorithm which is studying by (Muda, Shamsuddin, & Ajith, 2010;Azmi, Kabir, & Badi, 2003;Bayan, 2012;].It acts by reducing the dissimilarity between features for intra-class and increasing the dissimilarity between features for inter-class.The traditional and the proposed framework are shown in Figure 3 and 4 respectively.

Discretization Process
Discretization is considered as a divider that performs two essential operations the first task is to convert the value of the continuous characteristics into discrete.The second one is to divide the value and categorized them into appropriate intervals.The main objective of the discretization of the continuous characteristics is to represent the min a better way (Fabrice, & Ricco, 2005).There are some well-known techniques for discretization including Equal Information Gain, Maximum Entropy, and Equal Interval Width.Another method proposed in (Muda, Shamsuddin, & Ajith, Fabrice, & Ricco, 2005)), the Invariants Discretization method, is proved to be better in efficiency by having higher accuracy and better rates of identification.The method is supervised type and starts by choosing the suitable intervals to represent the writer's information (Muda, Shamsuddin, & Ajith, 2010;Fabrice, & Ricco, 2005;Bayan, & Shamsuddin, 2012;Bayan, & Siti, 2011).The upper and lower boundaries are then set for each interval.The number of intervals for an image must be the same as the number of the feature vectors.

Feature Extraction Phase
Techniques that transform the input sample data into the set of features are called feature extraction method.The characteristic of feature extraction is to reduce the dimension of the given data.Selection of the feature extraction method types is crucial and affects the performance evaluation of any pattern recognition system (Bayan, 2013;Trier, & Jain, 1996).Different extractors are proposed to recognize handwritten digits and characters such as (FT, IM, GM and Characteristic Loci) (Takahashi, 1991;Azmi, Kabir, & Badi, 2003).In this paper, geometric moment method is used to recognize handwritten off-line English characters.Geometric Moment is used in object recognition and pattern recognition applications.A set of distinctive features computed for an object must be capable of identifying the same object with another possible different size and orientation (Muralidharan, & Chandrasekar, 2011;Bayan, 2015;Bayan, 2012).
The computation steps of geometric moments are described as below: 1) Read an input image data from left to right and from top to bottom.
2) Threshold the image data to extract the target process area.
3) Compute the image moment value, m until third order with formula: m = x y f x , y dx dy ; p, q = 0,1,2, … . 1 4) Compute the intensity moment, (x , y ) of image with formula: 5) Compute the central moments, μ with formula: μ = x − x y − y f x, y dxdy ; p, q = 0,1,2 … 3 6) Compute normalized central moment, η to be used in image scaling until third order with formula: 7) Compute geometric moments, ϕ toϕ with respect to translation, scale and rotation (geometric moment invariants) invariants with formula below:

Discretization Phase
The process of discretization determines a set of interval that shows the representation of features to be extracted.
To obtain an interval, the lowest and highest data range of every writer is distributed along number of intervals (cuts) with equally size.Interval numbers are described according to the number of feature vector in the feature extraction process.An interval value representation is estimated based on the character class.If two characters have an identical invariant value, they take identical interval for these two classes.The Discretization method does not affect or change the properties of character; it is only representing the basic feature vector which is extracted invariantly in a standard representation with global features.Figure 5 depicts the discretization process method.
Figure 5. Invariant Discretization Line (Muda, Shamsuddin, & Ajith, 2010) Invariant discretization line uses minimum (v ) and the maximum (v ) feature vectors to determine the invariant intervals range.The width of an interval can be found as: Where: v : represent lowest value for a character.
v : represent highest value for a character.
f: represent invariant feature vector number.
The width in equation ( 9) is performed to find out the number of cut points of in the discretization line process.Figure 6 and 7 illustrate the process of transformation from invariant feature vector to the discretized feature vector respectively.The discretized data yielded from the discretization scheme clearly shows the unique feature of every character in English handwriting.

Uniqueness Test Results
Mean Absolute Error (MAE) function is used to measure the uniqueness of the character.Table .1 and 2 present the test result values of the MAE when the number of samples is 10 for every character.Feature (1 to 4) is an extracted feature that represents a character.The invarianceness of character and reference image (first image) is given by the MAE value.The small errors mean that the image is close to the reference image.An average of MAE is taken from the value of whole results.
Where, n ∶ is the number of images.
x : is the current image.
r : is the reference image or location measure.
f : is the number of features.
i : is the feature column of image.The profession of writing invarianceness for the geometric (moment and Geo-discretized) data value is determined by applying the intra-class and inter-class analysis of MAE value.The test result demonstrates that the dissimilarity between feature for intra-class (identical character) and inter-class (non-identical character) using the Geo-Discretization scheme gives a better result compared to geometric moments data.It has improved the recognition process where the MAE value for intra-class using Geo-discretized data is smaller than geometric moment's data, and MAE value for inter-class using Geo-discretized data is higher than geometric moment's data.The minimum MAE value in intra-class indicates that features are highly identical to each other for the identical character whilst the maximum MAE value for inter-class indicates that they are widely differ to each other for non-idnetical characters.These results have proved the hypothesis that the discretization process can improve the recognition process with a standard representation of individual features for the individuality representation in off-line English handwriting character.The results reveal that with the use of the invariant discretization technique, the accuracy of the off-line English handwritten character recognition is significantly improved with the general arrangement to get improved accuracy paralleled to geometric moment's information.For the future work, the similar experiment could be done over some other characters to improve more the accuracy of the proposed method.

Figure 1 .
Figure 1.Same Character by Different Writers

Figure 2 .
Figure 2. Different Character by Different Writers

Figure
Figure 3. Traditional Framework

Figure 6 .
Figure 6.Invariant Feature Vector Data for Character (h) and (n)

Figure 7 .
Figure 7. Example of Discretized Feature Data for Character (h)and(n) Figure 8 and 9 show the MAE results comparison of recognition process for the Geometric feature technique with Geo-discretized data and geometric moment's data.

Table 1 .
MAE Results using Geometric Moments