The Factors Affecting Eye Patients (Cataract) In Jordan by Using the Logistic Regression Model

This study aims to use the logistic regression model to classify patients as infected and without cataracts. The independent variables were used to represent the gender, the age, the pressure in the right eye, the pressure in the left eye, HbA1C, and the anemia, representative variables for the study of Cataract disease affects the eyes, based on a random sample of (116) patients. The results proved that the used logistic regression model is an efficient and representative for data that shows through (Likelihood Ratio Test) and (Hosmer and Lemeshow test), and the study proved that the value of (R Square Nagelkerke=1) this means that 100% of the change in the occurred changes in the response variable explained through the Logistic regression model.


Introduction
The human eye is a jewel, and it is the tool that enables us to see things around us.It is a priceless grant from God.Environmental factors, life patterns, behavior and daily practices of the individuals could cause damage to the eye, but the biggest danger on the eye is the inflammation so caution is a must.Eye diseases (inflammations) are trachoma, smallpox, leprosy, optic neuritis, cataract (white water), glaucoma (Blue water) and reproductive system diseases such as gallstones and syphilis.There are changes and complaints associated with eye diseases like pain, swelling, redness, a sudden change in vision as double vision, blurred vision, headache and intense sensitivity to light.The causes of partial vision loss are infections, hereditary diseases, congenital diseases, diseases of malnutrition, infections, tumors, glaucoma, short-sightedness and poor vision.Cataract is the opacity of the transparent lens in the eye, while glaucoma is a disease of the optic nerve due to high pressure in the eye as a result of damage to the optic nerve tissue, which leads to the formation of blind spots within the eye (partial loss of vision field).And if the disease is not treated, the optic nerve is completely damaged and the eye loses its ability to see.Glaucoma is commonly known as blue water, but in fact it is a common mistake as there is no blue water inside the eye, but this naming came from the Greek word "Glaucoma " which means blue waterfalls, because the patient sometimes sees blue halos around the light source giving the impression that there is blue water inside the eye.Glaucoma is the leading cause of blindness in the elderly which can be prevented if treatment is initiated early.
Many medical studies point out that there is a close relationship between the diagnostic aspects of the disease and some statistical studies and applications used in this field.And this helps in the speed of early diagnosis of difficult disease.However, we find that almost statistical data and the related applications at the Arabian region level in general and the local level in particular, which dealt with the use of the statistical application of the regression model in the medical field, focused on examining the factors influencing the incidence of the various diseases (the risk factors and the causes of the different diseases) and did not address the diagnostic side of the disease (and this is the search problem).Therefore, this study shows the relationship between the factors that assist in the diagnosis of cataract cases (white water), disease-specific disease conditions, and prioritization of factors using the statistical application of the regression model.This Research aims to use the logistic regression model to classify patients as infected or without cataracts.

Literature Review
Some of the previous statistical reports dealing with the subject of the statistical application of the regression model in the medical field, in particular, are presented below: Hassan and Ahmed (2014) employed the regression model to determine the factors which affect the incidence of anemia in children (2009 -2013).Their study aimed to analyze several factors that affect anemia in children.Furthermore, Al-Sharout, Mohammed, and Mohsen, Amira Jabber (2013) used Regression analysis to study the Breast Cancer Risk, which focused on the use of Regression as a useful tool in studying the factors affecting breast cancer.
Moreover, Yusuff, N. Mohamad, U. K. Ngah & A. S. Yahaya (2011) had written a paper entitled Breast Cancer Analysis using logistic regression.This study intended to use regression as a finite tool by using Mammogram for breast image to diagnose breast cancer.Qasim and Abd al-Razzaq (2011) studied the effect of a number of variables on the periodontal disease by using the regression model.These variables are such as sedimentation, malnutrition and other factors.They found that the regression analysis is the appropriate statistical tool to be employed.Mustafa and Khader (2011) studied the factors leading to traffic accidents and the casualties resulted from these accidents in Al-Jazerah province, The aim of this study was to identify the factors leading to traffic accidents and the casualties resulted from them in Al-Jazerah Province.The data were collected from all the victims of these accidents, including the cause and description of the accidents, vehicle type, the road and place of accident, in addition to other factors related to the accident circumstances for all accidents that took place in the province between January 2005 and December 2006.The source of the data is the traffic court.SPSS program was used to enter the data and analyze them.The Logistic regression model was the only method used for this purpose.The study showed that the logistic regression model is capable of predicting the casualty's factor.It was found that the factors related to the driver are the most influential in the increase of the casualties related to road accidents.Ali and Mohammed (2010) aimed at building Logistic regression model for injuries of the Sudanese female breast cancer with reference to the state of Khartoum, and then used the method of logistic regression to study the status of breast cancer in Sudan.
Abdul Majed and Hasan (2009) used the multiple logistic models to determine the factors affecting glaucoma.
The study aimed at determining the risk factors for glaucoma, where the multiple regression models were used.Five variables were used as potential risk factors.
Yassin ,Amal Hassan (2008) used regression load to determine the causes of breast cancer in females, studying the status of the National Center for Nuclear Medicine in Khartoum.This study aimed to identify the factors affecting breast cancer in 200 women from the patients of the National Center for Nuclear Medicine in Khartoum.Al-Bayati, and Ibrahim (2005) Analyzed the path in the regression model with a particular application.This study addressed the direct and indirect effects of a group of factors influencing the incidence of anemia for two groups of populations most vulnerable to the disease in people who are under 18 years of age and pregnant women.

Method
The research is based on the following methodologies: 1) Explaining the statistical file of the regression model and focusing on the characteristics and how to estimate the parameters.
2) Adopting the multiple regression models and observing the results, using the SPSS, Minitab, and Software.

Variables Used In the Study
1. Sex 2. Age 3. Intra-Ocular Pressure, physiological intraocular pressure is between 12 and 19 mm Hg, any increase in internal pressure of the eye exceeding 19 mm Hg is considered the beginning of the disease that should be treated and re-examined periodically every month for 3 or 6 months depending on the condition of the disease.If the pressure of the eye drops below normal range, it may lead to atrophy in Eye, which is very rare.
4. Sugar: This variable is very important, and has a significant impact on the incidence of eye disease.The Normal range of blood sugar in human is between 4.5 -6.5 mm mol per liter.When blood sugar drops below its normal levels, it causes fainting .While when it increases it leads to bleeding inside the eye and consequently causes blindness.Deterioration of the cataract is much faster in diabetics than in healthy people.

Packed cell volume or hemoglobin ratio in the blood:
The normal ratio is between 40 -45%.The patient suspected to general anesthesia should have normal blood ratio.

Logistic Regression
The statistical regression is defined as a statistical method for loading the (dataset).A variable is the response variable through one or more independent variables (One or more independent variable) where the response variable (dependent variable) is of the binary type[Logistic regression -MedCalc] (binary or dichotomous).
Categorical data is also known as the statistical modeling method for class data is widely used in medical research, finance, business, marketing, economics and insurance (Health ... etc.) (El-Habi , Abdalla & El-Jazzar , Majed , 2014).
Logistic Regression is a linear regression in terms of the relationship between the (dependent Variable) and a set of extant variables (explanatory or explanatory variables) but differs because the dependent variable in the regression model must be of the binary type and this difference is reflected.
Additionally, the logistic regression is not required to be independent variables of the continuous type, and not normally distributed.And the relationship between the dependent variable and the independent variables is not Linearity.This makes the logistic regression model more flexible than the rest Prediction and classification models (Abbas, Khodair, 2012).
Moreover, the categories must be specific and sealed so that each item belongs to only one category.Finally, the sample size used in the regression should be larger than the size of the sample used in Linear regression because the coefficients of the regression model are estimated by using the maximum function method (maximum likelihood method).The way of working out the sample size needed is relatively large.

Estimation of Logistic Regression Parameters
The regression model is based on a fundamental assumption that the dependent variable (Y)response variable which is marked by a binary variable following a distribution (Bernoulli)rank (1)with probability (p) &rank (0)with probability(q = 1-p)the occurrence of the response and non-occurrence (Ghanem & Fareed, 2011).
The regression model can be expressed in the case of only one independent variable containing the equation (Hosmer et, al., 2013). (1) Such that: represents the conditional average (conditional mean) for response variable (Y) in determined value for (x).ß 0, ß1 : logistic regression parameters.
If the regression model contains more than one independent variable, it can be: k, from the dependent variables then they express the following mathematical formula: Such that: Logistic regression parameters [Hosmer , David W. & Lemeshow , Stanley , (2013) ] : When we have sample of (n) independent observations, each pair (xi,yi), i=1, 2.,,,,, , n , such that (Hosmer et, al., 2013).
yi: represent the rank of a binary variable for observation (i) xi: the value of independent variable for observation (i) Now when y=1, then When y=o, then The maximum likelihood function can be expressed as the following mathematical formula (Hosmer et, al., 2013).
By taking the natural logarithm of both sides of the equation (4) we get (Hosmer et, al., 2013).

L (5)
Now we derive the equation no (5).And then equated with zero, get a number of equations can be solved only through the iterative algorithm (Qasim &Abdel Razaq, 2011).The so-called (Iteratively Weighted Least Squares Algorithm) (Marlene , 2004).

Wald Test
The test hypothesis: Vs.
Distributed with : chi-square with degrees of freedom (1) for all parameter : The estimated parameter of the rank (j) SE ( ) The Standard Error for the estimated parameter of the rank (j)

Homser & Lemeshow Test
This test is used to determine whether well-fitting model or not, through the difference between observed values and expected values and it model is distributed chi-square distribution.(Qasim &Abdel Razaq, 2011), and used to test the following hypothesis (Wuensch , 2014).
There is no significant difference between observed values and expected values.
There is a significant difference between observed values and expected values.
Then if (Homser & Lemeshow) static is > (0.05, then the model representatives will be a good model (Wuensch , 2014).4) shows the quality and efficiency of the model by using the likelihood ratio test ,which follows the(Chi-Square) distribution with degree of freedom equals to the number of independent variables(6).And this means: Table 4. Omnibus Tests of Model Coefficients Chi-square df Sig.

The Application Side
Step 1 Step 143.695 6 .000 Block 143.695 6 .000 Model 143.695 6 .000The results of the model summary in regression analysis is illustrated in table 5, as noted the (p-value) equals to (0.00) , which indicates the significance and the quality of the used model.6), results of (Hosmer and Lemeshow) test which follow (Chi-square) distribution with value (0.0) which is not statistically significant.Where the probable value was (1).So there is no difference between the observed value end the expected value.The results in table (7) also assure that.), and the Logistic regression model as: As we noted there is a statistical value represented by (WALD) for each estimated of the capabilities.While column number (6) represents the percentage of recoil for injury once there is a change in the value of the independent variable linked with ( ).Column number (7) explains the confidence limits for the estimation.
If the value of ( ) was larger than 1, the percentage of recoil for injury increased, while if the value of ( ) is less than 1, this means there is an increase in the value of independent variable which leads at the end to a decrease in the percentage of recoil for injury.
According to the variable (x 1 ) which represents the Gender, value equals to (0.143) indicates the probability of the person to be injured by (0.143) in males is higher than in females.Step Chi-square df Sig.

1
.000 8 1.000 According to the variable (x 2 ) which represents the Age, value equals to (0.061).which means that every increase in the age of patient by one year, the percentage of injury increases by (0.061).
According to the variable(x 3 ) which represents the pressure in the right eye, value equals to (0.4161).This indicates every increase in the pressure of the right eye for a patient by one unit, the chance of injury increased by (0.4161).
According to the variable(x 4 ) which represents the pressure in the left eye, value equals to (0.016).This indicates that with every increase in the pressure of the left eye for the patient by one unit, the chance of injury increased by (0.016).
According to the variable(x 5 ) which represents HB-A1C (sugar cumulative) Value equals to (0.102), which indicates that with every increase in the pressure of the left eye of the patient by one unit, the chance of injury increased by (0.102).
According to the variable(x 6 ) which represents Anemia, value equals to (0.970).This indicates that with every increase in the Anemia by one unit, the chance of injury increased by (0.970).

1-
The study shows that there is a relation between the independent variables and the response variable which represented the value of ( NagelKerke R Square) that equals (1).
2-The study proved that the used logistic regression model is efficient and representative of the data shown through ( Likelihood Ratio Test) and (Hosmer and Lemeshow test).
3-The study explained that the percentage of the true classification of the people who had Cataract disease was classified truly by 100%.Also, the percentage of the people who didn't have Cataract was classified truly by 100%.While the percentage of the people who were wrongly classified was (0.0).And this gives an excellent indicator of the capability of the model to classify the patients correctly.
4-The study proved that the value of (R Square Nagelkerke=1).This means that 100% of the change in the occurred changes in the response variable were explained through the Logistic regression model.

5-
The study explained that the independent variables represented by the gender, the age, the pressure in the right eye, the pressure in the left eye, HbA1C, and the anemia are representative variables for the study of Cataract disease which affects the eyes.

Recommendations
The researcher recommends the following: 1. Adopting the logistic regression method as an efficient statistical method for the classification of patients in the medical field.
2. Conducting extensive studies on ophthalmology and expanding the sample size further to cover more and more aspects of the human eye diseases.
using the Maximum Likelihood Method as(Hosmer, David W.   & Lemeshow, Stanley, 2013)    When we have sample size (n)

L 0 :
Force of the Model Evaluation In this model in instead of the coefficient of determination by ( Maximum likelihood function in the case of the inclusion in the Model constant L 1 : Maximum likelihood function in the case of the inclusion in the Model all independent variables And Can be calculated ( ) a s : Qasim &Abdel Razaq, 2011).= (7) L 0 : Maximum likelihood function in the case of the inclusion in the Model constant .

Table ( 1
) shows the number of the cases inserted in the model, which is 116 cases.While the number of the missing cases was zero.

Table ( 2
) shows the values of the variable binary response, as the code zero stands for the injured people, while the code number one stands for the uninjured people.
achieved in the iteration (20) with value of (0.0).Notice that the initial value in the indicated model equals to (143.695),Estimates were ( ) achieved in the iteration (20) are the best estimated in this model.

Table 3 .
Iteration History a,b,c,d

Table 5
Estimation terminated at iteration number 20 because maximum iterations have been reached.Final solution cannot be found.It can be observed in table (5) that the value of (R 2 Cox & Snell=0.71), which means 71% of the occurred changes in the response variable were explained through the Logistic regression model.While the value (R 2 Nagelkerke =1) means 100% of the change in the variable of the binary response were explained through Logistic regression model.It is clear from the results in table (

Table 6 .
Hosmer and Lemeshow Test

Table 7 .
Contingency Table for Hosmer and Lemeshow Test

Table ( 8
) shows the percentage of the right classification of the injured people with Cataract disease were classified correctly 100%, Also the percentage of the uninjured people with Cataract disease was classified by 100%.While the percentage of people classified incorrectly was 0.0%.That gives us an excellent indicator to the ability of the model to classify correctly.