Factors That Influence the Entrepreneurial Intention of Nigerian Postgraduates : Preliminary Analysis and Data Screening

The aim of this paper is to conduct a preliminary analysis and Data screening with relation to the effect of Attitude, subjective norm and perceived behavioural control, on the entrepreneurial Intentions of Nigerian Postgraduates. 240 Master and PhD candidates were surveyed from Universiti Utara Malaysia (UUM) and the study utilized the convenience sampling method, which result to 156 respondents. The study was equally conducted to suit the multivariate analysis assumptions. Using the Statistical Package for Social Science (SPSS) software version 20, the univariate and multivariate outliers are checked and treated, the check for missing Data was performed, so also the kurtosis and skewness, factor analysis and the reliability test of the cronbach coefficient alpha. The data was finally ready for the multivariate analysis as it fulfilled the necessary assumptions for that. The findings are therefore important to the study and that of other researchers whom will benefit from the literature to conduct data screening and preliminary analysis.


Introduction
Economic development will be absent unless there is a growth in venture creation that will improve employment availability.Venture creation is thus, a significant terminal for job creation among both developed and developing nations (Owoseni, 2014;Uddin & Bose, 2012).Entrepreneurship therefore, has a significant status in the fast changing global socioeconomic environment (Ali, Topping, & Tariq, 2010).The interest of policy makers in entrepreneurial development is growing (Davey, Plewa, & Struwig, 2011;Karabulut, 2014;Owoseni, 2014).Government and other institution's efforts are evidence for that (Karabulut, 2014).Nonetheless, business creation is a cumbersome decision due to its nature of voluntary process with conscious intention (Linan, Nabi, & Krueger, 2013).
Despite the mounting rate of unemployment and the effect of such to crimes, law and order, (Owoseni, 2014) only a few studies were conducted on entrepreneurial intentions in the developing countries (Nabi & Linan, 2011;Sandhu, Sidique, & Riaz, 2011).Specifically, Nigeria is lacking empirical researches on entrepreneurial intentions (Izedonmi & Okafor, 2010).
According to Agbim, Oriarewo, and Owocho, (2013) a lot of contemporary studies revealed the average entrepreneur to be more educated than the ordinary man.Therefore, studies on entrepreneurial career intentions are plenty on university students but only a few are conducted on graduate schools (Karabulut, 2014) and or postgraduate candidates (Sandhu et al., 2011).

Literature Review
Intention is the best predictor of behaviour thus; it can predict the process of venture creation (Krueger, Reilly, & Carsrud, 2000).Venture creation, is not likely to take place without intention (Owoseni & Akambi, 2010).Attitudes, subjective norms and perceived behavioural control are described as the antecedents of intention (Ajzen, 1991).Thus, they can influence the entrepreneurial intention and behaviour of people.

Methodology
The study utilized the quantitative survey method and the data was analysed using the descriptive and inferential statistics with the aid of SPSS version 20.

Population
The study population covers the 240 Nigerian postgraduate candidates of UUM which consist of the three graduate colleges of the university which are the College of Business (COB), College of Legal and International Studies (COLGIS) and the College of Arts and Sciences (CAS).Out of this 240 candidates, 157 are from COB, 49 candidates from CAS and 34 from COLGIS.

Sampling
The Krejcie and Morgan (1970) sampling table was used to ascertain the representative sample, the table states that the representative sample for a population of 240 should be 148, thus, the study pick 156 sample which satisfy the minimum requirement for the representative sample.The convenient sampling method was however used to collect the data, while ensuring that candidates from all the three colleges of the University are included in the survey.

Response Rate
Out of the 190 questionnaires that have been distributed to the population of 240 postgraduate candidates, 156 of the questionnaires were returned indicating a response rate of 82%.The 156 respondents represent exactly 65% of the population and have fulfilled the requirements of a representative sample of the population which was calculated to be not less than 148 of the 240 candidates by Krejcie and Morgan (1970).

Findings and Discussion
The findings and analysis of the personal background of the respondents, as well as other preliminary analysis will be discussed in this section.
Those who have ever owned a business are 111 (71.2%) and those that do not are 45 (28.8%).Those whose family members run a business are 140 (89.7%) and those that do not are 16 (10.3%).The respondents that have a role model on self-business are 122 (78.2%) those that have no role model are 34 (21.8%).

Test of Non-response Bias
The non-response bias is important for the study because, there is a bias possibility which need to be scrutinized irrespective of the small amount of a non-response (Sheikh, 1981).The mistake of which the researcher expects to perform during the estimation of a sample characteristics due to the under representation of some group of respondents as a result of non-response is referred to as non-response bias (Berg, 2002).Singer, (2006, p. 641) states that "there is no minimum or maximum response rate below or above which a survey estimate is biased or never biased".
The respondents are classified in to the early response and the late response.This classification was tied to the four variables of the study (Attitudes, Subjective norms, Perceived Behavioural Control and Entrepreneurial Intention).The questionnaire was first distributed in the second week of August thus, the study test the nonresponse bias of those that responded within August (early response) and those that respond in September (late response).From Table 2 below, the range mean and standard deviation for both the early and late response are varied distinctly.In Table 3

Data Prepared for Analysis
Each of the questionnaires was given a serial number before it is keyed in to the SPSS software.This is to assist in tracing and to facilitate a thorough checking to ensure that the information is entered correctly.The serialization also facilitates the task of distinguishing the earlier and the late respondents.

Coding
Coding was made to ease and facilitate the identification of items, thus, all items had been coded to ease keying the data and the analysis.The coding is based on each variable and was recorded accordingly with respect to the constructs.

Data Editing
Each questionnaire was checked through during the collection in order to avoid incompleteness; fortunately, all the returned questionnaires are fully answered.This might be related to the level of knowledge of the respondents whom are all postgraduates.Thus, there is no incomplete questionnaire or missing data.

Missing Data
Preventive measures were taken right from the start of the survey in order to avoid or reduce the rate of the missing data.This is due to its effect on the analysis.Thus, the filled questionnaires were properly checked from the start to the end and just after the collection, to ensure that each item is properly responded to.The participant that missed a question(s) should be politely requested to respond to that question properly (Maiyaki & Moktar, 2011) this will significantly help in reducing the amount of the data that was missed (Gorondutse & Hilman, 2014).
If the missing data is up to 25% or more, then it is advised that the questionnaire should be excluded from further analysis (Cavana, Delahaye, & Sekaran, 2001).According to Hair, Black, Babin, and Anderson, (2010) any case of a missing data that is greater than 50% should be excluded as extensive as the sample is adequate (Maiyaki & Moktar, 2011).However, if there is any significant missing data at random, the data should not be used for further analysis and therefore be removed (Maiyaki & Moktar, 2011).
After the data was keyed into the SPSS software, the descriptive statistics were utilized to examine if there was any data that is missing, the result showed that there are no any missing data values, and therefore the data is good for further analysis.

Assessment of Outliers
Another vital step of screening data is the assessment and treatment of the outliers.Outlier is the excessive case score which might possibly have a notable negative influence on the results (Maiyaki & Moktar, 2011).Outlier issues usually have an uncommon low or high value, a construct or a unique mixture of values upon numerous constructs, which cause the test stand out from the remnant rest (Hair et al., 2010;Bryn, 2010).
Utilizing the multivariate analysis might consequently prove the detection and treatment of outliers.Both the univariate and the multivariate outliers were examined in this study.The univariate outliers have been examined through detecting the cases with higher z-score values.Thus, any case with the value of a standardized z-score that is above 3.29 is regarded as a potential univariate outlier (Tabachnick & Fidell, 2007) and thus, 10 cases have been identified and removed.
On the other hand, the Mahalanobis Distance (D) was performed to find and treat the multivariate outlying issues (Hair et al., 2010) with reference to the suggestions of Tabachnick and Fidel (2007).According to Tabachnick and Fidel (2007) the numbers of items used in the study are checked under the chosen degree of freedom in the Chi-square table, in this case 21 items are adopted at the degree of freedom of P < 0.05 which revealed the standard to be 32.671.Therefore, any value that has a Mahalanobis Distance of 32.671 or above is regarded as a multivariate outlier that needs to be deleted.Fortunately, there is no single case that has above the standard of 32.671.

Normality
Screening for normality is a very important step in almost all multivariate analysis so far the final objective of a study is to make inference (Tabachnick & Fidell, 2007;Hair et al., 2010).According to Tabachnick and Fidell, (2007) Test Normality is concerned with the nature of the circulation of data for a single regular construct and the relationship of that to normal distribution.Hair et al. (2010) and Tabachnick and Fidell, (2007) stated that the most important postulation in the multivariate analysis is the test normality.
The Normality test includes the univariate and the multivariate normality, all of which are treated in this study.
The values of skewness are found to be below 2, while the values of kurtosis are below 7. The range of acceptable values of the Skewness is < 2 and < 7 for the Kurtosis (Gorondutse & Hilman, 2014).Thus, the values are within the range of accepting.
If the values are above the acceptability range, the best way to deal with it is by transforming the variable which will enhance the results (Tabachnick & Fidell, 2007).
Homoscedasticity test is also associated to the normality assumption and that the heteroscedasticity is absent when data is fairly normal thus, the variables relationships are assumed homoscedastic (Tabachnick & Fidell, 2007).The absence of heteroscedasticity, and the assumption of homoscedasticity are both satisfied in this study as both the univariate and the multivariate normality are verified.

Multicollinearity
According to Maiyaki and Moktar, (2011) multicollinearity will make the analysis weaker; this is because the interrelationship between two or more variables will grow the size of error terms as the interrelated variables will contain unnecessary information.The solution for multicollinearity issue is to delete the interrelated variable (Gorondutse & Hilman, 2014).The multicollinearity issue will therefore be verified using the Correlation and VIF/tolerance level analysis.

Correlation Analysis
To ascertain the direction and strength of the relationship between the variables of this study, the Pearson correlation was utilized.This will help to understand whether there is a threat of multicollinearity or not.
According to Tabachnick and Fidell, (2007) the issue of multicollinearity arises when the relationship between the independent variables is up to 0.9 and beyond.The Pearson Correlation analysis is depicted in Table 4 below: From the above table, we can see that none of the variables are up to 0.9 thus, there is no any threat of multicollinearity in consideration to the arguments of Tabachnick and Fidell (2007) and Hair et al., (2010).

Variance Inflation Factor (VIF)
Another method for screening the multicollinearity issue is the Variance Inflation Factor (VIF) and the tolerance level, which can be conducted through Regression analysis in the SPSS (Gorondutse & Hilman 2014).According to Hair et al., (2010) the tolerance value must not exceed 0.10 while the VIF value must not go beyond 10.When the VIF is less than 10 the result is good enough (Tabachnick & Fidell 2007).Table 5 will show the VIF and the tolerance value for each of the independent variables.From Table 5 we can see that there is no threat of multicollinearity because the VIF for all the independent variables are less than 10 and the tolerance values are also more than 0.10.

Factor Analysis for the Variables
The whole items for this study have been subjected to the Principal Component Analysis (PCA) by the use SPSS software (Hair et al., 2010;Bryn, 2010).Although the items of the study are adopted from past studies, the factor analysis is still important (Gorondutse & Hilman, 2014).
The results show that all values are < 0.9 in the correlation matrix, indicating the data has no case of multicollinearity (Hair et al., 2010;Nunally & Bernstein, 2004).The correlation matrix further shows a number of coefficients with values of > 0.3 therefore, the first obligation to assess the is fulfilled (Gorondutse & Hilman, 2014).Kaiser, (1974) recommended that Kaiser-Meyer-OLkin (KMO) values ranging from 0.5 to 0.7 are mediocre, values from 0.7 to 0.8 are regarded to be good, values from 0.8 and 0.9 are classified to be great while values above 0.9 are termed to be excellent.The result of the KMO measure of sampling adequacy was found to be 0.895 which is above the value of 0.6 that is recommended (Kaiser, 1974(Kaiser, , 1970;;Maiyaki & Mouktar, 2011).Thus, the value of 0.895 is a great value and therefore, the data is regarded fit for the factor analysis.
Additionally, the Bartlett's Test of Sphericity revealed a statistically significant value of P > 0.001 which sustained the factorability of the correlation matrix, noting some associations between the variables under study.
The cumulative variance was 40.696.However, the result revealed a communality value of above 0.5 for all items with the exception of PBC2 that has 0.494 which is noted to be deleted.This is because, according to Kaiser, (1974) the value of communality for all variables should be ≥ 0.50.6 above, all factors have a high factor loading which confirmed that the constructs are measured by different variables as earlier postulated.

Reliability Analysis
Reliability analysis discloses the degree of which a measure is error free, and unveils the consistency, stability and goodness of the measure.The Cronbach alpha is the most generally used technique for the reliability analysis (Cavana et al., 2001).The goal of measuring the Cronbach coefficient alpha is to ascertain the internal consistency of a scale (Sandhu et al., 2011).Cronbach alpha also indicates how the study items are correlated to each other (Sekaran, 2003).The closer Cronbach alpha is to 1, the greater the internal consistency (Sandhu et al., 2011).All values of the Cronbach alpha are greater than 0.70 in this study, thus; the instruments are internally consistent.The results of the reliability analysis are depicted in Table 7 below.Entrepreneurial Intention 6 .825

Conclusion
The findings of this study show that the data fulfilled the needs and the essential prerequisite to the multivariate analysis stage.This study has removed the multivariate and the univariate outliers as suggested by Tabachnick and Fidell, (2007) and Hair et al. (2010) the data is therefore, turned to a normal distribution.Non-response bias is also not experienced in the study.The data is neat and screened completely thus, ready for multivariable analysis (Tabachnick & Fidell, 2007).Multicollinearity was also not found to exist in reference to the suggestions of Tabachnick and Fidell, (2007) as well as Hair et al., (2010).The implication of this study is that the data has been made worthy for the most imperious suppositions and needs for the multivariate stage analysis and will provide literature to researchers.The findings will therefore, offer an insight to advance analysis and will provide the understanding of why and how this may be assorted in an intensifying environment viewpoint.

Table 1 .
Descriptive results for the respondent's profile

Table 2 .
the t test for the two tailed result depicts an insignificant difference with the early respondents with Attitude (t 1.350, p < 0.179), Subjective norms (t 1.027, p < 0.306), Perceived Behavioural Control (t 1.692, p< 0.093), and Entrepreneurial intention (t 0.176, p < 0.871).Descriptive statistics for early respondents and late respondents

Table 3 .
Independent samples T-test for equality of means Leven' Test for equality of variance

Table 4 .
Correlation between the study variables

Table 5 .
VIF and tolerance values for multicollinearity test

Table 6 .
Table 6 below shows the values of communality and the factor loadings for each item.Factor loading and communality for exogenous variables

Table 7 .
Cronbach coefficient alpha values for attitude, subjective norms, perceived behavioural control and entrepreneurial intentions