Explaining Individuals’ Usage of Social Commerce: A Data Mining Approach

With the use of Web 2.0 technology, e-commerce is undergoing a radical change that enriches consumer involvement and enables a better understanding of economic value. This emerg ing phenomenon is known as social commerce. Social commerce (s -commerce) presents a new alternative for consumers to search for and find informat ion about products they are seeking to buy. In spite of its universality, the adoption of this burgeoning technology is affected by several factors. This research project is an initial attempt to explore individuals ’ intention of s-commerce usage through the data mining approach. The data was collected via a web -based questionnaire survey of 360 social network site (SNS) users in Jordan. Data mining techniques were then used to analyze the collected data in order to figure out what group of features is best for predicting s -commerce adoption among SNS users. The results showed that data characteristics related to gender, monthly income, civil status, number of connections, and prior online shopping experience are key factors in the classificat ion process. The findings may assist researchers in investigating social commerce issues and aid practitioners in developing new s-commerce strategies.


Introduction
In today's rapidly evolving online world, the growth of social network sites is continually bringing new concepts to light. One of the main concepts that has emerged due to the existence of both Web 2.0 technologies and social med ia is social co mmerce (s -commerce). According to Zeng, Huang, and Dou (2009), online transactions (e-co mmerce) have developed into s -commerce due to the interactions and connections between people, especially on social networking sites (SNSs). S-co mmerce d iffers fro m e-co mmerce in that it is establis hed on several kinds of social media platforms, including Facebook, Instagram, and Twitter. It uses social media to support commercial activit ies (Chen & Shen, 2015). S-co mmerce is often represented as a subarea derived from e-commerce (Hajli, 2013;Sturiale & Scuderi, 2013); Wu, Shen, and Chang (2015) defined it as "word-of-mouth applied to e-commerce." Although social commerce is a new trend, this phenomenon has developed quickly and remarkably (Barnes, 2014). Social p latforms such as social networking sites have their own ro le in the s -commerce advancement. A report conducted by Barclays (2012) shows that half of the U.K. consumer population will be part of the s-commerce phenomenon by 2021. A new report conducted by We Are Social reveals that 2.80 billio n users are active on various social network sites, amounting to 37% of the total global population in 2018 (Kemp, 2018). HnyB Insights (2014) published a report stating that the revenue of global s -commerce is growing year by year, predicting the market to reach $80 billion U.S. dollars by 2020. Popular social networking sites such as Facebook, Instagram, and Twitter have their o wn impact on s -commerce gro wth. Facebook is considered to have a noticeable share in s-commerce activ ities. In 2015, the number o f purchases increased to 52%, in co mparison with the prev ious year (McCarthy, 2015). According to Chayko wski (2015), 50 million small co mpanies have Facebook pages to communicate with their customers, 4 million co mpanies pay for advertisements on Facebook, thus demonstrating the platform's influence on electronic commerce in general and s -commerce in particular. Facebook's Instagram subsidiary also plays an important role in s -co mmerce; on that platform, 4.21% of users interact with popular commercial brands, a rate 58 times higher than brand engagement on Facebook (Mathison, 2018).
A growing body of literature recognizes the importance of s -co mmerce and SNS usage among people of different ages and social statuses. In th is context, the greatest significance o f s-co mmerce is its capacity to become the most universally adopted platform for electronic co mmerce worldwide (Zhang & Benyoucef, 2016). To date, however, there are few studies published in this field. Nevertheless, authors of studies such as the one conducted by Chen and Shen (2015) assert that research on s -commerce is increasing. To take advantage of s-commerce, Hennig -Thurau, Ho facker, and Bloching (2013) and Liébana-Cabanillas and Alonso-Dos-Santos (2017) believe that s-commerce consumer behavior should be studied. In the Hashemite Kingdom of Jordan, 67% of adults have internet access, whereas 90% of these users are active on SNSs; in comparison, rates in other economically advanced countries such as the United States and the United Kingdom are lower: 71% and 66%, respectively (Poushter, 2016). Thus, this issue is considered vital in this setting and area, and it requires an extensive push for further study. In an attempt to achieve this goal, the key objective of this paper is to examine patterns in the existing data to predict the factors that may affect s -co mmerce adoption depending on personal and SNS usage-related features.
The paper is divided into five sections. In the introduction, we present the logic and main purpose of the study; in the second section, we rev iew previous studies in this field. The third section consists of the research methodology, follo wed by a comparative analysis and its results in the fourth section. The paper will end with a summary of the study's findings and a discussion on possible future research.

Background
S-co mmerce, or social co mmerce, is defined as commerce activ ities med iated through social network sites. Jascanu, Jascanu, and Nico lau (2007) stated that through different SNS avenues such as reviews, ratings, chat rooms, locator applications (geo-tagging), ranking, and reco mmendations, customers have been enabled to share their informat ion, experiences, and opinions related to d ifferent products and services. Yahoo! was the first to introduce the term "social co mmerce" in 2005 (Jascanu et al., 2007), and since that time, it has been described as a new and powerful phenomenon. This novel advancement in e-commerce makes consumers active participants in the online business process (Hajli, 2015). Statistics show that s-commerce has developed notably since its inception, allowing the creation of new business models based on online co mmunities (Hajli, 2015). Social commerce has gained wide popularity since 2005. For example, a survey with 2 000 respondents conducted by Immediate Future (2010) indicated that 53% of U.K. consumers review products and services online. This survey also revealed that those reviews are 157% more influential than traditional advertising methods. A report conducted by McKinsey revealed that Chinese people spend 78 minutes per day in s -commerce act ivities (as cited in Chiu, Ip, & Silv, 2012), and around 50% percent of Ch inese people buy products depending on the online recommendations of relatives and friends (Liu, Chu, Huang, & Chen, 2016).
S-co mmerce has benefits for both customers and firms because s -commerce p latforms simplify both consumer-to-consumer (C2C) and business -to-consumer (B2C) connections. According to Xiang, Zheng, Lee, and Zhao (2016), consumers have the ability to interact with other consumers who can help them when choosing products and services. Moreover, they can take advantage of ratings and reviews of products and sellers left by other online actors (Zheng, Zhu, & Lin, 2013). Based on these reviews and ratings, customers can evaluate a product's quality and decide whether they desire it. Additionally, they may know more about the seller's credibility and the online shopping experience. Jascanu et al. (2007) state that social co mmerce g ives consumers an opportunity to share informat ion and express their opinions about purchasing products. Kiet zmann, Hermkens, McCarthy, and Silvestre (2011) argue that social med ia influences a firm's reputation and impacts its transactions. As a result, firms should pay attention to social med ia and the emergence of new concepts such as s-commerce.
Several prev ious studies indicate that s -commerce has a positive influence on firms. Kim and Park (2013) state that 300 Korean s-co mmerce firms have gained appro ximately $300-$500 million in sales. Thus, firms can increase their profits by attracting buyers via positive recommendations (Curty & Zhang, 2011). To take advantage of this form o f co mmerce, firms may use SNSs in an attempt to sell p roducts, in addition to asking their consumers to comment and write about their experiences on social platforms in order to introduce their products and services. Via strategic use of s -commerce, co mpanies can enhance their relationships with consumers and therefore increase both sales and brand loyalty (Hajli, 2014). Moreover, s-commerce provides businesses with a variety of ways to build customer relat ionships, reduce market ing expenses, and increase sales (Chen, Lu, & Wang, 2017). Dell and H&M are examp les of major co mpanies that use s -commerce to present their products (Hajli, 2015).
However, several factors influence the presence of customers in s -co mmerce. Wigand, Benjamin, and Birkland mas.ccsenet.org M odern Applied Science Vol. 12, No. 8;2018 state that customers ' need to be independent, successful, and connected to others are key factors that motivate them to participate in s-commerce. Sharma and Crossler (2014) use both the trust theory and uses and gratifications theory to clarify consumers ' intention to adopt s-commerce. They found that trust factors such as security, privacy, and information quality influence the intention to be engaged in s -commerce. Zhang, Gupta, and Zhao (2014) designed a model based on a stimulus -organism-response model to investigate the factors affecting participation in s -co mmerce. Technological environ mental features such as interactivity, personalization, and sociability have an impact on the customer's experience regarding social support, social presence, and flow, wh ich in turn influence the intention of s -commerce adoption. Thus, interactions between customers (i.e., expressing their opinions and sharing informat ion) will increase the trust and therefore increase the social support.
Previous studies have established that gender, age, prior online shopping experiences, and income affect consumers' adoption of e-co mmerce and s-commerce. Rodgers and Harris (2003) studied the role of gender in e-commerce adoption, which was analyzed together with the role of prio r experience of online shopping. Other studies such as those by Zhang, Benyoucef, and Zhao (2015); Rapp, Beitelspacher, Grewal, and Hughes (2013); Z. Li and C. Li (2014); and Chang, Yu, and Lu (2015) hold that consumer demographics such as age and income play a role in s-co mmerce adoption in addit ion to being factors that may help co mpanies and industries leverage their marketing potential.

Research Methodol ogy
This research applies data min ing techniques and methods to generate new knowledge using data collected fro m a launched survey about social co mmerce. One of the aims of this research is to examine the capacity of d ata mining techniques to predict s -co mmerce usage among SNS users. The study uses a mixed -methods approach based on three steps. Namely, examine datasets against main classifiers; feature selection and weight the features using decision tree. The imp lementation of this study depends on WEKA software, wh ich offers several classification methods (Witten, Frank, Hall, & Pal, 2016).
The study survey contained general socio-demographic questions such as age, gender, income, and study and work field. The remaining sections were related to technology use such as type of smartphone used, internet access, experience in the internet field, and experience with s -co mmerce and e-co mmerce. In total, 49 features were ext racted. The collected data were analyzed in order to discover any common features between heavy and non-heavy s-commerce users.
Data were classified depending on how many times the respondent logs on to social networks to make purchasing decisions about a new product or service. The classes are split into six categories: None, 0-2, 3-5, 6-8, and 9-11/times per week, and mo re than 12 times per week. In th is project, t wo data sets were generated fro m the same collected data. In the first data set (Dataset1), we gathered the 0-2 and None categories, classifying them as the Low class, whereas the rest of the categories (3-5, 6-8, 9-11, more than 12) were classified as High. Ho wever, after running an analysis, the accuracy results obtained fro m Dataset1 were 0.68. In an attempt to make clear separation between the two classes (high and low class ), and to get better accuracy results, we modified our classificat ion fro m Dataset1 to generate Dataset2 as follows: we deleted the 3-5 category in order to clearly distinguish between the heavy s-commerce users' group and the low s-co mmerce users' group. Then we gathered the 0-2 and None categories and classified them as a Low class, whereas the 6-8, 9-11, and more than 12 categories were gathered under the High class (Table 1). The sample evaluated in Dataset1 contained 452 fu ll responses. After modification (i.e., deleting the 3-5 category), in total, 360 full responses remained in Dataset2; of these, 92 cases were deleted.
Our analysis proceeded along the following steps. First, we examined the two datasets against well -known classifiers, which we will e xp lore in the next section. In the second step, the best subset of features was selected by applying an auto feature selection process. Finally, in the third step, we studied the weight of each feature by using the decision tree algorithm. We illustrate more details about the analysis process in the following section.

Data Mining Methods
Data mining uncovers new significant connections, patterns, and orientations by analyzing the data through statistics, machine learn ing, art ificial intelligence (AI), and data visualization techniques. This process efficiently points out implicit, p reviously obscure, and potentially useful information found in the collected data, thus enabling the discovery of predictive patterns, the creation and testing of hypotheses, and the production of insight-provoking visualizations (Han, Pei, & Kamber, 2011).

Classification Algorithms
In the first step of data analysis, we selected several classificat ion algorithms that had the potential to yield good results. Some well-known WEKA classifiers were used in addit ion to two decision tree algorith ms (J48 and REPTree), the Naive Bayes (NB) Bayesian classifier, Mu ltilayer Perceptron (M LP), and a Nearest Neighbor algorithm (k-NN).

Reduced Error Pruning Tree (REPTree)
A classifier used in dealing with noise in decision tree learning, the REPTree was introduced in the post -pruning process based on the ideas of Quinlan (1987) and Pagallo and Haussler (1990;Brunk & Pazzan i, 1991 and Widmer (1994) described the algorith mic process as follows: in the first step, the data are div ided into two sets. The first is the growing set, generated through a realizat ion of learning algorith ms, whereas the second is the pruning set, which is produced through deleting literals and clauses from the theory until it results in a predictive accuracy decrease measured through the pruning set.

J48 Decision Tree
J48 is a decision tree stemming fro m the C4.5 decision tree algorith m (Kabakchieva, 2013). The C4.5 decision t ree algorith m points at data through informat ion entropy (Quinlan, 1993). Al-Zoubi, A lqatawna, and Faris (2017) described the repeated process applied on smaller subsets as follows: an attribute is selected fro m the collected data at each node by splitting the representative into subsets through the information gained criteria, resulting in a decision based on the highest attribute value.

Naive Bayes (NB)
The Naive Bayes (NB) is a probability theory-based algorith m that receives an independent contribution fro m each feature and exports it to the output class (al-Zoubi et al., 2017). In this process, no explicit clas sifiers are used. Despite its simplicity, Han et al. (2011) defended the algorithm's equality of accuracy in pragmatic applications.

Multilayer Perceptron (MLP)
Multilayer Perceptron (M LP) is one of the more co mmonly approached artificial neural networks; it uses one or mu ltip le hidden layers embedded as an aid to help expand the network's ability to model co mplex functions (Paola & Schowengerdt, 1995). Al-Zoubi et al. (2017) described its function as an informat ion processing system consisting of several layers used to map the input data into fitting sets of outputs. In the MLP, the input layer is passive, whereas both the hidden and output layers process the data actively (Zare, Pourghasemi, Vafakhah, & Pradhan, 2013).

k-Nearest Neighbor (k -NN)
The k-Nearest Neighbor algorith m (k-NN) is a closest training example in feature space classifier that is based on instance learning or lazy learning (Aha, Kibler, & Albert, 2013). The close instance is determined by using distance measurements and functions such as Euclidean, Minko wski, and Minimax (A l-Zoubi et al., 2017). Kabakchieva (2013) described it as one of the most average machine learning algorith ms because its c lassification process is established on a majority vote fro m the object's neighbors, thus assigning it to the most common class amid the k-nearest neighbors.   We examined Dataset1 and Dataset2 against the classifiers: J48, REPTree, NB, M LP, and k -NN. As explained previously, Dataset1 contains all the categories, with (0-2 and None) classified as Lo w and (3-5, 6-8, 9-11, and more than 12) classified as High. Table 3 shows the best accuracy rate for Dataset1 as 60.84%, achieved by the REPTree classifier, fo llo wed by the NB and J48 with 59.96% and 57.96%, respective ly. M LP and k-NN demonstrated the worst accuracy rates, with 54.87% for each. The best recall and F-Measure were also obtained by REPTree, whereas the second-best recall and F-Measure were achieved by NB, with 62.2% and 63.1% respectively. Moreover, the highest classifier for the precision was the NB, with a 64% rate, and the second highest results were achieved by REPTree.  Table 4). By removing the 3-5 category fro m the main class, the results improved in all measures. The best classifier for sorting the High and Lo w users was REPTree, with a 66.66% accuracy rate. The second-best result was obtained using the NB classifier. However, similar to Dataset1, REPTree achieved the best results for F-Measure and recall measures, with 78.6% and 88.8%, respectively. As mentioned in Section 4.1, the data with the best results, Dataset2, will go through an auto feature selection process.

Feature Selection (FS)
Feature selection (FS) can be applied to increase the accuracy rate of the classification (Al -Zoubi et al., 2017). In the machine learn ing context, the FS method reduces the dimensio nality of data without losing any in formation. This technique is used in the processing phase of the methodology to try to select the best subset of features and remove unneeded or irrelevant features (Faris, A la'M, & Aljarah, 2017;Faris et al., 2018), th us improving the results.  As shown in Table 5, all measures achieve a better result than the previous experiments did. The best accuracy was obtained by the J48 and NP, with a nearly 2.5% increase over all feature data in Table 3. In addit ion, the best recall and F-Measure also show a notable increase, with a 5.6% and 2.1% increase, respectively, which was achieved by REPTree. The classifier with the highest precision was the NB, with 74.5%.

Decision Tree
A decision tree is a tree-form graph of the possible consequences and outcomes of several choices. The tree helps to weigh these possible choices based on their probabilities, cost, and benefits. To be more specific, it is a decision support tool that can be used to map an algorith m in order to mathemat ically predict the most suitable choice (Safav ian & Landgrebe, 1990). In our case, the next step in analysis was to run the selected features on the decision tree classifier in order to analyze and weigh the features. The outcomes are illustrated in Figure 2.  Vol. 12, No. 8;2018 A summary of the ru les that can be concluded fro m the J48 Tree Model Dataset2 in Figure 2 will be h ighlighted in the following section.

Discussion
As several reports have suggested, s -commerce is characterized by two facets: social sharing and social shopping (Chen & Shen, 2015). Arising fro m the feature selection technique, the number of users ' connections on SNSs (F37) is among the best selected subset features that could influence the users ' adoption of social commerce services. This finding validates the assertion that social shopping is affected by a person's relationship with the surrounding community and other consumers, especially when consumers ' acquaintances take part in shopping decisions and other social pract ices that take p lace during the process (Chen & Shen, 2015). This study supports evidence fro m previous observations (e.g., Chen & Shen, 2015;Hajli, 2013) that demonstrated that both informat ional support and emotional support from a user's connections on social network can have an effect on his or her online shopping behavior.
Notably, the results confirm the relationship between gender (F1) and civ il status (F3) and their association with intensive use of s-commerce. This finding supports evidence from Awad and Ragowsky 's (2008) previous observations that women place greater value on online co mments fro m other consumers, wh ich confirms the role of gender in online shopping. Moreover, many researchers, including Campbell (2000) and Dittmar, Long, and Meek (2004), have noted that female users have a more positive attitude regarding online shopping compared to male users.
This study's findings also indicate that users with good prior experience of buying online fro m different websites (F49) will be strong candidates to be heavy s -commerce users. In addition, our findings support the idea that consumers who had a long and positive previous experience with shopping online using traditional websites such as Amazon and eBay will have a high capacity to use social networks to seek informat ion for products and make an actual purchase decision. In other words, the users who have these characteristics will be strong social commerce users. Although immediate conclusions are available, these findings need further analysis   As shown in Figure 2, the starting point (first branch) of all decision tree rules is F49, which stands for prior online shopping experiences. As seen in Table 6, n ine ru les were ext racted fro m the decision tree, of which three rules (R4, R5, and R8) are labeled as High. Thus, these are a significant group of factors that may pred ict heavy s-commerce use. In the following cases, the rules ' effectiveness is estimated by the fulfillment of the factors given a specific number. The rules can be illustrated as follows:

Algorithm mapping-Decision Tree Rules
⎯ R4 (H1), being the first of the High class rules, it develops as follows: if F49 (representing prior experience in online shopping) is higher than 0 as a result of being represented by 1 (accord ing to the key in Table 1, 1 means yes) and if F1 (representing gender) is h igher than 1 (mean ing it is 2 and according to the key in Table 1 means male) and F3 is higher than 1 (mean ing it is 2 and according to Table 1 means single) and F37 is higher than 7 (thus. according to Table 1, the number of connections on SNSs exceeds 2 000), then the prediction of being a heavy social commerce user is high; three cases fulfill all the above factors.  Vol. 12, No. 8;2018 In brief, R4 (H1) predicts that if users have prior online shopping experience and are single males who have more than 2 000 connections on SNSs, they are likely to be heavy SNS users; three cases reach the end of the branch. The same analysis is applied to all the rules whether the class is High or Low.
⎯ The second High rule R5 (H2) predicts that users with prior online shopping experience who are females and have more than 500 SNS connections are likely to be heavy social commerce users, with an accuracy of 26:8 cases (out of 34 cases, 26 cases fulfill all the above factors and 8 do not).
⎯ The last High-labeled rule R8 (H3) predicts that users with prior online shopping experience who are single females with a nu mber of SNS connections equaling or less than 500 and an inco me between 300 and 1 000 JDs monthly are likely to be heavy social commerce users, with an accuracy of five cases fulfilling the rule.
⎯ As shown previously, the Low class rules can be analyzed in the same manner.
The most important finding that we may ext ract fro m Figure 2 is that indiv iduals with no prior experience in online shopping are less likely to be heavy social commerce users, with an accuracy of 180:37 cases (Rule 1).

Research Implications
The findings of this study have a number of important implications for future pract ice. In an immediate application, they can serve as guidelines for s -commerce community managers and firms planning to reap the benefits of investing in s -co mmerce. As the results show, factors such as gender, civ il status, number of connections that users have on social networks, and prior experience in online shopping are among the most important co mponents of s -commerce usage. These features can be rendered as base pillars for investors interested in s-commerce and its imp lementation in their businesses. As recommended by Chen and Shen (2015), businesses may consider new init iatives such as providing innovative one -click sharing functions on SNS platforms and encouraging the sharing of high-quality content through monetary or virtual rewards. Businesses may also use the features uncovered by this study to target s -commerce campaigns by using popular topics and expert contributors to reach out to potential consumers in a s mart an d innovative way. It is important to remember, however, that s -commerce facilitation is based on relational factors. Through the presented findings, companies could find new ways to develop and sustain mutually beneficial and long -term relat ion with s-commerce users.
This study has examined consumers ' use of social commerce by analyzing different user features. A total of 49 features were extracted using data mining, which is a long-established approach for determin ing the significance of these features. A number of auto-selected features have been found to have a direct influence on s -commerce usage. Academic attention to these features, as well as investigations of other potentially influential features and their ro le in online shopping, will form a more co mprehensive picture of s -co mmerce and further improve our current understanding of this emerging phenomenon. This research has uncovered many questions in need of further investigation. Typically, data min ing aligns well with unstructured data. The data s et analyzed in this research project was collected using an online survey, limit ing the number of individual features that could be surveyed, collected, and then analyzed. More extensive work on the socio -demographic characteristics of online users and s-commerce participants is therefore required.
The findings of this research can be a milestone in shedding further light upon s -commerce data min ing project implementations. Future research could include additional possibility changes in the data set, offering a new setting for the classification algorith ms ' parameters, and so on in an attempt to achieve highly accurate results and extract additional important in formation fro m the available data. To the best of our findings and knowledge, this implementation pro ject is one of the pioneer studies using data mining techniques to examine data sets established based on an online survey to analyze human behavior in relation to specific technological use.
The key contribution of this study is paving the way for acad emics and researchers to use data mining in social science studies. It attempts to analyze the performance of classification algorith ms used in data mining based on a data set collected via online survey to evaluate their utility in fu lfilling the research project's goals and objectives. Further studies using this methodology should analyze the sufficiency of availab le data in producing reliable p redictions, thus looking at any required changes during the data collection phase or any additional ways to improve the process.
Finally, mo re research and studies on the use of the data min ing approach in social studies would help to establish a greater degree of accuracy in this realm. Co mb ining the auto selection feature technique with a decision tree is strongly recommended as a means to exp lain feature selection results and, consequently, to evaluate users' behavior and technology adoption process. This type of research could help provide businesses ' managers and academics with strategic and academic recommendations. mas.ccsenet.org M odern Applied Science Vol. 12, No. 8; 2018

Conclusion
The proliferation of social networks has unleashed myriad new opportunities as well as new obstacles for researchers and businesspeople alike (Chen & Shen, 2015). For examp le, SNS -based market p redictions are now more accurate due to informat ion exchange in social networks (Qiu, Rui, & Whinston, 2013). This research project is an in itial attempt to explore indiv iduals ' social co mmerce usage intention using a data mining approach. This project's findings were obtained by applying data mining classification algorith ms and an auto feature selection method to the collected data because each classifier of the five under consideration proceeds differently. We have noticed that the data related to gender, civil status, monthly income, nu mber of connections, and prior online shopping experience are among the major factors influencing the classification process.
In the applied realm, these findings will assist practitioners in imp roving their future social co mmerce strategies. Meanwhile, these findings can also guide researchers ' investigations of social commerce issues. We believe this project to be a primary step toward better applying data min ing in social science studies because the methodology employed can be further used in laying the first b rick for additional social science data min ing implementation projects.