Customer Clustering Using a Combination of Fuzzy C-Means and Genetic Algorithms

This study intends to combine the fuzzy c-means clustering and genetic algorithms to cluster the customers of steel industry. The customers were divided into two clusters by using the variables of the LRFM (length, recency, frequency, monetary value) model. Results indicated that customers belonging to the first cluster had a higher length of the relationship, recency of trade, and frequency of trade but lower monetary value compared to the average values of these criteria for all customers. The results also showed that customers belonging to the second cluster had a higher recency of trade and monetary value but lower length of the relationship and frequency of trade compared to the average values of these criteria for all customers. It was also found that the combined algorithm (i.e., fuzzy c-means clustering and genetic algorithm) used in this study had a lower mean squared error (MSE) compared to fuzzy c-means clustering.


Introduction
Identifying customers and customers' needs is an important factor which helps service providers and producers to gain competitive advantage in providing their products/services to the customers. Managers should prioritize their customers and focus on key customers and they should also try to gain a better intuition regarding the costs associated with losing their customers on a daily basis. This is important because when the customers discontinue doing business with us and move to our competitors, various negative consequences such as losing current revenues due to discontinued business relations or losing good reputation and credibility are expected (Tarokh & Sharifian, 2010). Our current customers will probably share their negative experiences with prospective customers. This loss of credibility can lead to a loss of current and potential customers' trust in our products/services. Experts suggest firms to design policies and competitive strategies, therefore organizations cannot disregard their fundamental goals including the achievement of competitive advantage. Identifying different groups of customers and their needs can lead to customer satisfaction, which in turn contributes to customer loyalty. In the long term, it is more beneficial to identify key customers and retain them rather than acquiring new customers to fill the empty place of those who have decided to discontinue doing business with the organization. This is mainly because the cost of new key customer acquisition is five times more than the cost of current customer retention (Griffin & Lowenstein, 2001). Companies have a higher probability of success in selling their products/services to their current customers compared to their prospective customers; the probability of successfully selling a product/service to current active customers is roughly 60-70 percent, while this probability is only 5-20 percent for prospective customers (Tarokh & Sharifian, 2010).
Classification is one of the most important topics in customer relationship management (CRM). In customer classification, the entire population of customers is divided into smaller groups, such that customers in the same group have similar characteristics. Ideally, organizations should have a good understanding about all of their customers, but this is not feasible in real world. Customer classification and clustering enable the firms to group similar customers together and help managers to better understand the customers' needs; because it is much easier to identify and analyze the characteristics of groups of customers rather than studying each customer individually. Data mining has various tools for customer classification. One of the most famous clustering techniques is fuzzy c-means clustering which has problems with high dimensional data sets and a large number of prototypes (Winkler et al., 2012). Additionally, the performance of fuzzy c-means algorithm is strongly affected by the selection of the initial centroid clusters (Arnaldo & Bedregal, 2013). Therefore, in order to obtain better clusters, this study combines genetic algorithms and fuzzy c-means clustering. The main goal of this study is to use this combined clustering method to divide customers of the steel industry into two different clusters.

Research Question and Problem Statement
Effective relationship marketing is an art in today's business environment. Constructing relations with clients is an important skill which is required in order to retain the key customers. Classical marketing theories mostly focused on trade and did not consider customer retention as an important aspect of firms' marketing strategies. But as the markets became more competitive and saturated, and as population combinations of different areas changed over time, companies realized that they are no longer facing a growing economic system with growing markets. In order to survive in these competitive markets, firms need to gain competitive advantage (Amiri Aghdaie et al., 2012;Riasi, 2015aRiasi, , 2015b. One of the most important determinants of competitive advantage is demand condition (Porter, 1990;Porter 1991) meaning that firms should focus on retaining their customers in order to ensure that their business is economically viable (Riasi &Amiri Aghdaie, 2013;Riasi & Pourmiri, 2015). Nowadays, every single customer has a special value because firms are struggling to obtain a larger share of their fixed or diminishing markets. As a result of this, the cost of new customer acquisition has been significantly increased. Companies have now realized that losing a customer does not simply mean that they are unable to sell a single item to that customer at a specific time, rather it means that they will lose all future purchases of that customer during his/her entire life (Kotler, 1994). Krasnikov et al. (2009) examined the impact of CRM on cost efficiency and profit efficiency and found that although CRM implementation leads to a decline in cost efficiency, it is associated with an increase in profit efficiency. The impact of CRM on long term profitability of the firms is very important, because the main objective of most firms is to maximize the value delivered to the shareholders. In order to maximize the shareholder value it is required to measure the value which is generated from different customers during a specific period and to identify the customers or groups of customers that contribute the most to the firm's value. After identifying these types of customers the firm should motivate them to establish long term relations. As a result of these actions, the customer loyalty will be enhanced, customer life-cycle will be optimized, and eventually the firm will become more profitable. In addition to these benefits, customer satisfaction and relationship commitment have a positive impact on firm's brand loyalty and brand awareness (Kim et al., 2008). It is clear that a firm's relation with its customers can affect its prosperity and it should be considered as an important issue when designing customer retention strategies. By dividing their customers into different clusters, firms can better decide how to effectively allocate their limited resources to different groups of customers based on their value. Also, by using customer clustering techniques, firms can effectively design their customer retention strategies and maximize their overall profitability.
Organizations collect tons of data about their customers, suppliers, and their business partners, but their inability to discover the latent knowledge which is inherent in these valuable data can make the whole data collection process useless. Many business owners are willing to discover this untapped knowledge from their data sets in order to increase their profitability. Data mining techniques including clustering methods enable the firms to exploit this hidden knowledge. Using clustering techniques, customers are divided into homogenous clusters in which customers with similar needs and characteristics are grouped together (Ghazanfari et al., 2010). After identifying the needs and values of its customers, a company must provide valuable products and services to its clients in order to increase their satisfaction and to ensure that they will remain loyal to the firm (Moslehi et al., 2012). But the main question is how to identify the key clients of the organization and how to analyze their behavioral attributes?

Fuzzy C-Means Clustering
In classical clustering each input data point belongs to exactly one cluster and cannot be part of two or more clusters; in other words, the clusters do not overlap each other. Imagine that there is a data point which has characteristics similar to two or more clusters, in classical clustering we have to decide which cluster is more suitable for this data point and allocate it to only a single cluster. The main difference between classical and fuzzy clustering is that in fuzzy clustering a data point can belong to more than one cluster. Similar to classical c-means algorithm, in the fuzzy c-means algorithm the number of clusters is also known beforehand. The objective function of this algorithm is as follows: Where m is a real number larger than 1, x k is the k th data point, v i is the centroid of the i th cluster, u ik is the degree to which data point k belongs to cluster i, and ||x k -v i || is the Euclidean distance between k th data point and i th cluster center. Using u ik a matrix U with c rows and n columns can be defined, where each element of the matrix can have a value between 0 and 1. If all the elements of matrix U are either 0 or 1 then the algorithm is similar to classical c-means. Although the elements of matrix U can have any value between 0 and 1, but the elements in each column should sum to 1. In other words: This indicates that the sum of the proportions that each data point belongs to each of the c different clusters should be equal to 1. Using the above condition and by minimizing the objective function we will have:

Data Analysis and Results
In order to perform the clustering, the data from 120 customers were collected and normalized. The data included four different variables, namely, length of the relationship, recency of trade, frequency of trade, and monetary value. In order to perform the fuzzy clustering, GA-Fuzzy Clustering software was used. This software performs the clustering by combining fuzzy c-means clustering and genetic algorithm. The parameters of these two algorithms are displayed in Table 1. After running the software, the customers were divided into two clusters. The cluster centers for each criterion were calculated as shown in Table 2. According to Table 2, customers belonging to the first cluster had a higher length of the relationship, recency of trade, and frequency of trade but lower monetary value compared to the average values of these criteria for all customers. Customers belonging to the second cluster had a higher recency of trade and monetary value but lower length of the relationship and frequency of trade compared to the average values of these criteria for all customers.
www.ccsen In order t algorithm) shows the was used i However, run time, b accuracy.  Vol. 11, No. 7; customer clustering. This combined algorithm was used because it had a higher efficiency and it did not have the deficiencies associated with classical clustering algorithms.
The customers of steel industry were divided into two clusters according to the variables of the LRFM model. The results revealed that customers belonging to the first cluster had a higher length of the relationship, recency of trade, and frequency of trade but lower monetary value compared to the average values of these criteria for all customers. On the other hand, the results indicated that customers belonging to the second cluster had a higher recency of trade and monetary value but lower length of the relationship and frequency of trade compared to the average values of these criteria for all customers. Therefore, based on the findings of Chang and Tsay (2004), the customers in the first cluster can be considered as loyal customers according to loyalty measures and they can be considered as uncertain customers based on value measures. Also the customers in the second cluster can be considered as newcomers based on the loyalty matrix and they can be considered as uncertain customers based on the value matrix. Additionally, the findings revealed that the combined clustering algorithm had a lower MSE and a higher run time compared to the fuzzy c-means algorithm. Since the accuracy is more important than speed in customer clustering, the authors suggest that future studies use a combination of fuzzy c-means clustering and genetic algorithm in order to obtain the most accurate clusters. This study clearly contributes to the literature in this field by showing that a combination of fuzzy c-means clustering and genetic algorithms is an efficient way to perform customer clustering in steel industry.