Indoor Localization Based on Fingerprint Clustering

With the rapid development of the huge promotion of the Internet and artificial intelligence, the demand for location-based services in indoor environments has grown rapidly. At present, for the localization of the indoor environment, researchers from all walks of life have proposed many indoor localization solutions based on different technologies. Fingerprint localization technology, as a commonly used indoor localization technology, has led to continuous research and improvement due to its low accuracy and complex calculations. An indoor localization system based on fingerprint clustering is proposed by this paper. The system includes offline phase and online phase. We collect the RSS signal in the offline phase. We preprocess it with the Gaussian model to build a fingerprint database, and then we use the K-Means++ algorithm to cluster the fingerprints and group the fingerprints with similar signal strengths into a clustering subset. In the online phase, we classify the measured received signal strength (RSS), and then use the weighted K-Nearest neighbor (WKNN) algorithm to calculate the localization error. The experimental results show that we can reduce the localization error and effectively reduce the computational cost of the localization algorithm in the online phase, and effectively improve the efficiency of real-time localization in the online phase.


Introduction
With the rapid increase of data services and multimedia services, people's demand for localization and navigation is increasing. When people enter a complex indoor environment, they are often used for outdoor real-time localization. Global positioning system (GPS) (Ahn, Song, Sung, Kim, & Lee, 2010) technology cannot be positioned normally indoors, but the emergence of indoor localization and navigation technology just makes up for the shortcomings of GPS and other technologies. For example, in environments such as airport halls, exhibition halls, warehouses, supermarkets, libraries, underground parking lots, mines (Liu, Wang, Zhang, & Han, 2019), etc. It is often necessary to determine the indoor location information of mobile terminals or their holders, facilities and objects.
The proximity method is a rough localization method. The simplest way is to directly select the location of the AP with the highest signal strength, and the localization result is the location of the currently connected Wi-Fi hotspot stored in the hotspot location database. Geometric feature method is divided into trilateral measurement method, triangulation method and hyperbolic localization method. The trilateral measurement method first uses time of arrival (ToA) (X. Zhu, W. Zhu, & Chen, 2018) to measure the distance from the receiving device to each signal source, and then calculates the location of the receiving device through the location of the signal source. The triangulation method uses angle of arrival (AoA) (Zheng, Sheng, Liu, & Li, 2018) to measure the relative position or angle between the receiving device and the signal source, thereby calculating the position of the receiving device. The hyperbolic localization method uses time difference of arrival (TDoA) (Cho, Yeo, Choi, Park, & Lee, 2012;Zhao, Li, Hao, Wan, & Wang, 2019) to measure the difference in the distance between multiple signal sources of the receiving device to obtain multiple hyperbolas. The intersection of the hyperbola is the location of the receiving device. The fingerprint location method (Poulose & Han, 2020) is to measure the signal characteristics at each position in advance and store it in the fingerprint database. When localization, the current signal characteristics are matched with those in the fingerprint library to determine the location. Fingerprint location method is currently a popular indoor location algorithm. The fingerprint algorithm is usually divided into offline sampling data phase and online localization phase. On this basis, to improve the localization accuracy, in the offline stage, the collected RSS signals are preprocessed by Gaussian filtering, and then the fingerprints are clustered using the K-Means algorithm. In the online localization stage, we classify the RSS signals measured at the target localization point first, and then use the WKNN algorithm for localization calculations. Our main contributions are as follows: 1) We use the Gaussian model to filter the RSS signal values with high probability and remove the singular values, which effectively improves the stability of the data.
2) Our method of clustering fingerprints has effectively improved the localization accuracy, and the localization error after clustering is about 2.24m.
3) Our method reduces the calculation time in the online localization phase, and effectively improves the efficiency of localization.

Method
Our experiment includes two phases: offline phase and online phase. The offline stage is responsible for collecting data, processing data and clustering the fingerprint database, and the online stage realizes regional localization.

Offline Phase
(1) Data collection In different physical locations, the expressive power of signal strength is different, that is to say, the signal strength of access point (AP) received at each point is different. By detecting the signal value of APs arranged in the localization environment in advance at each sampling point, extracting the signal strength as a localization feature attribute, training it into a mapping relationship with the physical location, and constructing a corresponding location fingerprint database. Therefore, we need to deploy APs and plan sampling points in the target site, and then collect RSS signals at each sampling point. (

2) Preprocess
Due to the complex indoor environment and numerous obstacles, wireless signals will undergo refraction, reflection, diffraction, etc., resulting in multipath transmission effects, which will cause the RSS value of the signal measured at the same location at the same time to be unstable and remain in a fluctuating state. This phenomenon will This leads to inaccuracy of training data in the offline phase, which affects the accuracy of online localization. Therefore, the data must be filtered to reduce noise, effectively eliminate interference, and ensure data stability.
A common preprocessing method is the mean filtering algorithm, that is, the mean value of the RSS signal measured at each sampling point for a period of time is taken to reduce the random error caused by unstable factors. However, this method does not eliminate the singular value, but averages it to each component of the RSS, which cannot fully represent the signal characteristics of the location. So we use another method: Gaussian filtering. The method of Gaussian filtering is to eliminate the small probability signal through the Gaussian model, retain the high probability signal, and then average the high probability signal as the final RSS signal value. The singular value signal is a small probability signal, so it can achieve a noise reduction effect.
Suppose we deploy N APs and plan M sampling points. We sample for a period of time at each sampling point, and then use the Gaussian model to filter the RSS signal value with high probability. The Gaussian function expression is as follows: where μ and σ are the mean and variance respectively.
We keep the signals with a probability value greater than 0.6, and then average these signals as the fingerprint of the sampling point and store it in the fingerprint database. The representation of the fingerprint of each sampling point in the database is = , , … , , and its corresponding label in the database is the coordinate of the sampling point ( , ), 1 ≤ ≤ . (

3) Fingerprint Clustering
After the fingerprint database is constructed, the structure of the map is the one-to-one mapping relationship between sampling points and their RSS fingerprints. Usually people use KNN algorithm or WKNN algorithm to locate the target directly. Due to the spatial distribution of RSS signals, we know that the Euclidean distance of RSS fingerprints of two adjacent points is small. The KNN algorithm and WKNN algorithm select multiple sampling points with the closest Euclidean distance from the RSS fingerprint of the target location as the target location basis. To ensure the accuracy of the selection of these sampling points, we cluster the fingerprint data of the map sampling points. In this way, the selection range of sampling points is reduced, the accuracy of online localization is higher, the time required for localization is shorter, and the localization efficiency is improved.
Here we use the K-Means++ clustering algorithm. Specifically, we set K clusters, and then start to select K cluster centers. The basic idea of selecting the initial centroid is to keep the distance between the initial cluster centers as far as possible. First, from the M fingerprint data, randomly select one fingerprint data as the centroid of the first cluster. Then the shortest distance between each sample and the existing cluster center is calculated, which is represented by D( ). Then, the probability P( ) of each sample being selected as the next cluster center is calculated. The formula of P( ) is as follows: Finally, we use the roulette method to select next cluster center . Then repeat the selection of cluster centers until K cluster centers are selected. Traverse each sample after selecting K cluster centers, and calculate the Euclidean distance between the sample and the K cluster centers. Then divide the sample into the cluster with the smallest Euclidean distance. Then the traversal is completed, the center of each cluster is recalculated, and the traversal operation is continued until the cluster allocation no longer changes.

Online phase
We measured the RSS value of AP at the test point , , … , . First, we classify the measured RSS data into cluster , where 1 ≤ k ≤ . This reduces the localization area. Then use WKNN algorithm for localization. Assuming that there are T fingerprint samples in this category, we calculate the RSS value of the test point and the Euclidean distance of the fingerprint in the classified area, the formula is: where 1 ≤ t ≤ . Let ( , ), ( , ) and ( , ) be the three closest points in the cluster with the closest Euclidean distance to the test point fingerprint, and the distances are , and respectively. Then our formula for calculating the weight is: where i = 1,2,3. So we get to the final localization result ( * , * ): ( * , * ) = ( , ) + ( , ) + ( , ) This method of first clustering fingerprint data and then localization is also applicable to complex indoor environments, especially when an area is divided into several sub-areas by walls. For example, in the environment shown in Figure 1, due to the influence of the wall on the signal, in this case, we cluster the fingerprints into 4 categories, and then the method of localization can effectively reduce the error and improve the localization accuracy.

Experiment and Analysis
Our experimental scene was in an underground parking lot, and we chose an area of 650 square meters. We deployed 16 APs. We planned a sampling point every 1 meter, planned a total of 702 sampling points, and collected the RSS signal for two minutes at each sampling point. Then for the RSS signal of each sampling point, we perform preprocessing, use the Gaussian model to filter out signal values with a probability greater than 0.6, and average these filtered signal values as the fingerprint of the sampling point. In the measurement, due to the large space, not all AP signal values can be collected at every sampling point. We consider the uncollected signal value to be -100 dBm. In this way, a fingerprint database is constructed.
We cluster these fingerprints using K-Means++ algorithm. For the setting of the K value of the number of clusters, we tried to set K to 2, 3, 4, 5 and 6. We randomly select 200 test points on the map, measure their RSS signals, and perform Gaussian filtering to get the average. Then the RSS signal is classified, and the localization error is calculated by WKNN algorithm after classification. As shown in Figure 2, under our data map, clustering into four categories is the best. In the case of K = 4, we average the accuracy error of 200 test points, and the error accuracy is about 2.24 m.

Figure 2.
In addition, we will compare the localization algorithm with clustering and the localization algorithm without clustering. We also directly perform WKNN algorithm on 200 test points to calculate the localization error. From Figure 2, we can see that the localization error without clustering algorithm is significantly higher than our method. Therefore, this illustrates the effectiveness of our method. At the same time, in the online localization stage, we first classify the RSS signals of the test points, which effectively reduces the calculation cost of the WKNN algorithm and improves the efficiency of real-time localization.

Conclusion
In the traditional fingerprint localization method, fingerprints are collected in the offline phase and positioned in the online phase. On the basis of traditional fingerprint localization, we propose two goals: reducing errors and improving localization efficiency. First, we preprocess the collected RSS signal. The Gaussian model is used to calculate the probability of the RSS signal value, retain the RSS signal with a probability greater than 0.6, and average these signals to eliminate singular values. Then we use the K-Means++ algorithm to cluster the fingerprints, and group the similar RSS signals into the same cluster. K-Means++ algorithm effectively solves the problem of random initial points, which makes the clustering effect better and the convergence speed faster. Our experimental results show that our method can not only improve the accuracy of localization, but also reduce localization time and improve localization efficiency.