Modelling Malicious Attack in Social Networks

Online Social Networks (OSNs) are based on actual trust relationships in environments which help people communicate with friends, family and acquaintances. Malicious individuals take advantage of this trust relationship to propagate malware through social networks. We study the dynamics of malware propagation among OSN users. Social networks users are referred to as nodes which is in two compartments: Healthy (H), or Infected (I). A H node could either be susceptible to infection (S) or removed (R). Simulations were carried out in R using the EpiModel network simulation package. Two networks were simulated thrice with different parameters to give better average values. Two categories of nodes, first category comprises of 3000 nodes with fewer connections and the second category comprising of 7000 nodes are the influential nodes with more connections. The larger network tends to have a higher fraction of nodes getting infected per unit time due to the high level of connectivity, as opposed to the small network where the number of connections is few. However, the infection tends to persist in the network as long as the birth rate is not equal to zero.


Background
A social network consists of individuals called nodes, connected by interdependencies such as friendship, kinship, common interest, dislike, etc. The structure of these networks are often represented using graphs (Kosorukoff, 2011). According to Faghani et al. (2012), there are three major types of OSN malware -Cross Site Scripting (XSS) worms, Trojans and ClickJacking worms. An XSS worm self-propagates among visitors of a website with the aim of progressively infecting other visitors. Trojans like Koobface use social engineering and spread themselves in social networks by broadcasting messages with fascinating topics.

Importance of Modeling Malicious Attacks in OSNs
Social network analysis has proved useful in epidemiology to help understand how patterns of human contact or aid inhibit the spread of infectious diseases in a population. A susceptible host is healthy but prone to infection. An infectious host had been infected and can infect other susceptible host. In cases where recovery from the infection confers immunity, hosts that recover are said to be removed, since they would not be susceptible to infection after recovery (Aderinola, Thompson, & Alese, 2017).
The dynamics of social networks can sometimes be modelled by using agent based models, which provide insight into the interplay between communication rules, rumour spreading and social structure. Social network analysis could also be used as a tool for mass surveillance (Kosorukoff, 2011).
Malware propagation in OSNs has become a major security threat to social networks users. More so, OSN malware propagate faster than traditional malware (Faghani et al., 2012).
Apart from being a platform for malware propagation, social networks have been disparaged for breaching the privacy of their users. This is such that Employees have been hired or fired, and University applicants have been screened based on their behaviour on social networks. There are various stories of user privacy breaches, with unfortunate consequences (Mahmood, 2013). Malicious hackers often plan and strategize for cyber-attacks on social networks.
Analysts scrutinising these discussions can warn system administrators about attacker capabilities and intentions. This kind of analysis is often done manually. With recent machine learning technologies, Lincoln Laboratory has demonstrated the possibility of automatically detecting such malicious discussions from various OSNs (Lippmann et al., 2016). Lloyd, Valeika, and Cintr (2005) discussed the use of network models to describe the impact of local spatial structure on infection spread. They provided a detailed discussion on the implications of spatial structure on the dynamics of diseases infections in small-world networks. However, heterogeneity, which is an important aspect of the structure of many populations, was merely mentioned, but was not discussed into detail. Tao et al., (2006) presented a detailed review of classical epidemic models and some interesting research problems in epidemiology. However, most studies on malware propagation in scale-free networks, small-world networks and online networks are often analysed with realistic network topologies, but with simple assumptions on user activities.

Related Work
Yan, et al. (2011) studied the impact of initial infection, user activity, social structures on malware propagation by using trace-driven simulation. However, the work assumed that each user becomes active only after she performs an activity on the social network, and that each user has only one account on the network. Ikhalia and Johnnes (2014) investigated common approaches to malware propagation on online social networks and provided insight to the operations and various types of malware that are proliferated in online social networks.
Xiao, Freeman, and Hwa (2015) described a scalable approach to finding groups of fake accounts registered by the same malicious user. They presented a scalable and time-sensitive machine learning approach to finding groups of fake accounts registered by the same actor.
Wen (2014) presented a Susceptible-Infectious-Immunized (SII). The model presented, however, assumed that online network users check messages at regular periods, which may not be true in the real scenario. This study will model and simulate malware propagation in online social networks using the Susceptible-Infected-Immunized model.
The rest of this paper is organized as follows. In Section 2 the details of the network structure, malware propagation model and simulation are discussed. The results of the simulation and their interpretation are discussed in Section 3.

Network Structure
Social networks are based on relationships between users of the network, referred to as nodes. The links between them are referred to edges. The social network will be modelled as an undirected graph. If two nodes are connected, information can flow both ways. The topology of a social network with N nodes can be represented as an N by N adjacency matrix: where represents the probability of contact between nodes and . When = 0, the nodes are not connected. When = 1, the nodes are connected, that is, they are "friends". Therefore, the node degree is the number of friends a user has. It is assumed that the nodes' degrees of users in the network exhibit the power-law distribution, which is described as where is the probability that a node connects to k nodes.

Malware Propagation
In a social network, the propagation of malware depends on the behaviour of users. The primary human behaviours affecting propagation are the message checking time of each user and the probability of clicking a malicious link.
A user is infectious only at the active state. The infection relies on contact between an infected and a susceptible node, with the susceptible node receiving and checking a malicious message from the infected node. The infection probability will be represented by , . A node can move into the removed state from any of the other states with a recovery probability . It is assumed that once the user recovers, he is aware of such kinds of malicious links and cannot be re-infected. The classical SIR epidemic model is adapted to model the dynamics of the network: where b is the birth rate, μ are the mortality rates specific for each compartment, and ν is the recovery rate.

Network Simulation
The model simulation depends on three parameters. The contact rate between within a connection in each time unit. The infection probability is the risk of transmission given contact with an infected person. In an SIR model, the recovery rate is the reciprocal of the average duration of disease; likewise, the reciprocal of the death rates is the average life expectancy for persons in those compartments. However, in the case of the network being modeled, it can be assumed that there are no deaths in the network, but there are births. In other words, users are not removed, but more users can join the network.
The SIR model then becomes: Simulations were carried out in R using the EpiModel network simulation package (Jenness, Goodreau, & Morris, 2018). Two networks were simulated with different parameters. Each simulation was carried out three times with the same parameters to give better average values.

Results
A total of 5 simulations were run, with T = 500. The output is shown in Figure 1. The plot displays the prevalence of the compartments in the model across the 3 simulations. The individual simulations are represented by the thin lines, the means across simulations at each time step are plotted with thicker lines, and the polygon band shows the inter-quartile range across simulations.
It can be seen from Figure 2 that given similar parameters and fraction of infected nodes, the infection trends are expected to follow the same pattern. Figure 3 shows the flows between the different compartments in the model across the 3 simulations: the births (b.flow), susceptible-infected (si.flow), and infected-recovered (ir.flow). The larger network would have a higher birth rate. In other words, a large social network will have a higher number of users joining than a small network. As shown by the SI flows, the larger network tends to have a higher fraction of nodes getting infected per unit time due to the high level of connectivity, as opposed to the small network where the number of connections is few. However, the infection tends to persist in the network as long as the birth rate is not equal to zero. As long as new users keep joining the social network, the infections will persist, since every new user is susceptible, and will eventually be infected with a probability that depends on the user's awareness of such infection and the user activities. Figure 3 shows the flows between the different compartments in the model across the 3 simulations: the births (b.flow), susceptible-infected (si.flow), and infected-recovered (ir.flow). The larger network would have a higher birth rate. In other words, a large social network will have a higher number of users joining than a small network. As shown by the SI flows, the larger network tends to have a higher fraction of nodes getting infected per unit time due to the high level of connectivity, as opposed to the small network where the number of connections is few. However, the infection tends to persist in the network as long as the birth rate is not equal to zero. As long as new users keep joining the social network, the infections will persist, since every new user is susceptible, and will eventually be infected with a probability that depends on the user's awareness of such infection and the user activities.
(N = 100, N f = 404; T = 300, I 0 = 10; b = 0.005) The degree of nodes also affects the infection prevalence in the two categories as shown in Figure 5. It can be observed that in both simulations, the infection is more prevalent in the second category of nodes (category 1), since they have higher degrees. Hence, social network users that have more friends are likely to receive more messages containing malicious links and possibly forward same to their friends.