Discovery of Similarity and Dissimilarity

The aim of this paper is to introduce some applications on similarities and dissimilarities. Using of a simplified diagram and tables to present the information about the similarities and dissimilarities account process and organization are also easy and we calculated the topology based on the similarity and topology views of the dissimilarity. For information system whose values are numeric, a method of classification is suggested. This method is based on constructing neighborhood relation on the universe of the resulted classification not generally a partition for the universe.


Introduction
Similarity is necessary for knowledge discovery.Granulation, classification, and cluster analysis each include some notion or a definition of similarity.The domain and distribution of the data are the base measurement of similarity were selected.Some similarity metrics may be considered more of use than others even within a specific domain.There is some uncertainty in quantitative measurement of similarity between records of mixed data.This uncertainty comes from the lack of scale that nominal and ordinal data have.Rough set theory is a tool that is developed for the sake of handling uncertainty.Rough sets may be used in dissimilarity analysis of qualitatively-collected data.It would seem that rough sets can be used in measuring similarity between records which contain quantitative and qualitative data for clustering the records.Rough sets were considered one of the tools that have been developed to deal with uncertainty.Rough sets measure of similarity between our records that contain the data with same qualitative and quantitative data (Han and Kamber, 2001;Lin et al., 2002;Pawlak, 1982;Pawlak, 2002;Zhu, 2002).
The knowledge can be express by Mathematics, whether the knowledge contains quantitative or qualitative.

Data types
1) Quantitative data is information regarding quantities, which is information that could be measured and written using numbers.

For example:
-Student's grades in school materials.
-Degree heat of patients in the hospital.
2) Qualitative data can be described as ordinal or nominal.Nominal data does not have order nor scale.
Ordinal data has order without scale.For example: -Colors.Each record has a node and a label edge between the nodes if deleting an attribute would place the records in the same class of equivalence.

Similarity and
For example: degree is between c2 and c4 with the label a5.We get the result as the next figure as follow: Determining length of the shortest path between the nodes in the graph corresponding to the records is the way dissimilarity between two records is computed.For example: The dissimilarity between c 3 and c 4 would be 2.

Definition 1.1
The similarities are computed as: (|D max -D ij )/D max |, where D max is the maximum dissimilarity over all pairs and D ij is the dissimilarity between c i and c j , D max is the previous example is 2.
The dissimilarity is computed out of (through) the following: Discernibility Matrices: Definition 1.2 An information system S defines a matrix M A , which is called discernibility matrix.Each entry M A (x, y) ⊆ A consists of a set of attributes that be used to discern between objects x, y ∊ U: M A is a |U|×|U| matrix; the discernibility matrix has the form: Let U={c 1 , c 2 , c 3 , c 4 , c 5 , c 6 , c 7 , c 8 } be a mobile devices, A={ a 1 , a 2 , a 3 } be screen measurement, weight and accuracy of camera in the following Table 1 To compute Similarity matrix of Table 1.6 in the Table 1.1.12 is compute as the fallow, where x 11 = x 13 = x 17 =|5-5|=0, x 12 = x 14 = x 15 = x 16 = x 18 =|8-13|=5.To compute similarities for Table 1.14, where D max = 3 as follows Table 1.14 ) is an information system defines an information function f: U →V, where A is the set of attributes, V is the domain of the particular attributes in which the values V are real numbers.We define a relation R i for each objects i(x) as follows: xR i y if |i(x)-i(y)|<λ, where λ is determined by an expert of the field.For example if the information is from medical field, the expert is a person interested in medicine and making in the problem.Thus for each i(x)∈U we can get a classification O ⁄R i where O is a finite set, which is xR i ={y:|i(x)-i(y)|<ε, x∈0}.

Definition 1.4
For each B ⊂ A, the relation R B ⊂U×U defined , where |B| is the cardinality of B and λ is a represented any number.

Yao's method (Yao, 1999)
Yao introduced a method for generalization of approximation space depending on the right neighborhood as showing: If U is a finite universe and R is a binary relation on U, then: The class of right neighborhood is .For a topological space (X, t), a subset A of X, we define the accuracy of Yao is

Conclusion
This Paper discussed two of the approaches for the determination of the similarity between records of mixed data.We introduce in this paper some concept's and application, from the introduce application, we found that the relation between the general topology and the rough set.From the last, we heard that topology is father of rough set, but in this thesis showed this relation.In the next paper we will this relation becomes fact.It may be noticed that because of the uncertainty and ambiguity of qualitative data and of trials to combine metrics leave rough set theory as an optional tool to be used.
As mentioned in the discussion, an extra or another approach is required for the discovery of identical sets of records in data sets of mixed data.
From Tables, we can notice that the cluster that includes the attributes, the most possible record in the same cluster would be attributing as it is in both approximations.One might use the union of the upper approximations to determine probable clusters.

Let
U={x 1 , x 2 , x 3 , x 4 } be patients, A ={a 1 , a 2 , a 3 } be Temperature, pressure and Diabetics in the following table: Figure 1.3

get Table 1.2 Possible modified set Table 1.2
Any attributes with the same value as another attribute for all records are disregarded.