Artificial Intelligence and Data Mining Approaches in Security Frameworks. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Artificial Intelligence and Data Mining Approaches in Security Frameworks - Группа авторов страница 14
2.3 Clustering
A data mining technique is used for grouping a set of objects in such a way that there is more similarity in the objects of the same class as compared to the objects of the other class. It means cluster of same class, i.e., similarity of intra-cluster is maximum and similarity of inter-cluster is minimum. Unsupervised learning can be performed with the help of clustering. Following are the types of clustering algorithms:
1 a) Distribution Based
2 b) Density Based
3 c) Centroid Based
4 d) Connection Based or Hierarchical Clustering
5 e) Recent Clustering Techniques
a) Distribution-Based Clustering
A model of clustering in which the date is grouped/fitted in the model on the basis of probability, i.e., in what way it may fit into the same distribution. Thus, the groups formed will be on the basis of either normal distribution or gaussian distribution
b) Density-Based Clustering
In this type of clustering, a cluster is formed with the help of area with higher density as compared to the rest of the data.
Following are three most frequently used Density-based Clustering techniques:
1 i) Mean-Shift
2 ii) OPTICS
3 iii) DBSCAN
c) Centroid-Based Clustering
Clusters that are represented by a vector are a part of centroid-based clustering. It is not a mandate requirement that these clusters should be a part of the given dataset. The number of clusters is inadequate to size k in k means clustering algorithm; therefore, it is essential to find centres of k cluster and allocate objects to their nearest centres. By taking different values of k random initializations, this algorithm runs multiple times to select the best of multiple runs (Giannotti et al., 2013). In k medoid clustering, clusters are firmly limited to the members of the dataset, whereas in k medians clustering, median is taken to form a cluster; the foremost drawback of these techniques is that we have to select the number of clusters beforehand.
d) Connection-Based (Hierarchical) clustering
As the name itself suggests, this type of clustering is performed on the basis of closeness or distance of objects. The most important key point to form these types of cluster is the distance between the objects by which they can be connected with each other and form clusters. Instead of single partitioning of dataset, these algorithms provide an in-depth hierarchy of merging clusters at particular distances. To represent clusters, a dendrogram is used. Merging distance of the clusters is shown on the y-axis and an object placement shows the x-axis to ensure that there should not be the mixing of clusters.
On the basis of the different ways with which distance is calculated, there are several types of connection-based clusters:
1 i) Single-Linkage Clustering
2 ii) Complete-Linkage
3 iii) Average-Linkage Clustering
e) Recent Clustering Techniques
For high dimensional data, the above-mentioned standard clustering techniques are not fit, therefore some new techniques are being discovered. These new techniques can be classified into two major categories, namely: Subspace Clustering and Correlation Clustering.
A small list of attributes that should be measured for the formation of a cluster is taken into consideration under subspace clustering. Correlation between the chosen attributes can also be performed with correlation clustering.
2.4 Privacy Preserving Data Mining (PPDM)
To extract the pertinent knowledge from large volumes of data and to protect all sensitive information of that database, we use privacy preserving data mining (PPDM). These techniques are created with the aim to confirm the protection of sensitive data so that privacy can be reserved with the efficient performance of all data mining operations. There are two classes of privacy concerned data mining techniques:
1 Data privacy
2 Information privacyModification of database for the protection of sensitive data of the individuals, we use data privacy technique. If there is a requirement for the modification of sensitive knowledge that can be deduced from the database, information privacy technique is preferred. To provide privacy to input, data privacy is preferable, whereas for providing privacy to output, the technique of information privacy is used. To reserve personal information from exposure is the main focus of a PPDM algorithm. It relies on the analysis of those mining algorithms that are attained during data privacy. Main objective of Privacy Preserving Data Mining is building algorithms that convert the original data in some useful means, so that there is no visibility of private data and knowledge even after a successful mining process. Privacy laws would allow the access in the case that some related satisfactory benefit is found resulting from the access.
2.5 Intrusion Detection Systems (IDS)
Onset detection of the intrusion is the main aim of an Intrusion detection system. There is a requirement of a high level of human knowledge and substantial amount of time to attain security in data mining. However, intrusion detection systems based on data mining need less expertise for better performance. To perceive network attacks in contrast to services that are vulnerable, intrusion detection system is very helpful. Applications-based data-driven attacks always privilege escalation (Thabtah et al., 2005), un-authorized logins and files accessibility is very sensitive in nature (Hong, 2012). Data mining process can be used as a tool for cyber security for the competent detection of malware from the code. Figure 2.3 shows the outline of an intrusion detection system. Several components such as, sensors, a console monitor and a central engine forms the complete intrusion detection system. Security events are generated by sensors whereas the task of console monitor is to monitor and control all events and alerts. The main function of the central engine is recording of events in a database and on the basis of these events, alerts can be created followed by certain set of rules. Following factors are responsible for the classification of an intrusion detection system:
1 i) Location
2 ii) Type of Sensors
3 iii)Technique used by the Central engine for generation of alerts.