Mohamed Seghire Othman Djediden, Hicham Reguieg, Zoulikha Mekkakia Maaza Article first published online: 2019 Abstract
With the great explosion of data generated in computer networks. The main task of Intrusion Detection Systems (IDS) has become more complicated. Most of the existing IDS are deployed on a single server and do not support the distributed processing. These systems encountered several problems as soon as the volume of the data to be analysed is larger and more varied. The main goal of this paper is to create an intrusion detection system that can analyse massive data quickly with great precision while supporting distributed data processing. This type of data processing assures that our system will be more available and fault-tolerant. In our work, we have combined the Apache Spark framework with known feature selection methods and machine learning algorithms from the improved Sickit-learn library called Sk-dist. The UNSW-NB15 dataset was used to assess the performance of our system. The results of comparisons made with other existing work have shown that our approach is much better in terms of accuracy, reduction of features and above all fault tolerance. |