Data identification is a key task for any Internet Service Provider (ISP) or network administrator. As port fluctuation and encryption become more common in P2P traffic wishing to avoid identification, new strategies must be developed to detect and classify such flows. This paper introduces a new method of separating P2P and standard web traffic that can be applied as part of a data mining process, based on the activity of the hosts on the network. Unlike other research, our method is aimed at classifying individual flows rather than just identifying P2P hosts or ports. Heuristics are analysed and a classification system proposed. The accuracy of the system is then tested using real network traffic from a core internet router showing over 99% accuracy in some cases. We expand on this proposed strategy to investigate its application to real-time, early classification problems. New proposals are made and the results of real-time experiments compared to those obtained in the data mining research. To the best of our knowledge this is the first research to use host based flow identification to determine a flows application within the early stages of the connection.
Bibliographical noteThe methodology presented can correctly separate web protocols from Peer to Peer protocols by looking at individual traffic flows as well as network hosts. This novel host based approach can achieve up to 99% accuracy when tested with real network traffic and it can be potentially used by network operators to identify user applications.
ASJC Scopus subject areas
- Computer Networks and Communications