Abstract
With network attack technology continuing to develop, traditional anomaly traffic detection methods that rely on feature engineering are increasingly insufficient in efficiency and accuracy. Graph Neural Network (GNN), a promising Deep Learning (DL) approach, has proven to be highly effective in identifying intricate patterns in graph-structured data and has already found wide applications in the field of network security. In this paper, we propose a hybrid Graph Convolutional Network (GCN)-GraphSAGE model for Anomaly Traffic Detection, namely HGS-ATD, which aims to improve the accuracy of anomaly traffic detection by leveraging edge feature learning to better capture the relationships between network entities. We validate the HGS-ATD model on four publicly available datasets, including NF-UNSW-NB15-v2. The experimental results show that the enhanced hybrid model is 5.71% to 10.25% higher than the baseline model in terms of accuracy, and the F1-score is 5.53% to 11.63% higher than the baseline model, proving that the model can effectively distinguish normal traffic from attack traffic and accurately classify various types of attacks.
0 Introduction
As computer network attacks become more frequent and sophisticated, the effectiveness and efficiency of anomaly traffic detection have become significant challenges in ensuring network security. To address these challenges, many enterprises and governments depend on Network Intrusion Detection Systems (NIDS) [1] to safeguard vital systems, sensitive information, and overall network integrity. These systems typically analyze raw network traffic to detect potential intrusion activities. Common formats for traffic records include packet capture and flow-based records, which outline the characteristics of network traffic and categorize them accordingly. Traditional NIDS methods are generally divided into two paradigms: signature-based[2] and behavior-based detection[3]. The former approach uses predefined rules or patterns to identify known threats. In contrast, behavior-based methods employ more sophisticated techniques, often integrating Machine Learning (ML) approaches to identify evolving and novel attack patterns.
Existing NIDS methods frequently overlook the topological distinctions between normal and attack traffic, a factor critical for accurate intrusion detection. Incorporating topological information[4], especially the relationships between nodes and lateral movement paths, can significantly enhance intrusion detection performance. Through improved feature extraction, this deeper analysis helps detect attacks like data theft, DDoS[5], and reconnaissance[6-8].
Lately, Graph Neural Networks (GNNs) [9] have gained prominence as powerful tools for modeling, yielding impressive results in areas such as Natural Language Processing (NLP) [10], speech recognition[11], and computer vision [12-13]. GNNs are particularly well-suited for handling graph-structured data with complex relationships. By propagating information through the connections between nodes and edges, GNNs can effectively model traffic behavior patterns within networks. In order for a GNN to map IP and port to the nodes and network flow to edges, network traffic could be symbolized as a graph with nodes (such as port number, IP address, etc.) and edges (such as data packet, protocol, byte number, etc.) . This creates an ideal application environment for problems involving the detection of anomaly traffic.
In addition, GNN can effectively aggregate information from neighboring nodes through their unique message passing mechanism[14], thus capturing complex structures and relationships in network data. This enables GNN to analyze network traffic by not solely relying on predefined features, but by autonomously learning the most relevant patterns from the data. As a result, the need for traditional manual feature engineering is minimized, giving the model greater flexibility to adapt to diverse and complex network topologies and traffic behaviors.
The fundamental limitations of current GNN-based techniques are threefold. First, the majority of research (including E-GraphSAGE[15] and (Graph Convolutional Network) GCN-based techniques[16]) ignores edge attributes, which are essential for capturing traffic interactions, in favor of concentrating on node aspects. In addition, when working with large-scale network graphs, typical GNN frameworks are computationally inefficient, which restricts their use in practical situations. Lastly, current models perform worse on unbalanced datasets, particularly for a large percentage of attacks that are likely to result in high false positive rates.
In this paper, our research goal is to design and implement an anomaly traffic detection model based on a hybrid GCN and GraphSAGE, which utilizes the respective advantages of GCN's global structure modeling and GraphSAGE's local efficient sampling to improve the detection accuracy. At the same time, feature coding and edge-based batching strategies were introduced to optimize processing efficiency of large-scale network data, and the problems of large computation and high memory consumption of traditional GCN in large-scale graph data were solved. Specifically, we transform the network traffic data into a graph format, with each edge representing a traffic-related characteristic like protocol type, flow size, and more. Additionally, we label the edges of the graph to enable supervised learning during the subsequent training process. To efficiently handle large-scale network graph data[17], we have designed an edge-based batching method combined with a node expansion technique. In each batch, a certain number of edges are selected, and the corresponding neighboring nodes are expanded, ensuring that each batch contains a sufficient number of nodes and edges. During training, this tactic aids in preserving the model's robustness and diversity. Our model's architecture is built on a hybrid framework that blends the GraphSAGE and GCN models. By aggregating information from neighboring nodes, this hybrid model learns feature representations for both nodes and edges, allowing it to capture potential anomalous behaviors within the network and accurately identify various types of attack patterns. According to experimental results, our model outperforms existing approaches across several public datasets, demonstrating significant improvements in detection performance.
In conclusion, this paper has three main contributions:
·We propose a new anomaly traffic detection model, called HGS-ATD, utilizing both graph convolution and edge features to improve the accuracy of anomaly traffic detection.
·An edge-based batch method is designed to ensure sample diversity and sufficient node coverage by dynamically expanding the set of nodes in each batch. This strategy can be applied to large-scale data and enhances computational efficiency during training.
The efficiency of the suggested strategy in reliably recognizing different types of attacks and distinguishing between attack and benign traffic is validated by substantial experiments conducted on four datasets.
The remaining parts of the document are structured as follows: the associated studies are presented in Section 1, the pertinent background is summarized in Section 2, Section 3 details the proposed model and methodology, the experimental setup and results analysis are presented in Section 4, and the work is summarized in Section 5.
1 Related Work
A network intrusion detection system is crucial for safeguarding computer networks against a variety of attacks. With the continuous evolution of network attack methods, traditional signature-based and rule-based methods have gradually been unable to deal with complex and new attacks, not only zero-day attacks[18] and advanced persistent threats, but also DDoS[19] attacks, large-scale botnets attacks, and new covert attacks using encrypted communication channels[20]. As a result, in the past few years, ML-based approaches, particularly DL and GNN, have attracted significant interest as promising advancements in the area of NIDS.
Traffic data in a network is generally composed of devices, IP addresses, or other communication nodes, with their interactions represented as edges. GNNs excel at modeling the complex relationships between nodes and edges, particularly by using convolutional operations to capture the intricate structural features of network communication. Busch et al.[21] extracted directed edge attribute graphs from network traffic, where each node corresponds to a network endpoint, and each edge represents a communication feature vector between two endpoints. Zhao et al.[22] introduced a sextuple representation of network traffic, encompassing details such as source IP, destination IP, ports, protocols, requests, and responses. Pujol-Perich et al.[23] developed a host connection graph in which nodes signify hosts or traffic entities, and edges represent the connections between them. Zhou et al.[24] presented network traffic as a communication graph comprising a node set and an adjacency matrix. The node set includes multiple distinct nodes observed in the traffic records, representing devices or entities in the network, while the adjacency matrix captures the connections between these nodes.
Many existing studies tend to overlook edge features, often initializing node features using vectors derived from all edge features. These methods all rely on graphs that are directly built from the underlying network architecture, producing graph data that closely resembles computer network topology. Nevertheless, this method restricts the model's capacity to completely utilize the connections among traffic flows produced within the network. In GNN models, similar nodes are typically linked by edges, but traditional network topologies struggle to generalize this relationship effectively. By creating graph-structured data based on the connections among traffic flows, we overcome this constraint in this work. Since every node in the graph reflects the properties of a flow, the connections made by edges are able to capture the nodes' innate commonalities.
Hamilton et al.[25] introduced the GraphSAGE approach, which improves the computational efficiency of GNN on large-scale graphs by sampling neighboring nodes, making it suitable for real-time traffic detection. While this method allows for generating low-dimensional vector representations of graphs, it has a significant limitation: it overlooks edge features. The E-GraphSAGE method, an extension of GraphSAGE intended for intrusion detection tasks, was created by Lo et al.[15] in order to overcome this limitation. This inductive model integrates edge features into the GraphSAGE framework. The method entails constructing the flow representation by aggregating the nearby flows of the source and target nodes. However, one limitation of this approach is that traffic features themselves have insufficient influence on the embedding representation. The real-time urban traffic analysis method proposed by Chandramohan et al.[26] optimizes routing decisions by modeling vehicle interaction through graph structure, which provides a cross-domain idea for efficient utilization of edge features in dynamic networks.
Chang and Branco[27] proposed E-ResSAGE and E-ResGAT algorithms, which are based on GraphSAGE and Graph Attention Network (GAT) algorithms, respectively. In terms of model construction, the E-ResSAGE algorithm retains the original traffic feature information by adding residual connections in the aggregation operation, and enhances the detection ability of minority categories. On the basis of E-ResSAGE, the E-ResGAT algorithm further introduces the attention mechanism to perform weighted aggregation of the neighborhood information of the traffic data, so that the model can better pay attention to the influence of different neighbor nodes on the current node.
Bao et al.[28] proposed a hybrid model based on E-GraphSAGE and LSTM (Long Short-Term Memory) to improve detection performance by combining spatial structure and time series features. E-GraphSAGE is used to perform hierarchical edge sampling and aggregation of the network traffic graph to retain the topological characteristics of communication between nodes. The residual design is introduced to concatenate the original edge features and the aggregated features to alleviate the feature dilution problem.
Ji and Meng[29] applied GNN-based representation learning methods to the problem of network traffic classification. Instead of directly extracting features from the traffic, they transformed raw traffic data into an image format and used a GCN for image classification, ultimately achieving traffic classification. Initial experimental results indicated strong performance on smaller networks, which showed a97.35% classification accuracy. Ying et al.[30] introduced a network intrusion detection system leveraging GCN. This approach represented traffic data as a graph structure, enabling GCN to capture intricate dependencies between network nodes and edges. Experiments demonstrated that this method could better identify anomalous behavior patterns in the network than conventional ML-based approaches.
The use of GCN for Heterogeneous Information Network (HIN) was examined by Yang et al.[16]. However, several significant limitations constrained the effectiveness of GCN on HIN, affecting both their precision and effectiveness. They developed a comprehensible and effective Heterogeneous Graph Convolutional Network (ie-HGCN) to overcome these difficulties. Experiment results showed that their approach successfully addressed the drawbacks of the current methods, allowing for a deeper comprehension and learning of HIN.
As shown in Fig.1, Tran and Park[31] classified malicious network traffic using a GCN model. The model required a significant amount of memory and computational time, even though its architecture was simple—it had only two GCN layers and one fully connected layer. Additionally, it only achieved around 90% identification rate on the CIC-IDS2017 dataset, indicating a moderate detection performance.
Fig.1GCN model
Deng and Huang[32] proposed a novel intrusion detection model, EMA-IDS (Edge-featured Multi-hop Attention Graph Neural Network for Intrusion Detection System) . The core idea is to use GNN combined with a multi-hop attention mechanism and edge features to improve the effect of network intrusion detection. It effectively alleviates the over-smoothing problem often encountered by traditional GNN in deep networks, and also improves the computational efficiency through the multi-hop attention mechanism.
Du et al.[33] designed an intrusion detection model combining GCN and LSTM networks, named UGL (UAV-GCNLSTM) . A comprehensive extended dataset is constructed by adding three dimensions to the original dataset: Message Cycle, Hamming Distance, and Entropy. At the same time, the efficiency of GCN in capturing network topology and the ability of LSTM in processing time series data are used to efficiently and accurately intercept network attacks in the real UAV working environment.
In the GNN architecture, GAT[34] dynamically learns the association weights between nodes through the attention mechanism, which can capture complex interaction patterns more flexibly. GAT has been applied to various anomaly detection tasks. For example, Jahin et al.[35] proposed to combine Contrastive Learning Graph Network (CAGN) and GAT, which fuses attention mechanism and contrastive loss through a two-stream architecture to improve feature discrimination. However, GAT's high computational complexity and lack of scalability to large-scale graph data limit its application to real-time traffic detection. Notably, the introduction of GATv2 improves the attention mechanism, making it a potential candidate for further research in this area.
This work offers an enhanced GCN-based model to solve the drawbacks of existing GCN approaches. Although the GCN method is widely acknowledged as one of the best methods for classifying nodes, it performs poorly on huge graph datasets. The model's efficiency is greatly impacted when working with huge graph data since it must store the complete adjacency matrix and its accompanying characteristics in memory. The suggested batch processing technique greatly enhances the model's capacity to effectively handle massive amounts of graph data. In addition to being able to differentiate between regular and abnormal traffic, our proposed model can also accurately detect different sorts of attacks by utilizing edge features and topological information.
2 Background
2.1 Graph Neural Network
GNN[36] is a DL framework tailored for processing graph-structured data. GNN transforms graph data into standardized feature representations by applying certain algorithms to a graph's nodes and edges, which can subsequently be used in various neural network models for training. In applications including node classification, propagating edge information, and graph-based aggregation, GNNs have shown remarkable performance.
GNN was first developed in 2005, when Recurrent Neural Network (RNN) [37] was first used to handle graph data. Subsequently, researchers proposed GCN, which incorporated properties such as translation invariance, local perception, and weight sharing from Convolutional Neural Network (CNN) [38-39] into graph structures. This laid the groundwork for the construction and improvement of subsequent GNN frameworks.
The key feature of GNN is its ability to efficiently learn graph representations through a message passing mechanism, which plays a key role in anomaly traffic detection. It allows the model to effectively aggregate information between nodes and capture complex relationships and patterns in network traffic, as shown in Fig.2. Nodes in the graph are connected to each other by edges, and messages are passed from one node to neighboring nodes along edges. In this process, each node updates its hidden state based on the messages it receives from its neighbors. For example, the hidden state of node A is affected by the information of its neighboring nodes B, C, and D. By continuously iterating this message passing and state updating, the model is able to learn the feature representation of the entire graph, which makes GNN particularly effective in dealing with graphs with complex structures or attributes, so as to detect abnormal traffic. In addition to natural language processing, image analysis, trajectory prediction, physical chemistry, and drug discovery, they are versatile in a number of other areas as well.
In the domain of anomalous traffic detection, GNNs are used to analyze abnormal patterns in network traffic. By converting network traffic into a graph structure, GNN can capture relationships between data packets and distinguish between attack traffic and normal traffic by aggregating node features. The application of GNNs to anomaly detection is not limited to node-level classification; it also encompasses analysis at the level of graph connections and entire network structures, such as the detection of anomalous nodes, edges, subgraphs, and entire graphs.
Fig.2Message passing mechanism
2.2 Graph Convolutional Network
GCN[40] can handle non-Euclidean structures of data[41], such as present in social media networks, molecular networks, and user behavior data in recommendation systems, in contrast to typical CNN, which are mostly employed for grid data (like photos) . Nodes and edges make up graph data, the nodes stand for discrete entities, while the edges specify the connections or exchanges between them. In many applications, nodes not only contain attribute information (such as text or image features) , but edges often carry additional features as well (such as relationship strength or timestamps) .
The fundamental concepts behind GCN involve propagating and combining information across the graph's adjacency structure, allowing each node to incorporate information from its neighborhood during the learning process. Unlike traditional neural networks with fixed structures, GCN leverages the graph's topology to aggregate features from neighboring nodes.This allows GCN to capture local as well as global dependencies inside the structure of the graph.
The hidden state of each node undergoes two sequential operations: aggregation and update. This is where the concept of “convolution” becomes relevant. The following formula can be used to update the concealed state of nodes in each GCN layer:
(1)
where σ is the activation function (such as ReLU), is the weight matrix, A is the graph’s normalized adjacency matrix, and H is the node feature matrix,and H(l+1) is the new node features obtained through the graph convolution operation.
As a potent tool for processing graph data, GCN may enhance performance through end⁃to⁃end learning and efficiently capture the dependencies within the graph structure using graph convolution operations. By using the topological information of the graph and the properties of its nodes and edges to perform complex pattern recognition, GCN offers strong assistance for efficient detection and classification in the context of network traffic anomaly detection.
2.3 GraphSAGE
GraphSAGE (Graph Sample and Aggregation), introduced by Hamilton et al.[25] in 2017, is a variant of GNN. GraphSAGE proposes an innovative approach that significantly improves the scalability and efficiency of GNN by extracting partial data from each node’s neighbors and combining these data to generate node features.
Each node’s representation in GraphSAGE is modified based on the feature information of the nodes that are adjacent to it. To avoid operating on the entire graph, a fixed number of neighboring nodes, denoted as Ns(v), are sampled from the neighbor set N(v) of node v . If the mean aggregation method is used, the sampled neighbor set is Ns(v) = {v1,v2,…,vk}.
The aggregation step is the core of GraphSAGE, which creates a new feature vector that represents the node and the aggregated information of its neighbors by combining the features of the sampled nearby nodes. Let the feature vectors of the neighbors of node v bewhererepresents the feature of node v . The aggregation function, AGGREGATE, aggregates these neighboring feature vectors into a new vector:
(2)
Common aggregation functions include meanaggregation, pooling aggregation, and LSTM aggregation.
A neural network is then used to update the node by integrating the aggregated neighbor features with its own features. Letrepresent the node v feature vector in layer ( k-1), then the feature of node v in the k⁃th layer is:
(3)
Here,is the learned weight matrix, Aggrepresents the aggregation operation ( such as mean) on the neighboring nodes, and σ is the activation function. In the final layer of the graph, the representations of nodes are employed for downstream tasks such as regression, classification, etc.
2.4 Comparative Analysis of Models
The traditional GCN model effectively captures the complex dependencies in the graph structure by the graph convolution operation, which is suitable for processing data with non⁃Euclidean structure. However, it will lead to a memory explosion when processing datasets with 1000 edges ( e. g. NF⁃CSE⁃ CIC⁃IDS2018⁃v2). GraphSAGE significantly improves the scalability and efficiency of GNN on large⁃scale graphs by sampling neighbor nodes and aggregating their features. GraphSAGE supports a variety of aggregation functions, such as mean aggregation, pooling aggregation and LSTM aggregation, and can dynamically weight neighbor nodes according to their importance. However, GraphSAGE ignores edge features, which limits its performance in some scenarios. GAT introduces an attention mechanism to improve feature discrimination, but multi⁃head attention leads to 3-5 times increase in computational cost. Although the E⁃GraphSAGE method interates edge attributes, the fixed window sampling strategy is difficult to adapt to dynamic traffic changes. The specific comparative analysis is shown in Table1.
Table1Comparative analysis of state⁃of⁃art models
3 Methodology
Traditional GNN, such as GCN, GAT, and GraphSAGE, have been successfully utilized across various applications. These methods mainly emphasize node features for node classification and do not presently account for edge features in the context of edge classification. On the other hand, the hybrid algorithm combining GCN and GraphSAGE that we propose allows us to simultaneously incorporate both node and edge feature information during the embedding process. This approach lays the groundwork for calculating edge embeddings and conducting edge classification, thus facilitating the effective differentiation of normal and attack traffic and ensuring accurate classification of different attack types.
3.1 Data Preprocessing
Data preprocessing plays a crucial role in transforming raw NetFlow data into graph data for training and testing. A detailed description of our data preprocessing procedure is provided below. Firstly, the key information of each flow is extracted from the large amount of NetFlow data in the dataset, which includes IP address, port number, packet and byte count, and other useful packet statistics. NetFlow data supports the conversion between flow records and those in graphical format because nodes can be represented by IP addresses, and ports, and flow information can represent edges.
For this study, the original network traffic data is first converted into the relationship between graph nodes and edges through feature engineering. The source IP and port, the destination IP and port are merged into a unique node identifier, redundant port fields are deleted, and the edge list based on the communication relationship is constructed. After dealing with infinite values and missing values, the numerical features are standardized, and the features are scaled to zero mean and unit variance by StandardScaler. This process improves the convergence efficiency of the model. Finally, the stratified sampling strategy was adopted in the data segmentation stage, and the dataset was divided into a60% training set and 40% test set, and the proportion of samples of each category in the training set and the test set was the same.
3.2 Graph Construction
Information, including source and destination IPaddresses, source and destination ports, and traffic characteristics like duration, transaction bytes, and transmission packet size, is commonly included in network traffic data. During the graph construction process, the raw data needs to be cleaned and preprocessed. We handle missing values by imputation, correct outliers, and standardize as well as encode categorical features such as protocol type and TCP flags.
When building the network traffic graph, each traffic record ( i. e., a network connection) is represented as an edge. The source IP address and port, along with the target IP address and port, serve as the graph’s nodes, while other traffic details are stored as features of the edges. This approach effectively frames the anomaly traffic detection task as an edge classification problem.
3.3 The Proposed HGS⁃ATD Model
With a focus on unusual traffic pattern identification, we provide an enhanced GCN model in conjunction with the GraphSAGE model for node classification tasks in network data. To improve generalization, the model applies ReLU activation and dropout regularization after each of the two SAGEConv layers and a fully connected output layer. Finally, the Multi⁃Layer Perceptron (MLP) predictor is used to forecast the assault classifications. Fig.3 displays the model’s architecture. The model first converts network traffic data into a graph structure through data preprocessing and a graph building module, where nodes and edges represent different network elements and traffic characteristics, respectively. Then, the data passes through the node coding module and the edge coding module successively. In the node coding module, the GCN layer is responsible for the propagation of node information in the global scope and provides the global semantic background for the GraphSAGE layer. On this basis, the GraphSAGE layer combines the features of local neighbor nodes for dynamic aggregation and optimization of node feature representation. The edge coding module calculates the weight of edge features through the attention mechanism to capture the important information of the edge more effectively. Finally, through the edge classification module, the edge embedding is mapped to specific classification results using an MLP to detect whether the traffic is abnormal. This architecture design combines the advantages of GCN and GraphSAGE to better handle large⁃scale graph data and accurately identify abnormal traffic.
Fig.3The overall architecture of HGS⁃ATD model
The core principle of our approach lies in synergizing the strengths of GraphSAGE and GCN. A popular model in GNN, GCN relies on the graph convolutional layer to learn graph structure representations. Through multiple convolution operations, GCN can effectively capture intricate features and structural information from the graph, thereby improving the model’s accuracy. However, GCN faces certain limitations when handling large⁃ scale graph data, especially when the graph is large. In such cases, the feature representation for each node requires aggregation of information from all its neighboring nodes, which significantly increases both computational demand and memory consumption.
To handle massive amounts of network traffic data, batch processing and node expansion strategies are also introduced. Specifically, an edge⁃based batching approach is employed to restrict the number of nodes and edges within each batch to a defined range, optimizing both computational efficiency and model performance. Instead of using the full neighborhood as shown in Ref.[15],for each batch, we first select a set of edges and then generate a subgraph by extracting the relevant node and edge features from these edges. This subgraph serves as the training input, effectively mitigating computational bottlenecks associated with processing the entire graph in memory. Specifically, the minimum number of nodes M in each batch is first determined to make surethat the number of nodes in each batch is enough to avoid insufficient information caused by too few nodes. Choose a batch of edgesfrom the graph, which is of size B. For each edge, the source and destination nodes are extracted to obtain a preliminary set of nodes,as shown as follows:
(4)
When generating batches, if a batch contains a small number of nodes ( less than a preset minimum number of nodes), we will expand the batch by selecting more adjacent nodes to ensure the diversity and representativeness of the batch, as shown in Eq. (5):
(5)
where neighbors (v) represents the set of neighbor nodes of node v, that is, all nodes directly connected to node v in the original network topology graph.
This expansion operation adds neighboring nodes such that the number of nodes in the batch is at least M. Extract a subgraph Gsubgraph from the original graph G based on the expanded node set Vexpanded:
(6)
where is the edge containing all nodes inThe subgraph is then used as the input data in the current batch for GNN training.
Similarly, SAGEConv, as a variant of the GraphSAGE model, improves on GraphSAGE by introducing more expressive convolution operations.Different from the traditional GraphSAGE method, SAGEConv captures the information in the graph structure more finely by normalizing the representation of each neighbor node. This improvement enables SAGEConv to extract more detailed neighbor node information, enhancing the model’s capacity to depict graph data. Following batch processing and the node expansion strategy, the data flows through the node encoding module, edge encoding module, and edge classification, ultimately producing the final output result.
Node encoding: Firstly, the GCN layer realizes the global scope of node information dissemination through the normalization operation of the adjacency matrix, as shown in Eq.(7):
(7)
whererepresents the node feature matrix of the l⁃th layer,is the normalized adjacency matrix of the graph,represents the trainable weight matrix of the l⁃th layer, and σ is the activation function ( such as ReLU).
By combining the characteristics of nearby nodes over the entire graph and weighting them according to the corresponding links, GCN can capture global structural traits. After receiving the output of the GCN layer, GraphSAGE takes it as the initial node feature, and then combines the features of local neighbor nodes for dynamic aggregation. Aggregation functions employed by the GraphSAGE layer, such as averaging, pooling, or LSTM aggregation, enable it to assign weights based on the importance of its neighbors, it is shown as follows:
(8)
whereis the feature passed from the GCN layer, and AGGREGATE is the aggregation function of GraphSAGE, such as pooling, mean or LSTM.
GCN provides a global semantic background for GraphSAGE, which makes GraphSAGE more accurate in local feature aggregation. GraphSAGE further optimized the features generated by GCN through a dynamic weighting mechanism, and enhanced the adaptability of the model to heterogeneous graphs and abnormal relationships.
Edge encoding: Input the edge coding module’s embedded and edge features that were processed by the preceding module. Better modification of the impact of various edges on the nodes is made possible by the attention⁃based calculations. The calculation ofedge feature attention is as follows:
(9)
Through the attention mechanism, a weight auv is assigned to the edge ( u, v), which indicates the importance of the edge to the final result in the aggregation process. The initial featureof the edge undergoes a linear conversion, and then the ReLU function is used to enhance the expressiveness. Softmax is used to normalize the weights of all edges, so that the weight distribution of edges conforms to the probability meaning.
The attention coefficients obtained are subsequently integrated with the edge features to generate a weighted edge feature using the attention mechanism. This method makes it possible to gather important information from the edges more successfully. The following is the formula:
(10)
wheredenotes the embedding representation of the edge ( u, v) at the lth layer,andare the embedding representations of the nodes at both ends of the edge,is the attention weight of the edge, represents the vector splicing operation.
Edge classification: The edge embedding generated by the READOUT operation captures the interrelations between the two end nodes of the edge and its properties and is used as the input for edge classification, as shown in Eq. (11):
(11)
where,andstand for the final embedding vectors of the nodes u and v, anddenotes the embedding representation of the edge (u, v). Finally, MLP was used to map the edge embedding to the specific classification result, which was used to detect whether it was abnormal traffic.
(12)
where yuv is the prediction result of edge ( u,v), and MLP is responsible for mapping the edge embedding to the classification result using a fully connected layer.
Algorithm 1 outlines the entire process and training flow of our model.
Algorithm 1:The HGS⁃ATD algorithm
Input:
• A graph G(V,ε)
• Input node features
• Input edge features
• Parameters: Number of layers, aggregation type (GCN/SAGE).
Output: Edge classification scores
1 For each iteration do
/*Batch processing and sampling */
2 batch_size =B,min_nodes = M
3
4 if ∣Vbatch ∣ < M:
5 Vexpanded = Vbatch ∪ {neighbors(v)|v ∈ Vbatch}
6 Gsubgraph = subgraph(Vexpanded,G)
/* node encoding */
7 For l = 1,2,...,L do
8 H (l) = σ(AH (l-1) W (l));
9= σ(W SAGE·CONACT(,AGGREGATE
10 end
/* edge encoding */
11 auv = softmax(ReLU
12= AGG
/* edge classification */
13 zuv = READOUT(zu,zv),∀(u,v) ∈ ε
14 yuv = MLP(zuv)
15 end
16 Return Return all edge classification scores
Overall, the proposed model combines the strengths of GCN and SAGEConv, replacing the traditional GraphConv layer. This design not only enables better handling of large⁃scale graph data but also maintains high accuracy and excellent performance. Choosing the right hyperparameters, including the total amount of hidden units as well as learning rate, is essential to maximizing the model’s accuracy and stability. In order to compare the proposed model with current methods, we conducted experiments on a number of benchmark datasets. The outcomes reveal that our model exhibits significant gains in efficiency and accuracy. Detailed experimental results and performance assessments will be discussed in the following sections.
4 Experiment and Result
4.1 Datasets
In this study, we utilize four distinct NetFlow datasets[42]: NF⁃UNSW⁃NB15⁃v2, NF⁃ToN⁃IoT, NF⁃BoT⁃IoT, and NF⁃CSE⁃CIC⁃IDS2018⁃v2. The NF⁃UNSW⁃NB15⁃v2 dataset consists of network traffic data derived from the UNSW⁃NB15[43]dataset, including both normal traffic and nine distinct types ofattack traffic. On the other hand, the NF⁃ToN⁃IoT dataset is specifically focused on IoT network traffic, incorporating features such as device diversity and intricate traffic patterns. The NF⁃BoT⁃IoT dataset is designed to provide insights into botnet attacks targeting IoT devices. Concurrently, the original file set of the CSE⁃CIC⁃IDS2018 dataset is used to create the NF⁃CSE⁃CIC⁃IDS2018⁃v2 dataset[44]. In order to examine network traffic and identify irregularities in IoT⁃based settings, these datasets are created specifically for that purpose, with standardization to the NetFlow format [45] carried out by Sarhan et al.[46]. Each dataset is specifically curated to simulate a variety of network conditions, aiding in the development and testing of NIDS. Table2 summarizes the salient features of these datasets.
4.2 Baselines
In this experiment, we evaluate the effectiveness of HGS⁃ATD model using DL and ML approaches, comparing it with three baseline models widely recognized as leading techniques in network intrusion detection.
•XGBoost [47] is a tree⁃based ML algorithm that belongs to the gradient boosting framework. It improves prediction accuracy by progressively constructing decision trees, where each new tree attempts to address the faults (residuals) produced by the prior ones. This iterative process allows XGBoost to learn from various data features, such as network traffic patterns or packet durations, to make more accurate and informed predictions.
• E⁃GraphSAGE [15] serves as a foundational model for analyzing graph data. It addresses a range of tasks in graph⁃structured datasets, including node classification, edge prediction, and graph clustering, by effectively learning representations of both nodes and edges within the graph. However, because all data must be processed simultaneously on the GPU, sometimes going beyond memory restrictions, its computational efficiency may be constrained when working with huge datasets. To overcome this problem, we retain the original E⁃GraphSAGE framework but incorporate a mini⁃batch training strategy to improve scalability.
• Anomal⁃E[48]can efficiently identify anomalous patterns in graph data by fusing self⁃supervised learning techniques with GNN, particularly in cases when the node⁃edge relationship is aberrant. By using a graph⁃based self⁃supervised task fortraining, the learned node representation is able to learn more about the graph structure without the need for a lot of label data. Large⁃scale graph data can stillbe processed effectively thanks to the graph convolution technique, which avoids the enormous expense of directly computing the entire graph.
Table2Salient features of the four datasets
4.3 Experiment Settings
The performance of the HGS⁃ATD model is thoroughly assessed using the four standard evaluation criteria listed below: Accuracy ( ACC), Precision, Recall and F1⁃score. These indicators have been widely used in several studies [49-50] and can evaluate the classification performance of models from several perspectives, especially when there are imbalanced categories.
For the parameter setting of the model, we set up 2 SAGEConv layers in order to capture more information about the network structure. By comparing the performance of models with different number of layers, the2⁃layer model provides the best balance between accuracy and computational efficiency. Thenumbers of hidden units is 128. Experiments show that this value can provide better performance on all data sets. A small number of hidden layer units cannot fully capture the complex relationship in the data, resulting in low accuracy. We set the number of epochs to 5 because, after several trials, we found that 5 epochs would allow the model to learn enough about the data and avoid overfitting due to too many epochs. The learning rate range is set to [10 -3, 10 -2], and the grid search method is used to tune the model. The experimental results show that when the learning rate is 0.001, the model can converge well in the training process and avoid excessive fluctuations in training. We chose2048 as the batch size in order to balance memory usage and training speed. Dropout regularizationof 0.2 was applied after each SAGEConv layer, this setting performed well on the dataset and helped the model avoid overfitting issues during training. The optimizer is Adam[51], which combines the advantages of Adagrad and RMSProp and can adaptively adjust the learning rate for each parameter, which helps to accelerate the convergence speed of the model and improve the training effect. Table3 shows the detailed settings of experimental parameters.
Table3Setting hyperparameters
The dataset is divided into two parts, with the training set accounting for 60% and the test set accounting for 40%. We use Python[52], PyTorch[53], PyTorch⁃geometric[54]and DGL 55]to implement our proposed model and training procedure. Because of the large⁃scale data in this experiment, we perform the evaluations using an NVIDIA Tesla V100 32GB GPU.
4.4 Results of Binary Classification
We evaluate the HGS⁃ATD model on a range of datasets against XGB, E⁃GraphSAGE, and Anomal⁃E for the binary classification test. According to the experimental findings shown in Table4, HGS⁃ATD outperforms the other models on the majority of datasets, especially when it comes to important metrics such as ACC and F1⁃score.
Table4Binary classification results
Note:Data in bold is the one with the best performance.
In the binary classification experiments of four data sets, HGS⁃ATD model shows significant advantages. For NF⁃UNSW⁃NB15⁃v2, HGS⁃ATD leads other models with an accuracy of 0.9948, and its recall rate is as high as 0.9994, which almost completely captures attack traffic, but its precision is slightly lower than that of the baseline model, indicating that a small amount of normal traffic is misclassified as an attack. In NF⁃ToN⁃IoT with a small number of samples and NF⁃BoT⁃IoT with a high imbalance ( 97. 69% of attacks), the indicators of HGS⁃ATD are higher than those of the three baseline models, showing its applicability and robustness in IoT scenarios, as well as its reliability in data skew scenarios. For the large⁃scale dataset NF⁃CSE⁃CIC⁃IDS2018⁃v2, HGS⁃ATD performs best with 0. 9932 accuracy and 0. 9938 precision, but the recall rate (0.9495) slightly decreases due to the limited dataset size, indicating that the model still maintains high reliability in large traffic scenarios. However, it needs to combine dynamic feature extraction to improve the adaptability to unknown attacks.
The confusion matrix in Fig. 4 shows the specific detection results of the model on four data sets. In NF⁃ UNSW⁃NB15⁃v2 and NF⁃CSE⁃CIC⁃IDS2018⁃v2, the model highlights its high recall characteristics with an accuracy of more than 99% and a very low miss rate, especially suitable for the scenario of zero tolerance to attacks. In NF⁃ToN⁃IoT, the model balances the precision ( 99.24%)and recall( 99.42%), whichverifies its robustness in different network environments. However, in the extremely imbalanced NF⁃BoT⁃IoT dataset, although the model maintains high accuracy ( 99.66%), the missed detection rate rises to 6.25%, which exposes the generalization bottleneck under class imbalance. In the future, data augmentation ( such as oversampling) and dynamicloss function optimization should be used to further improve the sensitivity to rare samples, and fine⁃ grained feature engineering should be combined to capture low⁃frequency attack patterns, so as to comprehensively strengthen the practicability of the model in complex scenarios.
Fig.4Binary confusion matrix
In conclusion, the proposed hybrid model, which combines GCN and GraphSAGE, demonstrates exceptional overall performance across four datasets in the binary classification task, outperforming baseline models such as XGBoost, E⁃GraphSAGE, and Anomal⁃E on four evaluation criteria. This thoroughly demonstrates the method’s high reliability and practicability as well as its ability to differentiate between attack and normal traffic in anomaly traffic detection.
4.5 Results of Multi⁃Class Classification
In the multi⁃classification task, we also conducted experiments on the above four data sets to make a comparison of the classification accuracies ofdifferent methods. As presented in Table5, the proposed approach achieved the highest accuracy on all datasets.
For the NF⁃UNSW⁃NB15⁃v2 dataset, the HGS⁃ATD obtains an accuracy of 0.9947, which significantly outperforms the three baseline models. As shown in Fig. 5, for the classification of attack types such as exploits and fuzzers, the accuracy is close to or at 100%, indicating that the model effectively captures the unique features of these attack types, and distinguishes them from other traffic. This further proves the powerful recognition ability of the HGS⁃ATD in dealing with various attack patterns.
Table5Results of multi⁃class classification experiments
Note: Data in bold is the one with the best performance.
Fig.5Multi⁃class classification results of NF⁃ UNSW⁃NB15⁃v2
The HGS⁃ATD performs exceptionally well for the NF⁃ToN⁃IoT dataset, with an accuracy of 0.9919, significantly outperforming other models. The accuracy was 0.7218 for XGBoost, 0.4597 for E⁃graphSAGE, and 0. 6345 for Anomal⁃E. These results demonstrate how well the model can categorize different kinds of attack traffic in this multi⁃classification challenge. As shown in Fig. 6, the model achieves exceptionally high classification accuracy for attack types such as DDoS and injection, with minimal misclassifications. This can be attributed to the model’s ability to recognize complicated traffic patterns in the dataset, as well as their correlations with attack signatures, which is the reason for its better performance.
Fig.6Multi⁃class classification results of NF⁃ToN⁃IoT
The HGS⁃ATD obtains an accuracy of 0.9283 for the NF⁃BoT⁃IoT dataset, outperforming XGBoost (0.8362), E⁃graphSAGE ( 0.8144), and Anomal⁃E (0.8239). Even though the dataset exhibits notableclass imbalances that may impair the model’s capacity to classify data, the HGS⁃ATD still manages to maintain high accuracy. This highlights its strength in handling datasets with unbalanced classes. The model effectively learns the patterns associated with different attack types, especially in the case of reconnaissance and DDoS attacks, and has an excellent classification effect. As shown in Fig. 7, although the model is less accurate when dealing with categories with a small sample size ( such as theft), this may be due to the fact that there is less data for these categories, resulting in the model not learning their features adequately.
Fig.7Multi⁃class classification results of NF⁃BoT⁃IoT
Using the NF⁃CSE⁃CIC⁃IDS2018⁃v2 dataset, the suggested method once more shows a high accuracy of 0.9929, slightly higher than XGBoost (0.9898), E⁃graphSAGE (0.9833), and Anomal⁃E (0.9820). As demonstrated in Fig.8, the model performs particularly well in identifying various attacks, including DDoS attack⁃HOIC and DoS attack⁃Hulk, maintaining a high level of accuracy. This highlights the model’s strong generalization capability, enabling it to adapt effectively to diverse attack patterns.
On all four datasets, the HGS⁃ATD shows a much higher accuracy than the baseline models in the multi⁃classification task. This clearly indicates that the technique not only successfully distinguishes between normal and attack traffic but also excels in accurately identifying various types of attacks. Its robustgeneralization capabilities allow it to perform well on different datasets, and it displays a strong aptitude for recognizing and classifying complex attack patterns. It provides solid support for anomaly traffic detection in network security, offering a more precise way to identify and address different types of network attacks.
Fig.8Multi⁃class classification results of NF⁃ CSE⁃CIC⁃IDS2018⁃v2
4.6 Discussion of Results
The experimental results demonstrate the practical significance of the HGS⁃ATD framework. In the actual network security scenario, the model has high accuracy and F1⁃score, and can accurately identify normal traffic and attack traffic in large⁃scale network traffic monitoring. In enterprise networks, for example, this helps to detect potential network attacks such as data theft and DDoS attacks in a timely manner. By accurately identifying these attacks, corresponding protective measures can be taken in time to protect the integrity of sensitive information and network systems of enterprises. In smart home systems, where numerous devices such as cameras, sensors, and smart appliances are connected to each other, the model can detect abnormal traffic patterns.
The edge⁃based dynamic batching strategy reduces the memory overhead and can be deployed in real⁃time monitoring systems for large⁃scale networks, such as cloud infrastructure or IoT ecosystems. In multi⁃class classification tasks, the proposed model’s precise identification of diverse attack types offers critical support for network security management. Network security administrators can quickly identify attack types based on the output of the model, so as to formulate targeted defense strategies. Greatly improve the efficiency and effectiveness of network security management. For example, when the model detects a DDoS attack, the administrator can adjust the network traffic policy in time to block malicious traffic andprotect network services.
However, the framework exhibits certain limitations, particularly in handling imbalanced datasets, as shown in the NF⁃BoT⁃IoT dataset, where the sample size for some attack types, such as theft, is small, the ability of the model to identify these rare attack types is insufficient. Although the overall accuracy of the model is high, misclassification or missed detection of rare attack types may bring potential security risks to the network. In real⁃world networks, these undetected attacks can lead to data leakage or system failure.
In addition, the model currently focuses on static traffic feature analysis. In the real network environment, network traffic is highly dynamic, its traffic and patterns are constantly changing, and new attack techniques are constantly emerging. Attacks typically involve rapid changes in traffic behavior, such as sudden bursts of traffic or abnormal fluctuations in data transmission rates. However, this model does not fully consider the time series characteristics and dynamic change rules of network traffic. This can lead to missed detections or false positives when dealing with attacks that exhibit dynamic behavior. For example, for stealth attacks that exploit sudden traffic changes to evade detection, the model may not be able to accurately identify them in time.
To address these limitations, several directions for improvement and suggestions for future research can be proposed. To improve the recognition ability of the model for rare attack types, data augmentation techniques can be applied. Oversampling methods can increase the number of samples for rare attack types, and Generative Adversarial Networks ( GAN) can generate synthetic samples that are similar to real⁃ world data, enabling the model to learn more comprehensive features. In addition, adversarial training can be introduced. By adding adversarial samples in the training process, the model can learn to resist adversarial attacks and improve its robustness.
Regarding the adaptation of the model to dynamic traffic characteristics, future research could explore the combination of time series analysis methods. LSTM network and its variants can be integrated into this model to capture the time⁃series characteristics of network traffic. In this way, the model can better adapt to the dynamic changes of the network environment and effectively detect attacks with changes in traffic patterns.
5 Conclusions and Future Work
This paper proposes a traffic anomaly detection method based on a hybrid model of GCN and GraphSAGE. By combining the global topology awareness of GCN and the local dynamic aggregation of GraphSAGE, as well as a novel edge⁃based batch processing strategy, it can effectively capture node edge relationships and complex traffic patterns. This paper implements scalable training of large⁃scale graphs, and compares it with traditional baseline models such as XGBoost, E⁃graphSAGE and Anomal⁃E on multiple datasets, and verifies its advantages in key indicators such as accuracy and recall. In particular, it shows strong advantages in dealing with the identification of multiple attack types, the accurate classification of attack traffic and the processing ability of unbalanced data sets, which proves its effectiveness in the field of network security.
Despite the satisfactory results, the present study has some limitations that suggest several directions for future research. Firstly, current methods mainly focus on the static feature analysis of traffic. In order to enhance the ability of the model to identify attacks in real time, future research should combine time series data and dynamic traffic characteristics, such as introducing the GATv2 dynamic graph attention mechanism. In addition, although HGS⁃ATD has a good performance on imbalanced datasets, for some attack types ( such as categories with a small number of samples), the model still has a certain miss rate. In the future, data augmentation and adversarial training can be used to further improve the recognition ability of the model for rare attack types. Finally, with the evolution of attack methods, the characteristics of attack traffic are also changing. Transfer learning techniques can be explored to further improve the adaptability of the HGS⁃ATD model in different network environments. For example, techniques such as adversarial domain adaptation [56 ] can be used to align the feature distribution between the source domain and the target domain to reduce the annotation cost in new environments. The knowledge obtained in the pre⁃trained model can quickly adapt to the new network environment and traffic patterns, and improve its generalization ability and robustness.
In summary, the approach presented in this paper demonstrates significant potential for network anomalytraffic detection. In the future, its performance and applicability can be further improved by optimizing the model structure, enhancing the dataset and combining with emerging technologies, so as to provide more effective technical support for network security protection.