A thread module is a part of a function. One module is used to decode data packets, another module is a detection module, and the other module is an output module. A packet can be processed by multiple threads. The packet will pass through the queue to the next thread. Packets can only be processed by one thread at a time, but the engine can process multiple packets at a time. (see Max-pending-packets) A thread can have one or more thread modules. If they have more modules, they can only be activated once. The way threads, modules and queues are arranged together is called Runmode.
You can choose one of several predefined operating modes. The command line option -list-runmodes displays all available running modes. All operating modes have a name: automatic, single and automatic. The most important task is detection; A packet will be checked for thousands of signatures.
Example of default operation mode:
In pfring mode, each process follows its own fixed route in operation mode.
Suricata needs to run in worker mode to get the best performance. This actually means that there are multiple threads, each running a complete package pipeline, and each thread receives the package from the capture method. This means that we rely on the capture method to distribute packets to different threads. One of the key aspects is that Suricata needs to get two aspects of the process in the right order in the same thread.
Both AF_PACKET and PF_RING capture methods have the option of selecting "cluster type". These defaults are "cluster_flow", which means that the capture method hashes by stream (5-tuple). This hash is symmetric.
The network diagram has no built-in cluster_flow mode. You can use the "lb" tool to add it separately:/luigirizzo/netmap/tree/master/apps/lb.
On almost all modern network cards with multiple queues, RSS settings need to be considered.
Receiver Scaling (RSS) is a technology used by network cards to distribute incoming traffic to various queues on the network cards. This is to improve performance, but it is important to realize that it is designed for normal traffic, not for IDS packet capture. RSS uses a hash algorithm to distribute incoming traffic to queues. This hash function is usually asymmetric. This means that when two ends of a stream are received, each end may end up in a different queue. Unfortunately, when deploying Suricata, this is a common situation when using span ports or Trap.
The problem here is that the order of packet processing becomes unpredictable because the two ends of the traffic are placed in different queues. The time difference between network card, driver, kernel and Suricata will lead to the order of data packets entering the network higher than that of the network. This is specifically about the mismatch between the two traffic directions. For example, Suricata tracks TCP 3 handshakes. Due to this time problem, SYN/ACK can only be received by Suricata after the client starts sending data to the server. Suricata will treat this traffic as invalid.
AF_PACKET, PF_RING or NETMAP can't solve this problem. This will require buffering and packet reordering, which is expensive.
To see how many queues are configured:
Some network cards allow you to set it to symmetric mode. Intel X(L)7 10 card can do this in theory, but the driver has not been able to do this (the work is trying to solve this problem). Another solution is to set a special "random key" to make RSS symmetric. See http://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf.
But in most cases, the best solution is to reduce the number of RSS queues to 1:
Example:
Some drivers do not support setting the number of queues through ethtool. In some cases, there is a module loading time option. For more information, please read the driver documentation.
The network card, the driver and the kernel itself all have various technologies to speed up the processing of data packets. Usually these will be disabled.
The LRO/GRO protocol combines various smaller data packets into a large "super data packet". These needs are disabled because they break the dsize keyword and TCP status tracking.
You can enable checksum offload on AF_PACKET and PF_RING, but you need to disable it on PCAP, network map, etc.
Read your driver documentation! For example. For i40e, if it is not done correctly, the ethtool change of RSS queue may cause kernel crash.
General: Set the RSS queue to 1 or ensure that the RSS hash is symmetric. Disable NIC uninstallation.
AF_PACKET: 1+0 RSS queue and stay in the kernel = 4.4. 16,>= 4.6.5 or > = 4.7.
Exception: If RSS is a symmetric cluster type, you can use "cluster_qm" to bind Suricata to the RSS queue. Disable NIC uninstallation except rx/tx csum.
PF_RING: 1+0 RSS queue, using cluster type "cluster_flow". Disable NIC uninstallation except rx/tx csum.
Web map: 1 RSS queue. There is no built-in traffic-based load balancing, but the "lb" tool can help. Another option is to use the "autofp" running mode.
Exception: If RSS is symmetric and load balancing is based on RSS hash, multiple RSS queues can be used. Disable all NIC uninstallation.