add document for TCP OFO, PAWS and skip ACK counters
add document and examples for below counters: TcpExtTCPOFOQueue TcpExtTCPOFODrop TcpExtTCPOFOMerge TcpExtPAWSActive TcpExtPAWSEstab TcpExtTCPACKSkippedSynRecv TcpExtTCPACKSkippedPAWS TcpExtTCPACKSkippedSeq TcpExtTCPACKSkippedFinWait2 TcpExtTCPACKSkippedTimeWait TcpExtTCPACKSkippedChallenge Signed-off-by: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
3a0ed3e961
commit
2b96547223
|
@ -571,7 +571,97 @@ duplicate packet is received.
|
|||
|
||||
* TcpExtTCPDSACKOfoRecv
|
||||
The TCP stack receives a DSACK, which indicate an out of order
|
||||
duplciate packet is received.
|
||||
duplicate packet is received.
|
||||
|
||||
TCP out of order
|
||||
===============
|
||||
* TcpExtTCPOFOQueue
|
||||
The TCP layer receives an out of order packet and has enough memory
|
||||
to queue it.
|
||||
|
||||
* TcpExtTCPOFODrop
|
||||
The TCP layer receives an out of order packet but doesn't have enough
|
||||
memory, so drops it. Such packets won't be counted into
|
||||
TcpExtTCPOFOQueue.
|
||||
|
||||
* TcpExtTCPOFOMerge
|
||||
The received out of order packet has an overlay with the previous
|
||||
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
|
||||
packets will also be counted into TcpExtTCPOFOQueue.
|
||||
|
||||
TCP PAWS
|
||||
=======
|
||||
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
|
||||
which is used to drop old packets. It depends on the TCP
|
||||
timestamps. For detail information, please refer the `timestamp wiki`_
|
||||
and the `RFC of PAWS`_.
|
||||
|
||||
.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17
|
||||
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
|
||||
|
||||
* TcpExtPAWSActive
|
||||
Packets are dropped by PAWS in Syn-Sent status.
|
||||
|
||||
* TcpExtPAWSEstab
|
||||
Packets are dropped by PAWS in any status other than Syn-Sent.
|
||||
|
||||
TCP ACK skip
|
||||
===========
|
||||
In some scenarios, kernel would avoid sending duplicate ACKs too
|
||||
frequently. Please find more details in the tcp_invalid_ratelimit
|
||||
section of the `sysctl document`_. When kernel decides to skip an ACK
|
||||
due to tcp_invalid_ratelimit, kernel would update one of below
|
||||
counters to indicate the ACK is skipped in which scenario. The ACK
|
||||
would only be skipped if the received packet is either a SYN packet or
|
||||
it has no data.
|
||||
|
||||
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
|
||||
|
||||
* TcpExtTCPACKSkippedSynRecv
|
||||
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
|
||||
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
|
||||
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
|
||||
in the Syn-Recv status. But in several scenarios, the TCP stack need
|
||||
to send an ACK. E.g., the TCP stack receives the same SYN packet
|
||||
repeately, the received packet does not pass the PAWS check, or the
|
||||
received packet sequence number is out of window. In these scenarios,
|
||||
the TCP stack needs to send ACK. If the ACk sending frequency is higher than
|
||||
tcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and
|
||||
increase TcpExtTCPACKSkippedSynRecv.
|
||||
|
||||
|
||||
* TcpExtTCPACKSkippedPAWS
|
||||
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
|
||||
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
|
||||
or Time-Wait statuses, the skipped ACK would be counted to
|
||||
TcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or
|
||||
TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
|
||||
would be counted to TcpExtTCPACKSkippedPAWS.
|
||||
|
||||
* TcpExtTCPACKSkippedSeq
|
||||
The sequence number is out of window and the timestamp passes the PAWS
|
||||
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
|
||||
|
||||
* TcpExtTCPACKSkippedFinWait2
|
||||
The ACK is skipped in Fin-Wait-2 status, the reason would be either
|
||||
PAWS check fails or the received sequence number is out of window.
|
||||
|
||||
* TcpExtTCPACKSkippedTimeWait
|
||||
Tha ACK is skipped in Time-Wait status, the reason would be either
|
||||
PAWS check failed or the received sequence number is out of window.
|
||||
|
||||
* TcpExtTCPACKSkippedChallenge
|
||||
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
|
||||
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
|
||||
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
|
||||
three scenarios, In some TCP status, the linux TCP stack would also
|
||||
send challenge ACKs if the ACK number is before the first
|
||||
unacknowledged number (more strict than `RFC 5961 section 5.2`_).
|
||||
|
||||
.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7
|
||||
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
|
||||
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
|
||||
|
||||
|
||||
examples
|
||||
=======
|
||||
|
@ -1188,3 +1278,151 @@ Run nstat on server B::
|
|||
We have deleted the default route on server B. Server B couldn't find
|
||||
a route for the 8.8.8.8 IP address, so server B increased
|
||||
IpOutNoRoutes.
|
||||
|
||||
TcpExtTCPACKSkippedSynRecv
|
||||
------------------------
|
||||
In this test, we send 3 same SYN packets from client to server. The
|
||||
first SYN will let server create a socket, set it to Syn-Recv status,
|
||||
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
|
||||
again, and record the reply time (the duplicate ACK reply time). The
|
||||
third SYN will let server check the previous duplicate ACK reply time,
|
||||
and decide to skip the duplicate ACK, then increase the
|
||||
TcpExtTCPACKSkippedSynRecv counter.
|
||||
|
||||
Run tcpdump to capture a SYN packet::
|
||||
|
||||
nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000
|
||||
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
|
||||
|
||||
Open another terminal, run nc command::
|
||||
|
||||
nstatuser@nstat-a:~$ nc nstat-b 9000
|
||||
|
||||
As the nstat-b didn't listen on port 9000, it should reply a RST, and
|
||||
the nc command exited immediately. It was enough for the tcpdump
|
||||
command to capture a SYN packet. A linux server might use hardware
|
||||
offload for the TCP checksum, so the checksum in the /tmp/syn.pcap
|
||||
might be not correct. We call tcprewrite to fix it::
|
||||
|
||||
nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum
|
||||
|
||||
On nstat-b, we run nc to listen on port 9000::
|
||||
|
||||
nstatuser@nstat-b:~$ nc -lkv 9000
|
||||
Listening on [0.0.0.0] (family 0, port 9000)
|
||||
|
||||
On nstat-a, we blocked the packet from port 9000, or nstat-a would send
|
||||
RST to nstat-b::
|
||||
|
||||
nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP
|
||||
|
||||
Send 3 SYN repeatly to nstat-b::
|
||||
|
||||
nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done
|
||||
|
||||
Check snmp cunter on nstat-b::
|
||||
|
||||
nstatuser@nstat-b:~$ nstat | grep -i skip
|
||||
TcpExtTCPACKSkippedSynRecv 1 0.0
|
||||
|
||||
As we expected, TcpExtTCPACKSkippedSynRecv is 1.
|
||||
|
||||
TcpExtTCPACKSkippedPAWS
|
||||
----------------------
|
||||
To trigger PAWS, we could send an old SYN.
|
||||
|
||||
On nstat-b, let nc listen on port 9000::
|
||||
|
||||
nstatuser@nstat-b:~$ nc -lkv 9000
|
||||
Listening on [0.0.0.0] (family 0, port 9000)
|
||||
|
||||
On nstat-a, run tcpdump to capture a SYN::
|
||||
|
||||
nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000
|
||||
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
|
||||
|
||||
On nstat-a, run nc as a client to connect nstat-b::
|
||||
|
||||
nstatuser@nstat-a:~$ nc -v nstat-b 9000
|
||||
Connection to nstat-b 9000 port [tcp/*] succeeded!
|
||||
|
||||
Now the tcpdump has captured the SYN and exit. We should fix the
|
||||
checksum::
|
||||
|
||||
nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum
|
||||
|
||||
Send the SYN packet twice::
|
||||
|
||||
nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done
|
||||
|
||||
On nstat-b, check the snmp counter::
|
||||
|
||||
nstatuser@nstat-b:~$ nstat | grep -i skip
|
||||
TcpExtTCPACKSkippedPAWS 1 0.0
|
||||
|
||||
We sent two SYN via tcpreplay, both of them would let PAWS check
|
||||
failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
|
||||
for the second SYN, and updated TcpExtTCPACKSkippedPAWS.
|
||||
|
||||
TcpExtTCPACKSkippedSeq
|
||||
--------------------
|
||||
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
|
||||
timestamp (to pass PAWS check) but the sequence number is out of
|
||||
window. The linux TCP stack would avoid to skip if the packet has
|
||||
data, so we need a pure ACK packet. To generate such a packet, we
|
||||
could create two sockets: one on port 9000, another on port 9001. Then
|
||||
we capture an ACK on port 9001, change the source/destination port
|
||||
numbers to match the port 9000 socket. Then we could trigger
|
||||
TcpExtTCPACKSkippedSeq via this packet.
|
||||
|
||||
On nstat-b, open two terminals, run two nc commands to listen on both
|
||||
port 9000 and port 9001::
|
||||
|
||||
nstatuser@nstat-b:~$ nc -lkv 9000
|
||||
Listening on [0.0.0.0] (family 0, port 9000)
|
||||
|
||||
nstatuser@nstat-b:~$ nc -lkv 9001
|
||||
Listening on [0.0.0.0] (family 0, port 9001)
|
||||
|
||||
On nstat-a, run two nc clients::
|
||||
|
||||
nstatuser@nstat-a:~$ nc -v nstat-b 9000
|
||||
Connection to nstat-b 9000 port [tcp/*] succeeded!
|
||||
|
||||
nstatuser@nstat-a:~$ nc -v nstat-b 9001
|
||||
Connection to nstat-b 9001 port [tcp/*] succeeded!
|
||||
|
||||
On nstat-a, run tcpdump to capture an ACK::
|
||||
|
||||
nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001
|
||||
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
|
||||
|
||||
On nstat-b, send a packet via the port 9001 socket. E.g. we sent a
|
||||
string 'foo' in our example::
|
||||
|
||||
nstatuser@nstat-b:~$ nc -lkv 9001
|
||||
Listening on [0.0.0.0] (family 0, port 9001)
|
||||
Connection from nstat-a 42132 received!
|
||||
foo
|
||||
|
||||
On nstat-a, the tcpdump should have caputred the ACK. We should check
|
||||
the source port numbers of the two nc clients::
|
||||
|
||||
nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
|
||||
State Recv-Q Send-Q Local Address:Port Peer Address:Port
|
||||
ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000
|
||||
ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001
|
||||
|
||||
Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to
|
||||
port 50208::
|
||||
|
||||
nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum
|
||||
|
||||
Now the /tmp/seq.pcap is the packet we need. Send it to nstat-b::
|
||||
|
||||
nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done
|
||||
|
||||
Check TcpExtTCPACKSkippedSeq on nstat-b::
|
||||
|
||||
nstatuser@nstat-b:~$ nstat | grep -i skip
|
||||
TcpExtTCPACKSkippedSeq 1 0.0
|
||||
|
|
Loading…
Reference in New Issue