Go to file
Neal Cardwell 04dce9a120 tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe
[ Upstream commit b41b4cbd9655bcebcce941bef3601db8110335be ]

Fix tcp_enter_recovery() so that if there are no retransmits out then
we zero retrans_stamp when entering fast recovery. This is necessary
to fix two buggy behaviors.

Currently a non-zero retrans_stamp value can persist across multiple
back-to-back loss recovery episodes. This is because we generally only
clears retrans_stamp if we are completely done with loss recoveries,
and get to tcp_try_to_open() and find !tcp_any_retrans_done(sk). This
behavior causes two bugs:

(1) When a loss recovery episode (CA_Loss or CA_Recovery) is followed
immediately by a new CA_Recovery, the retrans_stamp value can persist
and can be a time before this new CA_Recovery episode starts. That
means that timestamp-based undo will be using the wrong retrans_stamp
(a value that is too old) when comparing incoming TS ecr values to
retrans_stamp to see if the current fast recovery episode can be
undone.

(2) If there is a roughly minutes-long sequence of back-to-back fast
recovery episodes, one after another (e.g. in a shallow-buffered or
policed bottleneck), where each fast recovery successfully makes
forward progress and recovers one window of sequence space (but leaves
at least one retransmit in flight at the end of the recovery),
followed by several RTOs, then the ETIMEDOUT check may be using the
wrong retrans_stamp (a value set at the start of the first fast
recovery in the sequence). This can cause a very premature ETIMEDOUT,
killing the connection prematurely.

This commit changes the code to zero retrans_stamp when entering fast
recovery, when this is known to be safe (no retransmits are out in the
network). That ensures that when starting a fast recovery episode, and
it is safe to do so, retrans_stamp is set when we send the fast
retransmit packet. That addresses both bug (1) and bug (2) by ensuring
that (if no retransmits are out when we start a fast recovery) we use
the initial fast retransmit of this fast recovery as the time value
for undo and ETIMEDOUT calculations.

This makes intuitive sense, since the start of a new fast recovery
episode (in a scenario where no lost packets are out in the network)
means that the connection has made forward progress since the last RTO
or fast recovery, and we should thus "restart the clock" used for both
undo and ETIMEDOUT logic.

Note that if when we start fast recovery there *are* retransmits out
in the network, there can still be undesirable (1)/(2) issues. For
example, after this patch we can still have the (1) and (2) problems
in cases like this:

+ round 1: sender sends flight 1

+ round 2: sender receives SACKs and enters fast recovery 1,
  retransmits some packets in flight 1 and then sends some new data as
  flight 2

+ round 3: sender receives some SACKs for flight 2, notes losses, and
  retransmits some packets to fill the holes in flight 2

+ fast recovery has some lost retransmits in flight 1 and continues
  for one or more rounds sending retransmits for flight 1 and flight 2

+ fast recovery 1 completes when snd_una reaches high_seq at end of
  flight 1

+ there are still holes in the SACK scoreboard in flight 2, so we
  enter fast recovery 2, but some retransmits in the flight 2 sequence
  range are still in flight (retrans_out > 0), so we can't execute the
  new retrans_stamp=0 added here to clear retrans_stamp

It's not yet clear how to fix these remaining (1)/(2) issues in an
efficient way without breaking undo behavior, given that retrans_stamp
is currently used for undo and ETIMEDOUT. Perhaps the optimal (but
expensive) strategy would be to set retrans_stamp to the timestamp of
the earliest outstanding retransmit when entering fast recovery. But
at least this commit makes things better.

Note that this does not change the semantics of retrans_stamp; it
simply makes retrans_stamp accurate in some cases where it was not
before:

(1) Some loss recovery, followed by an immediate entry into a fast
recovery, where there are no retransmits out when entering the fast
recovery.

(2) When a TFO server has a SYNACK retransmit that sets retrans_stamp,
and then the ACK that completes the 3-way handshake has SACK blocks
that trigger a fast recovery. In this case when entering fast recovery
we want to zero out the retrans_stamp from the TFO SYNACK retransmit,
and set the retrans_stamp based on the timestamp of the fast recovery.

We introduce a tcp_retrans_stamp_cleanup() helper, because this
two-line sequence already appears in 3 places and is about to appear
in 2 more as a result of this bug fix patch series. Once this bug fix
patches series in the net branch makes it into the net-next branch
we'll update the 3 other call sites to use the new helper.

This is a long-standing issue. The Fixes tag below is chosen to be the
oldest commit at which the patch will apply cleanly, which is from
Linux v3.5 in 2012.

Fixes: 1fbc340514 ("tcp: early retransmit: tcp_enter_recovery()")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241001200517.2756803-3-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17 15:24:23 +02:00
Documentation selftests: Introduce Makefile variable to list shared bash scripts 2024-10-17 15:24:13 +02:00
LICENSES
arch x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h 2024-10-17 15:24:22 +02:00
block blk_iocost: fix more out of bound shifts 2024-10-10 11:57:24 +02:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto crypto: simd - Do not call crypto_alloc_tfm during registration 2024-10-10 11:57:26 +02:00
drivers net: phy: dp83869: fix memory corruption when enabling fiber 2024-10-17 15:24:23 +02:00
fs NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies() 2024-10-17 15:24:23 +02:00
include NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies() 2024-10-17 15:24:23 +02:00
init rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT 2024-08-29 17:33:29 +02:00
io_uring io_uring: check if we need to reschedule during overflow flush 2024-10-17 15:24:18 +02:00
ipc sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table) 2024-08-11 12:47:13 +02:00
kernel bpf: Prevent tail call between progs attached to different hooks 2024-10-17 15:24:16 +02:00
lib lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat 2024-10-17 15:24:13 +02:00
mm mm: z3fold: deprecate CONFIG_Z3FOLD 2024-10-10 11:58:02 +02:00
net tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe 2024-10-17 15:24:23 +02:00
rust rust: sync: require `T: Sync` for `LockedBy::access` 2024-10-10 11:57:44 +02:00
samples samples/bpf: Fix compilation errors with cf-protection option 2024-10-04 16:29:19 +02:00
scripts kconfig: qconf: fix buffer overflow in debug links 2024-10-10 11:58:01 +02:00
security tomoyo: fallback to realpath if symlink's pathname does not exist 2024-10-10 11:57:57 +02:00
sound PCI: Add function 0 DMA alias quirk for Glenfly Arise chip 2024-10-17 15:24:16 +02:00
tools tools/iio: Add memory allocation failure check for trigger_name 2024-10-17 15:24:20 +02:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock 2024-10-04 16:29:47 +02:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Remove *.orig pattern from .gitignore 2024-10-04 16:29:44 +02:00
.mailmap 20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues 2023-10-24 09:52:16 -10:00
.rustfmt.toml
COPYING
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild
Kconfig
MAINTAINERS membarrier: riscv: Add full memory barrier in switch_mm() 2024-09-12 11:11:45 +02:00
Makefile Linux 6.6.56 2024-10-10 12:50:06 +02:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.