Go to file
David S. Miller 2a8d8e0fec Merge branch 'blackhole-device-to-invalidate-dst'
Mahesh Bandewar says:

====================
blackhole device to invalidate dst

When we invalidate dst or mark it "dead", we assign 'lo' to
dst->dev. First of all this assignment is racy and more over,
it has MTU implications.

The standard dev MTU is 1500 while the Loopback MTU is 64k. TCP
code when dereferencing the dst don't check if the dst is valid
or not. TCP when dereferencing a dead-dst while negotiating a
new connection, may use dst device which is 'lo' instead of
using the correct device. Consider the following scenario:

A SYN arrives on an interface and tcp-layer while processing
SYNACK finds a dst and associates it with SYNACK skb. Now before
skb gets passed to L3 for processing, if that dst gets "dead"
(because of the virtual device getting disappeared & then reappeared),
the 'lo' gets assigned to that dst (lo MTU = 64k). Let's assume
the SYN has ADV_MSS set as 9k while the output device through
which this SYNACK is going to go out has standard MTU of 1500.
The MTU check during the route check passes since MIN(9K, 64K)
is 9k and TCP successfully negotiates 9k MSS. The subsequent
data packet; bigger in size gets passed to the device and it
won't be marked as GSO since the assumed MTU of the device is
9k.

This either crashes the NIC and we have seen fixes that went
into drivers to handle this scenario. 8914a59511 ('bnx2x:
disable GSO where gso_size is too big for hardware') and
2b16f04872 ('net: create skb_gso_validate_mac_len()') and
with those fixes TCP eventually recovers but not before
few dropped segments.

Well, I'm not a TCP expert and though we have experienced
these corner cases in our environment, I could not reproduce
this case reliably in my test setup to try this fix myself.
However, Michael Chan <michael.chan@broadcom.com> had a setup
where these fixes helped him mitigate the issue and not cause
the crash.

The idea here is to not alter the data-path with additional
locks or smb()/rmb() barriers to avoid racy assignments but
to create a new device that has really low MTU that has
.ndo_start_xmit essentially a kfree_skb(). Make use of this
device instead of 'lo' when marking the dst dead.

First patch implements the blackhole device and second
patch uses it in IPv4 and IPv6 stack while the third patch
is the self test that ensures the sanity of this device.

v1->v2
  fixed the self-test patch to handle the conflict

v2 -> v3
  fixed Kconfig text/string.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Documentation ipv6: icmp: allow flowlabel reflection in echo replies 2019-07-01 10:54:51 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-27 21:06:39 -07:00
block block: fix page leak when merging to same page 2019-06-17 09:33:04 -06:00
certs treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 2019-05-24 17:27:11 +02:00
crypto treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
drivers loopback: create blackhole net device similar to loopack. 2019-07-01 19:34:46 -07:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-27 21:06:39 -07:00
include loopback: create blackhole net device similar to loopack. 2019-07-01 19:34:46 -07:00
init Char/Misc driver fixes for 5.2-rc4 2019-06-08 12:50:36 -07:00
ipc treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-27 21:06:39 -07:00
lib blackhole_dev: add a selftest 2019-07-01 19:34:46 -07:00
mm treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
net blackhole_netdev: use blackhole_netdev to invalidate dst entries 2019-07-01 19:34:46 -07:00
samples samples: pktgen: allow to specify destination port 2019-07-01 11:02:20 -07:00
scripts treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 2019-06-19 17:11:22 +02:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sound treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
tools blackhole_dev: add a selftest 2019-07-01 19:34:46 -07:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt SPDX update for 5.2-rc6 2019-06-21 09:58:42 -07:00
.clang-format Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-17 11:26:25 -07:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore .gitignore: exclude .get_maintainer.ignore and .gitattributes 2019-05-18 11:49:54 +09:00
.mailmap MAINTAINERS: Update my email address 2019-06-18 14:37:27 +01:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-06-25 01:32:59 +02:00
Kbuild Kbuild updates for v5.1 2019-03-10 17:48:21 -07:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-27 21:06:39 -07:00
Makefile Linux 5.2-rc6 2019-06-22 16:01:36 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.