2019-05-27 14:55:01 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* INET An implementation of the TCP/IP protocol suite for the LINUX
|
|
|
|
* operating system. INET is implemented using the BSD Socket
|
|
|
|
* interface as the means of communication with the user level.
|
|
|
|
*
|
|
|
|
* The Internet Protocol (IP) module.
|
|
|
|
*
|
2005-05-06 07:16:16 +08:00
|
|
|
* Authors: Ross Biro
|
2005-04-17 06:20:36 +08:00
|
|
|
* Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG>
|
|
|
|
* Donald Becker, <becker@super.org>
|
2008-10-14 10:01:08 +08:00
|
|
|
* Alan Cox, <alan@lxorguk.ukuu.org.uk>
|
2005-04-17 06:20:36 +08:00
|
|
|
* Richard Underwood
|
|
|
|
* Stefan Becker, <stefanb@yello.ping.de>
|
|
|
|
* Jorge Cwik, <jorge@laser.satlink.net>
|
|
|
|
* Arnt Gulbrandsen, <agulbra@nvg.unit.no>
|
2007-02-09 22:24:47 +08:00
|
|
|
*
|
2005-04-17 06:20:36 +08:00
|
|
|
* Fixes:
|
|
|
|
* Alan Cox : Commented a couple of minor bits of surplus code
|
|
|
|
* Alan Cox : Undefining IP_FORWARD doesn't include the code
|
|
|
|
* (just stops a compiler warning).
|
|
|
|
* Alan Cox : Frames with >=MAX_ROUTE record routes, strict routes or loose routes
|
|
|
|
* are junked rather than corrupting things.
|
|
|
|
* Alan Cox : Frames to bad broadcast subnets are dumped
|
|
|
|
* We used to process them non broadcast and
|
|
|
|
* boy could that cause havoc.
|
|
|
|
* Alan Cox : ip_forward sets the free flag on the
|
|
|
|
* new frame it queues. Still crap because
|
|
|
|
* it copies the frame but at least it
|
|
|
|
* doesn't eat memory too.
|
|
|
|
* Alan Cox : Generic queue code and memory fixes.
|
|
|
|
* Fred Van Kempen : IP fragment support (borrowed from NET2E)
|
|
|
|
* Gerhard Koerting: Forward fragmented frames correctly.
|
|
|
|
* Gerhard Koerting: Fixes to my fix of the above 8-).
|
|
|
|
* Gerhard Koerting: IP interface addressing fix.
|
|
|
|
* Linus Torvalds : More robustness checks
|
|
|
|
* Alan Cox : Even more checks: Still not as robust as it ought to be
|
|
|
|
* Alan Cox : Save IP header pointer for later
|
|
|
|
* Alan Cox : ip option setting
|
|
|
|
* Alan Cox : Use ip_tos/ip_ttl settings
|
|
|
|
* Alan Cox : Fragmentation bogosity removed
|
|
|
|
* (Thanks to Mark.Bush@prg.ox.ac.uk)
|
|
|
|
* Dmitry Gorodchanin : Send of a raw packet crash fix.
|
|
|
|
* Alan Cox : Silly ip bug when an overlength
|
|
|
|
* fragment turns up. Now frees the
|
|
|
|
* queue.
|
|
|
|
* Linus Torvalds/ : Memory leakage on fragmentation
|
|
|
|
* Alan Cox : handling.
|
|
|
|
* Gerhard Koerting: Forwarding uses IP priority hints
|
|
|
|
* Teemu Rantanen : Fragment problems.
|
|
|
|
* Alan Cox : General cleanup, comments and reformat
|
|
|
|
* Alan Cox : SNMP statistics
|
|
|
|
* Alan Cox : BSD address rule semantics. Also see
|
|
|
|
* UDP as there is a nasty checksum issue
|
|
|
|
* if you do things the wrong way.
|
|
|
|
* Alan Cox : Always defrag, moved IP_FORWARD to the config.in file
|
|
|
|
* Alan Cox : IP options adjust sk->priority.
|
|
|
|
* Pedro Roque : Fix mtu/length error in ip_forward.
|
|
|
|
* Alan Cox : Avoid ip_chk_addr when possible.
|
|
|
|
* Richard Underwood : IP multicasting.
|
|
|
|
* Alan Cox : Cleaned up multicast handlers.
|
|
|
|
* Alan Cox : RAW sockets demultiplex in the BSD style.
|
|
|
|
* Gunther Mayer : Fix the SNMP reporting typo
|
|
|
|
* Alan Cox : Always in group 224.0.0.1
|
|
|
|
* Pauline Middelink : Fast ip_checksum update when forwarding
|
|
|
|
* Masquerading support.
|
|
|
|
* Alan Cox : Multicast loopback error for 224.0.0.1
|
|
|
|
* Alan Cox : IP_MULTICAST_LOOP option.
|
|
|
|
* Alan Cox : Use notifiers.
|
|
|
|
* Bjorn Ekwall : Removed ip_csum (from slhc.c too)
|
|
|
|
* Bjorn Ekwall : Moved ip_fast_csum to ip.h (inline!)
|
|
|
|
* Stefan Becker : Send out ICMP HOST REDIRECT
|
|
|
|
* Arnt Gulbrandsen : ip_build_xmit
|
|
|
|
* Alan Cox : Per socket routing cache
|
|
|
|
* Alan Cox : Fixed routing cache, added header cache.
|
|
|
|
* Alan Cox : Loopback didn't work right in original ip_build_xmit - fixed it.
|
|
|
|
* Alan Cox : Only send ICMP_REDIRECT if src/dest are the same net.
|
|
|
|
* Alan Cox : Incoming IP option handling.
|
|
|
|
* Alan Cox : Set saddr on raw output frames as per BSD.
|
|
|
|
* Alan Cox : Stopped broadcast source route explosions.
|
|
|
|
* Alan Cox : Can disable source routing
|
|
|
|
* Takeshi Sone : Masquerading didn't work.
|
|
|
|
* Dave Bonn,Alan Cox : Faster IP forwarding whenever possible.
|
|
|
|
* Alan Cox : Memory leaks, tramples, misc debugging.
|
|
|
|
* Alan Cox : Fixed multicast (by popular demand 8))
|
|
|
|
* Alan Cox : Fixed forwarding (by even more popular demand 8))
|
|
|
|
* Alan Cox : Fixed SNMP statistics [I think]
|
|
|
|
* Gerhard Koerting : IP fragmentation forwarding fix
|
|
|
|
* Alan Cox : Device lock against page fault.
|
|
|
|
* Alan Cox : IP_HDRINCL facility.
|
|
|
|
* Werner Almesberger : Zero fragment bug
|
|
|
|
* Alan Cox : RAW IP frame length bug
|
|
|
|
* Alan Cox : Outgoing firewall on build_xmit
|
|
|
|
* A.N.Kuznetsov : IP_OPTIONS support throughout the kernel
|
|
|
|
* Alan Cox : Multicast routing hooks
|
|
|
|
* Jos Vos : Do accounting *before* call_in_firewall
|
|
|
|
* Willy Konynenberg : Transparent proxying support
|
|
|
|
*
|
|
|
|
* To Fix:
|
|
|
|
* IP fragmentation wants rewriting cleanly. The RFC815 algorithm is much more efficient
|
|
|
|
* and could be made very efficient with the addition of some virtual memory hacks to permit
|
|
|
|
* the allocation of a buffer that can then be 'grown' by twiddling page tables.
|
2007-02-09 22:24:47 +08:00
|
|
|
* Output fragmentation wants updating along with the buffer management to use a single
|
2005-04-17 06:20:36 +08:00
|
|
|
* interleaved copy algorithm so that fragmenting has a one copy overhead. Actual packet
|
|
|
|
* output should probably do its own fragmentation at the UDP/RAW layer. TCP shouldn't cause
|
|
|
|
* fragmentation anyway.
|
|
|
|
*/
|
|
|
|
|
2012-03-12 15:03:32 +08:00
|
|
|
#define pr_fmt(fmt) "IPv4: " fmt
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/errno.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
#include <linux/net.h>
|
|
|
|
#include <linux/socket.h>
|
|
|
|
#include <linux/sockios.h>
|
|
|
|
#include <linux/in.h>
|
|
|
|
#include <linux/inet.h>
|
2005-12-27 12:43:12 +08:00
|
|
|
#include <linux/inetdevice.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
#include <linux/netdevice.h>
|
|
|
|
#include <linux/etherdevice.h>
|
2019-05-03 23:01:37 +08:00
|
|
|
#include <linux/indirect_call_wrapper.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
#include <net/snmp.h>
|
|
|
|
#include <net/ip.h>
|
|
|
|
#include <net/protocol.h>
|
|
|
|
#include <net/route.h>
|
|
|
|
#include <linux/skbuff.h>
|
|
|
|
#include <net/sock.h>
|
|
|
|
#include <net/arp.h>
|
|
|
|
#include <net/icmp.h>
|
|
|
|
#include <net/raw.h>
|
|
|
|
#include <net/checksum.h>
|
2013-08-06 18:32:11 +08:00
|
|
|
#include <net/inet_ecn.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
#include <linux/netfilter_ipv4.h>
|
|
|
|
#include <net/xfrm.h>
|
|
|
|
#include <linux/mroute.h>
|
|
|
|
#include <linux/netlink.h>
|
2015-07-21 16:43:56 +08:00
|
|
|
#include <net/dst_metadata.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/*
|
2010-06-07 11:12:08 +08:00
|
|
|
* Process Router Attention IP option (RFC 2113)
|
2007-02-09 22:24:47 +08:00
|
|
|
*/
|
2012-03-08 09:45:32 +08:00
|
|
|
bool ip_call_ra_chain(struct sk_buff *skb)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct ip_ra_chain *ra;
|
2007-04-21 13:47:35 +08:00
|
|
|
u8 protocol = ip_hdr(skb)->protocol;
|
2005-04-17 06:20:36 +08:00
|
|
|
struct sock *last = NULL;
|
2008-03-25 06:31:00 +08:00
|
|
|
struct net_device *dev = skb->dev;
|
2015-10-10 02:44:53 +08:00
|
|
|
struct net *net = dev_net(dev);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2018-03-22 17:45:32 +08:00
|
|
|
for (ra = rcu_dereference(net->ipv4.ra_chain); ra; ra = rcu_dereference(ra->next)) {
|
2005-04-17 06:20:36 +08:00
|
|
|
struct sock *sk = ra->sk;
|
|
|
|
|
|
|
|
/* If socket is bound to an interface, only report
|
|
|
|
* the packet if it came from that interface.
|
|
|
|
*/
|
2009-10-15 14:30:45 +08:00
|
|
|
if (sk && inet_sk(sk)->inet_num == protocol &&
|
2005-04-17 06:20:36 +08:00
|
|
|
(!sk->sk_bound_dev_if ||
|
2018-03-22 17:45:32 +08:00
|
|
|
sk->sk_bound_dev_if == dev->ifindex)) {
|
2011-06-22 11:33:34 +08:00
|
|
|
if (ip_is_fragment(ip_hdr(skb))) {
|
2015-10-10 02:44:54 +08:00
|
|
|
if (ip_defrag(net, skb, IP_DEFRAG_CALL_RA_CHAIN))
|
2012-03-08 09:45:32 +08:00
|
|
|
return true;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
if (last) {
|
|
|
|
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
|
|
|
|
if (skb2)
|
|
|
|
raw_rcv(last, skb2);
|
|
|
|
}
|
|
|
|
last = sk;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (last) {
|
|
|
|
raw_rcv(last, skb);
|
2012-03-08 09:45:32 +08:00
|
|
|
return true;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2012-03-08 09:45:32 +08:00
|
|
|
return false;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2019-05-03 23:01:37 +08:00
|
|
|
INDIRECT_CALLABLE_DECLARE(int udp_rcv(struct sk_buff *));
|
|
|
|
INDIRECT_CALLABLE_DECLARE(int tcp_v4_rcv(struct sk_buff *));
|
2018-11-07 19:38:31 +08:00
|
|
|
void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int protocol)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2018-11-07 19:38:31 +08:00
|
|
|
const struct net_protocol *ipprot;
|
|
|
|
int raw, ret;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2018-11-07 19:38:31 +08:00
|
|
|
resubmit:
|
|
|
|
raw = raw_local_deliver(skb, protocol);
|
|
|
|
|
|
|
|
ipprot = rcu_dereference(inet_protos[protocol]);
|
|
|
|
if (ipprot) {
|
|
|
|
if (!ipprot->no_policy) {
|
|
|
|
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
|
2022-02-05 15:47:37 +08:00
|
|
|
kfree_skb_reason(skb,
|
|
|
|
SKB_DROP_REASON_XFRM_POLICY);
|
2018-11-07 19:38:31 +08:00
|
|
|
return;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2019-09-30 02:54:03 +08:00
|
|
|
nf_reset_ct(skb);
|
2018-11-07 19:38:31 +08:00
|
|
|
}
|
2019-05-03 23:01:37 +08:00
|
|
|
ret = INDIRECT_CALL_2(ipprot->handler, tcp_v4_rcv, udp_rcv,
|
|
|
|
skb);
|
2018-11-07 19:38:31 +08:00
|
|
|
if (ret < 0) {
|
|
|
|
protocol = -ret;
|
|
|
|
goto resubmit;
|
|
|
|
}
|
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
|
|
|
|
} else {
|
|
|
|
if (!raw) {
|
|
|
|
if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
|
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_INUNKNOWNPROTOS);
|
|
|
|
icmp_send(skb, ICMP_DEST_UNREACH,
|
|
|
|
ICMP_PROT_UNREACH, 0);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2022-02-05 15:47:37 +08:00
|
|
|
kfree_skb_reason(skb, SKB_DROP_REASON_IP_NOPROTO);
|
2005-04-17 06:20:36 +08:00
|
|
|
} else {
|
2018-11-07 19:38:31 +08:00
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_INDELIVERS);
|
|
|
|
consume_skb(skb);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
}
|
2018-11-07 19:38:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
|
|
|
|
{
|
2022-03-03 03:56:22 +08:00
|
|
|
skb_clear_delivery_time(skb);
|
2018-11-07 19:38:31 +08:00
|
|
|
__skb_pull(skb, skb_network_header_len(skb));
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
ip_protocol_deliver_rcu(net, skb, ip_hdr(skb)->protocol);
|
2005-04-17 06:20:36 +08:00
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Deliver IP Packets to the higher protocol layers.
|
2007-02-09 22:24:47 +08:00
|
|
|
*/
|
2005-04-17 06:20:36 +08:00
|
|
|
int ip_local_deliver(struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Reassemble IP fragments.
|
|
|
|
*/
|
2015-10-10 02:44:54 +08:00
|
|
|
struct net *net = dev_net(skb->dev);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2011-06-22 11:33:34 +08:00
|
|
|
if (ip_is_fragment(ip_hdr(skb))) {
|
2015-10-10 02:44:54 +08:00
|
|
|
if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER))
|
2005-04-17 06:20:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-09-16 09:04:16 +08:00
|
|
|
return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN,
|
2015-10-10 02:44:54 +08:00
|
|
|
net, NULL, skb, skb->dev, NULL,
|
2005-04-17 06:20:36 +08:00
|
|
|
ip_local_deliver_finish);
|
|
|
|
}
|
2021-02-02 01:41:29 +08:00
|
|
|
EXPORT_SYMBOL(ip_local_deliver);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2019-04-01 21:17:32 +08:00
|
|
|
static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev)
|
2005-08-21 08:26:12 +08:00
|
|
|
{
|
|
|
|
struct ip_options *opt;
|
2011-04-22 12:53:02 +08:00
|
|
|
const struct iphdr *iph;
|
2005-08-21 08:26:12 +08:00
|
|
|
|
|
|
|
/* It looks as overkill, because not all
|
|
|
|
IP options require packet mangling.
|
|
|
|
But it is the easiest for now, especially taking
|
|
|
|
into account that combination of IP options
|
|
|
|
and running sniffer is extremely rare condition.
|
|
|
|
--ANK (980813)
|
|
|
|
*/
|
|
|
|
if (skb_cow(skb, skb_headroom(skb))) {
|
2016-04-28 07:44:35 +08:00
|
|
|
__IP_INC_STATS(dev_net(dev), IPSTATS_MIB_INDISCARDS);
|
2005-08-21 08:26:12 +08:00
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
2007-04-21 13:47:35 +08:00
|
|
|
iph = ip_hdr(skb);
|
2008-03-23 07:36:20 +08:00
|
|
|
opt = &(IPCB(skb)->opt);
|
|
|
|
opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
|
2005-08-21 08:26:12 +08:00
|
|
|
|
2008-03-25 20:47:49 +08:00
|
|
|
if (ip_options_compile(dev_net(dev), opt, skb)) {
|
2016-04-28 07:44:35 +08:00
|
|
|
__IP_INC_STATS(dev_net(dev), IPSTATS_MIB_INHDRERRORS);
|
2005-08-21 08:26:12 +08:00
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (unlikely(opt->srr)) {
|
2010-06-07 11:54:46 +08:00
|
|
|
struct in_device *in_dev = __in_dev_get_rcu(dev);
|
|
|
|
|
2005-08-21 08:26:12 +08:00
|
|
|
if (in_dev) {
|
|
|
|
if (!IN_DEV_SOURCE_ROUTE(in_dev)) {
|
2012-05-14 05:56:26 +08:00
|
|
|
if (IN_DEV_LOG_MARTIANS(in_dev))
|
|
|
|
net_info_ratelimited("source route option %pI4 -> %pI4\n",
|
|
|
|
&iph->saddr,
|
|
|
|
&iph->daddr);
|
2005-08-21 08:26:12 +08:00
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-04-01 21:17:32 +08:00
|
|
|
if (ip_options_rcv_srr(skb, dev))
|
2005-08-21 08:26:12 +08:00
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
2012-03-08 09:48:08 +08:00
|
|
|
return false;
|
2005-08-21 08:26:12 +08:00
|
|
|
drop:
|
2012-03-08 09:48:08 +08:00
|
|
|
return true;
|
2005-08-21 08:26:12 +08:00
|
|
|
}
|
|
|
|
|
2019-11-20 20:47:37 +08:00
|
|
|
static bool ip_can_use_hint(const struct sk_buff *skb, const struct iphdr *iph,
|
|
|
|
const struct sk_buff *hint)
|
|
|
|
{
|
|
|
|
return hint && !skb_dst(skb) && ip_hdr(hint)->daddr == iph->daddr &&
|
|
|
|
ip_hdr(hint)->tos == iph->tos;
|
|
|
|
}
|
|
|
|
|
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis. Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for
tcp and udp"). However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic. Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline. So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 01:52:07 +08:00
|
|
|
int tcp_v4_early_demux(struct sk_buff *skb);
|
|
|
|
int udp_v4_early_demux(struct sk_buff *skb);
|
2018-07-02 23:14:34 +08:00
|
|
|
static int ip_rcv_finish_core(struct net *net, struct sock *sk,
|
2019-11-20 20:47:37 +08:00
|
|
|
struct sk_buff *skb, struct net_device *dev,
|
|
|
|
const struct sk_buff *hint)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2007-04-21 13:47:35 +08:00
|
|
|
const struct iphdr *iph = ip_hdr(skb);
|
2022-02-05 15:47:36 +08:00
|
|
|
int err, drop_reason;
|
2017-09-28 21:51:36 +08:00
|
|
|
struct rtable *rt;
|
2022-02-05 15:47:36 +08:00
|
|
|
|
|
|
|
drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2019-11-20 20:47:37 +08:00
|
|
|
if (ip_can_use_hint(skb, iph, hint)) {
|
|
|
|
err = ip_route_use_hint(skb, iph->daddr, iph->saddr, iph->tos,
|
|
|
|
dev, hint);
|
|
|
|
if (unlikely(err))
|
|
|
|
goto drop_error;
|
|
|
|
}
|
|
|
|
|
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis. Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for
tcp and udp"). However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic. Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline. So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 01:52:07 +08:00
|
|
|
if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) &&
|
2016-01-27 08:59:42 +08:00
|
|
|
!skb_dst(skb) &&
|
|
|
|
!skb->sk &&
|
|
|
|
!ip_is_fragment(iph)) {
|
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis. Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for
tcp and udp"). However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic. Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline. So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 01:52:07 +08:00
|
|
|
switch (iph->protocol) {
|
|
|
|
case IPPROTO_TCP:
|
|
|
|
if (READ_ONCE(net->ipv4.sysctl_tcp_early_demux)) {
|
|
|
|
tcp_v4_early_demux(skb);
|
|
|
|
|
|
|
|
/* must reload iph, skb->head might have changed */
|
|
|
|
iph = ip_hdr(skb);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case IPPROTO_UDP:
|
|
|
|
if (READ_ONCE(net->ipv4.sysctl_udp_early_demux)) {
|
|
|
|
err = udp_v4_early_demux(skb);
|
|
|
|
if (unlikely(err))
|
|
|
|
goto drop_error;
|
|
|
|
|
|
|
|
/* must reload iph, skb->head might have changed */
|
|
|
|
iph = ip_hdr(skb);
|
|
|
|
}
|
|
|
|
break;
|
2012-07-24 09:19:31 +08:00
|
|
|
}
|
2012-06-28 13:01:22 +08:00
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* Initialise the virtual path cache for the packet. It describes
|
|
|
|
* how the packet travels inside Linux networking.
|
2007-02-09 22:24:47 +08:00
|
|
|
*/
|
2015-07-21 16:43:56 +08:00
|
|
|
if (!skb_valid_dst(skb)) {
|
2017-09-28 21:51:36 +08:00
|
|
|
err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
|
|
|
|
iph->tos, dev);
|
|
|
|
if (unlikely(err))
|
|
|
|
goto drop_error;
|
2022-10-10 03:16:43 +08:00
|
|
|
} else {
|
|
|
|
struct in_device *in_dev = __in_dev_get_rcu(dev);
|
|
|
|
|
|
|
|
if (in_dev && IN_DEV_ORCONF(in_dev, NOPOLICY))
|
|
|
|
IPCB(skb)->flags |= IPSKB_NOPOLICY;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2011-01-14 20:36:42 +08:00
|
|
|
#ifdef CONFIG_IP_ROUTE_CLASSID
|
2009-06-02 13:19:30 +08:00
|
|
|
if (unlikely(skb_dst(skb)->tclassid)) {
|
2010-06-24 08:52:37 +08:00
|
|
|
struct ip_rt_acct *st = this_cpu_ptr(ip_rt_acct);
|
2009-06-02 13:19:30 +08:00
|
|
|
u32 idx = skb_dst(skb)->tclassid;
|
2005-04-17 06:20:36 +08:00
|
|
|
st[idx&0xFF].o_packets++;
|
2008-11-03 18:47:38 +08:00
|
|
|
st[idx&0xFF].o_bytes += skb->len;
|
2005-04-17 06:20:36 +08:00
|
|
|
st[(idx>>16)&0xFF].i_packets++;
|
2008-11-03 18:47:38 +08:00
|
|
|
st[(idx>>16)&0xFF].i_bytes += skb->len;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2019-04-01 21:17:32 +08:00
|
|
|
if (iph->ihl > 5 && ip_rcv_options(skb, dev))
|
2005-08-21 08:26:12 +08:00
|
|
|
goto drop;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-06-02 13:14:27 +08:00
|
|
|
rt = skb_rtable(skb);
|
2009-04-27 17:45:02 +08:00
|
|
|
if (rt->rt_type == RTN_MULTICAST) {
|
2016-04-28 07:44:38 +08:00
|
|
|
__IP_UPD_PO_STATS(net, IPSTATS_MIB_INMCAST, skb->len);
|
2016-02-04 20:31:17 +08:00
|
|
|
} else if (rt->rt_type == RTN_BROADCAST) {
|
2016-04-28 07:44:38 +08:00
|
|
|
__IP_UPD_PO_STATS(net, IPSTATS_MIB_INBCAST, skb->len);
|
2016-02-04 20:31:17 +08:00
|
|
|
} else if (skb->pkt_type == PACKET_BROADCAST ||
|
|
|
|
skb->pkt_type == PACKET_MULTICAST) {
|
2016-09-15 07:40:05 +08:00
|
|
|
struct in_device *in_dev = __in_dev_get_rcu(dev);
|
2016-02-04 20:31:17 +08:00
|
|
|
|
|
|
|
/* RFC 1122 3.3.6:
|
|
|
|
*
|
|
|
|
* When a host sends a datagram to a link-layer broadcast
|
|
|
|
* address, the IP destination address MUST be a legal IP
|
|
|
|
* broadcast or IP multicast address.
|
|
|
|
*
|
|
|
|
* A host SHOULD silently discard a datagram that is received
|
|
|
|
* via a link-layer broadcast (see Section 2.4) but does not
|
|
|
|
* specify an IP multicast or broadcast destination address.
|
|
|
|
*
|
|
|
|
* This doesn't explicitly say L2 *broadcast*, but broadcast is
|
|
|
|
* in a way a form of multicast and the most common use case for
|
|
|
|
* this is 802.11 protecting against cross-station spoofing (the
|
|
|
|
* so-called "hole-196" attack) so do it for both.
|
|
|
|
*/
|
|
|
|
if (in_dev &&
|
2022-02-05 15:47:36 +08:00
|
|
|
IN_DEV_ORCONF(in_dev, DROP_UNICAST_IN_L2_MULTICAST)) {
|
|
|
|
drop_reason = SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST;
|
2016-02-04 20:31:17 +08:00
|
|
|
goto drop;
|
2022-02-05 15:47:36 +08:00
|
|
|
}
|
2016-02-04 20:31:17 +08:00
|
|
|
}
|
2007-04-30 15:48:10 +08:00
|
|
|
|
2018-07-02 23:14:34 +08:00
|
|
|
return NET_RX_SUCCESS;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
drop:
|
2022-02-05 15:47:36 +08:00
|
|
|
kfree_skb_reason(skb, drop_reason);
|
2007-02-09 22:24:47 +08:00
|
|
|
return NET_RX_DROP;
|
2017-09-28 21:51:36 +08:00
|
|
|
|
|
|
|
drop_error:
|
2022-02-05 15:47:36 +08:00
|
|
|
if (err == -EXDEV) {
|
|
|
|
drop_reason = SKB_DROP_REASON_IP_RPFILTER;
|
2017-09-28 21:51:36 +08:00
|
|
|
__NET_INC_STATS(net, LINUX_MIB_IPRPFILTER);
|
2022-02-05 15:47:36 +08:00
|
|
|
}
|
2017-09-28 21:51:36 +08:00
|
|
|
goto drop;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2018-07-02 23:14:34 +08:00
|
|
|
static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
|
|
|
|
{
|
2019-02-26 05:55:48 +08:00
|
|
|
struct net_device *dev = skb->dev;
|
2018-07-05 22:47:39 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* if ingress device is enslaved to an L3 master device pass the
|
|
|
|
* skb to its handler for processing
|
|
|
|
*/
|
|
|
|
skb = l3mdev_ip_rcv(skb);
|
|
|
|
if (!skb)
|
|
|
|
return NET_RX_SUCCESS;
|
2018-07-02 23:14:34 +08:00
|
|
|
|
2019-11-20 20:47:37 +08:00
|
|
|
ret = ip_rcv_finish_core(net, sk, skb, dev, NULL);
|
2018-07-02 23:14:34 +08:00
|
|
|
if (ret != NET_RX_DROP)
|
|
|
|
ret = dst_input(skb);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* Main IP Receive routine.
|
2007-02-09 22:24:47 +08:00
|
|
|
*/
|
2018-07-02 23:14:12 +08:00
|
|
|
static struct sk_buff *ip_rcv_core(struct sk_buff *skb, struct net *net)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2011-04-22 12:53:02 +08:00
|
|
|
const struct iphdr *iph;
|
2022-02-05 15:47:35 +08:00
|
|
|
int drop_reason;
|
2005-08-21 08:25:29 +08:00
|
|
|
u32 len;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/* When the interface is in promisc. mode, drop all the crap
|
|
|
|
* that it receives, do not try to analyse it.
|
|
|
|
*/
|
2022-02-05 15:47:35 +08:00
|
|
|
if (skb->pkt_type == PACKET_OTHERHOST) {
|
2022-04-07 01:26:00 +08:00
|
|
|
dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
|
2022-02-05 15:47:35 +08:00
|
|
|
drop_reason = SKB_DROP_REASON_OTHERHOST;
|
2005-04-17 06:20:36 +08:00
|
|
|
goto drop;
|
2022-02-05 15:47:35 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2016-04-28 07:44:38 +08:00
|
|
|
__IP_UPD_PO_STATS(net, IPSTATS_MIB_IN, skb->len);
|
2023-11-16 18:26:03 +08:00
|
|
|
if (skb->dev) {
|
|
|
|
__SNMP_INC_STATS(skb->dev->mib.dev_statistics,
|
|
|
|
DEV_MIB_IPV4_RX_PKTS);
|
|
|
|
__SNMP_ADD_STATS(skb->dev->mib.dev_statistics,
|
|
|
|
DEV_MIB_IPV4_RX_BYTES, skb->len);
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-04-03 16:17:26 +08:00
|
|
|
skb = skb_share_check(skb, GFP_ATOMIC);
|
|
|
|
if (!skb) {
|
2016-04-28 07:44:35 +08:00
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS);
|
2005-04-17 06:20:36 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2022-02-05 15:47:35 +08:00
|
|
|
drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
|
2005-04-17 06:20:36 +08:00
|
|
|
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
|
|
|
|
goto inhdr_error;
|
|
|
|
|
2007-04-21 13:47:35 +08:00
|
|
|
iph = ip_hdr(skb);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/*
|
2008-05-08 16:11:04 +08:00
|
|
|
* RFC1122: 3.2.1.2 MUST silently discard any IP frame that fails the checksum.
|
2005-04-17 06:20:36 +08:00
|
|
|
*
|
|
|
|
* Is the datagram acceptable?
|
|
|
|
*
|
|
|
|
* 1. Length at least the size of an ip header
|
|
|
|
* 2. Version of 4
|
|
|
|
* 3. Checksums correctly. [Speed optimisation for later, skip loopback checksums]
|
|
|
|
* 4. Doesn't have a bogus length
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (iph->ihl < 5 || iph->version != 4)
|
2005-08-21 08:25:29 +08:00
|
|
|
goto inhdr_error;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2013-08-06 18:32:11 +08:00
|
|
|
BUILD_BUG_ON(IPSTATS_MIB_ECT1PKTS != IPSTATS_MIB_NOECTPKTS + INET_ECN_ECT_1);
|
|
|
|
BUILD_BUG_ON(IPSTATS_MIB_ECT0PKTS != IPSTATS_MIB_NOECTPKTS + INET_ECN_ECT_0);
|
|
|
|
BUILD_BUG_ON(IPSTATS_MIB_CEPKTS != IPSTATS_MIB_NOECTPKTS + INET_ECN_CE);
|
2016-04-28 07:44:37 +08:00
|
|
|
__IP_ADD_STATS(net,
|
|
|
|
IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
|
|
|
|
max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));
|
2013-08-06 18:32:11 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
if (!pskb_may_pull(skb, iph->ihl*4))
|
|
|
|
goto inhdr_error;
|
|
|
|
|
2007-04-21 13:47:35 +08:00
|
|
|
iph = ip_hdr(skb);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-08-21 08:25:52 +08:00
|
|
|
if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
|
2013-04-29 16:39:56 +08:00
|
|
|
goto csum_error;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
net: add support for ipv4 big tcp
Similar to Eric's IPv6 BIG TCP, this patch is to enable IPv4 BIG TCP.
Firstly, allow sk->sk_gso_max_size to be set to a value greater than
GSO_LEGACY_MAX_SIZE by not trimming gso_max_size in sk_trim_gso_size()
for IPv4 TCP sockets.
Then on TX path, set IP header tot_len to 0 when skb->len > IP_MAX_MTU
in __ip_local_out() to allow to send BIG TCP packets, and this implies
that skb->len is the length of a IPv4 packet; On RX path, use skb->len
as the length of the IPv4 packet when the IP header tot_len is 0 and
skb->len > IP_MAX_MTU in ip_rcv_core(). As the API iph_set_totlen() and
skb_ip_totlen() are used in __ip_local_out() and ip_rcv_core(), we only
need to update these APIs.
Also in GRO receive, add the check for ETH_P_IP/IPPROTO_TCP, and allows
the merged packet size >= GRO_LEGACY_MAX_SIZE in skb_gro_receive(). In
GRO complete, set IP header tot_len to 0 when the merged packet size
greater than IP_MAX_MTU in iph_set_totlen() so that it can be processed
on RX path.
Note that by checking skb_is_gso_tcp() in API iph_totlen(), it makes
this implementation safe to use iph->len == 0 indicates IPv4 BIG TCP
packets.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28 23:58:39 +08:00
|
|
|
len = iph_totlen(skb, iph);
|
2007-04-30 15:46:30 +08:00
|
|
|
if (skb->len < len) {
|
2022-02-05 15:47:35 +08:00
|
|
|
drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL;
|
2016-04-28 07:44:35 +08:00
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS);
|
2007-04-30 15:46:30 +08:00
|
|
|
goto drop;
|
|
|
|
} else if (len < (iph->ihl*4))
|
2005-08-21 08:25:29 +08:00
|
|
|
goto inhdr_error;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-08-21 08:25:29 +08:00
|
|
|
/* Our transport medium may have padded the buffer out. Now we know it
|
|
|
|
* is IP we can trim to the true length of the frame.
|
|
|
|
* Note this now means skb->len holds ntohs(iph->tot_len).
|
|
|
|
*/
|
|
|
|
if (pskb_trim_rcsum(skb, len)) {
|
2016-04-28 07:44:35 +08:00
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS);
|
2005-08-21 08:25:29 +08:00
|
|
|
goto drop;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2019-01-17 23:34:38 +08:00
|
|
|
iph = ip_hdr(skb);
|
2013-07-16 11:03:19 +08:00
|
|
|
skb->transport_header = skb->network_header + iph->ihl*4;
|
|
|
|
|
2006-07-15 05:49:32 +08:00
|
|
|
/* Remove any debris in the socket control block */
|
2006-07-25 14:45:16 +08:00
|
|
|
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
|
2016-05-11 02:19:51 +08:00
|
|
|
IPCB(skb)->iif = skb->skb_iif;
|
2006-07-15 05:49:32 +08:00
|
|
|
|
2009-06-27 10:22:37 +08:00
|
|
|
/* Must drop socket now because of tproxy. */
|
2020-03-30 06:53:38 +08:00
|
|
|
if (!skb_sk_is_prefetched(skb))
|
|
|
|
skb_orphan(skb);
|
2009-06-27 10:22:37 +08:00
|
|
|
|
2018-07-02 23:14:12 +08:00
|
|
|
return skb;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2013-04-29 16:39:56 +08:00
|
|
|
csum_error:
|
2022-02-05 15:47:35 +08:00
|
|
|
drop_reason = SKB_DROP_REASON_IP_CSUM;
|
2016-04-28 07:44:35 +08:00
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_CSUMERRORS);
|
2005-04-17 06:20:36 +08:00
|
|
|
inhdr_error:
|
2022-02-05 15:47:35 +08:00
|
|
|
if (drop_reason == SKB_DROP_REASON_NOT_SPECIFIED)
|
|
|
|
drop_reason = SKB_DROP_REASON_IP_INHDR;
|
2016-04-28 07:44:35 +08:00
|
|
|
__IP_INC_STATS(net, IPSTATS_MIB_INHDRERRORS);
|
2005-04-17 06:20:36 +08:00
|
|
|
drop:
|
2022-02-05 15:47:35 +08:00
|
|
|
kfree_skb_reason(skb, drop_reason);
|
2005-04-17 06:20:36 +08:00
|
|
|
out:
|
2018-07-02 23:14:12 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IP receive entry point
|
|
|
|
*/
|
|
|
|
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
|
|
|
|
struct net_device *orig_dev)
|
|
|
|
{
|
|
|
|
struct net *net = dev_net(dev);
|
|
|
|
|
|
|
|
skb = ip_rcv_core(skb, net);
|
|
|
|
if (skb == NULL)
|
|
|
|
return NET_RX_DROP;
|
2019-01-25 22:41:50 +08:00
|
|
|
|
2018-07-02 23:14:12 +08:00
|
|
|
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
|
|
|
|
net, NULL, skb, dev, NULL,
|
|
|
|
ip_rcv_finish);
|
|
|
|
}
|
|
|
|
|
2018-07-02 23:14:34 +08:00
|
|
|
static void ip_sublist_rcv_finish(struct list_head *head)
|
2018-07-02 23:14:12 +08:00
|
|
|
{
|
|
|
|
struct sk_buff *skb, *next;
|
|
|
|
|
2018-07-11 23:01:20 +08:00
|
|
|
list_for_each_entry_safe(skb, next, head, list) {
|
2018-08-01 06:27:56 +08:00
|
|
|
skb_list_del_init(skb);
|
2018-07-02 23:14:34 +08:00
|
|
|
dst_input(skb);
|
2018-07-11 23:01:20 +08:00
|
|
|
}
|
2018-07-02 23:14:34 +08:00
|
|
|
}
|
|
|
|
|
2019-11-20 20:47:37 +08:00
|
|
|
static struct sk_buff *ip_extract_route_hint(const struct net *net,
|
|
|
|
struct sk_buff *skb, int rt_type)
|
|
|
|
{
|
2023-08-31 16:03:30 +08:00
|
|
|
if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST ||
|
|
|
|
IPCB(skb)->flags & IPSKB_MULTIPATH)
|
2019-11-20 20:47:37 +08:00
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return skb;
|
|
|
|
}
|
|
|
|
|
2018-07-02 23:14:34 +08:00
|
|
|
static void ip_list_rcv_finish(struct net *net, struct sock *sk,
|
|
|
|
struct list_head *head)
|
|
|
|
{
|
2019-11-20 20:47:37 +08:00
|
|
|
struct sk_buff *skb, *next, *hint = NULL;
|
2018-07-02 23:14:34 +08:00
|
|
|
struct dst_entry *curr_dst = NULL;
|
|
|
|
struct list_head sublist;
|
|
|
|
|
2018-07-05 02:23:50 +08:00
|
|
|
INIT_LIST_HEAD(&sublist);
|
2018-07-02 23:14:34 +08:00
|
|
|
list_for_each_entry_safe(skb, next, head, list) {
|
2019-02-26 05:55:48 +08:00
|
|
|
struct net_device *dev = skb->dev;
|
2018-07-02 23:14:34 +08:00
|
|
|
struct dst_entry *dst;
|
|
|
|
|
2018-12-05 01:37:57 +08:00
|
|
|
skb_list_del_init(skb);
|
2018-07-05 22:47:39 +08:00
|
|
|
/* if ingress device is enslaved to an L3 master device pass the
|
|
|
|
* skb to its handler for processing
|
|
|
|
*/
|
|
|
|
skb = l3mdev_ip_rcv(skb);
|
|
|
|
if (!skb)
|
|
|
|
continue;
|
2019-11-20 20:47:37 +08:00
|
|
|
if (ip_rcv_finish_core(net, sk, skb, dev, hint) == NET_RX_DROP)
|
2018-07-02 23:14:34 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
dst = skb_dst(skb);
|
|
|
|
if (curr_dst != dst) {
|
2019-11-20 20:47:37 +08:00
|
|
|
hint = ip_extract_route_hint(net, skb,
|
|
|
|
((struct rtable *)dst)->rt_type);
|
|
|
|
|
2018-07-02 23:14:34 +08:00
|
|
|
/* dispatch old sublist */
|
|
|
|
if (!list_empty(&sublist))
|
|
|
|
ip_sublist_rcv_finish(&sublist);
|
|
|
|
/* start new sublist */
|
2018-07-05 02:23:50 +08:00
|
|
|
INIT_LIST_HEAD(&sublist);
|
2018-07-02 23:14:34 +08:00
|
|
|
curr_dst = dst;
|
|
|
|
}
|
2018-07-05 02:23:50 +08:00
|
|
|
list_add_tail(&skb->list, &sublist);
|
2018-07-02 23:14:34 +08:00
|
|
|
}
|
|
|
|
/* dispatch final sublist */
|
2018-07-05 02:23:50 +08:00
|
|
|
ip_sublist_rcv_finish(&sublist);
|
2018-07-02 23:14:34 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void ip_sublist_rcv(struct list_head *head, struct net_device *dev,
|
|
|
|
struct net *net)
|
|
|
|
{
|
2018-07-02 23:14:12 +08:00
|
|
|
NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
|
|
|
|
head, dev, NULL, ip_rcv_finish);
|
2018-07-02 23:14:34 +08:00
|
|
|
ip_list_rcv_finish(net, NULL, head);
|
2018-07-02 23:14:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Receive a list of IP packets */
|
|
|
|
void ip_list_rcv(struct list_head *head, struct packet_type *pt,
|
|
|
|
struct net_device *orig_dev)
|
|
|
|
{
|
|
|
|
struct net_device *curr_dev = NULL;
|
|
|
|
struct net *curr_net = NULL;
|
|
|
|
struct sk_buff *skb, *next;
|
|
|
|
struct list_head sublist;
|
|
|
|
|
2018-07-05 02:23:50 +08:00
|
|
|
INIT_LIST_HEAD(&sublist);
|
2018-07-02 23:14:12 +08:00
|
|
|
list_for_each_entry_safe(skb, next, head, list) {
|
|
|
|
struct net_device *dev = skb->dev;
|
|
|
|
struct net *net = dev_net(dev);
|
|
|
|
|
2018-12-05 01:37:57 +08:00
|
|
|
skb_list_del_init(skb);
|
2018-07-02 23:14:12 +08:00
|
|
|
skb = ip_rcv_core(skb, net);
|
|
|
|
if (skb == NULL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (curr_dev != dev || curr_net != net) {
|
|
|
|
/* dispatch old sublist */
|
|
|
|
if (!list_empty(&sublist))
|
2018-07-05 02:23:50 +08:00
|
|
|
ip_sublist_rcv(&sublist, curr_dev, curr_net);
|
2018-07-02 23:14:12 +08:00
|
|
|
/* start new sublist */
|
2018-07-05 02:23:50 +08:00
|
|
|
INIT_LIST_HEAD(&sublist);
|
2018-07-02 23:14:12 +08:00
|
|
|
curr_dev = dev;
|
|
|
|
curr_net = net;
|
|
|
|
}
|
2018-07-05 02:23:50 +08:00
|
|
|
list_add_tail(&skb->list, &sublist);
|
2018-07-02 23:14:12 +08:00
|
|
|
}
|
|
|
|
/* dispatch final sublist */
|
inet: do not call sublist_rcv on empty list
syzbot triggered struct net NULL deref in NF_HOOK_LIST:
RIP: 0010:NF_HOOK_LIST include/linux/netfilter.h:331 [inline]
RIP: 0010:ip6_sublist_rcv+0x5c9/0x930 net/ipv6/ip6_input.c:292
ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:328
__netif_receive_skb_list_ptype net/core/dev.c:5274 [inline]
Reason:
void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
struct net_device *orig_dev)
[..]
list_for_each_entry_safe(skb, next, head, list) {
/* iterates list */
skb = ip6_rcv_core(skb, dev, net);
/* ip6_rcv_core drops skb -> NULL is returned */
if (skb == NULL)
continue;
[..]
}
/* sublist is empty -> curr_net is NULL */
ip6_sublist_rcv(&sublist, curr_dev, curr_net);
Before the recent change NF_HOOK_LIST did a list iteration before
struct net deref, i.e. it was a no-op in the empty list case.
List iteration now happens after *net deref, causing crash.
Follow the same pattern as the ip(v6)_list_rcv loop and add a list_empty
test for the final sublist dispatch too.
Cc: Edward Cree <ecree@solarflare.com>
Reported-by: syzbot+c54f457cad330e57e967@syzkaller.appspotmail.com
Fixes: ca58fbe06c54 ("netfilter: add and use nf_hook_slow_list()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 08:44:04 +08:00
|
|
|
if (!list_empty(&sublist))
|
|
|
|
ip_sublist_rcv(&sublist, curr_dev, curr_net);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|