Commit Graph

840209 Commits

Author SHA1 Message Date
Olivier Matz 59e3e4b526 ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
As it was done in commit 8f659a03a0 ("net: ipv4: fix for a race
condition in raw_sendmsg") and commit 20b50d7997 ("net: ipv4: emulate
READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
value of inet->hdrincl in a local variable, to avoid introducing a race
condition in the next commit.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06 10:29:21 -07:00
Hangbin Liu 4970b42d5c Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
This reverts commit e9919a24d3.

Nathan reported the new behaviour breaks Android, as Android just add
new rules and delete old ones.

If we return 0 without adding dup rules, Android will remove the new
added rules and causing system to soft-reboot.

Fixes: e9919a24d3 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Yaro Slav <yaro330@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:54:46 -07:00
Nikita Danilov 930b9a0543 net: aquantia: fix wol configuration not applied sometimes
WoL magic packet configuration sometimes does not work due to
couple of leakages found.

Mainly there was a regression introduced during readx_poll refactoring.

Next, fw request waiting time was too small. Sometimes that
caused sleep proxy config function to return with an error
and to skip WoL configuration.
At last, WoL data were passed to FW from not clean buffer.
That could cause FW to accept garbage as a random configuration data.

Fixes: 6a7f227731 ("net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic")
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:39:43 -07:00
Vivien Didelot 0ee4e76937 ethtool: fix potential userspace buffer overflow
ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
and pass it to the kernel driver via ops->get_regs() for filling.

There is no restriction about what the kernel drivers can or cannot do
with the open ethtool_regs structure. They usually set regs->version
and ignore regs->len or set it to the same size as ops->get_regs_len().

But if userspace allocates a smaller buffer for the registers dump,
we would cause a userspace buffer overflow in the final copy_to_user()
call, which uses the regs.len value potentially reset by the driver.

To fix this, make this case obvious and store regs.len before calling
ops->get_regs(), to only copy as much data as requested by userspace,
up to the value returned by ops->get_regs_len().

While at it, remove the redundant check for non-null regbuf.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:15:27 -07:00
Neil Horman 0a8dd9f67c Fix memory leak in sctp_process_init
syzbot found the following leak in sctp_process_init
BUG: memory leak
unreferenced object 0xffff88810ef68400 (size 1024):
  comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s)
  hex dump (first 32 bytes):
    1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25  ..(........h...%
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55
[inline]
    [<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline]
    [<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline]
    [<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline]
    [<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675
    [<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119
    [<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline]
    [<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20
net/sctp/sm_make_chunk.c:2437
    [<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682
[inline]
    [<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384
[inline]
    [<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194
[inline]
    [<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165
    [<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200
net/sctp/associola.c:1074
    [<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95
    [<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354
    [<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline]
    [<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418
    [<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934
    [<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122
    [<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802
    [<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline]
    [<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671
    [<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292
    [<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330
    [<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline]
    [<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline]
    [<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337
    [<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3

The problem was that the peer.cookie value points to an skb allocated
area on the first pass through this function, at which point it is
overwritten with a heap allocated value, but in certain cases, where a
COOKIE_ECHO chunk is included in the packet, a second pass through
sctp_process_init is made, where the cookie value is re-allocated,
leaking the first allocation.

Fix is to always allocate the cookie value, and free it when we are done
using it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:11:47 -07:00
Zhu Yanjun b50e058746 net: rds: fix memory leak when unload rds_rdma
When KASAN is enabled, after several rds connections are
created, then "rmmod rds_rdma" is run. The following will
appear.

"
BUG rds_ib_incoming (Not tainted): Objects remaining
in rds_ib_incoming on __kmem_cache_shutdown()

Call Trace:
 dump_stack+0x71/0xab
 slab_err+0xad/0xd0
 __kmem_cache_shutdown+0x17d/0x370
 shutdown_cache+0x17/0x130
 kmem_cache_destroy+0x1df/0x210
 rds_ib_recv_exit+0x11/0x20 [rds_rdma]
 rds_ib_exit+0x7a/0x90 [rds_rdma]
 __x64_sys_delete_module+0x224/0x2c0
 ? __ia32_sys_delete_module+0x2c0/0x2c0
 do_syscall_64+0x73/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
"
This is rds connection memory leak. The root cause is:
When "rmmod rds_rdma" is run, rds_ib_remove_one will call
rds_ib_dev_shutdown to drop the rds connections.
rds_ib_dev_shutdown will call rds_conn_drop to drop rds
connections as below.
"
rds_conn_path_drop(&conn->c_path[0], false);
"
In the above, destroy is set to false.
void rds_conn_path_drop(struct rds_conn_path *cp, bool destroy)
{
        atomic_set(&cp->cp_state, RDS_CONN_ERROR);

        rcu_read_lock();
        if (!destroy && rds_destroy_pending(cp->cp_conn)) {
                rcu_read_unlock();
                return;
        }
        queue_work(rds_wq, &cp->cp_down_w);
        rcu_read_unlock();
}
In the above function, destroy is set to false. rds_destroy_pending
is called. This does not move rds connections to ib_nodev_conns.
So destroy is set to true to move rds connections to ib_nodev_conns.
In rds_ib_unregister_client, flush_workqueue is called to make rds_wq
finsh shutdown rds connections. The function rds_ib_destroy_nodev_conns
is called to shutdown rds connections finally.
Then rds_ib_recv_exit is called to destroy slab.

void rds_ib_recv_exit(void)
{
        kmem_cache_destroy(rds_ib_incoming_slab);
        kmem_cache_destroy(rds_ib_frag_slab);
}
The above slab memory leak will not occur again.

>From tests,
256 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma

real    0m16.522s
user    0m0.000s
sys     0m8.152s
512 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma

real    0m32.054s
user    0m0.000s
sys     0m15.568s

To rmmod rds_rdma with 256 rds connections, about 16 seconds are needed.
And with 512 rds connections, about 32 seconds are needed.
>From ftrace, when one rds connection is destroyed,

"
 19)               |  rds_conn_destroy [rds]() {
 19)   7.782 us    |    rds_conn_path_drop [rds]();
 15)               |  rds_shutdown_worker [rds]() {
 15)               |    rds_conn_shutdown [rds]() {
 15)   1.651 us    |      rds_send_path_reset [rds]();
 15)   7.195 us    |    }
 15) + 11.434 us   |  }
 19)   2.285 us    |    rds_cong_remove_conn [rds]();
 19) * 24062.76 us |  }
"
So if many rds connections will be destroyed, this function
rds_ib_destroy_nodev_conns uses most of time.

Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:08:14 -07:00
Xin Long b7999b0772 ipv6: fix the check before getting the cookie in rt6_get_cookie
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.

It's caused by Commit 93531c6743 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().

Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.

c1:
  (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))

Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:00:29 -07:00
Xin Long 0a90478b93 ipv4: not do cache for local delivery if bc_forwarding is enabled
With the topo:

    h1 ---| rp1            |
          |     route  rp3 |--- h3 (192.168.200.1)
    h2 ---| rp2            |

If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
h2, and the packets can still be forwared.

This issue was caused by the input route cache. It should only do
the cache for either bc forwarding or local delivery. Otherwise,
local delivery can use the route cache for bc forwarding of other
interfaces.

This patch is to fix it by not doing cache for local delivery if
all.bc_forwarding is enabled.

Note that we don't fix it by checking route cache local flag after
rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
common route code shouldn't be touched for bc_forwarding.

Fixes: 5cbf777cfd ("route: add support for directed broadcast forwarding")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 16:59:21 -07:00
David S. Miller e7a9fe7b0d Merge branch 's390-qeth-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2019-06-05

one more shot...  now with patch 2 fixed up so that it uses the
dst entry returned from dst_check().

From the v1 cover letter:

Please apply the following set of qeth fixes to -net.

- The first two patches fix issues in the L3 driver's cast type
  selection for transmitted skbs.
- Alexandra adds a sanity check when retrieving VLAN information from
  neighbour address events.
- The last patch adds some missing error handling for qeth's new
  multiqueue code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Julian Wiedmann bd966839bd s390/qeth: handle error when updating TX queue count
netif_set_real_num_tx_queues() can return an error, deal with it.

Fixes: 73dc2daf11 ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Alexandra Winter 335726195e s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
Enabling sysfs attribute bridge_hostnotify triggers a series of udev events
for the MAC addresses of all currently connected peers. In case no VLAN is
set for a peer, the device reports the corresponding MAC addresses with
VLAN ID 4096. This currently results in attribute VLAN=4096 for all
non-VLAN interfaces in the initial series of events after host-notify is
enabled.

Instead, no VLAN attribute should be reported in the udev event for
non-VLAN interfaces.

Only the initial events face this issue. For dynamic changes that are
reported later, the device uses a validity flag.

This also changes the code so that it now sets the VLAN attribute for
MAC addresses with VID 0. On Linux, no qeth interface will ever be
registered with VID 0: Linux kernel registers VID 0 on all network
interfaces initially, but qeth will drop .ndo_vlan_rx_add_vid for VID 0.
Peers with other OSs could register MACs with VID 0.

Fixes: 9f48b9db9a ("qeth: bridgeport support - address notifications")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Julian Wiedmann 0cd6783d3c s390/qeth: check dst entry before use
While qeth_l3 uses netif_keep_dst() to hold onto the dst, a skb's dst
may still have been obsoleted (via dst_dev_put()) by the time that we
end up using it. The dst then points to the loopback interface, which
means the neighbour lookup in qeth_l3_get_cast_type() determines a bogus
cast type of RTN_BROADCAST.
For IQD interfaces this causes us to place such skbs on the wrong
HW queue, resulting in TX errors.

Fix-up the various call sites to first validate the dst entry with
dst_check(), and fall back accordingly.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Julian Wiedmann 72c87976c5 s390/qeth: handle limited IPv4 broadcast in L3 TX path
When selecting the cast type of a neighbourless IPv4 skb (eg. on a raw
socket), qeth_l3 falls back to the packet's destination IP address.
For this case we should classify traffic sent to 255.255.255.255 as
broadcast.
This fixes DHCP requests, which were misclassified as unicast
(and for IQD interfaces thus ended up on the wrong HW queue).

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Paolo Abeni fdf71426e7 net: fix indirect calls helpers for ptype list hooks.
As Eric noted, the current wrapper for ptype func hook inside
__netif_receive_skb_list_ptype() has no chance of avoiding the indirect
call: we enter such code path only for protocols other than ipv4 and
ipv6.

Instead we can wrap the list_func invocation.

v1 -> v2:
 - use the correct fix tag

Fixes: f5737cbadb ("net: use indirect calls helpers for ptype hook")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 20:16:22 -07:00
Miaohe Lin ceae266bf0 net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
There's some NICs, such as hinic, with NETIF_F_IP_CSUM and NETIF_F_TSO
on but NETIF_F_HW_CSUM off. And ipvlan device features will be
NETIF_F_TSO on with NETIF_F_IP_CSUM and NETIF_F_IP_CSUM both off as
IPVLAN_FEATURES only care about NETIF_F_HW_CSUM. So TSO will be
disabled in netdev_fix_features.
For example:
Features for enp129s0f0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on

Fixes: a188222b6e ("net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 20:01:15 -07:00
Tim Beale 82ba25c6de udp: only choose unbound UDP socket for multicast when not in a VRF
By default, packets received in another VRF should not be passed to an
unbound socket in the default VRF. This patch updates the IPv4 UDP
multicast logic to match the unicast VRF logic (in compute_score()),
as well as the IPv6 mcast logic (in __udp_v6_is_mcast_sock()).

The particular case I noticed was DHCP discover packets going
to the 255.255.255.255 address, which are handled by
__udp4_lib_mcast_deliver(). The previous code meant that running
multiple different DHCP server or relay agent instances across VRFs
did not work correctly - any server/relay agent in the default VRF
received DHCP discover packets for all other VRFs.

Fixes: 6da5b0f027 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 18:34:03 -07:00
David S. Miller 2b66552eb2 Merge branch 'net-tls-redo-the-RX-resync-locking'
Jakub Kicinski says:

====================
net/tls: redo the RX resync locking

Take two of making sure we don't use a NULL netdev pointer
for RX resync.  This time using a bit and an open coded
wait loop.

v2:
 - fix build warning (DaveM).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 13:34:38 -07:00
Jakub Kicinski e52972c11d net/tls: replace the sleeping lock around RX resync with a bit lock
Commit 38030d7cb7 ("net/tls: avoid NULL-deref on resync during device removal")
tried to fix a potential NULL-dereference by taking the
context rwsem.  Unfortunately the RX resync may get called
from soft IRQ, so we can't use the rwsem to protect from
the device disappearing.  Because we are guaranteed there
can be only one resync at a time (it's called from strparser)
use a bit to indicate resync is busy and make device
removal wait for the bit to get cleared.

Note that there is a leftover "flags" field in struct
tls_context already.

Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 13:34:37 -07:00
Jakub Kicinski 27393f8c6e Revert "net/tls: avoid NULL-deref on resync during device removal"
This reverts commit 38030d7cb7.
Unfortunately the RX resync may get called from soft IRQ,
so we can't take the rwsem to protect from the device
disappearing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 13:34:37 -07:00
Vladimir Oltean f4cfcfbdf0 net: dsa: sja1105: Fix link speed not working at 100 Mbps and below
The hardware values for link speed are held in the sja1105_speed_t enum.
However they do not increase in the order that sja1105_get_speed_cfg was
iterating over them (basically from SJA1105_SPEED_AUTO - 0 - to
SJA1105_SPEED_1000MBPS - 1 - skipping the other two).

Another bug is that the code in sja1105_adjust_port_config relies on the
fact that an invalid link speed is detected by sja1105_get_speed_cfg and
returned as -EINVAL.  However storing this into an enum that only has
positive members will cast it into an unsigned value, and it will miss
the negative check.

So take the simplest approach and remove the sja1105_get_speed_cfg
function and replace it with a simple switch-case statement.

Fixes: 8aa9ebccae ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 11:51:57 -07:00
Russell King 7731676332 net: phylink: avoid reducing support mask
Avoid reducing the support mask as a result of the interface type
selected for SFP modules, or when setting the link settings through
ethtool - this should only change when the supported link modes of
the hardware combination change.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 11:43:24 -07:00
Russell King 28e74a7cfd net: sfp: read eeprom in maximum 16 byte increments
Some SFP modules do not like reads longer than 16 bytes, so read the
EEPROM in chunks of 16 bytes at a time.  This behaviour is not specified
in the SFP MSAs, which specifies:

 "The serial interface uses the 2-wire serial CMOS E2PROM protocol
  defined for the ATMEL AT24C01A/02/04 family of components."

and

 "As long as the SFP+ receives an acknowledge, it shall serially clock
  out sequential data words. The sequence is terminated when the host
  responds with a NACK and a STOP instead of an acknowledge."

We must avoid breaking a read across a 16-bit quantity in the diagnostic
page, thankfully all 16-bit quantities in that page are naturally
aligned.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:16:37 -07:00
Xin Long 67c0aaa1ea selftests: set sysctl bc_forwarding properly in router_broadcast.sh
sysctl setting bc_forwarding for $rp2 is needed when ping_test_from h2,
otherwise the bc packets from $rp2 won't be forwarded. This patch is to
add this setting for $rp2.

Also, as ping_test_from does grep "$from" only, which could match some
unexpected output, some test case doesn't really work, like:

  # ping_test_from $h2 198.51.200.255 198.51.200.2
    PING 198.51.200.255 from 198.51.100.2 veth3: 56(84) bytes of data.
    64 bytes from 198.51.100.1: icmp_seq=1 ttl=64 time=0.336 ms

When doing grep $form (198.51.200.2), the output could still match.
So change to grep "bytes from $from" instead.

Fixes: 40f98b9af9 ("selftests: add a selftest for directed broadcast forwarding")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:15:01 -07:00
Sean Wang 880c2d4b2f net: ethernet: mediatek: Use NET_IP_ALIGN to judge if HW RX_2BYTE_OFFSET is enabled
Should only enable HW RX_2BYTE_OFFSET function in the case NET_IP_ALIGN
equals to 2.

Signed-off-by: Mark Lee <mark-mc.lee@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:04:07 -07:00
Sean Wang 9e4f56f1a7 net: ethernet: mediatek: Use hw_feature to judge if HWLRO is supported
Should hw_feature as hardware capability flags to check if hardware LRO
got support.

Signed-off-by: Mark Lee <mark-mc.lee@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:04:07 -07:00
Ivan Khoronzhuk 09faf5a7d7 net: ethernet: ti: cpsw_ethtool: fix ethtool ring param set
Fix ability to set RX descriptor number, the reason - initially
"tx_max_pending" was set incorrectly, but the issue appears after
adding sanity check, so fix is for "sanity" patch.

Fixes: 37e2d99b59 ("ethtool: Ensure new ring parameters are within bounds during SRINGPARAM")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 14:33:30 -07:00
Willem de Bruijn afa0925c6f packet: unconditionally free po->rollover
Rollover used to use a complex RCU mechanism for assignment, which had
a race condition. The below patch fixed the bug and greatly simplified
the logic.

The feature depends on fanout, but the state is private to the socket.
Fanout_release returns f only when the last member leaves and the
fanout struct is to be freed.

Destroy rollover unconditionally, regardless of fanout state.

Fixes: 57f015f5ec ("packet: fix crash in fanout_demux_rollover()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 18:10:14 -07:00
Wei Liu 8c26859819 Update my email address
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 13:57:20 -07:00
Yonglong Liu 2e1f164861 net: hns: Fix loopback test failed at copper ports
When doing a loopback test at copper ports, the serdes loopback
and the phy loopback will fail, because of the adjust link had
not finished, and phy not ready.

Adds sleep between adjust link and test process to fix it.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 13:55:54 -07:00
Nikita Yushchenko 62394708f3 net: dsa: mv88e6xxx: avoid error message on remove from VLAN 0
When non-bridged, non-vlan'ed mv88e6xxx port is moving down, error
message is logged:

failed to kill vid 0081/0 for device eth_cu_1000_4

This is caused by call from __vlan_vid_del() with vin set to zero, over
call chain this results into _mv88e6xxx_port_vlan_del() called with
vid=0, and mv88e6xxx_vtu_get() called from there returns -EINVAL.

On symmetric path moving port up, call goes through
mv88e6xxx_port_vlan_prepare() that calls mv88e6xxx_port_check_hw_vlan()
that returns -EOPNOTSUPP for zero vid.

This patch changes mv88e6xxx_vtu_get() to also return -EOPNOTSUPP for
zero vid, then this error code is explicitly cleared in
dsa_slave_vlan_rx_kill_vid() and error message is no longer logged.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 13:53:29 -07:00
Vladimir Oltean e8d67fa569 net: dsa: sja1105: Don't store frame type in skb->cb
Due to a confusion I thought that eth_type_trans() was called by the
network stack whereas it can actually be called by network drivers to
figure out the skb protocol and next packet_type handlers.

In light of the above, it is not safe to store the frame type from the
DSA tagger's .filter callback (first entry point on RX path), since GRO
is yet to be invoked on the received traffic.  Hence it is very likely
that the skb->cb will actually get overwritten between eth_type_trans()
and the actual DSA packet_type handler.

Of course, what this patch fixes is the actual overwriting of the
SJA1105_SKB_CB(skb)->type field from the GRO layer, which made all
frames be seen as SJA1105_FRAME_TYPE_NORMAL (0).

Fixes: 227d07a07e ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31 14:27:27 -07:00
Linus Torvalds 036e343109 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix OOPS during nf_tables rule dump, from Florian Westphal.

 2) Use after free in ip_vs_in, from Yue Haibing.

 3) Fix various kTLS bugs (NULL deref during device removal resync,
    netdev notification ignoring, etc.) From Jakub Kicinski.

 4) Fix ipv6 redirects with VRF, from David Ahern.

 5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet.

 6) Missing memory allocation failure check in ip6_ra_control(), from
    Gen Zhang. And likewise fix ip_ra_control().

 7) TX clean budget logic error in aquantia, from Igor Russkikh.

 8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet.

 9) Double frees in mlx5, from Parav Pandit.

10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit.

11) Fix botched register access in mvpp2, from Antoine Tenart.

12) Use after free in napi_gro_frags(), from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits)
  net: correct zerocopy refcnt with udp MSG_MORE
  ethtool: Check for vlan etype or vlan tci when parsing flow_rule
  net: don't clear sock->sk early to avoid trouble in strparser
  net-gro: fix use-after-free read in napi_gro_frags()
  net: dsa: tag_8021q: Create a stable binary format
  net: dsa: tag_8021q: Change order of rx_vid setup
  net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
  ipv4: tcp_input: fix stack out of bounds when parsing TCP options.
  mlxsw: spectrum: Prevent force of 56G
  mlxsw: spectrum_acl: Avoid warning after identical rules insertion
  net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
  r8169: fix MAC address being lost in PCI D3
  net: core: support XDP generic on stacked devices.
  netvsc: unshare skb in VF rx handler
  udp: Avoid post-GRO UDP checksum recalculation
  net: phy: dp83867: Set up RGMII TX delay
  net: phy: dp83867: do not call config_init twice
  net: phy: dp83867: increase SGMII autoneg timer duration
  net: phy: dp83867: fix speed 10 in sgmii mode
  net: phy: marvell10g: report if the PHY fails to boot firmware
  ...
2019-05-30 21:11:22 -07:00
Linus Torvalds adc3f554fa arm64 fixes for -rc3
- Fix implementation of our set_personality() system call, which wasn't
   being wrapped properly
 
 - Fix system call function types to keep CFI happy
 
 - Fix siginfo layout when delivering SIGKILL after a kernel fault
 
 - Really fix module relocation range checking
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzvv3EACgkQt6xw3ITB
 YzQviwf9Gw3VrBZpS9nwz0MQCf9W7+Vpy8XBsY7HJyUNQ4+8ZNR5HoZ3BcJX2HWk
 WKwSw721MllzLfJaRMqNV2+C7lm+EypcZApKFpPo7Vs9g78WcUdNZ4YM4XfAX45T
 cVPxeSGOj2aswyOn2Xa3UjKZj8deP8nAC/JgJY7t9L6qKObwUldmxBPRnZdclclw
 S8sQSMvLc9Q43jmEysPLixExZ6jzmq1i8xxPcyqFUz8DHYPf1irLxtpS7DYA+nk5
 nwQ/lnz6Tu8TBXcvgvXayKL8aa8SIsl0cOii2FWsZMkFXz3OZ08hdujvMYsPSSHO
 q3rMub7F/0znm00sBGXgTGRjy++v+A==
 =pyp4
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The fixes are still trickling in for arm64, but the only really
  significant one here is actually fixing a regression in the botched
  module relocation range checking merged for -rc2.

  Hopefully we've nailed it this time.

   - Fix implementation of our set_personality() system call, which
     wasn't being wrapped properly

   - Fix system call function types to keep CFI happy

   - Fix siginfo layout when delivering SIGKILL after a kernel fault

   - Really fix module relocation range checking"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: use the correct function type for __arm64_sys_ni_syscall
  arm64: use the correct function type in SYSCALL_DEFINE0
  arm64: fix syscall_fn_t type
  signal/arm64: Use force_sig not force_sig_fault for SIGKILL
  arm64/module: revert to unsigned interpretation of ABS16/32 relocations
  arm64: Fix the arm64_personality() syscall wrapper redirection
2019-05-30 21:05:23 -07:00
Linus Torvalds 318adf8e4b for-5.2-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlzvsOAACgkQxWXV+ddt
 WDuLQg/+OHwlNW/8KT+1/gQvAxVnI2bglRJ3lYOQRenR8jA4y3rIKgXWXyd7A/uK
 acrjeZYMaho5HY5VaKqAqDST7KikR+gPQh1IArYlBcL7tI5c/YsEgqf2G8PXo1U1
 9B13og3kWpdIRNIF9OyKUPcGGfnG5UdBDGNFAEuQZpRXbFKJ+8+ijYU0dXIIFdJb
 scl9vWQWFDoLlZ2szRDbl5gAG0lYwk5q0rTRDt+xyla83gD5UNP5oG8XNp1o/T5+
 yDwM81IhQ636n51/NkX5RgFbs0ljjRqVzXJg5pa3XH1w9vwZuWoKRNcUhuDH6j9W
 wL4Gw33Q8607uk01D5wDdtNI8JTOaXDDYnKsgzNb+7A7ICWlQ/8OR6VZintMioun
 ccpNY7HMuVdGdRZxE7ZW63LxLyXulZW51r5G2IvBwRfT6aGl+oKwU4AwB6slEId3
 S1ftxcCKYHqtCkRAutirjUknuYdzr0LB1sePoiFwQmIN6782fzuLF8O4hxl5Hcd9
 UoEgz/240HiTDqsluUmVkurLVUwBk7CoIdec3tPELrCagI7rqG4H2nkj7XXMJiVD
 XyCJZB0dF3E6G8TzlL5lKQWDniqDrLizYwnxYr6OSYZvp9kzfHgxpTPGdxwbIAjr
 JT+v6332N09ODooODtzci0Pt0YdfcK1tIhcWXP+oLpE4v/PZj8g=
 =lyvo
 -----END PGP SIGNATURE-----

Merge tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes for bugs reported by users, fuzzing tools and
  regressions:

   - fix crashes in relocation:
       + resuming interrupted balance operation does not properly clean
         up orphan trees
       + with enabled qgroups, resuming needs to be more careful about
         block groups due to limited context when updating qgroups

   - fsync and logging fixes found by fuzzing

   - incremental send fixes for no-holes and clone

   - fix spin lock type used in timer function for zstd"

* tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix race updating log root item during fsync
  Btrfs: fix wrong ctime and mtime of a directory after log replay
  Btrfs: fix fsync not persisting changed attributes of a directory
  btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference
  btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON()
  Btrfs: incremental send, fix emission of invalid clone operations
  Btrfs: incremental send, fix file corruption when no-holes feature is enabled
  btrfs: correct zstd workspace manager lock to use spin_lock_bh()
  btrfs: Ensure replaced device doesn't have pending chunk allocation
2019-05-30 20:52:40 -07:00
Linus Torvalds 8cb7104d03 configs fix for 5.2
- fix a use after free in configfs_d_iput (Sahitya Tummala)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlzvmYQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPz/A/+JH4Z1nKUNPFPtRIKkFf/3sBrMjBvw+feV7a2AiDS
 SbwNu7Rv30VjQYe6ABmANz6B0me6Zr2ELEn7v0x4tXNXXqBUVOLc88Sq8WyIq26V
 6GkZJ+8OvLxdamZCSFITZy/XgIzZFBBNeSDg7NIZU1kEUSGwsS5HhQlUzoLD3sFI
 8dguMEDGRypSV4cFu87q2NjA6r7ti35Rn/VZeS7Zkwopv6yc6xo9g4ocfvcuZcuK
 lI7ZW6aKzLfPel13qyqUFklG4sZ/U+uffjs5zJHGbA1Doe7g6ulHWFaM9rW8b6oB
 3XZ/s47uAtcDdIqH59Snk3aHmkk1pwBlxLJcpdrBskxeXFoC5UrQIDWE43WZ8xvK
 VZr4Nk+SmkTYG6t9aETplV15FP3BVZXdjVKVKWqhHDzRV3SkX+VfDfsG31P/Kicw
 Dl/1sokt7tdoCjXMiPILGtajacpspgMlpL2ztQG5DojsTtKRUd/wm6PhDVNYusHH
 GkL1JBqEzVkhZalVapGWRT9DxZGIu6nhF+E7lvhdExFU/yZsXTIlI8suNM8jKoLM
 P4mTb5bTwfjjUj2Z3KxzyIc5Kg4SHjjpzY2E6Fb/mY3BpzURzc2qHm9+f2k5/Xsq
 OlwPNwC0Br4W6NkNrlUmh8VUk3zAL3V5TwNz21uyyg4PgKL3a0yexSkOsR2TSnL8
 c/k=
 =qQ2a
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfs

Pull configs fix from Christoph Hellwig:

 - fix a use after free in configfs_d_iput (Sahitya Tummala)

* tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfs:
  configfs: Fix use-after-free when accessing sd->s_dentry
2019-05-30 20:35:48 -07:00
Linus Torvalds c5ba171266 sound fixes for 5.2-rc3
No big surprises here, just a few device-specific fixes.
 
 HD-audio received several fixes for Acer, Dell, Huawei and other
 laptops as well as the workaround for the new Intel chipset.
 One significant one-liner fix is the disablement of the node-power
 saving on Realtek codecs, which may potentially cover annoying bugs
 like the background noises or click noises on many devices.
 
 Other than that, a fix for FireWire bit definitions, and another fix
 for LINE6 USB audio bug that was discovered by syzkaller.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlzvmKUOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE+hJQ/+Ni6QlktS/PasTXYHikyub6FBvHlRbFXjvKbn
 blUTxDhIIHNlbugpCYfaZ4EUSX8ZYV39Prlfsgg6Sq8k2z3r99zW3nt1DAI9EoPW
 OMmaCBE19jEQl49pKQ6rOiBSeMxgtjJRTbNQKiY3uR7TK7/i0wtjtoIDtD9d979d
 vc3b9S95+chiKww0NqGMf/4kJIOyrA3POE3obvYcutwDm0yjBtS5cQYuKLicEGK2
 Q1j811PXmn+LgC8VZdH2cgGrWC9lWeMb3S5X+uJoSr5mLJCLBp1+oGnpxWYQMrzZ
 sOffACbVO/v106rjOoPKWChPVssgO6OuaFX+kUQ+1P5n73nMgplKsQ1CLGoXSiuN
 DfPNiF88z8O4KPOia3FDDid/zk4uURHh4DAKhtGSctRCCXiS/ZdUeRHypj63vTsF
 o85Boo9gss2wDs51vxS3ypoIfl0BnNLEjOcYGQBFA0ci4mrnwXG0PdQCwnYSfJjW
 9zCwS9l0oqhPWAG+9wBfaN9SlNIevtXnGy18s/OUM8QZKNaqbfuIvAd/HhCfHSra
 brQzouplMbT5G2DDCU4wdUhkhHY8i4wOT1PjcENP8QWnQXoBr2FsMmK9Wqj/mG68
 Frs07wyqEQcviGMOB3YUyZ1BnGNQujfgBBy5jaz5Ga4HNcsO6Ro9FHhIYlelat/i
 No0D7t8=
 =Bj5P
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "No big surprises here, just a few device-specific fixes.

  HD-audio received several fixes for Acer, Dell, Huawei and other
  laptops as well as the workaround for the new Intel chipset. One
  significant one-liner fix is the disablement of the node-power saving
  on Realtek codecs, which may potentially cover annoying bugs like the
  background noises or click noises on many devices.

  Other than that, a fix for FireWire bit definitions, and another fix
  for LINE6 USB audio bug that was discovered by syzkaller"

* tag 'sound-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: fireface: Use ULL suffixes for 64-bit constants
  ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops
  ALSA: line6: Assure canceling delayed work at disconnection
  ALSA: hda - Force polling mode on CNL for fixing codec communication
  ALSA: hda/realtek - Enable micmute LED for Huawei laptops
  ALSA: hda/realtek - Set default power save node to 0
  ALSA: hda/realtek - Check headset type by unplug and resume
2019-05-30 19:58:59 -07:00
Linus Torvalds 20f9449656 A few clk driver fixes
- Don't expose the SiFive clk driver on non-RISCV architectures
 
  - Fix some bits describing clks in the imx8mm driver
 
  - Always call clk domain code in the TI driver so non-legacy platforms
    work
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAlzvBroRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSXt1hAAjab8EFP2qRzqoZKEns5Zu4H9htPbLvco
 oWJAcaLgmeF8MV/pK+HmGkqFJnAOBkTf+2NoAkz5qcRghGBacbxfmJAm0zLNoMQ4
 JBGowtpQhio5u52UMNIyVND7TeNdRaNihe1osPpSoFj0zFzwUFIv2aK8Im6gRmWY
 uDNkJdib5DsVURiPa2kqglLLWOpJl5X4tXYpu0i88pToWhiOK9b3ngh1R/VTVYic
 H4vvo6XGClF1CSqpKpwCeUCxyuAWHT+qCbE3SHbEJJcon0QzEFTW635NMLbmCNMa
 oNpqLnPtx+IELvLeTeLyeKKsb9HedgllmqiGp2szC5jj8y+T4T1O45WhCl0mj2TV
 bS0ANcrpxy+edhH0qGEOAp4pK41WO5vEWiQbJBLm6ge9jtLuir/GP91tAcAdBDRT
 DOCGGhmYythMo0OETQ5ZlpP/TvgM9uCPRmg0OTVzZHgosG7u2TZgH5/TzoIlnNRE
 bowpTfKYPTTZ8ZhNqhqmNll8CV/xdmbdTW8QAgnOuj9Kv76IyMGZfxQIbUDbdUox
 cgVFWa+S6HxBUddX5GPsDqm0voE4bhYxBqb6B0NuopjMHYRqphyY5Icp4qQMhbyC
 MzcEEbfGj0GDtDLXEpDFOASWHPpBV/6g6xA1hJyhXPWBl3oEWPM1nL4hIADMx41O
 Wzg6MPVeqWU=
 =gbSJ
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk driver fixes from Stephen Boyd:

 - Don't expose the SiFive clk driver on non-RISCV architectures

 - Fix some bits describing clks in the imx8mm driver

 - Always call clk domain code in the TI driver so non-legacy platforms
   work

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: ti: clkctrl: Fix clkdm_clk handling
  clk: imx: imx8mm: fix int pll clk gate
  clk: sifive: restrict Kconfig scope for the FU540 PRCI driver
2019-05-30 16:33:37 -07:00
Willem de Bruijn 100f6d8e09 net: correct zerocopy refcnt with udp MSG_MORE
TCP zerocopy takes a uarg reference for every skb, plus one for the
tcp_sendmsg_locked datapath temporarily, to avoid reaching refcnt zero
as it builds, sends and frees skbs inside its inner loop.

UDP and RAW zerocopy do not send inside the inner loop so do not need
the extra sock_zerocopy_get + sock_zerocopy_put pair. Commit
52900d22288ed ("udp: elide zerocopy operation in hot path") introduced
extra_uref to pass the initial reference taken in sock_zerocopy_alloc
to the first generated skb.

But, sock_zerocopy_realloc takes this extra reference at the start of
every call. With MSG_MORE, no new skb may be generated to attach the
extra_uref to, so refcnt is incorrectly 2 with only one skb.

Do not take the extra ref if uarg && !tcp, which implies MSG_MORE.
Update extra_uref accordingly.

This conditional assignment triggers a false positive may be used
uninitialized warning, so have to initialize extra_uref at define.

Changes v1->v2: fix typo in Fixes SHA1

Fixes: 52900d2228 ("udp: elide zerocopy operation in hot path")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 15:54:04 -07:00
Maxime Chevallier b73484b2fc ethtool: Check for vlan etype or vlan tci when parsing flow_rule
When parsing an ethtool flow spec to build a flow_rule, the code checks
if both the vlan etype and the vlan tci are specified by the user to add
a FLOW_DISSECTOR_KEY_VLAN match.

However, when the user only specified a vlan etype or a vlan tci, this
check silently ignores these parameters.

For example, the following rule :

ethtool -N eth0 flow-type udp4 vlan 0x0010 action -1 loc 0

will result in no error being issued, but the equivalent rule will be
created and passed to the NIC driver :

ethtool -N eth0 flow-type udp4 action -1 loc 0

In the end, neither the NIC driver using the rule nor the end user have
a way to know that these keys were dropped along the way, or that
incorrect parameters were entered.

This kind of check should be left to either the driver, or the ethtool
flow spec layer.

This commit makes so that ethtool parameters are forwarded as-is to the
NIC driver.

Since none of the users of ethtool_rx_flow_rule_create are using the
VLAN dissector, I don't think this qualifies as a regression.

Fixes: eca4205f9e ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 15:04:55 -07:00
Jakub Kicinski 2b81f8161d net: don't clear sock->sk early to avoid trouble in strparser
af_inet sets sock->sk to NULL which trips strparser over:

BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.2.0-rc1-00139-g14629453a6d3 #21
RIP: 0010:tcp_peek_len+0x10/0x60
RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051
RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480
RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000
R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40
R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020
FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000012 CR3: 000000013920a003 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
 <IRQ>
 strp_data_ready+0x48/0x90
 tls_data_ready+0x22/0xd0 [tls]
 tcp_rcv_established+0x569/0x620
 tcp_v4_do_rcv+0x127/0x1e0
 tcp_v4_rcv+0xad7/0xbf0
 ip_protocol_deliver_rcu+0x2c/0x1c0
 ip_local_deliver_finish+0x41/0x50
 ip_local_deliver+0x6b/0xe0
 ? ip_protocol_deliver_rcu+0x1c0/0x1c0
 ip_rcv+0x52/0xd0
 ? ip_rcv_finish_core.isra.20+0x380/0x380
 __netif_receive_skb_one_core+0x7e/0x90
 netif_receive_skb_internal+0x42/0xf0
 napi_gro_receive+0xed/0x150
 nfp_net_poll+0x7a2/0xd30 [nfp]
 ? kmem_cache_free_bulk+0x286/0x310
 net_rx_action+0x149/0x3b0
 __do_softirq+0xe3/0x30a
 ? handle_irq_event_percpu+0x6a/0x80
 irq_exit+0xe8/0xf0
 do_IRQ+0x85/0xd0
 common_interrupt+0xf/0xf
 </IRQ>
RIP: 0010:cpuidle_enter_state+0xbc/0x450

To avoid this issue set sock->sk after sk_prot->close.
My grepping and testing did not discover any code which
would depend on the current behaviour.

Fixes: c46234ebb4 ("tls: RX path for ktls")
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 14:54:17 -07:00
Eric Dumazet a4270d6795 net-gro: fix use-after-free read in napi_gro_frags()
If a network driver provides to napi_gro_frags() an
skb with a page fragment of exactly 14 bytes, the call
to gro_pull_from_frag0() will 'consume' the fragment
by calling skb_frag_unref(skb, 0), and the page might
be freed and reused.

Reading eth->h_proto at the end of napi_frags_skb() might
read mangled data, or crash under specific debugging features.

BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline]
BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
Read of size 2 at addr ffff88809366840c by task syz-executor599/8957

CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 kasan_report+0x12/0x20 mm/kasan/common.c:614
 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142
 napi_frags_skb net/core/dev.c:5833 [inline]
 napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
 tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991
 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
 call_write_iter include/linux/fs.h:1872 [inline]
 do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
 do_iter_write fs/read_write.c:970 [inline]
 do_iter_write+0x184/0x610 fs/read_write.c:951
 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
 do_writev+0x15b/0x330 fs/read_write.c:1058

Fixes: a50e233c50 ("net-gro: restore frag0 optimization")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 14:52:14 -07:00
David S. Miller c3bc6debb4 Merge branch 'Fixes-for-DSA-tagging-using-802-1Q'
Vladimir Oltean says:

====================
Fixes for DSA tagging using 802.1Q

During the prototyping for the "Decoupling PHYLINK from struct
net_device" patchset, the CPU port of the sja1105 driver was moved to a
different spot.  This uncovered an issue in the tag_8021q DSA code,
which used to work by mistake - the CPU port was the last hardware port
numerically, and this was masking an ordering issue which is very likely
to be seen in other drivers that make use of 802.1Q tags.

A question was also raised whether the VID numbers bear any meaning, and
the conclusion was that they don't, at least not in an absolute sense.
The second patch defines bit fields inside the DSA 802.1Q VID so that
tcpdump can decode it unambiguously (although the meaning is now clear
even by visual inspection).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 14:47:14 -07:00
Vladimir Oltean 0471dd429c net: dsa: tag_8021q: Create a stable binary format
Tools like tcpdump need to be able to decode the significance of fake
VLAN headers that DSA uses to separate switch ports.

But currently these have no global significance - they are simply an
ordered list of DSA_MAX_SWITCHES x DSA_MAX_PORTS numbers ending at 4095.

The reason why this is submitted as a fix is that the existing mapping
of VIDs should not enter into a stable kernel, so we can pretend that
only the new format exists. This way tcpdump won't need to try to make
something out of the VLAN tags on 5.2 kernels.

Fixes: f9bbe4477c ("net: dsa: Optional VLAN-based port separation for switches without tagging")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 14:47:14 -07:00
Ioana Ciornei d34d2baa91 net: dsa: tag_8021q: Change order of rx_vid setup
The 802.1Q tagging performs an unbalanced setup in terms of RX VIDs on
the CPU port. For the ingress path of a 802.1Q switch to work, the RX
VID of a port needs to be seen as tagged egress on the CPU port.

While configuring the other front-panel ports to be part of this VID,
for bridge scenarios, the untagged flag is applied even on the CPU port
in dsa_switch_vlan_add.  This happens because DSA applies the same flags
on the CPU port as on the (bridge-controlled) slave ports, and the
effect in this case is that the CPU port tagged settings get deleted.

Instead of fixing DSA by introducing a way to control VLAN flags on the
CPU port (and hence stop inheriting from the slave ports) - a hard,
perhaps intractable problem - avoid this situation by moving the setup
part of the RX VID on the CPU port after all the other front-panel ports
have been added to the VID.

Fixes: f9bbe4477c ("net: dsa: Optional VLAN-based port separation for switches without tagging")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 14:47:14 -07:00
Antoine Tenart 2180843721 net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
MVPP2_TXQ_SCHED_TOKEN_CNTR_REG() expects the logical queue id but
the current code is passing the global tx queue offset, so it ends
up writing to unknown registers (between 0x8280 and 0x82fc, which
seemed to be unused by the hardware). This fixes the issue by using
the logical queue id instead.

Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 14:31:13 -07:00
Young Xiao 9609dad263 ipv4: tcp_input: fix stack out of bounds when parsing TCP options.
The TCP option parsing routines in tcp_parse_options function could
read one byte out of the buffer of the TCP options.

1         while (length > 0) {
2                 int opcode = *ptr++;
3                 int opsize;
4
5                 switch (opcode) {
6                 case TCPOPT_EOL:
7                         return;
8                 case TCPOPT_NOP:        /* Ref: RFC 793 section 3.1 */
9                         length--;
10                        continue;
11                default:
12                        opsize = *ptr++; //out of bound access

If length = 1, then there is an access in line2.
And another access is occurred in line 12.
This would lead to out-of-bound access.

Therefore, in the patch we check that the available data length is
larger enough to pase both TCP option code and size.

Signed-off-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 12:32:47 -07:00
David S. Miller 62851d71e7 Merge branch 'mlxsw-Two-small-fixes'
Ido Schimmel says:

====================
mlxsw: Two small fixes

Patch #1 from Jiri fixes an issue specific to Spectrum-2 where the
insertion of two identical flower filters with different priorities
would trigger a warning.

Patch #2 from Amit prevents the driver from trying to configure a port
with a speed of 56Gb/s and autoneg off as this is not supported and
results in error messages from firmware.

Please consider patch #1 for stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 12:30:47 -07:00
Amit Cohen 275e928f19 mlxsw: spectrum: Prevent force of 56G
Force of 56G is not supported by hardware in Ethernet devices. This
configuration fails with a bad parameter error from firmware.

Add check of this case. Instead of trying to set 56G with autoneg off,
return a meaningful error.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 12:30:47 -07:00
Jiri Pirko ef74422020 mlxsw: spectrum_acl: Avoid warning after identical rules insertion
When identical rules are inserted, the latter one goes to C-TCAM. For
that, a second eRP with the same mask is created. These 2 eRPs by the
nature cannot be merged and also one cannot be parent of another.
Teach mlxsw_sp_acl_erp_delta_fill() about this possibility and handle it
gracefully.

Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Fixes: c22291f7cf ("mlxsw: spectrum: acl: Implement delta for ERP")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 12:30:47 -07:00
Rasmus Villemoes 84b3fd1fc9 net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
Currently, the upper half of a 4-byte STATS_TYPE_PORT statistic ends
up in bits 47:32 of the return value, instead of bits 31:16 as they
should.

Fixes: 6e46e2d821 ("net: dsa: mv88e6xxx: Fix u64 statistics")
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30 12:28:06 -07:00