Commit Graph

681405 Commits

Author SHA1 Message Date
David S. Miller bf72acefeb ipv4: Export rtm_ipv4_policy.
The MPLS code now needs it.

Fixes: 397fc9e5ce ("mpls: route get support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 15:17:30 +01:00
Roopa Prabhu 397fc9e5ce mpls: route get support
This patch adds RTM_GETROUTE doit handler for mpls routes.

Input:
RTA_DST - input label
RTA_NEWDST - labels in packet for multipath selection

By default the getroute handler returns matched
nexthop label, via and oif

With RTM_F_FIB_MATCH flag, full matched route is
returned.

example (with patched iproute2):
$ip -f mpls route show
101
        nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
        nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12
201
        nexthop as to 202/203 via inet6 2001:db8:2::2 dev virt1-2
        nexthop as to 402/403 via inet6 2001:db8:12::2 dev virt1-12

$ip -f mpls route get 103
RTNETLINK answers: Network is unreachable

$ip -f mpls route get 101
101 as to 102/103 via inet 172.16.2.2 dev virt1-2

$ip -f mpls route get as to 302/303 101
101 as to 302/303 via inet 172.16.12.2 dev virt1-12

$ip -f mpls route get fibmatch 103
RTNETLINK answers: Network is unreachable

$ip -f mpls route get fibmatch 101
101
        nexthop as to 102/103 via inet 172.16.2.2 dev virt1-2
        nexthop as to 302/303 via inet 172.16.12.2 dev virt1-12

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:50:38 -07:00
Nikolay Aleksandrov 7597b266c5 bridge: allow ext learned entries to change ports
current code silently ignores change of port in the request
message. This patch makes sure the port is modified and
notification is sent to userspace.

Fixes: cf6b8e1eed ("bridge: add API to notify bridge driver of learned FBD on offloaded device")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:49:44 -07:00
Jamal Hadi Salim e05a90ec9e net: reflect mark on tcp syn ack packets
SYN-ACK responses on a server in response to a SYN from a client
did not get the injected skb mark that was tagged on the SYN packet.

Fixes: 84f39b08d7 ("net: support marking accepting TCP sockets")
Reviewed-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:46:10 -07:00
Sean Wang 8d32e06243 net: ethernet: mediatek: fixed deadlock captured by lockdep
Lockdep found an inconsistent lock state when mtk_get_stats64 is called
in user context while NAPI updates MAC statistics in softirq.

Use spin_trylock_bh/spin_unlock_bh fix following lockdep warning.

[   81.321030] WARNING: inconsistent lock state
[   81.325266] 4.12.0-rc1-00035-gd9dda65 #32 Not tainted
[   81.330273] --------------------------------
[   81.334505] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   81.340464] ksoftirqd/0/7 [HC0[0]:SC1[1]:HE1:SE0] takes:
[   81.345731]  (&syncp->seq#2){+.?...}, at: [<c054ba3c>] mtk_handle_status_irq.part.6+0x70/0x84
[   81.354219] {SOFTIRQ-ON-W} state was registered at:
[   81.359062]   lock_acquire+0xfc/0x2b0
[   81.362696]   mtk_stats_update_mac+0x60/0x2c0
[   81.367017]   mtk_get_stats64+0x17c/0x18c
[   81.370995]   dev_get_stats+0x48/0xbc
[   81.374628]   rtnl_fill_stats+0x48/0x128
[   81.378520]   rtnl_fill_ifinfo+0x4ac/0xd1c
[   81.382584]   rtmsg_ifinfo_build_skb+0x7c/0xe0
[   81.386991]   rtmsg_ifinfo.part.5+0x24/0x54
[   81.391139]   rtmsg_ifinfo+0x24/0x28
[   81.394685]   __dev_notify_flags+0xa4/0xac
[   81.398749]   dev_change_flags+0x50/0x58
[   81.402640]   devinet_ioctl+0x768/0x85c
[   81.406444]   inet_ioctl+0x1a4/0x1d0
[   81.409990]   sock_ioctl+0x16c/0x33c
[   81.413538]   do_vfs_ioctl+0xb4/0xa34
[   81.417169]   SyS_ioctl+0x44/0x6c
[   81.420458]   ret_fast_syscall+0x0/0x1c
[   81.424260] irq event stamp: 3354692
[   81.427806] hardirqs last  enabled at (3354692): [<c0678168>] net_rx_action+0xc0/0x504
[   81.435660] hardirqs last disabled at (3354691): [<c0678134>] net_rx_action+0x8c/0x504
[   81.443515] softirqs last  enabled at (3354106): [<c0101944>] __do_softirq+0x4b4/0x614
[   81.451370] softirqs last disabled at (3354109): [<c012f0c4>] run_ksoftirqd+0x44/0x80
[   81.459134]
[   81.459134] other info that might help us debug this:
[   81.465608]  Possible unsafe locking scenario:
[   81.465608]
[   81.471478]        CPU0
[   81.473900]        ----
[   81.476321]   lock(&syncp->seq#2);
[   81.479701]   <Interrupt>
[   81.482294]     lock(&syncp->seq#2);
[   81.485847]
[   81.485847]  *** DEADLOCK ***
[   81.485847]
[   81.491720] 1 lock held by ksoftirqd/0/7:
[   81.495693]  #0:  (&(&mac->hw_stats->stats_lock)->rlock){+.+...}, at: [<c054ba14>] mtk_handle_status_irq.part.6+0x48/0x84
[   81.506579]
[   81.506579] stack backtrace:
[   81.510904] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.12.0-rc1-00035-gd9dda65 #32
[   81.518668] Hardware name: Mediatek Cortex-A7 (Device Tree)
[   81.524208] [<c0113dc4>] (unwind_backtrace) from [<c010e3f0>] (show_stack+0x20/0x24)
[   81.531899] [<c010e3f0>] (show_stack) from [<c03f9c64>] (dump_stack+0xb4/0xe0)
[   81.539072] [<c03f9c64>] (dump_stack) from [<c017e970>] (print_usage_bug+0x234/0x2e0)
[   81.546846] [<c017e970>] (print_usage_bug) from [<c017f058>] (mark_lock+0x63c/0x7bc)
[   81.554532] [<c017f058>] (mark_lock) from [<c017fe90>] (__lock_acquire+0x654/0x1bfc)
[   81.562217] [<c017fe90>] (__lock_acquire) from [<c0181d04>] (lock_acquire+0xfc/0x2b0)
[   81.569990] [<c0181d04>] (lock_acquire) from [<c054b76c>] (mtk_stats_update_mac+0x60/0x2c0)
[   81.578283] [<c054b76c>] (mtk_stats_update_mac) from [<c054ba3c>] (mtk_handle_status_irq.part.6+0x70/0x84)
[   81.587865] [<c054ba3c>] (mtk_handle_status_irq.part.6) from [<c054c2b8>] (mtk_napi_tx+0x358/0x37c)
[   81.596845] [<c054c2b8>] (mtk_napi_tx) from [<c06782ec>] (net_rx_action+0x244/0x504)
[   81.604533] [<c06782ec>] (net_rx_action) from [<c01015c4>] (__do_softirq+0x134/0x614)
[   81.612306] [<c01015c4>] (__do_softirq) from [<c012f0c4>] (run_ksoftirqd+0x44/0x80)
[   81.619907] [<c012f0c4>] (run_ksoftirqd) from [<c0154680>] (smpboot_thread_fn+0x14c/0x25c)
[   81.628110] [<c0154680>] (smpboot_thread_fn) from [<c014f8cc>] (kthread+0x150/0x180)
[   81.635798] [<c014f8cc>] (kthread) from [<c0109290>] (ret_from_fork+0x14/0x24)

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:43:38 -07:00
David S. Miller 2671e9fc62 Merge branch 'ipv4-ipv6-refcount_t'
Elena Reshetova says:

====================
v2 ipv4/ipv6 refcount conversions

Changes in v2:
 * rebase on top of net-next
 * currently by default refcount_t = atomic_t (*) and uses all
   atomic standard operations unless CONFIG_REFCOUNT_FULL is enabled.
   This is a compromise for the systems that are critical on
   performance (such as net) and cannot accept even slight delay
   on the refcounter operations.

This series, for ipv4/ipv6 network components, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The patches are fully independent and can be cherry-picked separately.
In order to try with refcount functionality enabled in run-time,
CONFIG_REFCOUNT_FULL must be enabled.

NOTE: automatic kernel builder for some reason doesn't like all my
network branches and regularly times out the builds on these branches.
Suggestion for "waiting a day for a good coverage" doesn't work, as
we have seen with generic network conversions. So please wait for the
full report from kernel test rebot before merging further up.
This has been compile-tested in 116 configs, but 71 timed out (including
all s390-related configs again). I am trying to see if they can fix
build coverage for me in meanwhile.

* The respective change is currently merged into -next as
  "locking/refcount: Create unchecked atomic_t implementation".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:05 -07:00
Reshetova, Elena 0029c0deb5 net, ipv4: convert fib_info.fib_clntref from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena f6a6fede28 net, ipv4: convert cipso_v4_doi.refcount from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena 87078f26b6 net, ipv6: convert ip6addrlbl_entry.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena d12f3827e0 net, ipv6: convert xfrm6_tunnel_spi.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena affa78bc6a net, ipv6: convert ifacaddr6.aca_refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena d3981bc615 net, ipv6: convert ifmcaddr6.mca_refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena 271201c09c net, ipv6: convert inet6_ifaddr.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena 1be9246077 net, ipv6: convert inet6_dev.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:04 -07:00
Reshetova, Elena 0aeea21ada net, ipv6: convert ipv6_txoptions.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 01:29:03 -07:00
Michal Kalderon 25f4535a94 qed: initialize ll2_syn_handle at start of function
Fix compilation warning
qed_iwarp.c:1721:5: warning: ll2_syn_handle may be used
uninitialized in this function

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 14:23:53 -07:00
Daniel Axtens 52427fa063 openvswitch: fix mis-ordered comment lines for ovs_skb_cb
I was trying to wrap my head around meaning of mru, and realised
that the second line of the comment defining it had somehow
ended up after the line defining cutlen, leading to much confusion.

Reorder the lines to make sense.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 05:54:00 -07:00
David S. Miller bcdb9e8523 wireless-drivers-next patches for 4.13
Last minute changes to get new hardware and firmware support for
 iwlwifi and few other changes I was able to squeeze in. Also two
 patches for ieee80211.h and nl80211 as Johannes is away.
 
 Major changes:
 
 iwlwifi
 
 * some important fixes for 9000 HW
 
 * support for version 30 of the FW API for 8000 and 9000 series
 
 * a few new PCI IDs for 9000 series
 
 * reorganization of common files
 
 brcmfmac
 
 * support 4-way handshake offloading for WPA/WPA2-PSK and 802.1X
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZWiolAAoJEG4XJFUm622bDq4H/0hezP1JZ7ljQ5+w6PCNNij9
 o1+MSjMc3Xtj+uQsQOOReFCfFyUQDwsa8YxghCPjjQJvNJOGfChOS3yszFKRpn94
 +IVaAbKO+9yPa8lwh8l5RPls3wP2MF6UL/QiArPxlo0OFZy0PYUsl0vZGiOFWTKe
 efoUG4mt/JMAPUcffoOcnitNYrys+scVqyTt1lOzIEUnivafBrsHE+srGzgse00S
 Wc9F10f+tJNqs9xCXYam5OaE9DSVobvUiYfFC1Ql1amYCfsDHJ3ve0gDHqxEE6HL
 OwK3pM9F7dUFLHWf/I2kjy1NNqVu3aoL4RAJnMguQih3scdCTOh6Tk7Or1YAPRM=
 =EOqs
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2017-07-03' of https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.13

Last minute changes to get new hardware and firmware support for
iwlwifi and few other changes I was able to squeeze in. Also two
patches for ieee80211.h and nl80211 as Johannes is away.

Major changes:

iwlwifi

* some important fixes for 9000 HW

* support for version 30 of the FW API for 8000 and 9000 series

* a few new PCI IDs for 9000 series

* reorganization of common files

brcmfmac

* support 4-way handshake offloading for WPA/WPA2-PSK and 802.1X
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 05:51:45 -07:00
David S. Miller 3a3f7d130e Merge https://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some overlapping changes in the mlx5 driver.

A merge conflict resolution posted by Stephen Rothwell was used as a
guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 03:42:10 -07:00
Eric Dumazet 784c372a81 net: make sk_ehashfn() static
sk_ehashfn() is only used from a single file.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 03:29:14 -07:00
Eric Dumazet 5361e209dd net: avoid one splat in fib_nl_delrule()
We need to use refcount_set() on a newly created rule to avoid
following error :

[   64.601749] ------------[ cut here ]------------
[   64.601757] WARNING: CPU: 0 PID: 6476 at lib/refcount.c:184 refcount_sub_and_test+0x75/0xa0
[   64.601758] Modules linked in: w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
[   64.601769] CPU: 0 PID: 6476 Comm: ip Tainted: G        W       4.12.0-smp-DEV #274
[   64.601771] task: ffff8837bf482040 task.stack: ffff8837bdc08000
[   64.601773] RIP: 0010:refcount_sub_and_test+0x75/0xa0
[   64.601774] RSP: 0018:ffff8837bdc0f5c0 EFLAGS: 00010286
[   64.601776] RAX: 0000000000000026 RBX: 0000000000000001 RCX: 0000000000000000
[   64.601777] RDX: 0000000000000026 RSI: 0000000000000096 RDI: ffffed06f7b81eae
[   64.601778] RBP: ffff8837bdc0f5d0 R08: 0000000000000004 R09: fffffbfff4a54c25
[   64.601779] R10: 00000000cbc500e5 R11: ffffffffa52a6128 R12: ffff881febcf6f24
[   64.601779] R13: ffff881fbf4eaf00 R14: ffff881febcf6f80 R15: ffff8837d7a4ed00
[   64.601781] FS:  00007ff5a2f6b700(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000
[   64.601782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   64.601783] CR2: 00007ffcdc70d000 CR3: 0000001f9c91e000 CR4: 00000000001406f0
[   64.601783] Call Trace:
[   64.601786]  refcount_dec_and_test+0x11/0x20
[   64.601790]  fib_nl_delrule+0xc39/0x1630
[   64.601793]  ? is_bpf_text_address+0xe/0x20
[   64.601795]  ? fib_nl_newrule+0x25e0/0x25e0
[   64.601798]  ? depot_save_stack+0x133/0x470
[   64.601801]  ? ns_capable+0x13/0x20
[   64.601803]  ? __netlink_ns_capable+0xcc/0x100
[   64.601806]  rtnetlink_rcv_msg+0x23a/0x6a0
[   64.601808]  ? rtnl_newlink+0x1630/0x1630
[   64.601811]  ? memset+0x31/0x40
[   64.601813]  netlink_rcv_skb+0x2d7/0x440
[   64.601815]  ? rtnl_newlink+0x1630/0x1630
[   64.601816]  ? netlink_ack+0xaf0/0xaf0
[   64.601818]  ? kasan_unpoison_shadow+0x35/0x50
[   64.601820]  ? __kmalloc_node_track_caller+0x4c/0x70
[   64.601821]  rtnetlink_rcv+0x28/0x30
[   64.601823]  netlink_unicast+0x422/0x610
[   64.601824]  ? netlink_attachskb+0x650/0x650
[   64.601826]  netlink_sendmsg+0x7b7/0xb60
[   64.601828]  ? netlink_unicast+0x610/0x610
[   64.601830]  ? netlink_unicast+0x610/0x610
[   64.601832]  sock_sendmsg+0xba/0xf0
[   64.601834]  ___sys_sendmsg+0x6a9/0x8c0
[   64.601835]  ? copy_msghdr_from_user+0x520/0x520
[   64.601837]  ? __alloc_pages_nodemask+0x160/0x520
[   64.601839]  ? memcg_write_event_control+0xd60/0xd60
[   64.601841]  ? __alloc_pages_slowpath+0x1d50/0x1d50
[   64.601843]  ? kasan_slab_free+0x71/0xc0
[   64.601845]  ? mem_cgroup_commit_charge+0xb2/0x11d0
[   64.601847]  ? lru_cache_add_active_or_unevictable+0x7d/0x1a0
[   64.601849]  ? __handle_mm_fault+0x1af8/0x2810
[   64.601851]  ? may_open_dev+0xc0/0xc0
[   64.601852]  ? __pmd_alloc+0x2c0/0x2c0
[   64.601853]  ? __fdget+0x13/0x20
[   64.601855]  __sys_sendmsg+0xc6/0x150
[   64.601856]  ? __sys_sendmsg+0xc6/0x150
[   64.601857]  ? SyS_shutdown+0x170/0x170
[   64.601859]  ? handle_mm_fault+0x28a/0x650
[   64.601861]  SyS_sendmsg+0x12/0x20
[   64.601863]  entry_SYSCALL_64_fastpath+0x13/0x94

Fixes: 717d1e993a ("net: convert fib_rule.refcnt from atomic_t to refcount_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 03:29:14 -07:00
Zhu Yanjun 3b68067bd2 mlx4_en: make mlx4_log_num_mgm_entry_size static
The variable mlx4_log_num_mgm_entry_size is only called in main.c.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:41:26 -07:00
Alban Browaeys 9af9959e14 net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64
commit 9256645af0 ("net/core: relax BUILD_BUG_ON in
netdev_stats_to_stats64") made an attempt to read beyond
the size of the source a possibility.

Fix to only copy src size to dest. As dest might be bigger than src.

 ==================================================================
 BUG: KASAN: slab-out-of-bounds in netdev_stats_to_stats64+0xe/0x30 at addr ffff8801be248b20
 Read of size 192 by task VBoxNetAdpCtl/6734
 CPU: 1 PID: 6734 Comm: VBoxNetAdpCtl Tainted: G           O    4.11.4prahal+intel+ #118
 Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET52WW (1.32 ) 05/04/2017
 Call Trace:
  dump_stack+0x63/0x86
  kasan_object_err+0x1c/0x70
  kasan_report+0x270/0x520
  ? netdev_stats_to_stats64+0xe/0x30
  ? sched_clock_cpu+0x1b/0x190
  ? __module_address+0x3e/0x3b0
  ? unwind_next_frame+0x1ea/0xb00
  check_memory_region+0x13c/0x1a0
  memcpy+0x23/0x50
  netdev_stats_to_stats64+0xe/0x30
  dev_get_stats+0x1b9/0x230
  rtnl_fill_stats+0x44/0xc00
  ? nla_put+0xc6/0x130
  rtnl_fill_ifinfo+0xe9e/0x3700
  ? rtnl_fill_vfinfo+0xde0/0xde0
  ? sched_clock+0x9/0x10
  ? sched_clock+0x9/0x10
  ? sched_clock_local+0x120/0x130
  ? __module_address+0x3e/0x3b0
  ? unwind_next_frame+0x1ea/0xb00
  ? sched_clock+0x9/0x10
  ? sched_clock+0x9/0x10
  ? sched_clock_cpu+0x1b/0x190
  ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
  ? depot_save_stack+0x1d8/0x4a0
  ? depot_save_stack+0x34f/0x4a0
  ? depot_save_stack+0x34f/0x4a0
  ? save_stack+0xb1/0xd0
  ? save_stack_trace+0x16/0x20
  ? save_stack+0x46/0xd0
  ? kasan_slab_alloc+0x12/0x20
  ? __kmalloc_node_track_caller+0x10d/0x350
  ? __kmalloc_reserve.isra.36+0x2c/0xc0
  ? __alloc_skb+0xd0/0x560
  ? rtmsg_ifinfo_build_skb+0x61/0x120
  ? rtmsg_ifinfo.part.25+0x16/0xb0
  ? rtmsg_ifinfo+0x47/0x70
  ? register_netdev+0x15/0x30
  ? vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp]
  ? vboxNetAdpCreate+0x210/0x400 [vboxnetadp]
  ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
  ? do_vfs_ioctl+0x17f/0xff0
  ? SyS_ioctl+0x74/0x80
  ? do_syscall_64+0x182/0x390
  ? __alloc_skb+0xd0/0x560
  ? __alloc_skb+0xd0/0x560
  ? save_stack_trace+0x16/0x20
  ? init_object+0x64/0xa0
  ? ___slab_alloc+0x1ae/0x5c0
  ? ___slab_alloc+0x1ae/0x5c0
  ? __alloc_skb+0xd0/0x560
  ? sched_clock+0x9/0x10
  ? kasan_unpoison_shadow+0x35/0x50
  ? kasan_kmalloc+0xad/0xe0
  ? __kmalloc_node_track_caller+0x246/0x350
  ? __alloc_skb+0xd0/0x560
  ? kasan_unpoison_shadow+0x35/0x50
  ? memset+0x31/0x40
  ? __alloc_skb+0x31f/0x560
  ? napi_consume_skb+0x320/0x320
  ? br_get_link_af_size_filtered+0xb7/0x120 [bridge]
  ? if_nlmsg_size+0x440/0x630
  rtmsg_ifinfo_build_skb+0x83/0x120
  rtmsg_ifinfo.part.25+0x16/0xb0
  rtmsg_ifinfo+0x47/0x70
  register_netdevice+0xa2b/0xe50
  ? __kmalloc+0x171/0x2d0
  ? netdev_change_features+0x80/0x80
  register_netdev+0x15/0x30
  vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp]
  vboxNetAdpCreate+0x210/0x400 [vboxnetadp]
  ? vboxNetAdpComposeMACAddress+0x1d0/0x1d0 [vboxnetadp]
  ? kasan_check_write+0x14/0x20
  VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
  ? VBoxNetAdpLinuxOpen+0x20/0x20 [vboxnetadp]
  ? lock_acquire+0x11c/0x270
  ? __audit_syscall_entry+0x2fb/0x660
  do_vfs_ioctl+0x17f/0xff0
  ? __audit_syscall_entry+0x2fb/0x660
  ? ioctl_preallocate+0x1d0/0x1d0
  ? __audit_syscall_entry+0x2fb/0x660
  ? kmem_cache_free+0xb2/0x250
  ? syscall_trace_enter+0x537/0xd00
  ? exit_to_usermode_loop+0x100/0x100
  SyS_ioctl+0x74/0x80
  ? do_sys_open+0x350/0x350
  ? do_vfs_ioctl+0xff0/0xff0
  do_syscall_64+0x182/0x390
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: 0033:0x7f7e39a1ae07
 RSP: 002b:00007ffc6f04c6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007ffc6f04c730 RCX: 00007f7e39a1ae07
 RDX: 00007ffc6f04c730 RSI: 00000000c0207601 RDI: 0000000000000007
 RBP: 00007ffc6f04c700 R08: 00007ffc6f04c780 R09: 0000000000000008
 R10: 0000000000000541 R11: 0000000000000206 R12: 0000000000000007
 R13: 00000000c0207601 R14: 00007ffc6f04c730 R15: 0000000000000012
 Object at ffff8801be248008, in cache kmalloc-4096 size: 4096
 Allocated:
 PID = 6734
  save_stack_trace+0x16/0x20
  save_stack+0x46/0xd0
  kasan_kmalloc+0xad/0xe0
  __kmalloc+0x171/0x2d0
  alloc_netdev_mqs+0x8a7/0xbe0
  vboxNetAdpOsCreate+0x65/0x1c0 [vboxnetadp]
  vboxNetAdpCreate+0x210/0x400 [vboxnetadp]
  VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp]
  do_vfs_ioctl+0x17f/0xff0
  SyS_ioctl+0x74/0x80
  do_syscall_64+0x182/0x390
  return_from_SYSCALL_64+0x0/0x6a
 Freed:
 PID = 5600
  save_stack_trace+0x16/0x20
  save_stack+0x46/0xd0
  kasan_slab_free+0x73/0xc0
  kfree+0xe4/0x220
  kvfree+0x25/0x30
  single_release+0x74/0xb0
  __fput+0x265/0x6b0
  ____fput+0x9/0x10
  task_work_run+0xd5/0x150
  exit_to_usermode_loop+0xe2/0x100
  do_syscall_64+0x26c/0x390
  return_from_SYSCALL_64+0x0/0x6a
 Memory state around the buggy address:
  ffff8801be248a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801be248b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff8801be248b80: 00 00 00 00 00 00 00 00 00 00 00 07 fc fc fc fc
                                                     ^
  ffff8801be248c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801be248c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ==================================================================

Signed-off-by: Alban Browaeys <alban.browaeys@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:40:23 -07:00
Christos Gkekas 5614fd84eb netxen_nic: Remove unused pointer hdr in netxen_setup_minidump()
Pointer hdr in netxen_setup_minidump() is set but never used, thus
should be removed.

Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:38:02 -07:00
David S. Miller 4b821cb200 Merge branch 'vxlan-geneve-fix-hlist-corruption'
Jiri Benc says:

====================
vxlan, geneve: fix hlist corruption

Fix memory corruption introduced with the support of both IPv4 and IPv6
sockets in a single device. The same bug is present in VXLAN and Geneve.
====================

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:36:42 -07:00
Jiri Benc 4b4c21fad6 geneve: fix hlist corruption
It's not a good idea to add the same hlist_node to two different hash lists.
This leads to various hard to debug memory corruptions.

Fixes: 8ed66f0e82 ("geneve: implement support for IPv6-based tunnels")
Cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:36:27 -07:00
Jiri Benc 69e766612c vxlan: fix hlist corruption
It's not a good idea to add the same hlist_node to two different hash lists.
This leads to various hard to debug memory corruptions.

Fixes: b1be00a6c3 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:36:27 -07:00
Or Gerlitz c1c1d86bde net/mlxfw: Properly handle dependancy with non-loadable mlx5
If mlx5 is set to be built-in and mlxfw as a module, we
get a link error:

drivers/built-in.o: In function `mlx5_firmware_flash':
(.text+0x5aed72): undefined reference to `mlxfw_firmware_flash'

Since we don't want to mandate selecting mlxfw for mlx5 users, we
use the IS_REACHABLE macro to make sure that a stub is exposed
to the caller.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:32:25 -07:00
David S. Miller b2c9c5df66 iucv: Convert sk_wmem_alloc accesses to refcount_t.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:31:22 -07:00
David S. Miller bba5850c8b ctcm_fsms: Convert skb->user accesses to refcount_t
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:29:57 -07:00
David S. Miller 63d7c880c5 Merge branch 'bpf-misc-helper-verifier-improvements'
Daniel Borkmann says:

====================
Misc BPF helper/verifier improvements

Miscellanous improvements I still had in my queue, it adds a new
bpf_skb_adjust_room() helper for cls_bpf, exports to fdinfo whether
tail call array owner is JITed, so iproute2 error reporting can be
improved on that regard, a small cleanup and extension to trace
printk, two verifier patches, one to make the code around narrower
ctx access a bit more straight forward and one to allow for imm += x
operations, that we've seen LLVM generating and the verifier currently
rejecting. We've included the patch 6 given it's rather small and
we ran into it from LLVM side, it would be great if it could be
queued for stable as well after the merge window. Last but not least,
test cases are added also related to imm alu improvement.

Thanks a lot!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:53 -07:00
Daniel Borkmann 6d191ed40d bpf: add various test cases for verifier selftest
Add couple of verifier test cases for x|imm += pkt_ptr, including the
imm += x extension.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
John Fastabend 43188702b3 bpf, verifier: add additional patterns to evaluate_reg_imm_alu
Currently the verifier does not track imm across alu operations when
the source register is of unknown type. This adds additional pattern
matching to catch this and track imm. We've seen LLVM generating this
pattern while working on cilium.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
John Fastabend 7bda4b40c5 bpf: extend bpf_trace_printk to support %i
Currently, bpf_trace_printk does not support common formatting
symbol '%i' however vsprintf does and is what eventually gets
called by bpf helper. If users are used to '%i' and currently
make use of it, then bpf_trace_printk will just return with
error without dumping anything to the trace pipe, so just add
support for '%i' to the helper.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann 9780c0ab1a bpf: export whether tail call has jited owner
We do export through fdinfo already whether a prog is JITed or not,
given a program load can fail in case of either prog or tail call map
has JITed property, but neither both are JITed or not JITed, we can
facilitate error reporting in loaders like iproute2 through exporting
owner_jited of tail call map. We already do export owner_prog_type
through this facility, so parser can pick up both for comparison.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann f96da09473 bpf: simplify narrower ctx access
This work tries to make the semantics and code around the
narrower ctx access a bit easier to follow. Right now
everything is done inside the .is_valid_access(). Offset
matching is done differently for read/write types, meaning
writes don't support narrower access and thus matching only
on offsetof(struct foo, bar) is enough whereas for read
case that supports narrower access we must check for
offsetof(struct foo, bar) + offsetof(struct foo, bar) +
sizeof(<bar>) - 1 for each of the cases. For read cases of
individual members that don't support narrower access (like
packet pointers or skb->cb[] case which has its own narrow
access logic), we check as usual only offsetof(struct foo,
bar) like in write case. Then, for the case where narrower
access is allowed, we also need to set the aux info for the
access. Meaning, ctx_field_size and converted_op_size have
to be set. First is the original field size e.g. sizeof(<bar>)
as in above example from the user facing ctx, and latter
one is the target size after actual rewrite happened, thus
for the kernel facing ctx. Also here we need the range match
and we need to keep track changing convert_ctx_access() and
converted_op_size from is_valid_access() as both are not at
the same location.

We can simplify the code a bit: check_ctx_access() becomes
simpler in that we only store ctx_field_size as a meta data
and later in convert_ctx_accesses() we fetch the target_size
right from the location where we do convert. Should the verifier
be misconfigured we do reject for BPF_WRITE cases or target_size
that are not provided. For the subsystems, we always work on
ranges in is_valid_access() and add small helpers for ranges
and narrow access, convert_ctx_accesses() sets target_size
for the relevant instruction.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann 2be7e212d5 bpf: add bpf_skb_adjust_room helper
This work adds a helper that can be used to adjust net room of an
skb. The helper is generic and can be further extended in future.
Main use case is for having a programmatic way to add/remove room to
v4/v6 header options along with cls_bpf on egress and ingress hook
of the data path. It reuses most of the infrastructure that we added
for the bpf_skb_change_type() helper which can be used in nat64
translations. Similarly, the helper only takes care of adjusting the
room so that related data is populated and csum adapted out of the
BPF program using it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:52 -07:00
Daniel Borkmann 0daf434940 bpf, net: add skb_mac_header_len helper
Add a small skb_mac_header_len() helper similarly as the
skb_network_header_len() we have and replace open coded
places in BPF's bpf_skb_change_proto() helper. Will also
be used in upcoming work.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:22:51 -07:00
Tore Anderson a68491f895 net: cdc_mbim: apply "NDP to end" quirk to HP lt4132
The HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) is a rebranded Huawei
ME906s-158 device. It, like the ME906s-158, requires the "NDP to end"
quirk for correct operation.

Signed-off-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:19:36 -07:00
Matteo Croce 75674c4cbb Documentation: fix wrong example command
In the IPVLAN documentation there is an example command line where the
master and slave interface names are inverted.
Fix the command line and also add the optional `name' keyword to better
describe what the command is doing.

v2: added commit message

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:08:34 -07:00
Sabrina Dubroca 889ce937c9 vxlan: correctly set vxlan->net when creating the device in a netns
Commit a985343ba9 ("vxlan: refactor verification and application of
configuration") modified vxlan device creation, and replaced the
assignment of vxlan->net to src_net with dev_net(netdev) in ->setup().

But dev_net(netdev) is not the same as src_net. At the time ->setup()
is called, dev_net hasn't been set yet, so we end up creating the
socket for the vxlan device in init_net.

Fix this by bringing back the assignment of vxlan->net during device
creation.

Fixes: a985343ba9 ("vxlan: refactor verification and application of configuration")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:04:10 -07:00
David S. Miller c3b99db809 Merge branch 'hns-phy-loopback'
Lin Yun Sheng says:

====================
Add loopback support in phy_driver and hns ethtool fix

This Patch Set add set_loopback in phy_driver and use it to setup loopback
when doing ethtool phy self_test.

Patch V8:
	Respin the Patch based on net-next

Patch V7:
	1. Add comment why resume the phy in hns_nic_config_phy_loopback.
	2. Fix a typo error in patch description.

Patch V6:
	Fix Or'ing error code in __lb_setup.

Patch V5:
	Removing non loopback related code change.

Patch V4:
	1. Remove c45 checking
	2. Add -ENOTSUPP when function pointer is null,
	   take mutex in phy_loopback.

Patch V3:
	Calling phy_loopback enable and disable in pair in hns mac driver.

Patch V2:
	1. Add phy_loopback in phy_device.c.
	2. Do error checking and do the read and write once in
	   genphy_loopback.
	3. Remove gen10g_loopback in phy_device.c.

Patch V1:
	Initial Submit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:01:16 -07:00
Lin Yun Sheng 67cd9a997f net: hns: Use phy_driver to setup Phy loopback
Use function set_loopback in phy_driver to setup phy loopback
when doing ethtool self test.

Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:01:15 -07:00
Lin Yun Sheng f0f9b4ed23 net: phy: Add phy loopback support in net phy framework
This patch add set_loopback in phy_driver, which is used by MAC
driver to enable or disable phy loopback. it also add a generic
genphy_loopback function, which use BMCR loopback bit to enable
or disable loopback.

Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:01:15 -07:00
Stephen Rothwell 6992c6c5dd net/mlx5: fix memcpy limit?
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:57:27 -07:00
Sabrina Dubroca ec8add2a4c ipv6: dad: don't remove dynamic addresses if link is down
Currently, when the link for $DEV is down, this command succeeds but the
address is removed immediately by DAD (1):

    ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

In the same situation, this will succeed and not remove the address (2):

    ip addr add 1111::12/64 dev $DEV
    ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

The comment in addrconf_dad_begin() when !IF_READY makes it look like
this is the intended behavior, but doesn't explain why:

     * If the device is not ready:
     * - keep it tentative if it is a permanent address.
     * - otherwise, kill it.

We clearly cannot prevent userspace from doing (2), but we can make (1)
work consistently with (2).

addrconf_dad_stop() is only called in two cases: if DAD failed, or to
skip DAD when the link is down. In that second case, the fix is to avoid
deleting the address, like we already do for permanent addresses.

Fixes: 3c21edbd11 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:53:51 -07:00
Jim Baxter e1069bbfcf net: cdc_ncm: Reduce memory use when kernel memory low
The CDC-NCM driver can require large amounts of memory to create
skb's and this can be a problem when the memory becomes fragmented.

This especially affects embedded systems that have constrained
resources but wish to maximise the throughput of CDC-NCM with 16KiB
NTB's.

The issue is after running for a while the kernel memory can become
fragmented and it needs compacting.
If the NTB allocation is needed before the memory has been compacted
the atomic allocation can fail which can cause increased latency,
large re-transmissions or disconnections depending upon the data
being transmitted at the time.
This situation occurs for less than a second until the kernel has
compacted the memory but the failed devices can take a lot longer to
recover from the failed TX packets.

To ease this temporary situation I modified the CDC-NCM TX path to
temporarily switch into a reduced memory mode which allocates an NTB
that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
sized memory block and only transmit NTB's with a single network frame
until the memory situation is resolved.
Each time this issue occurs we wait for an increasing number of
reduced size allocations before requesting a full size one to not
put additional pressure on a low memory system.

Once the memory is compacted the CDC-NCM data can resume transmitting
at the normal tx_max rate once again.

Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:50:49 -07:00
David S. Miller 2da95be940 Merge branch 'qed-Add-iWARP-support-for-QL4xxxx'
Michal Kalderon says:

====================
qed: Add iWARP support for QL4xxxx

This patch series adds iWARP support to our QL4xxxx networking adapters.
The code changes span across qed and qedr drivers, but this series contains
changes to qed only. Once the series is accepted, the qedr series will
be submitted to the rdma tree.
There is one additional qed patch which enables the iWARP, this patch is
delayed until the qedr series will be accepted.

The patches were previously sent as an RFC, and these are the first 12
patches in the RFC series:
https://www.spinics.net/lists/linux-rdma/msg51416.html

This series was tested and built against net-next.

MAINTAINERS file is not updated in this PATCH as there is a pending patch
for qedr driver update https://patchwork.kernel.org/patch/9752761.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:46 -07:00
Kalderon, Michal 93c45984d3 qed: Add iWARP support for physical queue allocation
iWARP has different physical queue requirements than RoCE

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00
Kalderon, Michal 5d7dc9620d qed: Add iWARP protocol support in context allocation
When computing how much memory is required for the different hw clients
iWARP protocol should be taken into account

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 01:43:45 -07:00