Commit Graph

616911 Commits

Author SHA1 Message Date
Jiri Kosina 69012ae425 net: sched: fix handling of singleton qdiscs with qdisc_hash
qdisc_match_from_root() is now iterating over per-netdevice qdisc
hashtable instead of going through a linked-list of qdiscs (independently
on the actual underlying netdev), which was the case before the switch to
hashtable for qdiscs.

For singleton qdiscs, there is no underlying netdev associated though, and
therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will
always be NULL.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
 IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
 PGD 1aceba067 PUD 1aceb7067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP
[ ... ]
 task: ffff8801ec996e00 task.stack: ffff8801ec934000
 RIP: 0010:[<ffffffff8167efac>]  [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
 RSP: 0018:ffff8801ec937ab0  EFLAGS: 00010203
 RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
 RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
 RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
 R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
 R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
 FS:  00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
 Stack:
  ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
  ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
  ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
 Call Trace:
  [<ffffffff81681210>] qdisc_lookup+0x40/0x50
  [<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
  [<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
  [<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
  [<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
  [<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
  [<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
  [<ffffffff8168919b>] netlink_unicast+0x15b/0x210
  [<ffffffff81689569>] netlink_sendmsg+0x319/0x390
  [<ffffffff816379f8>] sock_sendmsg+0x38/0x50
  [<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
  [<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
  [<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
  [<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
  [<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
  [<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
  [<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
  [<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
  [<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
  [<ffffffff810038e7>] do_syscall_64+0x57/0xb0

Fix this by special-casing singleton qdiscs (those that don't have
underlying netdevice) and introduce immediate handling of those rather
than trying to go over an underlying netdevice. We're in the same
situation in tc_dump_qdisc_root() and tc_dump_tclass_root().

Ultimately, this will have to be slightly reworked so that we are actually
able to show singleton qdiscs (noop) in the dump properly; but we're not
currently doing that anyway, so no regression there, and better do this in
a gradual manner.

Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:19:08 -07:00
David S. Miller e951f145d1 Merge branch 'tipc-next'
Jon Maloy says:

====================
tipc: bearer and link improvements

The first commit makes it possible to set and check the 'blocked' state
of a bearer from the generic bearer layer. The second commit is a small
improvement to the link congestion mechanism.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:14:37 -07:00
Jon Paul Maloy 5a0950c272 tipc: ensure that link congestion and wakeup use same criteria
When a link is attempted woken up after congestion, it uses a different,
more generous criteria than when it was originally declared congested.
This has the effect that the link, and the sending process, sometimes
will be woken up unnecessarily, just to immediately return to congestion
when it turns out there is not not enough space in its send queue to
host the pending message. This is a waste of CPU cycles.

We now change the function link_prepare_wakeup() to use exactly the same
criteria as tipc_link_xmit(). However, since we are now excluding the
window limit from the wakeup calculation, and the current backlog limit
for the lowest level is too small to house even a single maximum-size
message, we have to expand this limit. We do this by evaluating an
alternative, minimum value during the setting of the importance limits.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:14:37 -07:00
Jon Paul Maloy 0d051bf93c tipc: make bearer packet filtering generic
In commit 5b7066c3dd ("tipc: stricter filtering of packets in bearer
layer") we introduced a method of filtering out messages while a bearer
is being reset, to avoid that links may be re-created and come back in
working state while we are still in the process of shutting them down.

This solution works well, but is limited to only work with L2 media, which
is insufficient with the increasing use of UDP as carrier media.

We now replace this solution with a more generic one, by introducing a
new flag "up" in the generic struct tipc_bearer. This field will be set
and reset at the same locations as with the previous solution, while
the packet filtering is moved to the generic code for the sending side.
On the receiving side, the filtering is still done in media specific
code, but now including the UDP bearer.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:14:36 -07:00
David S. Miller 37bd91d1d9 Merge branch 'qed-next'
Sudarsana Reddy Kalluru says:

====================
qed*: Add support for additional statistics.

The patch series adds qed/qede support for new statistics.
Patch (1) adds couple of statistcs for "ethtool -S" display.
Patch (2) adds support for per-queue statistics to ethtool display.
Patch (3) adds qed support for NCSI statistics.

Please consider applying this to 'net-next' branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:11:46 -07:00
Sudarsana Reddy Kalluru 6c75424612 qed: Add support for NCSI statistics.
The patch adds driver support for sending the NCSI statistics to the
MFW. This is an asynchronous request from MFW. Upon receiving this, driver
populates the required data and send it to MFW.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:11:45 -07:00
Sudarsana Reddy Kalluru 68db9ec2df qede: Add support for per-queue stats.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:11:45 -07:00
Sudarsana Reddy Kalluru 1a5a366f08 qede: Add support for capturing additional stats in ethtool-stats display.
The patch adds driver support for capturing stats ttl0_discard and
packet_too_big_discard in "ethtool -S" display.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:11:45 -07:00
Colin Ian King 0d135e4f26 net: atm: remove redundant null pointer check on dev->name
dev->name is a char array of IFNAMSIZ elements, hence can never be
null, so the null pointer check is redundant. Remove it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:03:48 -07:00
Appana Durga Kedareswara Rao e202d4c635 net: phy: Update copyright info
For implementing this driver most of the inputs is
provided by Andrew Lunn.

Updating the driver with Andrew Copy right.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 21:00:16 -07:00
shubhrajyoti.datta@xilinx.com aead88bd0e net: ethernet: macb: Add support for rx_clk
Some of the platforms like zynqmp ultrascale+ has a
separate clock gate for the rx clock. Add an optional
rx_clk so that the clock can be enabled.

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 20:58:42 -07:00
David S. Miller d52bfbda77 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-08-18

This series contains updates to i40e and i40evf only.

Wei Yongjun updates i40e to use list_move() instead of list_del() &
list_add() operations.

Anjali fixes an issue where the client->open call was not protected with
the client instance mutex, which allowed client->close to be called before
the open all completed.

Catherine makes sure that the VLAN count (and stats) gets reset to 0
after reset.

Jake provides two patches, first adds the needed rtnl lock around
i40evf_set_interrupt_capability() since i40evf_init_task() does not
hold the rtnl_lock.  Second fixes an issue where users could reduce
the number of channels (queues) below the current flow director
filter rules targets.

Dave fixes a problem where a static analysis tool generates a warning
so eliminating the irrelevant check and redundant assignment for the
value of enabled_tc.

Avinash fixes an sync issue where the iWARP device open is called
before the PCI register writes are completed, so ensure the register
writes complete before exiting the setup function.

Alan fixes a bug which causes RSS to continue to work after being
disabled.

Carolyn implements a feature change which allows using ethtool to set
RDD hash options using less than four parameters if desired.

Dan Carpenter cleans up a stray unlock.

Sridhar exposes the "trust" flag to userspace via ndo_get_vf_config().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 20:45:14 -07:00
Sridhar Samudrala d40062f3c4 i40e: Expose 'trust' flag to userspace via ndo_get_vf_config.
This enables
	ip -d l
to indicate if trust is on or off for VFs.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:14 -07:00
Dan Carpenter be0cb0a66a i40e: remove a stray unlock
We shifted the locking around a bit but forgot to delete this unlock so
now it can unlock twice.

Fixes: cd3be169a5 ('i40e: Move the mutex lock in i40e_client_unregister')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:14 -07:00
Bimmy Pujari 93e6fa2c34 i40e/i40evf-Bump version from 1.6.11 to 1.6.12
Signed-off-by: Bimmy Pujari <Bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:13 -07:00
Carolyn Wyborny eb0dd6e4a3 i40e: Allow RSS Hash set with less than four parameters
This patch implements a feature change which allows using ethtool to set
RSS hash opts using less than four parameters if desired.

Change-ID: I0fbb91255d81e997c456697c21ac39cc9754821b
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:13 -07:00
Mitch Williams b7d2cd951f i40e: fix memory leak
When we allocate memory, we must free it. It's simple courtesy.

Change-ID: Id007294096fb53344f1a8b9a0f78eddf9853c5d6
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:13 -07:00
Alan Brady d8ec986464 i40e: fix lookup table when RSS disabled/enabled
This patch fixes the bug which causes RSS to continue to work
after being disabled.  After disabling RSS, traffic would continue
to be assigned to different queues instead of falling back to a
single queue. Without this patch, attempting to disable RSS would
not work as expected. This patch fixes the bug by clearing the
lookup table used by RSS such that all traffic is assigned to a
single queue.  This patch also addresses the issue of reinstating
 the lookup table should RSS then be re-enabled.

Change-ID: Ib20c7c6a7e9f1f772bb787370f8a8c664796b141
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:12 -07:00
Avinash Dayanand 6a23449a23 i40e: Don't notify client of VF reset during VF creation
VF goes through reset path during VF creation which happens to also
have notification of VF reset to client. Adding conditional check to
avoid wrongly notifying VF reset during VF creation.

Also changing the call order of VF enable, calling it after VF creation
rather than before.

Change-ID: I96eabd99deae746a2f0fc465194c886f196178ce
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:12 -07:00
Avinash Dayanand 70df973b5e i40e: Force register writes to mitigate sync issues with iwarp VF driver
This patch is a fix for the bug i.e. unable to create iwarp device
in VF. This is a sync issue and the iwarp device open is called even
before the PCI register writes are done.

Forcing the PCI register writes to happen just before it exits the
function.

Change-ID: I60c6a2c709da89e845f2764cc50ce8b7373c8c44
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:12 -07:00
Jacob Keller 59826d9bec i40e: don't allow reduction of channels below active FD rules
If a driver is unable to maintain all current user supplied settings
from ethtool (or other sources), it is not ok for a user request to
succeed and silently trample over previous configuration.

To that end, if you change the number of channels, it must not be
allowed to reduce the number of channels (queues) below the current
flow director filter rules targets. In this case, return -EINVAL when
a request to reduce the number of channels would do so. In addition
log a warning to the kernel buffer explaining why we failed, and report
the rules which prevent us from lowering the number of channels.

Change-ID: If41464d63d7aab11cedf09e4f3aa1a69e21ffd88
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:11 -07:00
Dave Ertman 52a08caa0c i40e: Fix static analysis tool warning
This patch fixes a problem where a static analysis tool generates
a warning for "INVARIANT_CONDITION: Expression 'enabled_tc' used
in the condition always yields the same result."

Without this patch, the driver will not pass the static analysis
tool checks without generating warnings.

This patch fixes the problem by eliminating the irrelevant check
and redundant assignment for the value of enabled_tc.

Change-ID: Ia7d44cb050f507df7de333e96369d322e08bf408
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:11 -07:00
Jacob Keller 62fe2a865e i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability
The function calls netif_set_real_num_(tx|rx)_queues, both of which
should be done only under rntl lock. Unfortunately the
i40evf_init_task did not hold the rtnl_lock as necessary. This patch
adds the locking needed.

Change-ID: Ib72a21c3ce22b71a226b16f9bbe0f5f8cc3e849b
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:11 -07:00
Catherine Sullivan 42bce04ef3 i40e: reset RX csum error stat with other pf stats
When we are resetting the pf stats we should also reset the RX csum
error stat.

Change-ID: I7af5ee0ec81a10f6deee1a7b8c2082ea068ef620
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:11 -07:00
Catherine Sullivan dc5b4e9fad i40e/i40evf: Reset VLAN filter count when resetting
When we do a reset, all the VLAN filters get added again. Therefore we also
want to reset the VLAN count to 0 or we quickly run out of filters.

Change-ID: I459f26851e22204dc8b8999928ad87cde8170119
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:10 -07:00
Anjali Singhai Jain 3a0f52928a i40e: Fix a bug where a client close can be called before an open is complete
The client->open call in this path was not protected with the
client instance mutex, and hence the client->close can get initiated
before the open completes.

Change-Id: I0ed60c38868dd3f44966b6ed49a063d0e5b7edf5
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:10 -07:00
Wei Yongjun eb27163b2e i40e: Use list_move instead of list_del/list_add
Using list_move() instead of list_del() + list_add().

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-18 11:43:09 -07:00
Maor Gottlieb 87d22483ce net/mlx5: Add sniffer namespaces
Add sniffer TX and RX namespaces to receive ingoing and outgoing
traffic.

Each outgoing/incoming packet is duplicated and steered to the sniffer
TX/RX namespace in addition to the regular flow.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:59 +03:00
Maor Gottlieb cea824d416 net/mlx5: Introduce sniffer steering hardware capabilities
Define needed hardware capabilities for sniffer
RX and TX flow tables.

Add the following capabilities:
1. Sniffer RX flow table capabilities.
2. Sniffer TX flow table capabilities.
3. If same TIR can be used by multiple flow tables of different types.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:59 +03:00
Aviv Heller 917b41aab7 net/mlx5: Configure IB devices according to LAG state
When mlx5_ib is loaded, we would like each card's IB devices
to be added according to its LAG state (one IB device, instead of
two, is to be added if LAG is active).

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:58 +03:00
Aviv Heller 3bc34f3bcb net/mlx5: Vport LAG creation support
Add interfaces for issuing CREATE_VPORT_LAG and
DESTROY_VPORT_LAG commands.

Used for receiving PF1's eth traffic on PF0's
root ft.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller 3e75d4ebaa net/mlx5: Add LAG flow steering namespace
This namespace is used for LAG demux flowtable.

The idea is to position the LAG demux ft between
bypass and kernel flowtables, allowing raw-eth
traffic from both ports to be received by the PF0
IB device.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller aaff1bea16 net/mlx5: LAG demux flow table support
Add interfaces to allow the creation and destruction of a
LAG demux flow table.

It is a special flow table used during LAG for redirecting
non user-mode packets from PF0 to PF1 root ft, if a packet was
received on phys port two.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:56 +03:00
Aviv Heller edb31b1686 net/mlx5: LAG and SRIOV cannot be used together
Until support will be added for RoCE LAG SRIOV.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:56 +03:00
Aviv Heller db60b80273 net/mlx5e: Avoid port remapping of mlx5e netdev TISes
TISes belonging to the mlx5e NIC should not be
subject to port remap.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:55 +03:00
Aviv Heller 6a32047a44 net/mlx5: Get RoCE netdev
Used by IB driver for determining the IB bond
device's netdev, when LAG is active.

Returns PF0's netdev if mode is not active-backup,
or the PF netdev of the active slave when mode is
active-backup.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:54 +03:00
Aviv Heller 7907f23adc net/mlx5: Implement RoCE LAG feature
Available on dual port cards only, this feature keeps
track, using netdev LAG events, of the bonding
and link status of each port's PF netdev.

When both of the card's PF netdevs are enslaved to the
same bond/team master, and only them, LAG state
is active.

During LAG, only one IB device is present for both ports.

In addition to the above, this commit includes FW commands
used for managing the LAG, new facilities for adding and removing
a single device by interface, and port remap functionality according to
bond events.

Please note that this feature is currently used only for mimicking
Ethernet bonding for RoCE - netdevs functionality is not altered,
and their bonding continues to be managed solely by bond/team driver.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:54 +03:00
Aviv Heller 84df61ebc6 net/mlx5: Add HW interfaces used by LAG
Exposed LAG commands enum and layouts:
- CREATE_LAG
  HW enters LAG mode:
  RoCE traffic from port two is received on PF0 core dev.
  Allows to set tx_affinity (tx port) for QPs and TISes.
  Allows to port remap QPs and TISes, overriding their
  tx_affinity behavior.

- MODIFY_LAG
  Remap QPs and TISes to another port.

- QUERY_LAG
  Query whether LAG mode is active.

- DESTROY_LAG
  HW exits LAG mode, returning to non-LAG behavior.

- CREATE_VPORT_LAG
  Merge Ethernet flow steering, such that traffic received on port
  two jumps to PF0 root flow table.

  Available only in LAG mode.

- DESTROY_VPORT_LAG
  Ethernet flow steering returns to non-LAG behavior.

Caps added:
- lag_master
  Driver is in charge of managing the LAG.
  This is currently the only option.

- num_lag_ports
  LAG is supported only if this field's value is 2.

Other fields:
- QP/TIS tx port affinity
  During LAG, this field controls on which port a QP or TIS resides.

- TIS strict tx affinity
  When this field is set, the TIS will not be subject to port remap by
  CREATE_LAG/MODIFY_LAG.

- LAG demux flow table
  Flow table used for redirecting non user-space traffic back to
  PF1 root flow table, if the packet was received on port two.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:53 +03:00
Noa Osherovich d5beb7f2af net/mlx5: Separate query_port_proto_oper for IB and EN
Replaced mlx5_query_port_proto_oper with separate functions per link
type. The functions should take different arguments so no point in
trying to unite them.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:52 +03:00
Noa Osherovich 8cca30a7f9 net/mlx5: Expose mlx5e_link_mode
The mlx5e_link_mode enumeration will also be used in mlx5_ib for RoCE.
This patch moves the enumeration to the mlx5 driver port header file.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:52 +03:00
Artemy Kovalyov 2e353b3468 net/mlx5: Update struct mlx5_ifc_xrqc_bits
Update struct mlx5_ifc_xrqc_bits according to last specification

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:51 +03:00
Alex Vesker 83b502a12e net/mlx5: Modify RQ bitmask from mlx5 ifc
Use mlx5 ifc MODIFY_BITMASK_VSD in mlx5e_modify_rq_vsd and expose counter
set capability bit in hca caps structure.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:08 +03:00
David S. Miller 60747ef4d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes for both merge conflicts.

Resolution work done by Stephen Rothwell was used
as a reference.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 01:17:32 -04:00
Linus Torvalds 184ca82348 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
    Fietkau.

 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

 3) Fix some tg3 ethtool logic bugs, and one that would cause no
    interrupts to be generated when rx-coalescing is set to 0.  From
    Satish Baddipadige and Siva Reddy Kallam.

 4) QLCNIC mailbox corruption and napi budget handling fix from Manish
    Chopra.

 5) Fix fib_trie logic when walking the trie during /proc/net/route
    output than can access a stale node pointer.  From David Forster.

 6) Several sctp_diag fixes from Phil Sutter.

 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

 8) Checksum fixup fixes in bpf from Daniel Borkmann.

 9) Memork leaks in nfnetlink, from Liping Zhang.

10) Use after free in rxrpc, from David Howells.

11) Use after free in new skb_array code of macvtap driver, from Jason
    Wang.

12) Calipso resource leak, from Colin Ian King.

13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

15) Fix lockdep splats in macsec, from Sabrina Dubroca.

16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
    handling.

17) Various tc-action bug fixes, from CONG Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net_sched: allow flushing tc police actions
  net_sched: unify the init logic for act_police
  net_sched: convert tcf_exts from list to pointer array
  net_sched: move tc offload macros to pkt_cls.h
  net_sched: fix a typo in tc_for_each_action()
  net_sched: remove an unnecessary list_del()
  net_sched: remove the leftover cleanup_a()
  mlxsw: spectrum: Allow packets to be trapped from any PG
  mlxsw: spectrum: Unmap 802.1Q FID before destroying it
  mlxsw: spectrum: Add missing rollbacks in error path
  mlxsw: reg: Fix missing op field fill-up
  mlxsw: spectrum: Trap loop-backed packets
  mlxsw: spectrum: Add missing packet traps
  mlxsw: spectrum: Mark port as active before registering it
  mlxsw: spectrum: Create PVID vPort before registering netdevice
  mlxsw: spectrum: Remove redundant errors from the code
  mlxsw: spectrum: Don't return upon error in removal path
  i40e: check for and deal with non-contiguous TCs
  ixgbe: Re-enable ability to toggle VLAN filtering
  ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
  ...
2016-08-17 17:26:58 -07:00
David S. Miller 484334198f Merge branch 'strparser'
Tom Herbert says:

====================
strp: Stream parser for messages

This patch set introduces a utility for parsing application layer
protocol messages in a TCP stream. This is a generalization of the
mechanism implemented of Kernel Connection Multiplexor.

This patch set adapts KCM to use the strparser. We expect that kTLS
can use this mechanism also. RDS would probably be another candidate
to use a common stream parsing mechanism.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function. The callbacks include
a parse_msg function that is called to perform parsing (e.g.
BPF parsing in case of KCM), and a rcv_msg function that is called
when a full message has been completed.

For strparser we specify the return codes from the parser to allow
the backend to indicate that control of the socket should be
transferred back to userspace to handle some exceptions in the
stream: The return values are:

      >0 : indicates length of successfully parsed message
       0  : indicates more data must be received to parse the message
       -ESTRPIPE : current message should not be processed by the
          kernel, return control of the socket to userspace which
          can proceed to read the messages itself
       other < 0 : Error is parsing, give control back to userspace
          assuming that synchronization is lost and the stream
          is unrecoverable (application expected to close TCP socket)

There is one issue I haven't been able to fully resolve. If parse_msg
returns ESTRPIPE (wants control back to userspace) the parser may
already have consumed some bytes of the message. There is no way to
put bytes back into the TCP receive queue and tcp_read_sock does not
allow an easy way to peek messages. In lieu of a better solution, we
return ENODATA on the socket to indicate that the data stream is
unrecoverable (application needs to close socket). This condition
should only happen if an application layer message header is split
across two skbuffs and parsing just the first skbuff wasn't sufficient
to determine the that transfer to userspace is needed.

This patch set contains:

  - strparser implementation
  - changes to kcm to use strparser
  - strparser.txt documentation

v2:
  - Add copyright notice to C files
  - Remove GPL module license from strparser.c
  - Add report of rxpause

v3:
  - Restore GPL module license
  - Use EXPORT_SYMBOL_GPL

v4:
  - Removed unused function, changed another to be static as suggested
    by davem
  - Rewoked data_ready to be called from upper layer, no longer requires
    taking over socket data_ready callback as suggested by Lance Chao

Tested:
  - Ran a KCM thrash test for 24 hours. No behavioral or performance
    differences observed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:37:04 -04:00
Tom Herbert adcce4d5dd strparser: Documentation
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:36:49 -04:00
Tom Herbert 9b73896a81 kcm: Use stream parser
Adapt KCM to use the stream parser. This mostly involves removing
the RX handling and setting up the strparser using the interface.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:36:23 -04:00
Tom Herbert 43a0c6751a strparser: Stream parser for messages
This patch introduces a utility for parsing application layer protocol
messages in a TCP stream. This is a generalization of the mechanism
implemented of Kernel Connection Multiplexor.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function.

A stream parser instance is defined by a strparse structure that
is bound to a TCP socket. The function to initialize the structure
is:

int strp_init(struct strparser *strp, struct sock *csk,
              struct strp_callbacks *cb);

csk is the TCP socket being bound to and cb are the parser callbacks.

The upper layer calls strp_tcp_data_ready when data is ready on the lower
socket for strparser to process. This should be called from a data_ready
callback that is set on the socket:

void strp_tcp_data_ready(struct strparser *strp);

A parser is bound to a TCP socket by setting data_ready function to
strp_tcp_data_ready so that all receive indications on the socket
go through the parser. This is assumes that sk_user_data is set to
the strparser structure.

There are four callbacks.
 - parse_msg is called to parse the message (returns length or error).
 - rcv_msg is called when a complete message has been received
 - read_sock_done is called when data_ready function exits
 - abort_parser is called to abort the parser

The input to parse_msg is an skbuff which contains next message under
construction. The backend processing of parse_msg will parse the
application layer protocol headers to determine the length of
the message in the stream. The possible return values are:

   >0 : indicates length of successfully parsed message
   0  : indicates more data must be received to parse the message
   -ESTRPIPE : current message should not be processed by the
      kernel, return control of the socket to userspace which
      can proceed to read the messages itself
   other < 0 : Error is parsing, give control back to userspace
      assuming that synchronzation is lost and the stream
      is unrecoverable (application expected to close TCP socket)

In the case of error return (< 0) strparse will stop the parser
and report and error to userspace. The application must deal
with the error. To handle the error the strparser is unbound
from the TCP socket. If the error indicates that the stream
TCP socket is at recoverable point (ESTRPIPE) then the application
can read the TCP socket to process the stream. Once the application
has dealt with the exceptions in the stream, it may again bind the
socket to a strparser to continue data operations.

Note that ENODATA may be returned to the application. In this case
parse_msg returned -ESTRPIPE, however strparser was unable to maintain
synchronization of the stream (i.e. some of the message in question
was already read by the parser).

strp_pause and strp_unpause are used to provide flow control. For
instance, if rcv_msg is called but the upper layer can't immediately
consume the message it can hold the message and pause strparser.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:36:23 -04:00
Thierry Reding d2d371ae5d net: ipconfig: Fix more use after free
While commit 9c706a49d6 ("net: ipconfig: fix use after free") avoids
the use after free, the resulting code still ends up calling both the
ic_setup_if() and ic_setup_routes() after calling ic_close_devs(), and
access to the device is still required.

Move the call to ic_close_devs() to the very end of the function.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:33:40 -04:00
David S. Miller b96c22c071 Merge branch 'tc_action-fixes'
Cong Wang says:

====================
net_sched: tc action fixes and updates

This patchset fixes a few regressions caused by the previous
code refactor and more. Thanks to Jamal for catching them!

Note, patch 3/7 and 4/7 are not strictly necessary for this patchset,
I just want to carry them together.

---
v4: adjust an indention for Jamal
    add two more patches

v3: avoid list for fast path, suggested by Jamal

v2: replace flex_array with regular dynamic array
    keep tcf_action_stats_update() in act_api.h
    fix macro typos found by Amir
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:27:58 -04:00