Commit Graph

504628 Commits

Author SHA1 Message Date
Johan Hedberg 801c1e8da5 Bluetooth: Add mgmt HCI channel registration API
This patch adds an API for registering HCI channels with mgmt-like
semantics. For now the only user will be HCI_CHANNEL_CONTROL, but e.g.
6lowpan is intended to use this as well in the future.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06 20:15:21 +01:00
Marcel Holtmann 93690c227a Bluetooth: Introduce controller setting information for static address
Currently it is not possible to determine if the static address is used
by the controller. It is also not possible to determine if using a
static on a dual-mode controller with disabled BR/EDR is possible or
not.

To address this issue, introduce a new setting called static-address. If
support for this setting is signaled that means that the kernel supports
using static addresses. And if used on dual-mode controllers with BR/EDR
disabled it means that a configured static address can be used.

In addition utilize the same setting for the list of current active
settings that indicates if a static address is configured and if that
address will be actually used.

With this in mind the existing Set Static Address management command
has been extended to return the current settings. That way the caller
of that command can easily determine if the programmed address will
be used or if extra steps are required.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-06 20:43:07 +02:00
Jakub Pawlowski 82f8b651a9 Bluetooth: fix service discovery behaviour for empty uuids filter
This patch fixes service discovery behaviour, when provided uuid filter
is empty and HCI_QUIRK_STRICT_DUPLICATE_FILTER is set. Before this
patch, empty uuid filter was unable to trigger scan restart, and that
caused inconsistent behaviour in applications.

Example: two DBus clients call BlueZ, one to find all devices with
service abcd, second to find all devices with rssi smaller than -90.
Sum of those filters, that is passed to mgmt_service_scan is empty
filter, with no rssi or uuids set.
That caused kernel not to restart scan when quirk was set.
That was inconsistent with what happen when there's only one of those
two filters set (scan is restarted and reports devices).

To fix that, new variable hdev->discovery.result_filtering was
introduced. It can indicate that filtered scan is running, no matter
what uuid or rssi filter is set.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Jakub Pawlowski 2976cdeb27 Bluetooth: Refactor service discovery filter logic
This patch refactor code responsible for filtering when service
discovery method is used. Previously this code was mixed with
mgmt_device found logic. Now when it's in one place whole logic can
be greatly simplified. That includes removing no longer necessary
length field and merging checks for eir and scan_rsp.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Jakub Pawlowski 48f86b7f26 Bluetooth: Move Service Discovery logic before refactoring
This patch moves whole packet filering logic of service discovery
into new function is_filter_match. It's done because logic inside
mgmt_device_found is very complicated and needs some
simplification.

Also having whole logic in one place will allow to simplify it in
the future.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05 09:50:50 +02:00
Alexander Aring 263be3326b at86rf230: restore trx len when needed
In the most cases the spi messages has a length of two. Currently we
always set the the len field to two before transmit a spi message. In
cases for read out/write in the frame buffer we need another len. This
patch use trx len two as default. For the frame buffer cases we restore
the trx len to two on success and failure. This will reduce the len
setting of two when it's already two.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:15:25 +01:00
Alexander Aring 31fa74344c at86rf230: remove multiple dereferencing for ctx
This patch cleanups the referencing for the state change context
variable. The state change context should only set once and this is by
initial a state change. This patch will use the initial state change
variable in the complete handler of the state change by using the ctx
context which should be always the same like the initial state change
context.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:15:25 +01:00
Alexander Aring cca990c85d at86rf230: remove multiple dereferencing for irq
By holding the irq variable inside at86rf230_state_change we can squash
some multiple dereferencing for getting irq num.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:15:25 +01:00
Alexander Aring 74de4c804c at86rf230: refactor receive handling
This patch refactor the receive handling into one function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:15:25 +01:00
Alexander Aring ef5428a138 at86rf230: cleanup and squash stack variable
I had this variable because I thought it would be protected by
disable/enable irq but this is not true. It's protected by stop/wake
netdev queue which is called by ieee802154_xmit_complete.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:15:25 +01:00
Alexander Aring ba6d223932 at86rf230: add transmit retry support
This patch introduce a transmit retry handling into at86rf230 transmit
path. Current behaviour is to wait the normal receive time if we want
to go into STATE_TX_ON when the transceiver is in STATE_BUSY_RX_AACK
which indicates that a frame is currently receiving. A non force state
change will not interrupt the the receiving state.

The current behaviour is that after the normal receive time we will
start a force change into STATE_TX_ON. With this patch we do seven
retries to go into STATE_TX_ON without forcing. After we hit the
AT86RF2XX_MAX_TX_RETRIES we will start the force state change.
This is a polling like method to go into STATE_TX_ON in times of maximum
receiving time.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:15:24 +01:00
Kim, Ben Young Tae 3267c884ce Bluetooth: btusb: Add support for QCA ROME chipset family
This patch supports ROME Bluetooth family from Qualcomm Atheros,
e.g. QCA61x4 or QCA6574.

New chipset have similar firmware downloading sequences to previous
chipset from Atheros, however, it doesn't support vid/pid switching
after downloading the patch so that firmware needs to be handled by
btusb module directly.

ROME chipset can be differentiated from previous version by reading
ROM version.

T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 16 Spd=12   MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0cf3 ProdID=e300 Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms

T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  8 Spd=12   MxCh= 0
D:  Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0cf3 ProdID=e360 Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms

Signed-off-by: Ben Young Tae Kim <ytkim@qca.qualcomm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:07:01 +01:00
Kim, Ben Young Tae ace3198258 Bluetooth: btusb: Add setup callback for chip init on USB
Some of chipset does not allow to send a patch or config files through
HCI VS channel at early stage as well as they don't support to send
USB patch files to other channel except USB bulk path.

New callback added is for initialization of BT controller through USB

Signed-off-by: Ben Young Tae Kim <ytkim@qca.qualcomm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-03 02:07:00 +01:00
Daniel Borkmann 49b31e576a filter: refactor common filter attach code into __sk_attach_prog
Both sk_attach_filter() and sk_attach_bpf() are setting up sk_filter,
charging skmem and attaching it to the socket after we got the eBPF
prog up and ready. Lets refactor that into a common helper.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 14:51:38 -05:00
David S. Miller 70c836a4d1 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-03-02

Here's the first bluetooth-next pull request targeting the 4.1 kernel:

 - ieee802154/6lowpan cleanups
 - SCO routing to host interface support for the btmrvl driver
 - AMP code cleanups
 - Fixes to AMP HCI init sequence
 - Refactoring of the HCI callback mechanism
 - Added shutdown routine for Intel controllers in the btusb driver
 - New config option to enable/disable Bluetooth debugfs information
 - Fix for early data reception on L2CAP fixed channels

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 14:47:12 -05:00
David S. Miller b4844353c0 Merge branch 'sendmsg_recvmsg_iocb_removal'
Ying Xue says:

====================
net: Remove iocb argument from sendmsg and recvmsg

Currently there is only one user - TIPC whose sendmsg() instances
using iocb argument. Meanwhile, there is no user using iocb argument
in its recvmsg() instance. Therefore, if we eliminate the werid usage
of iobc argument from TIPC, the iocb argument can be removed from
all sendmsg() and recvmsg() instances of the whole networking stack.

Reference:
https://patchwork.ozlabs.org/patch/433960/

Changes:

v2:
 * Fix compile errors of DCCP module pointed by David
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 13:06:38 -05:00
Ying Xue 1b78414047 net: Remove iocb argument from sendmsg and recvmsg
After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 13:06:31 -05:00
Ying Xue 39a0295f90 tipc: Don't use iocb argument in socket layer
Currently the iocb argument is used to idenfiy whether or not socket
lock is hold before tipc_sendmsg()/tipc_send_stream() is called. But
this usage prevents iocb argument from being dropped through sendmsg()
at socket common layer. Therefore, in the commit we introduce two new
functions called __tipc_sendmsg() and __tipc_send_stream(). When they
are invoked, it assumes that their callers have taken socket lock,
thereby avoiding the weird usage of iocb argument.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 13:06:31 -05:00
David S. Miller 6556c38524 Merge branch 'dropcount'
Eyal Birger says:

====================
net: move skb->dropcount to skb->cb[]

Commit 977750076d ("af_packet: add interframe drop cmsg (v6)")
unionized skb->mark and skb->dropcount in order to allow recording
of the socket drop count while maintaining struct sk_buff size.

skb->dropcount was introduced since there was no available room
in skb->cb[] in packet sockets. However, its introduction led to
the inability to export skb->mark to userspace.

It was considered to alias skb->priority instead of skb->mark.
However, that would lead to the inabilty to export skb->priority
to userspace if desired. Such change may also lead to hard-to-find
issues as skb->priority is assumed to be alias free, and, as noted
by Shmulik Ladkani, is not 'naturally orthogonal' with other skb
fields.

This patch series follows the suggestions made by Eric Dumazet moving
the dropcount metric to skb->cb[], eliminating this problem
at the expense of 4 bytes less in skb->cb[] for protocol families
using it.

The patch series include compactization of bluetooth and packet
use of skb->cb[] as well as the infrastructure for placing dropcount
in skb->cb[].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:35 -05:00
Eyal Birger 744d5a3e9f net: move skb->dropcount to skb->cb[]
Commit 977750076d ("af_packet: add interframe drop cmsg (v6)")
unionized skb->mark and skb->dropcount in order to allow recording
of the socket drop count while maintaining struct sk_buff size.

skb->dropcount was introduced since there was no available room
in skb->cb[] in packet sockets. However, its introduction led to
the inability to export skb->mark, or any other aliased field to
userspace if so desired.

Moving the dropcount metric to skb->cb[] eliminates this problem
at the expense of 4 bytes less in skb->cb[] for protocol families
using it.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger 3bc3b96f3b net: add common accessor for setting dropcount on packets
As part of an effort to move skb->dropcount to skb->cb[], use
a common function in order to set dropcount in struct sk_buff.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger b4772ef879 net: use common macro for assering skb->cb[] available size in protocol families
As part of an effort to move skb->dropcount to skb->cb[] use a common
macro in protocol families using skb->cb[] for ancillary data to
validate available room in skb->cb[].

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:30 -05:00
Eyal Birger 2472d7613b net: packet: use sockaddr_ll fields as storage for skb original length in recvmsg path
As part of an effort to move skb->dropcount to skb->cb[], 4 bytes
of additional room are needed in skb->cb[] in packet sockets.

Store the skb original length in the first two fields of sockaddr_ll
(sll_family and sll_protocol) as they can be derived from the skb when
needed.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Eyal Birger 2cfdf9fcb8 net: rxrpc: change call to sock_recv_ts_and_drops() on rxrpc recvmsg to sock_recv_timestamp()
Commit 3b885787ea ("net: Generalize socket rx gap / receive queue overflow cmsg")
allowed receiving packet dropcount information as a socket level option.
RXRPC sockets recvmsg function was changed to support this by calling
sock_recv_ts_and_drops() instead of sock_recv_timestamp().

However, protocol families wishing to receive dropcount should call
sock_queue_rcv_skb() or set the dropcount specifically (as done
in packet_rcv()). This was not done for rxrpc and thus this feature
never worked on these sockets.

Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as
part of an effort to move skb->dropcount into skb->cb[]

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Eyal Birger 6368c23577 net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit fields
Convert boolean fields incoming and req_start to bit fields and move
force_active in order save space in bt_skb_cb in an effort to use
a portion of skb->cb[] for storing skb->dropcount.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Eyal Birger 49a6fe0557 net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrl
struct hci_req_ctrl is never used outside of struct bt_skb_cb;
Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing
the addition of more ancillary data.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:19:29 -05:00
Simon Farnsworth 287f3a943f pppoe: Use workqueue to die properly when a PADT is received
When a PADT frame is received, the socket may not be in a good state to
close down the PPP interface. The current implementation handles this by
simply blocking all further PPP traffic, and hoping that the lack of traffic
will trigger the user to investigate.

Use schedule_work to get to a process context from which we clear down the
PPP interface, in a fashion analogous to hangup on a TTY-based PPP
interface. This causes pppd to disconnect immediately, and allows tools to
take immediate corrective action.

Note that pppd's rp_pppoe.so plugin has code in it to disable the session
when it disconnects; however, as a consequence of this patch, the session is
already disabled before rp_pppoe.so is asked to disable the session. The
result is a harmless error message:

Failed to disconnect PPPoE socket: 114 Operation already in progress

This message is safe to ignore, as long as the error is 114 Operation
already in progress; in that specific case, it means that the PPPoE session
has already been disabled before pppd tried to disable it.

Signed-off-by: Simon Farnsworth <simon@farnz.org.uk>
Tested-by: Dan Williams <dcbw@redhat.com>
Tested-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 00:17:31 -05:00
Ivan Vecera 26caa3469a bnx2: disable toggling of rxvlan if necessary
The bnx2 driver uses .ndo_fix_features to force enable of Rx VLAN tag
stripping when the card cannot disable it. The driver should remove
NETIF_F_HW_VLAN_CTAG_RX flag from hw_features instead so it is fixed
for the ethtool.

Cc: Sony Chacko <sony.chacko@qlogic.com>
Cc: Dept-HSGLinuxNICDev@qlogic.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 23:25:51 -05:00
Arun Chandran ea373041bd net: macb: Properly add DMACFG bit definitions
Add *_SIZE macros for the bits ENDIA_DESC and
ENDIA_PKT

Signed-off-by: Arun Chandran <achandran@mvista.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 23:05:23 -05:00
Arun Chandran 62f6924cb1 net: macb: Add on the fly CPU endianness detection
Program management descriptor's access mode according to the
dynamically detected CPU endianness.

Signed-off-by: Arun Chandran <achandran@mvista.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 23:05:23 -05:00
Shrikrishna Khare 759c9359ae Driver: Vmxnet3: Copy TCP header to mapped frame for IPv6 packets
Allows for packet parsing to be done by the fast path. This performance
optimization already exists for IPv4. Add similar logic for IPv6.

Signed-off-by: Amitabha Banerjee <banerjeea@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 23:03:42 -05:00
David S. Miller 68932f7188 Merge branch 'ebpf_support_for_cls_bpf'
Daniel Borkmann says:

====================
eBPF support for cls_bpf

This is the non-RFC version of my patchset posted before netdev01 [1]
conference. It contains a couple of eBPF cleanups and preparation
patches to get eBPF support into cls_bpf. The last patch adds the
actual support. I'll post the iproute2 parts after the kernel bits
are merged, an initial preview link to the code is mentioned in the
last patch.

Patch 4 and 5 were originally one patch, but I've split them into
two parts upon request as patch 4 only is also needed for Alexei's
tracing patches that go via tip tree.

Tested with tc and all in-kernel available BPF test suites.

I have configured and built LLVM with --enable-experimental-targets=BPF
but as Alexei put it, the plan is to get rid of the experimental
status in future [2].

Thanks a lot!

v1 -> v2:
 - Removed arch patches from this series
  - x86 is already queued in tip tree, under x86/mm
  - arm64 just reposted directly to arm folks
 - Rest is unchanged

  [1] http://thread.gmane.org/gmane.linux.network/350191
  [2] http://article.gmane.org/gmane.linux.kernel/1874969
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:24 -05:00
Daniel Borkmann e2e9b6541d cls_bpf: add initial eBPF support for programmable classifiers
This work extends the "classic" BPF programmable tc classifier by
extending its scope also to native eBPF code!

This allows for user space to implement own custom, 'safe' C like
classifiers (or whatever other frontend language LLVM et al may
provide in future), that can then be compiled with the LLVM eBPF
backend to an eBPF elf file. The result of this can be loaded into
the kernel via iproute2's tc. In the kernel, they can be JITed on
major archs and thus run in native performance.

Simple, minimal toy example to demonstrate the workflow:

  #include <linux/ip.h>
  #include <linux/if_ether.h>
  #include <linux/bpf.h>

  #include "tc_bpf_api.h"

  __section("classify")
  int cls_main(struct sk_buff *skb)
  {
    return (0x800 << 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos));
  }

  char __license[] __section("license") = "GPL";

The classifier can then be compiled into eBPF opcodes and loaded
via tc, for example:

  clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
  tc filter add dev em1 parent 1: bpf cls.o [...]

As it has been demonstrated, the scope can even reach up to a fully
fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c).

For tc, maps are allowed to be used, but from kernel context only,
in other words, eBPF code can keep state across filter invocations.
In future, we perhaps may reattach from a different application to
those maps e.g., to read out collected statistics/state.

Similarly as in socket filters, we may extend functionality for eBPF
classifiers over time depending on the use cases. For that purpose,
cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so
we can allow additional functions/accessors (e.g. an ABI compatible
offset translation to skb fields/metadata). For an initial cls_bpf
support, we allow the same set of helper functions as eBPF socket
filters, but we could diverge at some point in time w/o problem.

I was wondering whether cls_bpf and act_bpf could share C programs,
I can imagine that at some point, we introduce i) further common
handlers for both (or even beyond their scope), and/or if truly needed
ii) some restricted function space for each of them. Both can be
abstracted easily through struct bpf_verifier_ops in future.

The context of cls_bpf versus act_bpf is slightly different though:
a cls_bpf program will return a specific classid whereas act_bpf a
drop/non-drop return code, latter may also in future mangle skbs.
That said, we can surely have a "classify" and "action" section in
a single object file, or considered mentioned constraint add a
possibility of a shared section.

The workflow for getting native eBPF running from tc [1] is as
follows: for f_bpf, I've added a slightly modified ELF parser code
from Alexei's kernel sample, which reads out the LLVM compiled
object, sets up maps (and dynamically fixes up map fds) if any, and
loads the eBPF instructions all centrally through the bpf syscall.

The resulting fd from the loaded program itself is being passed down
to cls_bpf, which looks up struct bpf_prog from the fd store, and
holds reference, so that it stays available also after tc program
lifetime. On tc filter destruction, it will then drop its reference.

Moreover, I've also added the optional possibility to annotate an
eBPF filter with a name (e.g. path to object file, or something
else if preferred) so that when tc dumps currently installed filters,
some more context can be given to an admin for a given instance (as
opposed to just the file descriptor number).

Last but not least, bpf_prog_get() and bpf_prog_put() needed to be
exported, so that eBPF can be used from cls_bpf built as a module.
Thanks to 60a3b2253c ("net: bpf: make eBPF interpreter images
read-only") I think this is of no concern since anything wanting to
alter eBPF opcode after verification stage would crash the kernel.

  [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:19 -05:00
Daniel Borkmann 24701ecea7 ebpf: move read-only fields to bpf_prog and shrink bpf_prog_aux
is_gpl_compatible and prog_type should be moved directly into bpf_prog
as they stay immutable during bpf_prog's lifetime, are core attributes
and they can be locked as read-only later on via bpf_prog_select_runtime().

With a bit of rearranging, this also allows us to shrink bpf_prog_aux
to exactly 1 cacheline.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:19 -05:00
Daniel Borkmann 96be4325f4 ebpf: add sched_cls_type and map it to sk_filter's verifier ops
As discussed recently and at netconf/netdev01, we want to prevent making
bpf_verifier_ops registration available for modules, but have them at a
controlled place inside the kernel instead.

The reason for this is, that out-of-tree modules can go crazy and define
and register any verfifier ops they want, doing all sorts of crap, even
bypassing available GPLed eBPF helper functions. We don't want to offer
such a shiny playground, of course, but keep strict control to ourselves
inside the core kernel.

This also encourages us to design eBPF user helpers carefully and
generically, so they can be shared among various subsystems using eBPF.

For the eBPF traffic classifier (cls_bpf), it's a good start to share
the same helper facilities as we currently do in eBPF for socket filters.

That way, we have BPF_PROG_TYPE_SCHED_CLS look like it's own type, thus
one day if there's a good reason to diverge the set of helper functions
from the set available to socket filters, we keep ABI compatibility.

In future, we could place all bpf_prog_type_list at a central place,
perhaps.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:19 -05:00
Daniel Borkmann d4052c4aea ebpf: remove CONFIG_BPF_SYSCALL ifdefs in socket filter code
This gets rid of CONFIG_BPF_SYSCALL ifdefs in the socket filter code,
now that the BPF internal header can deal with it.

While going over it, I also changed eBPF related functions to a sk_filter
prefix to be more consistent with the rest of the file.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:19 -05:00
Daniel Borkmann 0fc174dea5 ebpf: make internal bpf API independent of CONFIG_BPF_SYSCALL ifdefs
Socket filter code and other subsystems with upcoming eBPF support should
not need to deal with the fact that we have CONFIG_BPF_SYSCALL defined or
not.

Having the bpf syscall as a config option is a nice thing and I'd expect
it to stay that way for expert users (I presume one day the default setting
of it might change, though), but code making use of it should not care if
it's actually enabled or not.

Instead, hide this via header files and let the rest deal with it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:19 -05:00
Daniel Borkmann f1a66f85b7 ebpf: export BPF_PSEUDO_MAP_FD to uapi
We need to export BPF_PSEUDO_MAP_FD to user space, as it's used in the
ELF BPF loader where instructions are being loaded that need map fixups.

An initial stage loads all maps into the kernel, and later on replaces
related instructions in the eBPF blob with BPF_PSEUDO_MAP_FD as source
register and the actual fd as immediate value.

The kernel verifier recognizes this keyword and replaces the map fd with
a real pointer internally.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:19 -05:00
Daniel Borkmann a2c83fff58 ebpf: constify various function pointer structs
We can move bpf_map_ops and bpf_verifier_ops and other structs into ro
section, bpf_map_type_list and bpf_prog_type_list into read mostly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:18 -05:00
Daniel Borkmann f91fe17e24 ebpf: remove kernel test stubs
Now that we have BPF_PROG_TYPE_SOCKET_FILTER up and running, we can
remove the test stubs which were added to get the verifier suite up.

We can just let the test cases probe under socket filter type instead.
In the fill/spill test case, we cannot (yet) access fields from the
context (skb), but we may adapt that test case in future.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01 14:05:18 -05:00
David S. Miller b656cc64cf Merge branch 's390-next'
Ursula Braun says:

====================
s390: network patches for net-next

here are some s390 related patches for net-next
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 23:39:05 -05:00
Ursula Braun 8b7ac017aa MAINTAINERS: update S390 NETWORK DRIVERS maintainer
remove Frank Blaschka as S390 NETWORK DRIVERS maintainer

Acked-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 23:38:29 -05:00
Stefan Raspl ca5b20ace8 qeth: Fix command sizes
This patch adjusts two instances where we were using the (too big)
struct qeth_ipacmd_setadpparms size instead of the commands' actual
size. This didn't do any harm, but wasted a few bytes.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 23:38:29 -05:00
Ursula Braun 83650a2edc s390: remove claw driver
claw devices are outdated and no longer supported.
This patch removes the claw driver.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 23:38:29 -05:00
Eric Dumazet 74abc20ced tcp: cleanup static functions
tcp_fastopen_create_child() is static and should not be exported.

tcp4_gso_segment() and tcp6_gso_segment() should be static.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 16:56:51 -05:00
Andrew Schwartzmeyer 59995370db hyperv: Implement netvsc_get_channels() ethool op
This adds support for reporting the actual and maximum combined channels
count of the hv_netvsc driver via 'ethtool --show-channels'.

This required adding 'max_chn' to 'struct netvsc_device', and assigning
it 'rsscap.num_recv_que' in 'rndis_filter_device_add'. Now we can access
the combined maximum channel count via 'struct netvsc_device' in the
ethtool callback.

Signed-off-by: Andrew Schwartzmeyer <andrew@schwartzmeyer.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 16:51:36 -05:00
David S. Miller f9c7ce1853 Merge branch 'tcp-tso'
Eric Dumazet says:

====================
tcp: tso improvements

This patch serie reworks tcp_tso_should_defer() a bit
to get less bursts, and better ECN behavior.

We also removed tso_deferred field in tcp socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 15:10:47 -05:00
Eric Dumazet a0ea700e40 tcp: tso: allow CA_CWR state in tcp_tso_should_defer()
Another TCP issue is triggered by ECN.

Under pressure, receiver gets ECN marks, and send back ACK packets
with ECE TCP flag. Senders enter CA_CWR state.

In this state, tcp_tso_should_defer() is short cut :

if (icsk->icsk_ca_state != TCP_CA_Open)
    goto send_now;

This means that about all ACK packets we receive are triggering
a partial send, and because cwnd is kept small, we can only send
a small amount of data for each incoming ACK,
which in return generate more ACK packets.

Allowing CA_Open and CA_CWR states to enable TSO defer in
tcp_tso_should_defer() brings performance back :
TSO autodefer has more chance to defer under pressure.

This patch increases TSO and LRO/GRO efficiency back to normal levels,
and does not impact overall ECN behavior.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 15:10:39 -05:00
Eric Dumazet 50c8339e92 tcp: tso: restore IW10 after TSO autosizing
With sysctl_tcp_min_tso_segs being 4, it is very possible
that tcp_tso_should_defer() decides not sending last 2 MSS
of initial window of 10 packets. This also applies if
autosizing decides to send X MSS per GSO packet, and cwnd
is not a multiple of X.

This patch implements an heuristic based on age of first
skb in write queue : If it was sent very recently (less than half srtt),
we can predict that no ACK packet will come in less than half rtt,
so deferring might cause an under utilization of our window.

This is visible on initial send (IW10) on web servers,
but more generally on some RPC, as the last part of the message
might need an extra RTT to get delivered.

Tested:

Ran following packetdrill test
// A simple server-side test that sends exactly an initial window (IW10)
// worth of packets.

`sysctl -e -q net.ipv4.tcp_min_tso_segs=4`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+.1   < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.1   < . 1:1(0) ack 1 win 257
+0    accept(3, ..., ...) = 4

+0    write(4, ..., 14600) = 14600
+0    > . 1:5841(5840) ack 1 win 457
+0    > . 5841:11681(5840) ack 1 win 457
// Following packet should be sent right now.
+0    > P. 11681:14601(2920) ack 1 win 457

+.1   < . 1:1(0) ack 14601 win 257

+0    close(4) = 0
+0    > F. 14601:14601(0) ack 1
+.1   < F. 1:1(0) ack 14602 win 257
+0    > . 14602:14602(0) ack 2

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 15:10:39 -05:00
Eric Dumazet 5f852eb536 tcp: tso: remove tp->tso_deferred
TSO relies on ability to defer sending a small amount of packets.
Heuristic is to wait for future ACKS in hope to send more packets at once.
Current algorithm uses a per socket tso_deferred field as a pseudo timer.

This pseudo timer relies on future ACK, but there is no guarantee
we receive them in time.

Fix would be to use a real timer, but cost of such timer is probably too
expensive for typical cases.

This patch changes the logic to test the time of last transmit,
because we should not add bursts of more than 1ms for any given flow.

We've used this patch for about two years at Google, before FQ/pacing
as it would reduce a fair amount of bursts.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28 15:10:39 -05:00