This patch changes the name used by Ethtool to something more
conventional in preparation for TSE Ethtool register dump
support to be added in the near future.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses a fault in the error recovery path of the probe
routine where the netdev structure was not being unregistered properly
leading to a panic only when the phy probe failed.
Abbreviated panic stack seen is as follows:
(free_netdev+0xXX) from (altera_tse_probe+0xXX)
(altera_tse_probe+0xXX) from (platform_drv_probe+0xXX)
(platform_drv_probe+0xXX) from (driver_probe_device+0xXX)
(driver_probe_device+0xXX) from (__driver_attach+0xXX)
(__driver_attach+0xXX) from (bus_for_each_dev+0xXX)
(bus_for_each_dev+0xXX) from (driver_attach+0xXX)
(driver_attach+0xXX) from (bus_add_driver+0xXX)
(bus_add_driver+0xXX) from (driver_register+0xXX)
(driver_register+0xXX) from (__platform_driver_register+0xXX)
(__platform_driver_register+0xXX) from (altera_tse_driver_init+0xXX)
(altera_tse_driver_init+0xXX) from (do_one_initcall+0xXX)
(do_one_initcall+0xXX) from (kernel_init_freeable+0xXX)
(kernel_init_freeable+0xXX) from (kernel_init+0xXX)
(kernel_init+0xXX) from (ret_from_fork+0xXX)
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch initializes the pause quanta set for transmitted pause frames
to the IEEE specified default of 0xffff.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch works around a recently discovered unaligned receive dma problem
with the Altera SGMDA. The Altera SGDMA component cannot be configured to
DMA data to unaligned addresses for receive packet operations from the
Triple Speed Ethernet component because of a potential data transfer
corruption that can occur. This patch addresses this issue by
utilizing the shift 16 bits feature of the Altera Triple Speed Ethernet
component and modifying the receive buffer physical addresses accordingly
such that the target receive DMA address is always aligned on a 32-bit
boundary.
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Tested-by: Matthew Gerlach <mgerlach@altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz says:
====================
bnx2x: SRIOV bug fixes
This series contains 3 SRIOV bug fixes, 2 of which are regressions starting
with commit 2dc33bbc "bnx2x: Remove the sriov VFOP mechanism".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting with commit 2dc33bbc "bnx2x: Remove the sriov VFOP mechanism",
the bnx2x started enforcing vlan credits for all vlan configurations.
This exposed 2 issues:
- Vlan credits are not returned once a VF is removed; this causes a leak
of credits, and eventually will lead to VFs with no vlan credits.
- A vlan credit must be set aside for the Hypervisor to use, and should
not be visible to the VF.
Although linux VFs at the moment do not support vlan configuration [from the
VF side] which causes them to be resilient to this sort of issue, Windows VF
over linux hypervisors might fail to load as the vlan credits become depleted.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing a VF interface, the driver fails to release that VF's mailbox
and bulletin board allocated memory.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we're performing reauthentication (in order to elevate the
security level from an unauthenticated key to an authenticated one) we
do not need to issue any encryption command once authentication
completes. Since the trigger for the encryption HCI command is the
ENCRYPT_PEND flag this flag should not be set in this scenario.
Instead, the REAUTH_PEND flag takes care of all necessary steps for
reauthentication.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Commit 1c2e004183 introduced an event handler for the encryption key
refresh complete event with the intent of fixing some LE/SMP cases.
However, this event is shared with BR/EDR and there we actually want to
act only on the auth_complete event (which comes after the key refresh).
If we do not do this we may trigger an L2CAP Connect Request too early
and cause the remote side to return a security block error.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
This reverts commit d2bee8fb6e.
Enabling autosuspend for Intel Bluetooth devices has been shown to not
work reliable. It does work for some people with certain combinations
of USB host controllers, but for others it puts the device to sleep and
it will not wake up for any event.
These events can be important ones like HCI Inquiry Complete or HCI
Connection Request. The events will arrive as soon as you poke the
device with a new command, but that is not something we can do in
these cases.
Initially there were patches to the xHCI USB controller that fixed
this for some people, but not for all. This could be well a problem
somewhere in the USB subsystem or in the USB host controllers or
just plain a hardware issue somewhere. At this moment we just do
not know and the only safe action is to revert this patch.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Tedd Ho-Jeong An <tedd.an@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
brcmfmac has been broken on my cubietruck with a BCM43362:
brcmfmac: brcmf_chip_recognition: found AXI chip: BCM43362, rev=1
brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0:
Apr 22 2013 14:50:00 version 5.90.195.89.6 FWID 01-b30a427d
since commit 53036261033: "brcmfmac: update core reset and disable routines".
The problem is that since this commit brcmf_chip_ai_resetcore no longer sets
BCMA_IOCTL itself before bringing the core out of reset, instead relying on
brcmf_chip_ai_coredisable to do so. But brcmf_chip_ai_coredisable is a nop
of the chip is already in reset. This patch modifies brcmf_chip_ai_coredisable
to always set BCMA_IOCTL even if the core is already in reset.
This fixes brcmfmac hanging in firmware loading on my board.
Cc: stable@vger.kernel.org # v3.14
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The commit "ath9k: move sc_flags to ath_common" moved setting
ATH_OP_INVALID flag below ieee80211_register_hw. This is causing
the flag never being cleared randomly as the drv_start is called
prior to setting flag. Fix this by setting the flag prior to
register_hw.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the ipv6 fib changes during a table dump, the walk is
restarted and the number of nodes dumped are skipped. But the existing
code doesn't advance to the next node after a node is skipped. This can
cause the dump to loop or produce lots of duplicates when the fib
is modified during the dump.
This change advances the walk to the next node if the current node is
skipped after a restart.
Signed-off-by: Kumar Sundararajan <kumar@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
slc_xmit is called within softirq context and locks sl->lock, but
slcan_write_wakeup is not softirq context, so we need to use
spin_[un]lock_bh!
Detected using kernel lock debugging mechanism.
Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When trying to set a data bitrate on non CAN FD devices the 'ip' tool
answers with:
RTNETLINK answers: Unknown error 524
Rename '-ENOTSUPP' to '-EOPNOTSUPP' so that 'ip' answers correctly:
RTNETLINK answers: Operation not supported
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When accessing the SJA1000 controller registers in the indirect access mode,
writing the register number and reading/writing the data has to be an atomic
attempt.
As the sja1000_isa driver is an old style driver with a fixed number of
instances the locking variable depends on the same index like all the other
configuration elements given on the module command line.
As a positive side effect dev->dev_id is populated by the instance index,
which was missing in 3e66d0138c ("can: populate netdev::dev_id for udev
discrimination").
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Coverity complains that c_can_pci_probe() calls pci_enable_msi() without
checking the result:
CID 712278 (#1 of 1): Unchecked return value (CHECKED_RETURN) 3. check_return:
Calling pci_enable_msi_block without checking return value (as is done
elsewhere 88 out of 105 times).
88 pci_enable_msi(pdev);
This is CID 712278.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Commit 6439fbce10 (can: c_can: fix error checking of priv->instance in
probe()) found the warning but applied a suboptimal solution. Since, both
pdev->id and of_alias_get_id() return integers, it makes sense to convert the
variable to an integer and avoid the cast.
Signed-off-by: Wolfram Sang <wsa@sang-engineering.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
It's suffcient to kill the TXIE bit in the message control register
even if the documentation of C and D CAN says that it's not allowed to
do that while MSGVAL is set. Reality tells a different story and this
change gives us another 2% of CPU back for not waiting on I/O.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Mark suggested to use one IF for the softirq and the other for the
xmit function to avoid the xmit lock.
That requires to write the frame into the interface first, then handle
the echo skb and store the dlc before committing the TX request to the
message ram.
We use an atomic to handle the active buffers instead of reading the
MSGVAL register as thats way faster especially on PCH/x86.
Suggested-by: Mark <mark5@del-llc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Instead of obfuscating the code by artificial 16 bit splits use the
proper 32 bit assignments and split the result when writing to the
interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Remove the MASK from the TX transfer side.
Make the code readable and get rid of the annoying IFX_WRITE_XXX_16BIT
macros which are just obfuscating the code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sigh!
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Alexander reported that the new optimized handling of the RX fifo
causes random packet loss on Intel PCH C_CAN hardware.
After a few fruitless debugging sessions I got hold of a PCH (eg20t)
afflicted system. That machine does not have the CAN interface wired
up, but it was possible to reproduce the issue with the HW loopback
mode.
As Alexander observed correctly, clearing the NewDat flag along with
reading out the message buffer causes that issue on C_CAN, while D_CAN
handles that correctly.
Instead of restoring the original message buffer handling horror the
following workaround solves the issue:
transfer buffer to IF without clearing the NewDat
handle the message
clear NewDat bit
That's similar to the original code but conditional for C_CAN.
I really wonder why all user manuals (C_CAN, Intel PCH and some more)
recommend to clear the NewDat bit right away. The knows it all Oracle
operated by Gurgle does not unearth any useful information either. I
simply cannot believe that we are the first to uncover that HW issue.
Reported-and-tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The RX buffer split causes packet loss in the hardware:
What happens is:
RX Packet 1 --> message buffer 1 (newdat bit is not cleared)
RX Packet 2 --> message buffer 2 (newdat bit is not cleared)
RX Packet 3 --> message buffer 3 (newdat bit is not cleared)
RX Packet 4 --> message buffer 4 (newdat bit is not cleared)
RX Packet 5 --> message buffer 5 (newdat bit is not cleared)
RX Packet 6 --> message buffer 6 (newdat bit is not cleared)
RX Packet 7 --> message buffer 7 (newdat bit is not cleared)
RX Packet 8 --> message buffer 8 (newdat bit is not cleared)
Clear newdat bit in message buffer 1
Clear newdat bit in message buffer 2
Clear newdat bit in message buffer 3
Clear newdat bit in message buffer 4
Clear newdat bit in message buffer 5
Clear newdat bit in message buffer 6
Clear newdat bit in message buffer 7
Clear newdat bit in message buffer 8
Now if during that clearing of newdat bits, a new message comes in,
the HW gets confused and drops it.
It does not matter how many of them you clear. I put a delay between
clear of buffer 1 and buffer 2 which was long enough that the message
should have been queued either in buffer 1 or buffer 9. But it did not
show up anywhere. The next message ended up in buffer 1. So the
hardware lost a packet of course without telling it via one of the
error handlers.
That does not happen on all clear newdat bit events. I see one of 10k
packets dropped in the scenario which allows us to reproduce. But the
trace looks always the same.
Not splitting the RX Buffer avoids the packet loss but can cause
reordering. It's hard to trigger, but it CAN happen.
With that mode we use the HW as it was probably designed for. We read
from the buffer 1 upwards and clear the buffer as we get the
message. That's how all microcontrollers use it. So I assume that the
way we handle the buffers was never really tested. According to the
public documentation it should just work :)
Let the user decide which evil is the lesser one.
[ Oliver Hartkopp: Provided a sane config option and help text and
made me switch to favour potential and unlikely reordering over
packet loss ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The driver handles pointlessly TWO interrupts per packet. The reason
is that it enables the status interrupt which fires for each rx and tx
packet and it enables the per message object interrupts as well.
The status interrupt merily acks or in case of D_CAN ignores the TX/RX
state and then the message object interrupt fires.
The message objects interrupts are only useful if all message objects
have hardware filters activated.
But we don't have that and its not simple to implement in that driver
without rewriting it completely.
So we can ditch the message object interrupts and handle the RX/TX
right away from the status interrupt. Instead of TWO we handle ONE.
Note: We must keep the TXIE/RXIE bits in the message buffers because
the status interrupt alone is not reliable enough in corner cases.
If we ever have the need for HW filtering, then this code needs a
complete overhaul and we can think about it then. For now we prefer a
lower interrupt load.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
On D_CAN the RXOK, TXOK and LEC bits are cleared/set on read of the
status register. No need to update them.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Instead of writing to the message object we can simply clear the
NewDat bit with the get method.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If the allocation of the error skb fails, we still want to see the
error statistics.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reading the LEC type with
return (mode & ENABLED) && (status & LEC_MASK);
is not guaranteed to return (status & LEC_MASK) if the enabled bit in
mode is set. It's guaranteed to return 0 or !=0.
Remove the inline function and call unconditionally into the
berr_handling code and return early when the reporting is disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If the allocation of an error skb fails, the state change handling
returns w/o doing any work. That leaves the interface in a wreckaged
state as the internal status is wrong.
Split the interface handling and the skb handling.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
There is no guarantee that the skb is in the same state after calling
net_receive_skb(). It might be freed or reused. Not really harmful as
its a read access, except you turn on the proper debugging options
which catch a use after free.
The whole can subsystem is full of this. Copy and paste ....
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The state change handler is called with device interrupts disabled
already. So no point in disabling them again when we enter bus off
state.
But what's worse is that we reenable the interrupts at the end of NAPI
poll unconditionally. So c_can_start() which is called from the
restart timer can trigger interrupts which confuse the hell out of the
half reinitialized driver/hw.
Remove the pointless device interrupt disable in the BUS_OFF handler
and prevent reenabling the device interrupts at the end of the poll
routine when the current state is BUS_OFF.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
c_can_start() enables interrupts way too early. The first enabling
happens when setting the control mode in c_can_chip_config() and then
again at the end of the function.
But that happens before napi_enable() and that means that an interrupt
which comes in will disable interrupts again and call napi_schedule,
which ignores the request and the later napi_enable() is not making
thinks work either. So the interface is up with all device interrupts
disabled.
Move the device interrupt after napi_enable() and add it to the other
callsites of c_can_start() in c_can_set_mode() and c_can_power_up()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
All type checks in c_can.c are != BOSCH_D_CAN so nobody noticed so far
that the pci code does not update the type information.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
David Gibson says:
====================
Fix problems with with IFLA_VF_PORTS (v2)
I've had a customer encounter a problem with getifaddrs(3) freezing up
on a system with a Cisco enic device.
I've discovered that the problem is caused by an enic device with a
large number of SR-IOV virtual functions overflowing the normal sized
packet buffer for netlink, leading to interfaces not being reported
from an RTM_GETLINK request.
The first patch here just makes the problem easier to locate if it
occurs again in a different way, by adding a WARN_ON() when we run out
of room in a netlink packet in this manner.
The second patch actually fixes the problem, by only reporting
IFLA_VF_PORTS information when the RTEXT_FILTER_VF flag is specified.
v2: Corrected some CodingStyle problems
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 115c9b8192 (rtnetlink: Fix problem with
buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST
attribute if they were solicited by a GETLINK message containing an
IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag.
That was done because some user programs broke when they received more data
than expected - because IFLA_VFINFO_LIST contains information for each VF
it can become large if there are many VFs.
However, the IFLA_VF_PORTS attribute, supplied for devices which implement
ndo_get_vf_port (currently the 'enic' driver only), has the same problem.
It supplies per-VF information and can therefore become large, but it is
not currently conditional on the IFLA_EXT_MASK value.
Worse, it interacts badly with the existing EXT_MASK handling. When
IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at
NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then
rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet.
netlink_dump() will misinterpret this as having finished the listing and
omit data for this interface and all subsequent ones. That can cause
getifaddrs(3) to enter an infinite loop.
This patch addresses the problem by only supplying IFLA_VF_PORTS when
IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without IFLA_EXT_MASK specified, the information reported for a single
interface in response to RTM_GETLINK is expected to fit within a netlink
packet of NLMSG_GOODSIZE.
If it doesn't, however, things will go badly wrong, When listing all
interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
message in a packet as the end of the listing and omit information for
that interface and all subsequent ones. This can cause getifaddrs(3) to
enter an infinite loop.
This patch won't fix the problem, but it will WARN_ON() making it easier to
track down what's going wrong.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’:
net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable]
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman says:
====================
netlink: Preventing abuse when passing file descriptors.
Andy Lutomirski when looking at the networking stack noticed that it is
possible to trick privilged processes into calling write on a netlink
socket and send netlink messages they did not intend.
In particular from time to time there are suid applications that will
write to stdout or stderr without checking exactly what kind of file
descriptors those are and can be tricked into acting as a limited form
of suid cat. In other conversations the magic string CVE-2014-0181 has
been used to talk about this issue.
This patchset cleans things up a bit, adds some clean abstractions that
when used prevent this kind of problem and then finally changes all of
the handlers of netlink messages that I could find that call capable to
use netlink_ns_capable or an appropriate wrapper.
The abstraction netlink_ns_capable verifies that the original creator of
the netlink socket a message is sent from had the necessary capabilities
as well as verifying that the current sender of a netlink packet has the
necessary capabilities.
The idea is to prevent file descriptor passing of any form from
resulting in a file descriptor that can do more than it can for the
creator of the file descriptor.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.
To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink_net_capable - The common case use, for operations that are safe on a network namespace
netlink_capable - For operations that are only known to be safe for the global root
netlink_ns_capable - The general case of capable used to handle special cases
__netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
the skbuff of a netlink message.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>