commit 578bc3ef1e ("ipvs: reorganize dest trash") added
IP_VS_DEST_STATE_REMOVING flag and RCU callback named
ip_vs_dest_wait_readers() to keep dests and services after
removal for at least a RCU grace period. But we have the
following corner cases:
- we can not reuse the same dest if its service is removed
while IP_VS_DEST_STATE_REMOVING is still set because another dest
removal in the first grace period can not extend this period.
It can happen when ipvsadm -C && ipvsadm -R is used.
- dest->svc can be replaced but ip_vs_in_stats() and
ip_vs_out_stats() have no explicit read memory barriers
when accessing dest->svc. It can happen that dest->svc
was just freed (replaced) while we use it to update
the stats.
We solve the problems as follows:
- IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed
idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start
will remember when for first time after deletion we noticed
dest->refcnt=0. Later, the connections can grab a reference
while in RCU grace period but if refcnt becomes 0 we can
safely free the dest and its svc.
- dest->svc becomes RCU pointer. As result, we add explicit
RCU locking in ip_vs_in_stats() and ip_vs_out_stats().
- __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it
now can free the service immediately or after a RCU grace
period. dest->svc is not set to NULL anymore.
As result, unlinked dests and their services are
freed always after IP_VS_DEST_TRASH_PERIOD period, unused
services are freed after a RCU grace period.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Schedulers such as lblc and lblcr require the weight to be as high as the
maximum number of active connections. In commit b552f7e3a9
("ipvs: unify the formula to estimate the overhead of processing
connections"), the consideration of inactconns and activeconns was cleaned
up to always count activeconns as 256 times more important than inactconns.
In cases where 3000 or more connections are expected, a weight of 3000 *
256 * 3000 connections overflows the 32-bit signed result used to determine
if rescheduling is required.
On amd64, this merely changes the multiply and comparison instructions to
64-bit. On x86, a 64-bit result is already present from imull, so only
a few more comparison instructions are emitted.
Signed-off-by: Simon Kirby <sim@hostway.ca>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for you net tree,
mostly targeted to ipset, they are:
* Fix ICMPv6 NAT due to wrong comparison, code instead of type, from
Phil Oester.
* Fix RCU race in conntrack extensions release path, from Michal Kubecek.
* Fix missing inversion in the userspace ipset test command match if
the nomatch option is specified, from Jozsef Kadlecsik.
* Skip layer 4 protocol matching in ipset in case of IPv6 fragments,
also from Jozsef Kadlecsik.
* Fix sequence adjustment in nfnetlink_queue due to using the netlink
skb instead of the network skb, from Gao feng.
* Make sure we cannot swap of sets with different layer 3 family in
ipset, from Jozsef Kadlecsik.
* Fix possible bogus matching in ipset if hash sets with net elements
are used, from Oliver Smith.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit afbd8bae9c
vxlan: add implicit fdb entry for default destination
creates an implicit fdb entry for default destination. This results
in an invalid fdb entry if default destination is not specified.
For ex:
ip link add vxlan1 type vxlan id 100
creates the following fdb entry
00:00:00:00:00:00 dev vxlan1 dst 0.0.0.0 self permanent
This patch fixes this issue by creating an fdb entry only if a
valid default destination is specified.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1b7fdd2ab5 ("tcp: do not use cached RTT for RTT estimation")
did not correctly account for the fact that crtt is the RTT shifted
left 3 bits. Fix the calculation to consistently reflect this fact.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clears following warnings :
WARNING: Use include <linux/io.h> instead of <asm/io.h>
WARNING: Use include <linux/uaccess.h> instead of <asm/uaccess.h>
Signed-off-by: Avinash Kumar <avi.kp.137@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has recently turned up that we have a number of long standing bugs
in the network stack cleanup code with use of the loopback device
after it has been freed that have not turned up because in most cases
the storage allocated to the loopback device is not reused, when those
accesses happen.
Set looback_dev to NULL to trigger oopses instead of silent data corrupt
when we hit this class of bug.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
Some bug fixes and future-proofing for the recently added SFC9120
support:
1. Minimal support for the 40G configuration.
2. Disable the incomplete PTP/hardware timestamping support.
3. Reset MAC stats properly after a firmware upgrade.
4. Re-check the datapath firmware capabilities after the controller is
reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Adapt the same behaviour for SCTP as present in TCP for ICMP redirect
messages. For IPv6, RFC4443, section 2.4. says:
...
(e) An ICMPv6 error message MUST NOT be originated as a result of
receiving the following:
...
(e.2) An ICMPv6 redirect message [IPv6-DISC].
...
Therefore, do not report an error to user space, just invoke dst's redirect
callback and leave, same for IPv4 as done in TCP as well. The implication
w/o having this patch could be that the reception of such packets would
generate a poll notification and in worst case it could even tear down the
whole connection. Therefore, stop updating sk_err on redirects.
Reported-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use USB_DEVICE_AND_INTERFACE_INFO and USB_VENDOR_AND_INTERFACE_INFO
macros to reduce boilerplate.
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Cc: <stable@vger.kernel.org> # 3.0+ as far back as it applies cleanly
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.
Introduced by c075b13098 (ip6tnl: advertise tunnel param via rtnl).
Signed-off-by: Ding Zhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is important patch for new devices that support unaligned
addressing. That devices suffer from the backward-compatibility bug in
DMA engine. In theory we should be able to use old mechanism, but in
practice DMA address seems to be randomly copied into status register
when hardware reaches end of a ring. This breaks reading slot number
from status register and we can't use DMA anymore.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this patch it is impossible to read et_swtype, because the 1
byte space is needed for the terminating null byte. The max expected
value is 0xF, so now it should be possible to read decimal form ("15")
and hex form ("0xF").
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices (BCM4749, BCM5357, BCM53572) have internal switch that
requires initialization. We already have code for this, but because
of the typo in code it was never working. This resulted in network not
working for some routers and possibility of soft-bricking them.
Use correct bit for switch initialization and fix typo in the define.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Use separate table for alias entries in the ehea module, otherwise the
probe() function will operate on the separate ports instead of the
lhea-"root" entry of the device-tree
Addresses https://bugzilla.novell.com/show_bug.cgi?id=435215
[ Thadeu notes that: "... this issue might happen with the generation of
initrd, when the scripts check for /sys/class/net/eth0/device/modalias,
which links to the port device at
/sys/devices/ibmebus/23c00400.lhea/port0/" ]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Olaf Hering <ohering@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a serious bug affecting all hash types with a net element -
specifically, if a CIDR value is deleted such that none of the same size
exist any more, all larger (less-specific) values will then fail to
match. Adding back any prefix with a CIDR equal to or more specific than
the one deleted will fix it.
Steps to reproduce:
ipset -N test hash:net
ipset -A test 1.1.0.0/16
ipset -A test 2.2.2.0/24
ipset -T test 1.1.1.1 #1.1.1.1 IS in set
ipset -D test 2.2.2.0/24
ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set
This is due to the fact that the nets counter was unconditionally
decremented prior to the iteration that shifts up the entries. Now, we
first check if there is a proceeding entry and if not, decrement it and
return. Otherwise, we proceed to iterate and then zero the last element,
which, in most cases, will already be zero.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
The "nomatch" commandline flag should invert the matching at testing,
similarly to the --return-nomatch flag of the "set" match of iptables.
Until now it worked with the elements with "nomatch" flag only. From
now on it works with elements without the flag too, i.e:
# ipset n test hash:net
# ipset a test 10.0.0.0/24 nomatch
# ipset t test 10.0.0.1
10.0.0.1 is NOT in set test.
# ipset t test 10.0.0.1 nomatch
10.0.0.1 is in set test.
# ipset a test 192.168.0.0/24
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is NOT in set test.
Before the patch the results were
...
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is in set test.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
When driver registration fails, we need to clean up the resources allocated
before. cxgb4 missed to destroy the workqueue allocated at the very beginning.
This patch destroies the workqueue when registration fails.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
running bonding in ALB mode requires that learning packets be sent periodically,
so that the switch knows where to send responding traffic. However, depending
on switch configuration, there may not be any need to send traffic at the
default rate of 3 packets per second, which represents little more than wasted
data. Allow the ALB learning packet interval to be made configurable via sysfs
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Acked-by: Veaceslav Falico <vfalico@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 8390f814 "atm: nicstar: re-use native mac_pton() helper" did a
usefull thing. However, mac_pton() returns 1 in the case of the successfully
parsed input. This patch fixes a typo.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes sparse warnings when incorrectly handling the port number
and using int instead of unsigned int iterating through &vn->sock_list[].
Keeping the port as __be16 also makes things clearer wrt endianess.
Also, it was pointed out that vxlan_get_rx_port() had unnecessary checks
which got removed.
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o At the time of firmware hang "adapter->need_fw_reset" variable gets
set but after re-initialization of firmware OR at the time of VF
re-initialization that variable was not getting cleared which
was leading to failure in VF reset recovery.Fix it by clearing
this variable before re-initializing VF
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NULL deref happens when br_handle_frame is called between these
2 lines of del_nbp:
dev->priv_flags &= ~IFF_BRIDGE_PORT;
/* --> br_handle_frame is called at this time */
netdev_rx_handler_unregister(dev);
In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced
without check but br_port_get_rcu(dev) returns NULL if:
!(dev->priv_flags & IFF_BRIDGE_PORT)
Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary
here since we're in rcu_read_lock and we have synchronize_net() in
netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT
and by the previous patch, make sure br_port_get_rcu is called in
bridging code.
Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
current br_port_get_rcu is problematic in bridging path
(NULL deref). Change these calls in netlink path first.
Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/toshiba/ps3_gelic_net.c
It's a NOOP since 2.6.35 and I will remove it one day ;)
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
code in drivers/net/ethernet/smsc/
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/pasemi/pasemi_mac.c
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
code in drivers/net/ethernet/natsemi/
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/micrel/ks8851_mll.c
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/lantiq_etop.c
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/hp/hp100.c
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/freescale/fec_main.c
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the already existing pm_cap variable in struct pci_dev for
determining the power management offset. This saves the driver from
having to keep track of an extra variable.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Nithin Nayak Sujir <nsujir@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the already existing pm_cap variable in struct pci_dev for
determining the power management offset. This saves the driver from
having to keep track of an extra variable.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pci_enable_device_mem() will set device power state to D0,
so it's no need to do it again in alx_probe().
Also remove redundant PM Cap find code, because pci core
has been saved the pci device pm cap value.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid unneeded local string buffers for constructing debug output. Also
cleans up debug calls that contain a single parameter so that they cannot
be accidentally parsed as format strings.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Libo Chen <libo.chen@huawei.com>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to ixgbe and e1000e.
Jacob provides a ixgbe patch to fix the configure_rx patch to properly
disable RSC hardware logic when a user disables it. Previously we only
disabled RSC in the queue settings, but this does not fully disable
hardware RSC logic which can lead to unexpected performance issues.
Emil provides three fixes for ixgbe. First fixes the ethtool loopback
test when DCB is enabled, where the frames may be modified on Tx
(by adding VLAN tag) which will fail the check on receive. Then a fix
for QSFP+ modules, limit the speed setting to advertise only one speed
at a time since the QSFP+ modules do not support auto negotiation.
Lastly, resolve an issue where the driver will display incorrect info
for QSFP+ modules that were inserted after the driver has been loaded.
David Ertman provides to fixes for e1000e, one removes a comparison to
the boolean value true where evaluating the lvalue will produce the
same result. The other fixes an error in the calculation of the
rar_entry_count, which causes a write of unkown/undefined register
space in the MAC to unknown/undefined register space in the PHY.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When copying the MAC RAR registers to PHY there is an error in the
calculation of the rar_entry_count, which causes a write of unknown/
undefined register space in the MAC to unknown/undefined register space in
the PHY.
This patch fixes the overrun with writing to the PHY RAR and also fixes the
ethtool offline register tests so that the correctly addressed registers
have the appropriate bitmasks for R/W and RO bits for affected parts.
Shawn Rader gets credit for finding and fixing the register overrun.
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
CC: Shawn Rader <shawn.t.rader@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Removing a comparison to the boolean value true where simply interrogating
the lvalue will produce the same result.
Signed-off-by: David Ertman <davidx.m.ertman@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch resolves an issue where the driver will display incorrect info
for Q/SFP+ modules that were inserted after the driver has been loaded.
This patch adds a call to identify_phy() in ixgbe_get_settings() prior to
calling get_link_capabilities() which needs the PHY data in order to
determine the correct settings.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
QSFP+ modules do not support auto negotiation and should advertise only
one speed at a time.
This patch adds logic in ethtool to allow setting and reporting the
advertised speed at either 1Gbps or 10Gbps, but not both. Also limits
the speed set in ixgbe_sfp_link_config_subtask() to highest supported.
Previously the link was set to whatever the supported speeds were.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>