Commit Graph

617774 Commits

Author SHA1 Message Date
Florian Westphal ad66713f5a netfilter: remove __nf_ct_kill_acct helper
After timer removal this just calls nf_ct_delete so remove the __ prefix
version and make nf_ct_kill a shorthand for nf_ct_delete.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30 11:43:10 +02:00
Florian Westphal c023c0e4a0 netfilter: conntrack: resched gc again if eviction rate is high
If we evicted a large fraction of the scanned conntrack entries re-schedule
the next gc cycle for immediate execution.

This triggers during tests where load is high, then drops to zero and
many connections will be in TW/CLOSE state with < 30 second timeouts.

Without this change it will take several minutes until conntrack count
comes back to normal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30 11:43:09 +02:00
Florian Westphal b87a2f9199 netfilter: conntrack: add gc worker to remove timed-out entries
Conntrack gc worker to evict stale entries.

GC happens once every 5 seconds, but we only scan at most 1/64th of the
table (and not more than 8k) buckets to avoid hogging cpu.

This means that a complete scan of the table will take several minutes
of wall-clock time.

Considering that the gc run will never have to evict any entries
during normal operation because those will happen from packet path
this should be fine.

We only need gc to make sure userspace (conntrack event listeners)
eventually learn of the timeout, and for resource reclaim in case the
system becomes idle.

We do not disable BH and cond_resched for every bucket so this should
not introduce noticeable latencies either.

A followup patch will add a small change to speed up GC for the extreme
case where most entries are timed out on an otherwise idle system.

v2: Use cond_resched_rcu_qs & add comment wrt. missing restart on
nulls value change in gc worker, suggested by Eric Dumazet.

v3: don't call cancel_delayed_work_sync twice (again, Eric).

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30 11:43:09 +02:00
Florian Westphal 2344d64ec7 netfilter: evict stale entries on netlink dumps
When dumping we already have to look at the entire table, so we might
as well toss those entries whose timeout value is in the past.

We also look at every entry during resize operations.
However, eviction there is not as simple because we hold the
global resize lock so we can't evict without adding a 'expired' list
to drop from later.  Considering that resizes are very rare it doesn't
seem worth doing it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30 11:43:09 +02:00
Florian Westphal f330a7fdbe netfilter: conntrack: get rid of conntrack timer
With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.

Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9de, 'timers: Switch to
a non-cascading wheel').

Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.

During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.

The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.

Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.

This is the standard/expected way when conntrack entries are destroyed.

Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30 11:43:09 +02:00
Florian Westphal 616b14b469 netfilter: don't rely on DYING bit to detect when destroy event was sent
The reliable event delivery mode currently (ab)uses the DYING bit to
detect which entries on the dying list have to be skipped when
re-delivering events from the eache worker in reliable event mode.

Currently when we delete the conntrack from main table we only set this
bit if we could also deliver the netlink destroy event to userspace.

If we fail we move it to the dying list, the ecache worker will
reattempt event delivery for all confirmed conntracks on the dying list
that do not have the DYING bit set.

Once timer is gone, we can no longer use if (del_timer()) to detect
when we 'stole' the reference count owned by the timer/hash entry, so
we need some other way to avoid racing with other cpu.

Pablo suggested to add a marker in the ecache extension that skips
entries that have been unhashed from main table but are still waiting
for the last reference count to be dropped (e.g. because one skb waiting
on nfqueue verdict still holds a reference).

We do this by adding a tristate.
If we fail to deliver the destroy event, make a note of this in the
eache extension.  The worker can then skip all entries that are in
a different state.  Either they never delivered a destroy event,
e.g. because the netlink backend was not loaded, or redelivery took
place already.

Once the conntrack timer is removed we will now be able to replace
del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid
racing with other cpu that tries to evict the same conntrack.

Because DYING will then be set right before we report the destroy event
we can no longer skip event reporting when dying bit is set.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30 11:43:08 +02:00
Florian Westphal 95a8d19f28 netfilter: restart search if moved to other chain
In case nf_conntrack_tuple_taken did not find a conflicting entry
check that all entries in this hash slot were tested and restart
in case an entry was moved to another chain.

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: ea781f197d ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30 11:43:08 +02:00
David S. Miller 3201a39ba8 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2016-08-29

This series contains updates to fm10k only.

Jake provides all the changes in this series starting with fixes an issue
where VF devices may fail during an unbind/bind and we will never zero
the reference counter for the pci_dev structure.  Updated the hot path
to use SW counters instead of checking for hardware Tx pending for
possible transmit hangs, which will improve performance.  Fixed the NAPI
budget accounting so that fm10k_poll will return actual work done,
capped at (budget - 1) instead of returning 0.  Added a check to ensure
that the device is in the normal IO state before continuing to probe,
which allows us to give a more descriptive message of what is wrong
in the case of uncorrectable AER error.  In preparation for adding Geneve
Rx offload support, refactored the current VXLAN offload flow to be a bit
more generic.  Added support for receive offloads on one Geneve tunnel.
Ensure that other bits in the RXQCTL register do not get cleared, to
make sure that bits related to queue ownership are maintained.  Fixed
an issue in queue ownership assignment which casued a race condition
between the PF and the VF such that potentially a VF could cause FUM
fault errors due to normal PF/VF driver behavior.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 01:52:09 -04:00
David S. Miller 6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Linus Torvalds e4e98c460a hwmon patch for v4.8-rc5
Add missing sysfs attribute group terminator to it87 driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXxOcJAAoJEMsfJm/On5mBLuMQAIMl/6nPwcZA3TOqg8ACdE61
 CuqhX22uIwR/n+0ni38D+XuVpuAcFeQaC2N+PCkO1dtwSXEqRUxPgXyxElxe1+2V
 o53ALqMBPZv2/MgB95KU/nzo//2KxrnRzLPxU3CPBT8A4LLvqhe2dXdH17ojhoao
 7bYyrbq3YUOB4lvP/SDUtmiaie0xfaTGJnOibtKE1bHqwlbcz7ntuZSY7KrSZXfk
 PE7Vf3coLObijTIwEpfaGOJYr/WRZEEmdUH7bj4EvGF0jw8pG9Ho6c+RY/cPdTrb
 TxfpvfNhTxKdC2TWuCZp0fz4vhb7tVIVM8C9nNFrn5Z8O+GcgM6j2JIwCUBj2fLz
 ACyoogbnYhalN/aw/O0949dgMYkF/+Dmi44qQFudIODxr+LX6U/71pkOLv5GXeWx
 EGwvYP/GUUb0ZSzK//vUdrItEwxtho4ZduLPHKGxi7y/qbjKvkEAXpzLleeWgzqY
 DMw0lf8ffv99R5eAuur6f/VmLSAy8y+HVqqvHFj5vGTNXKsoEMZBjeMhw5/5Z2gD
 zQwTaoAlcfBcO60hnGMGy5fA7iPwfeY5Vv0wBlmCNv1zyCADpmna1benM1i7ehYw
 TClvaDMjLJFIRVGI0W1IaicVKHOIwraN3pAwp46CzBhQu7/LG4xlF854QPBZNLOW
 j86KN9JajF+hHOyKER5e
 =Lb8i
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Add missing sysfs attribute group terminator to it87 driver"

* tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (it87) Add missing sysfs attribute group terminator
2016-08-29 19:12:35 -07:00
Linus Torvalds b8927721ae Fix bugs that could cause kernel deadlocks or file system corruption
while moving xattrs to expand the extended inode.  Also add some
 sanity checks to the block group descriptors to make sure we don't end
 up overwriting the superblock.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXw7i2AAoJEPL5WVaVDYGj96gH/A8rNgx7BoqPx3kanVEamblT
 tM0X9JcEGmKHN4enRts2b78EWbR0/U0SOP92+fg9SSq2MDJ0/kdaKLWmbUwx8jUi
 B7HMEqCprlCdigK7wwt3xF+6edyZRhtzlWy3bhxJ40f0KT5CuriSQbxogr931uKl
 hUKW2h5JtUqHtINzTt4oWjVm8xwrScxuYHYAcpw0G42ZzfO6xQOzQdowcx4m3cE9
 PrtTbU5MwW8/wgsdLiClScQq30MK/GCbHh5heyRt1BcNo9+MDsZDOgdavh9StfnW
 Bl1N6zwRtRBJNcpKWfTfwU4NTIvStCTyA8BJgKgE95YIHDsstJVl4MO7ot25qbM=
 =pXe+
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix bugs that could cause kernel deadlocks or file system corruption
  while moving xattrs to expand the extended inode.

  Also add some sanity checks to the block group descriptors to make
  sure we don't end up overwriting the superblock"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: avoid deadlock when expanding inode size
  ext4: properly align shifted xattrs when expanding inodes
  ext4: fix xattr shifting when expanding inodes part 2
  ext4: fix xattr shifting when expanding inodes
  ext4: validate that metadata blocks do not overlap superblock
  ext4: reserve xattr index for the Hurd
2016-08-29 12:37:11 -07:00
Linus Torvalds 1f6a563ee0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Segregate namespaces properly in conntrack dumps, from Liping Zhang.

 2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet.

 3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz.

 4) Fix use-after-free in tcp_xmit_retransmit_queue().

 5) Userspace header fixups (use of __u32, missing includes, etc.) from
    Mikko Rapeli.

 6) Further refinements to fragmentation wrt gso and tunnels, from
    Shmulik Ladkani.

 7) Trigger poll correctly for zero length UDP packets, from Eric
    Dumazet.

 8) TCP window scaling fix, also from Eric Dumazet.

 9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets.

10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet.

11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng.

12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet.

13) Add new device ID to alx driver, from Owen Lin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
  Add Killer E2500 device ID in alx driver.
  net: smc91x: fix SMC accesses
  Documentation: networking: dsa: Remove platform device TODO
  net/mlx5: Increase number of ethtool steering priorities
  net/mlx5: Add error prints when validate ETS failed
  net/mlx5e: Fix memory leak if refreshing TIRs fails
  net/mlx5e: Add ethtool counter for TX xmit_more
  net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
  net/mlx5e: Don't wait for SQ completions on close
  net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
  net/mlx5e: Don't wait for RQ completions on close
  net/mlx5e: Limit UMR length to the device's limitation
  rhashtable: fix a memory leak in alloc_bucket_locks()
  sfc: fix potential stack corruption from running past stat bitmask
  team: loadbalance: push lacpdus to exact delivery
  net: hns: dereference ppe_cb->ppe_common_cb if it is non-null
  8139cp: Fix one possible deadloop in cp_rx_poll
  i40e: Change some init flow for the client
  Revert "phy: IRQ cannot be shared"
  net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
  ...
2016-08-29 12:29:13 -07:00
Linus Torvalds cf4d3779e5 platform-drivers-x86 for 4.8-4
Remove module related code from two drivers that are only configurable as
 built-in.
 
 intel_pmic_gpio:
  - Make explicitly non-modular
 
 platform/olpc:
  - Make ec explicitly non-modular
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXw8nIAAoJEKbMaAwKp364M/gH/iWSc0YIZppWjAF3brN8Vgv/
 PSKzG2BJHiDV/LYsGBBmgPTcT7Y56Oun88MkVCdTx/oKuTYY0CzMd5FE1rpNJoyz
 8kKy8Vi4PIj4qqAZA2//8GSY7iD/wX+9Zm2e42Hjo2gXxGF2RyrbLQue9C9vHrWr
 3WyCZEeN/gaED6G7tKMOdPQnqAJFdqjvAhsvKU/dedWIWcgB9mee6drQUrJBuWfO
 qnLUQrQXVGXQO3xOaPbJmusVhzrbUm4qlhb6cW86S/lNFp6JmN9XmgxULt/cHewE
 TIdhODBDYgbrBDOzViQ4QVXZprq6P2EXzawXK9EWjNswZODJW5hIAgwPOizTgLo=
 =GL9U
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
 "Remove module related code from two drivers that are only configurable
  as built-in: intel_pmic_gpio and platform/olpc"

* tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  intel_pmic_gpio: Make explicitly non-modular
  platform/olpc: Make ec explicitly non-modular
2016-08-29 12:20:22 -07:00
Linus Torvalds 2a90309e06 powerpc fixes for 4.8 #4
Andrew Donnellan (1):
       cxl: use pcibios_free_controller_deferred() when removing vPHBs
 
 Andrzej Hajda (1):
       powerpc/powernv/pci: fix iterator signedness
 
 Boqun Feng (1):
       powerpc, hotplug: Avoid to touch non-existent cpumasks.
 
 Christophe Leroy (1):
       powerpc: sysdev: cpm: fix gpio save_regs functions
 
 Cyril Bur (1):
       powerpc: signals: Discard transaction state from signal frames
 
 Guenter Roeck (1):
       powerpc: cputhreads: Add missing include file
 
 Markus Elfring (3):
       drivers/macintosh: Delete owner assignment
       powerpc/512x: Delete unnecessary assignment for the field "owner"
       powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner"
 
 Mauricio Faria de Oliveira (1):
       powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
 
 Michael Ellerman (1):
       powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
 
 Mukesh Ojha (1):
       powerpc/powernv : Drop reference added by kset_find_obj()
 
 Nicholas Piggin (3):
       powerpc/pseries: PACA save area fix for general exception vs MCE
       powerpc/pseries: PACA save area fix for MCE vs MCE
       powerpc/tm: do not use r13 for tabort_syscall
 
 Paolo Bonzini (1):
       powerpc: move hmi.c to arch/powerpc/kvm/
 
 Paul Gortmaker (1):
       powerpc: migrate exception table users off module.h and onto extable.h
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXw7jmAAoJEHM62YSLdExeu6YP+gMNWkW4CDjvIo8OsVMhYKqX
 RP3c6xa8APcizfdxssKMwX+STP6nfEzRUMCsDMmChSSpb1ZtSNdwM2t2zmlLDe8r
 wKHN7Chqw1EJsnIT6TLa79fvHft03MyGQCTjeRvr54aFcTVJHuAgZnTeuejFBrtN
 hHCsO/pEv+KjgGlDkN+UvEMUs0quI5Ri8kRJpDxO6il1agcPpeYFYB3ksIcgwaVH
 0X/MwLJbzENmiDLhtb3S0jQWGMijmpPbcz5b3z36gIpTxddEPnC9UhYCdc924V/C
 sREuku7rihXDEda6B/qNpk8kOxXxIkE/dh5eWVMyWVDHIXJA6hy4z0zpoF9xe4Ba
 H2/ERyCfsP3LqonOpDOWKs/tkTAmMXOwdfgr+geoc4vDsbR2X8jyjHqb3bdj7J2A
 YtcjmQ/w9HCnzC0weme1+0xR2tfyjUciNOdNoRHP/shLAObzc14xXlpg0IZ3eou2
 oozGqkHnGoA6SrsqezF0UmsAf3884BcVscY1N45p1s8ulMRkyHTgPHM8KnGpHoHq
 1zgUZji2F4VFgjGrMJyMZ82c3usP3/N65d2nkvRv/QzkmdTat4OmXLiB0Qky26Wy
 gW7GsyRpGdEvW/KxhLxuobxPw7xg4mGQ3MdUMWzL3Df17IuRquuFhjljeQIUsQDE
 I7NVfZATWJ7mqJQU2s5L
 =+1I0
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Ben Herrenschmidt:
 "This was meant to be sent early last week, but I has a change pending
  on one of the fixes and other things made me forget all about.  Ugh.

  We have some misc fixes for powerpc 4.8.  Some trivial bits and some
  regressions, and a trivial cleanup or two that I saw no point in
  letting rot in patchwork"

* tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: signals: Discard transaction state from signal frames
  powerpc/powernv : Drop reference added by kset_find_obj()
  powerpc/tm: do not use r13 for tabort_syscall
  powerpc: move hmi.c to arch/powerpc/kvm/
  powerpc: sysdev: cpm: fix gpio save_regs functions
  powerpc/pseries: PACA save area fix for MCE vs MCE
  powerpc/pseries: PACA save area fix for general exception vs MCE
  powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
  powerpc, hotplug: Avoid to touch non-existent cpumasks.
  powerpc: migrate exception table users off module.h and onto extable.h
  powerpc/powernv/pci: fix iterator signedness
  powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
  cxl: use pcibios_free_controller_deferred() when removing vPHBs
  powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner"
  powerpc/512x: Delete unnecessary assignment for the field "owner"
  drivers/macintosh: Delete owner assignment
  powerpc: cputhreads: Add missing include file
2016-08-29 12:12:15 -07:00
Jean Delvare 3c3292634f hwmon: (it87) Add missing sysfs attribute group terminator
Attribute array it87_attributes_in lacks its NULL terminator,
causing random behavior when operating on the attribute group.

Fixes: 5292971563 ("hwmon: (it87) Use is_visible for voltage sensors")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-08-29 05:31:31 -07:00
Jacob Keller 325782a173 fm10k: don't re-map queues when a mailbox message suffices
When the PF assigns a new MAC address to a VF it uses the base address
registers to store the MAC address. This allows a VF which loads after
this setup the ability to get the initial address without having to wait
for a mailbox message. Unfortunately to do this, the PF must take queue
ownership away from the VF, which can cause fault errors when there is
already an active VF driver.

This queue ownership assignment causes race condition between the PF and
the VF such that potentially a VF can cause FUM fault errors due to
normal PF/VF driver behavior.

It is not safe to simply allow the PF to write the base address
registers without taking queue ownership back as the PF must also
disable the queues, and this would impact active VF use. The current
code is safe because the queue ownership will prevent the VF from
actually writing but does trigger the FUM fault.

We can do better by simply avoiding the register write process when
a mailbox message suffices. If the message can be sent over the mailbox,
then we will not perform the queue ownership assignment and we won't
update the base address to be the same as the MAC address.

We do still have to write the TXQCTL registers in order to update the
VID of the queue. This is necessary because the TXQCTL register is
read-only from the VF, and thus the VF cannot do this for itself. This
register does not need to wait for the Tx queue to be disabled and is
safe for the PF to write during normal VF operation, so we move this
write to the top of the function above the mailbox message. Without
this, the TXQCTL register would be misconfigured and cause the VF to Tx
hang.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller c689eff124 fm10k: don't clear the RXQCTL register when enabling or disabling queues
Ensure that other bits in the RXQCTL register do not get cleared. This
ensures that bits related to queue ownership are maintained.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 9717c77213 fm10k: remove unnecessary extra parenthesis around ((~value))
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 1ad782928f fm10k: add support for Rx offloads on one Geneve tunnel
Similar to how we handle VXLAN offload, enable support for a single
Geneve tunnel.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller f92e0e4892 fm10k: rework vxlan_port offload before adding geneve support
In preparation for adding Geneve Rx offload support, refactor the
current VXLAN offload flow to be a bit more generic so that it will be
easier to add the new Geneve code. The fm10k hardware supports one VXLAN
and one Geneve tunnel, so we will eventually treat the VXLAN and Geneve
tunnels identically. To this end, factor out the code that handles the
current list so that we can use the generic flow for both tunnels in the
next patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 5f45c83024 fm10k: don't try to stop queues if we've lost hw_addr
In the event of a surprise remove, we expect the driver to go down,
which includes calling .stop_hw(). However, this function will return an
error because the queues won't appear to cleanly disable. Prevent this
and avoid the unnecessary checks by just returning when
FM10K_REMOVED(hw->hw_addr) is true.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 18095937cb fm10k: don't continue probe if PCI device not in normal IO state
In the event of an uncorrectable AER error occurring when the driver has
not loaded, the recovery routines are not done. This is done because
future loads of the driver may not be aware of the IO state and may not
be able to recover at all. In this case, when we next load the driver it
fails due to what appears to be a surprise remove event. Instead, add
a check to ensure that the device is in the normal IO state before
continuing to probe. This allows us to give a more descriptive message
of what is wrong.

Without this change, the driver will attempt to probe up to our first
call of .reset_hw() which will be unable to read registers and act as if
a surprise remove event occurred.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 76ef0fc5a7 fm10k: print error code when pci_enable_device_mem fails during probe
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller e5fbfb7864 fm10k: NAPI polling routine must return actual work done
When fm10k_poll fully cleans rings it returns 0. This is incorrect as it
messes up the budget accounting in the core NAPI code. Fix this by
returning actual work done, capped at budget - 1 since the core doesn't
expect a return of the full budget when the driver modifies the NAPI
status.

Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller ce4dad2ce2 fm10k: prefer READ_ONCE instead of ACCESS_ONCE
While technically not needed, as all our uses of ACCESS_ONCE are scalar
types, we already use READ_ONCE in a few places, and for code
readability we can swap all the uses of the older ACCESS_ONCE into
READ_ONCE.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 88cdcfec9a fm10k: remove fm10k_get_reta_size from namespace
The function is only used in fm10k_ethtool.c, so make it static.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 4aa0bd54d4 fm10k: use variadic form of alloc_workqueue
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller 5b9e4432db fm10k: use software values when checking for Tx hangs in hot path
A previous patch added support to check for hardware Tx pending in the
fm10k_down routine. This support was intended to ensure that we
accurately check what the hardware state is. However, checking for Tx
hangs in this manor during the hotpath results in a large performance
hit. Avoid this by making the hotpath check use the SW counters instead.

Fixes: a0f53cf49cb0 ("fm10k: use actual hardware registers when checking for pending Tx", 2016-06-08)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Jacob Keller e59a393d08 fm10k: fix PCI device enable_cnt leak in .io_slot_reset
A previous patch removed the pci_disable_device() call in
.io_error_detected. This call corresponded to a pci_enable_device_mem()
call within .io_slot_reset handler. Change the call here to
a pci_reenable_device() so that it does not increment and leak the
enable_cnt reference count for the device. Without this change, VF
devices may fail during an unbind/bind, and we'll never zero the
reference counter for the pci_dev structure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29 01:31:03 -07:00
Paul Gortmaker da43bf0c21 intel_pmic_gpio: Make explicitly non-modular
The Kconfig entry controlling compilation of this code is:

drivers/platform/x86/Kconfig:config GPIO_INTEL_PMIC
drivers/platform/x86/Kconfig:   bool "Intel PMIC GPIO support"

...meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.

We delete the MODULE_LICENSE tag etc. since all that information
was (or is now) contained at the top of the file in the comments.

We don't replace module.h with init.h since the file already has that.

Cc: Alek Du <alek.du@intel.com>
Cc: platform-driver-x86@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-08-28 22:31:52 -07:00
Paul Gortmaker f48d1496b8 platform/olpc: Make ec explicitly non-modular
The Kconfig entry controlling compilation of this code is:

arch/x86/Kconfig:config OLPC
arch/x86/Kconfig:       bool "One Laptop Per Child support"

...meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.

We delete the MODULE_LICENSE tag etc. since all that information
was (or is now) contained at the top of the file in the comments.

Cc: platform-driver-x86@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-08-28 22:31:52 -07:00
Arnd Bergmann 0b498a5277 net_sched: fix use of uninitialized ethertype variable in cls_flower
The addition of VLAN support caused a possible use of uninitialized
data if we encounter a zero TCA_FLOWER_KEY_ETH_TYPE key, as pointed
out by "gcc -Wmaybe-uninitialized":

net/sched/cls_flower.c: In function 'fl_change':
net/sched/cls_flower.c:366:22: error: 'ethertype' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This changes the code to only set the ethertype field if it
was nonzero, as before the patch.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 9399ae9a6c ("net_sched: flower: Add vlan support")
Cc: Hadar Hen Zion <hadarh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29 00:30:23 -04:00
Arnd Bergmann f9dc70744d net/xgene: fix error handling during reset
The newly added reset logic uses helper functions for the MMIO that
may fail. However, when the read operation fails, we end up writing
back uninitialized data to the register, as gcc warns:

drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c: In function 'xgene_enet_link_state':
drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:213:2: error: 'data' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:209:6: note: 'data' was declared here
  u32 data;

We already print a warning to the console log if that happens,
the best alternative that I can see is skip the rest of the reset
sequence if the register value cannot be read: Most likely the
write would fail as well, and if it succeeded, worse things could
happen.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 3eb7cb9dc9 ("drivers: net: xgene: XFI PCS reset when link is down")
Cc: Fushen Chen <fchen@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29 00:29:46 -04:00
Vidya Sagar Ravipati 5711a98221 net: ethtool: add support for 1000BaseX and missing 10G link modes
This patch enhances ethtool link mode bitmap to include
missing interface modes for 1G/10G speeds

Changes:
1000baseX is the mode introduced to cover all 1G Fiber cases.
All modes under 1000BaseX i.e. 1000BASE-SX, 1000BASE-LX, 1000BASE-LX10
and 1000BASE-BX10 are not explicitly defined at this moment.
10G CR,SR,LR and ER link modes are included for 10G speed..

Issue:
ethtool on  1G/10G SFP port reports Base-T
as this port supports 1000baseX,10G CR, SR and LR modes.

root@tor-02$ ethtool swp1
Settings for swp1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: off
        Current message level: 0x00000000 (0)

        Link detected: yes

After fix:
root@tor-02$ ethtool swp1
Settings for swp1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
                                10000baseLR/Full
                                10000baseER/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: off
        Current message level: 0x00000000 (0)
        Link detected: yes

Signed-off-by: Vidya Sagar Ravipati <vidya@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29 00:27:34 -04:00
James Morse a039b63859 amd-xgbe: Reset running devices after resume from hibernate
After resume from hibernate on arm64, any amd-xgbe devices that were
running when we hibernated are reported as down, even when it is not.

Re-plugging the cables does not cause the interface to come back, the
link must be marked as down then up via 'ip set link' using the serial
console.

This happens because the device has been power-cycled and possibly
re-initialised by firmware, whereas the driver's memory structures have
been restored from the hibernate image and the two do not agree.

Schedule a restart of the device after powerup in case the world changed
while we were asleep.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29 00:25:47 -04:00
Owen Lin b99b43bb4b Add Killer E2500 device ID in alx driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29 00:23:50 -04:00
Eric Dumazet c9c3321257 tcp: add tcp_add_backlog()
When TCP operates in lossy environments (between 1 and 10 % packet
losses), many SACK blocks can be exchanged, and I noticed we could
drop them on busy senders, if these SACK blocks have to be queued
into the socket backlog.

While the main cause is the poor performance of RACK/SACK processing,
we can try to avoid these drops of valuable information that can lead to
spurious timeouts and retransmits.

Cause of the drops is the skb->truesize overestimation caused by :

- drivers allocating ~2048 (or more) bytes as a fragment to hold an
  Ethernet frame.

- various pskb_may_pull() calls bringing the headers into skb->head
  might have pulled all the frame content, but skb->truesize could
  not be lowered, as the stack has no idea of each fragment truesize.

The backlog drops are also more visible on bidirectional flows, since
their sk_rmem_alloc can be quite big.

Let's add some room for the backlog, as only the socket owner
can selectively take action to lower memory needs, like collapsing
receive queues or partial ofo pruning.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29 00:20:24 -04:00
Russell King 2fb04fdf30 net: smc91x: fix SMC accesses
Commit b70661c708 ("net: smc91x: use run-time configuration on all ARM
machines") broke some ARM platforms through several mistakes.  Firstly,
the access size must correspond to the following rule:

(a) at least one of 16-bit or 8-bit access size must be supported
(b) 32-bit accesses are optional, and may be enabled in addition to
    the above.

Secondly, it provides no emulation of 16-bit accesses, instead blindly
making 16-bit accesses even when the platform specifies that only 8-bit
is supported.

Reorganise smc91x.h so we can make use of the existing 16-bit access
emulation already provided - if 16-bit accesses are supported, use
16-bit accesses directly, otherwise if 8-bit accesses are supported,
use the provided 16-bit access emulation.  If neither, BUG().  This
exactly reflects the driver behaviour prior to the commit being fixed.

Since the conversion incorrectly cut down the available access sizes on
several platforms, we also need to go through every platform and fix up
the overly-restrictive access size: Arnd assumed that if a platform can
perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
size needed to be specified - not so, all available access sizes must
be specified.

This likely fixes some performance regressions in doing this: if a
platform does not support 8-bit accesses, 8-bit accesses have been
emulated by performing a 16-bit read-modify-write access.

Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
accesses, which was broken by the original commit.

Fixes: b70661c708 ("net: smc91x: use run-time configuration on all ARM machines")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:44:55 -04:00
Florian Fainelli 7d13eca09e Documentation: networking: dsa: Remove platform device TODO
Since commit 83c0afaec7 ("net: dsa: Add new binding implementation"),
the shortcomings of the dsa platform device have been addressed, remove
that TODO item.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:43:06 -04:00
Colin Ian King 1a8ff8f52f cxgb4/cxgb4vf: fix spelling mistake "provissioned" -> "provisioned"
Trivial fix to spelling mistake in dev_warn message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:41:46 -04:00
Colin Ian King b9780a810b net: ucc_geth: fix spelling mistake "propperty" -> "property"
Trivial fix to spelling mistake in dev_warn message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:41:46 -04:00
Colin Ian King 24a24d07d6 wan/fsl_ucc_hdlc: fix spelling mistake "prameter" -> "parameter"
Trivial fix to spelling mistake in dev_err message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:41:46 -04:00
David S. Miller c6f04e93cb Merge branch 'strp-generalization'
Tom Herbert says:

====================
strp: Generalize stream parser to work with other socket types

Add a read_sock protocol operation function that allows something like
tcp_read_sock to be called for other protocol types.

Specific changes in this patch set:
  - Add read_sock function to proto_ops. This has the same signature as
    tcp_read_sock. sk_read_actor_t is also defined in net.h.
  - Set peek_len and read_sock proto_op functions for TCPv4 and TCPv6
    stream ops.
  - Remove references to tcp in strparser.
  - Call peek_len and read_sock operations from strparser instead of
    calling TCP specific functions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:32:58 -04:00
Tom Herbert 96a5908347 kcm: Remove TCP specific references from kcm and strparser
kcm and strparser need to work with any type of stream socket not just
TCP. Eliminate references to TCP and call generic proto_ops functions of
read_sock and peek_len. Also in strp_init check if the socket support
the proto_ops read_sock and peek_len.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:32:41 -04:00
Tom Herbert 3203558589 tcp: Set read_sock and peek_len proto_ops
In inet_stream_ops we set read_sock to tcp_read_sock and peek_len to
tcp_peek_len (which is just a stub function that calls tcp_inq).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:32:41 -04:00
Tom Herbert 0294b625ad net: Add read_sock proto_op
Add new function in proto_ops structure. This includes moving the
typedef got sk_read_actor into net.h and removing the definition from
tcp.h.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:32:41 -04:00
David S. Miller e4d986a878 Merge branch 'mlx5-series'
Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes 2016-08-29

This series contains some bug fixes for the mlx5 core and mlx5
ethernet driver.

From Saeed, Fix UMR to consider hardware translation table field
size limitation when calculating the maximum number of MTTs required
by the driver.  Three patches to speed-up netdevice close time by
serializing channel (SQs & RQs) destruction rather than issuing and
waiting for hardware interrupts to free them.

From Eran, Fix ethtool ring parameter reporting for striding RQ layout.
Add error prints on ETS validation failure.

From Kamal, Fix memory leak on error flow.

From Maor, Fix ethtool steering priorities number.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:24 -04:00
Maor Gottlieb e5835f2833 net/mlx5: Increase number of ethtool steering priorities
Ethtool has 11 flow tables, each flow table has its own priority.
Increase the number of priorities to be aligned with the number of flow
tables.

Fixes: 1174fce8d1 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:16 -04:00
Eran Ben Elisha 1722b9694e net/mlx5: Add error prints when validate ETS failed
Upon set ETS failure due to user invalid input, add error prints to
specify the exact error to the user.

Fixes: cdcf11212b ('net/mlx5e: Validate BW weight values of ETS')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:16 -04:00
Kamal Heib bf50082c15 net/mlx5e: Fix memory leak if refreshing TIRs fails
Free 'in' command object also when mlx5_core_modify_tir fails.

Fixes: 724b2aa151 ("net/mlx5e: TIRs management refactoring")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00