The fixed phys delete function simply removed the fixed phy from the
internal linked list and freed the memory. It however did not
unregister the associated phy device. This meant it was still possible
to find the phy device on the mdio bus.
Make fixed_phy_del() an internal function and add a
fixed_phy_unregister() to unregisters the phy device and then uses
fixed_phy_del() to free resources.
Modify DSA to use this new API function, so we don't leak phys.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per RFC4898, they count segments sent/received
containing a positive length data segment (that includes
retransmission segments carrying data). Unlike
tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments
carrying no data (e.g. pure ack).
The patch also updates the segs_in in tcp_fastopen_add_skb()
so that segs_in >= data_segs_in property is kept.
Together with retransmission data, tcpi_data_segs_out
gives a better signal on the rxmit rate.
v6: Rebase on the latest net-next
v5: Eric pointed out that checking skb->len is still needed in
tcp_fastopen_add_skb() because skb can carry a FIN without data.
Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in()
helper is used. Comment is added to the fastopen case to explain why
segs_in has to be reset and tcp_segs_in() has to be called before
__skb_pull().
v4: Add comment to the changes in tcp_fastopen_add_skb()
and also add remark on this case in the commit message.
v3: Add const modifier to the skb parameter in tcp_segs_in()
v2: Rework based on recent fix by Eric:
commit a9d99ce28e ("tcp: fix tcpi_segs_in after connection establishment")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit enables finding appropriate mbus window and obtaining its
target id and attribute for given physical address in two separate
routines, both for IO and DRAM windows. This functionality
is needed for Armada XP/38x Network Controller's Buffer Manager and
PnC configuration.
[gregory.clement@free-electrons.com: Fix size test for
mvebu_mbus_get_dram_win_info]
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
[DRAM window information reference in LKv3.10]
Signed-off-by: Evan Wang <xswang@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a very small one this time, with only 5 patches.
There are a couple of big items that could not be merged/finished
on time.
We have:
- 2 LLCP fixes for a race and a potential OOM.
- 2 cleanups for the pn544 and microread drivers.
- 1 Maintainer addition for the s3fwrn5 driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW4ygDAAoJEIqAPN1PVmxKQ5gQAIN5MaT6gcu2/p1kd7FzPKyC
KRb2EU907uVQObNLyZgU6liLYQ2prAvK7Dr9SuL10sl+zJkdU2TUuPfK+G4I50qQ
nY7dHN94S0d1F6vBZVKFslh9+mrJKZCqhyN7WrCdXT2i7W5bYO8lw419nADImAq9
fF4k7Ww+t9ZHGlzYun9WmNl8lGYz27w/sjjqnEbD8G+SZlkrqRNwccILP7rmOso2
orGvzxouGiWmp4XcXno2X3ydfFVTl5LdVeaX6oyKxEBpcikz0QFJUmTcqqy0GqGo
NeAGUip36MaJETXVgqKqeZERrST2Eminps3nTEwftd/ukOE+Sa8tX3spe2seN7mm
o9ZBk8a0VChlhfGmIqnYZfksRyacEmVaLFV2W5EQdumL+G+jelCmpPgQKo0ZxEEX
WTYULG5JiSIu34RjtTAP9tOPQ9XTV9hcArRYR355aki5EXGh/ntY1eVtV2nCSHjA
vNZ0hpJNwfcx/0UlVULEJqxwDYYtl/QzJFTQO4l0zFYgf5SK9DXbOGDcZfoKgFf8
7jzLn/hDzHNjCPocHDkXOq50H/pPHwdV65U4hPMnemcGVuVrNzHr5CyKBDgj7nXN
NYP3S6fphg8W/DIlQFaTRAuBCqanZE5DIqFUgptSSdAmBLKagZKkaypUjCscb7Ee
GinQKHm7YgZWG+YeAs4O
=W1gM
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.6 pull request
This is a very small one this time, with only 5 patches.
There are a couple of big items that could not be merged/finished
on time.
We have:
- 2 LLCP fixes for a race and a potential OOM.
- 2 cleanups for the pn544 and microread drivers.
- 1 Maintainer addition for the s3fwrn5 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e5a03bfd87 ("phy: Add an mdio_device structure") removed addr,
bus and dev member of the phy_device structure.
This patch remove the documentation about those members.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Major changes:
ath10k
* dt: add bindings for ipq4019 wifi block
* start adding support for qca4019 chip
ath9k
* add device ID for Toshiba WLM-20U2/GN-1080
* allow more than one interface on DFS channels
bcma
* move flash detection code to ChipCommon core driver
brcmfmac
* IPv6 Neighbor discovery offload
* driver settings that can be populated from different sources
* country code setting in firmware
* length checks to validate firmware events
* new way to determine device memory size needed for BCM4366
* various offloads during Wake on Wireless LAN (WoWLAN)
* full Management Frame Protection (MFP) support
iwlwifi
* add support for thermal device / cooling device
* improvements in scheduled scan without profiles
* new firmware support (-21.ucode)
* add MSIX support for 9000 devices
* enable MU-MIMO and take care of firmware restart
* add support for large SKBs in mvm to reach A-MSDU
* add support for filtering frames from a BA session
* start implementing the new Rx path for 9000 devices
* enable the new Radio Resource Management (RRM) nl80211 feature flag
* add a new module paramater to disable VHT
* build infrastructure for Dynamic Queue Allocation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJW4HwrAAoJEG4XJFUm622bvFUH/ArZD53Jh8btu8ukmkgKOPkc
hCnvR639TURCNkC/e1lR0MFjO1QLLZ2m1tdRoZQfLiZm63HUuQzPDmaVnTeVfjrI
4p3LmYriTECvgLoqVJgmBjNWiC61fMbWTJ91YqQiw2ZhvuKbcsu6oz/jU9MyCLyJ
7WSk+HUqAnwtj7z515vAYQYapdUbxU1u7m/NgYdiYKTXfBR2ozUbfDR18Ey2EBWC
KkDpFXyxo7ZByXzVA1B1UogB9NteV7IV1+WHphIX/XGdVQPpwRV3KxmLqKjIWW5E
1Cv3q05vWapev9V5ZYghLpAGUQTu8h0nH2v0bJa9nSiQX23/Vkz7xxA/hi5gBOo=
=aqji
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2016-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers patches for 4.6
Major changes:
ath10k
* dt: add bindings for ipq4019 wifi block
* start adding support for qca4019 chip
ath9k
* add device ID for Toshiba WLM-20U2/GN-1080
* allow more than one interface on DFS channels
bcma
* move flash detection code to ChipCommon core driver
brcmfmac
* IPv6 Neighbor discovery offload
* driver settings that can be populated from different sources
* country code setting in firmware
* length checks to validate firmware events
* new way to determine device memory size needed for BCM4366
* various offloads during Wake on Wireless LAN (WoWLAN)
* full Management Frame Protection (MFP) support
iwlwifi
* add support for thermal device / cooling device
* improvements in scheduled scan without profiles
* new firmware support (-21.ucode)
* add MSIX support for 9000 devices
* enable MU-MIMO and take care of firmware restart
* add support for large SKBs in mvm to reach A-MSDU
* add support for filtering frames from a BA session
* start implementing the new Rx path for 9000 devices
* enable the new Radio Resource Management (RRM) nl80211 feature flag
* add a new module paramater to disable VHT
* build infrastructure for Dynamic Queue Allocation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a regression in the bridge ageing time caused by:
commit c62987bbd8 ("bridge: push bridge setting ageing_time down to switchdev")
There are users of Linux bridge which use the feature that if ageing time
is set to 0 it causes entries to never expire. See:
https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
For a pure software bridge, it is unnecessary for the code to have
arbitrary restrictions on what values are allowable.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is based on a patch made by John Fastabend.
It adds support for offloading cls_flower.
when NETIF_F_HW_TC is on:
flags = 0 => Rule will be processed twice - by hardware, and if
still relevant, by software.
flags = SKIP_HW => Rull will be processed by software only
If hardware fail/not capabale to apply the rule, operation will NOT
fail. Filter will be processed by SW only.
Acked-by: Jiri Pirko <jiri@mellanox.com>
Suggested-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netdevice parameter hard_header_len is variously interpreted both as
an upper and lower bound on link layer header length. The field is
used as upper bound when reserving room at allocation, as lower bound
when validating user input in PF_PACKET.
Clarify the definition to be maximum header length. For validation
of untrusted headers, add an optional validate member to header_ops.
Allow bypassing of validation by passing CAP_SYS_RAWIO, for instance
for deliberate testing of corrupt input. In this case, pad trailing
bytes, as some device drivers expect completely initialized headers.
See also http://comments.gmane.org/gmane.linux.network/401064
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally I only wanted to drop the unneeded inclusion of
<linux/i2c.h>, but then noticed that struct
microread_nfc_platform_data isn't actually used, and
MICROREAD_DRIVER_NAME is redefined in the only file where it is used,
so we can get rid of the header file and dead code altogether.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This module implements the Kernel Connection Multiplexor.
Kernel Connection Multiplexor (KCM) is a facility that provides a
message based interface over TCP for generic application protocols.
With KCM an application can efficiently send and receive application
protocol messages over TCP using datagram sockets.
For more information see the included Documentation/networking/kcm.txt
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to
indicate that more messages will follow (i.e. a batch of messages is
being sent). This is similar to MSG_MORE except that the following
messages are not merged into one packet, they are sent individually.
sendmmsg is updated so that each contained message except for the
last one is marked as MSG_BATCH.
MSG_BATCH is a performance optimization in cases where a socket
implementation can benefit by transmitting packets in a batch.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export it for cases where we want to create sockets by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a convenience function that returns the next entry in an RCU
list or NULL if at the end of the list.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was observed that calling bpf_get_stackid() from a kprobe inside
slub or from spin_unlock causes similar deadlock as with hashmap,
therefore convert stackmap to use pre-allocated memory.
The call_rcu is no longer feasible mechanism, since delayed freeing
causes bpf_get_stackid() to fail unpredictably when number of actual
stacks is significantly less than user requested max_entries.
Since elements are no longer freed into slub, we can push elements into
freelist immediately and let them be recycled.
However the very unlikley race between user space map_lookup() and
program-side recycling is possible:
cpu0 cpu1
---- ----
user does lookup(stackidX)
starts copying ips into buffer
delete(stackidX)
calls bpf_get_stackid()
which recyles the element and
overwrites with new stack trace
To avoid user space seeing a partial stack trace consisting of two
merged stack traces, do bucket = xchg(, NULL); copy; xchg(,bucket);
to preserve consistent stack trace delivery to user space.
Now we can move memset(,0) of left-over element value from critical
path of bpf_get_stackid() into slow-path of user space lookup.
Also disallow lookup() from bpf program, since it's useless and
program shouldn't be messing with collected stack trace.
Note that similar race between user space lookup and kernel side updates
is also present in hashmap, but it's not a new race. bpf programs were
always allowed to modify hash and array map elements while user space
is copying them.
Fixes: d5a3b1f691 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If kprobe is placed on spin_unlock then calling kmalloc/kfree from
bpf programs is not safe, since the following dead lock is possible:
kfree->spin_lock(kmem_cache_node->lock)...spin_unlock->kprobe->
bpf_prog->map_update->kmalloc->spin_lock(of the same kmem_cache_node->lock)
and deadlocks.
The following solutions were considered and some implemented, but
eventually discarded
- kmem_cache_create for every map
- add recursion check to slow-path of slub
- use reserved memory in bpf_map_update for in_irq or in preempt_disabled
- kmalloc via irq_work
At the end pre-allocation of all map elements turned out to be the simplest
solution and since the user is charged upfront for all the memory, such
pre-allocation doesn't affect the user space visible behavior.
Since it's impossible to tell whether kprobe is triggered in a safe
location from kmalloc point of view, use pre-allocation by default
and introduce new BPF_F_NO_PREALLOC flag.
While testing of per-cpu hash maps it was discovered
that alloc_percpu(GFP_ATOMIC) has odd corner cases and often
fails to allocate memory even when 90% of it is free.
The pre-allocation of per-cpu hash elements solves this problem as well.
Turned out that bpf_map_update() quickly followed by
bpf_map_lookup()+bpf_map_delete() is very common pattern used
in many of iovisor/bcc/tools, so there is additional benefit of
pre-allocation, since such use cases are must faster.
Since all hash map elements are now pre-allocated we can remove
atomic increment of htab->count and save few more cycles.
Also add bpf_map_precharge_memlock() to check rlimit_memlock early to avoid
large malloc/free done by users who don't have sufficient limits.
Pre-allocation is done with vmalloc and alloc/free is done
via percpu_freelist. Here are performance numbers for different
pre-allocation algorithms that were implemented, but discarded
in favor of percpu_freelist:
1 cpu:
pcpu_ida 2.1M
pcpu_ida nolock 2.3M
bt 2.4M
kmalloc 1.8M
hlist+spinlock 2.3M
pcpu_freelist 2.6M
4 cpu:
pcpu_ida 1.5M
pcpu_ida nolock 1.8M
bt w/smp_align 1.7M
bt no/smp_align 1.1M
kmalloc 0.7M
hlist+spinlock 0.2M
pcpu_freelist 2.0M
8 cpu:
pcpu_ida 0.7M
bt w/smp_align 0.8M
kmalloc 0.4M
pcpu_freelist 1.5M
32 cpu:
kmalloc 0.13M
pcpu_freelist 0.49M
pcpu_ida nolock is a modified percpu_ida algorithm without
percpu_ida_cpu locks and without cross-cpu tag stealing.
It's faster than existing percpu_ida, but not as fast as pcpu_freelist.
bt is a variant of block/blk-mq-tag.c simlified and customized
for bpf use case. bt w/smp_align is using cache line for every 'long'
(similar to blk-mq-tag). bt no/smp_align allocates 'long'
bitmasks continuously to save memory. It's comparable to percpu_ida
and in some cases faster, but slower than percpu_freelist
hlist+spinlock is the simplest free list with single spinlock.
As expeceted it has very bad scaling in SMP.
kmalloc is existing implementation which is still available via
BPF_F_NO_PREALLOC flag. It's significantly slower in single cpu and
in 8 cpu setup it's 3 times slower than pre-allocation with pcpu_freelist,
but saves memory, so in cases where map->max_entries can be large
and number of map update/delete per second is low, it may make
sense to use it.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
if kprobe is placed within update or delete hash map helpers
that hold bucket spin lock and triggered bpf program is trying to
grab the spinlock for the same bucket on the same cpu, it will
deadlock.
Fix it by extending existing recursion prevention mechanism.
Note, map_lookup and other tracing helpers don't have this problem,
since they don't hold any locks and don't modify global data.
bpf_trace_printk has its own recursive check and ok as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for your net-next tree,
they are:
1) Remove useless debug message when deleting IPVS service, from
Yannick Brosseau.
2) Get rid of compilation warning when CONFIG_PROC_FS is unset in
several spots of the IPVS code, from Arnd Bergmann.
3) Add prandom_u32 support to nft_meta, from Florian Westphal.
4) Remove unused variable in xt_osf, from Sudip Mukherjee.
5) Don't calculate IP checksum twice from netfilter ipv4 defrag hook
since fixing af_packet defragmentation issues, from Joe Stringer.
6) On-demand hook registration for iptables from netns. Instead of
registering the hooks for every available netns whenever we need
one of the support tables, we register this on the specific netns
that needs it, patchset from Florian Westphal.
7) Add missing port range selection to nf_tables masquerading support.
BTW, just for the record, there is a typo in the description of
5f6c253ebe ("netfilter: bridge: register hooks only when bridge
interface is added") that refers to the cluster match as deprecated, but
it is actually the CLUSTERIP target (which registers hooks
inconditionally) the one that is scheduled for removal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix ordering of WEXT netlink messages so we don't see a newlink
after a dellink, from Johannes Berg.
2) Out of bounds access in minstrel_ht_set_best_prob_rage, from
Konstantin Khlebnikov.
3) Paging buffer memory leak in iwlwifi, from Matti Gottlieb.
4) Wrong units used to set initial TCP rto from cached metrics, also
from Konstantin Khlebnikov.
5) Fix stale IP options data in the SKB control block from leaking
through layers of encapsulation, from Bernie Harris.
6) Zero padding len miscalculated in bnxt_en, from Michael Chan.
7) Only CHECKSUM_PARTIAL packets should be passed down through GSO, fix
from Hannes Frederic Sowa.
8) Fix suspend/resume with JME networking devices, from Diego Violat
and Guo-Fu Tseng.
9) Checksums not validated properly in bridge multicast support due to
the placement of the SKB header pointers at the time of the check,
fix from Álvaro Fernández Rojas.
10) Fix hang/tiemout with r8169 if a stats fetch is done while the
device is runtime suspended. From Chun-Hao Lin.
11) The forwarding database netlink dump facilities don't track the
state of the dump properly, resulting in skipped/missed entries.
From Minoura Makoto.
12) Fix regression from a recent 3c59x bug fix, from Neil Horman.
13) Fix list corruption in bna driver, from Ivan Vecera.
14) Big endian machines crash on vlan add in bnx2x, fix from Michal
Schmidt.
15) Ethtool RSS configuration not propagated properly in mlx5 driver,
from Tariq Toukan.
16) Fix regression in PHY probing in stmmac driver, from Gabriel
Fernandez.
17) Fix SKB tailroom calculation in igmp/mld code, from Benjamin
Poirier.
18) A past change to skip empty routing headers in ipv6 extention header
parsing accidently caused fragment headers to not be matched any
longer. Fix from Florian Westphal.
19) eTSEC-106 erratum needs to be applied to more gianfar chips, from
Atsushi Nemoto.
20) Fix netdev reference after free via workqueues in usb networking
drivers, from Oliver Neukum and Bjørn Mork.
21) mdio->irq is now an array rather than a pointer to dynamic memory,
but several drivers were still trying to free it :-/ Fixes from
Colin Ian King.
22) act_ipt iptables action forgets to set the family field, thus LOG
netfilter targets don't work with it. Fix from Phil Sutter.
23) SKB leak in ibmveth when skb_linearize() fails, from Thomas Falcon.
24) pskb_may_pull() cannot be called with interrupts disabled, fix code
that tries to do this in vmxnet3 driver, from Neil Horman.
25) be2net driver leaks iomap'd memory on removal, fix from Douglas
Miller.
26) Forgotton RTNL mutex unlock in ppp_create_interface() error paths,
from Guillaume Nault.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (97 commits)
ppp: release rtnl mutex when interface creation fails
cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
tcp: fix tcpi_segs_in after connection establishment
net: hns: fix the bug about loopback
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
udp6: fix UDP/IPv6 encap resubmit path
be2net: Don't leak iomapped memory on removal.
vmxnet3: avoid calling pskb_may_pull with interrupts disabled
net: ethernet: Add missing MFD_SYSCON dependency on HAS_IOMEM
ibmveth: check return of skb_linearize in ibmveth_start_xmit
cdc_ncm: toggle altsetting to force reset before setup
usbnet: cleanup after bind() in probe()
mlxsw: pci: Correctly determine if descriptor queue is full
mlxsw: spectrum: Always decrement bridge's ref count
tipc: fix nullptr crash during subscription cancel
net: eth: altera: do not free array priv->mdio->irq
net/ethoc: do not free array priv->mdio->irq
net: sched: fix act_ipt for LOG target
asix: do not free array priv->mdio->irq
...
This patch adds mainly structures and APIs prototype changes
in order to give support for qede slowpath/fastpath support
for the same.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This follows the way of handling other flashes and cleans code a bit. As
next task we will want to move flash code to ChipCommon driver as:
1) Flash controllers are accesible using ChipCommon registers
2) This code isn't MIPS specific
This change prepares bcma for that.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Most of info stored in this struct wasn't really used anywhere as we put
all that data in platform data & resource as well.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Platform data is only available for sdio. With this patch a new
platform data structure is being used which allows for platform
data for any device and configurable per device. This patch only
switches to the new structure and adds support for SDIO devices.
Reviewed-by: Arend Van Spriel <arend@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Pull ceph fix from Sage Weil:
"This is a final commit we missed to align the protocol compatibility
with the feature bits.
It decodes a few extra fields in two different messages and reports
EIO when they are used (not yet supported)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support
Pull libata fixes from Tejun Heo:
"Assorted fixes for libata drivers.
- Turns out HDIO_GET_32BIT ioctl was subtly broken all along.
- Recent update to ahci external port handling was incorrectly
marking hotpluggable ports as external making userland handle
devices connected to those ports incorrectly.
- ahci_xgene needs its own irq handler to work around a hardware
erratum. libahci updated to allow irq handler override.
- Misc driver specific updates"
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci: don't mark HotPlugCapable Ports as external/removable
ahci: Workaround for ThunderX Errata#22536
libata: Align ata_device's id on a cacheline
Adding Intel Lewisburg device IDs for SATA
pata-rb532-cf: get rid of the irq_to_gpio() call
libata: fix HDIO_GET_32BIT ioctl
ahci_xgene: Implement the workaround to fix the missing of the edge interrupt for the HOST_IRQ_STAT.
ata: Remove the AHCI_HFLAG_EDGE_IRQ support from libahci.
libahci: Implement the capability to override the generic ahci interrupt handler.
Pull block fixes from Jens Axboe:
"Round 2 of this. I cut back to the bare necessities, the patch is
still larger than it usually would be at this time, due to the number
of NVMe fixes in there. This pull request contains:
- The 4 core fixes from Ming, that fix both problems with exceeding
the virtual boundary limit in case of merging, and the gap checking
for cloned bio's.
- NVMe fixes from Keith and Christoph:
- Regression on larger user commands, causing problems with
reading log pages (for instance). This touches both NVMe,
and the block core since that is now generally utilized also
for these types of commands.
- Hot removal fixes.
- User exploitable issue with passthrough IO commands, if !length
is given, causing us to fault on writing to the zero
page.
- Fix for a hang under error conditions
- And finally, the current series regression for umount with cgroup
writeback, where the final flush would happen async and hence open
up window after umount where the device wasn't consistent. fsck
right after umount would show this. From Tejun"
* 'for-linus2' of git://git.kernel.dk/linux-block:
block: support large requests in blk_rq_map_user_iov
block: fix blk_rq_get_max_sectors for driver private requests
nvme: fix max_segments integer truncation
nvme: set queue limits for the admin queue
writeback: flush inode cgroup wb switches instead of pinning super_block
NVMe: Fix 0-length integrity payload
NVMe: Don't allow unsupported flags
NVMe: Move error handling to failed reset handler
NVMe: Simplify device reset failure
NVMe: Fix namespace removal deadlock
NVMe: Use IDA for namespace disk naming
NVMe: Don't unmap controller registers on reset
block: merge: get the 1st and last bvec via helpers
block: get the 1st and last bvec via helpers
block: check virt boundary in bio_will_gap()
block: bio: introduce helpers to get the 1st and last bvec
a tasks "comm" field. But this prevented filtering on a comm field that
is within a trace event (like sched_migrate_task).
When trying to filter on when a program migrated, this change prevented
the filtering of the sched_migrate_task.
To fix this, the event fields are examined first, and then the extra fields
like "comm" and "cpu" are examined. Also, instead of testing to assign
the comm filter function based on the field's name, the generic comm field
is given a new filter type (FILTER_COMM). When this field is used to filter
the type is checked. The same is done for the cpu filter field.
Two new special filter types are added: "COMM" and "CPU". This allows users
to still filter the tasks comm for events that have "comm" as one of their
fields, in cases that users would like to filter sched_migrate_task on the
comm of the task that called the event, and not the comm of the task that
is being migrated.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJW2argAAoJEKKk/i67LK/8b78H/32nYPizDIsK/p2bL1mgbtMl
vrkcfb+maPOC7cjB+CdQmyV4EIVpSn06XFouYghGprdoVocVyBuIflxn0j3Gbymy
zLCg8lR70KTATTqst1wsWMbnh+UvAKNEiXj8jf2qcK2xhgalXMDwsTC4+LDlLugu
YAx89lmsjK1YpP/wIzMww2jQG+07Nhm9gHWXF2MC3egZ+sgYxARnfds0yTcGgS8o
dc/yJGZDCI44JMDNThcCFxNvsmoTa9tpm+JNe2YTht6KCympa+Ht9Jj9MMlD06cq
M5CqMQlok+mrVsW5LbJPCk1u83ynr6d/PcPQuT2nykRx8bGvKjA7AKMPaxw1Jz4=
=ixBz
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"A feature was added in 4.3 that allowed users to filter trace points
on a tasks "comm" field. But this prevented filtering on a comm field
that is within a trace event (like sched_migrate_task).
When trying to filter on when a program migrated, this change
prevented the filtering of the sched_migrate_task.
To fix this, the event fields are examined first, and then the extra
fields like "comm" and "cpu" are examined. Also, instead of testing
to assign the comm filter function based on the field's name, the
generic comm field is given a new filter type (FILTER_COMM). When
this field is used to filter the type is checked. The same is done
for the cpu filter field.
Two new special filter types are added: "COMM" and "CPU". This allows
users to still filter the tasks comm for events that have "comm" as
one of their fields, in cases that users would like to filter
sched_migrate_task on the comm of the task that called the event, and
not the comm of the task that is being migrated"
* tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not have 'comm' filter override event 'comm' field
DIV_ROUND_UP is defined in linux/kernel.h only for the kernel.
When ethtool.h is included by a userland app, we got the following error:
include/linux/ethtool.h:1218:8: error: variably modified 'queue_mask' at file scope
__u32 queue_mask[DIV_ROUND_UP(MAX_NUM_QUEUE, 32)];
^
Let's add a common definition in uapi and use it everywhere.
Fixes: ac2c7ad0e5 ("net/ethtool: introduce a new ioctl for per queue setting")
CC: Kan Liang <kan.liang@intel.com>
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the format change of MClientReply/MclientCaps.
Also add code that denies access to inodes with pool_ns layouts.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Commit 9f61668073 "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.
echo 'comm == "bash"' > events/sched_migrate_task/filter
will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.
This fix requires a couple of changes.
1) Change the look up order for filter predicates to look at the events
fields before looking at the generic filters.
2) Instead of basing the filter function off of the "comm" name, have the
generic "comm" filter have its own filter_type (FILTER_COMM). Test
against the type instead of the name to assign the filter function.
3) Add a new "COMM" filter that works just like "comm" but will filter based
on the current task, even if the trace event contains a "comm" field.
Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
Cc: stable@vger.kernel.org # v4.3+
Fixes: 9f61668073 "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Driver private request types should not get the artifical cap for the
FS requests. This is important to use the full device capabilities
for internal command or NVMe pass through commands.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jeff Lien <Jeff.Lien@hgst.com>
Tested-by: Jeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Updated by me to use an explicit check for the one command type that
does support extended checking, instead of relying on the ordering
of the enum command values - as suggested by Keith.
Signed-off-by: Jens Axboe <axboe@fb.com>
If cgroup writeback is in use, inodes can be scheduled for
asynchronous wb switching. Before 5ff8eaac16 ("writeback: keep
superblock pinned during cgroup writeback association switches"), this
could race with umount leading to super_block being destroyed while
inodes are pinned for wb switching. 5ff8eaac16 fixed it by bumping
s_active while wb switches are in flight; however, this allowed
in-flight wb switches to make umounts asynchronous when the userland
expected synchronosity - e.g. fsck immediately following umount may
fail because the device is still busy.
This patch removes the problematic super_block pinning and instead
makes generic_shutdown_super() flush in-flight wb switches. wb
switches are now executed on a dedicated isw_wq so that they can be
flushed and isw_nr_in_flight keeps track of the number of in-flight wb
switches so that flushing can be avoided in most cases.
v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
check in inode_switch_wbs() as Jan an Al suggested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tahsin Erdogan <tahsin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: 5ff8eaac16 ("writeback: keep superblock pinned during cgroup writeback association switches")
Cc: stable@vger.kernel.org #v4.5
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch applies the two introduced helpers to
figure out the 1st and last bvec, and fixes the
original way after bio splitting.
Cc: stable@vger.kernel.org
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
In the following patch, the way for figuring out
the last bvec will be changed with a bit cost introduced,
so return immediately if the queue doesn't have virt
boundary limit. Actually most of devices have not
this limit.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The bio passed to bio_will_gap() may be fast cloned from upper
layer(dm, md, bcache, fs, ...), or from bio splitting in block
core.
Unfortunately bio_will_gap() just figures out the last bvec via
'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously
wrong.
This patch introduces two helpers for getting the first and last
bvec of one bio for fixing the issue.
Cc: stable@vger.kernel.org
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The current reserved_tailroom calculation fails to take hlen and tlen into
account.
skb:
[__hlen__|__data____________|__tlen___|__extra__]
^ ^
head skb_end_offset
In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:
[__hlen__|__data____________|__extra__|__tlen___]
^ ^
head skb_end_offset
The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.
The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
reserved_tailroom = skb_tailroom - mtu
else:
reserved_tailroom = tlen
Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.
Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch manages the case when you have an Ethernet MAC with
a "fixed link", and not connected to a normal MDIO-managed PHY device.
The test of phy_bus_name was not helpful because it was never affected
and replaced by the mdio test node.
Signed-off-by: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should modify TIRs explicitly to apply the new RSS configuration.
The light ndo close/open calls do not "refresh" them.
Fixes: 2d75b2bc8a ('net/mlx5e: Add ethtool RSS configuration options')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch restructures the DMA bus settings and this is done
by introducing a new platform structure used for programming
the AXI Bus Mode Register inside the DMA module.
This structure can be populated from device-tree as documented in the
binding txt file.
After initializing the DMA, the AXI register can be optionally tuned
for platform drivers based.
This patch also reworks some parameters to make coherent the DMA
configuration now that AXI register is introduced.
For example, the burst_len is managed by using the mentioned axi
support above; so the snps,burst-len parameter has been removed.
It makes sense to provide the AAL parameter from DT to Address-Aligned
Beats inside the Register0 and review the PBL settings when initialize
the engine.
For PCI glue, rebuilding the story of this setting, it
was added to align a configuration so not for fixing some
known problem. No issue raised after this patch.
It is safe to use the default burst length instead of
tuning it to the maximum value
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the previous patches in place, a netns nf_hook_list might be empty,
even if e.g. init_net performs filtering.
Thus change nf_hook_thresh to check the hook_list as well before
initializing hook_state and calling nf_hook_slow().
We still make use of static keys; if no netfilter modules are loaded
list is guaranteed to be empty.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
delay hook registration until the table is being requested inside a
namespace.
Historically, a particular table (iptables mangle, ip6tables filter, etc)
was registered on module load.
When netns support was added to iptables only the ip/ip6tables ruleset was
made namespace aware, not the actual hook points.
This means f.e. that when ipt_filter table/module is loaded on a system,
then each namespace on that system has an (empty) iptables filter ruleset.
In other words, if a namespace sends a packet, such skb is 'caught' by
netfilter machinery and fed to hooking points for that table (i.e. INPUT,
FORWARD, etc).
Thanks to Eric Biederman, hooks are no longer global, but per namespace.
This means that we can avoid allocation of empty ruleset in a namespace and
defer hook registration until we need the functionality.
We register a tables hook entry points ONLY in the initial namespace.
When an iptables get/setockopt is issued inside a given namespace, we check
if the table is found in the per-namespace list.
If not, we attempt to find it in the initial namespace, and, if found,
create an empty default table in the requesting namespace and register the
needed hooks.
Hook points are destroyed only once namespace is deleted, there is no
'usage count' (it makes no sense since there is no 'remove table' operation
in xtables api).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This change prepares for upcoming on-demand xtables hook registration.
We change the protoypes of the register/unregister functions.
A followup patch will then add nf_hook_register/unregister calls
to the iptables one.
Once a hook is registered packets will be picked up, so all assignments
of the form
net->ipv4.iptable_$table = new_table
have to be moved to ip(6)t_register_table, else we can see NULL
net->ipv4.iptable_$table later.
This patch doesn't change functionality; without this the actual change
simply gets too big.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull d_inode/d_flags race fix from Al Viro.
I love this fix. Not only does it fix the race in the dentry type
handling, it entirely gets rid of the nasty and subtle memory ordering
rules for d_type and d_inode, and replaces them with the basic dentry
locking rules (sequence numbers under RCU, d_lock elsewhere).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
use ->d_seq to get coherency between ->d_inode and ->d_flags
After commit 52bd2d62ce ("net: better skb->sender_cpu and skb->napi_id cohabitation")
skb_sender_cpu_clear() becomes empty and can be removed.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid double mapping of io mapped memory, Device page may be
mapped to non-cached(NC) or to write-combining(WC).
The code before this fix tries to map it both to WC and NC
contrary to what stated in Intel's software developer manual.
Here we remove the global WC mapping of all UARS
"dev->priv.bf_mapping", since UAR mapping should be decided
per UAR (e.g we want different mappings for EQs, CQs vs QPs).
Caller will now have to choose whether to map via
write-combining API or not.
mlx5e SQs will choose write-combining in order to perform
BlueFlame writes.
Fixes: 88a85f99e5 ('TX latency optimization to save DMA reads')
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Reviewed-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The command timeout is terribly long, whole two hours. Make it 60s so if
things do go wrong, the user gets feedback in relatively short time, so
they can take corrective actions and/or investigate using tools and such.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>