When starting a new BA session, we must pass the win_size to the FW.
To do this we take max_rx_aggregation_subframes (BA RX win size)
which is stored in ieee80211_sta structure (e.g per link and not per HW)
We will use the value stored per link when passing the win_size to
firmware through the ACX_BA_SESSION_RX_SETUP command.
Signed-off-by: Maxim Altshul <maxim.altshul@ti.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The call to krealloc() in wsm_buf_reserve() directly assigns the newly
returned memory to buf->begin. This is all fine except when krealloc()
failes we loose the ability to free the old memory pointed to by
buf->begin. If we just create a temporary variable to assign memory to
and assign the memory to it we can mitigate the memory leak.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Add the missing destroy_workqueue() before return from
mwifiex_add_virtual_intf() in the error handling case.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Marvell p2p device disappears from the list of p2p peers on the other
p2p device after disconnection.
It happens due to a bug in driver. When interface is changed from p2p
to station, certain variables(bss_type, bss_role etc.) aren't correctly
updated. This patch corrects them to fix the issue.
Signed-off-by: Karthik D A <karthida@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
It is observed that if single tid 6 packet comes among with massive tid 0
packets, tid 6 packet may stay in it's queue and will never be
transmited. This is because wmm.highest_queued_prio will be set to 2
during transmission of tid 0 packets As a result, main work thread
keeps on looping without serving that packet. In this case, if command
has downloaded to firmware, driver doesn't process it's response causing
command timeout.
This patch will reset highest_queued_prio if packets exist in data
queue, and try to find a ra_list for current private.
Signed-off-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
wmm.tx_pkts_queued and ralist's total_pkt_count should be updated in
synchronization. They were not correctly updated in
mwifiex_send_processed_packet().
Signed-off-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Support for this debugfs command is available in driver. This patch
adds usage information in README file.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
While copying the vendor_ie obtained from the cfg80211_find_vendor_ie()
to the struct mwifiex_types_wmm_info, length/size was inappropriate.
This patch corrects the required length needed to the
mwifiex_types_wmm_info
Signed-off-by: Karthik D A <karthida@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
At couple of places in cleanup path, we are just going through the
skb queue and freeing them without unlinking. This leads to a crash
when other thread tries to do skb_dequeue() and use already freed node.
The problem is freed by unlinking skb before freeing it.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
When host_sleep_config command fails, we should return an error to
PCIe, instead of continuing (and possibly panicking, when we try to keep
processing a timed-out ioctl after we return "successfully" from
suspend).
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Following is mwifiex driver-firmware host sleep handshake.
It involves three threads. suspend handler, interrupt handler, interrupt
processing in main work queue.
1) Enter suspend handler
2) Download HS_CFG command
3) Response from firmware for HS_CFG
4) Suspend thread waits until handshake completes(i.e hs_activate becomes
true)
5) SLEEP from firmware
6) SLEEP confirm downloaded to firmware.
7) SLEEP confirm response from firmware
8) Driver processes SLEEP confirm response and set hs_activate to wake up
suspend thread
9) Exit suspend handler
10) Read sleep cookie in loop and wait until it indicates firmware is
sleep.
11) After processing SLEEP confirm response, we are at the end of interrupt
processing routine. Recheck if there are interrupts received while we were
processing them.
During suspend-resume stress test, it's been observed that we may end up
acessing PCIe hardware(in 10 and 11) when PCIe bus is closed which leads
to a kernel crash.
This patch solves the problem with below changes.
a) action 10 above can be done before 8
b) Skip 11 if hs_activated is true. SLEEP confirm response
is the last interrupt from firmware. No need to recheck for
pending interrupts.
c) Add flush_workqueue() in suspend handler.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
So far our core code was calling brcmf_fws_process_skb which wasn't
a proper thing to do. If case of devices using msgbuf protocol fwsignal
shouldn't be used. It was an unnecessary extra layer simply calling
a protocol specifix txdata function.
Please note we already have txdata callback, but it's used for calls
between bcdc and fwsignal so it couldn't be simply used there.
This makes core code more generic (instead of bcdc/fwsignal specific).
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
On some devices the EEPROMs of Ralink Wi-Fi chips have a default Ralink
MAC address set (RT3062F: 00:0C:43:30:62:00, RT3060F:
00:0C:43:30:60:00). Using multiple of these devices in the same network
can cause nasty issues.
Allow to override the MAC in the EEPROM with (a known good) one set in
the device tree to bypass the issue.
Signed-off-by: Mathias Kresin <dev@kresin.me>
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
ath.git patches for 4.10. Major changes:
ath10k
* allow setting coverage class for first generation cards
* read regulatory domain from ACPI
ath9k
* disable RNG by default
* Use dev_coredumpmsg() to prevent locking the driver;
* Small fix to pass the AID to the FW;
* Use FW PS decisions with multi-queue;
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJYEGRBAAoJEKFHnKIaPMX63wAP+wSdixtt6iKzKL+RUHf+UXoz
Qg+Kh8XCg6JzYYoTuWb00DgFZv6sBcca+PMv+/EREShZz+/hhQlxz5PPuA63Yx+7
ZwO0UdtRP3QsbukvVJcpnEnoWwpmqfnSW2R3pgQirWQAjSiNJvrXkQh4LDu+C89F
Bv/UyIo+oUNX5cCUavoR7meoau2GKLCz3B5vNZgLDBfNeOuV7KyVwuvdb7suL9C7
hf+Q/zi1BBU2p2xYLwrb9AuKURrqWqI+i8zZZ0OzgD3w61QqMQ+k4x4vpY1MupwF
eblR2wjmo51p1OmbzswjzvIqgeu8EUt+w0mzmfq8j8FMv+zdiZ4tbjnrYZC1s1jw
yxVkRUO9otZSoHD2C5sJsXlzCs2CFEbmpU3iTa/WV/QMWAclbvubEZwKpVQAp05D
tX9p0PTB8Zefuw84K2jcg0AuSPTDJLB7ojwdIEzYI6m/+O6N6pn9ymQ/TWzl0oyC
XzPGnhELa6FTreIOfen75eHusDa3mcYwb1NntGFbSrouS/2KUpUusOBgDLC2P97x
cHPv2ADOI4xPrMA8vMnVs7Sd8RT+hb61gBz37b/s+M5n74WwDLEpZdbEDD8xaHW9
wcYiiFnB53WXcUQdreQ6IbanDWH4MufpMeDQ72lXxrH+VvzMgfsMxwGCb1sUYy7h
XRWyhhjEgagj4/w2WpqK
=D+eo
-----END PGP SIGNATURE-----
Merge tag 'iwlwifi-next-for-kalle-2016-10-25-2' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
* Finalize and enable dynamic queue allocation;
* Use dev_coredumpmsg() to prevent locking the driver;
* Small fix to pass the AID to the FW;
* Use FW PS decisions with multi-queue;
When a port_type_set() is been called and the new port type set is the same
as the old one, just return success.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
firewire-net, like the older eth1394 driver, reduced the initial MTU to
less than 1500 octets if the local link layer controller's asynchronous
packet reception limit was lower.
This is bogus, since this reception limit does not have anything to do
with the transmission limit. Neither did this reduction affect the TX
path positively, nor could it prevent link fragmentation at the RX path.
Many FireWire CardBus cards have a max_rec of 9, causing an initial MTU
of 1024 - 16 = 1008. RFC 2734 and RFC 3146 allow a minimum max_rec = 8,
which would result in an initial MTU of 512 - 16 = 496. On such cards,
IPv6 could only be employed if the MTU was manually increased to 1280 or
more, i.e. IPv6 would not work without intervention from userland.
We now always initialize the MTU to 1500, which is the default according
to RFC 2734 and RFC 3146.
On a VIA VT6316 based CardBus card which was affected by this, changing
the MTU from 1008 to 1500 also increases TX bandwidth by 6 %.
RX remains unaffected.
CC: netdev@vger.kernel.org
CC: linux1394-devel@lists.sourceforge.net
CC: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b3e3893e12 ("net: use core MTU range checking in misc drivers")
mistakenly introduced an upper limit for firewire-net's MTU based on the
local link layer controller's reception capability. Revert this. Neither
RFC 2734 nor our implementation impose any particular upper limit.
Actually, to be on the safe side and to make the code explicit, set
ETH_MAX_MTU = 65535 as upper limit now.
(I replaced sizeof(struct rfc2734_header) by the equivalent
RFC2374_FRAG_HDR_SIZE in order to avoid distracting long/int conversions.)
Fixes: b3e3893e1253('net: use core MTU range checking in misc drivers')
CC: netdev@vger.kernel.org
CC: linux1394-devel@lists.sourceforge.net
CC: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This node pointer is returned by of_get_child_by_name() with refcount
incremented in this function. of_node_put() on it before exitting this
function.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setup_timer() instead of init_timer(), being the preferred/standard
way to set a timer up.
Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
active timer (if the timer is inactive it will be activated).
Use setup_timer and mod_timer to setup and arm a timer, to make the code
cleaner and easier to read.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return error code -ENODEV from the DMA is not supported error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled, spin_lock_irqsave()
make sure always in irq disable context. So the kfree_skb()
should be replaced with dev_kfree_skb_irq().
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setup_timer function instead of initializing timer with the function
and data fields.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not necessary to free memory allocated with devm_kzalloc in the
remove path and using kfree leads to a double free.
Fixes: 84640e27f2 ("net: netcp: Add Keystone NetCP core ethernet
driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The maximum MTU is defined via the slave devices of an batman-adv
interface. Thus it is not possible to calculate the max_mtu during the
creation of the batman-adv device when no slave devices are attached. Doing
so would for example break non-fragmentation setups which then
(incorrectly) allow an MTU of 1500 even when underlying device cannot
transport 1500 bytes + batman-adv headers.
Checking the dynamically calculated max_mtu via the minimum of the slave
devices MTU during .ndo_change_mtu is also used by the bridge interface.
Cc: Jarod Wilson <jarod@redhat.com>
Fixes: b3e3893e12 ("net: use core MTU range checking in misc drivers")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xo Wang says:
====================
Broadcom BCM54612E support
This series is based on tip of torvalds/master.
The first patch adds register definitions from Broadcom docs.
The second patch adds the BCM54612E PHY ID, flags, and device-specific
RGMII internal delay initialization.
I tested on a custom board with an Aspeed AST2500 SOC with its second
MAC connected to this PHY.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This PHY has internal delays enabled after reset. This clears the
internal delay enables unless the interface specifically requests them.
Signed-off-by: Xo Wang <xow@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the RXD-to-RXC skew (delay) time bit in the Miscellaneous Control
shadow register and a mask for the shadow selector field.
Remove a re-definition of MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL.
Signed-off-by: Xo Wang <xow@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I prepared commit d250a5f90e ("pkt_sched: gen_estimator: Dont
report fake rate estimators"), htb still had an implicit rate estimator
for all its classes.
Then later, I made this rate estimator optional in commit 64153ce0a7
("net_sched: htb: do not setup default rate estimators"), but I forgot
to update htb use of gnet_stats_copy_rate_est()
After this patch, "tc -s qdisc ..." no longer report fake rate
estimators for HTB classes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iwlmvm currently uses dev_coredumpm() to collect multiple
buffers, but this has the downside of pinning the module
until the coredump expires, if the data isn't read by any
userspace.
Avoid this by using the new dev_coredumpsg() method. We
still copy the data from the old way of generating it, but
neither hold on to vmalloc'ed data for a long time, nor do
we pin the module now.
Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
New firmwares support dynamic queue allocation (DQA), which enables
on-demand allocation of queues per RA/TID, instead of allocating them
statically per vif. This allows an AP to send, for instance, BE
traffic to STA2 even if it also needs to send traffic to a sleeping
STA1, without being blocked by the sleeping station.
The implementation in the driver is now ready, so we can enable this
feature by default when running firmwares that support it.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
[reworded the commit message]
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
In criu we are actively using diag interface to collect sockets
present in the system when dumping applications. And while for
unix, tcp, udp[lite], packet, netlink it works as expected,
the raw sockets do not have. Thus add it.
v2:
- add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
- implement @destroy for diag requests (by dsa@)
v3:
- add export of raw_abort for IPv6 (by dsa@)
- pass net-admin flag into inet_sk_diag_fill due to
changes in net-next branch (by dsa@)
v4:
- use @pad in struct inet_diag_req_v2 for raw socket
protocol specification: raw module carries sockets
which may have custom protocol passed from socket()
syscall and sole @sdiag_protocol is not enough to
match underlied ones
- start reporting protocol specifed in socket() call
when sockets are raw ones for the same reason: user
space tools like ss may parse this attribute and use
it for socket matching
v5 (by eric.dumazet@):
- use sock_hold in raw_sock_get instead of atomic_inc,
we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
when looking up so counter won't be zero here.
v6:
- use sdiag_raw_protocol() helper which will access @pad
structure used for raw sockets protocol specification:
we can't simply rename this member without breaking uapi
v7:
- sine sdiag_raw_protocol() helper is not suitable for
uapi lets rather make an alias structure with proper
names. __check_inet_diag_req_raw helper will catch
if any of structure unintentionally changed.
CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrey Vagin <avagin@openvz.org>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field is initialized by ILA and MPLS but never used. Remove it.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_mutex can be locked for a long time. It may be because many
namespaces are being destroyed or many processes decide to create
a network namespace.
Both these operations are heavy, so it is better to have an ability to
kill a process which is waiting net_mutex.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
is zero, then no vlan accel tag is present.
This is incorrect for zero VID vlan accel packets, making the following
match fail:
tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...
Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
introduced in 05423b2 "vlan: allow null VLAN ID to be used"
(and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
adapted).
Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
value.
Fixes: 05423b2413 ("vlan: allow null VLAN ID to be used")
Fixes: 1a31f2042e ("netsched: Allow meta match on vlan tag on receive")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: Driver update
Mostly cosmetics and small resource values management rewrite.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the number of resources is going to get much bigger, ease up the
addition by simly defining IDs. Convert the existing structure members
to a set array, one for validity, one for values. Introduce a set of
getters and setters for easy access.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Push cmd resource query related defines to cmd.h where they belong.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the MLXSW_REG_DEFINE macro to store register name in string form.
Use this string later on instead of hard coded string values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Save some code and also prepare to easily carry name in string form.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enforce const for getter buf args.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These should be const, so enforce it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
Add BPF numa id helper
This patch set adds a helper for retrieving current numa node
id and a test case for SO_REUSEPORT.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use case is mainly for soreuseport to select sockets for the local
numa node, but since generic, lets also add this for other networking
and tracing program types.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
udp: refactor memory accounting
This patch series refactor the udp memory accounting, replacing the
generic implementation with a custom one, in order to remove the needs for
locking the socket on the enqueue and dequeue operations. The socket backlog
usage is dropped, as well.
The first patch factor out pieces of some queue and memory management
socket helpers, so that they can later be used by the udp memory accounting
functions.
The second patch adds the memory account helpers, without using them.
The third patch replacse the old rx memory accounting path for udp over ipv4 and
udp over ipv6. In kernel UDP users are updated, as well.
The memory accounting schema is described in detail in the individual patch
commit message.
The performance gain depends on the specific scenario; with few flows (and
little contention in the original code) the differences are in the noise range,
while with several flows contending the same socket, the measured speed-up
is relevant (e.g. even over 100% in case of extreme contention)
Many thanks to Eric Dumazet for the reiterated reviews and suggestions.
v5 -> v6:
- do not orphan the skb on enqueue, skb_steal_sock() already did
the work for us
v4 -> v5:
- use the receive queue spin lock to protect the memory accounting
- several minor clean-up
v3 -> v4:
- simplified the locking schema, always use a plain spinlock
v2 -> v3:
- do not set the now unsed backlog_rcv callback
v1 -> v2:
- changed slighly the memory accounting schema, we now perform lazy reclaim
- fixed forward_alloc updating issue
- fixed memory counter integer overflows
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Completely avoid default sock memory accounting and replace it
with udp-specific accounting.
Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.
Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.
Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.
nr readers Kpps (vanilla) Kpps (patched)
1 170 440
3 1250 2150
6 3000 3650
9 4200 4450
12 5700 6250
v4 -> v5:
- avoid unneeded test in first_packet_length
v3 -> v4:
- remove useless sk_rcvqueues_full() call
v2 -> v3:
- do not set the now unsed backlog_rcv callback
v1 -> v2:
- add memory pressure support
- fixed dropwatch accounting for ipv6
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid using the generic helpers.
Use the receive queue spin lock to protect the memory
accounting operation, both on enqueue and on dequeue.
On dequeue perform partial memory reclaiming, trying to
leave a quantum of forward allocated memory.
On enqueue use a custom helper, to allow some optimizations:
- use a plain spin_lock() variant instead of the slightly
costly spin_lock_irqsave(),
- avoid dst_force check, since the calling code has already
dropped the skb dst
- avoid orphaning the skb, since skb_steal_sock() already did
the work for us
The above needs custom memory reclaiming on shutdown, provided
by the udp_destruct_sock().
v5 -> v6:
- don't orphan the skb on enqueue
v4 -> v5:
- replace the mem_lock with the receive queue spin lock
- ensure that the bh is always allowed to enqueue at least
a skb, even if sk_rcvbuf is exceeded
v3 -> v4:
- reworked memory accunting, simplifying the schema
- provide an helper for both memory scheduling and enqueuing
v1 -> v2:
- use a udp specific destrctor to perform memory reclaiming
- remove a couple of helpers, unneeded after the above cleanup
- do not reclaim memory on dequeue if not under memory
pressure
- reworked the fwd accounting schema to avoid potential
integer overflow
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basic sock operations that udp code can use with its own
memory accounting schema. No functional change is introduced
in the existing APIs.
v4 -> v5:
- avoid whitespace changes
v2 -> v4:
- avoid exporting __sock_enqueue_skb
v1 -> v2:
- avoid export sock_rmem_free
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>