commit 1e4e0767ec (Fix locking on fec_mpc52xx driver) assumed IRQ are
enabled when an IRQ handler is called.
It is not the case anymore (IRQF_DISABLED is deprecated), so we can use
regular spin_lock(), no need for spin_lock_irqsave().
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Jean-Michel Hautbois <jhautbois@gmail.com>
Cc: Asier Llano <a.llano@ziv.es>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now, fields in struct sock are not optimally ordered, because each
path (RX softirq, TX completion, RX user, TX user) has to touch fields
that are contained in many different cache lines.
The really critical thing is to shrink number of cache lines that are
used at RX softirq time : CPU handling softirqs for a device can receive
many frames per second for many sockets. If load is too big, we can drop
frames at NIC level. RPS or multiqueue cards can help, but better reduce
latency if possible.
This patch starts with UDP protocol, then additional patches will try to
reduce latencies of other ones as well.
At RX softirq time, fields of interest for UDP protocol are :
(not counting ones in inet struct for the lookup)
Read/Written:
sk_refcnt (atomic increment/decrement)
sk_rmem_alloc & sk_backlog.len (to check if there is room in queues)
sk_receive_queue
sk_backlog (if socket locked by user program)
sk_rxhash
sk_forward_alloc
sk_drops
Read only:
sk_rcvbuf (sk_rcvqueues_full())
sk_filter
sk_wq
sk_policy[0]
sk_flags
Additional notes :
- sk_backlog has one hole on 64bit arches. We can fill it to save 8
bytes.
- sk_backlog is used only if RX sofirq handler finds the socket while
locked by user.
- sk_rxhash is written only once per flow.
- sk_drops is written only if queues are full
Final layout :
[1] One section grouping all read/write fields, but placing rxhash and
sk_backlog at the end of this section.
[2] One section grouping all read fields in RX handler
(sk_filter, sk_rcv_buf, sk_wq)
[3] Section used by other paths
I'll post a patch on its own to put sk_refcnt at the end of struct
sock_common so that it shares same cache line than section [1]
New offsets on 64bit arch :
sizeof(struct sock)=0x268
offsetof(struct sock, sk_refcnt) =0x10
offsetof(struct sock, sk_lock) =0x48
offsetof(struct sock, sk_receive_queue)=0x68
offsetof(struct sock, sk_backlog)=0x80
offsetof(struct sock, sk_rmem_alloc)=0x80
offsetof(struct sock, sk_forward_alloc)=0x98
offsetof(struct sock, sk_rxhash)=0x9c
offsetof(struct sock, sk_rcvbuf)=0xa4
offsetof(struct sock, sk_drops) =0xa0
offsetof(struct sock, sk_filter)=0xa8
offsetof(struct sock, sk_wq)=0xb0
offsetof(struct sock, sk_policy)=0xd0
offsetof(struct sock, sk_flags) =0xe0
Instead of :
sizeof(struct sock)=0x270
offsetof(struct sock, sk_refcnt) =0x10
offsetof(struct sock, sk_lock) =0x50
offsetof(struct sock, sk_receive_queue)=0xc0
offsetof(struct sock, sk_backlog)=0x70
offsetof(struct sock, sk_rmem_alloc)=0xac
offsetof(struct sock, sk_forward_alloc)=0x10c
offsetof(struct sock, sk_rxhash)=0x128
offsetof(struct sock, sk_rcvbuf)=0x4c
offsetof(struct sock, sk_drops) =0x16c
offsetof(struct sock, sk_filter)=0x198
offsetof(struct sock, sk_wq)=0x88
offsetof(struct sock, sk_policy)=0x98
offsetof(struct sock, sk_flags) =0x130
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.
Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now vlan are lockless, we dont need special ndo_select_queue() logic.
dev_pick_tx() will do the multiqueue stuff on the real device transmit.
Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
vlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.
This patch completely removes locking in TX path.
tx stat counters are added into existing percpu stat structure, renamed
from vlan_rx_stats to vlan_pcpu_stats.
Note : this partially reverts commit 2e59af3dcb (vlan: multiqueue vlan
device)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.
This patch completely removes locking in TX path.
tx stat counters are added into existing percpu stat structure, renamed
from rx_stats to pcpu_stats.
Note : this reverts commit 2c11455321 (macvlan: add multiqueue
capability)
Note : rx_errors converted to a 32bit counter, like tx_dropped, since
they dont need 64bit range.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Version 4 of this patch.
Change notes:
1) Removed extra memset. Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)
Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run. The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be. It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation. Thats difficult under good circumstances, and horrid under memory
pressure.
I thought it would be good to make that a bit more usable. I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.
So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option. It attempts to allocate memory in the following order:
1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap
2) Using vmalloc
3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memory
The effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.
Tested successfully by me.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The changed functions do not modify the NL messages and/or attributes
at all. They should use const (similar to strchr), so that callers
which have a const nlmsg/nlattr around can make use of them without
casting.
While at it, constify a data array.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sending zero byte packets is not neccessarily an error (AF_INET accepts it,
too), so just apply a shortcut. This was discovered because of a non-working
software with WINE. See
http://bugs.winehq.org/show_bug.cgi?id=19397#c86http://thread.gmane.org/gmane.linux.irda.general/1643
for very detailed debugging information and a testcase. Kudos to Wolfgang for
those!
Reported-by: Wolfgang Schwotzer <wolfgang.schwotzer@gmx.net>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Mike Evans <mike.evans@cardolan.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ref count bug introduced by
commit 2de7957072
Author: Lorenzo Colitti <lorenzo@google.com>
Date: Wed Oct 27 18:16:49 2010 +0000
ipv6: addrconf: don't remove address state on ifdown if the address
is being kept
Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().
Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ERROR: "netif_get_vlan_features" [drivers/net/xen-netfront.ko] undefined!
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
usb_wait_anchor_empty_timeout's @timeout
wants milliseconds and not jiffies.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
WIPHY_FLAG_IBSS_RSN is BIT(7) as is WIPHY_FLAG_CONTROL_PORT_PROTOCOL. Change
to BIT(8).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When b43legacy is compiled on the arm platform, the following errors are seen:
CC [M] drivers/net/wireless/b43legacy/xmit.o
In file included from include/net/dst.h:11,
from drivers/net/wireless/b43legacy/xmit.c:31:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__'
before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named
'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named
'pcpuc_entries'
make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[3]: *** [drivers/net/wireless/b43legacy] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2
The cause is a missing include of <linux/cache.h>, which is present for
i386 and x86_64 architectures, but not for arm.
Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The commit below added a new helper dev_ingress_queue to cleanly obtain the
ingress queue pointer. This necessitated including 'linux/netdevice.h':
commit 24824a09e3
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat Oct 2 06:11:55 2010 +0000
net: dynamic ingress_queue allocation
However this include triggers issues for applications in userspace
which use the rtnetlink interfaces. Commonly this requires they include
'net/if.h' and 'linux/rtnetlink.h' leading to a compiler error as below:
In file included from /usr/include/linux/netdevice.h:28:0,
from /usr/include/linux/rtnetlink.h:9,
from t.c:2:
/usr/include/linux/if.h:135:8: error: redefinition of ‘struct ifmap’
/usr/include/net/if.h:112:8: note: originally defined here
/usr/include/linux/if.h:169:8: error: redefinition of ‘struct ifreq’
/usr/include/net/if.h:127:8: note: originally defined here
/usr/include/linux/if.h:218:8: error: redefinition of ‘struct ifconf’
/usr/include/net/if.h:177:8: note: originally defined here
The new helper is only defined for the kernel and protected by __KERNEL__
therefore we can simply pull the include down into the same protected
section.
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix data type of argument passed to pci_alloc_consistent and pci_free_consistent routines.
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macro br_port_exists() is not enough protection when only
RCU is being used. There is a tiny race where other CPU has cleared port
handler hook, but is bridge port flag might still be set.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add br_should_route_hook_t typedef, this is the only way we can
get a clean RCU implementation for function pointer.
Move route_hook to location where it is used.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add modern __rcu annotatations to bridge multicast table.
Use newer hlist macros to avoid direct access to hlist internals.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation). Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TX queues are now allocated in alloc_netdev_mq and freed in
free_netdev.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/8021q/vlanproc.c: In function 'vlandev_seq_show':
net/8021q/vlanproc.c:283:20: warning: unused variable 'fmt'
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the handcrafted
sign extension cruft with a generic
bitop function.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch moves code out from wireless drivers where two different
functions are defined in three code locations for the same purpose and
provides a common function to sign extend a 32-bit value.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make use the newly added method to pass platform data for wl1251 too.
This allows to eliminate some redundant code.
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Kalle Valo <kvalo@adurom.com>
Acked-by: Luciano Coelho <luciano.coelho@nokia.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add runtime PM support, similar to how it's done for wl1271.
This allows to power down the card when the driver is loaded but
network is not in use.
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Kalle Valo <kvalo@adurom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Call interface specific power callback before calling board specific
one. Also allow that callback to fail. This is how it's done for
wl1271 and will be used for runtime_pm support.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Kalle Valo <kvalo@adurom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For 6000 series devices and up, enable automatic update MAC's register
for better power usage in PSP mode
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Disconnected antenna algorithm is used to find out which antennas are
disconnected. It should be disabled for devices that support advanced
bluetooth coexist.
Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Disconnected antenna algorithm is seperated into its own function from chain noise
calibration routine for better code management.
Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the HT information is changed due to
BSS changes (like legacy stations joining)
we need to recalculate HT RXON parameters.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
During the RXON rewrite, this code got lost.
When we've just associated, we need to enable
all calibrations and see if some were already
finished. Add back the missing code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The RXON rework resulted in a massive loss of
throughput because we weren't programming the
device completely correctly -- the BSSID has
to be programmed into the device before the
AP station is uploaded. To fix this, simply
always send the unassoc RXON, i.e. even when
it was already unassoc so that the BSSID and
some other parameters are updated properly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Garen noticed that this was wrong. Fix
the calibration -- default to multiple
chains and fall back to single where
possible.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
"mac80211: Fix WMM driver queue configuration"
inadvertedly broke iwlwifi, because now mac80211
configures the QoS settings before assoc, and
therefore before HT. Thus, iwlwifi no longer told
the device about the HT setting, which it needs
to -- and thus throughput went down a lot. Fix
this by resending the QoS command to the device
not only when QoS/WMM settings change, but also
when HT changes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>