efx_unregister_netdev() should not call efx_release_tx_buffers()
directly, as it is already done when closing the device:
efx_net_stop() -> efx_stop_all() -> efx_stop_datapath() ->
efx_fini_tx_queue() -> efx_release_tx_buffers().
(This was presumably a workaround for a race between efx_stop_all()
and the data path that has since been properly fixed.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
rx_queue::enabled guards refill, so rename it to reflect that. Clear
it at the start of the queue teardown process rather than waiting for
the RX queue to be flushed.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We unconditionally acknowledge legacy interrupts just before disabling
them. This workaround is needed on Falcon A1 but probably not on
later chips where the legacy interrupt mechanism is different. It was
also originally done after the IRQ handler was removed, not before.
Restore the original behaviour for Falcon A1 only by doing this
acknowledgement in the efx_nic_type::fini operation.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There are many problems with the current efx_stop_interrupts() and
efx_start_interrupts():
1. On Siena, it is unsafe to disable the master IRQ enable bit
(DRV_INT_EN_KER) while any IRQ sources are enabled.
2. On EF10 there is no master IRQ enable bit, so we cannot expect to
defer IRQs without tearing down event queues. (Though I don't think
we will need to keep any event queues around while the device is down,
as we do for VFDI on Siena.)
3. synchronize_irq() only waits for a running IRQ handler to finish,
not for any propagation through IRQ controllers. Therefore an IRQ may
still be received and handled after efx_stop_interrupts() returns.
IRQ handlers can then race with channel reallocation.
To fix this:
a. Introduce a software IRQ enable flag. So long as this is clear,
IRQ handlers will only acknowledge IRQs and not touch the channel
structures.
b. Define a new struct efx_msi_context as the context for MSIs. This
is never reallocated and is sufficient to find the software enable
flag and the channel structure. It also includes the channel/IRQ
name, which was previously separated out as it must also not be
reallocated.
c. Split efx_{start,stop}_interrupts() into
efx_{,soft_}_{enable,disable}_interrupts(). The 'soft' functions
don't touch the hardware master enable flag (if it exists) and don't
reinitialise or tear down channels with the keep_eventq flag set.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_process_channel_now() is unneeded since self-tests can rely on
normal NAPI polling. Remove it and all calls to it.
efx_channel::work_pending and efx_channel_processed() are also
unneeded (the latter being the same as efx_nic_eventq_read_ack()).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The EF10 architecture has a very different register layout from
previous controllers, so we'll use separate files for the two sets of
register definitions. Use 'farch' as an abbreviation for
Falcon-architecture.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On EF10, the firmware is in charge of allocating buffer table entries.
Change struct efx_special_buffer to use a struct efx_buffer member,
so that it can be used with efx_nic_{alloc,free}_buffer() in that
case.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Most call sites for efx_nic_alloc_buffer() are part of the probe or
reconfiguration paths and can allocate with GFP_KERNEL. A few others
should use GFP_NOIO (I think). Only one is in atomic context and
must use the current GFP_ATOMIC.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Move the lowest layer (transport) of the current MCDI code to
per-NIC-type operations.
Introduce a new structure and efx_nic member for MCDI-specific data.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This should probably be done during MCDI initialisation for any NIC.
Change efx_mcdi_init() to return an error code.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Collect together MCDI port functions from mcdi.c, mcdi_mac.c,
mcdi_phy.c and siena.c. Rename the 'siena' functions accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We currently require that MCDI request and response lengths are
multiples of 4 bytes, because we will copy dwords in and out of shared
memory and we want to be sure we won't read or write out of bounds.
But all we really need to know is that there is sufficient padding for
that. Also, we should ensure that buffers are dword-aligned, as on
some architectures misaligned access will result in data corruption or
a crash.
Change the buffer type to array-of-efx_dword_t and remove the
requirement that the lengths are multiples of 4.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
A few functions are using heap buffers; change them to use stack
buffers as we really don't need to resort to the heap for a 252
byte buffer in process context.
MC_CMD_MEMCPY is quite weird in that it can use inline data placed in
the request buffer after the array of records. Thus there are two
variable-length arrays and we can't use the normal accessors for
the second. So we have to use _MCDI_PTR() in efx_sriov_memcpy().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We need to access arrays of 16-bit words and 32-bit dwords in MCDI
buffers based on the MCDI protocol definitions.
We should also be able to read and write fields within structures,
without specifying an array index each time. So add MCDI_FIELD()
and make MCDI_ARRAY_FIELD() use it. Also add MCDI_SET_FIELD().
Split MCDI_ARRAY_PTR() into MCDI_ARRAY_STRUCT_PTR() and
_MCDI_ARRAY_PTR(), which are currently identical but will diverge in
later changes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add _MCDI_DWORD() which yields an lvalue for the given dword field
and change MCDI_DWORD(), MCDI_SET_DWORD() and MCDI_QWORD() to use it.
Fold the rather trivial MCDI_PTR2() into MCDI_PTR() and _MCDI_DWORD().
Remove MCDI_SET_DWORD2() and MCDI_QWORD2(). MCDI_DWORD2() should also
go, but it still has one user which we'll get rid of later.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
MCDI_DECLARE_BUF declares a variable as an MCDI buffer of the
requested length, adding any necessary padding.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
commit 385904f819 ('sfc: Don't use
efx_filter_{build,hash,increment}() for default MAC filters') used the
wrong name to find the index of default RX MAC filters at insertion/
update time. This could result in memory corruption and would in any
case silently fail to update the filter.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Pravin B Shelar says:
====================
openvswitch: VXLAN tunneling.
First four vxlan patches extends vxlan so that openvswitch
can share vxlan recv code. Rest of patches refactors vxlan
data plane so that ovs can share that code with vxlan module.
Last patch adds vxlan-vport to openvswitch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch adds vxlan vport type for openvswitch using
vxlan api. So now there is vxlan dependency for openvswitch.
CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch allows transmit side vlan offload for vxlan
devices.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than having static headroom calculation, adjust headroom
according to target device.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch allows more code sharing between vxlan and ovs-vxlan.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch adds data field to vxlan socket and export
vxlan handler api.
vh->data is required to store private data per vxlan handler.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once we have ovs-vxlan functionality, one UDP port can be assigned
to kernel-vxlan or ovs-vxlan port. Therefore following patch adds
vxlan demux functionality, so that vxlan or ovs module can
register for particular port.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use iptunnel_pull_header() for better code sharing.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restructure vxlan-socket management APIs so that it can be
shared between vxlan and ovs modules.
This patch does not change any functionality.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
v6-v7:
- get rid of zero refcnt vs from hashtable.
Signed-off-by: David S. Miller <davem@davemloft.net>
Split some parts of code into another function to simplify
tx_bottom(). Use while loop to replace the goto loop.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move some declearation of variables in rx_bottom().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Use r8152_get_tx_agg for getting tx agg list
- Replace submit rx with goto submit
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the type of contex of tx_agg and rx_agg from void * to
staruc r8152 *.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
another Exynos clock patch which fixes a regression that prevents the
video pipeline from functioning on that platform.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSCma3AAoJEDqPOy9afJhJI/QP/R5kIvB3osg7ZGXYXzHtC4nu
EamZXz77Wg/fvpktaLP6C5eDjJgYKgUV2qwLvjurxRjpeJ4GmLlf0ieB8XknI8XQ
8BFjIk4ugvfqjWJ3VxO5Gcc1VdTAgpyZHVUbrmRMmAa/lWDO0BUFDdmyIguY7DoN
ydrWKBESnT/4tQ2e+a0i+/mzBqaLOrc5hzL2g76TUHDNO0XlFQpVIO4buI9+dlpb
maH9Be+Mg7q6JbTg6RdetZuJ7IowaYSOL5fDmGJPBuzbclpZ4fmDjpWCt//vAH4c
M+2mi6E0PjOXNv5F5R+A5Vg4oYAnXFLBay8eg+ss4ffSdV4XA0gRHUVUA4MF11FJ
rGJPhvNSOiuhnA2P3gNJd7EMHXLsZedLLHd00N5ZoOrSB9sTs7Q8qUiBnh1cCms/
/amCZBJQV8qZpyutn4lQC575JmGwysmou6Upg1f97GsqCVSIGQle+JjqcKJJ70Qr
jjTxg5/JKupcDhvJntfWm17z8dRdScsjzRnpWTNa1QeD48JJGyottbbX739pjtzC
afz8RqW3NDI7gwFMHcz6uo5hTsDByXP9NHexIDmgoDjxVu/NGOrQTegjRdFiE3k8
AZ1VNQZoa+nwp8TDxbc7XMxW/fg+noTZw3E+IvExw+jo+x0HSiYnsL3VxGlDBPlA
b4D9DkTR2z+u5EAMztnC
=C3go
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux
Pull clock controller fixes from Michael Turquette:
"Two small fixes for the Zynq clock controller introduced in 3.11-rc1
and another Exynos clock patch which fixes a regression that prevents
the video pipeline from functioning on that platform"
* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux:
clk: exynos4: Add CLK_GET_RATE_NOCACHE flag for the Exynos4x12 ISP clocks
clk/zynq/clkc: Add CLK_SET_RATE_PARENT flag to ethernet muxes
clk/zynq/clkc: Add dedicated spinlock for the SWDT
- The removal of delayed_work_pending() checks from kernel/power/qos.c
done in 3.9 introduced a deadlock in pm_qos_work_fn(). Fix from
Stephen Boyd.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJSDi1zAAoJEKhOf7ml8uNsv1wQAIfa2cD6k7WaFrDpL8FTRatY
77qDudjeJnv24R9lRA7rk3FViTfLWUoKAmJrCtgaWV7AxxvtJur2L3Q1vq5QvF8j
m44Dtn8WqyNezgnSoMMHW+SWfvYhduoF/U++8EZ4PschzNnm146cLVT4jjEu7twK
btCqB+Qg/F5jfdv+HuUCLfjx1WP9OgXi3km97fKKuRPFPG86ykoxUoT6GSNZ3kT9
60eOhf840ULOgtOLZV6gLJTVlJFY3dNviLQZXF3x+1VXwoOzoV0y496OItX6IVye
K6qN8T3IGdkqg7urBGRtskFc3IVutuUTY2UAxJjQqGOVl2W6Te7KSk8czTJp6hbl
jF5p5S7m/V6Oj0021ndXpAmgb5yDWxM+qCOuXxfBScd+ZWn190+0Ok1PYVQEXOLT
vczhn8b2OMojvC7bWiVFEAfxMwK/5qGI/+yeIIC8pf27TcrgfJYnCBd7YNXyTa3Z
sfr4ITUnu+IJq6NlJtK7brzAd3270TWljZUn/zQESyC8U7b2zWPWE3U5hB7Sil85
rJ2U91deoBu2/gEhZcyFjSTzikc9rhMZQHJ/BvzwMraUko+1uDHM+PPaXv3V8Q6c
SSvizjx4QGlTrr/PiXKMFTQO1ArwBJvy2r8NJLGPKSaUKAelU0wYSrUoUkhI/CtT
v4p4xFwXGObOyv4UFg3H
=mGG3
-----END PGP SIGNATURE-----
Merge tag 'pm-3.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"The removal of delayed_work_pending() checks from kernel/power/qos.c
done in 3.9 introduced a deadlock in pm_qos_work_fn().
Fix from Stephen Boyd"
* tag 'pm-3.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / QoS: Fix workqueue deadlock when using pm_qos_update_request_timeout()
This batch contains a few USB audio fixes, a couple of HD-audio quirks,
various small ASoC driver fixes in addition to an ASoC core fix that
may lead to memory corruption.
Unfortunately slightly more volume than the previous pull request, but
all are reasonable regression fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJSDfGrAAoJEGwxgFQ9KSmkwWQP/RJMMp8lYZXqRM2rPkiPxNup
RoCQU+wI+XQ9GqPDgTRE3FQVNO7fn785EzyXF2UK0Vb9nDlzpUu4fSj146+U/JhP
Cj1ryXYzg8ur3aLyhA6sNbFZIofcxdY6ynhpzqcQQexp1WEYphC92iGyq7DrOiFm
B9tg3M3c9f1/bzAtb5e70KK6t+2P3wxGZ1YTVEOZhr52qlvVlxuQtmwm4DuWDzhP
sPWlxfZfCbkER9PmNn3IqykDdG8mXT57E6AMjT1+Pd2Qk7LqU8SmOEvQY+qkfW0D
3unMbGTJYr5ewRrT+9/n6sF4Xp62l6hXK8a9ffw0/qZ+QRlVYLHvSr7LEMbmTgS7
8hLsVi4vT6nXI71VRC4ccpFA4+7vEybzepOW8tZL+LEvR3ocakRiXkBZ5wFT8CiN
ORdrxQrbdfVbPdCe/MxkSpGyMQpChXNjVero/noSQUgCTvS8egCI4DEs258GUeDh
mEjYzndu5siWTda8CjVc9+kifWuZIOQ0xzzZ/xiiKm5k/nWTNcCASb+OoKjL1jpS
+ux9PmCPsySeeKl3yrCol2smYzkLe+4RmmMKnmIHlyCaaLAg5ExxWWywJPSYYlXN
R+jRm+67+SozKgV7C3VbBWyuBKxWeSFNVNAeHPCZsXUBwEqmgo1HZBq1O/IDh3Ik
YNWPOncmA4eFXuxb2RYx
=EnXH
-----END PGP SIGNATURE-----
Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This batch contains a few USB audio fixes, a couple of HD-audio
quirks, various small ASoC driver fixes in addition to an ASoC core
fix that may lead to memory corruption.
Unfortunately slightly more volume than the previous pull request, but
all are reasonable regression fixes"
* tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add a fixup for Gateway LT27
ASoC: tegra: fix Tegra30 I2S capture parameter setup
ALSA: usb-audio: Fix invalid volume resolution for Logitech HD Webcam C525
ALSA: hda - Fix missing mute controls for CX5051
ALSA: usb-audio: fix automatic Roland/Yamaha MIDI detection
ALSA: 6fire: make buffers DMA-able (midi)
ALSA: 6fire: make buffers DMA-able (pcm)
ALSA: hda - Add pinfix for LG LW25 laptop
ASoC: cs42l52: Add new TLV for Beep Volume
ASoC: cs42l52: Reorder Min/Max and update to SX_TLV for Beep Volume
ASoC: dapm: Fix empty list check in dapm_new_mux()
ASoC: sgtl5000: fix buggy 'Capture Attenuate Switch' control
ASoC: sgtl5000: prevent playback to be muted when terminating concurrent capture
Here are some small USB fixes for 3.11-rc6 that have accumulated.
Nothing huge, a EHCI fix that solves a much-reported audio USB problem,
some usb-serial driver endian fixes and other minor fixes, a wireless
USB oops fix, and two new quirks.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iEYEABECAAYFAlINiREACgkQMUfUDdst+ynMkACgvH7Xfb6U27cAsf+Pf03lhWyg
4+sAnjKq3kD+UXwIaaSIeod5xEKLZLLk
=+24N
-----END PGP SIGNATURE-----
Merge tag 'usb-3.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 3.11-rc6 that have accumulated.
Nothing huge, a EHCI fix that solves a much-reported audio USB
problem, some usb-serial driver endian fixes and other minor fixes, a
wireless USB oops fix, and two new quirks"
* tag 'usb-3.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: keyspan: fix null-deref at disconnect and release
USB: mos7720: fix broken control requests
usb: add two quirky touchscreen
USB: ti_usb_3410_5052: fix big-endian firmware handling
USB: adutux: fix big-endian device-type reporting
USB: usbtmc: fix big-endian probe of Rigol devices
USB: mos7840: fix big-endian probe
USB-Serial: Fix error handling of usb_wwan
wusbcore: fix kernel panic when disconnecting a wireless USB->serial device
USB: EHCI: accept very late isochronous URBs
Pull networking fixes from David Miller:
1) Fix SKB leak in 8139cp, from Dave Jones.
2) Fix use of *_PAGES interfaces with mlx5 firmware, from Moshe Lazar.
3) RCU conversion of macvtap introduced two races, fixes by Eric
Dumazet
4) Synchronize statistic flows in bnx2x driver to prevent corruption,
from Dmitry Kravkov
5) Undo optimization in IP tunneling, we were using the inner IP header
in some cases to inherit the IP ID, but that isn't correct in some
circumstances. From Pravin B Shelar
6) Use correct struct size when parsing netlink attributes in
rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen
7) Length verifications in tun_get_user() are bogus, from Weiping Pan
and Dan Carpenter
8) Fix bad merge resolution during 3.11 networking development in
openvswitch, albeit a harmless one which added some unreachable
code. From Jesse Gross
9) Wrong size used in flexible array allocation in openvswitch, from
Pravin B Shelar
10) Clear out firmware capability flags the be2net driver isn't ready to
handle yet, from Sarveshwar Bandi
11) Revert DMA mapping error checking addition to cxgb3 driver, it's
buggy. From Alexey Kardashevskiy
12) Fix regression in packet scheduler rate limiting when working with a
link layer of ATM. From Jesper Dangaard Brouer
13) Fix several errors in TCP Cubic congestion control, in particular
overflow errors in timestamp calculations. From Eric Dumazet and
Van Jacobson
14) In ipv6 routing lookups, we need to backtrack if subtree traversal
don't result in a match. From Hannes Frederic Sowa
15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs
16) Get "low latency" out of the new MIB counter names. From Eliezer
Tamir
17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar
Samudrala
18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung
Cheng
19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean
20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong
21) call_rcu() call to destroy SCTP transport is done too early and
might result in an oops. From Daniel Borkmann
22) Fix races in genetlink family dumps, from Johannes Berg
23) Flags passed into macvlan by the user need to be validated properly,
from Michael S Tsirkin
24) Fix skge build on 32-bit, from Stephen Hemminger
25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira
Ayuso
26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay
Aleksandrov
27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel
Borkmann
28) neigh_parms need to be setup before calling the ->ndo_neigh_setup()
method. From Veaceslav Falico
29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet
30) Don't dereference MLD query message if the length isn't value in the
bridge multicast code, from Linus Lüssing
31) Fix VXLAN IGMP join regression due to an inverted check, from Cong
Wang
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes
tun: signedness bug in tun_get_user()
qlcnic: Fix diagnostic interrupt test for 83xx adapters
qlcnic: Fix beacon state return status handling
qlcnic: Fix set driver version command
net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset
net_sched: restore "linklayer atm" handling
drivers/net/ethernet/via/via-velocity.c: update napi implementation
Revert "cxgb3: Check and handle the dma mapping errors"
be2net: Clear any capability flags that driver is not interested in.
openvswitch: Reset tunnel key between input and output.
openvswitch: Use correct type while allocating flex array.
openvswitch: Fix bad merge resolution.
tun: compare with 0 instead of total_len
rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header
ethernet/arc/arc_emac - fix NAPI "work > weight" warning
ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
bnx2x: prevent crash in shutdown flow with CNIC
bnx2x: fix PTE write access error
bnx2x: fix memory leak in VF
...
Ben Tebulin reported:
"Since v3.7.2 on two independent machines a very specific Git
repository fails in 9/10 cases on git-fsck due to an SHA1/memory
failures. This only occurs on a very specific repository and can be
reproduced stably on two independent laptops. Git mailing list ran
out of ideas and for me this looks like some very exotic kernel issue"
and bisected the failure to the backport of commit 53a59fc67f ("mm:
limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
That commit itself is not actually buggy, but what it does is to make it
much more likely to hit the partial TLB invalidation case, since it
introduces a new case in tlb_next_batch() that previously only ever
happened when running out of memory.
The real bug is that the TLB gather virtual memory range setup is subtly
buggered. It was introduced in commit 597e1c3580 ("mm/mmu_gather:
enable tlb flush range in generic mmu_gather"), and the range handling
was already fixed at least once in commit e6c495a96c ("mm: fix the TLB
range flushed when __tlb_remove_page() runs out of slots"), but that fix
was not complete.
The problem with the TLB gather virtual address range is that it isn't
set up by the initial tlb_gather_mmu() initialization (which didn't get
the TLB range information), but it is set up ad-hoc later by the
functions that actually flush the TLB. And so any such case that forgot
to update the TLB range entries would potentially miss TLB invalidates.
Rather than try to figure out exactly which particular ad-hoc range
setup was missing (I personally suspect it's the hugetlb case in
zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
did), this patch just gets rid of the problem at the source: make the
TLB range information available to tlb_gather_mmu(), and initialize it
when initializing all the other tlb gather fields.
This makes the patch larger, but conceptually much simpler. And the end
result is much more understandable; even if you want to play games with
partial ranges when invalidating the TLB contents in chunks, now the
range information is always there, and anybody who doesn't want to
bother with it won't introduce subtle bugs.
Ben verified that this fixes his problem.
Reported-bisected-and-tested-by: Ben Tebulin <tebulin@googlemail.com>
Build-testing-by: Stephen Rothwell <sfr@canb.auug.org.au>
Build-testing-by: Richard Weinberger <richard.weinberger@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>