The ColdFire M5475 on the m5475evb board supports a PCI bus, lets
enable it for the defconfig to get better build and test coverage.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
The inb/outb/... family of IO methods end up being multiply defined when
building PCI support for the ColdFire. Compiling gives this:
CC init/main.o
In file included from ./arch/m68k/include/asm/io.h:4:0,
from include/linux/bio.h:30,
from include/linux/blkdev.h:18,
from init/main.c:75:
./arch/m68k/include/asm/io_mm.h:420:0: warning: "inb" redefined
./arch/m68k/include/asm/io_mm.h:108:0: note: this is the location of the previous definition
...
The ColdFire/PCI case defines its own IO access methods, so no others
should be defined or used in this case. Conditionally disable other
definitions that clash with it.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
It would be nice if we could support multiple ColdFire SoC types in a
single binary - but currently the code simply does not support it.
Change the SoC selection config options to be a choice instead of
individual selectable entries.
This fixes problems with building allnoconfig, and means that a sane
linux kernel is generated for a single ColdFire SoC type.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Create some intelligent default settings for each ColdFire SoC type
in the configuration entry for CONFIG_CLOCK_FREQ.
The ColdFire clock frequency is configurable at build time. There is a
lot of variation in the frequency of operation on specific ColdFire based
boards. But we can choose a default that matches the maximum frequency
of clock operation for a particular ColdFire part. That is typically
the most common clock setting.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
It is possible to disable the clock selection at configuration time,
but for ColdFire targets we always expect a clock frequency to be
selected. This results in the following compile time error:
CC arch/m68k/kernel/asm-offsets.s
In file included from ./arch/m68k/include/asm/timex.h:14:0,
from include/linux/timex.h:65,
from include/linux/sched.h:19,
from arch/m68k/kernel/asm-offsets.c:14:
./arch/m68k/include/asm/coldfire.h:25:2: error: #error "Don't know what your ColdFire CPU clock frequency is??"
Remove CONFIG_CLOCK_SELECT completely and always enable CONFIG_CLOCK_FREQ
for ColdFire.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
This reverts commit dec4f799d0.
Jörg Otte reports a NULL pointder dereference due to this commit, as
'crtc_state' very much can be NULL:
crtc_state = state->base.state ?
intel_atomic_get_crtc_state(state->base.state, intel_crtc) : NULL;
So the change to test 'crtc_state->base.active' cannot possibly be
correct as-is.
There may be some other minimal fix (like just checking crtc_state for
NULL), but I'm just reverting it now for the rc2 release, and people
like Daniel Vetter who actually know this code will figure out what the
right solution is in the longer term.
Reported-and-bisected-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull VFS fixes from Al Viro:
"Fixes for this cycle regression in overlayfs and a couple of
long-standing (== all the way back to 2.6.12, at least) bugs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
freeing unlinked file indefinitely delayed
fix a braino in ovl_d_select_inode()
9p: don't leave a half-initialized inode sitting around
Pull MIPS fixes from Ralf Baechle:
"A fair number of 4.2 fixes also because Markos opened the flood gates.
- Patch up the math used calculate the location for the page bitmap.
- The FDC (Not what you think, FDC stands for Fast Debug Channel) IRQ
around was causing issues on non-Malta platforms, so move the code
to a Malta specific location.
- A spelling fix replicated through several files.
- Fix to the emulation of an R2 instruction for R6 cores.
- Fix the JR emulation for R6.
- Further patching of mindless 64 bit issues.
- Ensure the kernel won't crash on CPUs with L2 caches with >= 8
ways.
- Use compat_sys_getsockopt for O32 ABI on 64 bit kernels.
- Fix cache flushing for multithreaded cores.
- A build fix"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: O32: Use compat_sys_getsockopt.
MIPS: c-r4k: Extend way_string array
MIPS: Pistachio: Support CDMM & Fast Debug Channel
MIPS: Malta: Make GIC FDC IRQ workaround Malta specific
MIPS: c-r4k: Fix cache flushing for MT cores
Revert "MIPS: Kconfig: Disable SMP/CPS for 64-bit"
MIPS: cps-vec: Use macros for various arithmetics and memory operations
MIPS: kernel: cps-vec: Replace KSEG0 with CKSEG0
MIPS: kernel: cps-vec: Use ta0-ta3 pseudo-registers for 64-bit
MIPS: kernel: cps-vec: Replace mips32r2 ISA level with mips64r2
MIPS: kernel: cps-vec: Replace 'la' macro with PTR_LA
MIPS: kernel: smp-cps: Fix 64-bit compatibility errors due to pointer casting
MIPS: Fix erroneous JR emulation for MIPS R6
MIPS: Fix branch emulation for BLTC and BGEC instructions
MIPS: kernel: traps: Fix broken indentation
MIPS: bootmem: Don't use memory holes for page bitmap
MIPS: O32: Do not handle require 32 bytes from the stack to be readable.
MIPS, CPUFREQ: Fix spelling of Institute.
MIPS: Lemote 2F: Fix build caused by recent mass rename.
Commit 514ac99c64 "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.
This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.
This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.
Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Driver core sets "default" pinmux on on probe and CAN driver
sets "sleep" pinmux during register. This causes a small window
where the CAN pins are in "default" state with the DCAN module
being disabled.
Change the "default" state to be like sleep so this glitch is
avoided. Add a new "active" state that is used by the driver
when CAN is actually active.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The previous change 3973c526ae (net: can: c_can: Disable pins when CAN
interface is down) causes a slight glitch on the pinctrl settings when used.
Since commit ab78029 (drivers/pinctrl: grab default handles from device core),
the device core will automatically set the default pins. This causes the pins
to be momentarily set to the default and then to the sleep state in
register_c_can_dev(). By adding an optional "enable" state, boards can set the
default pin state to be disabled and avoid the glitch when the switch from
default to sleep first occurs. If the "enable" state is not available
c_can_pinctrl_select_state() falls back to using the "default" pinctrl state.
[Roger Q] - Forward port to v4.2 and use pinctrl_get_select().
Signed-off-by: J.D. Schroeder <jay.schroeder@garmin.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
All the error messages in the driver but the ones from devm_clk_get() failures
use similar format. Make those two messages consitent with others.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Also print the error code when the request_irq() call fails in rcar_can_open(),
rewording the error message...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Fix typo in the first error message printed by rcar_can_open().
Based on the original patch by Vladimir Barinov.
Fixes: 862e2b6af9 ("can: rcar_can: support all input clocks")
Reported-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Printing IRQ # using "%x" and "%u" unsigned formats isn't quite correct as
'ndev->irq' is of type *int*, so the "%d" format needs to be used instead.
While fixing this, beautify the dev_info() message in rcar_can_probe() a bit.
Fixes: fd1159318e ("can: add Renesas R-Car CAN driver")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
rcar_can_probe() regards 0 as a wrong IRQ #, despite platform_get_irq() that it
calls returns negative error code in that case. This leads to the following
being printed to the console when attempting to open the device:
error requesting interrupt fffffffa
because rcar_can_open() calls request_irq() with a negative IRQ #, and that
function naturally fails with -EINVAL.
Check for the negative error codes instead and propagate them upstream instead
of just returning -ENODEV.
Fixes: fd1159318e ("can: add Renesas R-Car CAN driver")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull x86 fixes from Thomas Gleixner:
- the high latency PIT detection fix, which slipped through the cracks
for rc1
- a regression fix for the early printk mechanism
- the x86 part to plug irq/vector related hotplug races
- move the allocation of the espfix pages on cpu hotplug to non atomic
context. The current code triggers a might_sleep() warning.
- a series of KASAN fixes addressing boot crashes and usability
- a trivial typo fix for Kconfig help text
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text
x86/irq: Retrieve irq data after locking irq_desc
x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
x86/irq: Plug irq vector hotplug race
x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
x86/espfix: Init espfix on the boot CPU side
x86/espfix: Add 'cpu' parameter to init_espfix_ap()
x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig
x86/kasan: Add message about KASAN being initialized
x86/kasan: Fix boot crash on AMD processors
x86/kasan: Flush TLBs after switching CR3
x86/kasan: Fix KASAN shadow region page tables
x86/init: Clear 'init_level4_pgt' earlier
x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
Pull timer fixes from Thomas Gleixner:
"This update from the timer departement contains:
- A series of patches which address a shortcoming in the tick
broadcast code.
If the broadcast device is not available or an hrtimer emulated
broadcast device, some of the original assumptions lead to boot
failures. I rather plugged all of the corner cases instead of only
addressing the issue reported, so the change got a little larger.
Has been extensivly tested on x86 and arm.
- Get rid of the last holdouts using do_posix_clock_monotonic_gettime()
- A regression fix for the imx clocksource driver
- An update to the new state callbacks mechanism for clockevents.
This is required to simplify the conversion, which will take place
in 4.3"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Prevent NULL pointer dereference
time: Get rid of do_posix_clock_monotonic_gettime
cris: Replace do_posix_clock_monotonic_gettime()
tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
tick/broadcast: Handle spurious interrupts gracefully
tick/broadcast: Check for hrtimer broadcast active early
tick/broadcast: Return busy when IPI is pending
tick/broadcast: Return busy if periodic mode and hrtimer broadcast
tick/broadcast: Move the check for periodic mode inside state handling
tick/broadcast: Prevent deep idle if no broadcast device available
tick/broadcast: Make idle check independent from mode and config
tick/broadcast: Sanity check the shutdown of the local clock_event
tick/broadcast: Prevent hrtimer recursion
clockevents: Allow set-state callbacks to be optional
clocksource/imx: Define clocksource for mx27
Pull irq fix from Thomas Gleixner:
"A single fix for a cpu hotplug race vs. interrupt descriptors:
Prevent irq setup/teardown across the cpu starting/dying parts of cpu
hotplug so that the starting/dying cpu has a stable view of the
descriptor space. This has been an issue for all architectures in the
cpu dying phase, where interrupts are migrated away from the dying
cpu. In the starting phase its mostly a x86 issue vs the vector space
update"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hotplug: Prevent alloc/free of irq descriptors during cpu up/down
Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course). However, there's one case where that
does *not* happen. Namely, if you open it by fhandle with cold dcache,
then unlink() and close().
In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal. In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()). The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in. As the result, we have the final iput() delayed
indefinitely. It's trivial to reproduce -
void flush_dcache(void)
{
system("mount -o remount,rw /");
}
static char buf[20 * 1024 * 1024];
main()
{
int fd;
union {
struct file_handle f;
char buf[MAX_HANDLE_SZ];
} x;
int m;
x.f.handle_bytes = sizeof(x);
chdir("/root");
mkdir("foo", 0700);
fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
close(fd);
name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
flush_dcache();
fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
unlink("foo/bar");
write(fd, buf, sizeof(buf));
system("df ."); /* 20Mb eaten */
close(fd);
system("df ."); /* should've freed those 20Mb */
flush_dcache();
system("df ."); /* should be the same as #2 */
}
will spit out something like
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 283282 21692 93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).
Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
when opening a directory we want the overlayfs inode, not one from
the topmost layer.
Reported-By: Andrey Jr. Melnikov <temnota.am@gmail.com>
Tested-By: Andrey Jr. Melnikov <temnota.am@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The original wait time was 200ms which was enough for the firmware to
finish loading and boot. After that the firmware will perform
initialization and touch calibration, which will take about 1.1 second. The
touch calibration will change controller frequency to scan at the most
optimal frequency and during calibration/frequency switching process we may
run into i2c data errors. To avoid them we extend the sleep to 2500ms after
issuing the sw reset.
Signed-off-by: HungNien Chen <hn.chen@weidahitech.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
These attributes can be used to identify controllers present in the system.
Signed-off-by: HungNien Chen <hn.chen@weidahitech.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Florian Fainelli says:
====================
net: dsa: OF parsing fixes
This patch series fixes two small parsing issues, the first one was
reported by Dan, the second came after looking more closely at the
code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cd->sw_addr is used as a MDIO bus address, which cannot exceed
PHY_MAX_ADDR (32), our check was off-by-one.
Fixes: 5e95329b70 ("dsa: add device tree bindings to register DSA switches")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
port_index is used an index into an array, and this information comes
from Device Tree, make sure that port_index is not equal to the array
size before using it. Move the check against port_index earlier in the
loop.
Fixes: 5e95329b701c: ("dsa: add device tree bindings to register DSA switches")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to abort attribute setting or object addition, if the
prepare phase returned operation not supported.
Thus, abort these two transactions only if the error is not -EOPNOTSUPP.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bcmgenet driver needs to separate packet drops from packet errors.
When the driver has to drop a *good* packet, due to lack of buffers or
replacement skbs, increment only dev->stats.[rx|tx]_dropped.
When the driver encounters a bad Rx packet or Tx error, increment only
dev->stats.[rx|tx]_errors + relevant detailed error counter.
Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update referenced specs link to reflect actual file version and location.
Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull libnvdimm fixes from Dan Williams:
"1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
bug fixes (patches 1-6)
2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).
Granted these are slightly large for a -rc update. They have been
out for review in one form or another since the end of May and were
deferred from the merge window while we settled on the "PMEM API"
for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
and wmb_pmem).
Now that those apis are merged we implement them in the BLK driver
to guarantee that mmio aperture moves stay ordered with respect to
incoming read/write requests, and that writes are flushed through
those mmio-windows and platform-buffers to be persistent on media.
These pass the sub-system unit tests with the updates to
tools/testing/nvdimm, and have received a successful build-report from
the kbuild robot (468 configs).
With acks from Rafael for the touches to drivers/acpi/"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
nfit: add support for NVDIMM "latch" flag
nfit: update block I/O path to use PMEM API
tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
tools/testing/nvdimm: fix return code for unimplemented commands
tools/testing/nvdimm: mock ioremap_wt
pmem: add maintainer for include/linux/pmem.h
nfit: fix smatch "use after null check" report
nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
libnvdimm: smatch cleanups in __nd_ioctl
sparse: fix misplaced __pmem definition
Let's switch to using standard touchscreen device properties parsing module
instead of doing it by hand in the driver.
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
input_mt_init_slots() will perform necessary settings for performing
multi-touch to single-touch emulation, we do not need to do that ourselves.
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The controller has a RESET pin which is usually controlled over
a GPIO line. If such a GPIO is provided, perform a RESET
during probe.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
We are using threaded interrupt handler and thus are allowed to sleep.
Let's switch over to gpiod_get_value_cansleep() so that we do not get
ugly warnings in case GPIO controller might sleep when accessing GPIO.
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This allows uniform parsing on legacy, DT and ACPI systems.
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Let's move driver's platform data definitions from include/linux/input/
into include/linux/platform_data/ so that it stays with the rest of
platform data definitions.
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Let's switch form OF to device properties so that common parsing code could
work not only on device tree but also on ACPI-based platforms.
Reviewed-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The binding specification says that "touchscreen-size-x" and "-y" specify
horizontal and vertical resolution of the touchscreen and therefore maximum
absolute coordinates should be reduced by 1 since we are starting with 0.
Reviewed-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Do issue warning about axis that is present in device tree but not specified
by the driver even in case of multi-touch axis as callers now tell us if they
expect multi-touch data or not.
Reviewed-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
When we have an extent that got N references removed and N new references
added in the same transaction, we must run the insertion of the references
first because otherwise the last removed reference will remove the extent
item from the extent tree, resulting in a failure for the insertions.
This is a regression introduced in the 4.2-rc1 release and this fix just
brings back the behaviour of selecting reference additions before any
reference removals.
The following test case for fstests reproduces the issue:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
_cleanup_flakey
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_dm_flakey
_require_cloner
_require_metadata_journaling $SCRATCH_DEV
rm -f $seqres.full
_scratch_mkfs >>$seqres.full 2>&1
_init_flakey
_mount_flakey
# Create prealloc extent covering range [160K, 620K[
$XFS_IO_PROG -f -c "falloc 160K 460K" $SCRATCH_MNT/foo
# Now write to the last 80K of the prealloc extent plus 40K to the unallocated
# space that immediately follows it. This creates a new extent of 40K that spans
# the range [620K, 660K[.
$XFS_IO_PROG -c "pwrite -S 0xaa 540K 120K" $SCRATCH_MNT/foo | _filter_xfs_io
# At this point, there are now 2 back references to the prealloc extent in our
# extent tree. Both are for our file offset 160K and one relates to a file
# extent item with a data offset of 0 and a length of 380K, while the other
# relates to a file extent item with a data offset of 380K and a length of 80K.
# Make sure everything done so far is durably persisted (all back references are
# in the extent tree, etc).
sync
# Now clone all extents of our file that cover the offset 160K up to its eof
# (660K at this point) into itself at offset 2M. This leaves a hole in the file
# covering the range [660K, 2M[. The prealloc extent will now be referenced by
# the file twice, once for offset 160K and once for offset 2M. The 40K extent
# that follows the prealloc extent will also be referenced twice by our file,
# once for offset 620K and once for offset 2M + 460K.
$CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \
$SCRATCH_MNT/foo
# Now create one new extent in our file with a size of 100Kb. It will span the
# range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
# range [2M + 460K, 3M[. Our new file size is 3M + 100K.
$XFS_IO_PROG -c "pwrite -S 0xbb 3M 100K" $SCRATCH_MNT/foo | _filter_xfs_io
# At this point, there are now (in memory) 4 back references to the prealloc
# extent.
#
# Two of them are for file offset 160K, related to file extent items
# matching the file offsets 160K and 540K respectively, with data offsets of
# 0 and 380K respectively, and with lengths of 380K and 80K respectively.
#
# The other two references are for file offset 2M, related to file extent items
# matching the file offsets 2M and 2M + 380K respectively, with data offsets of
# 0 and 380K respectively, and with lengths of 389K and 80K respectively.
#
# The 40K extent has 2 back references, one for file offset 620K and the other
# for file offset 2M + 460K.
#
# The 100K extent has a single back reference and it relates to file offset 3M.
# Now clone our 100K extent into offset 600K. That offset covers the last 20K
# of the prealloc extent, the whole 40K extent and 40K of the hole starting at
# offset 660K.
$CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
# At this point there's only one reference to the 40K extent, at file offset
# 2M + 460K, we have 4 references for the prealloc extent (2 for file offset
# 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for
# file offset 3M and a new one for file offset 600K).
# Now fsync our file to make all its new data and metadata updates are durably
# persisted and present if a power failure/crash happens after a successful
# fsync and before the next transaction commit.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
echo "File digest before power failure:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
# Silently drop all writes and ummount to simulate a crash/power failure.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
# Allow writes again, mount to trigger log replay and validate file contents.
# During log replay, the btrfs delayed references implementation used to run the
# deletion of back references before the addition of new back references, which
# made the addition fail as it didn't find the key in the extent tree that it
# was looking for. The failure triggered by this test was related to the 40K
# extent, which got 1 reference dropped and 1 reference added during the fsync
# log replay - when running the delayed references at transaction commit time,
# btrfs was applying the deletion before the insertion, resulting in a failure
# of the insertion that ended up turning the fs into read-only mode.
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
echo "File digest after log replay:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
_unmount_flakey
status=0
exit
This issue turned the filesystem into read-only mode (current transaction
aborted) and produced the following traces:
[ 8247.578385] ------------[ cut here ]------------
[ 8247.579947] WARNING: CPU: 0 PID: 11341 at fs/btrfs/extent-tree.c:1547 lookup_inline_extent_backref+0x17d/0x45d [btrfs]()
(...)
[ 8247.601697] Call Trace:
[ 8247.602222] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[ 8247.604320] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[ 8247.605488] [<ffffffffa0506c8d>] ? lookup_inline_extent_backref+0x17d/0x45d [btrfs]
[ 8247.608226] [<ffffffffa0506c8d>] lookup_inline_extent_backref+0x17d/0x45d [btrfs]
[ 8247.617061] [<ffffffffa0507957>] insert_inline_extent_backref+0x41/0xb2 [btrfs]
[ 8247.621856] [<ffffffffa0507c4f>] __btrfs_inc_extent_ref+0x8c/0x20a [btrfs]
[ 8247.624366] [<ffffffffa050ee60>] __btrfs_run_delayed_refs+0xb0c/0xd49 [btrfs]
[ 8247.626176] [<ffffffffa0510dcd>] btrfs_run_delayed_refs+0x6d/0x1d4 [btrfs]
[ 8247.627435] [<ffffffff81155c9b>] ? __cache_free+0x4a7/0x4b6
[ 8247.628531] [<ffffffffa0520482>] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
(...)
[ 8247.648430] ---[ end trace 2461e55f92c2ac2d ]---
[ 8247.727263] WARNING: CPU: 3 PID: 11341 at fs/btrfs/extent-tree.c:2771 btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]()
[ 8247.728954] BTRFS: Transaction aborted (error -5)
(...)
[ 8247.760866] Call Trace:
[ 8247.761534] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[ 8247.764271] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[ 8247.767582] [<ffffffffa0510e04>] ? btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
[ 8247.769373] [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
[ 8247.770836] [<ffffffffa0510e04>] btrfs_run_delayed_refs+0xa4/0x1d4 [btrfs]
[ 8247.772532] [<ffffffff81155c9b>] ? __cache_free+0x4a7/0x4b6
[ 8247.773664] [<ffffffffa0520482>] btrfs_commit_transaction+0x4c/0xa20 [btrfs]
[ 8247.775047] [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
[ 8247.776176] [<ffffffff81155dd5>] ? kmem_cache_free+0x12b/0x189
[ 8247.777427] [<ffffffffa055a920>] btrfs_recover_log_trees+0x2da/0x33d [btrfs]
[ 8247.778575] [<ffffffffa055898e>] ? replay_one_extent+0x4fc/0x4fc [btrfs]
[ 8247.779838] [<ffffffffa051e265>] open_ctree+0x1cc0/0x201a [btrfs]
[ 8247.781020] [<ffffffff81120f48>] ? register_shrinker+0x56/0x81
[ 8247.782285] [<ffffffffa04fb12c>] btrfs_mount+0x5f0/0x734 [btrfs]
(...)
[ 8247.793394] ---[ end trace 2461e55f92c2ac2e ]---
[ 8247.794276] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2771: errno=-5 IO failure
[ 8247.797335] BTRFS: error (device dm-0) in btrfs_replay_log:2375: errno=-5 IO failure (Failed to recover log tree)
Fixes: c6fc245499 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Acked-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
When we call btrfs_commit_transaction(), we splice the list "ordered"
of our transaction handle into the transaction's "pending_ordered"
list, but we don't re-initialize the "ordered" list of our transaction
handle, this means it still points to the same elements it used to
before the splice. Then we check if the current transaction's state is
>= TRANS_STATE_COMMIT_START and if it is we end up calling
btrfs_end_transaction() which simply splices again the "ordered" list
of our handle into the transaction's "pending_ordered" list, leaving
multiple pointers to the same ordered extents which results in list
corruption when we are iterating, removing and freeing ordered extents
at btrfs_wait_pending_ordered(), resulting in access to dangling
pointers / use-after-free issues.
Similarly, btrfs_end_transaction() can end up in some cases calling
btrfs_commit_transaction(), and both did a list splice of the transaction
handle's "ordered" list into the transaction's "pending_ordered" without
re-initializing the handle's "ordered" list, resulting in exactly the
same problem.
This produces the following warning on a kernel with linked list
debugging enabled:
[109749.265416] ------------[ cut here ]------------
[109749.266410] WARNING: CPU: 7 PID: 324 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
[109749.267969] list_del corruption. prev->next should be ffff8800ba087e20, but was fffffff8c1f7c35d
(...)
[109749.287505] Call Trace:
[109749.288135] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[109749.298080] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[109749.331605] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[109749.334849] [<ffffffff81260642>] ? __list_del_entry+0x5a/0x98
[109749.337093] [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
[109749.337847] [<ffffffff81260642>] __list_del_entry+0x5a/0x98
[109749.338678] [<ffffffffa053e8bf>] btrfs_wait_pending_ordered+0x46/0xdb [btrfs]
[109749.340145] [<ffffffffa058a65f>] ? __btrfs_run_delayed_items+0x149/0x163 [btrfs]
[109749.348313] [<ffffffffa054077d>] btrfs_commit_transaction+0x36b/0xa10 [btrfs]
[109749.349745] [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
[109749.350819] [<ffffffffa055370d>] btrfs_sync_file+0x36f/0x3fc [btrfs]
[109749.351976] [<ffffffff8118ec98>] vfs_fsync_range+0x8f/0x9e
[109749.360341] [<ffffffff8118ecc3>] vfs_fsync+0x1c/0x1e
[109749.368828] [<ffffffff8118ee1d>] do_fsync+0x34/0x4e
[109749.369790] [<ffffffff8118f045>] SyS_fsync+0x10/0x14
[109749.370925] [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
[109749.382274] ---[ end trace 48e0d07f7c03d95a ]---
On a non-debug kernel this leads to invalid memory accesses, causing a
crash. Fix this by using list_splice_init() instead of list_splice() in
btrfs_commit_transaction() and btrfs_end_transaction().
Cc: stable@vger.kernel.org
Fixes: 50d9aa99bd ("Btrfs: make sure logged extents complete in the current transaction V3"
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>