OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ivan Khoronzhuk	5e391dc5a8	net: ethernet: ti: cpsw: fix tx vlan priority mapping The CPDMA_TX_PRIORITY_MAP in real is vlan pcp field priority mapping register and basically replaces vlan pcp field for tagged packets. So, set it to be 1:1 mapping. Otherwise, it will cause unexpected change of egress vlan tagged packets, like prio 2 -> prio 5. Fixes: `e05107e6b7` ("net: ethernet: ti: cpsw: add multi queue support") Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:55:43 -04:00
Cong Wang	b905ef9ab9	llc: delete timers synchronously in llc_sk_free() The connection timers of an llc sock could be still flying after we delete them in llc_sk_free(), and even possibly after we free the sock. We could just wait synchronously here in case of troubles. Note, I leave other call paths as they are, since they may not have to wait, at least we can change them to synchronously when needed. Also, move the code to net/llc/llc_conn.c, which is apparently a better place. Reported-by: <syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:55:03 -04:00
Guillaume Nault	5411b6187a	l2tp: fix {pppol2tp, l2tp_dfs}_seq_stop() in case of seq_file overflow Commit `0e0c3fee3a` ("l2tp: hold reference on tunnels printed in pppol2tp proc file") assumed that if pppol2tp_seq_stop() was called with non-NULL private data (the 'v' pointer), then pppol2tp_seq_start() would not be called again. It turns out that this isn't guaranteed, and overflowing the seq_file's buffer in pppol2tp_seq_show() is a way to get into this situation. Therefore, pppol2tp_seq_stop() needs to reset pd->tunnel, so that pppol2tp_seq_start() won't drop a reference again if it gets called. We also have to clear pd->session, because the rest of the code expects a non-NULL tunnel when pd->session is set. The l2tp_debugfs module has the same issue. Fix it in the same way. Fixes: `0e0c3fee3a` ("l2tp: hold reference on tunnels printed in pppol2tp proc file") Fixes: `f726214d9b` ("l2tp: hold reference on tunnels printed in l2tp/tunnels debugfs file") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:46:35 -04:00
David S. Miller	353697e6e6	Merge branch 's390-qeth-fixes' Julian Wiedmann says: ==================== s390/qeth: fixes 2018-04-19 Please apply the following qeth fixes for 4.17. The common theme seems to be error handling improvements in various areas of cmd IO. Patches 1-3 should also go back to stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:42:32 -04:00
Julian Wiedmann	b7493e91c1	s390/qeth: use Read device to query hypervisor for MAC For z/VM NICs, qeth needs to consider which of the three CCW devices in an MPC group it uses for requesting a managed MAC address. On the Base device, the hypervisor returns a default MAC which is pre-assigned when creating the NIC (this MAC is also returned by the READ MAC primitive). Querying any other device results in the allocation of an additional MAC address. For consistency with READ MAC and to avoid using up more addresses than necessary, it is preferable to use the NIC's default MAC. So switch the the diag26c over to using a NIC's Read device, which should always be identical to the Base device. Fixes: `ec61bd2fd2` ("s390/qeth: use diag26c to get MAC address on L2") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:42:32 -04:00
Julian Wiedmann	db71bbbd11	s390/qeth: fix request-side race during cmd IO timeout Submitting a cmd IO request (usually on the WRITE device, but for IDX also on the READ device) is currently done with ccw_device_start() and a manual timeout in the caller. On timeout, the caller cleans up the related resources (eg. IO buffer). But 1) the IO might still be active and utilize those resources, and 2) when the IO completes, qeth_irq() will attempt to clean up the same resources again. Instead of introducing additional resource locking, switch to ccw_device_start_timeout() to ensure IO termination after timeout, and let the IRQ handler alone deal with cleaning up after a request. This also removes a stray write->irq_pending reset from clear_ipacmd_list(). The routine doesn't terminate any pending IO on the WRITE device, so this should be handled properly via IO timeout in the IRQ handler. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:42:32 -04:00
Julian Wiedmann	bcacfcbc82	s390/qeth: fix MAC address update sequence When changing the MAC address on a L2 qeth device, current code first unregisters the old address, then registers the new one. If HW rejects the new address (or the IO fails), the device ends up with no operable address at all. Re-order the code flow so that the old address only gets dropped if the new address was registered successfully. While at it, add logic to catch some corner-cases. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:42:31 -04:00
Julian Wiedmann	a936b1ef37	s390/qeth: handle failure on workqueue creation Creating the global workqueue during driver init may fail, deal with it. Also, destroy the created workqueue on any subsequent error. Fixes: `0f54761d16` ("qeth: Support VEPA mode") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:42:31 -04:00
Julian Wiedmann	901e3f49fa	s390/qeth: avoid control IO completion stalls For control IO, qeth currently tracks the index of the buffer that it expects to complete the next IO on each qeth_channel. If the channel presents an IRQ while this buffer has not yet completed, no completion processing for _any_ completed buffer takes place. So if the 'next buffer' is skipped for any sort of reason* (eg. when it is released due to error conditions, before the IO is started), the buffer obviously won't switch to PROCESSED until it is eventually allocated for a _different_ IO and completes. Until this happens, all completion processing on that channel stalls and pending requests possibly time out. As a fix, remove the whole 'next buffer' logic and simply process any IO buffer right when it completes. A channel will never have more than one IO pending, so there's no risk of processing out-of-sequence. *Note: currently just one location in the code really handles this problem, by advancing the 'next' index manually. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:42:31 -04:00
Julian Wiedmann	686c97ee29	s390/qeth: fix error handling in adapter command callbacks Make sure to check both return code fields before(!) processing the command response. Otherwise we risk operating on invalid data. This matches an earlier fix for SETASSPARMS commands, see commit `ad3cbf6133` ("s390/qeth: fix error handling in checksum cmd callback"). Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-22 14:42:31 -04:00
Linus Torvalds	37a535edd7	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of fixes for x86: - Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which causes the possible CPU count to be wrong. - Prevent 32bit truncation in calc_hpet_ref() which causes the TSC calibration to fail - Fix the page table setup for temporary text mappings in the resume code which causes resume failures - Make the page table dump code handle HIGHPTE correctly instead of oopsing - Support for topologies where NUMA nodes share an LLC to prevent a invalid topology warning and further malfunction on such systems. - Remove the now unused pci-nommu code - Remove stale function declarations" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/power/64: Fix page-table setup for temporary text mapping x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y x86,sched: Allow topologies where NUMA nodes share an LLC x86/processor: Remove two unused function declarations x86/acpi: Prevent X2APIC id 0xffffffff from being accounted x86/tsc: Prevent 32bit truncation in calc_hpet_ref() x86: Remove pci-nommu.c	2018-04-22 11:40:52 -07:00
Linus Torvalds	c1e9dae0a9	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of timer fixes: - Evaluate the -ETIME condition correctly in the imx tpm driver - Fix the evaluation order of a condition in posix cpu timers - Use pr_cont() in the clockevents code to prevent ugly message splitting - Remove __current_kernel_time() which is now unused to prevent that new users show up. - Remove a stale forward declaration" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/imx-tpm: Correct -ETIME return condition check posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated timekeeping: Remove __current_kernel_time() timers: Remove stale struct tvec_base forward declaration clockevents: Fix kernel messages split across multiple lines	2018-04-22 10:49:02 -07:00
Linus Torvalds	38f0b33e6d	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A larger set of updates for perf. Kernel: - Handle the SBOX uncore monitoring correctly on Broadwell CPUs which do not have SBOX. - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The percentage of preempting and non-preempting context switches help understanding the nature of workloads (CPU or IO bound) that are running on a machine. This adds the kernel facility and userspace changes needed to show this information in 'perf script' and 'perf report -D' (Alexey Budankov) - Remove a WARN_ON() in the trace/kprobes code which is pointless because the return error code is already telling the caller what's wrong. - Revert a fugly workaround for clang BPF targets. - Fix sample_max_stack maximum check and do not proceed when an error has been detect, return them to avoid misidentifying errors (Jiri Olsa) - Add SPDX idenitifiers and get rid of GPL boilderplate. Tools: - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar) - Support MAP_FIXED_NOREPLACE, noticed when updating the tools/include/ copies (Arnaldo Carvalho de Melo) - Add '\n' at the end of parse-options error messages (Ravi Bangoria) - Add s390 support for detailed/verbose PMU event description (Thomas Richter) - perf annotate fixes and improvements: * Allow showing offsets in more than just jump targets, use the new 'O' hotkey in the TUI, config ~/.perfconfig annotate.offset_level for it and for --stdio2 (Arnaldo Carvalho de Melo) * Use the resolved variable names from objdump disassembled lines to make them more compact, just like was already done for some instructions, like "mov", this eventually will be done more generally, but lets now add some more to the existing mechanism (Arnaldo Carvalho de Melo) - perf record fixes: * Change warning for missing topology sysfs entry to debug, as not all architectures have those files, s390 being one of those (Thomas Richter) * Remove old error messages about things that unlikely to be the root cause in modern systems (Andi Kleen) - perf sched fixes: * Fix -g/--call-graph documentation (Takuya Yamamoto) - perf stat: * Enable 1ms interval for printing event counters values in (Alexey Budankov) - perf test fixes: * Run dwarf unwind on arm32 (Kim Phillips) * Remove unused ptrace.h include from LLVM test, sidesteping older clang's lack of support for some asm constructs (Arnaldo Carvalho de Melo) * Fixup BPF test using epoll_pwait syscall function probe, to cope with the syscall routines renames performed in this development cycle (Arnaldo Carvalho de Melo) - perf version fixes: * Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version --build-options' when HAVE_SYSCALL_TABLE_SUPPORT is true, as libaudit won't be used in that case, print info about syscall_table support instead (Jin Yao) - Build system fixes: * Use HAVE_..._SUPPORT used consistently (Jin Yao) * Restore READ_ONCE() C++ compatibility in tools/include (Mark Rutland) * Give hints about package names needed to build jvmti (Arnaldo Carvalho de Melo)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server" coresight: Move to SPDX identifier perf test BPF: Fixup BPF test using epoll_pwait syscall function probe perf tests mmap: Show which tracepoint is failing perf tools: Add '\n' at the end of parse-options error messages perf record: Remove suggestion to enable APIC perf record: Remove misleading error suggestion perf hists browser: Clarify top/report browser help perf mem: Allow all record/report options perf trace: Support MAP_FIXED_NOREPLACE perf: Remove superfluous allocation error check perf: Fix sample_max_stack maximum check perf: Return proper values for user stack errors perf list: Add s390 support for detailed/verbose PMU event description perf script: Extend misc field decoding with switch out event type perf report: Extend raw dump (-D) out with switch out event type perf/core: Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE] tools/headers: Synchronize kernel ABI headers, v4.17-rc1 trace_kprobe: Remove warning message "Could not insert probe at..." ...	2018-04-22 10:17:01 -07:00
Linus Torvalds	18de45a925	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Thomas Gleixner: "A single fix for objtool so it uses the host C and LD flags and not the target ones" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Support HOSTCFLAGS and HOSTLDFLAGS	2018-04-22 09:48:13 -07:00
Linus Torvalds	285848b0f4	Fix some bugs in the /dev/random driver which causes getrandom(2) to unblock earlier than designed. Thanks to Jann Horn from Google's Project Zero for pointing this out to me. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlrcCWAACgkQ8vlZVpUN gaOedwf/e1OU7CXMiinCcGfsr5XydZrEivaqS9QmqAKsLzJSNDDu1Jw6N9cbWagp OEIIRZdaFPImZHosEbjOW12Z3nxnlDC8jtOLyLIRGSA2u4RXd03RupHhQW4cE7ys EOljEvK5KFDIlPa947R5/k4CzC4O3PGf1GdWhHmkENOgd23GqI/yOTKQq5Z5ZgAp rZzcXiuCSq1QkLME7ElxoOLQhs+fYiVGoAM/maxLa+2g4M1Y/YlHBDGhG4RB4lLA 3zugbyJ15tNfgNuRvCB4x304WkCp5VDlcsCiKq18LFcrkz1SYGj5LwG/bswDqgkS 0mOtZKu68NhutX8Pcy4vY3iOmMa1/Q== =RhHb -----END PGP SIGNATURE----- Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull /dev/random fixes from Ted Ts'o: "Fix some bugs in the /dev/random driver which causes getrandom(2) to unblock earlier than designed. Thanks to Jann Horn from Google's Project Zero for pointing this out to me" * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: add new ioctl RNDRESEEDCRNG random: crng_reseed() should lock the crng instance that it is modifying random: set up the NUMA crng instances after the CRNG is fully initialized random: use a different mixing algorithm for add_device_randomness() random: fix crng_ready() test	2018-04-21 21:20:48 -07:00
Linus Torvalds	4c50ceae8f	Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A regression fix, new unit test infrastructure and a build fix: - Regression fix addressing support for the new NVDIMM label storage area access commands (_LSI, _LSR, and _LSW). The Intel specific version of these commands communicated the "Device Locked" status on the label-storage-information command. However, these new commands (standardized in ACPI 6.2) communicate the "Device Locked" status on the label-storage-read command, and the driver was missing the indication. Reading from locked persistent memory is similar to reading unmapped PCI memory space, returns all 1's. - Unit test infrastructure is added to regression test the "Device Locked" detection failure. - A build fix is included to allow the "of_pmem" driver to be built as a module and translate an Open Firmware described device to its local numa node" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: MAINTAINERS: Add backup maintainers for libnvdimm and DAX device-dax: allow MAP_SYNC to succeed Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error" libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid() tools/testing/nvdimm: enable labels for nfit_test.1 dimms tools/testing/nvdimm: fix missing newline in nfit_test_dimm 'handle' attribute tools/testing/nvdimm: support nfit_test_dimm attributes under nfit_test.1 tools/testing/nvdimm: allow custom error code injection libnvdimm, dimm: handle EACCES failures from label reads	2018-04-21 21:11:05 -07:00
Linus Torvalds	5e7c780611	sound fixes for 4.17-rc2 A few small fixes: - A fix for the NULL-dereference in rawmidi compat ioctls, triggered by fuzzer - HD-audio Realtek codec quirks, a VIA controller fixup - A long-standing bug fix in LINE6 MIDI -----BEGIN PGP SIGNATURE----- iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlrZq6YOHHRpd2FpQHN1 c2UuZGUACgkQLtJE4w1nLE+jhRAA0cUbR2PWgeFURaw2jntFSUafNxVefuquR+MD /Y43swY+q5ufKKtvEKG3u2u9R5gp/u/Oo95P5uDGg6FwftNJQphi6xEYNkaZAPH2 4maEZjWCerNDOgpuhczBSzW7CmQa4nQoKVj/kjtoTHuhMBtsNJyIZ3PyGdYi/2Ya pGOSoivGk+S/PFDxYmZobMxkTJ85FnPyH9vE30kVQaqMXjpZ/ZH5KpN2aTPa0UiD K1CcgoDZoJqWm0jp621X3O396y3Pby9qkUoeQM0mzcypufbDq91mik/ZfJXAOUDy LKsol+luKerXmqRbKamP2C0lfSJiMntedd4QowUTMbJ6oWtyLP9gQ7QXsAKqMmZ4 rtPlLSD7qMTlYCcqFHXnMHwicUjiL8SRF969FmH2iULuyWUcvuqCxw+or3x5H8hb aJzftV1pRnqNBFeAKhfFUxWLO5s8q+e8PMRya6UNcJw9ZDQr3wu2Gd8hW8d4XADu KusEuZZNMkxasdirs980OKFvCxR+kLgqmpzbTAyqgDVWyU4QAfBcjyKEXLVW53oY LkEhwe9i/hEpW/lObkJOqrlpG/Xv/yoiD0Bydlr6fW5GV16jtyXLsGVQ5+jhZa+E Db6WHu972c7AICL0hYT7bBx62pLfVjgfD7PrIkREzrHZnKeoheSRDveg6ISHzJd7 ZhGWfoI= =fb42 -----END PGP SIGNATURE----- Merge tag 'sound-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A few small fixes: - a fix for the NULL-dereference in rawmidi compat ioctls, triggered by fuzzer - HD-audio Realtek codec quirks, a VIA controller fixup - a long-standing bug fix in LINE6 MIDI" * tag 'sound-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: rawmidi: Fix missing input substream checks in compat ioctls ALSA: hda/realtek - adjust the location of one mic ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags ALSA: hda - New VIA controller suppor no-snoop path ALSA: line6: Use correct endpoint type for midi output	2018-04-21 10:32:16 -07:00
Linus Torvalds	e46096b6a3	linux-watchdog 4.17-rc2 tag -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEABECAAYFAlrUXbEACgkQ+iyteGJfRsoSSgCglHHxe51pHR+p395z2KdBEJAm 5CEAn0SNfw6PZw6F+P19yh4WBmT2QO6P =bKNy -----END PGP SIGNATURE----- Merge tag 'linux-watchdog-4.17-rc2' of git://www.linux-watchdog.org/linux-watchdog Pull watchdog fixes from Wim Van Sebroeck: - fall-through fixes - MAINTAINER change for hpwdt - renesas-wdt: Add support for WDIOF_CARDRESET - aspeed: set bootstatus during probe * tag 'linux-watchdog-4.17-rc2' of git://www.linux-watchdog.org/linux-watchdog: aspeed: watchdog: Set bootstatus during probe watchdog: renesas-wdt: Add support for WDIOF_CARDRESET watchdog: wafer5823wdt: Mark expected switch fall-through watchdog: w83977f_wdt: Mark expected switch fall-through watchdog: sch311x_wdt: Mark expected switch fall-through watchdog: hpwdt: change maintainer.	2018-04-21 10:28:15 -07:00
Linus Torvalds	6488ec2633	linux-kselftest-4.17-rc2 This Kselftest update for 4.17-rc2 consists of a fix from Michael Ellerman to not run dnotify_test by default to prevent Kselftest running forever. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJa2knEAAoJEAsCRMQNDUMc1iYP/R8TFK16Q5v1bh500oGnPc3n wvjpTn3iJ8DohMbWCSJONbMbKQiY04F07iH4jRZJAHeL0/Eet+/rOgUJsy+9nF5n GwI8E1xiZu23jhfP/oUJAYdeJwsAtytC6CiBk6iHNfk1m7KBpBdjIMkkQw1dJ1UX dJ9fnNDahnjsZFdSQLVsjS+lwFUVUGOuYCEOwY4ys9ezqTx2jEqTBhOSrvqgt8Cv 6rXYEhGFT/TYcai/orJ7f+YE5wyxx/U/bsBFnJC8xGWGi5iB2dk5jiA0vGhPHjTh cQ+ovELjt4Izu+nvAEBF34U/BAdyqqOGjs4vZQbCbgMBsWeDnO/J4vFUvxv2fCkr A6Xsb8rsKS5IYzyaD8jcv7o160l4MXHNX90raH4p8KfAFhTkOmTDEElmh/iI0cEO aFTjUoO8POhE9vMhfZS2zQUByBKHUzfFGGM1D/eoN9g2+ZSFajejiuqcG34J53GT tkktDyJk07NclxTNOoSdjEIuc8tI7vErqywDwsMW2Tb5GRxonGwXao0alQx7QlfY PKfFVy0qLFpCBR5hA2Ig152NstyNElnyFimjbjwZaanKBhCUqEl74z4ixj61wbot rib1VVijhpSvoJGX3LDTnHwH9jyYCsoNJKvu8RqH/MptqnIsH8IKNJRfopmn4aHG vxON0bW0XowtXqUFMas4 =okhu -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fix from Shuah Khan: "A fix from Michael Ellerman to not run dnotify_test by default to prevent Kselftest running forever" * tag 'linux-kselftest-4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/filesystems: Don't run dnotify_test by default	2018-04-21 10:26:00 -07:00
Linus Torvalds	9409227ab2	arm64 fixes: - KASan: avoid pfn_to_nid() before the page array is initialised - Fix typo causing the "upgrade" of known signals to SIGKILL -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlraKeAACgkQa9axLQDI XvFArhAAokKnCLd02Fc2CrFptWhzTUdhP2+F49qK68CXhq1gfNRo50/XPjmYTTY2 j8CEmoNgpXJQAp9kUaP5Cj81ltLP/4pzkqVidqDtFzBq7IPAVTz7rdsmhUPEuslQ 2LGHOTm2vLTGPjDYbD51ruXdclxJUy3iJLUAmrK+u9xu6VLlWtf3ERDWq/AxSi7J Ge9V9RPEJ3UqEiJGDJQYbPhFW0rRdNrLSZpLruqjvG+uXfP3t5gIrZZen+3pXl2b VGINk/yQLO0L8GyHkUrJ8wV5lT7nvKY7xjbgg2peuIMugkMwwL3rQIhjUjWLVQ6E rd4vwioDVe8w8dFf4BvQvexe3AyYyVG8j3URy6wcW+eAtj9egiuNLPrn0c6sIiqo Bk9shCZRG0k41D/1L8TsMQjGJGOFuExqYRePA6hqo7nc4z/Q4ghf2f8X10rligI/ 20C6tFngjQWvzPdLFiC+5GKGg88aR3FMHfrXcphpY3dQAOpN3BCTwOphRkn7+iuY 5KMQBx4Yd+e+7eHUBm6YefvDz8hiKG7VClvAPj/7T9w/AtnCGcnxUeFoKzKGLdwg dJC570frzKydLP1Y/ucYKdwz46BZs5VwRlUCc7glFIJyj0ri6bMPdwU+pVOO2lak 2gRZeVoxN45Gow1uHjfTM/DJa8uFwBWF1RCqRY2+GPmd7Wir4XA= =pLK1 -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - kasan: avoid pfn_to_nid() before the page array is initialised - Fix typo causing the "upgrade" of known signals to SIGKILL * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: signal: don't force known signals to SIGKILL arm64: kasan: avoid pfn_to_nid() before page array is initialized	2018-04-21 10:20:50 -07:00
Linus Torvalds	7a752478ef	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: - "fork: unconditionally clear stack on fork" is a non-bugfix which got lost during the merge window - performance concerns appear to have been adequately addressed. - and a bunch of fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/filemap.c: fix NULL pointer in page_cache_tree_insert() mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create() fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error kexec_file: do not add extra alignment to efi memmap proc: fix /proc/loadavg regression proc: revalidate kernel thread inodes to root:root autofs: mount point create should honour passed in mode MAINTAINERS: add personal addresses for Sascha and Uwe kasan: add no_sanitize attribute for clang builds rapidio: fix rio_dma_transfer error handling mm: enable thp migration for shmem thp writeback: safer lock nesting mm, pagemap: fix swap offset value for PMD migration entry mm: fix do_pages_move status handling fork: unconditionally clear stack on fork	2018-04-21 08:15:16 -07:00
Ingo Molnar	c042f7e9bb	perf/urgent fixes and improvements: - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The percentage of preempting and non-preempting context switches help understanding the nature of workloads (CPU or IO bound) that are running on a machine. This adds the kernel facility and userspace changes needed to show this information in 'perf script' and 'perf report -D' (Alexey Budankov) - Remove old error messages about things that unlikely to be the root cause in modern systems (Andi Kleen) - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar) - Support MAP_FIXED_NOREPLACE, noticed when updating the tools/include/ copies (Arnaldo Carvalho de Melo) - Fixup BPF test using epoll_pwait syscall function probe, to cope with the syscall routines renames performed in this development cycle (Arnaldo Carvalho de Melo) - Fix sample_max_stack maximum check and do not proceed when an error has been detect, return them to avoid misidentifying errors (Jiri Olsa) - Add '\n' at the end of parse-options error messages (Ravi Bangoria) - Add s390 support for detailed/verbose PMU event description (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlrZ+JQACgkQ1lAW81NS qkADwQ/9HVJr9inFOL4FVOWAVDweKSnp2wbKQdIUxRkR7LJjHfXfWBpJ8snGROKY sS95yNy0Q4xejVbvPkf9P2Mk6eiZw5oQQJwwhAYMD0Ae0Jz8CQ9rCt9ABHy9yOre tl76fF19U8TU0yk80fE+Mp0Z6Ssw6RrC9rkBAAm4LZgLcEuVCIRxlPT1LXDFO2xz 8Tk9UHeNNTb/sQRCme6bnbqBZsitd3K/qN2iWeAX/v10LMJaMcT9Xy1aL/K/RZzl H/4ehioaqtio1h5RlconcY2PjzhG3tW9469UIBkudXvtM1DDeZ1ddl4HP0C3xMlh /Ypo0vEsaGPRTal7l5fudnMfnRQ9Xbmcy74WylAFvDDpaegcipLbBY4qUziyELRv lRb2Vll9gHhsnuyfvGlIi6Zrwm3jCOe856ZhJYj0HBE+gwvGXrgg9leK+0LnjSrC /hZKFhiQp4WuBxaduqAHu9fhU0jtLN0TaqmpoaqxMqBcUPKFZkyHxXnV9HODVxuX NmF0WrcRxy3NAT0teLtAvpfei6s2wH3WyVC/zWr0N+u6mRR+kJvR9ibmwgShqVI9 sEKz0r6GhKvN1Bwe2fQ+OWUJ3mfjAO4w2Hfdls2a+I8lRRNv+uW7ZQS0aFSz8stH frHCV2HrUDKNLEfNikxu1ECdK5ErQdJgXMNJ8/2+TlOUufNfVqc= =WABo -----END PGP SIGNATURE----- Merge tag 'perf-urgent-for-mingo-4.17-20180420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo: - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The percentage of preempting and non-preempting context switches help understanding the nature of workloads (CPU or IO bound) that are running on a machine. This adds the kernel facility and userspace changes needed to show this information in 'perf script' and 'perf report -D' (Alexey Budankov) - Remove old error messages about things that unlikely to be the root cause in modern systems (Andi Kleen) - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar) - Support MAP_FIXED_NOREPLACE, noticed when updating the tools/include/ copies (Arnaldo Carvalho de Melo) - Fixup BPF test using epoll_pwait syscall function probe, to cope with the syscall routines renames performed in this development cycle (Arnaldo Carvalho de Melo) - Fix sample_max_stack maximum check and do not proceed when an error has been detect, return them to avoid misidentifying errors (Jiri Olsa) - Add '\n' at the end of parse-options error messages (Ravi Bangoria) - Add s390 support for detailed/verbose PMU event description (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-21 09:38:33 +02:00
Matthew Wilcox	abc1be13fd	mm/filemap.c: fix NULL pointer in page_cache_tree_insert() f2fs specifies the __GFP_ZERO flag for allocating some of its pages. Unfortunately, the page cache also uses the mapping's GFP flags for allocating radix tree nodes. It always masked off the __GFP_HIGHMEM flag, and masks off __GFP_ZERO in some paths, but not all. That causes radix tree nodes to be allocated with a NULL list_head, which causes backtraces like: __list_del_entry+0x30/0xd0 list_lru_del+0xac/0x1ac page_cache_tree_insert+0xd8/0x110 The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through if they are ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain. Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org Fixes: `449dd6984d` ("mm: keep page cache radix tree nodes in check") Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Chris Fries <cfries@google.com> Debugged-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:36 -07:00
Minchan Kim	c892fd82cc	mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create() If there is heavy memory pressure, page allocation with __GFP_NOWAIT fails easily although it's order-0 request. I got below warning 9 times for normal boot. <snip >: page allocation failure: order:0, mode:0x2200000(GFP_NOWAIT\|__GFP_NOTRACK) .. snip .. Call trace: dump_backtrace+0x0/0x4 dump_stack+0xa4/0xc0 warn_alloc+0xd4/0x15c __alloc_pages_nodemask+0xf88/0x10fc alloc_slab_page+0x40/0x18c new_slab+0x2b8/0x2e0 ___slab_alloc+0x25c/0x464 __kmalloc+0x394/0x498 memcg_kmem_get_cache+0x114/0x2b8 kmem_cache_alloc+0x98/0x3e8 mmap_region+0x3bc/0x8c0 do_mmap+0x40c/0x43c vm_mmap_pgoff+0x15c/0x1e4 sys_mmap+0xb0/0xc8 el0_svc_naked+0x24/0x28 Mem-Info: active_anon:17124 inactive_anon:193 isolated_anon:0 active_file:7898 inactive_file:712955 isolated_file:55 unevictable:0 dirty:27 writeback:18 unstable:0 slab_reclaimable:12250 slab_unreclaimable:23334 mapped:19310 shmem:212 pagetables:816 bounce:0 free:36561 free_pcp:1205 free_cma:35615 Node 0 active_anon:68496kB inactive_anon:772kB active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no DMA free:142188kB min:3056kB low:3820kB high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB free_cma:142492kB lowmem_reserve[]: 0 1842 1842 Normal free:4056kB min:4172kB low:5212kB high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB free_pcp:3392kB local_pcp:688kB free_cma:0kB lowmem_reserve[]: 0 0 0 DMA: 04kB 08kB 116kB (C) 032kB 064kB 0128kB 1256kB (C) 1512kB (C) 01024kB 12048kB (C) 344096kB (C) = 142096kB Normal: 2284kB (UMEH) 1728kB (UMH) 2316kB (UH) 2432kB (H) 564kB (H) 1128kB (H) 0256kB 0512kB 01024kB 02048kB 04096kB = 3872kB 721350 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 945512 pages RAM 0 pages HighMem/MovableOnly 63408 pages reserved 51200 pages cma reserved __memcg_schedule_kmem_cache_create() tries to create a shadow slab cache and the worker allocation failure is not really critical because we will retry on the next kmem charge. We might miss some charges but that shouldn't be critical. The excessive allocation failure report is not very helpful. [mhocko@kernel.org: changelog update] Link: http://lkml.kernel.org/r/20180418022912.248417-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:36 -07:00
Tetsuo Handa	d23a61ee90	fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error Commit `4ed2863951` ("fs, elf: drop MAP_FIXED usage from elf_map") is printing spurious messages under memory pressure due to map_addr == -ENOMEM. 9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already 14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already 16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already Complain only if -EEXIST, and use %px for printing the address. Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp Fixes: `4ed2863951` ("fs, elf: drop MAP_FIXED usage from elf_map") is Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:36 -07:00
Dave Young	a841aa83df	kexec_file: do not add extra alignment to efi memmap Chun-Yi reported a kernel warning message below: WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c() early_iounmap(ffffffffff200180, 00000118) [0] size not consistent 00000120 The problem is x86 kexec_file_load adds extra alignment to the efi memmap: in bzImage64_load(): efi_map_sz = efi_get_runtime_map_size(); efi_map_sz = ALIGN(efi_map_sz, 16); And __efi_memmap_init maps with the size including the alignment bytes but efi_memmap_unmap use nr_maps * desc_size which does not include the extra bytes. The alignment in kexec code is only needed for the kexec buffer internal use Actually kexec should pass exact size of the efi memmap to 2nd kernel. Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.com Signed-off-by: Dave Young <dyoung@redhat.com> Reported-by: joeyli <jlee@suse.com> Tested-by: Randy Wright <rwright@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:36 -07:00
Alexey Dobriyan	9a1015b32f	proc: fix /proc/loadavg regression Commit `95846ecf9d` ("pid: replace pid bitmap implementation with IDR API") changed last field of /proc/loadavg (last pid allocated) to be off by one: # unshare -p -f --mount-proc cat /proc/loadavg 0.00 0.00 0.00 1/60 2 <=== It should be 1 after first fork into pid namespace. This is formally a regression but given how useless this field is I don't think anyone is affected. Bug was found by /proc testsuite! Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2 Fixes: `95846ecf9d` ("pid: replace pid bitmap implementation with IDR API") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:36 -07:00
Alexey Dobriyan	2e0ad552f5	proc: revalidate kernel thread inodes to root:root task_dump_owner() has the following code: mm = task->mm; if (mm) { if (get_dumpable(mm) != SUID_DUMP_USER) { uid = ... } } Check for ->mm is buggy -- kernel thread might be borrowing mm and inode will go to some random uid:gid pair. Link: http://lkml.kernel.org/r/20180412220109.GA20978@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Ian Kent	1e6306652b	autofs: mount point create should honour passed in mode The autofs file system mkdir inode operation blindly sets the created directory mode to S_IFDIR \| 0555, ingoring the passed in mode, which can cause selinux dac_override denials. But the function also checks if the caller is the daemon (as no-one else should be able to do anything here) so there's no point in not honouring the passed in mode, allowing the daemon to set appropriate mode when required. Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Uwe Kleine-König	1551cf740c	MAINTAINERS: add personal addresses for Sascha and Uwe The idea behind using kernel@pengutronix.de (i.e. the mail alias for the kernel people at Pengutronix) as email address was to have a backup when a given developer is on vacation or run over by a bus. Make this more explicit by adding the alias as reviewer and use the personal address for Sascha and me. Link: http://lkml.kernel.org/r/20180413083312.11213-1-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Andrey Konovalov	12c8f25a01	kasan: add no_sanitize attribute for clang builds KASAN uses the __no_sanitize_address macro to disable instrumentation of particular functions. Right now it's defined only for GCC build, which causes false positives when clang is used. This patch adds a definition for clang. Note, that clang's revision 329612 or higher is required. [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check] Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Paul Lawrence <paullawrence@google.com> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Ioan Nicu	c5157b7686	rapidio: fix rio_dma_transfer error handling Some of the mport_dma_req structure members were initialized late inside the do_dma_request() function, just before submitting the request to the dma engine. But we have some error branches before that. In case of such an error, the code would return on the error path and trigger the calling of dma_req_free() with a req structure which is not completely initialized. This causes a NULL pointer dereference in dma_req_free(). This patch fixes these error branches by making sure that all necessary mport_dma_req structure members are initialized in rio_dma_transfer() immediately after the request structure gets allocated. Link: http://lkml.kernel.org/r/20180412150605.GA31409@nokia.com Fixes: `bbd876adb8` ("rapidio: use a reference count for struct mport_dma_req") Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Acked-by: Alexandre Bounine <alex.bou9@gmail.com> Cc: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Frank Kunz <frank.kunz@nokia.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Naoya Horiguchi	e71769ae52	mm: enable thp migration for shmem thp My testing for the latest kernel supporting thp migration showed an infinite loop in offlining the memory block that is filled with shmem thps. We can get out of the loop with a signal, but kernel should return with failure in this case. What happens in the loop is that scan_movable_pages() repeats returning the same pfn without any progress. That's because page migration always fails for shmem thps. In memory offline code, memory blocks containing unmovable pages should be prevented from being offline targets by has_unmovable_pages() inside start_isolate_page_range(). So it's possible to change migratability for non-anonymous thps to avoid the issue, but it introduces more complex and thp-specific handling in migration code, so it might not good. So this patch is suggesting to fix the issue by enabling thp migration for shmem thp. Both of anon/shmem thp are migratable so we don't need precheck about the type of thps. Link: http://lkml.kernel.org/r/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp Fixes: commit `72b39cfc4d` ("mm, memory_hotplug: do not fail offlining too early") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zi Yan <zi.yan@sent.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Greg Thelen	2e898e4c0a	writeback: safer lock nesting lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if the page's memcg is undergoing move accounting, which occurs when a process leaves its memcg for a new one that has memory.move_charge_at_immigrate set. unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the given inode is switching writeback domains. Switches occur when enough writes are issued from a new domain. This existing pattern is thus suspicious: lock_page_memcg(page); unlocked_inode_to_wb_begin(inode, &locked); ... unlocked_inode_to_wb_end(inode, locked); unlock_page_memcg(page); If both inode switch and process memcg migration are both in-flight then unlocked_inode_to_wb_end() will unconditionally enable interrupts while still holding the lock_page_memcg() irq spinlock. This suggests the possibility of deadlock if an interrupt occurs before unlock_page_memcg(). truncate __cancel_dirty_page lock_page_memcg unlocked_inode_to_wb_begin unlocked_inode_to_wb_end <interrupts mistakenly enabled> <interrupt> end_page_writeback test_clear_page_writeback lock_page_memcg <deadlock> unlock_page_memcg Due to configuration limitations this deadlock is not currently possible because we don't mix cgroup writeback (a cgroupv2 feature) and memory.move_charge_at_immigrate (a cgroupv1 feature). If the kernel is hacked to always claim inode switching and memcg moving_account, then this script triggers lockup in less than a minute: cd /mnt/cgroup/memory mkdir a b echo 1 > a/memory.move_charge_at_immigrate echo 1 > b/memory.move_charge_at_immigrate ( echo $BASHPID > a/cgroup.procs while true; do dd if=/dev/zero of=/mnt/big bs=1M count=256 done ) & while true; do sync done & sleep 1h & SLEEP=$! while true; do echo $SLEEP > a/cgroup.procs echo $SLEEP > b/cgroup.procs done The deadlock does not seem possible, so it's debatable if there's any reason to modify the kernel. I suggest we should to prevent future surprises. And Wang Long said "this deadlock occurs three times in our environment", so there's more reason to apply this, even to stable. Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch see "[PATCH for-4.4] writeback: safer lock nesting" https://lkml.org/lkml/2018/4/11/146 Wang Long said "this deadlock occurs three times in our environment" [gthelen@google.com: v4] Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com [akpm@linux-foundation.org: comment tweaks, struct initialization simplification] Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613 Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com Fixes: `682aa8e1a6` ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen <gthelen@google.com> Reported-by: Wang Long <wanglong19@meituan.com> Acked-by: Wang Long <wanglong19@meituan.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> [v4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Huang Ying	88c28f2469	mm, pagemap: fix swap offset value for PMD migration entry The swap offset reported by /proc/<pid>/pagemap may be not correct for PMD migration entries. If addr passed into pagemap_pmd_range() isn't aligned with PMD start address, the swap offset reported doesn't reflect this. And in the loop to report information of each sub-page, the swap offset isn't increased accordingly as that for PFN. This may happen after opening /proc/<pid>/pagemap and seeking to a page whose address doesn't align with a PMD start address. I have verified this with a simple test program. BTW: migration swap entries have PFN information, do we need to restrict whether to show them? [akpm@linux-foundation.org: fix typo, per Huang, Ying] Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Jerome Glisse" <jglisse@redhat.com> Cc: Daniel Colascione <dancol@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Michal Hocko	8f175cf5c9	mm: fix do_pages_move status handling Li Wang has reported that LTP move_pages04 test fails with the current tree: LTP move_pages04: TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT The test allocates an array of two pages, one is present while the other is not (resp. backed by zero page) and it expects EFAULT for the second page as the man page suggests. We are reporting EPERM which doesn't make any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter THP migration"). do_pages_move tries to handle as many pages in one batch as possible so we queue all pages with the same node target together and that corresponds to [start, i] range which is then used to update status array. add_page_for_migration will correctly notice the zero (resp. !present) page and returns with EFAULT which gets written to the status. But if this is the last page in the array we do not update start and so the last store_status after the loop will overwrite the range of the last batch with NUMA_NO_NODE (which corresponds to EPERM). Fix this by simply bailing out from the last flush if the pagelist is empty as there is clearly nothing more to do. Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org Fixes: cf5f16b23ec9 ("mm: unclutter THP migration") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Li Wang <liwang@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Kees Cook	e01e80634e	fork: unconditionally clear stack on fork One of the classes of kernel stack content leaks[1] is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. In the face of stack content exposure flaws, those contents can leak to userspace. Fixing this will make the kernel no longer vulnerable to these flaws, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit. Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54 and after: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46 Instead of making this a build or runtime config, Andy Lutomirski recommended this just be enabled by default. [1] A noisy search for many kinds of stack content leaks can be seen here: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak I did some more with perf and cycle counts on running 100,000 execs of /bin/true. before: Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841 Mean: 221015379122.60 Std Dev: 4662486552.47 after: Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348 Mean: 217745009865.40 Std Dev: 5935559279.99 It continues to look like it's faster, though the deviation is rather wide, but I'm not sure what I could do that would be less noisy. I'm open to ideas! Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 17:18:35 -07:00
Jann Horn	6ab690aa43	bpf: sockmap remove dead check Remove dead code that bails on `attr->value_size > KMALLOC_MAX_SIZE` - the previous check already bails on `attr->value_size != 4`. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-20 22:09:51 +02:00
Aurelien Aptel	596632de04	CIFS: fix typo in cifs_dbg Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Reported-by: Long Li <longli@microsoft.com>	2018-04-20 13:39:10 -05:00
Steve French	1d0cffa674	cifs: do not allow creating sockets except with SMB1 posix exensions RHBZ: 1453123 Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040 Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used. Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin. Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com> CC: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Cc: stable@vger.kernel.org	2018-04-20 13:31:32 -05:00
Jakub Kicinski	0cf22d6b31	PCI: Add "PCIe" to pcie_print_link_status() messages Currently the pcie_print_link_status() will print PCIe bandwidth and link width information but does not mention it is pertaining to the PCIe. Since this and related functions are used exclusively by networking drivers today users may get confused into thinking that it's the NIC bandwidth that is being talked about. Insert a "PCIe" into the messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2018-04-20 12:56:36 -05:00
Linus Torvalds	83beed7b2b	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal Pull thermal fixes from Eduardo Valentin: "A couple of fixes for the thermal subsystem" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: dt-bindings: thermal: Remove "cooling-{min\|max}-level" properties dt-bindings: thermal: remove no longer needed samsung thermal properties	2018-04-20 10:56:32 -07:00
Bjorn Helgaas	dd70e51ee0	Merge remote-tracking branch 'lorenzo/pci/host/fixes' into for-linus - correct GPIO name to fix HiSilicon Kirin probing issue (Loic Poulain) - fix MRRS setting in Marvell Armada Aardvark (Evan Wang) - fix interrupts by using ISR1 on Marvell Armada Aardvark (Victor Gu) - fix config accesses on Marvell Armada Aardvark (Victor Gu) * lorenzo/pci/host/fixes: PCI: kirin: Fix reset gpio name PCI: aardvark: Fix PCIe Max Read Request Size setting PCI: aardvark: Use ISR1 instead of ISR0 interrupt in legacy irq mode PCI: aardvark: Set PIO_ADDR_LS correctly in advk_pcie_rd_conf() PCI: aardvark: Fix logic in advk_pcie_{rd,wr}_conf()	2018-04-20 12:55:32 -05:00
Linus Torvalds	7e3cb169d3	MMC host: - sdhci-pci: Fixup tuning for AMD for eMMC HS200 mode - renesas_sdhi_internal_dmac: Avoid data corruption by limiting DMA RX -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJa2YzyAAoJEP4mhCVzWIwphfoQAMAVReir4qCfLrdNsMCsbBg/ GIXJajgOYbqX1iwMEjhHAwSxIpk9i2YM/OatcCe1Kru3h0+iuEqs4XhzwbD1w/WP BoSHKUdjocvIzUdD2N1zUf6V/1Qf74xAT9yEqWI0C8A1fhr52BbvaUgxJ9FxbjcD OTEv2CBcyoR9lP0t32K6PzHe/uDsTGwMHa2cXit/ugN7OSM62m9Jy8lm8lcijZsA LV9YMExoNr5T+kVijjRjzZ68z0cf3MtaMWCggO9J3yIKmkGTUYyaj6KgJWfOA+Ey hiObIW61Rdf6npk1ne6wXlzVW1SAxMFhjQYn7zIzPs6iiqYgrNfziL1chQ130Rpl 3LE9Lfb3PIT7lHvAJQQodCYQQddh3MNf5wgnemMY5Cnbnaj50KrX+2PqzOGZsdcr hGe4niq1pNTsgyunkkU2dG9poQjeqOS544S9/PH6VXqXhfrRvgKwJY4QQSEEykDq 3cGArh2YxfIhr/SkNDZsBegmMwW8rm00sKudm810d3q3wS7TN341qSx+no71Ryz7 qpfCZ+VFfus2irnzyV5FViPH+cZ45LUAC6NN9fcksmvEGlMwvX2yYKttSKtxi7ia CRyf5R1rn4DJ0qgIfv+j3pc5KpBhu9EU9QFd2Zi9STs0VUKe48Jmnozck/6pvQp+ 2q1d7X9S+97NJJ04CzbU =jO/Y -----END PGP SIGNATURE----- Merge tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "A couple of MMC host fixes: - sdhci-pci: Fixup tuning for AMD for eMMC HS200 mode - renesas_sdhi_internal_dmac: Avoid data corruption by limiting DMA RX" * tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs mmc: sdhci-pci: Only do AMD tuning for HS200	2018-04-20 10:41:31 -07:00
Linus Torvalds	7768ee3f45	Merge tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD fixes from Shaohua Li: "Three small fixes for MD: - md-cluster fix for faulty device from Guoqing - writehint fix for writebehind IO for raid1 from Mariusz - a live lock fix for interrupted recovery from Yufen" * tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid1: copy write hint from master bio to behind bio md/raid1: exit sync request if MD_RECOVERY_INTR is set md-cluster: don't update recovery_offset for faulty device	2018-04-20 10:39:44 -07:00
Long Li	ff30b89e0a	cifs: smbd: Dump SMB packet when configured When sending through SMB Direct, also dump the packet in SMB send path. Also fixed a typo in debug message. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>	2018-04-20 12:18:28 -05:00
Qu Wenruo	c087232374	btrfs: print-tree: debugging output enhancement This patch enhances the following things: - tree block header * add generation and owner output for node and leaf - node pointer generation output - allow btrfs_print_tree() to not follow nodes * just like btrfs-progs Please note that, although function btrfs_print_tree() is not called by anyone right now, it's still a pretty useful function to debug kernel. So that function is still kept for later use. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-04-20 19:18:16 +02:00
Nikolay Borisov	5e388e9581	btrfs: Fix race condition between delayed refs and blockgroup removal When the delayed refs for a head are all run, eventually cleanup_ref_head is called which (in case of deletion) obtains a reference for the relevant btrfs_space_info struct by querying the bg for the range. This is problematic because when the last extent of a bg is deleted a race window emerges between removal of that bg and the subsequent invocation of cleanup_ref_head. This can result in cache being null and either a null pointer dereference or assertion failure. task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000 RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs] RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292 RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8 RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001 R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0 R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0 FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs] btrfs_run_delayed_refs+0x68/0x250 [btrfs] btrfs_should_end_transaction+0x42/0x60 [btrfs] btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs] btrfs_evict_inode+0x4c6/0x5c0 [btrfs] evict+0xc6/0x190 do_unlinkat+0x19c/0x300 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fbf589c57a7 To fix this, introduce a new flag "is_system" to head_ref structs, which is populated at insertion time. This allows to decouple the querying for the spaceinfo from querying the possibly deleted bg. Fixes: `d7eae3403f` ("Btrfs: rework delayed ref total_bytes_pinned accounting") CC: stable@vger.kernel.org # 4.14+ Suggested-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-04-20 19:17:25 +02:00
David Howells	a9e5b73288	vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion In do_mount() when the MS_* flags are being converted to MNT_* flags, MS_RDONLY got accidentally convered to SB_RDONLY. Undo this change. Fixes: `e462ec50cb` ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 09:59:33 -07:00
David Howells	660625922b	afs: Fix server record deletion AFS server records get removed from the net->fs_servers tree when they're deleted, but not from the net->fs_addresses{4,6} lists, which can lead to an oops in afs_find_server() when a server record has been removed, for instance during rmmod. Fix this by deleting the record from the by-address lists before posting it for RCU destruction. The reason this hasn't been noticed before is that the fileserver keeps probing the local cache manager, thereby keeping the service record alive, so the oops would only happen when a fileserver eventually gets bored and stops pinging or if the module gets rmmod'd and a call comes in from the fileserver during the window between the server records being destroyed and the socket being closed. The oops looks something like: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c ... Workqueue: kafsd afs_process_async_call [kafs] RIP: 0010:afs_find_server+0x271/0x36f [kafs] ... Call Trace: afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs] afs_deliver_to_call+0x1ee/0x5e8 [kafs] afs_process_async_call+0x5b/0xd0 [kafs] process_one_work+0x2c2/0x504 worker_thread+0x1d4/0x2ac kthread+0x11f/0x127 ret_from_fork+0x24/0x30 Fixes: `d2ddc776a4` ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-20 09:59:33 -07:00

1 2 3 4 5 ...

752469 Commits All Branches Search

752469 Commits

All Branches