Pull perf fixes from Thomas Gleixner:
"A rather largish series of 12 patches addressing a maze of race
conditions in the perf core code from Peter Zijlstra"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Robustify task_function_call()
perf: Fix scaling vs. perf_install_in_context()
perf: Fix scaling vs. perf_event_enable()
perf: Fix scaling vs. perf_event_enable_on_exec()
perf: Fix ctx time tracking by introducing EVENT_TIME
perf: Cure event->pending_disable race
perf: Fix race between event install and jump_labels
perf: Fix cloning
perf: Only update context time when active
perf: Allow perf_release() with !event->ctx
perf: Do not double free
perf: Close install vs. exit race
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dax: move writeback calls into the filesystems
dax: give DAX clearing code correct bdev
ext4: online defrag not supported with DAX
ext2, ext4: only set S_DAX for regular inodes
block: disable block device DAX by default
ocfs2: unlock inode if deleting inode from orphan fails
mm: ASLR: use get_random_long()
drivers: char: random: add get_random_long()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().
dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev. This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.
Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device. This also fixes DAX code to properly flush caches in response
to sync(2).
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases. This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.
Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block. This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit d07e22597d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.
The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64. Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().
Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow. This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.
Finally, replace calls to get_random_int() with get_random_long() where
appropriate.
This patch (of 2):
Add get_random_long().
Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libnvdimm fixes from Dan Williams:
- Two fixes for compatibility with the ACPI 6.1 specification.
Without these fixes multi-interface DIMMs will fail to be probed, and
address range scrub commands to find memory errors will give results
that the kernel will mis-interpret. For multi-interface DIMMs Linux
will accept either the original 6.0 implementation or 6.1.
For address range scrub we'll only support 6.1 since ACPI formalized
this DSM differently than the original example [1] implemented in
v4.2. The expectation is that production systems will only ever ship
the ACPI 6.1 address range scrub command definition.
- The wider async address range scrub work targeting 4.6 discovered
that the original synchronous implementation in 4.5 is not sizing its
return buffer correctly.
- Arnd caught that my recent fix to the size of the pfn_t flags missed
updating the flags variable used in the pmem driver.
- Toshi found that we mishandle the memremap() return value in
devm_memremap().
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: use 'u64' for pfn flags
devm_memremap: Fix error value when memremap failed
nfit: update address range scrub commands to the acpi 6.1 format
libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
nfit: fix multi-interface dimm handling, acpi6.1 compatibility
Add a regression fix for changed sysfs path of
bq27xxx_battery and update MAINTAINERS file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJWzfHyAAoJENju1/PIO/qadagP/i8NEUdNeoaOuulZ2PCP/pPG
93bIFHkO+E+ZMVynQwq4anchEgOhv4HQh1UmHM/xOHyJ1NmTjvrm2XgXiNwbij8u
sdYKW6FcqBS/5AW+LVP3O9pyTnXeDg0A28PxrcwkXmE0rWx1ViHSaVyntRH9Ligg
VhS4hdLn/Zt/4JqieC62aFweurLnBt3ujMSvv/fmm6++KrIsnIKLWgXIqVBV+fcf
kNuZQh3oWQP8tYJ0+B6R3kvYjF7WDQ0cH9Aw0CHntrpS835goVPsBAV0c0iW5JQd
BDxLQxzUnIWKb4AhbOsg3Rks6AaduxxhgeOjVfALAFPfBJ/JYVv61ljifHjZhOQz
gYaK49zq5pJYLFtjYQaQadMuldF1/tt/pto0i+e1d6bijuO1BWJvJKSB5MSLMbUf
S25ZhZhvhosBFcU5Aa7V8A0Y1beLcDZeVFGPGB2flii6aLDPPOc8b0tYsyiQpLzs
KgP1q3wzkdq1OPR1WCi10JVS8Lbibf1MR/HabhlKC21gpdSBYTUCKamoNAhnMU2+
ed7nggls4ywiBcY1pQ0pLs9/fFhOJN8Rp/UThVliCHQRVcBl+PhfIGXPojtORbMy
cYguvA8cPClB12Y/TZfyDJPZqxCuFiPKBp5TXhJCDyGd8dyTR2C78ncTcBv9inEo
ECsQw/ITYyxCHccrZjXk
=wLVt
-----END PGP SIGNATURE-----
Merge tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
"Add a regression fix for changed sysfs path of bq27xxx_battery and
update MAINTAINERS file"
* tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: bq27xxx_battery: Restore device name
MAINTAINERS: update bq27xxx driver
perf_install_in_context() relies upon the context switch hooks to have
scheduled in events when the IPI misses its target -- after all, if
the task has moved from the CPU (or wasn't running at all), it will
have to context switch to run elsewhere.
This however doesn't appear to be happening.
It is possible for the IPI to not happen (task wasn't running) only to
later observe the task running with an inactive context.
The only possible explanation is that the context switch hooks are not
called. Therefore put in a sync_sched() after toggling the jump_label
to guarantee all CPUs will have them enabled before we install an
event.
A simple if (0->1) sync_sched() will not in fact work, because any
further increment can race and complete before the sync_sched().
Therefore we must jump through some hoops.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.980211985@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alexander reported that when the 'original' context gets destroyed, no
new clones happen.
This can happen irrespective of the ctx switch optimization, any task
can die, even the parent, and we want to continue monitoring the task
hierarchy until we either close the event or no tasks are left in the
hierarchy.
perf_event_init_context() will attempt to pin the 'parent' context
during clone(). At that point current is the parent, and since current
cannot have exited while executing clone(), its context cannot have
passed through perf_event_exit_task_context(). Therefore
perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE.
However, since inherit_event() does:
if (parent_event->parent)
parent_event = parent_event->parent;
it looks at the 'original' event when it does: is_orphaned_event().
This can return true if the context that contains the this event has
passed through perf_event_exit_task_context(). And thus we'll fail to
clone the perf context.
Fix this by adding a new state: STATE_DEAD, which is set by
perf_release() to indicate that the filedesc (or kernel reference) is
dead and there are no observers for our data left.
Only for STATE_DEAD will is_orphaned_event() be true and inhibit
cloning.
STATE_EXIT is otherwise preserved such that is_event_hup() remains
functional and will report when the observed task hierarchy becomes
empty.
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Fixes: c6e5b73242 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20160224174947.919845295@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The original format of these commands from the "NVDIMM DSM Interface
Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root
Device _DSMs" [2].
[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
"9.20.7 NVDIMM Root Device _DSMs"
Changes include:
1/ New 'restart' fields in ars_status, unfortunately these are
implemented in the middle of the existing definition so this change
is not backwards compatible. The expectation is that shipping
platforms will only ever support the ACPI 6.1 definition.
2/ New status values for ars_start ('busy') and ars_status ('overflow').
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Stable bugfixes:
- Fix nfs_size_to_loff_t
- NFSv4: Fix a dentry leak on alias use
Other bugfixes:
- Don't schedule a layoutreturn if the layout segment can be freed immediately.
- Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
- rpcrdma_bc_receive_call() should init rq_private_buf.len
- fix stateid handling for the NFS v4.2 operations
- pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
- fix panic in gss_pipe_downcall() in fips mode
- Fix a race between layoutget and pnfs_destroy_layout
- Fix a race between layoutget and bulk recalls
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWzKShAAoJEGcL54qWCgDyN0QQALiX8v2wvn07vE5ZeXB5uONq
+mfx8avhEoc3NVrpG6F4Kj+yJmHeAbkgIygnhZn4tcM/2YRxGDwlVLHb++yUTHO9
8zEi+tiKx9f5pK2PxRQ0PjavVxO/xOyO0/QNrUdnj8hSNR9ow+YOVjEYUulbuhIg
VAI3oSy5qIKgtDyW7w5PuPpTXLo74hPmyqHaa+ZIr2et//nJMSsw++vAmSg3oqXq
6QkLWPHt/8yvDRRn2hKkbD9gOrFCVfaZIGLM6Q0zRWAcGTzJi94ELzPdm8cVpD1o
eXKcufgLXPt3GOeAmxZ9kwQeebR6IFcvkYom5dsPhtMBuzXu1wpanU8PGgYIQ0VA
88b2YNl+TZpiVbRzxSEellZq5b+zapH/VVVnYptZiq9wUTACc7jK6W2heqe5PzaT
iepTGCAE21tV5JewcITMQHDZiOjRNdtbBzgixI7pNfMN8whU6e5NHYj6psZqT7cf
xEEZzL+RBJuCFKhXSPbBefccA4HCRkDEpT+2QgrMbS4KKfWOg36UNbJ2kgbvcRVi
HTqoRONR6zMzYBhyMlLaUuJ1co8nSHgEsL81Q3MwWSY6gucSW7jeJ2stR20KJIo1
7qgod9Ac/BAIozjzywi0LtmxouPyPU8cqaboMhSRVPDKfFlqZBNBkFLNWwgoYXMa
r1afZQwNeRRbZUR3RulE
=/WDS
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Stable bugfixes:
- Fix nfs_size_to_loff_t
- NFSv4: Fix a dentry leak on alias use
Other bugfixes:
- Don't schedule a layoutreturn if the layout segment can be freed
immediately.
- Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
- rpcrdma_bc_receive_call() should init rq_private_buf.len
- fix stateid handling for the NFS v4.2 operations
- pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
- fix panic in gss_pipe_downcall() in fips mode
- Fix a race between layoutget and pnfs_destroy_layout
- Fix a race between layoutget and bulk recalls"
* tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls
NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout
auth_gss: fix panic in gss_pipe_downcall() in fips mode
pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
nfs4: fix stateid handling for the NFS v4.2 operations
NFSv4: Fix a dentry leak on alias use
xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len
pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
pNFS: Fix pnfs_mark_matching_lsegs_return()
nfs: fix nfs_size_to_loff_t
Pull networking fixes from David Miller:
"Looks like a lot, but mostly driver fixes scattered all over as usual.
Of note:
1) Add conditional sched in nf conntrack in cleanup to avoid NMI
watchdogs. From Florian Westphal.
2) Fix deadlock in nfnetlink cttimeout, also from Floarian.
3) Fix handling of slaves in bonding ARP monitor validation, from Jay
Vosburgh.
4) Callers of ip_cmsg_send() are responsible for freeing IP options,
some were not doing so. Fix from Eric Dumazet.
5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT.
6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot.
7) bcm7xxx PHY driver bug fixes from Florian Fainelli.
8) Avoid unaligned accesses to protocol headers wrt. GRE, from
Alexander Duyck.
9) SKB leaks and other problems in arc_emac driver, from Alexander
Kochetkov.
10) tcp_v4_inbound_md5_hash() releases listener socket instead of
request socket on error path, oops. Fix from Eric Dumazet.
11) Missing socket release in pppoe_rcv_core() that seems to have
existed basically forever. From Guillaume Nault.
12) Missing slave_dev unregister in dsa_slave_create() error path,
from Florian Fainelli.
13) crypto_alloc_hash() never returns NULL, fix return value check in
__tcp_alloc_md5sig_pool. From Insu Yun.
14) Properly expire exception route entries in ipv4, from Xin Long.
15) Fix races in tcp/dccp listener socket dismantle, from Eric
Dumazet.
16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not
legal. These drivers modify the SKB on transmit. From Jiri Benc.
17) Fix regression in the initialziation of netdev->tx_queue_len.
From Phil Sutter.
18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun.
19) SCTP port hash sizing does not properly ensure that table is a
power of two in size. From Neil Horman.
20) Fix initializing of software copy of MAC address in fmvj18x_cs
driver, from Ken Kawasaki"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits)
bnx2x: Fix 84833 phy command handler
bnx2x: Fix led setting for 84858 phy.
bnx2x: Correct 84858 PHY fw version
bnx2x: Fix 84833 RX CRC
bnx2x: Fix link-forcing for KR2
net: ethernet: davicom: fix devicetree irq resource
fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address
Driver: Vmxnet3: Update Rx ring 2 max size
net: netcp: rework the code for get/set sw_data in dma desc
soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data
net: ti: netcp: restore get/set_pad_info() functionality
MAINTAINERS: Drop myself as xen netback maintainer
sctp: Fix port hash table size computation
can: ems_usb: Fix possible tx overflow
Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb
net: bcmgenet: Fix internal PHY link state
af_unix: Don't use continue to re-execute unix_stream_read_generic loop
unix_diag: fix incorrect sign extension in unix_lookup_by_ino
bnxt_en: Failure to update PHY is not fatal condition.
bnxt_en: Remove unnecessary call to update PHY settings.
...
Rename the pad to sw_data as per description of this field in the hardware
spec(refer sprugr9 from www.ti.com). Latest version of the document is
at http://www.ti.com/lit/ug/sprugr9h/sprugr9h.pdf and section 3.1
Host Packet Descriptor describes this field.
Define and use a constant for the size of sw_data field similar to
other fields in the struct for desc and document the sw_data field
in the header. As the sw_data is not touched by hw, it's type can be
changed to u32.
Rename the helpers to match with the updated dma desc field sw_data.
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch <703df6c09795> ("power: bq27xxx_battery: Reorganize I2C
into a module") has removed the device name numbering from
bq27xxx_battery_i2c_probe. Fix that by restoring the code.
Fixes: 703df6c097
Signed-off-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Pull x86 fixes from Ingo Molnar:
"This is unusually large, partly due to the EFI fixes that prevent
accidental deletion of EFI variables through efivarfs that may brick
machines. These fixes are somewhat involved to maintain compatibility
with existing install methods and other usage modes, while trying to
turn off the 'rm -rf' bricking vector.
Other fixes are for large page ioremap()s and for non-temporal
user-memcpy()s"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix vmalloc_fault() to handle large pages properly
hpet: Drop stale URLs
x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
lib/ucs2_string: Correct ucs2 -> utf8 conversion
efi: Add pstore variables to the deletion whitelist
efi: Make efivarfs entries immutable by default
efi: Make our variable validation list include the guid
efi: Do variable name validation tests in utf8
efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
lib/ucs2_string: Add ucs2 -> utf8 helper functions
Use the output length specified in the command to size the receive
buffer rather than the arbitrary 4K limit.
This bug was hiding the fact that the ndctl implementation of
ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size.
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: slab: free kmem_cache_node after destroy sysfs file
ipc/shm: handle removed segments gracefully in shm_mmap()
MAINTAINERS: update Kselftest Framework mailing list
devm_memremap_release(): fix memremap'd addr handling
mm/hugetlb.c: fix incorrect proc nr_hugepages value
mm, x86: fix pte_page() crash in gup_pte_range()
fsnotify: turn fsnotify reaper thread into a workqueue job
Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
mm: fix regression in remap_file_pages() emulation
thp, dax: do not try to withdraw pgtable from non-anon VMA
Pull livepatching fixes from Jiri Kosina:
- regression (from 4.4) fix for ordering issue, introduced by an
earlier ftrace change, that broke live patching of modules.
The fix replaces the ftrace module notifier by direct call in order
to make the ordering guaranteed and well-defined. The patch, from
Jessica Yu, has been acked both by Steven and Rusty
- error message fix from Miroslav Benes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
ftrace/module: remove ftrace module notifier
livepatch: change the error message in asm/livepatch.h header files
This reverts commit c510eff6be ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").
Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.
The above commit changed the code to use call_srcu to clean up the
marks. While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.
There's also another potential problem with using call_srcu here. While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.
It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys. While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.
This patch reverts the above patch. A later patch will take a different
approach to eliminated the dedicated thread here.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert 811a4e6fce ("PCI: Add helpers to manage pci_dev->irq and
pci_dev->irq_managed").
This is part of reverting 991de2e590 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.
This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.
Fixes: 5156dca34a ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull block fixes from Jens Axboe:
"A collection of fixes from the past few weeks that should go into 4.5.
This contains:
- Overflow fix for sysfs discard show function from Alan.
- A stacking limit init fix for max_dev_sectors, so we don't end up
artificially capping some use cases. From Keith.
- Have blk-mq proper end unstarted requests on a dying queue, instead
of pushing that to the driver. From Keith.
- NVMe:
- Update to Kconfig description for NVME_SCSI, since it was
vague and having it on is important for some SUSE distros.
From Christoph.
- Set of fixes from Keith, around surprise removal. Also kills
the no-merge flag, so it supports merging.
- Set of fixes for lightnvm from Matias, Javier, and Wenwei.
- Fix null_blk oops when asked for lightnvm, but not available. From
Matias.
- Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
if interrupted by a signal.
- Two floppy fixes from Jiri, fixing signal handling and blocking
open.
- A use-after-free fix for O_DIRECT, from Mike Krinkin.
- A block module ref count fix from Roman Pen.
- An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.
- Smaller reallo fix for xen-blkfront from Bob Liu.
- Removal of an unused struct member in the deadline IO scheduler,
from Tahsin.
- Also from Tahsin, properly initialize inode struct members
associated with cgroup writeback, if enabled.
- From Tejun, ensure that we keep the superblock pinned during cgroup
writeback"
* 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
blk: fix overflow in queue_discard_max_hw_show
writeback: initialize inode members that track writeback history
writeback: keep superblock pinned during cgroup writeback association switches
bio: return EINTR if copying to user space got interrupted
NVMe: Rate limit nvme IO warnings
NVMe: Poll device while still active during remove
NVMe: Requeue requests on suspended queues
NVMe: Allow request merges
NVMe: Fix io incapable return values
blk-mq: End unstarted requests on dying queue
block: Initialize max_dev_sectors to 0
null_blk: oops when initializing without lightnvm
block: fix module reference leak on put_disk() call for cgroups throttle
nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
kernel/fs: fix I/O wait not accounted for RW O_DSYNC
floppy: refactor open() flags handling
lightnvm: allow to force mm initialization
lightnvm: check overflow and correct mlc pairs
lightnvm: fix request intersection locking in rrpc
lightnvm: warn if irqs are disabled in lock laddr
...
The first is something that has come up a few times and has been
worked out individually, but it's come up now enough that the problem
should be generic. Tracepoints are protected by RCU sched. There are
several tracepoints within core infrastructure like kfree().
If a tracepoint is called when the CPU is going down, or when it's
coming up but has yet to be recognized by RCU, a RCU warning is
triggered. This is a true bug as that tracepoint is not protected by
RCU. Usually, this is taken care of by testing for cpu online as
a tracepoint condition. But as this is happening more often, moving
it from a individual tracepoint to a check in the tracepoint infrastructure
is more robust.
Note, there is now a duplicate of a cpu online test, because this update
does not remove the individual checks. But the overhead is small enough
that the removal can be done in another release.
The second change is strange linker breakage due to the branch tracer's
builtin_constant_p() check failing, and treating the condition as a
variable instead of a constant. Arnd Bergmann found that this can be
fixed by testing !!(cond) instead of just (cond).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWw2vTAAoJEKKk/i67LK/8vkMIAI+Fx+S9sCeWVGp4VZ3DKH9K
DibRD/2KREZe1AjYEU8ZAgo+VsFzW8OHiI1TI/1jP61YkiQSIhu6kVdPCoLG5buy
8WwiKEQ94VWC1hbPOiiq3K7THEu+M8zuFdU3+odS8E3sXIGqKPKQ3iFwwfTVHI6o
/cMTuefqsxo/hj8VwwaZdwlgWwLltM8sR040auTTEsqBLZ7D1q0aCyBrnju3FtBt
uSIPK91d92ANkpq3ELDihxBa41XSEahYgGm/ozewjHwpooWvIQz4tpGaxxkyltuE
RzeYBrM5LNBQUaXZ6C6jAdL0Y+bukS2MdNUjv8U6LwKbUvQoLuYteGEQ9g/m+mE=
=8LDX
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This includes two fixes.
The first is something that has come up a few times and has been
worked out individually, but it's come up now enough that the problem
should be generic. Tracepoints are protected by RCU sched. There are
several tracepoints within core infrastructure like kfree(). If a
tracepoint is called when the CPU is going down, or when it's coming
up but has yet to be recognized by RCU, a RCU warning is triggered.
This is a true bug as that tracepoint is not protected by RCU.
Usually, this is taken care of by testing for cpu online as a
tracepoint condition. But as this is happening more often, moving it
from a individual tracepoint to a check in the tracepoint
infrastructure is more robust.
Note, there is now a duplicate of a cpu online test, because this
update does not remove the individual checks. But the overhead is
small enough that the removal can be done in another release.
The second change is strange linker breakage due to the branch
tracer's builtin_constant_p() check failing, and treating the
condition as a variable instead of a constant. Arnd Bergmann found
that this can be fixed by testing !!(cond) instead of just (cond)"
* tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix freak link error caused by branch tracer
tracepoints: Do not trace when cpu is offline
problem description:
The current code sets UAR page size equal to system page size.
The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
solution:
Always set UAR page to 4KB. This allows more UAR pages if the OS
has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
system page size, with 4MB uar region, there are 4MB/2/64KB = 32
uars (half for uar, half for blueflame). This does not meet minimum 128
UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
which meet the minimum requirement.
Note that only codes in mlx4_core that deal with firmware know that uar
page size is 4KB. Codes that deal with usr page in cq and qp context
(mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
that uar page size equals to system page size.
Note that with this implementation, on 64KB system page size kernel, there
are 16 uars per system page but only one uars is used. The other 15
uars are ignored because of the above assumption.
Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
firmware.
Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
the virtual OS must be updated. If hypervisor has old code, and the virtual
OS has this new code, the new code will be backward compatible with the
old code. If the uar size is big enough, this new code in VF continues to
work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
meet 128 uars requirement, this new code not loaded in VF and print the same
error message as the old code in Hypervisor.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5_ifc.h is a header file representing the API and ABI between
the driver to the firmware and hardware. This file is used from
both the mlx5_ib and mlx5_core drivers.
Previously, this file used incrementing counter to indicate
reserved fields, for example:
struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 send[0x1];
u8 receive[0x1];
u8 write[0x1];
u8 read[0x1];
u8 reserved_0[0x1];
u8 srq_receive[0x1];
u8 reserved_1[0x1a];
};
If one developer implements through net-next feature A that uses
reserved_0, they replace it with featureA and renames reserved_1 to
reserved_0. In the same kernel cycle, a 2nd developer could implement
feature B through the rdma tree, that uses reserved_1 and split it to
featureB and a smaller reserved_1 field. This will cause a conflict
when the two trees are merged.
The source of this conflict is that the 1st developer changed *all*
reserved fields.
As Linus suggested, we change the layout of structs to:
struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 send[0x1];
u8 receive[0x1];
u8 write[0x1];
u8 read[0x1];
u8 reserved_at_4[0x1];
u8 srq_receive[0x1];
u8 reserved_at_6[0x1a];
};
This makes the conflicts much more rare and preserves the locality of
changes.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor register size and interrupt acknowledgement fixes which only showed
up in testing on newer hardware, but mostly a fix to the MM refcount
handling to prevent a recursive refcount issue when mmap() is used on
the file descriptor associated with a bound PASID.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iEYEABECAAYFAlbC/gAACgkQdwG7hYl686OY8QCfUPH+IB0zou9/MH3JNMz1ujot
I6wAoK0R4KiOFXvjNeNPy+XroZ9xKqv/
=RM+0
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu
Pull IOMMU SVM fixes from David Woodhouse:
"Minor register size and interrupt acknowledgement fixes which only
showed up in testing on newer hardware, but mostly a fix to the MM
refcount handling to prevent a recursive refcount issue when mmap() is
used on the file descriptor associated with a bound PASID"
* tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu:
iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts
iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
iommu/vt-d: Fix mm refcounting to hold mm_count not mm_users
may brick machines. We use a whitelist of known-safe variables to
allow things like installing distributions to work out of the box, and
instead restrict vendor-specific variable deletion by making
non-whitelist variables immutable - Peter Jones
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJWvbEYAAoJEC84WcCNIz1VatYP/1kkly4lIuSYmaQrvF9V/L75
lYNHjEURT55EDq4VAHH/wey3SbDkwy3wBsmfkkJTV1zhA+SHSAG2k097xGyLP6Xr
X+htIj//HH7U3SRWk66UiwkY/866sXCqVRN2vvjBxvP9Z/rTDKe7zRQdVVdCt80P
88H/1Nxy1S8eDExMGCvq8TbtWCSKV6P8197rUqUMf37Sbqr7yBM/sYDitdwOiGTW
gzLwJjWJgDsKw+BWaj5NNZzVAb1Dgof5oEL5WGCU7gJSis08i4cHoRiwutYk2g8f
ZbMnKvlFmiHGbjriowyNPm+pgRVDbS8JvJtORA1qXuVJFPtqV7Wdvdh+jJpdYXLp
bO8EB/yfc7PTH8ScbNbIcgmCknsItRh2SDNXxM/BY/dzaSkzVI/Wr6GauWKInQJ6
IypOMijITmnJ5Sij0V4aMUTZWS5btZt15iqAg3xUqWT9DJ61bIER+eGEhV6hx+7S
pSydQylbaVFpyswdCpJRsfxHfW5j0G9BxnKZGTh+LHeb6dXaughUq2EIdUNHWyEZ
3geJPC3Mh50MngO8phIq+DzjA4K84JZ9j6M3O27+x3bfLAqiktZS6HiaTSmSGNyM
95swhpyHREeLQqYUUTOWiz1rlQ9cW+Bmkhy7Wn3RBZ033YNtmpyoZup0432mwkMm
Wur3Jxd0GFz7zUkqvN3O
=F4YT
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Prevent accidental deletion of EFI variables through efivarfs that
may brick machines. We use a whitelist of known-safe variables to
allow things like installing distributions to work out of the box, and
instead restrict vendor-specific variable deletion by making
non-whitelist variables immutable (Peter Jones)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In my randconfig tests, I came across a bug that involves several
components:
* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing
u32 adc_clock = 450560; /* 45.056 MHz */
if (state->config.adc_clock)
adc_clock = state->config.adc_clock;
do_div(value, adc_clock);
In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.
That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():
dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.
I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.
Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de
Acked-by: Nicolas Pitre <nico@linaro.org>
Fixes: ab3c9c686e ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.
This can probuce the following warning:
[ INFO: suspicious RCU usage. ]
4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
-------------------------------
include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34
Call Trace:
[c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
[c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
[c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
[c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
[c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
[c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
[c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
[c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
[c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
[c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
[c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
[c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.
Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.
Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.
Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d ("tracing: Kernel Tracepoints")
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
According to the VT-d specification we need to clear the PPR bit in
the Page Request Status register when handling page requests, or the
hardware won't generate any more interrupts.
This wasn't actually necessary on SKL/KBL (which may well be the
subject of a hardware erratum, although it's harmless enough). But
other implementations do appear to get it right, and we only ever get
one interrupt unless we clear the PPR bit.
Reported-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
Here are a number of small tty and serial driver fixes for 4.5-rc4 that
resolve some reported issues.
One of them got reverted as it wasn't correct based on testing, and all
have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlbAzkEACgkQMUfUDdst+ylE4QCfXW10ziXSblRUIJubEm45Qhn2
WJAAoLFMd/eER2TFkBl4E2Y3I7HUaL5d
=V2Vb
-----END PGP SIGNATURE-----
Merge tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 4.5-rc4
that resolve some reported issues.
One of them got reverted as it wasn't correct based on testing, and
all have been in linux-next for a while"
* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "8250: uniphier: allow modular build with 8250 console"
pty: make sure super_block is still valid in final /dev/tty close
pty: fix possible use after free of tty->driver_data
tty: Add support for PCIe WCH382 2S multi-IO card
serial/omap: mark wait_for_xmitr as __maybe_unused
serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
8250: uniphier: allow modular build with 8250 console
tty: Drop krefs for interrupted tty lock
Merge fixes from Andrew Morton:
"10 fixes"
The lockdep hlist conversion is in the locking tree too, waiting for the
next merge window. Andrew thought it should go in now. I'll take it,
since it fixes a real problem and looks trivially correct (famous last
words).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI
mm: fix pfn_t vs highmem
kernel/locking/lockdep.c: convert hash tables to hlists
mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
mm,thp: khugepaged: call pte flush at the time of collapse
mm/backing-dev.c: fix error path in wb_init()
mm, dax: check for pmd_none() after split_huge_pmd()
vsprintf: kptr_restrict is okay in IRQ when 2
mm: fix filemap.c kernel doc warning
ubsan: cosmetic fix to Kconfig text
A set of seven fixes. Two regressions in the new hisi_sas arm driver, a
blacklist entry for the marvell console which was causing a reset cascade
without it, a race fix in the WRITE_SAME/DISCARD routines, a retry fix for the
rdac driver, without which, it would prematurely return EIO and a couple of
fixes for the hyper-v storvsc driver.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJWvXAcAAoJEDeqqVYsXL0MAdkIAMC3LDF4tnfkgDmgl/vHAJTH
q7UEOGzMv3XubgZXQRr190eWwCkpo0D0eQBqLIwV6AYMCdauTZ/kZ9Nd3ENn4+F1
qZa+YU1vYstIS6hJkt/byZojH7bht/ueaiqxXxz21T6kNrQJt44pn2jFfUQeZoMg
x5A16cUNdIl5H7F37qaerP/A4dNzFYGcdBJZoJZi90L7V9tzOBoaau1FdvtjFg9o
V4N92JTpm0/I02h9znXhwoHpinREg705rB7duKHUQ82F8hBZvuXkPVibXD/xhjuy
CXavuF1xeb/xjFh4X3a58SNRuxDuo6b8vYb8W6J0yi9j9O1iBe8XEIxgMRR6OJE=
=i5+x
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of seven fixes:
Two regressions in the new hisi_sas arm driver, a blacklist entry for
the marvell console which was causing a reset cascade without it, a
race fix in the WRITE_SAME/DISCARD routines, a retry fix for the rdac
driver, without which, it would prematurely return EIO and a couple of
fixes for the hyper-v storvsc driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
SCSI: Add Marvell Console to VPD blacklist
scsi_dh_rdac: always retry MODE SELECT on command lock violation
storvsc: Use the specified target ID in device lookup
storvsc: Install the storvsc specific timeout handler for FC devices
hisi_sas: fix v1 hw check for slot error
hisi_sas: add dependency for HAS_IOMEM
The pfn_t type uses an unsigned long to store a pfn + flags value. On a
64-bit platform the upper 12 bits of an unsigned long are never used for
storing the value of a pfn. However, this is not true on highmem
platforms, all 32-bits of a pfn value are used to address a 44-bit
physical address space. A pfn_t needs to store a 64-bit value.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211
Fixes: 01c8f1c44b ("mm, dax, gpu: convert vm_insert_mixed to pfn_t")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Stuart Foster <smf.linux@ntlworld.com>
Reported-by: Julian Margetson <runaway@candw.ms>
Tested-by: Julian Margetson <runaway@candw.ms>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike said:
: CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i. e
: kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
: message.
:
: The problem is that ubsan callbacks use spinlocks and might be called
: before lockdep is initialized. Particularly this line in the
: reserve_ebda_region function causes problem:
:
: lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
:
: If i put lockdep_init() before reserve_ebda_region call in
: x86_64_start_reservations kernel loads well.
Fix this ordering issue permanently: change lockdep so that it uses
hlists for the hash tables. Unlike a list_head, an hlist_head is in its
initialized state when it is all-zeroes, so lockdep is ready for
operation immediately upon boot - lockdep_init() need not have run.
The patch will also save some memory.
lockdep_init() and lockdep_initialized can be done away with now - a 4.6
patch has been prepared to do this.
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Fix BPF handling of branch offset adjustmnets on backjumps, from
Daniel Borkmann.
2) Make sure selinux knows about SOCK_DESTROY netlink messages, from
Lorenzo Colitti.
3) Fix openvswitch tunnel mtu regression, from David Wragg.
4) Fix ICMP handling of TCP sockets in syn_recv state, from Eric
Dumazet.
5) Fix SCTP user hmacid byte ordering bug, from Xin Long.
6) Fix recursive locking in ipv6 addrconf, from Subash Abhinov
Kasiviswanathan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bpf: fix branch offset adjustment on backjumps after patching ctx expansion
vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
geneve: Relax MTU constraints
vxlan: Relax MTU constraints
flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
of: of_mdio: Add marvell, 88e1145 to whitelist of PHY compatibilities.
selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables
sctp: translate network order to host order when users get a hmacid
enic: increment devcmd2 result ring in case of timeout
tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
net:Add sysctl_max_skb_frags
tcp: do not drop syn_recv on all icmp reports
ipv6: fix a lockdep splat
unix: correctly track in-flight fds in sending process user_struct
update be2net maintainers' email addresses
dwc_eth_qos: Reset hardware before PHY start
ipv6: addrconf: Fix recursive spin lock call
Pull libata fixes from Tejun Heo:
- PORTS_IMPL workaround for very early ahci controllers is misbehaving
on new systems. Disabled on recent ahci versions.
- Old-style PIO state machine had a horrible locking problem. Don't
know how we've been getting away this far. Fixed.
- Other device specific updates.
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: Intel DNV device IDs SATA
libata: fix sff host state machine locking while polling
libata-sff: use WARN instead of BUG on illegal host state machine state
libata: disable forced PORTS_IMPL for >= AHCI 1.3
libata: blacklist a Viking flash model for MWDMA corruption
drivers: ata: wake port before DMA stop for ALPM
Pull cgroup fixes from Tejun Heo:
- The destruction path of cgroup objects are asynchronous and
multi-staged and some of them ended up destroying parents before
children leading to failures in cpu and memory controllers. Ensure
that parents are always destroyed after children.
- cpuset mm node migration was performed synchronously while holding
threadgroup and cgroup mutexes and the recent threadgroup locking
update resulted in a possible deadlock. The migration is best effort
and shouldn't have been performed under those locks to begin with.
Made asynchronous.
- Minor documentation fix.
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Documentation: cgroup: Fix 'cgroup-legacy' -> 'cgroup-v1'
cgroup: make sure a parent css isn't freed before its children
cgroup: make sure a parent css isn't offlined before its children
cpuset: make mm migration asynchronous
Pull workqueue fixes from Tejun Heo:
"Workqueue fixes for v4.5-rc3.
- Remove a spurious triggering of flush dependency warning.
- Officially break local execution guarantee of unbound work items
and add a debug feature to flush out usages which depend on it.
- Work around CPU -> NODE mapping becoming invalid on CPU offline.
The branch is young but pushing out early as stable kernels are being
affected"
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs
Revert "workqueue: make sure delayed work run in local cpu"
workqueue: skip flush dependency checks for legacy workqueues
"rm -rf" is bricking some peoples' laptops because of variables being
used to store non-reinitializable firmware driver data that's required
to POST the hardware.
These are 100% bugs, and they need to be fixed, but in the mean time it
shouldn't be easy to *accidentally* brick machines.
We have to have delete working, and picking which variables do and don't
work for deletion is quite intractable, so instead make everything
immutable by default (except for a whitelist), and make tools that
aren't quite so broad-spectrum unset the immutable flag.
Signed-off-by: Peter Jones <pjones@redhat.com>
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Matthew Garrett <mjg59@coreos.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
All the variables in this list so far are defined to be in the global
namespace in the UEFI spec, so this just further ensures we're
validating the variables we think we are.
Including the guid for entries will become more important in future
patches when we decide whether or not to allow deletion of variables
based on presence in this list.
Signed-off-by: Peter Jones <pjones@redhat.com>
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Matthew Garrett <mjg59@coreos.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
This adds ucs2_utf8size(), which tells us how big our ucs2 string is in
bytes, and ucs2_as_utf8, which translates from ucs2 to utf8..
Signed-off-by: Peter Jones <pjones@redhat.com>
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Matthew Garrett <mjg59@coreos.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
and a much more interesting kallsyms race which has been around approximately
forever. This fix is more invasive, and will require some care in backporting,
but I hated all the bandaids I could think of, so...
There are some more coming, which are only for breakages introduced this
cycle (livepatch), but wanted these in now.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWuoNiAAoJENkgDmzRrbjxsoIP/3xdco6C6DnjmrRgriyZABbU
zQ+imTUwkoCtEHuZj9nqPecsuDw13p4UsOtqQ63J3I60tr1u595sQ3JG7HM9sLZy
DvZzpoCcakJDR5If4+C1NmNT/GH8/discJ5ncu2LE+dBmtaKsFgTLG+hV4zaY62E
goVM1X/0NR933VNw7RnZoUeLSeYDqs1EhKjkewp5GLR/XLo3wRduYfqPt2luXrfQ
+qXU7AwfNni//HsJfkxpW9KwI9VwzWFtB2j71FGQ9fhHAzhEOHWQBOIZuhPNqE3V
xcHrCdup31q/TSkH7jEv9CUKNdqDiMFto3PWJirOLgABb6TLUA6qw/9ABP3CH+su
/y9iDXKnhGi1qLN8qBcgyIAf3QsbJIhwYzk0Qi0ovrWoEILLkw4ULGFiWOWV2xTc
hmLtdPGWpH60A1q1DAMX4HnxTQ7/3TXMCIHRk3PQGvBfq8jXJy3ckmY/lvjh6r5s
Oe2PIiSAykrFUD1TE5AcjjclrR5KlxPsUuAkw/44VVp4SmojGwOBQqNgmC5+yxJ3
sh/AF2PnkFxzFNo+PdJvl0tE5oVo0Jao1hy525CiqYThaSWwIowhDmkO7iffM/y7
kf05o45sHXExO5NAb3zMNqK+5G21kBHIWsBfNxllClDJRhWcV1oKxofqr9iNtOI0
RlhuwHi5sr5QCR5L1F60
=Lt9v
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fixes from Rusty Russell:
"Fix for async_probe module param added in 4.3 (clearly not widely used
yet), and a much more interesting kallsyms race which has been around
approximately forever. This fix is more invasive, and will require
some care in backporting, but I hated all the bandaids I could think
of, so...
There are some more coming, which are only for breakages introduced
this cycle (livepatch), but wanted these in now"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modules: fix longstanding /proc/kallsyms vs module insertion race.
module: wrapper for symbol name.
modules: fix modparam async_probe request
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
See http: //www.infradead.org/rpr.html
X-Evolution-Source: 1451162204.2173.11@leira.trondhjem.org
Content-Transfer-Encoding: 8bit
Mime-Version: 1.0
We support OFFSET_MAX just fine, so don't round down below it. Also
switch to using min_t to make the helper more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 433c92379d ("NFS: Clean up nfs_size_to_loff_t()")
Cc: stable@vger.kernel.org # 2.6.23+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Considering current pty code and multiple devpts instances, it's possible
to umount a devpts file system while a program still has /dev/tty opened
pointing to a previosuly closed pty pair in that instance. In the case all
ptmx and pts/N files are closed, umount can be done. If the program closes
/dev/tty after umount is done, devpts_kill_index will use now an invalid
super_block, which was already destroyed in the umount operation after
running ->kill_sb. This is another "use after free" type of issue, but now
related to the allocated super_block instance.
To avoid the problem (warning at ida_remove and potential crashes) for
this specific case, I added two functions in devpts which grabs additional
references to the super_block, which pty code now uses so it makes sure
the super block structure is still valid until pty shutdown is done.
I also moved the additional inode references to the same functions, which
also covered similar case with inode being freed before /dev/tty final
close/shutdown.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Cc: stable@vger.kernel.org # 2.6.29+
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge fixes from Andrew Morton:
"22 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
radix-tree: fix oops after radix_tree_iter_retry
MAINTAINERS: trim the file triggers for ABI/API
dax: dirty inode only if required
thp: make deferred_split_scan() work again
mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
um: asm/page.h: remove the pte_high member from struct pte_t
mm, hugetlb: don't require CMA for runtime gigantic pages
mm/hugetlb: fix gigantic page initialization/allocation
mm: downgrade VM_BUG in isolate_lru_page() to warning
mempolicy: do not try to queue pages from !vma_migratable()
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
vmstat: make vmstat_update deferrable
mm, vmstat: make quiet_vmstat lighter
mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT
memblock: don't mark memblock_phys_mem_size() as __init
dump_stack: avoid potential deadlocks
mm: validate_mm browse_rb SMP race condition
m32r: fix build failure due to SMP and MMU
...