No longer required in a lot of cases, as objects are identified over NVIF
via an alternate mechanism since the rework.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
More amdgpu and radeon stuff for drm-next. Stoney support is the big change.
The rest is just bug fixes and code cleanups. The Stoney stuff is pretty
low impact with respect to existing chips.
* 'drm-next-4.4' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: change VM size default to 64GB
drm/amdgpu: add Stoney pci ids
drm/amdgpu: update the core VI support for Stoney
drm/amdgpu: add VCE support for Stoney (v2)
drm/amdgpu: add UVD support for Stoney
drm/amdgpu: add GFX support for Stoney (v2)
drm/amdgpu: add SDMA support for Stoney (v2)
drm/amdgpu: add DCE support for Stoney
drm/amdgpu: Update SMC/DPM for Stoney
drm/amdgpu: add GMC support for Stoney
drm/amdgpu: add Stoney chip family
drm/amdgpu: fix the broken vm->mutex V2
drm/amdgpu: remove the unnecessary parameter adev for amdgpu_fence_wait_any()
drm/amdgpu: remove the exclusive lock
drm/amdgpu: remove old lockup detection infrastructure
drm: fix trivial typos
drm/amdgpu/dce: simplify suspend/resume
drm/amdgpu/gfx8: set TC_WB_ACTION_EN in RELEASE_MEM packet
drm/radeon: Use rdev->gem.mutex to protect hyperz/cmask owners
More drm-misc for 4.4.
- fb refcount fix in atomic fbdev
- various locking reworks to reduce drm_global_mutex and dev->struct_mutex
- rename docbook to gpu.tmpl and include vga_switcheroo stuff, plus more
vga_switcheroo (Lukas Wunner)
- viewport check fixes for atomic drivers from Ville
- DRM_DEBUG_VBL from Ville
- non-contentious header fixes from Mikko Rapeli
- small things all over
* tag 'topic/drm-misc-2015-10-19' of git://anongit.freedesktop.org/drm-intel: (31 commits)
drm/fb-helper: Fix fb refcounting in pan_display_atomic
drm/fb-helper: Set plane rotation directly
drm: fix mutex leak in drm_dp_get_mst_branch_device
drm: Check plane src coordinates correctly during page flip for atomic drivers
drm: Check crtc viewport correctly with rotated primary plane on atomic drivers
drm: Refactor plane src coordinate checks
drm: Swap w/h when converting the mode to src coordidates for a rotated primary plane
drm: Don't leak fb when plane crtc coodinates are bad
ALSA: hda - Spell vga_switcheroo consistently
drm/gem: Use kref_get_unless_zero for the weak mmap references
drm/vgem: Drop vgem_drm_gem_mmap
drm: Fix return value of drm_framebuffer_init()
drm/gem: Use container_of in drm_gem_object_free
drm/gem: Check locking in drm_gem_object_unreference
drm/gem: Drop struct_mutex requirement from drm_gem_mmap_obj
drm/i810_drm.h: include drm/drm.h
r128_drm.h: include drm/drm.h
savage_drm.h: include <drm/drm.h>
gpu/doc: Convert to markdown harder
gpu/doc: Add vga_switcheroo documentation
...
- dmc fixes from Animesh (not yet all) for deeper sleep states
- piles of prep patches from Ville to make mmio functions type-safe
- more fbc work from Paulo all over
- w/a shuffling from Arun Siluvery
- first part of atomic watermark updates from Matt and Ville (later parts had to
be dropped again unfortunately)
- lots of patches to prepare bxt dsi support ( Shashank Sharma)
- userptr fixes from Chris
- audio rate interface between i915/snd_hda plus kerneldoc (Libin Yang)
- shrinker improvements and fixes (Chris Wilson)
- lots and lots of small patches all over
* tag 'drm-intel-next-2015-10-10' of git://anongit.freedesktop.org/drm-intel: (134 commits)
drm/i915: Update DRIVER_DATE to 20151010
drm/i915: Partial revert of atomic watermark series
drm/i915: Early exit from semaphore_waits_for for execlist mode.
drm/i915: Remove wrong warning from i915_gem_context_clean
drm/i915: Determine the stolen memory base address on gen2
drm/i915: fix FBC buffer size checks
drm/i915: fix CFB size calculation
drm/i915: remove pre-atomic check from SKL update_primary_plane
drm/i915: don't allocate fbcon from stolen memory if it's too big
Revert "drm/i915: Call encoder hotplug for init and resume cases"
Revert "drm/i915: Add hot_plug hook for hdmi encoder"
drm/i915: use error path
drm/i915/irq: Fix misspelled word register in kernel-doc
drm/i915/irq: Fix kernel-doc warnings
drm/i915: Hook up ring workaround writes at context creation time on Gen6-7.
drm/i915: Don't warn if the workaround list is empty.
drm/i915: Resurrect golden context on gen6/7
drm/i915/chv: remove pre-production hardware workarounds
drm/i915/snb: remove pre-production hardware workaround
drm/i915/bxt: Set time interval unit to 0.833us
...
Add 3D support to the virtio-gpu.
* 'virtio-gpu-for-drm-next' of git://git.kraxel.org/linux:
virtio-gpu: add page flip support
virtio-gpu: mark as a render gpu
virtio-gpu: add basic prime support
virtio-gpu: add 3d/virgl support
virtio-gpu: don't free things on ttm_bo_init failure
virtio-gpu: wait for cursor updates finish
virtio-gpu: add & use virtio_gpu_queue_fenced_ctrl_buffer
virtio-gpu: add virtio_gpu_queue_ctrl_buffer_locked
Fixes userspace compilation error:
error: array type has incomplete element type
struct drm_clip_rect boxes[I810_NR_SAREA_CLIPRECTS];
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes compile error:
drm/r128_drm.h:156:23: error: array type has incomplete element type
struct drm_clip_rect boxes[R128_NR_SAREA_CLIPRECTS];
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes compiler error:
drm/savage_drm.h:50:24: error: array type has incomplete element type
struct drm_tex_region texList[SAVAGE_NR_TEX_HEAPS][SAVAGE_NR_TEX_REGIONS +
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes userspace compile error since list_head is not exported to userspace
headers.
Suggested by Emil Velikov <emil.l.velikov@gmail.com> at
https://lkml.org/lkml/2015/6/3/792
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes userspace compile error:
drm/sis_drm.h:68:19: error: field ‘obj_list’ has incomplete type
struct list_head obj_list;
Suggested by Emil Velikov <emil.l.velikov@gmail.com> at
https://lkml.org/lkml/2015/6/3/792
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As include/uapi/linux/userfaultfd.h is a user visible header file, it
should not include kernel-exclusive header files.
So trying to build the userfaultfd test program from the selftests
directory fails, since it contains a reference to linux/compiler.h. As
it turns out, that header is not really needed there, so we can simply
remove it to fix that issue.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags
In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.
Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here:
http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html
v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
v6: Apply pin-high for ggtt too (Chris)
v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
Fix check for entries currently using +4GB addresses, use min_t and
other polish in object_bind_to_vm (Chris)
v8: Commit message updated to point to libdrm patch.
v9: vmas are allocated in the correct ozone, so only check flag when the
vma has not been allocated. (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Another attempt at drm-misc for 4.4 ...
- better atomic helpers for runtime pm drivers
- atomic fbdev
- dp aux i2c STATUS_UPDATE handling (for short i2c replies from the sink)
- bunch of constify patches
- inital kerneldoc for vga switcheroo
- some vblank code cleanups from Ville and Thierry
- various polish all over
* tag 'topic/drm-misc-2015-09-25' of git://anongit.freedesktop.org/drm-intel: (57 commits)
drm/irq: Add drm_crtc_vblank_count_and_time()
drm/irq: Rename drm_crtc -> crtc
drm: drm_atomic_crtc_get_property should be static
drm/gma500: Remove DP_LINK_STATUS_SIZE redefinition
vga_switcheroo: Set active attribute to false for audio clients
drm/core: Preserve the fb id on close.
drm/core: Preserve the framebuffer after removing it.
drm: Use vblank timestamps to guesstimate how many vblanks were missed
drm: store_vblank() is never called with NULL timestamp
drm: Clean up drm_calc_vbltimestamp_from_scanoutpos() vbl_status
drm: Limit the number of .get_vblank_counter() retries
drm: Pass flags to drm_update_vblank_count()
drm/i915: Fix vblank count variable types
drm: Kill pixeldur_ns
drm: Stop using linedur_ns and pixeldur_ns for vblank timestamps
drm: Move timestamping constants into drm_vblank_crtc
drm/fbdev: Update legacy plane->fb refcounting for atomic restore
drm: fix kernel-doc warnings in drm_crtc.h
vga_switcheroo: Sort headers alphabetically
drm: Spell vga_switcheroo consistently
...
Pull networking fixes from David Miller:
1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
instead of cloning them. From Daniel Borkmann.
2) When converting classical BPF into eBPF, fix the setting of the
source reg to BPF_REG_X. From Tycho Andersen.
3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
Linus Lussing.
4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.
5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
Roopa Prabhu.
6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.
7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
Florian Westphal.
8) Check DMA mapping errors in bna driver, from Ivan Vecera.
9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
misclearing of interrupt status in TX timeout handler, etc.) from
David Woodhouse.
10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
Hugne.
11) Fix autobind races et al. in netlink code, from Herbert Xu with
help from Tejun Heo and others.
12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.
13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
Eric Dumazet.
14) Fix array overruns in mac80211, from Johannes Berg and Dan
Carpenter.
15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.
16) Fix race between poll_one_napi and napi_disable, from Neil Horman.
17) Fix byte order in geneve tunnel port config, from John W Linville.
18) Fix handling of ARP replies over lightweight tunnels, from Jiri
Benc.
19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
Kok and Roopa Prabhu.
20) Several reference count handling bug fixes in the PHY/MDIO layer
from Russel King.
21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.
22) Fix crash in icmp_route_lookup(), from David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: Fix panic in icmp_route_lookup
net: update docbook comment for __mdiobus_register()
ppp: fix lockdep splat in ppp_dev_uninit()
net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
phy: marvell: add link partner advertised modes
net: fix net_device refcounting
phy: add phy_device_remove()
phy: fixed-phy: properly validate phy in fixed_phy_update_state()
net: fix phy refcounting in a bunch of drivers
of_mdio: fix MDIO phy device refcounting
phy: add proper phy struct device refcounting
phy: fix mdiobus module safety
net: dsa: fix of_mdio_find_bus() device refcount leak
phy: fix of_mdio_find_bus() device refcount leak
ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
ip6_gre: Reduce log level in ip6gre_err() to debug
fib_rules: fix fib rule dumps across multiple skbs
bnx2x: byte swap rss_key to comply to Toeplitz specs
net: revert "net_sched: move tp->root allocation into fw_init()"
lwtunnel: remove source and destination UDP port config option
...
The UDP tunnel config is asymmetric wrt. to the ports used. The source and
destination ports from one direction of the tunnel are not related to the
ports of the other direction. We need to be able to respond to ARP requests
using the correct ports without involving routing.
As the consequence, UDP ports need to be fixed property of the tunnel
interface and cannot be set per route. Remove the ability to set ports per
route. This is still okay to do, as no kernel has been released with these
attributes yet.
Note that the ability to specify source and destination ports is preserved
for other users of the lwtunnel API which don't use routes for tunnel key
specification (like openvswitch).
If in the future we rework ARP handling to allow port specification, the
attributes can be added back.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV/yX5AAoJEHm+PkMAQRiGUc4IAIFtSt2EORex45d2c64Varjm
4wVJM6k1xz0e8c5bI5D03y/WaefIC2LlKHtWw4+TytnwWEryuGQ1IitvDPZLIntk
I2tUN1IzyxZrJcG2GyfozjxSxeIcaL7us5j7555kEaRVWMamqDaQgVgEKFRqD43N
M4y8qRUeU3OiaL3OhQ9beSfpI/XqjaT+ECGO5HKC3NOJtTrD+cFqLAG9ScCPhvtk
YrrXx3K6J3mylvdvJ5W6JlxOrhFMO+YzViy2bRY8OnAR2vD88p61eT8V2+ENbnMj
+AqXS4HOBpJ6I1Qhff99r0YyvVT/ln9dW7qLAXK3WG27z6HOSWr8KWNUyQD2VLE=
=9yBb
-----END PGP SIGNATURE-----
Merge tag 'v4.3-rc2' into topic/drm-misc
Backmerge Linux 4.3-rc2 because of conflicts in the dp helper code
between bugfixes and new code. Just adjacent lines really.
On top of that there's a silent conflict in the new fsl-dcu driver
merged into 4.3 and
commit 844f9111f6
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Wed Sep 2 10:42:40 2015 +0200
drm/atomic: Make prepare_fb/cleanup_fb only take state, v3.
which Thierry Reding spotted and provided a fixup for.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Add the userfaultfd syscalls to uapi asm-generic, it was tested with
postcopy live migration on aarch64 with both 4k and 64k pagesize
kernels.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge fourth patch-bomb from Andrew Morton:
- sys_membarier syscall
- seq_file interface changes
- a few misc fixups
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"
mm/early_ioremap: add explicit #include of asm/early_ioremap.h
fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void
selftests: enhance membarrier syscall test
selftests: add membarrier syscall test
sys_membarrier(): system-wide memory barrier (generic, x86)
MODSIGN: fix a compilation warning in extract-cert
Pull SCSI target updates from Nicholas Bellinger:
"Here are the outstanding target-pending updates for v4.3-rc1.
Mostly bug-fixes and minor changes this round. The fallout from the
big v4.2-rc1 RCU conversion have (thus far) been minimal.
The highlights this round include:
- Move sense handling routines into scsi_common code (Sagi)
- Return ABORTED_COMMAND sense key for PI errors (Sagi)
- Add tpg_enabled_sendtargets attribute for disabled iscsi-target
discovery (David)
- Shrink target struct se_cmd by rearranging fields (Roland)
- Drop iSCSI use of mutex around max_cmd_sn increment (Roland)
- Replace iSCSI __kernel_sockaddr_storage with sockaddr_storage (Andy +
Chris)
- Honor fabric max_data_sg_nents I/O transfer limit (Arun + Himanshu +
nab)
- Fix EXTENDED_COPY >= v4.1 regression OOPsen (Alex + nab)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (37 commits)
target: use stringify.h instead of own definition
target/user: Fix UFLAG_UNKNOWN_OP handling
target: Remove no-op conditional
target/user: Remove unused variable
target: Fix max_cmd_sn increment w/o cmdsn mutex regressions
target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess
target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
target/iscsi: Replace __kernel_sockaddr_storage with sockaddr_storage
target/iscsi: Replace conn->login_ip with login_sockaddr
target/iscsi: Keep local_ip as the actual sockaddr
target/iscsi: Fix np_ip bracket issue by removing np_ip
target: Drop iSCSI use of mutex around max_cmd_sn increment
qla2xxx: Update tcm_qla2xxx module description to 24xx+
iscsi-target: Add tpg_enabled_sendtargets for disabled discovery
drivers: target: Drop unlikely before IS_ERR(_OR_NULL)
target: check DPO/FUA usage for COMPARE AND WRITE
target: Shrink struct se_cmd by rearranging fields
target: Remove cmd->se_ordered_id (unused except debug log lines)
target: add support for START_STOP_UNIT SCSI opcode
target: improve unsupported opcode message
...
Here is an implementation of a new system call, sys_membarrier(), which
executes a memory barrier on all threads running on the system. It is
implemented by calling synchronize_sched(). It can be used to
distribute the cost of user-space memory barriers asymmetrically by
transforming pairs of memory barriers into pairs consisting of
sys_membarrier() and a compiler barrier. For synchronization primitives
that distinguish between read-side and write-side (e.g. userspace RCU
[1], rwlocks), the read-side can be accelerated significantly by moving
the bulk of the memory barrier overhead to the write-side.
The existing applications of which I am aware that would be improved by
this system call are as follows:
* Through Userspace RCU library (http://urcu.so)
- DNS server (Knot DNS) https://www.knot-dns.cz/
- Network sniffer (http://netsniff-ng.org/)
- Distributed object storage (https://sheepdog.github.io/sheepdog/)
- User-space tracing (http://lttng.org)
- Network storage system (https://www.gluster.org/)
- Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
- Financial software (https://lkml.org/lkml/2015/3/23/189)
Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking. Especially in the case of RCU used by
libraries, sys_membarrier can speed up the read-side by moving the bulk of
the memory barrier cost to synchronize_rcu().
* Direct users of sys_membarrier
- core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)
Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement
Windows FlushProcessWriteBuffers() on Linux. They are referring to
sys_membarrier in their github thread, specifically stating that
sys_membarrier() is what they are looking for.
To explain the benefit of this scheme, let's introduce two example threads:
Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
Thread B (frequent, e.g. executing liburcu
rcu_read_lock()/rcu_read_unlock())
In a scheme where all smp_mb() in thread A are ordering memory accesses
with respect to smp_mb() present in Thread B, we can change each
smp_mb() within Thread A into calls to sys_membarrier() and each
smp_mb() within Thread B into compiler barriers "barrier()".
Before the change, we had, for each smp_mb() pairs:
Thread A Thread B
previous mem accesses previous mem accesses
smp_mb() smp_mb()
following mem accesses following mem accesses
After the change, these pairs become:
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
As we can see, there are two possible scenarios: either Thread B memory
accesses do not happen concurrently with Thread A accesses (1), or they
do (2).
1) Non-concurrent Thread A vs Thread B accesses:
Thread A Thread B
prev mem accesses
sys_membarrier()
follow mem accesses
prev mem accesses
barrier()
follow mem accesses
In this case, thread B accesses will be weakly ordered. This is OK,
because at that point, thread A is not particularly interested in
ordering them with respect to its own accesses.
2) Concurrent Thread A vs Thread B accesses
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
In this case, thread B accesses, which are ensured to be in program
order thanks to the compiler barrier, will be "upgraded" to full
smp_mb() by synchronize_sched().
* Benchmarks
On Intel Xeon E5405 (8 cores)
(one thread is calling sys_membarrier, the other 7 threads are busy
looping)
1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.
* User-space user of this system call: Userspace RCU library
Both the signal-based and the sys_membarrier userspace RCU schemes
permit us to remove the memory barrier from the userspace RCU
rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
accelerating them. These memory barriers are replaced by compiler
barriers on the read-side, and all matching memory barriers on the
write-side are turned into an invocation of a memory barrier on all
active threads in the process. By letting the kernel perform this
synchronization rather than dumbly sending a signal to every process
threads (as we currently do), we diminish the number of unnecessary wake
ups and only issue the memory barriers on active threads. Non-running
threads do not need to execute such barrier anyway, because these are
implied by the scheduler context switches.
Results in liburcu:
Operations in 10s, 6 readers, 2 writers:
memory barriers in reader: 1701557485 reads, 2202847 writes
signal-based scheme: 9830061167 reads, 6700 writes
sys_membarrier: 9952759104 reads, 425 writes
sys_membarrier (dyn. check): 7970328887 reads, 425 writes
The dynamic sys_membarrier availability check adds some overhead to
the read-side compared to the signal-based scheme, but besides that,
sys_membarrier slightly outperforms the signal-based scheme. However,
this non-expedited sys_membarrier implementation has a much slower grace
period than signal and memory barrier schemes.
Besides diminishing the number of wake-ups, one major advantage of the
membarrier system call over the signal-based scheme is that it does not
need to reserve a signal. This plays much more nicely with libraries,
and with processes injected into for tracing purposes, for which we
cannot expect that signals will be unused by the application.
An expedited version of this system call can be added later on to speed
up the grace period. Its implementation will likely depend on reading
the cpu_curr()->mm without holding each CPU's rq lock.
This patch adds the system call to x86 and to asm-generic.
[1] http://urcu.so
membarrier(2) man page:
MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2)
NAME
membarrier - issue memory barriers on a set of threads
SYNOPSIS
#include <linux/membarrier.h>
int membarrier(int cmd, int flags);
DESCRIPTION
The cmd argument is one of the following:
MEMBARRIER_CMD_QUERY
Query the set of supported commands. It returns a bitmask of
supported commands.
MEMBARRIER_CMD_SHARED
Execute a memory barrier on all threads running on the system.
Upon return from system call, the caller thread is ensured that
all running threads have passed through a state where all memory
accesses to user-space addresses match program order between
entry to and return from the system call (non-running threads
are de facto in such a state). This covers threads from all pro=E2=80=90
cesses running on the system. This command returns 0.
The flags argument needs to be 0. For future extensions.
All memory accesses performed in program order from each targeted
thread is guaranteed to be ordered with respect to sys_membarrier(). If
we use the semantic "barrier()" to represent a compiler barrier forcing
memory accesses to be performed in program order across the barrier,
and smp_mb() to represent explicit memory barriers forcing full memory
ordering across the barrier, we have the following ordering table for
each pair of barrier(), sys_membarrier() and smp_mb():
The pair ordering is detailed as (O: ordered, X: not ordered):
barrier() smp_mb() sys_membarrier()
barrier() X X O
smp_mb() X O O
sys_membarrier() O O O
RETURN VALUE
On success, these system calls return zero. On error, -1 is returned,
and errno is set appropriately. For a given command, with flags
argument set to 0, this system call is guaranteed to always return the
same value until reboot.
ERRORS
ENOSYS System call is not implemented.
EINVAL Invalid arguments.
Linux 2015-04-15 MEMBARRIER(2)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nicholas Miell <nmiell@comcast.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"Just a bunch of fixes to squeeze in before -rc1:
- three nouveau regression fixes
- one qxl regression fix
- a bunch of i915 fixes
... and some core displayport/atomic fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/device: enable c800 quirk for tecra w50
drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x
drm/nouveau/gr/nv04: fix big endian setting on gr context
drm/qxl: validate monitors config modes
drm/i915: Allow DSI dual link to be configured on any pipe
drm/i915: Don't try to use DDR DVFS on CHV when disabled in the BIOS
drm/i915: Fix CSR MMIO address check
drm/i915: Limit the number of loops for reading a split 64bit register
drm/i915: Fix broken mst get_hw_state.
drm/i915: Pass hpd_status_i915[] to intel_get_hpd_pins() in pre-g4x
uapi/drm/i915_drm.h: fix userspace compilation.
drm/i915: Always mark the object as dirty when used by the GPU
drm/dp: Add dp_aux_i2c_speed_khz module param to set the assume i2c bus speed
drm/dp: Adjust i2c-over-aux retry count based on message size and i2c bus speed
drm/dp: Define AUX_RETRY_INTERVAL as 500 us
drm/atomic: Fix bookkeeping with TEST_ONLY, v3.
Merge third patch-bomb from Andrew Morton:
- even more of the rest of MM
- lib/ updates
- checkpatch updates
- small changes to a few scruffy filesystems
- kmod fixes/cleanups
- kexec updates
- a dma-mapping cleanup series from hch
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
dma-mapping: consolidate dma_set_mask
dma-mapping: consolidate dma_supported
dma-mapping: cosolidate dma_mapping_error
dma-mapping: consolidate dma_{alloc,free}_noncoherent
dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd()
mm: make sure all file VMAs have ->vm_ops set
mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()
mm: mark most vm_operations_struct const
namei: fix warning while make xmldocs caused by namei.c
ipc: convert invalid scenarios to use WARN_ON
zlib_deflate/deftree: remove bi_reverse()
lib/decompress_unlzma: Do a NULL check for pointer
lib/decompressors: use real out buf size for gunzip with kernel
fs/affs: make root lookup from blkdev logical size
sysctl: fix int -> unsigned long assignments in INT_MIN case
kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo
kexec: align crash_notes allocation to make it be inside one physical page
kexec: remove unnecessary test in kimage_alloc_crash_control_pages()
kexec: split kexec_load syscall from kexec core code
...
- Use the correct GFN/BFN terms more consistently.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV8VRMAAoJEFxbo/MsZsTRiGQH/i/jrAJUJfrFC2PINaA2gDwe
O0dlrkCiSgAYChGmxxxXZQSPM5Po5+EbT/dLjZ/uvSooeorM9RYY/mFo7ut/qLep
4pyQUuwGtebWGBZTrj9sygUVXVhgJnyoZxskNUbhj9zvP7hb9++IiI78mzne6cpj
lCh/7Z2dgpfRcKlNRu+qpzP79Uc7OqIfDK+IZLrQKlXa7IQDJTQYoRjbKpfCtmMV
BEG3kN9ESx5tLzYiAfxvaxVXl9WQFEoktqe9V8IgOQlVRLgJ2DQWS6vmraGrokWM
3HDOCHtRCXlPhu1Vnrp0R9OgqWbz8FJnmVAndXT8r3Nsjjmd0aLwhJx7YAReO/4=
=JDia
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen terminology fixes from David Vrabel:
"Use the correct GFN/BFN terms more consistently"
* tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn
xen/privcmd: Further s/MFN/GFN/ clean-up
hvc/xen: Further s/MFN/GFN clean-up
video/xen-fbfront: Further s/MFN/GFN clean-up
xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn
xen: Use correctly the Xen memory terminologies
arm/xen: implement correctly pfn_to_mfn
xen: Make clear that swiotlb and biomerge are dealing with DMA address
Pull networking fixes from David Miller:
1) Fix out-of-bounds array access in netfilter ipset, from Jozsef
Kadlecsik.
2) Use correct free operation on netfilter conntrack templates, from
Daniel Borkmann.
3) Fix route leak in SCTP, from Marcelo Ricardo Leitner.
4) Fix sizeof(pointer) in mac80211, from Thierry Reding.
5) Fix cache pointer comparison in ip6mr leading to missed unlock of
mrt_lock. From Richard Laing.
6) rds_conn_lookup() needs to consider network namespace in key
comparison, from Sowmini Varadhan.
7) Fix deadlock in TIPC code wrt broadcast link wakeups, from Kolmakov
Dmitriy.
8) Fix fd leaks in bpf syscall, from Daniel Borkmann.
9) Fix error recovery when installing ipv6 multipath routes, we would
delete the old route before we would know if we could fully commit
to the new set of nexthops. Fix from Roopa Prabhu.
10) Fix run-time suspend problems in r8152, from Hayes Wang.
11) In fec, don't program the MAC address into the chip when the clocks
are gated off. From Fugang Duan.
12) Fix poll behavior for netlink sockets when using rx ring mmap, from
Daniel Borkmann.
13) Don't allocate memory with GFP_KERNEL from get_stats64 in r8169
driver, from Corinna Vinschen.
14) In TCP Cubic congestion control, handle idle periods better where we
are application limited, in order to keep cwnd from growing out of
control. From Eric Dumzet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
tcp_cubic: better follow cubic curve after idle period
tcp: generate CA_EVENT_TX_START on data frames
xen-netfront: respect user provided max_queues
xen-netback: respect user provided max_queues
r8169: Fix sleeping function called during get_stats64, v2
ether: add IEEE 1722 ethertype - TSN
netlink, mmap: fix edge-case leakages in nf queue zero-copy
netlink, mmap: don't walk rx ring on poll if receive queue non-empty
cxgb4: changes for new firmware 1.14.4.0
net: fec: add netif status check before set mac address
r8152: fix the runtime suspend issues
r8152: split DRIVER_VERSION
ipv6: fix ifnullfree.cocci warnings
add microchip LAN88xx phy driver
stmmac: fix check for phydev being open
net: qlcnic: delete redundant memsets
net: mv643xx_eth: use kzalloc
net: jme: use kzalloc() instead of kmalloc+memset
net: cavium: liquidio: use kzalloc in setup_glist()
net: ipv6: use common fib_default_rule_pref
...
As noted by Minchan, a benefit of reading idle flag from /proc/kpageflags
is that one can easily filter dirty and/or unevictable pages while
estimating the size of unused memory.
Note that idle flag read from /proc/kpageflags may be stale in case the
page was accessed via a PTE, because it would be too costly to iterate
over all page mappings on each /proc/kpageflags read to provide an
up-to-date value. To make sure the flag is up-to-date one has to read
/sys/kernel/mm/page_idle/bitmap first.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IEEE 1722 describes AVB (later renamed to TSN - Time Sensitive
Networking), a protocol, encapsualtion and synchronization to utilize
standard networks for audio/video (and later other time-sensitive)
streams.
This standard uses ethertype 0x22F0.
http://standards.ieee.org/develop/regauth/ethertype/eth.txt
This is a respin of a previous patch ("ether: add AVB frame type
ETH_P_AVB")
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
CC: linux-api@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Henrik Austad <henrik@austad.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
The linux/audit.h header uses EM_MICROBLAZE in order to define
AUDIT_ARCH_MICROBLAZE, but it's only available in the microblaze
asm headers. Move it to the common elf-em.h header so that the
define can be used on non-microblaze systems. Otherwise we get
build errors that EM_MICROBLAZE isn't defined when we try to use
the AUDIT_ARCH_MICROBLAZE symbol.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
- Create drivers/staging/rdma
- Move amso1100 driver to staging/rdma and schedule for deletion
- Move ipath driver to staging/rdma and schedule for deletion
- Add hfi1 driver to staging/rdma and set TODO for move to regular tree
- Initial support for namespaces to be used on RDMA devices
- Add RoCE GID table handling to the RDMA core caching code
- Infrastructure to support handling of devices with differing
read and write scatter gather capabilities
- Various iSER updates
- Kill off unsafe usage of global mr registrations
- Update SRP driver
- Misc. mlx4 driver updates
- Support for the mr_alloc verb
- Support for a netlink interface between kernel and user space cache
daemon to speed path record queries and route resolution
- Ininitial support for safe hot removal of verbs devices
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj
nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ
yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU
yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135
m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5
zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW
egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE
n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp
HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L
/T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT
WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt
30ZYFuW8evTL+YQcaV65
=EHLg
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull inifiniband/rdma updates from Doug Ledford:
"This is a fairly sizeable set of changes. I've put them through a
decent amount of testing prior to sending the pull request due to
that.
There are still a few fixups that I know are coming, but I wanted to
go ahead and get the big, sizable chunk into your hands sooner rather
than waiting for those last few fixups.
Of note is the fact that this creates what is intended to be a
temporary area in the drivers/staging tree specifically for some
cleanups and additions that are coming for the RDMA stack. We
deprecated two drivers (ipath and amso1100) and are waiting to hear
back if we can deprecate another one (ehca). We also put Intel's new
hfi1 driver into this area because it needs to be refactored and a
transfer library created out of the factored out code, and then it and
the qib driver and the soft-roce driver should all be modified to use
that library.
I expect drivers/staging/rdma to be around for three or four kernel
releases and then to go away as all of the work is completed and final
deletions of deprecated drivers are done.
Summary of changes for 4.3:
- Create drivers/staging/rdma
- Move amso1100 driver to staging/rdma and schedule for deletion
- Move ipath driver to staging/rdma and schedule for deletion
- Add hfi1 driver to staging/rdma and set TODO for move to regular
tree
- Initial support for namespaces to be used on RDMA devices
- Add RoCE GID table handling to the RDMA core caching code
- Infrastructure to support handling of devices with differing read
and write scatter gather capabilities
- Various iSER updates
- Kill off unsafe usage of global mr registrations
- Update SRP driver
- Misc mlx4 driver updates
- Support for the mr_alloc verb
- Support for a netlink interface between kernel and user space cache
daemon to speed path record queries and route resolution
- Ininitial support for safe hot removal of verbs devices"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
IB/ipoib: Suppress warning for send only join failures
IB/ipoib: Clean up send-only multicast joins
IB/srp: Fix possible protection fault
IB/core: Move SM class defines from ib_mad.h to ib_smi.h
IB/core: Remove unnecessary defines from ib_mad.h
IB/hfi1: Add PSM2 user space header to header_install
IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
mlx5: Fix incorrect wc pkey_index assignment for GSI messages
IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
IB/uverbs: reject invalid or unknown opcodes
IB/cxgb4: Fix if statement in pick_local_ip6adddrs
IB/sa: Fix rdma netlink message flags
IB/ucma: HW Device hot-removal support
IB/mlx4_ib: Disassociate support
IB/uverbs: Enable device removal when there are active user space applications
IB/uverbs: Explicitly pass ib_dev to uverbs commands
IB/uverbs: Fix race between ib_uverbs_open and remove_one
IB/uverbs: Fix reference counting usage of event files
IB/core: Make ib_dealloc_pd return void
IB/srp: Create an insecure all physical rkey only if needed
...
Significant work on toshiba_acpi, including new hardware support,
refactoring, and cleanups. Extend device support for asus, ideapad, and
acer systems. New surface pro 3 buttons driver. Misc. minor cleanups for
thinkpad and hp-wireless.
acer-wmi:
- No rfkill on HP Omen 15 wifi
thinkpad_acpi
- Remove side effects from vdbg_printk -> no_printk macro
surface pro 3
- Add support driver for Surface Pro 3 buttons
hp-wireless
- remove unneeded goto/label in hpwl_init
ideapad-laptop
- add alternative representation for Yoga 2 to DMI table
- Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list
asus-laptop
- Add key found on Asus F3M
MAINTAINERS
- Remove Toshiba Linux mailing list address
toshiba_acpi:
- Bump driver version to 0.23
- Remove unnecessary checks and returns in HCI/SCI functions
- Refactor *{get, set} functions return value
- Remove "*not supported" feature prints
- Change *available functions return type
- Add set_fan_status function
- Change some variables to avoid warnings from ninja-check
- Reorder toshiba_acpi_alt_keymap entries
- Remove unused wireless defines
- Transflective backlight updates
- Avoid registering input device on WMI event laptops
- Add /dev/toshiba_acpi device
- Adapt /proc/acpi/toshiba/keys to TOS1900 devices
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV6K71AAoJEKbMaAwKp364lzAIAIKeesYU4VevKg2wL6Em/dVP
4ftDgzhvFRQhM0CEe+uR8NWFNB/CVKVSphKD5mFtc0NAs0S0Eo/tmAlERfeq0qkt
HFSpP80deU+CwjURLo4wgMbPBPudHA3bNDZfSY6vq+/JhakdtheLyAODOda7uegz
PHV0zrgDntwVDAuBPTB2h6KigFXSc3soGCWcHTjD0PLNvTGvWOiNYO7otsclNzFc
qC0uerR4ryt6OFtul900/U/x1bfM9OAFNSnqWpfkkhQ4nskmkzDsyXutXm3CtXso
Ol3YS0Saj3gBGUmYua1smSU5u4IU9cQNeq6EEgR4jdHt2QQ8fJfmT0v5UgeiHHw=
=UZPt
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.3-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver updates from Darren Hart:
"Significant work on toshiba_acpi, including new hardware support,
refactoring, and cleanups. Extend device support for asus, ideapad,
and acer systems. New surface pro 3 buttons driver. Misc minor
cleanups for thinkpad and hp-wireless.
acer-wmi:
- No rfkill on HP Omen 15 wifi
thinkpad_acpi:
- Remove side effects from vdbg_printk -> no_printk macro
surface pro 3:
- Add support driver for Surface Pro 3 buttons
hp-wireless:
- remove unneeded goto/label in hpwl_init
ideapad-laptop:
- add alternative representation for Yoga 2 to DMI table
- Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list
asus-laptop:
- Add key found on Asus F3M
MAINTAINERS:
- Remove Toshiba Linux mailing list address
toshiba_acpi:
- Bump driver version to 0.23
- Remove unnecessary checks and returns in HCI/SCI functions
- Refactor *{get, set} functions return value
- Remove "*not supported" feature prints
- Change *available functions return type
- Add set_fan_status function
- Change some variables to avoid warnings from ninja-check
- Reorder toshiba_acpi_alt_keymap entries
- Remove unused wireless defines
- Transflective backlight updates
- Avoid registering input device on WMI event laptops
- Add /dev/toshiba_acpi device
- Adapt /proc/acpi/toshiba/keys to TOS1900 devices"
* tag 'platform-drivers-x86-v4.3-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (21 commits)
acer-wmi: No rfkill on HP Omen 15 wifi
thinkpad_acpi: Remove side effects from vdbg_printk -> no_printk macro
surface pro 3: Add support driver for Surface Pro 3 buttons
hp-wireless: remove unneeded goto/label in hpwl_init
ideapad-laptop: add alternative representation for Yoga 2 to DMI table
asus-laptop: Add key found on Asus F3M
MAINTAINERS: Remove Toshiba Linux mailing list address
ideapad-laptop: Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list
toshiba_acpi: Bump driver version to 0.23
toshiba_acpi: Remove unnecessary checks and returns in HCI/SCI functions
toshiba_acpi: Refactor *{get, set} functions return value
toshiba_acpi: Remove "*not supported" feature prints
toshiba_acpi: Change *available functions return type
toshiba_acpi: Add set_fan_status function
toshiba_acpi: Change some variables to avoid warnings from ninja-check
toshiba_acpi: Reorder toshiba_acpi_alt_keymap entries
toshiba_acpi: Remove unused wireless defines
toshiba_acpi: Transflective backlight updates
toshiba_acpi: Avoid registering input device on WMI event laptops
toshiba_acpi: Add /dev/toshiba_acpi device
...
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
Pull audit update from Paul Moore:
"This is one of the larger audit patchsets in recent history,
consisting of eight patches and almost 400 lines of changes.
The bulk of the patchset is the new "audit by executable"
functionality which allows admins to set an audit watch based on the
executable on disk. Prior to this, admins could only track an
application by PID, which has some obvious limitations.
Beyond the new functionality we also have some refcnt fixes and a few
minor cleanups"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
fixup: audit: implement audit by executable
audit: implement audit by executable
audit: clean simple fsnotify implementation
audit: use macros for unset inode and device values
audit: make audit_del_rule() more robust
audit: fix uninitialized variable in audit_add_rule()
audit: eliminate unnecessary extra layer of watch parent references
audit: eliminate unnecessary extra layer of watch references
Pull security subsystem updates from James Morris:
"Highlights:
- PKCS#7 support added to support signed kexec, also utilized for
module signing. See comments in 3f1e1bea.
** NOTE: this requires linking against the OpenSSL library, which
must be installed, e.g. the openssl-devel on Fedora **
- Smack
- add IPv6 host labeling; ignore labels on kernel threads
- support smack labeling mounts which use binary mount data
- SELinux:
- add ioctl whitelisting (see
http://kernsec.org/files/lss2015/vanderstoep.pdf)
- fix mprotect PROT_EXEC regression caused by mm change
- Seccomp:
- add ptrace options for suspend/resume"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
Documentation/Changes: Now need OpenSSL devel packages for module signing
scripts: add extract-cert and sign-file to .gitignore
modsign: Handle signing key in source tree
modsign: Use if_changed rule for extracting cert from module signing key
Move certificate handling to its own directory
sign-file: Fix warning about BIO_reset() return value
PKCS#7: Add MODULE_LICENSE() to test module
Smack - Fix build error with bringup unconfigured
sign-file: Document dependency on OpenSSL devel libraries
PKCS#7: Appropriately restrict authenticated attributes and content type
KEYS: Add a name for PKEY_ID_PKCS7
PKCS#7: Improve and export the X.509 ASN.1 time object decoder
modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
extract-cert: Cope with multiple X.509 certificates in a single file
sign-file: Generate CMS message as signature instead of PKCS#7
PKCS#7: Support CMS messages also [RFC5652]
X.509: Change recorded SKID & AKID to not include Subject or Issuer
PKCS#7: Check content type and versions
MAINTAINERS: The keyrings mailing list has moved
...
The privcmd code is mixing the usage of GFN and MFN within the same
functions which make the code difficult to understand when you only work
with auto-translated guests.
The privcmd driver is only dealing with GFN so replace all the mention
of MFN into GFN.
The ioctl structure used to map foreign change has been left unchanged
given that the userspace is using it. Nonetheless, add a comment to
explain the expected value within the "mfn" field.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Since this already confused me once when adding addfb2.1, let's clean up
the header to split params one per line.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Highlights include:
Stable patches:
- Fix atomicity of pNFS commit list updates
- Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY)
- nfs_set_pgio_error sometimes misses errors
- Fix a thinko in xs_connect()
- Fix borkage in _same_data_server_addrs_locked()
- Fix a NULL pointer dereference of migration recovery ops for v4.2 client
- Don't let the ctime override attribute barriers.
- Revert "NFSv4: Remove incorrect check in can_open_delegated()"
- Ensure flexfiles pNFS driver updates the inode after write finishes
- flexfiles must not pollute the attribute cache with attrbutes from the DS
- Fix a protocol error in layoutreturn
- Fix a protocol issue with NFSv4.1 CLOSE stateids
Bugfixes + cleanups
- pNFS blocks bugfixes from Christoph
- Various cleanups from Anna
- More fixes for delegation corner cases
- Don't fsync twice for O_SYNC/IS_SYNC files
- Fix pNFS and flexfiles layoutstats bugs
- pnfs/flexfiles: avoid duplicate tracking of mirror data
- pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues.
- pnfs/flexfiles: error handling retries a layoutget before fallback to MDS
Features:
- Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong
- More RDMA client transport improvements from Chuck
- Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs
from the SUNRPC, Lustre and core infiniband tree.
- Optimise away the close-to-open getattr if there is no cached data
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV7chgAAoJEGcL54qWCgDyqJQP/3kto9VXnXcatC382jF9Pfj5
F55XeSnviOXH7CyiKA4nSBhnxg/sLuWOTpbkVI/4Y+VyWhLby9h+mtcKURHOlBnj
d5BFoPwaBVDnUiKlHFQDkRjIyxjj2Sb6/uEb2V/u3v+3znR5AZZ4lzFx4cD85oaz
mcru7yGiSxaQCIH6lHExcCEKXaDP5YdvS9YFsyQfv2976JSaQHM9ZG04E0v6MzTo
E5wwC4CLMKmhuX9kmQMj85jzs1ASAKZ3N7b4cApTIo6F8DCDH0vKQphq/nEQC497
ECjEs5/fpxtNJUpSBu0gT7G4LCiW3PzE7pHa+8bhbaAn9OzxIR5+qWujKsfGYQhO
Oomp3K9zO6omshAc5w4MkknPpbImjoZjGAj/q/6DbtrDpnD7DzOTirwYY2yX0CA8
qcL81uJUb8+j4jJj4RTO+lTUBItrM1XTqTSd/3eSMr5DDRVZj+ERZxh17TaxRBZL
YrbrLHxCHrcbdSbPlovyvY+BwjJUUFJRcOxGQXLmNYR9u92fF59rb53bzVyzcRRO
wBozzrNRCFL+fPgfNPLEapIb6VtExdM3rl2HYsJGckHj4DPQdnoB3ytIT9iEFZEN
+/rE14XEZte7kuH3OP4el2UsP/hVsm7A49mtwrkdbd7rxMWD6XfQUp8DAggWUEtI
1H6T7RC1Y6wsu0X1fnVz
=knJA
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable patches:
- Fix atomicity of pNFS commit list updates
- Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY)
- nfs_set_pgio_error sometimes misses errors
- Fix a thinko in xs_connect()
- Fix borkage in _same_data_server_addrs_locked()
- Fix a NULL pointer dereference of migration recovery ops for v4.2
client
- Don't let the ctime override attribute barriers.
- Revert "NFSv4: Remove incorrect check in can_open_delegated()"
- Ensure flexfiles pNFS driver updates the inode after write finishes
- flexfiles must not pollute the attribute cache with attrbutes from
the DS
- Fix a protocol error in layoutreturn
- Fix a protocol issue with NFSv4.1 CLOSE stateids
Bugfixes + cleanups
- pNFS blocks bugfixes from Christoph
- Various cleanups from Anna
- More fixes for delegation corner cases
- Don't fsync twice for O_SYNC/IS_SYNC files
- Fix pNFS and flexfiles layoutstats bugs
- pnfs/flexfiles: avoid duplicate tracking of mirror data
- pnfs: Fix layoutget/layoutreturn/return-on-close serialisation
issues
- pnfs/flexfiles: error handling retries a layoutget before fallback
to MDS
Features:
- Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from
Kinglong
- More RDMA client transport improvements from Chuck
- Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr()
verbs from the SUNRPC, Lustre and core infiniband tree.
- Optimise away the close-to-open getattr if there is no cached data"
* tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits)
NFSv4: Respect the server imposed limit on how many changes we may cache
NFSv4: Express delegation limit in units of pages
Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"
NFS: Optimise away the close-to-open getattr if there is no cached data
NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb
NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()
nfs: Remove unneeded checking of the return value from scnprintf
nfs: Fix truncated client owner id without proto type
NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid
NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid
NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()
NFSv4.1/flexfiles: Fix freeing of mirrors
NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file
NFSv4.1/pnfs: Handle LAYOUTGET return values correctly
NFSv4.1/pnfs: Don't ask for a read layout for an empty file.
NFSv4.1: Fix a protocol issue with CLOSE stateids
NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors
SUNRPC: Prevent SYN+SYNACK+RST storms
SUNRPC: xs_reset_transport must mark the connection as disconnected
NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6ifEAAoJEAhfPr2O5OEVn5kP/i2jM1tWcmV/ZEBKGAN0jpRk
5Y/Q+rnXvOpIJSQC3dEkweoBymVMclSgSB/wFSWCZtp5MaB8KrH4/2uc3UvolF91
7bqXt+fCUacMbDQyaabMCR83mz9tdOJLd5sf0ABqBgXGfwh5uXmBPaYBzmcYvKcW
4D89MFUpaFDPARTs9rdpVyr0aPRU4GcN0R3snRO9Ly+cQnyV/RxPf9NqCgnI+yPq
+NvA9ScUBcBt62piSIGR4egcAR8boxYC+0r57340S21/JVMvsHQ3ok9b1aT8/rtd
Yl24FkcKrRV0ShN5S1RmW5DLH/HRGabuMjkiEz9xq52FGD2sQQda0At58dWivsa4
XYdxS9UUfb9Z+qyeMdmCl1MUFRrV2G4H6VItP+GKyT3UZLEDcLl6hBg3SkyWxWB4
CSO5WuRThiIB86OVcIaREftzqDy5HdvH3ZKRD7QrW0DItGVjQwV5j6gvwqO9OEXs
99BnSohyKwUBonumE2ZtFGGhIwIomllrMSqg991bPH9+13bg/rPxUqntkPrVap/9
cV3qKO8ZFrz5UInBnR1U83l60ZK7rV4G6AVMSMKpM9XVK9TDKryAUN9Mhj5XWRH8
hbma89TQVdhdrITtt27uzj8F622cvZvxd1BqDBR8DjKVvtv/E2GPzJrAj7GHe3/o
NgzP5fF6X2Si32GNb7J8
=cIed
-----END PGP SIGNATURE-----
Merge tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- new DVB frontend drivers: ascot2e, cxd2841er, horus3a, lnbh25
- new HDMI capture driver: tc358743
- new driver for NetUP DVB new boards (netup_unidvb)
- IR support for DVBSky cards (smipcie-ir)
- Coda driver has gain macroblock tiling support
- Renesas R-Car gains JPEG codec driver
- new DVB platform driver for STi boards: c8sectpfe
- added documentation for the media core kABI to device-drivers DocBook
- lots of driver fixups, cleanups and improvements
* tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (297 commits)
[media] c8sectpfe: Remove select on undefined LIBELF_32
[media] i2c: fix platform_no_drv_owner.cocci warnings
[media] cx231xx: Use wake_up_interruptible() instead of wake_up_interruptible_nr()
[media] tc358743: only queue subdev notifications if devnode is set
[media] tc358743: add missing Kconfig dependency/select
[media] c8sectpfe: Use %pad to print 'dma_addr_t'
[media] DocBook media: Fix typo "the the" in xml files
[media] tc358743: make reset gpio optional
[media] tc358743: set direction of reset gpio using devm_gpiod_get
[media] dvbdev: document most of the functions/data structs
[media] dvb_frontend.h: document the struct dvb_frontend
[media] dvb-frontend.h: document struct dtv_frontend_properties
[media] dvb-frontend.h: document struct dvb_frontend_ops
[media] dvb: Use DVBFE_ALGO_HW where applicable
[media] dvb_frontend.h: document struct analog_demod_ops
[media] dvb_frontend.h: Document struct dvb_tuner_ops
[media] Docbook: Document struct analog_parameters
[media] dvb_frontend.h: get rid of dvbfe_modcod
[media] add documentation for struct dvb_tuner_info
[media] dvb_frontend: document dvb_frontend_tune_settings
...
- Add Jeff Layton as an nfsd co-maintainer: no change to
existing practice, just an acknowledgement of the status quo.
- Two patches ("nfsd: ensure that...") for a race overlooked by
the state locking rewrite, causing a crash noticed by multiple
users.
- Lots of smaller bugfixes all over from Kinglong Mee.
- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6fbLAAoJECebzXlCjuG+qGkP/j2YnZynwqCa4uz1+FU7qfYI
kZWNGFFQ7O7e1i9Wznp7BkSA020rvM5d1HPwZhtstURM3i52XWRtbppwKF2+IuEU
tpNdPKb28BPCZO29Z8mQk9IS2sX5jmBiibXRqBk0VK7e43PXrIwg1LJJ9HOfOpLh
b1MvxdEB7vqK+fAVIYyhlg0UDd5AHAkQ+vS8YuohRXbDcsdhhE4vmusLlUl5UKp8
5Yunz+b+pXfXPYaKidmpar6U2KoRSTPP1uO3bNfN6URO1W1nchPadLs0DnsBKlhb
U8II5RZEmc+YfiIMoeptkJHoNhWT6Zu7CNJR6B0USTKv4L6TmFQVpxptVutzYVwx
sGJ65lvCiXXOPz8JJwvBty//HTmbyOiCm64/vMbhQRlSNLSmcmTXEpw/uT5Huaxx
bX9lnznoVVCd3eRoXPwMdZTbg/uEKqREZsQWVoqA6gexYqeyp79kvGbttLoUJ27Z
IjtNb9W6akxfPKrHMgan6j7dy866o6TdSfWRayHwUoswbNnVOnMYKHjApOtF0oev
k2pdLuy9tjl2a9Ow9sSwHZDbNsXgJO76E0aYnSTBP/YvctlG7KoZ+E0oxa6DWTC+
0dE+g1xhIuUtW5WRL4pfWWk1G7jnf16J91bKkn91VveDn666RncAbLBtePmpIcIu
5Ah6KxztTVCW++i5pmHh
=aecc
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Nothing major, but:
- Add Jeff Layton as an nfsd co-maintainer: no change to existing
practice, just an acknowledgement of the status quo.
- Two patches ("nfsd: ensure that...") for a race overlooked by the
state locking rewrite, causing a crash noticed by multiple users.
- Lots of smaller bugfixes all over from Kinglong Mee.
- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues"
* tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits)
nfsd: deal with DELEGRETURN racing with CB_RECALL
nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
nfsd: ensure that delegation stateid hash references are only put once
nfsd: ensure that the ol stateid hash reference is only put once
net: sunrpc: fix tracepoint Warning: unknown op '->'
nfsd: allow more than one laundry job to run at a time
nfsd: don't WARN/backtrace for invalid container deployment.
fs: fix fs/locks.c kernel-doc warning
nfsd: Add Jeff Layton as co-maintainer
NFSD: Return word2 bitmask if setting security label in OPEN/CREATE
NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.
nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug
NFSD: Store parent's stat in a separate value
nfsd: Fix two typos in comments
lockd: NLM grace period shouldn't block NFSv4 opens
nfsd: include linux/nfs4.h in export.h
sunrpc: Switch to using hash list instead single list
sunrpc/nfsd: Remove redundant code by exports seq_operations functions
sunrpc: Store cache_detail in seq_file's private directly
...
Merge patch-bomb from Andrew Morton:
- a few misc things
- Andy's "ambient capabilities"
- fs/nofity updates
- the ocfs2 queue
- kernel/watchdog.c updates and feature work.
- some of MM. Includes Andrea's userfaultfd feature.
[ Hadn't noticed that userfaultfd was 'default y' when applying the
patches, so that got fixed in this merge instead. We do _not_ mark
new features that nobody uses yet 'default y' - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm/hugetlb.c: make vma_has_reserves() return bool
mm/madvise.c: make madvise_behaviour_valid() return bool
mm/memory.c: make tlb_next_batch() return bool
mm/dmapool.c: change is_page_busy() return from int to bool
mm: remove struct node_active_region
mremap: simplify the "overlap" check in mremap_to()
mremap: don't do uneccesary checks if new_len == old_len
mremap: don't do mm_populate(new_addr) on failure
mm: move ->mremap() from file_operations to vm_operations_struct
mremap: don't leak new_vma if f_op->mremap() fails
mm/hugetlb.c: make vma_shareable() return bool
mm: make GUP handle pfn mapping unless FOLL_GET is requested
mm: fix status code which move_pages() returns for zero page
mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
genalloc: add support of multiple gen_pools per device
genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
mm/memblock: WARN_ON when nid differs from overlap region
Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
mm: defer flush of writable TLB entries
mm: send one IPI per CPU to TLB flush all entries after unmapping pages
...
This implements the uABI of UFFDIO_COPY and UFFDIO_ZEROPAGE.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I had requests to return the full address (not the page aligned one) to
userland.
It's not entirely clear how the page offset could be relevant because
userfaults aren't like SIGBUS that can sigjump to a different place and it
actually skip resolving the fault depending on a page offset. There's
currently no real way to skip the fault especially because after a
UFFDIO_COPY|ZEROPAGE, the fault is optimized to be retried within the
kernel without having to return to userland first (not even self modifying
code replacing the .text that touched the faulting address would prevent
the fault to be repeated). Userland cannot skip repeating the fault even
more so if the fault was triggered by a KVM secondary page fault or any
get_user_pages or any copy-user inside some syscall which will return to
kernel code. The second time FAULT_FLAG_RETRY_NOWAIT won't be set leading
to a SIGBUS being raised because the userfault can't wait if it cannot
release the mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for
that).
Still returning userland a proper structure during the read() on the uffd,
can allow to use the current UFFD_API for the future non-cooperative
extensions too and it looks cleaner as well. Once we get additional
fields there's no point to return the fault address page aligned anymore
to reuse the bits below PAGE_SHIFT.
The only downside is that the read() syscall will read 32bytes instead of
8bytes but that's not going to be measurable overhead.
The total number of new events that can be extended or of new future bits
for already shipped events, is limited to 64 by the features field of the
uffdio_api structure. If more will be needed a bump of UFFD_API will be
required.
[akpm@linux-foundation.org: use __packed]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is (seems to be) the minimal thing that is required to unblock
standard uffd usage from the non-cooperative one. Now more bits can be
added to the features field indicating e.g. UFFD_FEATURE_FORK and others
needed for the latter use-case.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Defines the uAPI of the userfaultfd, notably the ioctl numbers and protocol.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Per Andrew Morgan's request, add a securebit to allow admins to disable
PR_CAP_AMBIENT_RAISE. This securebit will prevent processes from adding
capabilities to their ambient set.
For simplicity, this disables PR_CAP_AMBIENT_RAISE entirely rather than
just disabling setting previously cleared bits.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Aaron Jones <aaronmdjones@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Markku Savela <msa@moth.iki.fi>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Credit where credit is due: this idea comes from Christoph Lameter with
a lot of valuable input from Serge Hallyn. This patch is heavily based
on Christoph's patch.
===== The status quo =====
On Linux, there are a number of capabilities defined by the kernel. To
perform various privileged tasks, processes can wield capabilities that
they hold.
Each task has four capability masks: effective (pE), permitted (pP),
inheritable (pI), and a bounding set (X). When the kernel checks for a
capability, it checks pE. The other capability masks serve to modify
what capabilities can be in pE.
Any task can remove capabilities from pE, pP, or pI at any time. If a
task has a capability in pP, it can add that capability to pE and/or pI.
If a task has CAP_SETPCAP, then it can add any capability to pI, and it
can remove capabilities from X.
Tasks are not the only things that can have capabilities; files can also
have capabilities. A file can have no capabilty information at all [1].
If a file has capability information, then it has a permitted mask (fP)
and an inheritable mask (fI) as well as a single effective bit (fE) [2].
File capabilities modify the capabilities of tasks that execve(2) them.
A task that successfully calls execve has its capabilities modified for
the file ultimately being excecuted (i.e. the binary itself if that
binary is ELF or for the interpreter if the binary is a script.) [3] In
the capability evolution rules, for each mask Z, pZ represents the old
value and pZ' represents the new value. The rules are:
pP' = (X & fP) | (pI & fI)
pI' = pI
pE' = (fE ? pP' : 0)
X is unchanged
For setuid binaries, fP, fI, and fE are modified by a moderately
complicated set of rules that emulate POSIX behavior. Similarly, if
euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
(primary, fP and fI usually end up being the full set). For nonroot
users executing binaries with neither setuid nor file caps, fI and fP
are empty and fE is false.
As an extra complication, if you execute a process as nonroot and fE is
set, then the "secure exec" rules are in effect: AT_SECURE gets set,
LD_PRELOAD doesn't work, etc.
This is rather messy. We've learned that making any changes is
dangerous, though: if a new kernel version allows an unprivileged
program to change its security state in a way that persists cross
execution of a setuid program or a program with file caps, this
persistent state is surprisingly likely to allow setuid or file-capped
programs to be exploited for privilege escalation.
===== The problem =====
Capability inheritance is basically useless.
If you aren't root and you execute an ordinary binary, fI is zero, so
your capabilities have no effect whatsoever on pP'. This means that you
can't usefully execute a helper process or a shell command with elevated
capabilities if you aren't root.
On current kernels, you can sort of work around this by setting fI to
the full set for most or all non-setuid executable files. This causes
pP' = pI for nonroot, and inheritance works. No one does this because
it's a PITA and it isn't even supported on most filesystems.
If you try this, you'll discover that every nonroot program ends up with
secure exec rules, breaking many things.
This is a problem that has bitten many people who have tried to use
capabilities for anything useful.
===== The proposed change =====
This patch adds a fifth capability mask called the ambient mask (pA).
pA does what most people expect pI to do.
pA obeys the invariant that no bit can ever be set in pA if it is not
set in both pP and pI. Dropping a bit from pP or pI drops that bit from
pA. This ensures that existing programs that try to drop capabilities
still do so, with a complication. Because capability inheritance is so
broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
then calling execve effectively drops capabilities. Therefore,
setresuid from root to nonroot conditionally clears pA unless
SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can
re-add bits to pA afterwards.
The capability evolution rules are changed:
pA' = (file caps or setuid or setgid ? 0 : pA)
pP' = (X & fP) | (pI & fI) | pA'
pI' = pI
pE' = (fE ? pP' : pA')
X is unchanged
If you are nonroot but you have a capability, you can add it to pA. If
you do so, your children get that capability in pA, pP, and pE. For
example, you can set pA = CAP_NET_BIND_SERVICE, and your children can
automatically bind low-numbered ports. Hallelujah!
Unprivileged users can create user namespaces, map themselves to a
nonzero uid, and create both privileged (relative to their namespace)
and unprivileged process trees. This is currently more or less
impossible. Hallelujah!
You cannot use pA to try to subvert a setuid, setgid, or file-capped
program: if you execute any such program, pA gets cleared and the
resulting evolution rules are unchanged by this patch.
Users with nonzero pA are unlikely to unintentionally leak that
capability. If they run programs that try to drop privileges, dropping
privileges will still work.
It's worth noting that the degree of paranoia in this patch could
possibly be reduced without causing serious problems. Specifically, if
we allowed pA to persist across executing non-pA-aware setuid binaries
and across setresuid, then, naively, the only capabilities that could
leak as a result would be the capabilities in pA, and any attacker
*already* has those capabilities. This would make me nervous, though --
setuid binaries that tried to privilege-separate might fail to do so,
and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have
unexpected side effects. (Whether these unexpected side effects would
be exploitable is an open question.) I've therefore taken the more
paranoid route. We can revisit this later.
An alternative would be to require PR_SET_NO_NEW_PRIVS before setting
ambient capabilities. I think that this would be annoying and would
make granting otherwise unprivileged users minor ambient capabilities
(CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than
it is with this patch.
===== Footnotes =====
[1] Files that are missing the "security.capability" xattr or that have
unrecognized values for that xattr end up with has_cap set to false.
The code that does that appears to be complicated for no good reason.
[2] The libcap capability mask parsers and formatters are dangerously
misleading and the documentation is flat-out wrong. fE is *not* a mask;
it's a single bit. This has probably confused every single person who
has tried to use file capabilities.
[3] Linux very confusingly processes both the script and the interpreter
if applicable, for reasons that elude me. The results from thinking
about a script's file capabilities and/or setuid bits are mostly
discarded.
Preliminary userspace code is here, but it needs updating:
https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2
Here is a test program that can be used to verify the functionality
(from Christoph):
/*
* Test program for the ambient capabilities. This program spawns a shell
* that allows running processes with a defined set of capabilities.
*
* (C) 2015 Christoph Lameter <cl@linux.com>
* Released under: GPL v3 or later.
*
*
* Compile using:
*
* gcc -o ambient_test ambient_test.o -lcap-ng
*
* This program must have the following capabilities to run properly:
* Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE
*
* A command to equip the binary with the right caps is:
*
* setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test
*
*
* To get a shell with additional caps that can be inherited by other processes:
*
* ./ambient_test /bin/bash
*
*
* Verifying that it works:
*
* From the bash spawed by ambient_test run
*
* cat /proc/$$/status
*
* and have a look at the capabilities.
*/
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <cap-ng.h>
#include <sys/prctl.h>
#include <linux/capability.h>
/*
* Definitions from the kernel header files. These are going to be removed
* when the /usr/include files have these defined.
*/
#define PR_CAP_AMBIENT 47
#define PR_CAP_AMBIENT_IS_SET 1
#define PR_CAP_AMBIENT_RAISE 2
#define PR_CAP_AMBIENT_LOWER 3
#define PR_CAP_AMBIENT_CLEAR_ALL 4
static void set_ambient_cap(int cap)
{
int rc;
capng_get_caps_process();
rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap);
if (rc) {
printf("Cannot add inheritable cap\n");
exit(2);
}
capng_apply(CAPNG_SELECT_CAPS);
/* Note the two 0s at the end. Kernel checks for these */
if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) {
perror("Cannot set cap");
exit(1);
}
}
int main(int argc, char **argv)
{
int rc;
set_ambient_cap(CAP_NET_RAW);
set_ambient_cap(CAP_NET_ADMIN);
set_ambient_cap(CAP_SYS_NICE);
printf("Ambient_test forking shell\n");
if (execv(argv[1], argv + 1))
perror("Cannot exec");
return 0;
}
Signed-off-by: Christoph Lameter <cl@linux.com> # Original author
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Aaron Jones <aaronmdjones@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Markku Savela <msa@moth.iki.fi>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>