This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We don't have to write protect guest memory for dirty logging if architecture
supports hardware dirty logging, such as PML on VMX, so rename it to be more
generic.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1. Generic
- sparse warning (make function static)
- optimize locking
- bugfixes for interrupt injection
- fix MVPG addressing modes
2. hrtimer/wakeup fun
A recent change can cause KVM hangs if adjtime is used in the host.
The hrtimer might wake up too early or too late. Too early is fatal
as vcpu_block will see that the wakeup condition is not met and
sleep again. This CPU might never wake up again.
This series addresses this problem. adjclock slowing down the host
clock will result in too late wakeups. This will require more work.
In addition to that we also change the hrtimer from REALTIME to
MONOTONIC to avoid similar problems with timedatectl set-time.
3. sigp rework
We will move all "slow" sigps to QEMU (protected with a capability that
can be enabled) to avoid several races between concurrent SIGP orders.
4. Optimize the shadow page table
Provide an interface to announce the maximum guest size. The kernel
will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.
5. Provide an interface to set the guest TOD
We now use two vm attributes instead of two oneregs, as oneregs are
vcpu ioctl and we don't want to call them from other threads.
6. Protected key functions
The real HMC allows to enable/disable protected key CPACF functions.
Lets provide an implementation + an interface for QEMU to activate
this the protected key instructions.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJUwj60AAoJEBF7vIC1phx8iV0QAKq1LZRTmgTLS2fd0oyWKZeN
ShWUIUiB+7IUiuogYXZMfqOm61oogxwc95Ti+3tpSWYwkzUWagpS/RJQze7E1HOc
3pHpXwrR01ueUT6uVV4xc/vmVIlQAIl/ScRDDPahlAT2crCleWcKVC9l0zBs/Kut
IrfzN9pJcrkmXD178CDP8/VwXsn02ptLQEpidGibGHCd03YVFjp3X0wfwNdQxMbU
qOwNYCz3SLfDm5gsybO2DG+aVY3AbM2ZOJt/qLv2j4Phz4XB4t4W9iJnAefSz7JA
W4677wbMQpfZlUQYhI78H/Cl9SfWAuLug1xk83O/+lbEiR5u+8zLxB69dkFTiBaH
442OY957T6TQZ/V9d0jDo2XxFrcaU9OONbVLsfBQ56Vwv5cAg9/7zqG8eqH7Nq9R
gU3fQesgD4N0Kpa77T9k45TT/hBRnUEtsGixAPT6QYKyE6cK4AJATHKSjMSLbdfj
ELbt0p2mVtKhuCcANfEx54U2CxOrg5ElBmPz8hRw0OkXdwpqh1sGKmt0govcHP1I
BGSzE9G4mswwI1bQ7cqcyTk/lwL8g3+KQmRJoOcgCveQlnY12X5zGD5DhuPMPiIT
VENqbcTzjlxdu+4t7Enml+rXl7ySsewT9L231SSrbLsTQVgCudD1B9m72WLu5ZUT
9/Z6znv6tkeKV5rM9DYE
=zLjR
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
KVM: s390: fixes and features for kvm/next (3.20)
1. Generic
- sparse warning (make function static)
- optimize locking
- bugfixes for interrupt injection
- fix MVPG addressing modes
2. hrtimer/wakeup fun
A recent change can cause KVM hangs if adjtime is used in the host.
The hrtimer might wake up too early or too late. Too early is fatal
as vcpu_block will see that the wakeup condition is not met and
sleep again. This CPU might never wake up again.
This series addresses this problem. adjclock slowing down the host
clock will result in too late wakeups. This will require more work.
In addition to that we also change the hrtimer from REALTIME to
MONOTONIC to avoid similar problems with timedatectl set-time.
3. sigp rework
We will move all "slow" sigps to QEMU (protected with a capability that
can be enabled) to avoid several races between concurrent SIGP orders.
4. Optimize the shadow page table
Provide an interface to announce the maximum guest size. The kernel
will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.
5. Provide an interface to set the guest TOD
We now use two vm attributes instead of two oneregs, as oneregs are
vcpu ioctl and we don't want to call them from other threads.
6. Protected key functions
The real HMC allows to enable/disable protected key CPACF functions.
Lets provide an implementation + an interface for QEMU to activate
this the protected key instructions.
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU
We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).
This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.
For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller. This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed. But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
With all of the GICv3 code in place now we allow userland to ask the
kernel for using a virtual GICv3 in the guest.
Also we provide the necessary support for guests setting the memory
addresses for the virtual distributor and redistributors.
This requires some userland code to make use of that feature and
explicitly ask for a virtual GICv3.
Document that KVM_CREATE_IRQCHIP only works for GICv2, but is
considered legacy and using KVM_CREATE_DEVICE is preferred.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
With all the necessary GICv3 emulation code in place, we can now
connect the code to the GICv3 backend in the kernel.
The LR register handling is different depending on the emulated GIC
model, so provide different implementations for each.
Also allow non-v2-compatible GICv3 implementations (which don't
provide MMIO regions for the virtual CPU interface in the DT), but
restrict those hosts to support GICv3 guests only.
If the device tree provides a GICv2 compatible GICV resource entry,
but that one is faulty, just disable the GICv2 emulation and let the
user use at least the GICv3 emulation for guests.
To provide proper support for the legacy KVM_CREATE_IRQCHIP ioctl,
note virtual GICv2 compatibility in struct vgic_params and use it
on creating a VGICv2.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
While the generation of a (virtual) inter-processor interrupt (SGI)
on a GICv2 works by writing to a MMIO register, GICv3 uses the system
register ICC_SGI1R_EL1 to trigger them.
Add a trap handler function that calls the new SGI register handler
in the GICv3 code. As ICC_SRE_EL1.SRE at this point is still always 0,
this will not trap yet, but will only be used later when all the data
structures have been initialized properly.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The gic_send_sgi() function used hardcoded bit shift values to
generate the ICC_SGI1R_EL1 register value.
Replace this with symbolic names to allow reusing them later.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
With everything separated and prepared, we implement a model of a
GICv3 distributor and redistributors by using the existing framework
to provide handler functions for each register group.
Currently we limit the emulation to a model enforcing a single
security state, with SRE==1 (forcing system register access) and
ARE==1 (allowing more than 8 VCPUs).
We share some of the functions provided for GICv2 emulation, but take
the different ways of addressing (v)CPUs into account.
Save and restore is currently not implemented.
Similar to the split-off of the GICv2 specific code, the new emulation
code goes into a new file (vgic-v3-emul.c).
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
GIC CPU interface for EL1 (guests). Currently we force it to 0, but
for proper GICv3 support we have to allow guests to use it (depending
on their selected virtual GIC model).
So add ICC_SRE_EL1 to the list of saved/restored registers on a
world switch, but actually disallow a guest to change it by only
restoring a fixed, once-initialized value.
This value depends on the GIC model userland has chosen for a guest.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently the maximum number of vCPUs supported is a global value
limited by the used GIC model. GICv3 will lift this limit, but we
still need to observe it for guests using GICv2.
So the maximum number of vCPUs is per-VM value, depending on the
GIC model the guest uses.
Store and check the value in struct kvm_arch, but keep it down to
8 for now.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently we unconditionally register the GICv2 emulation device
during the host's KVM initialization. Since with GICv3 support we
may end up with only v2 or only v3 or both supported, we move the
registration into the GIC probing function, where we will later know
which combination is valid.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently we only have one virtual GIC model supported, so all guests
use the same emulation code. With the addition of another model we
end up with different guests using potentially different vGIC models,
so we have to split up some functions to be per VM.
Introduce a vgic_vm_ops struct to hold function pointers for those
functions that are different and provide the necessary code to
initialize them.
Also split up the vgic_init() function to separate out VGIC model
specific functionality into a separate function, which will later be
different for a GICv3 model.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
With the introduction of a second emulated GIC model we need to let
userspace specify the GIC model to use for each VM. Pass the
userspace provided value down into the vGIC code and store it there
to differentiate later.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
kvm_get_dirty_log() provides generic handling of dirty bitmap, currently reused
by several architectures. Building on that we intrdoduce
kvm_get_dirty_log_protect() adding write protection to mark these pages dirty
for future write access, before next KVM_GET_DIRTY_LOG ioctl call from user
space.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
Nothing too exciting as a new year's start here: most of fixes are for
ASoC, a boot crash fix on OMAP for deferred probe, a few driver
specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
fixes in kerneldoc comments for PCM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUplctAAoJEGwxgFQ9KSmkmjwP/0Ccw2+8sh3uqPbmiwCBOYYL
AUo0nhjvsoeCFqgs1rUdQBEeVqvTw5wckhmnvDAYri4wSlnT078m9o4lJpTo/Bem
1VngAWjcA7CErhaU9G988znGAuIWhf9nr/DVhabIr22vmsPrGUDQrC3B/28596mn
Pcj2H412vZRMOu6+ZDKX+9oxAWBnP9UDD3hjrKjTCUxwVlhfxAfdbsPNbkivIrLb
QAkhqIHuMlb3D2MaBUjbQ+CQf+Q0N1szZoy9t83OCUgbNhmNObuwvso21mD92mwQ
4reW0TdFHXOk3HrY0RBMkvnmrRzZoPBWxBH0bCTLO87rBtZC11IdkMpQ9mZ7gO9Q
p/P8z+S1I8nuVmFamW0IbSjslVl8+zh/hJ3H+t4bZlOJgXevcl+VwIocsNcdPhTo
V8CqcgwKLwwMuPOgHsiHk+2uCDxm23zmlgo5jyOuKuSnKxo5qu1aH3Mw7mU3S3dd
+AOWmM4ighCK2rXkFJSH4UV7/bFIFGS0wfC7adqHW2BaXS/CRStwoYw/7C48c9Mk
bY3vSjk7HwOebdnhwXclqWpLDPnmAWWfQvYW+mYakIWX0+qLpL1BFQUr6fYwmLMP
GpPBdgqZmrchwEKQM5Wg+Gz4jlkOXzs1qUMidq4f6JxM2A1H2A8BfTz3pScJqqDh
+lHj5upG2yVpYQbf4DEf
=ARQX
-----END PGP SIGNATURE-----
Merge tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Nothing too exciting as a new year's start here: most of fixes are for
ASoC, a boot crash fix on OMAP for deferred probe, a few driver
specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
fixes in kerneldoc comments for PCM"
* tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Fix kerneldoc for params_*() functions
ASoC: rockchip: i2s: fix maxburst of dma data to 4
ASoC: rockchip: i2s: fix error defination of transmit data level
ASoC: Intel: correct the fixed free block allocation
ASoC: rt5677: fixed rt5677_dsp_vad_put rt5677_dsp_vad_get panic
ASoC: Intel: Fix BYTCR machine driver MODULE_ALIAS
ASoC: Intel: Fix BYTCR firmware name
ASoC: dwc: Iterate over all channels
ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
ASoC: Intel: Add I2C dependency to two new machines
ASoC: dapm: Remove snd_soc_of_parse_audio_routing() due to deferred probe
There's a single change here, fixing a vhost bug where vhost initialization
fails due to used ring alignment check being too strict.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUoRcnAAoJECgfDbjSjVRpb7IIAIGoNJkB56Q3WWX0rP5i1Lqi
Uxt8lrvySotzoWMVZlm2pGRiwJv2T1dQXKWetvZDn0GuwCBi2/vm2l4eOhu1K+VV
jOtKk8n8kDJVgMOGaJUwqt8lhDbqGwHHDUCKDk78/pV5Q959bAVo143vWOh2DixQ
m3YmvmLwyhF6CYNwOD0HqllaojDPBAP3k22GdmJNy/H7CpcOSYK7BRoiQHX7FsbR
I3Kj/7gf5OlZFzLTLxNx/O01p1BkX6EusADfl3+tl6nggc6vqW9fHNz/TZLGh9I/
ap0Ng3pm8q/t6S6wsPj8wzTHNesK4gVWvecRN8kZV8UQY2rwd/y7R/k+VpJDseg=
=+Pmi
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost cleanup and virtio bugfix
"There's a single change here, fixing a vhost bug where vhost
initialization fails due to used ring alignment check being too
strict"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: relax used address alignment
virtio_ring: document alignment requirements
Pull input layer fixes from Dmitry Torokhov:
"Fixes for v7 protocol for ALPS devices and few other driver fixes.
Also users can request input events to be stamped with boot time
timestamps, in addition to real and monotonic timestamps"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: hil_kbd - fix incorrect use of init_completion
Input: alps - v7: document the v7 touchpad packet protocol
Input: alps - v7: fix finger counting for > 2 fingers on clickpads
Input: alps - v7: sometimes a single touch is reported in mt[1]
Input: alps - v7: ignore new packets
Input: evdev - add CLOCK_BOOTTIME support
Input: psmouse - expose drift duration for IBM trackpoints
Input: stmpe - bias keypad columns properly
Input: stmpe - enforce device tree only mode
mfd: stmpe: add pull up/down register offsets for STMPE
Input: optimize events_per_packet count calculation
Input: edt-ft5x06 - fixed a macro coding style issue
Input: gpio_keys - replace timer and workqueue with delayed workqueue
Input: gpio_keys - allow separating gpio and irq in device tree
Pull networking fixes from David Miller:
1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.
2) Fix receive checksum handling in enic driver, from Govindarajulu
Varadarajan.
3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
Herbert Xu. Also, add code to detect drivers that have this mistake
in the future.
4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.
5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
input path,f rom Nicolas Dichtel.
6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.
7) Fix double SKB free in vxlan driver, also from Pravin.
8) When we scrub a packet, which happens when we are switching the
context of the packet (namespace, etc.), we should reset the
secmark. From Thomas Graf.
9) ->ndo_gso_check() needs to do more than return true/false, it also
has to allow the driver to clear netdev feature bits in order for
the caller to be able to proceed properly. From Jesse Gross.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
netlink/genetlink: pass network namespace to bind/unbind
ne2k-pci: Add pci_disable_device in error handling
bonding: change error message to debug message in __bond_release_one()
genetlink: pass multicast bind/unbind to families
netlink: call unbind when releasing socket
netlink: update listeners directly when removing socket
genetlink: pass only network namespace to genl_has_listeners()
netlink: rename netlink_unbind() to netlink_undo_bind()
net: Generalize ndo_gso_check to ndo_features_check
net: incorrect use of init_completion fixup
neigh: remove next ptr from struct neigh_table
net: xilinx: Remove unnecessary temac_property in the driver
net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
openvswitch: fix odd_ptr_err.cocci warnings
Bluetooth: Fix accepting connections when not using mgmt
Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
brcmfmac: Do not crash if platform data is not populated
ipw2200: select CFG80211_WEXT
...
Fix a copy and paste error in the kernel doc description for the params_*()
functions.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
- Fix for a potential NULL pointer dereference in the cpufreq
core due to an initialization race condition (Ethan Zhao).
- Fixes for abuse of the OPP (Operating Performance Points) API
related to RCU and other minor issues in the OPP library and
the cpufreq-dt driver (Dmitry Torokhov).
- cpuidle governors cleanup making them measure idle duration in
a better way without using the CPUIDLE_FLAG_TIME_INVALID flag
which allows that flag to be dropped from the ACPI cpuidle driver
and from the core too (Len Brown).
- New ACPI backlight blacklist entries for Samsung machines
without a working native backlight interface that need to
use the ACPI backlight instead (Aaron Lu).
- New CPU IDs of future Intel Xeon CPUs for the Intel RAPL power
capping driver (Jacob Pan).
- Generic power domains framework modification to export the
of_genpd_get_from_provider() function to modular drivers that
will allow future driver modifications to be based on the mainline
(Amit Daniel Kachhap).
- Two fixes for the cpupower tool (Michal Privoznik, Prarit Bhargava).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUoccdAAoJEILEb/54YlRxP9IQAJIAz5RSRQpv50nq+oMLER/S
G4FrVRgELYNTCvujlsgTydSa2IP/G4bPfa7zMquIs69j0L4/65z5aDAMNfHMiNQy
6ZC1Vey7mS+A91pdhhkI3MZ9GIu/HIBnXdhhzQaVOTFG+X8Ro+9WZva6fzgqhw3p
hhtjdNkiURoul+uGL0/tk/Ia228gn0Evw+DiPxH+6TyuR2Nzuolc1mKBqE4n948l
e99U3SPqT2/fH4otYKXKgz60NldRu9EjBtjSanLRlSEHw8NXI/fbaQJ/YUPL8qnT
gfvm694nhU7lERG4jhKMIdx2ubeMpjyRLbgDu+8XmAyMUik9nHfFG5wC04V6vSP5
0JgJyFfXT0HjcvHf1EyFZ5zvqy5FrRUV81E0MXz1Yi14E/MiBvUyNz+ThTQ+7VYT
MIdFcqo7hSOjXFjMU6orkxGWqJQpKz2I4BBuQikvCnHQxkevscRS8G0SpxLp6nyD
dFBQBZgztZc0FbqlQINszPOa7FXwL042nk9+YobEyE9i3v7lC8zMwNtAav76LSTd
Z0/OivQziIwyNxIjrNkTXGnqhxXwKGMQ/JOywaswos7MdnCS2B9sRgn9zCZZ0JSs
Z59gAli3A0/qv7YiWy8jxEkL8uqeNu1PCmiwlVuEl15/p+C9w+t4iFJi9DiGFXiF
sJOZmy1GcmQCGr9m+qA0
=CYBE
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI material from Rafael J Wysocki:
"These are fixes (operating performance points library, cpufreq-dt
driver, cpufreq core, ACPI backlight, cpupower tool), cleanups
(cpuidle), new processor IDs for the RAPL (Running Average Power
Limit) power capping driver, and a modification of the generic power
domains framework allowing modular drivers to call one of its helper
functions.
Specifics:
- Fix for a potential NULL pointer dereference in the cpufreq core
due to an initialization race condition (Ethan Zhao).
- Fixes for abuse of the OPP (Operating Performance Points) API
related to RCU and other minor issues in the OPP library and the
cpufreq-dt driver (Dmitry Torokhov).
- cpuidle governors cleanup making them measure idle duration in a
better way without using the CPUIDLE_FLAG_TIME_INVALID flag which
allows that flag to be dropped from the ACPI cpuidle driver and
from the core too (Len Brown).
- New ACPI backlight blacklist entries for Samsung machines without a
working native backlight interface that need to use the ACPI
backlight instead (Aaron Lu).
- New CPU IDs of future Intel Xeon CPUs for the Intel RAPL power
capping driver (Jacob Pan).
- Generic power domains framework modification to export the
of_genpd_get_from_provider() function to modular drivers that will
allow future driver modifications to be based on the mainline (Amit
Daniel Kachhap).
- Two fixes for the cpupower tool (Michal Privoznik, Prarit
Bhargava)"
* tag 'pm+acpi-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: Add some Samsung models to disable_native_backlight list
tools / cpupower: Fix no idle state information return value
tools / cpupower: Correctly detect if running as root
cpufreq: fix a NULL pointer dereference in __cpufreq_governor()
cpufreq-dt: defer probing if OPP table is not ready
PM / OPP: take RCU lock in dev_pm_opp_get_opp_count
PM / OPP: fix warning in of_free_opp_table()
PM / OPP: add some lockdep annotations
powercap / RAPL: add IDs for future Xeon CPUs
PM / Domains: Export of_genpd_get_from_provider function
cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID
cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
Pull thermal management updates from Zhang Rui:
"First of all, the most important change is the thermal cpu cooling
fixes. The major fix here is to have proper sequencing between
cpufreq layer and thermal cpu cooling registration. A take away of
this fix is an improvement in the thermal drivers code. Thermal
drivers that require cpu cooling do not need to check for cpufreq
layer. The requirement now is to propagate the error code, if any,
while registering cpu cooling device. Thanks to Viresh for
implementing the required CPUfreq changes.
Second, a new driver is introduced for int340x processor thermal
device. Given that int340x thermal is disabled by default, and this
processor thermal device is only available on limited platforms, plus
the driver does nothing but exposes some thermal limitation
information for user space to use, thus I think it is safe to include
it in this pull request after missing 3.19-rc2.
Specifics:
- Thermal cpu cooling fixes and cleanups.
- introduce INT340X processor thermal reporting device driver.
- several small fixes and cleanups for int340x thermal drivers"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (43 commits)
Thermal/int340x/int3403: Free acpi notification handler
Thermal/int340x/processor_thermal: Fix memory leak
Thermal/int340x/int3403: Fix memory leak
thermal: int340x: Introduce processor reporting device
thermal: int340x_thermal: drop owner assignment from platform_drivers
thermal: drop owner assignment from platform_drivers
thermal: cpu_cooling: document node in struct cpufreq_cooling_device
thermal/powerclamp: add ids for future xeon cpus
Thermal/int340x: Handle properly the case when _trt or _art acpi entry is missing
thermal: cpu_cooling: return ERR_PTR() for !CPU_THERMAL or !THERMAL_OF
thermal: cpu_cooling: small memory leak on error
thermal: ti-soc-thermal: Do not print error message in the EPROBE_DEFER case
thermal: db8500: Do not print error message in the EPROBE_DEFER case
thermal: imx: Do not print error message in the EPROBE_DEFER case
thermal: Fix cdev registration with THERMAL_NO_LIMIT on 64bit
drivers: thermal: Remove ARCH_HAS_BANDGAP dependency for samsung
thermal:core:fix: Check return code of the ->get_max_state() callback
thermal: cpu_cooling: update copyright tags
thermal: cpu_cooling: Use cpufreq_dev->freq_table for finding level/freq
thermal: cpu_cooling: Store frequencies in descending order
...
Commit 2457aec637 ("mm: non-atomically mark page accessed during page
cache allocation where possible") has added a separate parameter for
specifying gfp mask for radix tree allocations.
Not only this is less than optimal from the API point of view because it
is error prone, it is also buggy currently because
grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
the radix tree allocation wouldn't obey the restriction and might
recurse into filesystem and cause deadlocks. This is the case for most
filesystems unfortunately because only ext4 and gfs2 are using
AOP_FLAG_NOFS.
Let's simply remove radix_gfp_mask parameter because the allocation
context is same for both page cache and for the radix tree. Just make
sure that the radix tree gets only the sane subset of the mask (e.g. do
not pass __GFP_WRITE).
Long term it is more preferable to convert remaining users of
AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
interface even further.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pm-domains:
PM / Domains: Export of_genpd_get_from_provider function
* powercap:
powercap / RAPL: add IDs for future Xeon CPUs
* pm-tools:
tools / cpupower: Fix no idle state information return value
tools / cpupower: Correctly detect if running as root
* pm-cpufreq:
cpufreq: fix a NULL pointer dereference in __cpufreq_governor()
cpufreq-dt: defer probing if OPP table is not ready
* pm-cpuidle:
cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID
cpuidle: ladder: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
cpuidle: menu: Better idle duration measurement without using CPUIDLE_FLAG_TIME_INVALID
Host needs to know vring element alignment requirements:
simply doing alignof on structures doesn't work reliably: on some
platforms gcc has alignof(uint32_t) == 2.
Add macros for alignment as specified in virtio 1.0 cs01,
export them to userspace as well.
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.
To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.
Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to make the newly fixed multicast bind/unbind
functionality in generic netlink, pass them down to the
appropriate family.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no point to force the caller to know about the internal
genl_sock to use inside struct net, just have them pass the network
namespace. This doesn't really change code generation since it's
an inline, but makes the caller less magic - there's never any
reason to pass another socket.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.
This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.
By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.
CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Tom Herbert <therbert@google.com>
Fixes: 04ffcb255f ("net: Add ndo_gso_check")
Tested-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit
d7480fd3b1 ("neigh: remove dynamic neigh table registration support"),
this field is not used anymore.
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull drm fixes from Dave Airlie:
"Xmas fixes pull:
core:
one atomic fix, revert the WARN_ON dumb buffers patch.
agp:
fixup Dave J.
nouveau:
fix 3.18 regression for old userspace
tegra fixes:
vblank and iommu fixes
amdkfd:
fix bugs shown by testing with userspace, init apertures once
msm:
hdmi fixes and cleanup
i915:
misc fixes
There is also a link ordering fix that I've asked to be cc'ed to you,
putting iommu before gpu, it fixes an issue with amdkfd when things
are all in the kernel, but I didn't like sending it via my tree
without discussion.
I'll probably be a bit on/off for a few weeks with pulls now, due to
holidays and LCA, so don't be surprised if stuff gets a bit backed up,
and things end up a bit large due to lag"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (28 commits)
Revert "drm/gem: Warn on illegal use of the dumb buffer interface v2"
agp: Fix up email address & attributions in AGP MODULE_AUTHOR tags
nouveau: bring back legacy mmap handler
drm/msm/hdmi: rework HDMI IRQ handler
drm/msm/hdmi: enable regulators before clocks to avoid warnings
drm/msm/mdp5: update irqs on crtc<->encoder link change
drm/msm: block incoming update on pending updates
drm/atomic: fix potential null ptr on plane enable
drm/msm: Deletion of unnecessary checks before the function call "release_firmware"
drm/msm: Deletion of unnecessary checks before two function calls
drm/tegra: dc: Select root window for event dispatch
drm/tegra: gem: Use the proper size for GEM objects
drm/tegra: gem: Flush buffer objects upon allocation
drm/tegra: dc: Fix a potential race on page-flip completion
drm/tegra: dc: Consistently use the same pipe
drm/irq: Add drm_crtc_vblank_count()
drm/irq: Add drm_crtc_handle_vblank()
drm/irq: Add drm_crtc_send_vblank_event()
drm/i915: Disable PSMI sleep messages on all rings around context switches
drm/i915: Force the CS stall for invalidate flushes
...
This reverts commit 355a701838.
This had some bad side effects under normal operation, and should
have been dropped earlier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Pull audit fixes from Paul Moore:
"Four patches to fix various problems with the audit subsystem, all are
fairly small and straightforward.
One patch fixes a problem where we weren't using the correct gfp
allocation flags (GFP_KERNEL regardless of context, oops), one patch
fixes a problem with old userspace tools (this was broken for a
while), one patch fixes a problem where we weren't recording pathnames
correctly, and one fixes a problem with PID based filters.
In general I don't think there is anything controversial with this
patchset, and it fixes some rather unfortunate bugs; the allocation
flag one can be particularly scary looking for users"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
audit: restore AUDIT_LOGINUID unset ABI
audit: correctly record file names with different path name types
audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb
audit: don't attempt to lookup PIDs when changing PID filtering audit rules
A regression was caused by commit 780a7654cee8:
audit: Make testing for a valid loginuid explicit.
(which in turn attempted to fix a regression caused by e1760bd)
When audit_krule_to_data() fills in the rules to get a listing, there was a
missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.
This broke userspace by not returning the same information that was sent and
expected.
The rule:
auditctl -a exit,never -F auid=-1
gives:
auditctl -l
LIST_RULES: exit,never f24=0 syscall=all
when it should give:
LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all
Tag it so that it is reported the same way it was set. Create a new
private flags audit_krule field (pflags) to store it that won't interact with
the public one from the API.
Cc: stable@vger.kernel.org # v3.10-rc1+
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This is a set of fixes for two regressions and one bug in the IOMMU
mapping code. It turns out that all of these issues turn up primarily
on Tegra30 hardware. The IOMMU mapping bug only manifests on buffers
that aren't multiples of the page size. I happened to be testing HDMI
with 1080p while writing the code and framebuffers for that happen to
fit exactly within 2025 pages of 4 KiB each.
One of the regressions is caused by the IOMMU code allocating pages from
shmem which can have associated cache lines. If the pages aren't flushed
then these cache lines may be flushed later on and cause framebuffer
corruption. I'm not sure why I didn't see this before. Perhaps the board
that I was using had enough RAM so that the pages shmem would hand out
had a better chance of being unused. Or maybe I didn't look too closely.
The fix for this is to fake up an SG table so that it can be passed to
the DMA API. Ideally this would use drm_clflush_*(), but implementing
that for ARM causes DRM to fail to build as a module since some of the
low-level cache maintenance functions aren't exported. Hopefully we can
get a suitable API exported on ARM for the next release.
The second regression is caused by a mismatch between the hardware pipe
number and the CRTC's DRM index. These were used inconsistently, which
could cause one code location to call drm_vblank_get() with a different
pipe than the corresponding drm_vblank_put(), thereby causing the
reference count to become unbalanced. Alexandre also reported a possible
race condition related to this, which this series also fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUkYzSAAoJEN0jrNd/PrOhFWgP+wYiyXiLot7Wo3+HM779fQ9a
MZkOycfxyNJ+TxJjvIlJh/2y641G4Elw3rod/QhUKg1b0L2uqVrVRKvEsx5sR5Ci
XASwkx9UFLRxN6/Cr/X8SKmE7nFUSwGd3wSoVrT42ldo0DOOlHuVT9NLFoCfDmFa
GN5pxUW/WHS7WgCVpG9GgoFmFZXyrwx9ZRHqL49eJqAvjBngmBbZeyhFeZdu71fl
rm4qMiLkZZsLZEm3uP53pbdkAf7yZGV3WPWKO43LXgykSMfQ56WcN7JJsGygB3I1
uEMP65Tf3TdynW6Wz2dywq81uITJhd8y6Zhr6j2bsNINTHDz67YOxKfS50axZN/P
2PbqDLyJF7MT1ydQ7weeEv4gkRF8Vt6K3aBfL5gm8PM6jm2sdZytsjLM+/RCIkl3
cDtkC+XmPmGxLTEnV3iWMCbCfOrNvqzkp9jvilIEbxIvgX72T6EQPJnfe7Sv95Cr
VBFWoi26XtFhN9wWEGjc7fRTUwNdg4D21/ns8TY3MgOFQcdP01pp2KRTdPDC/6Mt
kknXwDaZRY6EjGQmRKkxKf1c64nmY8V7MJx2CSbPc3HgGdSXaa0AOZE2d60beste
ASpgMQIbERCmAbdmb5JN6fsKcpJrJL15zbrGcDwSnIk96x4HAC8zB9Xln2ubdCwc
IP4cm/Abz6Cfd6I1cQRr
=eVlK
-----END PGP SIGNATURE-----
Merge tag 'drm/tegra/for-3.19-rc1-fixes' of git://people.freedesktop.org/~tagr/linux into drm-fixes
drm/tegra: Fixes for v3.19-rc1
This is a set of fixes for two regressions and one bug in the IOMMU
mapping code. It turns out that all of these issues turn up primarily
on Tegra30 hardware. The IOMMU mapping bug only manifests on buffers
that aren't multiples of the page size. I happened to be testing HDMI
with 1080p while writing the code and framebuffers for that happen to
fit exactly within 2025 pages of 4 KiB each.
One of the regressions is caused by the IOMMU code allocating pages from
shmem which can have associated cache lines. If the pages aren't flushed
then these cache lines may be flushed later on and cause framebuffer
corruption. I'm not sure why I didn't see this before. Perhaps the board
that I was using had enough RAM so that the pages shmem would hand out
had a better chance of being unused. Or maybe I didn't look too closely.
The fix for this is to fake up an SG table so that it can be passed to
the DMA API. Ideally this would use drm_clflush_*(), but implementing
that for ARM causes DRM to fail to build as a module since some of the
low-level cache maintenance functions aren't exported. Hopefully we can
get a suitable API exported on ARM for the next release.
The second regression is caused by a mismatch between the hardware pipe
number and the CRTC's DRM index. These were used inconsistently, which
could cause one code location to call drm_vblank_get() with a different
pipe than the corresponding drm_vblank_put(), thereby causing the
reference count to become unbalanced. Alexandre also reported a possible
race condition related to this, which this series also fixes.
* tag 'drm/tegra/for-3.19-rc1-fixes' of git://people.freedesktop.org/~tagr/linux:
drm/tegra: dc: Select root window for event dispatch
drm/tegra: gem: Use the proper size for GEM objects
drm/tegra: gem: Flush buffer objects upon allocation
drm/tegra: dc: Fix a potential race on page-flip completion
drm/tegra: dc: Consistently use the same pipe
drm/irq: Add drm_crtc_vblank_count()
drm/irq: Add drm_crtc_handle_vblank()
drm/irq: Add drm_crtc_send_vblank_event()
Resolve conflicts between glibc definition of IPV6 socket options
and those defined in Linux headers. Looks like earlier efforts to
solve this did not cover all the definitions.
It resolves warnings during iproute2 build.
Please consider for stable as well.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
ACCESS_ONCE might fail with specific compilers for non-scalar accesses.
Here is a set of patches to tackle that problem.
The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure
is larger than the machine word size memcpy is used and a warning is emitted.
The next patches fix up several in-tree users of ACCESS_ONCE on non-scalar
types.
This merge does not yet contain a patch that forces ACCESS_ONCE to work only
on scalar types. This is targetted for the next merge window as Linux next
already contains new offenders regarding ACCESS_ONCE vs. non-scalar types.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJUkrVGAAoJEBF7vIC1phx8stkP/2LmN5y6LOseoEW06xa5MX4m
cbIKsZNtsGHl7EDcTzzuWs6Sq5/Cj7V3yzeBF7QGbUKOqvFWU3jvpUBCCfjMg37C
77/Vf0ZPrxTXXxeJ4Ykdy2CGvuMtuYY9TWkrRNKmLU0xex7lGblEzCt9z6+mZviw
26/DN8ctjkHRvIUAi+7RfQBBc3oSMYAC1mzxYKBAsAFLV+LyFmsGU/4iofZMAsdt
XFyVXlrLn0Bjx/MeceGkOlMDiVx4FnfccfFaD4hhuTLBJXWitkUK/MRa4JBiXWzH
agY8942A8/j9wkI2DFp/pqZYqA/sTXLndyOWlhE//ZSti0n0BSJaOx3S27rTLkAc
5VmZEVyIrS3hyOpyyAi0sSoPkDnjeCHmQg9Rqn34/poKLd7JDrW2UkERNCf/T3eh
GI2rbhAlZz3v5mIShn8RrxzslWYmOObpMr3HYNUdRk8YUfTf6d6aZ3txHp2nP4mD
VBAEzsvP9rcVT2caVhU2dnBzeaZAj3zeDxBtjcb3X2osY9tI7qgLc9Fa/fWKgILk
2evkLcctsae2mlLNGHyaK3Dm/ZmYJv+57MyaQQEZNfZZgeB1y4k0DkxH4w1CFmCi
s8XlH5voEHgnyjSQXXgc/PNVlkPAKr78ZyTiAfiKmh8rpe41/W4hGcgao7L9Lgiu
SI0uSwKibuZt4dHGxQuG
=IQ5o
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux
Pull ACCESS_ONCE cleanup preparation from Christian Borntraeger:
"kernel: Provide READ_ONCE and ASSIGN_ONCE
As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
ACCESS_ONCE might fail with specific compilers for non-scalar
accesses.
Here is a set of patches to tackle that problem.
The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data
structure is larger than the machine word size memcpy is used and a
warning is emitted. The next patches fix up several in-tree users of
ACCESS_ONCE on non-scalar types.
This does not yet contain a patch that forces ACCESS_ONCE to work only
on scalar types. This is targetted for the next merge window as Linux
next already contains new offenders regarding ACCESS_ONCE vs.
non-scalar types"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
s390/kvm: REPLACE barrier fixup with READ_ONCE
arm/spinlock: Replace ACCESS_ONCE with READ_ONCE
arm64/spinlock: Replace ACCESS_ONCE READ_ONCE
mips/gup: Replace ACCESS_ONCE with READ_ONCE
x86/gup: Replace ACCESS_ONCE with READ_ONCE
x86/spinlock: Replace ACCESS_ONCE with READ_ONCE
mm: replace ACCESS_ONCE with READ_ONCE or barriers
kernel: Provide READ_ONCE and ASSIGN_ONCE
much later than usual due to several last minute bugs that had to be
addressed. As usual the majority of changes are new drivers and
modifications to existing drivers. The core recieved many fixes along
with the groundwork for several large changes coming in the future which
will better parition clock providers from clock consumers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUlMRQAAoJEDqPOy9afJhJgdUQAK4myJT0q10LSqe9piwzGVXg
uDcIN5CTtbdYkvdGIfCjeqz3t+DClnAMPx2ZPIjC0Z1mIvqq+ViqwP5U8kKd7z1a
WCKV8e5Et3O1WNbslzsx5Z2JYJNgzqr1xxWAOLTLh5rYxVwE5b946Yv4Whxa694I
ugm4wNlibeN3H8pnyH8YEiWEtahtu7B5v/9WELpyREwNxw7ZA18MttEvWaamAPHG
rAxhQCB3A3HaIvyg8KFdVmwOBZQMc2EWT00kJfdRWL4/iGAipKCnbuh1c8Pr/RQE
XRg5Y+MuMLotoUELYYeZHtEmIlW3A+9gR6tLivswPpOP8/5BVUyA5Hh0yCGUqUHD
s5Iheq7s7xnKEgIu9cD4tf1nCY41gw+4/I4pm47WLkaRgehcEBcAibVC3CupZ5pI
hJiFqEKWPKEk8vAJ/mM+wCGI4w01+eoICBm4EG06Nwj4xkQcAVqE67ZvgVs1LrmL
efqSxkWpNoetf0Q12cfePHmWtesGNdvljLdXQ54T4qH9HxNaI9/9eM6tyFTfrDSe
BG5h7gbPr6/aM/1FfcWn5jQIfjEjPhQtSpCehs8pMf/pG5QZgftBtwe3p+yz7zXJ
Q/v8xNEcZ7Ze6/9rJsAcbLzyzcdk9NzTlEMplzGBoUQFNiEXKoIjCDKAx39UFtMz
EccWXvt9iNZZhmDcu0pU
=jD84
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus-3.19' of git://git.linaro.org/people/mike.turquette/linux
Pull clk framework updates from Mike Turquette:
"This is much later than usual due to several last minute bugs that had
to be addressed. As usual the majority of changes are new drivers and
modifications to existing drivers. The core recieved many fixes along
with the groundwork for several large changes coming in the future
which will better parition clock providers from clock consumers"
* tag 'clk-for-linus-3.19' of git://git.linaro.org/people/mike.turquette/linux: (86 commits)
clk: samsung: Fix Exynos 5420 pinctrl setup and clock disable failure due to domain being gated
ARM: OMAP3: clock: fix boot breakage in legacy mode
ARM: OMAP2+: clock: fix DPLL code to use new determine rate APIs
clk: Really fix deadlock with mmap_sem
clk: mmp: fix sparse non static symbol warning
clk: Change clk_ops->determine_rate to return a clk_hw as the best parent
clk: change clk_debugfs_add_file to take a struct clk_hw
clk: Don't expose __clk_get_accuracy
clk: Don't try to use a struct clk* after it could have been freed
clk: Remove unused function __clk_get_prepare_count
clk: samsung: Fix double add of syscore ops after driver rebind
clk: samsung: exynos4: set parent of sclk_hdmiphy to hdmi
clk: samsung: exynos4415: Fix build with PM_SLEEP disabled
clk: samsung: remove unnecessary inclusion of header files from clk.h
clk: samsung: remove unnecessary CONFIG_OF from clk.c
clk: samsung: Spelling s/bwtween/between/
clk: rockchip: Add support for the mmc clock phases using the framework
clk: rockchip: add bindings for the mmc clocks
clk: rockchip: rk3288 export i2s0_clkout for use in DT
clk: rockchip: use clock ID for DMC (memory controller) on rk3288
...
This is a much shorter set of patches that were on the go but didn't make it
in to the early pull request for the merge window. It's really a set of bug
fixes plus some final cleanup work on the new tag queue API.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJUlaYEAAoJEDeqqVYsXL0MmXAH/2UUcE11p0KBHMR4cAn76xrG
9093ZT9VZ4LH/Z7PbgwIWC4YHDqVpwA1+Trj1mh8PxiZz2SopWe27O2lQMRS5VUc
MN28kbmK3L0jQj+OUez10Da6k0hU/KL8TlWT765MxFDKCaAuPZ4u541tyZEIGmLL
olOQrn/fSlu+18QqqZ+D2rMaK7kGH6ZgbOadnRfYGkLjU4YeAMEC9L7UgnDxHiaN
gZozoARkGeAnDJERVETRTtAiOXGRH8sGCpue0yYlhZXpAQ9cFUkS/hMqDWnaVC2S
0x0w34RvbxSqO0gPT0K5XLoMiFyg04vnZ2xBVFBsANQTSEjQJO8USNAa4r74hf8=
=D3eN
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI update from James Bottomley:
"This is a much shorter set of patches that were on the go but didn't
make it in to the early pull request for the merge window. It's
really a set of bug fixes plus some final cleanup work on the new tag
queue API"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
storvsc: ring buffer failures may result in I/O freeze
ipr: set scsi_level correctly for disk arrays
ipr: add support for async scanning to speed up boot
scsi_debug: fix missing "break;" in SDEBUG_UA_CAPACITY_CHANGED case
scsi_debug: take sdebug_host_list_lock when changing capacity
scsi_debug: improve driver description in Kconfig
scsi_debug: fix compare and write errors
qla2xxx: fix race in handling rport deletion during recovery causes panic
scsi: blacklist RSOC for Microsoft iSCSI target devices
scsi: fix random memory corruption with scsi-mq + T10 PI
Revert "[SCSI] mpt3sas: Remove phys on topology change"
Revert "[SCSI] mpt2sas: Remove phys on topology change."
esas2r: Correct typos of "validate" in a comment
fc: FCP_PTA_SIMPLE is 0
ibmvfc: remove unused tag variable
scsi: remove MSG_*_TAG defines
scsi: remove scsi_set_tag_type
scsi: remove scsi_get_tag_type
scsi: never drop to untagged mode during queue ramp down
scsi: remove ->change_queue_type method
This removes the last few uses of CONFIG_PM_RUNTIME introduced
recently and makes that config option finally go away.
CONFIG_PM will be available directly from the menu now and
also it will be selected automatically if CONFIG_SUSPEND or
CONFIG_HIBERNATION is set.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUlYaPAAoJEILEb/54YlRx/SoP/2wYioGzBhOCYfHw6fZF8zrP
rotQ86sakhvSHre8K9QyFjvsA9wJ0CaTJF46YKZuHFhqU+IJZ7aXvNdEM1hK214J
Mf3L2AcbcdnXioAN+HpeZhQklp2qHe84YkVXBqsFD6kb/qUNV2LSjy6nKEUdY3jW
6KL2f3RgF/LDjTdedujJgcCYwMBwfX4B7U42BG4NQQ8z3wCV+imJgzNDrR5nNlqK
xu8ab8hO1Gi3msOJxS0y4MN6VTUpYOvQKhSyM9ErcB2ibclAdmcivKuFAz6gy5U7
PyDfYo/P3mXjMRBFb9fLqGtRcfstsnxPPSeKwp236tIQFX19Bj76UVUMJoUlXJP5
/f55/P7mCascg74ZZC4GiD/BSCRdqwInCsFMzqAfSq2NciKzeS6W7Mhd9VTLKDpl
5kqE39imUjZyps7/QqkfWskzB7Puhmqk3ZgTq2yAd4uQTpV7xlJYcnvr4oHCmAia
SsLdYOqMQzWr3qyz2f5cOqPAvOo3/Xk/HHfTOCHW/4L+Ov+C921/f3d5GnxX9Ha+
ucRaMp9j5FPYVwFaFkczAMNF2Eanq+Fupa3e6XUNNbYdchFqT9obnHZbVKyvswjR
vdGAYAjP/cLzIH9ETDCCXCRvBRw5pzeelDgvDPjPdmPjndHXG8WViyTIEyLL4+1i
BENtc/SUw3pZ7iNlGO78
=QnSO
-----END PGP SIGNATURE-----
Merge tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull CONFIG_PM_RUNTIME elimination from Rafael Wysocki:
"This removes the last few uses of CONFIG_PM_RUNTIME introduced
recently and makes that config option finally go away.
CONFIG_PM will be available directly from the menu now and also it
will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION
is set"
* tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: Eliminate CONFIG_PM_RUNTIME
tty: 8250_omap: Replace CONFIG_PM_RUNTIME with CONFIG_PM
sound: sst-haswell-pcm: Replace CONFIG_PM_RUNTIME with CONFIG_PM
spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
Pull vfs pile #3 from Al Viro:
"Assorted fixes and patches from the last cycle"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[regression] chunk lost from bd9b51
vfs: make mounts and mountstats honor root dir like mountinfo does
vfs: cleanup show_mountinfo
init: fix read-write root mount
unfuck binfmt_misc.c (broken by commit e6084d4)
vm_area_operations: kill ->migrate()
new helper: iter_is_iovec()
move_extent_per_page(): get rid of unused w_flags
lustre: get rid of playing with ->fs
btrfs: filp_open() returns ERR_PTR() on failure, not NULL...
Pull SCSI target fixes from Nicholas Bellinger:
"The highlights this merge window include:
- Allow target fabric drivers to function as built-in. (Roland)
- Fix tcm_loop multi-TPG endpoint nexus bug. (Hannes)
- Move per device config_item_type into se_subsystem_api, allowing
configfs attributes to be defined at module_init time. (Jerome +
nab)
- Convert existing IBLOCK/FILEIO/RAMDISK/PSCSI/TCMU drivers to use
external configfs attributes. (nab)
- A number of iser-target fixes related to active session + network
portal shutdown stability during extended stress testing. (Sagi +
Slava)
- Dynamic allocation of T10-PI contexts for iser-target, fixing a
potentially bogus iscsi_np->tpg_np pointer reference in >= v3.14
code. (Sagi)
- iser-target performance + scalability improvements. (Sagi)
- Fixes for SPC-4 Persistent Reservation AllRegistrants spec
compliance. (Ilias + James + nab)
- Avoid potential short kern_sendmsg() in iscsi-target for now until
Al's conversion to use msghdr iteration is merged post -rc1.
(Viro)
Also, Sagi has requested a number of iser-target patches (9) that
address stability issues he's encountered during extended stress
testing be considered for v3.10.y + v3.14.y code. Given the amount of
LOC involved, it will certainly require extra backporting effort.
Apologies in advance to Greg-KH & Co on this. Sagi and I will be
working post-merge to ensure they each get applied correctly"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (53 commits)
target: Allow AllRegistrants to re-RESERVE existing reservation
uapi/linux/target_core_user.h: fix headers_install.sh badness
iscsi-target: Fail connection on short sendmsg writes
iscsi-target: nullify session in failed login sequence
target: Avoid dropping AllRegistrants reservation during unregister
target: Fix R_HOLDER bit usage for AllRegistrants
iscsi-target: Drop left-over bogus iscsi_np->tpg_np
iser-target: Fix wc->wr_id cast warning
iser-target: Remove code duplication
iser-target: Adjust log levels and prettify some prints
iser-target: Use debug_level parameter to control logging level
iser-target: Fix logout sequence
iser-target: Don't wait for session commands from completion context
iser-target: Reduce CQ lock contention by batch polling
iser-target: Introduce isert_poll_budget
iser-target: Remove an atomic operation from the IO path
iser-target: Remove redundant call to isert_conn_terminate
iser-target: Use single CQ for TX and RX
iser-target: Centralize completion elements to a context
iser-target: Cast wr_id with uintptr_t instead of unsinged long
...
Pull x86 apic updates from Thomas Gleixner:
"After stopping the full x86/apic branch, I took some time to go
through the first block of patches again, which are mostly cleanups
and preparatory work for the irqdomain conversion and ioapic hotplug
support.
Unfortunaly one of the real problematic commits was right at the
beginning, so I rebased this portion of the pending patches without
the offenders.
It would be great to get this into 3.19. That makes reworking the
problematic parts simpler. The usual tip testing did not unearth any
issues and it is fully bisectible now.
I'm pretty confident that this wont affect the calmness of the xmas
season.
Changes:
- Split the convoluted io_apic.c code into domain specific parts
(vector, ioapic, msi, htirq)
- Introduce proper helper functions to retrieve irq specific data
instead of open coded dereferencing of pointers
- Preparatory work for ioapic hotplug and irqdomain conversion
- Removal of the non functional pci-ioapic driver
- Removal of unused irq entry stubs
- Make native_smp_prepare_cpus() preemtible to avoid GFP_ATOMIC
allocations for everything which is called from there.
- Small cleanups and fixes"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
iommu/amd: Use helpers to access irq_cfg data structure associated with IRQ
iommu/vt-d: Use helpers to access irq_cfg data structure associated with IRQ
x86: irq_remapping: Use helpers to access irq_cfg data structure associated with IRQ
x86, irq: Use helpers to access irq_cfg data structure associated with IRQ
x86, irq: Make MSI and HT_IRQ indepenent of X86_IO_APIC
x86, irq: Move IRQ initialization routines from io_apic.c into vector.c
x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h
x86, irq: Move HT IRQ related code from io_apic.c into htirq.c
x86, irq: Move PCI MSI related code from io_apic.c into msi.c
x86, irq: Replace printk(KERN_LVL) with pr_lvl() utilities
x86, irq: Make UP version of irq_complete_move() an inline stub
x86, irq: Move local APIC related code from io_apic.c into vector.c
x86, irq: Introduce helpers to access struct irq_cfg
x86, irq: Protect __clear_irq_vector() with vector_lock
x86, irq: Rename local APIC related functions in io_apic.c as apic_xxx()
x86, irq: Refine hw_irq.h to prepare for irqdomain support
x86, irq: Convert irq_2_pin list to generic list
x86, irq: Kill useless parameter 'irq_attr' of IO_APIC_get_PCI_irq_vector()
x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI
x86, irq: Introduce helper to check whether an IOAPIC has been registered
...
Having switched over all of the users of CONFIG_PM_RUNTIME to use
CONFIG_PM directly, turn the latter into a user-selectable option
and drop the former entirely from the tree.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Pull irq core fix from Thomas Gleixner:
"A single fix plugging a long standing race between proc/stat and
proc/interrupts access and freeing of interrupt descriptors"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent proc race against freeing of irq descriptors