- On ARM did not have balanced calls to get/put_cpu.
- Fix to make tboot + Xen + Linux correctly.
- Fix events VCPU binding issues.
- Fix a vCPU online race where IPIs are sent to not-yet-online vCPU.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSFMaJAAoJEFjIrFwIi8fJ+/0H/32rLj60FpKXcPDCvID+9p8T
XDGnFNttsxyhuzEzetOAd0aLKYKGnUaTDZBHfgSNipGCxjMLYgz84phRmHAYEj8u
kai1Ag1WjhZilCmImzFvdHFiUwtvKwkeBIL/cZtKr1BetpnuuFsoVnwbH9FVjMpr
TCg6sUwFq7xRyD1azo/cTLZFeiUqq0aQLw8J72YaapdS3SztHPeDHXlPpmLUdb6+
hiSYveJMYp2V0SW8g8eLKDJxVr2QdPEfl9WpBzpLlLK8GrNw8BEU6hSOSLzxB7z/
hDATAuZ5iHiIEi1uGfVjOyDws2ngUhmBKUH5x5iVIZd2P5c/ffLh2ePDVWGO5RI=
=yMuS
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- On ARM did not have balanced calls to get/put_cpu.
- Fix to make tboot + Xen + Linux correctly.
- Fix events VCPU binding issues.
- Fix a vCPU online race where IPIs are sent to not-yet-online vCPU.
* tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/smp: initialize IPI vectors before marking CPU online
xen/events: mask events when changing their VCPU binding
xen/events: initialize local per-cpu mask for all possible events
x86/xen: do not identity map UNUSABLE regions in the machine E820
xen/arm: missing put_cpu in xen_percpu_init
Pull MIPS fix from Ralf Baechle:
"Just a single patch which fixes a special case in the MIPS FPU
emulator which is always required, even on CPUs with FPU. There is
the rare special case that an FPU (or certain other instructions) in a
branch delay slot is causing an exception and then the branch
instruction will need to be emulated by the kernel before resuming
execution. This is working great except if the branch instruction is
an Octeon BBIT instruction.
The boring disclaimer - all MIPS defconfigs build tested and no
regressions and runtime tested on Octeon, no known issues"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Handle OCTEON BBIT instructions in FPU emulator.
(discovered with Vince's fuzzing tool).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iQIcBAABAgAGBQJSE6M4AAoJEGvWsS0AyF7x7ZQP/3A+t3bH2fn6TmxzFyRRsi2i
YbqX+OP+iGUrtzJQ9cR/cznml0yxePqFKXp76MWvtHkS9eMIcNhyWwhJRuDEV9Rx
8PJq2yA9eNxYjDvhA+2vtbEEsHaYdu3O+w7tMoHxaBhveeDZIgk+/5YnSuLmDE2i
HEEW2KYjiH7ILr5YhMJNotMIwyn5xg9r2nC96bDnLN0kjB4Khlaat5bikkn6DOrN
39EQYV8R7fJSa6t3yYXnO4DBuMrqKdsswQR+JdkQfGCFxYz+BruF+v73f+zlVEd6
a7Ie86Mjk/Gto7MzItU6PDqLnrTg3alTxYxCJQFj5SKtYX/+vzNicmmeG90PvPUV
KQc5rVNIYEHu05J8wTHrwZRDFlpr4mllqy6KtmcWDfqgYg5wzj139PEBL4gyw4wb
9qWk/ti/Nezk039Oh2EV/gUuZuHe2+a/4k2lbJqy8uUqsPvQtifDM6tJS1TgExG5
X9AGfRMVtBmQH8K/6oVc+S5sXwlddcRBviTzggpvkY6KgJL6kWOuTBla0KKHvrQc
Ok34e3Bxk5WuUnCa5rxaAW6Ewt9HqolMqL+Bgd0tvD5WKIRvT4pBG0a7mIhzeymM
Uh1n7FOLLfSNIVm7YU/v6yPJ8v8w+QpJtzX17VzZf1ASHj5JUh6RRoX6Rwshaplv
q6CA672f7Q7g5sG3HwOz
=Lajp
-----END PGP SIGNATURE-----
Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
Pull arm64 perf fixes from Catalin Marinas:
"Perf backend fixes for arm64 where the user can cause kernel panic
(discovered with Vince's fuzzing tool)"
* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: perf: fix event validation for software group leaders
arm64: perf: fix array out of bounds access in armpmu_map_hw_event()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJSEoS0AAoJEBvWZb6bTYbyE/8P/izUr1XtKBqtVuqVBMewEkeB
SBgvJ2w681f4d+1waVAEPqWVGKHDvSOQC54sG6Y3fWHKijKGLiQhLaY3Y1WaLlO1
B+duREwCwHaApjrpYoKhkyGQVpgyIfHBVe8d2TM9Q2bRuYNZEEcOtfdXk+Qfr6WR
2kN+67ivJzAXvjs0uuRyJtXXq9cemcOnngsAfBlJz+j6UbiEdQ3l569D3wQU1jS2
lUxxCEdtBDKDXkJUbTYvtJNYR48caqVXhYBTjpmY04207iSHmacUytOXO3rRA3OL
fFhm/QeKVZND0XrJDUOMFzosWdUVdP5Qd5PtYoV/gEydNJMMpPs+dFKv+RXzrWlm
2S2PWbFlkFT8yM+xwh6uKnLQ1aj614dkK2vKlp9GwDuwWiaod71C8ouTJvanNHGt
pWgktFlfD+npSc3QDeXG5QB78pTSeyJfZBeVvA+U/etX+vjdfFWZ3bHMScrAE4DX
xsdvtfamo0m9v2yZsnKzRWtCQq9No5FRb/c31w7yUzSXNBtyNR0Vft9gmiLo4HYa
FQ0wC2UPyaKbfYtX0orpWnN3u4vaylGw2HuzK+2Mwi2HL+AMI6Piu//nrTbqb/i+
1a6OARWvv9BQdbcuzBqznUdcllbmRl94kA5zXPvAz0dOBPQFU4X/t6dkxxH+8JFj
mA/c2PHEyOtuFDOoqZxE
=RlHG
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Fixes for ARM and aarch64.
This pull request is coming a bit later than I would have preferred,
because I and Gleb happened to have holidays around the same weeks of
August... sorry about that"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: ARM: Squash len warning
arm64: KVM: use 'int' instead of 'u32' for variable 'target' in kvm_host.h.
arm64: KVM: add missing dsb before invalidating Stage-2 TLBs
arm64: KVM: perform save/restore of PAR_EL1
arm64: KVM: fix 2-level page tables unmapping
ARM: KVM: Fix unaligned unmap_range leak
ARM: KVM: Fix 64-bit coprocessor handling
This was a new driver in this merge window, so some
post-merge hardening is happening.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJSCdCbAAoJEEEQszewGV1z5VIP/2BVZyUoh4bCY4ZnuacfhArI
y83blSEfyvMAfjJfoE2a3vQBLMQpz50IhxDZ2jWIORKUVgCpcuz7FLQge+fW7YRH
KmDjbRXaeEG2EkPCUT3xSaQx3sOgFnS5fVxa3rMgZKyfnHTQjRC654XDg0O8Ar4Q
yiYF5BerI+k7jAA+MRUGjz7h23McEcsxf7e/mINbbzFSMdUcYDWYu/VZaM2tU1eL
XzbG51T0jJi2NPeaezgTp9wDUV338DyYqLkJZ5ForvrvZ42g2Sm2n5w3rXV1XlEM
zPFjJ0JxwW0YIut/wvXTMto0l+M1I+PdYqEJ8x/3gMA7OmQt2ustBLc/bTYmB7W9
VR9J7UKmxjYCfN3SQmfYyokyKWF72ELO3C107JBo/KeVaCasjEKF1gxSHGo2d+QI
6a5TjKbna+fh9XOVXASqJtIL7rI/6q+UIoZh/M5ENBK+7D5sk3dYvCrW60zg1gVj
KVode0v1Uo48Xub902d68L2lmx/rt6RxHVYSd7atagGTMIpadwU0TrnDGP1IbgWc
zuhnE+7+uGrVR63xK7MIuKJxA0CxbM6qWiSNB/6OqVaKi9t/NexhB9ujId4bTro2
IyNDIC2Bj+BjdDm8oQxxBUUP/ozNNg2C45Zo9D39/22BIlIlYhNvUNqoXK5N3rhM
gTBeSX7bZSUFYXXOPb+j
=fGdI
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-for-v3.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pinctrl fixes from Linus Walleij:
"Fixes for the sunxi (AllWinner) pin control driver. This was a new
driver in this merge window, so some post-merge hardening is
happening"
[ I had completely missed this pull request for some reason, it was sent
over a week ago but my mailbox is chaotic ]
* tag 'pinctrl-for-v3.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: Add spinlocks
pinctrl: sunxi: Fix gpio_set behaviour
pinctrl: sunxi: Read register before writing to it in irq_set_type
There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:
- A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
underlying SCSI command will transfer data from the SCSI device to
the buffer provided in the ioctl)
- Before the command finishes, a signal is sent to the process waiting
in the ioctl. This will end up waking up the sg_ioctl() code:
result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));
but neither srp_done() nor sdp->detached is true, so we end up just
setting srp->orphan and returning to userspace:
srp->orphan = 1;
write_unlock_irq(&sfp->rq_list_lock);
return result; /* -ERESTARTSYS because signal hit process */
At this point the original process is done with the ioctl and
blithely goes ahead handling the signal, reissuing the ioctl, etc.
- Eventually, the SCSI command issued by the first ioctl finishes and
ends up in sg_rq_end_io(). At the end of that function, we run through:
write_lock_irqsave(&sfp->rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
* packet.
*/
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
kref_put(&sfp->f_ref, sg_remove_sfp);
} else {
INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
schedule_work(&srp->ew.work);
}
Since srp->orphan *is* set, we set done to 0 (assuming the
userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
to run in a workqueue.
- In workqueue context we go through sg_rq_end_io_usercontext() ->
sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().
The key point here is that we are doing copy_to_user() on a
workqueue -- that is, we're on a kernel thread with current->mm
equal to whatever random previous user process was scheduled before
this kernel thread. So we end up copying whatever data the SCSI
command returned to the virtual address of the buffer passed into
the original ioctl, but it's quite likely we do this copying into a
different address space!
As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>,
add a check for current->mm (which is NULL if we're on a kernel thread
without a real userspace address space) in bio_uncopy_user(), and skip
the copy if we're on a kernel thread.
There's no reason that I can think of for any caller of bio_uncopy_user()
to want to do copying on a kernel thread with a random active userspace
address space.
Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the
original pointer to this bug in the sg code.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Tested-by: David Milburn <dmilburn@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
We want ppc64 to be able to select between optimised assembly
checksum routines in big endian and the generic lib/checksum.c
routines in little endian.
The lpfc driver is forcing CONFIG_GENERIC_CSUM on which means
we are unable to make the decision to enable it in the arch
Kconfig. If the option exists it is always forced on.
This got introduced in 3.10 via commit 6a7252fdb0 ([SCSI] lpfc:
fix up Kconfig dependencies). I spoke to Randy about it and
the original issue was with CRC_T10DIF not being defined.
As such, remove the select of CONFIG_GENERIC_CSUM.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@vger.kernel.org> # 3.10
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
commit 385904f819 ('sfc: Don't use
efx_filter_{build,hash,increment}() for default MAC filters') used the
wrong name to find the index of default RX MAC filters at insertion/
update time. This could result in memory corruption and would in any
case silently fail to update the filter.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
It's not allowed to clear masks of a cpuset if there're tasks in it,
but it's broken:
# mkdir /cgroup/sub
# echo 0 > /cgroup/sub/cpuset.cpus
# echo 0 > /cgroup/sub/cpuset.mems
# echo $$ > /cgroup/sub/tasks
# echo > /cgroup/sub/cpuset.cpus
(should fail)
This bug was introduced by commit 88fa523bff
("cpuset: allow to move tasks to empty cpusets").
tj: Dropped temp bool variables and nestes the conditionals directly.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The VLAN code needs to know the length of the per-port VLAN bitmap to
perform its most basic operations (retrieving VLAN informations, removing
VLANs, forwarding database manipulation, etc). Unfortunately, in the
current implementation we are using a macro that indicates the bitmap
size in longs in places where the size in bits is expected, which in
some cases can cause what appear to be random failures.
Use the correct macro.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 77145f1cbd was introduced
error which cause that reclocking on nv40 not working anymore.
There is missing assigment of return value from pll_calc to ret.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Allocating type=0 marks the memory as free. This allows the ltcg memory
to be allocated twice.
Add a BUG_ON in core/mm.c to prevent this ever happening again.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Some registers were not initialized in init, this causes them to be
uninitialized after suspend.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit dceef5d87 (drm/nouveau/fb: initialise vram controller as pfb
sub-object) moved some code around and introduced these null derefs.
pfb->ram is set to the new ram object outside of this ctor.
Reported-by: Ronald Uitermark <ronald645@gmail.com>
Tested-by: Ronald Uitermark <ronald645@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
John W. Linville says:
====================
Regarding the iwlwifi bits, Johannes says:
"We revert an rfkill bugfix that unfortunately caused more bugs, shuffle
some code to avoid touching the PCIe device before it's enabled and
disconnect if firmware fails to do our bidding. I also have Stanislaw's
fix to not crash in some channel switch scenarios."
As for the mac80211 bits, Johannes says:
"This time, I have one fix from Dan Carpenter for users of
nl80211hdr_put(), and one fix from myself fixing a regression with the
libertas driver."
Along with the above...
Dan Carpenter fixes some incorrectly placed "address of" operators
in hostap that caused copying of junk data.
Jussi Kivilinna corrects zd1201 to use an allocated buffer rather
than the stack for a URB operation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit
ee80fbf301 ("packet: account statistics only in tpacket_stats_u")
cleaned up the getsockopt PACKET_STATISTICS code.
This also changed semantics. Historically, tp_packets included
tp_drops on return. The commit removed the line that adds tp_drops
into tp_packets.
This patch reinstates the old semantics.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to fix a problem in the rtl8211 where the driver
wasn't properly enabled the interrupt on link change status.
it has to enable the ineterrupt on the bit 10 in the register 18
(INER).
Reported-by: Sharma Bhupesh <B45370@freescale.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Check if the skb has been correctly prepared before going on
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iQIcBAABCAAGBQJSEkvHAAoJEADl0hg6qKeOyTQP/ifIXk5t26Tu8GTCH+lQnF36
1HY4nkEhLBkrEaKv0RXXEwLCDe1Gk8INewSXhtgDe7v696287zvxSDiftxXOwSn8
EkrP3jakxqNgyEstVUMxXuHQMxn8YsOnU+u4L4MZvcsWNmh1V8FzNLxPDWF3Z0bi
ycXhFI+BR+waWzFd8rVZ5sJ00ZhgSuM5vJ/uQ28kT8DyDZXz0I0mvve7ZUh5fczc
L7vvnju9VRq84RxV6bQwf9hXDk54fCLz22WSMolrqaHCl0XF4OAu6OVcYBLA0bp7
GUU7fS8IUiqAuC02FS5HEYPy1VErCok8hP/fzvjz8Bxuzz0I5SdYPurFTZQzAx73
U0GCLtNOE7zkwIsRbKhMdUcB6DoFZJVUaLo8YS9E1tl/nn1oRILFGTFb2T9/WzQ8
lbCkmYm+WhLdKeZbLkf8PPs9PDrhRQKr/QRHrCHVKh4rqzP1BUm4FqIBXo71Kiiz
no4GJoern8vW0CzoR59P5++/iFOCVTIx4ZJWvnYjWbqsYRazKjjCFtHffpz6mz+1
pHlrdYAZo+DOvme/2putfe6ViR+bA3lPxPkM7k3gADMifJcCAl3D7OM53QaDSKAk
Gw3yHafxBaFPXFdqxQkkC6ks7T6qoTsPI2lLqUG6srU3XA399bWVcLq7X9JEcR+9
ODwzHPj9fD5CZCe22lIO
=gllO
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included change:
- Check if the skb has been correctly prepared before going on
Do not clear Broadcast/Multicast/Unicast Wake Flag or LanWake in
Config5. This is necessary to preserve WOL state when the driver is
loaded. Although the r8168 vendor driver does not write Config5 (it has
been commented out), Hayes Wang from Realtek said that masking bits like
this is more sensible.
Signed-off-by: Peter Wu <lekensteyn@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If via_ircc_open() fails, data structures of the driver left uninitialized,
but probe (via_init_one()) returns zero. That can lead to null pointer dereference
in via_remove_one(), since it does not check drvdata for NULL.
The patch implements proper error code propagation.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user turns off VNET_HDR support on the
macvtap device, there is no way to provide any
offload information to the user. So, it's safer
to ignore offload setting then depend on the user
setting them correctly.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user turns off IFF_VNET_HDR flag, attempts to change
offload features via TUNSETOFFLOAD do not work. This could cause
GSO packets to be delivered to the user when the user is
not prepared to handle them.
To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is
disabled.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In macvtap, tap_features specific the features of that the user
has specified via ioctl(). If we treat macvtap as a macvlan+tap
then we could all the tap a pseudo-device and give it other features
like SG and GSO. Then we can stop using the features of lower
device (macvlan) when forwarding the traffic the tap.
This solves the issue of possible checksum offload mismatch between
tap feature and macvlan features.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent.
The "when" field must be set for such skb. It's used in tcp_rearm_rto
for example. If the "when" field isn't set, the retransmit timeout can
be calculated incorrectly and a tcp connected can stop for two minutes
(TCP_RTO_MAX).
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The branch emulation needs to handle the OCTEON BBIT instructions,
otherwise we get SIGILL instead of emulation.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5726/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:
kernel BUG at drivers/xen/events.c:1328!
RCU has detected that a CPU has not entered a quiescent state within the
grace period. It needs to send the CPU a reschedule IPI if it is not
offline. rcu_implicit_offline_qs() does this check:
/*
* If the CPU is offline, it is in a quiescent state. We can
* trust its state not to change because interrupts are disabled.
*/
if (cpu_is_offline(rdp->cpu)) {
rdp->offline_fqs++;
return 1;
}
Else the CPU is online. Send it a reschedule IPI.
The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()). See start_secondary():
set_cpu_online(smp_processor_id(), true);
...
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:
/*
* Wait until the cpu which brought this one up marked it
* online before enabling interrupts. If we don't do that then
* we can end up waking up the softirq thread before this cpu
* reached the active state, which makes the scheduler unhappy
* and schedule the softirq thread on the wrong cpu. This is
* only observable with forced threaded interrupts, but in
* theory it could also happen w/o them. It's just way harder
* to achieve.
*/
while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
cpu_relax();
/* enable local interrupts */
local_irq_enable();
The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug. In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.
The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up. rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:
kernel BUG at drivers/xen/events.c:1328!
xen_send_IPI_one()
xen_smp_send_reschedule()
rcu_implicit_offline_qs()
rcu_implicit_dynticks_qs()
force_qs_rnp()
force_quiescent_state()
__rcu_process_callbacks()
rcu_process_callbacks()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.
There is at least one other place that has caused the same crash:
xen_smp_send_reschedule()
wake_up_idle_cpu()
add_timer_on()
clocksource_watchdog()
call_timer_fn()
run_timer_softirq()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
xen_hvm_callback_vector()
clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:
/*
* Cycle through CPUs to check if the CPUs stay synchronized
* to each other.
*/
next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(cpu_online_mask);
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);
This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).
But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."
The conclusion was that:
"
1. At the IPI sender side:
It is incorrect to send an IPI to an offline CPU (cpu not present in
the cpu_online_mask). There are numerous places where we check this
and warn/complain.
2. At the IPI receiver side:
It is incorrect to let the world know of our presence (by setting
ourselves in global bitmasks) until our initialization steps are complete
to such an extent that we can handle the consequences (such as
receiving interrupts without crashing the sender etc.)
" (from Srivatsa)
As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.
It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.
This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.
Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When a event is being bound to a VCPU there is a window between the
EVTCHNOP_bind_vpcu call and the adjustment of the local per-cpu masks
where an event may be lost. The hypervisor upcalls the new VCPU but
the kernel thinks that event is still bound to the old VCPU and
ignores it.
There is even a problem when the event is being bound to the same VCPU
as there is a small window beween the clear_bit() and set_bit() calls
in bind_evtchn_to_cpu(). When scanning for pending events, the kernel
may read the bit when it is momentarily clear and ignore the event.
Avoid this by masking the event during the whole bind operation.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
CC: stable@vger.kernel.org
The sizeof() argument in init_evtchn_cpu_bindings() is incorrect
resulting in only the first 64 (or 32 in 32-bit guests) ports having
their bindings being initialized to VCPU 0.
In most cases this does not cause a problem as request_irq() will set
the irq affinity which will set the correct local per-cpu mask.
However, if the request_irq() is called on a VCPU other than 0, there
is a window between the unmasking of the event and the affinity being
set were an event may be lost because it is not locally unmasked on
any VCPU. If request_irq() is called on VCPU 0 then local irqs are
disabled during the window and the race does not occur.
Fix this by initializing all NR_EVENT_CHANNEL bits in the local
per-cpu masks.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org
If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.
There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.
We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.
This fixes a boot failure on hosts with tboot.
tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.
(XEN) 0000000000000000 - 0000000000060000 (usable)
(XEN) 0000000000060000 - 0000000000068000 (reserved)
(XEN) 0000000000068000 - 000000000009e000 (usable)
(XEN) 0000000000100000 - 0000000000800000 (usable)
(XEN) 0000000000800000 - 0000000000972000 (unusable)
tboot marked this region as unusable.
(XEN) 0000000000972000 - 00000000cf200000 (usable)
(XEN) 00000000cf200000 - 00000000cf38f000 (reserved)
(XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data)
(XEN) 00000000cf3ce000 - 00000000d0000000 (reserved)
(XEN) 00000000e0000000 - 00000000f0000000 (reserved)
(XEN) 00000000fe000000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000630000000 (usable)
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a port of c95eb3184e ("ARM: 7809/1: perf: fix event validation
for software group leaders") to arm64, which fixes a panic in the arm64
perf backend found as a result of Vince's fuzzing tool.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This is a port of d9f966357b ("ARM: 7810/1: perf: Fix array out of
bounds access in armpmu_map_hw_event()") to arm64, which fixes an oops
in the arm64 perf backend found as a result of Vince's fuzzing tool.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are possible HW configurations in which PFs will have SR-IOV capability
but will have Max VFs set to 0 - this happens when there are Multi-Function
devices where the VFs are allocated to only some of the PFs.
DMAE is configured to support VFs only if the configuring PF has supported VFs.
In case the first PF to be loaded will be one without supported VFs, it will
not configure DMAE to the VF-supporting mode. When VFs of other PFs will be
loaded later on, they will not be able to communicate with their PF.
This changes the requirement for configuring DMAE for VF-supporting mode;
If the device has SR-IOV capabilities there must be some PF that has
max supported VFs > 0, thus it will configure the DMAE for supporting VFs.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since SR-IOV can be activated dynamically and iproute2 can be called
asynchronously, the various callbacks need a robust sanity check before
attempting to access the SR-IOV database and members since there are numerous
states in which it can find the driver (e.g., PF is down, sriov was not enabled
yet, VF is down, etc.).
In many of the states the callback result will be null pointer dereference.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During probe, VFs might erroneously try to access the shared memory (which
only PFs are capabale of accessing), causing benign attentions to appear.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When publishing information via getfeatcfg(), bnx2x driver didn't consider
remote errors (e.g., switch that doesn't support DCBX) when setting the
error flags.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After notification that DCBX configuration has ended arrived to the driver,
the driver configured the FW/HW in sleepless context.
As a result, it was possible to reach a race (mostly with CNIC registration)
in which the configuration will return a timeout, failing to set the DCBX
results correctly.
This patch moves the configuration following the DCBX end into the slowpath
RTNL task (i.e., sleepless context protected by the RTNL lock), allowing the
configuration to cope with such races.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 3deb816 "bnx2x: Add a periodic task for link PHY events"
link state changes can be detected not only via the attention flow but also
from the periodic task.
If the link state will change in such a manner (i.e., via the periodic task),
dropless flow-control will not be configured.
This patch remedies the issue, adding the missing configuration to all required
flows.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not allowed for an ipv6 packet to contain multiple fragmentation
headers. So discard packets which were already reassembled by
fragmentation logic and send back a parameter problem icmp.
The updates for RFC 6980 will come in later, I have to do a bit more
research here.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the max_addresses check attackers were able to disable privacy
extensions on an interface by creating enough autoconfigured addresses:
<http://seclists.org/oss-sec/2012/q4/292>
But the check is not actually needed: max_addresses protects the
kernel to install too many ipv6 addresses on an interface and guards
addrconf_prefix_rcv to install further addresses as soon as this limit
is reached. We only generate temporary addresses in direct response of
a new address showing up. As soon as we filled up the maximum number of
addresses of an interface, we stop installing more addresses and thus
also stop generating more temp addresses.
Even if the attacker tries to generate a lot of temporary addresses
by announcing a prefix and removing it again (lifetime == 0) we won't
install more temp addresses, because the temporary addresses do count
to the maximum number of addresses, thus we would stop installing new
autoconfigured addresses when the limit is reached.
This patch fixes CVE-2013-0343 (but other layer-2 attacks are still
possible).
Thanks to Ding Tianhong to bring this topic up again.
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: George Kargiotakis <kargig@void.gr>
Cc: P J P <ppandit@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only one fix in this pull request.
A straight forward incorrect read address in the adjd_s311 driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iQIcBAABAgAGBQJSEnVkAAoJEFSFNJnE9BaIpbgP+QF+q+bPtLGl8w1XjD1yCNVe
N7ALHYHdBxgrSlkXrMpvdtWyMw7K+3+sZZq0/SjI1AI1jAjADEJ74hy/Vqw/tp3R
LbeOeSdF8LgplwZR8/qtXuETX84047Xm6O1XIFdAkm3fazScjcfd02PEBnggrsHI
mFxpyks6DE3MzjjWo/S1MDeNoLtiXPAfyJmxXn3UHtjB7ATAPnanfq+kmMx+3Tve
cRjcHTjmdeJGW7LjpZQBzVmOzJe1oht/kZcm/uG2fBbldDl7dHYB8jiCq8PicFfq
QkcwkD6FrRBFZ1S5mbB40gEcTI8kGONKQfMSdMYbbH4HDJpik+1mmlB0AxSneiDY
W9L3qgUkgIVdbsTMjDq6/W9r+yFFsv8LRBRsxQThKb1AcD59WrQcvVR+/MocWrGz
GhMxMUvs+SrvsrAUQjZTbt4hgjJiFWcGO90CigQe0zdyv7VYQmNwsHo1/w9HdqCB
LX47zE4TshKkVLnB/FNzoAGd9kagG6vm9UJBNKfW9WRgxYwcg3aPYBI9uFDPkm0G
kb8LQd4dGWL+AM+CbhMvMkx0LebhyIYjZ7UasYpNsLemmW7Vs703aE1oj/NOk1Xx
i1LX074JQ51e1bwLmB52yZ5ZijWScmwFAeVHtIgpG1H6nCBQxvZ+bfIHvSRRqpkD
XrFqLOx8gZDQkfJ7qgB6
=mWwS
-----END PGP SIGNATURE-----
Merge tag 'iio-fixes-for-3.11c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Third round of IIO fixes for the 3.11 series.
Only one fix in this pull request.
A straight forward incorrect read address in the adjd_s311 driver.
In the previous commit, Richard Genoud fixed proc_root_readdir(), which
had lost the check for whether all of the non-process /proc entries had
been returned or not.
But that in turn exposed _another_ bug, namely that the original readdir
conversion patch had yet another problem: it had lost the return value
of proc_readdir_de(), so now checking whether it had completed
successfully or not didn't actually work right anyway.
This reinstates the non-zero return for the "end of base entries" that
had also gotten lost in commit f0c3b5093a ("[readdir] convert
procfs"). So now you get all the base entries *and* you get all the
process entries, regardless of getdents buffer size.
(Side note: the Linux "getdents" manual page actually has a nice example
application for testing getdents, which can be easily modified to use
different buffers. Who knew? Man-pages can be useful)
Reported-by: Emmanuel Benisty <benisty.e@gmail.com>
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Cc: Richard Genoud <richard.genoud@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f6f91b0d9f ("ARM: allow kuser helpers to be removed from the
vector page") introduced some help text for the CONFIG_KUSER_HELPERS
option which is rather contradictory.
Let's fix that, and improve it a little.
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In case of normal kexec kernel load, all cpu's are offlined
before calling machine_kexec().But in case crash panic cpus
are relaxed in machine_crash_nonpanic_core() SMP function
but not offlined.
When crash kernel is loaded with kexec and on panic trigger
machine_kexec() checks for number of cpus online.
If more than one cpu is online machine_kexec() fails to load
with below error
kexec: error: multiple CPUs still online
In machine_crash_nonpanic_core() SMP function, offline CPU
before cpu_relax
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 2ba85e7af4 (ARM: Fix FIQ code on VIVT CPUs) causes the following build warning:
arch/arm/kernel/fiq.c:92:3: warning: passing argument 1 of 'cpu_cache.coherent_kern_range' makes integer from pointer without a cast [enabled by default]
Cast it as '(unsigned long)base' to avoid the warning.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>