Commit Graph

561778 Commits

Author SHA1 Message Date
Pavel Fedin bc45a516fa arm64: KVM: Correctly handle zero register during MMIO
On ARM64 register index of 31 corresponds to both zero register and SP.
However, all memory access instructions, use ZR as transfer register. SP
is used only as a base register in indirect memory addressing, or by
register-register arithmetics, which cannot be trapped here.

Correct emulation is achieved by introducing new register accessor
functions, which can do special handling for reg_num == 31. These new
accessors intentionally do not rely on old vcpu_reg() on ARM64, because
it is to be removed. Since the affected code is shared by both ARM
flavours, implementations of these accessors are also added to ARM32 code.

This patch fixes setting MMIO register to a random value (actually SP)
instead of zero by something like:

 *((volatile int *)reg) = 0;

compilers tend to generate "str wzr, [xx]" here

[Marc: Fixed 32bit splat]

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-12-04 16:29:37 +00:00
Marc Zyngier 3845d2953a PCI/MSI: Only use the generic MSI layer when domain is hierarchical
Since d8a1cb7575 ("PCI/MSI: Let pci_msi_get_domain use struct
device::msi_domain"), we use the MSI domain associated with the PCI device.

But finding an MSI domain doesn't mean that the domain is implemented using
the generic MSI domain API, and a number of MSI controllers are still using
arch_setup_msi_irq() and arch_teardown_msi_irqs().

Check that the domain we just obtained is hierarchical.  If it is, we can
use the new generic MSI stuff.  Otherwise we have to fall back to the old
arch_setup_msi_irq() and arch_teardown_msi_irqs() interfaces.

This avoids an oops in msi_domain_alloc_irqs() on systems with R-Car,
Tegra, Armada 370, and probably other DesignWare-based host controllers.

Fixes: d8a1cb7575 ("PCI/MSI: Let pci_msi_get_domain use struct device::msi_domain")
Reported-by: Phil Edworthy <phil.edworthy@renesas.com>
Tested-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
CC: stable@vger.kernel.org	# v4.3+
2015-12-04 10:28:14 -06:00
Alexandre Belloni 4a0c4c3609 USB: host: ohci-at91: fix a crash in ohci_hcd_at91_overcurrent_irq
The interrupt handler, ohci_hcd_at91_overcurrent_irq may be called right
after registration. At that time, pdev->dev.platform_data is not yet set,
leading to a NULL pointer dereference.

Fixes: e4df92279f (USB: host: ohci-at91: merge loops in ohci_hcd_at91_drv_probe)
Reported-by: Peter Rosin <peda@axentia.se>
Tested-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: stable@vger.kernel.org # 4.3+
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-04 08:19:55 -08:00
Don Zickus 6406eeb3f5 usb: Quiet down false peer failure messages
My recent Intel box is spewing these messages:

xhci_hcd 0000:00:14.0: xHCI Host Controller
xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: xHCI Host Controller
usb usb2: Manufacturer: Linux 4.3.0+ xhci-hcd
usb usb2: SerialNumber: 0000:00:14.0
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 6 ports detected
usb: failed to peer usb2-port2 and usb1-port1 by location (usb2-port2:none) (usb1-port1:usb2-port1)
usb usb2-port2: failed to peer to usb1-port1 (-16)
usb: port power management may be unreliable
usb: failed to peer usb2-port3 and usb1-port1 by location (usb2-port3:none) (usb1-port1:usb2-port1)
usb usb2-port3: failed to peer to usb1-port1 (-16)
usb: failed to peer usb2-port5 and usb1-port1 by location (usb2-port5:none) (usb1-port1:usb2-port1)
usb usb2-port5: failed to peer to usb1-port1 (-16)
usb: failed to peer usb2-port6 and usb1-port1 by location (usb2-port6:none) (usb1-port1:usb2-port1)
usb usb2-port6: failed to peer to usb1-port1 (-16)

Diving into the acpi tables, I noticed the EHCI hub has 12 ports while the XHCI
hub has 8 ports.  Most of those ports are of connect type USB_PORT_NOT_USED
(including port 1 of the EHCI hub).

Further the unused ports have location data initialized to 0x80000000.

Now each unused port on the xhci hub walks the port list and finds a matching
peer with port1 of the EHCI hub because the zero'd out group id bits falsely match.
After port1 of the XHCI hub, each following matching peer will generate the
above warning.

These warnings seem to be harmless for this scenario as I don't think it
matters that unused ports could not create a peer link.

The attached patch utilizes that assumption and just turns the pr_warn into
pr_debug to quiet things down.

Tested on my Intel box.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-04 08:19:55 -08:00
Chunfeng Yun 096b110a3d usb: xhci: fix config fail of FS hub behind a HS hub with MTT
if a full speed hub connects to a high speed hub which
supports MTT, the MTT field of its slot context will be set
to 1 when xHCI driver setups an xHCI virtual device in
xhci_setup_addressable_virt_dev(); once usb core fetch its
hub descriptor, and need to update the xHC's internal data
structures for the device, the HUB field of its slot context
will be set to 1 too, meanwhile MTT is also set before,
this will cause configure endpoint command fail, so in the
case, we should clear MTT to 0 for full speed hub according
to section 6.2.2

Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-04 08:19:55 -08:00
Mika Westerberg 84ed91526f xhci: Fix memory leak in xhci_pme_acpi_rtd3_enable()
There is a memory leak because acpi_evaluate_dsm() actually returns an
object which the caller is supposed to release. Fix this by calling
ACPI_FREE() for the returned object (this expands to kfree() so passing
NULL there is fine as well).

While there correct indentation in !CONFIG_ACPI case.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable <stable@vger.kernel.org> # v4.2
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-04 08:19:55 -08:00
Alex Williamson ae5515d663 Revert: "vfio: Include No-IOMMU mode"
Revert commit 033291eccb ("vfio: Include No-IOMMU mode") due to lack
of a user.  This was originally intended to fill a need for the DPDK
driver, but uptake has been slow so rather than support an unproven
kernel interface revert it and revisit when userspace catches up.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-12-04 08:38:42 -07:00
Ilya Dryomov 70b16db86f rbd: don't put snap_context twice in rbd_queue_workfn()
Commit 4e752f0ab0 ("rbd: access snapshot context and mapping size
safely") moved ceph_get_snap_context() out of rbd_img_request_create()
and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
error path in rbd_queue_workfn().  However, rbd_img_request_create()
consumes a ref on snapc, so calling ceph_put_snap_context() after
a successful rbd_img_request_create() leads to an extra put.  Fix it.

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2015-12-04 14:29:18 +01:00
Rafael J. Wysocki d441fe25e7 Merge branches 'pm-domains' and 'pm-cpufreq'
* pm-domains:
  PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
  PM / Domains: Validate cases of a non-bound driver in genpd governor

* pm-cpufreq:
  cpufreq: use last policy after online for drivers with ->setpolicy
2015-12-04 14:01:42 +01:00
Rafael J. Wysocki 3e5050e60e Merge branches 'acpica', 'acpi-video' and 'device-properties'
* acpica:
  ACPI: Better describe ACPI_DEBUGGER

* acpi-video:
  MAINTAINERS: ACPI / video: update a file name in drivers/acpi/

* device-properties:
  ACPI / property: fix compile error for acpi_node_get_property_reference() when CONFIG_ACPI=n
2015-12-04 14:01:17 +01:00
Rafael J. Wysocki c09c9dd2e9 Merge branches 'acpi-pci' and 'pm-pci'
* acpi-pci:
  x86/PCI/ACPI: Fix regression caused by commit 4d6b4e69a2

* pm-pci:
  PCI / PM: Tune down retryable runtime suspend error messages
2015-12-04 14:01:02 +01:00
Peter Zijlstra ecf7d01c22 sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule()
Oleg noticed that its possible to falsely observe p->on_cpu == 0 such
that we'll prematurely continue with the wakeup and effectively run p on
two CPUs at the same time.

Even though the overlap is very limited; the task is in the middle of
being scheduled out; it could still result in corruption of the
scheduler data structures.

        CPU0                            CPU1

        set_current_state(...)

        <preempt_schedule>
          context_switch(X, Y)
            prepare_lock_switch(Y)
              Y->on_cpu = 1;
            finish_lock_switch(X)
              store_release(X->on_cpu, 0);

                                        try_to_wake_up(X)
                                          LOCK(p->pi_lock);

                                          t = X->on_cpu; // 0

          context_switch(Y, X)
            prepare_lock_switch(X)
              X->on_cpu = 1;
            finish_lock_switch(Y)
              store_release(Y->on_cpu, 0);
        </preempt_schedule>

        schedule();
          deactivate_task(X);
          X->on_rq = 0;

                                          if (X->on_rq) // false

                                          if (t) while (X->on_cpu)
                                            cpu_relax();

          context_switch(X, ..)
            finish_lock_switch(X)
              store_release(X->on_cpu, 0);

Avoid the load of X->on_cpu being hoisted over the X->on_rq load.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 10:26:43 +01:00
Peter Zijlstra b75a225315 sched/core: Better document the try_to_wake_up() barriers
Explain how the control dependency and smp_rmb() end up providing
ACQUIRE semantics and pair with smp_store_release() in
finish_lock_switch().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 10:26:42 +01:00
Hiroshi Shimamoto 2541117b0c sched/cputime: Fix invalid gtime in proc
/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
is only updated when vtime accounting is runtime enabled.

This patch makes task_gtime() just return gtime without computing the
buggy non-existing tickless delta when vtime accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless delta. This
way we fix the gtime value regression on machines not running nohz full.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second:

	 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
	S 4348
	R 7064566
	R 7064766
	R 7064967
	R 7065168
	S 4759
	S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

	 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
	S 5338
	R 5365
	R 5465
	R 5566
	R 5666
	S 5726
	S 5726

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 10:18:49 +01:00
Xunlei Pang 8295c69925 sched/core: Clear the root_domain cpumasks in init_rootdomain()
root_domain::rto_mask allocated through alloc_cpumask_var()
contains garbage data, this may cause problems. For instance,
When doing pull_rt_task(), it may do useless iterations if
rto_mask retains some extra garbage bits. Worse still, this
violates the isolated domain rule for clustered scheduling
using cpuset, because the tasks(with all the cpus allowed)
belongs to one root domain can be pulled away into another
root domain.

The patch cleans the garbage by using zalloc_cpumask_var()
instead of alloc_cpumask_var() for root_domain::rto_mask
allocation, thereby addressing the issues.

Do the same thing for root_domain's other cpumask memembers:
dlo_mask, span, and online.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1449057179-29321-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 10:16:21 +01:00
Sasha Levin 119d6f6a3b sched/core: Remove false-positive warning from wake_up_process()
Because wakeups can (fundamentally) be late, a task might not be in
the expected state. Therefore testing against a task's state is racy,
and can yield false positives.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: oleg@redhat.com
Fixes: 9067ac85d5 ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task")
Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 10:10:16 +01:00
Peter Zijlstra 68985633bc sched/wait: Fix signal handling in bit wait helpers
Vladimir reported getting RCU stall warnings and bisected it back to
commit:

  743162013d ("sched: Remove proliferation of wait_on_bit() action functions")

That commit inadvertently reversed the calls to schedule() and signal_pending(),
thereby not handling the case where the signal receives while we sleep.

Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mark.rutland@arm.com
Cc: neilb@suse.de
Cc: oleg@redhat.com
Fixes: 743162013d ("sched: Remove proliferation of wait_on_bit() action functions")
Fixes: cbbce82209 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
Link: http://lkml.kernel.org/r/20151201130404.GL3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 10:10:15 +01:00
Peter Zijlstra 642c2d671c perf: Fix PERF_EVENT_IOC_PERIOD deadlock
Dmitry reported a fairly silly recursive lock deadlock for
PERF_EVENT_IOC_PERIOD, fix this by explicitly doing the inactive part of
__perf_event_period() instead of calling that function.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: c7999c6f3f ("perf: Fix PERF_EVENT_IOC_PERIOD migration race")
Link: http://lkml.kernel.org/r/20151130115615.GJ17308@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 10:08:03 +01:00
Kirill A. Shutemov 70f1528747 x86/mm: Fix regression with huge pages on PAE
Recent PAT patchset has caused issue on 32-bit PAE machines:

  page:eea45000 count:0 mapcount:-128 mapping:  (null) index:0x0 flags: 0x40000000()
  page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) < 0)
  ------------[ cut here ]------------
  kernel BUG at /home/build/linux-boris/mm/huge_memory.c:1485!
  invalid opcode: 0000 [#1] SMP
  [...]
  Call Trace:
   unmap_single_vma
   ? __wake_up
   unmap_vmas
   unmap_region
   do_munmap
   vm_munmap
   SyS_munmap
   do_fast_syscall_32
   ? __do_page_fault
   sysenter_past_esp
  Code: ...
  EIP: [<c11bde80>] zap_huge_pmd+0x240/0x260 SS:ESP 0068:f6459d98

The problem is in pmd_pfn_mask() and pmd_flags_mask(). These
helpers use PMD_PAGE_MASK to calculate resulting mask.
PMD_PAGE_MASK is 'unsigned long', not 'unsigned long long' as
phys_addr_t is on 32-bit PAE (ARCH_PHYS_ADDR_T_64BIT). As a
result, the upper bits of resulting mask get truncated.

pud_pfn_mask() and pud_flags_mask() aren't problematic since we
don't have PUD page table level on 32-bit systems, but it's
reasonable to keep them consistent with PMD counterpart.

Introduce PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK in
addition to existing PHYSICAL_PAGE_MASK and reworks helpers to
use them.

Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Fix -Woverflow warnings from the realmode code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: linux-mm <linux-mm@kvack.org>
Fixes: f70abb0fc3 ("x86/asm: Fix pud/pmd interfaces to handle large PAT bit")
Link: http://lkml.kernel.org/r/1448878233-11390-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 09:14:27 +01:00
Daniel Vetter bbc8764f80 drm/nouveau: Fix pre-nv50 pageflip events (v4)
Apparently pre-nv50 pageflip events happen before the actual vblank
period. Therefore that functionality got semi-disabled in

commit af4870e406
Author: Mario Kleiner <mario.kleiner.de@gmail.com>
Date:   Tue May 13 00:42:08 2014 +0200

    drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.

Unfortunately that hack got uprooted in

commit cc1ef118fc
Author: Thierry Reding <treding@nvidia.com>
Date:   Wed Aug 12 17:00:31 2015 +0200

    drm/irq: Make pipe unsigned and name consistent

Triggering a warning when trying to sample the vblank timestamp for a
non-existing pipe. There's a few ways to fix this:

- Open-code the old behaviour, which just enshrines this slight
  breakage of the userspace ABI.

- Revert Mario's commit and again inflict broken timestamps, again not
  pretty.

- Fix this for real by delaying the pageflip TS until the next vblank
  interrupt, thereby making it accurate.

This patch implements the third option. Since having a page flip
interrupt that happens when the pageflip gets armed and not when it
completes in the next vblank seems to be fairly common (older i915 hw
works very similarly) create a new helper to arm vblank events for
such drivers.

v2 (Mario Kleiner):
- Fix function prototypes in drmP.h
- Add missing vblank_put() for pageflip completion without
  pageflip event.
- Initialize sequence number for queued pageflip event to avoid
  trouble in drm_handle_vblank_events().
- Remove dead code and spelling fix.

v3 (Mario Kleiner):
- Add a signed-off-by and cc stable tag per Ilja's advice.

v4 (Thierry Reding):
- Fix kerneldoc typo, discovered by Michel Dänzer
- Rearrange tags and changelog

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org # v4.3
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-04 13:49:38 +10:00
Ken Xue 4fd41a8552 SCSI: Fix NULL pointer dereference in runtime PM
The routines in scsi_pm.c assume that if a runtime-PM callback is
invoked for a SCSI device, it can only mean that the device's driver
has asked the block layer to handle the runtime power management (by
calling blk_pm_runtime_init(), which among other things sets q->dev).

However, this assumption turns out to be wrong for things like the ses
driver.  Normally ses devices are not allowed to do runtime PM, but
userspace can override this setting.  If this happens, the kernel gets
a NULL pointer dereference when blk_post_runtime_resume() tries to use
the uninitialized q->dev pointer.

This patch fixes the problem by checking q->dev in block layer before
handle runtime PM. Since ses doesn't define any PM callbacks and call
blk_pm_runtime_init(), the crash won't occur.

This fixes Bugzilla #101371.
https://bugzilla.kernel.org/show_bug.cgi?id=101371

More discussion can be found from below link.
http://marc.info/?l=linux-scsi&m=144163730531875&w=2

Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Xiangliang Yu <Xiangliang.Yu@amd.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Terry <Michael.terry@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-03 20:35:02 -07:00
Thomas Hellstrom a0af2e538c drm: Fix an unwanted master inheritance v2
A client calling drmSetMaster() using a file descriptor that was opened
when another client was master would inherit the latter client's master
object and all its authenticated clients.

This is unwanted behaviour, and when this happens, instead allocate a
brand new master object for the client calling drmSetMaster().

Fixes a BUG() throw in vmw_master_set().

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-04 12:28:14 +10:00
Dave Airlie f46e699cb6 imx-drm crtc, plane, parallel panel, and TV encoder fixes
- Use drm_crtc_send_vblank_event to fix per crtc vblank handling
 - Move the crtc device of_node assignment out of the ipuv3-crtc driver into
   ipu-common code, where the devices are created.
 - Fix parallel display support with simple-panels
 - Remove some unused fields and superfluous checks
 - Switch to universal planes and add error handling for primary plane creation
 - Fix module autoload for TV encoder driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWXUoqAAoJEFDCiBxwnmDrNuUQALDHJMfmAxWpHxVn/YnT8/Yp
 6FD4+7FJDNG11NLXdZ4+ZdoZvHaI5yc7AVxsfrrFSHajVHlJQNrU9MdLGfxS3Ctg
 Yr3sH/EPoH9l3Dkjxko9/5D56XeiQSc7elSxDQRLyaKxC4V7Lq8S6cK8Y91oU+NG
 KJK60iTAi8JU6j5DlfDZjHmlcvimuGwm2Kyp0WC21Ks1O6y0WuQdnn6z+ZoJXUe/
 ReIfEBSainrNR9mZF2d9aBIAR2AkmHUOTnpXlbTFs7HFgQBfsu7J80OL6MRYui8S
 KBe3YE7hscKijVpjPZE2t4nW99ft2bXvHab+MIL5Crqz5XeJblhqERPoUm9iGmRe
 oLVEofXXhZ9mljDg2oKIVEX2YKjEnjxo8AzuGmeIK9UaBIB0PYQazf/626IHUT1P
 jADtKL5nO1jMtkrlogs8BSOnlbkodwqfiqYIyjLnO/HdUHNoOofq4szbTH7/sGoZ
 cUnRfwh7SAigmdxT1L0I2tRRv3BIKtZ/OYfyyRqOfgWALYFVdPI/ISDN7/CpwzNR
 qhgPFAJ59bN6riqrVJF3pq4+vmDQ30OL0TFFf9petZOm7OmsV5Bvfo2BgbatyOUo
 77NLMztA2s5ugUT8gjDjOjZPAe+pByMCOg4ZRvyaE1JMyJI7ITXVLlCGk4obakDx
 yoSkRn2LiE2IKAVDalm5
 =iEJU
 -----END PGP SIGNATURE-----

Merge tag 'imx-drm-fixes-2015-12-01' of git://git.pengutronix.de/git/pza/linux into drm-fixes

imx-drm crtc, plane, parallel panel, and TV encoder fixes

- Use drm_crtc_send_vblank_event to fix per crtc vblank handling
- Move the crtc device of_node assignment out of the ipuv3-crtc driver into
  ipu-common code, where the devices are created.
- Fix parallel display support with simple-panels
- Remove some unused fields and superfluous checks
- Switch to universal planes and add error handling for primary plane creation
- Fix module autoload for TV encoder driver

* tag 'imx-drm-fixes-2015-12-01' of git://git.pengutronix.de/git/pza/linux:
  drm: imx: imx-tve: Fix module autoload for OF platform driver
  drm: imx: convert to drm_crtc_send_vblank_event()
  GPU-DRM-IMX: Delete an unnecessary check before drm_fbdev_cma_restore_mode()
  drm/imx: Remove of_node assignment from ipuv3-crtc driver probe
  gpu: ipu-v3: Assign of_node of child platform devices to corresponding ports
  gpu: ipu-v3: Remove reg_offset field
  gpu: ipu-v3: drop unused dmfc field from client platform data
  drm/imx: parallel-display: allow to determine bus format from the connected panel
  drm/imx: ipuv3-crtc: Return error if ipu_plane_init() fails for primary plane
  drm/imx: switch to universal planes
2015-12-04 12:26:29 +10:00
Dave Airlie 00b83070b3 Merge tag 'drm-intel-fixes-2015-12-03' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Another batch of drm/i915 fixes for v4.4, on top of the ones from
earlier this week. One timeout handling regression fix from Chris, and
backport of five patches from our -next to fix a power management
related HDMI hotplug regression.

* tag 'drm-intel-fixes-2015-12-03' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: take a power domain reference while checking the HDMI live status
  drm/i915: add MISSING_CASE to a few port/aux power domain helpers
  drm/i915/ddi: fix intel_display_port_aux_power_domain() after HDMI detect
  drm/i915: Introduce a gmbus power domain
  drm/i915: Clean up AUX power domain handling
  drm/i915: Check the timeout passed to i915_wait_request
2015-12-04 12:23:13 +10:00
Eric Anholt 265e2cf672 PM / Domains: Fix bad of_node_put() in failure paths of genpd_dev_pm_attach()
It looks like these meant to be unreffing the
of_parse_phandle_with_args() node, since the error paths above it
don't do of_node_put.  That function returns a new ref in pd_args.np,
though, not a new ref on dev->of_node.  Also, it would have leaked the
ref in the success case.

Fixes "ERROR: Bad of_node_put()" on bcm2835 in the -EPROBE_DEFER case.

Fixes: aa42240ab2 (PM / Domains: Add generic OF-based PM domain look-up)
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Cc: 3.18+ <stable@vger.kernel.org> # 3.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-04 01:21:17 +01:00
Linus Torvalds 071f5d105a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A lot of Thanksgiving turkey leftovers accumulated, here goes:

   1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.

   2) IDs for some new iwlwifi chips, from Oren Givon.

   3) Fix rtlwifi lockups on boot, from Larry Finger.

   4) Fix memory leak in fm10k, from Stephen Hemminger.

   5) We have a route leak in the ipv6 tunnel infrastructure, fix from
      Paolo Abeni.

   6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.

   7) Wrong lockdep annotations in tcp md5 support, fix from Eric
      Dumazet.

   8) Work around some middle boxes which prevent proper handling of TCP
      Fast Open, from Yuchung Cheng.

   9) TCP repair can do huge kmalloc() requests, build paged SKBs
      instead.  From Eric Dumazet.

  10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
      Borkmann.

  11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
      Nikolay Aleksandrov.

  12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
      Weikusat.

  13) Fix double free in VRF code, from Nikolay Aleksandrov.

  14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.

  15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.

  16) Fix clearing of persistent array maps in bpf, from Daniel
      Borkmann.

  17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
      early enough.  From Eric Dumazet.

  18) Fix out of bounds accesses in bpf array implementation when
      updating elements, from Daniel Borkmann.

  19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
      Dumazet.

  20) When dumping proxy neigh entries, we have to accomodate NULL
      device pointers properly, from Konstantin Khlebnikov.

  21) SCTP doesn't release all ipv6 socket resources properly, fix from
      Eric Dumazet.

  22) Prevent underflows of sch->q.qlen for multiqueue packet
      schedulers, also from Eric Dumazet.

  23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
      Huang and Michael Chan.

  24) Don't actively scan radar channels, from Antonio Quartulli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
  net: phy: reset only targeted phy
  bnxt_en: Setup uc_list mac filters after resetting the chip.
  bnxt_en: enforce proper storing of MAC address
  bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
  net: lpc_eth: remove irq > NR_IRQS check from probe()
  net_sched: fix qdisc_tree_decrease_qlen() races
  openvswitch: fix hangup on vxlan/gre/geneve device deletion
  ipv4: igmp: Allow removing groups from a removed interface
  ipv6: sctp: implement sctp_v6_destroy_sock()
  arm64: bpf: add 'store immediate' instruction
  ipv6: kill sk_dst_lock
  ipv6: sctp: add rcu protection around np->opt
  net/neighbour: fix crash at dumping device-agnostic proxy entries
  sctp: use GFP_USER for user-controlled kmalloc
  sctp: convert sack_needed and sack_generation to bits
  ipv6: add complete rcu protection around np->opt
  bpf: fix allocation warnings in bpf maps and integer overflow
  mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
  net: mvneta: enable setting custom TX IP checksum limit
  net: mvneta: fix error path for building skb
  ...
2015-12-03 16:02:46 -08:00
Linus Torvalds 2873d32ff4 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes from this series.  The most important here is a
  regression fix for an issue that some folks would hit in blk-merge.c,
  and the NVMe queue depth limit for the screwed up Apple "nvme"
  controller.

  In more detail, this pull request contains:

   - a set of fixes for null_blk, including a fix for a few corner cases
     where we could hang the device.  From Arianna and Paolo.

   - lightnvm:
        - A build improvement from Keith.
        - Update the qemu pci id detection from Matias.
        - Error handling fixes for leaks and other little fixes from
          Sudip and Wenwei.

   - fix from Eric where BLKRRPART would not return EBUSY for whole
     device mounts, only when partitions were mounted.

   - fix from Jan Kara, where EOF O_DIRECT reads would return
     negatively.

   - remove check for rq_mergeable() when checking limits for cloned
     requests.  The check doesn't make any sense.  It's assuming that
     since NOMERGE is set on the request that we don't have to
     recalculate limits since the request didn't change, but that's not
     true if the request has been redirected.  From Hannes.

   - correctly get the bio front segment value set for single segment
     bio's, fixing a BUG() in blk-merge.  From Ming"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: temporary fix for Apple controller reset
  null_blk: change type of completion_nsec to unsigned long
  null_blk: guarantee device restart in all irq modes
  null_blk: set a separate timer for each command
  blk-merge: fix computing bio->bi_seg_front_size in case of single segment
  direct-io: Fix negative return from dio read beyond eof
  block: Always check queue limits for cloned requests
  lightnvm: missing nvm_lock acquire
  lightnvm: unconverted ppa returned in get_bb_tbl
  lightnvm: refactor and change vendor id for qemu
  lightnvm: do device max sectors boundary check first
  lightnvm: fix ioctl memory leaks
  lightnvm: free memory when gennvm register fails
  lightnvm: Simplify config when disabled
  Return EBUSY from BLKRRPART for mounted whole-dev fs
2015-12-03 15:45:16 -08:00
Linus Torvalds c041f08738 During the merge window I added a new file that is used to filter trace
events on pids. It filters all events where only tasks with their pid in that
 file exists. It also handles the sched_switch and sched_wakeup trace events
 where the current task does not have its pid in the file, but the task
 either being switched to or awaken does.
 
 Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both of
 these tracepoints use the same class as the sched_wakeup tracepoint, and
 they too should be included in what gets filtered by the set_event_pid file.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWX7XpAAoJEKKk/i67LK/8G6gH/0W9QxCt/iS+J+x3gAPPs+/9
 jtwZgAOWjq+118ZpWORtgRcoH2r/sJUNwlXqhMojAHsPlwZsr6TXkWJkgyNdZZ7B
 QdUtZrr+egGYvd7TE0ONi/XrLTe9VLtBQsh5pN7l9fF9TjxYUmu5V9LplH9z0RxW
 Hw8EzqGzG2iZnXYCnErtu5jRLmr18f2u9aUptPAc4bYPLVUUw9M9MqRV/ZwQxsaX
 1mfIoR5SVC5IWW/R07qjULlbFpvNXkVJ56HwXMVBN44mYz3eUGYBKzjyAJ0Ugymf
 CNDPzh4HgVFsEqDedr0D8T5WZNJSUErdbHVSWze+CCUNfYikvU7gNmxNJ89Q3P4=
 =LsnU
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "During the merge window I added a new file that is used to filter
  trace events on pids.  It filters all events where only tasks with
  their pid in that file exists.  It also handles the sched_switch and
  sched_wakeup trace events where the current task does not have its pid
  in the file, but the task either being switched to or awaken does.

  Unfortunately, I forgot about sched_wakeup_new and sched_waking.  Both
  of these tracepoints use the same class as the sched_wakeup
  tracepoint, and they too should be included in what gets filtered by
  the set_event_pid file"

* tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
2015-12-03 15:23:17 -08:00
David S. Miller e3c9b1ef78 A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
    channels (from Antonio)
  * fix uninitialized variable in remain-on-channel that
    could lead to treating frame TX as remain-on-channel
    and not sending the frame at all
  * remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
    was broken and needs more work, we'll enable it later
  * fix call_rcu() induced use-after-reset/free in mesh
    (that was suddenly causing issues in certain tests)
  * always request block-ack window size 64 as we found
    some APs will otherwise crash (really ...)
  * fix P2P-Device teardown sequence to avoid restarting
    with uninitialized data
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJWX2JpAAoJEGt7eEactAAd+9cQAJmn3zt0orj/sASv7BeF0h5d
 sRfAhkBVOTZur8MgVj1c7fNzT3h1HYNei6c4SA2+rphy6Vbifoli1nLNloC+1Ld4
 2WXllEqVe473GqVofCxZHsYZPr2Inmhj7uMiDqvoKUiSRz7phmkY9m+Vju6WZG/W
 F6FrTLqFS7UDHIYNYH1DNVSScd/89Gu6pHZvEpoHkrsvt5rEEZiPAQ7sDB4MAMSm
 amETtuBqgX83gHR2G4UT2Z9r8TVdzhO+s7vvdVjj0qbP6C6BaS9IUXDjmm3gOvHy
 G7j9MJuUC8w/2fZ5A5/l94OuN5rF/ZFMNkn2e6OIzg0HjEZh74CeLl21CnuxdNpB
 ECmDVbKoI3OVoFbhEl7P5fBokzZsqhAXpZOmbYEeFRyO6lF2Mv9uzttsF6EOmCX0
 BjIoEXOWA2o6IUD/8M6NjW+/B58SDDVi9Mg6D+7Dn7rUFlQ4pddjb0m94bI8GQQU
 wl7gROMvYR3tIhiMs1bLF9jJgA831WGWu9eiq8mT2kHPaEV2bFO7OK+SUxyZu1M7
 UhN4eoLpU84v9QNJ34N8RCiYxEZ1e6HQxBwQn/fDIWOjOHryZoArhicFY9aOEja4
 9xBI9OJhBWOL4N4AFdmTExBdYudSgCTpX+/gQ4tSfedz3lqF79y8+PILwv6E1Q6D
 8pScH/4pVo4v5omGaMpA
 =vui7
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A small set of fixes for 4.4:
 * fix scanning in mac80211 to not actively scan radar
   channels (from Antonio)
 * fix uninitialized variable in remain-on-channel that
   could lead to treating frame TX as remain-on-channel
   and not sending the frame at all
 * remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
   was broken and needs more work, we'll enable it later
 * fix call_rcu() induced use-after-reset/free in mesh
   (that was suddenly causing issues in certain tests)
 * always request block-ack window size 64 as we found
   some APs will otherwise crash (really ...)
 * fix P2P-Device teardown sequence to avoid restarting
   with uninitialized data
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:56:22 -05:00
Jérôme Pouiller cf18b7788f net: phy: reset only targeted phy
It is possible to address another chip on same MDIO bus. The case is
correctly handled for media advertising. It is taken into account
only if mii_data->phy_id == phydev->addr. However, this condition
was missing for reset case.

Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:26:13 -05:00
David S. Miller c5ba5c8ac8 Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: set mac address and uc_list bug fixes.

Fix ndo_set_mac_address() for PF and VF.
Re-apply uc_list after chip reset.

v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:07:14 -05:00
Michael Chan b664f008b0 bnxt_en: Setup uc_list mac filters after resetting the chip.
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters.  Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.

Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:07:13 -05:00
Jeffrey Huang bdd4347b33 bnxt_en: enforce proper storing of MAC address
For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW.  For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.

v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.

Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:07:13 -05:00
Jeffrey Huang 1fc2cfd03b bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.

Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:07:13 -05:00
Vladimir Zapolskiy 39198ec987 net: lpc_eth: remove irq > NR_IRQS check from probe()
If the driver is used on an ARM platform with SPARSE_IRQ defined,
semantics of NR_IRQS is different (minimal value of virtual irqs) and
by default it is set to 16, see arch/arm/include/asm/irq.h.

This value may be less than the actual number of virtual irqs, which
may break the driver initialization. The check removal allows to use
the driver on such a platform, and, if irq controller driver works
correctly, the check is not needed on legacy platforms.

Fixes a runtime problem:

    lpc-eth 31060000.ethernet: error getting resources.
    lpc_eth: lpc-eth: not found (-6).

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:02:24 -05:00
Eric Dumazet 4eaf3b84f2 net_sched: fix qdisc_tree_decrease_qlen() races
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.

One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[  681.774821] PAX: sch->q.qlen: 0 n: 1
[  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
[  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[  681.774962] Call Trace:
[  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
[  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
[  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70

mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.

A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.

Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.

Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.

A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.

Reported-by: Daniele Fucini <dfucini@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 14:59:05 -05:00
Paolo Abeni 1317530302 openvswitch: fix hangup on vxlan/gre/geneve device deletion
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.

Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc39 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 14:29:25 -05:00
James Bottomley be9e2f775f Merge branch 'mkp-fixes' into fixes 2015-12-03 09:32:33 -08:00
James Bottomley 3ddda3e4c8 mpt3sas: fix Kconfig dependency problem for mpt2sas back compatibility
The non-PCI builds of the O day test project are failing:

On Thu, 2015-12-03 at 05:02 +0800, kbuild test robot wrote:
> warning: (SCSI_MPT2SAS) selects SCSI_MPT3SAS which has unmet direct
> dependencies (SCSI_LOWLEVEL && PCI && SCSI)

The problem is that select and depend don't interact because Kconfig doesn't
have a SAT solver, so depend picks up dependencies and select does onward
selects, but select doesn't pick up dependencies.  To fix this, we need to add
the correct dependencies to the MPT2SAS option like this.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: b840c3627b
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2015-12-03 09:31:23 -08:00
Andrew Lunn 4eba7bb1d7 ipv4: igmp: Allow removing groups from a removed interface
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.

If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a53 ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.

The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a53 "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 12:07:05 -05:00
Eric Dumazet 602dd62dfb ipv6: sctp: implement sctp_v6_destroy_sock()
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.

We need to call inet6_destroy_sock() to properly release
inet6 specific fields.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 12:05:57 -05:00
David S. Miller 79aecc7216 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2015-12-01

Here's a Bluetooth fix for the 4.4-rc series that fixes a memory leak of
the Security Manager L2CAP channel that'll happen for every LE
connection.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 12:04:05 -05:00
Yang Shi df849ba3a8 arm64: bpf: add 'store immediate' instruction
aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:

Load immediate to register
Store register

Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:38:31 -05:00
Eric Dumazet 6bd4f355df ipv6: kill sk_dst_lock
While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.

ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.

__ip6_dst_store() and ip6_dst_store() share same implementation.

sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()

Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

This is because ip6_dst_store() can be called from process context,
without any lock held.

As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()

Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:32:06 -05:00
Eric Dumazet c836a8ba93 ipv6: sctp: add rcu protection around np->opt
This patch completes the work I did in commit 45f6fad84c
("ipv6: add complete rcu protection around np->opt"), as I missed
sctp part.

This simply makes sure np->opt is used with proper RCU locking
and accessors.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:30:58 -05:00
Tejun Heo 67cde9c493 cgroup_pids: don't account for the root cgroup
Because accounting resources for the root cgroup sometimes incurs
measureable overhead for workloads which don't care about cgroup and
often ends up calculating a number which is available elsewhere in a
slightly different form, cgroup is not in the business of providing
system-wide statistics.  The pids controller which was introduced
recently was exposing "pids.current" at the root.  This patch disable
accounting for root cgroup and removes the file from the root
directory.

While this is a userland visible behavior change, pids has been
available only in one version and was badly broken there, so I don't
think this will be noticeable.  If it turns out to be a problem, we
can reinstate it for v1 hierarchies.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-12-03 10:18:21 -05:00
Tejun Heo 1f7dd3e5a6 cgroup: fix handling of multi-destination migration from subtree_control enabling
Consider the following v2 hierarchy.

  P0 (+memory) --- P1 (-memory) --- A
                                 \- B
       
P0 has memory enabled in its subtree_control while P1 doesn't.  If
both A and B contain processes, they would belong to the memory css of
P1.  Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter.  IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses.  pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

 WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
 Modules linked in:
 CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
 ...
  ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
  ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
  ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
 Call Trace:
  [<ffffffff81551ffc>] dump_stack+0x4e/0x82
  [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
  [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
  [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
  [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
  [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
  [<ffffffff81189016>] cgroup_attach_task+0x176/0x200
  [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
  [<ffffffff81189684>] cgroup_procs_write+0x14/0x20
  [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
  [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
  [<ffffffff81265f88>] __vfs_write+0x28/0xe0
  [<ffffffff812666fc>] vfs_write+0xac/0x1a0
  [<ffffffff81267019>] SyS_write+0x49/0xb0
  [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated.  All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
  target csses can be converted trivially.  cpu, io, freezer, perf,
  netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
  and destination and thus doesn't support v2 hierarchy already.  The
  only change made by this patchset is how that single destination css
  is obtained.

* memory migration path already doesn't do anything on v2.  How the
  single destination css is obtained is updated and the prep stage of
  mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug.  It now
  correctly handles multi-destination migrations and no longer causes
  counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-12-03 10:18:21 -05:00
Tejun Heo 599c963a0f cgroup_freezer: simplify propagation of CGROUP_FROZEN clearing in freezer_attach()
If one or more tasks get moved into a frozen css, the frozen state is
cleared up from the destination css so that it can be reasserted once
the migrated tasks are frozen.  freezer_attach() implements this in
two separate steps - clearing CGROUP_FROZEN on the target css while
processing each task and propagating the clearing upwards after the
task loop is done if necessary.

This patch merges the two steps.  Propagation now takes place inside
the task loop.  This simplifies the code and prepares it for the fix
of multi-destination migration.

Signed-off-by: Tejun Heo <tj@kernel.org>
2015-12-03 10:18:21 -05:00
Maxime Ripard 59f0ec231f clk: sunxi: pll2: Fix clock running too fast
Contrary to what the datasheet says, the pre divider doesn't seem to be
incremented by one in the PLL2, but just uses the value from the register,
with 0 being a bypass.

This fixes the audio playing too fast.

Since we now have the same pre-divider flags, and the only difference with
the A10 is the post-divider offset, also remove the structure to just pass
the offset as an argument.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Fixes: eb662f8547 ("clk: sunxi: pll2: Add A13 support")
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2015-12-02 23:27:47 -08:00
Konstantin Khlebnikov 6adc5fd6a1 net/neighbour: fix crash at dumping device-agnostic proxy entries
Proxy entries could have null pointer to net-device.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c1420 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 00:07:51 -05:00