Commit Graph

481427 Commits

Author SHA1 Message Date
Kirill Tkhai 753899183c sched/fair: Kill task_struct::numa_entry and numa_group::task_list
Nobody iterates over numa_group::task_list, this just confuses the readers.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415358456.28592.17.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:58:48 +01:00
Ingo Molnar e9ac5f0fa8 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:50:25 +01:00
Stanislaw Gruszka 6e998916df sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency
Commit d670ec1317 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc
test case in cost of breaking another one. After that commit, calling
clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result
of Y time being smaller than X time.

Reproducer/tester can be found further below, it can be compiled and ran by:

	gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread
	while ./tst-cpuclock2 ; do : ; done

This reproducer, when running on a buggy kernel, will complain
about "clock_gettime difference too small".

Issue happens because on start in thread_group_cputimer() we initialize
sum_exec_runtime of cputimer with threads runtime not yet accounted and
then add the threads runtime to running cputimer again on scheduler
tick, making it's sum_exec_runtime bigger than actual threads runtime.

KOSAKI Motohiro posted a fix for this problem, but that patch was never
applied: https://lkml.org/lkml/2013/5/26/191 .

This patch takes different approach to cure the problem. It calls
update_curr() when cputimer starts, that assure we will have updated
stats of running threads and on the next schedule tick we will account
only the runtime that elapsed from cputimer start. That also assure we
have consistent state between cpu times of individual threads and cpu
time of the process consisted by those threads.

Full reproducer (tst-cpuclock2.c):

	#define _GNU_SOURCE
	#include <unistd.h>
	#include <sys/syscall.h>
	#include <stdio.h>
	#include <time.h>
	#include <pthread.h>
	#include <stdint.h>
	#include <inttypes.h>

	/* Parameters for the Linux kernel ABI for CPU clocks.  */
	#define CPUCLOCK_SCHED          2
	#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
		((~(clockid_t) (pid) << 3) | (clockid_t) (clock))

	static pthread_barrier_t barrier;

	/* Help advance the clock.  */
	static void *chew_cpu(void *arg)
	{
		pthread_barrier_wait(&barrier);
		while (1) ;

		return NULL;
	}

	/* Don't use the glibc wrapper.  */
	static int do_nanosleep(int flags, const struct timespec *req)
	{
		clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED);

		return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL);
	}

	static int64_t tsdiff(const struct timespec *before, const struct timespec *after)
	{
		int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec;
		int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec;

		return after_i - before_i;
	}

	int main(void)
	{
		int result = 0;
		pthread_t th;

		pthread_barrier_init(&barrier, NULL, 2);

		if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) {
			perror("pthread_create");
			return 1;
		}

		pthread_barrier_wait(&barrier);

		/* The test.  */
		struct timespec before, after, sleeptimeabs;
		int64_t sleepdiff, diffabs;
		const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 };

		/* The relative nanosleep.  Not sure why this is needed, but its presence
		   seems to make it easier to reproduce the problem.  */
		if (do_nanosleep(0, &sleeptime) != 0) {
			perror("clock_nanosleep");
			return 1;
		}

		/* Get the current time.  */
		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) {
			perror("clock_gettime[2]");
			return 1;
		}

		/* Compute the absolute sleep time based on the current time.  */
		uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec;
		sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000;
		sleeptimeabs.tv_nsec = nsec % 1000000000;

		/* Sleep for the computed time.  */
		if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) {
			perror("absolute clock_nanosleep");
			return 1;
		}

		/* Get the time after the sleep.  */
		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) {
			perror("clock_gettime[3]");
			return 1;
		}

		/* The time after sleep should always be equal to or after the absolute sleep
		   time passed to clock_nanosleep.  */
		sleepdiff = tsdiff(&sleeptimeabs, &after);
		if (sleepdiff < 0) {
			printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff);
			result = 1;

			printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec);
			printf("After  %llu.%09llu\n", after.tv_sec, after.tv_nsec);
			printf("Sleep  %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec);
		}

		/* The difference between the timestamps taken before and after the
		   clock_nanosleep call should be equal to or more than the duration of the
		   sleep.  */
		diffabs = tsdiff(&before, &after);
		if (diffabs < sleeptime.tv_nsec) {
			printf("clock_gettime difference too small: %" PRId64 "\n", diffabs);
			result = 1;
		}

		pthread_cancel(th);

		return result;
	}

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:04:20 +01:00
Peter Zijlstra 23cfa361f3 sched/cputime: Fix cpu_timer_sample_group() double accounting
While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:

  cpu_timer_sample_group()
    thread_group_cputimer()
      thread_group_cputime()
        times->sum_exec_runtime += task_sched_runtime();

    *sample = cputime.sum_exec_runtime + task_delta_exec();

Which would make the sample run ahead, making the sleep short.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:04:18 +01:00
Peter Zijlstra 7af683350c sched/numa: Avoid selecting oneself as swap target
Because the whole numa task selection stuff runs with preemption
enabled (its long and expensive) we can end up migrating and selecting
oneself as a swap target. This doesn't really work out well -- we end
up trying to acquire the same lock twice for the swap migrate -- so
avoid this.

Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141110100328.GF29390@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:04:17 +01:00
Andrey Ryabinin c123588b3b sched/numa: Fix out of bounds read in sched_init_numa()
On latest mm + KASan patchset I've got this:

    ==================================================================
    BUG: AddressSanitizer: out of bounds access in sched_init_smp+0x3ba/0x62c at addr ffff88006d4bee6c
    =============================================================================
    BUG kmalloc-8 (Not tainted): kasan error
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in alloc_vfsmnt+0xb0/0x2c0 age=75 cpu=0 pid=0
     __slab_alloc+0x4b4/0x4f0
     __kmalloc_track_caller+0x15f/0x1e0
     kstrdup+0x44/0x90
     alloc_vfsmnt+0xb0/0x2c0
     vfs_kern_mount+0x35/0x190
     kern_mount_data+0x25/0x50
     pid_ns_prepare_proc+0x19/0x50
     alloc_pid+0x5e2/0x630
     copy_process.part.41+0xdf5/0x2aa0
     do_fork+0xf5/0x460
     kernel_thread+0x21/0x30
     rest_init+0x1e/0x90
     start_kernel+0x522/0x531
     x86_64_start_reservations+0x2a/0x2c
     x86_64_start_kernel+0x15b/0x16a
    INFO: Slab 0xffffea0001b52f80 objects=24 used=22 fp=0xffff88006d4befc0 flags=0x100000000004080
    INFO: Object 0xffff88006d4bed20 @offset=3360 fp=0xffff88006d4bee70

    Bytes b4 ffff88006d4bed10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
    Object ffff88006d4bed20: 70 72 6f 63 00 6b 6b a5                          proc.kk.
    Redzone ffff88006d4bed28: cc cc cc cc cc cc cc cc                          ........
    Padding ffff88006d4bee68: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
    CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B          3.18.0-rc3-mm1+ #108
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
     ffff88006d4be000 0000000000000000 ffff88006d4bed20 ffff88006c86fd18
     ffffffff81cd0a59 0000000000000058 ffff88006d404240 ffff88006c86fd48
     ffffffff811fa3a8 ffff88006d404240 ffffea0001b52f80 ffff88006d4bed20
    Call Trace:
    dump_stack (lib/dump_stack.c:52)
    print_trailer (mm/slub.c:645)
    object_err (mm/slub.c:652)
    ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
    kasan_report_error (mm/kasan/report.c:102 mm/kasan/report.c:178)
    ? kasan_poison_shadow (mm/kasan/kasan.c:48)
    ? kasan_unpoison_shadow (mm/kasan/kasan.c:54)
    ? kasan_poison_shadow (mm/kasan/kasan.c:48)
    ? kasan_kmalloc (mm/kasan/kasan.c:311)
    __asan_load4 (mm/kasan/kasan.c:371)
    ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
    sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
    kernel_init_freeable (init/main.c:869 init/main.c:997)
    ? finish_task_switch (kernel/sched/sched.h:1036 kernel/sched/core.c:2248)
    ? rest_init (init/main.c:924)
    kernel_init (init/main.c:929)
    ? rest_init (init/main.c:924)
    ret_from_fork (arch/x86/kernel/entry_64.S:348)
    ? rest_init (init/main.c:924)
    Read of size 4 by task swapper/0:
    Memory state around the buggy address:
     ffff88006d4beb80: fc fc fc fc fc fc fc fc fc fc 00 fc fc fc fc fc
     ffff88006d4bec00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bed00: fc fc fc fc 00 fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    >ffff88006d4bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 04 fc
                                                              ^
     ffff88006d4bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bef80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
     ffff88006d4bf000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff88006d4bf080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================

Zero 'level' (e.g. on non-NUMA system) causing out of bounds
access in this line:

     sched_max_numa_distance = sched_domains_numa_distance[level - 1];

Fix this by exiting from sched_init_numa() earlier.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Fixes: 9942f79ba ("sched/numa: Export info needed for NUMA balancing on complex topologies")
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1415372020-1871-1-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-10 10:33:22 +01:00
Iulia Manda 44dba3d5d6 sched: Refactor task_struct to use numa_faults instead of numa_* pointers
This patch simplifies task_struct by removing the four numa_* pointers
in the same array and replacing them with the array pointer. By doing this,
on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on
x86_64).

A new parameter is added to the task_faults_idx function so that it can return
an index to the correct offset, corresponding with the old precalculated
pointers.

All of the code in sched/ that depended on task_faults_idx and numa_* was
changed in order to match the new logic.

Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: dave@stgolabs.net
Cc: riel@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfell
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:57 +01:00
Wanpeng Li cad3bb32e1 sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
There are both UP and SMP version of pull_dl_task(), so don't need
to check CONFIG_SMP in switched_from_dl();

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-6-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:56 +01:00
Wanpeng Li cd66091162 sched/deadline: Reschedule from switched_from_dl() after a successful pull
In switched_from_dl() we have to issue a resched if we successfully
pulled some task from other cpus. This patch also aligns the behavior
with -rt.

Suggested-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-5-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:55 +01:00
Wanpeng Li 6b0a563f3a sched/deadline: Push task away if the deadline is equal to curr during wakeup
This patch pushes task away if the dealine of the task is equal
to current during wake up. The same behavior as rt class.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-4-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:55 +01:00
Wanpeng Li acb32132ec sched/deadline: Add deadline rq status print
This patch add deadline rq status print.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-3-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:54 +01:00
Wanpeng Li 804968809c sched/deadline: Fix artificial overrun introduced by yield_task_dl()
The yield semantic of deadline class is to reduce remaining runtime to
zero, and then update_curr_dl() will stop it. However, comsumed bandwidth
is reduced from the budget of yield task again even if it has already been
set to zero which leads to artificial overrun. This patch fix it by make
sure we don't steal some more time from the task that yielded in update_curr_dl().

Suggested-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-2-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:53 +01:00
Wanpeng Li 308a623a40 sched/rt: Clean up check_preempt_equal_prio()
This patch checks if current can be pushed/pulled somewhere else
in advance to make logic clear, the same behavior as dl class.

- If current can't be migrated, useless to reschedule, let's hope
  task can move out.
- If task is migratable, so let's not schedule it and see if it
  can be pushed or pulled somewhere else.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414708776-124078-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:52 +01:00
Juri Lelli 75e23e49db sched/core: Use dl_bw_of() under rcu_read_lock_sched()
As per commit f10e00f4bf ("sched/dl: Use dl_bw_of() under
rcu_read_lock_sched()"), dl_bw_of() has to be protected by
rcu_read_lock_sched().

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414497286-28824-1-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:52 +01:00
Yao Dongdong 9f96742a13 sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
Idle cpu is idler than non-idle cpu, so we needn't search for least_loaded_cpu
after we have found an idle cpu.

Signed-off-by: Yao Dongdong <yaodongdong@huawei.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414469286-6023-1-git-send-email-yaodongdong@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:51 +01:00
Kirill Tkhai 67dfa1b756 sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()
Currently used hrtimer_try_to_cancel() is racy:

raw_spin_lock(&rq->lock)
...                            dl_task_timer                 raw_spin_lock(&rq->lock)
...                               raw_spin_lock(&rq->lock)   ...
   switched_from_dl()             ...                        ...
      hrtimer_try_to_cancel()     ...                        ...
   switched_to_fair()             ...                        ...
...                               ...                        ...
...                               ...                        ...
raw_spin_unlock(&rq->lock)        ...                        (asquired)
...                               ...                        ...
...                               ...                        ...
do_exit()                         ...                        ...
   schedule()                     ...                        ...
      raw_spin_lock(&rq->lock)    ...                        raw_spin_unlock(&rq->lock)
      ...                         ...                        ...
      raw_spin_unlock(&rq->lock)  ...                        raw_spin_lock(&rq->lock)
      ...                         ...                        (asquired)
      put_task_struct()           ...                        ...
          free_task_struct()      ...                        ...
      ...                         ...                        raw_spin_unlock(&rq->lock)
...                               (asquired)                 ...
...                               ...                        ...
...                               (use after free)           ...

So, let's implement 100% guaranteed way to cancel the timer and let's
be sure we are safe even in very unlikely situations.

rq unlocking does not limit the area of switched_from_dl() use, because
this has already been possible in pull_dl_task() below.

Let's consider the safety of of this unlocking. New code in the patch
is working when hrtimer_try_to_cancel() fails. This means the callback
is running. In this case hrtimer_cancel() is just waiting till the
callback is finished. Two

1) Since we are in switched_from_dl(), new class is not dl_sched_class and
new prio is not less MAX_DL_PRIO. So, the callback returns early; it's
right after !dl_task() check. After that hrtimer_cancel() returns back too.

The above is:

raw_spin_lock(rq->lock);                  ...
...                                       dl_task_timer()
...                                          raw_spin_lock(rq->lock);
   switched_from_dl()                        ...
       hrtimer_try_to_cancel()               ...
          raw_spin_unlock(rq->lock);         ...
          hrtimer_cancel()                   ...
          ...                                raw_spin_unlock(rq->lock);
          ...                                return HRTIMER_NORESTART;
          ...                             ...
          raw_spin_lock(rq->lock);        ...

2) But the below is also possible:
                                   dl_task_timer()
                                      raw_spin_lock(rq->lock);
                                      ...
                                      raw_spin_unlock(rq->lock);
raw_spin_lock(rq->lock);              ...
   switched_from_dl()                 ...
       hrtimer_try_to_cancel()        ...
       ...                            return HRTIMER_NORESTART;
       raw_spin_unlock(rq->lock);  ...
       hrtimer_cancel();           ...
       raw_spin_lock(rq->lock);    ...

In this case hrtimer_cancel() returns immediately. Very unlikely case,
just to mention.

Nobody can manipulate the task, because check_class_changed() is
always called with pi_lock locked. Nobody can force the task to
participate in (concurrent) priority inheritance schemes (the same reason).

All concurrent task operations require pi_lock, which is held by us.
No deadlocks with dl_task_timer() are possible, because it returns
right after !dl_task() check (it does nothing).

If we receive a new dl_task during the time of unlocked rq, we just
don't have to do pull_dl_task() in switched_from_dl() further.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
[ Added comments]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:50 +01:00
Peter Zijlstra e7097e8bd0 sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
In some cases this can trigger a true flood of output.

Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:49 +01:00
Peter Zijlstra ff960a7317 netdev, sched/wait: Fix sleeping inside wait event
rtnl_lock_unregistering*() take rtnl_lock() -- a mutex -- inside a
wait loop. The wait loop relies on current->state to function, but so
does mutex_lock(), nesting them makes for the inner to destroy the
outer state.

Fix this using the new wait_woken() bits.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com>
Cc: stephen hemminger <stephen@networkplumber.org>
Cc: Tom Gundersen <teg@jklm.no>
Cc: Tom Herbert <therbert@google.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20141029173110.GE15602@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:48 +01:00
Peter Zijlstra eedf7e47da rfcomm, sched/wait: Fix broken wait construct
rfcomm_run() is a tad broken in that is has a nested wait loop. One
cannot rely on p->state for the outer wait because the inner wait will
overwrite it.

Fix this using the new wait_woken() facility.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Alexander Holler <holler@ahsoftware.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Joe Perches <joe@perches.com>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Libor Pechacek <lpechacek@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vignesh Raman <Vignesh_Raman@mentor.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:47 +01:00
Peter Zijlstra 6b55fc63f4 audit, sched/wait: Fixup kauditd_thread() wait loop
The kauditd_thread wait loop is a bit iffy; it has a number of problems:

 - calls try_to_freeze() before schedule(); you typically want the
   thread to re-evaluate the sleep condition when unfreezing, also
   freeze_task() issues a wakeup.

 - it unconditionally does the {add,remove}_wait_queue(), even when the
   sleep condition is false.

Use wait_event_freezable() that does the right thing.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: oleg@redhat.com
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:47 +01:00
Peter Zijlstra (Intel) 5d4d565824 sched/wait: Remove wait_event_freezekillable()
There is no user.. make it go away.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:46 +01:00
Peter Zijlstra 36df04bc52 sched/wait: Reimplement wait_event_freezable()
Provide better implementations of wait_event_freezable() APIs.

The problem is with freezer_do_not_count(), it hides the thread from
the freezer, even though this thread might not actually freeze/sleep
at all.

Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-d86fz1jmso9wjxa8jfpinp8o@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:45 +01:00
Peter Zijlstra cb6538e740 sched/wait: Fix a kthread race with wait_woken()
There is a race between kthread_stop() and the new wait_woken() that
can result in a lack of progress.

CPU 0                                    | CPU 1
                                         |
rfcomm_run()                             | kthread_stop()
  ...                                    |
  if (!test_bit(KTHREAD_SHOULD_STOP))    |
                                         |   set_bit(KTHREAD_SHOULD_STOP)
                                         |   wake_up_process()
    wait_woken()                         |   wait_for_completion()
      set_current_state(INTERRUPTIBLE)   |
      if (!WQ_FLAG_WOKEN)                |
        schedule_timeout()               |
                                         |

After which both tasks will wait.. forever.

Fix this by having wait_woken() check for kthread_should_stop() but
only for kthreads (obviously).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:44 +01:00
Kirill Tkhai f7b8a47da1 sched: Remove lockdep check in sched_move_task()
sched_move_task() is the only interface to change sched_task_group:
cpu_cgrp_subsys methods and autogroup_move_group() use it.

Everything is synchronized by task_rq_lock(), so cpu_cgroup_attach()
is ordered with other users of sched_move_task(). This means we do no
need RCU here: if we've dereferenced a tg here, the .attach method
hasn't been called for it yet.

Thus, we should pass "true" to task_css_check() to silence lockdep
warnings.

Fixes: eeb61e53ea ("sched: Fix race between task_group and sched_task_group")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414473874.8574.2.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:07:30 +01:00
Linus Torvalds 980d0d51b1 Pin control fixes for the v3.18 series:
- Two fixes for the Baytrail driver affecting IRQs and
   output state in sysfs
 - Use the linux-gpio mailing list also for pinctrl patches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUV5SiAAoJEEEQszewGV1z5xAQAKEGADd/Y6Qaz/HLORf5ffjg
 R0Jkwy4B6t4nF5cjd1dy72jRWk+tx0bYFHSbtpfxYcyXJK31gYNsGxEng3gZrwOw
 dPKzivZP4FtFem0XQsXq+V6s3wzXqjl4NLSpIFANRpobpqmfov7oyaXaNkPCM7Kw
 Qtb49qPnQk2E7h/HlREE+eNRDylF6ijthmwFxw0upxQ0cZY+JXdbF2SolRM6D+cy
 Bn5lqvsZ1kDOUdwNAnmZxQv/lbtn/m1j0hHFbZItvujA6e899TAEYZSmWiGAECvx
 B3PFB5ivO+fqhi1xtuW0CqUYeUW12MIB5yZleVVn2tmBBWMcAlGV+7PXuyZW+Wwm
 ljuiO8k6jjfS5cUrf3hu9rkBz+IgTbECZCKi5ItwlzLFIHbIEVHFzaOS+RoVrUlT
 mm8cXI64IackRPHSnCdlvVVLC3M46I5DyEfmepnjY4NDRtf+icxNhbOJlK1+DCk5
 AKbfJYMnyNcgGxtct9fSCAg48n0O/hHIvqJQ9lfPxUYeYt1zo4m5SoQVDBfW86fw
 +4fZrlVDIZX6FvREJBJ91fIsgB327p5eGIBcmVpedNcCXr2Sn0e9fVKs4VEPtlf/
 3bNJJDsiBIL3SGMIwlND/uCZBy09pcbjDPLjuPRQCr0zAtajYjwn6teWG6ykS6G+
 /sn8jbBctyjf+9fOTEMF
 =dbx2
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v3.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin-control fixes from Linus Walleij:
 "This kernel cycle has been calm for both pin control and GPIO so far
  but here are three pin control patches for you anyway, only really
  dealing with Baytrail:

   - Two fixes for the Baytrail driver affecting IRQs and output state
     in sysfs
   - Use the linux-gpio mailing list also for pinctrl patches"

* tag 'pinctrl-v3.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: baytrail: show output gpio state correctly on Intel Baytrail
  pinctrl: use linux-gpio mailing list
  pinctrl: baytrail: Clear DIRECT_IRQ bit
2014-11-03 21:06:22 -08:00
Linus Torvalds f3ed88a6bc Merge branch 'fixes-for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull CMA and DMA-mapping fixes from Marek Szyprowski:
 "This contains important fixes for recently introduced highmem support
  for default contiguous memory region used for dma-mapping subsystem"

* 'fixes-for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  mm, cma: make parameters order consistent in func declaration and definition
  mm: cma: Use %pa to print physical addresses
  mm: cma: Ensure that reservations never cross the low/high mem boundary
  mm: cma: Always consider a 0 base address reservation as dynamic
  mm: cma: Don't crash on allocation if CMA area can't be activated
2014-11-03 21:01:04 -08:00
Linus Torvalds ce1928da84 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "There is a GFP flag fix from Mike Christie, an error code fix from
  Jan, and fixes for two unnecessary allocations (kmalloc and workqueue)
  from Ilya.  All are well tested.

  Ilya has one other fix on the way but it didn't get tested in time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: eliminate unnecessary allocation in process_one_ticket()
  rbd: Fix error recovery in rbd_obj_read_sync()
  libceph: use memalloc flags for net IO
  rbd: use a single workqueue for all devices
2014-11-03 15:04:26 -08:00
Linus Torvalds f4ca536f71 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k update from Geert Uytterhoeven.

Just wiring up the bpf system call.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Wire up bpf
2014-11-03 14:09:33 -08:00
Linus Torvalds 2084becbe1 ARM: SoC fixes for 3.8-rc3
A surprisingly small batch of fixes for -rc3. Suspiciously small, I'd say.
 
 Anyway, most of this are a few defconfig updates. Some for omap to deal
 with kernel binary size (moving ipv6 to module, etc). A larger one for
 socfpga that refreshes with some churn, but also turns on a few options
 that makes the newly-added board in my bootfarm usable for testing.
 
 OMAP3 will also now warn when booted with legacy (non-DT) boot protocols,
 hopefully encouraging those who still care about some of those platforms
 to submit DT support and report bugs where needed. Nothing stops working
 though, this is just to warn for future deprecation.
 
 Beyond this, very few actual bugfixes. A PXA fix for DEBUG_LL boot hangs,
 a missing terminting entry in a dt_match array on RealView a MTD fix on
 OMAP with NAND.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJUVqZOAAoJEIwa5zzehBx322EP/3qID6KdT6d5AF2ywaHpKse/
 LSV3sQbwViGJfKo0jurcfpYUbIaT1QYXMsbjUy9ZFPa7zujgKBx6ugmoLiPKSanP
 wQODZXCvUXOXBspq17U6+kA5pKvethG9zdfFhsStLgIpw44YhWqFhGr2oK9az/pd
 ZOiI5n/96n35WIETNQdp/P/JHZziatELNUseXd7xsp6vzfaIo3CudYRj0fqX8YCy
 l+iFKUJY9gD9zdMNAzoCcXnFmol00UhDDqWrkE+5QKS+T7GVEgtIzaYmYjACyzKA
 YsTohQQPXrtdKHsnNko/7PPAUhd1xLV2gqD3Subi8m65QdDZw62xbGdVWgfD3Pa+
 TjBeIqunPBQ9rambWzpV/uWUjmgu9e7eX18MjDxChUOiBZuKtYeW1kfwS0mfybXw
 TKxJCd9HYBY94bj8sJMLGs8DlDViVjiVeuv9pBf/MzqYS+CGXFeB/yadyrsLHEjG
 bawKUIL3/bfkPbvxNlMs1tmRlVpMmBu1AVU3SwCtoDZpf/OcnRizFM4yzI9bUMkd
 0FDFYZCUvvMNeTraiptHRE3cG7io9gPX4ocIf/9LaT1jdxMedlU3O0j18HY7ZO33
 Fzr4Mr9OmEpUT1UTjTiwzE1uDTlfbawAROb8DCuzTfcW2KiIKR1ZAX4RsdE+YSvZ
 D+f0FzqEOZ07gkb08lUW
 =JasG
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-for-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A surprisingly small batch of fixes for -rc3.  Suspiciously small, I'd
  say.

  Anyway, most of this are a few defconfig updates.  Some for omap to
  deal with kernel binary size (moving ipv6 to module, etc).  A larger
  one for socfpga that refreshes with some churn, but also turns on a
  few options that makes the newly-added board in my bootfarm usable for
  testing.

  OMAP3 will also now warn when booted with legacy (non-DT) boot
  protocols, hopefully encouraging those who still care about some of
  those platforms to submit DT support and report bugs where needed.
  Nothing stops working though, this is just to warn for future
  deprecation.

  Beyond this, very few actual bugfixes.  A PXA fix for DEBUG_LL boot
  hangs, a missing terminting entry in a dt_match array on RealView a
  MTD fix on OMAP with NAND"

[ Obviously missed rc3, will make rc4 instead ;) ]

* tag 'armsoc-for-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: drop list entry for davinci
  ARM: OMAP2+: Warn about deprecated legacy booting mode
  ARM: omap2plus_defconfig: Fix errors with NAND BCH
  ARM: multi_v7_defconfig: fix support for APQ8084
  soc: versatile: Add terminating entry for realview_soc_of_match
  ARM: ixp4xx: remove compilation warnings in io.h
  MAINTAINERS: Add Soren as reviewer for Zynq
  ARM: omap2plus_defconfig: Fix bloat caused by having ipv6 built-in
  ARM: socfpga_defconfig: Update defconfig for SoCFPGA
  ARM: pxa: fix hang on startup with DEBUG_LL
2014-11-03 14:07:05 -08:00
Linus Torvalds 0df1f2487d Linux 3.18-rc3 2014-11-02 15:01:51 -08:00
Linus Torvalds 81d92dc117 Three main MTD fixes for 3.18:
* A regression from 3.16 which was noticed in 3.17. With the restructuring of
    the m25p80.c driver and the SPI NOR library framework, we omitted proper
    listing of the SPI device IDs. This means m25p80.c wouldn't auto-load
    (modprobe) properly when built as a module. For now, we duplicate the device
    IDs into both modules.
 
 * The OMAP / ELM modules were depending on an implicit link ordering. Use
   deferred probing so that the new link order (in 3.18-rc) can still allow for
   successful probing.
 
 * Fix suspend/resume support for LH28F640BF NOR flash
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUVq20AAoJEFySrpd9RFgt8XQQAI4oIygz4zGQ6n0y4HOqwOBy
 F4ZPtOuzuCYA86x2zORFgj4A9JGVjDQwTfnMQnn1NG+XEEmZMJfG2IwqlUxsZd5A
 KkAS5XUoi/Fvq95Qi95KQYXqm1dniXEGKsRFsHKXIsnnDmbqRK5fBn6Ve5PAwcau
 uru5FwrZ2Ve0EwF9/Z/bxAatRirdAhwgMGlaXdXLmL7S13NQGmXP9QI7CbxSZ38R
 GJ+A6PhiYs6Sml6Ou5bovNXyFGfx4J35pk6nTWoWe5MfZHRQk447OQwBPbsrM119
 Boq8F/6diXyJfuXdxvF6JiDmDzaw/fBY+Xuq1O6p+JzLONN16x93KlpAPhzy4a15
 PwFHCBzg5khY49if/dmrPJ+kLkU+9wIHUib8m6HSImCKBT5Bv/VJXoQ1g4s1IJ8/
 Di3mz/8pWm/cscABIkuEqb9TwwUrSHzVXgGH/p4CY0eUo8DbQQA1zDsig8aRIX36
 FlAReaHH8QivdnghkMX9Px7SIo7XoMZZEi+55k8FrIVqjqEHNGx+w+BKhxgtFggN
 nAg0l7NrLdQHpigK1SZjLFGIYi7MmarvbatUjVPGagiRqoQ0mCSS7eKX1DEs4EAo
 P2g64BSJickGAhUiAV9ZO1EBoaOU6olIPpc33J+uG/8qBU1cNClx3FJ1UPWX27JQ
 +FBsD1mec4FuoZ2SoE7r
 =IxjM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20141102' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
 "Three main MTD fixes for 3.18:

   - A regression from 3.16 which was noticed in 3.17.  With the
     restructuring of the m25p80.c driver and the SPI NOR library
     framework, we omitted proper listing of the SPI device IDs.  This
     means m25p80.c wouldn't auto-load (modprobe) properly when built as
     a module.  For now, we duplicate the device IDs into both modules.

   - The OMAP / ELM modules were depending on an implicit link ordering.
     Use deferred probing so that the new link order (in 3.18-rc) can
     still allow for successful probing.

   - Fix suspend/resume support for LH28F640BF NOR flash"

* tag 'for-linus-20141102' of git://git.infradead.org/linux-mtd:
  mtd: cfi_cmdset_0001.c: fix resume for LH28F640BF chips
  mtd: omap: fix mtd devices not showing up
  mtd: m25p80,spi-nor: Fix module aliases for m25p80
  mtd: spi-nor: make spi_nor_scan() take a chip type name, not spi_device_id
  mtd: m25p80: get rid of spi_get_device_id
2014-11-02 14:45:52 -08:00
Linus Torvalds ad2be3796f SCSI for-linus on 20141102
This is a set of six patches consisting of two MAINTAINER updates, two scsi-mq
 fixs for the old parallel interface (not every request is tagged and we need
 to set the right flags to populate the SPI tag message) and a fix for a memory
 leak in scatterlist traversal caused by a preallocation update in 3.17) and an
 ipv6 fix for cxgbi.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUVqYyAAoJEDeqqVYsXL0MDwQIAI4aCdC3LUelt6IzywToiJ9u
 GTUZU7eULzV1KuaExcBM8g9DyE3wtnhfS0WA5TqvfKf8ImMKntan0loK0juIjj8K
 UBtbSDapwkKRRAIoRHvBzcHU+Jzc//naO95ojsFt2ELHB91jta368+DiPb8m+wQx
 IKeYO6vFaSwZZCKVNaWwyCpxEKmMO+ib80zfuANBzWjau3IFxMiUofSvtPGIWb+E
 Zxi9qvOyqjfFE8OLfL9ckH5NPiIYcwmarAXyuYcJstiw6VX2deGBaJalueSSC0JV
 TAqEtT5k1AraFS2HM0TRxar+HK5iBj3s8f0pFBfgD2+oagUln6L/486nMh2Ktac=
 =l6wS
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of six patches consisting of:
   - two MAINTAINER updates
   - two scsi-mq fixs for the old parallel interface (not every request
     is tagged and we need to set the right flags to populate the SPI
     tag message)
   - a fix for a memory leak in scatterlist traversal caused by a
     preallocation update in 3.17
   - an ipv6 fix for cxgbi"

[ The scatterlist fix also came in separately through the block layer tree ]

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  MAINTAINERS: ufs - remove self
  MAINTAINERS: change hpsa and cciss maintainer
  libcxgbi : support ipv6 address host_param
  scsi: set REQ_QUEUE for the blk-mq case
  Revert "block: all blk-mq requests are tagged"
  lib/scatterlist: fix memory leak with scsi-mq
2014-11-02 14:39:35 -08:00
Linus Torvalds 12267166c5 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Nothing too astounding or major: radeon, i915, vmwgfx, armada and
  exynos.

  Biggest ones:
   - vmwgfx has one big locking regression fix
   - i915 has come displayport fixes
   - radeon has some stability and a memory alloc failure
   - armada and exynos have some vblank fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (24 commits)
  drm/exynos: correct connector->dpms field before resuming
  drm/exynos: enable vblank after DPMS on
  drm/exynos: init kms poll at the end of initialization
  drm/exynos: propagate plane initialization errors
  drm/exynos: vidi: fix build warning
  drm/exynos: remove explicit encoder/connector de-initialization
  drm/exynos: init vblank with real number of crtcs
  drm/vmwgfx: Filter out modes those cannot be supported by the current VRAM size.
  drm/vmwgfx: Fix hash key computation
  drm/vmwgfx: fix lock breakage
  drm/i915/dp: only use training pattern 3 on platforms that support it
  drm/radeon: remove some buggy dead code
  drm/i915: Ignore VBT backlight check on Macbook 2, 1
  drm/radeon: remove invalid pci id
  drm/radeon: dpm fixes for asrock systems
  radeon: clean up coding style differences in radeon_get_bios()
  drm/radeon: Use drm_malloc_ab instead of kmalloc_array
  drm/radeon/dpm: disable ulv support on SI
  drm/i915: Fix GMBUSFREQ on vlv/chv
  drm/i915: Ignore long hpds on eDP ports
  ...
2014-11-02 14:27:30 -08:00
Olof Johansson 4257412db5 Few fixes for omaps to enable NAND BCH so devices won't
produce errors when booted with omap2plus_defconfig, and
 reduce bloat by making IPV6 a loadable module.
 
 Also let's add a warning about legacy boot being deprecated
 for omap3.
 
 We now have things working with device tree, and only omap3 is
 still booting in legacy mode. So hopefully this warning will
 help move the remaining legacy mode users to boot with device
 tree.
 
 As the total reduction of code and static data is somewhere
 around 20000 lines of code once we remove omap3 legacy mode
 booting, we really do want to make omap3 to boot also in
 device tree mode only over the next few merge cycles.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUUmuGAAoJEBvUPslcq6VzcvkQAJkQhYXffP+lN8XHvZKNSnXb
 aue3W1PBI9zPwnMfySryo/262od/cz1ucgLwNn7SHFs2EhfEYNKSjgF++q+ave8q
 frqwbEE0Pe5dGlV9tmA6SdffsiW2hLPuI1XF9a1XBW2qD5LHBidJLconUILnQsH2
 E7ddhf2K/yAizF3BRhOeX7YVzOG5hxodBuhPz1fqLllfTA3EBXgB33wffHSOGVls
 r1fgpaUFzaBoJ5ZZyqNx7chGBRrOxAOviz/zvp8ljFkCelENkBa8IkuSiNRbiWlk
 el4wo3HfCRypqKBpXlDkD6Gfkugvc54xmkXXod/qgNnFchaIR0LENuzNonHZlC2S
 cIVGuqIhRaIHkLMtJKATH13WmA7MqYg3Vcjqm5VnVpaqM6Q4390Ug000UpLj7FEB
 6FXnN3tQJLBgmvc5SDhohXNzWZSjXGTUBo92fS9zc4BzN9CUrLsEfKZtIcHA7HSZ
 rseGiLx0dyrYPdwsyEfp1row/ZWw9fGbtWCumN1LXlnQH/IXwGGf+irOeKbMmU2V
 nRZxpQIQtEfpJTJf0lKMZV7JUm93MXOvkcMPaPq9PtEn0plC7YIohPrO/yEPdl9K
 EIuehWwRbbjWoxtznTLlm9FOwNBg/PtChq6XP+M+wHzeDfiumA3jwPQEtmUkhha3
 wy42f5Bi3MhtIzZ51oDm
 =WvgB
 -----END PGP SIGNATURE-----

Merge tag 'fixes-against-v3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

Merge "omap fixes against v3.18-rc2" from Tony Lindgren:

Few fixes for omaps to enable NAND BCH so devices won't
produce errors when booted with omap2plus_defconfig, and
reduce bloat by making IPV6 a loadable module.

Also let's add a warning about legacy boot being deprecated
for omap3.

We now have things working with device tree, and only omap3 is
still booting in legacy mode. So hopefully this warning will
help move the remaining legacy mode users to boot with device
tree.

As the total reduction of code and static data is somewhere
around 20000 lines of code once we remove omap3 legacy mode
booting, we really do want to make omap3 to boot also in
device tree mode only over the next few merge cycles.

* tag 'fixes-against-v3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (407 commits)
  ARM: OMAP2+: Warn about deprecated legacy booting mode
  ARM: omap2plus_defconfig: Fix errors with NAND BCH
  ARM: omap2plus_defconfig: Fix bloat caused by having ipv6 built-in
  + Linux 3.18-rc2

Signed-off-by: Olof Johansson <olof@lixom.net>
2014-11-02 13:37:07 -08:00
Linus Torvalds 3c43de0ffd Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 - add the new bpf syscall to ARM.
 - drop a redundant return statement in __iommu_alloc_remap()
 - fix a performance issue noticed by Thomas Petazzoni with
   kmap_atomic().
 - fix an issue with the L2 cache OF parsing code which caused it to
   incorrectly print warnings on each boot, and make the warning text
   more consistent with the rest of the code

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8180/1: mm: implement no-highmem fast path in kmap_atomic_pfn()
  ARM: 8183/1: l2c: Improve l2c310_of_parse() error message
  ARM: 8181/1: Drop extra return statement
  ARM: 8182/1: l2c: Make l2x0_cache_size_of_parse() return 'int'
  ARM: enable bpf syscall
2014-11-02 12:56:20 -08:00
Linus Torvalds 7501a53329 A small set of x86 fixes. The most serious is an SRCU lockdep fix.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUVd9KAAoJEL/70l94x66Dc1AH/0jdb8DsewyAuJzLKaJ/qJwK
 9JMqglpDQ+Sm0f2puPyJkR8NQd2AMPK7J5aJjWAl/XxJjsDcn+TQur20okzUDXLJ
 21sIbqo92hCgpSNs+RHLHlj7/iMQVYnMFh7bp6JcvzmhpN8F/D793BT+oOxdjMRg
 PLCQ794ugGhFboesDkV822VWgtQ26yG2aQDWbYgL9r5xPp5OpbzSiq85KopSEfS0
 K+PPntI8yNI+EvOC9ta0FfEOMMfQoLDds+V0FXiEIRx43MV8bwAXpWzsB8ibd1F6
 eY+cVvSPzWgDSCVLn3gfYkrRl3sWGdvyfxTe/cz507ZfXcuT2uHJhtbpH2KCGto=
 =FJ6/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "A small set of x86 fixes.  The most serious is an SRCU lockdep fix.

  A bit late - needed some time to test the SRCU fix, which only came in
  on Friday"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: vmx: defer load of APIC access page address during reset
  KVM: nVMX: Disable preemption while reading from shadow VMCS
  KVM: x86: Fix far-jump to non-canonical check
  KVM: emulator: fix execution close to the segment limit
  KVM: emulator: fix error code for __linearize
2014-11-02 12:31:02 -08:00
Dave Airlie 66338feee4 Merge branch 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
This pull-request includes some bug fixes and code cleanups.
Especially, this fixes the bind failure issue occurred when it tries
to re-bind Exynos drm driver after unbound, and the modetest failure
issue incurred by not having a pair to vblank on and off requests.

* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
  drm/exynos: correct connector->dpms field before resuming
  drm/exynos: enable vblank after DPMS on
  drm/exynos: init kms poll at the end of initialization
  drm/exynos: propagate plane initialization errors
  drm/exynos: vidi: fix build warning
  drm/exynos: remove explicit encoder/connector de-initialization
  drm/exynos: init vblank with real number of crtcs
2014-11-03 05:23:17 +10:00
Linus Torvalds 7e05b807b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
 "A bunch of assorted fixes, most of them followups to overlayfs merge"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ovl: initialize ->is_cursor
  Return short read or 0 at end of a raw device, not EIO
  isofs: don't bother with ->d_op for normal case
  isofs_cmp(): we'll never see a dentry for . or ..
  overlayfs: fix lockdep misannotation
  ovl: fix check for cursor
  overlayfs: barriers for opening upper-layer directory
  rcu: Provide counterpart to rcu_dereference() for non-RCU situations
  staging: android: logger: Fix log corruption regression
2014-11-02 10:28:43 -08:00
Linus Torvalds 4cb8c3593b irda: stop calling sk_prot->disconnect() on connection failure
The sk_prot is irda's own set of protocol handlers, so irda should
statically know what that function is anyway, without using an indirect
pointer.  And as it happens, we know *exactly* what that pointer is
statically: it's NULL, because irda doesn't define a disconnect
operation.

So calling that function is doubly wrong, and will just cause an oops.

Reported-by: Martin Lang <mlg.hessigheim@gmail.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-02 10:20:26 -08:00
Andrzej Hajda 74cfe07a83 drm/exynos: correct connector->dpms field before resuming
During system suspend after connector switch off its dpms field
is set to connector previous dpms state. To properly resume dpms field
should be set to its actual state (off) before resuming to previous dpms state.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2014-11-03 01:51:28 +09:00
Andrzej Hajda d6948b2fd8 drm/exynos: enable vblank after DPMS on
Before DPMS off driver disables vblank.
It should be balanced by vblank enable after DPMS on.
The patch fixes issue with page_flip ioctl not being able
to acquire vblank counter introduced by patch:
drm: Always reject drm_vblank_get() after drm_vblank_off()

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2014-11-03 01:51:28 +09:00
Andrzej Hajda 3cb6830a75 drm/exynos: init kms poll at the end of initialization
HPD events can be generated by components even if drm_dev is not fully
initialized, to skip such events kms poll initialization should
be performed at the end of load callback followed directly by forced
connection detection.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2014-11-03 01:51:28 +09:00
Andrzej Hajda 64f7aed83d drm/exynos: propagate plane initialization errors
In case of error during plane initialization load callback
incorrectly return success, this patch fixes it.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2014-11-03 01:51:28 +09:00
Inki Dae 9887e2d9da drm/exynos: vidi: fix build warning
encoder object isn't used anymore so remove it.

Signed-off-by: Inki Dae <inki.dae@samsung.com>
2014-11-03 01:51:27 +09:00
Andrzej Hajda d9aaf75762 drm/exynos: remove explicit encoder/connector de-initialization
All KMS objects are destroyed by drm_mode_config_cleanup in proper order
so component drivers should not care about it.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2014-11-03 01:51:27 +09:00
Andrzej Hajda c52142e6a8 drm/exynos: init vblank with real number of crtcs
Initialization of vblank with MAX_CRTC caused attempts
to disabling vblanks for non-existing crtcs in case
drm used fewer crtcs. The patch fixes it.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2014-11-03 01:51:27 +09:00
Paolo Bonzini a73896cb5b KVM: vmx: defer load of APIC access page address during reset
Most call paths to vmx_vcpu_reset do not hold the SRCU lock.  Defer loading
the APIC access page to the next vmentry.

This avoids the following lockdep splat:

[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
-------------------------------
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
 #0:  (&vcpu->mutex){+.+...}, at: [<ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
 0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000
 ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00
 ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08
Call Trace:
 [<ffffffff816f514f>] dump_stack+0x4e/0x71
 [<ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120
 [<ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm]
 [<ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm]
 [<ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm]
 [<ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
 [<ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
 [<ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
 [<ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
 [<ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm]
 [<ffffffff810bc664>] ? __lock_is_held+0x54/0x80
 [<ffffffff812231f0>] do_vfs_ioctl+0x300/0x520
 [<ffffffff8122ee45>] ? __fget+0x5/0x250
 [<ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0
 [<ffffffff81223491>] SyS_ioctl+0x81/0xa0
 [<ffffffff816fed6d>] system_call_fastpath+0x16/0x1b

Reported-by: Takashi Iwai <tiwai@suse.de>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Fixes: 38b9917350
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-02 08:37:18 +01:00
Jan Kiszka 282da870f4 KVM: nVMX: Disable preemption while reading from shadow VMCS
In order to access the shadow VMCS, we need to load it. At this point,
vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If
we now get preempted by Linux, vmx_vcpu_put and, on return, the
vmx_vcpu_load will work against the wrong vmcs. That can cause
copy_shadow_to_vmcs12 to corrupt the vmcs12 state.

Fix the issue by disabling preemption during the copy operation.
copy_vmcs12_to_shadow is safe from this issue as it is executed by
vmx_vcpu_run when preemption is already disabled before vmentry.

This bug is exposed by running Jailhouse within KVM on CPUs with
shadow VMCS support.  Jailhouse never expects an interrupt pending
vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12
is preempted, the active VMCS happens to have the virtual interrupt
pending flag set in the CPU-based execution controls.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-02 07:55:46 +01:00
Nadav Amit 7e46dddd6f KVM: x86: Fix far-jump to non-canonical check
Commit d1442d85cc ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete.  Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP.  As we know, this imposes a security problem.

In addition, the condition for two warnings was incorrect.

Fixes: d1442d85cc
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-02 07:54:55 +01:00
Dave Airlie 10a8fce846 Merge branch 'vmwgfx-fixes-3.18' of git://people.freedesktop.org/~thomash/linux
A critical 3.18 regression fix from Rob, (thanks!)
A fix to avoid advertizing modes we can't support from Sinclair
  (welcome Sinclair!)
and a fix for an incorrect  hash key computation from me that is
  completely harmless, but can wait 'til the next merge window if necessary.
  (I can't really bother stable with this one).

* 'vmwgfx-fixes-3.18' of git://people.freedesktop.org/~thomash/linux:
  drm/vmwgfx: Filter out modes those cannot be supported by the current VRAM size.
  drm/vmwgfx: Fix hash key computation
  drm/vmwgfx: fix lock breakage
2014-11-02 09:23:31 +10:00