Commit Graph

10386 Commits

Author SHA1 Message Date
Linus Torvalds f6e858a00a Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc VM changes from Andrew Morton:
 "The rest of most-of-MM.  The other MM bits await a slab merge.

  This patch includes the addition of a huge zero_page.  Not a
  performance boost but it an save large amounts of physical memory in
  some situations.

  Also a bunch of Fujitsu engineers are working on memory hotplug.
  Which, as it turns out, was badly broken.  About half of their patches
  are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken.  We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text.  Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
  mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
  mm/memory.c: remove unused code from do_wp_page()
  asm-generic, mm: pgtable: consolidate zero page helpers
  mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
  hwpoison, hugetlbfs: fix RSS-counter warning
  hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
  mm: protect against concurrent vma expansion
  memcg: do not check for mm in __mem_cgroup_count_vm_event
  tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
  mm: provide more accurate estimation of pages occupied by memmap
  fs/buffer.c: remove redundant initialization in alloc_page_buffers()
  fs/buffer.c: do not inline exported function
  writeback: fix a typo in comment
  mm: introduce new field "managed_pages" to struct zone
  mm, oom: remove statically defined arch functions of same name
  mm, oom: remove redundant sleep in pagefault oom handler
  mm, oom: cleanup pagefault oom handler
  memory_hotplug: allow online/offline memory to result movable node
  numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  mm, memcg: avoid unnecessary function call when memcg is disabled
  ...
2012-12-13 13:11:15 -08:00
Linus Torvalds a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
Linus Torvalds 6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
David Rientjes c2d23f919b mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Linus Torvalds 9977d9b379 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
 "All architectures are converted to new model.  Quite a bit of that
  stuff is actually shared with architecture trees; in such cases it's
  literally shared branch pulled by both, not a cherry-pick.

  A lot of ugliness and black magic is gone (-3KLoC total in this one):

   - kernel_thread()/kernel_execve()/sys_execve() redesign.

     We don't do syscalls from kernel anymore for either kernel_thread()
     or kernel_execve():

     kernel_thread() is essentially clone(2) with callback run before we
     return to userland, the callbacks either never return or do
     successful do_execve() before returning.

     kernel_execve() is a wrapper for do_execve() - it doesn't need to
     do transition to user mode anymore.

     As a result kernel_thread() and kernel_execve() are
     arch-independent now - they live in kernel/fork.c and fs/exec.c
     resp.  sys_execve() is also in fs/exec.c and it's completely
     architecture-independent.

   - daemonize() is gone, along with its parts in fs/*.c

   - struct pt_regs * is no longer passed to do_fork/copy_process/
     copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

   - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
     still need wrappers (ones with callee-saved registers not saved in
     pt_regs on syscall entry), but the main part of those suckers is in
     kernel/fork.c now."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
  do_coredump(): get rid of pt_regs argument
  print_fatal_signal(): get rid of pt_regs argument
  ptrace_signal(): get rid of unused arguments
  get rid of ptrace_signal_deliver() arguments
  new helper: signal_pt_regs()
  unify default ptrace_signal_deliver
  flagday: kill pt_regs argument of do_fork()
  death to idle_regs()
  don't pass regs to copy_process()
  flagday: don't pass regs to copy_thread()
  bfin: switch to generic vfork, get rid of pointless wrappers
  xtensa: switch to generic clone()
  openrisc: switch to use of generic fork and clone
  unicore32: switch to generic clone(2)
  score: switch to generic fork/vfork/clone
  c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
  take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
  mn10300: switch to generic fork/vfork/clone
  h8300: switch to generic fork/vfork/clone
  tile: switch to generic clone()
  ...

Conflicts:
	arch/microblaze/include/asm/Kbuild
2012-12-12 12:22:13 -08:00
Linus Torvalds f57d54bab6 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest change affects group scheduling: we now track the runnable
  average on a per-task entity basis, allowing a smoother, exponential
  decay average based load/weight estimation instead of the previous
  binary on-the-runqueue/off-the-runqueue load weight method.

  This will inevitably disturb workloads that were in some sort of
  borderline balancing state or unstable equilibrium, so an eye has to
  be kept on regressions.

  For that reason the new load average is only limited to group
  scheduling (shares distribution) at the moment (which was also hurting
  the most from the prior, crude weight calculation and whose scheduling
  quality wins most from this change) - but we plan to extend this to
  regular SMP balancing as well in the future, which will simplify and
  speed up things a bit.

  Other changes involve ongoing preparatory work to extend NOHZ to the
  scheduler as well, eventually allowing completely irq-free user-space
  execution."

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled"
  cputime: Comment cputime's adjusting code
  cputime: Consolidate cputime adjustment code
  cputime: Rename thread_group_times to thread_group_cputime_adjusted
  cputime: Move thread_group_cputime() to sched code
  vtime: Warn if irqs aren't disabled on system time accounting APIs
  vtime: No need to disable irqs on vtime_account()
  vtime: Consolidate a bit the ctx switch code
  vtime: Explicitly account pending user time on process tick
  vtime: Remove the underscore prefix invasion
  sched/autogroup: Fix crash on reboot when autogroup is disabled
  cputime: Separate irqtime accounting from generic vtime
  cputime: Specialize irq vtime hooks
  kvm: Directly account vtime to system on guest switch
  vtime: Make vtime_account_system() irqsafe
  vtime: Gather vtime declarations to their own header file
  sched: Describe CFS load-balancer
  sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
  sched: Make __update_entity_runnable_avg() fast
  sched: Update_cfs_shares at period edge
  ...
2012-12-11 18:21:38 -08:00
Linus Torvalds 090f8ccba3 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Lots of activity:

   211 files changed, 8328 insertions(+), 4116 deletions(-)

  most of it on the tooling side.

  Main changes:

   * ftrace enhancements and fixes from Steve Rostedt.

   * uprobes fixes, cleanups and preparation for the ARM port from Oleg
     Nesterov.

   * UAPI fixes, from David Howels - prepares the arch/x86 UAPI
     transition

   * Separate perf tests into multiple objects, one per test, from Jiri
     Olsa.

   * Make hardware event translations available in sysfs, from Jiri
     Olsa.

   * Fixes to /proc/pid/maps parsing, preparatory to supporting data
     maps, from Namhyung Kim

   * Implement ui_progress for GTK, from Namhyung Kim

   * Add framework for automated perf_event_attr tests, where tools with
     different command line options will be run from a 'perf test', via
     python glue, and the perf syscall will be intercepted to verify
     that the perf_event_attr fields set by the tool are those expected,
     from Jiri Olsa

   * Add a 'link' method for hists, so that we can have the leader with
     buckets for all the entries in all the hists.  This new method is
     now used in the default 'diff' output, making the sum of the
     'baseline' column be 100%, eliminating blind spots.

   * libtraceevent fixes for compiler warnings trying to make perf it
     build on some distros, like fedora 14, 32-bit, some of the warnings
     really pointed to real bugs.

   * Add a browser for 'perf script' and make it available from the
     report and annotate browsers.  It does filtering to find the
     scripts that handle events found in the perf.data file used.  From
     Feng Tang

   * perf inject changes to allow showing where a task sleeps, from
     Andrew Vagin.

   * Makefile improvements from Namhyung Kim.

   * Add --pre and --post command hooks in 'stat', from Peter Zijlstra.

   * Don't stop synthesizing threads when one vanishes, this is for the
     existing threads when we start a tool like trace.

   * Use sched:sched_stat_runtime to provide a thread summary, this
     produces the same output as the 'trace summary' subcommand of
     tglx's original "trace" tool.

   * Support interrupted syscalls in 'trace'

   * Add an event duration column and filter in 'trace'.

   * There are references to the man pages in some tools, so try to
     build Documentation when installing, warning the user if that is
     not possible, from Borislav Petkov.

   * Give user better message if precise is not supported, from David
     Ahern.

   * Try to find cross-built objdump path by using the session
     environment information in the perf.data file header, from Irina
     Tirdea, original patch and idea by Namhyung Kim.

   * Diplays more output on features check for make V=1, so that one can
     figure out what is happening by looking at gcc output, etc.  From
     Jiri Olsa.

   * Add on_exit implementation for systems without one, e.g.  Android,
     from Bernhard Rosenkraenzer.

   * Only process events for vcpus of interest, helps handling large
     number of events, from David Ahern.

   * Cross compilation fixes for Android, from Irina Tirdea.

   * Add documentation on compiling for Android, from Irina Tirdea.

   * perf diff improvements from Jiri Olsa.

   * Target (task/user/cpu/syswide) handling improvements, from Namhyung
     Kim.

   * Add support in 'trace' for tracing workload given by command line,
     from Namhyung Kim.

   * ... and much more."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
  uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
  perf evsel: Introduce is_group_member method
  perf powerpc: Use uapi/unistd.h to fix build error
  tools: Pass the target in descend
  tools: Honour the O= flag when tool build called from a higher Makefile
  tools: Define a Makefile function to do subdir processing
  perf ui: Always compile browser setup code
  perf ui: Add ui_progress__finish()
  perf ui gtk: Implement ui_progress functions
  perf ui: Introduce generic ui_progress helper
  perf ui tui: Move progress.c under ui/tui directory
  perf tools: Add basic event modifier sanity check
  perf tools: Omit group members from perf_evlist__disable/enable
  perf tools: Ensure single disable call per event in record comand
  perf tools: Fix 'disabled' attribute config for record command
  perf tools: Fix attributes for '{}' defined event groups
  perf tools: Use sscanf for parsing /proc/pid/maps
  perf tools: Add gtk.<command> config option for launching GTK browser
  perf tools: Fix compile error on NO_NEWT=1 build
  perf hists: Initialize all of he->stat with zeroes
  ...
2012-12-11 18:14:31 -08:00
Linus Torvalds 608ff1a210 Merge branch 'akpm' (Andrew's patchbomb)
Merge misc updates from Andrew Morton:
 "About half of most of MM.  Going very early this time due to
  uncertainty over the coreautounifiednumasched things.  I'll send the
  other half of most of MM tomorrow.  The rest of MM awaits a slab merge
  from Pekka."

* emailed patches from Andrew Morton: (71 commits)
  memory_hotplug: ensure every online node has NORMAL memory
  memory_hotplug: handle empty zone when online_movable/online_kernel
  mm, memory-hotplug: dynamic configure movable memory and portion memory
  drivers/base/node.c: cleanup node_state_attr[]
  bootmem: fix wrong call parameter for free_bootmem()
  avr32, kconfig: remove HAVE_ARCH_BOOTMEM
  mm: cma: remove watermark hacks
  mm: cma: skip watermarks check for already isolated blocks in split_free_page()
  mm, oom: fix race when specifying a thread as the oom origin
  mm, oom: change type of oom_score_adj to short
  mm: cleanup register_node()
  mm, mempolicy: remove duplicate code
  mm/vmscan.c: try_to_freeze() returns boolean
  mm: introduce putback_movable_pages()
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce compaction and migration for ballooned pages
  mm: introduce a common interface for balloon pages mobility
  mm: redefine address_space.assoc_mapping
  mm: adjust address_space_operations.migratepage() return code
  arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/
  ...
2012-12-11 18:05:37 -08:00
Joonsoo Kim 81df9bff26 bootmem: fix wrong call parameter for free_bootmem()
It is strange that alloc_bootmem() returns a virtual address and
free_bootmem() requires a physical address.  Anyway, free_bootmem()'s
first parameter should be physical address.

There are some call sites for free_bootmem() with virtual address.  So fix
them.

[akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation]
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:28 -08:00
Wen Congyang 8732794b16 numa: convert static memory to dynamically allocated memory for per node device
We use a static array to store struct node.  In many cases, we don't have
too many nodes, and some memory will be unused.  Convert it to per-device
dynamically allocated memory.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Linus Torvalds c6bd5bcc49 TTY/Serial merge for 3.8-rc1
Here's the big tty/serial tree set of changes for 3.8-rc1.
 
 Contained in here is a bunch more reworks of the tty port layer from Jiri and
 bugfixes from Alan, along with a number of other tty and serial driver updates
 by the various driver authors.
 
 Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY
 layer, which is much appreciated by me.
 
 All of these have been in the linux-next tree for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlDHhgwACgkQMUfUDdst+ynI6wCcC+YeBwncnoWHvwLAJOwAZpUL
 bysAn28o780/lOsTzp3P1Qcjvo69nldo
 =hN/g
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY/Serial merge from Greg Kroah-Hartman:
 "Here's the big tty/serial tree set of changes for 3.8-rc1.

  Contained in here is a bunch more reworks of the tty port layer from
  Jiri and bugfixes from Alan, along with a number of other tty and
  serial driver updates by the various driver authors.

  Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the
  TTY layer, which is much appreciated by me.

  All of these have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fixed up some trivial conflicts in the staging tree, due to the fwserial
driver having come in both ways (but fixed up a bit in the serial tree),
and the ioctl handling in the dgrp driver having been done slightly
differently (staging tree got that one right, and removed both
TIOCGSOFTCAR and TIOCSSOFTCAR).

* tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits)
  staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer()
  staging/fwserial: Remove superfluous free
  staging/fwserial: Use WARN_ONCE when port table is corrupted
  staging/fwserial: Destruct embedded tty_port on teardown
  staging/fwserial: Fix build breakage when !CONFIG_BUG
  staging: fwserial: Add TTY-over-Firewire serial driver
  drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage
  staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user()
  staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver
  serial: ifx6x60: Add modem power off function in the platform reboot process
  serial: mxs-auart: unmap the scatter list before we copy the data
  serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled
  serial: max310x: Setup missing "can_sleep" field for GPIO
  tty/serial: fix ifx6x60.c declaration warning
  serial: samsung: add devicetree properties for non-Exynos SoCs
  serial: samsung: fix potential soft lockup during uart write
  tty: vt: Remove redundant null check before kfree.
  tty/8250 Add check for pci_ioremap_bar failure
  tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards
  tty/8250 Add XR17D15x devices to the exar_handle_irq override
  ...
2012-12-11 14:08:47 -08:00
Linus Torvalds cff2f741b8 Driver core updates for 3.8-rc1
Here's the large driver core updates for 3.8-rc1.
 
 The biggest thing here is the various __dev* marking removals.  This is
 going to be a pain for the merge with different subsystem trees, I know,
 but all of the patches included here have been ACKed by their various
 subsystem maintainers, as they wanted them to go through here.
 
 If this is too much of a pain, I can pull all of them out of this tree
 and just send you one with the other fixes/updates and then, after
 3.8-rc1 is out, do the rest of the removals to ensure we catch them all,
 it's up to you.  The merges should all be trivial, and Stephen has been
 doing them all in linux-next for a few weeks now quite easily.
 
 Other than the __dev* marking removals, there's nothing major here, some
 firmware loading updates and other minor things in the driver core.
 
 All of these have (much to Stephen's annoyance), been in linux-next for
 a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlDHkPkACgkQMUfUDdst+ykaWgCfW7AM30cv0nzoVO08ax6KjlG1
 KVYAn3z/KYazvp4B6LMvrW9y0G34Wmad
 =yvVr
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg Kroah-Hartman:
 "Here's the large driver core updates for 3.8-rc1.

  The biggest thing here is the various __dev* marking removals.  This
  is going to be a pain for the merge with different subsystem trees, I
  know, but all of the patches included here have been ACKed by their
  various subsystem maintainers, as they wanted them to go through here.

  If this is too much of a pain, I can pull all of them out of this tree
  and just send you one with the other fixes/updates and then, after
  3.8-rc1 is out, do the rest of the removals to ensure we catch them
  all, it's up to you.  The merges should all be trivial, and Stephen
  has been doing them all in linux-next for a few weeks now quite
  easily.

  Other than the __dev* marking removals, there's nothing major here,
  some firmware loading updates and other minor things in the driver
  core.

  All of these have (much to Stephen's annoyance), been in linux-next
  for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fixed up trivial conflicts in drivers/gpio/gpio-{em,stmpe}.c due to gpio
update.

* tag 'driver-core-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (93 commits)
  modpost.c: Stop checking __dev* section mismatches
  init.h: Remove __dev* sections from the kernel
  acpi: remove use of __devinit
  PCI: Remove __dev* markings
  PCI: Always build setup-bus when PCI is enabled
  PCI: Move pci_uevent into pci-driver.c
  PCI: Remove CONFIG_HOTPLUG ifdefs
  unicore32/PCI: Remove CONFIG_HOTPLUG ifdefs
  sh/PCI: Remove CONFIG_HOTPLUG ifdefs
  powerpc/PCI: Remove CONFIG_HOTPLUG ifdefs
  mips/PCI: Remove CONFIG_HOTPLUG ifdefs
  microblaze/PCI: Remove CONFIG_HOTPLUG ifdefs
  dma: remove use of __devinit
  dma: remove use of __devexit_p
  firewire: remove use of __devinitdata
  firewire: remove use of __devinit
  leds: remove use of __devexit
  leds: remove use of __devinit
  leds: remove use of __devexit_p
  mmc: remove use of __devexit
  ...
2012-12-11 13:13:55 -08:00
Linus Torvalds bad73c5aa0 ACPI and power management updates for 3.8-rc1
* Introduction of device PM QoS flags.
 
 * ACPI device power management update allowing subsystems other than
   PCI to use it more easily.
 
 * ACPI device enumeration rework allowing additional kinds of devices
   to be enumerated via ACPI.  From Mika Westerberg, Adrian Hunter,
   Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki.
 
 * ACPICA update to version 20121018 from Bob Moore and Lv Zheng.
 
 * ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu.
 
 * Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU
   hot-remove support from Toshi Kani.
 
 * ACPI EC updates from Feng Tang.
 
 * cpufreq updates from Viresh Kumar, Fabio Baltieri and others.
 
 * cpuidle changes to quickly notice governor prediction failure from
   Youquan Song.
 
 * Support for using multiple cpuidle drivers at the same time and cpuidle
   cleanups from Daniel Lezcano.
 
 * devfreq updates from Nishanth Menon and others.
 
 * cpupower update from Thomas Renninger.
 
 * Fixes and small cleanups all over the place.
 
 --
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxxevAAoJEKhOf7ml8uNsHacQAK2xoQozDddPBAaTCf1OWW/G
 J4E2qMn+gy4AtFmQ0/xeZVvvylOCn9eu/kv3QJ/bFlVoUsqTgfPwQBjT6HjF1Acn
 7yVFdr9H/tn2wi9nW2Gv6Tl2Hr/kFLpo0f5Kd40Z+P8SV5sKX5ktJcVpJ/I/P4Vh
 Qw2nWtj7hQktZDERzgG4ZpMbQToNhbLhbKWB9ad3/XQSSA9JkfgvBFgrTEGHcZD5
 bwsggjNdOAWNGeDdzRsQSDn0Alld5ILVdSJ5xKimO1O70WvKc7fqA1IdYRIeKL90
 yz4bcoYKzl9iktlw8+x5o1U9mrc8TFV5p4+zV+t5Z6pzS/J3kWvnsW4zu9sCrxFv
 Wic3SKyiem7s2dxrYyj4ZXAci3GK4ouRTrCLqk7/00tEGdwAQD1ZNfsUJp6jKayz
 FvtZUgItcOyrlQ6B4nh951OY6dI3AUYJ2NuWWNr5NZkgVAvQGV8zTGOImbeVeL2+
 pMiw14zScO3ylYilVcjTKDDMj2sDZ68mw5PIcbmksvWsCLo26jDBVDtLVmtYWyd4
 ek3WnOrQZr0R3agvOLLssMKXompvpP+N4Klf4rV+GtqGsWtHryYKys2Laju9FwFj
 yYLchxYlxhGTzqq8LjF90HDL0TWpPe6cPi+B5ow9g/SXLexbMKNQGhv3Jovm2yR3
 j54tKBWy7e9AAYEDPirX
 =6OEP
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:

 - Introduction of device PM QoS flags.

 - ACPI device power management update allowing subsystems other than
   PCI to use it more easily.

 - ACPI device enumeration rework allowing additional kinds of devices
   to be enumerated via ACPI.  From Mika Westerberg, Adrian Hunter,
   Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki.

 - ACPICA update to version 20121018 from Bob Moore and Lv Zheng.

 - ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu.

 - Introduction of acpi_handle_<level>() messaging macros and ACPI-based
   CPU hot-remove support from Toshi Kani.

 - ACPI EC updates from Feng Tang.

 - cpufreq updates from Viresh Kumar, Fabio Baltieri and others.

 - cpuidle changes to quickly notice governor prediction failure from
   Youquan Song.

 - Support for using multiple cpuidle drivers at the same time and
   cpuidle cleanups from Daniel Lezcano.

 - devfreq updates from Nishanth Menon and others.

 - cpupower update from Thomas Renninger.

 - Fixes and small cleanups all over the place.

* tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (196 commits)
  mmc: sdhci-acpi: enable runtime-pm for device HID INT33C6
  ACPI: add Haswell LPSS devices to acpi_platform_device_ids list
  ACPI: add documentation about ACPI 5 enumeration
  pnpacpi: fix incorrect TEST_ALPHA() test
  ACPI / PM: Fix header of acpi_dev_pm_detach() in acpi.h
  ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000
  ACPI : do not use Lid and Sleep button for S5 wakeup
  ACPI / PNP: Do not crash due to stale pointer use during system resume
  ACPI / video: Add "Asus UL30VT" to ACPI video detect blacklist
  ACPI: do acpisleep dmi check when CONFIG_ACPI_SLEEP is set
  spi / ACPI: add ACPI enumeration support
  gpio / ACPI: add ACPI support
  PM / devfreq: remove compiler error with module governors (2)
  cpupower: IvyBridge (0x3a and 0x3e models) support
  cpupower: Provide -c param for cpupower monitor to schedule process on all cores
  cpupower tools: Fix warning and a bug with the cpu package count
  cpupower tools: Fix malloc of cpu_info structure
  cpupower tools: Fix issues with sysfs_topology_read_file
  cpupower tools: Fix minor warnings
  cpupower tools: Update .gitignore for files created in the debug directories
  ...
2012-12-11 12:45:35 -08:00
Linus Torvalds b58ed041a3 Device tree changes for v3.8
Bug fixes, little cleanups, and documentation changes. The most invasive
 thing here touches a bunch of the arch directories to use a common build
 rule for .dtb files. There are no major changes to functionality here
 other than a ew new helper functions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQx2/sAAoJEEFnBt12D9kB8aYP/iInsIXDi6c0sGRNgYoFBZUH
 pGZH/PGcZjDD2WPUOos56KTDsWt2kIZm/4ck1mBmXMD5pKM4znIQyMTkhKFiyO0l
 CARwCjyRuaHQJHt88Pcaqhk34HGLgV7ntImMHwpIn0mNbjx7hruk9aU9Jqdcd32j
 2ANbUYU9ikgq0e9s/+xQfB6QZxH1zecQkMalhQWvihtVybpG3xW67IROZv/69uL0
 OtHZZJ7nCwXcEb84rcEHU781tO0CoQhr/pId7DirQEKaD8oNLHsejFEgGEFfFrvZ
 eFmJP/YmZh7Ll+NNTMHnQwyDNa2LFuLazbjaqCqUHQgcw9daFFeP5ga3Uqp7WZbU
 4kIxBQJ3byELnttFVQbsMQag+IAgmgfp8YdJFk3VTJDJKwpMJAO1sQeoEUHnnWAr
 cyyCzROaFt1hUkhRiXso+IYNLvb60o0/2NJRTiObPdljy9OSXW6O3wFEGk9F/6u0
 jgl9tisfLUL0orKnJ5tR3kbHaJKQd+6HgBRJNiuB+9FYO43j1cY/sYggSNCkiCMy
 RvlGYDCJ+lfmZdZ4T+QYLpjsOenFosV0JZGU6MVlZmEGzTtLyhALZQKNHuxLw1Nc
 oTe8Fx5aJrWEYe/HGqm4C43DqEiQUCmZ9NPae0JyfocmjuwARSnepvWe1XZNa/8t
 aNfK6UkTO/IL3zOtOHfd
 =LB3n
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull device tree changes from Grant Likely:
 "Here are the DT changes I've got queued up for v3.8.  As described
  below, there are a lot of bug fixes here and documentation updates but
  nothing major:

  Bug fixes, little cleanups, and documentation changes.  The most
  invasive thing here touches a bunch of the arch directories to use a
  common build rule for .dtb files.  There are no major changes to
  functionality here other than a few new helper functions."

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6: (34 commits)
  arm64: Fix the dtbs target building
  mtd: nand: davinci: fix the binding documentation
  rtc: rtc-mv: Add the device tree binding documentation
  devicetree/bindings: Move gpio-leds binding into leds directory
  of/vendor-prefixes: add Imagination Technologies
  microblaze: use new common dtc rule
  c6x: use new common dtc rule
  openrisc: use new common dtc rule
  arm64: Add dtbs target for building all the enabled dtb files
  arm64: use new common dtc rule
  ARM: dt: change .dtb build rules to build in dts directory
  kbuild: centralize .dts->.dtb rule
  Fix build when CONFIG_W1_MASTER_GPIO=m b exporting "allnodes"
  of/spi: Honour "status=disabled" property of device
  of_mdio: Honour "status=disabled" property of device
  of_i2c: Honour "status=disabled" property of device
  powerpc: Fix fallout from device_node->name constification
  of: add 'const' for of_parse_phandle parameter *np
  Documentation: correct of_platform_populate() argument list
  script: dtc: clean generated files
  ...
2012-12-11 11:30:41 -08:00
Ingo Molnar cc1b39dbf9 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull ftrace updates from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08 15:54:35 +01:00
Ingo Molnar 7e0dd574cd Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08 15:51:10 +01:00
Ingo Molnar 38130ec087 Some more cputime cleanups:
* Get rid of underscores polluting the vtime namespace
 
 * Consolidate context switch and tick handling
 
 * Improve debuggability by detecting irq unsafe callers
 
 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQq52nAAoJEIUkVEdQjox3ZRoP/RuDC59hNGu3rR0ERMM3TqW3
 SIaMSHlHQh3h8P8OASpRqBb9s0BoWD0l3xZ68TEACnRdLS50Rre2P0SSxqpkdbnL
 cj0+I7gmbxKa9c9zpm+mn1TvL2bEhg6hkWCMK9jn2SSBl33cOqKUGUfy8Gx0nryc
 q+cOZrXSgMvYKCixGubCqsTl8MKs9CrpyrLSYtFUiHFVWREPfndS9M9BB5yfKHL1
 t9qmdb5WRq2NpU6apoZBMBdPQcmQr5WswLpbhoTocpvCiEmt5RZGkSDOawPa1DHP
 2SPM7fGZIDrXCMW/g9d2mt43j/HxS9LYu9lToZCbbMqehe2Bf5jYqO1Kwi7FhedR
 NSofoXbW/j589+7I+pN66lo0pctNWxd59YDvLw22SqUFcBEUSmypM6eUwbrbVUg7
 /H0a8T/5bPwx2ukNrCW0+Zsd9X3If4K290j4lNOMLki9ikYG6IXfGw1GMwsiyFSo
 LNSnDs0ekovvWOAg1iRq8DW8j/TWoZuZUSRME2LdCde9SbkMEGgWaNYCwNLMenie
 6jZHar7SfpdRDPP6NCY85jMy5MRbyN3mzSFhMfqMKQgmFNd7ay7oRKppIkwT+qkD
 VozCvdPmCxd+orNMbWINDAhNY5RUlcPj/Em8Mue1U152rpjfNt/WZOfmujmLwNW2
 /RPQtHo+F7w7KhbylFpx
 =cL3t
 -----END PGP SIGNATURE-----

Merge tag 'sched-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core

Pull more cputime cleanups from Frederic Weisbecker:

 * Get rid of underscores polluting the vtime namespace

 * Consolidate context switch and tick handling

 * Improve debuggability by detecting irq unsafe callers

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08 15:44:43 +01:00
Ingo Molnar 222e82bef4 Merge branch 'linus' into sched/core
Pick up the autogroups fix and other fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-07 12:15:33 +01:00
Stefan Roese 667b504a2c powerpc: mpc5200: Add a3m071 board support
This patch adds the MPC5200B based a3m071 board.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2012-12-06 22:59:08 +01:00
Mihai Caraman 352df1deb2 KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to
the list of ONE_REG PPC supported registers.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: remove HV dependency, use get/put_user]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:20 +01:00
Mihai Caraman 38f988240c KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
Add EPCR support in booke mtspr/mfspr emulation. EPCR register is defined only
for 64-bit and HV categories, we will expose it at this point only to 64-bit
virtual processors running on 64-bit HV hosts.
Define a reusable setter function for vcpu's EPCR.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: move HV dependency in the code]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:19 +01:00
Mihai Caraman 95e90b43c9 KVM: PPC: bookehv: Add guest computation mode for irq delivery
When delivering guest IRQs, update MSR computation mode according to guest
interrupt computation mode found in EPCR.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: remove HV dependency in the code]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:18 +01:00
Alexander Graf 62b4db0042 KVM: PPC: Make EPCR a valid field for booke64 and bookehv
In BookE, EPCR is defined and valid when either the HV or the 64bit
category are implemented. Reflect this in the field definition.

Today the only KVM target on 64bit is HV enabled, so there is no
change in actual source code, but this keeps the code closer to the
spec and doesn't build up artificial road blocks for a PR KVM
on 64bit.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:17 +01:00
Mihai Caraman e9666ea1b3 KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
Extend MAS2 EPN mask to retain most significant bits on 64-bit hosts.
Use this mask in tlb effective address accessor.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:15 +01:00
Mihai Caraman 9e2fa64693 KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
Mask high 32 bits of MAS2's effective page number in tlbwe emulation for guests
running in 32-bit mode.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:14 +01:00
Mihai Caraman 8823a8fd0d KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
Mask high 32 bits of effective address in emulation layer for guests running
in 32-bit mode.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: fix indent]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:13 +01:00
Mihai Caraman 7cdd7a95c6 KVM: PPC: e500: Add emulation helper for getting instruction ea
Add emulation helper for getting instruction ea and refactor tlb instruction
emulation to use it.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: keep rt variable around]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:12 +01:00
Mihai Caraman e51f8f32d6 KVM: PPC: bookehv64: Add support for interrupt handling
Add interrupt handling support for 64-bit bookehv hosts. Unify 32 and 64 bit
implementations using a common stack layout and a common execution flow starting
from kvm_handler_common macro. Update documentation for 64-bit input register
values. This patch only address the bolted TLB miss exception handlers version.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:11 +01:00
Mihai Caraman ff59474684 KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
GET_VCPU define will not be implemented for 64-bit for performance reasons
so get rid of it also on 32-bit.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:10 +01:00
Mihai Caraman b50df19ccc KVM: PPC: booke: Fix get_tb() compile error on 64-bit
Include header file for get_tb() declaration.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:09 +01:00
Mihai Caraman 910040b82d KVM: PPC: e500: Silence bogus GCC warning in tlb code
64-bit GCC 4.5.1 warns about an uninitialized variable which was guarded
by a flag. Initialize the variable to make it happy.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: reword comment]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:08 +01:00
Paul Mackerras b4072df407 KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic.  Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.

To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest.  On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host.  On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca.  The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check.  We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.

If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler.  We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7.  On PPC970, the
two are equivalent because address 0x200 just contains a branch.

Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:07 +01:00
Paul Mackerras 1b400ba0cd KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations
When we change or remove a HPT (hashed page table) entry, we can do
either a global TLB invalidation (tlbie) that works across the whole
machine, or a local invalidation (tlbiel) that only affects this core.
Currently we do local invalidations if the VM has only one vcpu or if
the guest requests it with the H_LOCAL flag, though the guest Linux
kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
possibility that vcpus moving around to different physical cores might
expose stale TLB entries, there is some code in kvmppc_hv_entry to
flush the whole TLB of entries for this VM if either this vcpu is now
running on a different physical core from where it last ran, or if this
physical core last ran a different vcpu.

There are a number of problems on POWER7 with this as it stands:

- The TLB invalidation is done per thread, whereas it only needs to be
  done per core, since the TLB is shared between the threads.
- With the possibility of the host paging out guest pages, the use of
  H_LOCAL by an SMP guest is dangerous since the guest could possibly
  retain and use a stale TLB entry pointing to a page that had been
  removed from the guest.
- The TLB invalidations that we do when a vcpu moves from one physical
  core to another are unnecessary in the case of an SMP guest that isn't
  using H_LOCAL.
- The optimization of using local invalidations rather than global should
  apply to guests with one virtual core, not just one vcpu.

(None of this applies on PPC970, since there we always have to
invalidate the whole TLB when entering and leaving the guest, and we
can't support paging out guest memory.)

To fix these problems and simplify the code, we now maintain a simple
cpumask of which cpus need to flush the TLB on entry to the guest.
(This is indexed by cpu, though we only ever use the bits for thread
0 of each core.)  Whenever we do a local TLB invalidation, we set the
bits for every cpu except the bit for thread 0 of the core that we're
currently running on.  Whenever we enter a guest, we test and clear the
bit for our core, and flush the TLB if it was set.

On initial startup of the VM, and when resetting the HPT, we set all the
bits in the need_tlb_flush cpumask, since any core could potentially have
stale TLB entries from the previous VM to use the same LPID, or the
previous contents of the HPT.

Then, we maintain a count of the number of online virtual cores, and use
that when deciding whether to use a local invalidation rather than the
number of online vcpus.  The code to make that decision is extracted out
into a new function, global_invalidates().  For multi-core guests on
POWER7 (i.e. when we are using mmu notifiers), we now never do local
invalidations regardless of the H_LOCAL flag.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:05 +01:00
Paul Mackerras 3a2e7b0d76 KVM: PPC: Book3S PR: MSR_DE doesn't exist on Book 3S
The mask of MSR bits that get transferred from the guest MSR to the
shadow MSR included MSR_DE.  In fact that bit only exists on Book 3E
processors, and it is assigned the same bit used for MSR_BE on Book 3S
processors.  Since we already had MSR_BE in the mask, this just removes
MSR_DE.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:03 +01:00
Paul Mackerras 28c483b62f KVM: PPC: Book3S PR: Fix VSX handling
This fixes various issues in how we were handling the VSX registers
that exist on POWER7 machines.  First, we were running off the end
of the current->thread.fpr[] array.  Ultimately this was because the
vcpu->arch.vsr[] array is sized to be able to store both the FP
registers and the extra VSX registers (i.e. 64 entries), but PR KVM
only uses it for the extra VSX registers (i.e. 32 entries).

Secondly, calling load_up_vsx() from C code is a really bad idea,
because it jumps to fast_exception_return at the end, rather than
returning with a blr instruction.  This was causing it to jump off
to a random location with random register contents, since it was using
the largely uninitialized stack frame created by kvmppc_load_up_vsx.

In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx,
since giveup_fpu and load_up_fpu handle the extra VSX registers as well
as the standard FP registers on machines with VSX.  Also, since VSX
instructions can access the VMX registers and the FP registers as well
as the extra VSX registers, we have to load up the FP and VMX registers
before we can turn on the MSR_VSX bit for the guest.  Conversely, if
we save away any of the VSX or FP registers, we have to turn off MSR_VSX
for the guest.

To handle all this, it is more convenient for a single call to
kvmppc_giveup_ext() to handle all the state saving that needs to be done,
so we make it take a set of MSR bits rather than just one, and the switch
statement becomes a series of if statements.  Similarly kvmppc_handle_ext
needs to be able to load up more than one set of registers.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:02 +01:00
Paul Mackerras b0a94d4e23 KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registers
This adds basic emulation of the PURR and SPURR registers.  We assume
we are emulating a single-threaded core, so these advance at the same
rate as the timebase.  A Linux kernel running on a POWER7 expects to
be able to access these registers and is not prepared to handle a
program interrupt on accessing them.

This also adds a very minimal emulation of the DSCR (data stream
control register).  Writes are ignored and reads return zero.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:01 +01:00
Paul Mackerras 1cc8ed0b13 KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages
Currently, if the guest does an H_PROTECT hcall requesting that the
permissions on a HPT entry be changed to allow writing, we make the
requested change even if the page is marked read-only in the host
Linux page tables.  This is a problem since it would for instance
allow a guest to modify a page that KSM has decided can be shared
between multiple guests.

To fix this, if the new permissions for the page allow writing, we need
to look up the memslot for the page, work out the host virtual address,
and look up the Linux page tables to get the PTE for the page.  If that
PTE is read-only, we reduce the HPTE permissions to read-only.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:00 +01:00
Paul Mackerras 05dd85f793 KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPT
This fixes a bug in the code which allows userspace to read out the
contents of the guest's hashed page table (HPT).  On the second and
subsequent passes through the HPT, when we are reporting only those
entries that have changed, we were incorrectly initializing the index
field of the header with the index of the first entry we skipped
rather than the first changed entry.  This fixes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:59 +01:00
Paul Mackerras a64fd70748 KVM: PPC: Book3S HV: Reset reverse-map chains when resetting the HPT
With HV-style KVM, we maintain reverse-mapping lists that enable us to
find all the HPT (hashed page table) entries that reference each guest
physical page, with the heads of the lists in the memslot->arch.rmap
arrays.  When we reset the HPT (i.e. when we reboot the VM), we clear
out all the HPT entries but we were not clearing out the reverse
mapping lists.  The result is that as we create new HPT entries, the
lists get corrupted, which can easily lead to loops, resulting in the
host kernel hanging when it tries to traverse those lists.

This fixes the problem by zeroing out all the reverse mapping lists
when we zero out the HPT.  This incidentally means that we are also
zeroing our record of the referenced and changed bits (not the bits
in the Linux PTEs, used by the Linux MM subsystem, but the bits used
by the KVM_GET_DIRTY_LOG ioctl, and those used by kvm_age_hva() and
kvm_test_age_hva()).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:58 +01:00
Paul Mackerras a2932923cc KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT.  There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags.  The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).

This is intended for use in implementing qemu's savevm/loadvm and for
live migration.  Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
end of the HPT, it returns from the read.  Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.

The format of the data provides a simple run-length compression of the
invalid entries.  Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries.  The valid entries, 16
bytes each, follow the header.  The invalid entries are not explicitly
represented.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:57 +01:00
Paul Mackerras 6b445ad4f8 KVM: PPC: Book3S HV: Make a HPTE removal function available
This makes a HPTE removal function, kvmppc_do_h_remove(), available
outside book3s_hv_rm_mmu.c.  This will be used by the HPT writing
code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:55 +01:00
Paul Mackerras 44e5f6be62 KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs
This uses a bit in our record of the guest view of the HPTE to record
when the HPTE gets modified.  We use a reserved bit for this, and ensure
that this bit is always cleared in HPTE values returned to the guest.

The recording of modified HPTEs is only done if other code indicates
its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
The reason for this is that when later commits add facilities for
userspace to read the HPT, the first pass of reading the HPT will be
quicker if there are no (or very few) HPTEs marked as modified,
rather than having most HPTEs marked as modified.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:54 +01:00
Paul Mackerras 4879f24172 KVM: PPC: Book3S HV: Fix bug causing loss of page dirty state
This fixes a bug where adding a new guest HPT entry via the H_ENTER
hcall would lose the "changed" bit in the reverse map information
for the guest physical page being mapped.  The result was that the
KVM_GET_DIRTY_LOG could return a zero bit for the page even though
the page had been modified by the guest.

This fixes it by only modifying the index and present bits in the
reverse map entry, thus preserving the reference and change bits.
We were also unnecessarily setting the reference bit, and this
fixes that too.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:53 +01:00
Paul Mackerras 7ed661bf85 KVM: PPC: Book3S HV: Restructure HPT entry creation code
This restructures the code that creates HPT (hashed page table)
entries so that it can be called in situations where we don't have a
struct vcpu pointer, only a struct kvm pointer.  It also fixes a bug
where kvmppc_map_vrma() would corrupt the guest R4 value.

Most of the work of kvmppc_virtmode_h_enter is now done by a new
function, kvmppc_virtmode_do_h_enter, which itself calls another new
function, kvmppc_do_h_enter, which contains most of the old
kvmppc_h_enter.  The new kvmppc_do_h_enter takes explicit arguments
for the place to return the HPTE index, the Linux page tables to use,
and whether it is being called in real mode, thus removing the need
for it to have the vcpu as an argument.

Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
to handle H_ENTER hcalls from the guest that need to pin a page of
memory.  Since H_ENTER returns the index of the created HPTE in R4,
kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
in the case when it gets called from kvmppc_map_vrma on the first
VCPU_RUN ioctl.  With this, kvmppc_map_vrma instead calls
kvmppc_virtmode_do_h_enter with the address of a dummy word as the
place to store the HPTE index, thus avoiding corrupting the guest R4.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:52 +01:00
Alexander Graf 0e673fb679 KVM: PPC: Support eventfd
In order to support the generic eventfd infrastructure on PPC, we need
to call into the generic KVM in-kernel device mmio code.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:33:50 +01:00
Timur Tabi 6baf11906e powerpc/512x: don't compile any platform DIU code if the DIU is not enabled
If the DIU framebuffer driver is not enabled, then there's no point in
compiling any platform DIU code, because it will never be used.  Most of
the platform code was protected in the appropriate #ifdef, but not all.
This caused a break in some randconfig builds.

This is only a problem on the 512x platforms.  The P1022DS and MPC8610HPCD
platforms are already correct.

This patch reverts commit 12e36309f8 ("powerpc:
Option FB_FSL_DIU is not really optional for mpc512x") and restores the
ability to configure DIU support.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2012-12-03 22:13:34 +01:00
Srinivas Kandagatla 6c27b20395 powerpc/mpc52xx: use module_platform_driver macro
This patch removes some code duplication by using
module_platform_driver.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
2012-12-03 22:13:33 +01:00
Rafael J. Wysocki 9ee71f513c Merge branch 'pm-cpuidle'
* pm-cpuidle:
  cpuidle: Measure idle state durations with monotonic clock
  cpuidle: fix a suspicious RCU usage in menu governor
  cpuidle: support multiple drivers
  cpuidle: prepare the cpuidle core to handle multiple drivers
  cpuidle: move driver checking within the lock section
  cpuidle: move driver's refcount to cpuidle
  cpuidle: fixup device.h header in cpuidle.h
  cpuidle / sysfs: move structure declaration into the sysfs.c file
  cpuidle: Get typical recent sleep interval
  cpuidle: Set residency to 0 if target Cstate not enter
  cpuidle: Quickly notice prediction failure in general case
  cpuidle: Quickly notice prediction failure for repeat mode
  cpuidle / sysfs: move kobj initialization in the syfs file
  cpuidle / sysfs: change function parameter
2012-11-29 21:46:14 +01:00
David S. Miller 8a2cf062b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 12:51:17 -05:00
Grant Likely 499b42c3e4 powerpc: Fix fallout from device_node->name constification
Commit c22618a1, "drivers/of: Constify device_node->name and
->path_component_name" changes device_node name to a const value, but
the PowerPC scom code still assigns it to a non-void field in
debugfs_blob_wrapper. The /right/ solution might be to change the
debugfs_blob_wrapper->data to also be const, but that is a bit
risky. Instead, cast the value to (void*). It is a bit ugly, but it
is the safest change until it can be investigated where
debugfs_blob_wrapper can be modified.

Reported-by: Michael Neuling <mikey@neuling.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-11-29 17:27:19 +00:00
Al Viro 4f4202fe5a unify default ptrace_signal_deliver
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:23 -05:00
Al Viro afa86fc426 flagday: don't pass regs to copy_thread()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 23:43:42 -05:00
Al Viro 0bcfe54049 powerpc: switch to generic fork/clone/vfork
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 22:14:55 -05:00
Al Viro f4091322d7 Merge branches 'no-rebases', 'arch-avr32', 'arch-blackfin', 'arch-cris', 'arch-h8300', 'arch-m32r', 'arch-mn10300', 'arch-score', 'arch-sh' and 'arch-powerpc' into for-next 2012-11-28 21:52:07 -05:00
Bill Pemberton e47034c7a1 powerpc/PCI: Remove CONFIG_HOTPLUG ifdefs
Remove conditional code based on CONFIG_HOTPLUG being false.  It's
always on now in preparation of it going away as an option.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-28 12:50:22 -08:00
Shuah Khan 34daa88efd powerpc: dma_debug: add debug_dma_mapping_error support
Add dma-debug interface debug_dma_mapping_error() to debug drivers that fail
to check dma mapping errors on addresses returned by dma_map_single() and
dma_map_page() interfaces.

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2012-11-28 15:28:59 +01:00
Marcelo Tosatti 42897d866b KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization
TSC initialization will soon make use of online_vcpus.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:14 -02:00
Julius Werner a474a51549 cpuidle: Measure idle state durations with monotonic clock
Many cpuidle drivers measure their time spent in an idle state by
reading the wallclock time before and after idling and calculating the
difference. This leads to erroneous results when the wallclock time gets
updated by another processor in the meantime, adding that clock
adjustment to the idle state's time counter.

If the clock adjustment was negative, the result is even worse due to an
erroneous cast from int to unsigned long long of the last_residency
variable. The negative 32 bit integer will zero-extend and result in a
forward time jump of roughly four billion milliseconds or 1.3 hours on
the idle state residency counter.

This patch changes all affected cpuidle drivers to either use the
monotonic clock for their measurements or make use of the generic time
measurement wrapper in cpuidle.c, which was already working correctly.
Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
should always already be disabled before entering the idle function, and
not get reenabled until the generic wrapper has performed its second
measurement). It also removes the erroneous cast, making sure that
negative residency values are applied correctly even though they should
not appear anymore.

Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-27 14:17:58 +01:00
Benjamin Herrenschmidt 3991782ea3 Merge remote-tracking branch 'kumar/next' into next
Freescale updates from Kumar
2012-11-26 09:25:25 +11:00
Benjamin Herrenschmidt 2a859ab07b Merge branch 'merge' into next
Merge my own merge branch to get various fixes from there
and upstream, especially the hvc console tty refcouting fixes
which which testing is quite a bit harder...
2012-11-26 09:23:57 +11:00
Gavin Shan e716e01438 powerpc/eeh: Do not invalidate PE properly
While the EEH does recovery on the specific PE that has PCI errors,
the PCI devices belonging to the PE will be removed and the PE will
be marked as invalid since we still need the information stored in
the PE. We only invalidate the PE when it doesn't have associated
EEH devices and valid child PEs. However, the code used to check
that is wrong. The patch fixes that.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-26 09:14:16 +11:00
David S. Miller 24bc518a68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/tx.c

Minor iwlwifi conflict in TX queue disabling between 'net', which
removed a bogus warning, and 'net-next' which added some status
register poking code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-25 12:49:17 -05:00
Xuelin Shi 1723d90915 powerpc/dma/raidengine: add raidengine device
The RaidEngine is a new Freescale hardware that used for parity
computation offloading in RAID5/6.

This patch adds the device node in device tree and related binding
documentation.

Signed-off-by: Harninder Rai <harninder.rai@freescale.com>
Signed-off-by: Naveen Burmi <naveenburmi@freescale.com>
Signed-off-by: Xuelin Shi <b29237@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-11-25 07:19:51 -06:00
Varun Sethi 5320b50797 powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct
PAMU bypass enable register added to the ccsr_guts structure.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-11-25 07:19:39 -06:00
York Sun bc15236fbe powerpc/mpc85xx: Change spin table to cached memory
ePAPR v1.1 requires the spin table to be in cached memory. So we need
to change the call argument of ioremap to enable cache and coherence.
We also flush the cache after writing to spin table to keep it compatible
with previous cache-inhibit spin table. Flushing before and after
accessing spin table is recommended by ePAPR.

Signed-off-by: York Sun <yorksun@freescale.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-11-25 07:00:31 -06:00
Jia Hongtao a393d8977a powerpc/fsl-pci: Add PCI controller ATMU PM support
Power supply for PCI controller ATMU registers is off when system go to
deep-sleep state. So ATMU registers should be re-setup during PCI
controllers resume from sleep.

Signed-off-by: Jia Hongtao <B38951@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-11-25 07:00:28 -06:00
Timur Tabi b567d1c74e powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI
Function fsl_pcibios_fixup_bus() is available only if PCI is enabled.  The
MPC8610 HPCD platform file was not protecting the assigned with an #ifdef,
which results in a link failure when PCI is disabled.  Every other platform
already has this #ifdef.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-11-25 07:00:24 -06:00
Tushar Behera e9c36b0b09 powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers
The third argument for of_get_property() is a pointer, hence pass
NULL instead of 0.

Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2012-11-25 07:00:19 -06:00
Alexey Kardashevskiy bb4618823a powerpc/pseries: Fix oops with MSIs when missing EEH PEs
The new EEH code introduced a small regression, if the EEH PEs
are missin (which happens currently in qemu for example), it
will deref a NULL pointer in the MSI code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-23 13:26:05 +11:00
Frederic Weisbecker 1b2852b152 vtime: Warn if irqs aren't disabled on system time accounting APIs
System time accounting APIs such as vtime_account_system() and
vtime_account_idle() need to be irqsafe. Current callers include
irq entry, exit and kvm, all of which have been checked against that
requirement. Now it's better to grow that with an automatic check
in case we have further callers or we missed something.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-11-20 15:42:51 +01:00
Frederic Weisbecker e3942ba040 vtime: Consolidate a bit the ctx switch code
On ia64 and powerpc, vtime context switch only consists
in flushing system and user pending time, plus a few
arch housekeeping.

Consolidate that into a generic implementation. s390 is
a special case because pending user and system time accounting
there is hard to dissociate. So it's keeping its own implementation.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-11-19 16:41:32 +01:00
Frederic Weisbecker bcebdf8465 vtime: Explicitly account pending user time on process tick
All vtime implementations just flush the user time on process
tick. Consolidate that in generic code by calling a user time
accounting helper. This avoids an indirect call in ia64 and
prepare to also consolidate vtime context switch code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-11-19 16:41:21 +01:00
Frederic Weisbecker fd25b4c2f2 vtime: Remove the underscore prefix invasion
Prepending irq-unsafe vtime APIs with underscores was actually
a bad idea as the result is a big mess in the API namespace that
is even waiting to be further extended. Also these helpers
are always called from irq safe callers except kvm. Just
provide a vtime_account_system_irqsafe() for this specific
case so that we can remove the underscore prefix on other
vtime functions.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-11-19 16:40:16 +01:00
Eric W. Biederman 17cf22c33e pidns: Use task_active_pid_ns where appropriate
The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
aka ns_of_pid(task_pid(tsk)) should have the same number of
cache line misses with the practical difference that
ns_of_pid(task_pid(tsk)) is released later in a processes life.

Furthermore by using task_active_pid_ns it becomes trivial
to write an unshare implementation for the the pid namespace.

So I have used task_active_pid_ns everywhere I can.

In fork since the pid has not yet been attached to the
process I use ns_of_pid, to achieve the same effect.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:09 -08:00
Adam Buchbinder 48fc7f7e78 Fix misspellings of "whether" in comments.
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:31:35 +01:00
Masanari Iida 02582e9bcc treewide: fix typo of "suport" in various comments and Kconfig
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:16:09 +01:00
Daniel Borkmann 5082dfb716 PPC: net: bpf_jit_comp: add VLAN instructions for BPF JIT
This patch is a follow-up for patch "net: filter: add vlan tag access"
to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:12:47 -05:00
Daniel Borkmann 02871903a1 PPC: net: bpf_jit_comp: add XOR instruction for BPF JIT
This patch is a follow-up for patch "filter: add XOR instruction for use
with X/K" that implements BPF PowerPC JIT parts for the BPF XOR operation.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:12:47 -05:00
Grant Likely c22618a11d drivers/of: Constify device_node->name and ->path_component_name
Neither of these should ever be changed once set. Make them const and
fix up the users that try to modify it in-place. In one case
kmalloc+memcpy is replaced with kstrdup() to avoid modifying the string.

Build tested with defconfigs on ARM, PowerPC, Sparc, MIPS, x86 among
others.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Julian Calaby <julian.calaby@gmail.com>
2012-11-17 12:05:57 +00:00
Ian Munsie cedddd812a powerpc: Disable relocation on exceptions when kexecing
Since we don't know if they new kernel we are kexecing into has been
built to support relocation on exceptions, we disable them before we
kexec.

We do NOT disable them if we are execing a kdump kernel, because we
want to change as little state as possible and it is likely that we are
execing ourselves and will be able to handle them anyway.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:08 +11:00
Ian Munsie fc8effa4e4 powerpc: Enable relocation on during exceptions at boot
We currently do this synchronously at boot from setup_arch. On a large
system this could hypothetically take a little while to complete, so
currently we will give up if we are asked to wait for more than a second
in total.

If we actually start hitting that timeout in practice we can always move
this code into a kernel thread to take care of it in the background.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:08 +11:00
Ian Munsie cca55d9ddf powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function
I am going to use this in the next patch, better to have this code in
one place rather than three.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:07 +11:00
Ian Munsie 798042da4e powerpc: Add wrappers to enable/disable relocation on exceptions
These wrappers hide the parameters that have to be passed to H_SET_MODE
to enable/disable relocation on during exceptions.

As noted in the comments, since these have partition wide scope, they
may take some time to complete and must be periodically retried until
H_SUCCESS is returned.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:07 +11:00
Ian Munsie d8f48ecc0e powerpc: Add set_mode hcall
This new hcall in POWER8 is used to set various resource mode registers.
eg. it can set address translation mode on interrupt (note: partition wide
scope)

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:06 +11:00
Michael Neuling b0302722ee powerpc: Setup relocation on exceptions for bare metal systems
This turns on MMU on execptions via AIL field in the LPCR.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:06 +11:00
Michael Neuling f7c32c24f5 powerpc: Move initial mfspr LPCR out of __init_LPCR
We want to change what's initially set in the LPCR, so start by taking the move
from LPCR out of the function and into the caller.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:05 +11:00
Michael Neuling c1fb6816fb powerpc: Add relocation on exception vector handlers
POWER8/v2.07 allows exceptions to be taken with the MMU still on.

A new set of exception vectors is added at 0xc000_0000_0000_4xxx.  When the HW
takes us here, MSR IR/DR will be set already and we no longer need a costly
RFID to turn the MMU back on again.

The original 0x0 based exception vectors remain for when the HW can't leave the
MMU on.  Examples of this are when we can't trust the current MMU mappings,
like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU
was off already.  In these cases the HW will take us to the original 0x0 based
exception vectors with the MMU off as before.

This uses the new macros added previously too implement these new execption
vectors at 0xc000_0000_0000_4xxx.  We exit these exception vectors using
mflr/blr (rather than mtspr SSR0/RFID), since we don't need the costly MMU
switch anymore.

This moves the __end_interrupts marker down past these new 0x4000 vectors since
they will need to be copied down to 0x0 when the kernel is not at 0x0.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:05 +11:00
Michael Neuling 4700dfaf1e powerpc: Add new macros needed for relocation on exceptions
POWER8/v2.07 allows exceptions to be taken with the MMU still on.

A new set of exception vectors is added at 0xc000_0000_0000_4xxx.  When the HW
takes us here, MSR IR/DR will be set already and we no longer need a costly
RFID to turn the MMU back on again.

The original 0x0 based exception vectors remain for when the HW can't leave the
MMU on.  Examples of this are when we can't trust the current the MMU mappings,
like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU
was off already.  In these cases the HW will take us to the original 0x0 based
exception vectors with the MMU off as before.

The below macros are copies of the macros used at the 0x0 offset but modified
to handle the MMU being on.  In these macros we use the link register to jump
to the secondary handlers rather than using RFID (RFID was also use to turn on
the MMU).

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:04 +11:00
Michael Neuling 742415d6b6 powerpc: Turn syscall handler into macros
This turns the syscall handler into macros as we are going to want to reuse
them again later.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:04 +11:00
Michael Neuling 61e2390ede powerpc: Make load_hander handle upto 64k offset
If we change load_hander() to use an ori instead of addi, we can load handlers
upto 64k away provided we are still 64k aligned.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:03 +11:00
Michael Neuling faab4dd2d2 powerpc: Remove unessessary 0x3000 location enforcement
This removes the large gap between 0x1800 and 0x3000.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:03 +11:00
Michael Neuling 278a6cdc39 powerpc: Whitespace changes in exception64s.S
Remove redundancy spaces and make tab usage consistent.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:02 +11:00
Benjamin Herrenschmidt de1bb03af7 Merge branch 'dt' into next 2012-11-15 15:02:44 +11:00
Anton Blanchard 11ee7e99f3 powerpc: Fix CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n build
If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n,
the kernel fails when we run at a non zero offset. It turns out
we were incorrectly wrapping some of the relocatable kernel code
with CONFIG_CRASH_DUMP.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:02:03 +11:00
Michael Neuling c674e703cb powerpc: Add POWER8 architected mode to cputable
A PVR of 0x0F000004 means we are arch v2.07 complicate ie, POWER8.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:02:00 +11:00
Michael Neuling df77c79920 powerpc/pseries: Update ibm,architecture.vec for PAPR 2.7/POWER8
Update ibm,architecture.vec for POWER8 and allows us to support more
than one parition per core.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:01:39 +11:00
JoonSoo Kim 03737439d8 powerpc: Change free_bootmem() to kfree()
commit ea96025a('Don't use alloc_bootmem() in init_IRQ() path')
changed alloc_bootmem() to kzalloc(),
but missed to change free_bootmem() to kfree().
So correct it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:01:16 +11:00
Aravinda Prasad a53fd61ac2 powerpc/ptrace: Enable hardware breakpoint upon re-registering
On powerpc, ptrace will disable hardware breakpoint request once the
breakpoint is hit. It is the responsibility of the caller to set it
again. However, when the caller sets the hardware breakpoint again
using ptrace(PTRACE_SET_DEBUGREG, child_pid, 0, addr), the hardware
breakpoint is not enabled.

While gdb's approach is to unregister and re-register the hardware
breakpoint every time the breakpoint is hit - which is working fine,
this could affect other programs trying to re-register hardware
breakpoint without unregistering.

This patch enables hardware breakpoint if the caller is re-registering.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:01:13 +11:00
Akinobu Mita 79597be99a powerpc: Use asm-generic/bitops/le.h
The only difference between powerpc and asm-generic le-bitops is
test_bit_le().  Usually all bitops require a long aligned bitmap.
But powerpc test_bit_le() can take an unaligned address.

There is no special callsite of test_bit_le() that needs unaligned
access in powerpc as far as I can see.  So convert to use
asm-generic/bitops/le.h for powerpc.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:01:10 +11:00
Akinobu Mita 2237f4f40a powerpc: Remove BITOP_MASK and BITOP_WORD from asm/bitops.h
Replace BITOP_MASK and BITOP_WORD with BIT_MASK and BIT_WORD defined
in linux/bitops.h and remove BITOP_* which are not used anymore.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:01:07 +11:00
Akinobu Mita c5a0809a24 powerpc/iommu: Use bitmap library
- Caluculate the bitmap size with BITS_TO_LONGS()
 - Use bitmap_empty() to verify that all bits are cleared

This also includes a printk to pr_warn() conversion.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:01:04 +11:00
Yang Li 8a56e1ee92 powerpc: Fix typos in Freescale copyright claims
There are many cases that Semiconductor is misspelled.  The patch
fix these typos.

Signed-off-by: Li Yang <leoli@freescale.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:58 +11:00
Anton Blanchard 5e0f9ea784 powerpc: Remove stale function prototypes from setup.h
I noticed a couple of function prototypes for functions that
no longer exist. Remove them.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:54 +11:00
Anton Blanchard 560285cd2c powerpc: Move most of setup.h out of uapi
Most of setup.h should not be exported to userspace, so move it
back. All we are left with is the asm-generic include to pick
up the COMMAND_LINE_SIZE define.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:51 +11:00
Michael Neuling 51cf2b30a5 powerpc: Fix denorm symbol name
Fix global symbol name to match actual denorm_exception_hv label.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:48 +11:00
Michael Neuling 71e1849724 powerpc: POWER8 cputable entry
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:45 +11:00
Michael Neuling aec937b1ee powerpc: Add POWER8 setup code
Just a copy of POWER7 for now.  Will update with new code later.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:42 +11:00
Michael Neuling cd5daaf713 powerpc: make POWER7 setup code name generic
We are going to reuse this in POWER8 so make the name generic.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:39 +11:00
Michael Ellerman da11195779 powerpc/perf: Add missing L2 constraint handling in Power7 PMU
If we have two cache events that require different settings of the L2SEL
bits in MMCR1 then we can not schedule those events simultaneously. Add
logic to the constraint handling to express that.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:36 +11:00
Andreas Schwab bb29b71937 powerpc/powermac/cpufreq_32: Set non-infinite transition time for 7447A driver
The transition time for the 7447A is around 8ms which makes it possible
to use the ondemand governor.  This has been tested on the iBook G4
(PowerBook6,7).

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:33 +11:00
Michael Neuling ec1b33dcd2 powerpc/ptrace: Remove unused addr parameter in ppc_del_hwdebug()
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:29 +11:00
Michael Neuling 84295dfc59 powerpc/ptrace: Fix spelling mistake
s/intruction/instruction/

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:26 +11:00
K.Prasad 6c7a2856ad powerpc/hw-breakpoint: Use generic hw-breakpoint interfaces for new PPC ptrace flags
PPC_PTRACE_GETHWDBGINFO, PPC_PTRACE_SETHWDEBUG and PPC_PTRACE_DELHWDEBUG are
PowerPC specific ptrace flags that use the watchpoint register. While they are
targeted primarily towards BookE users, user-space applications such as GDB
have started using them for BookS too. This patch enables the use of generic
hardware breakpoint interfaces for these new flags.

Apart from the usual benefits of using generic hw-breakpoint interfaces, these
changes allow debuggers (such as GDB) to use a common set of ptrace flags for
their watchpoint needs and allow more precise breakpoint specification (length
of the variable can be specified).

Mikey added: rebased and added dbginfo.features around #ifdef
             CONFIG_HAVE_HW_BREAKPOINT

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:23 +11:00
Michael Ellerman 16b86bf252 powerpc: Remove no longer used ppc_md.idle_loop()
The last user of ppc_md.idle_loop() was removed when we dropped the
legacy iSeries code, in commit 8ee3e0d.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:20 +11:00
Li Zhong 12660b1702 powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning !
This patch tries to fix the following BUG report:

[    0.012313] BUG: MAX_STACK_TRACE_ENTRIES too low!
[    0.012318] turning off the locking correctness validator.
[    0.012321] Call Trace:
[    0.012330] [c00000017666f6d0] [c000000000012128] .show_stack+0x78/0x184 (unreliable)
[    0.012339] [c00000017666f780] [c0000000000b6348] .save_trace+0x12c/0x14c
[    0.012345] [c00000017666f800] [c0000000000b7448] .mark_lock+0x2bc/0x710
[    0.012351] [c00000017666f8b0] [c0000000000bb198] .__lock_acquire+0x748/0xaec
[    0.012357] [c00000017666f9b0] [c0000000000bb684] .lock_acquire+0x148/0x194
[    0.012365] [c00000017666fa80] [c00000000069371c] .mutex_lock_nested+0x84/0x4ec
[    0.012372] [c00000017666fb90] [c000000000096998] .smpboot_register_percpu_thread+0x3c/0x10c
[    0.012380] [c00000017666fc30] [c0000000009ba910] .spawn_ksoftirqd+0x28/0x48
[    0.012386] [c00000017666fcb0] [c00000000000a98c] .do_one_initcall+0xd8/0x1d0
[    0.012392] [c00000017666fd60] [c00000000000b1f8] .kernel_init+0x120/0x398

[    0.012398] [c00000017666fe30] [c000000000009ad4] .ret_from_kernel_thread+0x5c/0x64
[    0.012404] [c00000017666fa00] [c00000017666fb20] 0xc00000017666fb20
[    0.012410] [c00000017666fa80] [c00000000069371c] .mutex_lock_nested+0x84/0x4ec
[    0.012416] [c00000017666fb90] [c000000000096998] .smpboot_register_percpu_thread+0x3c/0x10c
[    0.012422] [c00000017666fc30] [c0000000009ba910] .spawn_ksoftirqd+0x28/0x48
[    0.012427] [c00000017666fcb0] [c00000000000a98c] .do_one_initcall+0xd8/0x1d0
[    0.012433] [c00000017666fd60] [c00000000000b1f8] .kernel_init+0x120/0x398

[    0.012439] [c00000017666fe30] [c000000000009ad4] .ret_from_kernel_thread+0x5c/0x64
.......

The reason is that the back chain of c00000017666fe30
(ret_from_kernel_thread) contains some invalid value, which might form a
loop.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:17 +11:00
Benjamin Herrenschmidt ab7f961a58 powerpc/powernv: Fix OPAL debug entry
OPAL provides the firmware base/entry in registers at boot time
for debugging purposes. We had a bug in the code trying to stash
these into the appropriate kernel globals (a line of code was
probably dropped by accident back when this was merged)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:14 +11:00
Julia Lawall bc26957c6c powerpc/rtas_flash: Eliminate possible double free
The function initialize_flash_pde_data is only called four times.  All four
calls are in the function rtas_flash_init, and on the failure of any of the
calls, remove_flash_pde is called on the third argument of each of the
calls.  There is thus no need for initialize_flash_pde_data to call
remove_flash_pde on the same argument.  remove_flash_pde kfrees the data
field of its argument, and does not clear that field, so this amounts ot a
possible double free.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier f,free,a;
parameter list[n] ps;
type T;
expression e;
@@

f(ps,T a,...) {
  ... when any
      when != a = e
  if(...) { ... free(a); ... return ...; }
  ... when any
}

@@
identifier r.f,r.free;
expression x,a;
expression list[r.n] xs;
@@

* x = f(xs,a,...);
  if (...) { ... free(a); ... return ...; }
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:11 +11:00
Gavin Shan 490e078d6a powerpc/pnv: Avoid bogus output
There're couples of functions defined to print debugging messages
during initializing P7IOC. However, we got bogus output from those
functions like pe_info(). The problem here is that the message
level (the first parameter to printk()) isn't printable and that
caused the bogus output.

The patch fixes the issue by merging __pe_printk() to the macro
define_pe_printk_level() so that we can pass the message level
directly to printk().

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:08 +11:00
Srinivas Kandagatla 40c935ae3d powerpc/sysdev: Use module_platform_driver macro
This patch removes some code duplication by using
module_platform_driver.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:05 +11:00
Michael Ellerman b2bb65f680 powerpc/xmon: Fallback to printk() in xmon_printf() if udbg is not setup
It is possible to configure a kernel which has xmon enabled, but has no
udbg backend to provide IO. This can make xmon rather confusing, as it
produces no output, blocks for two seconds, and then returns.

As a last resort we can instead try to printk(), which may deadlock or
otherwise crash, but tries quite hard not to.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:02 +11:00
Michael Ellerman 0104cd6839 powerpc/xmon: Fiddle xmon_depth_to_print logic in xmon_show_stack()
Currently xmon_depth_to_print is static and global, but it's only
ever used in xmon_show_stack().

At least with a modern compiler it's inlined, so there's no point
in it being static, we could #define it but it's only used in one
place.

By reworking the logic we can drop count and just decrement the
max value as a loop counter. Also switch to a while loop so we
actually print no more than 64 frames as you'd expect based on the
variable name.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:59 +11:00
Michael Ellerman c4de38093e powerpc/xmon: Use STACK_FRAME_OVERHEAD in xmon_show_stack()
We use STACK_FRAME_OVERHEAD in the exception vectors to establish
the exception frame, so it should be good enough to use here.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:55 +11:00
Michael Ellerman c5c5714d50 powerpc/xmon: Remove unused #defines
Neither REGS_PER_LINE or LAST_VOLATILE are used, nor have they ever
been as far back as I can see.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:52 +11:00
Michael Ellerman b3dc19cddc powerpc/xmon: Remove renaming #defines of scanhex() and skipbl()
We have two #defines that rename scanhex() and skipbl() to
xmon_scanhex() and xmon_skipbl() - but no one ever uses those
names.

So the only effect is to rename the actual symbols in the generated
code, and AFACIS there is no reason to do that, so drop them.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:49 +11:00
Michael Ellerman 33b5cd6866 powerpc/xmon: Merge start.c into nonstdio.c
The routines in start.c are only ever called from nonstdio.c, so if we
move them in there they can become static which is nice.

I suspect the idea behind the separation was that start.c could be
replaced in order to build xmon in userland. If anyone still cares about
doing that we could handle that with an ifdef or two.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:46 +11:00
Michael Ellerman 88c6d62641 powerpc/xmon: Make xmon_getchar() static
xmon_getchar() is only called from within nonstdio.c, so make it static.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:43 +11:00
Michael Ellerman 08702c73a6 powerpc/xmon: Remove empty xmon_map_scc()
This has been empty since 2005, commit 51d3082 "Unify udbg (#2)".

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:40 +11:00
Michael Ellerman eb1c2abb61 powerpc/xmon: Remove unused xmon_expect() & xmon_read_poll()
It looks like xmon_expect() was used for doing xmon over a modem (!?),
that code was dropped in 2005 in commit 51d3082 "Unify udbg (#2)".

Once xmon_expect() is gone xmon_read_poll() is unused, drop it too.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:37 +11:00
Michael Ellerman 6432200aa8 powerpc/udbg: Remove unused udbg_read()
The last user of udbg_read() was removed in 2005, in commit fca5dcd
"Simplify and clean up the xmon terminal I/O".

Given we haven't needed it for 7 years we can probably drop it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:33 +11:00
Tony Breeds 0dc3289c79 powerpc: Add asm/debug.h to get powerpc_debugfs_root
Since the "Disintegrate asm/system.h for PowerPC"
(ae3a197e3d) This has been failing when
DEBUG is #defined.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:27 +11:00
Tony Breeds 1afc149def powerpc/47x: Use the new ppc-opcode infrastructure
Don't use 47x only #defines for TLBIVAX or ICBT, supply and use helpers
in ppc-opcode.h

This fixes a compile breakage.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:24 +11:00
Matthew McClintock 8662d0bcab powerpc: dtc is required to build dtb files
Fixes this following:

$ make distclean; make corenet32_smp_defconfig; make p4080ds.dtb
  CLEAN   arch/powerpc/boot
  CLEAN   scripts/basic
  CLEAN   scripts/dtc
  CLEAN   scripts/genksyms
  CLEAN   scripts/kconfig
  CLEAN   scripts/mod
  CLEAN   scripts
  CLEAN   include/config include/generated
  CLEAN   .config
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  SHIPPED scripts/kconfig/zconf.tab.c
  SHIPPED scripts/kconfig/zconf.lex.c
  SHIPPED scripts/kconfig/zconf.hash.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf --silentoldconfig Kconfig
  DTC     arch/powerpc/boot/p4080ds.dtb
/bin/sh: /local/home/mattsm/git/linux/scripts/dtc/dtc: No such file or directory
make[1]: *** [arch/powerpc/boot/p4080ds.dtb] Error 1
make: *** [p4080ds.dtb] Error 2

Signed-off-by: Matthew McClintock <msm@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:21 +11:00
Nishanth Aravamudan be812195d8 powerpc/pseries: Double NR_CPUS in defconfig
Anticipating growth in coming years, we should ensure we are getting a
good lead on testing.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:59:17 +11:00
Nathan Fontenot f459d63e16 powerpc+of: Remove the pSeries_reconfig.h file
Remove the pSeries_reconfig.h header file. At this point there is only one
definition in the file, pSeries_coalesce_init(), which can be
moved to rtas.h.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:56:55 +11:00
Nathan Fontenot 79d1c71295 powerpc+of: Rename the drivers/of prom_* functions to of_*
Rename the prom_*_property routines of the generic OF code to of_*_property.
This brings them in line with the naming used by the rest of the OF code.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:56:52 +11:00
Nathan Fontenot 1cf3d8b3d2 powerpc+of: Add of node/property notification chain for adds and removes
This patch moves the notification chain for updates to the device tree
from the powerpc/pseries code to the base OF code. This makes this
functionality available to all architectures.

Additionally the notification chain is updated to allow notifications
for property add/remove/update. To make this work a pointer to a new
struct (of_prop_reconfig) is passed to the routines in the notification chain.
The of_prop_reconfig property contains a pointer to the node containing the
property and a pointer to the property itself. In the case of property
updates, the property pointer refers to the new property.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:56:41 +11:00
Nathan Fontenot f594972083 powerpc+of: Move of_drconf_cell struct definition to asm/prom.h
This patch moves the definition of the of_drconf_cell struct to asm/prom.h
to make it available for all powerpc/pseries code.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 09:43:55 +11:00
Nathan Fontenot e81b3295bc powerpc+of: Add /proc device tree updating to of node add/remove
When adding or removing a device tree node we should also update
the device tree in /proc/device-tree. This action is already done in the
generic OF code for adding/removing properties of a node. This patch adds
this functionality for nodes.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 09:43:54 +11:00
David Sharp 8cbd9cc625 tracing,x86: Add a TSC trace_clock
In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups
v7:
Generic arch bits in Kbuild.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-13 15:48:27 -05:00
David S. Miller d4185bbf62 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c

Minor conflict between the BCM_CNIC define removal in net-next
and a bug fix added to net.  Based upon a conflict resolution
patch posted by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-10 18:32:51 -05:00
Oleg Nesterov 65b2c8f0e5 uprobes/powerpc: Do not use arch_uprobe_*_step() helpers
No functional changes.

powerpc is the only user of arch_uprobe_enable/disable_step() helpers,
but they should die. They can not be used correctly, every arch needs
its own implementation (like x86 does). And they do not really help
even as initial-and-almost-working code, arch_uprobe_*_xol() hooks can
easily use user_enable/disable_single_step() directly.

Change arch_uprobe_*_step() to do nothing, and convert powerpc to use
ptrace helpers. This is equally wrong, powerpc needs the arch-specific
fixes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-11-03 17:15:12 +01:00
Oleg Nesterov f57d56dd29 uprobes/powerpc: Don't clear TIF_UPROBE in do_notify_resume()
Cleanup. No need to clear TIF_UPROBE, uprobe_notify_resume() does this.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-11-03 17:15:10 +01:00
Pavel Emelyanov a8fc927780 sk-filter: Add ability to get socket filter program (v2)
The SO_ATTACH_FILTER option is set only. I propose to add the get
ability by using SO_ATTACH_FILTER in getsockopt. To be less
irritating to eyes the SO_GET_FILTER alias to it is declared. This
ability is required by checkpoint-restore project to be able to
save full state of a socket.

There are two issues with getting filter back.

First, kernel modifies the sock_filter->code on filter load, thus in
order to return the filter element back to user we have to decode it
into user-visible constants. Fortunately the modification in question
is interconvertible.

Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
speed up the run-time division by doing kernel_k = reciprocal(user_k).
Bad news is that different user_k may result in same kernel_k, so we
can't get the original user_k back. Good news is that we don't have
to do it. What we need to is calculate a user2_k so, that

  reciprocal(user2_k) == reciprocal(user_k) == kernel_k

i.e. if it's re-loaded back the compiled again value will be exactly
the same as it was. That said, the user2_k can be calculated like this

  user2_k = reciprocal(kernel_k)

with an exception, that if kernel_k == 0, then user2_k == 1.

The optlen argument is treated like this -- when zero, kernel returns
the amount of sock_fprog elements in filter, otherwise it should be
large enough for the sock_fprog array.

changes since v1:
* Declared SO_GET_FILTER in all arch headers
* Added decode of vlan-tag codes

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01 11:17:15 -04:00
Alexander Graf 63a1909190 PPC: ePAPR: Convert hcall header to uapi (round 2)
The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-31 13:45:32 +01:00
Alexander Graf 0588000eac Merge commit 'origin/queue' into for-queue
Conflicts:
	arch/powerpc/include/asm/Kbuild
	arch/powerpc/include/uapi/asm/Kbuild
2012-10-31 13:36:18 +01:00
Paul Mackerras 8b5869ad85 KVM: PPC: Book3S HV: Fix thinko in try_lock_hpte()
This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the "bits" variable.  Nevertheless it should still be fixed.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:59 +01:00
Paul Mackerras 9f8c8c7812 KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0
Commit 55b665b026 ("KVM: PPC: Book3S HV: Provide a way for userspace
to get/set per-vCPU areas") includes a check on the length of the
dispatch trace log (DTL) to make sure the buffer is at least one entry
long.  This is appropriate when registering a buffer, but the
interface also allows for any existing buffer to be unregistered by
specifying a zero address.  In this case the length check is not
appropriate.  This makes the check conditional on the address being
non-zero.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:58 +01:00
Paul Mackerras c7b676709c KVM: PPC: Book3S HV: Fix accounting of stolen time
Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:57 +01:00
Paul Mackerras 8455d79e21 KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:56 +01:00
Paul Mackerras 2f12f03436 KVM: PPC: Book3S HV: Fixes for late-joining threads
If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest.  Currently this only happens
rarely, when a vcpu is first started.  This fixes some bugs and
omissions in the code in this case.

First, we need to check for VPA updates for the latecomer and make
a DTL entry for it.  Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core.  To handle this correctly
we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap.  A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-30 10:54:55 +01:00