Commit Graph

336619 Commits

Author SHA1 Message Date
Naoya Horiguchi ff604cf6d4 mm: hwpoison: fix action_result() to print out dirty/clean
action_result() fails to print out "dirty" even if an error occurred on
a dirty pagecache, because when we check PageDirty in action_result() it
was cleared after page isolation even if it's dirty before error
handling.  This can break some applications that monitor this message,
so should be fixed.

There are several callers of action_result() except page_action(), but
either of them are not for LRU pages but for free pages or kernel pages,
so we don't have to consider dirty or not for them.

Note that PG_dirty can be set outside page locks as described in commit
6746aff74d ("HWPOISON: shmem: call set_page_dirty() with locked
page"), so this patch does not completely closes the race window, but
just narrows it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Matthieu CASTET 5de55b265a dmapool: make DMAPOOL_DEBUG detect corruption of free marker
This can help to catch the case where hardware is writing after dma free.

[akpm@linux-foundation.org: tidy code, fix comment, use sizeof(page->offset), use pr_err()]
Signed-off-by: Matthieu Castet <matthieu.castet@parrot.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
David Rientjes 9ff4868e30 mm, oom: allow exiting threads to have access to memory reserves
Exiting threads, those with PF_EXITING set, can pagefault and require
memory before they can make forward progress.  This happens, for instance,
when a process must fault task->robust_list, a userspace structure, before
detaching its memory.

These threads also aren't guaranteed to get access to memory reserves
unless oom killed or killed from userspace.  The oom killer won't grant
memory reserves if other threads are also exiting other than current and
stalling at the same point.  This prevents needlessly killing processes
when others are already exiting.

Instead of special casing all the possible situations between PF_EXITING
getting set and a thread detaching its mm where it may allocate memory,
which probably wouldn't get updated when a change is made to the exit
path, the solution is to give all exiting threads access to memory
reserves if they call the oom killer.  This allows them to quickly
allocate, detach its mm, and free the memory it represents.

Summary of Luigi's bug report:

: He had an oom condition where threads were faulting on task->robust_list
: and repeatedly called the oom killer but it would defer killing a thread
: because it saw other PF_EXITING threads.  This can happen anytime we need
: to allocate memory after setting PF_EXITING and before detaching our mm;
: if there are other threads in the same state then the oom killer won't do
: anything unless one of them happens to be killed from userspace.
:
: So instead of only deferring for PF_EXITING and !task->robust_list, it's
: better to just give them access to memory reserves to prevent a potential
: livelock so that any other faults that may be introduced in the future in
: the exit path don't cause the same problem (and hopefully we don't allow
: too many of those!).

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Tested-by: Luigi Semenzato <semenzato@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Jeff Liu 348b465530 Documentation/cgroups/memory.txt: s/mem_cgroup_charge/mem_cgroup_change_common/
mem_cgroup_charge_common() is invoked as the entry point for cgroup limits
charge rather than mem_cgroup_charge(), as the later has been removed for
years.  Update the cgroup/memory.txt to reflect this change.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Will Deacon a1dd450bcb mm: thp: set the accessed flag for old pages on access fault
On x86 memory accesses to pages without the ACCESSED flag set result in
the ACCESSED flag being set automatically.  With the ARM architecture a
page access fault is raised instead (and it will continue to be raised
until the ACCESSED flag is set for the appropriate PTE/PMD).

For normal memory pages, handle_pte_fault will call pte_mkyoung
(effectively setting the ACCESSED flag).  For transparent huge pages,
pmd_mkyoung will only be called for a write fault.

This patch ensures that faults on transparent hugepages which do not
result in a CoW update the access flags for the faulting pmd.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Joonsoo Kim eb2db439a3 mm, highmem: get virtual address of the page using PKMAP_ADDR()
In flush_all_zero_pkmaps(), we have an index of the pkmap associated with
the page.  Using this index, we can simply get virtual address of the
page.  So change it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Joonsoo Kim a354e2c84e mm, highmem: remove page_address_pool list
We can find free page_address_map instance without the page_address_pool.
So remove it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Joonsoo Kim cc33a303f1 mm, highmem: remove useless pool_lock
The pool_lock protects the page_address_pool from concurrent access.  But,
access to the page_address_pool is already protected by kmap_lock.  So
remove it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kin <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Joonsoo Kim 4de22c0584 mm, highmem: use PKMAP_NR() to calculate an index of pkmap
To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, so change it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Cesar Eduardo Barros 6555bc0357 mm: do not call frontswap_init() during swapoff
The call to frontswap_init() was added within enable_swap_info(), which
was called not only during sys_swapon, but also to reinsert the swap_info
into the swap_list in case of failure of try_to_unuse() within
sys_swapoff.  This means that frontswap_init() might be called more than
once for the same swap area.

While as far as I could see no frontswap implementation has any problem
with it (and in fact, all the ones I found ignore the parameter passed to
frontswap_init), this could change in the future.

To prevent future problems, move the call to frontswap_init() to outside
the code shared between sys_swapon and sys_swapoff.

Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Cesar Eduardo Barros cf0cac0a09 mm: refactor reinsert of swap_info in sys_swapoff()
The block within sys_swapoff() which re-inserts the swap_info into the
swap_list in case of failure of try_to_unuse() reads a few values outside
the swap_lock.  While this is safe at that point, it is subtle code.

Simplify the code by moving the reading of these values to a separate
function, refactoring it a bit so they are read from within the swap_lock.
 This is easier to understand, and matches better the way it worked before
I unified the insertion of the swap_info from both sys_swapon and
sys_swapoff.

This change should make no functional difference.  The only real change is
moving the read of two or three structure fields to within the lock
(frontswap_map_get() is nothing more than a read of p->frontswap_map).

Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Rik van Riel e986850598 mm,vmscan: only evict file pages when we have plenty
If we have more inactive file pages than active file pages, we skip
scanning the active file pages altogether, with the idea that we do not
want to evict the working set when there is plenty of streaming IO in the
cache.

However, the code forgot to also skip scanning anonymous pages in that
situation.  That leads to the curious situation of keeping the active file
pages protected from being paged out when there are lots of inactive file
pages, while still scanning and evicting anonymous pages.

This patch fixes that situation, by only evicting file pages when we have
plenty of them and most are inactive.

[akpm@linux-foundation.org: adjust comment layout]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Jan Kara e749eb9553 mm: add comment on storage key dirty bit semantics
Add comments that dirty bit in storage key gets set whenever page content
is changed.  Hopefully if someone will use this function, he'll have a
look at one of the two places where we comment on this.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Tang Chen 712cd386fd mm/memory_hotplug.c: update start_pfn in zone and pg_data when spanned_pages == 0.
If we hot-remove memory only and leave the cpus alive, the corresponding
node will not be removed.  But the node_start_pfn and node_spanned_pages
in pg_data will be reset to 0.  In this case, when we hot-add the memory
back next time, the node_start_pfn will always be 0 because no pfn is less
than 0.  After that, if we hot-remove the memory again, it will cause
kernel panic in function find_biggest_section_pfn() when it tries to scan
all the pfns.

The zone will also have the same problem.

This patch sets start_pfn to the start_pfn of the section being added when
spanned_pages of the zone or pg_data is 0.

  ---How to reproduce---

1. hot-add a container with some memory and cpus;
2. hot-remove the container's memory, and leave cpus there;
3. hot-add these memory again;
4. hot-remove them again;

then, the kernel will panic.

  ---Call trace---

  BUG: unable to handle kernel paging request at 00000fff82a8cc38
  IP: [<ffffffff811c0d55>] find_biggest_section_pfn+0xe5/0x180
  ......
  Call Trace:
   [<ffffffff811c1124>] __remove_zone+0x184/0x1b0
   [<ffffffff811c11dc>] __remove_section+0x8c/0xb0
   [<ffffffff811c12e7>] __remove_pages+0xe7/0x120
   [<ffffffff81654f7c>] arch_remove_memory+0x2c/0x80
   [<ffffffff81655bb6>] remove_memory+0x56/0x90
   [<ffffffff813da0c8>] acpi_memory_device_remove_memory+0x48/0x73
   [<ffffffff813da55a>] acpi_memory_device_notify+0x153/0x274
   [<ffffffff813b6786>] acpi_ev_notify_dispatch+0x41/0x5f
   [<ffffffff813a3867>] acpi_os_execute_deferred+0x27/0x34
   [<ffffffff81090589>] process_one_work+0x219/0x680
   [<ffffffff810923be>] worker_thread+0x12e/0x320
   [<ffffffff81098396>] kthread+0xc6/0xd0
   [<ffffffff8167c7c4>] kernel_thread_helper+0x4/0x10
  ......
  ---[ end trace 96d845dbf33fee11 ]---

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Lai Jiangshan b9d5ab2562 slub, hotplug: ignore unrelated node's hot-adding and hot-removing
SLUB only focuses on the nodes which have normal memory and it ignores the
other node's hot-adding and hot-removing.

Aka: if some memory of a node which has no onlined memory is online, but
this new memory onlined is not normal memory (for example, highmem), we
should not allocate kmem_cache_node for SLUB.

And if the last normal memory is offlined, but the node still has memory,
we should remove kmem_cache_node for that node.  (The current code delays
it when all of the memory is offlined)

So we only do something when marg->status_change_nid_normal > 0.
marg->status_change_nid is not suitable here.

The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
for every node even the node don't have normal memory, SLAB tolerates
kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Rob Landley <rob@landley.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Lai Jiangshan d9713679db memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]
Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], it
forgets to manage node_states[N_NORMAL_MEMORY].  This may cause
node_states[N_NORMAL_MEMORY] to become incorrect.

Example, if a node is empty before online, and we online a memory which is
in ZONE_NORMAL.  And after online, node_states[N_HIGH_MEMORY] is correct,
but node_states[N_NORMAL_MEMORY] is incorrect, the online code doesn't set
the new online node to node_states[N_NORMAL_MEMORY].

The same thing will happen when offlining (the offline code doesn't clear
the node from node_states[N_NORMAL_MEMORY] when needed).  Some memory
managment code depends node_states[N_NORMAL_MEMORY], so we have to fix up
the node_states[N_NORMAL_MEMORY].

We add node_states_check_changes_online() and
node_states_check_changes_offline() to detect whether
node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] are changed
while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus the
memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] are
changed.  (We can add a @flags and reuse @status_change_nid instead of
introducing @status_change_nid_normal, but it will add much more
complexity in memory hotplug callback in every subsystem.  So introducing
@status_change_nid_normal is better and it doesn't change the sematics of
@status_change_nid)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Rob Landley <rob@landley.net>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Wen Congyang 6dcd73d701 memory-hotplug: allocate zone's pcp before onlining pages
We use __free_page() to put a page to buddy system when onlining pages.
__free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
should allocate zone's pcp before onlining pages, otherwise we will lose
some free pages.

[mhocko@suse.cz: make zone_pcp_reset independent of MEMORY_HOTREMOVE]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Wen Congyang 3ac19f8efe memory-hotplug, mm/sparse.c: clear the memory to store struct page
If sparse memory vmemmap is enabled, we can't free the memory to store
struct page when a memory device is hotremoved, because we may store
struct page in the memory to manage the memory which doesn't belong to
this memory device.  When we hotadded this memory device again, we will
reuse this memory to store struct page, and struct page may contain some
obsolete information, and we will get bad-page state:

  init_memory_mapping: [mem 0x80000000-0x9fffffff]
  Built 2 zonelists in Node order, mobility grouping on.  Total pages: 547617
  Policy zone: Normal
  BUG: Bad page state in process bash  pfn:9b6dc
  page:ffffea0002200020 count:0 mapcount:0 mapping:          (null) index:0xfdfdfdfdfdfdfdfd
  page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
  Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
  Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
  Call Trace:
   [<ffffffff810e9b30>] ? bad_page+0xb0/0x100
   [<ffffffff810ea4c3>] ? free_pages_prepare+0xb3/0x100
   [<ffffffff810ea668>] ? free_hot_cold_page+0x48/0x1a0
   [<ffffffff8112cc08>] ? online_pages_range+0x68/0xa0
   [<ffffffff8112cba0>] ? __online_page_increment_counters+0x10/0x10
   [<ffffffff81045561>] ? walk_system_ram_range+0x101/0x110
   [<ffffffff814c4f95>] ? online_pages+0x1a5/0x2b0
   [<ffffffff8135663d>] ? __memory_block_change_state+0x20d/0x270
   [<ffffffff81356756>] ? store_mem_state+0xb6/0xf0
   [<ffffffff8119e482>] ? sysfs_write_file+0xd2/0x160
   [<ffffffff8113769a>] ? vfs_write+0xaa/0x160
   [<ffffffff81137977>] ? sys_write+0x47/0x90
   [<ffffffff814e2f25>] ? async_page_fault+0x25/0x30
   [<ffffffff814ea239>] ? system_call_fastpath+0x16/0x1b
  Disabling lock debugging due to kernel taint

This patch clears the memory to store struct page to avoid unexpected error.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reported-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Yasuaki Ishimatsu 8c7b5b4ed9 memory-hotplug: suppress "Device nodeX does not have a release() function" warning
When calling unregister_node(), the function shows following message at
device_release().

"Device 'node2' does not have a release() function, it is broken and must
be fixed."

The reason is node's device struct does not have a release() function.

So the patch registers node_device_release() to the device's release()
function for suppressing the warning message.  Additionally, the patch
adds memset() to initialize a node struct into register_node().  Because
the node struct is part of node_devices[] array and it cannot be freed by
node_device_release().  So if system reuses the node struct, it has a
garbage.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Wen Congyang 8732794b16 numa: convert static memory to dynamically allocated memory for per node device
We use a static array to store struct node.  In many cases, we don't have
too many nodes, and some memory will be unused.  Convert it to per-device
dynamically allocated memory.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Wen Congyang 97d0da2204 memory-hotplug: fix NR_FREE_PAGES mismatch
NR_FREE_PAGES will be wrong after offlining pages.  We add/dec
NR_FREE_PAGES like this now:

1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES

2. don't add NR_FREE_PAGES when it is freed and the migratetype is
   MIGRATE_ISOLATE

3. dec NR_FREE_PAGES when offlining isolated pages.

4. add NR_FREE_PAGES when undoing isolate pages.

When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
NR_FREE_PAGES are right.  When we come to step4, all pages are not in
buddy system, so we don't change NR_FREE_PAGES in this step, but we change
NR_FREE_PAGES in step3.  So NR_FREE_PAGES is wrong after offlining pages.
So there is no need to change NR_FREE_PAGES in step3.

This patch also fixs a problem in step2: if the migratetype is
MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
pcppages.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jianguo Wu <wujianguo106@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Wen Congyang 7c72eb3272 memory-hotplug: auto offline page_cgroup when onlining memory block failed
When a memory block is onlined, we will try allocate memory on that node
to store page_cgroup.  If onlining the memory block failed, we don't
offline the page cgroup, and we have no chance to offline this page cgroup
unless the memory block is onlined successfully again.  It will cause that
we can't hot-remove the memory device on that node, because some memory is
used to store page cgroup.  If onlining the memory block is failed, there
is no need to stort page cgroup for this memory.  So auto offline
page_cgroup when onlining memory block failed.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:23 -08:00
Wen Congyang 95a4774d05 memory-hotplug: update mce_bad_pages when removing the memory
When we hotremove a memory device, we will free the memory to store struct
page.  If the page is hwpoisoned page, we should decrease mce_bad_pages.

[akpm@linux-foundation.org: cleanup ifdefs]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Wen Congyang b023f46813 memory-hotplug: skip HWPoisoned page when offlining pages
hwpoisoned may be set when we offline a page by the sysfs interface
/sys/devices/system/memory/soft_offline_page or
/sys/devices/system/memory/hard_offline_page. If we don't clear
this flag when onlining pages, this page can't be freed, and will
not in free list. So we can't offline these pages again. So we
should skip such page when offlining pages.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Yasuaki Ishimatsu fa7194eb99 memory hotplug: suppress "Device memoryX does not have a release() function" warning
When calling remove_memory_block(), the function shows following message
at device_release().

"Device 'memory528' does not have a release() function, it is broken and
must be fixed."

The reason is memory_block's device struct does not have a release()
function.

So the patch registers memory_block_release() to the device's release()
function for suppressing the warning message.  Additionally, the patch
moves kfree(mem) into the release function since the release function is
prepared as a means to free a memory_block struct.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Bob Liu b3092b3b73 thp: cleanup: introduce mk_huge_pmd()
Introduce mk_huge_pmd() to simplify the code

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Bob Liu fa475e517a thp: introduce hugepage_vma_check()
Multiple places do the same check.

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Bob Liu 6219049ae1 mm: introduce mm_find_pmd()
Several place need to find the pmd by(mm_struct, address), so introduce a
function to simplify it.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Bob Liu 344aa35c27 thp: clean up __collapse_huge_page_isolate
There are duplicated places using release_pte_pages().
And release_all_pte_pages() can be removed.

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Kirill A. Shutemov d84da3f9e4 mm: use IS_ENABLED(CONFIG_COMPACTION) instead of COMPACTION_BUILD
We don't need custom COMPACTION_BUILD anymore, since we have handy
IS_ENABLED().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Kirill A. Shutemov e5adfffc85 mm: use IS_ENABLED(CONFIG_NUMA) instead of NUMA_BUILD
We don't need custom NUMA_BUILD anymore, since we have handy
IS_ENABLED().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
David Rientjes 19965460e3 mm, memcg: make mem_cgroup_out_of_memory() static
mem_cgroup_out_of_memory() is only referenced from within file scope, so
it can be marked static.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Rabin Vincent 377e4f1676 mm: show migration types in show_mem
This is useful to diagnose the reason for page allocation failure for
cases where there appear to be several free pages.

Example, with this alloc_pages(GFP_ATOMIC) failure:

 swapper/0: page allocation failure: order:0, mode:0x0
 ...
 Mem-info:
 Normal per-cpu:
 CPU    0: hi:   90, btch:  15 usd:  48
 CPU    1: hi:   90, btch:  15 usd:  21
 active_anon:0 inactive_anon:0 isolated_anon:0
  active_file:0 inactive_file:84 isolated_file:0
  unevictable:0 dirty:0 writeback:0 unstable:0
  free:4026 slab_reclaimable:75 slab_unreclaimable:484
  mapped:0 shmem:0 pagetables:0 bounce:0
 Normal free:16104kB min:2296kB low:2868kB high:3444kB active_anon:0kB
 inactive_anon:0kB active_file:0kB inactive_file:336kB unevictable:0kB
 isolated(anon):0kB isolated(file):0kB present:331776kB mlocked:0kB
 dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:300kB
 slab_unreclaimable:1936kB kernel_stack:328kB pagetables:0kB unstable:0kB
 bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
 lowmem_reserve[]: 0 0

Before the patch, it's hard (for me, at least) to say why all these free
chunks weren't considered for allocation:

 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB
 1*1024kB 1*2048kB 3*4096kB = 16128kB

After the patch, it's obvious that the reason is that all of these are
in the MIGRATE_CMA (C) freelist:

 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB
 (C) 1*1024kB (C) 1*2048kB (C) 3*4096kB (C) = 16128kB

Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Namjae Jeon d0e1d66b5a writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr()
There is no reason to pass the nr_pages_dirtied argument, because
nr_pages_dirtied value from the caller is unused in
balance_dirty_pages_ratelimited_nr().

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:21 -08:00
Linus Torvalds b58ed041a3 Device tree changes for v3.8
Bug fixes, little cleanups, and documentation changes. The most invasive
 thing here touches a bunch of the arch directories to use a common build
 rule for .dtb files. There are no major changes to functionality here
 other than a ew new helper functions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQx2/sAAoJEEFnBt12D9kB8aYP/iInsIXDi6c0sGRNgYoFBZUH
 pGZH/PGcZjDD2WPUOos56KTDsWt2kIZm/4ck1mBmXMD5pKM4znIQyMTkhKFiyO0l
 CARwCjyRuaHQJHt88Pcaqhk34HGLgV7ntImMHwpIn0mNbjx7hruk9aU9Jqdcd32j
 2ANbUYU9ikgq0e9s/+xQfB6QZxH1zecQkMalhQWvihtVybpG3xW67IROZv/69uL0
 OtHZZJ7nCwXcEb84rcEHU781tO0CoQhr/pId7DirQEKaD8oNLHsejFEgGEFfFrvZ
 eFmJP/YmZh7Ll+NNTMHnQwyDNa2LFuLazbjaqCqUHQgcw9daFFeP5ga3Uqp7WZbU
 4kIxBQJ3byELnttFVQbsMQag+IAgmgfp8YdJFk3VTJDJKwpMJAO1sQeoEUHnnWAr
 cyyCzROaFt1hUkhRiXso+IYNLvb60o0/2NJRTiObPdljy9OSXW6O3wFEGk9F/6u0
 jgl9tisfLUL0orKnJ5tR3kbHaJKQd+6HgBRJNiuB+9FYO43j1cY/sYggSNCkiCMy
 RvlGYDCJ+lfmZdZ4T+QYLpjsOenFosV0JZGU6MVlZmEGzTtLyhALZQKNHuxLw1Nc
 oTe8Fx5aJrWEYe/HGqm4C43DqEiQUCmZ9NPae0JyfocmjuwARSnepvWe1XZNa/8t
 aNfK6UkTO/IL3zOtOHfd
 =LB3n
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull device tree changes from Grant Likely:
 "Here are the DT changes I've got queued up for v3.8.  As described
  below, there are a lot of bug fixes here and documentation updates but
  nothing major:

  Bug fixes, little cleanups, and documentation changes.  The most
  invasive thing here touches a bunch of the arch directories to use a
  common build rule for .dtb files.  There are no major changes to
  functionality here other than a few new helper functions."

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6: (34 commits)
  arm64: Fix the dtbs target building
  mtd: nand: davinci: fix the binding documentation
  rtc: rtc-mv: Add the device tree binding documentation
  devicetree/bindings: Move gpio-leds binding into leds directory
  of/vendor-prefixes: add Imagination Technologies
  microblaze: use new common dtc rule
  c6x: use new common dtc rule
  openrisc: use new common dtc rule
  arm64: Add dtbs target for building all the enabled dtb files
  arm64: use new common dtc rule
  ARM: dt: change .dtb build rules to build in dts directory
  kbuild: centralize .dts->.dtb rule
  Fix build when CONFIG_W1_MASTER_GPIO=m b exporting "allnodes"
  of/spi: Honour "status=disabled" property of device
  of_mdio: Honour "status=disabled" property of device
  of_i2c: Honour "status=disabled" property of device
  powerpc: Fix fallout from device_node->name constification
  of: add 'const' for of_parse_phandle parameter *np
  Documentation: correct of_platform_populate() argument list
  script: dtc: clean generated files
  ...
2012-12-11 11:30:41 -08:00
Linus Torvalds 259cdbee20 irqdomain changes for Linux v3.8
Trivial changes to irqdomain. An update to the documentation and make
 one of the error paths not quite so obnoxious.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQx25bAAoJEEFnBt12D9kBmBkP/i4TONBT0TVWVKsK7RIdZpVF
 aQCC1W2gFuo5E8JITB6gfhvSw6LMiho9ThzX0NVP52SN6k5QWm0QzlB1sjQyaKph
 +rB7pepO9+MxYuz5zPz1RPNlBA7pvu+S2e7eDD0dtWfmJn9gjeubrschZINSU25E
 g0BYdy/eJTnNwOdsASNaXNN2Bu7/gO7EpV0Z5T8fCfbih9euyKcuGhHasXGdnHQX
 /PpYu87kq0FPYyLuHBo1+9WXIG++sGtlw8SPZ7kjPLFlnis4spiajoaXrwXWMyQN
 KJTdDLHh2DSRlxQEr+Ibw2ptajJ/iNIwQdQksFn1DzDzkcCDBgl8e+0sCCUhv0eb
 sQcf/jQ6clYmwH/7SCHo93kyLt7BpY4jocw8UL6XiITTY156I0u/4YTPJNCm/ai8
 NajWsmt0otekZOmLpKQhDEsnWrAfVMfmRiQOBAhACt2KPS78FtNpt3b8jCWYyEV4
 MB5aemsTdtRGSLCITe3nZ9W0CyFDS8fhgS310DV+y3QVa7KIT7oFYy2S8FuO2AOn
 HVRNjKqZJm1jwHdgSN4BXpve/KqxycTA3Kfz/SnaOs/JNP7oWxPY/X6MUrKHmcId
 eCciJ/hXknslgH6q+1OD/7RCJL7fAt0+9xmqzKStBGcfgswC4uIn3+gHmWUwdwlE
 1f8NCvsRERl+79biglT8
 =24xd
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull irqdomain changes from Grant Likely:
 "Trivial changes to irqdomain.  An update to the documentation and make
  one of the error paths not quite so obnoxious."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  irqdomain: update documentation
  irqdomain: stop screaming about preallocated irqdescs
2012-12-11 11:30:08 -08:00
Linus Torvalds 9ada9fd5df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fixes from Borislav Petkov:

 - EDAC core error path fix, from Denis Kirjanov.

 - Generalization of AMD MCE bank names and some minor error reporting
   improvements.

 - EDAC core cleanups and simplifications, from Wei Yongjun.

 - amd64_edac fixes for sysfs-reported values, from Josh Hunt.

 - some heavy amd64_edac error reporting path shaving, leading to
   removing a bunch of code.

 - amd64_edac error injection method improvements.

 - EDAC core cleanups and fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
  EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
  EDAC: Handle error path in edac_mc_sysfs_init() properly
  MCE, AMD: Dump error status
  MCE, AMD: Report decoded error type first
  MCE, AMD: Dump CPU f/m/s triple with the error
  MCE, AMD: Remove functional unit references
  EDAC: Convert to use simple_open()
  EDAC, Calxeda highbank: Convert to use simple_open()
  EDAC: Fix mc size reported in sysfs
  EDAC: Fix csrow size reported in sysfs
  EDAC: Pass mci parent
  EDAC: Add memory controller flags
  amd64_edac: Fix csrows size and pages computation
  amd64_edac: Use DBAM_DIMM macro
  amd64_edac: Fix K8 chip select reporting
  amd64_edac: Reorganize error reporting path
  amd64_edac: Do not check whether error address is valid
  amd64_edac: Improve error injection
  amd64_edac: Cleanup error injection code
  amd64_edac: Small fixlets and cleanups
  ...
2012-12-11 11:28:43 -08:00
Linus Torvalds c45564e916 Merge branch 'for-v3.8' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull CMA and DMA-mapping update from Marek Szyprowski:
 "Another set of Contiguous Memory Allocator and DMA-mapping framework
  updates for v3.8.

  This pull request consists only of two patches.  The first fixes a
  long standing issue with dmapools (the code predates current GIT
  history), which forced all allocations to use GFP_ATOMIC flag,
  ignoring the flags passed by the caller.  The second patch changes CMA
  code to correctly use phys_addr_t type what enables support for LPAE
  systems."

* 'for-v3.8' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  drivers: cma: represent physical addresses as phys_addr_t
  mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls
2012-12-11 11:27:10 -08:00
Linus Torvalds 93874681aa The common clock framework changes for 3.8 are comprised of lots of
fixes for existing platforms as well as new ports for some ARM
 platforms.  In addition there are new clk drivers for audio devices and
 MFDs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxtdeAAoJEDqPOy9afJhJwzUP/2/oaBAGXakQf+TTOsRo2IMh
 ejwgOxFsBcspR0OrJ73TAPDqbgY3xZ+BeVdvbIiYikcZLqT9dZsoN7oa9udcu6aL
 1OxBT6F/CFnxUR4EVkpUdQ+vVIR8svxsAAv71zvaVGCeie0D7MDL2JgK8TvgRxHF
 DKxFYJ935CJC64JHJBYhW/1b4T/Tt94z/nYMijcQxkjmpEimTm/qLHpbK6OCQFUU
 fmvs3VmSA4p7hBmgXu3zp6NkOF3JJa7NWb+3kJh1UmqM7xh/CijxZP2YHhLkIdU1
 g2qhYVKIIxmAFa7xJjXY05VrjMKvAkXGNJGVwCFQHnP17By4Pni3BDsQ+61u30Nj
 B/bIRrzAC17EOh6c6pAZIbNLTHHaQGe0XQMDuHGsjgmVpn2CTRmIduEVJPiq9wAk
 lNkwqh6Dftq72Xepy1RieqFuDOO8kHSsOPqS2e9A9yDuh5bzLsKlhKWKUahhxrML
 TnRBd7NfwctoEsKy42HtrXA2+iQsQDmHXNlec3ARNgWS3Hhre7qb1d0Q00y28OTA
 RWyPoxOn1O+wQsV2cu3I1LKVo9CmNU55evHG5zSDPIA3GsrMcPZmP/4KM9Vbs3Ye
 5BIMtptUrOeZQ2PRxcTHnCbWvch5bQyvDkDiK/xR7XsiQIheE/0Ak8wGgVZ7TW4d
 0zLm7UmmkmFu4xTwf2Nk
 =GoXf
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus' of git://git.linaro.org/people/mturquette/linux

Pull clock framework changes from Mike Turquette:
 "The common clock framework changes for 3.8 are comprised of lots of
  fixes for existing platforms as well as new ports for some ARM
  platforms.  In addition there are new clk drivers for audio devices
  and MFDs."

Fix up trivial conflict in <linux/clk-provider.h> (removal of 'inline'
clashing with return type fixes)

* tag 'clk-for-linus' of git://git.linaro.org/people/mturquette/linux: (51 commits)
  MAINTAINERS: bad email address for Mike Turquette
  clk: introduce optional disable_unused callback
  clk: ux500: fix bit error
  clk: clock multiplexers may register out of order
  clk: ux500: Initial support for abx500 clock driver
  CLK: SPEAr: Remove unused dummy apb_pclk
  CLK: SPEAr: Correct index scanning done for clock synths
  CLK: SPEAr: Update clock rate table
  CLK: SPEAr: Add missing clocks
  CLK: SPEAr: Set CLK_SET_RATE_PARENT for few clocks
  CLK: SPEAr13xx: fix parent names of multiple clocks
  CLK: SPEAr13xx: Fix mux clock names
  CLK: SPEAr: Fix dev_id & con_id for multiple clocks
  clk: move IM-PD1 clocks to drivers/clk
  clk: make ICST driver handle the VCO registers
  clk: add GPLv2 headers to the Versatile clock files
  clk: mxs: Use a better name for the USB PHY clock
  clk: spear: Add stub functions for spear3[0|1|2]0_clk_init()
  CLK: clk-twl6040: fix return value check in twl6040_clk_probe()
  clk: ux500: Register nomadik keypad clock lookups for u8500
  ...
2012-12-11 11:25:08 -08:00
Linus Torvalds 505cbedab9 This is the pinctrl big pull request for v3.8.
As can be seen from the diffstat the major changes
 are:
 
 - A big conversion of the AT91 pinctrl driver and
   the associated ACKed platform changes under
   arch/arm/max-at91 and its device trees. This
   has been coordinated with the AT91 maintainers
   to go in through the pinctrl tree.
 
 - A larger chunk of changes to the SPEAr drivers
   and the addition of the "plgpio" driver for the
   SPEAr as well.
 
 - The removal of the remnants of the Nomadik driver
   from the arch/arm tree and fusion of that into
   the Nomadik driver and platform data header files.
 
 - Some local movement in the Marvell MVEBU drivers,
   these now have their own subdirectory.
 
 - The addition of a chunk of code to gpiolib under
   drivers/gpio to register gpio-to-pin range mappings
   from the GPIO side of things. This has been
   requested by Grant Likely and is now implemented,
   it is particularly useful for device tree work.
 
 Then we have incremental updates all over the place,
 many of these are cleanups and fixes from Axel Lin
 who has done a great job of removing minor mistakes
 and compilation annoyances.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQupLkAAoJEEEQszewGV1z8ykP/3yLi5hb3QstajrL3jvrHcqN
 7sc4uW1/9pCa6802nBw7qOfIGxgTAriGtAIePdtGhIkij6TLyyWfvlmUz3iJkeZ5
 4nYy69yOdNMeBGvhXBkBD4K4lL3NGZ9eR+S1rgWY0J3Y+a5upibJeaXxmYBayjBH
 bN/OiK77zaKv91zKSZ4YW9WCzrjn2E0w1mDRcWdffcyrNplY8qm/G2iXBT+UoCLa
 UoR1zxG9nqF+nQ8mL+dVtVjlHsUcj0NEp34HQrUQ8ACOaEIiSI/zDe7afOC38Iy7
 EUTV4IwKeKJyTnAN/QSzbTXF41CR/Qbihubo6sUrbAmyJXLnybVotd4Inh4ca7II
 c2TPV89tSnJWwDSizHwbY3sXIVw8ojmjYMr1ib0Z9GBGyoij1va5WqCJ4iIzTzuc
 imvDSz8ctuuxo6iOQs3smUaHXGz1V+3zvQ5v+Ioc1h9mN2LVKNa6NjmFNZmeFHLa
 44zIes51DUXizaRobOffjoTIlUkAdwYQUpRtq0hvQtgYTyUIeXzfzCNzDoT6bhK3
 VhLn4c4apETER6KtYCPu8PtxM/yyopwUj95WvnPK2fu/m+1B26jUVawomWfRtCQF
 kuovLCTTemn04jWWl3r0JovE/tVcgBrpxTYi6Z4RPY7PuD4sQ477DeM2x3DWZPQQ
 MHveLGA87735XKZkqQRR
 =rUOP
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-for-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl changes from Linus Walleij:
 "These are the first and major pinctrl changes for the v3.8 merge
  cycle.  Some of this is used as merge base for other trees so I better
  be early on the trigger.

  As can be seen from the diffstat the major changes are:

  - A big conversion of the AT91 pinctrl driver and the associated ACKed
    platform changes under arch/arm/max-at91 and its device trees.  This
    has been coordinated with the AT91 maintainers to go in through the
    pinctrl tree.

  - A larger chunk of changes to the SPEAr drivers and the addition of
    the "plgpio" driver for the SPEAr as well.

  - The removal of the remnants of the Nomadik driver from the arch/arm
    tree and fusion of that into the Nomadik driver and platform data
    header files.

  - Some local movement in the Marvell MVEBU drivers, these now have
    their own subdirectory.

  - The addition of a chunk of code to gpiolib under drivers/gpio to
    register gpio-to-pin range mappings from the GPIO side of things.
    This has been requested by Grant Likely and is now implemented, it
    is particularly useful for device tree work.

  Then we have incremental updates all over the place, many of these are
  cleanups and fixes from Axel Lin who has done a great job of removing
  minor mistakes and compilation annoyances."

* tag 'pinctrl-for-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (114 commits)
  ARM: mmp: select PINCTRL for ARCH_MMP
  pinctrl: Drop selecting PINCONF for MMP2, PXA168 and PXA910
  pinctrl: pinctrl-single: Fix error check condition
  pinctrl: SPEAr: Update error check for unsigned variables
  gpiolib: Fix use after free in gpiochip_add_pin_range
  gpiolib: rename pin range arguments
  pinctrl: single: support gpio request and free
  pinctrl: generic: add input schmitt disable parameter
  pinctrl/u300/coh901: stop spawning pinctrl from GPIO
  pinctrl/u300/coh901: let the gpio_chip register the range
  pinctrl: add function to retrieve range from pin
  gpiolib: return any error code from range creation
  pinctrl: make range registration defer properly
  gpiolib: rename find_pinctrl_*
  gpiolib: let gpiochip_add_pin_range() specify offset
  ARM: at91: pm9g45: add mmc support
  ARM: at91: Animeo IP: add mmc support
  ARM: at91: dt: add mmc pinctrl for Atmel reference boards
  ARM: at91: dt: at91sam9: add mmc pinctrl support
  ARM: at91/dts: add nodes for atmel hsmci controllers for atmel boards
  ...
2012-12-11 11:21:33 -08:00
Linus Torvalds a8936db7c2 New driver: DA9055
Added/improved support for new chips in existing drivers: Z650/670, N550/570,
 ADS7830, AMD 16h family
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxslVAAoJEMsfJm/On5mBCCIP/2zEvtwW2nEjlqlBIuIaKsW5
 +8nGOZvyd9Td0ApuC/b4UgzG4BYkOuiHFyUJGwJBcEMetNJqr650TRfS9RrLsi1w
 vOrCsIgjioWpzCEwD8EkgHLXFNk9lXYK9jlkpl434WuPvSUKwN4QE46XtHoSfAye
 NUJBj9hwDv87P68lcg7MhEyZxEPzcInPTtXMGFOiPprAKHQHeDJReSnqx8oKbmBb
 5ZD+zy+Byx1UyX1lqc77xjaInvGGW1OfUkljJb7iFdlXTqyNe0g2Chn07PMF5dJJ
 7stNCQ/z1l+Mo4mmgvGTgHcjooeZL+HpD3XQG0rERCV0X8prKHV2ctGPEQUoCt6c
 OpV08OgsaLii+2x7+otL6svyD22FoFFxS5UP1EkYZPyGekf/C5I9Wh9f77xJ+hes
 PNki3TrnNUzVAZKYOH9JgW2MJ9oYjvKkskZoSytnEoPAs1fS4rJ0a5bQf2sryXdT
 xsDw57fdjqK7A9hPzc/lIae6TYK0iMnQ4wO9nWD7ftQlH6AOT5O4vYAbwoCEks/Q
 6Bv3XCaBkt1GbWyCC+aXUWjvSdU0A3N/H7jaocwR9vcgoJmfwinJi10hVcgE1Kfu
 JVKQMvDFe4jV1j8hiQhGPSjxe+9suoaVBrWFB51W4DyDfJZQPOFd6+3om9/vGDb3
 Y3opAqGVV7IsenVtPif8
 =HQlU
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon updates from Guenter Roeck:
 "New driver: DA9055

  Added/improved support for new chips in existing drivers: Z650/670,
  N550/570, ADS7830, AMD 16h family"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (da9055) Fix chan_mux[DA9055_ADC_ADCIN3] setting
  hwmon: DA9055 HWMON driver
  hwmon: (coretemp) List TjMax for Z650/670 and N550/570
  hwmon: (coretemp) Drop N4xx, N5xx, D4xx, D5xx CPUs from tjmax table
  hwmon: (coretemp) Use model table instead of if/else to identify CPU models
  hwmon: da9052: Use da9052_reg_update for rmw operations
  hwmon: (coretemp) Drop dependency on PCI for TjMax detection on Atom CPUs
  hwmon: (ina2xx) use module_i2c_driver to simplify the code
  hwmon: (ads7828) add support for ADS7830
  hwmon: (ads7828) driver cleanup
  x86,AMD: Power driver support for AMD's family 16h processors
2012-12-11 11:20:34 -08:00
Linus Torvalds 11b84c5857 MMC highlights for 3.8:
Core:
  - Expose access to the eMMC RPMB ("Replay Protected Memory Block") area
    by extending the existing mmc_block ioctl.
  - Add SDIO powered-suspend DT properties to the core MMC DT binding.
  - Add no-1-8-v DT flag for boards where the SD controller reports that it
    supports 1.8V but the board itself has no way to switch to 1.8V.
  - More work on switching to 1.8V UHS support using a vqmmc regulator.
  - Fix up a case where the slot-gpio helper may fail to reset the host
    controller properly if a card was removed during a transfer.
  - Fix several cases where a broken device could cause an infinite loop
    while we wait for a register to update.
 
 Drivers:
  - at91-mci: Remove obsolete driver, atmel-mci handles these devices now.
  - sdhci-dove: Allow using GPIOs for card-detect notifications.
  - sdhci-esdhc: Fix for recovering from ADMA errors on broken silicon.
  - sdhci-s3c: Add pinctrl support.
  - wmt-sdmmc: New driver for WonderMedia SD/MMC controllers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxpwoAAoJEHNBYZ7TNxYMir0P/in5xv6vVq+06hMb9JTA2+M0
 +eSgNAoSQt3Zn3dTxGtcaMbrnVvegMD2sRmnHncSKoi6iUgWh7RhuT7o8vp2zgJe
 KbuOF/OF5+RSnLwgCMcOCjauhBtHO7s+NPNbUH4DMN2/xQ7fpiY+QIsxrKWtImyO
 E4v62zRm/6aa1IfGDgYTAlX8Y1QfEQhH55OYrF9UzU1m7TKsSGxYTXwx+LeLQZQ0
 j6HO+FVlmu5jbZ4V69z3qNIdiklRGQBE7D+7cJqW6btv7x4oLmyygEoNA8M8ncxX
 g5BD5oEt9oxW5OW8hYcEJ/flDErgekZekH8n+fZoNVDVq8gMJxHOV73mnN+wxRnd
 lRGLhniVOzl4cU+ObcYTBzdUV+2yR3sRHSK8nwtb+HFATetM70gBlpoWni1+sGr5
 qvFhnpzNrj0xCwoQpLerwrvCwPpyko9uxZQwO51HqhugFwWlJuTqQsnCgj+Q4MDE
 CRUq4R6TCH7oMGeGin0qyD7jyigDnh7g4pPNQmuIYJSLsDwvinisz98MQ0XC8szc
 Z1H9YXKch2woLHLPFWzNyAIlJWHXakjk5vb9uSQmkUIto/y7q+LiGVCc5uvny3rU
 pJmda+U/8R8YmSWS7pUi6OmPRfcnPuAIllEwo3lBypVpdP/fcZ6aribrbGEo6DLb
 XLFIHnl3O9/sjfUcMl+L
 =oSX3
 -----END PGP SIGNATURE-----

Merge tag 'mmc-updates-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC updates from Chris Ball:
 "MMC highlights for 3.8:

  Core:
   - Expose access to the eMMC RPMB ("Replay Protected Memory Block")
     area by extending the existing mmc_block ioctl.
   - Add SDIO powered-suspend DT properties to the core MMC DT binding.
   - Add no-1-8-v DT flag for boards where the SD controller reports
     that it supports 1.8V but the board itself has no way to switch to
     1.8V.
   - More work on switching to 1.8V UHS support using a vqmmc regulator.
   - Fix up a case where the slot-gpio helper may fail to reset the host
     controller properly if a card was removed during a transfer.
   - Fix several cases where a broken device could cause an infinite
     loop while we wait for a register to update.

  Drivers:
   - at91-mci: Remove obsolete driver, atmel-mci handles these devices
     now.
   - sdhci-dove: Allow using GPIOs for card-detect notifications.
   - sdhci-esdhc: Fix for recovering from ADMA errors on broken silicon.
   - sdhci-s3c: Add pinctrl support.
   - wmt-sdmmc: New driver for WonderMedia SD/MMC controllers."

* tag 'mmc-updates-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (65 commits)
  mmc: sdhci: implement the .card_event() method
  mmc: extend the slot-gpio card-detection to use host's .card_event() method
  mmc: add a card-event host operation
  mmc: sdhci-s3c: Fix compilation warning
  mmc: sdhci-pci: Enable SDHCI_CAN_DO_HISPD for Ricoh SDHCI controller
  mmc: sdhci-dove: allow GPIOs to be used for card detection on Dove
  mmc: sdhci-dove: use two-stage initialization for sdhci-pltfm
  mmc: sdhci-dove: use devm_clk_get()
  mmc: eSDHC: Recover from ADMA errors
  mmc: dw_mmc: remove duplicated buswidth code
  mmc: dw_mmc: relocate where dw_mci_setup_bus() is called from
  mmc: Limit MMC speed to 52MHz if not HS200
  mmc: dw_mmc: use devres functions in dw_mmc
  mmc: sh_mmcif: remove unneeded clock connection ID
  mmc: sh_mobile_sdhi: remove unneeded clock connection ID
  mmc: sh_mobile_sdhi: fix clock frequency printing
  mmc: Remove redundant null check before kfree in bus.c
  mmc: Remove redundant null check before kfree in sdio_bus.c
  mmc: sdhci-imx-esdhc: use more devm_* functions
  mmc: dt: add no-1-8-v device tree flag
  ...
2012-12-11 11:19:09 -08:00
Vitaly Andrianov 4009793e15 drivers: cma: represent physical addresses as phys_addr_t
This commit changes the CMA early initialization code to use phys_addr_t
for representing physical addresses instead of unsigned long.

Without this change, among other things, dma_declare_contiguous() simply
discards any memory regions whose address is not representable as unsigned
long.

This is a problem on 32-bit PAE machines where unsigned long is 32-bit
but physical address space is larger.

Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Cyril Chemparathy <cyril@ti.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2012-12-11 09:28:09 +01:00
Marek Szyprowski 387870f2d6 mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls
dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later  trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.

This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.

Reported-by: Soeren Moch <smoch@web.de>
Reported-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Soeren Moch <smoch@web.de>
Cc: stable@vger.kernel.org
2012-12-11 09:28:08 +01:00
Mike Turquette 8f87189653 MAINTAINERS: bad email address for Mike Turquette
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2012-12-10 22:35:32 -08:00
Mike Turquette 7c045a55c9 clk: introduce optional disable_unused callback
Some gate clocks have special needs which must be handled during the
disable-unused clocks sequence.  These needs might be driven by software
due to the fact that we're disabling a clock outside of the normal
clk_disable path and a clk's enable_count will not be accurate.  On the
other hand a specific hardware programming sequence might need to be
followed for this corner case.

This change is needed for the upcoming OMAP port to the common clock
framework.  Specifically, it is undesirable to treat the disable-unused
path identically to the normal clk_disable path since other software
layers are involved.  In this case OMAP's clockdomain code throws WARNs
and bails early due to the clock's enable_count being set to zero.  A
custom callback mitigates this problem nicely.

Cc: Paul Walmsley <paul@pwsan.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2012-12-10 22:35:02 -08:00
Linus Torvalds 29594404d7 Linux 3.7 2012-12-10 19:30:57 -08:00
Catalin Marinas 58fea354d8 arm64: Fix the dtbs target building
The arch/arm64/Makefile was not passing the right target to the
boot/dts/Makefile.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
2012-12-10 20:24:57 -06:00
Florian Fainelli 55220bb3e5 Input: matrix-keymap - provide proper module license
The matrix-keymap module is currently lacking a proper module license,
add one so we don't have this module tainting the entire kernel.  This
issue has been present since commit 1932811f42 ("Input: matrix-keymap
- uninline and prepare for device tree support")

Signed-off-by: Florian Fainelli <florian@openwrt.org>
CC: stable@vger.kernel.org # v3.5+
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-10 16:10:05 -08:00
Linus Torvalds 2c68bc72dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Netlink socket dumping had several missing verifications and checks.

    In particular, address comparisons in the request byte code
    interpreter could access past the end of the address in the
    inet_request_sock.

    Also, address family and address prefix lengths were not validated
    properly at all.

    This means arbitrary applications can read past the end of certain
    kernel data structures.

    Fixes from Neal Cardwell.

 2) ip_check_defrag() operates in contexts where we're in the process
    of, or about to, input the packet into the real protocols
    (specifically macvlan and AF_PACKET snooping).

    Unfortunately, it does a pskb_may_pull() which can modify the
    backing packet data which is not legal if the SKB is shared.  It
    very much can be shared in this context.

    Deal with the possibility that the SKB is segmented by using
    skb_copy_bits().

    Fix from Johannes Berg based upon a report by Eric Leblond.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  ipv4: ip_check_defrag must not modify skb before unsharing
  inet_diag: validate port comparison byte code to prevent unsafe reads
  inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
  inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
  inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
2012-12-10 16:07:11 -08:00