License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2017-02-09 01:51:29 +08:00
|
|
|
#ifndef _LINUX_SCHED_MM_H
|
|
|
|
#define _LINUX_SCHED_MM_H
|
|
|
|
|
2017-02-09 01:51:54 +08:00
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/atomic.h>
|
2017-02-09 01:51:29 +08:00
|
|
|
#include <linux/sched.h>
|
2017-02-04 07:16:44 +08:00
|
|
|
#include <linux/mm_types.h>
|
2017-02-03 03:56:33 +08:00
|
|
|
#include <linux/gfp.h>
|
2018-01-30 04:20:17 +08:00
|
|
|
#include <linux/sync_core.h>
|
2017-02-09 01:51:29 +08:00
|
|
|
|
2017-02-02 02:08:20 +08:00
|
|
|
/*
|
|
|
|
* Routines for handling mm_structs
|
|
|
|
*/
|
2018-02-01 08:15:51 +08:00
|
|
|
extern struct mm_struct *mm_alloc(void);
|
2017-02-02 02:08:20 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* mmgrab() - Pin a &struct mm_struct.
|
|
|
|
* @mm: The &struct mm_struct to pin.
|
|
|
|
*
|
|
|
|
* Make sure that @mm will not get freed even after the owning task
|
|
|
|
* exits. This doesn't guarantee that the associated address space
|
|
|
|
* will still exist later on and mmget_not_zero() has to be used before
|
|
|
|
* accessing it.
|
|
|
|
*
|
|
|
|
* This is a preferred way to to pin @mm for a longer/unbounded amount
|
|
|
|
* of time.
|
|
|
|
*
|
|
|
|
* Use mmdrop() to release the reference acquired by mmgrab().
|
|
|
|
*
|
2018-03-22 03:22:47 +08:00
|
|
|
* See also <Documentation/vm/active_mm.rst> for an in-depth explanation
|
2017-02-02 02:08:20 +08:00
|
|
|
* of &mm_struct.mm_count vs &mm_struct.mm_users.
|
|
|
|
*/
|
|
|
|
static inline void mmgrab(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
atomic_inc(&mm->mm_count);
|
|
|
|
}
|
|
|
|
|
2018-02-22 06:45:17 +08:00
|
|
|
extern void __mmdrop(struct mm_struct *mm);
|
|
|
|
|
|
|
|
static inline void mmdrop(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The implicit full barrier implied by atomic_dec_and_test() is
|
|
|
|
* required by the membarrier system call before returning to
|
|
|
|
* user-space, after storing to rq->curr.
|
|
|
|
*/
|
|
|
|
if (unlikely(atomic_dec_and_test(&mm->mm_count)))
|
|
|
|
__mmdrop(mm);
|
|
|
|
}
|
2017-02-02 02:08:20 +08:00
|
|
|
|
2019-04-19 08:50:52 +08:00
|
|
|
/*
|
|
|
|
* This has to be called after a get_task_mm()/mmget_not_zero()
|
|
|
|
* followed by taking the mmap_sem for writing before modifying the
|
|
|
|
* vmas or anything the coredump pretends not to change from under it.
|
|
|
|
*
|
2019-06-14 06:56:11 +08:00
|
|
|
* It also has to be called when mmgrab() is used in the context of
|
|
|
|
* the process, but then the mm_count refcount is transferred outside
|
|
|
|
* the context of the process to run down_write() on that pinned mm.
|
|
|
|
*
|
2019-04-19 08:50:52 +08:00
|
|
|
* NOTE: find_extend_vma() called from GUP context is the only place
|
|
|
|
* that can modify the "mm" (notably the vm_start/end) under mmap_sem
|
|
|
|
* for reading and outside the context of the process, so it is also
|
|
|
|
* the only case that holds the mmap_sem for reading that must call
|
|
|
|
* this function. Generally if the mmap_sem is hold for reading
|
|
|
|
* there's no need of this check after get_task_mm()/mmget_not_zero().
|
|
|
|
*
|
|
|
|
* This function can be obsoleted and the check can be removed, after
|
|
|
|
* the coredump code will hold the mmap_sem for writing before
|
|
|
|
* invoking the ->core_dump methods.
|
|
|
|
*/
|
|
|
|
static inline bool mmget_still_valid(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
return likely(!mm->core_state);
|
|
|
|
}
|
|
|
|
|
2017-02-02 02:08:20 +08:00
|
|
|
/**
|
|
|
|
* mmget() - Pin the address space associated with a &struct mm_struct.
|
|
|
|
* @mm: The address space to pin.
|
|
|
|
*
|
|
|
|
* Make sure that the address space of the given &struct mm_struct doesn't
|
|
|
|
* go away. This does not protect against parts of the address space being
|
|
|
|
* modified or freed, however.
|
|
|
|
*
|
|
|
|
* Never use this function to pin this address space for an
|
|
|
|
* unbounded/indefinite amount of time.
|
|
|
|
*
|
|
|
|
* Use mmput() to release the reference acquired by mmget().
|
|
|
|
*
|
2018-03-22 03:22:47 +08:00
|
|
|
* See also <Documentation/vm/active_mm.rst> for an in-depth explanation
|
2017-02-02 02:08:20 +08:00
|
|
|
* of &mm_struct.mm_count vs &mm_struct.mm_users.
|
|
|
|
*/
|
|
|
|
static inline void mmget(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
atomic_inc(&mm->mm_users);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool mmget_not_zero(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
return atomic_inc_not_zero(&mm->mm_users);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* mmput gets rid of the mappings and all user-space */
|
|
|
|
extern void mmput(struct mm_struct *);
|
2017-10-04 07:15:00 +08:00
|
|
|
#ifdef CONFIG_MMU
|
|
|
|
/* same as above but performs the slow path from the async context. Can
|
|
|
|
* be called from the atomic context as well
|
|
|
|
*/
|
|
|
|
void mmput_async(struct mm_struct *);
|
|
|
|
#endif
|
2017-02-02 02:08:20 +08:00
|
|
|
|
|
|
|
/* Grab a reference to a task's mm, if it is not already going away */
|
|
|
|
extern struct mm_struct *get_task_mm(struct task_struct *task);
|
|
|
|
/*
|
|
|
|
* Grab a reference to a task's mm, if it is not already going away
|
|
|
|
* and ptrace_may_access with the mode parameter passed to it
|
|
|
|
* succeeds.
|
|
|
|
*/
|
|
|
|
extern struct mm_struct *mm_access(struct task_struct *task, unsigned int mode);
|
2019-11-07 05:55:38 +08:00
|
|
|
/* Remove the current tasks stale references to the old mm_struct on exit() */
|
|
|
|
extern void exit_mm_release(struct task_struct *, struct mm_struct *);
|
|
|
|
/* Remove the current tasks stale references to the old mm_struct on exec() */
|
|
|
|
extern void exec_mm_release(struct task_struct *, struct mm_struct *);
|
2017-02-02 02:08:20 +08:00
|
|
|
|
2017-02-02 19:18:24 +08:00
|
|
|
#ifdef CONFIG_MEMCG
|
|
|
|
extern void mm_update_next_owner(struct mm_struct *mm);
|
|
|
|
#else
|
|
|
|
static inline void mm_update_next_owner(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_MEMCG */
|
|
|
|
|
|
|
|
#ifdef CONFIG_MMU
|
2018-04-11 07:34:53 +08:00
|
|
|
extern void arch_pick_mmap_layout(struct mm_struct *mm,
|
|
|
|
struct rlimit *rlim_stack);
|
2017-02-02 19:18:24 +08:00
|
|
|
extern unsigned long
|
|
|
|
arch_get_unmapped_area(struct file *, unsigned long, unsigned long,
|
|
|
|
unsigned long, unsigned long);
|
|
|
|
extern unsigned long
|
|
|
|
arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
|
|
|
|
unsigned long len, unsigned long pgoff,
|
|
|
|
unsigned long flags);
|
|
|
|
#else
|
2018-04-11 07:34:53 +08:00
|
|
|
static inline void arch_pick_mmap_layout(struct mm_struct *mm,
|
|
|
|
struct rlimit *rlim_stack) {}
|
2017-02-02 19:18:24 +08:00
|
|
|
#endif
|
|
|
|
|
2017-02-02 19:32:21 +08:00
|
|
|
static inline bool in_vfork(struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
bool ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* need RCU to access ->real_parent if CLONE_VM was used along with
|
|
|
|
* CLONE_PARENT.
|
|
|
|
*
|
|
|
|
* We check real_parent->mm == tsk->mm because CLONE_VFORK does not
|
|
|
|
* imply CLONE_VM
|
|
|
|
*
|
|
|
|
* CLONE_VFORK can be used with CLONE_PARENT/CLONE_THREAD and thus
|
|
|
|
* ->real_parent is not necessarily the task doing vfork(), so in
|
|
|
|
* theory we can't rely on task_lock() if we want to dereference it.
|
|
|
|
*
|
|
|
|
* And in this case we can't trust the real_parent->mm == tsk->mm
|
|
|
|
* check, it can be false negative. But we do not care, if init or
|
|
|
|
* another oom-unkillable task does this it should blame itself.
|
|
|
|
*/
|
|
|
|
rcu_read_lock();
|
|
|
|
ret = tsk->vfork_done && tsk->real_parent->mm == tsk->mm;
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
mm: introduce memalloc_nofs_{save,restore} API
GFP_NOFS context is used for the following 5 reasons currently:
- to prevent from deadlocks when the lock held by the allocation
context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because the
allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on other
reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives
Unfortunately overuse of this allocation context brings some problems to
the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.
In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.
Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.
Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.
PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.
This patch shouldn't introduce any functional changes.
Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.
[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-04 05:53:15 +08:00
|
|
|
/*
|
|
|
|
* Applies per-task gfp context to the given allocation flags.
|
|
|
|
* PF_MEMALLOC_NOIO implies GFP_NOIO
|
|
|
|
* PF_MEMALLOC_NOFS implies GFP_NOFS
|
2019-03-06 07:47:40 +08:00
|
|
|
* PF_MEMALLOC_NOCMA implies no allocation from CMA region.
|
2017-02-03 03:43:54 +08:00
|
|
|
*/
|
mm: introduce memalloc_nofs_{save,restore} API
GFP_NOFS context is used for the following 5 reasons currently:
- to prevent from deadlocks when the lock held by the allocation
context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because the
allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on other
reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives
Unfortunately overuse of this allocation context brings some problems to
the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.
In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.
Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.
Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.
PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.
This patch shouldn't introduce any functional changes.
Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.
[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-04 05:53:15 +08:00
|
|
|
static inline gfp_t current_gfp_context(gfp_t flags)
|
2017-02-03 03:43:54 +08:00
|
|
|
{
|
2019-03-06 07:47:40 +08:00
|
|
|
if (unlikely(current->flags &
|
|
|
|
(PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_NOCMA))) {
|
|
|
|
/*
|
|
|
|
* NOIO implies both NOIO and NOFS and it is a weaker context
|
|
|
|
* so always make sure it makes precedence
|
|
|
|
*/
|
|
|
|
if (current->flags & PF_MEMALLOC_NOIO)
|
|
|
|
flags &= ~(__GFP_IO | __GFP_FS);
|
|
|
|
else if (current->flags & PF_MEMALLOC_NOFS)
|
|
|
|
flags &= ~__GFP_FS;
|
|
|
|
#ifdef CONFIG_CMA
|
|
|
|
if (current->flags & PF_MEMALLOC_NOCMA)
|
|
|
|
flags &= ~__GFP_MOVABLE;
|
|
|
|
#endif
|
|
|
|
}
|
2017-02-03 03:43:54 +08:00
|
|
|
return flags;
|
|
|
|
}
|
|
|
|
|
2017-03-03 17:13:38 +08:00
|
|
|
#ifdef CONFIG_LOCKDEP
|
lockdep: fix fs_reclaim annotation
While revisiting my Btrfs swapfile series [1], I introduced a situation
in which reclaim would lock i_rwsem, and even though the swapon() path
clearly made GFP_KERNEL allocations while holding i_rwsem, I got no
complaints from lockdep. It turns out that the rework of the fs_reclaim
annotation was broken: if the current task has PF_MEMALLOC set, we don't
acquire the dummy fs_reclaim lock, but when reclaiming we always check
this _after_ we've just set the PF_MEMALLOC flag. In most cases, we can
fix this by moving the fs_reclaim_{acquire,release}() outside of the
memalloc_noreclaim_{save,restore}(), althought kswapd is slightly
different. After applying this, I got the expected lockdep splats.
1: https://lwn.net/Articles/625412/
Link: http://lkml.kernel.org/r/9f8aa70652a98e98d7c4de0fc96a4addcee13efe.1523778026.git.osandov@fb.com
Fixes: d92a8cfcb37e ("locking/lockdep: Rework FS_RECLAIM annotation")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-08 08:07:02 +08:00
|
|
|
extern void __fs_reclaim_acquire(void);
|
|
|
|
extern void __fs_reclaim_release(void);
|
2017-03-03 17:13:38 +08:00
|
|
|
extern void fs_reclaim_acquire(gfp_t gfp_mask);
|
|
|
|
extern void fs_reclaim_release(gfp_t gfp_mask);
|
|
|
|
#else
|
lockdep: fix fs_reclaim annotation
While revisiting my Btrfs swapfile series [1], I introduced a situation
in which reclaim would lock i_rwsem, and even though the swapon() path
clearly made GFP_KERNEL allocations while holding i_rwsem, I got no
complaints from lockdep. It turns out that the rework of the fs_reclaim
annotation was broken: if the current task has PF_MEMALLOC set, we don't
acquire the dummy fs_reclaim lock, but when reclaiming we always check
this _after_ we've just set the PF_MEMALLOC flag. In most cases, we can
fix this by moving the fs_reclaim_{acquire,release}() outside of the
memalloc_noreclaim_{save,restore}(), althought kswapd is slightly
different. After applying this, I got the expected lockdep splats.
1: https://lwn.net/Articles/625412/
Link: http://lkml.kernel.org/r/9f8aa70652a98e98d7c4de0fc96a4addcee13efe.1523778026.git.osandov@fb.com
Fixes: d92a8cfcb37e ("locking/lockdep: Rework FS_RECLAIM annotation")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-08 08:07:02 +08:00
|
|
|
static inline void __fs_reclaim_acquire(void) { }
|
|
|
|
static inline void __fs_reclaim_release(void) { }
|
2017-03-03 17:13:38 +08:00
|
|
|
static inline void fs_reclaim_acquire(gfp_t gfp_mask) { }
|
|
|
|
static inline void fs_reclaim_release(gfp_t gfp_mask) { }
|
|
|
|
#endif
|
|
|
|
|
2018-05-29 16:26:44 +08:00
|
|
|
/**
|
|
|
|
* memalloc_noio_save - Marks implicit GFP_NOIO allocation scope.
|
|
|
|
*
|
|
|
|
* This functions marks the beginning of the GFP_NOIO allocation scope.
|
|
|
|
* All further allocations will implicitly drop __GFP_IO flag and so
|
|
|
|
* they are safe for the IO critical section from the allocation recursion
|
|
|
|
* point of view. Use memalloc_noio_restore to end the scope with flags
|
|
|
|
* returned by this function.
|
|
|
|
*
|
|
|
|
* This function is safe to be used from any context.
|
|
|
|
*/
|
2017-02-03 03:43:54 +08:00
|
|
|
static inline unsigned int memalloc_noio_save(void)
|
|
|
|
{
|
|
|
|
unsigned int flags = current->flags & PF_MEMALLOC_NOIO;
|
|
|
|
current->flags |= PF_MEMALLOC_NOIO;
|
|
|
|
return flags;
|
|
|
|
}
|
|
|
|
|
2018-05-29 16:26:44 +08:00
|
|
|
/**
|
|
|
|
* memalloc_noio_restore - Ends the implicit GFP_NOIO scope.
|
|
|
|
* @flags: Flags to restore.
|
|
|
|
*
|
|
|
|
* Ends the implicit GFP_NOIO scope started by memalloc_noio_save function.
|
|
|
|
* Always make sure that that the given flags is the return value from the
|
|
|
|
* pairing memalloc_noio_save call.
|
|
|
|
*/
|
2017-02-03 03:43:54 +08:00
|
|
|
static inline void memalloc_noio_restore(unsigned int flags)
|
|
|
|
{
|
|
|
|
current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
|
|
|
|
}
|
|
|
|
|
2018-05-29 16:26:44 +08:00
|
|
|
/**
|
|
|
|
* memalloc_nofs_save - Marks implicit GFP_NOFS allocation scope.
|
|
|
|
*
|
|
|
|
* This functions marks the beginning of the GFP_NOFS allocation scope.
|
|
|
|
* All further allocations will implicitly drop __GFP_FS flag and so
|
|
|
|
* they are safe for the FS critical section from the allocation recursion
|
|
|
|
* point of view. Use memalloc_nofs_restore to end the scope with flags
|
|
|
|
* returned by this function.
|
|
|
|
*
|
|
|
|
* This function is safe to be used from any context.
|
|
|
|
*/
|
mm: introduce memalloc_nofs_{save,restore} API
GFP_NOFS context is used for the following 5 reasons currently:
- to prevent from deadlocks when the lock held by the allocation
context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because the
allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on other
reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives
Unfortunately overuse of this allocation context brings some problems to
the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.
In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.
Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.
Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.
PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.
This patch shouldn't introduce any functional changes.
Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.
[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-04 05:53:15 +08:00
|
|
|
static inline unsigned int memalloc_nofs_save(void)
|
|
|
|
{
|
|
|
|
unsigned int flags = current->flags & PF_MEMALLOC_NOFS;
|
|
|
|
current->flags |= PF_MEMALLOC_NOFS;
|
|
|
|
return flags;
|
|
|
|
}
|
|
|
|
|
2018-05-29 16:26:44 +08:00
|
|
|
/**
|
|
|
|
* memalloc_nofs_restore - Ends the implicit GFP_NOFS scope.
|
|
|
|
* @flags: Flags to restore.
|
|
|
|
*
|
|
|
|
* Ends the implicit GFP_NOFS scope started by memalloc_nofs_save function.
|
|
|
|
* Always make sure that that the given flags is the return value from the
|
|
|
|
* pairing memalloc_nofs_save call.
|
|
|
|
*/
|
mm: introduce memalloc_nofs_{save,restore} API
GFP_NOFS context is used for the following 5 reasons currently:
- to prevent from deadlocks when the lock held by the allocation
context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because the
allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on other
reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives
Unfortunately overuse of this allocation context brings some problems to
the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.
In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.
Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.
Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.
PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.
This patch shouldn't introduce any functional changes.
Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.
[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-04 05:53:15 +08:00
|
|
|
static inline void memalloc_nofs_restore(unsigned int flags)
|
|
|
|
{
|
|
|
|
current->flags = (current->flags & ~PF_MEMALLOC_NOFS) | flags;
|
|
|
|
}
|
|
|
|
|
2017-05-09 06:59:50 +08:00
|
|
|
static inline unsigned int memalloc_noreclaim_save(void)
|
|
|
|
{
|
|
|
|
unsigned int flags = current->flags & PF_MEMALLOC;
|
|
|
|
current->flags |= PF_MEMALLOC;
|
|
|
|
return flags;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void memalloc_noreclaim_restore(unsigned int flags)
|
|
|
|
{
|
|
|
|
current->flags = (current->flags & ~PF_MEMALLOC) | flags;
|
|
|
|
}
|
|
|
|
|
2019-03-06 07:47:40 +08:00
|
|
|
#ifdef CONFIG_CMA
|
|
|
|
static inline unsigned int memalloc_nocma_save(void)
|
|
|
|
{
|
|
|
|
unsigned int flags = current->flags & PF_MEMALLOC_NOCMA;
|
|
|
|
|
|
|
|
current->flags |= PF_MEMALLOC_NOCMA;
|
|
|
|
return flags;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void memalloc_nocma_restore(unsigned int flags)
|
|
|
|
{
|
|
|
|
current->flags = (current->flags & ~PF_MEMALLOC_NOCMA) | flags;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
static inline unsigned int memalloc_nocma_save(void)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void memalloc_nocma_restore(unsigned int flags)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8.
The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs. All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.
The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated. However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg. This patch series
contains two such concrete use-cases i.e. fsnotify and buffer_head.
The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener. The events
are allocated in the context of the event producer. However they should
be charged to the event consumer. Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.
To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg. In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged. For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.
This patch (of 2):
A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener. This can cause
system level memory pressure or OOMs. So, it's better to account the
fsnotify kmem caches to the memcg of the listener.
However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer. This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.
There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener. So, SLAB_ACCOUNT is enough for these caches.
The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.
The allocations from the event caches happen in the context of the event
producer. For such caches we will need to remote charge the allocations
to the listener's memcg. Thus we save the memcg reference in the
fsnotify_group structure of the listener.
This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.
[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 06:46:39 +08:00
|
|
|
#ifdef CONFIG_MEMCG
|
|
|
|
/**
|
|
|
|
* memalloc_use_memcg - Starts the remote memcg charging scope.
|
|
|
|
* @memcg: memcg to charge.
|
|
|
|
*
|
|
|
|
* This function marks the beginning of the remote memcg charging scope. All the
|
|
|
|
* __GFP_ACCOUNT allocations till the end of the scope will be charged to the
|
|
|
|
* given memcg.
|
|
|
|
*
|
|
|
|
* NOTE: This function is not nesting safe.
|
|
|
|
*/
|
|
|
|
static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
|
|
|
|
{
|
|
|
|
WARN_ON_ONCE(current->active_memcg);
|
|
|
|
current->active_memcg = memcg;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* memalloc_unuse_memcg - Ends the remote memcg charging scope.
|
|
|
|
*
|
|
|
|
* This function marks the end of the remote memcg charging scope started by
|
|
|
|
* memalloc_use_memcg().
|
|
|
|
*/
|
|
|
|
static inline void memalloc_unuse_memcg(void)
|
|
|
|
{
|
|
|
|
current->active_memcg = NULL;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
static inline void memalloc_use_memcg(struct mem_cgroup *memcg)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void memalloc_unuse_memcg(void)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-10-20 01:30:15 +08:00
|
|
|
#ifdef CONFIG_MEMBARRIER
|
|
|
|
enum {
|
2018-01-30 04:20:13 +08:00
|
|
|
MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY = (1U << 0),
|
|
|
|
MEMBARRIER_STATE_PRIVATE_EXPEDITED = (1U << 1),
|
|
|
|
MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY = (1U << 2),
|
|
|
|
MEMBARRIER_STATE_GLOBAL_EXPEDITED = (1U << 3),
|
2018-01-30 04:20:17 +08:00
|
|
|
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY = (1U << 4),
|
|
|
|
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE = (1U << 5),
|
|
|
|
};
|
|
|
|
|
|
|
|
enum {
|
|
|
|
MEMBARRIER_FLAG_SYNC_CORE = (1U << 0),
|
2017-10-20 01:30:15 +08:00
|
|
|
};
|
|
|
|
|
2018-01-30 04:20:11 +08:00
|
|
|
#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
|
|
|
|
#include <asm/membarrier.h>
|
|
|
|
#endif
|
|
|
|
|
2018-01-30 04:20:17 +08:00
|
|
|
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
|
|
|
|
{
|
2019-09-20 01:37:01 +08:00
|
|
|
if (current->mm != mm)
|
|
|
|
return;
|
2018-01-30 04:20:17 +08:00
|
|
|
if (likely(!(atomic_read(&mm->membarrier_state) &
|
|
|
|
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
|
|
|
|
return;
|
|
|
|
sync_core_before_usermode();
|
|
|
|
}
|
|
|
|
|
2019-09-20 01:37:02 +08:00
|
|
|
extern void membarrier_exec_mmap(struct mm_struct *mm);
|
|
|
|
|
2017-10-20 01:30:15 +08:00
|
|
|
#else
|
2018-01-30 04:20:11 +08:00
|
|
|
#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
|
|
|
|
static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
|
|
|
|
struct mm_struct *next,
|
|
|
|
struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif
|
2019-09-20 01:37:02 +08:00
|
|
|
static inline void membarrier_exec_mmap(struct mm_struct *mm)
|
2017-10-20 01:30:15 +08:00
|
|
|
{
|
|
|
|
}
|
2018-01-30 04:20:17 +08:00
|
|
|
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
}
|
2017-10-20 01:30:15 +08:00
|
|
|
#endif
|
|
|
|
|
2017-02-09 01:51:29 +08:00
|
|
|
#endif /* _LINUX_SCHED_MM_H */
|