2014-12-17 01:58:19 +08:00
|
|
|
/*
|
|
|
|
* core.c - Kernel Live Patching Core
|
|
|
|
*
|
|
|
|
* Copyright (C) 2014 Seth Jennings <sjenning@redhat.com>
|
|
|
|
* Copyright (C) 2014 SUSE
|
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU General Public License
|
|
|
|
* as published by the Free Software Foundation; either version 2
|
|
|
|
* of the License, or (at your option) any later version.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program; if not, see <http://www.gnu.org/licenses/>.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/mutex.h>
|
|
|
|
#include <linux/slab.h>
|
|
|
|
#include <linux/list.h>
|
|
|
|
#include <linux/kallsyms.h>
|
|
|
|
#include <linux/livepatch.h>
|
2016-03-23 08:03:18 +08:00
|
|
|
#include <linux/elf.h>
|
|
|
|
#include <linux/moduleloader.h>
|
2017-03-07 01:20:29 +08:00
|
|
|
#include <linux/completion.h>
|
2015-12-04 06:33:26 +08:00
|
|
|
#include <asm/cacheflush.h>
|
2017-03-08 21:27:05 +08:00
|
|
|
#include "core.h"
|
2017-02-14 09:42:37 +08:00
|
|
|
#include "patch.h"
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
#include "transition.h"
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2015-01-20 23:26:19 +08:00
|
|
|
/*
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
* klp_mutex is a coarse lock which serializes access to klp data. All
|
|
|
|
* accesses to klp-related variables and structures must have mutex protection,
|
|
|
|
* except within the following functions which carefully avoid the need for it:
|
|
|
|
*
|
|
|
|
* - klp_ftrace_handler()
|
|
|
|
* - klp_update_patch_state()
|
2015-01-20 23:26:19 +08:00
|
|
|
*/
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
DEFINE_MUTEX(klp_mutex);
|
2015-01-20 23:26:19 +08:00
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
static LIST_HEAD(klp_patches);
|
|
|
|
|
|
|
|
static struct kobject *klp_root_kobj;
|
|
|
|
|
|
|
|
static bool klp_is_module(struct klp_object *obj)
|
|
|
|
{
|
|
|
|
return obj->name;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* sets obj->mod if object is not vmlinux and module is found */
|
|
|
|
static void klp_find_object_module(struct klp_object *obj)
|
|
|
|
{
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
struct module *mod;
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
if (!klp_is_module(obj))
|
|
|
|
return;
|
|
|
|
|
|
|
|
mutex_lock(&module_mutex);
|
|
|
|
/*
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
* We do not want to block removal of patched modules and therefore
|
|
|
|
* we do not take a reference here. The patches are removed by
|
2016-03-17 08:55:39 +08:00
|
|
|
* klp_module_going() instead.
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
*/
|
|
|
|
mod = find_module(obj->name);
|
|
|
|
/*
|
2016-03-17 08:55:39 +08:00
|
|
|
* Do not mess work of klp_module_coming() and klp_module_going().
|
|
|
|
* Note that the patch might still be needed before klp_module_going()
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
* is called. Module functions can be called even in the GOING state
|
|
|
|
* until mod->exit() finishes. This is especially important for
|
|
|
|
* patches that modify semantic of the functions.
|
2014-12-17 01:58:19 +08:00
|
|
|
*/
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
if (mod && mod->klp_alive)
|
|
|
|
obj->mod = mod;
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
mutex_unlock(&module_mutex);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool klp_is_patch_registered(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
struct klp_patch *mypatch;
|
|
|
|
|
|
|
|
list_for_each_entry(mypatch, &klp_patches, list)
|
|
|
|
if (mypatch == patch)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool klp_initialized(void)
|
|
|
|
{
|
2015-05-11 13:52:29 +08:00
|
|
|
return !!klp_root_kobj;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
struct klp_find_arg {
|
|
|
|
const char *objname;
|
|
|
|
const char *name;
|
|
|
|
unsigned long addr;
|
|
|
|
unsigned long count;
|
2015-12-02 10:40:54 +08:00
|
|
|
unsigned long pos;
|
2014-12-17 01:58:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static int klp_find_callback(void *data, const char *name,
|
|
|
|
struct module *mod, unsigned long addr)
|
|
|
|
{
|
|
|
|
struct klp_find_arg *args = data;
|
|
|
|
|
|
|
|
if ((mod && !args->objname) || (!mod && args->objname))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (strcmp(args->name, name))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (args->objname && strcmp(args->objname, mod->name))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
args->addr = addr;
|
|
|
|
args->count++;
|
|
|
|
|
2015-12-02 10:40:54 +08:00
|
|
|
/*
|
|
|
|
* Finish the search when the symbol is found for the desired position
|
|
|
|
* or the position is not defined for a non-unique symbol.
|
|
|
|
*/
|
|
|
|
if ((args->pos && (args->count == args->pos)) ||
|
|
|
|
(!args->pos && (args->count > 1)))
|
|
|
|
return 1;
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int klp_find_object_symbol(const char *objname, const char *name,
|
2015-12-02 10:40:54 +08:00
|
|
|
unsigned long sympos, unsigned long *addr)
|
2014-12-17 01:58:19 +08:00
|
|
|
{
|
|
|
|
struct klp_find_arg args = {
|
|
|
|
.objname = objname,
|
|
|
|
.name = name,
|
|
|
|
.addr = 0,
|
2015-12-02 10:40:54 +08:00
|
|
|
.count = 0,
|
|
|
|
.pos = sympos,
|
2014-12-17 01:58:19 +08:00
|
|
|
};
|
|
|
|
|
2015-06-01 23:48:37 +08:00
|
|
|
mutex_lock(&module_mutex);
|
2017-03-28 21:10:35 +08:00
|
|
|
if (objname)
|
|
|
|
module_kallsyms_on_each_symbol(klp_find_callback, &args);
|
|
|
|
else
|
|
|
|
kallsyms_on_each_symbol(klp_find_callback, &args);
|
2015-06-01 23:48:37 +08:00
|
|
|
mutex_unlock(&module_mutex);
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2015-12-02 10:40:54 +08:00
|
|
|
/*
|
|
|
|
* Ensure an address was found. If sympos is 0, ensure symbol is unique;
|
|
|
|
* otherwise ensure the symbol position count matches sympos.
|
|
|
|
*/
|
|
|
|
if (args.addr == 0)
|
2014-12-17 01:58:19 +08:00
|
|
|
pr_err("symbol '%s' not found in symbol table\n", name);
|
2015-12-02 10:40:54 +08:00
|
|
|
else if (args.count > 1 && sympos == 0) {
|
2016-03-09 22:20:59 +08:00
|
|
|
pr_err("unresolvable ambiguity for symbol '%s' in object '%s'\n",
|
|
|
|
name, objname);
|
2015-12-02 10:40:54 +08:00
|
|
|
} else if (sympos != args.count && sympos > 0) {
|
|
|
|
pr_err("symbol position %lu for symbol '%s' in object '%s' not found\n",
|
|
|
|
sympos, name, objname ? objname : "vmlinux");
|
|
|
|
} else {
|
2014-12-17 01:58:19 +08:00
|
|
|
*addr = args.addr;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
*addr = 0;
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
|
2014-12-17 01:58:19 +08:00
|
|
|
{
|
2016-03-23 08:03:18 +08:00
|
|
|
int i, cnt, vmlinux, ret;
|
|
|
|
char objname[MODULE_NAME_LEN];
|
|
|
|
char symname[KSYM_NAME_LEN];
|
|
|
|
char *strtab = pmod->core_kallsyms.strtab;
|
|
|
|
Elf_Rela *relas;
|
|
|
|
Elf_Sym *sym;
|
|
|
|
unsigned long sympos, addr;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2015-12-02 10:40:54 +08:00
|
|
|
/*
|
2016-03-23 08:03:18 +08:00
|
|
|
* Since the field widths for objname and symname in the sscanf()
|
|
|
|
* call are hard-coded and correspond to MODULE_NAME_LEN and
|
|
|
|
* KSYM_NAME_LEN respectively, we must make sure that MODULE_NAME_LEN
|
|
|
|
* and KSYM_NAME_LEN have the values we expect them to have.
|
|
|
|
*
|
|
|
|
* Because the value of MODULE_NAME_LEN can differ among architectures,
|
|
|
|
* we use the smallest/strictest upper bound possible (56, based on
|
|
|
|
* the current definition of MODULE_NAME_LEN) to prevent overflows.
|
2015-12-02 10:40:54 +08:00
|
|
|
*/
|
2016-03-23 08:03:18 +08:00
|
|
|
BUILD_BUG_ON(MODULE_NAME_LEN < 56 || KSYM_NAME_LEN != 128);
|
|
|
|
|
|
|
|
relas = (Elf_Rela *) relasec->sh_addr;
|
|
|
|
/* For each rela in this klp relocation section */
|
|
|
|
for (i = 0; i < relasec->sh_size / sizeof(Elf_Rela); i++) {
|
|
|
|
sym = pmod->core_kallsyms.symtab + ELF_R_SYM(relas[i].r_info);
|
|
|
|
if (sym->st_shndx != SHN_LIVEPATCH) {
|
2017-04-14 06:59:15 +08:00
|
|
|
pr_err("symbol %s is not marked as a livepatch symbol\n",
|
2016-03-23 08:03:18 +08:00
|
|
|
strtab + sym->st_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Format: .klp.sym.objname.symname,sympos */
|
|
|
|
cnt = sscanf(strtab + sym->st_name,
|
|
|
|
".klp.sym.%55[^.].%127[^,],%lu",
|
|
|
|
objname, symname, &sympos);
|
|
|
|
if (cnt != 3) {
|
2017-04-14 06:59:15 +08:00
|
|
|
pr_err("symbol %s has an incorrectly formatted name\n",
|
2016-03-23 08:03:18 +08:00
|
|
|
strtab + sym->st_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* klp_find_object_symbol() treats a NULL objname as vmlinux */
|
|
|
|
vmlinux = !strcmp(objname, "vmlinux");
|
|
|
|
ret = klp_find_object_symbol(vmlinux ? NULL : objname,
|
|
|
|
symname, sympos, &addr);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
sym->st_value = addr;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int klp_write_object_relocations(struct module *pmod,
|
|
|
|
struct klp_object *obj)
|
|
|
|
{
|
2016-03-23 08:03:18 +08:00
|
|
|
int i, cnt, ret = 0;
|
|
|
|
const char *objname, *secname;
|
|
|
|
char sec_objname[MODULE_NAME_LEN];
|
|
|
|
Elf_Shdr *sec;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
if (WARN_ON(!klp_is_object_loaded(obj)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
objname = klp_is_module(obj) ? obj->name : "vmlinux";
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
/* For each klp relocation section */
|
|
|
|
for (i = 1; i < pmod->klp_info->hdr.e_shnum; i++) {
|
|
|
|
sec = pmod->klp_info->sechdrs + i;
|
|
|
|
secname = pmod->klp_info->secstrings + sec->sh_name;
|
|
|
|
if (!(sec->sh_flags & SHF_RELA_LIVEPATCH))
|
|
|
|
continue;
|
2015-12-04 06:33:26 +08:00
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
/*
|
|
|
|
* Format: .klp.rela.sec_objname.section_name
|
|
|
|
* See comment in klp_resolve_symbols() for an explanation
|
|
|
|
* of the selected field width value.
|
|
|
|
*/
|
|
|
|
cnt = sscanf(secname, ".klp.rela.%55[^.]", sec_objname);
|
|
|
|
if (cnt != 1) {
|
2017-04-14 06:59:15 +08:00
|
|
|
pr_err("section %s has an incorrectly formatted name\n",
|
2016-03-23 08:03:18 +08:00
|
|
|
secname);
|
|
|
|
ret = -EINVAL;
|
|
|
|
break;
|
|
|
|
}
|
2015-12-04 06:33:26 +08:00
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
if (strcmp(objname, sec_objname))
|
|
|
|
continue;
|
2015-12-04 06:33:26 +08:00
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
ret = klp_resolve_symbols(sec, pmod);
|
2015-12-02 10:40:55 +08:00
|
|
|
if (ret)
|
2016-03-23 08:03:18 +08:00
|
|
|
break;
|
2015-12-02 10:40:55 +08:00
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
ret = apply_relocate_add(pmod->klp_info->sechdrs,
|
|
|
|
pmod->core_kallsyms.strtab,
|
|
|
|
pmod->klp_info->symndx, i, pmod);
|
|
|
|
if (ret)
|
|
|
|
break;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
2015-12-04 06:33:26 +08:00
|
|
|
return ret;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Sysfs Interface
|
|
|
|
*
|
|
|
|
* /sys/kernel/livepatch
|
|
|
|
* /sys/kernel/livepatch/<patch>
|
|
|
|
* /sys/kernel/livepatch/<patch>/enabled
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
* /sys/kernel/livepatch/<patch>/transition
|
livepatch: send a fake signal to all blocking tasks
Live patching consistency model is of LEAVE_PATCHED_SET and
SWITCH_THREAD. This means that all tasks in the system have to be marked
one by one as safe to call a new patched function. Safe means when a
task is not (sleeping) in a set of patched functions. That is, no
patched function is on the task's stack. Another clearly safe place is
the boundary between kernel and userspace. The patching waits for all
tasks to get outside of the patched set or to cross the boundary. The
transition is completed afterwards.
The problem is that a task can block the transition for quite a long
time, if not forever. It could sleep in a set of patched functions, for
example. Luckily we can force the task to leave the set by sending it a
fake signal, that is a signal with no data in signal pending structures
(no handler, no sign of proper signal delivered). Suspend/freezer use
this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
is woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.
There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled differently.
They can be woken up so they leave the patched set and their
TIF_PATCH_PENDING can be cleared thanks to stack checking.
For the sake of completeness, if the task is in TASK_RUNNING state but
not currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Second, if the task runs in the
kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
handled on return from the interrupt. It would be handled on return to
the userspace in the future when the fake signal is sent again. Stack
checking deals with these cases in a better way.
If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...
* each process dealing with signals should react accordingly to EINTR
return values.
* syscalls returning EINTR happen to be quite common situation in the
system even if no fake signal is sent.
* freezer sends the fake signal and does not deal with EINTR anyhow.
Thus EINTR values are returned when the system is resumed.
The very safe marking is done in architectures' "entry" on syscall and
interrupt/exception exit paths, and in a stack checking functions of
livepatch. TIF_PATCH_PENDING is cleared and the next
recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
call klp_update_patch_state() before do_signal(), so that
recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
immediately and thus prevent a double call of do_signal().
Note that the fake signal is not sent to stopped/traced tasks. Such task
prevents the patching to finish till it continues again (is not traced
anymore).
Last, sending the fake signal is not automatic. It is done only when
admin requests it by writing 1 to signal sysfs attribute in livepatch
sysfs directory.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 21:50:13 +08:00
|
|
|
* /sys/kernel/livepatch/<patch>/signal
|
2017-11-22 18:29:21 +08:00
|
|
|
* /sys/kernel/livepatch/<patch>/force
|
2014-12-17 01:58:19 +08:00
|
|
|
* /sys/kernel/livepatch/<patch>/<object>
|
2015-12-02 10:40:56 +08:00
|
|
|
* /sys/kernel/livepatch/<patch>/<object>/<function,sympos>
|
2014-12-17 01:58:19 +08:00
|
|
|
*/
|
2019-01-09 20:43:20 +08:00
|
|
|
static int __klp_disable_patch(struct klp_patch *patch);
|
|
|
|
static int __klp_enable_patch(struct klp_patch *patch);
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
struct klp_patch *patch;
|
|
|
|
int ret;
|
2017-02-14 09:42:38 +08:00
|
|
|
bool enabled;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2017-02-14 09:42:38 +08:00
|
|
|
ret = kstrtobool(buf, &enabled);
|
2014-12-17 01:58:19 +08:00
|
|
|
if (ret)
|
2017-02-14 09:42:38 +08:00
|
|
|
return ret;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
patch = container_of(kobj, struct klp_patch, kobj);
|
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
|
2017-03-07 01:20:29 +08:00
|
|
|
if (!klp_is_patch_registered(patch)) {
|
|
|
|
/*
|
|
|
|
* Module with the patch could either disappear meanwhile or is
|
|
|
|
* not properly initialized yet.
|
|
|
|
*/
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2017-02-14 09:42:38 +08:00
|
|
|
if (patch->enabled == enabled) {
|
2014-12-17 01:58:19 +08:00
|
|
|
/* already in requested state */
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
if (patch == klp_transition_patch) {
|
|
|
|
klp_reverse_transition();
|
|
|
|
} else if (enabled) {
|
2014-12-17 01:58:19 +08:00
|
|
|
ret = __klp_enable_patch(patch);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
} else {
|
|
|
|
ret = __klp_disable_patch(patch);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
|
|
|
|
return count;
|
|
|
|
|
|
|
|
err:
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t enabled_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
struct klp_patch *patch;
|
|
|
|
|
|
|
|
patch = container_of(kobj, struct klp_patch, kobj);
|
2017-02-14 09:42:35 +08:00
|
|
|
return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->enabled);
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
static ssize_t transition_show(struct kobject *kobj,
|
|
|
|
struct kobj_attribute *attr, char *buf)
|
|
|
|
{
|
|
|
|
struct klp_patch *patch;
|
|
|
|
|
|
|
|
patch = container_of(kobj, struct klp_patch, kobj);
|
|
|
|
return snprintf(buf, PAGE_SIZE-1, "%d\n",
|
|
|
|
patch == klp_transition_patch);
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
livepatch: send a fake signal to all blocking tasks
Live patching consistency model is of LEAVE_PATCHED_SET and
SWITCH_THREAD. This means that all tasks in the system have to be marked
one by one as safe to call a new patched function. Safe means when a
task is not (sleeping) in a set of patched functions. That is, no
patched function is on the task's stack. Another clearly safe place is
the boundary between kernel and userspace. The patching waits for all
tasks to get outside of the patched set or to cross the boundary. The
transition is completed afterwards.
The problem is that a task can block the transition for quite a long
time, if not forever. It could sleep in a set of patched functions, for
example. Luckily we can force the task to leave the set by sending it a
fake signal, that is a signal with no data in signal pending structures
(no handler, no sign of proper signal delivered). Suspend/freezer use
this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
is woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.
There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled differently.
They can be woken up so they leave the patched set and their
TIF_PATCH_PENDING can be cleared thanks to stack checking.
For the sake of completeness, if the task is in TASK_RUNNING state but
not currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Second, if the task runs in the
kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
handled on return from the interrupt. It would be handled on return to
the userspace in the future when the fake signal is sent again. Stack
checking deals with these cases in a better way.
If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...
* each process dealing with signals should react accordingly to EINTR
return values.
* syscalls returning EINTR happen to be quite common situation in the
system even if no fake signal is sent.
* freezer sends the fake signal and does not deal with EINTR anyhow.
Thus EINTR values are returned when the system is resumed.
The very safe marking is done in architectures' "entry" on syscall and
interrupt/exception exit paths, and in a stack checking functions of
livepatch. TIF_PATCH_PENDING is cleared and the next
recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
call klp_update_patch_state() before do_signal(), so that
recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
immediately and thus prevent a double call of do_signal().
Note that the fake signal is not sent to stopped/traced tasks. Such task
prevents the patching to finish till it continues again (is not traced
anymore).
Last, sending the fake signal is not automatic. It is done only when
admin requests it by writing 1 to signal sysfs attribute in livepatch
sysfs directory.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 21:50:13 +08:00
|
|
|
static ssize_t signal_store(struct kobject *kobj, struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
struct klp_patch *patch;
|
|
|
|
int ret;
|
|
|
|
bool val;
|
|
|
|
|
|
|
|
ret = kstrtobool(buf, &val);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2017-12-21 21:40:43 +08:00
|
|
|
if (!val)
|
|
|
|
return count;
|
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
|
|
|
|
patch = container_of(kobj, struct klp_patch, kobj);
|
|
|
|
if (patch != klp_transition_patch) {
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
klp_send_signals();
|
|
|
|
|
|
|
|
mutex_unlock(&klp_mutex);
|
livepatch: send a fake signal to all blocking tasks
Live patching consistency model is of LEAVE_PATCHED_SET and
SWITCH_THREAD. This means that all tasks in the system have to be marked
one by one as safe to call a new patched function. Safe means when a
task is not (sleeping) in a set of patched functions. That is, no
patched function is on the task's stack. Another clearly safe place is
the boundary between kernel and userspace. The patching waits for all
tasks to get outside of the patched set or to cross the boundary. The
transition is completed afterwards.
The problem is that a task can block the transition for quite a long
time, if not forever. It could sleep in a set of patched functions, for
example. Luckily we can force the task to leave the set by sending it a
fake signal, that is a signal with no data in signal pending structures
(no handler, no sign of proper signal delivered). Suspend/freezer use
this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
is woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.
There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled differently.
They can be woken up so they leave the patched set and their
TIF_PATCH_PENDING can be cleared thanks to stack checking.
For the sake of completeness, if the task is in TASK_RUNNING state but
not currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Second, if the task runs in the
kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
handled on return from the interrupt. It would be handled on return to
the userspace in the future when the fake signal is sent again. Stack
checking deals with these cases in a better way.
If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...
* each process dealing with signals should react accordingly to EINTR
return values.
* syscalls returning EINTR happen to be quite common situation in the
system even if no fake signal is sent.
* freezer sends the fake signal and does not deal with EINTR anyhow.
Thus EINTR values are returned when the system is resumed.
The very safe marking is done in architectures' "entry" on syscall and
interrupt/exception exit paths, and in a stack checking functions of
livepatch. TIF_PATCH_PENDING is cleared and the next
recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
call klp_update_patch_state() before do_signal(), so that
recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
immediately and thus prevent a double call of do_signal().
Note that the fake signal is not sent to stopped/traced tasks. Such task
prevents the patching to finish till it continues again (is not traced
anymore).
Last, sending the fake signal is not automatic. It is done only when
admin requests it by writing 1 to signal sysfs attribute in livepatch
sysfs directory.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 21:50:13 +08:00
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
2017-11-22 18:29:21 +08:00
|
|
|
static ssize_t force_store(struct kobject *kobj, struct kobj_attribute *attr,
|
|
|
|
const char *buf, size_t count)
|
|
|
|
{
|
|
|
|
struct klp_patch *patch;
|
|
|
|
int ret;
|
|
|
|
bool val;
|
|
|
|
|
|
|
|
ret = kstrtobool(buf, &val);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2017-12-21 21:40:43 +08:00
|
|
|
if (!val)
|
|
|
|
return count;
|
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
|
|
|
|
patch = container_of(kobj, struct klp_patch, kobj);
|
|
|
|
if (patch != klp_transition_patch) {
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
klp_force_transition();
|
|
|
|
|
|
|
|
mutex_unlock(&klp_mutex);
|
2017-11-22 18:29:21 +08:00
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
static struct kobj_attribute enabled_kobj_attr = __ATTR_RW(enabled);
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
static struct kobj_attribute transition_kobj_attr = __ATTR_RO(transition);
|
livepatch: send a fake signal to all blocking tasks
Live patching consistency model is of LEAVE_PATCHED_SET and
SWITCH_THREAD. This means that all tasks in the system have to be marked
one by one as safe to call a new patched function. Safe means when a
task is not (sleeping) in a set of patched functions. That is, no
patched function is on the task's stack. Another clearly safe place is
the boundary between kernel and userspace. The patching waits for all
tasks to get outside of the patched set or to cross the boundary. The
transition is completed afterwards.
The problem is that a task can block the transition for quite a long
time, if not forever. It could sleep in a set of patched functions, for
example. Luckily we can force the task to leave the set by sending it a
fake signal, that is a signal with no data in signal pending structures
(no handler, no sign of proper signal delivered). Suspend/freezer use
this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
is woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.
There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled differently.
They can be woken up so they leave the patched set and their
TIF_PATCH_PENDING can be cleared thanks to stack checking.
For the sake of completeness, if the task is in TASK_RUNNING state but
not currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Second, if the task runs in the
kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
handled on return from the interrupt. It would be handled on return to
the userspace in the future when the fake signal is sent again. Stack
checking deals with these cases in a better way.
If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...
* each process dealing with signals should react accordingly to EINTR
return values.
* syscalls returning EINTR happen to be quite common situation in the
system even if no fake signal is sent.
* freezer sends the fake signal and does not deal with EINTR anyhow.
Thus EINTR values are returned when the system is resumed.
The very safe marking is done in architectures' "entry" on syscall and
interrupt/exception exit paths, and in a stack checking functions of
livepatch. TIF_PATCH_PENDING is cleared and the next
recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
call klp_update_patch_state() before do_signal(), so that
recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
immediately and thus prevent a double call of do_signal().
Note that the fake signal is not sent to stopped/traced tasks. Such task
prevents the patching to finish till it continues again (is not traced
anymore).
Last, sending the fake signal is not automatic. It is done only when
admin requests it by writing 1 to signal sysfs attribute in livepatch
sysfs directory.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 21:50:13 +08:00
|
|
|
static struct kobj_attribute signal_kobj_attr = __ATTR_WO(signal);
|
2017-11-22 18:29:21 +08:00
|
|
|
static struct kobj_attribute force_kobj_attr = __ATTR_WO(force);
|
2014-12-17 01:58:19 +08:00
|
|
|
static struct attribute *klp_patch_attrs[] = {
|
|
|
|
&enabled_kobj_attr.attr,
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
&transition_kobj_attr.attr,
|
livepatch: send a fake signal to all blocking tasks
Live patching consistency model is of LEAVE_PATCHED_SET and
SWITCH_THREAD. This means that all tasks in the system have to be marked
one by one as safe to call a new patched function. Safe means when a
task is not (sleeping) in a set of patched functions. That is, no
patched function is on the task's stack. Another clearly safe place is
the boundary between kernel and userspace. The patching waits for all
tasks to get outside of the patched set or to cross the boundary. The
transition is completed afterwards.
The problem is that a task can block the transition for quite a long
time, if not forever. It could sleep in a set of patched functions, for
example. Luckily we can force the task to leave the set by sending it a
fake signal, that is a signal with no data in signal pending structures
(no handler, no sign of proper signal delivered). Suspend/freezer use
this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
is woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.
There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled differently.
They can be woken up so they leave the patched set and their
TIF_PATCH_PENDING can be cleared thanks to stack checking.
For the sake of completeness, if the task is in TASK_RUNNING state but
not currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Second, if the task runs in the
kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
handled on return from the interrupt. It would be handled on return to
the userspace in the future when the fake signal is sent again. Stack
checking deals with these cases in a better way.
If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...
* each process dealing with signals should react accordingly to EINTR
return values.
* syscalls returning EINTR happen to be quite common situation in the
system even if no fake signal is sent.
* freezer sends the fake signal and does not deal with EINTR anyhow.
Thus EINTR values are returned when the system is resumed.
The very safe marking is done in architectures' "entry" on syscall and
interrupt/exception exit paths, and in a stack checking functions of
livepatch. TIF_PATCH_PENDING is cleared and the next
recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
call klp_update_patch_state() before do_signal(), so that
recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
immediately and thus prevent a double call of do_signal().
Note that the fake signal is not sent to stopped/traced tasks. Such task
prevents the patching to finish till it continues again (is not traced
anymore).
Last, sending the fake signal is not automatic. It is done only when
admin requests it by writing 1 to signal sysfs attribute in livepatch
sysfs directory.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 21:50:13 +08:00
|
|
|
&signal_kobj_attr.attr,
|
2017-11-22 18:29:21 +08:00
|
|
|
&force_kobj_attr.attr,
|
2014-12-17 01:58:19 +08:00
|
|
|
NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
static void klp_kobj_release_patch(struct kobject *kobj)
|
|
|
|
{
|
2017-03-07 01:20:29 +08:00
|
|
|
struct klp_patch *patch;
|
|
|
|
|
|
|
|
patch = container_of(kobj, struct klp_patch, kobj);
|
|
|
|
complete(&patch->finish);
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct kobj_type klp_ktype_patch = {
|
|
|
|
.release = klp_kobj_release_patch,
|
|
|
|
.sysfs_ops = &kobj_sysfs_ops,
|
|
|
|
.default_attrs = klp_patch_attrs,
|
|
|
|
};
|
|
|
|
|
2015-05-19 18:01:18 +08:00
|
|
|
static void klp_kobj_release_object(struct kobject *kobj)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct kobj_type klp_ktype_object = {
|
|
|
|
.release = klp_kobj_release_object,
|
|
|
|
.sysfs_ops = &kobj_sysfs_ops,
|
|
|
|
};
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
static void klp_kobj_release_func(struct kobject *kobj)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct kobj_type klp_ktype_func = {
|
|
|
|
.release = klp_kobj_release_func,
|
|
|
|
.sysfs_ops = &kobj_sysfs_ops,
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Free all functions' kobjects in the array up to some limit. When limit is
|
|
|
|
* NULL, all kobjects are freed.
|
|
|
|
*/
|
|
|
|
static void klp_free_funcs_limited(struct klp_object *obj,
|
|
|
|
struct klp_func *limit)
|
|
|
|
{
|
|
|
|
struct klp_func *func;
|
|
|
|
|
|
|
|
for (func = obj->funcs; func->old_name && func != limit; func++)
|
|
|
|
kobject_put(&func->kobj);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Clean up when a patched object is unloaded */
|
|
|
|
static void klp_free_object_loaded(struct klp_object *obj)
|
|
|
|
{
|
|
|
|
struct klp_func *func;
|
|
|
|
|
|
|
|
obj->mod = NULL;
|
|
|
|
|
2015-05-19 18:01:19 +08:00
|
|
|
klp_for_each_func(obj, func)
|
2019-01-09 20:43:19 +08:00
|
|
|
func->old_func = NULL;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Free all objects' kobjects in the array up to some limit. When limit is
|
|
|
|
* NULL, all kobjects are freed.
|
|
|
|
*/
|
|
|
|
static void klp_free_objects_limited(struct klp_patch *patch,
|
|
|
|
struct klp_object *limit)
|
|
|
|
{
|
|
|
|
struct klp_object *obj;
|
|
|
|
|
|
|
|
for (obj = patch->objs; obj->funcs && obj != limit; obj++) {
|
|
|
|
klp_free_funcs_limited(obj, NULL);
|
2015-05-19 18:01:18 +08:00
|
|
|
kobject_put(&obj->kobj);
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void klp_free_patch(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
klp_free_objects_limited(patch, NULL);
|
|
|
|
if (!list_empty(&patch->list))
|
|
|
|
list_del(&patch->list);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int klp_init_func(struct klp_object *obj, struct klp_func *func)
|
|
|
|
{
|
2016-04-28 22:34:08 +08:00
|
|
|
if (!func->old_name || !func->new_func)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-07-20 17:46:42 +08:00
|
|
|
if (strlen(func->old_name) >= KSYM_NAME_LEN)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2015-01-20 23:26:19 +08:00
|
|
|
INIT_LIST_HEAD(&func->stack_node);
|
2017-02-14 09:42:35 +08:00
|
|
|
func->patched = false;
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
func->transition = false;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2015-12-02 10:40:56 +08:00
|
|
|
/* The format for the sysfs directory is <function,sympos> where sympos
|
|
|
|
* is the nth occurrence of this symbol in kallsyms for the patched
|
|
|
|
* object. If the user selects 0 for old_sympos, then 1 will be used
|
|
|
|
* since a unique symbol will be the first occurrence.
|
|
|
|
*/
|
2015-01-20 23:26:19 +08:00
|
|
|
return kobject_init_and_add(&func->kobj, &klp_ktype_func,
|
2015-12-02 10:40:56 +08:00
|
|
|
&obj->kobj, "%s,%lu", func->old_name,
|
|
|
|
func->old_sympos ? func->old_sympos : 1);
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
2016-08-18 08:58:28 +08:00
|
|
|
/* Arches may override this to finish any remaining arch-specific tasks */
|
|
|
|
void __weak arch_klp_init_object_loaded(struct klp_patch *patch,
|
|
|
|
struct klp_object *obj)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
/* parts of the initialization that is done only when the object is loaded */
|
|
|
|
static int klp_init_object_loaded(struct klp_patch *patch,
|
|
|
|
struct klp_object *obj)
|
|
|
|
{
|
|
|
|
struct klp_func *func;
|
|
|
|
int ret;
|
|
|
|
|
2016-08-18 08:58:28 +08:00
|
|
|
module_disable_ro(patch->mod);
|
2016-03-23 08:03:18 +08:00
|
|
|
ret = klp_write_object_relocations(patch->mod, obj);
|
2016-08-18 08:58:28 +08:00
|
|
|
if (ret) {
|
|
|
|
module_enable_ro(patch->mod, true);
|
2016-03-23 08:03:18 +08:00
|
|
|
return ret;
|
2016-08-18 08:58:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
arch_klp_init_object_loaded(patch, obj);
|
|
|
|
module_enable_ro(patch->mod, true);
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2015-05-19 18:01:19 +08:00
|
|
|
klp_for_each_func(obj, func) {
|
2015-12-02 10:40:54 +08:00
|
|
|
ret = klp_find_object_symbol(obj->name, func->old_name,
|
|
|
|
func->old_sympos,
|
2019-01-09 20:43:19 +08:00
|
|
|
(unsigned long *)&func->old_func);
|
2014-12-17 01:58:19 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2017-02-14 09:42:39 +08:00
|
|
|
|
2019-01-09 20:43:19 +08:00
|
|
|
ret = kallsyms_lookup_size_offset((unsigned long)func->old_func,
|
2017-02-14 09:42:39 +08:00
|
|
|
&func->old_size, NULL);
|
|
|
|
if (!ret) {
|
|
|
|
pr_err("kallsyms size lookup failed for '%s'\n",
|
|
|
|
func->old_name);
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = kallsyms_lookup_size_offset((unsigned long)func->new_func,
|
|
|
|
&func->new_size, NULL);
|
|
|
|
if (!ret) {
|
|
|
|
pr_err("kallsyms size lookup failed for '%s' replacement\n",
|
|
|
|
func->old_name);
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int klp_init_object(struct klp_patch *patch, struct klp_object *obj)
|
|
|
|
{
|
|
|
|
struct klp_func *func;
|
|
|
|
int ret;
|
|
|
|
const char *name;
|
|
|
|
|
|
|
|
if (!obj->funcs)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-07-20 17:46:42 +08:00
|
|
|
if (klp_is_module(obj) && strlen(obj->name) >= MODULE_NAME_LEN)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2017-02-14 09:42:35 +08:00
|
|
|
obj->patched = false;
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
obj->mod = NULL;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
klp_find_object_module(obj);
|
|
|
|
|
|
|
|
name = klp_is_module(obj) ? obj->name : "vmlinux";
|
2015-05-19 18:01:18 +08:00
|
|
|
ret = kobject_init_and_add(&obj->kobj, &klp_ktype_object,
|
|
|
|
&patch->kobj, "%s", name);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2015-05-19 18:01:19 +08:00
|
|
|
klp_for_each_func(obj, func) {
|
2014-12-17 01:58:19 +08:00
|
|
|
ret = klp_init_func(obj, func);
|
|
|
|
if (ret)
|
|
|
|
goto free;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (klp_is_object_loaded(obj)) {
|
|
|
|
ret = klp_init_object_loaded(patch, obj);
|
|
|
|
if (ret)
|
|
|
|
goto free;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
free:
|
|
|
|
klp_free_funcs_limited(obj, func);
|
2015-05-19 18:01:18 +08:00
|
|
|
kobject_put(&obj->kobj);
|
2014-12-17 01:58:19 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int klp_init_patch(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
struct klp_object *obj;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!patch->objs)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
|
2017-02-14 09:42:35 +08:00
|
|
|
patch->enabled = false;
|
2017-03-07 01:20:29 +08:00
|
|
|
init_completion(&patch->finish);
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
|
2015-02-15 17:03:20 +08:00
|
|
|
klp_root_kobj, "%s", patch->mod->name);
|
2017-03-07 01:20:29 +08:00
|
|
|
if (ret) {
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
return ret;
|
|
|
|
}
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2015-05-19 18:01:19 +08:00
|
|
|
klp_for_each_object(patch, obj) {
|
2014-12-17 01:58:19 +08:00
|
|
|
ret = klp_init_object(patch, obj);
|
|
|
|
if (ret)
|
|
|
|
goto free;
|
|
|
|
}
|
|
|
|
|
2015-01-10 04:03:04 +08:00
|
|
|
list_add_tail(&patch->list, &klp_patches);
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
free:
|
|
|
|
klp_free_objects_limited(patch, obj);
|
2017-03-07 01:20:29 +08:00
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
mutex_unlock(&klp_mutex);
|
2017-03-07 01:20:29 +08:00
|
|
|
|
|
|
|
kobject_put(&patch->kobj);
|
|
|
|
wait_for_completion(&patch->finish);
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* klp_unregister_patch() - unregisters a patch
|
|
|
|
* @patch: Disabled patch to be unregistered
|
|
|
|
*
|
|
|
|
* Frees the data structures and removes the sysfs interface.
|
|
|
|
*
|
|
|
|
* Return: 0 on success, otherwise error
|
|
|
|
*/
|
|
|
|
int klp_unregister_patch(struct klp_patch *patch)
|
|
|
|
{
|
2017-03-07 01:20:29 +08:00
|
|
|
int ret;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
|
|
|
|
if (!klp_is_patch_registered(patch)) {
|
|
|
|
ret = -EINVAL;
|
2017-03-07 01:20:29 +08:00
|
|
|
goto err;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
2017-02-14 09:42:35 +08:00
|
|
|
if (patch->enabled) {
|
2014-12-17 01:58:19 +08:00
|
|
|
ret = -EBUSY;
|
2017-03-07 01:20:29 +08:00
|
|
|
goto err;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
klp_free_patch(patch);
|
|
|
|
|
2017-03-07 01:20:29 +08:00
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
|
|
|
|
kobject_put(&patch->kobj);
|
|
|
|
wait_for_completion(&patch->finish);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
err:
|
2014-12-17 01:58:19 +08:00
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(klp_unregister_patch);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* klp_register_patch() - registers a patch
|
|
|
|
* @patch: Patch to be registered
|
|
|
|
*
|
|
|
|
* Initializes the data structure associated with the patch and
|
|
|
|
* creates the sysfs interface.
|
|
|
|
*
|
2017-03-07 01:20:29 +08:00
|
|
|
* There is no need to take the reference on the patch module here. It is done
|
|
|
|
* later when the patch is enabled.
|
|
|
|
*
|
2014-12-17 01:58:19 +08:00
|
|
|
* Return: 0 on success, otherwise error
|
|
|
|
*/
|
|
|
|
int klp_register_patch(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
if (!patch || !patch->mod)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-03-23 08:03:18 +08:00
|
|
|
if (!is_livepatch_module(patch->mod)) {
|
2017-04-14 06:59:15 +08:00
|
|
|
pr_err("module %s is not marked as a livepatch module\n",
|
2016-03-23 08:03:18 +08:00
|
|
|
patch->mod->name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
if (!klp_initialized())
|
|
|
|
return -ENODEV;
|
|
|
|
|
2018-01-10 18:01:28 +08:00
|
|
|
if (!klp_have_reliable_stack()) {
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
pr_err("This architecture doesn't have support for the livepatch consistency model.\n");
|
|
|
|
return -ENOSYS;
|
|
|
|
}
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2017-03-07 01:20:29 +08:00
|
|
|
return klp_init_patch(patch);
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(klp_register_patch);
|
|
|
|
|
2019-01-09 20:43:20 +08:00
|
|
|
static int __klp_disable_patch(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
struct klp_object *obj;
|
|
|
|
|
|
|
|
if (WARN_ON(!patch->enabled))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (klp_transition_patch)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
/* enforce stacking: only the last enabled patch can be disabled */
|
|
|
|
if (!list_is_last(&patch->list, &klp_patches) &&
|
|
|
|
list_next_entry(patch, list)->enabled)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
klp_init_transition(patch, KLP_UNPATCHED);
|
|
|
|
|
|
|
|
klp_for_each_object(patch, obj)
|
|
|
|
if (obj->patched)
|
|
|
|
klp_pre_unpatch_callback(obj);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Enforce the order of the func->transition writes in
|
|
|
|
* klp_init_transition() and the TIF_PATCH_PENDING writes in
|
|
|
|
* klp_start_transition(). In the rare case where klp_ftrace_handler()
|
|
|
|
* is called shortly after klp_update_patch_state() switches the task,
|
|
|
|
* this ensures the handler sees that func->transition is set.
|
|
|
|
*/
|
|
|
|
smp_wmb();
|
|
|
|
|
|
|
|
klp_start_transition();
|
|
|
|
klp_try_complete_transition();
|
|
|
|
patch->enabled = false;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* klp_disable_patch() - disables a registered patch
|
|
|
|
* @patch: The registered, enabled patch to be disabled
|
|
|
|
*
|
|
|
|
* Unregisters the patched functions from ftrace.
|
|
|
|
*
|
|
|
|
* Return: 0 on success, otherwise error
|
|
|
|
*/
|
|
|
|
int klp_disable_patch(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
|
|
|
|
if (!klp_is_patch_registered(patch)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!patch->enabled) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = __klp_disable_patch(patch);
|
|
|
|
|
|
|
|
err:
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(klp_disable_patch);
|
|
|
|
|
|
|
|
static int __klp_enable_patch(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
struct klp_object *obj;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (klp_transition_patch)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
if (WARN_ON(patch->enabled))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* enforce stacking: only the first disabled patch can be enabled */
|
|
|
|
if (patch->list.prev != &klp_patches &&
|
|
|
|
!list_prev_entry(patch, list)->enabled)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A reference is taken on the patch module to prevent it from being
|
|
|
|
* unloaded.
|
|
|
|
*/
|
|
|
|
if (!try_module_get(patch->mod))
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
pr_notice("enabling patch '%s'\n", patch->mod->name);
|
|
|
|
|
|
|
|
klp_init_transition(patch, KLP_PATCHED);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Enforce the order of the func->transition writes in
|
|
|
|
* klp_init_transition() and the ops->func_stack writes in
|
|
|
|
* klp_patch_object(), so that klp_ftrace_handler() will see the
|
|
|
|
* func->transition updates before the handler is registered and the
|
|
|
|
* new funcs become visible to the handler.
|
|
|
|
*/
|
|
|
|
smp_wmb();
|
|
|
|
|
|
|
|
klp_for_each_object(patch, obj) {
|
|
|
|
if (!klp_is_object_loaded(obj))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ret = klp_pre_patch_callback(obj);
|
|
|
|
if (ret) {
|
|
|
|
pr_warn("pre-patch callback failed for object '%s'\n",
|
|
|
|
klp_is_module(obj) ? obj->name : "vmlinux");
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = klp_patch_object(obj);
|
|
|
|
if (ret) {
|
|
|
|
pr_warn("failed to patch object '%s'\n",
|
|
|
|
klp_is_module(obj) ? obj->name : "vmlinux");
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
klp_start_transition();
|
|
|
|
klp_try_complete_transition();
|
|
|
|
patch->enabled = true;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
err:
|
|
|
|
pr_warn("failed to enable patch '%s'\n", patch->mod->name);
|
|
|
|
|
|
|
|
klp_cancel_transition();
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* klp_enable_patch() - enables a registered patch
|
|
|
|
* @patch: The registered, disabled patch to be enabled
|
|
|
|
*
|
|
|
|
* Performs the needed symbol lookups and code relocations,
|
|
|
|
* then registers the patched functions with ftrace.
|
|
|
|
*
|
|
|
|
* Return: 0 on success, otherwise error
|
|
|
|
*/
|
|
|
|
int klp_enable_patch(struct klp_patch *patch)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
|
|
|
|
if (!klp_is_patch_registered(patch)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = __klp_enable_patch(patch);
|
|
|
|
|
|
|
|
err:
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(klp_enable_patch);
|
|
|
|
|
2017-10-02 23:56:48 +08:00
|
|
|
/*
|
|
|
|
* Remove parts of patches that touch a given kernel module. The list of
|
|
|
|
* patches processed might be limited. When limit is NULL, all patches
|
|
|
|
* will be handled.
|
|
|
|
*/
|
|
|
|
static void klp_cleanup_module_patches_limited(struct module *mod,
|
|
|
|
struct klp_patch *limit)
|
|
|
|
{
|
|
|
|
struct klp_patch *patch;
|
|
|
|
struct klp_object *obj;
|
|
|
|
|
|
|
|
list_for_each_entry(patch, &klp_patches, list) {
|
|
|
|
if (patch == limit)
|
|
|
|
break;
|
|
|
|
|
|
|
|
klp_for_each_object(patch, obj) {
|
|
|
|
if (!klp_is_module(obj) || strcmp(obj->name, mod->name))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Only unpatch the module if the patch is enabled or
|
|
|
|
* is in transition.
|
|
|
|
*/
|
|
|
|
if (patch->enabled || patch == klp_transition_patch) {
|
2017-11-15 17:53:24 +08:00
|
|
|
|
|
|
|
if (patch != klp_transition_patch)
|
|
|
|
klp_pre_unpatch_callback(obj);
|
|
|
|
|
2017-10-02 23:56:48 +08:00
|
|
|
pr_notice("reverting patch '%s' on unloading module '%s'\n",
|
|
|
|
patch->mod->name, obj->mod->name);
|
|
|
|
klp_unpatch_object(obj);
|
2017-11-15 17:53:24 +08:00
|
|
|
|
|
|
|
klp_post_unpatch_callback(obj);
|
2017-10-02 23:56:48 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
klp_free_object_loaded(obj);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
int klp_module_coming(struct module *mod)
|
2014-12-17 01:58:19 +08:00
|
|
|
{
|
|
|
|
int ret;
|
2016-03-17 08:55:39 +08:00
|
|
|
struct klp_patch *patch;
|
|
|
|
struct klp_object *obj;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
if (WARN_ON(mod->state != MODULE_STATE_COMING))
|
|
|
|
return -EINVAL;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
mutex_lock(&klp_mutex);
|
|
|
|
/*
|
|
|
|
* Each module has to know that klp_module_coming()
|
|
|
|
* has been called. We never know what module will
|
|
|
|
* get patched by a new patch.
|
|
|
|
*/
|
|
|
|
mod->klp_alive = true;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
list_for_each_entry(patch, &klp_patches, list) {
|
|
|
|
klp_for_each_object(patch, obj) {
|
|
|
|
if (!klp_is_module(obj) || strcmp(obj->name, mod->name))
|
|
|
|
continue;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
obj->mod = mod;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
ret = klp_init_object_loaded(patch, obj);
|
|
|
|
if (ret) {
|
|
|
|
pr_warn("failed to initialize patch '%s' for module '%s' (%d)\n",
|
|
|
|
patch->mod->name, obj->mod->name, ret);
|
|
|
|
goto err;
|
|
|
|
}
|
2014-12-17 01:58:19 +08:00
|
|
|
|
livepatch: change to a per-task consistency model
Change livepatch to use a basic per-task consistency model. This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics. This is the
biggest remaining piece needed to make livepatch more generally useful.
This code stems from the design proposal made by Vojtech [1] in November
2014. It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching. There are also a number of fallback options which make
it quite flexible.
Patches are applied on a per-task basis, when the task is deemed safe to
switch over. When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds. The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.
An interrupt handler inherits the patched state of the task it
interrupts. The same is true for forked tasks: the child inherits the
patched state of the parent.
Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:
1. The first and most effective approach is stack checking of sleeping
tasks. If no affected functions are on the stack of a given task,
the task is patched. In most cases this will patch most or all of
the tasks on the first try. Otherwise it'll keep trying
periodically. This option is only available if the architecture has
reliable stacks (HAVE_RELIABLE_STACKTRACE).
2. The second approach, if needed, is kernel exit switching. A
task is switched when it returns to user space from a system call, a
user space IRQ, or a signal. It's useful in the following cases:
a) Patching I/O-bound user tasks which are sleeping on an affected
function. In this case you have to send SIGSTOP and SIGCONT to
force it to exit the kernel and be patched.
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
then it will get patched the next time it gets interrupted by an
IRQ.
c) In the future it could be useful for applying patches for
architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
this case you would have to signal most of the tasks on the
system. However this isn't supported yet because there's
currently no way to patch kthreads without
HAVE_RELIABLE_STACKTRACE.
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
instead have a klp_update_patch_state() call in the idle loop which
allows them to be patched before the CPU enters the idle state.
(Note there's not yet such an approach for kthreads.)
All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately. This can be useful if the patch doesn't
change any function or data semantics. Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.
There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency. This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.
For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately. This option should be used with care, only when the patch
doesn't change any function or data semantics.
In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.
A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original patch state.
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the scheduler changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-02-14 09:42:40 +08:00
|
|
|
/*
|
|
|
|
* Only patch the module if the patch is enabled or is
|
|
|
|
* in transition.
|
|
|
|
*/
|
|
|
|
if (!patch->enabled && patch != klp_transition_patch)
|
2016-03-17 08:55:39 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
pr_notice("applying patch '%s' to loading module '%s'\n",
|
|
|
|
patch->mod->name, obj->mod->name);
|
|
|
|
|
2017-10-14 03:08:41 +08:00
|
|
|
ret = klp_pre_patch_callback(obj);
|
|
|
|
if (ret) {
|
|
|
|
pr_warn("pre-patch callback failed for object '%s'\n",
|
|
|
|
obj->name);
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2017-02-14 09:42:35 +08:00
|
|
|
ret = klp_patch_object(obj);
|
2016-03-17 08:55:39 +08:00
|
|
|
if (ret) {
|
|
|
|
pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
|
|
|
|
patch->mod->name, obj->mod->name, ret);
|
2017-10-14 03:08:41 +08:00
|
|
|
|
2017-10-20 22:56:50 +08:00
|
|
|
klp_post_unpatch_callback(obj);
|
2016-03-17 08:55:39 +08:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2017-10-14 03:08:41 +08:00
|
|
|
if (patch != klp_transition_patch)
|
|
|
|
klp_post_patch_callback(obj);
|
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
mutex_unlock(&klp_mutex);
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
return 0;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
err:
|
|
|
|
/*
|
|
|
|
* If a patch is unsuccessfully applied, return
|
|
|
|
* error to the module loader.
|
|
|
|
*/
|
|
|
|
pr_warn("patch '%s' failed for module '%s', refusing to load module '%s'\n",
|
|
|
|
patch->mod->name, obj->mod->name, obj->mod->name);
|
|
|
|
mod->klp_alive = false;
|
2017-10-02 23:56:48 +08:00
|
|
|
klp_cleanup_module_patches_limited(mod, patch);
|
2016-03-17 08:55:39 +08:00
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
|
|
|
|
return ret;
|
2014-12-17 01:58:19 +08:00
|
|
|
}
|
|
|
|
|
2016-03-17 08:55:39 +08:00
|
|
|
void klp_module_going(struct module *mod)
|
2014-12-17 01:58:19 +08:00
|
|
|
{
|
2016-03-17 08:55:39 +08:00
|
|
|
if (WARN_ON(mod->state != MODULE_STATE_GOING &&
|
|
|
|
mod->state != MODULE_STATE_COMING))
|
|
|
|
return;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
mutex_lock(&klp_mutex);
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
/*
|
2016-03-17 08:55:39 +08:00
|
|
|
* Each module has to know that klp_module_going()
|
|
|
|
* has been called. We never know what module will
|
|
|
|
* get patched by a new patch.
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
*/
|
2016-03-17 08:55:39 +08:00
|
|
|
mod->klp_alive = false;
|
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-12 19:55:13 +08:00
|
|
|
|
2017-10-02 23:56:48 +08:00
|
|
|
klp_cleanup_module_patches_limited(mod, NULL);
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
mutex_unlock(&klp_mutex);
|
|
|
|
}
|
|
|
|
|
2015-05-22 22:26:29 +08:00
|
|
|
static int __init klp_init(void)
|
2014-12-17 01:58:19 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2015-01-09 17:53:21 +08:00
|
|
|
ret = klp_check_compiler_support();
|
|
|
|
if (ret) {
|
|
|
|
pr_info("Your compiler is too old; turning off.\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2014-12-17 01:58:19 +08:00
|
|
|
klp_root_kobj = kobject_create_and_add("livepatch", kernel_kobj);
|
2016-03-17 08:55:39 +08:00
|
|
|
if (!klp_root_kobj)
|
|
|
|
return -ENOMEM;
|
2014-12-17 01:58:19 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
module_init(klp_init);
|