Commit Graph

488 Commits

Author SHA1 Message Date
Steven Rostedt (Red Hat) 049fb9bd41 ftrace/module: Call clean up function when module init fails early
If the module init code fails after calling ftrace_module_init() and before
calling do_init_module(), we can suffer from a memory leak. This is because
ftrace_module_init() allocates pages to store the locations that ftrace
hooks are placed in the module text. If do_init_module() fails, it still
calls the MODULE_GOING notifiers which will tell ftrace to do a clean up of
the pages it allocated for the module. But if load_module() fails before
then, the pages allocated by ftrace_module_init() will never be freed.

Call ftrace_release_mod() on the module if load_module() fails before
getting to do_init_module().

Link: http://lkml.kernel.org/r/567CEA31.1070507@intel.com

Reported-by: "Qiu, PeiyangX" <peiyangx.qiu@intel.com>
Fixes: a949ae560a "ftrace/module: Hardcode ftrace_module_init() call into load_module()"
Cc: stable@vger.kernel.org # v2.6.38+
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 12:17:39 -05:00
Miroslav Benes e022441851 module: keep percpu symbols in module's symtab
Currently, percpu symbols from .data..percpu ELF section of a module are
not copied over and stored in final symtab array of struct module.
Consequently such symbol cannot be returned via kallsyms API (for
example kallsyms_lookup_name). This can be especially confusing when the
percpu symbol is exported. Only its __ksymtab et al. are present in its
symtab.

The culprit is in layout_and_allocate() function where SHF_ALLOC flag is
dropped for .data..percpu section. There is in fact no need to copy the
section to final struct module, because kernel module loader allocates
extra percpu section by itself. Unfortunately only symbols from
SHF_ALLOC sections are copied due to a check in is_core_symbol().

The patch changes is_core_symbol() function to copy over also percpu
symbols (their st_shndx points to .data..percpu ELF section). We do it
only if CONFIG_KALLSYMS_ALL is set to be consistent with the rest of the
function (ELF section is SHF_ALLOC but !SHF_EXECINSTR). Finally
elf_type() returns type 'a' for a percpu symbol because its address is
absolute.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-04 22:46:26 +01:00
Rusty Russell 85c898db63 module: clean up RO/NX handling.
Modules have three sections: text, rodata and writable data.  The code
handled the case where these overlapped, however they never can:
debug_align() ensures they are always page-aligned.

This is why we got away with manually traversing the pages in
set_all_modules_text_rw() without rounding.

We create three helper functions: frob_text(), frob_rodata() and
frob_writable_data().  We then call these explicitly at every point,
so it's clear what we're doing.

We also expose module_enable_ro() and module_disable_ro() for
livepatch to use.

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-04 22:46:26 +01:00
Rusty Russell 7523e4dc50 module: use a structure to encapsulate layout.
Makes it easier to handle init vs core cleanly, though the change is
fairly invasive across random architectures.

It simplifies the rbtree code immediately, however, while keeping the
core data together in the same cachline (now iff the rbtree code is
enabled).

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-04 22:46:25 +01:00
Josh Poimboeuf 20ef10c1b3 module: Use the same logic for setting and unsetting RO/NX
When setting a module's RO and NX permissions, set_section_ro_nx() is
used, but when clearing them, unset_module_{init,core}_ro_nx() are used.
The unset functions don't have the same checks the set function has for
partial page protections.  It's probably harmless, but it's still
confusingly asymmetrical.

Instead, use the same logic to do both.  Also add some new
set_module_{init,core}_ro_nx() helper functions for more symmetry with
the unset functions.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-04 22:46:24 +01:00
Peter Zijlstra 275d7d44d8 module: Fix locking in symbol_put_addr()
Poma (on the way to another bug) reported an assertion triggering:

  [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
  [<ffffffff81150822>] __module_address+0x32/0x150
  [<ffffffff81150956>] __module_text_address+0x16/0x70
  [<ffffffff81150f19>] symbol_put_addr+0x29/0x40
  [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]

Laura Abbott <labbott@redhat.com> produced a patch which lead us to
inspect symbol_put_addr(). This function has a comment claiming it
doesn't need to disable preemption around the module lookup
because it holds a reference to the module it wants to find, which
therefore cannot go away.

This is wrong (and a false optimization too, preempt_disable() is really
rather cheap, and I doubt any of this is on uber critical paths,
otherwise it would've retained a pointer to the actual module anyway and
avoided the second lookup).

While its true that the module cannot go away while we hold a reference
on it, the data structure we do the lookup in very much _CAN_ change
while we do the lookup. Therefore fix the comment and add the
required preempt_disable().

Reported-by: poma <pomidorabelisima@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: a6e6abd575 ("module: remove module_text_address()")
Cc: stable@kernel.org
2015-08-24 10:37:01 +09:30
Rusty Russell fe0d34d242 module: weaken locking assertion for oops path.
We don't actually hold the module_mutex when calling find_module_all
from module_kallsyms_lookup_name: that's because it's used by the oops
code and we don't want to deadlock.

However, access to the list read-only is safe if preempt is disabled,
so we can weaken the assertion.  Keep a strong version for external
callers though.

Fixes: 0be964be0d ("module: Sanitize RCU usage and locking")
Reported-by: He Kuang <hekuang@huawei.com>
Cc: stable@kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-07-29 06:13:22 +09:30
Peter Zijlstra 758556bdc1 module: Fix load_module() error path
The load_module() error path frees a module but forgot to take it out
of the mod_tree, leaving a dangling entry in the tree, causing havoc.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
Fixes: 93c2e105f6 ("module: Optimize __module_address() using a latched RB-tree")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-07-09 06:57:12 +09:30
Linus Torvalds 02201e3f1b Minor merge needed, due to function move.
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
 speed module address lookup.  He found some abusers of the module lock
 doing that too.
 
 A little bit of parameter work here too; including Dan Streetman's breaking
 up the big param mutex so writing a parameter can load another module (yeah,
 really).  Unfortunately that broke the usual suspects, !CONFIG_MODULES and
 !CONFIG_SYSFS, so those fixes were appended too.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
 rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
 wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
 GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
 PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
 HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
 dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
 tLdh/a9GiCitqS0bT7GE
 =tWPQ
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Main excitement here is Peter Zijlstra's lockless rbtree optimization
  to speed module address lookup.  He found some abusers of the module
  lock doing that too.

  A little bit of parameter work here too; including Dan Streetman's
  breaking up the big param mutex so writing a parameter can load
  another module (yeah, really).  Unfortunately that broke the usual
  suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
  appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
  modules: only use mod->param_lock if CONFIG_MODULES
  param: fix module param locks when !CONFIG_SYSFS.
  rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
  module: add per-module param_lock
  module: make perm const
  params: suppress unused variable error, warn once just in case code changes.
  modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
  kernel/module.c: avoid ifdefs for sig_enforce declaration
  kernel/workqueue.c: remove ifdefs over wq_power_efficient
  kernel/params.c: export param_ops_bool_enable_only
  kernel/params.c: generalize bool_enable_only
  kernel/module.c: use generic module param operaters for sig_enforce
  kernel/params: constify struct kernel_param_ops uses
  sysfs: tightened sysfs permission checks
  module: Rework module_addr_{min,max}
  module: Use __module_address() for module_address_lookup()
  module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
  module: Optimize __module_address() using a latched RB-tree
  rbtree: Implement generic latch_tree
  seqlock: Introduce raw_read_seqcount_latch()
  ...
2015-07-01 10:49:25 -07:00
Rusty Russell cf2fde7b39 param: fix module param locks when !CONFIG_SYSFS.
As Dan Streetman points out, the entire point of locking for is to
stop sysfs accesses, so they're elided entirely in the !SYSFS case.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-06-28 14:46:14 +09:30
Linus Torvalds 8d7804a2f0 Driver core patches for 4.2-rc1
Here is the driver core / firmware changes for 4.2-rc1.
 
 A number of small changes all over the place in the driver core, and in
 the firmware subsystem.  Nothing really major, full details in the
 shortlog.  Some of it is a bit of churn, given that the platform driver
 probing changes was found to not work well, so they were reverted.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D
 jg8AnRgdW8ArDA/xOarULd/X43eA3J3C
 =Al2B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the driver core / firmware changes for 4.2-rc1.

  A number of small changes all over the place in the driver core, and
  in the firmware subsystem.  Nothing really major, full details in the
  shortlog.  Some of it is a bit of churn, given that the platform
  driver probing changes was found to not work well, so they were
  reverted.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
  Revert "base/platform: Only insert MEM and IO resources"
  Revert "base/platform: Continue on insert_resource() error"
  Revert "of/platform: Use platform_device interface"
  Revert "base/platform: Remove code duplication"
  firmware: add missing kfree for work on async call
  fs: sysfs: don't pass count == 0 to bin file readers
  base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
  base/platform: Remove code duplication
  of/platform: Use platform_device interface
  base/platform: Continue on insert_resource() error
  base/platform: Only insert MEM and IO resources
  firmware: use const for remaining firmware names
  firmware: fix possible use after free on name on asynchronous request
  firmware: check for file truncation on direct firmware loading
  firmware: fix __getname() missing failure check
  drivers: of/base: move of_init to driver_init
  drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
  sysfs: disambiguate between "error code" and "failure" in comments
  driver-core: fix build for !CONFIG_MODULES
  driver-core: make __device_attach() static
  ...
2015-06-26 15:07:37 -07:00
Linus Torvalds e382608254 This patch series contains several clean ups and even a new trace clock
"monitonic raw". Also some enhancements to make the ring buffer even
 faster. But the biggest and most noticeable change is the renaming of
 the ftrace* files, structures and variables that have to deal with
 trace events.
 
 Over the years I've had several developers tell me about their confusion
 with what ftrace is compared to events. Technically, "ftrace" is the
 infrastructure to do the function hooks, which include tracing and also
 helps with live kernel patching. But the trace events are a separate
 entity altogether, and the files that affect the trace events should
 not be named "ftrace". These include:
 
   include/trace/ftrace.h	->	include/trace/trace_events.h
   include/linux/ftrace_event.h	->	include/linux/trace_events.h
 
 Also, functions that are specific for trace events have also been renamed:
 
   ftrace_print_*()		->	trace_print_*()
   (un)register_ftrace_event()	->	(un)register_trace_event()
   ftrace_event_name()		->	trace_event_name()
   ftrace_trigger_soft_disabled()->	trace_trigger_soft_disabled()
   ftrace_define_fields_##call() ->	trace_define_fields_##call()
   ftrace_get_offsets_##call()	->	trace_get_offsets_##call()
 
 Structures have been renamed:
 
   ftrace_event_file		->	trace_event_file
   ftrace_event_{call,class}	->	trace_event_{call,class}
   ftrace_event_buffer		->	trace_event_buffer
   ftrace_subsystem_dir		->	trace_subsystem_dir
   ftrace_event_raw_##call	->	trace_event_raw_##call
   ftrace_event_data_offset_##call->	trace_event_data_offset_##call
   ftrace_event_type_funcs_##call ->	trace_event_type_funcs_##call
 
 And a few various variables and flags have also been updated.
 
 This has been sitting in linux-next for some time, and I have not heard
 a single complaint about this rename breaking anything. Mostly because
 these functions, variables and structures are mostly internal to the
 tracing system and are seldom (if ever) used by anything external to that.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJViYhVAAoJEEjnJuOKh9ldcJ0IAI+mytwoMAN/CWDE8pXrTrgs
 aHlcr1zorSzZ0Lq6lKsWP+V0VGVhP8KWO16vl35HaM5ZB9U+cDzWiGobI8JTHi/3
 eeTAPTjQdgrr/L+ZO1ApzS1jYPhN3Xi5L7xublcYMJjKfzU+bcYXg/x8gRt0QbG3
 S9QN/kBt0JIIjT7McN64m5JVk2OiU36LxXxwHgCqJvVCPHUrriAdIX7Z5KRpEv13
 zxgCN4d7Jiec/FsMW8dkO0vRlVAvudZWLL7oDmdsvNhnLy8nE79UOeHos2c1qifQ
 LV4DeQ+2Hlu7w9wxixHuoOgNXDUEiQPJXzPc/CuCahiTL9N/urQSGQDoOVMltR4=
 =hkdz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This patch series contains several clean ups and even a new trace
  clock "monitonic raw".  Also some enhancements to make the ring buffer
  even faster.  But the biggest and most noticeable change is the
  renaming of the ftrace* files, structures and variables that have to
  deal with trace events.

  Over the years I've had several developers tell me about their
  confusion with what ftrace is compared to events.  Technically,
  "ftrace" is the infrastructure to do the function hooks, which include
  tracing and also helps with live kernel patching.  But the trace
  events are a separate entity altogether, and the files that affect the
  trace events should not be named "ftrace".  These include:

    include/trace/ftrace.h         ->    include/trace/trace_events.h
    include/linux/ftrace_event.h   ->    include/linux/trace_events.h

  Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*()               ->    trace_print_*()
    (un)register_ftrace_event()    ->    (un)register_trace_event()
    ftrace_event_name()            ->    trace_event_name()
    ftrace_trigger_soft_disabled() ->    trace_trigger_soft_disabled()
    ftrace_define_fields_##call()  ->    trace_define_fields_##call()
    ftrace_get_offsets_##call()    ->    trace_get_offsets_##call()

  Structures have been renamed:

    ftrace_event_file              ->    trace_event_file
    ftrace_event_{call,class}      ->    trace_event_{call,class}
    ftrace_event_buffer            ->    trace_event_buffer
    ftrace_subsystem_dir           ->    trace_subsystem_dir
    ftrace_event_raw_##call        ->    trace_event_raw_##call
    ftrace_event_data_offset_##call->    trace_event_data_offset_##call
    ftrace_event_type_funcs_##call ->    trace_event_type_funcs_##call

  And a few various variables and flags have also been updated.

  This has been sitting in linux-next for some time, and I have not
  heard a single complaint about this rename breaking anything.  Mostly
  because these functions, variables and structures are mostly internal
  to the tracing system and are seldom (if ever) used by anything
  external to that"

* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  ring_buffer: Allow to exit the ring buffer benchmark immediately
  ring-buffer-benchmark: Fix the wrong type
  ring-buffer-benchmark: Fix the wrong param in module_param
  ring-buffer: Add enum names for the context levels
  ring-buffer: Remove useless unused tracing_off_permanent()
  ring-buffer: Give NMIs a chance to lock the reader_lock
  ring-buffer: Add trace_recursive checks to ring_buffer_write()
  ring-buffer: Allways do the trace_recursive checks
  ring-buffer: Move recursive check to per_cpu descriptor
  ring-buffer: Add unlikelys to make fast path the default
  tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
  tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
  tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
  tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
  tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
  tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
  tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
  tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
  tracing: Rename ftrace_event_name() to trace_event_name()
  tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
  ...
2015-06-26 14:02:43 -07:00
Dan Streetman b51d23e4e9 module: add per-module param_lock
Add a "param_lock" mutex to each module, and update params.c to use
the correct built-in or module mutex while locking kernel params.
Remove the kparam_block_sysfs_r/w() macros, replace them with direct
calls to kernel_param_[un]lock(module).

The kernel param code currently uses a single mutex to protect
modification of any and all kernel params.  While this generally works,
there is one specific problem with it; a module callback function
cannot safely load another module, i.e. with request_module() or even
with indirect calls such as crypto_has_alg().  If the module to be
loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
config file), then the attempt will result in a deadlock between the
first module param callback waiting for modprobe, and modprobe trying to
lock the single kernel param mutex to set the new module's param.

This fixes that by using per-module mutexes, so that each individual module
is protected against concurrent changes in its own kernel params, but is
not blocked by changes to other module params.  All built-in modules
continue to use the built-in mutex, since they will always be loaded at
runtime and references (e.g. request_module(), crypto_has_alg()) to them
will never cause load-time param changing.

This also simplifies the interface used by modules to block sysfs access
to their params; while there are currently functions to block and unblock
sysfs param access which are split up by read and write and expect a single
kernel param to be passed, their actual operation is identical and applies
to all params, not just the one passed to them; they simply lock and unlock
the global param mutex.  They are replaced with direct calls to
kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
if the module is built-in, it locks the built-in mutex.

Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-06-23 15:27:38 +09:30
Greg Kroah-Hartman 987aec39a7 Merge 4.1-rc7 into driver-core-next
We want the fixes in this branch as well for testing and merge
resolution.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08 10:19:40 -07:00
Luis R. Rodriguez 6727bb9c6a kernel/module.c: avoid ifdefs for sig_enforce declaration
There's no need to require an ifdef over the declaration
of sig_enforce as IS_ENABLED() can be used. While at it,
there's no harm in exposing this kernel parameter outside of
CONFIG_MODULE_SIG as it'd be a no-op on non module sig
kernels.

Now, technically we should in theory be able to remove
the #ifdef'ery over the declaration of the module parameter
as we are also trusting the bool_enable_only code for
CONFIG_MODULE_SIG kernels but for now remain paranoid
and keep it.

With time if no one can put a bullet through bool_enable_only
and if there are no technical requirements over not exposing
CONFIG_MODULE_SIG_FORCE with the measures in place by
bool_enable_only we could remove this last ifdef.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:13 +09:30
Luis R. Rodriguez d19f05d8a8 kernel/params.c: generalize bool_enable_only
This takes out the bool_enable_only implementation from
the module loading code and generalizes it so that others
can make use of it.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:11 +09:30
Luis R. Rodriguez 05f408dddb kernel/module.c: use generic module param operaters for sig_enforce
We're directly checking and modifying sig_enforce when needed instead
of using the generic helpers. This prevents us from generalizing this
helper so that others can use it. Use indirect helpers to allow us
to generalize this code a bit and to make it a bit more clear what
this is doing.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: cocci@systeme.lip6.fr
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:11 +09:30
Peter Zijlstra 4f666546d0 module: Rework module_addr_{min,max}
__module_address() does an initial bound check before doing the
{list/tree} iteration to find the actual module. The bound variables
are nowhere near the mod_tree cacheline, in fact they're nowhere near
one another.

module_addr_min lives in .data while module_addr_max lives in .bss
(smarty pants GCC thinks the explicit 0 assignment is a mistake).

Rectify this by moving the two variables into a structure together
with the latch_tree_root to guarantee they all share the same
cacheline and avoid hitting two extra cachelines for the lookup.

While reworking the bounds code, move the bound update from allocation
to insertion time, this avoids updating the bounds for a few error
paths.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:09 +09:30
Peter Zijlstra b7df4d1b23 module: Use __module_address() for module_address_lookup()
Use the generic __module_address() addr to struct module lookup
instead of open coding it once more.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:08 +09:30
Peter Zijlstra 6c9692e2d6 module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
Andrew worried about the overhead on small systems; only use the fancy
code when either perf or tracing is enabled.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:07 +09:30
Peter Zijlstra 93c2e105f6 module: Optimize __module_address() using a latched RB-tree
Currently __module_address() is using a linear search through all
modules in order to find the module corresponding to the provided
address. With a lot of modules this can take a lot of time.

One of the users of this is kernel_text_address() which is employed
in many stack unwinders; which in turn are used by perf-callchain and
ftrace (possibly from NMI context).

So by optimizing __module_address() we optimize many stack unwinders
which are used by both perf and tracing in performance sensitive code.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:07 +09:30
Peter Zijlstra 0be964be0d module: Sanitize RCU usage and locking
Currently the RCU usage in module is an inconsistent mess of RCU and
RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu()
does not imply synchronize_sched().

Most usage sites use preempt_{dis,en}able() which is RCU-sched, but
(most of) the modification sites use synchronize_rcu(). With the
exception of the module bug list, which actually uses RCU.

Convert everything over to RCU-sched.

Furthermore add lockdep asserts to all sites, because it's not at all
clear to me the required locking is observed, esp. on exported
functions.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:31:52 +09:30
Peter Zijlstra 926a59b1df module: Annotate module version magic
Due to the new lockdep checks in the coming patch, we go:

[    9.759380] ------------[ cut here ]------------
[    9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130()
[    9.759391] Modules linked in:
[    9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65
[    9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[    9.759396]  ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001
[    9.759398]  0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8
[    9.759400]  0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000
[    9.759400] Call Trace:
[    9.759407]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
[    9.759410]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
[    9.759412]  [<ffffffff810ec160>] ? section_objs+0x60/0x60
[    9.759414]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
[    9.759415]  [<ffffffff810ed9c1>] each_symbol_section+0x121/0x130
[    9.759417]  [<ffffffff810eda01>] find_symbol+0x31/0x70
[    9.759420]  [<ffffffff810ef5bf>] load_module+0x20f/0x2660
[    9.759422]  [<ffffffff8104ef10>] ? __do_page_fault+0x190/0x4e0
[    9.759426]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759427]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759433]  [<ffffffff810ae73d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
[    9.759437]  [<ffffffff812fcc0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    9.759439]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
[    9.759441]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
[    9.759443]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
[    9.759445] ---[ end trace 9294429076a9c644 ]---

As per the comment this site should be fine, but lets wrap it in
preempt_disable() anyhow to placate lockdep.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-27 11:09:50 +09:30
Luis R. Rodriguez f2411da746 driver-core: add driver module asynchronous probe support
Some init systems may wish to express the desire to have device drivers
run their probe() code asynchronously. This implements support for this
and allows userspace to request async probe as a preference through a
generic shared device driver module parameter, async_probe.

Implementation for async probe is supported through a module parameter
given that since synchronous probe has been prevalent for years some
userspace might exist which relies on the fact that the device driver
will probe synchronously and the assumption that devices it provides
will be immediately available after this.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:24 -07:00
Luis R. Rodriguez ecc8617053 module: add extra argument for parse_params() callback
This adds an extra argument onto parse_params() to be used
as a way to make the unused callback a bit more useful and
generic by allowing the caller to pass on a data structure
of its choice. An example use case is to allow us to easily
make module parameters for every module which we will do
next.

@ parse @
identifier name, args, params, num, level_min, level_max;
identifier unknown, param, val, doing;
type s16;
@@
 extern char *parse_args(const char *name,
 			 char *args,
 			 const struct kernel_param *params,
 			 unsigned num,
 			 s16 level_min,
 			 s16 level_max,
+			 void *arg,
 			 int (*unknown)(char *param, char *val,
					const char *doing
+					, void *arg
					));

@ parse_mod @
identifier name, args, params, num, level_min, level_max;
identifier unknown, param, val, doing;
type s16;
@@
 char *parse_args(const char *name,
 			 char *args,
 			 const struct kernel_param *params,
 			 unsigned num,
 			 s16 level_min,
 			 s16 level_max,
+			 void *arg,
 			 int (*unknown)(char *param, char *val,
					const char *doing
+					, void *arg
					))
{
	...
}

@ parse_args_found @
expression R, E1, E2, E3, E4, E5, E6;
identifier func;
@@

(
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   func);
|
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   &func);
|
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   NULL);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   func);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   &func);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   NULL);
)

@ parse_args_unused depends on parse_args_found @
identifier parse_args_found.func;
@@

int func(char *param, char *val, const char *unused
+		 , void *arg
		 )
{
	...
}

@ mod_unused depends on parse_args_found @
identifier parse_args_found.func;
expression A1, A2, A3;
@@

-	func(A1, A2, A3);
+	func(A1, A2, A3, NULL);

Generated-by: Coccinelle SmPL
Cc: cocci@systeme.lip6.fr
Cc: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:24 -07:00
Steven Rostedt (Red Hat) af658dca22 tracing: Rename ftrace_event.h to trace_events.h
The term "ftrace" is really the infrastructure of the function hooks,
and not the trace events. Rename ftrace_event.h to trace_events.h to
represent the trace_event infrastructure and decouple the term ftrace
from it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:05:12 -04:00
Steven Rostedt 37815bf866 module: Call module notifier on failure after complete_formation()
The module notifier call chain for MODULE_STATE_COMING was moved up before
the parsing of args, into the complete_formation() call. But if the module failed
to load after that, the notifier call chain for MODULE_STATE_GOING was
never called and that prevented the users of those call chains from
cleaning up anything that was allocated.

Link: http://lkml.kernel.org/r/554C52B9.9060700@gmail.com

Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Fixes: 4982223e51 "module: set nx before marking module MODULE_STATE_COMING"
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-09 03:29:24 +09:30
Linus Torvalds 15ce2658dd Quentin opened a can of worms by adding extable entry checking to modpost,
but most architectures seem fixed now.  Thanks to all involved.
 
 Last minute rebase because I noticed a "[PATCH]" had snuck into a commit
 message somehow.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVN1X3AAoJENkgDmzRrbjxmBEP/1Kv1wCrUGg07TfqNTJgr5Pe
 R81C+8P2wSEsA94lBhLNbn2DsN7uWsygcSXZMQ+5CrmbKAG45kRlMUzSlg6IdGXo
 usA9rOlH4hR+Dqn3zyx543xMJyp0a0LoSCLWGMmr/U71AXNp5kehvbIv64gJHrwg
 4CALI9qCwHH4FXF7WBYdLknlfpmD7KzCjIVckkfUfwf2A8mZXKdqBFQ+MPqbChEm
 vpFWkgp1oJgK4OcvvBeiNT2PwFZy1vmJtQWc2lieJ6HCjbqvwgM0g3EYTIQmccIm
 O40mEFMNkUuliMtrCZgOJBP2bmWy79JydiUq5Mu5Cb4bL6jGl2r6Z+w+iQ7Sggv9
 DTqTrk0/Cn02krJXDXAHVOXXgTqggf1xev6wr+3dbZn5sytQ8ddImuuVai8KALoC
 aEXAsnAK+3GnkAn9qqKVFhnrnCcI69I6srUAn5DKVWu36+Ye05FsZboIFYcxcK1F
 NZ+31xA17xA+yoo62ly/Auz7DXMa2QZfJcoQLMx2fMgL/+wMqzfGbcrZ5ZaVIj4w
 ePig1SbozAweAAfX7+W+QCQ3wqNN/nJgMO8vPldlcq7eaXH0/Dqu72cC2d82dK+H
 xl/1pEj83W0AkUkfNS8j3Z4zSzg31g8bK5SmPPRID00mx8roCfsP/Cfx5hRiO8U2
 ECtbKDxSyUWwF9vnOLVR
 =EYrW
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Quentin opened a can of worms by adding extable entry checking to
  modpost, but most architectures seem fixed now.  Thanks to all
  involved.

  Last minute rebase because I noticed a "[PATCH]" had snuck into a
  commit message somehow"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modpost: don't emit section mismatch warnings for compiler optimizations
  modpost: expand pattern matching to support substring matches
  modpost: do not try to match the SHT_NUL section.
  modpost: fix extable entry size calculation.
  modpost: fix inverted logic in is_extable_fault_address().
  modpost: handle -ffunction-sections
  modpost: Whitelist .text.fixup and .exception.text
  params: handle quotes properly for values not of form foo="bar".
  modpost: document the use of struct section_check.
  modpost: handle relocations mismatch in __ex_table.
  scripts: add check_extable.sh script.
  modpost: mismatch_handler: retrieve tosym information only when needed.
  modpost: factorize symbol pretty print in get_pretty_name().
  modpost: add handler function pointer to sectioncheck.
  modpost: add .sched.text and .kprobes.text to the TEXT_SECTIONS list.
  modpost: add strict white-listing when referencing sections.
  module: do not print allocation-fail warning on bogus user buffer size
  kernel/module.c: fix typos in message about unused symbols
2015-04-22 09:49:24 -07:00
Linus Torvalds eeee78cf77 Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
 
 Tracepoints have helper functions for the TP_printk() called
 __print_symbolic() and __print_flags() that lets a numeric number be
 displayed as a a human comprehensible text. What is placed in the
 TP_printk() is also shown in the tracepoint format file such that
 user space tools like perf and trace-cmd can parse the binary data
 and express the values too. Unfortunately, the way the TRACE_EVENT()
 macro works, anything placed in the TP_printk() will be shown pretty
 much exactly as is. The problem arises when enums are used. That's
 because unlike macros, enums will not be changed into their values
 by the C pre-processor. Thus, the enum string is exported to the
 format file, and this makes it useless for user space tools.
 
 The TRACE_DEFINE_ENUM() solves this by converting the enum strings
 in the TP_printk() format into their number, and that is what is
 shown to user space. For example, the tracepoint tlb_flush currently
 has this in its format file:
 
      __print_symbolic(REC->reason,
         { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
         { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
         { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
         { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
 
 After adding:
 
      TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
      TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
      TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
      TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
 
 Its format file will contain this:
 
      __print_symbolic(REC->reason,
         { 0, "flush on task switch" },
         { 1, "remote shootdown" },
         { 2, "local shootdown" },
         { 3, "local mm shootdown" })
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR
 wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr
 y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM
 dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+
 aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu
 JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk=
 =Ixwt
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Some clean ups and small fixes, but the biggest change is the addition
  of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

  Tracepoints have helper functions for the TP_printk() called
  __print_symbolic() and __print_flags() that lets a numeric number be
  displayed as a a human comprehensible text.  What is placed in the
  TP_printk() is also shown in the tracepoint format file such that user
  space tools like perf and trace-cmd can parse the binary data and
  express the values too.  Unfortunately, the way the TRACE_EVENT()
  macro works, anything placed in the TP_printk() will be shown pretty
  much exactly as is.  The problem arises when enums are used.  That's
  because unlike macros, enums will not be changed into their values by
  the C pre-processor.  Thus, the enum string is exported to the format
  file, and this makes it useless for user space tools.

  The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
  the TP_printk() format into their number, and that is what is shown to
  user space.  For example, the tracepoint tlb_flush currently has this
  in its format file:

     __print_symbolic(REC->reason,
        { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
        { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
        { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
        { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

  After adding:

     TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
     TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
     TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
     TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

  Its format file will contain this:

     __print_symbolic(REC->reason,
        { 0, "flush on task switch" },
        { 1, "remote shootdown" },
        { 2, "local shootdown" },
        { 3, "local mm shootdown" })"

* tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
  tracing: Add enum_map file to show enums that have been mapped
  writeback: Export enums used by tracepoint to user space
  v4l: Export enums used by tracepoints to user space
  SUNRPC: Export enums in tracepoints to user space
  mm: tracing: Export enums in tracepoints to user space
  irq/tracing: Export enums in tracepoints to user space
  f2fs: Export the enums in the tracepoints to userspace
  net/9p/tracing: Export enums in tracepoints to userspace
  x86/tlb/trace: Export enums in used by tlb_flush tracepoint
  tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
  tracing: Allow for modules to convert their enums to values
  tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
  tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
  tracing: Give system name a pointer
  brcmsmac: Move each system tracepoints to their own header
  iwlwifi: Move each system tracepoints to their own header
  mac80211: Move message tracepoints to their own header
  tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
  tracing: Add TRACE_SYSTEM_VAR to kvm-s390
  tracing: Add TRACE_SYSTEM_VAR to intel-sst
  ...
2015-04-14 10:49:03 -07:00
Linus Torvalds 3afe9f8496 Copy the kernel module data from user space in chunks
Unlike most (all?) other copies from user space, kernel module loading
is almost unlimited in size.  So we do a potentially huge
"copy_from_user()" when we copy the module data from user space to the
kernel buffer, which can be a latency concern when preemption is
disabled (or voluntary).

Also, because 'copy_from_user()' clears the tail of the kernel buffer on
failures, even a *failed* copy can end up wasting a lot of time.

Normally neither of these are concerns in real life, but they do trigger
when doing stress-testing with trinity.  Running in a VM seems to add
its own overheadm causing trinity module load testing to even trigger
the watchdog.

The simple fix is to just chunk up the module loading, so that it never
tries to copy insanely big areas in one go.  That bounds the latency,
and also the amount of (unnecessarily, in this case) cleared memory for
the failure case.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-08 14:35:48 -07:00
Steven Rostedt (Red Hat) 3673b8e4ce tracing: Allow for modules to convert their enums to values
Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM()
will have those enums converted into their values in the tracepoint
print fmt strings.

Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.au

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-08 09:39:57 -04:00
Kirill A. Shutemov cc9e605dc6 module: do not print allocation-fail warning on bogus user buffer size
init_module(2) passes user-specified buffer length directly to
vmalloc(). It makes warn_alloc_failed() to print out a lot of info into
dmesg if user specified insane size, like -1.

Let's silence the warning. It doesn't add much value to -ENOMEM return
code. Without the patch the syscall is prohibitive noisy for testing
with trinity.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-24 12:32:37 +10:30
Yannick Guerrini 7b63c3ab9b kernel/module.c: fix typos in message about unused symbols
Fix typos in pr_warn message about unused symbols

Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-24 12:32:36 +10:30
Peter Zijlstra 35a9393c95 lockdep: Fix the module unload key range freeing logic
Module unload calls lockdep_free_key_range(), which removes entries
from the data structures. Most of the lockdep code OTOH assumes the
data structures are append only; in specific see the comments in
add_lock_to_list() and look_up_lock_class().

Clearly this has only worked by accident; make it work proper. The
actual scenario to make it go boom would involve the memory freed by
the module unlock being re-allocated and re-used for a lock inside of
a rcu-sched grace period. This is a very unlikely scenario, still
better plug the hole.

Use RCU list iteration in all places and ammend the comments.

Change lockdep_free_key_range() to issue a sync_sched() between
removal from the lists and returning -- which results in the memory
being freed. Further ensure the callers are placed correctly and
comment the requirements.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Tsyvarev <tsyvarev@ispras.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:49:07 +01:00
Andrey Ryabinin a5af5aa8b6 kasan, module, vmalloc: rework shadow allocation for modules
Current approach in handling shadow memory for modules is broken.

Shadow memory could be freed only after memory shadow corresponds it is no
longer used.  vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:

    void vfree(const void *addr)
    {
    ...
        if (unlikely(in_interrupt())) {
            struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
            if (llist_add((struct llist_node *)addr, &p->list))
                    schedule_work(&p->wq);

Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.

So shadow memory should be freed after module's memory.  However, such
deallocation order could race with kasan_module_alloc() in module_alloc().

Free shadow right before releasing vm area.  At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-12 18:46:08 -07:00
Laura Abbott 168e47f2a6 kernel/module.c: Update debug alignment after symtable generation
When CONFIG_DEBUG_SET_MODULE_RONX is enabled, the sizes of
module sections are aligned up so appropriate permissions can
be applied. Adjusting for the symbol table may cause them to
become unaligned. Make sure to re-align the sizes afterward.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-06 12:04:22 +00:00
Jan Kiszka be02a18623 kernel/module.c: do not inline do_init_module()
This provides a reliable breakpoint target, required for automatic symbol
loading via the gdb helper command 'lx-symbols'.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:53 -08:00
Andrey Ryabinin bebf56a1b1 kasan: enable instrumentation of global variables
This feature let us to detect accesses out of bounds of global variables.
This will work as for globals in kernel image, so for globals in modules.
Currently this won't work for symbols in user-specified sections (e.g.
__init, __read_mostly, ...)

The idea of this is simple.  Compiler increases each global variable by
redzone size and add constructors invoking __asan_register_globals()
function.  Information about global variable (address, size, size with
redzone ...) passed to __asan_register_globals() so we could poison
variable's redzone.

This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
address making shadow memory handling (
kasan_module_alloc()/kasan_module_free() ) more simple.  Such alignment
guarantees that each shadow page backing modules address space correspond
to only one module_alloc() allocation.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:42 -08:00
Peter Zijlstra 9cc019b8c9 module: Replace over-engineered nested sleep
Since the introduction of the nested sleep warning; we've established
that the occasional sleep inside a wait_event() is fine.

wait_event() loops are invariant wrt. spurious wakeups, and the
occasional sleep has a similar effect on them. As long as its occasional
its harmless.

Therefore replace the 'correct' but verbose wait_woken() thing with
a simple annotation to shut up the warning.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-11 15:02:04 +10:30
Peter Zijlstra d64810f561 module: Annotate nested sleep in resolve_symbol()
Because wait_event() loops are safe vs spurious wakeups we can allow the
occasional sleep -- which ends up being very similar.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-11 15:02:04 +10:30
Marcel Holtmann ab92ebbb8e module: Remove double spaces in module verification taint message
The warning message when loading modules with a wrong signature has
two spaces in it:

"module verification failed: signature and/or  required key missing"

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-06 15:31:41 +10:30
Andrey Tsyvarev de96d79f34 kernel/module.c: Free lock-classes if parse_args failed
parse_args call module parameters' .set handlers, which may use locks defined in the module.
So, these classes should be freed in case parse_args returns error(e.g. due to incorrect parameter passed).

Signed-off-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-06 15:31:40 +10:30
Rusty Russell d5db139ab3 module: make module_refcount() a signed integer.
James Bottomley points out that it will be -1 during unload.  It's
only used for diagnostics, so let's not hide that as it could be a
clue as to what's gone wrong.

Cc: Jason Wessel <jason.wessel@windriver.com>
Acked-and-documention-added-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Masami Hiramatsu <maasami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-22 11:15:54 +10:30
Rusty Russell c749637909 module: fix race in kallsyms resolution during module load success.
The kallsyms routines (module_symbol_name, lookup_module_* etc) disable
preemption to walk the modules rather than taking the module_mutex:
this is because they are used for symbol resolution during oopses.

This works because there are synchronize_sched() and synchronize_rcu()
in the unload and failure paths.  However, there's one case which doesn't
have that: the normal case where module loading succeeds, and we free
the init section.

We don't want a synchronize_rcu() there, because it would slow down
module loading: this bug was introduced in 2009 to speed module
loading in the first place.

Thus, we want to do the free in an RCU callback.  We do this in the
simplest possible way by allocating a new rcu_head: if we put it in
the module structure we'd have to worry about that getting freed.

Reported-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-20 11:38:34 +10:30
Rusty Russell be1f221c04 module: remove mod arg from module_free, rename module_memfree().
Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.

This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org
2015-01-20 11:38:33 +10:30
Rusty Russell d453cded05 module_arch_freeing_init(): new hook for archs before module->module_init freed.
Archs have been abusing module_free() to clean up their arch-specific
allocations.  Since module_free() is also (ab)used by BPF and trace code,
let's keep it to simple allocations, and provide a hook called before
that.

This means that avr32, ia64, parisc and s390 no longer need to implement
their own module_free() at all.  avr32 doesn't need module_finalize()
either.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
2015-01-20 11:38:32 +10:30
Linus Torvalds d790be3863 The exciting thing here is the getting rid of stop_machine on module
removal.  This is possible by using a simple atomic_t for the counter,
 rather than our fancy per-cpu counter: it turns out that no one is doing
 a module increment per net packet, so the slowdown should be in the noise.
 
 Also, script fixed for new git version.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUk3cQAAoJENkgDmzRrbjxr44P/25ZBYmKZZ3XM3flt2o0LCti
 1Px+MRbWuXhueWQOYZSXOO3c2ENNuV3siaU4jQZqnxslpdvT4rVsVFkYuwva2vHT
 hqpoq1Hz++yjFJArjERFOdoZ1gxkBbZQQGYm8esToAqU3b2Z74SrU48dPwp65q/1
 r6hbXdWSiKALEBZeW2coi+QVCL/oxE8hmNqDO1mpe82aEKu0xIVpTdU5vAfBIj8/
 Z95U2bx+CjiP7khhSjBGtltLqxL6QXw1m2eg1gO9nf1gJNI0/dAY6IJmFbGz+7Bt
 CAyc9BRsB40Em8G7d7wr4FsURcLfmYNdjtx79j+Rot5PkVIi+Ztv7C1QYlMQESPa
 ESddUMySOmKlzTm50w3ZLvV1ZTRU8TjmttSkzQYZ3csCLkKUgfeL9SAxU9KGoA2l
 jFxrvDcWEHtuU1D/FeYyOofNaD/BflPfdhj4WAm9XnPPi+THEu7fulWJaIP4glHh
 8TpYNbinXuZqXO4nJ41Ad5utbSbBQa4fFBUuViWRTU0TtWJT2HVqn/XoYJ5mnPEz
 IbYh31rQDKFJKzePfscWrJ6XzoF59yGiAVcWcI3HS7aT8bFZGapAQu9mNCVu+cLF
 uRxWrukHG7d8YeYrAtbVXWfxArR155V9QJN55hQ1nKLq2M03gNvYTtAPw2yEsfuw
 u3Fk/KkV1RfaiFurjoG/
 =rDum
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "The exciting thing here is the getting rid of stop_machine on module
  removal.  This is possible by using a simple atomic_t for the counter,
  rather than our fancy per-cpu counter: it turns out that no one is
  doing a module increment per net packet, so the slowdown should be in
  the noise"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  param: do not set store func without write perm
  params: cleanup sysfs allocation
  kernel:module Fix coding style errors and warnings.
  module: Remove stop_machine from module unloading
  module: Replace module_ref with atomic_t refcnt
  lib/bug: Use RCU list ops for module_bug_list
  module: Unlink module with RCU synchronizing instead of stop_machine
  module: Wait for RCU synchronizing before releasing a module
2014-12-18 20:55:41 -08:00
Ionut Alexa 6da0b56515 kernel:module Fix coding style errors and warnings.
Fixed codin style errors and warnings. Changes printk with
print_debug/warn. Changed seq_printf to seq_puts.

Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed bogus KERN_DEFAULT conversion)
2014-11-11 17:07:47 +10:30
Masami Hiramatsu e513cc1c07 module: Remove stop_machine from module unloading
Remove stop_machine from module unloading by adding new reference
counting algorithm.

This atomic refcounter works like a semaphore, it can get (be
incremented) only when the counter is not 0. When loading a module,
kmodule subsystem sets the counter MODULE_REF_BASE (= 1). And when
unloading the module, it subtracts MODULE_REF_BASE from the counter.
If no one refers the module, the refcounter becomes 0 and we can
remove the module safely. If someone referes it, we try to recover
the counter by adding MODULE_REF_BASE unless the counter becomes 0,
because the referrer can put the module right before recovering.
If the recovering is failed, we can get the 0 refcount and it
never be incremented again, it can be removed safely too.

Note that __module_get() forcibly gets the module refcounter,
users should use try_module_get() instead of that.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11 17:07:46 +10:30
Masami Hiramatsu 2f35c41f58 module: Replace module_ref with atomic_t refcnt
Replace module_ref per-cpu complex reference counter with
an atomic_t simple refcnt. This is for code simplification.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11 17:07:46 +10:30
Masami Hiramatsu 0286b5ea12 lib/bug: Use RCU list ops for module_bug_list
Actually since module_bug_list should be used in BUG context,
we may not need this. But for someone who want to use this
from normal context, this makes module_bug_list an RCU list.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11 17:07:46 +10:30
Masami Hiramatsu 461e34aed0 module: Unlink module with RCU synchronizing instead of stop_machine
Unlink module from module list with RCU synchronizing instead
of using stop_machine(). Since module list is already protected
by rcu, we don't need stop_machine() anymore.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11 17:07:45 +10:30
Masami Hiramatsu 4f48795b61 module: Wait for RCU synchronizing before releasing a module
Wait for RCU synchronizing on failure path of module loading
before releasing struct module, because the memory of mod->list
can still be accessed by list walkers (e.g. kallsyms).

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11 17:07:44 +10:30
Peter Zijlstra 3c9b2c3d64 sched, modules: Fix nested sleep in add_unformed_module()
This is a genuine bug in add_unformed_module(), we cannot use blocking
primitives inside a wait loop.

So rewrite the wait_event_interruptible() usage to use the fresh
wait_woken() stuff.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: oleg@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20140924082242.458562904@infradead.org
[ So this is probably complex to backport and the race wasn't reported AFAIK,
  so not marked for -stable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:56:30 +01:00
Linus Torvalds 50edb5cc22 A single panic fix for a rare race, stable CC'd.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUQFD3AAoJENkgDmzRrbjxJu4P/jBrwRdvrdaGSj+p3su1azYi
 ROXspE3HzUpWh3ANItPzVbTT55CXBcXgd5x/MHNGfgolYKaAiVLRtVByMn60vwry
 NXM2xfCluA/ITkra4f1IheKYp6FS2reAmLxEXcgvTrt2Zhb+bhY5ogg+J/jw8WCT
 fXBkj01O5JEPVi0tXAGDuvoKKeRmjop1Sa5TDZS1VGFYrYI/iLK4G3HcwkIiOTzK
 JvehkpIof7GYz69E6zalANPrHknUDk9YWvUniCkJhqUz5mI8vGgt3ZdpLsun5yUQ
 s48qbFJxGRprDSy0D3IY/wEQwelXiqmWZtwXvcKsJWRP1z8JoQYwQiySQfRWcv3d
 0cxCNbdn/YgvLYe+g+Zjy5etOu76dtWspgPRsnmAuFTdX0RbjS+KNXw+TCLCeQfO
 uKZFhz8SGZwIcnUCjVO4Z/MaTuIeFeKHpb1zl1vNw3QvW0LSnb6j2WA6aEGEmFS0
 TNJ0twG1n0VpJRP4uaS3okQOeb7yx/Gmdk1NuA2ZBB8bZHN1Likc6fTW3WKdlW0o
 yFZL0ly5xvJTL582FfRrZk76DXW+S6nvj6/BWsIs+5/ZPGPOxb8pPQw7h30gO6+e
 ECSTDuflFaaKF02phkaE7VajDG2uuLrvGvEpn/1K0hhtzpITfnBV3qaGw32UqwpF
 e02nzk8mDpdewLdXwwB6
 =pl+4
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module fix from Rusty Russell:
 "A single panic fix for a rare race, stable CC'd"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modules, lock around setting of MODULE_STATE_UNFORMED
2014-10-18 10:24:26 -07:00
Prarit Bhargava d3051b489a modules, lock around setting of MODULE_STATE_UNFORMED
A panic was seen in the following sitation.

There are two threads running on the system. The first thread is a system
monitoring thread that is reading /proc/modules. The second thread is
loading and unloading a module (in this example I'm using my simple
dummy-module.ko).  Note, in the "real world" this occurred with the qlogic
driver module.

When doing this, the following panic occurred:

 ------------[ cut here ]------------
 kernel BUG at kernel/module.c:3739!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
 CPU: 37 PID: 186343 Comm: cat Tainted: GF          O--------------   3.10.0+ #7
 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
 task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
 RIP: 0010:[<ffffffff810d64c5>]  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
 RSP: 0018:ffff88080fa7fe18  EFLAGS: 00010246
 RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
 RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
 RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
 R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
 R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
 FS:  00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
  ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
  ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
 Call Trace:
  [<ffffffff810d666c>] m_show+0x19c/0x1e0
  [<ffffffff811e4d7e>] seq_read+0x16e/0x3b0
  [<ffffffff812281ed>] proc_reg_read+0x3d/0x80
  [<ffffffff811c0f2c>] vfs_read+0x9c/0x170
  [<ffffffff811c1a58>] SyS_read+0x58/0xb0
  [<ffffffff81605829>] system_call_fastpath+0x16/0x1b
 Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
 RIP  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
  RSP <ffff88080fa7fe18>

    Consider the two processes running on the system.

    CPU 0 (/proc/modules reader)
    CPU 1 (loading/unloading module)

    CPU 0 opens /proc/modules, and starts displaying data for each module by
    traversing the modules list via fs/seq_file.c:seq_open() and
    fs/seq_file.c:seq_read().  For each module in the modules list, seq_read
    does

            op->start()  <-- this is a pointer to m_start()
            op->show()   <- this is a pointer to m_show()
            op->stop()   <-- this is a pointer to m_stop()

    The m_start(), m_show(), and m_stop() module functions are defined in
    kernel/module.c. The m_start() and m_stop() functions acquire and release
    the module_mutex respectively.

    ie) When reading /proc/modules, the module_mutex is acquired and released
    for each module.

    m_show() is called with the module_mutex held.  It accesses the module
    struct data and attempts to write out module data.  It is in this code
    path that the above BUG_ON() warning is encountered, specifically m_show()
    calls

    static char *module_flags(struct module *mod, char *buf)
    {
            int bx = 0;

            BUG_ON(mod->state == MODULE_STATE_UNFORMED);
    ...

    The other thread, CPU 1, in unloading the module calls the syscall
    delete_module() defined in kernel/module.c.  The module_mutex is acquired
    for a short time, and then released.  free_module() is called without the
    module_mutex.  free_module() then sets mod->state = MODULE_STATE_UNFORMED,
    also without the module_mutex.  Some additional code is called and then the
    module_mutex is reacquired to remove the module from the modules list:

        /* Now we can delete it from the lists */
        mutex_lock(&module_mutex);
        stop_machine(__unlink_module, mod, NULL);
        mutex_unlock(&module_mutex);

This is the sequence of events that leads to the panic.

CPU 1 is removing dummy_module via delete_module().  It acquires the
module_mutex, and then releases it.  CPU 1 has NOT set dummy_module->state to
MODULE_STATE_UNFORMED yet.

CPU 0, which is reading the /proc/modules, acquires the module_mutex and
acquires a pointer to the dummy_module which is still in the modules list.
CPU 0 calls m_show for dummy_module.  The check in m_show() for
MODULE_STATE_UNFORMED passed for dummy_module even though it is being
torn down.

Meanwhile CPU 1, which has been continuing to remove dummy_module without
holding the module_mutex, now calls free_module() and sets
dummy_module->state to MODULE_STATE_UNFORMED.

CPU 0 now calls module_flags() with dummy_module and ...

static char *module_flags(struct module *mod, char *buf)
{
        int bx = 0;

        BUG_ON(mod->state == MODULE_STATE_UNFORMED);

and BOOM.

Acquire and release the module_mutex lock around the setting of
MODULE_STATE_UNFORMED in the teardown path, which should resolve the
problem.

Testing: In the unpatched kernel I can panic the system within 1 minute by
doing

while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done

and

while (true) do cat /proc/modules; done

in separate terminals.

In the patched kernel I was able to run just over one hour without seeing
any issues.  I also verified the output of panic via sysrq-c and the output
of /proc/modules looks correct for all three states for the dummy_module.

        dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
        dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
        dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2014-10-15 10:20:09 +10:30
Linus Torvalds 6325e940e7 arm64 updates for 3.18:
- eBPF JIT compiler for arm64
 - CPU suspend backend for PSCI (firmware interface) with standard idle
   states defined in DT (generic idle driver to be merged via a different
   tree)
 - Support for CONFIG_DEBUG_SET_MODULE_RONX
 - Support for unmapped cpu-release-addr (outside kernel linear mapping)
 - set_arch_dma_coherent_ops() implemented and bus notifiers removed
 - EFI_STUB improvements when base of DRAM is occupied
 - Typos in KGDB macros
 - Clean-up to (partially) allow kernel building with LLVM
 - Other clean-ups (extern keyword, phys_addr_t usage)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUNB6NAAoJEGvWsS0AyF7x22sP/1qPQvFoY71fSqTZmSY+kfgW
 UMXhDFZOd+khD2TPHWptbgBRDElTQjRPHyISv/8ILKwDNoMlUDLlYkp1XPLM/nlB
 ea9ou2GX8iktqgM2JF5r4vk1hjH6JqEGOUHyWKZc7ibphTVm3dhg3nWL1A4peOUG
 0UyX79kl8BLAaggLSUhjtUz1GMpSNlb6Pc1ForUXaPMayBlOcVoOzh1ir7b5wb3e
 IvotUY1gv+opE9uK0QPr1AJSfpCogPEfQ2TSCP8MQZjxkrEz69n0HaFvdy60rwf4
 DaJiqBoQ5MSP3Bw+qvoYgyz+tfiPFAvEF+O3YQ5x3LBTteoooriFYH4mL7DsicAs
 2WLor/342mHykE0bOc44/gNl8B/xaZNzvO2ezLYrjVGsiY2QHTZ7fXB8arPUvQSS
 RUXVfHmcv4qthZjI17rgreBKvsfeFIMighSfvMJnVhGqDSvB8abjiPwZjzqB91Bq
 pu5MDitNgR3k3ctwzRaS6JtH2CluVFv97xIS4VaD/hm3JnS5NPeTXFou3Gb3lvon
 d/wXOIB3vY8FDMIt+BMCQPzWiU0liZ/sN7p1bsOmkgZ1wLOZ0nmsaHF09PDRGbtA
 vifopwaw9qtNlcVrTB/rDBCDaT0Ds/mTYD/a3+ch5CYUeLmQmfW/vBMfq/3gUt65
 JdI/nTVXawbl2CpBWw36
 =SAfQ
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 - eBPF JIT compiler for arm64
 - CPU suspend backend for PSCI (firmware interface) with standard idle
   states defined in DT (generic idle driver to be merged via a
   different tree)
 - Support for CONFIG_DEBUG_SET_MODULE_RONX
 - Support for unmapped cpu-release-addr (outside kernel linear mapping)
 - set_arch_dma_coherent_ops() implemented and bus notifiers removed
 - EFI_STUB improvements when base of DRAM is occupied
 - Typos in KGDB macros
 - Clean-up to (partially) allow kernel building with LLVM
 - Other clean-ups (extern keyword, phys_addr_t usage)

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (51 commits)
  arm64: Remove unneeded extern keyword
  ARM64: make of_device_ids const
  arm64: Use phys_addr_t type for physical address
  aarch64: filter $x from kallsyms
  arm64: Use DMA_ERROR_CODE to denote failed allocation
  arm64: Fix typos in KGDB macros
  arm64: insn: Add return statements after BUG_ON()
  arm64: debug: don't re-enable debug exceptions on return from el1_dbg
  Revert "arm64: dmi: Add SMBIOS/DMI support"
  arm64: Implement set_arch_dma_coherent_ops() to replace bus notifiers
  of: amba: use of_dma_configure for AMBA devices
  arm64: dmi: Add SMBIOS/DMI support
  arm64: Correct ftrace calls to aarch64_insn_gen_branch_imm()
  arm64:mm: initialize max_mapnr using function set_max_mapnr
  setup: Move unmask of async interrupts after possible earlycon setup
  arm64: LLVMLinux: Fix inline arm64 assembly for use with clang
  arm64: pageattr: Correctly adjust unaligned start addresses
  net: bpf: arm64: fix module memory leak when JIT image build fails
  arm64: add PSCI CPU_SUSPEND based cpu_suspend support
  arm64: kernel: introduce cpu_init_idle CPU operation
  ...
2014-10-08 05:34:24 -04:00
Kyle McMartin 6c34f1f542 aarch64: filter $x from kallsyms
Similar to ARM, AArch64 is generating $x and $d syms... which isn't
terribly helpful when looking at %pF output and the like. Filter those
out in kallsyms, modpost and when looking at module symbols.

Seems simplest since none of these check EM_ARM anyway, to just add it
to the strchr used, rather than trying to make things overly
complicated.

initcall_debug improves:
dmesg_before.txt: initcall $x+0x0/0x154 [sg] returned 0 after 26331 usecs
dmesg_after.txt: initcall init_sg+0x0/0x154 [sg] returned 0 after 15461 usecs

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-10-02 17:01:51 +01:00
Jani Nikula 6a4c264313 module: rename KERNEL_PARAM_FL_NOARG to avoid confusion
Make it clear this is about kernel_param_ops, not kernel_param (which
will soon have a flags field of its own). No functional changes.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Jon Mason <jon.mason@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-08-27 21:54:07 +09:30
Andy Lutomirski ff7e0055bb module: Clean up ro/nx after early module load failures
The commit

    4982223e51 module: set nx before marking module MODULE_STATE_COMING.

introduced a regression: if a module fails to parse its arguments or
if mod_sysfs_setup fails, then the module's memory will be freed
while still read-only.  Anything that reuses that memory will crash
as soon as it tries to write to it.

Cc: stable@vger.kernel.org # v3.16
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-08-16 04:47:00 +09:30
Linus Torvalds c8d6637d04 This finally applies the stricter sysfs perms checking we pulled out
before last merge window.  A few stragglers are fixed (thanks linux-next!)
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT6CrEAAoJENkgDmzRrbjx3GoQAI1rt8XbTE8zVGf1PKp0SL10
 gWWL9BnnHtUFriwgIbT4mBa1p0wnavIzJIeUBH0rJb2BNAbf7mBT7CFPrMuS+iV2
 WlRoy/chIFnX5A7m6ddaHnzL8lPhMFvUi8dpvxO6FwpyhhNcUHqmb+uCZeLjTX/m
 Gj5mlOlilvH2NSugKyiTapCgcQMQqaaxcwKxyg1z3FRo12gwKvTBdjzdA3Fg7k4T
 TAEbTG4Fq6Q7DkQYDpJK2KWDkPmJ7hxExHFW/M0m1r7DpxY1oHI95TsugU3Mr2mM
 90S15vA6Sn0l1+bRiv5qHF26VjOpdhC8uQhydjnX+lqzBGBRNoMUE/ubmxd43G4m
 /VlVJ9ZD40HLEmRFdtJI6UZSHYwDh7eruVH7Sjj8KFiqGps/F6nDOhV7fVLOdI+0
 J9pLBbj1mA38pIK/XC3r2k8Z/u9GB/7tJFirzmk5rIVzNb/4GBrn/Cgf+GDX7djz
 r8c2QnLeUIht5fm34qKNnSQ/o+ZBKmG6f2bLuBesntZMsAD2cC5TUEP15NERuF3a
 Wa7Wn1Y9WuonH7O3j+PoUOys/bGLXZeFXfKYS8A8SGroE99xo/QhkRm/sNU0+wEz
 JTN4Sra03imE/YSniFnRyRiAShR3KAVen/yfOx6XPs/r5XrFG14Q7cqCKjp1EjHj
 TX5scRWFM5qntTSloGJt
 =9mjn
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "This finally applies the stricter sysfs perms checking we pulled out
  before last merge window.  A few stragglers are fixed (thanks
  linux-next!)"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  arch/powerpc/platforms/powernv/opal-dump.c: fix world-writable sysfs files
  arch/powerpc/platforms/powernv/opal-elog.c: fix world-writable sysfs files
  drivers/video/fbdev/s3c2410fb.c: don't make debug world-writable.
  ARM: avoid ARM binutils leaking ELF local symbols
  scripts: modpost: Remove numeric suffix pattern matching
  scripts: modpost: fix compilation warning
  sysfs: disallow world-writable files.
  module: return bool from within_module*()
  module: add within_module() function
  modules: Fix build error in moduleloader.h
2014-08-10 21:31:58 -07:00
Russell King 2e3a10a155 ARM: avoid ARM binutils leaking ELF local symbols
Symbols starting with .L are ELF local symbols and should not appear
in ELF symbol tables.  However, unfortunately ARM binutils leaks the
.LANCHOR symbols into the symbol table, which leads kallsyms to report
these symbols rather than the real name.  It is not very useful when
%pf reports symbols against these leaked .LANCHOR symbols.

Arrange for kallsyms to ignore these symbols using the same mechanism
that is used for the ARM mapping symbols.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-07-27 20:52:47 +09:30
Petr Mladek 9b20a352d7 module: add within_module() function
It is just a small optimization that allows to replace few
occurrences of within_module_init() || within_module_core()
with a single call.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-07-27 20:52:43 +09:30
Jarod Wilson 002c77a48b crypto: fips - only panic on bad/missing crypto mod signatures
Per further discussion with NIST, the requirements for FIPS state that
we only need to panic the system on failed kernel module signature checks
for crypto subsystem modules. This moves the fips-mode-only module
signature check out of the generic module loading code, into the crypto
subsystem, at points where we can catch both algorithm module loads and
mode module loads. At the same time, make CONFIG_CRYPTO_FIPS dependent on
CONFIG_MODULE_SIG, as this is entirely necessary for FIPS mode.

v2: remove extraneous blank line, perform checks in static inline
function, drop no longer necessary fips.h include.

CC: "David S. Miller" <davem@davemloft.net>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Stephan Mueller <stephan.mueller@atsec.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-07-03 21:38:32 +08:00
Linus Torvalds 4251c2a670 Most of this is cleaning up various driver sysfs permissions so we can
re-add the perm check (we unified the module param and sysfs checks, but
 the module ones were stronger so we weakened them temporarily).
 
 Param parsing gets documented, and also "--" now forces args to be
 handed to init (and ignored by the kernel).
 
 Module NX/RO protections get tightened: we now set them before calling
 parse_args().
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTl+oJAAoJENkgDmzRrbjxtUEP/jIXml01jE2HquOJ/DfrCJOt
 ry5L5Iy8wVBRotTszrXqlD6+W8fLYsEdhM65Wof1H7X1qjaulqYZmrL7bQn4rIGN
 YPUmO5rOzECeAPNW5+e2JLnR4bmS99gVcWzJFCHUBd7Z8ceKaoIk7/XvUg6Mdjg7
 v0kJ5X+U9da2sVYYcZ71euth4ADLFDRNRexA1mPI6mKzJLOBgfvCBWZnkFVdBcjd
 VmL6ceFo/yP9Ed4pgG/4uXq1dZ4ZttpjPusDmNcjq+snOzsQb4tW+KB2Pr6iTwQy
 TDt7lQm5+xfUXgUG/S5L6PYn10P44Voo7AEJa+QK5YPSOY/eRVA0h4/ayP0vqDaJ
 LpZjqXbW77G4yOgEV9KRFLLXiFXykTh2TyCPYL5G2XVXQp1OmViu2f21JWJLFLgL
 mqOXYWdowOGVOOoTgwxIdxczCFCATJUaU5Ig6ay8C02E2mCwIV+IaGSdpsCiyjz/
 dNNumMxWg0NMo/c0YG4K3Ake6ZaGrwbnuJYijaEj6mgpifhh7k4yhFciXGLpkLnS
 Yuo4ORO0GX34z1+bX0iwrgMGPdy7+BnbXsDdWJsbsnwnKKes/Sp44fNl4lPwdM3n
 siaPsxmfAtl9EGqbkU1Fk+x5+X/Lv2I/7/nX5n53520RLkJJpbeMDfHUqpbrqeUN
 JNUTOZ9o72EqDVKnn175
 =IxSN
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Most of this is cleaning up various driver sysfs permissions so we can
  re-add the perm check (we unified the module param and sysfs checks,
  but the module ones were stronger so we weakened them temporarily).

  Param parsing gets documented, and also "--" now forces args to be
  handed to init (and ignored by the kernel).

  Module NX/RO protections get tightened: we now set them before calling
  parse_args()"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: set nx before marking module MODULE_STATE_COMING.
  samples/kobject/: avoid world-writable sysfs files.
  drivers/hid/hid-picolcd_fb: avoid world-writable sysfs files.
  drivers/staging/speakup/: avoid world-writable sysfs files.
  drivers/regulator/virtual: avoid world-writable sysfs files.
  drivers/scsi/pm8001/pm8001_ctl.c: avoid world-writable sysfs files.
  drivers/hid/hid-lg4ff.c: avoid world-writable sysfs files.
  drivers/video/fbdev/sm501fb.c: avoid world-writable sysfs files.
  drivers/mtd/devices/docg3.c: avoid world-writable sysfs files.
  speakup: fix incorrect perms on speakup_acntsa.c
  cpumask.h: silence warning with -Wsign-compare
  Documentation: Update kernel-parameters.tx
  param: hand arguments after -- straight to init
  modpost: Fix resource leak in read_dump()
2014-06-11 16:09:14 -07:00
Rusty Russell 4982223e51 module: set nx before marking module MODULE_STATE_COMING.
We currently set RO & NX on modules very late: after we move them from
MODULE_STATE_UNFORMED to MODULE_STATE_COMING, and after we call
parse_args() (which can exec code in the module).

Much better is to do it in complete_formation() and then call
the notifier.

This means that the notifiers will be called on a module which
is already RO & NX, so that may cause problems (ftrace already
changed so they're unaffected).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-05-14 10:55:47 +09:30
Linus Torvalds 60b88f3941 Fixed one missing place for the new taint flag, and remove a warning
giving only false positives (now we finally figured out why).
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTYdCQAAoJENkgDmzRrbjxms4QALIGN2l8VugSoh3TSRHSGZtj
 5clH84FXkDR8DFA0w9rYxAsr1EhTadet8U1nCm6LWaz8FAPizH2hyUq6tFMU1+Jk
 zdWRPYLhuUBWW+XVFSeYo2gIclFHEYefawX9SmRcZJxuDy7xHW/bkmX/NT5p/Ll7
 3eKRPckO09agofLQgIOJGL21IQPFXYiCwur5b/OvNfzEkBfRmUALbO2oFhU+oebZ
 2P4M3Wmp7gEGbus2dB23v06BqpEhrdpXlAnvM61PS8exhsQI6ojgL3ZAYEl+6wkr
 whd0SjYs5Sd+3czlQDhlArYlcOlVAhvY4F5CHysEmM/CxjF1YAnk2Q7RLOV958Bk
 TTfDGG2b8qkJwN/2+CymDXyIUIppNPMuPXSOp3XQrRGOz8Uyh1URQD8l24Ssmrtt
 +3fUPDZ6npmtkxZdBu0SkdesCXYOtOeqpqt7MQpJiYbVMxx+ul4LnPB/A1+wf/Xx
 uvXMrpp1fz/hs9ZOK8n+nRMtbsc75LDQ0lYGcbbW8YJRkluf5/GJgyG8ptIvbbFW
 kh90ObVaJ2FN0Uj31POdtsOwM7tf2W5C1lZkE/aWf+wgNylHAylYoUHRIGFOcCqV
 PeWrD0Chz+bzrZk1sT6cHIvTu6u5ShjkOfcEGhWK2JFllxpKO4eZV4O1IaGhWaoV
 Y9JtmJNSOnnS261i1Rmb
 =725P
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module fixes from Rusty Russell:
 "Fixed one missing place for the new taint flag, and remove a warning
  giving only false positives (now we finally figured out why)"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: remove warning about waiting module removal.
  Fix: tracing: use 'E' instead of 'X' for unsigned module taint flag
2014-05-01 10:35:01 -07:00
Steven Rostedt (Red Hat) a949ae560a ftrace/module: Hardcode ftrace_module_init() call into load_module()
A race exists between module loading and enabling of function tracer.

	CPU 1				CPU 2
	-----				-----
  load_module()
   module->state = MODULE_STATE_COMING

				register_ftrace_function()
				 mutex_lock(&ftrace_lock);
				 ftrace_startup()
				  update_ftrace_function();
				   ftrace_arch_code_modify_prepare()
				    set_all_module_text_rw();
				   <enables-ftrace>
				    ftrace_arch_code_modify_post_process()
				     set_all_module_text_ro();

				[ here all module text is set to RO,
				  including the module that is
				  loading!! ]

   blocking_notifier_call_chain(MODULE_STATE_COMING);
    ftrace_init_module()

     [ tries to modify code, but it's RO, and fails!
       ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-28 10:37:21 -04:00
Rusty Russell 51e158c12a param: hand arguments after -- straight to init
The kernel passes any args it doesn't need through to init, except it
assumes anything containing '.' belongs to the kernel (for a module).
This change means all users can clearly distinguish which arguments
are for init.

For example, the kernel uses debug ("dee-bug") to mean log everything to
the console, where systemd uses the debug from the Scandinavian "day-boog"
meaning "fail to boot".  If a future versions uses argv[] instead of
reading /proc/cmdline, this confusion will be avoided.

eg: test 'FOO="this is --foo"' -- 'systemd.debug="true true true"'

Gives:
argv[0] = '/debug-init'
argv[1] = 'test'
argv[2] = 'systemd.debug=true true true'
envp[0] = 'HOME=/'
envp[1] = 'TERM=linux'
envp[2] = 'FOO=this is --foo'

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-04-28 11:48:34 +09:30
Rusty Russell 79465d2fd4 module: remove warning about waiting module removal.
We remove the waiting module removal in commit 3f2b9c9cdf (September
2013), but it turns out that modprobe in kmod (< version 16) was
asking for waiting module removal.  No one noticed since modprobe would
check for 0 usage immediately before trying to remove the module, and
the race is unlikely.

However, it means that anyone running old (but not ancient) kmod
versions is hitting the printk designed to see if anyone was running
"rmmod -w".  All reports so far have been false positives, so remove
the warning.

Fixes: 3f2b9c9cdf
Reported-by: Valerio Vanni <valerio.vanni@inwind.it>
Cc: Elliott, Robert (Server Storage) <Elliott@hp.com>
Cc: stable@kernel.org
Acked-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-04-28 11:06:59 +09:30
Christoph Lameter 08f141d3db modules: use raw_cpu_write for initialization of per cpu refcount.
The initialization of a structure is not subject to synchronization.
The use of __this_cpu would trigger a false positive with the additional
preemption checks for __this_cpu ops.

So simply disable the check through the use of raw_cpu ops.

Trace:

  __this_cpu_write operation in preemptible [00000000] code: modprobe/286
  caller is __this_cpu_preempt_check+0x38/0x60
  CPU: 3 PID: 286 Comm: modprobe Tainted: GF            3.12.0-rc4+ #187
  Call Trace:
    dump_stack+0x4e/0x82
    check_preemption_disabled+0xec/0x110
    __this_cpu_preempt_check+0x38/0x60
    load_module+0xcfd/0x2650
    SyS_init_module+0xa6/0xd0
    tracesys+0xe1/0xe6

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Linus Torvalds 6f4c98e1c2 Nothing major: the stricter permissions checking for sysfs broke
a staging driver; fix included.  Greg KH said he'd take the patch
 but hadn't as the merge window opened, so it's included here
 to avoid breaking build.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQMH9AAoJENkgDmzRrbjxo4UP/jwlenP44v+RFpo/dn8Z8E2n
 SREQscU5ZZKvuyFD6kUdvOz8YC/nTrJvXoVkMUF05GVbuvb8/8UPtT9ECVemd0rW
 xNy4aFfv9rbrqRLBLpLK9LAgTuhwlbTgGxgL78zRn3hWmf1hBZWCY+cEvKM8l/+9
 oEQdORL0sUpZh7iryAeGqbOrXT4gqJEvSLOFwiYTSo6ryzWIilmdXSUAh6s8MIEX
 PR1+oH9J8B6J29lcXKMf8/sDI1EBUeSLdBmMCuN5Y7xpYxsQLroVx94kPbdBY+XK
 ZRoYuUGSUJfGRZY46cFKApIGeF07z1DGoyXghbSWEQrI+23TMUmrKUg47LSukE4Y
 yCUf8HAtqIA3gVc9GKDdSp/2UpkAhTTv5ogKgnIzs1InWtOIBdDRSVUQXDosFEXw
 6ZZe1pQs2zfXyXxO4j0Wq36K4RgI0aqOVw+dcC+w5BidjVylgnYRV0PSDd72tid7
 bIfnjDbUBo+o4LanPNGYK474KyO7AslgTE50w6zwbJzgdwCQ36hCpKqScBZzm60a
 42LrgTVoIHHWAL1tDzWL/LzWflZGdJAezzNje0/f2Q3bGMiNHWoljAvUphkTZ7qt
 E8+jWqmM+riH3e8Y5wKpO1BKt7NGHISEy//bUlnqTwisjIzVILZ6VjfugQ1AI+0x
 llTXPBotFvfvXqxunBg7
 =yzUO
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Nothing major: the stricter permissions checking for sysfs broke a
  staging driver; fix included.  Greg KH said he'd take the patch but
  hadn't as the merge window opened, so it's included here to avoid
  breaking build"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  staging: fix up speakup kobject mode
  Use 'E' instead of 'X' for unsigned module taint flag.
  VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
  kallsyms: fix percpu vars on x86-64 with relocation.
  kallsyms: generalize address range checking
  module: LLVMLinux: Remove unused function warning from __param_check macro
  Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
  module: remove MODULE_GENERIC_TABLE
  module: allow multiple calls to MODULE_DEVICE_TABLE() per module
  module: use pr_cont
2014-04-06 09:38:07 -07:00
Linus Torvalds 176ab02d49 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 LTO changes from Peter Anvin:
 "More infrastructure work in preparation for link-time optimization
  (LTO).  Most of these changes is to make sure symbols accessed from
  assembly code are properly marked as visible so the linker doesn't
  remove them.

  My understanding is that the changes to support LTO are still not
  upstream in binutils, but are on the way there.  This patchset should
  conclude the x86-specific changes, and remaining patches to actually
  enable LTO will be fed through the Kbuild tree (other than keeping up
  with changes to the x86 code base, of course), although not
  necessarily in this merge window"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  Kbuild, lto: Handle basic LTO in modpost
  Kbuild, lto: Disable LTO for asm-offsets.c
  Kbuild, lto: Add a gcc-ld script to let run gcc as ld
  Kbuild, lto: add ld-version and ld-ifversion macros
  Kbuild, lto: Drop .number postfixes in modpost
  Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
  lto: Disable LTO for sys_ni
  lto: Handle LTO common symbols in module loader
  lto, workaround: Add workaround for initcall reordering
  lto: Make asmlinkage __visible
  x86, lto: Disable LTO for the x86 VDSO
  initconst, x86: Fix initconst mistake in ts5500 code
  initconst: Fix initconst mistake in dcdbas
  asmlinkage: Make trace_hardirqs_on/off_caller visible
  asmlinkage, x86: Fix 32bit memcpy for LTO
  asmlinkage Make __stack_chk_failed and memcmp visible
  asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
  asmlinkage: Make main_extable_sort_needed visible
  asmlinkage, mutex: Mark __visible
  asmlinkage: Make trace_hardirq visible
  ...
2014-03-31 14:13:25 -07:00
Rusty Russell 57673c2b0b Use 'E' instead of 'X' for unsigned module taint flag.
Takashi Iwai <tiwai@suse.de> says:
> The letter 'X' has been already used for SUSE kernels for very long
> time, to indicate the external supported modules.  Can the new flag be
> changed to another letter for avoiding conflict...?
> (BTW, we also use 'N' for "no support", too.)

Note: this code should be cleaned up, so we don't have such maps in
three places!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-31 14:52:43 +10:30
Dave Jones 8c90487cdc Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
the flag to encompass a wider range of pushing the CPU beyond its
warrany.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20 16:28:09 -07:00
Mathieu Desnoyers 66cc69e34e Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.

This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.

Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.

With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.

Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).

Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
NAKed-by: Ingo Molnar <mingo@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 12:11:51 +10:30
Jiri Slaby 27bba4d6bb module: use pr_cont
When dumping loaded modules, we print them one by one in separate
printks. Let's use pr_cont as they are continuation prints.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 12:10:59 +10:30
Joe Mario 80375980f1 lto: Handle LTO common symbols in module loader
Here is the workaround I made for having the kernel not reject modules
built with -flto.  The clean solution would be to get the compiler to not
emit the symbol.  Or if it has to emit the symbol, then emit it as
initialized data but put it into a comdat/linkonce section.

Minor tweaks by AK over Joe's patch.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391846481-31491-5-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 20:24:50 -08:00
Tetsuo Handa 22e669568d module: Add missing newline in printk call.
Add missing \n and also follow commit bddb12b3 "kernel/module.c: use pr_foo()".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-01-21 09:59:16 +10:30
Linus Torvalds ce6513f758 Mainly boring here, too. rmmod --wait finally removed, though.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSe3ngAAoJENkgDmzRrbjxqkMP/jFwTIVy+tZbPL36xR4C7UI/
 JZ9JU2c2HTyAtqp/T/bljA0QUDQWUASCfwG5WmRgyvMkEwhfuGrQ3dveQLRq5iKD
 Ln/LIN8JXXijhRr+ywhXLAcp1P5ysSJJYYS5lZTCmJ2Cv9jnAvmUl0KqdTEx+ZNH
 YsWBiI9+WmwhODiAdUlqtThDK37w8OsWeMq2agf97bBERlRYnRZvzwy3tSP2mf5j
 4wx8viOdzPC7NVblyX1cj3gonFFQJtMI4s/e787QzkUpNQjvrN3XecPiQX6aBCX3
 seVjuv6panv1tw1HqyU1KXWo7fs2uCc9mVR5Rr3Zok+8qpKWkj0dyCnF3A+ufsrO
 vlkrFLUsv/U1NUkWJM6mJKzMjKRD4iF702QsEEpNA5rlOsAMMGSSlju4eu6GvadI
 ZJ+ZDaNWUDPbWa9Xgjyp+DKWR6vybNgEHZmLmcCdeLt1u8Th1E/ujsKxv4SN6eIO
 2v+lNPjGEivoNXUX52toRZ1324U3FFzburCSA0c55+r1sjPT6SXCfl8kISSKvVtt
 iFemsDxhaSwqVzqbsx3ztU010Z0f9uVbpZHAQgZ514Uk25HtwhkaQSdiIP+cPXE8
 rClzj9m4gD+Jy0T+P0HjPlSxKCGSlgLiEBWEigX36/F4Isv+GL1HjvrGGCWM4VnO
 lIyw5ux/UH8USct9nH4x
 =xg2p
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Mainly boring here, too.  rmmod --wait finally removed, though"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modpost: fix bogus 'exported twice' warnings.
  init: fix in-place parameter modification regression
  asmlinkage, module: Make ksymtab and kcrctab symbols and __this_module __visible
  kernel: add support for init_array constructors
  modpost: Optionally ignore secondary errors seen if a single module build fails
  module: remove rmmod --wait option.
2013-11-15 13:27:50 +09:00
Andrew Morton bddb12b32f kernel/module.c: use pr_foo()
kernel/module.c uses a mix of printk(KERN_foo and pr_foo().  Convert it
all to pr_foo and make the offered cleanups.

Not sure what to do about the printk(KERN_DEFAULT).  We don't have a
pr_default().

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Joe Perches <joe@perches.com>
Cc: Frantisek Hrbata <fhrbata@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:34 +09:00
Frantisek Hrbata eb3057df73 kernel: add support for init_array constructors
This adds the .init_array section as yet another section with constructors. This
is needed because gcc could add __gcov_init calls to .init_array or .ctors
section, depending on gcc (and binutils) version .

v2: - reuse mod->ctors for .init_array section for modules, because gcc uses
      .ctors or .init_array, but not both at the same time
v3: - fail to load if that does happen somehow.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-17 15:05:17 +10:30
Rusty Russell 3f2b9c9cdf module: remove rmmod --wait option.
The option to wait for a module reference count to reach zero was in
the initial module implementation, but it was never supported in
modprobe (you had to use rmmod --wait).  After discussion with Lucas,
It has been deprecated (with a 10 second sleep) in kmod for the last
year.

This finally removes it: the flag will evoke a printk warning and a
normal (non-blocking) remove attempt.

Cc: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-23 15:44:58 +09:30
Linus Torvalds 45d9a2220f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "Unfortunately, this merge window it'll have a be a lot of small piles -
  my fault, actually, for not keeping #for-next in anything that would
  resemble a sane shape ;-/

  This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
  cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
  components) + several long-standing patches from various folks.

  There definitely will be a lot more (starting with Miklos'
  check_submount_and_drop() series)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  direct-io: Handle O_(D)SYNC AIO
  direct-io: Implement generic deferred AIO completions
  add formats for dentry/file pathnames
  kvm eventfd: switch to fdget
  powerpc kvm: use fdget
  switch fchmod() to fdget
  switch epoll_ctl() to fdget
  switch copy_module_from_fd() to fdget
  git simplify nilfs check for busy subtree
  ibmasmfs: don't bother passing superblock when not needed
  don't pass superblock to hypfs_{mkdir,create*}
  don't pass superblock to hypfs_diag_create_files
  don't pass superblock to hypfs_vm_create_files()
  oprofile: get rid of pointless forward declarations of struct super_block
  oprofilefs_create_...() do not need superblock argument
  oprofilefs_mkdir() doesn't need superblock argument
  don't bother with passing superblock to oprofile_create_stats_files()
  oprofile: don't bother with passing superblock to ->create_files()
  don't bother passing sb to oprofile_create_files()
  coh901318: don't open-code simple_read_from_buffer()
  ...
2013-09-05 08:50:26 -07:00
Al Viro a2e0578be3 switch copy_module_from_fd() to fdget
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 23:04:44 -04:00
Li Zhong 942e443127 module: Fix mod->mkobj.kobj potentially freed too early
DEBUG_KOBJECT_RELEASE helps to find the issue attached below.

After some investigation, it seems the reason is:
The mod->mkobj.kobj(ffffffffa01600d0 below) is freed together with mod
itself in free_module(). However, its children still hold references to
it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
child(holders below) tries to decrease the reference count to its parent
in kobject_del(), BUG happens as it tries to access already freed memory.

This patch tries to fix it by waiting for the mod->mkobj.kobj to be
really released in the module removing process (and some error code
paths).

[ 1844.175287] kobject: 'holders' (ffff88007c1f1600): kobject_release, parent ffffffffa01600d0 (delayed)
[ 1844.178991] kobject: 'notes' (ffff8800370b2a00): kobject_release, parent ffffffffa01600d0 (delayed)
[ 1845.180118] kobject: 'holders' (ffff88007c1f1600): kobject_cleanup, parent ffffffffa01600d0
[ 1845.182130] kobject: 'holders' (ffff88007c1f1600): auto cleanup kobject_del
[ 1845.184120] BUG: unable to handle kernel paging request at ffffffffa01601d0
[ 1845.185026] IP: [<ffffffff812cda81>] kobject_put+0x11/0x60
[ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
[ 1845.185026] Oops: 0000 [#1] PREEMPT
[ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
[ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G           O 3.11.0-rc6-next-20130819+ #1
[ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1845.185026] Workqueue: events kobject_delayed_cleanup
[ 1845.185026] task: ffff88007ca51f00 ti: ffff88007ca5c000 task.ti: ffff88007ca5c000
[ 1845.185026] RIP: 0010:[<ffffffff812cda81>]  [<ffffffff812cda81>] kobject_put+0x11/0x60
[ 1845.185026] RSP: 0018:ffff88007ca5dd08  EFLAGS: 00010282
[ 1845.185026] RAX: 0000000000002000 RBX: ffffffffa01600d0 RCX: ffffffff8177d638
[ 1845.185026] RDX: ffff88007ca5dc18 RSI: 0000000000000000 RDI: ffffffffa01600d0
[ 1845.185026] RBP: ffff88007ca5dd18 R08: ffffffff824e9810 R09: ffffffffffffffff
[ 1845.185026] R10: ffff8800ffffffff R11: dead4ead00000001 R12: ffffffff81a95040
[ 1845.185026] R13: ffff88007b27a960 R14: ffff88007c1f1600 R15: 0000000000000000
[ 1845.185026] FS:  0000000000000000(0000) GS:ffffffff81a23000(0000) knlGS:0000000000000000
[ 1845.185026] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1845.185026] CR2: ffffffffa01601d0 CR3: 0000000037207000 CR4: 00000000000006b0
[ 1845.185026] Stack:
[ 1845.185026]  ffff88007c1f1600 ffff88007c1f1600 ffff88007ca5dd38 ffffffff812cdb7e
[ 1845.185026]  0000000000000000 ffff88007c1f1640 ffff88007ca5dd68 ffffffff812cdbfe
[ 1845.185026]  ffff88007c974800 ffff88007c1f1640 ffff88007ff61a00 0000000000000000
[ 1845.185026] Call Trace:
[ 1845.185026]  [<ffffffff812cdb7e>] kobject_del+0x2e/0x40
[ 1845.185026]  [<ffffffff812cdbfe>] kobject_delayed_cleanup+0x6e/0x1d0
[ 1845.185026]  [<ffffffff81063a45>] process_one_work+0x1e5/0x670
[ 1845.185026]  [<ffffffff810639e3>] ? process_one_work+0x183/0x670
[ 1845.185026]  [<ffffffff810642b3>] worker_thread+0x113/0x370
[ 1845.185026]  [<ffffffff810641a0>] ? rescuer_thread+0x290/0x290
[ 1845.185026]  [<ffffffff8106bfba>] kthread+0xda/0xe0
[ 1845.185026]  [<ffffffff814ff0f0>] ? _raw_spin_unlock_irq+0x30/0x60
[ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
[ 1845.185026]  [<ffffffff8150751a>] ret_from_fork+0x7a/0xb0
[ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
[ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d <f6> 87 00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84
[ 1845.185026] RIP  [<ffffffff812cda81>] kobject_put+0x11/0x60
[ 1845.185026]  RSP <ffff88007ca5dd08>
[ 1845.185026] CR2: ffffffffa01601d0
[ 1845.185026] ---[ end trace 49a70afd109f5653 ]---

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-03 16:35:47 +09:30
Chen Gang cc56ded3fd kernel/module.c: use scnprintf() instead of sprintf()
For some strings, they are permitted to be larger than PAGE_SIZE, so
need use scnprintf() instead of sprintf(), or it will cause issue.

One case is:

  if a module version is crazy defined (length more than PAGE_SIZE),
  'modinfo' command is still OK (print full contents),
  but for "cat /sys/modules/'modname'/version", will cause issue in kernel.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-08-20 15:37:46 +09:30
Steven Rostedt 0ce814096f module: Add NOARG flag for ops with param_set_bool_enable_only() set function
The ops that uses param_set_bool_enable_only() as its set function can
easily handle being used without an argument. There's no reason to
fail the loading of the module if it does not have one.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-08-20 15:37:43 +09:30
Linus Torvalds 8133633368 Nothing interesting. Except the most embarrassing bugfix ever. But let's
ignore that.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR3NrgAAoJENkgDmzRrbjxaQ0P/RwaKDlmrrFiFD6yYZRQKM1G
 esv3H5j3jU29Eo5WvY1IEoWbMCK4ObL9DL86CCL/XgYZIhJfTiND+R2W8XBSEXec
 qwJXb3oMWVMDBBnxknl/pf0RWhwaFwxsQ89Yco6zsGTWQ0kB6+PAaUNOh91MBOYJ
 UM3nJD79Xqv9X/WcTtPNUmVmmZ7WJFqQ3UYhpTL2WRfRnEO/ku8GwAvLunSFuLBP
 iI4IjSMM8ssLL10g3Cm7J/pzLTTRctmjEZzIclACUrAgm3V+s1NeARztKZiuCVoz
 3Eu0VIp5t2rvifOajj0UF0pdO8Q3XpsMj+lr21gg2rc2VTHKZ3oobEzGX7Ev0lae
 BH1DdW3ivgyq3cBi56Dh3aZDggF24h+QO6qN92s73l0qYK1t+eAg1qc76lsayclB
 ozJJD9/xuEqLV3GScbMdqb4AxTc6Y0ks0SWD07PvoQkthH/2tPiLpSvrITdKdCrY
 0+RssMuO4StM1e8ALjHqDz+/2k8MS70b0e8JUp0e1WaKzMUKKUYZIqQTGqczHH7u
 4hWJiSMnsZqzLLKUX2A6nVlg0eo6zgyqHuxk6O4fHghXWOJJlLjWcMQW26dBS3ki
 FpWYsJeHw6HcITcbWXkdozygf0GOwiaHaI6ivVnB3L8spc6VFD4gvDpbcpAeixGX
 BfFA7nG/Hy3aOT05juMj
 =OVO/
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Nothing interesting.  Except the most embarrassing bugfix ever.  But
  let's ignore that"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: cleanup call chain.
  module: do percpu allocation after uniqueness check.  No, really!
  modules: don't fail to load on unknown parameters.
  ABI: Clarify when /sys/module/MODULENAME is created
  There is no /sys/parameters
  module: don't modify argument of module_kallsyms_lookup_name()
2013-07-10 14:51:41 -07:00
Rusty Russell 9eb76d7797 module: cleanup call chain.
Fold alloc_module_percpu into percpu_modalloc().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-07-03 10:15:10 +09:30
Rusty Russell 8d8022e8ab module: do percpu allocation after uniqueness check. No, really!
v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting
percpu memory on large machines:

    Now we have a new state MODULE_STATE_UNFORMED, we can insert the
    module into the list (and thus guarantee its uniqueness) before we
    allocate the per-cpu region.

In my defence, it didn't actually say the patch did this.  Just that
we "can".

This patch actually *does* it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Jim Hull <jim.hull@hp.com>
Cc: stable@kernel.org # 3.8
2013-07-03 10:15:09 +09:30
Rusty Russell 54041d8a73 modules: don't fail to load on unknown parameters.
Although parameters are supposed to be part of the kernel API, experimental
parameters are often removed.  In addition, downgrading a kernel might cause
previously-working modules to fail to load.

On balance, it's probably better to warn, and load the module anyway.
This may let through a typo, but at least the logs will show it.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-07-02 15:38:21 +09:30
Mathias Krause 4f6de4d51f module: don't modify argument of module_kallsyms_lookup_name()
If we pass a pointer to a const string in the form "module:symbol"
module_kallsyms_lookup_name() will try to split the string at the colon,
i.e., will try to modify r/o data. That will, in fact, fail on a kernel
with enabled CONFIG_DEBUG_RODATA.

Avoid modifying the passed string in module_kallsyms_lookup_name(),
modify find_module_all() instead to pass it the module name length.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-07-02 15:38:18 +09:30
Steven Rostedt 89c837351d kmemleak: No need for scanning specific module sections
As kmemleak now scans all module sections that are allocated, writable
and non executable, there's no need to scan individual sections that
might reference data.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-17 09:53:36 +01:00
Steven Rostedt 06c9494c0e kmemleak: Scan all allocated, writeable and not executable module sections
Instead of just picking data sections by name (names that start
with .data, .bss or .ref.data), use the section flags and scan all
sections that are allocated, writable and not executable. Which should
cover all sections of a module that might reference data.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[catalin.marinas@arm.com: removed unused 'name' variable]
[catalin.marinas@arm.com: collapsed 'if' blocks]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-17 09:53:07 +01:00
Rusty Russell 944a1fa012 module: don't unlink the module until we've removed all exposure.
Otherwise we get a race between unload and reload of the same module:
the new module doesn't see the old one in the list, but then fails because
it can't register over the still-extant entries in sysfs:

 [  103.981925] ------------[ cut here ]------------
 [  103.986902] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xab/0xd0()
 [  103.993606] Hardware name: CrownBay Platform
 [  103.998075] sysfs: cannot create duplicate filename '/module/pch_gbe'
 [  104.004784] Modules linked in: pch_gbe(+) [last unloaded: pch_gbe]
 [  104.011362] Pid: 3021, comm: modprobe Tainted: G        W    3.9.0-rc5+ #5
 [  104.018662] Call Trace:
 [  104.021286]  [<c103599d>] warn_slowpath_common+0x6d/0xa0
 [  104.026933]  [<c1168c8b>] ? sysfs_add_one+0xab/0xd0
 [  104.031986]  [<c1168c8b>] ? sysfs_add_one+0xab/0xd0
 [  104.037000]  [<c1035a4e>] warn_slowpath_fmt+0x2e/0x30
 [  104.042188]  [<c1168c8b>] sysfs_add_one+0xab/0xd0
 [  104.046982]  [<c1168dbe>] create_dir+0x5e/0xa0
 [  104.051633]  [<c1168e78>] sysfs_create_dir+0x78/0xd0
 [  104.056774]  [<c1262bc3>] kobject_add_internal+0x83/0x1f0
 [  104.062351]  [<c126daf6>] ? kvasprintf+0x46/0x60
 [  104.067231]  [<c1262ebd>] kobject_add_varg+0x2d/0x50
 [  104.072450]  [<c1262f07>] kobject_init_and_add+0x27/0x30
 [  104.078075]  [<c1089240>] mod_sysfs_setup+0x80/0x540
 [  104.083207]  [<c1260851>] ? module_bug_finalize+0x51/0xc0
 [  104.088720]  [<c108ab29>] load_module+0x1429/0x18b0

We can teardown sysfs first, then to be sure, put the state in
MODULE_STATE_UNFORMED so it's ignored while we deconstruct it.

Reported-by: Veaceslav Falico <vfalico@redhat.com>
Tested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-04-17 13:23:02 +09:30
James Hogan a4b6a77b77 module: fix symbol versioning with symbol prefixes
Fix symbol versioning on architectures with symbol prefixes. Although
the build was free from warnings the actual modules still wouldn't load
as the ____versions table contained unprefixed symbol names, which were
being compared against the prefixed symbol names when checking the
symbol versions.

This is fixed by modifying modpost to add the symbol prefix to the
____versions table it outputs (Modules.symvers still contains unprefixed
symbol names). The check_modstruct_version() function is also fixed as
it checks the version of the unprefixed "module_layout" symbol which
would no longer work.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Kliegman <kliegs@chromium.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use VMLINUX_SYMBOL_STR)
2013-03-20 11:27:26 +10:30
Rusty Russell b92021b09d CONFIG_SYMBOL_PREFIX: cleanup.
We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
"_".  But Al Viro broke this in "consolidate cond_syscall and
SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
do so.

Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
prefix it so something.  So various places define helpers which are
defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:

1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
   CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
   for pasting.

(arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).

Let's solve this properly:
1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
2) Make linux/export.h usable from asm.
3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
4) Make everyone use them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Tested-by: James Hogan <james.hogan@imgtec.com> (metag)
2013-03-15 15:09:43 +10:30
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Al Viro 3dadecce20 switch vfs_getattr() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:08 -05:00