Commit Graph

125 Commits

Author SHA1 Message Date
Xiongfeng Wang e6bbde8b2b EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_location
Reading those sysfs entries gives:

  [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/max_location
  memory 3 [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/dimm0/dimm_location
  memory 0 [root@localhost /]#

Add newlines after the value it prints for better readability.

  [ bp: Make len a signed int and change the check to catch wraparound.
    Increment the pointer p only when the length check passes. Use
    scnprintf(). ]

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1600051734-8993-1-git-send-email-wangxiongfeng2@huawei.com
2020-09-18 09:14:01 +02:00
Robert Richter 4aa92c8646 EDAC/mc: Remove per layer counters
Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
turns out that only the leaves in the memory hierarchy are consumed
(in sysfs), but not the intermediate layers, e.g.:

  count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];

These unused counters only add complexity, remove them. The error
counter values are directly stored in struct dimm_info now.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com
2020-02-17 13:37:00 +01:00
Robert Richter bea1bfd5b7 EDAC/mc: Change mci device removal to use put_device()
There are dimm and csrow devices linked to the mci device esp. to show
up in sysfs. It must be granted that children devices are removed before
its mci parent. Thus, the release functions must be called in the
correct order and may not miss any child before releasing its parent. In
the current implementation this is only granted by the correct order of
release functions.

A much better approach is to use put_device() that releases the device
only after all users are gone. It is the recommended way to release a
device and free its memory. The function uses the device's refcount and
only frees it if there are no users of it anymore such as children.

So implement a mci_release() function to remove mci devices, use
put_device() to free them and early initialize the mci device right
after its struct has been allocated.

Change the release function so that it can be universally used no
matter if the device is registered or not. Since subsequent dimm
and csrow sysfs links are implemented as children devices, their
refcounts will keep the parent mci device from being removed as long
as sysfs entries exist and until all users have been unregistered in
edac_remove_sysfs_mci_device().

Remove edac_unregister_sysfs() and merge mci sysfs removal into
edac_remove_sysfs_mci_device(). There is only a single instance now that
removes the sysfs entries. The function can now be used in the error
paths for cleanup.

Also, create device release functions for all involved devices
(dev->release), remove device_type release functions (dev_type->
release) and also use dev->init_name instead of dev_set_name().

 [ bp: Massage commit message and comments. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com
2020-02-17 12:32:44 +01:00
Robert Richter 4d59588c09 EDAC/sysfs: Remove csrow objects on errors
All created csrow objects must be removed in the error path of
edac_create_csrow_objects(). The objects have been added as devices.

They need to be removed by doing a device_del() *and* put_device() call
to also free their memory. The missing put_device() leaves a memory
leak. Use device_unregister() instead of device_del() which properly
unregisters the device doing both.

Fixes: 7adc05d2dc ("EDAC/sysfs: Drop device references properly")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com
2020-02-13 13:29:41 +01:00
Robert Richter 216aa145aa EDAC/mc: Fix use-after-free and memleaks during device removal
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:

1) Use-after-free:

On 27.11.19 17:07:33, John Garry wrote:
> [   22.104498] BUG: KASAN: use-after-free in
> edac_remove_sysfs_mci_device+0x148/0x180

The use-after-free is caused by the mci_for_each_dimm() macro called in
edac_remove_sysfs_mci_device(). The iterator was introduced with

  c498afaf7d ("EDAC: Introduce an mci_for_each_dimm() iterator").

The iterator loop calls device_unregister(&dimm->dev), which removes
the sysfs entry of the device, but also frees the dimm struct in
dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
the dimm struct is accessed again, after having been freed already.

The fix is to free all the mci device's subsequent dimm and csrow
objects at a later point, in _edac_mc_free(), when the mci device itself
is being freed.

This keeps the data structures intact and the mci device can be
fully used until its removal. The change allows the safe usage of
mci_for_each_dimm() to release dimm devices from sysfs.

2) Memory leaks:

Following memory leaks have been detected:

 # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
       1     [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0      # mci->csrows
      16     [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0      # csr->channels
      16     [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0      # csr->channels[chn]
       1     [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0      # mci->dimms
      34     [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()

All leaks are from memory allocated by edac_mc_alloc().

Note: The test above shows that edac_mc_alloc() was called here from
ghes_edac_register(), thus both functions show up in the stack trace
but the module causing the leaks is edac_mc. The comments with the data
structures involved were made manually by analyzing the objdump.

The data structures listed above and created by edac_mc_alloc() are
not properly removed during device removal, which is done in
edac_mc_free().

There are two paths implemented to remove the device depending on device
registration, _edac_mc_free() is called if the device is not registered
and edac_unregister_sysfs() otherwise.

The implemenations differ. For the sysfs case, the mci device removal
lacks the removal of subsequent data structures (csrows, channels,
dimms). This causes the memory leaks (see mci_attr_release()).

 [ bp: Massage commit message. ]

Fixes: c498afaf7d ("EDAC: Introduce an mci_for_each_dimm() iterator")
Fixes: faa2ad09c0 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
Fixes: 7a623c0390 ("edac: rewrite the sysfs code to use struct device")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
2020-02-13 13:28:52 +01:00
Robert Richter c498afaf7d EDAC: Introduce an mci_for_each_dimm() iterator
Introduce an mci_for_each_dimm() iterator. It returns a pointer to
a struct dimm_info. This makes the declaration and use of an index
obsolete and avoids access to internal data of struct mci (direct array
access etc).

 [ bp: push the struct dimm_info *dimm; declaration into the
   CONFIG_EDAC_DEBUG block. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com
2019-11-10 12:39:40 +01:00
Robert Richter 977b1ce7c1 EDAC: Remove EDAC_DIMM_OFF() macro
The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index.
Simplify this by storing the index in struct dimm_info to avoid its
calculation and remove the EDAC_DIMM_OFF() macro. The index can be
directly used then.

Another advantage is that edac_mc_alloc() could be used even if the
exact size of the layers is unknown. Only the number of DIMMs would be
needed.

Rename iterator variable to idx, while at it. The name is more handy,
esp. when searching for it in the code.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com
2019-11-09 11:23:49 +01:00
Robert Richter e701f41203 EDAC/mc_sysfs: Make debug messages consistent
Debug messages are inconsistently used in the error handlers. Some lack
an error message, some are called regardless of the return status,
messages for the same error are at different locations in the code
depending on the error code. This happens esp. near put_device() calls.

Make those debug messages more consistent. Additionally, unify the error
messages to have the same terms for the same operations of the device.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190902123216.9809-5-rrichter@marvell.com
2019-09-04 11:39:19 +02:00
Robert Richter 644110e17d EDAC/mc_sysfs: Remove pointless gotos
Use direct returns instead of gotos. Error handling code becomes
smaller and better readable.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190902123216.9809-4-rrichter@marvell.com
2019-09-03 19:24:10 +02:00
Robert Richter d55c79ac86 EDAC: Prefer 'unsigned int' to bare use of 'unsigned'
Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for
edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl.

While at it, struct member dev_ch_attribute->channel is always used as
unsigned int. Change type to unsigned int to avoid type casts.

 [ bp: Massage. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com
2019-09-03 19:21:19 +02:00
Eiichi Tsukata d8655e7630 EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec
Commit 9da21b1509 ("EDAC: Poll timeout cannot be zero, p2") assumes
edac_mc_poll_msec to be unsigned long, but the type of the variable still
remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
write.

Reproducer:

  # echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec

KASAN report:

  BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
  Write of size 8 at addr ffffffffb91b2d00 by task bash/1996

  CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  Call Trace:
   dump_stack+0xca/0x13e
   print_address_description.cold+0x5/0x246
   __kasan_report.cold+0x75/0x9a
   ? edac_set_poll_msec+0x140/0x150
   kasan_report+0xe/0x20
   edac_set_poll_msec+0x140/0x150
   ? dimmdev_location_show+0x30/0x30
   ? vfs_lock_file+0xe0/0xe0
   ? _raw_spin_lock+0x87/0xe0
   param_attr_store+0x1b5/0x310
   ? param_array_set+0x4f0/0x4f0
   module_attr_store+0x58/0x80
   ? module_attr_show+0x80/0x80
   sysfs_kf_write+0x13d/0x1a0
   kernfs_fop_write+0x2bc/0x460
   ? sysfs_kf_bin_read+0x270/0x270
   ? kernfs_notify+0x1f0/0x1f0
   __vfs_write+0x81/0x100
   vfs_write+0x1e1/0x560
   ksys_write+0x126/0x250
   ? __ia32_sys_read+0xb0/0xb0
   ? do_syscall_64+0x1f/0x390
   do_syscall_64+0xc1/0x390
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7fa7caa5e970
  Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
  RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
  RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
  RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
  R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
  R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005

  The buggy address belongs to the variable:
   edac_mc_poll_msec+0x0/0x40

  Memory state around the buggy address:
   ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
   ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
  >ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
                     ^
   ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
   ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fix it by changing the type of edac_mc_poll_msec to unsigned int.
The reason why this patch adopts unsigned int rather than unsigned long
is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
integer conversion bugs and unsigned int will be large enough for
edac_mc_poll_msec.

Reviewed-by: James Morse <james.morse@arm.com>
Fixes: 9da21b1509 ("EDAC: Poll timeout cannot be zero, p2")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-06-27 10:24:47 -07:00
Greg KH 7adc05d2dc EDAC/sysfs: Drop device references properly
Do put_device() if device_add() fails.

 [ bp: do device_del() for the successfully created devices in
   edac_create_csrow_objects(), on the unwind path. ]

Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20190427214925.GE16338@kroah.com
2019-06-20 11:44:36 -07:00
Pan Bian 585fb3d93d EDAC/sysfs: Fix memory leak when creating a csrow object
In edac_create_csrow_object(), the reference to the object is not
released when adding the device to the device hierarchy fails
(device_add()). This may result in a memory leak.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.com
2019-06-20 11:44:36 -07:00
Borislav Petkov 861e6ed667 EDAC: Drop per-memory controller buses
... and use the single edac_subsys object returned from
subsys_system_register(). The idea is to have a single bus
and multiple devices on it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
2018-11-13 21:55:24 +01:00
Tony Luck 88a10b1517 EDAC: Don't add devices under /sys/bus/edac
Nobody(*) uses them. Dropping this will allow us to make the total
number of memory controllers configurable (as we won't have to worry
about duplicated device names under this directory).

(*) https://lkml.kernel.org/r/20180927221054.580220e5@coco.lan

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
CC: Aristeu Rozanski Filho <arozansk@redhat.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Justin Ernst <justin.ernst@hpe.com>
CC: Mauro Carvalho Chehab <mchehab@kernel.org>
CC: Russ Anderson <rja@hpe.com>
CC: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20181001224313.GA9487@agluck-desk
2018-11-13 20:26:41 +01:00
Johan Hovold 4708aa85d5 EDAC: Fix memleak in module init error path
Make sure to use put_device() to free the initialised struct device so
that resources managed by driver core also gets released in the event of
a registration failure.

Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2d56b109e3 ("EDAC: Handle error path in edac_mc_sysfs_init() properly")
Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-06-17 13:09:44 +02:00
Tony Luck d6dd77ebcd EDAC: Drop duplicated array of strings for memory type names
Somehow we ended up with two separate arrays of strings to describe the
"enum mem_type" values.

In edac_mc.c we have an exported list edac_mem_types[] that is used
by a couple of drivers in debug messaged.

In edac_mc_sysfs.c we have a private list that is used to display
values in:
  /sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
  /sys/devices/system/edac/mc/mc*/csrow*/mem_type

This list was missing a value for MEM_LRDDR3.

The string values in the two lists were different :-(

Combining the lists, I kept the values so that the sysfs output
will be unchanged as some scripts may depend on that.

Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-03-14 12:20:16 +01:00
Kees Cook e4dca7b7aa treewide: Fix function prototypes for module_param_call()
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:

@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@

 module_param_call(_name, _set_func, _get_func, _arg, _mode);

@fix_set_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _set_func(
-_val_type _val
+const char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

@fix_get_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _get_func(
-_val_type _val
+char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:

	drivers/platform/x86/thinkpad_acpi.c
	fs/lockd/svc.c

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2017-10-31 15:30:37 +01:00
Bhumika Goyal b2b3e7362e EDAC: Make device_type const
Make these const as they are only stored in the type field of a device
structure, which is const.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1503130946-2854-2-git-send-email-bhumirks@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-08-20 13:12:32 +02:00
Arvind Yadav 1c18be5a4e EDAC: Constify attribute_group structures
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
CC: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/776cb8265509054abd01b0b551624cc0da3b88e7.1499078335.git.arvind.yadav.cs@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-07-17 10:20:25 +02:00
Aaron Miller 4fb6fde74d EDAC: Expose per-DIMM error counts in sysfs
The old csrowX sysfs directories have per-csrow error counters, but the
new dimmX directories do not currently expose error counts.

EDAC already keeps these counts, add them to sysfs so per-DIMM counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n.

Signed-off-by: Aaron Miller <aaronmiller@fb.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-19 10:29:40 +01:00
Ben Dooks 628ea92f09 EDAC: Make dev_attr_sdram_scrub_rate static
The dev_attr_sdram_scrub_rate is not declared in a header or used
anywhere else, so make it static to fix the following warning:

  drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
  'dev_attr_sdram_scrub_rate' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-06 15:54:16 +01:00
Mauro Carvalho Chehab 78d88e8a3d edac: rename edac_core.h to edac_mc.h
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Borislav Petkov bba142957e EDAC: Correct channel count limit
c44696fff0 ("EDAC: Remove arbitrary limit on number of channels")
lifted the arbitrary limit on memory controller channels in EDAC.
However, the dynamic channel attributes dynamic_csrow_dimm_attr and
dynamic_csrow_ce_count_attr remained 6.

This wasn't a problem except channels 6 and 7 weren't visible in sysfs
on machines with more than 6 channels after the conversion to static
attr groups with

  2c1946b6d6 ("EDAC: Use static attribute groups for managing sysfs entries")

 [ without that, we're exploding in edac_create_sysfs_mci_device()
   because we're dereferencing out of the bounds of the
   dynamic_csrow_dimm_attr array. ]

Add attributes for channels 6 and 7 along with a guard for the
future, should more channels be required and/or to sanity check for
misconfigured machines.

We still need to check against the number of channels present on the MC
first, as Thor reported.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com>
Tested-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: <stable@vger.kernel.org> # 4.2
2016-06-16 10:06:35 +02:00
Tony Luck ab67b6c22d EDAC: Fix used after kfree() error in edac_unregister_sysfs()
Code flow looks like this:

  device_unregister(&mci->dev);
   -> kobject_put+0x25/0x50
    -> kobject_cleanup+0x77/0x190
      -> device_release+0x32/0xa0
	-> mci_attr_release+0x36/0x70
	  -> kfree(mci);
  bus_unregister(mci->bus);

Fix is to grab a local copy of "mci->bus" and use that when we call
bus_unregister().

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23 13:23:54 +02:00
Borislav Petkov d4538000ca EDAC: Remove edac_get_sysfs_subsys() error handling
It cannot fail now. We either load EDAC core after having successfully
initialized edac_subsys or we don't.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:41 +01:00
Borislav Petkov 733476cf20 EDAC: Rip out the edac_subsys reference counting
This was really dumb - reference counting for the main EDAC sysfs
object. While we could've simply registered it as the first thing in the
module init path and then hand it around to what needs it.

Do that and rip out all the code around it, thus simplifying the whole
handling significantly.

Move the edac_subsys node back to edac_module.c.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:39 +01:00
Borislav Petkov 12e26969b3 EDAC, mc_sysfs: Fix freeing bus' name
I get the splat below when modprobing/rmmoding EDAC drivers. It happens
because bus->name is invalid after bus_unregister() has run. The Code: section
below corresponds to:

  .loc 1 1108 0
  movq    672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
  .loc 1 1109 0
  popq    %rbx    #

  .loc 1 1108 0
  movq    (%rax), %rdi    # _7->name,
  jmp     kfree   #

and %rax has some funky stuff 2030203020312030 which looks a lot like
something walked over it.

Fix that by saving the name ptr before doing stuff to string it points to.

  general protection fault: 0000 [#1] SMP
  Modules linked in: ...
  CPU: 4 PID: 10318 Comm: modprobe Tainted: G          I EN  3.12.51-11-default+ #48
  Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
  task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
  RIP: 0010:[<ffffffffa019da92>]  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
  RSP: 0018:ffff88030da3fe28  EFLAGS: 00010292
  RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
  RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
  RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
  R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
  R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
  FS:  00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
  Stack:
  Call Trace:
    i7core_unregister_mci.isra.9
    i7core_remove
    pci_device_remove
    __device_release_driver
    driver_detach
    bus_remove_driver
    pci_unregister_driver
    i7core_exit
    SyS_delete_module
    system_call_fastpath
    0x7fc9bf426536
  Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
  RIP  [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
   RSP <ffff88030da3fe28>

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: <stable@vger.kernel.org> # v3.6..
Fixes: 7a623c0390 ("edac: rewrite the sysfs code to use struct device")
2015-12-11 16:56:38 +01:00
Tan Xiaojun 30f84a891b EDAC: Use edac_debugfs_remove_recursive()
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem, but mci->debugfs might not empty.

This can be triggered by the following sequence:

1) Enable CONFIG_EDAC_DEBUG
2) insmod an EDAC module (like i3000_edac or similar)
3) rmmod this module
4) we can see files remaining under <debugfs_mountpoint>/edac/ like
   "fake_inject", for example.

Removing edac_core then, causes a NULL pointer dereference.

Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com>
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-14 18:50:32 +02:00
Toshi Kani d0c9c93019 EDAC: Don't allow empty DIMM labels
Updating dimm_label to an empty string does not make much sense. Change
the sysfs dimm_label store operation to fail a request when an input
string is empty.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: elliott@hpe.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-28 16:39:05 +02:00
Toshi Kani 438470b84c EDAC: Fix sysfs dimm_label store operation
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
in their store operation:

 1) A newline-terminated input string causes redundant newlines:

  # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat  /sys/bus/mc0/devices/dimm0/dimm_label
  test

  #  od -bc /sys/bus/mc0/devices/dimm0/dimm_label
  0000000 164 145 163 164 012 012
            t   e   s   t  \n  \n
  0000006

 2) The original label string (31 characters) cannot be stored due to
    an improper size check:

  # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat /sys/bus/mc0/devices/dimm0/dimm_label

  # od -bc /sys/bus/mc0/devices/dimm0/dimm_label
   0000000 012 012
            \n  \n
   0000002

 3) An input string longer than the buffer size results a wrong label
    info as it allows a retry with the remaining string:

  # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
  # cat  /sys/bus/mc0/devices/dimm0/dimm_label
  _TEST

Fix these issues by making the following changes:
 1) Replace a newline character at the end by setting a null. It also
    assures that the string is null-terminated in the label buffer.
 2) Check the label buffer size with 'sizeof(dimm->label)'.
 3) Fail a request if its string exceeds the label buffer size.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Robert Elliott <elliott@hpe.com>
Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 19:45:59 +02:00
Toshi Kani 1ea62c59c8 EDAC: Fix sysfs dimm_label show operation
After

  7d375bffa5 ("sb_edac: Fix support for systems with two home agents per socket")

sysfs "dimm_label" and "chX_dimm_label" show their label string without a
newline "\n" at the end.

  [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
  CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

  [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
  CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#

The label strings now have 31 characters, which are the same as
EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
the newline in the format "%s\n" is ignored.

  [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
  0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
            C   P   U   _   S   r   c   I   D   #   0   _   H   a   #   0
  0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
            _   C   h   a   n   #   0   _   D   I   M   M   #   0  \0
  0000037

Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
snprintf()s in channel_dimm_label_show() and dimmdev_label_show().

Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25 19:19:21 +02:00
Borislav Petkov 7ac8bf9bc9 EDAC: Carve out debugfs functionality
... into a separate compilation unit and drop a couple of
CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to
edac_create_debugfs_nodes(), while at it.

No functionality change.

Cc: <linux-edac@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-22 12:29:46 +02:00
Tony Luck c44696fff0 EDAC: Remove arbitrary limit on number of channels
Currently set to "6", but the reset of the code will dynamically
allocate as needed.  We need to go to "8" today, but drop the check
completely to save doing this again when we need even larger numbers.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-06-03 10:10:22 -03:00
Alexey Khoroshilov c6b97bcf8e EDAC: Properly unwind on failure path in edac_init()
edac_init() does not deallocate already allocated resources on failure
path.

Found by Linux Driver Verification project (linuxtesting.org).

 [ Boris: The unwind path functions have __exit annotation but are being
   used in an __init function, leading to section mismatches. Drop the
   section annotation and make them normal functions. ]

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:13:07 +01:00
Takashi Iwai 4e8d230de9 EDAC: Allow to pass driver-specific attribute groups
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups.  This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.

edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:07:41 +01:00
Takashi Iwai 2c1946b6d6 EDAC: Use static attribute groups for managing sysfs entries
Instead of manual calls of device_create_file() and
device_remove_file(), use static attribute groups with proper
is_visible callbacks for managing the sysfs entries.

This simplifies the code a lot and avoids the possible races.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:06:58 +01:00
Borislav Petkov f11135d87d EDAC: edac_mc_sysfs: Make stuff static
Fix sparse warnings.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30 14:49:04 +01:00
Junjie Mao 1bf1950c4e EDAC: Fix the leak of mci->bus->name when bus_register fails
Also use goto labels for all failure paths in
edac_create_sysfs_mci_device and update meaningless labels.

Signed-off-by: Junjie Mao <junjie.mao@hotmail.com>
Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl
[ Boris: Use ! for 0 checks and add newlines for less crammed code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30 14:38:45 +01:00
Jim Snow 50043e257c sb_edac: Fix off-by-one error in number of channels
This prevented edac sysfs code from properly handling 6 channels
per memory controller.

Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:51 -02:00
Aristeu Rozanski 7b8278358c edac: add DDR4 and RDDR4
Haswell memory controller can make use of DDR4 and Registered DDR4

Cc: tony.luck@intel.com
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26 15:46:55 -03:00
Mauro Carvalho Chehab c897df0e2d Linux 3.14-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTE+9XAAoJEHm+PkMAQRiGrMQIAKI2V49Kj8WlnwGchFvsbGJB
 SLALwNi33T/IBKdZRhrfryBu02Zj7eVvZ2ML35dJEnmF88O+dJBDMTkKV1xalrip
 mtkBrjUnfAI04fq/daLQ1TsAy4qqlra5tSTuDCw8ILOnGPwT0VydIEHNdtmoUIfw
 xlZLxHzny1MslZ78d7uR/cUnV9ylKRRajWzfw1HT8hL51fCt8nRWY0sCvwvl+kMJ
 LsK+6I7mHDUuzA7QBmBI+dhzQgos5+JkkrnpmqHAqwmIh+AI3ksmjUCQ4dM7owrO
 IvEx+ZNDqxAdLcm1WAxATNfxddFXHc62JTvKuuKqTVWuaxVfK1Aqt8MjDMIPeAQ=
 =yV5u
 -----END PGP SIGNATURE-----

Merge tag 'v3.14-rc5' into patchwork

Linux 3.14-rc5

* tag 'v3.14-rc5': (1117 commits)
  Linux 3.14-rc5
  drm/vmwgfx: avoid null pointer dereference at failure paths
  drm/vmwgfx: Make sure backing mobs are cleared when allocated. Update driver date.
  drm/vmwgfx: Remove some unused surface formats
  MAINTAINERS: add maintainer entry for Armada DRM driver
  arm64: Fix !CONFIG_SMP kernel build
  arm64: mm: Add double logical invert to pte accessors
  dm cache: fix truncation bug when mapping I/O to >2TB fast device
  perf tools: Fix strict alias issue for find_first_bit
  powerpc/powernv: Fix indirect XSCOM unmangling
  powerpc/powernv: Fix opal_xscom_{read,write} prototype
  powerpc/powernv: Refactor PHB diag-data dump
  powerpc/powernv: Dump PHB diag-data immediately
  powerpc: Increase stack redzone for 64-bit userspace to 512 bytes
  powerpc/ftrace: bugfix for test_24bit_addr
  powerpc/crashdump : Fix page frame number check in copy_oldmem_page
  powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly
  kvm, vmx: Really fix lazy FPU on nested guest
  perf tools: fix BFD detection on opensuse
  drm/radeon: enable speaker allocation setup on dce3.2
  ...
2014-03-11 06:55:49 -03:00
Borislav Petkov 9da21b1509 EDAC: Poll timeout cannot be zero, p2
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
2014-02-14 10:40:29 +01:00
Prarit Bhargava 79040cad3f drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
If you do

  echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec

the following stack trace is output because the edac module is not
designed to poll with a timeout of zero.

  WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
  list_add corruption. prev->next should be next (ffff8808291dd1b8), but was           (null). (prev=ffff8808286fe3f8).
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    __list_add+0xac/0xc0
    __internal_add_timer+0xab/0x130
    internal_add_timer+0x17/0x40
    mod_timer_pinned+0xca/0x170
    intel_pstate_timer_func+0x28a/0x380
    call_timer_fn+0x36/0x100
    run_timer_softirq+0x1ff/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290

  kernel BUG at kernel/timer.c:1084!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    run_timer_softirq+0x245/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290
  RIP   cascade+0x93/0xa0

  WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Workqueue: edac-poller edac_mc_workq_function [edac_core]
  Call Trace:
    dump_stack+0x45/0x56
    warn_slowpath_common+0x7d/0xa0
    warn_slowpath_null+0x1a/0x20
    __queue_delayed_work+0xed/0x1a0
    queue_delayed_work_on+0x27/0x50
    edac_mc_workq_function+0x72/0xa0 [edac_core]
    process_one_work+0x17b/0x460
    worker_thread+0x11b/0x400
    kthread+0xd2/0xf0
    ret_from_fork+0x7c/0xb0

This patch adds a range check in the edac_mc_poll_msec code to check for 0.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Mauro Carvalho Chehab 37e59f876b [media, edac] Change my email address
There are several left overs with my old email address.
Remove their occurrences and add myself at CREDITS, to
allow people to be able to reach me on my new addresses.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-02-07 08:03:07 -02:00
Rashika Kheria 95285933d0 EDAC: Mark edac_create_debug_nodes as static
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.

Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15 17:59:59 +01:00
Ingo Molnar c874b6ba55 An amd64_edac fix for single channel configurations + trivial cleanups
courtesy of Jingoo Han.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSC6mSAAoJEBLB8Bhh3lVKhKYQALt3lklah1BpZIqyTiLqtizN
 YVF/lSubpcHB2BfnZUzaSInE+17uo21DJ44tQ56O0lWDsxLqadJzTDiJ2BG5Ol3u
 JOIWti/Yl8YoCQQcir6QksBAUkR26iCpelO8kO6J/JWGEekVgl3Oik4NOmYrdN55
 rUoHIyVdK5Z5y4j79Pmth7//+c6OFli1cAeUmBlIvxS9T4T2ZCz30jBim76VTS8H
 AjgaX/aBlE3SxAYoMWLZh1VglukxAVCG2qZ9lm7iNLCpkGwP59jT/DE9Gok2IXBg
 id9SMTkrpitijCyM3oKYox14Tl+QP/brElnWCVyVeRIpkVH8s2WUkU+qHeZztBg9
 8i/aU9x4bOCkDjhvBjIM4jbYJAvvaVKlIXPvANO8xl/A0D7nCsmCs7DpritRHZEr
 3y4N8SQsaamVD083+UaVciAo1XCpl5cNq9gH/Y7+U8h2bIThZn8HTbZ1uWgXaCY1
 OwfElbJDKInsSmDVBEklMI8CF42YsjGQc2JC+A/3M3CapTfepKPg6SvptNQueZb+
 SUw0mgGBumpdDQSHo5tPf5JL1y57ERkdOBVryrqYtr6qdjw3ox9o/+B8eTy3gkg1
 b71LEhzG2UbdSVaHiYhRE/IR3W3yKzNX8Oh3BTHfp04jqnNijx4hHu9oX7j3W59M
 P/bd6GHHI5ZY66aLqCue
 =r2kL
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull RAS/EDAC updates from Boris Petkov:

 "An amd64_edac fix for single channel configurations + trivial cleanups
  courtesy of Jingoo Han."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:07:20 +02:00
Jingoo Han c542b53da9 EDAC: Replace strict_strtol() with kstrtol()
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-24 09:30:03 +02:00
Borislav Petkov 88d84ac973 EDAC: Fix lockdep splat
Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
  dump_stack
  warn_slowpath_common
  warn_slowpath_fmt
  lockdep_init_map
  ? trace_hardirqs_on_caller
  ? trace_hardirqs_on
  debug_mutex_init
  __mutex_init
  bus_register
  edac_create_sysfs_mci_device
  edac_mc_add_mc
  sbridge_probe
  pci_device_probe
  driver_probe_device
  __driver_attach
  ? driver_probe_device
  bus_for_each_dev
  driver_attach
  bus_add_driver
  driver_register
  __pci_register_driver
  ? 0xffffffffa0010fff
  sbridge_init
  ? 0xffffffffa0010fff
  do_one_initcall
  load_module
  ? unset_module_init_ro_nx
  SyS_init_module
  tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-23 16:01:28 -07:00
Jingoo Han c7f62fc87b EDAC: Replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-08 10:16:33 +02:00