Haswell memory controller can make use of DDR4 and Registered DDR4
Cc: tony.luck@intel.com
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
When a MC is handled, the correct sbridge_dev is searched based on the node,
checking again later with the assumption the first memory controller found is
the first socket's memory controller is a bogus assumption. Get rid of it.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
channel_mask will be used in the future to determine which group of memory
modules is causing the errors since when mirroring, lockstep and close page
are enabled you can't. While that doesn't happen, use the channel_mask to
determine the channel instead of relying on the MC event/exception.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This patch fixes the obvious bug while handling the socket/HA bitmask used in
Ivy Bridge memory controllers.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This patch changes the way devices are searched by using product id instead of
device/function numbers. Tested in a Sandy Bridge and a Ivy Bridge machine to
make sure everything works properly.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Haswell has a different way to retrieve RIR limits, make this procedure per
model.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Haswell has a different way to retrieve the node id, make so this procedure
can be reimplemented.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Haswell has different register, offset to determine memory type and supports
DDR4 in some models. This patch makes it easier to have a different method
depending on the memory controller type.
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
There are a bunch of users open coding the for_each_node_by_name() by
calling of_find_node_by_name() directly instead of using the macro. This
is getting in the way of some cleanups, and the possibility of removing
of_find_node_by_name() entirely. Clean it up so that all the users are
consistent.
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Takashi Iwai <tiwai@suse.de>
To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.
Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Assign PCI resources before pci_bus_add_device(). The resources must be
assigned before a driver can claim the device.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_bus_add_device() always returns 0, so there's no point in returning
anything at all. Make it a void function and remove the tests of the
return value from the callers.
[bhelgaas: changelog, remove unused "err" from i82875p_setup_overfl_dev()]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
295d8cda26 ("EDAC, MCE, AMD: Drop local coreid reporting") removed the
code snippet which used that mask but forgot to drop the mask itself. Do
that now.
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull media updates from Mauro Carvalho Chehab:
"The main set of series of patches for media subsystem, including:
- document RC sysfs class
- added an API to setup scancode to allow waking up systems using the
Remote Controller
- add API for SDR devices. Drivers are still on staging
- some API improvements for getting EDID data from media
inputs/outputs
- new DVB frontend driver for drx-j (ATSC)
- one driver (it913x/it9137) got removed, in favor of an improvement
on another driver (af9035)
- added a skeleton V4L2 PCI driver at documentation
- added a dual flash driver (lm3646)
- added a new IR driver (img-ir)
- added an IR scancode decoder for the Sharp protocol
- some improvements at the usbtv driver, to allow its core to be
reused.
- added a new SDR driver (rtl2832u_sdr)
- added a new tuner driver (msi001)
- several improvements at em28xx driver to fix PM support, device
removal and to split the V4L2 specific bits into a separate
sub-driver
- one driver got converted to videobuf2 (s2255drv)
- the e4000 tuner driver now follows an improved binding model
- some fixes at V4L2 compat32 code
- several fixes and enhancements at videobuf2 code
- some cleanups at V4L2 API documentation
- usual driver enhancements, new board additions and misc fixups"
[ NOTE! This merge effective drops commit 4329b93b28 ("of: Reduce
indentation in of_graph_get_next_endpoint").
The of_graph_get_next_endpoint() function was moved and renamed by
commit fd9fdb78a9 ("[media] of: move graph helpers from
drivers/media/v4l2-core to drivers/of"). It was originally called
v4l2_of_get_next_endpoint() and lived in the file
drivers/media/v4l2-core/v4l2-of.c.
In that original location, it was then fixed to support empty port
nodes by commit b9db140c1e ("[media] v4l: of: Support empty port
nodes"), and that commit clashes badly with the dropped "Reduce
intendation" commit. I had to choose one or the other, and decided
that the "Support empty port nodes" commit was more important ]
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (426 commits)
[media] em28xx-dvb: fix PCTV 461e tuner I2C binding
Revert "[media] em28xx-dvb: fix PCTV 461e tuner I2C binding"
[media] em28xx: fix PCTV 290e LNA oops
[media] em28xx-dvb: fix PCTV 461e tuner I2C binding
[media] m88ds3103: fix bug on .set_tone()
[media] saa7134: fix WARN_ON during resume
[media] v4l2-dv-timings: add module name, description, license
[media] videodev2.h: add parenthesis around macro arguments
[media] saa6752hs: depends on CRC32
[media] si4713: fix Kconfig dependencies
[media] Sensoray 2255 uses videobuf2
[media] adv7180: free an interrupt on failure paths in init_device()
[media] e4000: make VIDEO_V4L2 dependency optional
[media] af9033: Don't export functions for the hardware filter
[media] af9035: use af9033 PID filters
[media] af9033: implement PID filter
[media] rtl2832_sdr: do not use dynamic stack allocation
[media] e4000: fix 32-bit build error
[media] em28xx-audio: make sure audio is unmuted on open()
[media] DocBook media: v4l2_format_sdr was renamed to v4l2_sdr_format
...
Pull sb_edac patches from Mauro Carvalho Chehab:
"A couple sb_edac driver improvements, cleaning a little bit the amount
of data sent to dmesg, and fixing one error message"
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
sb_edac: mark MCE messages as KERN_DEBUG
sb_edac: use "event" instead of "exception" when MC wasnt signaled
Pull MIPS updates from Ralf Baechle:
- Support for Imgtec's Aptiv family of MIPS cores.
- Improved detection of BCM47xx configurations.
- Fix hiberation for certain configurations.
- Add support for the Chinese Loongson 3 CPU, a MIPS64 R2 core and
systems.
- Detection and support for the MIPS P5600 core.
- A few more random fixes that didn't make 3.14.
- Support for the EVA Extended Virtual Addressing
- Switch Alchemy to the platform PATA driver
- Complete unification of Alchemy support
- Allow availability of I/O cache coherency to be runtime detected
- Improvments to multiprocessing support for Imgtec platforms
- A few microoptimizations
- Cleanups of FPU support
- Paul Gortmaker's fixes for the init stuff
- Support for seccomp
* 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr: (165 commits)
MIPS: CPC: Use __raw_ memory access functions
MIPS: CM: use __raw_ memory access functions
MIPS: Fix warning when including smp-ops.h with CONFIG_SMP=n
MIPS: Malta: GIC IPIs may be used without MT
MIPS: smp-mt: Use common GIC IPI implementation
MIPS: smp-cmp: Remove incorrect core number probe
MIPS: Fix gigaton of warning building with microMIPS.
MIPS: Fix core number detection for MT cores
MIPS: MT: core_nvpes function to retrieve VPE count
MIPS: Provide empty mips_mt_set_cpuoptions when CONFIG_MIPS_MT=n
MIPS: Lasat: Replace del_timer by del_timer_sync
MIPS: Malta: Setup PM I/O region on boot
MIPS: Loongson: Add a Loongson-3 default config file
MIPS: Loongson 3: Add CPU hotplug support
MIPS: Loongson 3: Add Loongson-3 SMP support
MIPS: Loongson: Add Loongson-3 Kconfig options
MIPS: Loongson: Add swiotlb to support All-Memory DMA
MIPS: Loongson 3: Add serial port support
MIPS: Loongson 3: Add IRQ init and dispatch support
MIPS: Loongson 3: Add HT-linked PCI support
...
* Support for new AMD models, along with more graceful fallback for
unsupported hw.
* Bunch of fixes from SUSE accumulated from bug reports
* Misc other fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTMUueAAoJEBLB8Bhh3lVKbGIP/iXBanEUUXbg027hvNch8tbJ
Yzsk3VkhmZ6hWXU4XNkvFFbSfCfHuHYQSEAxro6yc2t+6aVhMBrKIcCBNejNYt37
louswtfhC9A4geX206bzfOFNxOVsWzX2vxhhQFbqybZe6BnmcnCSKn//v+kF0/5E
6CrwQyKQjl2vGewPU4pBlrbnRNduPRs6bg6lWgzNxbKrhD8Ce/7NQRVd5v698pNm
NTqfS37J90o1DP3I6yY30trtprPjjfnFANxb2DPaf09Z/X+aI1ONNe4s/URBd+qg
yYIWAphbKopXPZOS0wDU0ikb7dGWH1lcQdVE69GxnCEvz9hz2ISg3plguwbYLPzK
CmI4RGU0D8n5Irxu8bUID4SLT6fSAqU+cgJCdu6dqUncB7khcwG8dWJxMYNGjlBm
TD9aCRRs+/b07KUrBeL1tufgd9gGPKWDeecTw55VMofIFkYvwIAz8TFdj3ql4Uz/
AsYJkAqvoO4dTz1ai+PJ06EWSijUT+KLkyZbybS5+PiP4uRAWY8pkCKbvTJQb1xN
0SAwm0tqT7mQKc/oIsvpZ/TLjwYxRhAeVMvrkElkNRPSsrt70rDVQiPgzQh9GTqV
d5GmYI65Tk2Jrl7dVN+Pk+NDdS/y52SPk1AMDsfoBP6dndftdF6/gtqnijmbLgMY
/yH+hIAKJXJUUKO4pELe
=vafk
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"A bunch of EDAC updates all over the place:
- Support for new AMD models, along with more graceful fallback for
unsupported hw.
- Bunch of fixes from SUSE accumulated from bug reports
- Misc other fixes and cleanups"
* tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
amd64_edac: Add support for newer F16h models
i7core_edac: Drop unused variable
i82875p_edac: Drop redundant call to pci_get_device()
amd8111_edac: Fix leaks in probe error paths
e752x_edac: Drop pvt->bridge_ck
MCE, AMD: Fix decoding module loading on unsupported hw
i5100_edac: Remove an unneeded condition in i5100_init_csrows()
sb_edac: Degrade log level for device registration
amd64_edac: Fix logic to determine channel for F15 M30h processors
edac/85xx: Remove deprecated IRQF_DISABLED
i3200_edac: Add a missing pci_disable_device() on the exit path
i5400_edac: Disable device when unloading module
e752x_edac: Simplify call to pci_get_device()
Since the driver is decoding the MCE, it's useless to have these
messages printed unless you're debugging a problem in the driver.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Corrected Errors are MC events, not exceptions and reporting as the
later might confuse users.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
turned on using mce_amd_inj module and the patch works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Tested-by: Arindam Nath <Arindam.Nath@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix the following warning:
drivers/edac/i7core_edac.c: In function "core_mce_output_error":
drivers/edac/i7core_edac.c:1711:8: warning: variable "type" set but not used [-Wunused-but-set-variable]
char *type, *optype, *err;
^
According to Mauro, type can just be dropped, as tp_event now maps if
the error is corrected, uncorrected non-fatal or uncorrected fatal
one.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224171358.692d7e5a@endymion.delvare
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
The first call to pci_get_device() in i82875p_probe1() is useless. The
result is immediately reset at the beginning of
i82875p_setup_overfl_dev(), which then issues the same
pci_get_device() call. Dropping the redundant call avoids a device
reference leak.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224160316.60e55fb6@endymion.delvare
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
pvt->bridge_ck always points to the same device as pvt->dev_d0f1, so
get rid of the former and only use the latter.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140224151544.16ba28a0@endymion.delvare
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.
This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.
Tested on an Intel S7000FC4UR system with a 7300 chipset.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.
In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.
This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
We want to still be able to issue some error information on systems for
which there is no decoding support (think older distro kernels here,
for example). Therefore, we allow module registration but skip the
per-family bank-specific decoders and issue the general information
only, i.e.:
[ 46.822828] [Hardware Error]: Error Status: Uncorrected, software containable error.
[ 46.822846] [Hardware Error]: CPU:0 (15:30:0) MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f
[ 46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)
with the hope that it still contains helpful useful bits.
Suggested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392659391-2411-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
We checked that "npages" was not zero a couple lines earlier so we can
remove this and pull the code in one indent level.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20140213111738.GB15549@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
On a system with four Intel processors, it generates too many messages
"EDAC sbridge: Seeking for: dev 1d.3 PCI ID xxxx". And it doesn't give
many useful information for normal users, so change log level from INFO
to DEBUG.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1392613824-11230-1-git-send-email-jiang.liu@linux.intel.com
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.
On that second path we don't need to init the workqueue which has been
initialized already.
Thanks to Tejun for workqueue insights.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
Currently, i3200_edac.c forgets to call pci_disable_device() during a
process of disabling device. This patch adds that call.
The problem was detected by Huqiu Liu.
Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Link: http://lkml.kernel.org/r/1389587178-10167-1-git-send-email-mitake.hitoshi@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This was found by Huqiu Liu using a static analysis.
Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140116162021.GY15716@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
There are several left overs with my old email address.
Remove their occurrences and add myself at CREDITS, to
allow people to be able to reach me on my new addresses.
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
The last parameter, pvt->bridge_ck, is always NULL as far as I can
see, so just use that.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Link: http://lkml.kernel.org/r/20140130091101.22f2b550@endymion.delvare
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull x86 RAS changes from Ingo Molnar:
- SCI reporting for other error types not only correctable ones
- GHES cleanups
- Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which,
if done properly in the BIOS, makes a corresponding EDAC module
redundant
- PCIe AER tracepoint severity levels fix
- Error path correction for the mce device init
- MCE timer fix
- Add more flexibility to the error injection (EINJ) debugfs interface
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mce: Fix mce_start_timer semantics
ACPI, APEI, GHES: Cleanup ghes memory error handling
ACPI, APEI: Cleanup alignment-aware accesses
ACPI, APEI, GHES: Do not report only correctable errors with SCI
ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface
ACPI, eMCA: Combine eMCA/EDAC event reporting priority
EDAC, sb_edac: Modify H/W event reporting policy
EDAC: Add an edac_report parameter to EDAC
PCI, AER: Fix severity usage in aer trace event
x86, mce: Call put_device on device_register failure
* misc small enhancements/fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJS3PEWAAoJEBLB8Bhh3lVKziAP/3nNouipUvIcRpL3o8U3jX19
Z38tD0KPjIIX46rgexjoXbI/bKEy6whE1eDc0CG9GPpOaN9p9WWe81hfBIQc9DYK
epFNdH3Wk83Ml+3+01CBKfLfiIV25c7rc9IOM7g+nMOg65Z++iC1Cea7i7Ul7T5a
C+CIDdsmco9VrDBNjHHsrdYVZ1Ufx4NxPb5xhemQAiGCXseBtoD1IDS/gOM1F+aP
UBDSJobjDrFq5JasEkp83U62VBTDndEcxb6yHROVahIPRn4miroc0NMHJ3XEd1/H
K9u24iXmHgjjG3Euzd3mq8Axsscz8dUksKQT+lL2YIXz+AM8PcrCDrXK6TiVF9/A
k5Vj91svnWUDrEU0OF1HysOvn0Z/9Y0+Fwyt6//HMyl/7ehCZBINwxsZ4sc4whtd
nVHV2Ddcc/LjZzQEAvHuyWdf2pfxi3M1VRhNvuGyWtdHBPf5P8tqVUteoztELPM+
bKr7wkpvKzKajX6xldNNrVP9xNps8hrYdETjhP5FpUYpF4HKntAe+VJvIue+nOja
K2AxB25qGHyK5rgh69DkSuatZmImflG4brKWbOE+4La0E8OajrM790YdqP2hRnDQ
3zpuIRr/htIAy7C2vGbmWITRhxXnGx5FhDI7AQRr1VjvgJprWM5zgIacDuH+5C9K
ftouZ1hFTDI1WEEMDxRF
=EJ+3
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- mpc85xx PCIe error interrupt support
- misc small enhancements/fixes all over the place.
* tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Don't try to cancel workqueue when it's never setup
e752x_edac: Fix pci_dev usage count
sb_edac: Mark get_mci_for_node_id as static
EDAC: Mark edac_create_debug_nodes as static
amd64_edac: Remove "amd64" prefix from static functions
amd64_edac: Simplify code around decode_bus_error
amd64_edac: Mark amd64_decode_bus_error as static
EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
amd64_edac: Fix condition to verify max channels allowed for F15 M30h
edac/85xx: Add PCIe error interrupt edac support
We only setup a workqueue for edac devices that use the polling
method. We still try to cancel the workqueue if an edac_device
uses the irq method though. This causes a warning from debug
objects when we remove an edac device:
WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0()
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28
Modules linked in: krait_edac(-)
CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69
(unwind_backtrace+0x0/0x144)
(show_stack+0x20/0x24)
(dump_stack+0x74/0xb4)
(warn_slowpath_common+0x78/0x9c)
(warn_slowpath_fmt+0x40/0x48)
(debug_print_object+0x98/0xc0)
(debug_object_assert_init+0xdc/0xec)
(del_timer+0x24/0x7c)
(try_to_grab_pending+0xc0/0x1b0)
(cancel_delayed_work+0x2c/0xa0)
(edac_device_workq_teardown+0x1c/0x38)
(edac_device_del_device+0xb8/0xe4)
(krait_edac_remove+0x50/0x70 [krait_edac])
(platform_drv_remove+0x24/0x28)
(__device_release_driver+0x68/0xc0)
(driver_detach+0xc4/0xc8)
(bus_remove_driver+0xac/0x114)
(driver_unregister+0x38/0x58)
(platform_driver_unregister+0x1c/0x20)
(krait_edac_driver_exit+0x14/0x1c [krait_edac])
(SyS_delete_module+0x178/0x2b4)
Fix it by skipping the workqueue teardown for such devices.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org
Signed-off-by: Borislav Petkov <bp@suse.de>
In case the device 0, function 1 is not found using pci_get_device(),
pci_scan_single_device() will be used but, differently than
pci_get_device(), it allocates a pci_dev but doesn't does bump the usage
count on the pci_dev and after few module removals and loads the pci_dev
will be freed.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: mark gross <mark.gross@intel.com>
Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSrCysAAoJEBLB8Bhh3lVK1ikP/0hKY1Kk4tjbSta9A9Z8LdQG
9F5JzEny47DpTrLaKij7MqAlbYFO8sSm7Zw0CEztTF7Ou/H37GAuxhMlB8ECMGOm
Dzu53X1rySTna9mB+1gyXXd+pJypp/oe18/o16rw1QKjI9o2Kfgwfj7lKvytR549
kDM1dhxEImQIS5cpJPkOPbcpVlSqYN7BnK9/Qx3h0W70httT/8qrr9xVtVL7wjOT
auTA0R5/TkV06FtxyfHUNULEWTSP+2yNP/iJbusR6f4Jk1j0XmyCFr0BYOkPA1UO
9+wC9+2R+r7rJw8MBfMzNmPrRzDJHdaiHPwYqse05yewRHfRHe5cgZWJYbL8Qv0u
2WOX+fY12EfDYlihcOYtlupRzhGfGKRsaRpSuG1zX87ctDxAfNZencv4hnaJvfqG
Xk6ggIX6tHKEivO2gmaPsmhoKveh0zcozUs+wgh/tvV5QB6ioFCjzHfSEsix5+BH
ryyg1ri7IZnh92g3UuSUpE0OCbAquMfI7XIJo+kFs0u79dZTL/kD3wVu6oYazwdy
yTrvIq7Bq5cMWnnni5w7dIU09ef2uvDgyHyAS6+RiqaQxhYFsW8/yx2zJrIloWRs
7txz6t3CVmWFiejIg2gw6KyjaG6pXRBkDkI1XU6T+bKLb31ojx2+i9UKIIUeRZTB
iisWAOI6ZSdt4eAkgeaI
=r//I
-----END PGP SIGNATURE-----
Merge tag 'ras_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS updates from Borislav Petkov:
* Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch marks the function get_mci_for_node_id() as static because it
is not used outside of sb_edac.c.
Thus, it also eliminates the following warning:
drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.
Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
This patch marks the function amd64_decode_bus_error() as static because
it is not used outside of amd64_edac.c.
It also eliminates the following warning:
drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Newer Intel platforms support more than one method to report H/W event.
On this kind of platform, H/W event report can adopt new method and
traditional EDAC method should be disabled. Moreover, if EDAC event
report method is set to *force*, it means event must be reported via
EDAC interface. IOW, it overrides the default event report policy.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit and error messages ]
Signed-off-by: Borislav Petkov <bp@suse.de>
This new parameter is used to control how to report HW error reporting,
especially for newer Intel platform, like Ivybridge-EX, which contains
an enhanced error decoding functionality in the firmware, i.e. eMCA.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Currently, there is no other bus that has something like this macro for
their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com
[ Boris: swap commit message with better one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The value returned from 'f15_m30h_determine_channel' will
always be 0x3 max. The condition
(channel > 4 || channel < 0)
works as hardware never returns a value of 4, but
it leads to static checker analysis errors like
http://marc.info/?l=linux-edac&m=138607615131951&w=2.
Fix that.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain
[ Boris: massage commit message a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix this:
In file included from drivers/edac/sb_edac.c:27:0:
drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized]
printk(level "EDAC " prefix ": " fmt, ##arg)
^
drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here
u64 ch_addr, offset, limit, prv = 0;
Limit can be initialized to 0. The only way limit wouldn't be
initialized is if there are no DIMMs present (which would be a bug of
course) and it'd fail on the next test.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Add pcie error interrupt edac support for mpc85xx, p3041, p4080, and
p5020. The mpc85xx uses the legacy interrupt report mechanism - the
error interrupts are reported directly to mpic. While the p3041/
p4080/p5020 attaches the most of error interrupts to interrupt zero. And
report error interrupts to mpic via interrupt 0.
This patch can handle both of them.
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Link: http://lkml.kernel.org/r/1384712714-8826-3-git-send-email-morbidrsa@gmail.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Dave Jiang <dave.jiang@gmail.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull EDAC driver updates from Mauro Carvalho Chehab:
- sb_edac: add support for Ivy Bridge support
- cell_edac: add a missing of_node_put() call
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
cell_edac: fix missing of_node_put
sb_edac: add support for Ivy Bridge
sb_edac: avoid decoding the same error multiple times
sb_edac: rename mci_bind_devs()
sb_edac: enable multiple PCI id tables to be used
sb_edac: rework sad_pkg
sb_edac: allow different interleave lists
sb_edac: allow different dram_rule arrays
sb_edac: isolate TOHM retrieval
sb_edac: rename pci_br
sb_edac: isolate TOLM retrieval
sb_edac: make RANK_CFG_A value part of sbridge_info
Highlights:
* Support for Calxeda ECX-2000 memory controller, from Robert Richter
* Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob Herring
and Robert Richter
* New maintainer for Freescale's MPC85xx EDAC driver: Johannes Thumshirn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSiRmeAAoJEBLB8Bhh3lVKYswQAKPCYSsIQg3L3K3320uvWV3x
NEAxAAdYyN2ds7ksLv//54FBLgH9UeiC56glTMmnK9QvfGCcgHGo1WJCa84LwqIf
M7H6mKSgTEBZXr7HpWgtarMYjpNve4Nh8SwHv7tlWRJNd3ufcBbOfCY6rEreULJd
sBTRMuEPc1Ki7NxZr2m/xsPyzWXS1N1nSd2aewiszyY3Rwp1vIAPv/Chr7UwF/Fm
GGeQonc801hVRIQONwxsXzS2qwj/wgx8OPab05psfMuv6CWLxQQJAzGWbe+gv3V4
mYx64+U4nOkQ/knRAf9s0fLwJX6DWSTtQer7m5YSUey0dYDfgV+DemLFvS5We7XB
os9PBGL+0bGUrJ0cnLE6O+6S1qniWaKZrhSZndcYiVoQeDZmaMuartFlIaeRvY21
WJML2oqqUop2ZyaIKInJEyeD74FIf7BsG3V+RJwsCZx5+38Pm0EnBZqGJ9bnJl7x
OxXlHjwjZRhlVFIdcN5WeaKoKmXpdcnzLcL1XE2wMgs9ZFaleeyTDXuQm+XxkKGd
ExD/1TbAoBqFF7FwIAQwqPf1f2HPDBKlSg38X6l6NV5uLK/u5rdffKTMQPWi/6p2
RDO+Ddbypzm72850hcYc9mY+B8Qe3T3F/iKlbELiK1S5+IQzm/hmfTDcyKStwKZn
cCTvsIo9QHflhshXR1a8
=pU/h
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"Following up on last week's discussion, here's my part of the EDAC
pile, highlights in the signed tag.
The last two patches have a date from just now because I've just
applied them to the tree after Johannes sent them to me earlier. I
decided to forward them now because they're trivial.
There's a third one for MPC85xx which adds PCIe error interrupt
support but since it is not so trivial and hasn't seen any linux-next
time, I'm deferring it to 3.14
EDAC update highlights:
- Support for Calxeda ECX-2000 memory controller, from Robert Richter
- Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob
Herring and Robert Richter
- New maintainer for Freescale's MPC85xx EDAC driver: Johannes
Thumshirn"
* tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
edac/85xx: Remove mpc85xx_pci_err_remove
EDAC: Add edac-mpc85xx driver to MAINTAINERS
edac, highbank: Moving error injection to sysfs for edac
edac, highbank: Add MAINTAINERS entry
edac: Unify reporting of device info for device, mc and pci
edac, highbank: Improve and unify naming
edac, highbank: Add Calxeda ECX-2000 support
ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi
edac, highbank: Fix interrupt setup of mem and l2 controller
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the
compiler warning which can be seen when building the driver either
statically or as a module.
Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com>
Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's
wiser to modify sb_edac to support both instead of creating another
driver.
[m.chehab@samsung.com: Fix CodingStyle]
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Whenever the extended error reporting is active, multiple MCEs will be
generated for the same event, which will lead to multiple repeated
errors to be reported. So check ADDRV and only decode the error if the
MCE address is valid.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy
Bridge.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Ivy Bridge has more than one, so rename pci_br to pci_br0
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation for the Ivy Bridge support.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This is in preparation of Ivy Bridge support.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
Always have the error injection i/f available, even if there is no
debugfs or EDAC_DEBUG enabled. We need this for testing production
kernels and environments.
Thus, the entry moves from:
/sys/kernel/debug/edac/mc0/inject_ctrl
to:
/sys/devices/system/edac/mc/mc0/inject_ctrl
No other changes of the interface.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Log messages slightly differ between edac subsystems. Unifying it.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rric@kernel.org>
Assinging correct names of the 'hb_mc_edac' and 'hb_l2_edac' edac
modules for module, controller and device. Reported values for
Highbank in dmesg are now:
EDAC MC0: Giving out device to module hb_mc_edac controller
calxeda,hb-ddr-ctrl: DEV fff00000.memory-controller (INTERRUPT)
EDAC DEVICE0: Giving out device to module hb_l2_edac controller
calxeda,hb-sregs-l2-ecc: DEV fff3c200.sregs (INTERRUPT)
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
Implement edac support for Calxeda ECX-2000.
The ECX-2000 memory controller is similar to Highbank but has
different register bases for error and interrupt registers. There is
an own device tree name "calxeda,ecx-2000-ddr-ctrl" for identification
and initialization of the ECX-2000 and its base addresses.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
Register and enable interrupts after the edac registration. Otherwise
incomming ecc error interrupts lead to crashes during device setup.
Fixing this in drivers for mc and l2.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Cc: stable <stable@vger.kernel.org> # 3.6+
Signed-off-by: Robert Richter <rric@kernel.org>
In latest UEFI spec(by now it's 2.4) there are some new
fields for memory error reporting. Add these new fields for
ghes_edac interface.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.
Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
GENMASK is used to create a contiguous bitmask([hi:lo]). It is
implemented twice in current kernel. One is in EDAC driver, the other
is in SiS/XGI FB driver. Move it to a more generic place for other
usage.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Winischhofer <thomas@winischhofer.net>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Powerpc is a mess of implicit includes by prom.h. Add the necessary
explicit includes to drivers in preparation of prom.h cleanup.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Pull Tile arch updates from Chris Metcalf:
"These changes bring in a bunch of new functionality that has been
maintained internally at Tilera over the last year, plus other stray
bits of work that I've taken into the tile tree from other folks.
The changes include some PCI root complex work, interrupt-driven
console support, support for performing fast-path unaligned data
fixups by kernel-based JIT code generation, CONFIG_PREEMPT support,
vDSO support for gettimeofday(), a serial driver for the tilegx
on-chip UART, KGDB support, more optimized string routines, support
for ftrace and kprobes, improved ASLR, and many bug fixes.
We also remove support for the old TILE64 chip, which is no longer
buildable"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits)
tile: refresh tile defconfig files
tile: rework <asm/cmpxchg.h>
tile PCI RC: make default consistent DMA mask 32-bit
tile: add null check for kzalloc in tile/kernel/setup.c
tile: make __write_once a synonym for __read_mostly
tile: remove support for TILE64
tile: use asm-generic/bitops/builtin-*.h
tile: eliminate no-op "noatomichash" boot argument
tile: use standard tile_bundle_bits type in traps.c
tile: simplify code referencing hypervisor API addresses
tile: change <asm/system.h> to <asm/switch_to.h> in comments
tile: mark pcibios_init() as __init
tile: check for correct compiler earlier in asm-offsets.c
tile: use standard 'generic-y' model for <asm/hw_irq.h>
tile: use asm-generic version of <asm/local64.h>
tile PCI RC: add comment about "PCI hole" problem
tile: remove DEBUG_EXTRA_FLAGS kernel config option
tile: add virt_to_kpte() API and clean up and document behavior
tile: support FRAME_POINTER
tile: support reporting Tilera hypervisor statistics
...
dct_base and dct_limit obtain 32 bit register values when they read
their respective pci config space registers. A left shift beyond 32 bits
will cause them to wrap around. Similar case for chan_addr as can be
seen from the bug report (link below). In the patch, we rectify this by
casting chan_addr to u64 and by comparing dct_base and dct_limit against
properly shifted sys_addr in order to compare the correct bits.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later.
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
The struct should be terminated by using empty braces in order to
fix the following sparse warning.
drivers/edac/cpc925_edac.c:792:10: warning: Using plain integer as NULL pointer
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ drop obvious comment ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.
Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.
While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:
* Erratum 505: fixed in model 0x1, stepping 0x1 and later.
* Erratum 637: not fixed.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Cleanups, corrections ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Make a local function static in order to fix the following sparse
warning:
drivers/edac/x38_edac.c:252:14: warning: symbol 'x38_map_mchbar' was not declared. Should it be static?
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ Boris: Correct commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
This local symbol is used only in this file.
Fix the following sparse warnings:
drivers/edac/i3200_edac.c:264:14: warning: symbol 'i3200_map_mchbar' was not declared. Should it be static?
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.
So always allocate two channels unconditionally.
This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.
Reported-and-tested-by: Roger Leigh <rleigh@debian.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Pull MIPS updates from Ralf Baechle:
"MIPS updates:
- All the things that didn't make 3.10.
- Removes the Windriver PPMC platform. Nobody will miss it.
- Remove a workaround from kernel/irq/irqdomain.c which was there
exclusivly for MIPS. Patch by Grant Likely.
- More small improvments for the SEAD 3 platform
- Improvments on the BMIPS / SMP support for the BCM63xx series.
- Various cleanups of dead leftovers.
- Platform support for the Cavium Octeon-based EdgeRouter Lite.
Two large KVM patchsets didn't make it for this pull request because
their respective authors are vacationing"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits)
MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER
MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions
MIPS: SEAD3: Disable L2 cache on SEAD-3.
MIPS: BCM63xx: Enable second core SMP on BCM6328 if available
MIPS: BCM63xx: Add SMP support to prom.c
MIPS: define write{b,w,l,q}_relaxed
MIPS: Expose missing pci_io{map,unmap} declarations
MIPS: Malta: Update GCMP detection.
Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
MIPS: APSP: Remove <asm/kspd.h>
SSB: Kconfig: Amend SSB_EMBEDDED dependencies
MIPS: microMIPS: Fix improper definition of ISA exception bit.
MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
MIPS: Declare emulate_load_store_microMIPS as a static function.
MIPS: Fix typos and cleanup comment
MIPS: Cleanup indentation and whitespace
MIPS: BMIPS: support booting from physical CPU other than 0
MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
MIPS: GIC: Fix gic_set_affinity infinite loop
MIPS: Don't save/restore OCTEON wide multiplier state on syscalls.
...
Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ cleanup commit message and error string ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off. Remove all the remaining references to it.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix yet another issue caught by 8f46baaa7e ("base: core: WARN() about
bogus permissions on device attributes").
Signed-off-by: Borislav Petkov <bp@suse.de>
I get the following warning on boot:
------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name: -[8737R2A]-
Write permission without 'store'
...
</snip>
Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull edac fixes from Mauro Carvalho Chehab:
"Two edac fixes:
- i7300_edac currently reports a wrong number of DIMMs when the
memory controller is in single channel mode
- on some Sandy Bridge machines, the EDAC driver bails out as one of
the PCI IDs used by the driver is hidden by BIOS. As the driver
uses it only to detect the type of memory, make it optional at the
driver"
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
edac: sb_edac.c should not require prescence of IMC_DDRIO device
i7300_edac: Fix memory detection in single mode
The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
space to determine the type of DIMMs (registered or unregistered).
But this device does not exist on some single socket Sandy Bridge
servers. While the type of DIMMs is nice to know, it is not essential
for this driver's other functions. So it seems harsh to have it
refuse to load at all when it cannot find this device.
Make the check for this device be optional. If it isn't present
just report the memory type as "MEM_UNKNOWN".
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add code to handle DRAM ECC errors decoding for Fam16h.
Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.
There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:
if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
We were filling the csrow size with a wrong value. 16a528ee39 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
"For:
- Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
- error injection support for i5100, when EDAC debug is enabled;
- fix edac when it is loaded builtin (early init for the subsystem);
- a "Firmware First" EDAC driver, allowing ghes to report errors via
EDAC (ghes-edac).
With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
i7core_edac or sb_edac are running, the error reports are
unpredictable, as both BIOS and OS race to access the registers. With
ghes-edac, the EDAC core will refuse to register any other concurrent
memory error driver.
This patchset moves the ghes struct definitions to a separate header
file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
register/unregister and to report errors via ghes-edac. Those changes
were acked by ghes driver maintainer (Huang)."
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
i5100_edac: convert to use simple_open()
ghes_edac: fix to use list_for_each_entry_safe() when delete list items
ghes_edac: Fix RAS tracing
ghes_edac: Make it compliant with UEFI spec 2.3.1
ghes_edac: Improve driver's printk messages
ghes_edac: Don't credit the same memory dimm twice
ghes_edac: do a better job of filling EDAC DIMM info
ghes_edac: add support for reporting errors via EDAC
ghes_edac: Register at EDAC core the BIOS report
ghes: add the needed hooks for EDAC error report
ghes: move structures/enum to a header file
edac: add support for error type "Info"
edac: add support for raw error reports
edac: reduce stack pressure by using a pre-allocated buffer
edac: lock module owner to avoid error report conflicts
edac: remove proc_name from mci structure
edac: add a new memory layer type
edac: initialize the core earlier
edac: better report error conditions in debug mode
i5100_edac: Remove two checkpatch warnings
...
This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.
As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.
The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.
Now, the error message:
[ 72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 72.396627] {1}[Hardware Error]: APEI generic hardware error status
[ 72.396628] {1}[Hardware Error]: severity: 2, corrected
[ 72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[ 72.396632] {1}[Hardware Error]: flags: 0x01
[ 72.396634] {1}[Hardware Error]: primary
[ 72.396635] {1}[Hardware Error]: section_type: memory error
[ 72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[ 72.396638] {1}[Hardware Error]: node: 3
[ 72.396639] {1}[Hardware Error]: card: 0
[ 72.396640] {1}[Hardware Error]: module: 0
[ 72.396641] {1}[Hardware Error]: device: 0
[ 72.396643] {1}[Hardware Error]: error_type: 18, unknown
[ 72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
Is properly represented on the trace event:
kworker/0:2-584 [000] .... 72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location👎-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)
Tested on a 4 sockets E5-4650 Sandy Bridge machine.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Provide a better infrastructure for printk's inside the driver:
- use edac_dbg() for debug messages;
- standardize the usage of pr_info();
- provide warning about the risk of relying on this
driver.
While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Conflicts:
drivers/edac/ghes_edac.c
Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.
The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.
For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers all
over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
If you need me to provide a merged tree to handle these resolutions,
please let me know.
Other than those patches, there's not much here, some minor fixes and
updates.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
=yWAQ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
Other than those patches, there's not much here, some minor fixes and
updates"
Fix up trivial conflicts
* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...
The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
"There can be only one".
There are two reasons for that:
1) Each driver assumes that it is the only one registering at
the EDAC core, as it is driver's responsibility to number
the memory controllers, and all of them start from 0;
2) If BIOS is handling the memory errors, the OS can't also be
doing it, as one will mangle with the other.
So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
In order for it to work with it builtin, the EDAC core should
be initialized earlier, otherwise the ghes_edac driver initializes
before edac_mc_sysfs_init() being called:
...
[ 4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
...
[ 4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
[ 6.519495] EDAC MC: Ver: 3.0.0
[ 6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created
The net result is that no EDAC sysfs nodes will appear.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The last changeset introduced a few checkpatch warnings:
WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+ if (priv->debugfs)
+ debugfs_remove_recursive(priv->debugfs);
WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+ if (i5100_debugfs)
+ debugfs_remove(i5100_debugfs);
Get rid of them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.
After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.
Example of a CE injection:
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable
Example of UE injection:
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable
Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.
[1] Intel 5100 Memory Controller Hub Chipset
Doc.Nr: 318378
http://www.intel.com/content/dam/doc/datasheet/5100-
memory-controller-hub-chipset-datasheet.pdf
[2] Intel 7300 Chipset MemoryController Hub (MCH)
Doc.Nr: 318082
http://www.intel.com/assets/pdf/datasheet/318082.pdf
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Probe and store the device handle for the device 19 function 0 during
driver initialization. The device is used during fault injection.
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.
Reported-by: Felipe Balbi <balbi@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
After running a series of tests on an HP DL320, filled with different
memory sizes, it was noticed that, when filled with just one DIMM
on such hardware, the driver wrongly detects twice the memory, and
thinks that both channels 0 and 1 are filled.
It seems to be partially caused by the BIOS and partially by the driver.
The i3200_edac current logic would be working fine if the BIOS were
disabling the unused second channel when just one DIMM is connected,
in order to do power-saving, as recommended on this chipset's datasheet.
However, the BIOS on this particular machine doesn't do it:
[ 16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
[ 16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled
So, the driver were assuming that 2 channels are enabled (well, they are,
but the second is unused).
Combined with that, I found two issues at the logic that creates the
EDAC data, that were failing when the two channels are not equally
filled (AFAICT, that happens only when just 1 DIMM is plugged).
The first one is that a 0 at DRB means that nothing is filled. The
driver's logic, however, do some calculation with that.
The second one is that the logic that fills the DIMM data currently
assumes that both channels are equally filled.
I tested the system already with the current configuration and my
patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
at dimm slot 01 (channel 0), it is now displaying:
[ 16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
[ 16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
[ 16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
[ 16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
...
[ 16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
[ 16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb
and the corresponding sysfs nodes are now properly filled.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently, it is not possible to know, when debug is enabled,
if the driver is using 2 DIMMS per channel mode or not. It is
not possible to know the values of the drbs registers, used
to identify the memory rank sizes.
Add debug for both, as it helps to track issues on the driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Only one: AMD F16h MCE decoding enablement from Jacob Shin.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRI2t7AAoJEBLB8Bhh3lVKmLAP/RbYNVVvLZZxXqB6heCkX7D3
Avhi4GtuHqjY+79nqBoJEH9HjgzBzL6ffsJGPF3gtXrWybi6nzZkqog5QGRQFfGS
gIoldbfP9MMF/RrMHbUF5x9Ha9EcdudDgYE3eqCq6KUfOMlUrBRECaU5YRrB+FKR
qsuQNmF6J3uMPYwO9oGhTpA/vPwwapFaKgm+KND8q5h5JcpJrRpooKNQc1DheW+x
nfYFpmSBW8eamVwV5DTjqKVhKeG3gB/3i6uSTpLvh4i0ZmV2vQ+drwC3aU0Eh0Q4
wnslEpQENjODsvbZH6b0j2WTDgBGKEpALRkMUDTnnqvYrR96cV02MWUfp6cbKYCv
+d/iPt3hEBCyDTDzfP66N+1b+Wqls4Dx8lhhqLdPhsIpY2Qu8OQ8/mjyIIs12qK5
g9w3lsfy3shWmdsK/Ehd0IhNt1bzNV+63CINA2isC2QAp0Eqecj3jKrXor+MTPu1
uRY3wM+8CrdeFNNhhhsMQYIb4+DFGLgMwY5mV6z8ceFJAjxvyUxjObnSMopjBKog
9+JcyEHCADbpHsRup/zRIoGtSs+44sQ6l8l7pq6d4K4UfkG7Biq9lcxThID7rImn
Rl6MOJVmJuM5n9JVIU4/bmhjfuTcI+gdshexDEH2HnqaXxd7ceIbnrFWFaBZEO5t
t3WasBuIb30g1XCQmNT6
=LbcU
-----END PGP SIGNATURE-----
Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"Mostly AMD's side of EDAC. It is basically a new family enablement
stuff: AMD F16h MCE decoding enablement from Jacob Shin. The rest is
trivial cleanups."
* tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
mpc85xx_edac: Fix typo
EDAC, MCE, AMD: Remove unneeded exports
EDAC, MCE, AMD: Add MCE decoding support for Family 16h
EDAC, MCE, AMD: Make MC2 decoding per-family
amd64_edac: Remove dead code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q
GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5
E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+
Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo
EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL
ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M=
=jjBc
-----END PGP SIGNATURE-----
Merge tag 'v3.8-rc7' into next
Linux 3.8-rc7
* tag 'v3.8-rc7': (12052 commits)
Linux 3.8-rc7
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
atm/iphase: rename fregt_t -> ffreg_t
ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
ARM: DMA mapping: fix bad atomic test
ARM: realview: ensure that we have sufficient IRQs available
ARM: GIC: fix GIC cpumask initialization
net: usb: fix regression from FLAG_NOARP code
l2tp: dont play with skb->truesize
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
netback: correct netbk_tx_err to handle wrap around.
xen/netback: free already allocated memory on failure in xen_netbk_get_requests
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
xen/netback: shutdown the ring if it contains garbage.
drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
virtio_console: Don't access uninitialized data.
net: qmi_wwan: add more Huawei devices, including E320
net: cdc_ncm: add another Huawei vendor specific device
ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
...
Pull x86 platform changes from Ingo Molnar:
- Support for the Technologic Systems TS-5500 platform, by Vivien
Didelot
- Improved NUMA support on AMD systems:
Add support for federated systems where multiple memory controllers
can exist and see each other over multiple PCI domains. This
basically means that AMD node ids can be more than 8 now and the code
handling this is taught to incorporate PCI domain into those IDs.
- Support for the Goldfish virtual Android emulator, by Jun Nakajima,
Intel, Google, et al.
- Misc fixlets.
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add TS-5500 platform support
x86/srat: Simplify memory affinity init error handling
x86/apb/timer: Remove unnecessary "if"
goldfish: platform device for x86
amd64_edac: Fix type usage in NB IDs and memory ranges
amd64_edac: Fix PCI function lookup
x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
x86, AMD, NB: Add multi-domain support
Initially, those strings describing different parts of an MCE message
were shared with amd64_edac and were therefore exported to modules.
However, all except pp_msgs are used only in one place right now so hide
them and make them static.
No functionality change.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Add MCE decoding logic for AMD Family 16h processors.
Boris:
- drop unneeded uu_msgs export
- exit early in cat_mc1_mce and save us an indentation level
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Currently only AMD Family 15h processors have special handling for MC2
errors. Since upcoming Family 16h will also need unique handling, let's
make MC2 handling part of amd_decoder_ops.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
5e2af0c09e ("edac: Don't initialize csrow's first_page & friends when
not needed") removed useless initialization of variables but left in the
functions which did that. They're unused now so drop them.
Signed-off-by: Borislav Petkov <bp@alien8.de>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use appropriate types for northbridge IDs and memory ranges. Mark
immutable data const and keep within compilation unit on related
structures.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com
[Boris: Drop arg change to node_to_amd_nb]
Signed-off-by: Borislav Petkov <bp@alien8.de>
Fix locating sibling memory controller PCI functions by using the
correct PCI domain and use a northbridge descriptor only if found. We
need to at least warn if it wasn't found so that it gets fixed and we
don't go off with wrong results.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com
[Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails]
Signed-off-by: Borislav Petkov <bp@alien8.de>
Fix get_node_id to match northbridge IDs from the array of detected
ones, allowing multi-server support such as with Numascale's
NumaConnect, renaming to 'amd_get_node_id' for consistency.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com
[Boris: shorten lines to fit 80 cols]
Signed-off-by: Borislav Petkov <bp@alien8.de>
which spilled all EDAC suboptions into the 'Device Drivers' menu.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQ7D0jAAoJEBLB8Bhh3lVKtcgP/3uzBjETQ7KzKG9R+zHxkBom
7pl6UYqgquqpoqP3tGsn0+oaVyEvY3D11NMp/vhGxUjqJjDnIE3T/TvBMIwaRaq1
/bPrZ+hE5d2mHWP/TULEVVYMziHhR9UO1SpB+E7vX7FqKcLwzeiS7CMlm4qjq0qC
duXCG/6gyEKIecT3KPuXFzpojV6pSpSIYvLu03hdS3e8w7Wb6jvomWV282TpaSim
aFyL909M/f4kexWD0827VnBxogwBWPl1hIGfFVprYGrabgUTtfb/OFhnZut8U1FV
giDtDgIC7nATp1LzTK+fPE5kx9yZvrE1SV8ZTNF6r0dYH14fnR3YgbQqbVgbLUYF
icQqCHpyZKD+s/ajymq8lg+ltVtROqCLWdT6YFZlaBlObP0sYltpSj1JRcoC5O6c
Lx252LQQmxMQZKGnNlBooI6QN73GXlsngTgHD9z3DpFgvRilhA8EmbK2ZFFesdkk
R4s25yH8fno6ClhxlwyA3+0vwU9B19Ul00cx4/5ZXPdK56Fq7slM/ORTE5SQFrHH
6QG/pYDz09WXhKvOtX4ZRh2bBudMsY7mgsrxFelGYtsqHikLtYVUzkkyFkccjZMj
UWcHUHsdSx+gr4DK/jclW1tDgwzh2G17d3FJobujGSgJJIqM2qOo/izON/g4+ojA
RlKqYTCkAh3fX6DZX1vX
=Qh4Y
-----END PGP SIGNATURE-----
Merge tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fixes from Borislav Petkov:
"Two error path fixes causing a crash and a Kconfig fix for an issue
which spilled all EDAC suboptions into the 'Device Drivers' menu."
* tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Cleanup device deregistering path
EDAC: Fix EDAC Kconfig menu
EDAC: Fix kernel panic on module unloading
Use device_unregister to replace put_device + device_del for
cleanup, and fix the potential use after free.
Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
After f65aad41772f("MIPS: Cavium: Add EDAC support."), when entering
the "Device Drivers" toplevel menu in menuconfig, the suboptions behind
EDAC appeared merged with the rest of the device drivers types. This was
because the menuconfig option EDAC is querying an EDAC_SUPPORT Kconfig
bool which was defined after the menu definition.
When pushing EDAC_SUPPORT up, before the menu definition, the variable
is defined earlier and the above menuconfig artifact doesn't happen.
Drop a useless menuconfig comment while at it.
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Borislav Petkov <bp@alien8.de>
This patch fixes use-after-free and double-free bugs in
edac_mc_sysfs_exit(). mci_pdev has single reference and put_device()
calls mc_attr_release() which calls kfree(). The following
device_del() works with already released memory. An another kfree() in
edac_mc_sysfs_exit() releses the same memory again. Great.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: stable@vger.kernel.org # 3.[67]
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg
Signed-off-by: Borislav Petkov <bp@alien8.de>
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, and __devexit
from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove size from lookup arrays and mark them as const.
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Niklas Söderlund <niso@kth.se>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Pull MIPS updates from Ralf Baechle:
"The MIPS bits for 3.8. This also includes a bunch fixes that were
sitting in the linux-mips.org git tree for a long time. This pull
request contains updates to several OCTEON drivers and the board
support code for BCM47XX, BCM63XX, XLP, XLR, XLS, lantiq, Loongson1B,
updates to the SSB bus support, MIPS kexec code and adds support for
kdump.
When pulling this, there are two expected merge conflicts in
include/linux/bcma/bcma_driver_chipcommon.h which are trivial to
resolve, just remove the conflict markers and keep both alternatives."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (90 commits)
MIPS: PMC-Sierra Yosemite: Remove support.
VIDEO: Newport Fix console crashes
MIPS: wrppmc: Fix build of PCI code.
MIPS: IP22/IP28: Fix build of EISA code.
MIPS: RB532: Fix build of prom code.
MIPS: PowerTV: Fix build.
MIPS: IP27: Correct fucked grammar in ops-bridge.c
MIPS: Highmem: Fix build error if CONFIG_DEBUG_HIGHMEM is disabled
MIPS: Fix potencial corruption
MIPS: Fix for warning from FPU emulation code
MIPS: Handle COP3 Unusable exception as COP1X for FP emulation
MIPS: Fix poweroff failure when HOTPLUG_CPU configured.
MIPS: MT: Fix build with CONFIG_UIDGID_STRICT_TYPE_CHECKS=y
MIPS: Remove unused smvp.h
MIPS/EDAC: Improve OCTEON EDAC support.
MIPS: OCTEON: Add definitions for OCTEON memory contoller registers.
MIPS: OCTEON: Add OCTEON family definitions to octeon-model.h
ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian.
MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.
MIPS: Remove usage of CEVT_R4K_LIB config option.
...
Some initialization errors are reported with the existing OCTEON EDAC
support patch. Also some parts have more than one memory controller.
Fix the errors and add multiple controllers if present.
Signed-off-by: David Daney <david.daney@cavium.com>
Drivers for EDAC on Cavium. Supported subsystems are:
o CPU primary caches. These are parity protected only, so only error
reporting.
o Second level cache - ECC protected, provides SECDED.
o Memory: ECC / SECDEC if used with suitable DRAM modules. The driver will
will only initialize if ECC is enabled on a system so is safe to run on
non-ECC memory.
o PCI: Parity error reporting
Since it is very hard to test this sort of code the implementation is very
conservative and uses polling where possible for now.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Pull EDAC fixes from Borislav Petkov:
- EDAC core error path fix, from Denis Kirjanov.
- Generalization of AMD MCE bank names and some minor error reporting
improvements.
- EDAC core cleanups and simplifications, from Wei Yongjun.
- amd64_edac fixes for sysfs-reported values, from Josh Hunt.
- some heavy amd64_edac error reporting path shaving, leading to
removing a bunch of code.
- amd64_edac error injection method improvements.
- EDAC core cleanups and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits)
EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code
EDAC: Handle error path in edac_mc_sysfs_init() properly
MCE, AMD: Dump error status
MCE, AMD: Report decoded error type first
MCE, AMD: Dump CPU f/m/s triple with the error
MCE, AMD: Remove functional unit references
EDAC: Convert to use simple_open()
EDAC, Calxeda highbank: Convert to use simple_open()
EDAC: Fix mc size reported in sysfs
EDAC: Fix csrow size reported in sysfs
EDAC: Pass mci parent
EDAC: Add memory controller flags
amd64_edac: Fix csrows size and pages computation
amd64_edac: Use DBAM_DIMM macro
amd64_edac: Fix K8 chip select reporting
amd64_edac: Reorganize error reporting path
amd64_edac: Do not check whether error address is valid
amd64_edac: Improve error injection
amd64_edac: Cleanup error injection code
amd64_edac: Small fixlets and cleanups
...
Use for_each_pci_dev to simplify the code.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
[Boris: cleanup comments and drop loop brackets]
Signed-off-by: Borislav Petkov <bp@alien8.de>
It is very useful to have the family/model/stepping with the reported
error so dump it. This saves us asking the bug reporter about it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Having the functional unit names in each bank decode is only misleading
as this code supports multiple families and there's no guarantee the
mapping between FUs and MCE banks will stay the same.
And also, knowing the functional unit name doesn't help much since you
end up looking at the respective BKDG anyway.
So drop all FU references and use the MC bank numbers instead.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This is the complement to previous commit "EDAC: Fix csrow size
reported in sysfs". This fixes the memory controller size reporting on
csrow-based memory controllers. The csrow size is already combined for
both channels. Without this patch memory size is reported doubled.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
On csrow-based memory controllers, we combine the csrow size from both
channels and there's no need to do that again in csrow_size_show which
leads to double the size of a csrow.
Fix it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Make sure code pays attention to K8 having only one DCT, reformat and
cleanup code, correct debug messages, remove unused code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Instead of open-coding it, use the DBAM_DIMM macro in
amd64_csrow_nr_pages() which we have already.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This basically reverts 603adaf6b3 ("amd64_edac: fix K8 chip select
reporting") because it was a clumsy workaround for DIMM sizes reporting
on K8 which got superceded by a much more correct one with 41d8bfaba7
("amd64_edac: Improve DRAM address mapping") without removing the prior
one. Remove it now finally.
Reported-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Rewrite CE/UE paths so that they use the same code and drop additional
code duplication in handle_ue. Add a struct err_info which collects
required info for the error reporting. This, in turn, helps slimming all
edac_mc_handle_error() calls down to one.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
All families report a valid error address when encountering a DRAM ECC
error so no need to check it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine
does this by injecting the error in the next non-cached access. This
takes relatively long time on a normal system so that in order for us to
expedite it, we disable the caches around the injection.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Invert kstrtoul return value testing and win one indentation level.
Also, shorten up macro names so that the lines can fit into 80 cols. No
functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
amd64_get_dram_hole_info: remove local variable 'base'.
sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize
constants formatting.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
A reported error could look like this
[ 226.178315] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6)
with two spaces back-to-back due to the msg argument of
edac_mc_handle_error being passed on empty by the specific drivers.
Handle that.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The tracepoint decodes the error type later anyway so remove a useless
assignment to the temporary p which gets overwritten later anyway.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Only levels [0:4] are allowed so enforce that. Also, while at it,
massage Kconfig text and add valid debug levels range to the module
parameter description.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Currently, we unconditionally enable PCI polling and we don't look at
the edac_op_state module parameter. Make this dependent on the parameter
setting supplied on the command line.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The i7core_edac addrmatch_dev and chancounts_dev have sysfs files
associated with them. The sysfs files, however, are coded so that the
parent device is is the mci device. This is incorrect and the mci struct
should be obtained through the addrmatch_dev and chancounts_dev device's
private data field which is populated in i7core_create_sysfs_devices().
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* Right-shift the values in GET_FBD_FAT_IDX and GET_FBD_NF_IDX, so
that the callers get the result they expect.
* Fix definition of FERR_FAT_FBD_ERR_MASK.
* Call GET_FBD_NF_IDX, not GET_FBD_FAT_IDX, when operating on
register FERR_NF_FBD. We were lucky they have the same definition.
This fixes kernel bug #44131:
https://bugzilla.kernel.org/show_bug.cgi?id=44131
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The driver is currently filling data in a wrong way, on drivers
for csrows-based memory controller, when the first layer is a
csrow.
This is not easily to notice, as, in general, memories are
filed in dual, interleaved, symetric mode, as very few memory
controllers support asymetric modes.
While digging into a bug for i82795_edac driver, the asymetric
mode there is now working, allowing us to fill the machine with
4x1GB ranks at channel 0, and 2x512GB at channel 1:
Channel 0 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages)
Channel 1 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages)
Instead of properly showing the memories as such, before this patch, it
shows the memory layout as:
+-----------------------------------+
| mc0 |
| csrow0 | csrow1 | csrow2 |
----------+-----------------------------------+
channel1: | 1024 MB | 1024 MB | 512 MB |
channel0: | 1024 MB | 1024 MB | 512 MB |
----------+-----------------------------------+
as if both channels were symetric, grouping the DIMMs on a wrong
layout.
After this patch, the memory is correctly represented.
So, for csrows at layers[0], it shows:
+-----------------------------------------------+
| mc0 |
| csrow0 | csrow1 | csrow2 | csrow3 |
----------+-----------------------------------------------+
channel1: | 512 MB | 512 MB | 0 MB | 0 MB |
channel0: | 1024 MB | 1024 MB | 1024 MB | 1024 MB |
----------+-----------------------------------------------+
For csrows at layers[1], it shows:
+-----------------------+
| mc0 |
| channel0 | channel1 |
--------+-----------------------+
csrow3: | 1024 MB | 0 MB |
csrow2: | 1024 MB | 0 MB |
--------+-----------------------+
csrow1: | 1024 MB | 512 MB |
csrow0: | 1024 MB | 512 MB |
--------+-----------------------+
So, no matter of what comes first, the information between
channel and csrow will be properly represented.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>