Commit Graph

20 Commits

Author SHA1 Message Date
Borislav Petkov 1c38108a08 PCI/AER: Clean up error printing code a bit
Save one indentation level in aer_print_error() for the generic case where
we have info->status of an error, disregard 80 cols rule a bit for the sake
of better readability, fix alignment.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-12-13 14:40:03 -07:00
Borislav Petkov fab4c256a5 PCI/AER: Add a TLP header print helper
... and call it instead of duplicating the large printk format
statement.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-12-13 14:39:56 -07:00
Lance Ortiz 37448adfc7 aerdrv: Move cper_print_aer() call out of interrupt context
The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.

WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()

This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer().  The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.

The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-05-30 10:51:20 -07:00
Lance Ortiz 2cced2d959 aerdrv: Cleanup log output for AER
These changes make cper_print_aer more consistent with aer_print_error
and clean things up by elimiating the use of the prefix variable and
replacing it with dev_printk.

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Boris Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-01-03 14:35:41 -08:00
Lance Ortiz 1d5210008b aerdrv: Enhanced AER logging
This patch will provide a more reliable and easy way for user-space
applications to have access to AER logs rather than reading them from the
message buffer. It also provides a way to notify user-space when an AER
event occurs.

The aer driver is updated to generate a trace event of function 'aer_event'
when a PCIe error is reported over the AER interface.  The trace event was
added to both the interrupt based aer path and the firmware first path.

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Boris Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-01-03 14:34:06 -08:00
Huang Ying 0918472cee PCI: PCIe AER: add aer_recover_queue
In addition to native PCIe AER, now APEI (ACPI Platform Error
Interface) GHES (Generic Hardware Error Source) can be used to report
PCIe AER errors too.  To add support to APEI GHES PCIe AER recovery,
aer_recover_queue is added to export the recovery function in native
PCIe AER driver.

Recoverable PCIe AER errors are reported via NMI in APEI GHES.  Then
APEI GHES uses irq_work to delay the error processing into an IRQ
handler.  But PCIe AER recovery can be very time-consuming, so
aer_recover_queue, which can be used in IRQ handler, delays the real
recovery action into the process context, that is, work queue.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:25:37 -07:00
Huang Ying c413d76820 ACPI, APEI, Add PCIe AER error information printing support
The AER error information printing support is implemented in
drivers/pci/pcie/aer/aer_print.c.  So some string constants, functions
and macros definitions can be re-used without being exported.

The original PCIe AER error information printing function is not
re-used directly because the overall format is quite different.  And
changing the original printing format may make some original users'
scripts broken.

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-21 22:59:08 -04:00
Huang Ying b64a441465 PCIe, AER, use pre-generated prefix in error information printing
When printing PCIe AER error information, each line is prefixed with
PCIe device and driver information.  In original implementation, the
prefix is generated when each line is printed.  In fact, all lines
share the same prefix.  So this patch pre-generated the prefix, and
use that one when each line is printed.

In addition to common prefix can be pre-generated, the trailing white
spaces in string constants and NULLs in char * array constants can be
removed too.  These can reduce the object file size further.

The size of object file before and after changing is as follow:

           text    data     bss     dec
before:    3038       0       0    3038
after:     2118       0       0    2118

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-21 22:59:08 -04:00
Stefan Assmann 7e8af37a9a PCI: change PCI nomenclature in drivers/pci/ (non-comment changes)
Changing occurrences of variants of PCI-X and PCIe to the PCI-SIG
terms listed in the "Trademark and Logo Usage Guidelines".
http://www.pcisig.com/developers/procedures/logos/Trademark_and_Logo_Usage_Guidelines_updated_112206.pdf

Patch is limited to drivers/pci/ and changes concern non-comment parts or
anything that might be visible to the user.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:54 -08:00
Hidetoshi Seto 79e4b89be8 PCI: pcie, aer: change error print format
Use dev_printk like format.

Sample (real machine + dummy error injected by aer-inject):

- Before:

+------ PCI-Express Device Error ------+
Error Severity          : Corrected
PCIE Bus Error type     : Data Link Layer
Bad TLP                 :
Receiver ID             : 2800
VendorID=8086h, DeviceID=1096h, Bus=28h, Device=00h, Function=00h
+------ PCI-Express Device Error ------+
Error Severity          : Corrected
PCIE Bus Error type     : Data Link Layer
Bad TLP                 :
Bad DLLP                :
Receiver ID             : 2801
VendorID=8086h, DeviceID=1096h, Bus=28h, Device=00h, Function=01h
Error of this Agent(2801) is reported first

- After:

pcieport-driver 0000:00:02.0: AER: Multiple Corrected error received: id=2801
e1000e 0000:28:00.0: PCIE Bus Error: severity=Corrected, type=Data Link Layer, id=2800(Receiver ID)
e1000e 0000:28:00.0:   device [8086:1096] error status/mask=00000040/00000000
e1000e 0000:28:00.0:    [ 6] Bad TLP
e1000e 0000:28:00.1: PCIE Bus Error: severity=Corrected, type=Data Link Layer, id=2801(Receiver ID)
e1000e 0000:28:00.1:   device [8086:1096] error status/mask=000000c0/00000000
e1000e 0000:28:00.1:    [ 6] Bad TLP
e1000e 0000:28:00.1:    [ 7] Bad DLLP
e1000e 0000:28:00.1:   Error of this Agent(2801) is reported first

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:50:05 -07:00
Hidetoshi Seto 273024ded7 PCI: pcie, aer: flags to bits
Compact struct and codes.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:49:56 -07:00
Hidetoshi Seto e7a0d92b19 PCI: pcie, aer: report multiple/first error on a device
Multiple bits might be set in the Uncorrectable Error Status
register.  But aer_print_error_source() only report a error of
the lowest bit set in the error status register.

So print strings for all bits unmasked and set.

And check First Error Pointer to mark the error occured first.
This FEP is not valid when the corresponing bit of the Uncorrectable
Error Status register is not set, or unimplemented or undefined.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:49:26 -07:00
Hidetoshi Seto 0d90c3ac0b PCI: pcie, aer: refer mask state in mask register properly
ERR_{,UN}CORRECTABLE_ERROR_MASK are set of error bits which linux know,
set of PCI_ERR_COR_* and PCI_ERR_UNC_* defined in linux/pci_regs.h.
This masks make aerdrv not to report errors of unknown bit, while aerdrv
have ability to report such undefined errors as "Unknown Error Bit %2d".

OTOH aerdrv_errprint does not have any check of setting in mask register.
So it could report masked wrong error by finding bit in status without
knowing that the bit is masked in the mask register.

This patch changes aerdrv to use mask state in mask register propely
instead of defined/hardcoded ERR_{,UN}CORRECTABLE_ERROR_MASK.
This change prevents aerdrv from reporting masked error, and also enable
reporting unknown errors.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:49:07 -07:00
Hidetoshi Seto 24dbb7beb2 PCI: pcie, aer: remove spinlock in aerdrv_errprint.c
The static buffer errmsg_buff[] is used only for building error
message in fixed format, and is protected by a spinlock.

This patch removes this buffer and the spinlock.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:48:19 -07:00
Hidetoshi Seto 0d465f2350 PCI: pcie, aer: fix report of multiple errors
The flag AER_MULTI_ERROR_VALID_FLAG in info->flag does mean that the
root port receives multiple error messages.  Error messages can be
posted from different devices, so it does not mean that each reported
device has multiple errors.

If there are multiple error devices and the root port has valid error
source ID, it would be nice to report which device is the error source
reported first.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:47:46 -07:00
Hidetoshi Seto f158575696 PCI: pcie, aer: rework MASK macros in aerdrv_errprint.c
Definitions of MASK macros in aerdrv_errprint.c are tricky and unsafe.

For example, AER_AGENT_TRANSMITTER_MASK(_sev, _stat) does work like:
  static inline func(int _sev, int _stat)
  {
    if (_sev == AER_CORRECTABLE)
      return (_stat & (PCI_ERR_COR_REP_ROLL|PCI_ERR_COR_REP_TIMER));
    else
      return (_stat & PCI_ERR_COR_REP_ROLL);
  }
In case of else path here, for uncorrectable errors, testing bits in
_stat by PCI_ERR_COR_* does not make sense because _stat should have only
PCI_ERR_UNC_* bits originated in uncorrectable error status register.
But at this time this is safe because uncorrectable error using bit
position same to PCI_ERR_COR_REP_ROLL(= bit position 8) is not defined.
Likewise, AER_AGENT_COMPLETER_MASK is always PCI_ERR_UNC_COMP_ABORT but
it works because bit 15 of correctable error status is not defined.

It means that these MASK macros will turn to be wrong once if new error
is defined. (In fact, bit 15 of correctable is now defined in PCIe 2.1)

This patch changes these MASK macros to be more strict, not to return
PCI_ERR_COR_* bits for uncorrectable error status and vise versa.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:47:16 -07:00
Hidetoshi Seto bd8fedd045 PCI: pcie, aer: AER_PR for printing in aerdrv_errprint.c
Add workaround macro to reduce the number of checkpatch warning:
 WARNING: printk() should include KERN_ facility level

Before:
  total: 0 errors, 10 warnings, 247 lines checked
After:
  total: 0 errors, 1 warnings, 243 lines checked

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:46:54 -07:00
Hidetoshi Seto c9a918838c PCI: pcie, aer: checkpatch style cleanup in pcie/aer/*
Before:
 drivers/pci/pcie/aer/aer_inject.c
  total: 4 errors, 4 warnings, 473 lines checked
 drivers/pci/pcie/aer/aerdrv.c
  total: 5 errors, 2 warnings, 333 lines checked
 drivers/pci/pcie/aer/aerdrv.h
  total: 1 errors, 0 warnings, 139 lines checked
 drivers/pci/pcie/aer/aerdrv_core.c
  total: 4 errors, 3 warnings, 872 lines checked
 drivers/pci/pcie/aer/aerdrv_errprint.c
  total: 12 errors, 11 warnings, 248 lines checked

After:
 drivers/pci/pcie/aer/aer_inject.c
  total: 0 errors, 0 warnings, 466 lines checked
 drivers/pci/pcie/aer/aerdrv.c
  total: 0 errors, 0 warnings, 335 lines checked
 drivers/pci/pcie/aer/aerdrv.h
  total: 0 errors, 0 warnings, 139 lines checked
 drivers/pci/pcie/aer/aerdrv_core.c
  total: 0 errors, 0 warnings, 869 lines checked
 drivers/pci/pcie/aer/aerdrv_errprint.c
  total: 0 errors, 10 warnings, 247 lines checked

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:46:18 -07:00
Hidetoshi Seto a367f74cb6 PCI hotplug: aerdrv: fix a typo in error message
"TLP" is an acronym for "Transaction Layer Packet."

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:41 -08:00
Zhang, Yanmin 6c2b374d74 PCI-Express AER implemetation: AER core and aerdriver
Patch 3 implements the core part of PCI-Express AER and aerdrv
port service driver.

When a root port service device is probed, the aerdrv will call
request_irq to register irq handler for AER error interrupt.

When a device sends an PCI-Express error message to the root port,
the root port will trigger an interrupt, by either MSI or IO-APIC,
then kernel would run the irq handler. The handler collects root
error status register and schedules a work. The work will call
the core part to process the error based on its type
(Correctable/non-fatal/fatal).

As for Correctable errors, the patch chooses to just clear the correctable
error status register of the device.

As for the non-fatal error, the patch follows generic PCI error handler
rules to call the error callback functions of the endpoint's driver. If
the device is a bridge, the patch chooses to broadcast the error to
downstream devices.

As for the fatal error, the patch resets the pci-express link and
follows generic PCI error handler rules to call the error callback
functions of the endpoint's driver. If the device is a bridge, the patch
chooses to broadcast the error to downstream devices.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-26 17:43:53 -07:00