This patch fixed the following bug.
https://bugzilla.kernel.org/show_bug.cgi?id=43282
This is caused by a firmware bug checking (checking generic address
register provided by firmware) in runtime. The checking should be
done in address mapping time instead of runtime to avoid too much
error reporting in runtime.
Reported-by: Pawel Sikora <pluto@agmk.net>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
APEI needs memory access in interrupt context. The obvious choice is
acpi_read(), but originally it couldn't be used in interrupt context
because it makes temporary mappings with ioremap(). Therefore, we added
drivers/acpi/atomicio.c, which provides:
acpi_pre_map_gar() -- ioremap in process context
acpi_atomic_read() -- memory access in interrupt context
acpi_post_unmap_gar() -- iounmap
Later we added acpi_os_map_generic_address() (2971852) and enhanced
acpi_read() so it works in interrupt context as long as the address has
been previously mapped (620242a). Now this sequence:
acpi_os_map_generic_address() -- ioremap in process context
acpi_read()/apei_read() -- now OK in interrupt context
acpi_os_unmap_generic_address()
is equivalent to what atomicio.c provides.
This patch introduces apei_read() and apei_write(), which currently are
functional equivalents of acpi_read() and acpi_write(). This is mainly
proactive, to prevent APEI breakages if acpi_read() and acpi_write()
are ever augmented to support the 'bit_offset' field of GAS, as APEI's
__apei_exec_write_register() precludes splitting up functionality
related to 'bit_offset' and APEI's 'mask' (see its
APEI_EXEC_PRESERVE_REGISTER block).
With apei_read() and apei_write() in place, usages of atomicio routines
are converted to apei_read()/apei_write() and existing calls within
osl.c and the CA, based on the re-factoring that was done in an earlier
patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
acpi_pre_map_gar() --> acpi_os_map_generic_address()
acpi_post_unmap_gar() --> acpi_os_unmap_generic_address()
acpi_atomic_read() --> apei_read()
acpi_atomic_write() --> apei_write()
Note that acpi_read() and acpi_write() currently use 'bit_width'
for accessing GARs which seems incorrect. 'bit_width' is the size of
the register, while 'access_width' is the size of the access the
processor must generate on the bus. The 'access_width' may be larger,
for example, if the hardware only supports 32-bit or 64-bit reads. I
wanted to minimize any possible impacts with this patch series so I
did *not* change this behavior.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some APEI firmware implementation will access injected address
specified in param1 to trigger the error when injecting memory error.
This will cause resource conflict with RAM.
On one of our testing machine, if injecting at memory address
0x10000000, the following error will be reported in dmesg:
APEI: Can not request iomem region <0000000010000000-0000000010000008> for GARs.
This patch removes the injecting memory address range from trigger
table resources to avoid conflict.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
APEI firmware first mode must be turned on explicitly on some
machines, otherwise there may be no GHES hardware error record for
hardware error notification. APEI bit in generic _OSC call can be
used to do that, but on some machine, a special WHEA _OSC call must be
used. This patch adds the support to that WHEA _OSC call.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some actions in APEI ERST and EINJ tables are optional, for example,
ACPI_EINJ_BEGIN_OPERATION action is used to do some preparation for
error injection, and firmware may choose to do nothing here. While
some other actions are mandatory, for example, firmware must provide
ACPI_EINJ_GET_ERROR_TYPE implementation.
Original implementation treats all actions as optional (that is, can
have no instructions), that may cause issue if firmware does not
provide some mandatory actions. To fix this, this patch adds
apei_exec_run_optional, which should be used for optional actions.
The original apei_exec_run should be used for mandatory actions.
Cc: Thomas Renninger <trenn@novell.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
In APEI, Hardware error information reported by firmware to Linux
kernel is in the data structure of APEI generic error status (struct
acpi_hes_generic_status). While now printk is used by Linux kernel to
report hardware error information to user space.
So, this patch adds printing support for the data structure, so that
the corresponding hardware error information can be reported to user
space via printk.
PCIe AER information printing is not implemented yet. Will refactor the
original PCIe AER information printing code to avoid code duplicating.
The output format is as follow:
<error record> :=
APEI generic hardware error status
severity: <integer>, <severity string>
section: <integer>, severity: <integer>, <severity string>
flags: <integer>
<section flags strings>
fru_id: <uuid string>
fru_text: <string>
section_type: <section type string>
<section data>
<severity string>* := recoverable | fatal | corrected | info
<section flags strings># :=
[primary][, containment warning][, reset][, threshold exceeded]\
[, resource not accessible][, latent error]
<section type string> := generic processor error | memory error | \
PCIe error | unknown, <uuid string>
<section data> :=
<generic processor section data> | <memory section data> | \
<pcie section data> | <null>
<generic processor section data> :=
[processor_type: <integer>, <proc type string>]
[processor_isa: <integer>, <proc isa string>]
[error_type: <integer>
<proc error type strings>]
[operation: <integer>, <proc operation string>]
[flags: <integer>
<proc flags strings>]
[level: <integer>]
[version_info: <integer>]
[processor_id: <integer>]
[target_address: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[IP: <integer>]
<proc type string>* := IA32/X64 | IA64
<proc isa string>* := IA32 | IA64 | X64
<processor error type strings># :=
[cache error][, TLB error][, bus error][, micro-architectural error]
<proc operation string>* := unknown or generic | data read | data write | \
instruction execution
<proc flags strings># :=
[restartable][, precise IP][, overflow][, corrected]
<memory section data> :=
[error_status: <integer>]
[physical_address: <integer>]
[physical_address_mask: <integer>]
[node: <integer>]
[card: <integer>]
[module: <integer>]
[bank: <integer>]
[device: <integer>]
[row: <integer>]
[column: <integer>]
[bit_position: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[target_id: <integer>]
[error_type: <integer>, <mem error type string>]
<mem error type string>* :=
unknown | no error | single-bit ECC | multi-bit ECC | \
single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
target abort | parity error | watchdog timeout | invalid address | \
mirror Broken | memory sparing | scrub corrected error | \
scrub uncorrected error
<pcie section data> :=
[port_type: <integer>, <pcie port type string>]
[version: <integer>.<integer>]
[command: <integer>, status: <integer>]
[device_id: <integer>:<integer>:<integer>.<integer>
slot: <integer>
secondary_bus: <integer>
vendor_id: <integer>, device_id: <integer>
class_code: <integer>]
[serial number: <integer>, <integer>]
[bridge: secondary_status: <integer>, control: <integer>]
<pcie port type string>* := PCIe end point | legacy PCI end point | \
unknown | unknown | root port | upstream switch port | \
downstream switch port | PCIe to PCI/PCI-X bridge | \
PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
root complex event collector
Where, [] designate corresponding content is optional
All <field string> description with * has the following format:
field: <integer>, <field string>
Where value of <integer> should be the position of "string" in <field
string> description. Otherwise, <field string> will be "unknown".
All <field strings> description with # has the following format:
field: <integer>
<field strings>
Where each string in <fields strings> corresponding to one set bit of
<integer>. The bit position is the position of "string" in <field
strings> description.
For more detailed explanation of every field, please refer to UEFI
specification version 2.3 or later, section Appendix N: Common
Platform Error Record.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
CPER stands for Common Platform Error Record, it is the hardware error
record format used to describe platform hardware error by various APEI
tables, such as ERST, BERT and HEST etc.
For more information about CPER, please refer to Appendix N of UEFI
Specification version 2.3.
This patch mainly includes the data structure difinition header file
used by other files.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
APEI stands for ACPI Platform Error Interface, which allows to report
errors (for example from the chipset) to the operating system. This
improves NMI handling especially. In addition it supports error
serialization and error injection.
For more information about APEI, please refer to ACPI Specification
version 4.0, chapter 17.
This patch provides some common functions used by more than one APEI
tables, mainly framework of interpreter for EINJ and ERST.
A machine readable language is defined for EINJ and ERST for OS to
execute, and so to drive the firmware to fulfill the corresponding
functions. The machine language for EINJ and ERST is compatible, so a
common framework is defined for them.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>