The logic there is broken: it basically creates two csrows for
each DIMM and assumes that all DIMM's are dual rank. Only one of
the csrows will contain the entire DIMM size. If single rank
memories are found, they'll be marked with 0 bytes.
The check if the AMB is present were also wrong.
Yet, as the error reports don't use the memory size in order to
credit an error to the right DIMM, that part of the driver seems
to work. That's why probably nobody detected the issue yet.
After this patch, the memory layout is now properly reported,
when debug mode is enabled, and the number of ranks per dimm is
now shown:
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot 3 0 MB | 0 MB | 0 MB | 0 MB |
calculate_dimm_size: slot 2 0 MB | 0 MB | 0 MB | 0 MB |
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot 1 0 MB | 0 MB | 0 MB | 0 MB |
calculate_dimm_size: slot 0 512 MB 1R| 512 MB 1R| 512 MB 1R| 512 MB 1R|
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: channel 0 | channel 1 | channel 2 | channel 3 |
calculate_dimm_size: branch 0 | branch 1 |
(1R above means that all memories on my test machine are single-ranked)
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Improves the debug output message, in order to better represent the
memory controller hierarchy, when outputing the debug messages.
No functional changes when debug is disabled.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Remove some information that it is duplicated at the MCE log,
and don't have much usage for the error. Those data will be
added again, when creating a trace function that outputs both
memory errors and MCE fields.
Cc: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
While userspace doesn't fill the dimm labels, add there the dimm location,
as described by the used memory model. This could eventually match what
is described at the dmidecode, making easier for people to identify the
memory.
For example, on an Intel motherboard where the DMI table is reliable,
the first memory stick is described as:
Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: 1
Locator: A1_DIMM0
Bank Locator: A1_Node0_Channel0_Dimm0
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 800 MHz
Manufacturer: A1_Manufacturer0
Serial Number: A1_SerNum0
Asset Tag: A1_AssetTagNum0
Part Number: A1_PartNum0
The memory named as "A1_DIMM0" is physically located at the first
memory controller (node 0), at channel 0, dimm slot 0.
After this patch, the memory label will be filled with:
/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0
And (after the new EDAC API patches) as:
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0
So, even if the memory label is not initialized on userspace, an useful
information with the error location is filled there, expecially since
several systems/motherboards are provided with enough info to map from
channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
EDAC core fill it by default is a good thing.
It should noticed that, as the label filling happens at the
edac_mc_alloc(), drivers can override it to better describe the memories
(and some actually do it).
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Now that all drivers got converted to use the new ABI, we can
drop the old one.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Change the EDAC internal representation to work with non-csrow
based memory controllers.
There are lots of those memory controllers nowadays, and more
are coming. So, the EDAC internal representation needs to be
changed, in order to work with those memory controllers, while
preserving backward compatibility with the old ones.
The edac core was written with the idea that memory controllers
are able to directly access csrows.
This is not true for FB-DIMM and RAMBUS memory controllers.
Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks.
So, change the allocation and error report routines to allow
them to work with all types of architectures.
This will allow the removal of several hacks with FB-DIMM and RAMBUS
memory controllers.
Also, several tests were done on different platforms using different
x86 drivers.
TODO: a multi-rank DIMMs are currently represented by multiple DIMM
entries in struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same DIMM.
This bug is present since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.
I tried to make this patch as short as possible, preceding it with
several other patches that simplified the logic here. Yet, as the
internal API changes, all drivers need changes. The changes are
generally bigger in the drivers for FB-DIMMs.
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The edac_align_ptr() function is used to prepare data for a single
memory allocation kzalloc() call. It counts how many bytes are needed
by some data structure.
Using it as-is is not that trivial, as the quantity of memory elements
reserved is not there, but, instead, it is on a next call.
In order to avoid mistakes when using it, move the number of allocated
elements into it, making easier to use it.
Reviewed-by: Borislav Petkov <bp@amd64.org>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The number of pages is a dimm property. Move it to the dimm struct.
After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.
A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Almost all edac drivers initialize csrow_info->first_page,
csrow_info->last_page and csrow_info->page_mask. Those vars are
used inside the EDAC core, in order to calculate the csrow affected
by an error, by using the routine edac_mc_find_csrow_by_page().
However, very few drivers actually use it:
e752x_edac.c
e7xxx_edac.c
i3000_edac.c
i82443bxgx_edac.c
i82860_edac.c
i82875p_edac.c
i82975x_edac.c
r82600_edac.c
There also a few other drivers that have their own calculus
formula internally using those vars.
All the others are just wasting time by initializing those
data.
While initializing data without using them won't cause any troubles, as
those information is stored at the wrong place (at csrows structure), it
is better to remove what is unused, in order to simplify the next patch.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized.
However, such assumption is not true for all types of memory
controllers.
Controllers for FB-DIMM's don't have such requirements.
Also, modern Intel controllers seem to be capable of handling such
differences.
So, we need to get rid of storing the DIMM information into a per-csrow
data, storing it, instead at the right place.
The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Mike Williams <mike@mikebwilliams.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.
This forced drivers to fake^Wvirtualize a csrow struct, and to create
a mess under csrow/channel original's concept.
Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel.
Latter patches will modify the location to properly represent the
memory architecture.
All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.
TODO: While this patch doesn't change the existing behavior, on
csrows-based memory controllers, a csrow/channel pair points to a memory
rank. There's a known bug at the EDAC core that allows having different
labels for the same DIMM, if it has more than one rank. A latter patch
is need to merge the several ranks for a DIMM into the same dimm_info
struct, in order to avoid having different labels for the same DIMM.
The edac_mc_alloc() will now contain a per-dimm initialization loop that
will be changed by latter patches in order to match other types of
memory architectures.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Needed for shifting 64-bit values on 32-bit, like MSR values,
for example.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frank Arnold <frank.arnold@amd.com>
Link: http://lkml.kernel.org/r/1337684026-19740-1-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a partial revert of
15ed103a98 ("edac: Fix spelling errors")
6997991ab0 ("mips: Fix printk typos in arc/mips")
which change code that doesn't exist any more in edac/mips trees.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull arch/tile bug fixes from Chris Metcalf:
"This includes Paul Gortmaker's change to fix the <asm/system.h>
disintegration issues on tile, a fix to unbreak the tilepro ethernet
driver, and a backlog of bugfix-only changes from internal Tilera
development over the last few months.
They have all been to LKML and on linux-next for the last few days.
The EDAC change to MAINTAINERS is an oddity but discussion on the
linux-edac list suggested I ask you to pull that change through my
tree since they don't have a tree to pull edac changes from at the
moment."
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (39 commits)
drivers/net/ethernet/tile: fix netdev_alloc_skb() bombing
MAINTAINERS: update EDAC information
tilepro ethernet driver: fix a few minor issues
tile-srom.c driver: minor code cleanup
edac: say "TILEGx" not "TILEPro" for the tilegx edac driver
arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally
arch/tile: remove bogus performance optimization
arch/tile: return SIGBUS for addresses that are unaligned AND invalid
arch/tile: fix finv_buffer_remote() for tilegx
arch/tile: use atomic exchange in arch_write_unlock()
arch/tile: stop mentioning the "kvm" subdirectory
arch/tile: export the page_home() function.
arch/tile: fix pointer cast in cacheflush.c
arch/tile: fix single-stepping over swint1 instructions on tilegx
arch/tile: implement panic_smp_self_stop()
arch/tile: add "nop" after "nap" to help GX idle power draw
arch/tile: use proper memparse() for "maxmem" options
arch/tile: fix up locking in pgtable.c slightly
arch/tile: don't leak kernel memory when we unload modules
arch/tile: fix bug in delay_backoff()
...
MCA details seldom change inbetween the models of a family so don't
be too conservative and enable decoding on everything starting from
K8 onwards. Minor adjustments can come in later but most importantly,
we have some decoding infrastructure in place for upcoming models by
default.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This is just an aesthetic change but it was silly to say TILEPro
when booting up on the tilegx architecture.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Pull EDAC fixes from Mauro Carvalho Chehab:
"A series of EDAC driver fixes. It also has one core fix at the
documentation, and a rename patch, fixing the name of the struct that
contains the rank information."
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
edac: rename channel_info to rank_info
i5400_edac: Avoid calling pci_put_device() twice
edac: i5100 ack error detection register after each read
edac: i5100 fix erroneous define for M1Err
edac: sb_edac: Fix a wrong value setting for the previous value
edac: sb_edac: Fix a INTERLEAVE_MODE() misuse
edac: sb_edac: Let the driver depend on PCI_MMCONFIG
edac: Improve the comments to better describe the memory concepts
edac/ppc4xx_edac: Fix compilation
Fix sb_edac compilation with 32 bits kernels
"[RFC PATCH 0/2] audit of linux/device.h users in include/*"
https://lkml.org/lkml/2012/3/4/159
--
Nearly every subsystem has some kind of header with a proto like:
void foo(struct device *dev);
and yet there is no reason for most of these guys to care about the
sub fields within the device struct. This allows us to significantly
reduce the scope of headers including headers. For this instance, a
reduction of about 40% is achieved by replacing the include with the
simple fact that the device is some kind of a struct.
Unlike the much larger module.h cleanup, this one is simply two
commits. One to fix the implicit <linux/device.h> users, and then
one to delete the device.h includes from the linux/include/ dir
wherever possible.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPbNxLAAoJEOvOhAQsB9HWR6QQAMRUZ94O2069/nW9h4TO/xTr
Hq/80lo/TBBiRmob3iWBP76lzgeeMPPVEX1I6N7YYlhL3IL7HsaJH1DvpIPPHXQP
GFKcBsZ5ZLV8c4CBDSr+/HFNdhXc0bw0awBjBvR7gAsWuZpNFn4WbhizJi4vWAoE
4ydhPu55G1G8TkBtYLJQ8xavxsmiNBSDhd2i+0vn6EVpgmXynjOMG8qXyaS97Jvg
pZLwnN5Wu21coj6+xH3QUKCl1mJ+KGyamWX5gFBVIfsDB3k5H4neijVm7t1en4b0
cWxmXeR/JE3VLEl/17yN2dodD8qw1QzmTWzz1vmwJl2zK+rRRAByBrL0DP7QCwCZ
ppeJbdhkMBwqjtknwrmMwsuAzUdJd79GXA+6Vm+xSEkr6FEPK1M0kGbvaqV9Usgd
ohMewewbO6ddgR9eF7Kw2FAwo0hwkPNEplXIym9rZzFG1h+T0STGSHvkn7LV765E
ul1FapSV3GCxEVRwWTwD28FLU2+0zlkOZ5sxXwNPTT96cNmW+R7TGuslZKNaMNjX
q7eBZxo8DtVt/jqJTntR8bs8052c8g1Ac1IKmlW8VSmFwT1M6VBGRn1/JWAhuUgv
dBK/FF+I1GJTAJWIhaFcKXLHvmV9uhS6JaIhLMDOetoOkpqSptJ42hDG+89WkFRk
o55GQ5TFdoOpqxVzGbvE
=3j4+
-----END PGP SIGNATURE-----
Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull <linux/device.h> avoidance patches from Paul Gortmaker:
"Nearly every subsystem has some kind of header with a proto like:
void foo(struct device *dev);
and yet there is no reason for most of these guys to care about the
sub fields within the device struct. This allows us to significantly
reduce the scope of headers including headers. For this instance, a
reduction of about 40% is achieved by replacing the include with the
simple fact that the device is some kind of a struct.
Unlike the much larger module.h cleanup, this one is simply two
commits. One to fix the implicit <linux/device.h> users, and then one
to delete the device.h includes from the linux/include/ dir wherever
possible."
* tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
device.h: audit and cleanup users in main include dir
device.h: cleanup users outside of linux/include (C files)
What it is pointed by a csrow/channel vector is a rank information, and
not a channel information.
On a traditional architecture, the memory controller directly access the
memory ranks, via chip select rows. Different ranks at the same DIMM is
selected via different chip select rows. So, typically, one
csrow/channel pair means one different DIMM.
On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
Memory Buffer (AMB) that serves as the interface between the memory
controller and the memory chips.
The AMB selection is via the DIMM slot, and not via a csrow.
It is up to the AMB to talk with the csrows of the DRAM chips.
So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
rank. RAMBUS is similar.
Newer memory controllers, like the ones found on Intel Sandy Bridge and
Nehalem, even working with normal DDR3 DIMM's, don't use the usual
channel A/channel B interleaving schema to provide 128 bits data access.
Instead, they have more channels (3 or 4 channels), and they can use
several interleaving schemas. Such memory controllers see the DIMMs
directly on their registers, instead of the ranks, which is better for
the driver, as its main usageis to point to a broken DIMM stick (the
Field Repleceable Unit), and not to point to a broken DRAM chip.
The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures, as
the subsystem was conceived with the idea that the csrow would always be
visible by the CPU.
To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't apply
to the memory controllers they're talking with. So, each driver author
interpreted the concepts using a different logic.
In order to fix it, let's rename the data structure that points into a
DIMM rank to "rank_info", in order to be clearer about what's stored
there.
Latter patches will provide a better way to represent the memory
hierarchy for the other types of memory controller.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
When i5400_edac driver is removed and re-loaded a few times, it causes
an OOPS, as it is currently decrementing some PCI device usage two
times.
When called inside a loop, pci_get_device() will call
pci_put_device(). That mangles the error count. In this specific
case, it seems easier to just duplicate the call.
Also fixes the error logic when pci_get_device fails.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
If I only ack the detection register after a error have been detected
I'm unable to reliably detect errors. I have verified this behavior
using both an error injection DIMM and software to inject errors.
I can't find any documentation supporting this behavior in Intel 5100
Memory Controller Hub Chipset, see 1. So this is all based on
experimentation.
[1] Intel® 5100 Memory Controller Hub Chipset
http://www.intel.com/content/dam/doc/datasheet/5100-
memory-controller-hub-chipset-datasheet.pdf
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
According to [1] the define for M1Err in the FERR_NF_MEM register is
wrong. It should be at position 1 not 0.
[1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378
http://www.intel.com/content/dam/doc/datasheet/5100-
memory-controller-hub-chipset-datasheet.pdf
Reported-by: Ba Thang Nguyen <thang.b.nguyen@dektech.com.au>
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
>From the driver design, the variable limit wants to compare with its
previous value, we should set the value of limit instead of the value
of tmp_mb to the variable prev.
Signed-off-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
We can identify dram interleave mode from the Dram Rule register
rather than Dram Interleave list register.
In this context, the reg of INTERLEAVE_MODE(reg) contains the Dram
Interleave list register, we can't get interleave mode from the reg,
while the variable interleave_mode saves the the mode got from the
Dram Rule register, so we use the variable to replace
INTERLEAVE_MDDE(reg) here.
Signed-off-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This driver needs to access PCIe Extended Configuration Space
Registers (0x100~0xfff), to correctly access those registers, we need
to enable PCI_MMCONFIG option. Since this option is not enabled for
X86_64 by default, we let the driver depend on it to prevent users
forgetting to enable this option.
Signed-off-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
It seems that nobody is cross-compiling for this arch anymore...
drivers/edac/ppc4xx_edac.c: In function 'ppc4xx_edac_probe':
drivers/edac/ppc4xx_edac.c:188:12: error: storage class specified for parameter 'ppc4xx_edac_remove'
...
drivers/edac/ppc4xx_edac.c:1068:19: error: 'match' undeclared (first use in this function)
drivers/edac/ppc4xx_edac.c:1068:19: note: each undeclared identifier is reported only once for each function it appears in
drivers/edac/ppc4xx_edac.c:1068:36: warning: left-hand operand of comma expression has no effect [-Wunused-value]
Acked-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
... so that checkpatch can chill out.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Correct their formulation, replace per-family functions with a single,
unified lookup table.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
This MC1 error signature is called differently now, fix it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Use "System Read Data Error" as a more general name for MC0 bus errors
on F15h and update some error definitions.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
These const tables are currently marked __devinitdata, but
Documentation/PCI/pci.txt says:
"o The ID table array should be marked __devinitconst; this is done
automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()."
So use DEFINE_PCI_DEVICE_TABLE(x).
Based on PaX and earlier work by Andi Kleen.
Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The original scrub rate API definition states that if scrub rate
accessors are not implemented, a negative value (-1) should be written
to the sysfs file (/sys/devices/system/edac/mc/mc<N>/sdram_scrub_rate,
where N is the memory controller number on the system). This is
counter-intuitive and awkward at the very least because, when setting
the scrub rate, userspace has to write to sysfs and then read it back to
check error status of the operation.
As Tony notes, best it would be to not have the sdram_scrub_rate in
sysfs if scrub rate support is not implemented. It is too late about
that and a bunch of drivers on a bunch of arches would need to be
changed and tested which is not a trivial task ATM.
Instead, settle for the next best thing of returning -ENODEV when
implementation is missing and -EINVAL when there was an error
encountered while setting the scrub rate.
Reported-by: Han Pingtian <phan@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20110916105856.GA13253@hpt.nay.redhat.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
While initializing the array of csrow attribute instances, a few csrows
were uninitialized. This happened because the module only performed a
check for DRAM base ctl register0's and not DRAM base ctl register1's
chip select enable bit. There could be systems with DIMMs populated
on only single memory channel whereas the module also assumed that a
dual channel dimm had double the memory size of a single memory channel
instead of checking the memory on each channel.
This patch fixes these above issues.
Signed-off-by: Ashish Shenoy <ashenoy@riverbed.com>
Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
Link: http://lkml.kernel.org/r/4F459CFA.5090604@riverbed.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
For files that are actively using linux/device.h, make sure
that they call it out. This will allow us to clean up some
of the implicit uses of linux/device.h within include/*
without introducing build regressions.
Yes, this was created by "cheating" -- i.e. the headers were
cleaned up, and then the fallout was found and fixed, and then
the two commits were reordered. This ensures we don't introduce
build regressions into the git history.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This provides unified readq()/writeq() helper functions for 32-bit
drivers.
For some cases, readq/writeq without atomicity is harmful, and order of
io access has to be specified explicitly. So in this patch, new two
header files which contain non-atomic readq/writeq are added.
- <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/
writeq with the order of lower address -> higher address
- <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/
writeq with reversed order
This allows us to remove some readq()s that were added drivers when the
default non-atomic ones were removed in commit dbee8a0aff ("x86:
remove 32-bit versions of readq()/writeq()")
The drivers which need readq/writeq but can do with the non-atomic ones
must add the line:
#include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */
But this will be nop in 64-bit environments, and no other #ifdefs are
required. So I believe that this patch can solve the problem of
1. driver-specific readq/writeq
2. atomicity and order of io access
This patch is tested with building allyesconfig and allmodconfig as
ARCH=x86 and ARCH=i386 on top of tip/master.
Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
Kconfig: acpi: Fix typo in comment.
misc latin1 to utf8 conversions
devres: Fix a typo in devm_kfree comment
btrfs: free-space-cache.c: remove extra semicolon.
fat: Spelling s/obsolate/obsolete/g
SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
tools/power turbostat: update fields in manpage
mac80211: drop spelling fix
types.h: fix comment spelling for 'architectures'
typo fixes: aera -> area, exntension -> extension
devices.txt: Fix typo of 'VMware'.
sis900: Fix enum typo 'sis900_rx_bufer_status'
decompress_bunzip2: remove invalid vi modeline
treewide: Fix comment and string typo 'bufer'
hyper-v: Update MAINTAINERS
treewide: Fix typos in various parts of the kernel, and fix some comments.
clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
leds: Kconfig: Fix typo 'D2NET_V2'
sound: Kconfig: drop unknown symbol ARCH_CLPS7500
...
Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
arm: fix up some samsung merge sysdev conversion problems
firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
Drivers:hv: Fix a bug in vmbus_driver_unregister()
driver core: remove __must_check from device_create_file
debugfs: add missing #ifdef HAS_IOMEM
arm: time.h: remove device.h #include
driver-core: remove sysdev.h usage.
clockevents: remove sysdev.h
arm: convert sysdev_class to a regular subsystem
arm: leds: convert sysdev_class to a regular subsystem
kobject: remove kset_find_obj_hinted()
m86k: gpio - convert sysdev_class to a regular subsystem
mips: txx9_sram - convert sysdev_class to a regular subsystem
mips: 7segled - convert sysdev_class to a regular subsystem
sh: dma - convert sysdev_class to a regular subsystem
sh: intc - convert sysdev_class to a regular subsystem
power: suspend - convert sysdev_class to a regular subsystem
power: qe_ic - convert sysdev_class to a regular subsystem
power: cmm - convert sysdev_class to a regular subsystem
s390: time - convert sysdev_class to a regular subsystem
...
Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
- arch/arm/mach-exynos/cpu.c
- arch/arm/mach-exynos/irq-eint.c
- arch/arm/mach-s3c64xx/common.c
- arch/arm/mach-s3c64xx/cpu.c
- arch/arm/mach-s5p64x0/cpu.c
- arch/arm/mach-s5pv210/common.c
- arch/arm/plat-samsung/include/plat/cpu.h
- arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.
The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space. However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files. The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:
text data bss dec hex filename
4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before
4737444 506459 972040 6215943 5ed907 vmlinux.o.after
for a difference of 276 bytes for an example UP config.
If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No functionality change, this is done so that in a follow-on patch all
queued-up MCEs can be decoded after registering on the chain.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
Please let me know if I missed anything, and I will try to get it changed and resent.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
compatible in dts has been changed, so the driver needs to be updated
accordingly.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
powerpc/p3060qds: Add support for P3060QDS board
powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX
powerpc/85xx: Make kexec to interate over online cpus
powerpc/fsl_booke: Fix comment in head_fsl_booke.S
powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices
powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver
powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver
powerpc/86xx: Correct Gianfar support for GE boards
powerpc/cpm: Clear muram before it is in use.
drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager
powerpc/fsl_msi: add support for "msi-address-64" property
powerpc/85xx: Setup secondary cores PIR with hard SMP id
powerpc/fsl-booke: Fix settlbcam for 64-bit
powerpc/85xx: Adding DCSR node to dtsi device trees
powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards
powerpc/85xx: fix PHYS_64BIT selection for P1022DS
powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
powerpc: respect mem= setting for early memory limit setup
powerpc: Update corenet64_smp_defconfig
powerpc: Update mpc85xx/corenet 32-bit defconfigs
...
Fix up trivial conflicts in:
- arch/powerpc/configs/40x/hcu4_defconfig
removed stale file, edited elsewhere
- arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c:
added opal and gelic drivers vs added ePAPR driver
- drivers/tty/serial/8250.c
moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support
The sb_edac driver is marginally useful on a 32-bit kernel, and
currently has 64-bit divide compile errors when building that config.
For now, make this build on only for 64-bit kernels.
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits)
MAINTAINERS: add an entry for Edac Sandy Bridge driver
edac: tag sb_edac as EXPERIMENTAL, as it requires more testing
EDAC: Fix incorrect edac mode reporting in sb_edac
edac: sb_edac: Add it to the building system
edac: Add an experimental new driver to support Sandy Bridge CPU's
i7300_edac: Fix error cleanup logic
i7core_edac: Initialize memory name with cpu, channel, bank
i7core_edac: Fix compilation on 32 bits arch
i7core_edac: scrubbing fixups
EDAC: Correct Kconfig dependencies
i7core_edac: return -ENODEV if no MC is found
i7core_edac: use edac's own way to print errors
MAINTAINERS: remove dropped edac_mce.* from the file
i7core_edac: Drop the edac_mce facility
x86, MCE: Use notifier chain only for MCE decoding
EDAC i7core: Use mce socketid for better compatibility
i7core_edac: Don't enable memory scrubbing for Xeon 35xx
i7core_edac: Add scrubbing support
edac: Move edac main structs to include/linux/edac.h
i7core_edac: Fix oops when trying to inject errors
...
The edac driver for Sandy Bridge was found to be reporting "FPM"
for edac_mode, which clearly doesn't make sense. It was found that
sb_edac.c:get_dimm_config was reusing a variable for both mem_type
and edac_type, and thus was overwriting the value after setting
it correctly. This patch fixes that issue.
Before the patch:
/sys/devices/system/edac/mc/mc0/csrow0/edac_mode:FPM
/sys/devices/system/edac/mc/mc0/csrow1/edac_mode:FPM
/sys/devices/system/edac/mc/mc0/csrow2/edac_mode:FPM
/sys/devices/system/edac/mc/mc0/csrow3/edac_mode:FPM
After:
/sys/devices/system/edac/mc/mc0/csrow0/edac_mode:S4ECD4ED
/sys/devices/system/edac/mc/mc0/csrow1/edac_mode:S4ECD4ED
/sys/devices/system/edac/mc/mc0/csrow2/edac_mode:S4ECD4ED
/sys/devices/system/edac/mc/mc0/csrow3/edac_mode:S4ECD4ED
Signed-off-by: Mark A. Grondona <mgrondona@llnl.gov>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Some changes on it were required due to changeset cd90cc84c6bf0, that
changed the glue with the MCE logic.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This driver is known to work on mine and Tony's test environments,
using software error injection, and a partial hardware/software
error injection tool.
There's no broader range test yet to double check if the error decoding
logic will actually point to the right DIMM, so use it with care.
More tests are required to be sure that the driver will work on all
different types of memory configurations.
If you're willing to risk using it, I suggest you to enable EDAC debugs
for your test machines, as the debug logs helps to track what's going
inside the driver.
Please feed me with bug reports, if you notice that the driver
is miss-behaving.
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The error cleanup logic was broken. Due to that, one error is generated for
every error polling.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
on i386:
ERROR: "__udivdi3" [drivers/edac/i7core_edac.ko] undefined!\
In both get_sdram_scrub_rate() and set_sdram_scrub_rate()
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Get a more reliable DCLK value from DMI, name the SCRUBINTERVAL mask
and guard against potential overflow in the scrub rate computations.
Signed-off-by: Nils Carlson <nils.carlson@ericsson.com>
Both AMD and Intel i7 EDAC drivers use MCE features and are thus
dependent of this functionality present in the kernel. Express this in
Kconfig so that randconfig builds don't break.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Nehalem-EX uses a different memory controller. However, as the
memory controller is not visible on some Nehalem/Nehalem-EP, we
need to indirectly probe via a X58 PCI device. The same devices
are found on (some) Nehalem-EX. So, on those machines, the
probe routine needs to return -ENODEV, as the actual Memory
Controller registers won't be detected.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Remove edac_mce pieces and use the normal MCE decoder notifier chain by
retaining the same functionality with considerably less code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This file really needs the full module.h header file present, but
was just getting it implicitly before. Fix it up in advance so we
avoid build failures once the cleanup commit is present.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
mce->socketid and cpu_data(mce->cpu).phys_proc_id are the same,
compare with mce_setup (in mce.c):
m->cpu = m->extcpu = smp_processor_id();
...
m->socketid = cpu_data(m->extcpu).phys_proc_id;
This makes it easier for example for XEN patches to hook into
the MCE subsystem.
Compile tested on x86_64.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: JBeulich@novell.com
CC: linux-edac@vger.kernel.org
CC: Mauro Carvalho Chehab <mchehab@redhat.com>
Add scrubbing support to i7core_edac, tested on intel Xeon L5638.
Signed-off-by: Samuel Gabrielsson <samuel.gabrielsson@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
As we'll need to use those structs for trace functions, they should
be on a more public place. So, move struct mem_ctl_info & friends
to edac.h.
No functional changes on this patch.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Error injection needs the pci device 0:0. So, we need to revert
this changeset: 79daef2099.
Tests need to be made to be sure that refcount won't be wrong
as noticed before.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>