Calgary hits a NULL pointer dereference when booting in a multi-chassis
NUMA system. See Redhat bugzilla number 198498, found by Konrad
Rzeszutek (konradr@redhat.com).
There are many issues that had to be resolved to fix this problem.
Firstly when I originally wrote the code to handle NUMA systems, I
had a large misunderstanding that was not corrected until now. That was
that I thought the "number of nodes online" referred to number of
physical systems connected. So that if NUMA was disabled, there
would only be 1 node and it would only show that node's PCI bus.
In reality if NUMA is disabled, the system displays all of the
connected chassis as one node but is only ignorant of the delays
in accessing main memory. Therefore, references to num_online_nodes()
and MAX_NUMNODES are incorrect and need to be set to the maximum
number of nodes that can be accessed (which are 8). I created a
variable, MAX_NUM_CHASSIS, and set it to 8 to fix this.
Secondly, when walking the PCI in detect_calgary, the code only
checked the first "slot" when looking to see if a device is present.
This will work for most cases, but unfortunately it isn't always the
case. In the NUMA MXE drawers, there are USB devices present on the
3rd slot (with slot 1 being empty). So, to work around this, all
slots (up to 8) are scanned to see if there are any devices present.
Lastly, the bus is being enumerated on large systems in a different
way the we originally thought. This throws the ugly logic we had
out the window. To more elegantly handle this, I reorganized the
kva array to be sparse (which removed the need to have any bus number
to kva slot logic in tce.c) and created a secondary space array to
contain the bus number to phb mapping.
With these changes Calgary boots on an x460 with 4 nodes with and
without NUMA enabled.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixed off-by-one error in detect_calgary and calgary_init which will
cause arrays to overflow. Also, removed impossible to hit BUG_ON.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Intel systems generally the TSC stops in C3 or deeper,
so don't use it there. Follows similar logic on i386.
This should fix problems on Meroms.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Similar patch to earlier x86-64 patch. When the dwarf2 unwinder fails
dump the left over stack with the old unwinder.
Also some clarifications in the headers.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The dwarf2 unwinder currently often gets stuck because a lot
of assembly code doesn't have proper dwarf2 annotiation yet.
This currently often happens with __down. Should fix this by
adding proper dwarf2 annotation to all inline assembly. However
until that's done we need a quick fix for 2.6.18 to avoid
incomplete backtraces.
So when this happens dump the rest of the stack with the old unwinder
instead of silently not dumping it. There was already a optional
"both" mode that dumped both, but that was too ugly.
I also clarified the headers for the different backtraces a bit.
Also add a clear error message for missing dwarf2
annotation that people can work on.
And I removed a dead variable left over from Ingo's changes.
Cc: mingo@elte.hu
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When int 0x80 is called from long mode r8-r11 would leak out of the
kernel (or rather they would be filled with some values from
the kernel stack). I don't think it's a security issue because
the values come from the fixed stack frame which should be near
always user registers from a previous interrupt.
Still better fix it.
Longer term the register save macros need to be cleaned up
to avoid such mistakes in the future.
Original analysis from Richard Brunner, fix by me.
Cc: Richard.Brunner@amd.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes a obscure user space triggerable crash during oprofiling.
Oprofile calls profile_pc from NMIs even when user_mode(regs) is not true and
the program counter is inside the kernel lock section. This opens
a race - when a user program jumps to a kernel lock address and
a NMI happens before the illegal page fault exception is raised
and the program has a unmapped esp or ebp then the kernel could
oops. NMIs have a higher priority than exceptions so that could
happen.
Add user_mode checks to i386/x86-64 profile_pc to prevent that.
Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Recent changes in i386 __switch_to() have a misplaced closing
parenthesis causing an unlikely() to terminate early.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By using for_each_node_by_type().
Also, correct a spurioud test in check_cpu_node() on sparc64.
It is only called with nodes that have device_type "cpu".
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Explicitly traverse to the root looking for the "sbi".
2) Grab the "board#" property from the sbi's parent and
verify that this parent is an "io-unit" node.
3) Skip IRQ initialization when device lacks "reg" property.
Signed-off-by: David S. Miller <davem@davemloft.net>
On sparc32 the prom_{first,next}prop() interfaces work
a little differently. The buffer argument is ignored on
sparc32 and the firmware just returns a raw pointer to
the property name.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabre and Psycho PCI controllers can have partial interrupt-map
properties, meaning that on-board devices don't match up to any
entries. Instead, they are fully specified from the beginning and
we should pass them directly to the IRQ translator as-is.
Also, fill in the necessary translator slots for the "graphics"
and "expansion UPA" interrupts on Sabre, Psycho, and SYSIO SBUS.
Increase PROMREG_MAX to 24, as seen on SUNW,ffb devices.
Finally, prevent accidentally writing past the end of the of_device
struct resource[] and irqs[] arrays. Spit out a log message when
we ignore some entries because there are too many of them.
Signed-off-by: David S. Miller <davem@davemloft.net>
Take return values of sysfs_create_group & friends into account.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
SLES9 binutils don't like .align 4096 statements in head.S. Work around this
by using .org statements.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Some -mm-only material leaked into a patch destined for mainline, and I didn't
notice.
This was the replacement of system_utsname with utsname() that's required by
the uts namespace patch. This patch reverts those changes (which are correct
in -mm) so that mainline UML builds again.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up whitespace and return syntax in os.h.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On top of the previous biarch changes for UML, this makes the preprocessor
changes a bit cleaner. Specify the 64-bit build in CPPFLAGS on the x86_64
SUBARCH, rather than #undef'ing i386. Compile-tested with i386 and x86_64
SUBARCHs.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The UML_SETJMP macro was requiring its users to pass in a argument which it
could supply itself, since it wasn't used outside that invocation of the
macro.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On i386, the user space accessor functions copy_from/to_user() both invoke
might_sleep(), do a quick sanity check, and then pass the work on to their
__copy_from/to_user() counterparts, which again invoke might_sleep().
Given that no actual work happens between these two calls, it is best to
eliminate one of the redundant might_sleep()s.
Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the foolish assumption that SMP implied local apics.
That assumption is not-true on the Voyager subarch. This makes that
dependency explicit, and allows the code to build.
What gets disabled is just an optimization to get better crash dumps so the
support should work if there is a kernel that will initialization on the
voyager subarch under those harsh conditions.
Hopefully we can figure out how to initialize apics in init_IRQ and remove
the need to disable io_apics and this dependency.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
handle_BUG() tries to print file and line number even when they're not
available (CONFIG_DEBUG_BUGVERBOSE is not set.) Change this to print a
message stating info is unavailable instead of printing a misleading
message.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pbm->name should be initialized before calling
pbm_register_toplevel_resources. Move the call a few lines down to
avoid a nice Oops.
Signed-off-by: Marc Zyngier <maz@misterjones.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't overwrite it, it's the device node full name
already and that's what we want.
Based upon a report from Marc Zyngier.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (53 commits)
[MIPS] sparsemem: fix crash in show_mem
[MIPS] vr41xx: Update workpad setup function
[MIPS] vr41xx: Update e55 setup function
[MIPS] vr41xx: Removed old v2.4 VRC4173 driver
[MIPS] vr41xx: Move IRQ numbers to asm-mips/vr41xx/irq.h
[MIPS] MIPSsim: Build fix, rename sim_timer_setup -> plat_timer_setup.
[MIPS] Remove unused code.
[MIPS] IP22 Fix brown paper bag in RTC code.
[MIPS] Atlas, Malta, SEAD: Don't disable interrupts in mips_time_init().
[MIPS] Replace board_timer_setup function pointer by plat_timer_setup.
[MIPS] Nuke redeclarations of board_time_init.
[MIPS] Remove redeclarations of setup_irq().
[MIPS] Nuke redeclarations of board_timer_setup.
[MIPS] Print out TLB handler assembly for debugging.
[MIPS] SMTC: Reformat to Linux style.
[MIPS] MIPSsim: Delete redeclaration of ll_local_timer_interrupt.
[MIPS] IP27: Reformatting.
[MIPS] IP27: Invoke setup_irq for timer interrupt so proc stats will be shown.
[MIPS] IP27: irq_chip startup method returns unsigned int.
[MIPS] IP27: struct irq_desc member handler was renamed to chip.
...
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] arch/arm/kernel/bios32.c: no need to set isa_bridge
[ARM] 3729/3: EABI padding rules necessitate the packed attribute of floatx80
[ARM] 3725/1: sharpsl_pm: warn about wrong temperature
[ARM] 3723/1: collie charging
[ARM] 3728/1: Restore missing CPU Hotplug irq helper
[ARM] 3727/1: fix ucb initialization on collie
[ARM] Allow Versatile to be built for AB and PB
[ARM] 3726/1: update {ep93xx,ixp2000,ixp23xx,lpd270,onearm} defconfigs to 2.6.18-rc1
[ARM] 3721/1: Small cleanup for locomo.c
With sparsemem, pfn should be checked by pfn_valid() before pfn_to_page().
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch fixes a typo in arch/mips/sgi-ip22/ip22-time.c, leading to the
incorrect year being set into the RTC chip.
Signed-off-by: Julien BLACHE <jb@jblache.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>