1
linux/arch/x86/kernel
Ingo Molnar 4aae070252 x86: fix "Kernel panic - not syncing: IO-APIC + timer doesn't work!"
this is the tale of a full day spent debugging an ancient but elusive bug.

after booting up thousands of random .config kernels, i finally happened
to generate a .config that produced the following rare bootup failure
on 32-bit x86:

| ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
| ..MP-BIOS bug: 8254 timer not connected to IO-APIC
| ...trying to set up timer (IRQ0) through the 8259A ...  failed.
| ...trying to set up timer as Virtual Wire IRQ... failed.
| ...trying to set up timer as ExtINT IRQ... failed :(.
| Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with apic=debug
| and send a report.  Then try booting with the 'noapic' option

this bug has been reported many times during the years, but it was never
reproduced nor fixed.

the bug that i hit was extremely sensitive to .config details.

First i did a .config-bisection - suspecting some .config detail.
That led to CONFIG_X86_MCE: enabling X86_MCE magically made the bug disappear
and the system would boot up just fine.

Debugging my way through the MCE code ended up identifying two unlikely
candidates: the thing that made a real difference to the hang was that
X86_MCE did two printks:

 Intel machine check architecture supported.
 Intel machine check reporting enabled on CPU#1.

Adding the same printks to a !CONFIG_X86_MCE kernel made the bug go away!

this left timing as the main suspect: i experimented with adding various
udelay()s to the arch/x86/kernel/io_apic_32.c:check_timer() function, and
the race window turned out to be narrower than 30 microseconds (!).

That made debugging especially funny, debugging without having printk
ability before the bug hits is ... interesting ;-)

eventually i started suspecting IRQ activities - those are pretty much the
only thing that happen this early during bootup and have the timescale of
a few dozen microseconds. Also, check_timer() changes the IRQ hardware
in various creative ways, so the main candidate became IRQ0 interaction.

i've added a counter to track timer irqs (on which core they arrived, at
what exact time, etc.) and found that no timer IRQ would arrive after the
bug condition hits - even if we re-enable IRQ0 and re-initialize the i8259A,
but that we'd get a small number of timer irqs right around the time when we
call the check_timer() function.

Eventually i got the following backtrace triggered from debug code in the
timer interrupt:

...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ...
Pid: 1, comm: swapper Not tainted (2.6.24-rc5 #57)
EIP: 0060:[<c044d57e>] EFLAGS: 00000246 CPU: 0
EIP is at _spin_unlock_irqrestore+0x5/0x1c
EAX: c0634178 EBX: 00000000 ECX: c4947d63 EDX: 00000246
ESI: 00000002 EDI: 00010031 EBP: c04e0f2e ESP: f7c41df4
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 CR0: 8005003b CR2: ffe04000 CR3: 00630000 CR4: 000006d0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
  [<c05f5784>] setup_IO_APIC+0x9c3/0xc5c

the spin_unlock() was called from init_8259A(). Wait ... we have an IRQ0
entry while we are in the middle of setting up the local APIC, the i8259A
and the PIT??

That is certainly not how it's supposed to work! check_timer() was supposed
to be called with irqs turned off - but this eroded away sometime in the
past. This code would still work most of the time because this code runs
very quickly, but just the right timing conditions are present and IRQ0
hits in this small, ~30 usecs window, timer irqs stop and the system does
not boot up. Also, given how early this is during bootup, the hang is
very deterministic - but it would only occur on certain machines (and
certain configs).

The fix was quite simple: disable/restore interrupts properly in this
function. With that in place the test-system now boots up just fine.

(64-bit x86 io_apic_64.c had the same bug.)

Phew! One down, only 1500 other kernel bugs are left ;-)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-18 18:05:58 +01:00
..
acpi ACPI: suspend: old debugging hacks sneaked back 2007-12-06 16:03:06 -05:00
cpu x86: free_cache_attributes() section fix 2007-12-04 17:19:07 +01:00
.gitignore .gitignore update for x86 arch 2007-10-17 21:19:04 +02:00
alternative.c x86: convert cpuinfo_x86 array to a per_cpu array 2007-10-19 20:35:04 +02:00
aperture_64.c x86 gart: rename symbols only used for the GART implementation 2007-10-30 00:22:22 +01:00
apic_32.c x86: fix APIC related bootup crash on Athlon XP CPUs 2007-11-26 20:42:20 +01:00
apic_64.c x86: add lapic_shutdown for x86_64 2007-10-23 22:37:22 +02:00
apm_32.c spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
asm-offsets_32.c Boot with virtual == physical to get closer to native Linux. 2007-10-23 15:49:54 +10:00
asm-offsets_64.c x86: Fix boot protocol KEEP_SEGMENTS check. 2007-10-27 20:57:43 +02:00
asm-offsets.c
audit_64.c
bootflag.c
bugs_64.c
cpuid.c x86: convert cpuinfo_x86 array to a per_cpu array 2007-10-19 20:35:04 +02:00
crash_dump_32.c kmap leak fix for x86_32 kdump 2007-10-19 11:53:33 -07:00
crash_dump_64.c
crash.c x86: disable hpet legacy replacement for kdump 2007-12-03 17:17:10 +01:00
doublefault_32.c
e820_32.c kexec: add BSS to resource tree 2007-10-22 08:13:19 -07:00
e820_64.c kexec: add BSS to resource tree 2007-10-22 08:13:19 -07:00
early_printk.c [x86] remove uses of magic macros for boot_params access 2007-10-16 17:38:31 -07:00
early-quirks.c x86 gart: rename symbols only used for the GART implementation 2007-10-30 00:22:22 +01:00
efi_32.c kexec: add BSS to resource tree 2007-10-22 08:13:19 -07:00
efi_stub_32.S
entry_32.S Merge branch 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen 2007-10-17 11:10:11 -07:00
entry_64.S x86: return correct error code from child_rip in x86_64 entry.S 2007-10-17 20:15:29 +02:00
genapic_64.c x86: convert cpu_to_apicid to be a per cpu variable 2007-10-19 20:35:03 +02:00
genapic_flat_64.c x86: convert cpu_to_apicid to be a per cpu variable 2007-10-19 20:35:03 +02:00
geode_32.c x86: Geode Multi-Function General Purpose Timers support 2007-10-12 23:04:06 +02:00
head64.c x86: use descriptor's functions instead of inline assembly 2007-10-19 20:35:03 +02:00
head_32.S x86: fix x86-32 early fixmap initialization. 2007-12-03 17:17:10 +01:00
head_64.S
hpet.c x86: disable hpet on shutdown 2007-12-03 17:17:10 +01:00
i386_ksyms_32.c x86: export the symbol empty_zero_page on the 32-bit x86 architecture 2007-11-26 20:42:19 +01:00
i387_32.c
i387_64.c x86: fix taking DNA during 64bit sigreturn 2007-11-12 11:09:33 -08:00
i8237.c
i8253.c spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
i8259_32.c i386: introduce "used_vectors" bitmap which can be used to reserve vectors. 2007-10-19 20:35:03 +02:00
i8259_64.c x86: more struct irqaction initializer cleanups 2007-10-17 20:16:07 +02:00
init_task.c x86: merge init_task_32/64.c 2007-10-19 20:35:02 +02:00
io_apic_32.c x86: fix "Kernel panic - not syncing: IO-APIC + timer doesn't work!" 2007-12-18 18:05:58 +01:00
io_apic_64.c x86: fix "Kernel panic - not syncing: IO-APIC + timer doesn't work!" 2007-12-18 18:05:58 +01:00
ioport_32.c
ioport_64.c
irq_32.c x86: also show non-zero IRQ counts for vectors that currently don't have a handler 2007-10-17 20:16:54 +02:00
irq_64.c x86: also show non-zero IRQ counts for vectors that currently don't have a handler 2007-10-17 20:16:54 +02:00
k8.c
kprobes_32.c x86: jprobe bugfix 2007-12-18 18:05:58 +01:00
kprobes_64.c x86: kprobes bugfix 2007-12-18 18:05:58 +01:00
ldt_32.c x86: convert mm_context_t semaphore to a mutex 2007-10-17 20:17:05 +02:00
ldt_64.c x86: convert mm_context_t semaphore to a mutex 2007-10-17 20:17:00 +02:00
machine_kexec_32.c Use extended crashkernel command line on i386 2007-10-19 11:53:49 -07:00
machine_kexec_64.c x86: Dump filtering supports x86_64 sparsemem 2007-10-27 20:57:43 +02:00
Makefile x86: delete vsyscall files during make clean 2007-10-17 21:56:01 +02:00
Makefile_32 x86: do not use $(ARCH) when not needed 2007-11-12 21:02:20 +01:00
Makefile_64 x86: do not use $(ARCH) when not needed 2007-11-12 21:02:20 +01:00
mca_32.c
mfgpt_32.c x86: Geode MFGPT clock event device support 2007-10-12 23:04:06 +02:00
microcode.c x86: convert cpuinfo_x86 array to a per_cpu array 2007-10-19 20:35:04 +02:00
module_32.c
module_64.c
mpparse_32.c spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
mpparse_64.c x86: acpi use cpu_physical_id 2007-10-19 20:35:03 +02:00
msr.c x86: convert cpuinfo_x86 array to a per_cpu array 2007-10-19 20:35:04 +02:00
nmi_32.c x86: add the word 'WARNING' in check_nmi_watchdog() output 2007-12-04 17:19:07 +01:00
nmi_64.c x86: add the word 'WARNING' in check_nmi_watchdog() output 2007-12-04 17:19:07 +01:00
numaq_32.c
paravirt_32.c x86/paravirt: revert exports to restore old behaviour 2007-11-29 09:24:55 -08:00
pci-calgary_64.c x86 gart: rename iommu.h to gart.h 2007-10-30 00:22:22 +01:00
pci-dma_32.c i386: Clean up duplicate includes in arch/i386/kernel/ 2007-10-17 20:15:51 +02:00
pci-dma_64.c x86: turn off iommu merge by default 2007-11-26 20:42:19 +01:00
pci-gart_64.c x86 gart: rename symbols only used for the GART implementation 2007-10-30 00:22:22 +01:00
pci-nommu_64.c x86 gart: rename iommu.h to gart.h 2007-10-30 00:22:22 +01:00
pci-swiotlb_64.c x86 gart: rename iommu.h to gart.h 2007-10-30 00:22:22 +01:00
pcspeaker.c
pmtimer_64.c
process_32.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86 2007-10-19 15:06:00 -07:00
process_64.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ptrace_32.c spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
ptrace_64.c x86: convert mm_context_t semaphore to a mutex 2007-10-17 20:17:00 +02:00
quirks.c x86: Add HPET force support for MCP55 (nForce 5) chipsets 2007-10-23 22:37:25 +02:00
reboot_32.c x86: disable hpet on shutdown 2007-12-03 17:17:10 +01:00
reboot_64.c x86: disable hpet on shutdown 2007-12-03 17:17:10 +01:00
reboot_fixups_32.c x86: reboot fixup for wrap2c board 2007-11-17 16:27:02 +01:00
relocate_kernel_32.S
relocate_kernel_64.S
scx200_32.c
setup64.c x86: use descriptor's functions instead of inline assembly 2007-10-19 20:35:03 +02:00
setup_32.c x86: kernel/setup_32.c: unexport machine_id 2007-10-30 00:22:22 +01:00
setup_64.c x86: fixup cpu_info array conversion 2007-11-17 16:27:01 +01:00
sigframe_32.h
signal_32.c spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
signal_64.c spelling fixes: arch/x86_64/ 2007-10-20 01:25:36 +02:00
smp_32.c x86: export smp_ops to allow modular build of KVM 2007-10-27 20:57:43 +02:00
smp_64.c x86: implement missing x86_64 function smp_call_function_mask() 2007-10-19 20:35:03 +02:00
smpboot_32.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2007-10-19 20:36:17 -07:00
smpboot_64.c x86: ARRAY_SIZE cleanup 2007-10-23 22:37:22 +02:00
smpcommon_32.c
srat_32.c
stacktrace.c x86: constify stacktrace_ops 2007-10-17 20:16:11 +02:00
summit_32.c spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
suspend_64.c revert "Hibernation: Use temporary page tables for kernel text mapping on x86_64" 2007-12-17 19:28:15 -08:00
suspend_asm_64.S x86: Save registers in saved_context during suspend and hibernation 2007-10-23 22:37:24 +02:00
sys_i386_32.c remove include/asm-*/ipc.h 2007-10-17 08:42:55 -07:00
sys_x86_64.c
syscall_64.c
syscall_table_32.S
sysenter_32.c
tce_64.c x86: Create clflush() inline, remove hardcoded wbinvd 2007-10-17 20:16:12 +02:00
time_32.c x86: Fix irq0 / local apic timer accounting 2007-10-12 23:04:06 +02:00
time_64.c x86: on x86_64, correct reading of PC RTC when update in progress in time_64.c 2007-11-17 16:27:01 +01:00
topology.c x86: arch_register_cpu() section fix 2007-12-04 17:19:07 +01:00
trampoline_32.S x86: misc. constifications 2007-10-17 20:16:08 +02:00
trampoline_64.S x86: misc. constifications 2007-10-17 20:16:08 +02:00
traps_32.c lockdep: annotate do_debug() trap handler 2007-11-26 20:42:19 +01:00
traps_64.c lockdep: annotate do_debug() trap handler 2007-11-26 20:42:19 +01:00
tsc_32.c x86: fix more TSC clock source calibration errors 2007-10-23 22:37:22 +02:00
tsc_64.c x86: convert cpuinfo_x86 array to a per_cpu array 2007-10-19 20:35:04 +02:00
tsc_sync.c
verify_cpu_64.S
vm86_32.c
vmi_32.c paravirt: clean up lazy mode handling 2007-10-16 11:51:29 -07:00
vmiclock_32.c
vmlinux_32.lds.S
vmlinux_64.lds.S
vmlinux.lds.S
vsmp_64.c
vsyscall_32.lds.S
vsyscall_32.S
vsyscall_64.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2007-10-19 20:36:17 -07:00
vsyscall-int80_32.S
vsyscall-note_32.S
vsyscall-sigreturn_32.S
vsyscall-sysenter_32.S
x8664_ksyms_64.c