linux

Author	SHA1	Message	Date
Ian Campbell	9a626612c2	xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq apic_register_gsi_xen_hvm is a tiny wrapper around xen_hvm_register_pirq. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:38 -05:00
Ian Campbell	4b41df7f6e	xen: events: return irq from xen_allocate_pirq_msi consistent with other similar functions. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:37 -05:00
Ian Campbell	bb5d079aef	xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi All callers pass this flag so it is pointless. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:35 -05:00
Ian Campbell	260a7d4cfd	xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0 Fixes: CC arch/x86/pci/xen.o arch/x86/pci/xen.c:183: warning: 'xen_initdom_setup_msi_irqs' defined but not used Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-10 14:44:33 -05:00
Konrad Rzeszutek Wilk	8448f0119a	Merge branch 'stable/pcifront-fixes' into stable/irq.cleanup * stable/pcifront-fixes: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work()	2011-03-10 14:42:11 -05:00
Konrad Rzeszutek Wilk	8054c3634c	Merge branch 'stable/irq.rework' into stable/irq.cleanup * stable/irq.rework: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME	2011-03-10 14:41:43 -05:00
Ian Campbell	f611f2da99	xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. The patches missed an indirect use of IRQF_NO_SUSPEND pulled in via IRQF_TIMER. The following patch fixes the issue. With this fixlet PV guest migration works just fine. I also booted the entire series as a dom0 kernel and it appeared fine. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-03 12:00:31 -05:00
Ian Campbell	3f2a230caf	xen: handled remapped IRQs when enabling a pcifront PCI device. This happens to not be an issue currently because we take pains to try to ensure that the GSI-IRQ mapping is 1-1 in a PV guest and that regular event channels do not clash. However a subsequent patch is going to break this 1-1 mapping. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>	2011-03-03 11:56:57 -05:00
Konrad Rzeszutek Wilk	3d74a539ae	pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. This code path is only run when an MSI/MSI-X PCI device is passed in to PV DomU. In 2.6.37 time-frame we over-wrote the default cleanup handler for MSI/MSI-X irq->desc to be "xen_teardown_msi_irqs". That function calls the the xen-pcifront driver which can tell the backend to cleanup/take back the MSI/MSI-X device. However, we forgot to continue the process of free-ing the MSI/MSI-X device resources (irq->desc) in the PV domU side. Which is what the default cleanup handler: default_teardown_msi_irqs did. Hence we would leak IRQ descriptors. Without this patch, doing "rmmod igbvf;modprobe igbvf" multiple times ends with abandoned IRQ descriptors: 28: 5 xen-pirq-pcifront-msi-x 29: 8 xen-pirq-pcifront-msi-x ... 130: 10 xen-pirq-pcifront-msi-x with the end result of running out of IRQ descriptors. Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-02-18 12:41:53 -05:00
Konrad Rzeszutek Wilk	cc0f89c4a4	pci/xen: Cleanup: convert int** to int[] Cleanup code. Cosmetic change to make the code look easier to read. Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-02-18 12:41:49 -05:00
Konrad Rzeszutek Wilk	55cb8cd45e	pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen_allocate_pirq -> xen_map_pirq_gsi -> PHYSDEVOP_alloc_irq_vector IFF xen_initial_domain() in addition to the kernel side book-keeping side of things (set chip and handler, update irq_info etc) whereas xen_allocate_pirq_msi just does the kernel book keeping. Also xen_allocate_pirq allocates an IRQ in the 1-1 GSI space whereas xen_allocate_pirq_msi allocates a dynamic one in the >GSI IRQ space. All of this is uneccessary as this code path is only executed when we run as a domU PV guest with an MSI/MSI-X PCI card passed in. Hence we can jump straight to allocating an dynamic IRQ (and binding it to the proper PIRQ) and skip the rest. In short: this change is a cosmetic one. Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-02-18 09:26:37 -05:00
Linus Torvalds	1cecd791f2	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix text_poke_smp_batch() deadlock perf tools: Fix thread_map event synthesizing in top and record watchdog, nmi: Lower the severity of error messages ARM: oprofile: Fix backtraces in timer mode oprofile: Fix usage of CONFIG_HW_PERF_EVENTS for oprofile_perf_init and friends	2011-02-15 10:18:48 -08:00
Linus Torvalds	fef86db8fe	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, dmi, debug: Log board name (when present) in dmesg/oops output x86, ioapic: Don't warn about non-existing IOAPICs if we have none x86: Fix mwait_usable section mismatch x86: Readd missing irq_to_desc() in fixup_irq() x86: Fix section mismatch in LAPIC initialization	2011-02-15 10:18:29 -08:00
Naga Chumbalkar	84e383b322	x86, dmi, debug: Log board name (when present) in dmesg/oops output The "Type 2" SMBIOS record that contains Board Name is not strictly required and may be absent in the SMBIOS on some platforms. ( Please note that Type 2 is not listed in Table 3 in Sec 6.2 ("Required Structures and Data") of the SMBIOS v2.7 Specification. ) Use the Manufacturer Name (aka System Vendor) name. Print Board Name only when it is present. Before the fix: (i) dmesg output: DMI: /ProLiant DL380 G6, BIOS P62 01/29/2011 (ii) oops output: Pid: 2170, comm: bash Not tainted 2.6.38-rc4+ #3 /ProLiant DL380 G6 After the fix: (i) dmesg output: DMI: HP ProLiant DL380 G6, BIOS P62 01/29/2011 (ii) oops output: Pid: 2278, comm: bash Not tainted 2.6.38-rc4+ #4 HP ProLiant DL380 G6 Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: <stable@kernel.org> # .3x - good for debugging, please apply as far back as it applies cleanly LKML-Reference: <20110214224423.2182.13929.sendpatchset@nchumbalkar.americas.hpqcorp.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-15 04:20:57 +01:00
Paul Bolle	678301ecad	x86, ioapic: Don't warn about non-existing IOAPICs if we have none mp_find_ioapic() prints errors like: ERROR: Unable to locate IOAPIC for GSI 13 if it can't find the IOAPIC that manages that specific GSI. I see errors like that at every boot of a laptop that apparently doesn't have any IOAPICs. But if there are no IOAPICs it doesn't seem to be an error that none can be found. A solution that gets rid of this message is to directly return if nr_ioapics (still) is zero. (But keep returning -1 in that case, so nothing breaks from this change.) The call chain that generates this error is: pnpacpi_allocated_resource() case ACPI_RESOURCE_TYPE_IRQ: pnpacpi_parse_allocated_irqresource() acpi_get_override_irq() mp_find_ioapic() Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-15 04:15:04 +01:00
Borislav Petkov	1c9d16e359	x86: Fix mwait_usable section mismatch We use it in non __cpuinit code now too so drop marker. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20110211171754.GA21047@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-14 12:08:28 +01:00
Thomas Gleixner	5117348dea	x86: Readd missing irq_to_desc() in fixup_irq() commit a3c08e5d(x86: Convert irq_chip access to new functions) accidentally zapped desc = irq_to_desc(irq); in the vector loop. So we lock some random irq descriptor. Add it back. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> # .37	2011-02-12 11:56:22 +01:00
Peter Zijlstra	d91309f69b	x86: Fix text_poke_smp_batch() deadlock Fix this deadlock - we are already holding the mutex: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.38-rc4-test+ #1 ------------------------------------------------------- bash/1850 is trying to acquire lock: (text_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f but task is already holding lock: (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (smp_alt){+.+...}: [<ffffffff81082d02>] lock_acquire+0xcd/0xf8 [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339 [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43 [<ffffffff8101050f>] alternatives_smp_switch+0x77/0x1d8 [<ffffffff81926a6f>] do_boot_cpu+0xd7/0x762 [<ffffffff819277dd>] native_cpu_up+0xe6/0x16a [<ffffffff81928e28>] _cpu_up+0x9d/0xee [<ffffffff81928f4c>] cpu_up+0xd3/0xe7 [<ffffffff82268d4b>] kernel_init+0xe8/0x20a [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10 -> #1 (cpu_hotplug.lock){+.+.+.}: [<ffffffff81082d02>] lock_acquire+0xcd/0xf8 [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339 [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43 [<ffffffff810568cc>] get_online_cpus+0x41/0x55 [<ffffffff810a1348>] stop_machine+0x1e/0x3e [<ffffffff819314c1>] text_poke_smp_batch+0x3a/0x3c [<ffffffff81932b6c>] arch_optimize_kprobes+0x10d/0x11c [<ffffffff81933a51>] kprobe_optimizer+0x152/0x222 [<ffffffff8106bb71>] process_one_work+0x1d3/0x335 [<ffffffff8106cfae>] worker_thread+0x104/0x1a4 [<ffffffff810707c4>] kthread+0x9d/0xa5 [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10 -> #0 (text_mutex){+.+.+.}: other info that might help us debug this: 6 locks held by bash/1850: #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f #1: (s_active#75){.+.+.+}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f #2: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f #3: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f #4: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f #5: (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f stack backtrace: Pid: 1850, comm: bash Not tainted 2.6.38-rc4-test+ #1 Call Trace: [<ffffffff81080eb2>] print_circular_bug+0xa8/0xb7 [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43 [<ffffffff81010302>] alternatives_smp_unlock+0x3d/0x93 [<ffffffff81010630>] alternatives_smp_switch+0x198/0x1d8 [<ffffffff8102568a>] native_cpu_die+0x65/0x95 [<ffffffff818cc4ec>] _cpu_down+0x13e/0x202 [<ffffffff8117a619>] sysfs_write_file+0x108/0x144 [<ffffffff8111f5a2>] vfs_write+0xac/0xff [<ffffffff8111f7a9>] sys_write+0x4a/0x6e Reported-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: mathieu.desnoyers@efficios.com Cc: rusty@rustcorp.com.au Cc: ananth@in.ibm.com Cc: masami.hiramatsu.pt@hitachi.com Cc: fweisbec@gmail.com Cc: jbeulich@novell.com Cc: jbaron@redhat.com Cc: mhiramat@redhat.com LKML-Reference: <1297458466.5226.93.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-12 02:34:34 +01:00
Linus Torvalds	3c6c0d6ca3	Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index	2011-02-11 16:30:09 -08:00
Jan Beulich	2fb270f321	x86: Fix section mismatch in LAPIC initialization Additionally doing things conditionally upon smp_processor_id() being zero is generally a bad idea, as this means CPU 0 cannot be offlined and brought back online later again. While there may be other places where this is done, I think adding more of those should be avoided so that some day SMP can really become "symmetrical". Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <4D525C7E0200007800030EE1@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-10 13:26:53 +01:00
Joerg Roedel	893a5ab6ee	KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index The gs_index loading code uses the swapgs instruction to switch to the user gs_base temporarily. This is unsave in an lightweight exit-path in KVM on AMD because the KERNEL_GS_BASE MSR is switches lazily. An NMI happening in the critical path of load_gs_index may use the wrong GS_BASE value then leading to unpredictable behavior, e.g. a triple-fault. This patch fixes the issue by making sure that load_gs_index is called only with a valid KERNEL_GS_BASE value loaded in KVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-02-09 18:31:36 +02:00
H. Peter Anvin	d344e38b2c	x86, nx: Mark the ACPI resume trampoline code as +x We reserve lowmem for the things that need it, like the ACPI wakeup code, way early to guarantee availability. This happens before we set up the proper pagetables, so set_memory_x() has no effect. Until we have a better solution, use an initcall to mark the wakeup code executable. Originally-by: Matthieu Castet <castet.matthieu@free.fr> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Matthias Hopf <mhopf@suse.de> Cc: rjw@sisk.pl Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4D4F8019.2090104@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-07 09:07:13 +01:00
Linus Torvalds	07675f484b	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure the stack is set up before we use it x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms x86, nx: Don't force pages RW when setting NX bits	2011-02-06 12:03:10 -08:00
H. Peter Anvin	11d4c3f9b6	x86-32: Make sure the stack is set up before we use it Since checkin `ebba638ae7` we call verify_cpu even in 32-bit mode. Unfortunately, calling a function means using the stack, and the stack pointer was not initialized in the 32-bit setup code! This code initializes the stack pointer, and simplifies the interface slightly since it is easier to rely on just a pointer value rather than a descriptor; we need to have different values for the segment register anyway. This retains start_stack as a virtual address, even though a physical address would be more convenient for 32 bits; the 64-bit code wants the other way around... Reported-by: Matthieu Castet <castet.matthieu@free.fr> LKML-Reference: <4D41E86D.8060205@free.fr> Tested-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-02-04 22:27:28 -08:00
Suresh Siddha	831d52bc15	x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb IPI's while the cr3 is still pointing to the prev mm. And this window can lead to the possibility of bogus TLB fills resulting in strange failures. One such problematic scenario is mentioned below. T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI etc between the point of clearing the cpu from the mm_cpumask(mm1) and before reloading the cr3 with the new mm2. T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1 as it doesn't see that cpu listed in the mm_cpumask(mm1). T3. After the TLB flush is complete, CPU-2 goes ahead and frees the page-table pages associated with the removed vma mapping. T4. CPU-2 now allocates those freed page-table pages for something else. T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1 can potentially speculate and walk through the page-table caches and can insert new TLB entries. As the page-table pages are already freed and being used on CPU-2, this page walk can potentially insert a bogus global TLB entry depending on the (random) contents of the page that is being used on CPU-2. T6. This bogus TLB entry being global will be active across future CR3 changes and can result in weird memory corruption etc. To avoid this issue, for the prev mm that is handing over the cpu to another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is changed. Marking it for -stable, though we haven't seen any reported failure that can be attributed to this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org [v2.6.32+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-03 13:32:39 -08:00
Linus Torvalds	eb487ab4d5	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix reading in perf_event_read() watchdog: Don't change watchdog state on read of sysctl watchdog: Fix sysctl consistency watchdog: Fix broken nowatchdog logic perf: Fix Pentium4 raw event validation perf: Fix alloc_callchain_buffers()	2011-02-03 08:52:05 -08:00
Suresh Siddha	f7448548a9	x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms Markus Kohn ran into a hard hang regression on an acer aspire 1310, when acpi is enabled. git bisect showed the following commit as the bad one that introduced the boot regression. commit `d0af9eed5a` Author: Suresh Siddha <suresh.b.siddha@intel.com> Date: Wed Aug 19 18:05:36 2009 -0700 x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init Because of the UP configuration of that platform, native_smp_prepare_cpus() bailed out (in smp_sanity_check()) before doing the set_mtrr_aps_delayed_init() Further down the boot path, native_smp_cpus_done() will call the delayed MTRR initialization for the AP's (mtrr_aps_init()) with mtrr_aps_delayed_init not set. This resulted in the boot processor reprogramming its MTRR's to the values seen during the start of the OS boot. While this is not needed ideally, this shouldn't have caused any side-effects. This is because the reprogramming of MTRR's (set_mtrr_state() that gets called via set_mtrr()) will check if the live register contents are different from what is being asked to write and will do the actual write only if they are different. BP's mtrr state is read during the start of the OS boot and typically nothing would have changed when we ask to reprogram it on BP again because of the above scenario on an UP platform. So on a normal UP platform no reprogramming of BP MTRR MSR's happens and all is well. However, on this platform, bios seems to be modifying the fixed mtrr range registers between the start of OS boot and when we double check the live registers for reprogramming BP MTRR registers. And as the live registers are modified, we end up reprogramming the MTRR's to the state seen during the start of the OS boot. During ACPI initialization, something in the bios (probably smi handler?) don't like this fact and results in a hard lockup. We didn't see this boot hang issue on this platform before the commit `d0af9eed5a`, because only the AP's (if any) will program its MTRR's to the value that BP had at the start of the OS boot. Fix this issue by checking mtrr_aps_delayed_init before continuing further in the mtrr_aps_init(). Now, only AP's (if any) will program its MTRR's to the BP values during boot. Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393 [ By the way, this behavior of the bios modifying MTRR's after the start of the OS boot is not common and the kernel is not prepared to handle this situation well. Irrespective of this issue, during suspend/resume, linux kernel will try to reprogram the BP's MTRR values to the values seen during the start of the OS boot. So suspend/resume might be already broken on this platform for all linux kernel versions. ] Reported-and-bisected-by: Markus Kohn <jabber@gmx.org> Tested-by: Markus Kohn <jabber@gmx.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Renninger <trenn@novell.com> Cc: Rafael Wysocki <rjw@novell.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: stable@kernel.org # [v2.6.32+] LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-03 12:10:38 +01:00
Matthieu CASTET	f12d3d04e8	x86, nx: Don't force pages RW when setting NX bits Xen want page table pages read only. But the initial page table (from head_*.S) live in .data or .bss. That was broken by `64edc8ed5f`. There is absolutely no reason to force these pages RW after they have already been marked RO. Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-02-02 16:02:36 -08:00
Linus Torvalds	1f0324caef	Merge branch 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/setup: Route halt operations to safe_halt pvop. xen/e820: Guard against E820_RAM not having page-aligned size or start. xen/p2m: Mark INVALID_P2M_ENTRY the mfn_list past max_pfn.	2011-01-28 12:24:34 +10:00
Linus Torvalds	f7b548fa3d	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: percpu, x86: Fix percpu_xchg_op() x86: Remove left over system_64.h x86-64: Don't use pointer to out-of-scope variable in dump_trace()	2011-01-28 06:43:41 +10:00
Stephane Eranian	d038b12c6d	perf: Fix Pentium4 raw event validation This patch fixes some issues with raw event validation on Pentium 4 (Netburst) based processors. As I was testing libpfm4 Netburst support, I ran into two problems in the p4_validate_raw_event() function: - the shared field must be checked ONLY when HT is on - the binding to ESCR register was missing The second item was causing raw events to not be encoded correctly compared to generic PMU events. With this patch, I can now pass Netburst events to libpfm4 examples and get meaningful results: $ task -e global_power_events🏃u noploop 1 noploop for 1 seconds 3,206,304,898 global_power_events:running Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: peterz@infradead.org Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@gmail.com Cc: robert.richter@amd.com Cc: acme@redhat.com Cc: gorcunov@gmail.com Cc: ming.m.lin@intel.com LKML-Reference: <4d3efb2f.1252d80a.1a80.ffffc83f@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-27 19:21:53 +01:00
Stefano Stabellini	23febeddbe	xen/setup: Route halt operations to safe_halt pvop. With this patch, the cpuidle driver does not load and does not issue the mwait operations. Instead the hypervisor is doing them (b/c we call the safe_halt pvops call). This fixes quite a lot of bootup issues wherein the user had to force interrupts for the continuation of the bootup. Details are discussed in: http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00535.html [v2: Wrote the commit description] Reported-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Tested-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-01-27 12:00:24 -05:00
Stefano Stabellini	7cb31b752c	xen/e820: Guard against E820_RAM not having page-aligned size or start. Under Dell Inspiron 1525, and Intel SandyBridge SDP's the BIOS e820 RAM is not page-aligned: [ 0.000000] Xen: 0000000000100000 - 00000000df66d800 (usable) We were not handling that and ended up setting up a pagetable that included up to df66e000 with the disastrous effect that when memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t)); tried to clear the page it would crash at the 2K mark. Initially reported by Michael Young @ http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00108.html The fix is to page-align the size and also take into consideration the start of the E820 (in case that is not page-aligned either). This fixes the bootup failure on those affected machines. This patch is a rework of the Micheal A Young initial patch and considers the case if the start is not page-aligned. Reported-by: Michael A Young <m.a.young@durham.ac.uk> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Michael A Young <m.a.young@durham.ac.uk>	2011-01-27 10:49:35 -05:00
Stefan Bader	cf04d120d9	xen/p2m: Mark INVALID_P2M_ENTRY the mfn_list past max_pfn. In case the mfn_list does not have enough entries to fill a p2m page we do not want the entries from max_pfn up to the boundary to be filled with unknown values. Hence set them to INVALID_P2M_ENTRY. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-01-27 10:49:34 -05:00
Eric Dumazet	889a7a6a5d	percpu, x86: Fix percpu_xchg_op() These recent percpu commits: `2485b6464c`: x86,percpu: Move out of place 64 bit ops into X86_64 section `8270137a0d`: cpuops: Use cmpxchg for xchg to avoid lock semantics Caused this 'perf top' crash: Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>] ? panic ? kmsg_dump ? kmsg_dump ? oops_end ? no_context ? __bad_area_nosemaphore ? perf_output_begin ? bad_area_nosemaphore ? do_page_fault ? __task_pid_nr_ns ? perf_event_tid ? __perf_event_header__init_id ? validate_chain ? perf_output_sample ? trace_hardirqs_off ? page_fault ? irq_work_run ? update_process_times ? tick_sched_timer ? tick_sched_timer ? __run_hrtimer ? hrtimer_interrupt ? account_system_vtime ? smp_apic_timer_interrupt ? apic_timer_interrupt ... Looking at assembly code, I found: list = this_cpu_xchg(irq_work_list, NULL); gives this wrong code : (gcc-4.1.2 cross compiler) ffffffff810bc45e: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne ffffffff810bc45e <irq_work_run+0x3e> test %rax,%rax je ffffffff810bc4aa <irq_work_run+0x8a> Tell gcc we dirty eax/rax register in percpu_xchg_op() Compiler must use another register to store pxo_new__ We also dont need to reload percpu value after a jump, since a 'failed' cmpxchg already updated eax/rax Wrong generated code was : xor %rax,%rax /* load 0 into %rax / 1: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne 1b test %rax,%rax After patch : xor %rdx,%rdx / load 0 into %rdx */ mov %gs:0xead0,%rax 1: cmpxchg %rdx,%gs:0xead0 jne 1b: test %rax,%rax Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-26 08:10:49 +01:00
Yinghai Lu	9a57c3e487	x86: Remove left over system_64.h Left-over from the x86 merge ... Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D3E23D1.7010405@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-26 08:05:58 +01:00
Andrea Arcangeli	cacf061c5e	thp: fix PARAVIRT x86 32bit noPAE This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n. The #ifdef that this patch removes was erratically introduced to fix a build error for noPAE (where pmd.pmd doesn't exist). So then the kernel built but it failed at runtime because set_pmd_at was a noop. This will correct it by enabling set_pmd_at for noPAE mode too. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: werner <w.landgraf@ru.ru> Reported-by: Minchan Kim <minchan.kim@gmail.com> Tested-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-26 10:49:57 +10:00
Jesper Juhl	2e5aa6824d	x86-64: Don't use pointer to out-of-scope variable in dump_trace() In arch/x86/kernel/dumpstack_64.c::dump_trace() we have this code: ... if (!stack) { unsigned long dummy; stack = &dummy; if (task && task != current) stack = (unsigned long )task->thread.sp; } bp = stack_frame(task, regs); / * Print function call entries in all stacks, starting at the * current stack address. If the stacks consist of nested * exceptions / tinfo = task_thread_info(task); for (;;) { char id; unsigned long *estack_end; estack_end = in_exception_stack(cpu, (unsigned long)stack, &used, &id); ... You'll notice that we assign to 'stack' the address of the variable 'dummy' which is only in-scope inside the 'if (!stack)'. So when we later access stack (at the end of the above, and assuming we did not take the 'if (task && task != current)' branch) we'll be using the address of a variable that is no longer in scope. I believe this patch is the proper fix, but I freely admit that I'm not 100% certain. Signed-off-by: Jesper Juhl <jj@chaosbits.net> LKML-Reference: <alpine.LNX.2.00.1101242232590.10252@swampdragon.chaosbits.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-01-24 13:46:15 -08:00
Linus Torvalds	4398f31ca7	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix jump label with RO/NX module protection crash x86, hotplug: Fix powersavings with offlined cores on AMD x86, mcheck, therm_throt.c: Export symbol platform_thermal_notify to allow coretemp to handler intr x86: Use asm-generic/cacheflush.h x86: Update CPU cache attributes table descriptors	2011-01-25 05:24:12 +10:00
matthieu castet	8969691343	x86: Fix jump label with RO/NX module protection crash If we use jump table in module init, there are marked as removed in __jump_table section after init is done. But we already applied ro permissions on the module, so we can't modify a read only section (crash in remove_jump_label_module_init). Make the __jump_table section rw. Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Cc: Xiaotian Feng <xtfeng@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Siarhei Liakh <sliakh.lkml@gmail.com> Cc: Xuxian Jiang <jiang@cs.ncsu.edu> Cc: James Morris <jmorris@namei.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4D3C3F20.7030203@free.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-23 16:12:45 +01:00
Borislav Petkov	93789b32db	x86, hotplug: Fix powersavings with offlined cores on AMD `ea53069231` made a CPU use monitor/mwait when offline. This is not the optimal choice for AMD wrt to powersavings and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the same selection whether to use monitor/mwait has to be used as when we select the idle routine for the machine. With this patch, offlining cores 1-5 on a X6 machine allows core0 to boost again. [ hpa: putting this in urgent since it is a (power) regression fix ] Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: stable@kernel.org # 37.x Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.hl> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-01-21 18:14:54 -08:00
Linus Torvalds	ebe0d80507	Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: x86,percpu: Move out of place 64 bit ops into X86_64 section	2011-01-21 13:43:21 -08:00
Linus Torvalds	cfd74486ea	Merge branch 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: p2m: correctly initialize partial p2m leaf xen: fix non-ANSI function warning in irq.c	2011-01-21 13:35:10 -08:00
Stefan Bader	8e1b4cf210	xen: p2m: correctly initialize partial p2m leaf After changing the p2m mapping to a tree by commit `58e05027b5` xen: convert p2m to a 3 level tree and trying to boot a DomU with 615MB of memory, the following crash was observed in the dump: kernel direct mapping tables up to 26f00000 @ 1ec4000-1fff000 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c0107397>] xen_set_pte+0x27/0x60 pdpt = 0000000000000000 pde = 0000000000000000 Adding further debug statements showed that when trying to set up pfn=0x26700 the returned mapping was invalid. pfn=0x266ff calling set_pte(0xc1fe77f8, 0x6b3003) pfn=0x26700 calling set_pte(0xc1fe7800, 0x3) Although the last_pfn obtained from the startup info is 0x26700, which should in turn not be hit, the additional 8MB which are added as extra memory normally seem to be ok. This lead to looking into the initial p2m tree construction, which uses the smaller value and assuming that there is other code handling the extra memory. When the p2m tree is set up, the leaves are directly pointed to the array which the domain builder set up. But if the mapping is not on a boundary that fits into one p2m page, this will result in the last leaf being only partially valid. And as the invalid entries are not initialized in that case, things go badly wrong. I am trying to fix that by checking whether the current leaf is a complete map and if not, allocate a completely new page and copy only the valid pointers there. This may not be the most efficient or elegant solution, but at least it seems to allow me booting DomUs with memory assignments all over the range. BugLink: http://bugs.launchpad.net/bugs/686692 [v2: Redid a bit of commit wording and fixed a compile warning] Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-01-21 11:24:14 -05:00
Fenghua Yu	f21bbec9ff	x86, mcheck, therm_throt.c: Export symbol platform_thermal_notify to allow coretemp to handler intr In therm_throt.c, commit `9e76a97efd` patch doesn't export the symbol platform_thermal_notify. Other drivers (e.g. drivers/hwmon/coretemp.c) can not find the symbol platform_thermal_notify when defining threshould interrupt handler. Please apply this patch to allow threshold interrupt handler in coretemp. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: R Durgadoss <durgadoss.r@intel.com> Cc: khali@linux-fr.org <khali@linux-fr.org> Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org> Cc: Guenter Roeck <guenter.roeck@ericsson.com> LKML-Reference: <20110121041239.GB26954@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-21 14:11:12 +01:00
Akinobu Mita	cc67ba6352	x86: Use asm-generic/cacheflush.h The implementation of the cache flushing interfaces on the x86 is identical with the default implementation in asm-generic. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: arnd@arndb.de LKML-Reference: <1295523136-4277-2-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-21 14:11:12 +01:00
Linus Torvalds	2b1caf6ed7	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c lockdep: Move early boot local IRQ enable/disable status to init/main.c	2011-01-20 18:30:37 -08:00
Linus Torvalds	8d99641f6c	Merge branch 'akpm' * akpm: kernel/smp.c: consolidate writes in smp_call_function_interrupt() kernel/smp.c: fix smp_call_function_many() SMP race memcg: correctly order reading PCG_USED and pc->mem_cgroup backlight: fix 88pm860x_bl macro collision drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking MAINTAINERS: update Atmel AT91 entry mm: fix truncate_setsize() comment memcg: fix rmdir, force_empty with THP memcg: fix LRU accounting with THP memcg: fix USED bit handling at uncharge in THP memcg: modify accounting function for supporting THP better fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio mm: compaction: prevent division-by-zero during user-requested compaction mm/vmscan.c: remove duplicate include of compaction.h memblock: fix memblock_is_region_memory() thp: keep highpte mapped until it is no longer needed kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT	2011-01-20 17:02:14 -08:00
David Rientjes	6a108a14fa	kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option is used to configure any non-standard kernel with a much larger scope than only small devices. This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes references to the option throughout the kernel. A new CONFIG_EMBEDDED option is added that automatically selects CONFIG_EXPERT when enabled and can be used in the future to isolate options that should only be considered for embedded systems (RISC architectures, SLOB, etc). Calling the option "EXPERT" more accurately represents its intention: only expert users who understand the impact of the configuration changes they are making should enable it. Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <david.woodhouse@intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg KH <gregkh@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Robin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-20 17:02:05 -08:00
Linus Torvalds	e55fdbd741	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: remove virtio-pci root device LGUEST_GUEST: fix unmet direct dependencies (VIRTUALIZATION && VIRTIO) lguest: compile fixes lguest: Use this_cpu_ops lguest: document --rng in example Launcher lguest: example launcher to use guard pages, drop PROT_EXEC, fix limit logic lguest: --username and --chroot options	2011-01-20 16:31:20 -08:00

1 2 3 4 5 ...

12431 Commits