32bit and 64bit on x86 are tested and working. The rest I have looked
at closely and I can't find any problems.
setns is an easy system call to wire up. It just takes two ints so I
don't expect any weird architecture porting problems.
While doing this I have noticed that we have some architectures that are
very slow to get new system calls. cris seems to be the slowest where
the last system calls wired up were preadv and pwritev. avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h. frv is
behind with perf_event_open being the last syscall wired up. On h8300
the last system call wired up was epoll_wait. On m32r the last system
call wired up was fallocate. mn10300 has recvmmsg as the last system
call wired up. The rest seem to at least have syncfs wired up which was
new in the 2.6.39.
v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
v7: ported to Linus's latest post 2.6.39 tree.
> arch/blackfin/include/asm/unistd.h | 3 ++-
> arch/blackfin/mach-common/entry.S | 1 +
Acked-by: Mike Frysinger <vapier@gentoo.org>
Oh - ia64 wiring looks good.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] define "_sdata" symbol
pstore: Fix Kconfig dependencies for apei->pstore
pstore: fix potential logic issue in pstore read interface
pstore: fix pstore filesystem mount/remount issue
pstore: fix one type of return value in pstore
[IA64] fix build warning in arch/ia64/oprofile/backtrace.c
core_kernel_data() wants to know if an address looks like kernel
data. IA64 has had _edata forever, but never needed _sdata until
now.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: convert mips to generic i8253 clocksource
clocksource: convert x86 to generic i8253 clocksource
clocksource: convert footbridge to generic i8253 clocksource
clocksource: add common i8253 PIT clocksource
blackfin: convert to clocksource_register_hz
mips: convert to clocksource_register_hz/khz
sparc: convert to clocksource_register_hz/khz
alpha: convert to clocksource_register_hz
microblaze: convert to clocksource_register_hz/khz
ia64: convert to clocksource_register_hz/khz
x86: Convert remaining x86 clocksources to clocksource_register_hz/khz
Make clocksource name const
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
sched: Fix and optimise calculation of the weight-inverse
sched: Avoid going ahead if ->cpus_allowed is not changed
sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
sched: Remove unused parameters from sched_fork() and wake_up_new_task()
sched: Shorten the construction of the span cpu mask of sched domain
sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUG
sched: Remove unused 'this_best_prio arg' from balance_tasks()
sched: Remove noop in alloc_rt_sched_group()
sched: Get rid of lock_depth
sched: Remove obsolete comment from scheduler_tick()
sched: Fix sched_domain iterations vs. RCU
sched: Next buddy hint on sleep and preempt path
sched: Make set_*_buddy() work on non-task entities
sched: Remove need_migrate_task()
sched: Move the second half of ttwu() to the remote cpu
sched: Restructure ttwu() some more
sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
sched: Remove rq argument from ttwu_stat()
sched: Remove rq->lock from the first half of ttwu()
sched: Drop rq->lock from sched_exec()
...
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix rt_rq runtime leakage bug
Conflicts:
arch/ia64/kernel/cyclone.c
arch/mips/kernel/i8253.c
arch/x86/kernel/i8253.c
Reason: Resolve conflicts so further cleanups do not conflict further
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.
How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.
To enabled debugging during runtime, mount debugfs and
$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control
for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append
ddebug_query="module cpufreq +p"
as a boot parameter to the kernel of your choice.
For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
For future rework of try_to_wake_up() we'd like to push part of that
function onto the CPU the task is actually going to run on.
In order to do so we need a generic callback from the existing scheduler IPI.
This patch introduces such a generic callback: scheduler_ipi() and
implements it as a NOP.
BenH notes: PowerPC might use this IPI on offline CPUs under rare conditions!
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110405152728.744338123@chello.nl
The current functions are going away.
Also use the accessor for pending setaffinity in irq_data instead of
the open coded irq_desc access.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The core code calls mask_ack() which calls irq_ack() and irq_mask()
for the case where an interrupt is disabled and marked pending. That
seems to be a leftover from the old __do_IRQ() mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
irq_chip.end got obsolete with the removal of __do_IRQ().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <20110203004210.143127544@linutronix.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (42 commits)
ACPI: minor printk format change in acpi_pad
ACPI: make acpi_pad /sys output more readable
ACPICA: Update version to 20110316
ACPICA: Header support for SLIC table
ACPI: Make sure the FADT is at least rev 2 before using the reset register
ACPI: Bug compatibility for Windows on the ACPI reboot vector
ACPICA: Fix access width for reset vector
ACPI battery: fribble sysfs files from a resume notifier
ACPI button: remove unused procfs I/F
ACPI, APEI, Add PCIe AER error information printing support
PCIe, AER, use pre-generated prefix in error information printing
ACPI, APEI, Add ERST record ID cache
ACPI: Use syscore_ops instead of sysdev class and sysdev
ACPI: Remove the unused EC sysdev class
ACPI: use __cpuinit for the acpi_processor_set_pdc() call tree
ACPI: use __init where possible in processor driver
Thermal_Framework-Fix_crash_during_hwmon_unregister
ACPICA: Update version to 20110211.
ACPICA: Add mechanism to defer _REG methods for some installed handlers
ACPICA: Add support for FunctionalFixedHW in acpi_ut_get_region_name
...
The Xen PV drivers in a crashed HVM guest can not connect to the dom0
backend drivers because both frontend and backend drivers are still in
connected state. To run the connection reset function only in case of a
crashdump, the is_kdump_kernel() function needs to be available for the PV
driver modules.
Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
usable for modules.
Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup()
to early_param(). It adds an address range check from x86 also on ia64
and powerpc.
[akpm@linux-foundation.org: additional #includes]
[akpm@linux-foundation.org: remove elfcorehdr_addr export]
[akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] tioca: Fix assignment from incompatible pointer warnings
[IA64] mca.c: Fix cast from integer to pointer warning
[IA64] setup.c Typo fix "Architechtuallly"
[IA64] Add CONFIG_MISC_DEVICES=y to configs that need it.
[IA64] disable interrupts at end of ia64_mca_cpe_int_handler()
[IA64] Add DMA_ERROR_CODE define.
pstore: fix build warning for unused return value from sysfs_create_file
pstore: X86 platform interface using ACPI/APEI/ERST
pstore: new filesystem interface to platform persistent storage
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support
percpu: Generic support for this_cpu_cmpxchg_double()
alpha: use L1_CACHE_BYTES for cacheline size in the linker script
percpu: align percpu readmostly subsection to cacheline
Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the
percpu alignment having changed ("x86: Reduce back the alignment of the
per-CPU data section")
Once acpi_map_lsapic() in ia64 follows how x86 treats it wrt section
placement, the whole tree from acpi_processor_set_pdc() can become
__cpuinit.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
ia64_mca_cpu_init has a void *data local variable that is assigned
the value from either __get_free_pages() or mca_bootmem(). The problem
is that __get_free_pages returns an unsigned long and mca_bootmem, via
alloc_bootmem(), returns a void *. format_mca_init_stack takes the void *,
and it's also used with __pa(), but that casts it to long anyway.
This results in the following build warning:
arch/ia64/kernel/mca.c:1898: warning: assignment makes pointer from
integer without a cast
Cast the return of __get_free_pages to a void * to avoid
the warning.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
SAL requires that interrupts be enabled when making some calls
to it to pick up error records, so we enable interrupts inside
this handler. We should disable them again at the end.
Found by a new WARN_ONCE that tglx added to handle_irq_event_percpu()
Signed-off-by: Tony Luck <tony.luck@intel.com>
The function do_suspend_lowlevel() is specific to x86 and defined in
assembly code, so it should be called from the x86 low-level suspend
code rather than from acpi_suspend_enter().
Merge do_suspend_lowlevel() into the x86's acpi_save_state_mem() and
change the name of the latter to acpi_suspend_lowlevel(), so that the
function's purpose is better reflected by its name.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This converts the ia64 clocksources to use clocksource_register_hz/khz
CC: Tony Luck <tony.luck@intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com> [clocksource_itc path]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
local_cpu_data->itm_next = new_itm; does not need to be protected by
xtime_lock. xtime_update() takes the lock itself.
Signed-off-by: Torben Hohn <torbenh@gmx.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: johnstul@us.ibm.com
Cc: hch@infradead.org
Cc: yong.zhang0@gmail.com
LKML-Reference: <20110127145956.23248.49107.stgit@localhost>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently percpu readmostly subsection may share cachelines with other
percpu subsections which may result in unnecessary cacheline bounce
and performance degradation.
This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR()
linker macros, makes each arch linker scripts specify its cacheline
size and use it to align percpu subsections.
This is based on Shaohua's x86 only patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shaohua.li@intel.com>
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
intel_idle: open broadcast clock event
cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific
cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle
cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions
SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOW
cpuidle: delete NOP CPUIDLE_FLAG_POLL
ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGs
cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL
ACPI, intel_idle: Cleanup idle= internal variables
cpuidle: Make cpuidle_enable_device() call poll_idle_init()
intel_idle: update Sandy Bridge core C-state residency targets
arch/ia64/kernel/acpi.c:481: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long unsigned int’
Introduced by commit 05f2f274c8
[IA64] Avoid array overflow if there are too many cpus in SRAT table
Signed-off-by: Tony Luck <tony.luck@intel.com>
Having four variables for the same thing:
idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
is rather confusing and unnecessary complex.
if idle= boot param is passed, only set up one variable:
boot_option_idle_overrides
Introduces following functional changes/fixes:
- intel_idle driver does not register if any idle=xy
boot param is passed.
- processor_idle.c will also not register a cpuidle driver
and get active if idle=halt is passed.
Before a cpuidle driver with one (C1, halt) state got registered
Now the default_idle function will be used which finally uses
the same idle call to enter sleep state (safe_halt()), but
without registering a whole cpuidle driver.
That means idle= param will always avoid cpuidle drivers to register
with one exception (same behavior as before):
idle=nomwait
may still register acpi_idle cpuidle driver, but C1 will not use
mwait, but hlt. This can be a workaround for IO based deeper sleep
states where C1 mwait causes problems.
Signed-off-by: Thomas Renninger <trenn@suse.de>
cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Avoid array overflow if there are too many cpus in SRAT table
[IA64] Remove unlikely from cpu_is_offline
[IA64] irq_ia64, use set_irq_chip
[IA64] perfmon: Change vmalloc to vzalloc and drop memset.
[IA64] eliminate race condition in smp_flush_tlb_mm
acpi_numa_init() has to parse the whole SRAT table, even if the
kernel wants to limit the number of cpus it will use (because the
ones it is going to use may be described by entries at the end of
the SRAT table). Avoid overflowing the node_cpuid array.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.
The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.
We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.
- check the global sum once every interval (this will delay zero detection
for some interval, so it's probably a showstopper for vfsmounts).
- keep a local count and only taking the global sum when local reaches 0 (this
is difficult for vfsmounts, because we can't hold preempt off for the life of
a reference, so a counter would need to be per-thread or tied strongly to a
particular CPU which requires more locking).
- keep a local difference of increments and decrements, which allows us to sum
the total difference and hence find the refcount when summing all CPUs. Then,
keep a single integer "long" refcount for slow and long lasting references,
and only take the global sum of local counters when the long refcount is 0.
This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.
This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.
This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>