Based upon a report and initial patch by Friedrich Oslage.
The intention is to provide this facility for
__trigger_all_cpu_backtrace even if MAGIC_SYSRQ is not set.
The only part that should have MAGIC_SYSRQ ifdef protection is the
sparc_globalreg_op sysrq regitration and immediate code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug reported by Alexander Beregalov.
Before we dereference the stack frame or try to peek at the
pt_regs magic value, make sure the entire object is within
the kernel stack bounds.
Signed-off-by: David S. Miller <davem@davemloft.net>
All of the xcall delivery implementation is cpumask agnostic, so
we can pass around pointers to const cpumask_t objects everywhere.
The sad remaining case is the argument to arch_send_call_function_ipi().
Signed-off-by: David S. Miller <davem@davemloft.net>
It can eat up a lot of stack space when NR_CPUS is large.
We retain some of it's functionality by reporting at least one
of the cpu's which are seen in error state.
Signed-off-by: David S. Miller <davem@davemloft.net>
Then modify all of the xcall dispatch implementations get passed and
use this information.
Now all of the xcall dispatch implementations do not need to be mindful
of details such as "is current cpu in the list?" and "is cpu online?"
Signed-off-by: David S. Miller <davem@davemloft.net>
This just facilitates the next changeset where we'll be building
the cpu list and mondo block in this helper function.
Signed-off-by: David S. Miller <davem@davemloft.net>
Ideally this could be simplified further such that we could pass
the pointer down directly into the xcall_deliver() implementation.
But if we do that we need to do the "cpu_online(cpu)" and
"cpu != self" checks down in those functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
For these cases the callers make sure:
1) The cpus indicated are online.
2) The current cpu is not in the list of indicated cpus.
Therefore we can pass a pointer to the mask directly.
One of the motivations in this transformation is to make use of
"&cpumask_of_cpu(cpu)" which evaluates to a pointer to constant
data in the kernel and thus takes up no stack space.
Hopefully someone in the future will change the interface of
arch_send_call_function_ipi() such that it passes a const cpumask_t
pointer so that this will optimize ever further.
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed duplicated #include <linux/tracehook.h> in
arch/sparc64/kernel/signal.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
That's the userland thread register, so we should never try to change
it like this.
Based upon glibc bug nptl/6577 and suggestions by Jakub Jelinek.
Signed-off-by: David S. Miller <davem@davemloft.net>
The story is that what we used to do when we actually used
smp_report_regs() is that if you specifically only wanted to have the
current cpu's registers dumped you would call "__show_regs()"
otherwise you would call show_regs() which also invoked
smp_report_regs().
Now that we killed off smp_report_regs() there is no longer any
reason to have these two routines, just show_regs() is sufficient.
Also kill off a stray declaration of show_regs() in sparc64_ksym.c
Signed-off-by: David S. Miller <davem@davemloft.net>
All the call sites are #if 0'd out and we have a much more
useful global cpu dumping facility these days. smp_report_regs()
is way too verbose to be usable.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just clutters everything up and even though I wrote that hack I
can't remember having used it in the last 5 years or so.
Signed-off-by: David S. Miller <davem@davemloft.net>
Record one more level of stack frame program counter.
Particularly when lockdep and all sorts of spinlock debugging is
enabled, figuring out the caller of spin_lock() is difficult when the
cpu is stuck on the lock.
Signed-off-by: David S. Miller <davem@davemloft.net>
Call the standard hook after setting up signal handlers.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds TIF_NOTIFY_RESUME support for sparc64.
When set, we call tracehook_notify_resume() on the way to user mode.
Signed-off-by: Roland McGrath <roland@redhat.com>
This changes sparc64 syscall tracing to use the new tracehook.h entry
points.
[ Add assembly changes to force an immediate -ENOSYS return from
the system call when syscall_trace() returns non-zero at syscall
entry. -DaveM ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The majority of this patch was created by the following script:
***
ASM=arch/sparc/include/asm
mkdir -p $ASM
git mv include/asm-sparc64/ftrace.h $ASM
git rm include/asm-sparc64/*
git mv include/asm-sparc/* $ASM
sed -ie 's/asm-sparc64/asm/g' $ASM/*
sed -ie 's/asm-sparc/asm/g' $ASM/*
***
The rest was an update of the top-level Makefile to use sparc
for header files when sparc64 is being build.
And a small fixlet to pick up the correct unistd.h from
sparc64 code.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This wires up the recently added Wire up signalfd4, eventfd2,
epoll_create1, dup3, pipe2, and inotify_init1 system calls.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value. This patch implements
the handling of O_CLOEXEC for the flag. I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation. I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.
The implementation introduces do_pipe_flags. I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code. To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags. Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_pipe2
# ifdef __x86_64__
# define __NR_pipe2 293
# elif defined __i386__
# define __NR_pipe2 331
# else
# error "need __NR_pipe2"
# endif
#endif
int
main (void)
{
int fd[2];
if (syscall (__NR_pipe2, fd, 0) != 0)
{
puts ("pipe2(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
{
puts ("pipe2(O_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based upon a report by Daniel Smolik.
We do it too early, which triggers a BUG in
cpufreq_register_notifier().
Signed-off-by: David S. Miller <davem@davemloft.net>
We're calling request_irq() with a IRQs disabled.
No straightforward fix exists because we want to
enable these IRQs and setup state atomically before
getting into the IRQ handler the first time.
What happens now is that we mark the VIRQ to not be
automatically enabled by request_irq(). Then we
make explicit enable_irq() calls when we grab the
LDC channel.
This way we don't need to call request_irq() illegally
under the LDC channel lock any more.
Bump LDC version and release date.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
remove CONFIG_KMOD from core kernel code
remove CONFIG_KMOD from lib
remove CONFIG_KMOD from sparc64
rework try_then_request_module to do less in non-modular kernels
remove mention of CONFIG_KMOD from documentation
make CONFIG_KMOD invisible
modules: Take a shortcut for checking if an address is in a module
module: turn longs into ints for module sizes
Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module: reorder struct module to save space on 64 bit builds
module: generic each_symbol iterator function
module: don't use stop_machine for waiting rmmod
One place is just a comment, the other a conditional, unused
inclusion of linux/kmod.h.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This converts all instances of bus_id in the sparc core kernel to use
either dev_set_name(), or dev_name() depending on the need.
This is done in anticipation of removing the bus_id field from struct
driver.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kobjects do not have a limit in name size since a while, so stop
pretending that they do.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:
scheduler switch to idle task
enable interrupts
Window starts here
----> interrupt happens (does not set NEED_RESCHED)
irq_exit() stops the tick
----> interrupt happens (does set NEED_RESCHED)
return from schedule()
cpu_idle(): preempt_disable();
Window ends here
The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.
The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.
Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.
cpu_idle()
{
preempt_disable();
while(1) {
tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
are in the idle loop
while (!need_resched())
halt();
tick_nohz_restart_sched_tick(); <- disables NOHZ mode
preempt_enable_no_resched();
schedule();
preempt_disable();
}
}
In hindsight we should have done this forever, but ...
/me grabs a large brown paperbag.
Debugged-by: Jack Ren <jack.ren@marvell.com>,
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
* 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (228 commits)
ftrace: build fix for ftraced_suspend
ftrace: separate out the function enabled variable
ftrace: add ftrace_kill_atomic
ftrace: use current CPU for function startup
ftrace: start wakeup tracing after setting function tracer
ftrace: check proper config for preempt type
ftrace: trace schedule
ftrace: define function trace nop
ftrace: move sched_switch enable after markers
ftrace: prevent ftrace modifications while being kprobe'd, v2
fix "ftrace: store mcount address in rec->ip"
mmiotrace broken in linux-next (8-bit writes only)
ftrace: avoid modifying kprobe'd records
ftrace: freeze kprobe'd records
kprobes: enable clean usage of get_kprobe
ftrace: store mcount address in rec->ip
ftrace: build fix with gcc 4.3
namespacecheck: fixes
ftrace: fix "notrace" filtering priority
ftrace: fix printout
...
Today's linux-next build (spac64 allmodconfig) failed like this:
arch/sparc64/kernel/stacktrace.c:50: warning: type defaults to `int' in declaration of `EXPORT_SYMBOL_GPL'
arch/sparc64/kernel/stacktrace.c:50: warning: parameter names (without types) in function declaration
arch/sparc64/kernel/stacktrace.c:50: warning: data definition has no type or storage class
Signed-off-by: Stephen Rothwell <sf@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Also fixes up the sparc code that was assuming this is not a constant.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Andrew Morton reported this against linux-next:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alexander Beregalov reported this build failure:
$ make CROSS_COMPILE=sparc64-unknown-linux-gnu- image modules && sudo
make modules_install
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CALL scripts/checksyscalls.sh
CHK include/linux/compile.h
dnsdomainname: Unknown host
CC arch/sparc64/kernel/sparc64_ksyms.o
arch/sparc64/kernel/sparc64_ksyms.c:116: error: '_mcount' undeclared
here (not in a function)
cc1: warnings being treated as errors
arch/sparc64/kernel/sparc64_ksyms.c:116: error: type defaults to 'int'
in declaration of '_mcount'
And bisected it back to:
| commit 395a59d0f8
| Author: Abhishek Sagar <sagar.abhishek@gmail.com>
| Date: Sat Jun 21 23:47:27 2008 +0530
|
| ftrace: store mcount address in rec->ip
the mcount prototype is only available under CONFIG_FTRACE,
extend it to CONFIG_MCOUNT as well.
Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds kernel/smp.c which contains helpers for IPI function calls. In
addition to supporting the existing smp_call_function() in a more efficient
manner, it also adds a more scalable variant called smp_call_function_single()
for calling a given function on a single CPU only.
The core of this is based on the x86-64 patch from Nick Piggin, lots of
changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has
contributed lots of fixes and suggestions as well. Also thanks to
Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage
and getting rid of the data allocation fallback deadlock.
Acked-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.
Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we fully commit to returning back to kernel mode from
a trap, zero out the regs->magic value to prevent false
positives during stack backtraces.
Signed-off-by: David S. Miller <davem@davemloft.net>
The offset to the pt_regs area was wrong, so we weren't
looking at the right location for the magic cookie.
A trap frame is composed of a "struct sparc_stackf" then
a "struct pt_regs", the code was using "struct reg_window"
instead of "struct sparc_stackf".
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the silly way I set up the initial stack for
new kernel threads, there is a loop at the top of the
stack.
To fix this, properly add another stack frame that is copied
from the parent and terminate it in the child by setting
the frame pointer in that frame to zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
When a cpu really is stuck in the kernel, it can be often
impossible to figure out which cpu is stuck where. The
worst case is when the stuck cpu has interrupts disabled.
Therefore, implement a global cpu state capture that uses
SMP message interrupts which are not disabled by the
normal IRQ enable/disable APIs of the kernel.
As long as we can get a sysrq 'y' to the kernel, we can
get a dump. Even if the console interrupt cpu is wedged,
we can trigger it from userspace using /proc/sysrq-trigger
The output is made compact so that this facility is more
useful on high cpu count systems, which is where this
facility will likely find itself the most useful :)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like mmap, we need to validate address ranges regardless
of MAP_FIXED.
sparc{,64}_mmap_check()'s flag argument is unused, remove.
Based upon a report and preliminary patch by
Jan Lieskovsky <jlieskov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So, forever, we've had this ptrace_signal_deliver implementation
which tries to handle all of the nasties that can occur when the
debugger looks at a process about to take a signal. It's meant
to address all of these issues inside of the kernel so that the
debugger need not be mindful of such things.
Problem is, this doesn't work.
The idea was that we should do the syscall restart business first, so
that the debugger captures that state. Otherwise, if the debugger for
example saves the child's state, makes the child execute something
else, then restores the saved state, we won't handle the syscall
restart properly because we lose the "we're in a syscall" state.
The code here worked for most cases, but if the debugger actually
passes the signal through to the child unaltered, it's possible that
we would do a syscall restart when we shouldn't have.
In particular this breaks the case of debugging a process under a gdb
which is being debugged by yet another gdb. gdb uses sigsuspend
to wait for SIGCHLD of the inferior, but if gdb itself is being
debugged by a top-level gdb we get a ptrace_stop(). The top-level gdb
does a PTRACE_CONT with SIGCHLD to let the inferior gdb see the
signal. But ptrace_signal_deliver() assumed the debugger would cancel
out the signal and therefore did a syscall restart, because the return
error was ERESTARTNOHAND.
Fix this by simply making ptrace_signal_deliver() a nop, and providing
a way for the debugger to control system call restarting properly:
1) Report a "in syscall" software bit in regs->{tstate,psr}.
It is set early on in trap entry to a system call and is fully
visible to the debugger via ptrace() and regsets.
2) Test this bit right before doing a syscall restart. We have
to do a final recheck right after get_signal_to_deliver() in
case the debugger cleared the bit during ptrace_stop().
3) Clear the bit in trap return so we don't accidently try to set
that bit in the real register.
As a result we also get a ptrace_{is,clear}_syscall() for sparc32 just
like sparc64 has.
M68K has this same exact bug, and is now the only other user of the
ptrace_signal_deliver hook. It needs to be fixed in the same exact
way as sparc.
Signed-off-by: David S. Miller <davem@davemloft.net>
Forever we had a PTRACE_SUNOS_DETACH which was unconditionally
recognized, regardless of the personality of the process.
Unfortunately, this value is what ended up in the GLIBC sys/ptrace.h
header file on sparc as PTRACE_DETACH and PT_DETACH.
So continue to recognize this old value. Luckily, it doesn't conflict
with anything we actually care about.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be more liberal about the alignment of the buffer given to
us by sigaltstack(). The user should not need to be mindful of all of
the alignment constraints we have for the stack frame.
This mirrors how we handle this situation in clone() as well.
Also, we align the stack even in non-SA_ONSTACK cases so that signals
due to bad stack alignment can be delivered properly. This makes such
errors easier to debug and recover from.
Finally, add the sanity check x86 has to make sure we won't overflow
the signal stack.
This fixes glibc testcases nptl/tst-cancel20.c and
nptl/tst-cancelx20.c
Signed-off-by: David S. Miller <davem@davemloft.net>
We clobber %i1 as well as %i0 for these system calls,
because they give two return values.
Therefore, on error, we have to restore %i1 properly
or else the restart explodes since it uses the wrong
arguments.
This fixes glibc's nptl/tst-eintr1.c testcase.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2664ef44cf.
Ingo moved around where the softlockup dependency sits
so this change is no longer necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
The change I put into copy_thread() just papered over the real
problem.
When we are looking to see if we should do a syscall restart, when
deliverying a signal, we should only interpret the syscall return
value as an error if the carry condition code(s) are set.
Otherwise it's a success return.
Also, sigreturn paths should do a pt_regs_clear_trap_type().
It turns out that doing a syscall restart when returning from a fork()
does and should happen, from time to time. Even if copy_thread()
returns success, copy_process() can still unwind and signal
-ERESTARTNOINTR in the parent.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just creates confusion, errors, and bugs.
For one thing, this can cause dup sysfs or procfs nodes to get
created:
[ 1.198015] proc_dir_entry '00.0' already registered
[ 1.198036] Call Trace:
[ 1.198052] [00000000004f2534] create_proc_entry+0x7c/0x98
[ 1.198092] [00000000005719e4] pci_proc_attach_device+0xa4/0xd4
[ 1.198126] [00000000007d991c] pci_proc_init+0x64/0x88
[ 1.198158] [00000000007c62a4] kernel_init+0x190/0x330
[ 1.198183] [0000000000426cf8] kernel_thread+0x38/0x48
[ 1.198210] [00000000006a0d90] rest_init+0x18/0x5c
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove dulicated include file <asm/timer.h> in arch/sparc64/kernel/smp.c.
Signed-off-by: Huang Weiyi <hwy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current limitations:
1) On SMP single stepping has some fundamental issues,
shared with other sw single-step architectures such
as mips and arm.
2) On 32-bit sparc we don't support SMP kgdb yet. That
requires some reworking of the IPI mechanisms and
infrastructure on that platform.
Signed-off-by: David S. Miller <davem@davemloft.net>
entry.S was a hodge-podge of several totally unrelated
sets of assembler routines, ranging from FPU trap handlers
to hypervisor call functions.
Split it up into topic-sized pieces.
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a regression added by
238468b2ac ("[SPARC64]: Use trap type
stored in pt_regs to handle syscall restart.")
Because we now encode the "returning from syscall" status in the
pt_regs area, we have to be mindful to zap it out in the child
of a fork.
During a parallel kernel build I saw an accidental -EINTR return
from vfork() in 'make' because of this bug.
Signed-off-by: David S. Miller <davem@davemloft.net>
If we use this from more than one place, it's better to
have helpers instead of twiddling magic constants all
over.
Add pt_regs_trap_type(), pt_regs_clear_trap_type(), and
pt_regs_is_syscall().
Use them in do_signal().
Signed-off-by: David S. Miller <davem@davemloft.net>
Back around the same time we were bootstrapping the first 32-bit sparc
Linux kernel with a SunOS userland, we made the signal frame match
that of SunOS.
By the time we even started putting together a native Linux userland
for 32-bit Sparc we realized this layout wasn't sufficient for Linux's
needs.
Therefore we changed the layout, yet kept support for the old style
signal frame layout in there. The detection mechanism is that we had
sys_sigaction() start passing in a negative signal number to indicate
"new style signal frames please".
Anyways, no binaries exist in the world that use the old stuff. In
fact, I bet Jakub Jelinek and myself are the only two people who ever
had such binaries to be honest.
So let's get rid of this stuff.
I added an assertion using WARN_ON_ONCE() that makes sure 32-bit
applications are passing in that negative signal number still.
Signed-off-by: David S. Miller <davem@davemloft.net>
I must have disabled this due to other bugs which were fixed over
time. And this is needed in order for child devices of "pmu"
to get proper resource values.
Signed-off-by: David S. Miller <davem@davemloft.net>
It's completely superfluous, CONFIG_COMPAT is sufficient.
What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types. But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.
Update defconfig.
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel bugzilla 10273
As reported by Jos van der Ende, ever since commit
5a606b72a4 ("[SPARC64]: Do not ACK an
INO if it is disabled or inprogress.") sun4u interrupts
can get stuck.
What this changset did was add the following conditional to
the various IRQ chip ->enable() handlers on sparc64:
if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))
return;
which is correct, however it means that special care is needed
in the ->enable() method.
Specifically we must put the interrupt into IDLE state during
an enable, or else it might never be sent out again.
Setting the INO interrupt state to IDLE resets the state machine,
the interrupt input to the INO is retested by the hardware, and
if an interrupt is being signalled by the device, the INO
moves back into TRANSMIT state, and an interrupt vector is sent
to the cpu.
The two sun4v IRQ chip handlers were already doing this properly,
only sun4u got it wrong.
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise all sorts of bad things can happen, including
spurious softlockup reports.
Other platforms have this same bug, in one form or
another, just don't see the issue because they
don't sleep as long as sparc64 can in NOHZ.
Thanks to some brilliant debugging by Peter Zijlstra.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have a magic cookie in the pt_regs, we can
properly detect trap frames in stack bactraces.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we indicate the "restart system call" in the
trap type field of pt_regs->magic, we don't need to
set the %l6 boolean in all of the trap return paths.
And we therefore don't need to pass it to do_notify_resume().
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we can check the trap type directly, we don't need the
funny restart_syscall indication from the trap return paths.
Signed-off-by: David S. Miller <davem@davemloft.net>
This sets us up for several simplifications and facilities:
1) The magic cookie lets us identify trap frames more
accurately in stack backtraces.
2) The trap type lets us simplify all of the "are we in
a syscall" state management and checks.
3) We can now see if a task off the cpu is sleeping in
a system call or not. In fact, we can see what
trap it is sleeping in whatever the type. The utrace
guys will use this.
Based upon some discussions with Roland McGrath.
Signed-off-by: David S. Miller <davem@davemloft.net>
The following cleanups are now possible:
- arch/sparc64/kernel/entry.S:ret_sys_call no longer has to be global
- arch/sparc64/kernel/sparc64_ksyms.c:
remove no longer used prototypes
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is only code to parse NUMA attributes on
sun4v/niagara systems, but later on we will add such parsing
for older systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to do it like this before we can move the PROM and MDESC device
tree code over to using lmb_alloc().
Signed-off-by: David S. Miller <davem@davemloft.net>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
1) ptrace should pass 'current' to task_user_regset_view()
2) When fetching general registers using a 64-bit view, and
the target is 32-bit, we have to convert.
3) Skip the whole register window get/set code block if
the user isn't asking to access anything in there.
Otherwise we have problems if the user doesn't have
an address space setup. Fetching ptrace register is
still valid at such a time, and ptrace does not try
to access the register window area of the regset.
Signed-off-by: David S. Miller <davem@davemloft.net>
The calculation of the FPU reg save area pointer
was wrong.
Based upon an OOPS report from Tom Callaway.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some IOMMUs allocate memory areas spanning LLD's segment boundary limit. It
forces low level drivers to have a workaround to adjust scatter lists that the
IOMMU builds. We are in the process of making all the IOMMUs respect the
segment boundary limits to remove such work around in LLDs.
SPARC64 IOMMUs were rewritten to use the IOMMU helper functions and the commit
89c94f2f70 made the IOMMUs not allocate memory
areas spanning the segment boundary limit.
However, SPARC64 IOMMUs allocate memory areas first then try to merge them
(while some IOMMUs walk through all the sg entries to see how they can be
merged first and allocate memory areas). So SPARC64 IOMMUs also need the
boundary limit checking when they try to merge sg entries.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse still doesn't like the funny cast we make from a scalar to a
"union semun" (which is correct by the C language and in particular
works with the sparc64 calling conventions, but sparse doesn't grok
that yet).
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 'UL' markers to DCU_* macros.
Declare C functions called from assembler in entry.h
Declare C functions called from within the sparc64 arch
code in include/asm-sparc64/*.h headers as appropriate.
Remove unused routines in traps.c
Signed-off-by: David S. Miller <davem@davemloft.net>
We create a local header file entry.h, under arch/sparc64/kernel/,
that we can use to declare routines either defined in assembler
or only invoked from assembler. As well as other data objects
which are private to the inner sparc64 kernel arch code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Doing a 'flushw' every stack trace capture creates so much overhead
that it makes lockdep next to unusable.
We only care about the frame pointer chain and the function caller
program counters, so flush those by hand to the stack frame.
This is significantly more efficient than a 'flushw' because:
1) We only save 16 bytes per active register window to the stack.
2) This doesn't push the entire register window context of the current
call chain out of the cpu, forcing register window fill traps as we
return back down.
Note that we can't use 'restore' and 'save' instructions to move
around the register windows because that wouldn't work on Niagara
processors. They optimize 'save' into a new register window by
simply clearing out the registers instead of pulling them in from
the on-chip register window backing store.
Based upon a report by Tom Callaway.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: exec PT_DTRACE
[SPARC64]: Use shorter list_splice_init() for brevity.
[SPARC64]: Remove most limitations to kernel image size.
The PT_DTRACE flag is meaningless and obsolete.
Don't touch it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently kernel images are limited to 8MB in size, and this causes
problems especially when enabling features that take up a lot of
kernel image space such as lockdep.
The code now will align the kernel image size up to 4MB and map that
many locked TLB entries. So, the only practical limitation is the
number of available locked TLB entries which is 16 on Cheetah and 64
on pre-Cheetah sparc64 cpus. Niagara cpus don't actually have hw
locked TLB entry support. Rather, the hypervisor transparently
provides support for "locked" TLB entries since it runs with physical
addressing and does the initial TLB miss processing.
Fully utilizing this change requires some help from SILO, a patch for
which will be submitted to the maintainer. Essentially, SILO will
only currently map up to 8MB for the kernel image and that needs to be
increased.
Note that neither this patch nor the SILO bits will help with network
booting. The openfirmware code will only map up to a certain amount
of kernel image during a network boot and there isn't much we can to
about that other than to implemented a layered network booting
facility. Solaris has this, and calls it "wanboot" and we may
implement something similar at some point.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warnings:
WARNING: vmlinux.o(.text+0x4b258): Section mismatch in reference from the function dr_cpu_data() to the function .devinit.text:mdesc_fill_in_cpu_data()
WARNING: vmlinux.o(.text+0x4b290): Section mismatch in reference from the function dr_cpu_data() to the function .cpuinit.text:cpu_up()
mdesc_fill_in_cpu_data() is only used during early init and for
cpu hotplug so the __cpuinit annotation is the correct choice.
We have the call chain:
dr_cpu_data() => dr_cpu_configure() => mdesc_fill_in_cpu_data()
dr_cpu_data() is used only during early init and for cpu
hotplug. So annotating them all __cpuinit solves the
section mismatch and should be correct.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/sparc64/kernel/process.c:504:17: warning: symbol 'sparc_do_fork' was not declared. Should it be static?
arch/sparc64/kernel/process.c:655:5: warning: symbol 'dump_fpu' was not declared. Should it be static?
arch/sparc64/kernel/process.c:708:16: warning: symbol 'sparc_execve' was not declared. Should it be static?
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/sparc64/kernel/process.c:219:6: warning: symbol '__show_regs' was not declared. Should it be static?
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed via sparse:
arch/sparc64/kernel/process.c:215:6: warning: symbol 'show_stackframe' was not declared. Should it be static?
arch/sparc64/kernel/process.c:243:6: warning: symbol 'show_stackframe32' was not declared. Should it be static?
It is totally unused.
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/sparc64/kernel/process.c:123:6: warning: symbol 'machine_alt_power_off' was not declared. Should it be static?
Signed-off-by: David S. Miller <davem@davemloft.net>
And also it's helper function pci_is_controller(). Both
are unused.
I can't remove the equivalent from sparc32 yet as some
ancient bus probing code still uses that platform's version.
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea of this thing is we could save/restore the firmware's
palette when breaking in and out of the firmware prompt.
Only one driver implemented this (atyfb) and it's value is
questionable. If you're just debugging you don't really
care that the characters end up being purple or whatever.
And we can provide better debugging and firmware command
facilities with minimal in-kernel console I/O drivers.
Signed-off-by: David S. Miller <davem@davemloft.net>