linux

Author	SHA1	Message	Date
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Peter Zijlstra	a866374aec	[PATCH] mm: pagefault_{disable,enable}() Introduce pagefault_{disable,enable}() and use these where previously we did manual preempt increments/decrements to make the pagefault handler do the atomic thing. Currently they still rely on the increased preempt count, but do not rely on the disabled preemption, this might go away in the future. (NOTE: the extra barrier() in pagefault_disable might fix some holes on machines which have too many registers for their own good) [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Al Viro	d5c6393641	[NET]: SPARC64 checksum annotations and cleanups. * sanitize prototypes, annotate * kill useless shift Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:23 -08:00
Linus Torvalds	72a73a69f6	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (28 commits) PCI: make arch/i386/pci/common.c:pci_bf_sort static PCI: ibmphp_pci.c: fix NULL dereference pciehp: remove unnecessary pci_disable_msi pciehp: remove unnecessary free_irq PCI: rpaphp: change device tree examination PCI: Change memory allocation for acpiphp slots i2c-i801: SMBus patch for Intel ICH9 PCI: irq: irq and pci_ids patch for Intel ICH9 PCI: pci_{enable,disable}_device() nestable ports PCI: switch pci_{enable,disable}_device() to be nestable PCI: arch/i386/kernel/pci-dma.c: ioremap balanced with iounmap pci/i386: style cleanups PCI: Block on access to temporarily unavailable pci device pci: fix __pci_register_driver error handling pci: clear osc support flags if no _OSC method acpiphp: fix missing acpiphp_glue_exit() acpiphp: fix use of list_for_each macro Altix: Initial ACPI support - ROM shadowing. Altix: SN ACPI hotplug support. Altix: Add initial ACPI IO support ...	2006-12-01 16:41:27 -08:00
Benjamin Herrenschmidt	c6dbaef22a	Driver core: add dev_archdata to struct device Add arch specific dev_archdata to struct device Adds an arch specific struct dev_arch to struct device. This enables architecture to add specific fields to every device in the system, like DMA operation pointers, NUMA node ID, firmware specific data, etc... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Andi Kleen <ak@suse.de> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-12-01 14:52:01 -08:00
Matthew Wilcox	ebf5a24829	PCI: Use pci_generic_prep_mwi on sparc64 The setting of the CACHE_LINE_SIZE register in sparc64's pci initialisation code isn't quite adequate as the device may have incompatible requirements. The generic code tests for this, so switch sparc64 over to using it. Since sparc64 has different L1 cache line size and PCI cache line size, it would need to override the generic code like i386 and ia64 do. We know what the cache line size is at compile time though, so introduce a new optional constant PCI_CACHE_LINE_BYTES. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: David Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-12-01 14:36:57 -08:00
David S. Miller	59359ff877	[SPARC]: Fix robust futex syscalls and wire up migrate_pages. When I added the entries for the robust futex syscall entries, I forgot to bump NR_SYSCALLS. The current situation is error-prone because NR_SYSCALLS lives in entry.S where the system call limit checks are enforced. Move the definition to asm/unistd.h in order to make this mistake much more difficult to make. And wire up sys_migrate_pages since the powerpc folks implemented the compat wrapper for us. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-05 16:51:03 -08:00
David S. Miller	c7fed9d750	[SPARC64]: Fix futex_atomic_cmpxchg_inatomic implementation. I copied the logic from ll/sc arch implementations, but that was wrong and makes no sense at all. Just do a straight compare-exchange instruction, just like x86. Based upon bug reports from Dennis Gilmore and Fabio Massimo. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-01 16:30:39 -08:00
David S. Miller	a94b1d1fd7	[SPARC64]: 8-byte align return value from compat_alloc_user_space() Otherwise we get a ton of unaligned exceptions, for cases such as compat_sys_msgrcv() which go: p = compat_alloc_user_space(second + sizeof(struct msgbuf)); and here 'second' can for example be an arbitrary odd value. Based upon a bug report from Jurij Smakov. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-22 21:53:30 -07:00
Matthew Wilcox	e50190a834	[PATCH] Consolidate check_signature There's nothing arch-specific about check_signature(), so move it to <linux/io.h>. Use a cross between the Alpha and i386 implementations as the generic one. Signed-off-by: Matthew Wilcox <willy@parisc-linux.org> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:23 -07:00
Al Viro	6d24c8dc2e	[PATCH] sparc64 pt_regs fixes Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-08 12:32:35 -07:00
Dave Jones	038b0a6d8d	Remove all inclusions of <linux/config.h> kbuild explicitly includes this at build time. Signed-off-by: Dave Jones <davej@redhat.com>	2006-10-04 03:38:54 -04:00
David S. Miller	36d046bbbd	[SPARC64]: Do not include compat.h from asm-sparc64/signal.h any more. It's not needed, now that all of that stuff is now in asm/compat_signal.h, and it breaks the build too :-) Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-02 14:30:45 -07:00
David S. Miller	14cc6abada	[SPARC64]: Move signal compat bits to new header file. Create asm-sparc64/compat_signal.h and stuff things there. This avoids the "linux/compat.h includes asm/signal.h but asm/signal.h needs compat_sigset_t which isn't defined yet" problems introduced recently. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-02 14:24:18 -07:00
Arnd Bergmann	135ab6ec8f	[PATCH] remove remaining errno and __KERNEL_SYSCALLS__ references The last in-kernel user of errno is gone, so we should remove the definition and everything referring to it. This also removes the now-unused lib/execve.c file that was introduced earlier. Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the kernel. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:23 -07:00
Martin Schwidefsky	ef6edc9746	[PATCH] Directed yield: cpu_relax variants for spinlocks and rw-locks On systems running with virtual cpus there is optimization potential in regard to spinlocks and rw-locks. If the virtual cpu that has taken a lock is known to a cpu that wants to acquire the same lock it is beneficial to yield the timeslice of the virtual cpu in favour of the cpu that has the lock (directed yield). With CONFIG_PREEMPT="n" this can be implemented by the architecture without common code changes. Powerpc already does this. With CONFIG_PREEMPT="y" the lock loops are coded with _raw_spin_trylock, _raw_read_trylock and _raw_write_trylock in kernel/spinlock.c. If the lock could not be taken cpu_relax is called. A directed yield is not possible because cpu_relax doesn't know anything about the lock. To be able to yield the lock in favour of the current lock holder variants of cpu_relax for spinlocks and rw-locks are needed. The new _raw_spin_relax, _raw_read_relax and _raw_write_relax primitives differ from cpu_relax insofar that they have an argument: a pointer to the lock structure. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:21 -07:00
Dave McCracken	46a82b2d55	[PATCH] Standardize pxx_page macros One of the changes necessary for shared page tables is to standardize the pxx_page macros. pte_page and pmd_page have always returned the struct page associated with their entry, while pte_page_kernel and pmd_page_kernel have returned the kernel virtual address. pud_page and pgd_page, on the other hand, return the kernel virtual address. Shared page tables needs pud_page and pgd_page to return the actual page structures. There are very few actual users of these functions, so it is simple to standardize their usage. Since this is basic cleanup, I am submitting these changes as a standalone patch. Per Hugh Dickins' comments about it, I am also changing the pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning. Signed-off-by: Dave McCracken <dmccr@us.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:51 -07:00
Jeff Garzik	a6d967a485	[libata] No need for all those arch libata-portmap.h headers They all contain the same thing. Instead, have a single generic one in include/asm-generic, and permit an arch to override as needed. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-09-25 15:33:09 -04:00
David Woodhouse	02b25fcff6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2006-09-24 22:05:59 +01:00
Jeff Garzik	23930fa1ce	Merge branch 'master' into upstream	2006-09-24 01:52:47 -04:00
David Woodhouse	09087a1a87	Fix exported headers for SPARC, SPARC64 Mostly removing files which have no business being used in userspace. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-21 08:48:27 +01:00
David Woodhouse	fadcfa33b6	[HEADERS] One line per header in Kbuild files to reduce conflicts Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-09-19 12:43:58 +01:00
Jeff Garzik	97148ba223	Merge branch 'master' into upstream	2006-09-12 12:03:21 -04:00
Kirill Korotaev	3a45975681	[PATCH] IA64,sparc: local DoS with corrupted ELFs This prevents cross-region mappings on IA64 and SPARC which could lead to system crash. They were correctly trapped for normal mmap() calls, but not for the kernel internal calls generated by executable loading. This code just moves the architecture-specific cross-region checks into an arch-specific "arch_mmap_check()" macro, and defines that for the architectures that needed it (ia64, sparc and sparc64). Architectures that don't have any special requirements can just ignore the new cross-region check, since the mmap() code will just notice on its own when the macro isn't defined. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> [ Cleaned up to not affect architectures that don't need it ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-08 08:40:46 -07:00
Jeff Garzik	b01e86fee6	Merge /spare/repo/linux-2.6 into upstream	2006-08-29 17:55:59 -04:00
David S. Miller	c46f477422	[SPARC64]: Fix pfn_pte() build failure. The "%uhi" needs to be "%%uhi" because we want a real "%" character in the assembler here, instead of an assembler variable expansion. Aparently older GCCs were more liberal and interpreted this %-letter as a literal "%" for whatever reason. Based upon a build failure report from Meelis Roos. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-22 14:35:40 -07:00
Alan Cox	2ec7df0457	[PATCH] libata: rework legacy handling to remove much of the cruft Kill host_set->next Fix simplex support Allow per platform setting of IDE legacy bases Some of this can be tidied further later on, in particular all the legacy port gunge belongs as a PCI quirk/PCI header decode to understand the special legacy IDE rules in the PCI spec. Longer term Jeff also wants to move the request_irq/free_irq out of core which will make this even cleaner. tj: folded in three followup patches - ata_piix-fix, broken-arch-fix and fix-new-legacy-handling, and separated per-dev xfermask into separate patch preceding this one. Folded in fixes are... * ata_piix-fix: fix build failure due to host_set->next removal * broken-arch-fix: add missing include/asm-/libata-portmap.h fix-new-legacy-handling: * In ata_pci_init_legacy_port(), probe_num was incorrectly incremented during initialization of the secondary port and probe_ent->n_ports was incorrectly fixed to 1. * Both legacy ports ended up having the same hard_port_no. * When printing port information, both legacy ports printed the first irq. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Tejun Heo <htejun@gmail.com>	2006-08-10 16:59:10 +09:00
bibo, mao	a9ad965ea9	[PATCH] IA64: kprobe invalidate icache of jump buffer Kprobe inserts breakpoint instruction in probepoint and then jumps to instruction slot when breakpoint is hit, the instruction slot icache must be consistent with dcache. Here is the patch which invalidates instruction slot icache area. Without this patch, in some machines there will be fault when executing instruction slot where icache content is inconsistent with dcache. Signed-off-by: bibo,mao <bibo.mao@intel.com> Acked-by: "Luck, Tony" <tony.luck@intel.com> Acked-by: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-31 13:28:38 -07:00
David S. Miller	b8cfac4c2f	[SPARC64]: Fix typo in pgprot_noncached(). The sun4v code sequence was or'ing in the sun4u pte bits by mistake. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-27 17:57:32 -07:00
David S. Miller	92f282988b	[SPARC64]: Fix quad-float multiply emulation. Something is wrong with the 3-multiply (vs. 4-multiply) optimized version of _FP_MUL_MEAT_2_*(), so just use the slower version which actually computes correct values. Noticed by Rene Rebe Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-27 16:49:21 -07:00
David S. Miller	06ffd7956e	[SPARC]: Kill prom_getname, unused and not implemented properly. The m68k port's sun3 asm/oplib.h had a stray reference too, so I killed that off as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:17:55 -07:00
David S. Miller	46ba6d7d8b	[SPARC64]: Fix more of_device layer IRQ bugs, and correct PROMREG_MAX. Sabre and Psycho PCI controllers can have partial interrupt-map properties, meaning that on-board devices don't match up to any entries. Instead, they are fully specified from the beginning and we should pass them directly to the IRQ translator as-is. Also, fill in the necessary translator slots for the "graphics" and "expansion UPA" interrupts on Sabre, Psycho, and SYSIO SBUS. Increase PROMREG_MAX to 24, as seen on SUNW,ffb devices. Finally, prevent accidentally writing past the end of the of_device struct resource[] and irqs[] arrays. Spit out a log message when we ignore some entries because there are too many of them. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:17:52 -07:00
Steven Rostedt	52393ccc0a	[PATCH] remove set_wmb - arch removal set_wmb should not be used in the kernel because it just confuses the code more and has no benefit. Since it is not currently used in the kernel this patch removes it so that new code does not include it. All archs define set_wmb(var, value) to do { var = value; wmb(); } while(0) except ia64 and sparc which use a mb() instead. But this is still moot since it is not used anyway. Hasn't been tested on any archs but x86 and x86_64 (and only compiled tested) Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:56:14 -07:00
David Woodhouse	50f73fe026	[SPARC64]: Fix make headers_install A minor typo in the include/asm-sparc64/Kbuild file prevents the make headers_install from building a useful tree of kernel headers for sparc64. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Tom "spot" Callaway <tcallawa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-13 01:50:04 -07:00
Randy Dunlap	7233589d77	[SPARC64]: Fix sparc64 build errors when CONFIG_PCI=n. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-05 20:18:39 -07:00
Linus Torvalds	6fa0cb1141	Merge git://git.infradead.org/hdrinstall-2.6 * git://git.infradead.org/hdrinstall-2.6: Remove export of include/linux/isdn/tpam.h Remove <linux/i2c-id.h> and <linux/i2c-algo-ite.h> from userspace export Restrict headers exported to userspace for SPARC and SPARC64 Add empty Kbuild files for 'make headers_install' in remaining arches. Add Kbuild file for Alpha 'make headers_install' Add Kbuild file for SPARC 'make headers_install' Add Kbuild file for IA64 'make headers_install' Add Kbuild file for S390 'make headers_install' Add Kbuild file for i386 'make headers_install' Add Kbuild file for x86_64 'make headers_install' Add Kbuild file for PowerPC 'make headers_install' Add generic Kbuild files for 'make headers_install' Basic implementation of 'make headers_check' Basic implementation of 'make headers_install'	2006-07-04 12:55:45 -07:00
Ingo Molnar	a875a69f8b	[PATCH] lockdep: add per_cpu_offset() Add the per_cpu_offset() generic method. (used by the lock validator) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-03 15:27:00 -07:00
Thomas Gleixner	d356d7f4f2	[PATCH] irq-flags: SPARC64: Use the new IRQF_ constants Use the new IRQF_ constants and remove the SA_INTERRUPT define Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-02 13:58:48 -07:00
David S. Miller	6e990b50ed	[SPARC64]: Kill sun4v virtual device layer. Replace with a simple IRQ translater in the PROM device tree builder. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-30 14:13:41 -07:00
Linus Torvalds	74e651f0aa	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits) [TIPC]: Initial activation message now includes TIPC version number [TIPC]: Improve response to requests for node/link information [TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf [IrDA]: Fix the AU1000 FIR dependencies [IrDA]: Fix RCU lock pairing on error path [XFRM]: unexport xfrm_state_mtu [NET]: make skb_release_data() static [NETFILTE] ipv4: Fix typo (Bugzilla #6753) [IrDA]: MCS7780 usb_driver struct should be static [BNX2]: Turn off link during shutdown [BNX2]: Use dev_kfree_skb() instead of the _irq version [ATM]: basic sysfs support for ATM devices [ATM]: [suni] change suni_init to __devinit [ATM]: [iphase] should be __devinit not __init [ATM]: [idt77105] should be __devinit not __init [BNX2]: Add NETIF_F_TSO_ECN [NET]: Add ECN support for TSO [AF_UNIX]: Datagram getpeersec [NET]: Fix logical error in skb_gso_ok [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000 ...	2006-06-29 17:43:43 -07:00
Catherine Zhang	877ce7c1b3	[AF_UNIX]: Datagram getpeersec This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:58:06 -07:00
David S. Miller	2b1e597871	[SPARC64]: of_device layer IRQ resolution Do IRQ determination generically by parsing the PROM properties, and using IRQ controller drivers for final resolution. One immediate positive effect is that all of the IRQ frobbing in the EBUS, ISA, and PCI controller layers has been eliminated. We just look up the of_device and use the properly computed value. The PCI controller irq_build() routines are gone and no longer used. Unfortunately sbus_build_irq() has to remain as there is a direct reference to this in the sunzilog driver. That can be killed off once the sparc32 side of this is written and the sunzilog driver is transformed into an "of" bus driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:37:38 -07:00
David S. Miller	946ea09962	[SPARC]: Kill interrupt stuff and linux_phandle from device_node. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:37:25 -07:00
David S. Miller	3ca9fab410	[SPARC]: Add of_io{remap,unmap}(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:37:16 -07:00
David S. Miller	cf44bbc26c	[SPARC]: Beginnings of generic of_device framework. The idea is to fully construct the device register and interrupt values into these of_device objects, and convert all of SBUS, EBUS, ISA drivers to use this new stuff. Much ideas and code taken from Ben H.'s powerpc work. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:37:12 -07:00
David S. Miller	3ae9a3489a	[SPARC]: Add of_n_{addr,size}_cells(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:37:10 -07:00
David S. Miller	286bbe87c1	[SPARC64]: Kill starfire_cookie from SBUS/PCI. Totally unused. We need to traverse the list of global IRQ translaters, so storing it in the per-bus structures was useless. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:37:08 -07:00
Siddha, Suresh B	5c45bf279d	[PATCH] sched: mc/smt power savings sched policy sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in /sys/devices/system/cpu/ control the MC/SMT power savings policy for the scheduler. Based on the values (1-enable, 0-disable) for these controls, sched groups cpu power will be determined for different domains. When power savings policy is enabled and under light load conditions, scheduler will minimize the physical packages/cpu cores carrying the load and thus conserving power(with a perf impact based on the workload characteristics... see OLS 2005 CMP kernel scheduler paper for more details..) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Con Kolivas <kernel@kolivas.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-27 17:32:45 -07:00
Linus Torvalds	8871e73fdb	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC]: Add iomap interfaces. [OPENPROM]: Rewrite driver to use in-kernel device tree. [OPENPROMFS]: Rewrite using in-kernel device tree and seq_file. [SPARC]: Add unique device_node IDs and a ".node" property. [SPARC]: Add of_set_property() interface. [SPARC64]: Export auxio_register to modules. [SPARC64]: Add missing interfaces to dma-mapping.h [SPARC64]: Export _PAGE_IE to modules. [SPARC64]: Allow floppy driver to build modular. [SPARC]: Export x_bus_type to modules. [RIOWATCHDOG]: Fix the build. [CPWATCHDOG]: Fix the build. [PARPORT] sunbpp: Fix typo. [MTD] sun_uflash: Port to new EBUS device layer.	2006-06-26 10:08:32 -07:00
Anil S Keshavamurthy	e6f47f978b	[PATCH] Notify page fault call chain With this patch Kprobes now registers for page fault notifications only when their is an active probe registered. Once all the active probes are unregistered their is no need to be notified of page faults and kprobes unregisters itself from the page fault notifications. Hence we will have ZERO side effects when no probes are active. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:22 -07:00
Anil S Keshavamurthy	d98f8f0518	[PATCH] Notify page fault call chain for sparc64 Overloading of page fault notification with the notify_die() has performance issues(since the only interested components for page fault is kprobes and/or kdb) and hence this patch introduces the new notifier call chain exclusively for page fault notifications their by avoiding notifying unnecessary components in the do_page_fault() code path. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:22 -07:00
David S. Miller	87b385da1f	[SPARC]: Add unique device_node IDs and a ".node" property. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-25 23:18:57 -07:00
David S. Miller	fb7cd9d9ac	[SPARC]: Add of_set_property() interface. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-25 23:18:36 -07:00
David S. Miller	36321426e3	[SPARC64]: Add missing interfaces to dma-mapping.h Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-25 23:15:05 -07:00
David S. Miller	3505599615	[SPARC64]: Allow floppy driver to build modular. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-25 23:15:01 -07:00
Paul Mackerras	bfe5d83419	[PATCH] Define __raw_get_cpu_var and use it There are several instances of per_cpu(foo, raw_smp_processor_id()), which is semantically equivalent to __get_cpu_var(foo) but without the warning that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For those architectures with optimized per-cpu implementations, namely ia64, powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower code than __get_cpu_var(), so it would be preferable to use __get_cpu_var on those platforms. This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x, raw_smp_processor_id()) on architectures that use the generic per-cpu implementation, and turns into __get_cpu_var(x) on the architectures that have an optimized per-cpu implementation. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-25 10:01:01 -07:00
David S. Miller	576c352e89	[SBUS]: Rewrite and plug into of_device framework. I severely apologize, I was still learning how to program in C when I wrote this stuff 10 years ago... Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:50 -07:00
David S. Miller	fd53143116	[SPARC]: Port of_device layer and make ebus use it. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:47 -07:00
David S. Miller	a2bd4fd179	[SPARC64]: Add of_device layer and make ebus/isa use it. Sparcspkr and power drivers are converted, to make sure it works. Eventually the SBUS device layer will use this as a sub-class. I really cannot cut loose on that bit until sparc32 is given the same infrastructure. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:43 -07:00
David S. Miller	8cd24ed4f8	[SPARC64]: Expand of_*() interfaces some more. Import some more stuff from powerpc. Add of_device_is_compatible(), and of_find_compatible_node(). Export some more of the other routines to modules. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:41 -07:00
David S. Miller	cecc4e9222	[SPARC64]: Convert central bus layer to in-kernel PROM device tree. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:32 -07:00
David S. Miller	9c10a58ed6	[SPARC64]: Kill ebus/isa range and interrupt mapping struct members. Unused outside of initial bus probe scan. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:30 -07:00
David S. Miller	690c8fd31f	[SPARC64]: Use in-kernel PROM tree for EBUS and ISA. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:28 -07:00
David S. Miller	de8d28b16f	[SPARC64]: Convert sparc64 PCI layer to in-kernel device tree. One thing this change pointed out was that we really should pull the "get 'local-mac-address' property" logic into a helper function all the network drivers can call. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:26 -07:00
David S. Miller	c2a5a46be4	[SPARC64]: Fix for Niagara memory corruption. On some sun4v systems, after netboot the ethernet controller and it's DMA mappings can be left active. The net result is that the kernel can end up using memory the ethernet controller will continue to DMA into, resulting in corruption. To deal with this, we are more careful about importing IOMMU translations which OBP has left in the IO-TLB. If the mapping maps into an area the firmware claimed was free and available memory for the kernel to use, we demap instead of import that IOMMU entry. This is going to cause the network chip to take a PCI master abort on the next DMA it attempts, if it has been left going like this. All tests show that this is handled properly by the PCI layer and the e1000 drivers. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:21 -07:00
David S. Miller	07f8e5f358	[SPARC64]: Convert cpu_find_by_*() interface to in-kernel PROM device tree. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:17 -07:00
David S. Miller	6d307724cb	[SPARC64]: Add of_getintprop_default(). This encodes a common idiomatic coding pattern used when dealing with integer properties. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:15 -07:00
David S. Miller	6760d28bc6	[SPARC64]: Convert sun4v virtual-device layer to in-kernel PROM device tree. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:13 -07:00
David S. Miller	e87dc35020	[SPARC64]: Use in-kernel OBP device tree for PCI controller probing. It can be pushed even further down, but this is a first step. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:07 -07:00
David S. Miller	aaf7cec276	[SPARC64]: Add of_find_node_by_{name,type}(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:04 -07:00
David S. Miller	372b07bb5a	[SPARC64]: Import OBP device tree into kernel data structures. The basic framework is based on the PowerPC OF code. This code even tries to get the device addressing components correct in the full path names. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:02 -07:00
David S. Miller	8fae097deb	[SBUS]: Start cleaning up generic sbus support layer. In particular, move the IRQ probing out to sparc32/sparc64 arch specific code where it belongs. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 23:15:00 -07:00
Bjorn Helgaas	4f1bcaf094	[PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use VGA_MAP_MEM translates to ioremap() on some architectures. It makes sense to do this to vga_vram_base, because we're going to access memory between vga_vram_base and vga_vram_end. But it doesn't really make sense to map starting at vga_vram_end, because we aren't going to access memory starting there. On ia64, which always has to be different, ioremapping vga_vram_end gives you something completely incompatible with ioremapped vga_vram_start, so vga_vram_size ends up being nonsense. As a bonus, we often know the size up front, so we can use ioremap() correctly, rather than giving it a zero size. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-22 15:05:58 -07:00
Linus Torvalds	be883da759	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Update defconfig. [SPARC64]: Don't double-export synchronize_irq. [SPARC64]: Move over to GENERIC_HARDIRQS. [SPARC64]: Virtualize IRQ numbers. [SPARC64]: Kill ino_bucket->pil [SPARC]: Kill __irq_itoa(). [SPARC64]: bp->pil can never be zero [SPARC64]: Send all device interrupts via one PIL. [SPARC]: Fix iommu_flush_iotlb end address [SPARC]: Mark smp init functions as cpuinit [SPARC]: Add missing rw can_lock macros [SPARC]: Setup cpu_possible_map [SPARC]: Add topology_init()	2006-06-20 17:39:28 -07:00
Linus Torvalds	cee4cca740	Merge git://git.infradead.org/hdrcleanup-2.6 * git://git.infradead.org/hdrcleanup-2.6: (63 commits) [S390] __FD_foo definitions. Switch to __s32 types in joystick.h instead of C99 types for consistency. Add <sys/types.h> to headers included for userspace in <linux/input.h> Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h Remove struct fddi_statistics from user view in <linux/if_fddi.h> Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390 Revert include/media changes: Mauro says those ioctls are only used in-kernel(!) Include <linux/types.h> and use __uXX types in <linux/cramfs_fs.h> Use __uXX types in <linux/i2o_dev.h>, include <linux/ioctl.h> too Remove private struct dx_hash_info from public view in <linux/ext3_fs.h> Include <linux/types.h> and use __uXX types in <linux/affs_hardblocks.h> Use __uXX types in <linux/divert.h> for struct divert_blk et al. Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible. Remove PPP_FCS from user view in <linux/ppp_defs.h>, remove __P mess entirely Use __uXX types in user-visible structures in <linux/nbd.h> Don't use 'u32' in user-visible struct ip_conntrack_old_tuple. Use __uXX types for S390 DASD volume label definitions which are user-visible S390 BIODASDREADCMB ioctl should use __u64 not u64 type. Remove unneeded inclusion of <linux/time.h> from <linux/ufs_fs.h> Fix private integer types used in V4L2 ioctls. ... Manually resolve conflict in include/linux/mtd/physmap.h	2006-06-20 15:10:08 -07:00
David S. Miller	e18e2a00ef	[SPARC64]: Move over to GENERIC_HARDIRQS. This is the long overdue conversion of sparc64 over to the generic IRQ layer. The kernel image is slightly larger, but the BSS is ~60K smaller due to the reduced size of struct ino_bucket. A lot of IRQ implementation details, including ino_bucket, were moved out of asm-sparc64/irq.h and are now private to arch/sparc64/kernel/irq.c, and most of the code in irq.c totally disappeared. One thing that's different at the moment is IRQ distribution, we do it at enable_irq() time. If the cpu mask is ALL then we round-robin using a global rotating cpu counter, else we pick the first cpu in the mask to support single cpu targetting. This is similar to what powerpc's XICS IRQ support code does. This works fine on my UP SB1000, and the SMP build goes fine and runs on that machine, but lots of testing on different setups is needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-20 01:23:32 -07:00
David S. Miller	8047e247c8	[SPARC64]: Virtualize IRQ numbers. Inspired by PowerPC XICS interrupt support code. All IRQs are virtualized in order to keep NR_IRQS from needing to be too large. Interrupts on sparc64 are arbitrary 11-bit values, but we don't need to define NR_IRQS to 2048 if we virtualize the IRQs. As PCI and SBUS controller drivers build device IRQs, we divy out virtual IRQ numbers incrementally starting at 1. Zero is a special virtual IRQ used for the timer interrupt. So device drivers all see virtual IRQs, and all the normal interfaces such as request_irq(), enable_irq(), etc. translate that into a real IRQ number in order to configure the IRQ. At this point knowledge of the struct ino_bucket is almost entirely contained within arch/sparc64/kernel/irq.c There are a few small bits in the PCI controller drivers that need to be swept away before we can remove ino_bucket's definition out of asm-sparc64/irq.h and privately into kernel/irq.c Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-20 01:22:35 -07:00
David S. Miller	37cdcd9e82	[SPARC64]: Kill ino_bucket->pil And reuse that struct member for virt_irq, which will be used in future changesets for the implementation of mapping between real and virtual IRQ numbers. This nicely kills off a ton of SBUS and PCI controller PIL assignment code which is no longer necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-20 01:21:57 -07:00
David S. Miller	c6387a48cf	[SPARC]: Kill __irq_itoa(). This ugly hack was long overdue to die. It was a way to print out Sparc interrupts in a more freindly format, since IRQ numbers were arbitrary opaque 32-bit integers which vectored into PIL levels. These 32-bit integers were not necessarily in the 0-->NR_IRQS range, but the PILs they vectored to were. The idea now is that we will increase NR_IRQS a little bit and use a virtual<-->real IRQ number mapping scheme similar to PowerPC. That makes this IRQ printing hack irrelevant, and furthermore only a handful of drivers actually used __irq_itoa() making it even less useful. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-20 01:21:29 -07:00
David S. Miller	fd0504c321	[SPARC64]: Send all device interrupts via one PIL. This is the first in a series of cleanups that will hopefully allow a seamless attempt at using the generic IRQ handling infrastructure in the Linux kernel. Define PIL_DEVICE_IRQ and vector all device interrupts through there. Get rid of the ugly pil0_dummy_{bucket,desc}, instead vector the timer interrupt directly to a specific handler since the timer interrupt is the only event that will be signaled on PIL 14. The irq_worklist is now in the per-cpu trap_block[]. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-20 01:20:00 -07:00
David S. Miller	4d1a099828	Restrict headers exported to userspace for SPARC and SPARC64 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-06-20 08:34:40 +01:00
David Woodhouse	a29ee9f3bf	Add Kbuild file for SPARC 'make headers_install' Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-06-18 12:22:49 +01:00
David S. Miller	0b0968a3e6	[SPARC64]: Fix D-cache corruption in mremap If we move a mapping from one virtual address to another, and this changes the virtual color of the mapping to those pages, we can see corrupt data due to D-cache aliasing. Check for and deal with this by overriding the move_pte() macro. Set things up so that other platforms can cleanly override the move_pte() macro too. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-01 17:47:25 -07:00
David Woodhouse	66643de455	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: include/asm-powerpc/unistd.h include/asm-sparc/unistd.h include/asm-sparc64/unistd.h Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-05-24 09:22:21 +01:00
David S. Miller	42f142371e	[SPARC64]: Respect gfp_t argument to dma_alloc_coherent(). Using asm-generic/dma-mapping.h does not work because pushing the call down to pci_alloc_coherent() causes the gfp_t argument of dma_alloc_coherent() to be ignored. Fix this by implementing things directly, and adding a gfp_t argument we can use in the internal call down to the PCI DMA implementation of pci_alloc_coherent(). This fixes massive memory corruption when using the sound driver layer, which passes things like __GFP_COMP down into these routines and (correctly) expects that to work. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-23 02:07:22 -07:00
David S. Miller	353b28bafd	[SPARC]: Add robust futex syscall entries. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-21 21:22:53 -07:00
David Woodhouse	5047f09b56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-05-06 19:59:18 +01:00
David S. Miller	8c45112b82	[SPARC]: Hook up vmsplice into syscall tables. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-03 13:55:46 -07:00
David S. Miller	1241140f51	[SPARC64]: Kill __flush_tlb_page() prototype. This function no longer exists. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-30 21:40:13 -07:00
David Woodhouse	5614253686	Remove unneeded _syscallX macros from user view in asm-*/unistd.h These aren't needed by glibc or klibc, and they're broken in some cases anyway. The uClibc folks are apparently switching over to stop using them too (now that we agreed that they should be dropped, at least). Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-29 01:51:47 +01:00
David Woodhouse	62c4f0a2d5	Don't include linux/config.h from anywhere else in include/ Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-26 12:56:16 +01:00
OGAWA Hirofumi	d8fe3f1920	[SPARC]: __NR_sys removal __NR_sys_sync_file_range part was lost somewhere... [glibc is already checking __NR_sync_file_range] Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-24 13:48:51 -07:00
David S. Miller	2784f40e27	[SPARC]: __NR_sys_splice --> __NR_splice Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-19 15:00:01 -07:00
David S. Miller	5fdef39495	[SPARC]: Hook up sys_tee() into syscall tables. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-14 15:29:32 -07:00
KAMEZAWA Hiroyuki	a283a52520	[PATCH] for_each_possible_cpu: sparc64 for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. for sparc64. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:31 -07:00
David S. Miller	289eee6fa7	[SPARC]: Wire up sys_sync_file_range() into syscall tables. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-31 23:49:34 -08:00
David S. Miller	1339713a32	[SPARC]: Wire up sys_splice() into the syscall tables. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-31 23:03:38 -08:00
Linus Torvalds	78cd9e04e0	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Implement futex_atomic_cmpxchg_inatomic().	2006-03-28 09:25:22 -08:00
Alexey Dobriyan	7f927fcc2f	[PATCH] Typo fixes Fix a lot of typos. Eyeballed by jmc@ in OpenBSD. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-28 09:16:08 -08:00
David S. Miller	6e57a3a897	[SPARC64]: Implement futex_atomic_cmpxchg_inatomic(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-28 01:00:08 -08:00
Alan Stern	e041c68341	[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:50 -08:00
Ingo Molnar	e9056f13bf	[PATCH] lightweight robust futexes: arch defaults This patchset provides a new (written from scratch) implementation of robust futexes, called "lightweight robust futexes". We believe this new implementation is faster and simpler than the vma-based robust futex solutions presented before, and we'd like this patchset to be adopted in the upstream kernel. This is version 1 of the patchset. Background ---------- What are robust futexes? To answer that, we first need to understand what futexes are: normal futexes are special types of locks that in the noncontended case can be acquired/released from userspace without having to enter the kernel. A futex is in essence a user-space address, e.g. a 32-bit lock variable field. If userspace notices contention (the lock is already owned and someone else wants to grab it too) then the lock is marked with a value that says "there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to wait for the other guy to release it. The kernel creates a 'futex queue' internally, so that it can later on match up the waiter with the waker - without them having to know about each other. When the owner thread releases the futex, it notices (via the variable value) that there were waiter(s) pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up. Once all waiters have taken and released the lock, the futex is again back to 'uncontended' state, and there's no in-kernel state associated with it. The kernel completely forgets that there ever was a futex at that address. This method makes futexes very lightweight and scalable. "Robustness" is about dealing with crashes while holding a lock: if a process exits prematurely while holding a pthread_mutex_t lock that is also shared with some other process (e.g. yum segfaults while holding a pthread_mutex_t, or yum is kill -9-ed), then waiters for that lock need to be notified that the last owner of the lock exited in some irregular way. To solve such types of problems, "robust mutex" userspace APIs were created: pthread_mutex_lock() returns an error value if the owner exits prematurely - and the new owner can decide whether the data protected by the lock can be recovered safely. There is a big conceptual problem with futex based mutexes though: it is the kernel that destroys the owner task (e.g. due to a SEGFAULT), but the kernel cannot help with the cleanup: if there is no 'futex queue' (and in most cases there is none, futexes being fast lightweight locks) then the kernel has no information to clean up after the held lock! Userspace has no chance to clean up after the lock either - userspace is the one that crashes, so it has no opportunity to clean up. Catch-22. In practice, when e.g. yum is kill -9-ed (or segfaults), a system reboot is needed to release that futex based lock. This is one of the leading bugreports against yum. To solve this problem, 'Robust Futex' patches were created and presented on lkml: the one written by Todd Kneisel and David Singleton is the most advanced at the moment. These patches all tried to extend the futex abstraction by registering futex-based locks in the kernel - and thus give the kernel a chance to clean up. E.g. in David Singleton's robust-futex-6.patch, there are 3 new syscall variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER. The kernel attaches such robust futexes to vmas (via vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are searched to see whether they have a robust_head set. Lots of work went into the vma-based robust-futex patch, and recently it has improved significantly, but unfortunately it still has two fundamental problems left: - they have quite complex locking and race scenarios. The vma-based patches had been pending for years, but they are still not completely reliable. - they have to scan _every_ vma at sys_exit() time, per thread! The second disadvantage is a real killer: pthread_exit() takes around 1 microsecond on Linux, but with thousands (or tens of thousands) of vmas every pthread_exit() takes a millisecond or more, also totally destroying the CPU's L1 and L2 caches! This is very much noticeable even for normal process sys_exit_group() calls: the kernel has to do the vma scanning unconditionally! (this is because the kernel has no knowledge about how many robust futexes there are to be cleaned up, because a robust futex might have been registered in another task, and the futex variable might have been simply mmap()-ed into this process's address space). This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than that: the overhead makes robust futexes impractical for any type of generic Linux distribution. So it became clear to us, something had to be done. Last week, when Thomas Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he found a handful of new races and we were talking about it and were analyzing the situation. At that point a fundamentally different solution occured to me. This patchset (written in the past couple of days) implements that new solution. Be warned though - the patchset does things we normally dont do in Linux, so some might find the approach disturbing. Parental advice recommended ;-) New approach to robust futexes ------------------------------ At the heart of this new approach there is a per-thread private list of robust locks that userspace is holding (maintained by glibc) - which userspace list is registered with the kernel via a new syscall [this registration happens at most once per thread lifetime]. At do_exit() time, the kernel checks this user-space list: are there any robust futex locks to be cleaned up? In the common case, at do_exit() time, there is no list registered, so the cost of robust futexes is just a simple current->robust_list != NULL comparison. If the thread has registered a list, then normally the list is empty. If the thread/process crashed or terminated in some incorrect way then the list might be non-empty: in this case the kernel carefully walks the list [not trusting it], and marks all locks that are owned by this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any). The list is guaranteed to be private and per-thread, so it's lockless. There is one race possible though: since adding to and removing from the list is done after the futex is acquired by glibc, there is a few instructions window for the thread (or process) to die there, leaving the futex hung. To protect against this possibility, userspace (glibc) also maintains a simple per-thread 'list_op_pending' field, to allow the kernel to clean up if the thread dies after acquiring the lock, but just before it could have added itself to the list. Glibc sets this list_op_pending field before it tries to acquire the futex, and clears it after the list-add (or list-remove) has finished. That's all that is needed - all the rest of robust-futex cleanup is done in userspace [just like with the previous patches]. Ulrich Drepper has implemented the necessary glibc support for this new mechanism, which fully enables robust mutexes. (Ulrich plans to commit these changes to glibc-HEAD later today.) Key differences of this userspace-list based approach, compared to the vma based method: - it's much, much faster: at thread exit time, there's no need to loop over every vma (!), which the VM-based method has to do. Only a very simple 'is the list empty' op is done. - no VM changes are needed - 'struct address_space' is left alone. - no registration of individual locks is needed: robust mutexes dont need any extra per-lock syscalls. Robust mutexes thus become a very lightweight primitive - so they dont force the application designer to do a hard choice between performance and robustness - robust mutexes are just as fast. - no per-lock kernel allocation happens. - no resource limits are needed. - no kernel-space recovery call (FUTEX_RECOVER) is needed. - the implementation and the locking is "obvious", and there are no interactions with the VM. Performance ----------- I have benchmarked the time needed for the kernel to process a list of 1 million (!) held locks, using the new method [on a 2GHz CPU]: - with FUTEX_WAIT set [contended mutex]: 130 msecs - without FUTEX_WAIT set [uncontended mutex]: 30 msecs I have also measured an approach where glibc does the lock notification [which it currently does for !pshared robust mutexes], and that took 256 msecs - clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do. (1 million held locks are unheard of - we expect at most a handful of locks to be held at a time. Nevertheless it's nice to know that this approach scales nicely.) Implementation details ---------------------- The patch adds two new syscalls: one to register the userspace list, and one to query the registered list pointer: asmlinkage long sys_set_robust_list(struct robust_list_head __user head, size_t len); asmlinkage long sys_get_robust_list(int pid, struct robust_list_head __user head_ptr, size_t __user len_ptr); List registration is very fast: the pointer is simply stored in current->robust_list. [Note that in the future, if robust futexes become widespread, we could extend sys_clone() to register a robust-list head for new threads, without the need of another syscall.] So there is virtually zero overhead for tasks not using robust futexes, and even for robust futex users, there is only one extra syscall per thread lifetime, and the cleanup operation, if it happens, is fast and straightforward. The kernel doesnt have any internal distinction between robust and normal futexes. If a futex is found to be held at exit time, the kernel sets the highest bit of the futex word: #define FUTEX_OWNER_DIED 0x40000000 and wakes up the next futex waiter (if any). User-space does the rest of the cleanup. Otherwise, robust futexes are acquired by glibc by putting the TID into the futex field atomically. Waiters set the FUTEX_WAITERS bit: #define FUTEX_WAITERS 0x80000000 and the remaining bits are for the TID. Testing, architecture support ----------------------------- I've tested the new syscalls on x86 and x86_64, and have made sure the parsing of the userspace list is robust [ ;-) ] even if the list is deliberately corrupted. i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the new glibc code (on x86_64 and i386), and it works for his robust-mutex testcases. All other architectures should build just fine too - but they wont have the new syscalls yet. Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline function before writing up the syscalls (that function returns -ENOSYS right now). This patch: Add placeholder futex_atomic_cmpxchg_inuser() implementations to every architecture that supports futexes. It returns -ENOSYS. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:49 -08:00
KAMEZAWA Hiroyuki	a117e66ed4	[PATCH] unify pfn_to_page: generic functions There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM. Each arch has its own page_to_pfn(), pfn_to_page() for each models. But most of them can use the same arithmetic. This patch adds asm-generic/memory_model.h, which includes generic page_to_pfn(), pfn_to_page() definitions for each memory model. When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are used instead of macro. This is enabled by some archs and reduces text size. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:44 -08:00
Akinobu Mita	2d78d4beb6	[PATCH] bitops: sparc64: use generic bitops - remove __{,test_and_}{set,clear,change}_bit() and test_bit() - remove ffz() - remove __ffs() - remove generic_fls() - remove generic_fls64() - remove sched_find_first_bit() - remove ffs() - unless defined(ULTRA_HAS_POPULATION_COUNT) - remove generic_hweight{64,32,16,8}() - remove find_{next,first}{,_zero}_bit() - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit() Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:14 -08:00
Akinobu Mita	67b0ad574b	[PATCH] bitops: use non atomic operations for minix__bit() and ext2__bit() Bitmap functions for the minix filesystem and the ext2 filesystem except ext2_set_bit_atomic() and ext2_clear_bit_atomic() do not require the atomic guarantees. But these are defined by using atomic bit operations on several architectures. (cris, frv, h8300, ia64, m32r, m68k, m68knommu, mips, s390, sh, sh64, sparc, sparc64, v850, and xtensa) This patch switches to non atomic bit operation. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:10 -08:00
Davide Libenzi	f348d70a32	[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 08:22:56 -08:00
Andrew Morton	394e3902c5	[PATCH] more for_each_cpu() conversions When we stop allocating percpu memory for not-possible CPUs we must not touch the percpu data for not-possible CPUs at all. The correct way of doing this is to test cpu_possible() or to use for_each_cpu(). This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very few instances of this bug, if any. But the patch converts lots of open-coded test to use the preferred helper macros. Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Kyle McMartin <kyle@parisc-linux.org> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andi Kleen <ak@muc.de> Cc: Christian Zankel <chris@zankel.net> Cc: Philippe Elie <phil.el@wanadoo.fr> Cc: Nathan Scott <nathans@sgi.com> Cc: Jens Axboe <axboe@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-23 07:38:17 -08:00
Nick Piggin	0b2fcfdb8b	[PATCH] atomic: add_unless cmpxchg optimise Without branch hints, the very unlikely chance of the loop repeating due to cmpxchg failure is unrolled with gcc-4 that I have tested. Improve this for architectures with a native cas/cmpxchg. llsc archs should try to implement this natively. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Andi Kleen <ak@muc.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-23 07:38:17 -08:00
Kyle McMartin	804f1594cc	[PATCH] Move read_mostly definition to asm/cache.h Seems like needless clutter having a bunch of #if defined(CONFIG_$ARCH) in include/linux/cache.h. Move the per architecture section definition to asm/cache.h, and keep the if-not-defined dummy case in linux/cache.h to catch architectures which don't implement the section. Verified that symbols still go in .data.read_mostly on parisc, and the compile doesn't break. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-23 07:38:10 -08:00
David S. Miller	dcc1e8dd88	[SPARC64]: Add a secondary TSB for hugepage mappings. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-22 01:15:14 -08:00
David S. Miller	f6b83f070e	[SPARC64]: Fix 2 bugs in huge page support. 1) huge_pte_offset() did not check the page table hierarchy elements as being empty correctly, resulting in an OOPS 2) Need platform specific hugetlb_get_unmapped_area() to handle the top-down vs. bottom-up address space allocation strategies. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:17:17 -08:00
David S. Miller	bb8646d834	[SPARC64]: Optimized TSB table initialization. We only need to write an invalid tag every 16 bytes, so taking advantage of this can save many instructions compared to the simple memset() call we make now. A prefetching implementation is implemented for sun4u and a block-init store version if implemented for Niagara. The next trick is to be able to perform an init and a copy_tsb() in parallel when growing a TSB table. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:16:41 -08:00
David S. Miller	d61e16df94	[SPARC64]: Increase top of 32-bit process stack. Put it one page below the top of the 32-bit address space. This gives us ~16MB more address space to work with. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:16:36 -08:00
David S. Miller	a91690ddd0	[SPARC64]: Top-down address space allocation for 32-bit tasks. Currently allocations are very constrained for 32-bit processes. It grows down-up from 0x70000000 to 0xf0000000 which gives about 2GB of stack + dynamic mmap() space. So support the top-down method, and we need to override the generic helper function in order to deal with D-cache coloring. With these changes I was able to squeeze out a mmap() just over 3.6GB in size in a 32-bit process. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:16:35 -08:00
David S. Miller	7a1ac52641	[SPARC64]: Fix and re-enable dynamic TSB sizing. This is good for up to %50 performance improvement of some test cases. The problem has been the race conditions, and hopefully I've plugged them all up here. 1) There was a serious race in switch_mm() wrt. lazy TLB switching to and from kernel threads. We could erroneously skip a tsb_context_switch() and thus use a stale TSB across a TSB grow event. There is a big comment now in that function describing exactly how it can happen. 2) All code paths that do something with the TSB need to be guarded with the mm->context.lock spinlock. This makes page table flushing paths properly synchronize with both TSB growing and TLB context changes. 3) TSB growing events are moved to the end of successful fault processing. Previously it was in update_mmu_cache() but that is deadlock prone. At the end of do_sparc64_fault() we hold no spinlocks that could deadlock the TSB grow sequence. We also have dropped the address space semaphore. While we're here, add prefetching to the copy_tsb() routine and put it in assembler into the tsb.S file. This piece of code is quite time critical. There are some small negative side effects to this code which can be improved upon. In particular we grab the mm->context.lock even for the tsb insert done by update_mmu_cache() now and that's a bit excessive. We can get rid of that locking, and the same lock taking in flush_tsb_user(), by disabling PSTATE_IE around the whole operation including the capturing of the tsb pointer and tsb_nentries value. That would work because anyone growing the TSB won't free up the old TSB until all cpus respond to the TSB change cross call. I'm not quite so confident in that optimization to put it in right now, but eventually we might be able to and the description is here for reference. This code seems very solid now. It passes several parallel GCC bootstrap builds, and our favorite "nut cruncher" stress test which is a full "make -j8192" build of a "make allmodconfig" kernel. That puts about 256 processes on each cpu's run queue, makes lots of process cpu migrations occur, causes lots of page table and TLB flushing activity, incurs many context version number changes, and it swaps the machine real far out to disk even though there is 16GB of ram on this test system. :-) Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:16:33 -08:00
David S. Miller	90a6646bf6	[SPARC64]: Fix system type in /proc/cpuinfo and remove bogus OBP check. Report 'sun4v' when appropriate in /proc/cpuinfo Remove all the verifications of the OBP version string. Just make sure it's there, and report it raw in the bootup logs and via /proc/cpuinfo. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:25 -08:00
David S. Miller	8935dced54	[SPARC64]: Add SMT scheduling support for Niagara. The mapping is a simple "(cpuid >> 2) == core" for now. Later we'll add more sophisticated code that will walk the sun4v machine description and figure this out from there. We should also add core mappings for jaguar and panther processors. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:24 -08:00
David S. Miller	d1112018b4	[SPARC64]: Move over to sparsemem. This has been pending for a long time, and the fact that we waste a ton of ram on some configurations kind of pushed things over the edge. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:22 -08:00
David S. Miller	ee29074d3b	[SPARC64]: Fix new context version SMP handling. Don't piggy back the SMP receive signal code to do the context version change handling. Instead allocate another fixed PIL number for this asynchronous cross-call. We can't use smp_call_function() because this thing is invoked with interrupts disabled and a few spinlocks held. Also, fix smp_call_function_mask() to count "cpus" correctly. There is no guarentee that the local cpu is in the mask yet that is exactly what this code was assuming. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:21 -08:00
David S. Miller	a77754b4d0	[SPARC64]: Bulletproof MMU context locking. 1) Always spin_lock_init() in init_context(). The caller essentially clears it out, or copies the mm info from the parent. In both cases we need to explicitly initialize the spinlock. 2) Always do explicit IRQ disabling while taking mm->context.lock and ctx_alloc_lock. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:20 -08:00
David S. Miller	8bcd174116	[SPARC64]: Do not allow mapping pages within 4GB of 64-bit VA hole. The UltraSPARC T1 manual recommends this because the chip could instruction prefetch into the VA hole, and this would also make decoding certain kinds of memory access traps more difficult (because the chip sign extends certain pieces of trap state). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:14 -08:00
David S. Miller	e22990451a	[SPARC64]: Kill bogus function externs in asm/pgtable.h These are all implemented inline earlier in the file. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:11 -08:00
David S. Miller	b830ab665a	[SPARC64]: Fix bugs in SUN4V cpu mondo dispatch. There were several bugs in the SUN4V cpu mondo dispatch code. In fact, if we ever got a EWOULDBLOCK or other error from the hypervisor call, we'd potentially send a cpu mondo multiple times to the same cpu and even worse we could loop until the timeout resending the same mondo over and over to such cpus. So let's bulletproof this thing as follows: 1) Implement cpu_mondo_send() and cpu_state() hypervisor calls in arch/sparc64/kernel/entry.S, add prototypes to asm/hypervisor.h 2) Don't build and update the cpulist using inline functions, this was causing the cpu mask to not get updated in the caller. 3) Disable interrupts during the entire mondo send, otherwise our cpu list and/or mondo block could get overwritten if we take an interrupt and do a cpu mondo send on the current cpu. 4) Check for all possible error return types from the cpu_mondo_send() hypervisor call. In particular: HV_EOK) Our work is done, all cpus have received the mondo. HV_CPUERROR) One or more of the cpus in the cpu list we passed to the hypervisor are in error state. Use cpu_state() calls over the entries in the cpu list to see which ones. Record them in "error_mask" and report this after we are done sending the mondo to cpus which are not in error state. HV_EWOULDBLOCK) We need to keep trying. Any other error we consider fatal, we report the event and exit immediately. 5) We only timeout if forward progress is not made. Forward progress is defined as having at least one cpu get the mondo successfully in a given cpu_mondo_send() call. Otherwise we bump a counter and delay a little. If the counter hits a limit, we signal an error and report the event. Also, smp_call_function_mask() error handling reports the number of cpus incorrectly. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:09 -08:00
David S. Miller	97c4b6f95a	[SPARC64]: Use 13-bit context size always. We no longer have the problems that require using the smaller sizes. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:06 -08:00
David S. Miller	3634476239	[SPARC64]: Niagara optimized XOR functions for RAID. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:03 -08:00
David S. Miller	a0663a79ad	[SPARC64]: Fix TLB context allocation with SMT style shared TLBs. The context allocation scheme we use depends upon there being a 1<-->1 mapping from cpu to physical TLB for correctness. Chips like Niagara break this assumption. So what we do is notify all cpus with a cross call when the context version number changes, and if necessary this makes them allocate a valid context for the address space they are running at the time. Stress tested with make -j1024, make -j2048, and make -j4096 kernel builds on a 32-strand, 8 core, T2000 with 16GB of ram. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:14:00 -08:00
David S. Miller	0f05da6d57	[SPARC64]: Fix %tstate ASI handling in start_thread{,32}() Niagara helps us find a ancient bug in the sparc64 port :-) The ASI_* values are plain constant defines, thus signed 32-bit on sparc64. To put shift this into the regs->tstate value we were doing or'ing "(ASI_PNF << 24)" into there. ASI_PNF is 0x82 and shifted left by 24 makes that topmost bit the sign bit in a 32-bit value. This would get sign extended to 64-bits and thus corrupt the top-half of the reg->tstate value. This never caused problems in pre-Niagara cpus because the only thing up there were the condition code values. But Niagara has the global register level field, and this all 1's value is illegal there so Niagara gives an illegal instruction trap due to this bug. I'm pretty sure this bug is about as old as the sparc64 port itself. This also points out that we weren't setting ASI_PNF for 32-bit tasks. We should, so fix that while we're here. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:57 -08:00
David S. Miller	d7744a0950	[SPARC64]: Create a seperate kernel TSB for 4MB/256MB mappings. It can map all of the linear kernel mappings with zero TSB hash conflicts for systems with 16GB or less ram. In such cases, on SUN4V, once we load up this TSB the first time with all the mappings, we never take a linear kernel mapping TLB miss ever again, the hypervisor handles them all. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:56 -08:00
David S. Miller	6f5374c91f	[SPARC64]: Add sun4v_cpu_yield(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:52 -08:00
David S. Miller	1bd0cd74d1	[SPARC64]: Kill cpudata->idle_volume. Set, but never used. We used to use this for dynamic IRQ retargetting, but that code died a long time ago. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:51 -08:00
David S. Miller	0f15952ac8	[SPARC64]: Export a PAGE_SHARED symbol. For drivers/media/*, noticed by Fabbione. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:36 -08:00
Fabio M. Di Nitto	f6c1fe5292	[SPARC64] Fix build if CONFIG_HUGETLB_PAGE is not set Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:35 -08:00
David S. Miller	8b23427441	[SPARC64]: More TLB/TSB handling fixes. The SUN4V convention with non-shared TSBs is that the context bit of the TAG is clear. So we have to choose an "invalid" bit and initialize new TSBs appropriately. Otherwise a zero TAG looks "valid". Make sure, for the window fixup cases, that we use the right global registers and that we don't potentially trample on the live global registers in etrap/rtrap handling (%g2 and %g6) and that we put the missing virtual address properly in %g5. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:34 -08:00
David S. Miller	3763be32d5	[SPARC64]: Define ARCH_HAS_READ_CURRENT_TIMER. This gives more consistent bogomips and delay() semantics, especially on sun4v. It gives weird looking values though... Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:29 -08:00
David S. Miller	c857e3fdbc	[SPARC64]: __bzero_noasi --> __clear_user Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:28 -08:00
David S. Miller	97532f5982	[SPARC64]: Add HWCAP_SPARC_BLKINIT elf capability flag for Niagara. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:26 -08:00
David S. Miller	ebd8c56c5a	[SPARC64]: Fix uniprocessor IRQ targetting on SUN4V. We need to use the real hardware processor ID when targetting interrupts, not the "define to 0" thing the uniprocessor build gives us. Also, fill in the Node-ID and Agent-ID fields properly on sun4u/Safari. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:24 -08:00
David S. Miller	72aff53f1f	[SPARC64]: Get SUN4V SMP working. The sibling cpu bringup is extremely fragile. We can only perform the most basic calls until we take over the trap table from the firmware/hypervisor on the new cpu. This means no accesses to %g4, %g5, %g6 since those can't be TLB translated without our trap handlers. In order to achieve this: 1) Change sun4v_init_mondo_queues() so that it can operate in several modes. It can allocate the queues, or install them in the current processor, or both. The boot cpu does both in it's call early on. Later, the boot cpu allocates the sibling cpu queue, starts the sibling cpu, then the sibling cpu loads them in. 2) init_cur_cpu_trap() is changed to take the current_thread_info() as an argument instead of reading %g6 directly on the current cpu. 3) Create a trampoline stack for the sibling cpus. We do our basic kernel calls using this stack, which is locked into the kernel image, then go to our proper thread stack after taking over the trap table. 4) While we are in this delicate startup state, we put 0xdeadbeef into %g4/%g5/%g6 in order to catch accidental accesses. 5) On the final prom_set_trap_table*() call, we put &init_thread_union into %g6. This is a hack to make prom_world(0) work. All that wants to do is restore the %asi register using get_thread_current_ds(). Longer term we should just do the OBP calls to set the trap table by hand just like we do for everything else. This would avoid that silly prom_world(0) issue, then we can remove the init_thread_union hack. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:22 -08:00
David S. Miller	4ff7ac417d	[SPARC64]: Add GET_GL_GLOBAL() macro for SUN4V. So we can read the %gl register for debugging. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:18 -08:00
David S. Miller	94f8762db9	[SPARC64]: Add sun4v_cpu_qconf() hypervisor call. Call it from register_one_mondo(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:16 -08:00
David S. Miller	bc45e32e0f	[SPARC]: Kill off these __put_user_ret things. They are bogus and haven't been referenced in years. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:15 -08:00
David S. Miller	9d29a3fafd	[SPARC64]: Decode virtual-devices interrupts correctly. Need to translate through the interrupt-map{,-mask] properties. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:05 -08:00
David S. Miller	7890f794e0	[SPARC64]: Add prom_{start,stop}cpu_cpuid(). Use prom_startcpu_cpuid() on SUN4V instead of prom_startcpu(). We should really test for "SUNW,start-cpu-by-cpuid" presence and use it if present even on SUN4U. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:04 -08:00
David S. Miller	7c3514e450	[SPARC64]: Fixup TSTATE layout diagram in asm/pstate.h Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:13:02 -08:00
David S. Miller	50f4f23c3b	[SPARC64]: Fix gcc-3.3.x warnings. It doesn't like const variables being passed into "i" constraing asm operations. It's a bug, but there is nothing we can really do but work around it. Based upon a report from Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:12:53 -08:00
David S. Miller	c4bea28839	[SPARC64]: Make error codes available from sun4v_intr_get*(). And check for errors at call sites. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:12:51 -08:00
David S. Miller	5259d5bfaf	[SPARC64]: Fix comment typo in asm/hypervisor.h Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:12:46 -08:00
David S. Miller	e77227eb4e	[SPARC64]: Probe virtual-devices root node on sun4v. This is where we learn how to get the interrupts for things like the hypervisor console device. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:12:44 -08:00
David S. Miller	e3999574b4	[SPARC64]: Generic sun4v_build_irq(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:12:40 -08:00
David S. Miller	6c0f402f6c	[SPARC64]: Implement rest of generic interrupt hypervisor calls. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 01:12:37 -08:00

1 2 3 4 5 ...

440 Commits