linux

Author	SHA1	Message	Date
Tejun Heo	da3dbb17a0	libata: make ->scr_read/write callbacks return error code Convert ->scr_read/write callbacks to return error code to better indicate failure. This will help handling of SCR_NOTIFICATION. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-20 08:02:11 -04:00
Tejun Heo	5335b72906	libata: implement AC_ERR_NCQ When an NCQ command fails, all commands in flight are aborted and the offending one is reported using log page 10h. Depending on controller characteristics and LLD implementation, all commands may appear as having a device error due to shared TF status making it hard to determine what's actually going on. This patch adds AC_ERR_NCQ, marks the command reported by log page 10h with it and print extra "<F>" after the error report for the command to help distinguishing the offending command. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-20 08:02:11 -04:00
Tejun Heo	b64bbc39f2	libata: improve EH report formatting Requiring LLDs to format multiple error description messages properly doesn't work too well. Help LLDs a bit by making ata_ehi_push_desc() insert ", " on each invocation. __ata_ehi_push_desc() is the raw version without the automatic separator. While at it, make ehi_desc interface proper functions instead of macros. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-20 08:02:11 -04:00
Tejun Heo	9977126c4b	libata: add @is_cmd to ata_tf_to_fis() Add @is_cmd to ata_tf_to_fis(). This controls bit 7 of the second byte which tells the device whether this H2D FIS is for a command or not. This cleans up ahci a bit and will be used by PMP. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-20 08:02:10 -04:00
Magnus Damm	56386f6424	sh: intc - add support for SH7750 and its variants This patch converts the cpu specific 7750 setup code to use the new intc controller. Many new vectors are added and multiple processor variants including 7091, 7750, 7750s, 7750r, 7751 and 7751r should all have the correct vectors hooked up. IRLM interrupts can be enabled using ipr_irq_enable_irlm() which now is marked as __init. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 18:44:49 +09:00
Jaroslav Kysela	53555eb758	[ALSA] version 1.0.14 Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-07-20 11:13:35 +02:00
Takashi Iwai	89f157d9e6	[ALSA] cs46xx - Fix PM resume Fixed PM resume of cs46xx devices. It now restores properly the DSP image and kick-off the DSP. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-07-20 11:11:57 +02:00
Robert P. J. Day	47a2327eac	[ALSA] Remove unreferenced header file include/sound/wavefront_fx.h Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-07-20 11:11:46 +02:00
Pavel Hofman	13d457094b	[ALSA] emu10k1 - EMU 1212 with 16 capture channels * adding 8 more 32-bit capture channels (total of 16) for emu1010 cards * adding some code comments and card details description Signed-off-by: Pavel Hofman <dustin@seznam.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-07-20 11:11:27 +02:00
Takashi Iwai	621887aee9	[ALSA] Add support for Cyrix/NatSemi Geode CS5530 (VSA1) Add support for Cyrix/NatSemi Geode SC5530 (VSA1). The driver is snd-cs5530. Signed-off-by Ash Willis <ashwillis@programmer.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-07-20 11:11:19 +02:00
Pavel Hofman	ea7cfcdfe6	[ALSA] ice1724 - Add PCM Playback Switch to Revo 7.1 This patch adds the support of mute for front channels of M-Audio Revolution 7.1 (the DAC AK4381 features a mute bit). Signed-off-by: Pavel Hofman <dustin@seznam.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-07-20 11:11:18 +02:00
Liam Girdwood	7a05f067c0	[ALSA] ASoC S3C24xx machine drivers - I2C ID for LM4857 This patch adds I2C ID for the LM4857 audio amp and corrects the spacing of the WM8731, WM8750 and WM8753 ID's. Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz>	2007-07-20 11:11:16 +02:00
Uwe Kleine-König	3945a567d0	[ARM] 4487/1: ns9xxx: complete definition of GPIO related registers I changed the naming to be more obvious---unfortunately the HRM doesn't specify these. Moreover the numbering is changed to be zero indexed as this is more natural. Adjust all callers. Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 10:00:42 +01:00
Uwe Kleine-König	f4ae6413f4	[ARM] 4486/1: ns9xxx: fix a typo in the register definitions. Fixed all users. Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 10:00:41 +01:00
Uwe Kleine-König	70ca7d55e1	[ARM] 4484/1: ns9xxx: fix definition of SYS_TCx_TEN_DIS Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 10:00:41 +01:00
Andrew Victor	ed54fcfd78	[ARM] 4479/1: AT91: Define new MMC register bits Add definitions for RDPROOF, WRPROOF and PDCFBYTE bits of the Mode Register in the updated MMC controller found on the AT91SAM9260 and AT91SAM9263 processors. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:43:28 +01:00
Russell King	228adef16d	[ARM] vfp: make fpexc bit names less verbose Use the fpexc abbreviated names instead of long verbose names for fpexc bits. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:39:57 +01:00
Russell King	21d1ca0453	[ARM] avoid floppy warnings by using fd_dma_setup() Avoid the virt_to_bus()/bus_to_virt() warnings in floppy.c caused by the (useless) double conversion to/from bus addresses. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:39:57 +01:00
Russell King	1ad2cdbd0e	[ARM] remove asm/ptrace.h from asm/thread_info.h asm/thread_info.h doesn't need asm/ptrace.h, remove it. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:39:57 +01:00
Russell King	40c3a578a7	[ARM] shut up "warning: "__IGNORE_sync_file_range" redefined" Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:39:56 +01:00
Dan Williams	70c14ff0e9	[ARM] 4495/1: iop: combined watchdog timer driver for iop3xx and iop13xx In order for this driver to be shared across the iop architectures the iop3xx and iop13xx header files are modified to present a common interface for the iop_wdt driver. Details: * iop13xx supports disabling the timer while iop3xx does not. This requires a few 'compatibility' definitions in include/asm-arm/hardware/iop3xx.h to preclude adding #ifdef CONFIG_ARCH_IOP13XX blocks to the driver code. * The heartbeat interval is derived from the internal bus clock rate, so this this patch also exports the tick rate to the iop_wdt driver. Cc: Curt Bruns <curt.e.bruns@intel.com> Cc: Peter Milne <peter.milne@d-tacq.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:35:42 +01:00
Dan Williams	7dea1b2006	[ARM] 4494/1: iop13xx: fix up elf_hwcap compile breakage arch/arm/boot/compressed/misc.o: In function `valid_user_regs': misc.c:(.text+0x74): undefined reference to `elf_hwcap' This triggers after the various elf_hwcap cleanups in: `f884b1cf57` `d1cbbd6b41` include/asm-arm/arch-iop13xx/uncompress.h calls cpu_relax while spinning on a register value. cpu_relax requires processor.h->ptrace.h->hwcap.h 'elf_hwcap' is defined as an extern, but since the uncompressor does not link against arch/arm/kernel/setup.c 'elf_hwcap' remains undefined. Fix is to open code the cpu_relax() call as barrier(). Cc: Lennert Buytenhek <kernel@wantstofly.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:35:35 +01:00
Arnaud Patard	a8135fcfd0	[ARM] 4476/1: EM7210/SS4000E support This patch adds the basic support for the em7210 board. It is similar to the iq31244 board and can be found on Intel "Baxter Creek" ss4000e nas. Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-07-20 09:35:35 +01:00
Vasily Tarasov	c2dea2d1fd	cfq: async queue allocation per priority If we have two processes with different ioprio_class, but the same ioprio_data, their async requests will fall into the same queue. I guess such behavior is not expected, because it's not right to put real-time requests and best-effort requests in the same queue. The attached patch fixes the problem by introducing additional *cfqq fields on cfqd, pointing to per-(class,priority) async queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-20 10:06:38 +02:00
Paul Mundt	da9f0ac2f1	Merge branch 'clkfwk'	2007-07-20 13:38:49 +09:00
Paul Mundt	f6991b0456	sh: Implement clk_round_rate() in the clock framework. This is an optional component of the clock framework. However, as we're going to be using this in the cpufreq drivers, add support for it to the framework. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 13:29:09 +09:00
David S. Miller	91ba3c2128	[SPARC64]: Fix handling of multiple vdc-port nodes. The "id" property in vdc-port nodes are not unique, they are all zero. Therefore assign ID's using the parent's "cfg-handle" property which will be unique. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-19 21:27:18 -07:00
David S. Miller	bc5a2e64a1	[SPARC]: Add sys_fallocate() entries. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-19 21:26:47 -07:00
David S. Miller	a376178011	[SPARC64]: Use orderly_poweroff(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-19 21:26:42 -07:00
Stephen Rothwell	3f23de10f2	Create drivers/of/platform.c and populate it with the common parts from PowerPC and Sparc[64]. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 14:25:51 +10:00
Stephen Rothwell	b41912ca34	Create linux/of_platorm.h Move common stuff from asm-powerpc/of_platform.h to here and move the common bits from asm-sparc/of_device.h here as well. Create asm-sparc/of_platform.h and move appropriate parts of of_device.h to them. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 14:25:22 +10:00
Stephen Rothwell	37b7754aab	[SPARC/64] Rename some functions like PowerPC This is to make the of merge easier. Also rename of_bus_type. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 14:24:53 +10:00
Stephen Rothwell	f898f8dbce	Begin consolidation of of_device.h This just moves the common stuff from the arch of_device.h files to linux/of_device.h. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 13:41:56 +10:00
Stephen Rothwell	1ef4d4242d	Consolidate of_find_node_by routines This consolidates the routines of_find_node_by_path, of_find_node_by_name, of_find_node_by_type and of_find_compatible_device. Again, the comparison of strings are done differently by Sparc and PowerPC and also these add read_locks around the iterations. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 13:39:06 +10:00
Paul Mundt	a13c16e847	sh64: Wire up fallocate() syscall. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 12:37:51 +09:00
Stephen Rothwell	e679c5f445	Consolidate of_get_parent This requires creating dummy of_node_{get,put} routines for sparc and sparc64. It also adds a read_lock around the parent accesses. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 13:32:58 +10:00
Stephen Rothwell	581b605a83	Consolidate of_find_property The only change here is that a readlock is taken while the property list is being traversed on Sparc where it was not taken previously. Also, Sparc uses strcasecmp to compare property names while PowerPC uses strcmp. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 13:32:24 +10:00
Stephen Rothwell	0081cbc373	Consolidate of_device_is_compatible The only difference here is that Sparc uses strncmp to match compatibility names while PowerPC uses strncasecmp. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 13:29:51 +10:00
Stephen Rothwell	97e873e5c8	Start split out of common open firmware code This creates drivers/of/base.c (depending on CONFIG_OF) and puts the first trivially common bits from the prom.c files into it. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 13:28:41 +10:00
Paul Mundt	0c99adb0a6	sh: Wire up fallocate() syscall. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 12:27:09 +09:00
Magnus Damm	39c7aa9ea9	sh: intc - add support for 7780 This patch converts the cpu specific 7780 setup code to use the new intc controller. Many new vectors are added and also support for external interrupt sense configuration. So with this patch it is now possible to configure external interrupt pins as edge or level triggered using set_irq_type(). No external interrupts are registered by default. Use plat_irq_setup_pins() to select between IRQ or IRL mode. This patch also fixes the Alarm IRQ for the RTC. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 12:18:21 +09:00
Magnus Damm	90015c8938	sh: IPR/INTC2 IRQ setup consolidation. This patch unifies the cpu specific interrupt setup functions for interrupt controller blocks such as ipr, intc2 and intc. There is no point in having separate functions for each interrupt controller, so let's clean this up. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 12:18:20 +09:00
Magnus Damm	493a358e0a	sh: clean up interrupt code for solution engine 7722 board This patch cleans up solution engine 7722 specific interrupt code. The main purpose is to replace the mux function with use of set_irq_chained_handler() and replace hard coded register poking code with set_irq_type(). The board specific interrupts are also moved to start from SE7722_FPGA_IRQ_BASE. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 12:18:20 +09:00
Magnus Damm	1b06428ee5	sh: intc - add support for 7722 processor This patch converts the cpu specific 7722 setup code to use the new intc controller. Many new vectors are added and also support for external interrupt sense configuration. So with this patch it is now possible to configure external interrupt pins as edge or level triggered using set_irq_type(). Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 12:18:20 +09:00
Magnus Damm	02ab3f7079	sh: intc - shared IPR and INTC2 controller This is the second version of the shared interrupt controller patch for the sh architecture, fixing up handling of intc_reg_fns[]. The three main advantages with this controller over the existing ones are: - Both priority (ipr) and bitmap (intc2) registers are supported - External pin sense configuration is supported, ie edge vs level triggered - CPU/Board specific code maps 1:1 with datasheet for easy verification This controller can easily coexist with the current IPR and INTC2 controllers, but the idea is that CPUs/Boards should be moved over to this controller over time so we have a single code base to maintain. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 12:18:20 +09:00
Stephen Rothwell	76c1ce7870	Split out common parts of prom.h This creates linux/of.h and includes asm/prom.h from it. We also include linux/of.h from asm/prom.h while we transition. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net>	2007-07-20 13:10:22 +10:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Tony Luck	f4fbfb0dda	Pull vector-domain into release branch	2007-07-19 16:34:40 -07:00
Bartlomiej Zolnierkiewicz	4099d14322	ide: add PIO masks * Add ATA_PIO[0-6] defines to <linux/ata.h>. * Add ->pio_mask field to ide_pci_device_t and ide_hwif_t. * Add PIO masks to host drivers. <linux/ata.h> change ACK-ed by Jeff Garzik <jeff@garzik.org>. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-07-20 01:11:59 +02:00
Bartlomiej Zolnierkiewicz	6a824c92db	ide: remove ide_find_best_pio_mode() * Add ->host_flags to ide_hwif_t to store ide_pci_device_t.host_flags, assign it in setup-pci.c:ide_pci_setup_ports(). * Add IDE_HFLAG_PIO_NO_{BLACKLIST,DOWNGRADE} to ide_pci_device_t.host_flags and teach ide_get_best_pio_mode() about them. Also remove needless !drive->id check while at it (drive->id is always present). * Convert amd74xx, via82cxxx and ide-timing.h to use ide_get_best_pio_mode() and then remove no longer needed ide_find_best_pio_mode(). There should be no functionality changes caused by this patch. Acked-by: Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-07-20 01:11:58 +02:00
Bartlomiej Zolnierkiewicz	2134758d2a	ide: drop "PIO data" argument from ide_get_best_pio_mode() * Drop no longer needed "PIO data" argument from ide_get_best_pio_mode() and convert all users accordingly. * Remove no longer needed ide_pio_data_t. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-07-20 01:11:58 +02:00
Bartlomiej Zolnierkiewicz	7dd00083b1	ide: add ide_pio_cycle_time() helper (take 2) * Add ide_pio_cycle_time() helper. * Use it in ali14xx/ht6560b/qd65xx/cmd64{0,x}/sl82c105 and pmac host drivers (previously cycle time given by the device was only used for "pio" == 255). * Remove no longer needed ide_pio_data_t.cycle_time field. v2: * Fix "ata_" prefix (Noticed by Jeff). Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-07-20 01:11:56 +02:00
Bartlomiej Zolnierkiewicz	a5d8c5c834	ide: add ide_pci_device_t.host_flags (take 2) * Rename ide_pci_device_t.flags to ide_pci_device_t.host_flags and IDEPCI_FLAG_ISA_PORTS flag to IDE_HFLAG_ISA_PORTS. * Add IDE_HFLAG_SINGLE flag for single channel devices. * Convert core code and all IDE PCI drivers to use IDE_HFLAG_SINGLE and remove no longer needed ide_pci_device_t.channels field. v2: * Fix issues noticed by Sergei: - correct code alignment in scc_pata.c - s/IDE_HFLAG_SINGLE/~IDE_HFLAG_SINGLE/ in serverworks.c Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-07-20 01:11:55 +02:00
Bartlomiej Zolnierkiewicz	2229833c13	ide: add ide_dev_has_iordy() helper (take 4) * Add ide_dev_has_iordy() helper and use it sl82c105 host driver. * Remove no longer needed ide_pio_data_t.use_iordy field. v2/v3: * Fix issues noticed by Sergei: - correct patch description - fix comment in ide_get_best_pio_mode() v4: * Fix "ata_" prefix (Noticed by Jeff). Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-07-20 01:11:55 +02:00
Bartlomiej Zolnierkiewicz	342cdb6d47	ide: make ide_get_best_pio_mode() print info if overriding PIO mode * Print info about overriding PIO mode in ide_get_best_pio_mode(). * Remove info about overriding PIO mode from cmd64{0,x} host drivers. * Remove no longer needed ide_pio_data_t.overridden field. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-07-20 01:11:55 +02:00
Linus Torvalds	721e2629fa	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement	2007-07-19 14:42:40 -07:00
Linus Torvalds	fdb64f93b3	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Fix inode size update before data write in xfs_setattr [XFS] Allow punching holes to free space when at ENOSPC [XFS] Implement ->page_mkwrite in XFS. [FS] Implement block_page_mkwrite. Manually fix up conflict with Nick's VM fault handling patches in fs/xfs/linux-2.6/xfs_file.c Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 14:41:33 -07:00
Avi Kivity	2d9ce177e6	i386: Allow KVM on i386 nonpae Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of the cmpxchg64b feature and enables compilation of the set_64bit() family. Since the option is dependent on PAE, and since KVM depends on set_64bit(), this effectively disables KVM on i386 nopae. Simplify by removing the config option altogether: the boot check is made dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed without constraints. It is up to users to check for the feature flag (KVM does not as virtualiation extensions imply its existence). Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 14:37:05 -07:00
Linus Torvalds	3e1f900bff	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: NFSv4: handle lack of clientaddr in option string NFSv4: debug print ntohl(status) in nfs client callback xdr code SUNRPC: Clean up the sillyrename code NFS: Introduce struct nfs_removeargs+nfs_removeres NFS: Use dentry->d_time to store the parent directory verifier. SUNRPC: move bkl locking and xdr proc invocation into a common helper NFSv4: Fix the nfsv4 readlink reply buffer alignment NFSv4: Fix the readdir reply buffer alignment NFSv4: More NFSv4 xdr cleanups NFSv4: Try to recover from getfh failures in nfs4_xdr_dec_open NFSv4: 'constify' lookup arguments. NFSv4: Don't fail nfs4_xdr_dec_open if decode_restorefh() failed NFSv4: Fix open state recovery NFSD/SUNRPC: Fix the automatic selection of RPCSEC_GSS	2007-07-19 14:33:41 -07:00
Linus Torvalds	40b42f1ebf	Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6 * 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (44 commits) i2c: Delete the i2c-isa pseudo bus driver hwmon: refuse to load abituguru driver on non-Abit boards hwmon: fix Abit Uguru3 driver detection on some motherboards hwmon/w83627ehf: Be quiet when no chip is found hwmon/w83627ehf: No need to initialize fan_min hwmon/w83627ehf: Export the thermal sensor types hwmon/w83627ehf: Enable VBAT monitoring hwmon/w83627ehf: Add support for the VID inputs hwmon/w83627ehf: Fix timing issues hwmon/w83627ehf: Add error messages for two error cases hwmon/w83627ehf: Convert to a platform driver hwmon/w83627ehf: Update the Kconfig entry make coretemp_device_remove() static hwmon: Add LM93 support hwmon: Improve the pwmN_enable documentation hwmon/smsc47b397: Don't report missing fans as spinning at 82 RPM hwmon: Add support for newer uGuru's hwmon/f71805f: Add temperature-tracking fan control mode hwmon/w83627ehf: Preserve speed reading when changing fan min hwmon: fix detection of abituguru volt inputs ... Manual fixup of trivial conflict in MAINTAINERS file	2007-07-19 14:24:57 -07:00
Linus Torvalds	ff86303e30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: [PATCH] sched: implement cpu_clock(cpu) high-speed time source [PATCH] sched: fix the all pinned logic in load_balance_newidle() [PATCH] sched: fix newly idle load balance in case of SMT [PATCH] sched: sched_cacheflush is now unused	2007-07-19 14:11:14 -07:00
Serge E. Hallyn	626ac545c1	user namespace: fix copy_user_ns return value When a CONFIG_USER_NS=n and a user tries to unshare some namespace other than the user namespace, the dummy copy_user_ns returns NULL rather than the old_ns. This value then gets assigned to task->nsproxy->user_ns, so that a subsequent setuid, which uses task->nsproxy->user_ns, causes a NULL pointer deref. Fix this by returning old_ns. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 14:05:08 -07:00
David Chinner	3d7559e677	[IA64] fallocate system call sys_fallocate for ia64. This uses an empty slot #1303 erroneously marked as reserved for move_pages (which had already been allocated as syscall #1276) Signed-Off-By: Dave Chinner <dgc@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2007-07-19 13:48:00 -07:00
Ingo Molnar	e436d80085	[PATCH] sched: implement cpu_clock(cpu) high-speed time source Implement the cpu_clock(cpu) interface for kernel-internal use: high-speed (but slightly incorrect) per-cpu clock constructed from sched_clock(). This API, unused at the moment, will be used in the future by blktrace, by the softlockup-watchdog, by printk and by lockstat. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-19 21:28:35 +02:00
Ralf Baechle	c41917df8a	[PATCH] sched: sched_cacheflush is now unused Since Ingo's recent scheduler rewrite which was merged as commit `0437e109e1` sched_cacheflush is unused. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-19 21:28:35 +02:00
Trond Myklebust	e4eff1a622	SUNRPC: Clean up the sillyrename code Fix a couple of bugs: - Don't rely on the parent dentry still being valid when the call completes. Fixes a race with shrink_dcache_for_umount_subtree() - Don't remove the file if the filehandle has been labelled as stale. Fix a couple of inefficiencies - Remove the global list of sillyrenamed files. Instead we can cache the sillyrename information in the dentry->d_fsdata - Move common code from unlink_setup/unlink_done into fs/nfs/unlink.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Trond Myklebust	4fdc17b2a7	NFS: Introduce struct nfs_removeargs+nfs_removeres We need a common structure for setting up an unlink() rpc call in order to fix the asynchronous unlink code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
J. Bruce Fields	be879c4e24	SUNRPC: move bkl locking and xdr proc invocation into a common helper Since every invocation of xdr encode or decode functions takes the BKL now, there's a lot of redundant lock_kernel/unlock_kernel pairs that we can pull out into a common function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Jean Delvare	e24b8cb4fa	i2c: Delete the i2c-isa pseudo bus driver There are no users of i2c-isa left, so we can finally get rid of it. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-07-19 14:25:20 -04:00
Linus Torvalds	ce8c2293be	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits) [TG3]: Fix msi issue with kexec/kdump. [NET] XFRM: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. [NET] ROSE: Fix whitespace errors. [NET] RFKILL: Fix whitespace errors. [NET] PACKET: Fix whitespace errors. [NET] NETROM: Fix whitespace errors. [NET] NETFILTER: Fix whitespace errors. [NET] IPV4: Fix whitespace errors. [NET] DCCP: Fix whitespace errors. [NET] CORE: Fix whitespace errors. [NET] BLUETOOTH: Fix whitespace errors. [NET] AX25: Fix whitespace errors. [PATCH] mac80211: remove rtnl locking in ieee80211_sta.c [PATCH] mac80211: fix GCC warning on 64bit platforms [GENETLINK]: Dynamic multicast groups. [NETLIKN]: Allow removing multicast groups. ...	2007-07-19 10:23:21 -07:00
Kristian Høgsberg	b02b6bc465	[SCSI] Make scsi_host_template::proc_name const char * instead of char . Signed-off-by: Kristian Høgsberg <krh@redhat.com> collapsed with fw-sbp2 patch "Drop cast to non-const char in host template initialization." from Kristian HÃ¸gsberg Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-19 12:06:26 -05:00
Douglas Thompson	53078ca84b	include/linux/pci_id.h: add amd northbridge defines pci_ids.h needs two of the AMD NB device-ids namely, Addressmap and the Memory Controller devices This patch adds those to the pci_id.h include file Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:55 -07:00
Dave Jiang	66ee2f940a	drivers/edac: mod assert_error check Change error check and clear variable from an atomic to an int Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:54 -07:00
Jason Uhlenkott	535c6a5303	drivers/edac: new inte 30x0 MC driver Here's a driver for the Intel 3000 and 3010 memory controllers, relative to today's Sourceforge code drop. This has only had light testing (I've yet to actually see it handle a memory error) but it detects my hardware correctly. Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:54 -07:00
Dave Jiang	c0d1217202	drivers/edac: add new nmi rescan Provides a way for NMI reported errors on x86 to notify the EDAC subsystem pending ECC errors by writing to a software state variable. Here's the reworked patch. I added an EDAC stub to the kernel so we can have variables that are in the kernel even if EDAC is a module. I also implemented the idea of using the chip driver to select error detection mode via module parameter and eliminate the kernel compile option. Please review/test. Thx! Also, I only made changes to some of the chipset drivers since I am unfamiliar with the other ones. We can add similar changes as we go. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:53 -07:00
Rusty Russell	d7e28ffe6c	lguest: the host code This is the code for the "lg.ko" module, which allows lguest guests to be launched. [akpm@linux-foundation.org: update for futex-new-private-futexes] [akpm@linux-foundation.org: build fix] [jmorris@namei.org: lguest: use hrtimers] [akpm@linux-foundation.org: x86_64 build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Rusty Russell	07ad157f6e	lguest: the guest code lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need VT/SVM hardware. Unlike Xen it's simply "modprobe and go". Unlike both, it's 5000 lines and self-contained. Performance is ok, but not great (-30% on kernel compile). But given its hackability, I expect this to improve, along with the paravirt_ops code which it supplies a complete example for. There's also a 64-bit version being worked on and other craziness. But most of all, lguest is awesome fun! Too much of the kernel is a big ball of hair. lguest is simple enough to dive into and hack, plus has some warts which scream "fork me!". This patch: This is the code and headers required to make an i386 kernel an lguest guest. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	c7d51402d2	knfsd: clean up EX_RDONLY Share a little common code, reverse the arguments for consistency, drop the unnecessary "inline", and lowercase the name. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	e22841c637	knfsd: move EX_RDONLY out of header EX_RDONLY is only called in one place; just put it there. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Andrew Morton	d688abf50b	move page writeback acounting out of macros page-writeback accounting is presently performed in the page-flags macros. This is inconsistent and a bit ugly and makes it awkward to implement per-backing_dev under-writeback page accounting. So move this accounting down to the callsite(s). Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Greg Ungerer	10146801e8	m68knommu: remove is_in_rom() function Remove is_in_rom() function. It doesn't actually serve the purpose it was intended to. If you look at the use of it _access_ok() (which is the only use of it) then it is obvious that most of memory is marked as access_ok. No point having is_in_rom() then, so remove it. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:51 -07:00
Greg Ungerer	2502b667ea	m68knommu: generic irq handling Change the m68knommu irq handling to use the generic irq framework. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:50 -07:00
Johannes Berg	3b5ad0797c	stacktrace: fix header file for !CONFIG_STACKTRACE The print_stack_trace macro in stacktrace.h has a wrong number of arguments, fix it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:49 -07:00
Peter Zijlstra	96645678cd	lockstat: measure lock bouncing __acquire \| lock _____ \| \ \| __contended \| \| \| wait \| _______/ \|/ \| __acquired \| __release \| unlock We measure acquisition and contention bouncing. This is done by recording a cpu stamp in each lock instance. Contention bouncing requires the cpu stamp to be set on acquisition. Hence we move __acquired into the generic path. __acquired is then used to measure acquisition bouncing by comparing the current cpu with the old stamp before replacing it. __contended is used to measure contention bouncing (only useful for preemptable locks) [akpm@linux-foundation.org: cleanups] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:49 -07:00
Peter Zijlstra	4b32d0a4e9	lockdep: various fixes - update the copyright notices - use the default hash function - fix a thinko in a BUILD_BUG_ON - add a WARN_ON to spot inconsitent naming - fix a termination issue in /proc/lock_stat [akpm@linux-foundation.org: cleanups] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:49 -07:00
Peter Zijlstra	f20786ff4d	lockstat: core infrastructure Introduce the core lock statistics code. Lock statistics provides lock wait-time and hold-time (as well as the count of corresponding contention and acquisitions events). Also, the first few call-sites that encounter contention are tracked. Lock wait-time is the time spent waiting on the lock. This provides insight into the locking scheme, that is, a heavily contended lock is indicative of a too coarse locking scheme. Lock hold-time is the duration the lock was held, this provides a reference for the wait-time numbers, so they can be put into perspective. 1) lock 2) ... do stuff .. unlock 3) The time between 1 and 2 is the wait-time. The time between 2 and 3 is the hold-time. The lockdep held-lock tracking code is reused, because it already collects locks into meaningful groups (classes), and because it is an existing infrastructure for lock instrumentation. Currently lockdep tracks lock acquisition with two hooks: lock() lock_acquire() _lock() ... code protected by lock ... unlock() lock_release() _unlock() We need to extend this with two more hooks, in order to measure contention. lock_contended() - used to measure contention events lock_acquired() - completion of the contention These are then placed the following way: lock() lock_acquire() if (!_try_lock()) lock_contended() _lock() lock_acquired() ... do locked stuff ... unlock() lock_release() _unlock() (Note: the try_lock() 'trick' is used to avoid instrumenting all platform dependent lock primitive implementations.) It is also possible to toggle the two lockdep features at runtime using: /proc/sys/kernel/prove_locking /proc/sys/kernel/lock_stat (esp. turning off the O(n^2) prove_locking functionaliy can help) [akpm@linux-foundation.org: build fixes] [akpm@linux-foundation.org: nuke unneeded ifdefs] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:49 -07:00
Peter Zijlstra	21f8ca3bf6	fix raw_spinlock_t vs lockdep Use the lockdep infrastructure to track lock contention and other lock statistics. It tracks lock contention events, and the first four unique call-sites that encountered contention. It also measures lock wait-time and hold-time in nanoseconds. The minimum and maximum times are tracked, as well as a total (which together with the number of event can give the avg). All statistics are done per lock class, per write (exclusive state) and per read (shared state). The statistics are collected per-cpu, so that the collection overhead is minimized via having no global cachemisses. This new lock statistics feature is independent of the lock dependency checking traditionally done by lockdep; it just shares the lock tracking code. It is also possible to enable both and runtime disabled either component - thereby avoiding the O(n^2) lock chain walks for instance. This patch: raw_spinlock_t should not use lockdep (and doesn't) since lockdep itself relies on it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:49 -07:00
Jan Harkes	3cf01f28c3	coda: remove statistics counters from /proc/fs/coda Similar information can easily be obtained with strace -c. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	a1b0aa8764	coda: remove struct coda_sb_info The sb_info structure only contains a single pointer to the character device, there is no need for the added indirection. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	d9664c95af	coda: block signals during upcall processing We ignore signals for about 30 seconds to give userspace a chance to see the upcall. As we did not block signals we ended up in a busy loop for the remainder of the period when a signal is received. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Roland McGrath	cbe87121f1	i386: Put allocated ELF notes in read-only data segment This changes the i386 linker script and the asm-generic macro it uses so that ELF note sections with SHF_ALLOC set are linked into the kernel image along with other read-only data. The PT_NOTE also points to their location. This paves the way for putting useful build-time information into ELF notes that can be found easily later in a kernel memory dump. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	3cb4a0bb1e	coredump masking: add an interface for core dump filter This patch adds an interface to set/reset flags which determines each memory segment should be dumped or not when a core file is generated. /proc/<pid>/coredump_filter file is provided to access the flags. You can change the flag status for a particular process by writing to or reading from the file. The flag status is inherited to the child process when it is created. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	6c5d523826	coredump masking: reimplementation of dumpable using two flags This patch changes mm_struct.dumpable to a pair of bit flags. set_dumpable() converts three-value dumpable to two flags and stores it into lower two bits of mm_struct.flags instead of mm_struct.dumpable. get_dumpable() behaves in the opposite way. [akpm@linux-foundation.org: export set_dumpable] Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:46 -07:00
Josef 'Jeff' Sipek	f79c20f525	fs: remove path_walk export Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	c4a7808fc3	fs: mark link_path_walk static Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	16f1820028	fs: introduce vfs_path_lookup Stackable file systems, among others, frequently need to lookup paths or path components starting from an arbitrary point in the namespace (identified by a dentry and a vfsmount). Currently, such file systems use lookup_one_len, which is frowned upon [1] as it does not pass the lookup intent along; not passing a lookup intent, for example, can trigger BUG_ON's when stacking on top of NFSv4. The first patch introduces a new lookup function to allow lookup starting from an arbitrary point in the namespace. This approach has been suggested by Christoph Hellwig [2]. The second patch changes sunrpc to use vfs_path_lookup. The third patch changes nfsctl.c to use vfs_path_lookup. The fourth patch marks link_path_walk static. The fifth, and last patch, unexports path_walk because it is no longer unnecessary to call it directly, and using the new vfs_path_lookup is cleaner. For example, the following snippet of code, looks up "some/path/component" in a directory pointed to by parent_{dentry,vfsmnt}: err = vfs_path_lookup(parent_dentry, parent_vfsmnt, "some/path/component", 0, &nd); if (!err) { /* exits / ... / once done, release the references / path_release(&nd); } else if (err == -ENOENT) { / doesn't exist / } else { / other error */ } VFS functions such as lookup_create can be used on the nameidata structure to pass the create intent to the file system. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Ollie Wild	b6a2fea393	mm: variable length argument support Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from the old mm into the new mm. We create the new mm before the binfmt code runs, and place the new stack at the very top of the address space. Once the binfmt code runs and figures out where the stack should be, we move it downwards. It is a bit peculiar in that we have one task with two mm's, one of which is inactive. [a.p.zijlstra@chello.nl: limit stack size] Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Hugh Dickins <hugh@veritas.com> [bunk@stusta.de: unexport bprm_mm_init] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Peter Zijlstra	bdf4c48af2	audit: rework execve audit The purpose of audit_bprm() is to log the argv array to a userspace daemon at the end of the execve system call. Since user-space hasn't had time to run, this array is still in pristine state on the process' stack; so no need to copy it, we can just grab it from there. In order to minimize the damage to audit_log_*() copy each string into a temporary kernel buffer first. Currently the audit code requires that the full argument vector fits in a single packet. So currently it does clip the argv size to a (sysctl) limit, but only when execve auditing is enabled. If the audit protocol gets extended to allow for multiple packets this check can be removed. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> Cc: <linux-audit@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Peter Zijlstra	b111757c50	arch: personality independent stack top New arch macro STACK_TOP_MAX it gives the larges valid stack address for the architecture in question. It differs from STACK_TOP in that it will not distinguish between personalities but will always return the largest possible address. This is used to create the initial stack on execve, which we will move down to the proper location once the binfmt code has figured out where that is. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Fenghua Yu	5fb7dc37dc	define new percpu interface for shared data per cpu data section contains two types of data. One set which is exclusively accessed by the local cpu and the other set which is per cpu, but also shared by remote cpus. In the current kernel, these two sets are not clearely separated out. This can potentially cause the same data cacheline shared between the two sets of data, which will result in unnecessary bouncing of the cacheline between cpus. One way to fix the problem is to cacheline align the remotely accessed per cpu data, both at the beginning and at the end. Because of the padding at both ends, this will likely cause some memory wastage and also the interface to achieve this is not clean. This patch: Moves the remotely accessed per cpu data (which is currently marked as ____cacheline_aligned_in_smp) into a different section, where all the data elements are cacheline aligned. And as such, this differentiates the local only data and remotely accessed data cleanly. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Michael Ellerman	3d7e33825d	jprobes: make jprobes a little safer for users I realise jprobes are a razor-blades-included type of interface, but that doesn't mean we can't try and make them safer to use. This guy I know once wrote code like this: struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" }; And then his kernel exploded. Oops. This patch adds an arch hook, arch_deref_entry_point() (I don't like it either) which takes the void * in a struct jprobe, and gives back the text address that it represents. We can then use that in register_jprobe() to check that the entry point we're passed is actually in the kernel text, rather than just some random value. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Michael Ellerman	9e367d8592	jprobes: remove JPROBE_ENTRY() AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything useful - so remove it .. I've left a do-nothing version so that out-of-tree jprobes code will still compile without modifications. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Michael Ellerman	81eae375ec	jprobes: make struct jprobe.entry a void * Currently jprobe.entry is a kprobe_opcode_t , but that's a lie. On some platforms it doesn't point to an opcode at all, it points to a function descriptor. It's really a pointer to something that the arch code can turn into a function entry point. And that's what actually happens, none of the generic code ever looks at jprobe.entry, it's only ever dereferenced by arch code. So just make it a void . Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	f9acc8c7b3	readahead: sanify file_ra_state names Rename some file_ra_state variables and remove some accessors. It results in much simpler code. Kudos to Rusty! Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Rusty Russell	cf914a7d65	readahead: split ondemand readahead interface into two functions Split ondemand readahead interface into two functions. I think this makes it a little clearer for non-readahead experts (like Rusty). Internally they both call ondemand_readahead(), but the page argument is changed to an obvious boolean flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	fe3cba17c4	mm: share PG_readahead and PG_reclaim Share the same page flag bit for PG_readahead and PG_reclaim. One is used only on file reads, another is only for emergency writes. One is used mostly for fresh/young pages, another is for old pages. Combinations of possible interactions are: a) clear PG_reclaim => implicit clear of PG_readahead it will delay an asynchronous readahead into a synchronous one it actually does _good_ for readahead: the pages will be reclaimed soon, it's readahead thrashing! in this case, synchronous readahead makes more sense. b) clear PG_readahead => implicit clear of PG_reclaim one(and only one) page will not be reclaimed in time it can be avoided by checking PageWriteback(page) in readahead first c) set PG_reclaim => implicit set of PG_readahead will confuse readahead and make it restart the size rampup process it's a trivial problem, and can mostly be avoided by checking PageWriteback(page) first in readahead d) set PG_readahead => implicit set of PG_reclaim PG_readahead will never be set on already cached pages. PG_reclaim will always be cleared on dirtying a page. so not a problem. In summary, a) we get better behavior b,d) possible interactions can be avoided c) racy condition exists that might affect readahead, but the chance is _really_ low, and the hurt on readahead is trivial. Compound pages also use PG_reclaim, but for now they do not interact with reclaim/readahead code. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	c743d96b6d	readahead: remove the old algorithm Remove the old readahead algorithm. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	122a21d11c	readahead: on-demand readahead logic This is a minimal readahead algorithm that aims to replace the current one. It is more flexible and reliable, while maintaining almost the same behavior and performance. Also it is full integrated with adaptive readahead. It is designed to be called on demand: - on a missing page, to do synchronous readahead - on a lookahead page, to do asynchronous readahead In this way it eliminated the awkward workarounds for cache hit/miss, readahead thrashing, retried read, and unaligned read. It also adopts the data structure introduced by adaptive readahead, parameterizes readahead pipelining with `lookahead_index', and reduces the current/ahead windows to one single window. HEURISTICS The logic deals with four cases: - sequential-next found a consistent readahead window, so push it forward - random standalone small read, so read as is - sequential-first create a new readahead window for a sequential/oversize request - lookahead-clueless hit a lookahead page not associated with the readahead window, so create a new readahead window and ramp it up In each case, three parameters are determined: - readahead index: where the next readahead begins - readahead size: how much to readahead - lookahead size: when to do the next readahead (for pipelining) BEHAVIORS The old behaviors are maximally preserved for trivial sequential/random reads. Notable changes are: - It no longer imposes strict sequential checks. It might help some interleaved cases, and clustered random reads. It does introduce risks of a random lookahead hit triggering an unexpected readahead. But in general it is more likely to do good than to do evil. - Interleaved reads are supported in a minimal way. Their chances of being detected and proper handled are still low. - Readahead thrashings are better handled. The current readahead leads to tiny average I/O sizes, because it never turn back for the thrashed pages. They have to be fault in by do_generic_mapping_read() one by one. Whereas the on-demand readahead will redo readahead for them. OVERHEADS The new code reduced the overheads of - excessively calling the readahead routine on small sized reads (the current readahead code insists on seeing all requests) - doing a lot of pointless page-cache lookups for small cached files (the current readahead only turns itself off after 256 cache hits, unfortunately most files are < 1MB, so never see that chance) That accounts for speedup of - 0.3% on 1-page sequential reads on sparse file - 1.2% on 1-page cache hot sequential reads - 3.2% on 256-page cache hot sequential reads - 1.3% on cache hot `tar /lib` However, it does introduce one extra page-cache lookup per cache miss, which impacts random reads slightly. That's 1% overheads for 1-page random reads on sparse file. PERFORMANCE The basic benchmark setup is - 2.6.20 kernel with on-demand readahead - 1MB max readahead size - 2.9GHz Intel Core 2 CPU - 2GB memory - 160G/8M Hitachi SATA II 7200 RPM disk The benchmarks show that - it maintains the same performance for trivial sequential/random reads - sysbench/OLTP performance on MySQL gains up to 8% - performance on readahead thrashing gains up to 3 times iozone throughput (KB/s): roughly the same ========================================== iozone -c -t1 -s 4096m -r 64k 2.6.20 on-demand gain first run " Initial write " 61437.27 64521.53 +5.0% " Rewrite " 47893.02 48335.20 +0.9% " Read " 62111.84 62141.49 +0.0% " Re-read " 62242.66 62193.17 -0.1% " Reverse Read " 50031.46 49989.79 -0.1% " Stride read " 8657.61 8652.81 -0.1% " Random read " 13914.28 13898.23 -0.1% " Mixed workload " 19069.27 19033.32 -0.2% " Random write " 14849.80 14104.38 -5.0% " Pwrite " 62955.30 65701.57 +4.4% " Pread " 62209.99 62256.26 +0.1% second run " Initial write " 60810.31 66258.69 +9.0% " Rewrite " 49373.89 57833.66 +17.1% " Read " 62059.39 62251.28 +0.3% " Re-read " 62264.32 62256.82 -0.0% " Reverse Read " 49970.96 50565.72 +1.2% " Stride read " 8654.81 8638.45 -0.2% " Random read " 13901.44 13949.91 +0.3% " Mixed workload " 19041.32 19092.04 +0.3% " Random write " 14019.99 14161.72 +1.0% " Pwrite " 64121.67 68224.17 +6.4% " Pread " 62225.08 62274.28 +0.1% In summary, writes are unstable, reads are pretty close on average: access pattern 2.6.20 on-demand gain Read 62085.61 62196.38 +0.2% Re-read 62253.49 62224.99 -0.0% Reverse Read 50001.21 50277.75 +0.6% Stride read 8656.21 8645.63 -0.1% Random read 13907.86 13924.07 +0.1% Mixed workload 19055.29 19062.68 +0.0% Pread 62217.53 62265.27 +0.1% aio-stress: roughly the same ============================ aio-stress -l -s4096 -r128 -t1 -o1 knoppix511-dvd-cn.iso aio-stress -l -s4096 -r128 -t1 -o3 knoppix511-dvd-cn.iso 2.6.20 on-demand delta sequential 92.57s 92.54s -0.0% random 311.87s 312.15s +0.1% sysbench fileio: roughly the same ================================= sysbench --test=fileio --file-io-mode=async --file-test-mode=rndrw \ --file-total-size=4G --file-block-size=64K \ --num-threads=001 --max-requests=10000 --max-time=900 run threads 2.6.20 on-demand delta first run 1 59.1974s 59.2262s +0.0% 2 58.0575s 58.2269s +0.3% 4 48.0545s 47.1164s -2.0% 8 41.0684s 41.2229s +0.4% 16 35.8817s 36.4448s +1.6% 32 32.6614s 32.8240s +0.5% 64 23.7601s 24.1481s +1.6% 128 24.3719s 23.8225s -2.3% 256 23.2366s 22.0488s -5.1% second run 1 59.6720s 59.5671s -0.2% 8 41.5158s 41.9541s +1.1% 64 25.0200s 23.9634s -4.2% 256 22.5491s 20.9486s -7.1% Note that the numbers are not very stable because of the writes. The overall performance is close when we sum all seconds up: sum all up 495.046s 491.514s -0.7% sysbench oltp (trans/sec): up to 8% gain ======================================== sysbench --test=oltp --oltp-table-size=10000000 --oltp-read-only \ --mysql-socket=/var/run/mysqld/mysqld.sock \ --mysql-user=root --mysql-password=readahead \ --num-threads=064 --max-requests=10000 --max-time=900 run 10000-transactions run threads 2.6.20 on-demand gain 1 62.81 64.56 +2.8% 2 67.97 70.93 +4.4% 4 81.81 85.87 +5.0% 8 94.60 97.89 +3.5% 16 99.07 104.68 +5.7% 32 95.93 104.28 +8.7% 64 96.48 103.68 +7.5% 5000-transactions run 1 48.21 48.65 +0.9% 8 68.60 70.19 +2.3% 64 70.57 74.72 +5.9% 2000-transactions run 1 37.57 38.04 +1.3% 2 38.43 38.99 +1.5% 4 45.39 46.45 +2.3% 8 51.64 52.36 +1.4% 16 54.39 55.18 +1.5% 32 52.13 54.49 +4.5% 64 54.13 54.61 +0.9% That's interesting results. Some investigations show that - MySQL is accessing the db file non-uniformly: some parts are more hot than others - It is mostly doing 4-page random reads, and sometimes doing two reads in a row, the latter one triggers a 16-page readahead. - The on-demand readahead leaves many lookahead pages (flagged PG_readahead) there. Many of them will be hit, and trigger more readahead pages. Which might save more seeks. - Naturally, the readahead windows tend to lie in hot areas, and the lookahead pages in hot areas is more likely to be hit. - The more overall read density, the more possible gain. That also explains the adaptive readahead tricks for clustered random reads. readahead thrashing: 3 times better =================================== We boot kernel with "mem=128m single", and start a 100KB/s stream on every second, until reaching 200 streams. max throughput min avg I/O size 2.6.20: 5MB/s 16KB on-demand: 15MB/s 140KB Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	5ce1110b92	readahead: data structure and routines Extend struct file_ra_state to support the on-demand readahead logic. Also define some helpers for it. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	d77c2d7cc5	readahead: introduce PG_readahead Introduce a new page flag: PG_readahead. It acts as a look-ahead mark, which tells the page reader: Hey, it's time to invoke the read-ahead logic. For the sake of I/O pipelining, don't wait until it runs out of cached pages! Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
David Brownell	2ba2d00363	AIO sparse fix (type of ki_flags) Fix type issue reported by latest 'sparse': kiocb.ki_flags should be "unsigned long" (not "long"), to match bitop type signature. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Akinobu Mita	e53252d97e	unregister_chrdev() return void unregister_chrdev() does not return meaningful value. This patch makes it return void like most unregister_* functions. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Pavel Machek	77afcf78a2	PM: Integrate beeping flag with existing acpi_sleep flags Move "debug during resume from s2ram" into the variable we already use for real-mode flags to simplify code. It also closes nasty trap for the user in acpi_sleep_setup; order of parameters actually mattered there, acpi_sleep=s3_bios,s3_mode doing something different from acpi_sleep=s3_mode,s3_bios. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Nigel Cunningham	5a60d6235c	PM: Optional beeping during resume from suspend to RAM Add a feature allowing the user to make the system beep during a resume from suspend to RAM, on x86_64 and i386. This is useful for the users with broken resume from RAM, so that they can verify if the control reaches the kernel after a wake-up event. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Rafael J. Wysocki	bd804eba1c	PM: Introduce pm_power_off_prepare Introduce the pm_power_off_prepare() callback that can be registered by the interested platforms in analogy with pm_idle() and pm_power_off(), used for preparing the system to power off (needed by ACPI). This allows us to drop acpi_sysclass and device_acpi that are only defined in order to register the ACPI power off preparation callback, which is needed by pm_power_off() registered in a much different way. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:42 -07:00
Rafael J. Wysocki	b10d911749	PM: introduce hibernation and suspend notifiers Make it possible to register hibernation and suspend notifiers, so that subsystems can perform hibernation-related or suspend-related operations that should not be carried out by device drivers' .suspend() and .resume() routines. [akpm@linux-foundation.org: build fixes] [akpm@linux-foundation.org: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:42 -07:00
Rafael J. Wysocki	0c1eecfb34	Freezer: avoid freezing kernel threads prematurely Kernel threads should not have TIF_FREEZE set when user space processes are being frozen, since otherwise some of them might be frozen prematurely. To prevent this from happening we can (1) make exit_mm() unset TIF_FREEZE unconditionally just after clearing tsk->mm and (2) make try_to_freeze_tasks() check if p->mm is different from zero and PF_BORROWED_MM is unset in p->flags when user space processes are to be frozen. Namely, when user space processes are being frozen, we only should set TIF_FREEZE for tasks that have p->mm different from NULL and don't have PF_BORROWED_MM set in p->flags. For this reason task_lock() must be used to prevent try_to_freeze_tasks() from racing with use_mm()/unuse_mm(), in which p->mm and p->flags.PF_BORROWED_MM are changed under task_lock(p). Also, we need to prevent the following scenario from happening: * daemonize() is called by a task spawned from a user space code path * freezer checks if the task has p->mm set and the result is positive * task enters exit_mm() and clears its TIF_FREEZE * freezer sets TIF_FREEZE for the task * task calls try_to_freeze() and goes to the refrigerator, which is wrong at that point This requires us to acquire task_lock(p) before p->flags.PF_BORROWED_MM and p->mm are examined and release it after TIF_FREEZE is set for p (or it turns out that TIF_FREEZE should not be set). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:42 -07:00
Rafael J. Wysocki	a634cc1016	swsusp: introduce restore platform operations At least on some machines it is necessary to prepare the ACPI firmware for the restoration of the system memory state from the hibernation image if the "platform" mode of hibernation has been used. Namely, in that cases we need to disable the GPEs before replacing the "boot" kernel with the "frozen" kernel (cf. http://bugzilla.kernel.org/show_bug.cgi?id=7887). After the restore they will be re-enabled by hibernation_ops->finish(), but if the restore fails, they have to be re-enabled by the restore code explicitly. For this purpose we can introduce two additional hibernation operations, called pre_restore() and restore_cleanup() and call them from the restore code path. Still, they should be called if the "platform" mode of hibernation has been used, so we need to pass the information about the hibernation mode from the "frozen" kernel to the "boot" kernel in the image header. Apparently, we can't drop the disabling of GPEs before the restore because of Bug #7887 . We also can't do it unconditionally, because the GPEs wouldn't have been enabled after a successful restore if the suspend had been done in the 'shutdown' or 'reboot' mode. In principle we could (and probably should) unconditionally disable the GPEs before each snapshot creation and before the restore, but then we'd have to unconditionally enable them after the snapshot creation as well as after the restore (or restore failure) Still, for this purpose we'd need to modify acpi_enter_sleep_state_prep() and acpi_leave_sleep_state() and we'd have to introduce some mechanism synchronizing the disablind/enabling of the GPEs with the device drivers' .suspend()/.resume() routines and with disable_/enable_nonboot_cpus(). However, this would have affected the suspend (ie. s2ram) code as well as the hibernation, which I'd like to avoid in this patch series. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:42 -07:00
Mel Gorman	bb2d5ce164	Remove alloc_zeroed_user_highpage() alloc_zeroed_user_highpage() has no in-tree users and it is not exported. As it is not exported, it can simply be removed. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	83c54070ee	mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	d0217ac04c	mm: fault feedback #1 Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	54cb8821de	mm: merge populate and nopage into fault (fixes nonlinear) Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	d00806b183	mm: fix fault vs invalidate race for linear mappings Fix the race between invalidate_inode_pages and do_no_page. Andrea Arcangeli identified a subtle race between invalidation of pages from pagecache with userspace mappings, and do_no_page. The issue is that invalidation has to shoot down all mappings to the page, before it can be discarded from the pagecache. Between shooting down ptes to a particular page, and actually dropping the struct page from the pagecache, do_no_page from any process might fault on that page and establish a new mapping to the page just before it gets discarded from the pagecache. The most common case where such invalidation is used is in file truncation. This case was catered for by doing a sort of open-coded seqlock between the file's i_size, and its truncate_count. Truncation will decrease i_size, then increment truncate_count before unmapping userspace pages; do_no_page will read truncate_count, then find the page if it is within i_size, and then check truncate_count under the page table lock and back out and retry if it had subsequently been changed (ptl will serialise against unmapping, and ensure a potentially updated truncate_count is actually visible). Complexity and documentation issues aside, the locking protocol fails in the case where we would like to invalidate pagecache inside i_size. do_no_page can come in anytime and filemap_nopage is not aware of the invalidation in progress (as it is when it is outside i_size). The end result is that dangling (->mapping == NULL) pages that appear to be from a particular file may be mapped into userspace with nonsense data. Valid mappings to the same place will see a different page. Andrea implemented two working fixes, one using a real seqlock, another using a page->flags bit. He also proposed using the page lock in do_no_page, but that was initially considered too heavyweight. However, it is not a global or per-file lock, and the page cacheline is modified in do_no_page to increment _count and _mapcount anyway, so a further modification should not be a large performance hit. Scalability is not an issue. This patch implements this latter approach. ->nopage implementations return with the page locked if it is possible for their underlying file to be invalidated (in that case, they must set a special vm_flags bit to indicate so). do_no_page only unlocks the page after setting up the mapping completely. invalidation is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Paul Moore	23bcdc1ade	SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement Create a new NetLabel KAPI interface, netlbl_enabled(), which reports on the current runtime status of NetLabel based on the existing configuration. LSMs that make use of NetLabel, i.e. SELinux, can use this new function to determine if they should perform NetLabel access checks. This patch changes the NetLabel/SELinux glue code such that SELinux only enforces NetLabel related access checks when netlbl_enabled() returns true. At present NetLabel is considered to be enabled when there is at least one labeled protocol configuration present. The result is that by default NetLabel is considered to be disabled, however, as soon as an administrator configured a CIPSO DOI definition NetLabel is enabled and SELinux starts enforcing NetLabel related access controls - including unlabeled packet controls. This patch also tries to consolidate the multiple "#ifdef CONFIG_NETLABEL" blocks into a single block to ease future review as recommended by Linus. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2007-07-19 10:21:11 -04:00
David Chinner	5417169026	[FS] Implement block_page_mkwrite. Many filesystems need a ->page-mkwrite callout to correctly set up pages that have been written to by mmap. This is especially important when mmap is writing into holes as it allows filesystems to correctly account for and allocate space before the mmap write is allowed to proceed. Protection against truncate races is provided by locking the page and checking to see whether the page mapping is correct and whether it is beyond EOF so we don't end up allowing allocations beyond the current EOF or changing EOF as a result of a mmap write. SGI-PV: 940392 SGI-Modid: 2.6.x-xfs-melb:linux:29146a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:50:50 +10:00
Linus Torvalds	ce524c8360	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: eHEA: Fix bonding support Blackfin ethernet driver: on chip ethernet MAC controller driver fix wrong argument of tc35815_read_plat_dev_addr() ARM/ETHER3: Handle multicast frames. SAA9730: Handle multicast frames. NI5010: Handle multicast frames. NS83820: Handle multicast frames. Fix RGMII-ID handling in gianfar Fix Vitesse RGMII-ID support Add phy-connection-type to gianfar nodes Fix Vitesse 824x PHY interrupt acking [PATCH] zd1211rw: Add ID for Siemens Gigaset USB Stick 54 [PATCH] zd1211rw: Add ID for Planex GW-US54GXS [PATCH] Update version ipw2200 stamp to 1.2.2 [PATCH] ipw2200: Fix ipw_isr() comments error on shared IRQ [PATCH] Fix ipw2200 set wrong power parameter causing firmware error [PATCH] ipw2100: Fix `iwpriv set_power` error [PATCH] softmac: Channel is listed twice in scan output	2007-07-18 18:33:45 -07:00
Linus Torvalds	29e7ee378e	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: cosmetic clean up on node creation failure paths sysfs: kill an extra put in sysfs_create_link() failure path Driver core: check return code of sysfs_create_link() HOWTO: Add the knwon_regression URI to the documentation dev_vdbg() documentation dev_vdbg(), available with -DVERBOSE_DEBUG sysfs: make sysfs_init_inode() static sysfs: fix sysfs root inode nlink accounting Documentation fix devres.txt: lib/iomap.c -> lib/devres.c sysfs: avoid kmem_cache_free(NULL) PM: remove deprecated dpm_runtime_* routines PM: Remove deprecated sysfs files Driver core: accept all valid action-strings in uevent-trigger debugfs: remove rmdir() non-empty complaint	2007-07-18 18:28:08 -07:00
Linus Torvalds	fc15bc817e	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6: UIO: Hilscher CIF card driver UIO: Documentation UIO: Add the User IO core code	2007-07-18 18:27:50 -07:00
Linus Torvalds	a8dcf12f9e	Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux * 'for-linus' of git://linux-nfs.org/~bfields/linux: locks: fix vfs_test_lock() comment locks: make posix_test_lock() interface more consistent nfs: disable leases over NFS gfs2: stop giving out non-cluster-coherent leases locks: export setlease to filesystems locks: provide a file lease method enabling cluster-coherent leases locks: rename lease functions to reflect locks.c conventions locks: share more common lease code locks: clean up lease_alloc() locks: convert an -EINVAL return to a BUG leases: minor break_lease() comment clarification	2007-07-18 18:27:00 -07:00
J. Bruce Fields	6d34ac199a	locks: make posix_test_lock() interface more consistent Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or presence of a conflict lock by setting fl_type to, respectively, F_UNLCK or something other than F_UNLCK, the return value is no longer needed. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	4698afe8e3	locks: export setlease to filesystems Export setlease so it can used by filesystems to implement their lease methods. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:06 -04:00
J. Bruce Fields	f9ffed26d6	locks: provide a file lease method enabling cluster-coherent leases Currently leases are only kept locally, so there's no way for a distributed filesystem to enforce them against multiple clients. We're particularly interested in the case of nfsd exporting a cluster filesystem, in which case nfsd needs cluster-coherent leases in order to implement delegations correctly. Also add some documentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:14:47 -04:00
J. Bruce Fields	a9933cea7a	locks: rename lease functions to reflect locks.c conventions We've been using the convention that vfs_foo is the function that calls a filesystem-specific foo method if it exists, or falls back on a generic method if it doesn't; thus vfs_foo is what is called when some other part of the kernel (normally lockd or nfsd) wants to get a lock, whereas foo is what filesystems call to use the underlying local functionality as part of their lock implementation. So rename setlease to vfs_setlease (which will call a filesystem-specific setlease after a later patch) and __setlease to setlease. Also, vfs_setlease need only be GPL-exported as long as it's only needed by lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:14:12 -04:00
Hans J. Koch	beafc54c4e	UIO: Add the User IO core code This interface allows the ability to write the majority of a driver in userspace with only a very small shell of a driver in the kernel itself. It uses a char device and sysfs to interact with a userspace process to process interrupts and control memory accesses. See the docbook documentation for more details on how to use this interface. From: Hans J. Koch <hjk@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benedikt Spranger <b.spranger@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:57:15 -07:00
David Brownell	aebdc3b450	dev_vdbg(), available with -DVERBOSE_DEBUG This defines a dev_vdbg() call, which is enabled with -DVERBOSE_DEBUG. When enabled, dev_vdbg() acts just like dev_dbg(). When disabled, it is a NOP ... just like dev_dbg() without -DDEBUG. The specific code was moved out of a USB patch, but lots of drivers have similar support. That is, code can now be written to use an additional level of debug output, selected at compile time. Many driver authors have found this idiom to be very useful. A typical usage model is for "normal" debug messages to focus on fault paths and not be very "chatty", so that those messages can be left on during normal operation without much of a performance or syslog load. On the other hand "verbose" messages would be noisy enough that they wouldn't normally be enabled; they might even affect timings enough to change system or driver behavior. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:50 -07:00
Alan Stern	3f8df781fc	PM: remove deprecated dpm_runtime_* routines This patch (as933) removes the deprecated dpm_runtime_suspend() and dpm_runtime_resume() routines from the PM core. The only user of those routines is the PCMCIA ds driver; local replacements are added. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Kay Sievers	60a96a5956	Driver core: accept all valid action-strings in uevent-trigger This allows the uevent file to handle any type of uevent action to be triggered by userspace instead of just the "add" uevent. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Johannes Berg	2dbba6f773	[GENETLINK]: Dynamic multicast groups. Introduce API to dynamically register and unregister multicast groups. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 15:47:52 -07:00
Johannes Berg	84659eb529	[NETLIKN]: Allow removing multicast groups. Allow kicking listeners out of a multicast group when necessary (for example if that group is going to be removed.) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 15:47:05 -07:00
Johannes Berg	b4ff4f0419	[NETLINK]: allocate group bitmaps dynamically Allow changing the number of groups for a netlink family after it has been created, use RCU to protect the listeners bitmap keeping netlink_has_listeners() lock-free. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 15:46:06 -07:00
Andy Fleming	7132ab7f6e	Fix RGMII-ID handling in gianfar The TSEC/eTSEC can detect the interface to the PHY automatically, but it isn't able to detect whether the RGMII connection needs internal delay. So we need to detect that change in the device tree, propagate it to the platform data, and then check it if we're in RGMII. This fixes a bug on the 8641D HPCN board where the Vitesse PHY doesn't use the delay for RGMII. Signed-off-by: Andy Fleming <afleming@freescale.com>	2007-07-18 18:29:37 -04:00
Linus Torvalds	5bae7ac9fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6: [AVR32] Initialize phy_mask for both macb devices [AVR32] Fix atomic_add_unless() and atomic_sub_unless() [AVR32] Correct misspelled CONFIG_BLK_DEV_INITRD variable. [AVR32] Fix build error in parse_tag_rdimg() [AVR32] Don't wire up macb0 unless SW6 is in default position [AVR32] Wire up SSC platform device 0 as TX on ATSTK1000 board [AVR32] Add Atmel SSC driver platform device to AT32AP architecture [AVR32] Remove optimization of unaligned word loads [AVR32] Make STK1000 mux settings configurable [AVR32] CPU frequency scaling for AT32AP [AVR32] Split SM device into PM, RTC, WDT and EIC [AVR32] faster avr32 unaligned access	2007-07-18 12:57:52 -07:00
Haavard Skinnemoen	3da86ee4f1	[AVR32] Fix atomic_add_unless() and atomic_sub_unless() These functions depend on "result" being initalized to 0, but "result" is not included as an input constraint to the inline assembly block following its initialization, only as an output constraint. Thus gcc thinks it doesn't need to initialize it, so result ends up undefined if the "unless" condition is true. This fixes an oops in sunrpc where the faulty atomics caused rpciod_up() to not start the workqueue as it should. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:47:04 +02:00
Hans-Christian Egtvedt	9cf6cf58d0	[AVR32] Add Atmel SSC driver platform device to AT32AP architecture This patch adds register definitions, clocks and IRQs to the platform devices. Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com> Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:45:52 +02:00
Haavard Skinnemoen	e122eaf694	[AVR32] Remove optimization of unaligned word loads If we let unaligned word loads bypass the generic unaligned handling, gcc may combine it with a swap.b instruction and turn it into a ldwsp instruction, which does not work with unaligned addresses. Revert the optimization to prevent the RNDIS driver from crashing. Hopefully we'll figure something out later (it may be better to do the optimization in gcc.) Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:45:51 +02:00
Haavard Skinnemoen	7a5b805907	[AVR32] Split SM device into PM, RTC, WDT and EIC Split the SM platform device into separate platform devices for PM, RTC, WDT and EIC. This is more correct according to the documentation and allows us to simplify the code a little. Also turn the EIC driver into a real platform driver. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com>	2007-07-18 20:45:51 +02:00
David Brownell	c6083cd61b	[AVR32] faster avr32 unaligned access Use a more conventional implementation for unaligned access, and include an AT32AP-specific optimization: the CPU will handle unaligned words. The result is always faster and smaller for 8, 16, and 32 bit values. For 64 bit quantities, it's presumably larger. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:45:50 +02:00
Linus Torvalds	a267c0a887	Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb * 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (126 commits) V4L/DVB (5847): Clean up schedule_timeout calls in cpia2 and ivtv code V4L/DVB (5846): Clean up setting state and scheduling timeouts V4L/DVB (5844): ivtv: add high volume debugging flag V4L/DVB (5843): ivtv: fix missing signal_pending check. V4L/DVB (5842): ivtv: Add locking to ensure stream setup is atomic. V4L/DVB (5841): tveeprom: add support for Philips FQ1216LME MK3 tuner. V4L/DVB (5840): fix dst and cx24123: tune() callback changed signess for delay V4L/DVB (5838): dvb-core: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22) V4L/DVB (5837): stv0299: Fix signedness warning (gcc 4.1.1, kernel 2.6.22) V4L/DVB (5836): dvb-ttpci: re-initialize aspect ratio and pan scan after arm crash V4L/DVB (5835): saa7146/dvb-ttpci: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22) V4L/DVB (5834): dvb-core: fix signedness warnings and const stripping V4L/DVB (5832): ir-common: optimize bit extract function V4L/DVB (5831): stradis: use ARRAY_SIZE V4L/DVB (5829): Firmware extract and loading for opera dvb-usb update V4L/DVB (5828): Kconfig: Added GemTek USB radio and removed experimental dependency. V4L/DVB (5826): Usbvision: video mux cleanup V4L/DVB (5825): Alter the tuner type for the WinTV USB UK PAL model. V4L/DVB (5824): Usbvision: Hauppauge WinTV USB SECAM_L fix V4L/DVB (5821): Saa7134: add remote control support for LifeView FlyDVB-S LR300 ...	2007-07-18 11:25:58 -07:00
Linus Torvalds	d756d10e24	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: extent macros cleanup Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. ext4: remove extra IS_RDONLY() check ext4: Use is_power_of_2() Use zero_user_page() in ext4 where possible ext4: Remove 65000 subdirectory limit ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields ext4: Add nanosecond timestamps jbd2: Move jbd2-debug file to debugfs jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices ext4: Make extents code sanely handle on-disk corruption ext4: copy i_flags to inode flags on write ext4: Enable extents by default Change on-disk format to support 2^15 uninitialized extents write support for preallocated blocks fallocate support in ext4 sys_fallocate() implementation on i386, x86_64 and powerpc	2007-07-18 10:32:00 -07:00
Linus Torvalds	cdf4a6482d	Merge branch 'upstream' of git://git.infradead.org/~dedekind/ubi-2.6 * 'upstream' of git://git.infradead.org/~dedekind/ubi-2.6: (28 commits) UBI: fix compile warning UBI: fix error handling in erase worker UBI: fix comments UBI: remove unneeded error checks UBI: cleanup usage of try_module_get UBI: fix overflow bug UBI: bugfix in max_sqnum calculation UBI: bugfix in sqnum calculation UBI: fix signed-unsigned multiplication UBI: fix bug in atomic_leb_change() UBI: fix message UBI: fix debugging stuff UBI: bugfix in error path UBI: use is_power_of_2() UBI: fix freeing ubi->vtbl while unloading UBI: fix MAINTAINERS UBI: bugfix in ubi_leb_change() UBI: kill homegrown endian macros UBI: cleanup ioctl handling UBI: error path bugfix ...	2007-07-18 10:27:24 -07:00
Oliver Endriss	804b445894	V4L/DVB (5835): saa7146/dvb-ttpci: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22) Fix signedness warnings (gcc 4.1.1, kernel 2.6.22). Signed-off-by: Oliver Endriss <o.endriss@gmx.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:24:44 -03:00
Linus Torvalds	485cf925d8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits) [NETFILTER]: xt_connlimit needs to depend on nf_conntrack [NETFILTER]: ipt_iprange.h must #include <linux/types.h> [IrDA]: Fix IrDA build failure [ATM]: nicstar needs virt_to_bus [NET]: move __dev_addr_discard adjacent to dev_addr_discard for readability [NET]: merge dev_unicast_discard and dev_mc_discard into one [NET]: move dev_mc_discard from dev_mcast.c to dev.c [NETLINK]: negative groups in netlink_setsockopt [PPPOL2TP]: Reset meta-data in xmit function [PPPOL2TP]: Fix use-after-free [PKT_SCHED]: Some typo fixes in net/sched/Kconfig [XFRM]: Fix crash introduced by struct dst_entry reordering [TCP]: remove unused argument to cong_avoid op [ATM]: [idt77252] Rename CONFIG_ATM_IDT77252_SEND_IDLE to not resemble a Kconfig variable [ATM]: [drivers] ioremap balanced with iounmap [ATM]: [lanai] sram_test_word() must be __devinit [ATM]: [nicstar] Replace C code with call to ARRAY_SIZE() macro. [ATM]: Eliminate dead config variable CONFIG_BR2684_FAST_TRANS. [ATM]: Replacing kmalloc/memset combination with kzalloc. [NET]: gen_estimator deadlock fix ...	2007-07-18 10:24:36 -07:00
Michael Krufky	8218b0b2ca	V4L/DVB (5793): Tuner: remove hardware-specific info from public header Move internal structures and debug macros to drivers/media/video/tuner-driver.h Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:24:23 -03:00
Michael Krufky	7a91a80a0d	V4L/DVB (5753): Tuner: create struct tuner_operations Move tuner callback function pointers out of struct tuner, into struct tuner_operations. Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:24:02 -03:00
Michael Krufky	be2b85a135	V4L/DVB (5741): Tuner: add release callback Individual tuner drivers are now allocating memory themselves for their own private data structures. This changeset adds a release callback to the tuner operations, so that newer drivers that may require more complex data structures may release this private data themselves. Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:23:54 -03:00
Michael Krufky	b208319993	V4L/DVB (5719): Tuner: Move device-specific private data out of tuner struct Create private data struct for device specific private data. Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:23:48 -03:00
Linus Torvalds	31bdc5dc76	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Set vio->desc_buf to NULL after freeing. [SPARC]: Mark sparc and sparc64 as not having virt_to_bus [SPARC64]: Fix reset handling in VNET driver. [SPARC64]: Handle reset events in vio_link_state_change(). [SPARC64]: Handle LDC resets properly in domain-services driver. [SPARC64]: Massively simplify VIO device layer and support hot add/remove. [SPARC64]: Simplify VNET probing. [SPARC64]: Simplify VDC device probing. [SPARC64]: Add basic infrastructure for MD add/remove notification.	2007-07-18 10:23:37 -07:00
Mauro Carvalho Chehab	8573a9e6a8	V4L/DVB (5563a): Add experimental support for tea5761 tuner This driver were made based on tea5761 specs. Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:23:11 -03:00
FUJITA Tomonori	ba1fc175cc	[SCSI] libsas: add SAS management protocol handler This patch adds support for SAS Management Protocol (SMP) passthrough support via bsg. aic94xx can use this. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:18:20 -05:00
FUJITA Tomonori	7aa68e80bd	[SCSI] transport_sas: add SAS management protocol support The sas transport class attaches one bsg device to every SAS object (host, device, expander, etc). LLDs can define a function to handle SMP requests via sas_function_template::smp_handler. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:18:07 -05:00
Prakash, Sathya	0c8db6beb8	[SCSI] add PCI_VENDOR_ID macro for Brocade in pci_ids.h Adds PCI_VENDOR_ID_BROCADE macro in include/linux/pci_ids.h file. This macro is used in MPT Fusion FC drivers to support Brocade branded FC controllers signed-off-by: Sathya Prakash <sathya.prakash@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:17:11 -05:00
James Bottomley	0f05df8b3b	[SCSI] libsas, aic94xx: fix dma mapping cockups with ATA This one was noticed by Gilbert Wu of Adaptec: The libata core actually does the DMA mapping for you, so there has to be an exception in the device drivers that don't do dma mapping for ATA commands. However, since we've already done this, libsas must now dma map any ATA commands that it wishes to issue ... and yes, this is a horrible mess. Additionally, the test in aic94xx for ATA protocols isn't quite right. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:16:14 -05:00
Darrick J. Wong	3a2755af37	[SCSI] sas_ata: Implement sas_task_abort for ATA devices ATA devices need special handling for sas_task_abort. If the ATA command came from SCSI, then we merely need to tell SCSI to abort the scsi_cmnd. However, internal commands require a bit more work--we need to fill the qc with the appropriate error status and complete the command, and eventually post_internal will issue the actual ABORT TASK. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:16:03 -05:00
Darrick J. Wong	1c50dc83f9	[SCSI] sas_ata: ata_post_internal should abort the sas_task This patch adds a new field, lldd_task, to ata_queued_cmd so that libata users such as libsas can associate some data with a qc. The particular ambition with this patch is to associate a sas_task with a qc; that way, if libata decides to timeout a command, we can come back (in sas_ata_post_internal) and abort the sas task. One question remains: Is it necessary to reset the phy on error, or will the libata error handler take care of it? (Assuming that one is written, of course.) This patch, as it is today, works well enough to clean things up when an ATA device probe attempt fails halfway through the probe, though I'm not sure this is always the right thing to do. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:15:13 -05:00
Darrick J. Wong	338ec57003	[SCSI] Migrate libsas ATA code into a separate file This is a respin of my earlier patch that migrates the ATA support code into a separate file. For now, the controversial linking bits have been removed per James Bottomley's request for a patch that contains only the migration diffs, which means that libsas continues to require libata. I intend to address that problem in a separate patch. This patch is against the aic94xx-sas-2.6 git tree, and it has been sanity tested on my x206m with Seagate SATA and SAS disks without uncovering any new problems. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:14:40 -05:00
Darrick J. Wong	fa1c1e8f1e	[SCSI] Add SATA support to libsas Hook the scsi_host_template functions in libsas to delegate functionality to libata when appropriate. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Misc code changes and merge fixes and update for libata->drivers/ata move Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-07-18 11:12:53 -05:00
Jeremy Fitzhardinge	60223a326f	xen: Place vcpu_info structure into per-cpu memory An experimental patch for Xen allows guests to place their vcpu_info structs anywhere. We try to use this to place the vcpu_info into the PDA, which allows direct access. If this works, then switch to using direct access operations for irq_enable, disable, save_fl and restore_fl. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Keir Fraser <keir@xensource.com>	2007-07-18 08:47:45 -07:00
Jeremy Fitzhardinge	9f27ee5950	xen: add virtual block device driver. The block device frontend driver allows the kernel to access block devices exported exported by a virtual machine containing a physical block device driver. Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Greg KH <greg@kroah.com> Cc: Jens Axboe <axboe@kernel.dk>	2007-07-18 08:47:45 -07:00
Jeremy Fitzhardinge	4bac07c993	xen: add the Xenbus sysfs and virtual device hotplug driver This communicates with the machine control software via a registry residing in a controlling virtual machine. This allows dynamic creation, destruction and modification of virtual device configurations (network devices, block devices and CPUS, to name some examples). [ Greg, would you mind giving this a review? Thanks -J ] Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Greg KH <greg@kroah.com>	2007-07-18 08:47:45 -07:00
Jeremy Fitzhardinge	ad9a86121f	xen: Add grant table support Add Xen 'grant table' driver which allows granting of access to selected local memory pages by other virtual machines and, symmetrically, the mapping of remote memory pages which other virtual machines have granted access to. This driver is a prerequisite for many of the Xen virtual device drivers, which grant the 'device driver domain' restricted and temporary access to only those memory pages that are currently involved in I/O operations. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:44 -07:00
Jeremy Fitzhardinge	b536b4b962	xen: use the hvc console infrastructure for Xen console Implement a Xen back-end for hvc console. * * * Add early printk support via hvc console, enable using "earlyprintk=xen" on the kernel command line. From: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Olof Johansson <olof@lixom.net>	2007-07-18 08:47:44 -07:00
Jeremy Fitzhardinge	f87e4cac4f	xen: SMP guest support This is a fairly straightforward Xen implementation of smp_ops. Xen has its own IPI mechanisms, and has no dependency on any APIC-based IPI. The smp_ops hooks and the flush_tlb_others pv_op allow a Xen guest to avoid all APIC code in arch/i386 (the only apic operation is a single apic_read for the apic version number). One subtle point which needs to be addressed is unpinning pagetables when another cpu may have a lazy tlb reference to the pagetable. Xen will not allow an in-use pagetable to be unpinned, so we must find any other cpus with a reference to the pagetable and get them to shoot down their references. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andi Kleen <ak@suse.de>	2007-07-18 08:47:44 -07:00
Jeremy Fitzhardinge	c85b04c374	xen: add pinned page flag Add a new definition for PG_owner_priv_1 to define PG_pinned on Xen pagetable pages. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:43 -07:00
Jeremy Fitzhardinge	e46cdb66c8	xen: event channels Xen implements interrupts in terms of event channels. Each guest domain gets 1024 event channels which can be used for a variety of purposes, such as Xen timer events, inter-domain events, inter-processor events (IPI) or for real hardware IRQs. Within the kernel, we map the event channels to IRQs, and implement the whole interrupt handling using a Xen irq_chip. Rather than setting NR_IRQ to 1024 under PARAVIRT in order to accomodate Xen, we create a dynamic mapping between event channels and IRQs. Ideally, Linux will eventually move towards dynamically allocating per-irq structures, and we can use a 1:1 mapping between event channels and irqs. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric W. Biederman <ebiederm@xmission.com>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	5ead97c84f	xen: Core Xen implementation This patch is a rollup of all the core pieces of the Xen implementation, including: - booting and setup - pagetable setup - privileged instructions - segmentation - interrupt flags - upcalls - multicall batching BOOTING AND SETUP The vmlinux image is decorated with ELF notes which tell the Xen domain builder what the kernel's requirements are; the domain builder then constructs the address space accordingly and starts the kernel. Xen has its own entrypoint for the kernel (contained in an ELF note). The ELF notes are set up by xen-head.S, which is included into head.S. In principle it could be linked separately, but it seems to provoke lots of binutils bugs. Because the domain builder starts the kernel in a fairly sane state (32-bit protected mode, paging enabled, flat segments set up), there's not a lot of setup needed before starting the kernel proper. The main steps are: 1. Install the Xen paravirt_ops, which is simply a matter of a structure assignment. 2. Set init_mm to use the Xen-supplied pagetables (analogous to the head.S generated pagetables in a native boot). 3. Reserve address space for Xen, since it takes a chunk at the top of the address space for its own use. 4. Call start_kernel() PAGETABLE SETUP Once we hit the main kernel boot sequence, it will end up calling back via paravirt_ops to set up various pieces of Xen specific state. One of the critical things which requires a bit of extra care is the construction of the initial init_mm pagetable. Because Xen places tight constraints on pagetables (an active pagetable must always be valid, and must always be mapped read-only to the guest domain), we need to be careful when constructing the new pagetable to keep these constraints in mind. It turns out that the easiest way to do this is use the initial Xen-provided pagetable as a template, and then just insert new mappings for memory where a mapping doesn't already exist. This means that during pagetable setup, it uses a special version of xen_set_pte which ignores any attempt to remap a read-only page as read-write (since Xen will map its own initial pagetable as RO), but lets other changes to the ptes happen, so that things like NX are set properly. PRIVILEGED INSTRUCTIONS AND SEGMENTATION When the kernel runs under Xen, it runs in ring 1 rather than ring 0. This means that it is more privileged than user-mode in ring 3, but it still can't run privileged instructions directly. Non-performance critical instructions are dealt with by taking a privilege exception and trapping into the hypervisor and emulating the instruction, but more performance-critical instructions have their own specific paravirt_ops. In many cases we can avoid having to do any hypercalls for these instructions, or the Xen implementation is quite different from the normal native version. The privileged instructions fall into the broad classes of: Segmentation: setting up the GDT and the GDT entries, LDT, TLS and so on. Xen doesn't allow the GDT to be directly modified; all GDT updates are done via hypercalls where the new entries can be validated. This is important because Xen uses segment limits to prevent the guest kernel from damaging the hypervisor itself. Traps and exceptions: Xen uses a special format for trap entrypoints, so when the kernel wants to set an IDT entry, it needs to be converted to the form Xen expects. Xen sets int 0x80 up specially so that the trap goes straight from userspace into the guest kernel without going via the hypervisor. sysenter isn't supported. Kernel stack: The esp0 entry is extracted from the tss and provided to Xen. TLB operations: the various TLB calls are mapped into corresponding Xen hypercalls. Control registers: all the control registers are privileged. The most important is cr3, which points to the base of the current pagetable, and we handle it specially. Another instruction we treat specially is CPUID, even though its not privileged. We want to control what CPU features are visible to the rest of the kernel, and so CPUID ends up going into a paravirt_op. Xen implements this mainly to disable the ACPI and APIC subsystems. INTERRUPT FLAGS Xen maintains its own separate flag for masking events, which is contained within the per-cpu vcpu_info structure. Because the guest kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely ignored (and must be, because even if a guest domain disables interrupts for itself, it can't disable them overall). (A note on terminology: "events" and interrupts are effectively synonymous. However, rather than using an "enable flag", Xen uses a "mask flag", which blocks event delivery when it is non-zero.) There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which are implemented to manage the Xen event mask state. The only thing worth noting is that when events are unmasked, we need to explicitly see if there's a pending event and call into the hypervisor to make sure it gets delivered. UPCALLS Xen needs a couple of upcall (or callback) functions to be implemented by each guest. One is the event upcalls, which is how events (interrupts, effectively) are delivered to the guests. The other is the failsafe callback, which is used to report errors in either reloading a segment register, or caused by iret. These are implemented in i386/kernel/entry.S so they can jump into the normal iret_exc path when necessary. MULTICALL BATCHING Xen provides a multicall mechanism, which allows multiple hypercalls to be issued at once in order to mitigate the cost of trapping into the hypervisor. This is particularly useful for context switches, since the 4-5 hypercalls they would normally need (reload cr3, update TLS, maybe update LDT) can be reduced to one. This patch implements a generic batching mechanism for hypercalls, which gets used in many places in the Xen code. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Ian Pratt <ian.pratt@xensource.com> Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Cc: Adrian Bunk <bunk@stusta.de>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	a42089dd35	xen: Add Xen interface header files Add Xen interface header files. These are taken fairly directly from the Xen tree, but somewhat rearranged to suit the kernel's conventions. Define macros and inline functions for doing hypercalls into the hypervisor. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	688340ea34	Add a sched_clock paravirt_op The tsc-based get_scheduled_cycles interface is not a good match for Xen's runstate accounting, which reports everything in nanoseconds. This patch replaces this interface with a sched_clock interface, which matches both Xen and VMI's requirements. In order to do this, we: 1. replace get_scheduled_cycles with sched_clock 2. hoist cycles_2_ns into a common header 3. update vmi accordingly One thing to note: because sched_clock is implemented as a weak function in kernel/sched.c, we must define a real function in order to override this weak binding. This means the usual paravirt_ops technique of using an inline function won't work in this case. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Cc: john stultz <johnstul@us.ibm.com>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	d572929cdd	paravirt: helper to disable all IO space In a virtual environment, device drivers such as legacy IDE will waste quite a lot of time probing for their devices which will never appear. This helper function allows a paravirt implementation to lay claim to the whole iomem and ioport space, thereby disabling all device drivers trying to claim IO resources. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Rusty Russell <rusty@rustcorp.com.au>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	5f4352fbff	Allocate and free vmalloc areas Allocate/release a chunk of vmalloc address space: alloc_vm_area reserves a chunk of address space, and makes sure all the pagetables are constructed for that address range - but no pages. free_vm_area releases the address space range. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: "Andi Kleen" <ak@muc.de>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	c70df74376	paravirt: make siblingmap functions visible Paravirt implementations need to set the sibling map on new cpus. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	724faa89cc	paravirt: unstatic smp_store_cpu_info Paravirt implementations need to store cpu info when bringing up cpus. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	5378701324	paravirt: unstatic leave_mm Make globally leave_mm visible, specifically so that Xen can use it to shoot-down lazy uses of cr3. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	03f0c2f950	paravirt: increase IRQ limit When running with CONFIG_PARAVIRT, we may want lots of IRQs even if there's no IO APIC. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	6996d3b63f	paravirt: add a hook for once the allocator is ready Add a hook so that the paravirt backend knows when the allocator is ready. This is useful for the obvious reason that the allocator is available, but the other side-effect of having the bootmem allocator available is that each page now has an associated "struct page". Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	fdb4c338c8	paravirt: add an "mm" argument to alloc_pt It's useful to know which mm is allocating a pagetable. Xen uses this to determine whether the pagetable being added to is pinned or not. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge	810bab448e	use elfnote.h to generate vsyscall notes. Use existing elfnote.h to generate vsyscall notes, rather than doing it locally. Changes elfnote.h a bit to suit, since this is the first asm user, and it wasn't quite right. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.com>	2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge	86313c488a	usermodehelper: Tidy up waiting Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>	2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge	10a0a8d4e3	Add common orderly_poweroff() Various pieces of code around the kernel want to be able to trigger an orderly poweroff. This pulls them together into a single implementation. By default the poweroff command is /sbin/poweroff, but it can be set via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it can include command-line arguments. This patch replaces four other instances of invoking either "poweroff" or "shutdown -h now": two sbus drivers, and acpi thermal management. sparc64 has its own "powerd"; still need to determine whether it should be replaced by orderly_poweroff(). Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <ak@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David S. Miller <davem@davemloft.net>	2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge	0ab4dc9227	usermodehelper: split setup from execution Rather than having hundreds of variations of call_usermodehelper for various pieces of usermode state which could be set up, split the info allocation and initialization from the actual process execution. This means the general pattern becomes: info = call_usermodehelper_setup(path, argv, envp); /* basic state / call_usermodehelper_<SET EXTRA STATE>(info, stuff...); / extra state / call_usermodehelper_exec(info, wait); / run process and free info / This patch introduces wrappers for all the existing calling styles for call_usermodehelper_, but folds their implementations into one. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Howells <dhowells@redhat.com> Cc: Bj?rn Steinbrink <B.Steinbrink@gmx.de> Cc: Randy Dunlap <randy.dunlap@oracle.com>	2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge	d84d1cc764	add argv_split() argv_split() is a helper function which takes a string, splits it at whitespace, and returns a NULL-terminated argv vector. This is deliberately simple - it does no quote processing of any kind. [ Seems to me that this is something which is already being done in the kernel, but I couldn't find any other implementations, either to steal or replace. Keep an eye out. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com>	2007-07-18 08:47:40 -07:00
Jeremy Fitzhardinge	1e66df3ee3	add kstrndup Add a kstrndup function, modelled on strndup. Like strndup this returns a string copied into its own allocated memory, but it copies no more than the specified number of bytes from the source. Remove private strndup() from irda code. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@mandriva.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Panagiotis Issaris <takis@issaris.org> Cc: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>	2007-07-18 08:47:39 -07:00
Maciej W. Rozycki	8b4a40809e	zs: move to the serial subsystem This is a reimplementation of the zs driver for the serial subsystem. Any resemblance to the old driver is purely coincidential. ;-) I do hope I got the handling of modem lines right -- better do not tackle me about the issue unless you feel too good... Any users of the old driver: please note the numbers of the serial lines have now been swapped, i.e. ttyS0 <-> ttyS1 and ttyS2 <-> ttyS3. It has to do with the modem lines mentioned above; basically the port A in a given chip has to be initialised before the port B if you want to use the latter as the serial console (which is usually the case), as operations on modem lines of the serial line associated with the port B access both ports (see the comment at the top of the driver for the details of wiring used). Please update your scripts. This is also the reason each SCC now requests an IRQ once only (as seen in "/proc/interrupts") -- the handler takes care of both ports at once as the line associated with the port B has to take status update interrupts from both ports (and yet the line of the port A takes its own for itself too). The old driver never got it right... Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-18 08:38:22 -07:00
Yinghai Lu	b187f180cc	serial: add early_serial_setup() back to header file early_serial_setup was removed from serial.h, but forgot to put in serial_8250.h Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-18 08:38:22 -07:00
Christoph Hellwig	3261ebd7d4	UBI: kill homegrown endian macros Kill UBI's homegrown endianess handling and replace it with the standard kernel endianess handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2007-07-18 16:53:49 +03:00
Andreas Dilger	f8628a14a2	ext4: Remove 65000 subdirectory limit This patch adds support to ext4 for allowing more than 65000 subdirectories. Currently the maximum number of subdirectories is capped at 32000. If we exceed 65000 subdirectories in an htree directory it sets the inode link count to 1 and no longer counts subdirectories. The directory link count is not actually used when determining if a directory is empty, as that only counts subdirectories and not regular files that might be in there. A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if the subdir count for any directory crosses 65000. A later fsck will clear EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory with >65000 subdirs. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:38:01 -04:00
Kalpak Shah	6dd4ee7cab	ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:19:57 -04:00
Kalpak Shah	ef7f38359e	ext4: Add nanosecond timestamps This patch adds nanosecond timestamps for ext4. This involves adding *time_extra fields to the ext4_inode to extend the timestamps to 64-bits. Creation time is also added by this patch. These extended fields will fit into an inode if the filesystem was formatted with large inodes (-I 256 or larger) and there are currently no EAs consuming all of the available space. For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. So this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature flag(ro-compat so that older kernels can't create inodes with a smaller extra_isize). which indicates if the fields fitting inside s_min_extra_isize are available or not. If the expansion of inodes if unsuccessful then this feature will be disabled. This feature is only enabled if requested by the sysadmin. None of the extended inode fields is critical for correct filesystem operation. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:15:20 -04:00
Jose R. Santos	0f49d5d019	jbd2: Move jbd2-debug file to debugfs The jbd2-debug file used to be located in /proc/sys/fs/jbd2-debug, but it incorrectly used create_proc_entry() instead of the sysctl routines, and no proc entry was ever created. Instead of fixing this we might as well move the jbd2-debug file to debugfs which would be the preferred location for this kind of tunable. The new location is now /sys/kernel/debug/jbd2/jbd2-debug. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:50:18 -04:00
Jose R. Santos	e23291b912	jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG When the JBD code was forked to create the new JBD2 code base, the references to CONFIG_JBD_DEBUG where never changed to CONFIG_JBD2_DEBUG. This patch fixes that. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:57:06 -04:00
Jan Kara	ff9ddf7e84	ext4: copy i_flags to inode flags on write Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ext4-specific i_flags. Quota code changes these flags on quota files (to make it harder for sysadmin to screw himself) and these changes were not correctly propagated into the filesystem. (This is a forward port patch from ext3) Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:24:20 -04:00
Amit Arora	749269faca	Change on-disk format to support 2^15 uninitialized extents This change was suggested by Andreas Dilger. This patch changes the EXT_MAX_LEN value and extent code which marks/checks uninitialized extents. With this change it will be possible to have initialized extents with 2^15 blocks (earlier the max blocks we could have was 2^15 - 1). This way we can have better extent-to-block alignment. Now, maximum number of blocks we can have in an initialized extent is 2^15 and in an uninitialized extent is 2^15 - 1. Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-18 09:02:56 -04:00
Adrian Bunk	ebd61cc042	[NETFILTER]: ipt_iprange.h must #include <linux/types.h> ipt_iprange.h must #include <linux/types.h> since it uses __be32. This patch fixes kernel Bugzilla #7604. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 02:21:50 -07:00
Denis Cheng	456ad75c89	[NET]: move dev_mc_discard from dev_mcast.c to dev.c Because this function is only called by unregister_netdevice, this moving could make this non-global function static, and also remove its declaration in netdevice.h; Any further, function __dev_addr_discard is also just called by dev_mc_discard and dev_unicast_discard, keeping this two functions both in one c file could make __dev_addr_discard also static and remove its declaration in netdevice.h; Futhermore, the sequential call to dev_unicast_discard and then dev_mc_discard in unregister_netdevice have a similar mechanism that: (netif_tx_lock_bh / __dev_addr_discard / netif_tx_unlock_bh), they should merged into one to eliminate duplicates in acquiring and releasing the dev->_xmit_lock, this would be done in my following patch. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 02:10:54 -07:00
Patrick McHardy	bd0bf0765e	[XFRM]: Fix crash introduced by struct dst_entry reordering XFRM expects xfrm_dst->u.next to be same pointer as dst->next, which was broken by the dst_entry reordering in commit 1e19e02c~, causing an oops in xfrm_bundle_ok when walking the bundle upwards. Kill xfrm_dst->u.next and change the only user to use dst->next instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:55:52 -07:00
Stephen Hemminger	16751347a0	[TCP]: remove unused argument to cong_avoid op None of the existing TCP congestion controls use the rtt value pased in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt could have been -1 when handling a duplicate ack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:46:58 -07:00
Stephen Rothwell	0785b9dcdc	[SPARC]: Mark sparc and sparc64 as not having virt_to_bus Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:20:22 -07:00
David S. Miller	6160f63518	[SPARC64]: Massively simplify VIO device layer and support hot add/remove. Create and destroy VIO devices in response to MD update events. These run synchronously inside of the MD update mutex so the VIO layer doesn't need to do internal locking of any sort. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:20:04 -07:00
David S. Miller	920c3ed741	[SPARC64]: Add basic infrastructure for MD add/remove notification. And add dummy handlers for the VIO device layer. These will be filled in with real code after the vdc, vnet, and ds drivers are reworked to have simpler dependencies on the VIO device tree. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:19:51 -07:00
Dmitry Torokhov	5517853712	Input: document intended meaning of KEY_SWITCHVIDEOMODE Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-07-18 00:38:45 -04:00
Dmitry Torokhov	85f202d5df	Input: add driver for Fujitsu serial touchscreens These serial touchscreens are found on some Fujitsu lifebook P-series laptops, and the B6210. Using this requires a new version of inputattach and doing: inputattach -fjt /dev/ttyS0 Big thanks to Stephen Hemminger for testing it and making it work on his B6210 laptop. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-07-18 00:37:01 -04:00
Semih Hazar	1d25891f32	Input: ads7846 - re-check pendown status before reporting events Pendown status from the PENIRQ pin is currently read only at the beginning of a sample set. If the pen is lifted just after sampling has began then sampled values become wrong. This patch adds an optional platform penirq_recheck_delay attribute. If non-zero, samples are only reported to the input subsystem if PENIRQ is still active that long after the samples taken. Signed-off-by: Semih Hazar <semih.hazar@indefia.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-07-18 00:36:04 -04:00
Semih Hazar	e4f4886199	Input: ads7846 - introduce sample settling delay The ads7846 driver has support for filtering, but when the chip gets deselected between samples this causes noise. This patch adds support for an optional settling delay time, so that two consecutive samples will be taken with the specified delay time apart. This ensures that the chip won't be deselected, so the noise won't appear. Filtering can still be done, but will have less work to do since each time a new sample is taken the same delay applies. Signed-off-by: Semih Hazar <semih.hazar@indefia.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-07-18 00:35:56 -04:00
Amit Arora	56055d3ae4	write support for preallocated blocks This patch adds write support to the uninitialized extents that get created when a preallocation is done using fallocate(). It takes care of splitting the extents into multiple (upto three) extents and merging the new split extents with neighbouring ones, if possible. Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-17 21:42:38 -04:00
Amit Arora	a2df2a6340	fallocate support in ext4 This patch implements ->fallocate() inode operation in ext4. With this patch users of ext4 file systems will be able to use fallocate() system call for persistent preallocation. Current implementation only supports preallocation for regular files (directories not supported as of date) with extent maps. This patch does not support block-mapped files currently. Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of now. Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-17 21:42:41 -04:00
Amit Arora	97ac73506c	sys_fallocate() implementation on i386, x86_64 and powerpc fallocate() is a new system call being proposed here which will allow applications to preallocate space to any file(s) in a file system. Each file system implementation that wants to use this feature will need to support an inode operation called ->fallocate(). Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the the system becomes full. Currently, glibc provides an interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working on all file systems, but it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call. It is expected that posix_fallocate() will be modified to call this new system call first and incase the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks. ToDos: 1. Implementation on other architectures (other than i386, x86_64, and ppc). Patches for s390(x) and ia64 are already available from previous posts, but it was decided that they should be added later once fallocate is in the mainline. Hence not including those patches in this take. 2. Changes to glibc, a) to support fallocate() system call b) to make posix_fallocate() and posix_fallocate64() call fallocate() Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-17 21:42:44 -04:00
Paul Mundt	cb32da0416	slob: Kill off duplicate kzalloc() definition. With the slab zeroing allocations cleanups Christoph stubbed in a generic kzalloc(), which was missed on SLOB. Follow the SLAB/SLUB changes and kill off the __kzalloc() wrapper that SLOB was using. Reported-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 17:26:43 -07:00
Linus Torvalds	f3d9071667	Merge branch 'bsg' of git://git.kernel.dk/data/git/linux-2.6-block * 'bsg' of git://git.kernel.dk/data/git/linux-2.6-block: bsg: fix missing space in version print Don't define empty struct bsg_class_device if !CONFIG_BLK_DEV_BSG bsg: Kconfig updates bsg: minor cleanup bsg: device hash table cleanup bsg: fix initialization error handling bugs bsg: mark FUJITA Tomonori as bsg maintainer bsg: convert to dynamic major bsg: address various review comments	2007-07-17 15:26:31 -07:00
Al Viro	8dfd588c31	smp_call_function_single() should be a macro on UP ... or we end up with header include order problems from hell. E.g. on m68k this is 100% fatal - local_irq_enable() there wants preempt_count(), which wants task_struct fields, which we won't have when we are in smp.h pulled from sched.h. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 14:39:19 -07:00
Satyam Sharma	3bd858ab1c	Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check Introduce is_owner_or_cap() macro in fs.h, and convert over relevant users to it. This is done because we want to avoid bugs in the future where we check for only effective fsuid of the current task against a file's owning uid, without simultaneously checking for CAP_FOWNER as well, thus violating its semantics. [ XFS uses special macros and structures, and in general looked ... untouchable, so we leave it alone -- but it has been looked over. ] The (current->fsuid != inode->i_uid) check in generic_permission() and exec_permission_lite() is left alone, because those operations are covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: Al Viro <viro@ftp.linux.org.uk> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 12:00:03 -07:00
Linus Torvalds	49c13b51a1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits) KVM: Use CPU_DYING for disabling virtualization KVM: Tune hotplug/suspend IPIs KVM: Keep track of which cpus have virtualization enabled SMP: Allow smp_call_function_single() to current cpu i386: Allow smp_call_function_single() to current cpu x86_64: Allow smp_call_function_single() to current cpu HOTPLUG: Adapt thermal throttle to CPU_DYING HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING HOTPLUG: Add CPU_DYING notifier KVM: Clean up #includes KVM: Remove kvmfs in favor of the anonymous inodes source KVM: SVM: Reliably detect if SVM was disabled by BIOS KVM: VMX: Remove unnecessary code in vmx_tlb_flush() KVM: MMU: Fix Wrong tlb flush order KVM: VMX: Reinitialize the real-mode tss when entering real mode KVM: Avoid useless memory write when possible KVM: Fix x86 emulator writeback KVM: Add support for in-kernel pio handlers KVM: VMX: Fix interrupt checking on lightweight exit KVM: Adds support for in-kernel mmio handlers ...	2007-07-17 11:50:26 -07:00
Linus Torvalds	492559af23	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Clean away some code inside some non-existent CONFIG ifdefs [IA64] ar.itc access must really be after xtime_lock.sequence has been read [IA64] correctly count CPU objects in the ia64/sn hwperf interface [IA64] arbitary speed tty ioctl support [IA64] use machvec=dig on hpzx1 platforms	2007-07-17 11:31:57 -07:00
Al Viro	d37c6e1b67	saner typechecking in generic unaligned.h Verify that types would match for assignment (under sizeof, so we are safe from side effects or any code actually getting generated), then explicitly cast everywhere to the fixed-sized types. Kills a bunch of bogus warnings about constants being truncated (gcc, sparse), finds a pile of endianness problems hidden by old noise (sparse). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 11:01:07 -07:00
Al Viro	5072d5d58e	alpha termios.h hadn't been updated ... fortunately, termios and ktermios there are identical, so no run-time breakage happened. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 11:01:07 -07:00
NeilBrown	4ad1366376	md: change bitmap_unplug and others to void functions bitmap_unplug only ever returns 0, so it may as well be void. Two callers try to print a message if it returns non-zero, but that message is already printed by bitmap_file_kick. write_page returns an error which is not consistently checked. It always causes BITMAP_WRITE_ERROR to be set on an error, and that can more conveniently be checked. When the return of write_page is checked, an error causes bitmap_file_kick to be called - so move that call into write_page - and protect against recursive calls into bitmap_file_kick. bitmap_update_sb returns an error that is never checked. So make these 'void' and be consistent about checking the bit. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:15 -07:00
NeilBrown	713f6ab18b	md: improve the is_mddev_idle test fix Don't use 'unsigned' variable to track sync vs non-sync IO, as the only thing we want to do with them is a signed comparison, and fix up the comment which had become quite wrong. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:15 -07:00
Imre Deak	fe0e3a9df6	OMAP: add TI OMAP1610 accelerator entry. Signed-off-by: Trilok Soni <soni.trilok@gmail.com> Cc: Tony Lindgren <tony@atomide.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:13 -07:00
Geert Uytterhoeven	9900abfb5e	fbdev: Add fb_append_extra_logo() Add fb_append_extra_logo(), to append extra lines of logos below the standard Linux logo. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-By: James Simmons <jsimmons@infradead.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:13 -07:00
Antonino A. Daplas	eb3daa83c2	tgafb: actually allocate memory for the pseudo_palette No memory allocation was done for the pseudo_palette. Allocate one for it. Signed-off-by: Antonino Daplas <adaplas@gmail.com> Acked-by: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:12 -07:00
Jesse Barnes	b7269dd2b9	vt: add comment for unbind_con_driver() - add comment for unbind_con_driver(). - bind_con_driver() is made private again Signed-off-by: Jesse Barnes <jesse.barnes@intel.com> Signed-off-by: Antonino Daplas <adaplas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:11 -07:00
Jesse Barnes	cfafca8067	fbdev: fbcon: console unregistration from unregister_framebuffer This allows for proper console unregistration via the VT layer, and updates the FB layer to use it. This makes debugging new console drivers much easier, since you can properly clean them up before unloading. [adaplas] unregister_framebuffer() is typically called as part of the driver's module_exit(). Doing so otherwise will freeze the machine as the VT layer is holding reference counts on fbcon, and fbcon on the driver. With this change, it allows unregister_framebuffer() to be called safely anywhere as needed. Additions from the original: If multiple drivers are used by fbcon, and if one of them unregisters, a driver will take over the consoles vacated by the outgoing one (via set_con2fb_map). Once only the outgoing driver remains, then fbcon will unbind from the VT layer (if CONFIG_HW_CONSOLE_UNBINDING is set to y). It is important that these drivers implement fb_open() and fb_release() just to ensure that no other process is using the driver. Likewise, these drivers _must_ check the return value of unregister_framebuffer(). [akpm@linux-foundation.org: make fbcon_unbind() stub inline] Signed-off-by: Jesse Barnes <jesse.barnes@intel.com> Signed-off-by: Antonino Daplas <adaplas@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:11 -07:00
Antonino A. Daplas	623e71b035	fbcon: allow fbcon to use the primary display driver Allow fbcon to select the primary display adapter using the fb_is_primary_device() arch-specific helper. If a a primary adapter is detected, fbcon will unbind the old adapter from the VT layer, then rebind using the new adapter. This requires that bind_/unbind_con_driver() be made public. Because this feature may produce unexpected behavior (from the user's POV), this must be explicitly enabled in Kconfig. [akpm@linux-foundation.org: export unbind_con_driver] Signed-off-by: Antonino Daplas <adaplas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:11 -07:00
Antonino A. Daplas	317b3c2167	fbdev: detect primary display device Add function helper, fb_is_primary_device(). Given struct fb_info, it will return a nonzero value if the device is the primary display. Currently, only the i386 is supported where the function checks for the IORESOURCE_ROM_SHADOW flag. Signed-off-by: Antonino Daplas <adaplas@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:11 -07:00
Antonino A. Daplas	10eb2659cc	fbdev: move arch-specific bits to their respective subdirectories Move arch-specific bits of fb_mmap() to their respective subdirectories [bob.picco@hp.com: efi_range_is_wc is referenced but not declared] [bunk@stusta.de: fix include/asm-m68k/fb.h] Signed-off-by: Antonino Daplas <adaplas@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:11 -07:00
Mark Zhan	2e774c7caf	rtc: add support for the ST M48T59 RTC [akpm@linux-foundation.org: x86_64 build fix] [akpm@linux-foundation.org: The acpi guys changed the bin_attribute code] Signed-off-by: Mark Zhan <rongkai.zhan@windriver.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:09 -07:00
J. Bruce Fields	1269bc69b6	knfsd: nfsd: enforce per-flavor id squashing Allow root squashing to vary per-pseudoflavor, so that you can (for example) allow root access only when sufficiently strong security is in use. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	4796f45740	knfsd: nfsd4: secinfo handling without secinfo= option We could return some sort of error in the case where someone asks for secinfo on an export without the secinfo= option set--that'd be no worse than what we've been doing. But it's not really correct. So, hack up an approximate secinfo response in that case--it may not be complete, but it'll tell the client at least one acceptable security flavor. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
Andy Adamson	dcb488a3b7	knfsd: nfsd4: implement secinfo Implement the secinfo operation. (Thanks to Usha Ketineni wrote an earlier version of this support.) Cc: Usha Ketineni <uketinen@us.ibm.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	0ec757df97	knfsd: nfsd4: make readonly access depend on pseudoflavor Allow readonly access to vary depending on the pseudoflavor, using the flag passed with each pseudoflavor in the export downcall. The rest of the flags are ignored for now, though some day we might also allow id squashing to vary based on the flavor. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
Andy Adamson	32c1eb0cd7	knfsd: nfsd4: return nfserr_wrongsec Make the first actual use of the secinfo information by using it to return nfserr_wrongsec when an export is found that doesn't allow the flavor used on this request. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	3ab4d8b121	knfsd: nfsd: set rq_client to ip-address-determined-domain We want it to be possible for users to restrict exports both by IP address and by pseudoflavor. The pseudoflavor information has previously been passed using special auth_domains stored in the rq_client field. After the preceding patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so now we use rq_client for the ip information, as auth_null and auth_unix do. However, we keep around the special auth_domain in the rq_gssclient field for backwards compatibility purposes, so we can still do upcalls using the old "gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an appropriate export. This allows us to continue supporting old mountd. In fact, for this first patch, we always use the "gss/pseudoflavor" auth_domain (and only it) if it is available; thus rq_client is ignored in the auth_gss case, and this patch on its own makes no change in behavior; that will be left to later patches. Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap upcall by a dummy value--no version of idmapd has ever used it, and it's unlikely anyone really wants to perform idmapping differently depending on the where the client is (they may want to perform credential mapping differently, but that's a different matter--the idmapper just handles id's used in getattr and setattr). But I'm updating the idmapd code anyway, just out of general backwards-compatibility paranoia. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	0989a78896	knfsd: nfsd: provide export lookup wrappers which take a svc_rqst Split the callers of exp_get_by_name(), exp_find(), and exp_parent() into those that are processing requests and those that are doing other stuff (like looking up filehandles for mountd). No change in behavior, just a (fairly pointless, on its own) cleanup. (Note this has the effect of making nfsd_cross_mnt() pass rqstp->rq_client instead of exp->ex_client into exp_find_by_name(). However, the two should have the same value at this point.) Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	df547efb03	knfsd: nfsd4: simplify exp_pseudoroot arguments We're passing three arguments to exp_pseudoroot, two of which are just fields of the svc_rqst. Soon we'll want to pass in a third field as well. So let's just give up and pass in the whole struct svc_rqst. Also sneak in some minor style cleanups while we're at it. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Andy Adamson	e677bfe4d4	knfsd: nfsd4: parse secinfo information in exports downcall We add a list of pseudoflavors to each export downcall, which will be used both as a list of security flavors allowed on that export, and (in the order given) as the list of pseudoflavors to return on secinfo calls. This patch parses the new downcall information and adds it to the export structure, but doesn't use it for anything yet. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Andy Adamson	c4170583f6	knfsd: nfsd4: store pseudoflavor in request Add a new field to the svc_rqst structure to record the pseudoflavor that the request was made with. For now we record the pseudoflavor but don't use it for anything. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Meelap Shah	47f9940c55	knfsd: nfsd4: don't delegate files that have had conflicts One more incremental delegation policy improvement: don't give out a delegation on a file if conflicting access has previously required that a delegation be revoked on that file. (In practice we'll forget about the conflict when the struct nfs4_file is removed on close, so this is of limited use for now, though it should at least solve a temporary problem with self-conflicts on write opens from the same client.) Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Meelap Shah	c2f1a551de	knfsd: nfsd4: vary maximum delegation limit based on RAM size Our original NFSv4 delegation policy was to give out a read delegation on any open when it was possible to. Since the lifetime of a delegation isn't limited to that of an open, a client may quite reasonably hang on to a delegation as long as it has the inode cached. This becomes an obvious problem the first time a client's inode cache approaches the size of the server's total memory. Our first quick solution was to add a hard-coded limit. This patch makes a mild incremental improvement by varying that limit according to the server's total memory size, allowing at most 4 delegations per megabyte of RAM. My quick back-of-the-envelope calculation finds that in the worst case (where every delegation is for a different inode), a delegation could take about 1.5K, which would make the worst case usage about 6% of memory. The new limit works out to be about the same as the old on a 1-gig server. [akpm@linux-foundation.org: Don't needlessly bloat vmlinux] [akpm@linux-foundation.org: Make it right for highmem machines] Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	1e5140279f	knfsd: nfsd: remove unused header interface.h It looks like Al Viro gutted this header file five years ago and it hasn't been touched since. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	33a1060ae7	knfsd: nfsd4: fix NFSv4 filehandle size units confusion NFS4_FHSIZE is measured in bytes, not 4-byte words, so much more space than necessary is being allocated for struct nfs4_cb_recall. I should have wondered why this structure was so much larger than it needed to be! Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Marc Eshel	9a8db97e77	knfsd: lockd: nfsd4: use same grace period for lockd and nfsd4 Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot, during which clients may reclaim locks from the previous server instance, but may not acquire new locks. Currently the lockd and nfsd enforce grace periods of different lengths. This may cause problems when we reboot a server with both v2/v3 and v4 clients. For example, if the lockd grace period is shorter (as is likely the case), then a v3 client might acquire a new lock that conflicts with a lock already held (but not yet reclaimed) by a v4 client. This patch calculates a lease time that lockd and nfsd can both use. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Christoph Hellwig	d37065cd6d	knfsd: exportfs: add procedural interface for NFSD Currently NFSD calls directly into filesystems through the export_operations structure. I plan to change this interface in various ways in later patches, and want to avoid the export of the default operations to NFSD, so this patch adds two simple exportfs_encode_fh/exportfs_decode_fh helpers for NFSD to call instead of poking into exportfs guts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00

... 3 4 5 6 7 ...

15289 Commits