linux

Author	SHA1	Message	Date
Linus Torvalds	8f616cd524	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: remove write-only variables from ext4_ordered_write_end ext4: unexport jbd2_journal_update_superblock ext4: Cleanup whitespace and other miscellaneous style issues ext4: improve ext4_fill_flex_info() a bit ext4: Cleanup the block reservation code path ext4: don't assume extents can't cross block groups when truncating ext4: Fix lack of credits BUG() when deleting a badly fragmented inode ext4: Fix ext4_ext_journal_restart() ext4: fix ext4_da_write_begin error path jbd2: don't abort if flushing file data failed ext4: don't read inode block if the buffer has a write error ext4: Don't allow lg prealloc list to be grow large. ext4: Convert the usage of NR_CPUS to nr_cpu_ids. ext4: Improve error handling in mballoc ext4: lock block groups when initializing ext4: sync up block and inode bitmap reading functions ext4: Allow read/only mounts with corrupted block group checksums ext4: Fix data corruption when writing to prealloc area	2008-08-03 10:50:44 -07:00
Eric Sandeen	7d55992d60	ext4: remove write-only variables from ext4_ordered_write_end The variables 'from' and 'to' are not used anywhere. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-08-02 21:22:18 -04:00
OGAWA Hirofumi	17263849c7	fat: Fix allow_utime option FAT has to handle the newly introduced ATTR_TIMES_SET for allow_utime option. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-02 09:13:55 -07:00
Linus Torvalds	b8a327be3f	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-pull * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-pull: (64 commits) [XFS] Remove vn_revalidate calls in xfs. [XFS] Now that xfs_setattr is only used for attributes set from ->setattr [XFS] xfs_setattr currently doesn't just handle the attributes set through [XFS] fix use after free with external logs or real-time devices [XFS] A bug was found in xfs_bmap_add_extent_unwritten_real(). In a [XFS] fix compilation without CONFIG_PROC_FS [XFS] s/XFS_PURGE_INODE/IRELE/g s/VN_HOLD(XFS_ITOV())/IHOLD()/ [XFS] fix mount option parsing in remount [XFS] Disable queue flag test in barrier check. [XFS] streamline init/exit path [XFS] Fix up problem when CONFIG_XFS_POSIX_ACL is not set and yet we still [XFS] Don't assert if trying to mount with blocksize > pagesize [XFS] Don't update mtime on rename source [XFS] Allow xfs_bmbt_split() to fallback to the lowspace allocator [XFS] Restore the lowspace extent allocator algorithm [XFS] use minleft when allocating in xfs_bmbt_split() [XFS] attrmulti cleanup [XFS] Check for invalid flags in xfs_attrlist_by_handle. [XFS] Fix CI lookup in leaf-form directories [XFS] Use the generic xattr methods. ...	2008-08-01 12:39:09 -07:00
Linus Torvalds	63a16f9016	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: [PATCH] ocfs2: Release mutex in error handling code [PATCH] ocfs2: Fix oops when racing files truncates with writes into an mmap region [PATCH 2/2] ocfs2: Fix race between mount and recovery [PATCH 1/2] ocfs2: Add counter in struct ocfs2_dinode to track journal replays [PATCH] configfs: Convenience macros for attribute definition. [PATCH] configfs: Pin configfs subsystems separately from new config_items. [PATCH] configfs: Fix open directory making rmdir() fail [PATCH] configfs: Lock new directory inodes before removing on cleanup after failure [PATCH] configfs: Prevent userspace from creating new entries under attaching directories [PATCH] configfs: Fix failing symlink() making rmdir() fail [PATCH] configfs: Fix symlink() to a removing item [PATCH] configfs: Include linux/err.h in linux/configfs.h	2008-08-01 11:54:05 -07:00
Linus Torvalds	623fa579e6	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: [MTD] [NAND] drivers/mtd/nand/nandsim.c: fix printk warnings [MTD] [NAND] Blackfin NFC Driver: Cleanup the error exit path of bf5xx_nand_probe function [MTD] [NAND] Blackfin NFC Driver: use standard dev_err() rather than printk() [MTD] [NAND] Blackfin NFC Driver: enable Blackfin nand HWECC support by default [MTD] [NAND] Blackfin NFC Driver: add proper devinit/devexit markings to probe/remove functions [MTD] [NAND] Blackfin NFC Driver: add support for the ECC layout the Blackfin bootrom uses [MTD] [NAND] Blackfin NFC Driver: fix bug - hw ecc calc by making sure we extract 11 bits from each register instead of 10 [MTD] [NAND] Blackfin NFC Driver: fix bug - do not clobber the status from the first 256 bytes if operating on 512 pages [MTD] [NAND] diskonchip.c fix sparse endian warnings [MTD] [NAND] drivers/mtd/nand/nandsim.c needs div64.h [JFFS2] Fix allocation of summary buffer Fix rename of at91_nand -> atmel_nand [MTD] [NOR] drivers/mtd/chips/jedec_probe.c: fix Am29DL800BB device ID [MTD] MTD_DEBUG always does compile-time typechecks [MTD] DataFlash: bugfix, binary page sizes now handled [MTD] [NAND] fsl_elbc_nand.c: fix printk warning [MTD] [NAND] nandsim: support random page read command [MTD] [NAND] fix subpage read for small page NAND	2008-08-01 11:29:54 -07:00
Al Viro	8d66bf5481	[PATCH] pass struct path * to do_add_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:32 -04:00
Al Viro	d5686b444f	[PATCH] switch mtd and dm-table to lookup_bdev() No need to open-code it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:31 -04:00
Miklos Szeredi	a95164d979	[patch 3/4] vfs: remove unused nameidata argument of may_create() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:30 -04:00
Alexey Dobriyan	7ee7c12b71	[PATCH] devpts: switch to IDA Devpts code wants just numbers for tty indexes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:29 -04:00
Alexey Dobriyan	9a18540915	[PATCH 2/2] proc: switch inode number allocation to IDA proc doesn't use "associate pointer with id" feature of IDR, so switch to IDA. NOTE, NOTE, NOTE: Do not apply if release_inode_number() still mantions MAX_ID_MASK! Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:28 -04:00
Alexey Dobriyan	67935df49d	[PATCH 1/2] proc: fix inode number bogorithmetic Id which proc gets from IDR for inode number and id which proc removes from IDR do not match. E.g. 0x11a transforms into 0x8000011a. Which stayed unnoticed for a long time because, surprise, idr_remove() masks out that high bit before doing anything. All of this due to "\| ~MAX_ID_MASK" in release_inode_number(). I still don't understand how it's supposed to work, because "\| ~MASK" is not an inversion for "& MAX" operation. So, use just one nice, working addition. Make start offset unsigned int, while I'm at it. It's longness is not used anywhere. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:27 -04:00
Al Viro	8266602033	[PATCH] fix bdev leak in block_dev.c do_open() Callers expect it to drop reference to bdev on all failure exits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:26 -04:00
Al Viro	77e69dac3c	[PATCH] fix races and leaks in vfs_quota_on() users * new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the pathname resolution. * callers of vfs_quota_on() that do their own pathname resolution and checks based on it are switched to vfs_quota_on_path(); that way we avoid the races. * reiserfs leaked dentry/vfsmount references on several failure exits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:25 -04:00
Al Viro	1b7e190b47	[PATCH] clean dup2() up a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:24 -04:00
Al Viro	1027abe882	[PATCH] merge locate_fd() and get_unused_fd() New primitive: alloc_fd(start, flags). get_unused_fd() and get_unused_fd_flags() become wrappers on top of it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:23 -04:00
Stephen Smalley	f418b00607	Re: BUG at security/selinux/avc.c:883 (was: Re: linux-next: Tree for July 17: early crash on x86-64) SELinux needs MAY_APPEND to be passed down to the security hook. Otherwise, we get permission denials when only append permission is granted by policy even if the opening process specified O_APPEND. Shows up as a regression in the ltp selinux testsuite, fixed by this patch. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:21 -04:00
David Woodhouse	b7600dba6d	[JFFS2] Fix allocation of summary buffer We can't use vmalloc for the buffer we use for writing summaries, because some drivers may want to DMA from it. So limit the size to 64KiB and use kmalloc for it instead. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2008-08-01 10:07:51 +01:00
Julia Lawall	c259ae52e2	[PATCH] ocfs2: Release mutex in error handling code The mutex is released on a successful return, so it would seem that it should be released on an error return as well. The semantic patch finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression l; @@ mutex_lock(l); ... when != mutex_unlock(l) when any when strict ( if (...) { ... when != mutex_unlock(l) + mutex_unlock(l); return ...; } \| mutex_unlock(l); ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:14 -07:00
Sunil Mushran	961cecbee6	[PATCH] ocfs2: Fix oops when racing files truncates with writes into an mmap region This patch fixes an oops that is reproduced when one races writes to a mmap-ed region with another process truncating the file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:14 -07:00
Sunil Mushran	539d826409	[PATCH 2/2] ocfs2: Fix race between mount and recovery As the fs recovery is asynchronous, there is a small chance that another node can mount (and thus recover) the slot before the recovery thread gets to it. If this happens, the recovery thread will block indefinitely on the journal/slot lock as that lock will be held for the duration of the mount (by design) by the node assigned to that slot. The solution implemented is to keep track of the journal replays using a recovery generation in the journal inode, which will be incremented by the thread replaying that journal. The recovery thread, before attempting the blocking lock on the journal/slot lock, will compare the generation on disk with what it has cached and skip recovery if it does not match. This bug appears to have been inadvertently introduced during the mount/umount vote removal by mainline commit `34d024f843`. In the mount voting scheme, the messaging would indirectly indicate that the slot was being recovered. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:14 -07:00
Sunil Mushran	c69991aac7	[PATCH 1/2] ocfs2: Add counter in struct ocfs2_dinode to track journal replays This patch renames the ij_pad to ij_recovery_generation in struct ocfs2_dinode. This will be used to keep count of journal replays after an unclean shutdown. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00
Joel Becker	70526b6744	[PATCH] configfs: Pin configfs subsystems separately from new config_items. configfs_mkdir() creates a new item by calling its parent's ->make_item/group() functions. Once that object is created, configfs_mkdir() calls try_module_get() on the new item's module. If it succeeds, the module owning the new item cannot be unloaded, and configfs is safe to reference the item. If the item and the subsystem it belongs to are part of the same module, the subsystem is also pinned. This is the common case. However, if the subsystem is made up of multiple modules, this may not pin the subsystem. Thus, it would be possible to unload the toplevel subsystem module while there is still a child item. Thus, we now try_module_get() the subsystem's module. This only really affects children of the toplevel subsystem group. Deeper children already have their parents pinned. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00
Louis Rilling	99cefda42a	[PATCH] configfs: Fix open directory making rmdir() fail When checking for user-created elements under an item to be removed by rmdir(), configfs_detach_prep() counts fake configfs_dirents created by dir_open() as user-created and fails when finding one. It is however perfectly valid to remove a directory that is open. Simply make configfs_detach_prep() skip fake configfs_dirent, like it already does for attributes, and like detach_groups() does. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00
Louis Rilling	2e2ce171c3	[PATCH] configfs: Lock new directory inodes before removing on cleanup after failure Once a new configfs directory is created by configfs_attach_item() or configfs_attach_group(), a failure in the remaining initialization steps leads to removing a directory which inode the VFS may have already accessed. This commit adds the necessary inode locking to safely remove configfs directories while cleaning up after a failure. As an advantage, the locking rules of populate_groups() and detach_groups() become the same: the caller must have the group's inode mutex locked. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00
Louis Rilling	2a109f2a41	[PATCH] configfs: Prevent userspace from creating new entries under attaching directories process 1: process 2: configfs_mkdir("A") attach_group("A") attach_item("A") d_instantiate("A") populate_groups("A") mutex_lock("A") attach_group("A/B") attach_item("A") d_instantiate("A/B") mkdir("A/B/C") do_path_lookup("A/B/C", LOOKUP_PARENT) ok lookup_create("A/B/C") mutex_lock("A/B") ok configfs_mkdir("A/B/C") ok attach_group("A/C") attach_item("A/C") d_instantiate("A/C") populate_groups("A/C") mutex_lock("A/C") attach_group("A/C/D") attach_item("A/C/D") failure mutex_unlock("A/C") detach_groups("A/C") nothing to do mkdir("A/C/E") do_path_lookup("A/C/E", LOOKUP_PARENT) ok lookup_create("A/C/E") mutex_lock("A/C") ok configfs_mkdir("A/C/E") ok detach_item("A/C") d_delete("A/C") mutex_unlock("A") detach_groups("A") mutex_lock("A/B") detach_group("A/B") detach_groups("A/B") nothing since no _default_ group detach_item("A/B") mutex_unlock("A/B") d_delete("A/B") detach_item("A") d_delete("A") Two bugs: 1/ "A/B/C" and "A/C/E" are created, but never removed while their parent are removed in the end. The same could happen with symlink() instead of mkdir(). 2/ "A" and "A/C" inodes are not locked while detach_item() is called on them, which may probably confuse VFS. This commit fixes 1/, tagging new directories with CONFIGFS_USET_CREATING before building the inode and instantiating the dentry, and validating the whole group+default groups hierarchy in a second pass by clearing CONFIGFS_USET_CREATING. mkdir(), symlink(), lookup(), and dir_open() simply return -ENOENT if called in (or linking to) a directory tagged with CONFIGFS_USET_CREATING. This does not prevent userspace from calling stat() successfuly on such directories, but this prevents userspace from adding (children to \| symlinking from/to \| read/write attributes of \| listing the contents of) not validated items. In other words, userspace will not interact with the subsystem on a new item until the new item creation completes correctly. It was first proposed to re-use CONFIGFS_USET_IN_MKDIR instead of a new flag CONFIGFS_USET_CREATING, but this generated conflicts when checking the target of a new symlink: a valid target directory in the middle of attaching a new user-created child item could be wrongly detected as being attached. 2/ is fixed by next commit. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00
Louis Rilling	9a73d78cda	[PATCH] configfs: Fix failing symlink() making rmdir() fail On a similar pattern as mkdir() vs rmdir(), a failing symlink() may make rmdir() fail for the symlink's parent and the symlink's target as well. failing symlink() making target's rmdir() fail: process 1: process 2: symlink("A/S" -> "B") allow_link() create_link() attach to "B" links list rmdir("B") detach_prep("B") error because of new link configfs_create_link("A", "S") error (eg -ENOMEM) failing symlink() making parent's rmdir() fail: process 1: process 2: symlink("A/D/S" -> "B") allow_link() create_link() attach to "B" links list configfs_create_link("A/D", "S") make_dirent("A/D", "S") rmdir("A") detach_prep("A") detach_prep("A/D") error because of "S" create("S") error (eg -ENOMEM) We cannot use the same solution as for mkdir() vs rmdir(), since rmdir() on the target cannot wait on the i_mutex of the new symlink's parent without risking a deadlock (with other symlink() or sys_rename()). Instead we define a global mutex protecting all configfs symlinks attachment, so that rmdir() can avoid the races above. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00
Louis Rilling	4768e9b18d	[PATCH] configfs: Fix symlink() to a removing item The rule for configfs symlinks is that symlinks always point to valid config_items, and prevent the target from being removed. However, configfs_symlink() only checks that it can grab a reference on the target item, without ensuring that it remains alive until the symlink is correctly attached. This patch makes configfs_symlink() fail whenever the target is being removed, using the CONFIGFS_USET_DROPPING flag set by configfs_detach_prep() and protected by configfs_dirent_lock. This patch introduces a similar (weird?) behavior as with mkdir failures making rmdir fail: if symlink() races with rmdir() of the parent directory (or its youngest user-created ancestor if parent is a default group) or rmdir() of the target directory, and then fails in configfs_create(), this can make the racing rmdir() fail despite the concerned directory having no user-created entry (resp. no symlink pointing to it or one of its default groups) in the end. This behavior is fixed in later patches. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:12 -07:00
Joel Becker	dacdd0e047	[PATCH] configfs: Include linux/err.h in linux/configfs.h We now use PTR_ERR() in the ->make_item() and ->make_group() operations. Folks including configfs.h need err.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:12 -07:00
Linus Torvalds	0056e65f9e	romfs_readpage: don't report errors for pages beyond i_size We zero-fill them like we are supposed to, and that's all fine. It's only an error if the 'romfs_copyfrom()' routine isn't able to fill the data that is supposed to be there. Most of the patch is really just re-organizing the code a bit, and using separate variables for the error value and for how much of the page we actually filled from the filesystem. Reported-and-tested-by: Chris Fester <cfester@wms.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Matt Waddel <matt.waddel@freescale.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 14:30:34 -07:00
Thomas Petazzoni	dbacefc9c4	fs/buffer.c: uninline __remove_assoc_queue() Uninline the __remove_assoc_queue() function in fs/buffer.c, called at too many places and too long to really be inlined. Size results: text data bss dec hex filename 1134606 118840 212992 1466438 166046 vmlinux.old 1134303 118840 212992 1466135 165f17 vmlinux -303 0 0 -303 -12F +/- This patch is part of the Linux Tiny project and has been originally written by Matt Mackall <mpm@selenic.com>. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:46 -07:00
Harvey Harrison	d406f66ddb	omfs: sparse annotations Missing cpu_to_be64 on some constant assignments. fs/omfs/dir.c:107:16: warning: incorrect type in assignment (different base types) fs/omfs/dir.c:107:16: expected restricted __be64 [usertype] i_sibling fs/omfs/dir.c:107:16: got unsigned long long fs/omfs/file.c:33:13: warning: incorrect type in assignment (different base types) fs/omfs/file.c:33:13: expected restricted __be64 [usertype] e_next fs/omfs/file.c:33:13: got unsigned long long fs/omfs/file.c:36:24: warning: incorrect type in assignment (different base types) fs/omfs/file.c:36:24: expected restricted __be64 [usertype] e_cluster fs/omfs/file.c:36:24: got unsigned long long fs/omfs/file.c:37:23: warning: incorrect type in assignment (different base types) fs/omfs/file.c:37:23: expected restricted __be64 [usertype] e_blocks fs/omfs/file.c:37:23: got unsigned long long fs/omfs/bitmap.c:74:18: warning: incorrect type in argument 2 (different signedness) fs/omfs/bitmap.c:74:18: expected unsigned long volatile addr fs/omfs/bitmap.c:74:18: got long <noident> fs/omfs/bitmap.c:77:20: warning: incorrect type in argument 2 (different signedness) fs/omfs/bitmap.c:77:20: expected unsigned long volatile addr fs/omfs/bitmap.c:77:20: got long <noident> fs/omfs/bitmap.c:112:17: warning: incorrect type in argument 2 (different signedness) fs/omfs/bitmap.c:112:17: expected unsigned long volatile addr fs/omfs/bitmap.c:112:17: got long <noident> Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:46 -07:00
Alex Nixon	3971e1a917	VFS: increase pseudo-filesystem block size to PAGE_SIZE This commit: commit `ba52de123d` Author: Theodore Ts'o <tytso@mit.edu> Date: Wed Sep 27 01:50:49 2006 -0700 [PATCH] inode-diet: Eliminate i_blksize from the inode structure caused the block size used by pseudo-filesystems to decrease from PAGE_SIZE to 1024 leading to a doubling of the number of context switches during a kernbench run. Signed-off-by: Alex Nixon <Alex.Nixon@citrix.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hugh@veritas.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:44 -07:00
Eric Sandeen	7fcba05437	eCryptfs: use page_alloc not kmalloc to get a page of memory With SLUB debugging turned on in 2.6.26, I was getting memory corruption when testing eCryptfs. The root cause turned out to be that eCryptfs was doing kmalloc(PAGE_CACHE_SIZE); virt_to_page() and treating that as a nice page-aligned chunk of memory. But at least with SLUB debugging on, this is not always true, and the page we get from virt_to_page does not necessarily match the PAGE_CACHE_SIZE worth of memory we got from kmalloc. My simple testcase was 2 loops doing "rm -f fileX; cp /tmp/fileX ." for 2 different multi-megabyte files. With this change I no longer see the corruption. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Yoichi Yuasa	e3b6e806cf	bio-integrity: remove EXPORT_SYMBOL for bio_integrity_init_slab() I got section mismatch message about bio_integrity_init_slab(). WARNING: fs/built-in.o(__ksymtab+0xb60): Section mismatch in reference from the variable __ksymtab_bio_integrity_init_slab to the function .init.text:bio_integrity_init_slab() The symbol bio_integrity_init_slab is exported and annotated __init Fix this by removing the __init annotation of bio_integrity_init_slab or drop the export. It only call from init_bio(). The EXPORT_SYMBOL() can be removed. Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Hisashi Hifumi	8ab22b9abb	vfs: pagecache usage optimization for pagesize!=blocksize When we read some part of a file through pagecache, if there is a pagecache of corresponding index but this page is not uptodate, read IO is issued and this page will be uptodate. I think this is good for pagesize == blocksize environment but there is room for improvement on pagesize != blocksize environment. Because in this case a page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate. So I suggest that when all buffers which correspond to a part of a file that we want to read are uptodate, use this pagecache and copy data from this pagecache to user buffer even if a page is not uptodate. This can reduce read IO and improve system throughput. I wrote a benchmark program and got result number with this program. This benchmark do: 1: mount and open a test file. 2: create a 512MB file. 3: close a file and umount. 4: mount and again open a test file. 5: pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). 6: measure time of preading randomly 100000 times on a test file. The result was: 2.6.26 330 sec 2.6.26-patched 226 sec Arch:i386 Filesystem:ext3 Blocksize:1024 bytes Memory: 1GB On ext3/4, a file is written through buffer/block. So random read/write mixed workloads or random read after random write workloads are optimized with this patch under pagesize != blocksize environment. This test result showed this. The benchmark program is as follows: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #define LEN 1024 #define LOOP 1024512 / 512MB / main(void) { unsigned long i, offset, filesize; int fd; char buf[LEN]; time_t t1, t2; if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } memset(buf, 0, LEN); fd = open("/root/test1/testfile", O_CREAT\|O_RDWR\|O_TRUNC); if (fd < 0) { perror("cannot open file\n"); exit(1); } for (i = 0; i < LOOP; i++) write(fd, buf, LEN); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } fd = open("/root/test1/testfile", O_RDWR); if (fd < 0) { perror("cannot open file\n"); exit(1); } filesize = LEN LOOP; for (i = 0; i < 300000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pwrite(fd, buf, LEN, offset); } printf("start test\n"); time(&t1); for (i = 0; i < 100000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pread(fd, buf, LEN, offset); } time(&t2); printf("%ld sec\n", t2-t1); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } } Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Hugh Dickins	ca5b172bd2	exec: include pagemap.h again to fix build Fix compilation errors on avr32 and without CONFIG_SWAP, introduced by `ba92a43dba` ("exec: remove some includes") In file included from include/asm/tlb.h:24, from fs/exec.c:55: include/asm-generic/tlb.h: In function 'tlb_flush_mmu': include/asm-generic/tlb.h:76: error: implicit declaration of function 'release_pages' include/asm-generic/tlb.h: In function 'tlb_remove_page': include/asm-generic/tlb.h:105: error: implicit declaration of function 'page_cache_release' make[1]: *** [fs/exec.o] Error 1 This straightforward part-revert is nobody's favourite patch to address the underlying tlb.h needs swap.h needs pagemap.h (but sparc won't like that) mess; but appropriate to fix the build now before any overhaul. Reported-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Reported-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Tested-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:20 -07:00
Linus Torvalds	3988ba0708	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fix uninitialized variable for search_rsb_list callers dlm: release socket on error dlm: fix basts for granted CW waiting PR/CW dlm: check for null in device_write	2008-07-28 09:46:00 -07:00
Paul Mundt	3bc24a1a54	sh: Initial ELF FDPIC support. This adds initial support for ELF FDPIC on MMU-less SH, as per version 0.2 of the ABI definition at: http://www.codesourcery.com/public/docs/sh-fdpic/sh-fdpic-abi.txt Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2008-07-28 18:10:28 +09:00
Paul Mundt	9b14ec35f0	binfmt_elf_fdpic: Magical stack pointer index, for NEW_AUX_ENT compat. While implementing binfmt_elf_fdpic on SH it quickly became apparent that SH was the first platform to support both binfmt_elf_fdpic and binfmt_elf, as well as the only of the FDPIC platforms to make use of the auxvt. Currently binfmt_elf_fdpic uses a special version of NEW_AUX_ENT() where the first argument is the entry displacement after csp has been adjusted, being reset after each adjustment. As we have no ability to sort this out through the platform's ARCH_DLINFO, this index needs to be managed entirely in create_elf_fdpic_tables(). Presently none of the platforms that set their own auxvt entries are able to do so through their respective ARCH_DLINFOs when using binfmt_elf_fdpic. In addition to this, binfmt_elf_fdpic has been looking at DLINFO_ARCH_ITEMS for the number of architecture-specific entries in the auxvt. This is legacy cruft, and is not defined by any platforms in-tree, even those that make heavy use of the auxvt. AT_VECTOR_SIZE_ARCH is always available, and contains the number that is of interest here, so we switch to using that unconditionally as well. As this has direct bearing on how much stack is used, platforms that have configurable (or dynamically adjustable) NEW_AUX_ENT calls need to either make AT_VECTOR_SIZE_ARCH more fine-grained, or leave it as a worst-case and live with some lost stack space if those entries aren't pushed (some platforms may also need to purposely sacrifice some space here for alignment considerations, as noted in the code -- although not an issue for any FDPIC-capable platform today). Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: David Howells <dhowells@redhat.com>	2008-07-28 18:10:28 +09:00
Christoph Hellwig	f13fae2d2a	[XFS] Remove vn_revalidate calls in xfs. These days most of the attributes in struct inode are properly kept in sync by XFS. This patch removes the need for vn_revalidate completely by: - keeping inode.i_flags uptodate after any flags are updated in xfs_ioctl_setattr - keeping i_mode, i_uid and i_gid uptodate in xfs_setattr SGI-PV: 984566 SGI-Modid: xfs-linux-melb:xfs-kern:31679a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:39 +10:00
Christoph Hellwig	0f285c8a1c	[XFS] Now that xfs_setattr is only used for attributes set from ->setattr it can be switched to take struct iattr directly and thus simplify the implementation greatly. Also rename the ATTR_ flags to XFS_ATTR_ to not conflict with the ATTR_ flags used by the VFS. SGI-PV: 984565 SGI-Modid: xfs-linux-melb:xfs-kern:31678a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:37 +10:00
Christoph Hellwig	25fe55e814	[XFS] xfs_setattr currently doesn't just handle the attributes set through ->setattr but also addition XFS-specific attributes: project id, inode flags and extent size hint. Having these in a single function makes it more complicated and forces to have us a bhv_vattr intermediate structure eating up stackspace. This patch adds a new xfs_ioctl_setattr helper for the XFS ioctls that set these attributes and remove the code to set them through xfs_setattr. SGI-PV: 984564 SGI-Modid: xfs-linux-melb:xfs-kern:31677a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:36 +10:00
Lachlan McIlroy	c032bfcf46	[XFS] fix use after free with external logs or real-time devices SGI-PV: 983806 SGI-Modid: xfs-linux-melb:xfs-kern:31666a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-07-28 16:59:34 +10:00
Tim Shimmin	6a617dd22b	[XFS] A bug was found in xfs_bmap_add_extent_unwritten_real(). In a particular case, the delta param which is supposed to describe the region where extents have changed was not updated appropriately. SGI-PV: 984030 SGI-Modid: xfs-linux-melb:xfs-kern:31663a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Olaf Weber <olaf@sgi.com>	2008-07-28 16:59:32 +10:00
Christoph Hellwig	766b0925c0	[XFS] fix compilation without CONFIG_PROC_FS SGI-PV: 984019 SGI-Modid: xfs-linux-melb:xfs-kern:31408a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:31 +10:00
Christoph Hellwig	26cc002180	[XFS] s/XFS_PURGE_INODE/IRELE/g s/VN_HOLD(XFS_ITOV())/IHOLD()/ SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31405a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:29 +10:00
Christoph Hellwig	62a877e35d	[XFS] fix mount option parsing in remount Remount currently happily accept any option thrown at it, although the only filesystem specific option it actually handles is barrier/nobarrier. And it actually doesn't handle these correctly either because it only uses the value it parsed when we're doing a ro->rw transition. In addition to that there's also a bad bug in xfs_parseargs which doesn't touch the actual option in the mount point except for a single one, XFS_MOUNT_SMALL_INUMS and thus forced any filesystem that's every remounted in some way to not support 64bit inodes with no way to recover unless unmounted. This patch changes xfs_fs_remount to use it's own linux/parser.h based options parse instead of xfs_parseargs and reject all options except for barrier/nobarrier and to the right thing in general. Eventually I'd like to have a single big option table used for mount aswell but that can wait for a while. SGI-PV: 983964 SGI-Modid: xfs-linux-melb:xfs-kern:31382a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:28 +10:00
Eric Sandeen	deeb5912db	[XFS] Disable queue flag test in barrier check. md raid1 can pass down barriers, but does not set an ordered flag on the queue, so xfs does not even attempt a barrier write, and will never use barriers on these block devices. Remove the flag check and just let the barrier write test determine barrier support. A possible risk here is that if something does not set an ordered flag and also does not properly return an error on a barrier write... but if it's any consolation jbd/ext3/reiserfs never test the flag, and don't even do a test write, they just disable barriers the first time an actual journal barrier write fails. SGI-PV: 983924 SGI-Modid: xfs-linux-melb:xfs-kern:31377a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:26 +10:00
Christoph Hellwig	9f8868ffb3	[XFS] streamline init/exit path Currently the xfs module init/exit code is a mess. It's farmed out over a lot of function with very little error checking. This patch makes sure we propagate all initialization failures properly and clean up after them. Various runtime initializations are replaced with compile-time initializations where possible to make this easier. The exit path is similarly consolidated. There's now split out function to create/destroy the kmem zones and alloc/free the trace buffers. I've also changed the ktrace allocations to KM_MAYFAIL and handled errors resulting from that. And yes, we really should replace the XFS_*_TRACE ifdefs with a single XFS_TRACE.. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:31354a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:59:25 +10:00

1 2 3 4 5 ...

9765 Commits