linux

Author	SHA1	Message	Date
Ian Kent	be3ca7fecb	[PATCH] autofs4: autofs4_follow_link false negative fix The check for an empty directory in the autofs4_follow_link method fails occassionally due to old dentrys. We had the same problem autofs4_revalidate ages ago. I thought we wouldn't need this in autofs4_follow_link, silly me. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:18 -07:00
Jonathan Corbet	cf3e43dbe0	[PATCH] cdev documentation Add some documentation comments for the cdev interface. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Acked-by: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:16 -07:00
Alan Cox	3cfd0885fa	[PATCH] tty: stop the tty vanishing under procfs access Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:16 -07:00
Joel & Rebecca VanderZee	fb50ae7446	[PATCH] I/O Error attempting to read last partial block of a file in an ISO9660 file system There was an I/O error that prevented reading the last partial block of large files in an ISO9660 filesystem. The error was generated when a file comprised more than one section and had a size that was not an exact multiple of the filesystem block size. This patch removes the check (and failure) for reading into the last partial block (and possibly beyond) for multiple-section files. It worked in my testing to prevent reading beyond the end of the section; my first patch just incremented the sect_size block count for a partial block and continued doing the check. But there is a commment in the source code about reading beyond the end of the file to fill a page cache. Failing to access beyond the section would prevent reading beyond the end of the file. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:15 -07:00
Jan Kara	b525a7e444	[PATCH] dquot: add proper locking when using current->signal->tty Dquot passes the tty to tty_write_message without locking Signed-off-by: Jan Kara <jack@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:14 -07:00
Oleg Nesterov	bce9a234ce	[PATCH] elf_fdpic_core_dump: don't take tasklist_lock do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep in wait_for_completion(&mm->core_done) at this point, so we can use RCU locks. Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:14 -07:00
Oleg Nesterov	486ccb05fd	[PATCH] elf_core_dump: don't take tasklist_lock do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep in wait_for_completion(&mm->core_done) at this point, so we can use RCU locks. Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:14 -07:00
Ernie Petrides	ee731f4f78	[PATCH] fix wrong error code on interrupted close syscalls The problem is that close() syscalls can call a file system's flush handler, which in turn might sleep interruptibly and ultimately pass back an -ERESTARTSYS return value. This happens for files backed by an interruptible NFS mount under nfs_file_flush() when a large file has just been written and nfs_wait_bit_interruptible() detects that there is a signal pending. I have a test case where the "strace" command is used to attach to a process sleeping in such a close(). Since the SIGSTOP is forced onto the victim process (removing it from the thread's "blocked" mask in force_sig_info()), the RPC wait is interrupted and the close() is terminated early. But the file table entry has already been cleared before the flush handler was called. Thus, when the syscall is restarted, the file descriptor appears closed and an EBADF error is returned (which is wrong). What's worse, there is the hypothetical case where another thread of a multi-threaded application might have reused the file descriptor, in which case that file would be mistakenly closed. The bottom line is that close() syscalls are not restartable, and thus -ERESTARTSYS return values should be mapped to -EINTR. This is consistent with the close(2) manual page. The fix is below. Signed-off-by: Ernie Petrides <petrides@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:13 -07:00
Amos Waterland	01d553d0fe	[PATCH] Chardev checking of overlapping ranges The code in __register_chrdev_region checks that if the driver wishing to register has the same major as an existing driver the new minor range is strictly less than the existing minor range. However, it does not also check that the new minor range is strictly greater than the existing minor range. That is, if driver X has registered with major=x and minor=0-3, __register_chrdev_region will allow driver Y to register with major=x and minor=1-4. Signed-off-by: Amos Waterland <apw@us.ibm.com> Cc: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:12 -07:00
Kirill Korotaev	3b9b8ab65d	[PATCH] Fix unserialized task->files changing Fixed race on put_files_struct on exec with proc. Restoring files on current on error path may lead to proc having a pointer to already kfree-d files_struct. ->files changing at exit.c and khtread.c are safe as exit_files() makes all things under lock. Found during OpenVZ stress testing. [akpm@osdl.org: add export] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:12 -07:00
Chris Mason	ae78bf9c4f	[PATCH] add -o flush for fat Fat is commonly used on removable media. Mounting with -o flush tells the FS to write things to disk as quickly as possible. It is like -o sync, but much faster (and not as safe). Signed-off-by: Chris Mason <mason@suse.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:12 -07:00
Alexey Dobriyan	50462062a0	[PATCH] fs.h: ifdef security fields [assuming BSD security levels are deleted] The only user of i_security, f_security, s_security fields is SELinux, however, quite a few security modules are trying to get into kernel. So, wrap them under CONFIG_SECURITY. Adding config option for each security field is likely an overkill. Following Stephen Smalley's suggestion, i_security initialization is moved to security_inode_alloc() to not clutter core code with ifdefs and make alloc_inode() codepath tiny little bit smaller and faster. The user of (highly greppable) struct fown_struct::security field is still to be found. I've checked every "fown_struct" and every "f_owner" occurence. Additionally it's removal doesn't break i386 allmodconfig build. struct inode, struct file, struct super_block, struct fown_struct become smaller. P.S. Combined with two reiserfs inode shrinking patches sent to linux-fsdevel, I can finally suck 12 reiserfs inodes into one page. /proc/slabinfo -ext2_inode_cache 388 10 +ext2_inode_cache 384 10 -inode_cache 280 14 +inode_cache 276 14 -proc_inode_cache 296 13 +proc_inode_cache 292 13 -reiser_inode_cache 336 11 +reiser_inode_cache 332 12 <= -shmem_inode_cache 372 10 +shmem_inode_cache 368 10 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00
Alexey Dobriyan	cfe14677f2	[PATCH] reiserfs: ifdef ACL stuff from inode Shrink reiserfs inode more (by 8 bytes) for ACL non-users: -reiser_inode_cache 344 11 +reiser_inode_cache 336 11 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00
Alexey Dobriyan	068fbb315d	[PATCH] reiserfs: ifdef xattr_sem Shrink reiserfs inode by 12 bytes for xattr non-users (me). -reiser_inode_cache 356 11 +reiser_inode_cache 344 11 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00
Chris Mason	a317202714	[PATCH] Fix reiserfs latencies caused by data=ordered ReiserFS does periodic cleanup of old transactions in order to limit the length of time a journal replay may take after a crash. Sometimes, writing metadata from an old (already committed) transaction may require committing a newer transaction, which also requires writing all data=ordered buffers. This can cause very long stalls on journal_begin. This patch makes sure new transactions will not need to be committed before trying a periodic reclaim of an old transaction. It is low risk because if a bad decision is made, it just means a slightly longer journal replay after a crash. Signed-off-by: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00
Chris Mason	25736b1c69	[PATCH] reiserfs_fsync should only use barriers when they are enabled make sure that reiserfs_fsync only triggers barriers when mounted with -o barrier=flush Signed-off-by: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00
Eric Sandeen	39b3f6d6e9	[PATCH] mount udf UDF_PART_FLAG_READ_ONLY partitions with MS_RDONLY There's a bug where a UDF_PART_FLAG_READ_ONLY udf partition gets mounted read-write, then subsequent problems happen; files seem to be able to be removed, but file creation results in EIO or worse, oops. EIO is coming from udf_new_block(), which returns EIO if the right flags aren't set; only UDF_PART_FLAG_READ_ONLY is set in this case. We probably s hould not have gotten this far... Attached patch seems to fix it - and includes a printk to alert the user that their "rw" mount request has been converted to "ro." Here's the testcase I used: [root@magnesium ~]# mkisofs -R -J -udf -o testiso /tmp/ ... Total translation table size: 0 Total rockridge attributes bytes: 342923 Total directory bytes: 382312 Path table size(bytes): 104 Max brk space used 103000 105059 extents written (205 MB) [root@magnesium ~]# mount -o loop testiso /mnt/test/ [root@magnesium ~]# ls /mnt/test/fsfile /mnt/test/fsfile [root@magnesium ~]# rm /mnt/test/fsfile [root@magnesium ~]# ls /mnt/test/fsfile ls: /mnt/test/fsfile: No such file or directory [root@magnesium ~]# touch /mnt/test/fsfile touch: cannot touch `/mnt/test/fsfile': Input/output error [root@magnesium tmp]# grep udf /proc/mounts /dev/loop1 /mnt/test udf rw 0 0 Force readonly mounts of UDF partitions marked as read-only. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:09 -07:00
Olaf Hering	e1dfa92dca	[PATCH] ignore partition table on disks with AIX label The on-disk data structures from AIX are not known, also the filesystem layout is not known. There is a msdos partition signature at the end of the first block, and the kernel recognizes 3 small (and overlapping) partitions. But they are not usable. Maybe the firmware uses it to find the bootloader for AIX, but AIX boots also if the first block is cleared. This is the content of the partition table: # dd if=/dev/sdb count=$(( 4 * 16 )) bs=1 skip=$(( 0x1be )) \| xxd 0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000010: 80ff ffff 41ff ffff 1b11 0000 381b 0000 ....A.......8... 0000020: 00ff ffff 41ff ffff 0211 0000 1900 0000 ....A........... 0000030: 80ff ffff 41ff ffff 1b11 0000 381b 0000 ....A.......8... Handle the whole disk as empty disk. This fixes also YaST which compares the output from parted (and formerly fdisk) with /proc/partitions. fdisk recognizes the AIX label since a long time, SuSE has a patch for parted to handle the disk label as unknown. dmesg will look like this: sda: [AIX] unknown partition table Tested on an IBM B50 with AIX V4.3.3. Signed-off-by: Olaf Hering <olh@suse.de> Cc: Albert Cahalan <acahalan@gmail.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:09 -07:00
Miklos Szeredi	650a898342	[PATCH] vfs: define new lookup flag for chdir In the "operation does permission checking" model used by fuse, chdir permission is not checked, since there's no chdir method. For this case set a lookup flag, which will be passed to ->permission(), so fuse can distinguish it from permission checks for other operations. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:08 -07:00
Miklos Szeredi	5b35e8e58a	[PATCH] fuse: use dentry in statfs Some filesystems may want to report different values depending on the path within the filesystem, i.e. one mount is actually several filesystems. This can be the case for a network filesystem exported by an unprivileged server (e.g. sshfs). This is now possible, thanks to David Howells "VFS: Permit filesystem to perform statfs with a known root dentry" patch. This change is backward compatible, so no need to change interface version. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:08 -07:00
Eugene Teo	8454aeef6f	[PATCH] Require mmap handler for a.out executables Files supported by fs/proc/base.c, i.e. /proc/<pid>/, are not capable of meeting the validity checks in ELF load_elf_() handling because they have no mmap handler which is required by ELF. In order to stop a.out executables being used as part of an exploit attack against /proc-related vulnerabilities, we make a.out executables depend on ->mmap() existing. Signed-off-by: Eugene Teo <eteo@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:08 -07:00
Josh Triplett	9c4dbee79d	[PATCH] fs: add lock annotation to grab_super grab_super gets called with sb_lock held, and releases it. Add a lock annotation to this function so that sparse can check callers for lock pairing, and so that sparse will not complain about this function since it intentionally uses the lock in this manner. Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:08 -07:00
Josh Triplett	ddc0a51d2e	[PATCH] hugetlbfs: add lock annotation to hugetlbfs_forget_inode() hugetlbfs_forget_inode releases inode_lock. Add a lock annotation to this function so that sparse can check callers for lock pairing, and so that sparse will not complain about this functions since it intentionally uses the lock in this manner. Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:08 -07:00
Josh Triplett	105f4d7a81	[PATCH] fuse: add lock annotations to request_end and fuse_read_interrupt request_end and fuse_read_interrupt release fc->lock. Add lock annotations to these two functions so that sparse can check callers for lock pairing, and so that sparse will not complain about these functions since they intentionally use locks in this manner. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:08 -07:00
Josh Triplett	99fc705996	[PATCH] afs: add lock annotations to afs_proc_cell_servers_{start,stop} afs_proc_cell_servers_start acquires a lock, and afs_proc_cell_servers_stop releases that lock. Add lock annotations to these two functions so that sparse can check callers for lock pairing, and so that sparse will not complain about these functions since they intentionally use locks in this manner. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:07 -07:00
Josh Triplett	58f555e5f6	[PATCH] mbcache: add lock annotation for __mb_cache_entry_release_unlock() __mb_cache_entry_release_unlock releases mb_cache_spinlock, so annotate it accordingly. Signed-off-by: Josh Triplett <josh@freedesktop.org> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:07 -07:00
Olaf Hering	42012cc4a2	[PATCH] use gcc -O1 in fs/reiserfs only for ancient gcc versions Only compile with -O1 if the (very old) compiler is broken. We use reiserfs alot since SLES9 on ppc64, and it was never seen with gcc33. Assume the broken gcc is gcc-3.4 or older. Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:07 -07:00
Pekka J Enberg	c0d92cbc58	[PATCH] libfs: remove page up-to-date check from simple_readpage Remove the unnecessary PageUptodate check from simple_readpage. The only two callers for ->readpage that don't have explicit PageUptodate check are read_cache_pages and page_cache_read which operate on newly allocated pages which don't have the flag set. [akpm: use the allegedly-faster clear_page(), too] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:06 -07:00
Randy Dunlap	15a67dd8cc	[PATCH] fs/namespace: handle init/registration errors Check and handle init errors. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:05 -07:00
Andrew Morton	4d7dd8fd95	[PATCH] blockdev.c: check driver layer errors Check driver layer errors. Fix from: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com> In blockdevc-check-errors.patch, add_bd_holder() is modified to return error values when some of its operation failed. Among them, it returns -EEXIST when a given bd_holder object already exists in the list. However, in this case, the function completed its work successfully and need no action by its caller other than freeing unused bd_holder object. So I think it's better to return success after freeing by itself. Otherwise, bd_claim-ing with same claim pointer will fail. Typically, lvresize will fails with following message: device-mapper: reload ioctl failed: Invalid argument and you'll see messages like below in kernel log: device-mapper: table: 254:13: linear: dm-linear: Device lookup failed device-mapper: ioctl: error adding target to table Similarly, it should not add bd_holder to the list if either one of symlinking fails. I don't have a test case for this to happen but it should cause dereference of freed pointer. If a matching bd_holder is found in bd_holder_list, add_bd_holder() completes its job by just incrementing the reference count. In this case, it should be considered as success but it used to return 'fail' to let the caller free temporary bd_holder. Fixed it to return success and free given object by itself. Also, if either one of symlinking fails, the bd_holder should not be added to the list so that it can be discarded later. Otherwise, the caller will free bd_holder which is in the list. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:04 -07:00
Zoltan Menyhart	d1807793e1	[PATCH] JBD: memory leak in "journal_init_dev()" We leak a bh ref in "journal_init_dev()" in case of failure. Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:03 -07:00
Dave Kleikamp	f71b2f10f5	[PATCH] JBD: Make journal_brelse_array() static It's always good to make symbols static when we can, and this also eliminates the need to rename the function in jbd2 Suggested by Eric Sandeen. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:03 -07:00
Tim Shimmin	65e8697a12	[XFS] Remove v1 dir trace macro - missed in a past commit. Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-29 15:23:02 +10:00
Vlad Apostolov	6e73b41888	[XFS] 955947: Infinite loop in xfs_bulkstat() on formatter() error SGI-PV: 955947 SGI-Modid: xfs-linux-melb:xfs-kern:26986a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:06:21 +10:00
Vlad Apostolov	6f1f216840	[XFS] pv 956241, author: nathans, rv: vapo - make ino validation checks consistent in bulkstat SGI-PV: 956241 SGI-Modid: xfs-linux-melb:xfs-kern:26984a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:06:15 +10:00
Vlad Apostolov	6216ff1883	[XFS] pv 956240, author: nathans, rv: vapo - Minor fixes in kmem_zalloc_greedy() SGI-PV: 956240 SGI-Modid: xfs-linux-melb:xfs-kern:26983a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:06:10 +10:00
David Chinner	f273ab848b	[XFS] Really fix use after free in xfs_iunpin. The previous attempts to fix the linux inode use-after-free in xfs_iunpin simply made the problem harder to hit. We actually need complete exclusion between xfs_reclaim and xfs_iunpin, as well as ensuring that the i_flags are consistent during both of these functions. Introduce a new spinlock for exclusion and the i_flags, and fix up xfs_iunpin to use igrab before marking the inode dirty. SGI-PV: 952967 SGI-Modid: xfs-linux-melb:xfs-kern:26964a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:06:03 +10:00
Eric Sandeen	01106eae97	[XFS] Collapse sv_init and init_sv into just the one interface. SGI-PV: 907752 SGI-Modid: xfs-linux-melb:xfs-kern:26925a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:05:52 +10:00
Eric Sandeen	7ae67d78e7	[XFS] standardize on one sema init macro One sema to rule them all, one sema to find them... SGI-PV: 907752 SGI-Modid: xfs-linux-melb:xfs-kern:26911a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:05:46 +10:00
Eric Sandeen	91d8723204	[XFS] Reduce endian flipping in alloc_btree, same as was done for ialloc_btree. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26910a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:05:40 +10:00
Nathan Scott	edcd4bce5e	[XFS] Minor cleanup from dio locking fix, remove an extra conditional. SGI-PV: 955696 SGI-Modid: xfs-linux-melb:xfs-kern:26908a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:05:33 +10:00
Nathan Scott	215101c360	[XFS] Fix kmem_zalloc_greedy warnings on 64 bit platforms. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26907a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:04:43 +10:00
Vlad Apostolov	e132f54ce8	[XFS] pv 955157, rv bnaujok - break the loop on EFAULT formatter() error SGI-PV: 955157 SGI-Modid: xfs-linux-melb:xfs-kern:26869a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:04:31 +10:00
Vlad Apostolov	22de606a0b	[XFS] pv 955157, rv bnaujok - break the loop on formatter() error SGI-PV: 955157 SGI-Modid: xfs-linux-melb:xfs-kern:26866a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:04:24 +10:00
Tim Shimmin	955e47ad28	[XFS] Fixes the leak in reservation space because we weren't ungranting space for the unmount record - which becomes a problem in the freeze/thaw scenario. SGI-PV: 942533 SGI-Modid: xfs-linux-melb:xfs-kern:26815a Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:04:16 +10:00
Josh Triplett	22d91f65d5	[XFS] Add lock annotations to xfs_trans_update_ail and xfs_trans_delete_ail xfs_trans_update_ail and xfs_trans_delete_ail get called with the AIL lock held, and release it. Add lock annotations to these two functions so that sparse can check callers for lock pairing, and so that sparse will not complain about these functions since they intentionally use locks in this manner. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26807a Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:04:07 +10:00
Nathan Scott	68c3271515	[XFS] Fix a porting botch on the realtime subvol growfs code path. SGI-PV: 955515 SGI-Modid: xfs-linux-melb:xfs-kern:26806a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:53 +10:00
Nathan Scott	d432c80e68	[XFS] Minor code rearranging and cleanup to prevent some coverity false positives. SGI-PV: 955502 SGI-Modid: xfs-linux-melb:xfs-kern:26805a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:44 +10:00
Nathan Scott	b627259c60	[XFS] Remove a no-longer-correct debug assert from dio completion handling. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26804a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:33 +10:00
Nathan Scott	77e4635ae1	[XFS] Add a greedy allocation interface, allocating within a min/max size range. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26803a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:27 +10:00
Nathan Scott	572d95f49f	[XFS] Improve error handling for the zero-fsblock extent detection code. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26802a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:20 +10:00
Nathan Scott	948ecdb4c1	[XFS] Be more defensive with page flags (error/private) for metadata buffers. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26801a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:13 +10:00
Nathan Scott	efb8ad7e94	[XFS] Add a debug flag for allocations which are known to be larger than one page. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26800a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:05 +10:00
Eric Sandeen	3f89243c5b	[XFS] Remove several macros that are no longer used anywhere SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26749a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:57 +10:00
Eric Sandeen	065d312e15	[XFS] Remove unused iop_abort log item operation SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26747a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:44 +10:00
Eric Sandeen	43129c16e8	[XFS] Remove a couple of unused BUF macros SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26746a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:37 +10:00
Vlad Apostolov	17370097da	[XFS] pass file mode on DMAPI remove events SGI-PV: 953687 SGI-Modid: xfs-linux-melb:xfs-kern:26639a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:30 +10:00
Nathan Scott	745b1f47fc	[XFS] Remove last bulkstat false-positives with debug kernels. SGI-PV: 953819 SGI-Modid: xfs-linux-melb:xfs-kern:26628a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:23 +10:00
Nathan Scott	a3c6685eaa	[XFS] Ensure xlog_state_do_callback does not report spurious warnings on ramdisks. SGI-PV: 954802 SGI-Modid: xfs-linux-melb:xfs-kern:26627a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:14 +10:00
Nathan Scott	bb3c7d2936	[XFS] Increase the size of the buffer holding the local inode cluster list, to increase our potential readahead window and in turn improve bulkstat performance. SGI-PV: 944409 SGI-Modid: xfs-linux-melb:xfs-kern:26607a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:09 +10:00
Nathan Scott	2627509330	[XFS] Drop unneeded endian conversion in bulkstat and start readahead for batches of inode cluster buffers at once, before any blocking reads are issued. SGI-PV: 944409 SGI-Modid: xfs-linux-melb:xfs-kern:26606a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:03 +10:00
Nathan Scott	51bdd70681	[XFS] When issuing metadata readahead, submit bio with READA not READ. SGI-PV: 944409 SGI-Modid: xfs-linux-melb:xfs-kern:26603a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:01:57 +10:00
Nathan Scott	8b56f083c2	[XFS] Rework DMAPI bulkstat calls in such a way that we can directly extract inline attributes out of the bulkstat buffer (for that case), rather than using an (extremely expensive for large icount filesystems) iget for fetching attrs. SGI-PV: 944409 SGI-Modid: xfs-linux-melb:xfs-kern:26602a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:01:46 +10:00
Tim Shimmin	726801ba06	[XFS] Add EA list callbacks for xfs kernel use. Cleanup some namespace code. SGI-PV: 954372 SGI-Modid: xfs-linux-melb:xfs-kern:26583a Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:01:37 +10:00
Nathan Scott	69e23b9a5e	[XFS] Update XFS for i_blksize removal from generic inode structure SGI-PV: 954366 SGI-Modid: xfs-linux-melb:xfs-kern:26565a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:01:22 +10:00
Nathan Scott	29b6d22b01	[XFS] remove accidentally reintroduced vfs unmount flag, unneeded in current kernels SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26564a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:59:06 +10:00
Christoph Hellwig	fe48cae9ed	[XFS] remove bhv_lookup, _range version works aswell and has more useful semantics. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26563a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:58:52 +10:00
Nathan Scott	1121b219bf	[XFS] use NULL for pointer initialisation instead of zero-cast-to-ptr SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26562a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:58:40 +10:00
Christoph Hellwig	8801bb99e4	[XFS] endianess annotations for xfs_bmbt_key Trivial as there are no incore users. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26561a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:58:17 +10:00
Christoph Hellwig	576039cf3c	[XFS] endianess annotate XFS_BMAP_BROOT_PTR_ADDR Make sure it returns a __be64 and let the callers use the proper macros. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26560a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:58:06 +10:00
Christoph Hellwig	397b5208d5	[XFS] endianess annotations for xfs_bmbt_ptr_t/xfs_bmdr_ptr_t SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26559a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:57:52 +10:00
Christoph Hellwig	b113bcb83e	[XFS] add xfs_btree_check_lptr_disk variant which handles endian conversion SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26558a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:57:42 +10:00
Christoph Hellwig	c38e5e84db	[XFS] remove left over INT_ comments in alloc.c We can verify endianess handling with sparse now, no need for comments. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26557a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:57:17 +10:00
Christoph Hellwig	61a2584867	[XFS] endianess annotations for xfs_inobt_rec_t / xfs_inobt_key_t SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26556a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:57:04 +10:00
Christoph Hellwig	e21010053a	[XFS] endianess annotation for xfs_agfl_t. Trivial, xfs_agfl_t is always used for ondisk values. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26553a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:56:51 +10:00
Nathan Scott	ed9d88f7b7	[XFS] Fix sparse warning found when page tracing enabled, due to overloaded gfp_t param. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26552a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:56:43 +10:00
Nathan Scott	673cdf5c72	[XFS] Fix rounding bug in xfs_free_file_space found by sparse checking. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26551a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:56:26 +10:00
Alexey Dobriyan	87395deb0b	[XFS] move XFS_IOC_GETVERSION to main multiplexer Avoids doing an unnecessary inode to vnode conversion and avoids a memory allocation. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26492a Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:56:01 +10:00
Tim Shimmin	128dabc5e9	[XFS] cleanup the field types of some item format structures SGI-PV: 954365 SGI-Modid: xfs-linux-melb:xfs-kern:26406a Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:55:43 +10:00
Nathan Scott	f07c225036	[XFS] Improve xfsbufd delayed write submission patterns, after blktrace analysis. Under a sequential create+allocate workload, blktrace reported backward writes being issued by xfsbufd, and frequent inappropriate queue unplugs. We now insert at the tail when moving from the delwri lists to the temp lists, which maintains correct ordering, and we avoid unplugging queues deep in the submit paths when we'd shortly do it at a higher level anyway. blktrace now reports much healthier write patterns from xfsbufd for this workload (and likely many others). SGI-PV: 954310 SGI-Modid: xfs-linux-melb:xfs-kern:26396a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:52:15 +10:00
Alexey Dobriyan	f37ea14969	[XFS] pass inode to xfs_ioc_space(), simplify some code. There is trivial "inode => vnode => inode" conversion, but only flags and mode of final inode are looked at. Pass original inode instead. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26395a Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:52:04 +10:00
Eric W. Biederman	aafe6c2a2b	[PATCH] de_thread: Use tsk not current Ingo Oeser pointed out that because current expands to an inline function it is more space efficient and somewhat faster to simply keep a cached copy of current in another variable. This patch implements that for the de_thread function. (akpm: saves nearly 100 bytes of text on x86) Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:20 -07:00
Adrian Bunk	66f37509fc	[PATCH] fs/nfs/: make code static Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:20 -07:00
Martin Bligh	b7b52630de	[PATCH] add newline to nfs dprintk Add missing \n to dprintk Signed-off-by: Martin Bligh <mbligh@google.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:19 -07:00
Eric W. Biederman	c18258c6f0	[PATCH] pid: Implement transfer_pid and use it to simplify de_thread In de_thread we move pids from one process to another, a rather ugly case. The function transfer_pid makes it clear what we are doing, and makes the action atomic. This is useful we ever want to atomically traverse the process group and session lists, in a rcu safe manner. Even if the atomic properties this change should be a win as transfer_pid should be less code to execute than executing both attach_pid and detach_pid, and this should make de_thread slightly smaller as only a single function call needs to be emitted. The only downside is that the code might be slower to execute as the odds are against transfer_pid being in cache. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:19 -07:00
Eric W. Biederman	b89a81712f	[PATCH] sysctl: Allow /proc/sys without sys_sysctl Since sys_sysctl is deprecated start allow it to be compiled out. This should catch any remaining user space code that cares, and paves the way for further sysctl cleanups. [akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:19 -07:00
Andrew Morton	8b0e330b77	[PATCH] alloc_fdtable() cleanup free_fdset(NULL, ...) is legal. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:19 -07:00
Adrian Bunk	36b756f2b5	[PATCH] reiserfs: warn about the useless nolargeio option Since the nolargeio option no longer has any effect, print a warning instead of setting a write-only variable. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <mason@suse.com> Cc: Hans Reiser <reiser@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:18 -07:00
Theodore Ts'o	ba52de123d	[PATCH] inode-diet: Eliminate i_blksize from the inode structure This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. [bunk@stusta.de: cleanup] [akpm@osdl.org: generic_fillattr() fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:18 -07:00
Theodore Ts'o	577c4eb09d	[PATCH] inode-diet: Move i_cdev into a union Move the i_cdev pointer in struct inode into a union. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
Theodore Ts'o	eaf796e7ef	[PATCH] inode-diet: Move i_bdev into a union Move the i_bdev pointer in struct inode into a union. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
Theodore Ts'o	8e18e2941c	[PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. [judith@osdl.org: powerpc build fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Judith Lebzelter <judith@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
OGAWA Hirofumi	6a1d9805ec	[PATCH] fat: cleanup fat_get_block(s) get_blocks() was removed. So, this removes it on fat, and will take advantage of the multi block mapping. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
Ian Kent	bcdc5e019d	[PATCH] autofs4 needs to force fail return revalidate For a long time now I have had a problem with not being able to return a lookup failure on an existsing directory. In autofs this corresponds to a mount failure on a autofs managed mount entry that is browsable (and so the mount point directory exists). While this problem has been present for a long time I've avoided resolving it because it was not very visible. But now that autofs v5 has "mount and expire on demand" of nested multiple mounts, such as is found when mounting an export list from a server, solving the problem cannot be avoided any longer. I've tried very hard to find a way to do this entirely within the autofs4 module but have not been able to find a satisfactory way to achieve it. So, I need to propose a change to the VFS. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
David Howells	f269fdd182	[PATCH] NOMMU: move the fallback arch_vma_name() to a sensible place Move the fallback arch_vma_name() to a sensible place (kernel/signal.c). Currently it's in fs/proc/task_mmu.c, a file that is dependent on both CONFIG_PROC_FS and CONFIG_MMU being enabled, but it's used from kernel/signal.c from where it is called unconditionally. [akpm@osdl.org: build fix] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
David Howells	dbf8685c8e	[PATCH] NOMMU: Implement /proc/pid/maps for NOMMU Implement /proc/pid/maps for NOMMU by reading the vm_area_list attached to current->mm->context.vmlist. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	5da6185bca	[PATCH] NOMMU: Set BDI capabilities for /dev/mem and /dev/kmem Set the backing device info capabilities for /dev/mem and /dev/kmem to permit direct sharing under no-MMU conditions and full mapping capabilities under MMU conditions. Make the BDI used by these available to all directly mappable character devices. Also comment the capabilities for /dev/zero. [akpm@osdl.org: ifdef reductions] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
Alexey Dobriyan	1a1d92c10d	[PATCH] Really ignore kmem_cache_destroy return value * Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Panagiotis Issaris	f52720ca5f	[PATCH] fs: Removing useless casts * Removing useless casts * Removing useless wrapper * Conversion from kmalloc+memset to kzalloc Signed-off-by: Panagiotis Issaris <takis@issaris.org> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Panagiotis Issaris	f8314dc60c	[PATCH] fs: Conversions from kmalloc+memset to k(z\|c)alloc Conversions from kmalloc+memset to kzalloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Jffs2-bit-acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Eric Sandeen	32c2d2bc4b	[PATCH] more ext3 16T overflow fixes Some of the changes in balloc.c are just cosmetic, as Andreas pointed out - if they overflow they'll then underflow and things are fine. 5th hunk actually fixes an overflow problem. Also check for potential overflows in inode & block counts when resizing. Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Dave Kleikamp	a4e4de36dc	[PATCH] ext3: Fix sparse warnings Fixing up some endian-ness warnings in preparation to clone ext4 from ext3. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Dave Kleikamp	e9ad5620bf	[PATCH] ext3: More whitespace cleanups More white space cleanups in preparation of cloning ext4 from ext3. Removing spaces that precede a tab. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:10 -07:00
Vasily Averin	7543fc7b3a	[PATCH] ext3: wrong error behavior SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error behavior was broken in linux kernels since 2.5.x versions by the following patch: 2002/10/31 02:15:26-05:00 tytso@snap.thunk.org Default mount options from superblock for ext2/3 filesystems http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ In case ext3 file system is mounted with errors=continue (EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at present in case of any error kernel aborts journal and remounts filesystem to read-only. Such behavior was hit number of times and noted to differ from that of 2.4.x kernels. This patch fixes this: - do nothing in case of EXT3_ERRORS_CONTINUE, - set EXT3_MOUNT_ABORT and call journal_abort() in all other cases - panic() should be called after ext3_commit_super() to save sb marked as EXT3_ERROR_FS Signed-off-by: Vasily Averin <vvs@sw.ru> Acked-by: Kirill Korotaev <dev@sw.ru> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Mingming Cao	36faadc144	[PATCH] ext3: more comments about block allocation/reservation code Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Mingming Cao	321fb9e818	[PATCH] ext3: turn on reservation dump on block allocation errors In the past there were a few kernel panics related to block reservation tree operations failure (insert/remove etc). It would be very useful to get the block allocation reservation map info when such error happens. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Eric Sandeen	37ed322290	[PATCH] JBD: 16T fixes These are a few places I've found in jbd that look like they may not be 16T-safe, or consistent with the use of unsigned longs for block containers. Problems here would be somewhat hard to hit, would require journal blocks past the 8T boundary, which would not be terribly common. Still, should fix. (some of these have come from the ext4 work on jbd as well). I think there's one more possibility that the wrap() function may not be safe IF your last block in the journal butts right up against the 232 block boundary, but that seems like a VERY remote possibility, and I'm not worrying about it at this point. Signed-off-by: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Eric Sandeen	eee194e76c	[PATCH] ext3: inode numbers are unsigned long This is primarily format string fixes, with changes to ialloc.c where large inode counts could overflow, and also pass around journal_inum as an unsigned long, just to be pedantic about it.... Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Eric Sandeen	41f04d852e	[PATCH] ext2: fix mounts at 16T Signed-off-by: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Eric Sandeen	855565e81a	[PATCH] fix ext3 mounts at 16T I need to do some actual IO testing now, but this gets things mounting for a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that off the kernel list) This patch fixes these issues in the kernel: o sbi->s_groups_count overflows in ext3_fill_super() sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) - le32_to_cpu(es->s_first_data_block) + EXT3_BLOCKS_PER_GROUP(sb) - 1) / EXT3_BLOCKS_PER_GROUP(sb); at 16T, s_blocks_count is already maxed out; adding EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0. Not really what we want, and causes a failed mount. Feel free to check my math (actually, please do!), but changing it this way should work & avoid the overflow: (A + B - 1)/B changed to: ((A - 1)/B) + 1 o ext3_check_descriptors() overflows range checks ext3_check_descriptors() iterates over all block groups making sure that various bits are within the right block ranges... on the last pass through, it is checking the error case [item] >= block + EXT3_BLOCKS_PER_GROUP(sb) where "block" is the first block in the last block group. The last block in this group (and the last one that will fit in 32 bits) is block + EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps back around to 0. so, make things clearer with "first_block" and "last_block" where those are first and last, inclusive, and use <, > rather than <, >=. Finally, the last block group may be smaller than the rest, so account for this on the last pass through: last_block = sb->s_blocks_count - 1; (a similar patch could be done for ext2; does anyone in their right mind use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's warranted) Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Alexey Dobriyan	2aed348469	[PATCH] jbd: use BUILD_BUG_ON in journal init Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Mingming Cao	ae6ddcc5f2	[PATCH] ext3 and jbd cleanup: remove whitespace Remove whitespace from ext3 and jbd, before we clone ext4. Signed-off-by: Mingming Cao<cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:09 -07:00
Josh Triplett	e7ab8d6505	[PATCH] jbd: add lock annotation to jbd_sync_bh jbd_sync_bh releases journal->j_list_lock. Add a lock annotation to this function so that sparse can check callers for lock pairing, and so that sparse will not complain about this function since it intentionally uses the lock in this manner. Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:08 -07:00
Linus Torvalds	b278240839	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits) [PATCH] Don't set calgary iommu as default y [PATCH] i386/x86-64: New Intel feature flags [PATCH] x86: Add a cumulative thermal throttle event counter. [PATCH] i386: Make the jiffies compares use the 64bit safe macros. [PATCH] x86: Refactor thermal throttle processing [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64) [PATCH] Fix unwinder warning in traps.c [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1 [PATCH] x86: Move direct PCI scanning functions out of line [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI [PATCH] Don't leak NT bit into next task [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder [PATCH] Fix some broken white space in ia32_signal.c [PATCH] Initialize argument registers for 32bit signal handlers. [PATCH] Remove all traces of signal number conversion [PATCH] Don't synchronize time reading on single core AMD systems [PATCH] Remove outdated comment in x86-64 mmconfig code [PATCH] Use string instructions for Core2 copy/clear [PATCH] x86: - restore i8259A eoi status on resume [PATCH] i386: Split multi-line printk in oops output. ...	2006-09-26 13:07:55 -07:00
Linus Torvalds	dd77a4ee0f	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (47 commits) Driver core: Don't call put methods while holding a spinlock Driver core: Remove unneeded routines from driver core Driver core: Fix potential deadlock in driver core PCI: enable driver multi-threaded probe Driver Core: add ability for drivers to do a threaded probe sysfs: add proper sysfs_init() prototype drivers/base: check errors drivers/base: Platform notify needs to occur before drivers attach to the device v4l-dev2: handle __must_check add CONFIG_ENABLE_MUST_CHECK add __must_check to device management code Driver core: fixed add_bind_files() definition Driver core: fix comments in drivers/base/power/resume.c sysfs_remove_bin_file: no return value, dump_stack on error kobject: must_check fixes Driver core: add ability for devices to create and remove bin files Class: add support for class interfaces for devices Driver core: create devices/virtual/ tree Driver core: add device_rename function Driver core: add ability for classes to handle devices properly ...	2006-09-26 11:49:46 -07:00
Andrew Morton	8d6b5eeea5	[PATCH] binfmt_elf: consistently use loff_t As David Howells <dhowells@redhat.com> points out, binfmt_elf sometimes uses off_t, sometimes uses loff_t. Use loff_t throughout. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:53 -07:00
Christoph Lameter	972d1a7b14	[PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLE Remove the atomic counter for slab_reclaim_pages and replace the counter and NR_SLAB with two ZVC counter that account for unreclaimable and reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE. Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The intend seems to be to check for slab pages that could be freed. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:51 -07:00
Christoph Lameter	182e8e2373	[PATCH] reduce MAX_NR_ZONES: make display of highmem counters conditional on CONFIG_HIGHMEM Do not display HIGHMEM memory sizes if CONFIG_HIGHMEM is not set. Make HIGHMEM dependent texts and make display of highmem counters optional Some texts are depending on CONFIG_HIGHMEM. Remove those strings and remove the display of highmem counter values if CONFIG_HIGHMEM is not set. [akpm@osdl.org: remove some ifdefs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:46 -07:00
Peter Zijlstra	d08b3851da	[PATCH] mm: tracking shared dirty pages Tracking of dirty pages in shared writeable mmap()s. The idea is simple: write protect clean shared writeable pages, catch the write-fault, make writeable and set dirty. On page write-back clean all the PTE dirty bits and write protect them once again. The implementation is a tad harder, mainly because the default backing_dev_info capabilities were too loosely maintained. Hence it is not enough to test the backing_dev_info for cap_account_dirty. The current heuristic is as follows, a VMA is eligible when: - its shared writeable (vm_flags & (VM_WRITE\|VM_SHARED)) == (VM_WRITE\|VM_SHARED) - it is not a 'special' mapping (vm_flags & (VM_PFNMAP\|VM_INSERTPAGE)) == 0 - the backing_dev_info is cap_account_dirty mapping_cap_account_dirty(vma->vm_file->f_mapping) - f_op->mmap() didn't change the default page protection Page from remap_pfn_range() are explicitly excluded because their COW semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and because they don't have a backing store anyway. mprotect() is taught about the new behaviour as well. However it overrides the last condition. Cleaning the pages on write-back is done with page_mkclean() a new rmap call. It can be called on any page, but is currently only implemented for mapped pages, if the page is found the be of a VMA that accounts dirty pages it will also wrprotect the PTE. Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from under ->private_lock. This seems to be safe, since ->private_lock is used to serialize access to the buffers, not the page itself. This is needed because clear_page_dirty() will call into page_mkclean() and would thereby violate locking order. [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Jan Kara	3998b9301d	[PATCH] jbd: fix commit of ordered data buffers Original commit code assumes, that when a buffer on BJ_SyncData list is locked, it is being written to disk. But this is not true and hence it can lead to a potential data loss on crash. Also the code didn't count with the fact that journal_dirty_data() can steal buffers from committing transaction and hence could write buffers that no longer belong to the committing transaction. Finally it could possibly happen that we tried writing out one buffer several times. The patch below tries to solve these problems by a complete rewrite of the data commit code. We go through buffers on t_sync_datalist, lock buffers needing write out and store them in an array. Buffers are also immediately refiled to BJ_Locked list or unfiled (if the write out is completed). When the array is full or we have to block on buffer lock, we submit all accumulated buffers for IO. [suitable for 2.6.18.x around the 2.6.19-rc2 timeframe] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Andi Kleen	758333458a	[PATCH] Check return value of copy_to_user in compat_sys_pselect7 Fix linux/fs/compat.c: In function compat_sys_pselect7 linux/fs/compat.c:1869: warning: ignoring return value of copy_to_user, declared with attribute warn_unused_result To make it easier to handle I changed to semantics to not try to write out a timespec if an error occurred. I hope that's ok. Cc: dwmw2@infradead.org Signed-off-by: Andi Kleen <ak@suse.de>	2006-09-26 10:52:39 +02:00
Andi Kleen	c16b63e09d	[PATCH] i386/x86-64: Don't randomize stack top when no randomization personality is set Based on patch from Frank van Maarseveen <frankvm@frankvm.com>, but extended. Signed-off-by: Andi Kleen <ak@suse.de>	2006-09-26 10:52:28 +02:00
Andrew Morton	f20a9ead0d	sysfs: add proper sysfs_init() prototype Don't be crufty. Mark it __must_check too. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-09-25 21:08:39 -07:00
Randy.Dunlap	995982ca79	sysfs_remove_bin_file: no return value, dump_stack on error Make sysfs_remove_bin_file() void. If it detects an error, printk the file name and call dump_stack(). sysfs_hash_and_remove() now returns an error code indicating its success or failure so that sysfs_remove_bin_file() can know success/failure. Convert the only driver that checked the return value of sysfs_remove_bin_file(). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-09-25 21:08:39 -07:00
Greg Kroah-Hartman	ceeee1fb28	SYSFS: allow sysfs_create_link to create symlinks in the root of sysfs This is needed to make the compatible link for /sys/block in the future. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-09-25 21:08:36 -07:00
Randy Dunlap	6468b3afa7	Debugfs: kernel-doc fixes for debugfs Fix kernel-doc and typos/spellos in fs/debugfs/. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-09-25 21:08:36 -07:00
Juha Yrj�l�	eea3f8911f	sysfs: Make poll behaviour consistent When no events have been reported by sysfs_notify(), sd->s_events was previously set to zero. The initial value for new readers is also zero, so poll was blocking, regardless of whether the attribute was read by the process or not. Make poll behave consistently by setting the initial value of sd->s_events to non-zero. Signed-off-by: Juha Yrjola <juha.yrjola@solidboot.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-09-25 21:08:36 -07:00
Ian Kent	c0ba7e5147	[PATCH] autofs4: zero timeout prevents shutdown If the timeout of an autofs mount is set to zero then umounts are disabled. This works fine, however the kernel module checks the expire timeout and goes no further if it is zero. This is not the right thing to do at shutdown as the module is passed an option to expire mounts regardless of their timeout setting. This patch allows autofs to honor the force expire option. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-25 17:38:35 -07:00
Mark Fasheh	0d5dc6c2dd	ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback With this, we don't need to pass an additional struct with function pointer. Now that the callbacks are fully used, comment the remaining API. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	b5e500e23e	ocfs2: Remove ->unblock lockres operation Have ocfs2_process_blocked_lock() call ocfs2_generic_unblock_lock(), which gets to be ocfs2_unblock_lock() now that it's the only possible unblock function. Remove the ->unblock() callback from the structure, and all lock type specific unblock functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	cc567d89b3	ocfs2: move downconvert worker to lockres ops This way lock types don't have to manually pass it to ocfs2_generic_unblock_lock(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	08280f11de	ocfs2: Remove unused dlmglue functions The meta data unblocking code no longer needs ocfs2_do_unblock_meta() or ocfs2_can_downconvert_meta_lock(), so remove them. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	810d5aeba1	ocfs2: Have the metadata lock use generic dlmglue functions Fill in the ->check_downconvert and ->set_lvb callbacks with meta data specific operations and switch ocfs2_unblock_meta() to call ocfs2_generic_unblock_lock() Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	5ef0d4ea08	ocfs2: Add ->set_lvb callback in dlmglue This allows a lock type to set the value block before downconvert. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	16d5b9567a	ocfs2: Add ->check_downconvert callback in dlmglue This will allow lock types to force a requeue of a lock downconvert. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	f7fbfdd1fc	ocfs2: Check for refreshing locks in generic unblock function Tidy up the exit path a bit too. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	b80fc012e0	ocfs2: don't unconditionally pass LVB flags Allow a lock type to specifiy whether it makes use of the LVB. The only type which does this right now is the meta data lock. This should save us some space on network messages since they won't have to needlessly transmit value blocks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	aa2623ad80	ocfs2: combine inode and generic blocking AST functions There is extremely little difference between the two now. We can remove the callback from ocfs2_lock_res_ops as well. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	54a7e7552e	ocfs2: Add ->get_osb() dlmglue locking operation Will be used to find the ocfs2_super structure from a given lockres. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	2a45f2d13e	ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops This was always defined to the same function in all locks, so clean things up by removing and passing ocfs2_unlock_ast() directly to the DLM. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	e92d57df27	ocfs2: combine inode and generic AST functions There is extremely little difference between the two now. We can remove the callback from ocfs2_lock_res_ops as well. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	f625c9793b	ocfs2: Clean up lock resource refresh flags Use of the refresh mechanism is lock-type wide, so move knowledge of that to the ocfs2_lock_res_ops structure. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	24c19ef404	ocfs2: Remove i_generation from inode lock names OCFS2 puts inode meta data in the "lock value block" provided by the DLM. Typically, i_generation is encoded in the lock name so that a deleted inode on and a new one in the same block don't share the same lvb. Unfortunately, that scheme means that the read in ocfs2_read_locked_inode() is potentially thrown away as soon as the meta data lock is taken - we cannot encode the lock name without first knowing i_generation, which requires a disk read. This patch encodes i_generation in the inode meta data lvb, and removes the value from the inode meta data lock name. This way, the read can be covered by a lock, and at the same time we can distinguish between an up to date and a stale LVB. This will help cold-cache stat(2) performance in particular. Since this patch changes the protocol version, we take the opportunity to do a minor re-organization of two of the LVB fields. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	f9e2d82e63	ocfs2: Encode i_generation in the meta data lvb When i_generation is removed from the lockname, this will help us determine whether a meta data lvb has information that is in sync with the local struct inode. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	4d3b83f736	ocfs2: Free up some space in the lvb lvb_version doesn't need to be a whole 32 bits. Make it an 8 bit field to free up some space. This should be backwards compatible until we use one of the fields, in which case we'd bump the lvb version anyway. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	0027dd5bc2	ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock() We can't use LKM_LOCAL for new dentry locks because an unlink and subsequent re-create of a name/inode pair may result in the lock still being mastered somewhere in the cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	1ba9da2ffa	ocfs2: manually d_move() during ocfs2_rename() Make use of FS_RENAME_DOES_D_MOVE to avoid a race condition that can occur during ->rename() if we d_move() outside of the parent directory cluster locks, and another node discovers the new name (created during the rename) and unlinks it. d_move() will unconditionally rehash a dentry - which will leave stale data in the system. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	349457ccf2	[PATCH] Allow file systems to manually d_move() inside of ->rename() Some file systems want to manually d_move() the dentries involved in a rename. We can do this by making use of the FS_ODD_RENAME flag if we just have nfs_rename() unconditionally do the d_move(). While there, we rename the flag to be more descriptive. OCFS2 uses this to protect that part of the rename operation with a cluster lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-09-24 13:50:45 -07:00
Mark Fasheh	1390334b4c	ocfs2: Remove the dentry vote This is unused now. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:43 -07:00
Mark Fasheh	379dfe9d0d	ocfs2: Hook rest of the file system into dentry locking API Actually replace the vote calls with the new dentry operations. Make any necessary adjustments to get the scheme to work. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:43 -07:00
Mark Fasheh	80c05846f6	ocfs2: Add dentry tracking API Replace the dentry vote mechanism with a cluster lock which covers a set of dentries. This allows us to force d_delete() only on nodes which actually care about an unlink. Every node that does a ->lookup() gets a read only lock on the dentry, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. This patch adds the higher level API and the dentry manipulation code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:43 -07:00
Mark Fasheh	d680efe9d8	ocfs2: Add new cluster lock type Replace the dentry vote mechanism with a cluster lock which covers a set of dentries. This allows us to force d_delete() only on nodes which actually care about an unlink. Every node that does a ->lookup() gets a read only lock on the dentry, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. This patch adds the cluster lock type which OCFS2 can attach to dentries. A small number of fs/ocfs2/dcache.c functions are stubbed out so that this change can compile. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	f0681062b8	ocfs2: Update dlmglue for new dlmlock() API File system lock names are very regular right now, so we really only need to pass an extra parameter to dlmlock(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	ea5b3a187e	ocfs2: Update dlmfs for new dlmlock() API We just need to add a namelen field to the user_lock_res structure, and update a few debug prints. Instead of updating all debug prints, I took the opportunity to remove a few that are likely unnecessary these days. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	3384f3df5e	ocfs2: Allow binary names in the DLM The OCFS2 DLM uses strlen() to determine lock name length, which excludes the possibility of putting binary values in the name string. Fix this by requiring that string length be passed in as a parameter. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	e2c73698af	ocfs2: Silence dlm error print An AST can be delivered via the network after a lock has been removed, so no need to print an error when we see that. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:41 -07:00
Jeff Garzik	e18fa700c9	Move several *_SUPER_MAGIC symbols to include/linux/magic.h. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-09-24 11:13:19 -04:00
Chuck Lever	026ed5c918	NFS: unmark NFS direct I/O as experimental Remove the EXPERIMENTAL flag from the NFS_DIRECTIO option. Test plan: Unset the EXPERIMENTAL kernel build option and check to see that the NFS direct I/O option is still available. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:06 -04:00
Chuck Lever	f551e44ff1	NFS: add comments clarifying the use of nfs_post_op_update() Comments-only change to clarify a detail of the NFS protocol and how it is implemented in Linux. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:05 -04:00
Josef 'Jeff' Sipek	aec5e17528	NFS: Use SEEK_END instead of hardcoded value Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:04 -04:00
Trond Myklebust	51b6ded4d9	NFSv4: When mounting with a port=0 argument, substitute port=2049 RFC3530 states that the registered port 2049 for the NFS protocol should be the default configuration in order to allow clients not to use the RPC binding protocols. If the mount program sends us a port=0, we therefore substitute port=2049. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:04 -04:00
Trond Myklebust	2066fe89b4	NFSv4: Poll more aggressively when handling NFS4ERR_DELAY Change the initial retry delay from 1s to 0.1s (and then back off exponentially). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:04 -04:00
Trond Myklebust	c514983d8d	NFSv4: Handle the condition NFS4ERR_FILE_OPEN Retry a few times before we give up: the error is usually due to ordering issues with asynchronous RPC calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:03 -04:00
Trond Myklebust	6b30954ebb	NFSv4: Retry lease recovery if it failed during a synchronous operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:03 -04:00
Trond Myklebust	97db8f4179	NFS: Don't invalidate the symlink we just stuffed into the cache And slight optimisation of nfs_end_data_update(): directories never have delegations anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:03 -04:00
Trond Myklebust	5f004cf2aa	NFS: Make read() return an ESTALE if the file has been deleted Currently, a read() request will return EIO even if the file has been deleted on the server, simply because that is what the VM will return if the call to readpage() fails to update the page. Ensure that readpage() marks the inode as stale if it receives an ESTALE. Then return that error to userland. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:02 -04:00
J. Bruce Fields	2dec51466a	NFSv4: It's perfectly legal for clp to be NULL here.... Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:02 -04:00
Trond Myklebust	fd6840714d	NFS: nfs_lookup - don't hash dentry when optimising away the lookup If the open intents tell us that a given lookup is going to result in a, exclusive create, we currently optimize away the lookup call itself. The reason is that the lookup would not be atomic with the create RPC call, so why do it in the first place? A problem occurs, however, if the VFS aborts the exclusive create operation after the lookup, but before the call to create the file/directory: in this case we will end up with a hashed negative dentry in the dcache that has never been looked up. Fix this by only actually hashing the dentry once the create operation has been successfully completed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:01 -04:00
andros@citi.umich.edu	297de4f656	Fix a referral error Oops Fix an oops when the referral server is not responding. Check the error return from nfs4_set_client() in nfs4_create_referral_server. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:56 -04:00
Chuck Lever	058ad9cbf1	NFS: NFS_ROOT should use the new rpc_create API Teach NFS_ROOT to use the new rpc_create API instead of the old two-call API for creating an RPC transport. Test plan: Compile the kernel with the NFS client build-in, and set CONFIG_NFS_ROOT. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:55 -04:00
David Howells	6daabf1b04	NFS: Fix up compiler warnings on 64-bit platforms in client.c Fix up warnings from compiling on ppc64. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:55 -04:00
Trond Myklebust	158998b6fe	SUNRPC: Make rpc_mkpipe() take the parent dentry as an argument Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
Trond Myklebust	5dd3177ae5	NFSv4: Fix a use-after-free issue with the nfs server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
Trond Myklebust	275a082fe9	Add a real API for dealing with blk_congestion_wait() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
Chuck Lever	94a6d75320	NFS: Use cached page as buffer for NFS symlink requests Now that we have a copy of the symlink path in the page cache, we can pass a struct page down to the XDR routines instead of a string buffer. Test plan: Connectathon, all NFS versions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:53 -04:00
Chuck Lever	873101b337	NFS: copy symlinks into page cache before sending NFS SYMLINK request Currently the NFS client does not cache symlinks it creates. They get cached only when the NFS client reads them back from the server. Copy the symlink into the page cache before sending it. Test plan: Connectathon, all NFS versions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:53 -04:00
Chuck Lever	4f390c152b	NFS: Fix double d_drop in nfs_instantiate() error path If the LOOKUP or GETATTR in nfs_instantiate fail, nfs_instantiate will do a d_drop before returning. But some callers already do a d_drop in the case of an error return. Make certain we do only one d_drop in all error paths. This issue was introduced because over time, the symlink proc API diverged slightly from the create/mkdir/mknod proc API. To prevent other coding mistakes of this type, change the symlink proc API to be more like create/mkdir/mknod and move the nfs_instantiate call into the symlink proc routines so it is used in exactly the same way for create, mkdir, mknod, and symlink. Test plan: Connectathon, all versions of NFS. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:52 -04:00
Chuck Lever	d3db90e270	NFS: remove a no-longer-needed error check in nfs_symlink() In the early days of NFS, there was no duplicate reply cache on the server. Thus retransmitted non-idempotent requests often found that the request had already completed on the server. To avoid passing an unanticipated return code to unsuspecting applications, NFS clients would often shunt error codes that implied the request had been retried but already completed. Thanks to NFS over TCP, duplicate reply caches on the server, and network performance and reliability improvements, it is safe to remove such checks. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:52 -04:00
Chuck Lever	ae5c79476f	NFSD: Convert NFS server callback logic to use new rpc_create API Replace xprt_create_proto/rpc_create_client call in NFS server callback functions to use new rpc_create() API. Test plan: NFSv4 delegation functionality tests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:50 -04:00
Chuck Lever	41877d207c	NFS: Convert NFS client to use new rpc_create() API Convert NFS client mount logic to use rpc_create() instead of the old xprt_create_proto/rpc_create_client API. Test plan: Mount stress tests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:50 -04:00
Chuck Lever	e1ec78928b	LOCKD: Convert to use new rpc_create() API Replace xprt_create_proto/rpc_create_client with new rpc_create() interface in the Network Lock Manager. Note that the semantics of NLM transports is now "hard" instead of "soft" to provide a better guarantee that lock requests will get to the server. Test plan: Repeated runs of Connectathon locking suite. Check network trace to ensure NLM requests are working correctly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:50 -04:00
Chuck Lever	6ca9482387	SUNRPC: Clean-up after previous patches. Remove some unused macros related to accessing an RPC peer address Test plan: Compile kernel with CONFIG_NFS option enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:49 -04:00
Chuck Lever	39d7bbcb5b	SUNRPC: remove extraneous header inclusions include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h. We can remove xprt.h from source files that already include clnt.h. Likewise include/linux/sunrpc/timer.h. Test plan: Compile kernel with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:47 -04:00
Chuck Lever	44c31be261	LOCKD: Teach lockd to use the new rpc_peeraddr() API Hide the details of how the RPC client stores remote peer addresses from the Network Lock Manager. Test plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:46 -04:00
Trond Myklebust	9c5bf38d85	NFS: Fix nfs_alloc_client() The scheme to indicate which services have been started up appears to be seriously broken. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:38 -04:00
Trond Myklebust	36b15c54cd	NFS: Ensure NFSv2/v3 mounts respect the NFS_MOUNT_SECFLAVOUR flag Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:38 -04:00
David Howells	738a351959	NFS: Secure the roots of the NFS subtrees in a shared superblock Invoke security_d_instantiate() on root dentries after allocating them with dentry_alloc_anon(). Normally dentry_alloc_root() would do that, but we don't call that as we don't want to assign a name to the root dentry at this point (we may discover the real name later). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:38 -04:00
David Howells	27ba851244	NFS: Fix error handling Fix an error handling problem: nfs_put_client() can be given a NULL pointer if nfs_free_server() is asked to destroy a partially initialised record. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:37 -04:00
David Howells	6aaca56650	NFS: Add server and volume lists to /proc Make two new proc files available: /proc/fs/nfsfs/servers /proc/fs/nfsfs/volumes The first lists the servers with which we are currently dealing (struct nfs_client), and the second lists the volumes we have on those servers (struct nfs_server). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:37 -04:00
David Howells	54ceac4515	NFS: Share NFS superblocks per-protocol per-server per-FSID The attached patch makes NFS share superblocks between mounts from the same server and FSID over the same protocol. It does this by creating each superblock with a false root and returning the real root dentry in the vfsmount presented by get_sb(). The root dentry set starts off as an anonymous dentry if we don't already have the dentry for its inode, otherwise it simply returns the dentry we already have. We may thus end up with several trees of dentries in the superblock, and if at some later point one of anonymous tree roots is discovered by normal filesystem activity to be located in another tree within the superblock, the anonymous root is named and materialises attached to the second tree at the appropriate point. Why do it this way? Why not pass an extra argument to the mount() syscall to indicate the subpath and then pathwalk from the server root to the desired directory? You can't guarantee this will work for two reasons: (1) The root and intervening nodes may not be accessible to the client. With NFS2 and NFS3, for instance, mountd is called on the server to get the filehandle for the tip of a path. mountd won't give us handles for anything we don't have permission to access, and so we can't set up NFS inodes for such nodes, and so can't easily set up dentries (we'd have to have ghost inodes or something). With this patch we don't actually create dentries until we get handles from the server that we can use to set up their inodes, and we don't actually bind them into the tree until we know for sure where they go. (2) Inaccessible symbolic links. If we're asked to mount two exports from the server, eg: mount warthog:/warthog/aaa/xxx /mmm mount warthog:/warthog/bbb/yyy /nnn We may not be able to access anything nearer the root than xxx and yyy, but we may find out later that /mmm/www/yyy, say, is actually the same directory as the one mounted on /nnn. What we might then find out, for example, is that /warthog/bbb was actually a symbolic link to /warthog/aaa/xxx/www, but we can't actually determine that by talking to the server until /warthog is made available by NFS. This would lead to having constructed an errneous dentry tree which we can't easily fix. We can end up with a dentry marked as a directory when it should actually be a symlink, or we could end up with an apparently hardlinked directory. With this patch we need not make assumptions about the type of a dentry for which we can't retrieve information, nor need we assume we know its place in the grand scheme of things until we actually see that place. This patch reduces the possibility of aliasing in the inode and page caches for inodes that may be accessed by more than one NFS export. It also reduces the number of superblocks required for NFS where there are many NFS exports being used from a server (home directory server + autofs for example). This in turn makes it simpler to do local caching of network filesystems, as it can then be guaranteed that there won't be links from multiple inodes in separate superblocks to the same cache file. Obviously, cache aliasing between different levels of NFS protocol could still be a problem, but at least that gives us another key to use when indexing the cache. This patch makes the following changes: (1) The server record construction/destruction has been abstracted out into its own set of functions to make things easier to get right. These have been moved into fs/nfs/client.c. All the code in fs/nfs/client.c has to do with the management of connections to servers, and doesn't touch superblocks in any way; the remaining code in fs/nfs/super.c has to do with VFS superblock management. (2) The sequence of events undertaken by NFS mount is now reordered: (a) A volume representation (struct nfs_server) is allocated. (b) A server representation (struct nfs_client) is acquired. This may be allocated or shared, and is keyed on server address, port and NFS version. (c) If allocated, the client representation is initialised. The state member variable of nfs_client is used to prevent a race during initialisation from two mounts. (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we are given the root FH in advance. (e) The volume FSID is probed for on the root FH. (f) The volume representation is initialised from the FSINFO record retrieved on the root FH. (g) sget() is called to acquire a superblock. This may be allocated or shared, keyed on client pointer and FSID. (h) If allocated, the superblock is initialised. (i) If the superblock is shared, then the new nfs_server record is discarded. (j) The root dentry for this mount is looked up from the root FH. (k) The root dentry for this mount is assigned to the vfsmount. (3) nfs_readdir_lookup() creates dentries for each of the entries readdir() returns; this function now attaches disconnected trees from alternate roots that happen to be discovered attached to a directory being read (in the same way nfs_lookup() is made to do for lookup ops). The new d_materialise_unique() function is now used to do this, thus permitting the whole thing to be done under one set of locks, and thus avoiding any race between mount and lookup operations on the same directory. (4) The client management code uses a new debug facility: NFSDBG_CLIENT which is set by echoing 1024 to /proc/net/sunrpc/nfs_debug. (5) Clone mounts are now called xdev mounts. (6) Use the dentry passed to the statfs() op as the handle for retrieving fs statistics rather than the root dentry of the superblock (which is now a dummy). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:37 -04:00
David Howells	cf6d7b5de8	NFS: Start rpciod in server common management Start rpciod in the server common (nfs_client struct) management code rather than in the superblock management code. This means we only need to "start" it once per server instead of once per superblock. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:36 -04:00
David Howells	5006a76cca	NFS: Eliminate client_sys in favour of cl_rpcclient Eliminate nfs_server::client_sys in favour of nfs_client::cl_rpcclient as we only really need one per server that we're talking to since it doesn't have any security on it. The retransmission management variables are also moved to the common struct as they're required to set up the cl_rpcclient connection. The NFS2/3 client and client_acl connections are thenceforth derived by cloning the cl_rpcclient connection and post-applying the authorisation flavour. The code for setting up the initial common connection has been moved to client.c as nfs_create_rpc_client(). All the NFS program definition tables are also moved there as that's where they're now required rather than super.c. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:36 -04:00
David Howells	8fa5c000d7	NFS: Move rpc_ops from nfs_server to nfs_client Move the rpc_ops from the nfs_server struct to the nfs_client struct as they're common to all server records of a particular NFS protocol version. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:35 -04:00
David Howells	1f163415dc	NFS: Make better use of inode* dereferencing macros Make better use of inode* dereferencing macros to hide dereferencing chains (including NFS_PROTO and NFS_CLIENT). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:35 -04:00
David Howells	27951bd260	NFS: Maintain a common server record for NFS2/3 as well as for NFS4 Maintain a common server record for NFS2/3 as well as for NFS4 so that common stuff can be moved there from struct nfs_server. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:35 -04:00
David Howells	509de81116	NFS: Add extra const qualifiers Add some extra const qualifiers into NFS. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:34 -04:00
David Howells	0c7d90cfed	NFS: Use the dentry superblock directly in nfs_statfs() Use the nominated dentry's superblock directly in the NFS statfs() op to get a file handle, rather than using s_root (which will become a dummy dentry in a future patch). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:34 -04:00
David Howells	24c8dbbb5f	NFS: Generalise the nfs_client structure Generalise the nfs_client structure by: (1) Moving nfs_client to a more general place (nfs_fs_sb.h). (2) Renaming its maintenance routines to be non-NFS4 specific. (3) Move those maintenance routines to a new non-NFS4 specific file (client.c) and move the declarations to internal.h. (4) Make nfs_find/get_client() take a full sockaddr_in to include the port number (will be required for NFS2/3). (5) Make nfs_find/get_client() take the NFS protocol version (again will be required to differentiate NFS2, 3 & 4 client records). Also: (6) Make nfs_client construction proceed akin to inodes, marking them as under construction and providing a function to indicate completion. (7) Make nfs_get_client() wait interruptibly if it finds a client that it can share, but that client is currently being constructed. (8) Make nfs4_create_client() use (6) and (7) instead of locking cl_sem. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:33 -04:00
David Howells	e9326dcab4	NFS: Add a server capabilities NFS RPC op Add a set_capabilities NFS RPC op so that the server capabilities can be set. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:33 -04:00
David Howells	2b3de4411b	NFS: Add a lookupfh NFS RPC op Add a lookup filehandle NFS RPC op so that a file handle can be looked up without requiring dentries and inodes and other VFS stuff when doing an NFS4 pathwalk during mounting. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:32 -04:00

... 2 3 4 5 6 ...

3631 Commits