linux

Author	SHA1	Message	Date
Al Viro	f19d4a8fa6	add caching of ACLs in struct inode No helpers, no conversions yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:27 -04:00
Ankit Jain	3e63cbb1ef	fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls This patch adds ioctls to vfs for compatibility with legacy XFS pre-allocation ioctls (XFS_IOC_RESVP). The implementation effectively invokes sys_fallocate for the new ioctls. Also handles the compat_ioctl case. Note: These legacy ioctls are also implemented by OCFS2. [AV: folded fixes from hch] Signed-off-by: Ankit Jain <me@ankitjain.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:27 -04:00
Christoph Hellwig	01c031945f	cleanup __writeback_single_inode There is no reason to for the split between __writeback_single_inode and __sync_single_inode, the former just does a couple of checks before tail-calling the latter. So merge the two, and while we're at it split out the I_SYNC waiting case for data integrity writers, as it's logically separate function. Finally rename __writeback_single_inode to writeback_single_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:26 -04:00
Al Viro	f21f62208a	... and the same for vfsmount id/mount group id Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:26 -04:00
Al Viro	c63e09eccc	Make allocation of anon devices cheaper Standard trick - add a new variable (start) such that for each n < start n is known to be busy. Allocation can skip checking everything in [0..start) and if it returns n, we can set start to n + 1. Freeing below start sets start to what we'd just freed. Of course, it still sucks if we do something like free 0 allocate allocate in a loop - still O(n^2) time. However, on saner loads it improves the things a lot and the entire thing is not worth the trouble of switching to something with better worst-case behaviour. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:25 -04:00
H. Peter Anvin	f6cc746bbb	devpts: remove module-related code These days, the devpts filesystem is closely integrated with the pty memory management, and cannot be built as a module, even less removed from the kernel. Accordingly, remove all module-related stuff from this filesystem. [ v2: only remove code that's actually dead ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
Trond Myklebust	3b22edc573	VFS: Switch init_mount_tree() to use the new create_mnt_ns() helper Eliminates some duplicated code... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
J. R. Okajima	654f562c52	vfs: fix nd->root leak in do_filp_open() commit `2a73787110` "Cache root in nameidata" introduced a new member nd->root, but forgot to put it in do_filp_open(). Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
Christoph Hellwig	b5450d9c84	reiserfs: remove stray unlock_super in reiserfs_resize Reiserfs doesn't use lock_super anywhere internally, and ->remount_fs which calls reiserfs_resize does have it currently but also expects it to be held on return, so there's no business for the unlock_super here. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked by Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:15:24 -04:00
Roel Kluin	3391faa4f1	udf: remove redundant tests on unsigned first_block and goal are unsigned. When negative they are wrapped and caught by the other test. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-06-24 13:48:28 +02:00
Linus Torvalds	cf5434e894	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. ocfs2: Add lockdep annotations vfs: Set special lockdep map for dirs only if not set by fs ocfs2: Disable orphan scanning for local and hard-ro mounts ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() ocfs2: Stop orphan scan as early as possible during umount ocfs2: Fix ocfs2_osb_dump() ocfs2: Pin journal head before accessing jh->b_committed_data ocfs2: Update atime in splice read if necessary. ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.	2009-06-23 19:36:02 -07:00
Trond Myklebust	b88f8a546f	NFS: Correct the NFS mount path when following a referral Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	0b75b35c7c	NFS: Fix nfs_path() to always return a '/' at the beginning of the path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	c02d7adf8c	NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace As noted in the previous patch, the NFSv4 client mount code currently has several limitations. If the mount path contains symlinks, or referrals, or even if it just contains a '..', then the client code in nfs4_path_walk() will fail with an error. This patch replaces the nfs4_path_walk()-based lookup with a helper function that sets up a private namespace to represent the namespace on the server, then uses the ordinary VFS and NFS path lookup code to walk down the mount path in that namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	cf8d2c11cb	VFS: Add VFS helper functions for setting up private namespaces The purpose of this patch is to improve the remote mount path lookup support for distributed filesystems such as the NFSv4 client. When given a mount command of the form "mount server:/foo/bar /mnt", the NFSv4 client is required to look up the filehandle for "server:/", and then look up each component of the remote mount path "foo/bar" in order to find the directory that is actually going to be mounted on /mnt. Following that remote mount path may involve following symlinks, crossing server-side mount points and even following referrals to filesystem volumes on other servers. Since the standard VFS path lookup code already supports walking paths that contain all these features (using in-kernel automounts for following referrals) we would like to be able to reuse that rather than duplicate the full path traversal functionality in the NFSv4 client code. This patch therefore defines a VFS helper function create_mnt_ns(), that sets up a temporary filesystem namespace and attaches a root filesystem to it. It exports the create_mnt_ns() and put_mnt_ns() function for use by filesystem modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	616511d039	VFS: Uninline the function put_mnt_ns() In order to allow modules to use it without having to export vfsmount_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
David Woodhouse	4839641333	jffs2: fix another potential leak on error path in scan.c Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-06-23 01:34:19 +01:00
Linus Torvalds	ac1b7c378e	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (63 commits) mtd: OneNAND: Allow setting of boundary information when built as module jffs2: leaking jffs2_summary in function jffs2_scan_medium mtd: nand: Fix memory leak on txx9ndfmc probe failure. mtd: orion_nand: use burst reads with double word accesses mtd/nand: s3c6400 support for s3c2410 driver [MTD] [NAND] S3C2410: Use DIV_ROUND_UP [MTD] [NAND] S3C2410: Deal with unaligned lengths in S3C2440 buffer read/write [MTD] [NAND] S3C2410: Allow the machine code to get the BBT table from NAND [MTD] [NAND] S3C2410: Added a kerneldoc for s3c2410_nand_set mtd: physmap_of: Add multiple regions and concatenation support mtd: nand: max_retries off by one in mxc_nand mtd: nand: s3c2410_nand_setrate(): use correct macros for 2412/2440 mtd: onenand: add bbt_wait & unlock_all as replaceable for some platform mtd: Flex-OneNAND support mtd: nand: add OMAP2/OMAP3 NAND driver mtd: maps: Blackfin async: fix memory leaks in probe/remove funcs mtd: uclinux: mark local stuff static mtd: uclinux: do not allow to be built as a module mtd: uclinux: allow systems to override map addr/size mtd: blackfin NFC: fix hang when using NAND on BF527-EZKITs ...	2009-06-22 16:56:22 -07:00
Tao Ma	d246ab307d	ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. Actually ocfs2_sysfile_cluster_lock_key is only used if we enable CONFIG_DEBUG_LOCK_ALLOC. Wrap it so that we can avoid a building warning. fs/ocfs2/sysfile.c:53: warning: ‘ocfs2_sysfile_cluster_lock_key’ defined but not used Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:29 -07:00
Jan Kara	cb25797d45	ocfs2: Add lockdep annotations Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:26 -07:00
Jan Kara	9a7aa12f39	vfs: Set special lockdep map for dirs only if not set by fs Some filesystems need to set lockdep map for i_mutex differently for different directories. For example OCFS2 has system directories (for orphan inode tracking and for gathering all system files like journal or quota files into a single place) which have different locking locking rules than standard directories. For a filesystem setting lockdep map is naturaly done when the inode is read but we have to modify unlock_new_inode() not to overwrite the lockdep map the filesystem has set. Acked-by: peterz@infradead.org CC: mingo@redhat.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:22 -07:00
Sunil Mushran	df152c241d	ocfs2: Disable orphan scanning for local and hard-ro mounts Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:55 -07:00
Sunil Mushran	3211949f89	ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() We don't access the LVB in our ocfs2_*_lock_res_init() functions. Since the LVB can become invalid during some cluster recovery operations, the dlmglue must be able to handle an uninitialized LVB. For the orphan scan lock, we initialized an uninitialzed LVB with our scan sequence number plus one. This starts a normal orphan scan cycle. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:53 -07:00
Sunil Mushran	692684e19e	ocfs2: Stop orphan scan as early as possible during umount Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:51 -07:00
Sunil Mushran	c3d38840ab	ocfs2: Fix ocfs2_osb_dump() Skip printing information that is not valid for local mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:49 -07:00
Sunil Mushran	94e41ecfe0	ocfs2: Pin journal head before accessing jh->b_committed_data This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses to jh->b_committed_data. Fixes oss bugzilla#1131 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:47 -07:00
Tao Ma	1962f39abb	ocfs2: Update atime in splice read if necessary. We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so that we can update atime in splice read if necessary. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:45 -07:00
Joel Becker	1c520dfbf3	ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and the DLM cannot reconstruct its state. Clients of the DLM need to know this. ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses track of the state. This is not a standard behavior, but ocfs2 has always relied on it. Thus, an o2dlm LVB is always "valid". ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When fs/dlm loses track of an LVBs state, it sets a flag (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of the LVB may be garbage or merely stale. ocfs2 doesn't want to try to guess at the validity of the stale LVB. Instead, it should be checking the VALNOTVALID flag. As this is the 'standard' way of treating LVBs, we will promote this behavior. We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when the LVB is valid. o2dlm will always return valid, while fs/dlm will check VALNOTVALID. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com>	2009-06-22 14:24:30 -07:00
Linus Torvalds	7e0338c0de	Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits) SUNRPC: Fix the TCP server's send buffer accounting nfsd41: Backchannel: minorversion support for the back channel nfsd41: Backchannel: cleanup nfs4.0 callback encode routines nfsd41: Remove ip address collision detection case nfsd: optimise the starting of zero threads when none are running. nfsd: don't take nfsd_mutex twice when setting number of threads. nfsd41: sanity check client drc maxreqs nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct NFS: kill off complicated macro 'PROC' sunrpc: potential memory leak in function rdma_read_xdr nfsd: minor nfsd_vfs_write cleanup nfsd: Pull write-gathering code out of nfsd_vfs_write nfsd: track last inode only in use_wgather case sunrpc: align cache_clean work's timer nfsd: Use write gathering only with NFSv2 NFSv4: kill off complicated macro 'PROC' NFSv4: do exact check about attribute specified knfsd: remove unreported filehandle stats counters knfsd: fix reply cache memory corruption knfsd: reply cache cleanups ...	2009-06-22 12:55:50 -07:00
Linus Torvalds	df36b439c5	Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (128 commits) nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh() nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res nfs: remove unnecessary NFS_INO_INVALID_ACL checks NFS: More "sloppy" parsing problems NFS: Invalid mount option values should always fail, even with "sloppy" NFS: Remove unused XDR decoder functions NFS: Update MNT and MNT3 reply decoding functions NFS: add XDR decoder for mountd version 3 auth-flavor lists NFS: add new file handle decoders to in-kernel mountd client NFS: Add separate mountd status code decoders for each mountd version NFS: remove unused function in fs/nfs/mount_clnt.c NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument NFS: Clean up MNT program definitions lockd: Don't bother with RPC ping for NSM upcalls lockd: Update NSM state from SM_MON replies NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available NFS: Return error code from nfs_callback_up() to user space NFS: Do not display the setting of the "intr" mount option NFS: add support for splice writes nfs41: Backchannel: CB_SEQUENCE validation ...	2009-06-22 12:53:06 -07:00
Hitoshi Mitake	e38be994b9	Making fs/minix/minix.h double including safe I happened to find that fs/minix/minix.h doesn't guard double include. Yes, I know this never cause something destructive because this is self-evidence that no source file includes minix.h twice, but I think fixing this is better than disregarding it. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 11:34:42 -07:00
Benny Halevy	578e458568	nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res nfs4_open_recover_helper clears opendata->o_res before calling nfs4_init_opendata_res, thus causing NFSv4.0 OPEN operations to be sent rather than nfsv4.1. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-20 14:55:12 -04:00
Linus Torvalds	12e24f34cb	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...	2009-06-20 11:29:32 -07:00
OGAWA Hirofumi	3e107603ae	fat: Fix the removal of opts->fs_dmask (`ce3b0f8d5c`: New helper - current_umask()) is removing the opts->fs_dmask, probably it's a cut-and-paste miss or something. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>	2009-06-20 21:50:47 +09:00
Linus Torvalds	bee89ab228	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: inotify_destroy_mark_entry could get called twice	2009-06-19 17:46:44 -07:00
Linus Torvalds	31583d6acf	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Fix kernel-doc parameter name typo in blk-settings.c: block: rename CONFIG_LBD to CONFIG_LBDAF block: Fix bounce_pfn setting hd: stop defining MAJOR_NR	2009-06-19 17:43:04 -07:00
Eric Paris	528da3e9e2	inotify: inotify_destroy_mark_entry could get called twice inotify_destroy_mark_entry could get called twice for the same mark since it is called directly in inotify_rm_watch and when the mark is being destroyed for another reason. As an example assume that the file being watched was just deleted so inotify_destroy_mark_entry would get called from the path fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() -> fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry(). If this happened at the same time as userspace tried to remove a watch via inotify_rm_watch we could attempt to remove the mark from the idr twice and could thus double dec the ref cnt and potentially could be in a use after free/double free situation. The fix is to have inotify_rm_watch use the generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the inotify_destroy_mark_entry() function can only be called one. This patch also renames the function to inotify_ingored_remove_idr() so it is clear what is actually going on in the function. Hopefully this fixes: [ 20.342058] idr_remove called for id=20 which is not allocated. [ 20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077 [ 20.353933] Call Trace: [ 20.356410] [<ffffffff811a82b7>] idr_remove+0x115/0x18f [ 20.361737] [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75 [ 20.367061] [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf [ 20.373771] [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf [ 20.380306] [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10 [ 20.386238] [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170 [ 20.393293] [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf [ 20.399829] [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6 [ 20.405850] [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by: Peter Ziljlstra <peterz@infradead.org>	2009-06-19 12:42:48 -04:00
Bartlomiej Zolnierkiewicz	90c699a9ee	block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 08:08:50 +02:00
Andy Adamson	ab52ae6db0	nfsd41: Backchannel: minorversion support for the back channel Prepare to share backchannel code with NFSv4.1. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [nfsd41: use nfsd4_cb_sequence for callback minorversion] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-18 18:33:57 -07:00
Andy Adamson	ef52bff840	nfsd41: Backchannel: cleanup nfs4.0 callback encode routines Mimic the client and prepare to share the back channel xdr with NFSv4.1. Bump the number of operations in each encode routine, then backfill the number of operations. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-18 18:33:57 -07:00
Trond Myklebust	1f84603c09	Merge branch 'devel-for-2.6.31' into for-2.6.31 Conflicts: fs/nfs/client.c fs/nfs/super.c	2009-06-18 18:13:44 -07:00
Mike Sager	6ddbbbfe52	nfsd41: Remove ip address collision detection case Verified that cthon and pynfs exchange id tests pass (except for the two expected fails: EID8 and EID50) Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-06-18 17:43:53 -07:00
Linus Torvalds	0732f87761	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: clean up jbd2_journal_try_to_free_buffers() ext4: Don't update ctime for non-extent-mapped inodes ext4: Fix up whitespace issues in fs/ext4/inode.c ext4: Fix 64-bit block type problem on 32-bit platforms ext4: teach the inode allocator to use a goal inode number ext4: Use a hash of the topdir directory name for the Orlov parent group ext4: document the "abort" mount option ext4: move the abort flag from s_mount_opts to s_mount_flags ext4: update the s_last_mounted field in the superblock ext4: change s_mount_opt to be an unsigned int ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl ext4: avoid unnecessary spinlock in critical POSIX ACL path ext3: avoid unnecessary spinlock in critical POSIX ACL path ext4: convert instrumentation from markers to tracepoints jbd2: convert instrumentation from markers to tracepoints	2009-06-18 14:07:46 -07:00
Peter Oberparleiter	0b923606e7	seq_file: add function to write binary data seq_write() can be used to construct seq_files containing arbitrary data. Required by the gcov-profiling interface to synthesize binary profiling data files. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Oleg Nesterov	3b34fc5880	elf_core_dump: use rcu_read_lock() to access ->real_parent In theory it is not safe to dereference ->parent/real_parent without tasklist or rcu lock, we can race with re-parenting. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Jeff Mahoney	1d965fe0eb	reiserfs: fix warnings with gcc 4.4 Several code paths in reiserfs have a construct like: if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ... which, in addition to being ugly, end up causing compiler warnings with gcc 4.4.0. Previous compilers didn't issue a warning. fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefined I believe this is due to the ih being passed to macros which evaluate the argument more than once. This is old code and we haven't seen any problems with it, but this patch eliminates the warnings. It converts the multiple evaluation macros to static inlines and does a preassignment for the cases that were causing the warnings because that code is just ugly. Reported-by: Chris Mason <mason@oracle.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:46 -07:00
Roel Kluin	37044c86ba	ufs: sector_t cannot be negative unsigned i_block,fragment cannot be negative. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:46 -07:00
Jan Kara	5404ac8e44	isofs: cleanup mount option processing Remove unused variables from isofs_sb_info (used to be some mount options), unify variables for option to use 0/1 (some options used 'y'/'n'), use bit fields for option flags in superblock. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	5c4a656b7e	isofs: fix setting of uid and gid to 0 isofs allows setting of default uid and gid of files but value 0 was used to indicate that user did not specify any uid/gid mount option. Since this option also overrides uid/gid set in Rock Ridge extension, it makes sense to allow forcing uid/gid 0. Fix option processing to allow this. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	52b680c812	isofs: let mode and dmode mount options override rock ridge mode setting So far, permissions set via 'mode' and/or 'dmode' mount options were effective only if the medium had no rock ridge extensions (or was mounted without them). Add 'overriderockmode' mount option to indicate that these options should override permissions set in rock ridge extensions. Maybe this should be default but the current behavior is there since mount options were created so I think we should not change how they behave. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	ef43618a47	ext3: make sure inode is deleted from orphan list after truncate As Ted pointed out, it can happen that ext3_truncate() returns without removing inode from orphan list. This way we could in some rare cases (like when we get ENOMEM from an allocation in ext3_truncate called because of failed ext3_write_begin) leave the inode on orphan list and that triggers assertion failure on umount. So make ext3_truncate() always remove inode from in-memory orphan list. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Hisashi Hifumi	6f3f1cb21f	jbd: clean up journal_try_to_free_buffers() I delete the following patch "commit `3f31fddfa2` Author: Mingming Cao <cmm@us.ibm.com> Date: Fri Jul 25 01:46:22 2008 -0700 jbd: fix race between free buffer and commit transaction This patch is no longer needed because if race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO by the following patch. commit `6ccfa806a9` Author: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Date: Tue Sep 2 14:35:40 2008 -0700 VFS: fix dio write returning EIO when try_to_release_page fails Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Theodore Tso <tytso@mit.edu> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	e8ef7aaea7	ext3: fix chain verification in ext3_get_blocks() Chain verification in ext3_get_blocks() has been hosed since it called verify_chain(chain, NULL) which always returns success. As a result readers could in theory race with truncate. On the other hand the race probably cannot happen with the current locking scheme, since by the time ext3_truncate() is called all the pages are already removed and hence get_block() shouldn't be called on such pages... Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:45 -07:00
Jan Kara	39fe7557b4	ext2: Do not update mtime of a moved directory One of our users is complaining that his backup tool is upset on ext2 (while it's happy on ext3, xfs, ...) because of the mtime change. The problem is: mkdir foo mkdir bar mkdir foo/a Now under ext2: mv foo/a foo/b changes mtime of 'foo/a' (foo/b after the move). That does not really make sense and it does not happen under any other filesystem I've seen. More complicated is: mv foo/a bar/a This changes mtime of foo/a (bar/a after the move) and it makes some sense since we had to update parent directory pointer of foo/a. But again, no other filesystem does this. So after some thoughts I'd vote for consistency and change ext2 to behave the same as other filesystems. Do not update mtime of a moved directory. Specs don't say anything about it (neither that it should, nor that it should not be updated) and other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do it. So let's become more consistent. Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel. Reported-by: <ronny.pretzsch@dfs.de> Cc: <hare@suse.de> Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:44 -07:00
Cyrill Gorcunov	2f6d311080	proc: vmcore - use kzalloc in get_new_element() Instead of kmalloc+memset better use straight kzalloc Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
Michal Simek	bcac2b1b7d	procfs: remove sparse errors in proc_devtree.c CHECK fs/proc/proc_devtree.c fs/proc/proc_devtree.c:197:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:203:34: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:210:14: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:223:26: warning: Using plain integer as NULL pointer fs/proc/proc_devtree.c:226:14: warning: Using plain integer as NULL pointer Signed-off-by: Michal Simek <monstr@monstr.eu> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
Davide Libenzi	3fe4a975d6	epoll: fix nested calls support This fixes a regression in 2.6.30. I unfortunately accepted a patch time ago, to drop the "current" usage from possible IRQ context, w/out proper thought over it. The patch switched to using the CPU id by bounding the nested call callback with a get_cpu()/put_cpu(). Unfortunately the ep_call_nested() function can be called with a callback that grabs sleepy locks (from own f_op->poll()), that results in epic fails. The following patch uses the proper "context" depending on the path where it is called, and on the kind of callback. This has been reported by Stefan Richter, that has also verified the patch is his previously failing environment. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
Keika Kobayashi	d3d64df21d	proc: export statistics for softirq to /proc Export statistics for softirq in /proc/softirqs and /proc/stat. 1. /proc/softirqs Implement /proc/softirqs which shows the number of softirq for each CPU like /proc/interrupts. 2. /proc/stat Add the "softirq" line to /proc/stat. This line shows the number of softirq for all cpu. The first column is the total of all softirqs and each subsequent column is the total for particular softirq. [kosaki.motohiro@jp.fujitsu.com: remove redundant for_each_possible_cpu() loop] Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp> Reviewed-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:41 -07:00
NeilBrown	671e1fcf63	nfsd: optimise the starting of zero threads when none are running. Currently, if we ask to set then number of nfsd threads to zero when there are none running, we set up all the sockets and register the service, and then tear it all down again. This is pointless. So detect that case and exit promptly. (also remove an assignment to 'error' which was never used. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com>	2009-06-18 09:42:41 -07:00
NeilBrown	82e12fe924	nfsd: don't take nfsd_mutex twice when setting number of threads. Currently when we write a number to 'threads' in nfsdfs, we take the nfsd_mutex, update the number of threads, then take the mutex again to read the number of threads. Mostly this isn't a big deal. However if we are write '0', and portmap happens to be dead, then we can get unpredictable behaviour. If the nfsd threads all got killed quickly and the last thread is waiting for portmap to respond, then the second time we take the mutex we will block waiting for the last thread. However if the nfsd threads didn't die quite that fast, then there will be no contention when we try to take the mutex again. Unpredictability isn't fun, and waiting for the last thread to exit is pointless, so avoid taking the lock twice. To achieve this, get nfsd_svc return a non-negative number of active threads when not returning a negative error. Signed-off-by: NeilBrown <neilb@suse.de>	2009-06-18 09:40:31 -07:00
Peter Zijlstra	d3a9262e59	fs: Provide empty .set_page_dirty() aop for anon inodes .set_page_dirty() is one of those a_ops that defaults to the buffer implementation when not set. Therefore provide a dummy function to make it do nothing. (Uncovered by perfcounters fd's which can now be writable-mmap-ed.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-18 14:46:10 +02:00
Jan Kara	24a5d59f34	udf: Use device size when drive reported bogus number of written blocks Some drives report 0 as the number of written blocks when there are some blocks recorded. Use device size in such case so that we can automagically mount such media. Signed-off-by: Jan Kara <jack@suse.cz>	2009-06-18 12:33:16 +02:00
James Morris	4bf259e3ae	nfs: remove unnecessary NFS_INO_INVALID_ACL checks Unless I'm mistaken, NFS_INO_INVALID_ACL is being checked twice during getacl calls (i.e. first via nfs_revalidate_inode() and then by each all site). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:14 -07:00
Chuck Lever	a5a16bae70	NFS: More "sloppy" parsing problems Specifying "port=-5" with the kernel's current mount option parser generates "unrecognized mount option". If "sloppy" is set, this causes the mount to succeed and use the default values; the desired behavior is that, since this is a valid option with an invalid value, the mount should fail, even with "sloppy." To properly handle "sloppy" parsing, we need to distinguish between correct options with invalid values, and incorrect options. We will need to parse integer values by hand, therefore, and not rely on match_token(). For instance, these must all fail with "invalid value": port=12345678 port=-5 port=samuel and not with "unrecognized option," as they do currently. Thus, for the sake of match_token() we need to treat the values for these options as strings, and do the conversion to integers using strict_strtol(). This is basically the same solution we used for the earlier "retry=" fix (commit `ecbb3845`), except in this case the kernel actually has to parse the value, rather than ignore it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:14 -07:00
Chuck Lever	d23c45fd84	NFS: Invalid mount option values should always fail, even with "sloppy" Ian Kent reports: "I've noticed a couple of other regressions with the options vers and proto option of mount.nfs(8). The commands: mount -t nfs -o vers=<invalid version> <server>:/<path> /<mountpoint> mount -t nfs -o proto=<invalid proto> <server>:/<path> /<mountpoint> both immediately fail. But if the "-s" option is also used they both succeed with the mount falling back to defaults (by the look of it). In the past these failed even when the sloppy option was given, as I think they should. I believe the sloppy option is meant to allow the mount command to still function for mount options (for example in shared autofs maps) that exist on other Unix implementations but aren't present in the Linux mount.nfs(8). So, an invalid value specified for a known mount option is different to an unknown mount option and should fail appropriately." See RH bugzilla 486266. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:13 -07:00
Chuck Lever	065015e5ef	NFS: Remove unused XDR decoder functions Clean up: Remove xdr_decode_fhstatus() and xdr_decode_fhstatus3(), now that they are unused. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:13 -07:00
Chuck Lever	8e02f6b9aa	NFS: Update MNT and MNT3 reply decoding functions Solder xdr_stream-based XDR decoding functions into the in-kernel mountd client that are more careful about checking data types and watching for buffer overflows. The new MNT3 decoder includes support for auth-flavor list decoding. The "_sz" macro for MNT3 replies was missing the size of the file handle. I've added this back, and included the size of the auth flavor array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:13 -07:00
Chuck Lever	a14017db28	NFS: add XDR decoder for mountd version 3 auth-flavor lists Introduce an xdr_stream-based XDR decoder that can unpack the auth- flavor list returned in a MNT3 reply. The nfs_mount() function's caller allocates an array, and passes the size and a pointer to it. The decoder decodes all the flavors it can into the array, and returns the number of decoded flavors. If the caller is not interested in the auth flavors, it can pass a value of zero as the size of the pre-allocated array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:12 -07:00
Chuck Lever	4fdcd9966d	NFS: add new file handle decoders to in-kernel mountd client Introduce xdr_stream-based XDR file handle decoders to the in-kernel mountd client. These are more careful than the existing decoder functions about buffer overflows and data type and range checking. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:12 -07:00
Chuck Lever	fb12529577	NFS: Add separate mountd status code decoders for each mountd version Introduce data structures and xdr_stream-based decoding functions for unmarshalling mountd status codes properly. Mountd version 3 uses specific standard error return codes that are not errno values and not NFS3ERR_ values. These have a well-defined standard mapping to local errno values. Introduce data structures and a decoder function that map these status codes to local errno values properly. This is new functionality (but not used yet). Version 1 mountd status values are defined by RFC 1094 as UNIX error values (errno values). Errno values on heterogeneous systems do not necessarily match each other. To avoid exposing possibly incorrect errno values to upper layers, the current XDR decoder converts all non-zero MNT version 1 status codes to -EACCES. The OpenGroup XNFS standard provides a mapping similar to but smaller than the version 3 error codes. Implement a decoder that uses the XNFS error codes, replacing the current decoder. For both mountd protocol versions, map unrecognized errors to -EACCES. Finally we introduce a replacement data structure for mnt_fhstatus at this time, which is used by the new XDR decoders. In addition to documenting that the status value returned by the XDR decoders is always an errno, this new structure will be expanded in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:12 -07:00
Chuck Lever	99835db430	NFS: remove unused function in fs/nfs/mount_clnt.c Clean up: remove xdr_encode_dirpath() now that it has been replaced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	29a1bd6bf8	NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument Check the length of the supplied dirpath, and see that it fits properly in the RPC buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	2ad780978b	NFS: Clean up MNT program definitions Clean up: Relocate MNT program procedure number definitions to the only file that uses them. Relocate the version number definitions, which are shared, to nfs.h. Remove duplicate program number definitions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	0e5c2632e1	lockd: Don't bother with RPC ping for NSM upcalls Cut NSM upcall RPC traffic in half -- don't do a NULL call first. The cases where a ping would be helpful are rare. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:11 -07:00
Chuck Lever	6c9dc42551	lockd: Update NSM state from SM_MON replies When rpc.statd starts up in user space at boot time, it attempts to write the latest NSM local state number into /proc/sys/fs/nfs/nsm_local_state. If lockd.ko isn't loaded yet (as is the case in most configurations), that file doesn't exist, thus the kernel's NSM state remains set to its initial value of zero during lockd operation. This is a problem because rpc.statd and lockd use the NSM state number to prevent repeated lock recovery on rebooted hosts. If lockd sends a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state number is received, there is no way for lockd or rpc.statd to distinguish that stale SM_NOTIFY from an actual reboot. Thus lock recovery could be performed after the rebooted host has already started reclaiming locks, and those locks will be lost. We could change /etc/init.d/nfslock so it always modprobes lockd.ko before starting rpc.statd. However, if lockd.ko is ever unloaded and reloaded, we are back at square one, since the NSM state is not preserved across an unload/reload cycle. This may happen frequently on clients that use automounter. A period of NFS inactivity causes lockd.ko to be unloaded, and the kernel loses its NSM state setting. Instead, let's use the fact that rpc.statd plants the local system's NSM state in every SM_MON (and SM_UNMON) reply. lockd performs a synchronous SM_MON upcall to the local rpc.statd _before_ sending its first NLM request to a new remote. This would permit rpc.statd to provide the current NSM state to lockd, even after lockd.ko had been unloaded and reloaded. Note that NLMPROC_LOCK arguments are constructed before the nsm_monitor() call, so we have to rearrange argument construction very slightly to make this all work out. And, the kernel appears to treat NSM state as a u32 (see struct nlm_args and nsm_res). Make nsm_local_state a u32 as well, to ensure we don't get bogus comparison results. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:10 -07:00
Chuck Lever	18fc316419	NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available Clear "ret" if the error return from svc_create_xprt(AF_INET6) was -EAFNOSUPORT. Otherwise, callback start-up will succeed, but nfs_callback_up() will return -EAFNOSUPPORT anyway, and the first NFSv4 mount attempt after a reboot will fail. Bug introduced by commit `f738f517` in 2.6.30-rc1. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:10 -07:00
Chuck Lever	a21bdd9b96	NFS: Return error code from nfs_callback_up() to user space If the kernel cannot start the NFSv4 callback service during a mount request, it returns -ENOMEM to user space, resulting in this message: mount.nfs4: Cannot allocate memory Adjust nfs_alloc_client() and nfs_get_client() to pass NFSv4 callback start-up errors back to user space so a less mysterious error message can be displayed by the mount command. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:10 -07:00
Chuck Lever	c381ad2cf2	NFS: Do not display the setting of the "intr" mount option The "intr" mount option has been deprecated for a while, but /proc/mounts continues to display "nointr" whether "intr" or "nointr" has been specified for a mount point. Since these options do not have any effect, simply do not display them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:09 -07:00
Suresh Jayaraman	bf40d3435c	NFS: add support for splice writes Adds support for splice writes. It effectively calls generic_file_splice_write() to do the writes. We need not worry about O_APPEND case as the combination of splice() writes and O_APPEND is disallowed. This patch propagates NFS write errors back to the caller. The number of bytes written via splice are being added to NFSIO_NORMALWRITTENBYTES as these are effectively cached writes. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-06-17 18:02:09 -07:00
Trond Myklebust	301933a0ac	Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31	2009-06-17 17:59:58 -07:00
Hisashi Hifumi	536fc240e7	jbd2: clean up jbd2_journal_try_to_free_buffers() This patch reverts `3f31fddf`, which is no longer needed because if a race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO due to the commit `6ccfa806`. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 20:08:51 -04:00
Ricardo Labiaga	68f3f90133	nfs41: Backchannel: CB_SEQUENCE validation Validates the callback's sessionID, the slot number, and the sequence ID. Increments the slot's sequence. Detects replays, but simply prints a debug message (if debugging is enabled since we don't yet implement a duplicate request cache for the backchannel. This should not present a problem, since only idempotent callbacks are currently implemented. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Backchannel: Be more obvious about the return value] [nfs41: Backchannel: dprink in host order] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:43 -07:00
Ricardo Labiaga	963891ac43	nfs41: Backchannel: New find_client_with_session() Finds the 'struct nfs_client' that matches the server's address, major version number, and session ID. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:43 -07:00
Ricardo Labiaga	f8625a6a4b	nfs41: Backchannel: Add a backchannel slot table to the session Defines a new 'struct nfs4_slot_table' in the 'struct nfs4_session' for use by the backchannel. Initializes, resets, and destroys the backchannel slot table in the same manner the forechannel slot table is initialized, reset, and destroyed. The sequenceid for each slot in the backchannel slot table is initialized to 0, whereas the forechannel slotid's sequenceid is set to 1. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:42 -07:00
Ricardo Labiaga	050047ce71	nfs41: Backchannel: Refactor nfs4_init_slot_table() Generalize nfs4_init_slot_table() so it can be used to initialize the backchannel slot table in addition to the forechannel slot table. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:42 -07:00
Ricardo Labiaga	b73dafa7ac	nfs41: Backchannel: Refactor nfs4_reset_slot_table() Generalize nfs4_reset_slot_table() so it can be used to reset the backchannel slot table in addition to the forechannel slot table. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:41 -07:00
Ricardo Labiaga	65fc64e547	nfs41: Backchannel: update cb_sequence args and results Change the type of cs_addr and csr_status to 'struct sockaddr' and '__be32' since the cb_sequence processing function will use existing functionality that expects these types. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:40 -07:00
Benny Halevy	281fe15dc1	nfs41: verify CB_SEQUENCE position in callback compound CB_SEQUENCE must appear first in the callback compound RPC. If it is not the first operation NFS4ERR_SEQUENCE_POS must be returned. If the first operation ni the CB_COMPOUND is not CB_SEQUENCE then NFS4ERR_OP_NOT_IN_SESSION must be returned. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: refactor op preprocessing out of process_op] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:39 -07:00
Benny Halevy	4aece6a19c	nfs41: cb_sequence xdr implementation [nfs41: get rid of READMEM and COPYMEM for callback_xdr.c] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: get rid of READ64 in callback_xdr.c] See http://linux-nfs.org/pipermail/pnfs/2009-June/007846.html Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:38 -07:00
Benny Halevy	d49433e1e3	nfs41: cb_sequence proc implementation Currently, just free up any referring calls information. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: fix csr_{,target}highestslotid] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:38 -07:00
Benny Halevy	2d9b9ec344	nfs41: cb_sequence protocol level data structures Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:37 -07:00
Benny Halevy	34bc47c941	nfs41: consider minorversion in callback_xdr:process_op Note that this patch changes the nfsv4.0 behavior also when CONFIG_NFS_V4_1 is not defined where NFS4ERR_MINOR_VERS_MISMATCH will be returned if the client received a CB_COMPOUND with minorversion != 0. Previously, it would have returned NFS4ERR_OP_ILLEGAL for CB_SEQUENCE. (or if the server is broken and sent OP_CB_GETATTR or OP_CB_RECALL with minorversion!=0, they would have been processed normally. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: refactor op preprocessing out of process_op] See http://linux-nfs.org/pipermail/pnfs/2009-June/007845.html [nfs41: define CB_NOTIFY_DEVICEID as not supported] Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:37 -07:00
Benny Halevy	45377b94ed	nfs41: callback numbers definitions Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:36 -07:00
Benny Halevy	48a9e2d228	nfs41: decode minorversion 1 cb_compound header decode cb_compound header conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Get rid of cb_compound_hdr_arg.callback_ident callback_ident is not used anywhere so we shouldn't waste any memory to store it. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: no need to break read_buf in decode_compound_hdr_arg] See http://linux-nfs.org/pipermail/pnfs/2009-June/007844.html Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:35 -07:00
Benny Halevy	b8f2ef84b0	nfs41: store minorversion in cb_compound_hdr_arg Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:35 -07:00
Andy Adamson	5a0ffe544c	nfs41: Release backchannel resources associated with session Frees the preallocated backchannel resources that are associated with this session when the session is destroyed. A backchannel is currently created once per session. Destroy the backchannel only when the session is destroyed. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:34 -07:00
Andy Adamson	0f91421e8e	nfs41: Client indicates presence of NFSv4.1 callback channel. Set the SESSION4_BACK_CHAN flag to indicate the client supports a backchannel. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:33 -07:00
Andy Adamson	0b5b7ae0a8	nfs41: Setup the backchannel The NFS v4.1 callback service has already been setup, and rpc_xprt->serv points to the svc_serv structure describing it. Invoke the xprt_setup_backchannel() initialization to pre- allocate the necessary backchannel structures. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: change nfs4_put_session(nfs4_session*) to nfs4_destroy_session(nfs_session)] Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [moved xprt_setup_backchannel from nfs4_init_session to nfs4_init_backchannel] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:32 -07:00
Andy Adamson	e82dc22dac	nfs41: Allow NFSv4 and NFSv4.1 callback services to coexist Tracks the nfs_callback_info for both versions, enabling the callback service for v4 and v4.1 to run concurrently and be stopped independently of each other. Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:32 -07:00
Benny Halevy	8f97524235	nfs41: create a svc_xprt for nfs41 callback thread and use for incoming callbacks Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2009-06-17 14:11:31 -07:00

1 2 3 4 5 ...

14628 Commits