linux

Author	SHA1	Message	Date
Mark Fasheh	5a25403175	ocfs2: Fix max offset calculations ocfs2_max_file_offset() was over-estimating the largest file size for several cases. This wasn't really a problem before, but now that we support sparse files, it needs to be more accurate. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-08-09 17:25:49 -07:00
Mark Fasheh	ce76fd30ce	ocfs2: check ia_size limits in setattr We have to manually check the requested truncate size as the check in vmtruncate() comes too late for Ocfs2. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-08-09 17:25:38 -07:00
Mark Fasheh	7c08d70c69	ocfs2: Fix some casting errors related to file writes ocfs2_align_clusters_to_page_index() needs to cast the clusters shift to pgoff_t and ocfs2_file_buffered_write() needs loff_t when calculating destination start for memcpy. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-08-09 17:25:27 -07:00
Mark Fasheh	a00cce356b	ocfs2: use s_maxbytes directly in ocfs2_change_file_space() There's no need to recalculate things via ocfs2_max_file_offset() as we've already done that to fill s_maxbytes, so use that instead. We can also un-export ocfs2_max_file_offset() then. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-08-09 17:25:07 -07:00
Mark Fasheh	c11e9fafb3	ocfs2: Restrict inode changes in ocfs2_update_inode_atime() ocfs2_update_inode_atime() calls ocfs2_mark_inode_dirty() to push changes from the struct inode into the ocfs2 disk inode. The problem is, ocfs2_mark_inode_dirty() might change other fields, depending on what happened to the struct inode. Since we don't always have locking to serialize changes to other fields (like i_size, etc), just fix things up to only touch the atime field. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-08-09 17:23:50 -07:00
Linus Torvalds	8b80fc02b8	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: SUNRPC: Replace flush_workqueue() with cancel_work_sync() and friends NFS: Replace flush_scheduled_work with cancel_work_sync() and friends SUNRPC: Don't call gss_delete_sec_context() from an rcu context NFSv4: Don't call put_rpccred() from an rcu callback NFS: Fix NFSv4 open stateid regressions NFSv4: Fix a locking regression in nfs4_set_mode_locked() NFS: Fix put_nfs_open_context SUNRPC: Fix a race in rpciod_down()	2007-08-09 08:38:14 -07:00
Trond Myklebust	3d39c691ff	NFS: Replace flush_scheduled_work with cancel_work_sync() and friends This will avoid deadlocks of the form: stack backtrace: [<c0104fda>] show_trace_log_lvl+0x1a/0x30 [<c0105c02>] show_trace+0x12/0x20 [<c0105d15>] dump_stack+0x15/0x20 [<c013ee42>] __lock_acquire+0xc22/0x1030 [<c013f2b1>] lock_acquire+0x61/0x80 [<c012edd9>] flush_workqueue+0x49/0x70 [<c012ee0d>] flush_scheduled_work+0xd/0x10 [<dcf55c0c>] nfs_release_automount_timer+0x2c/0x30 [nfs] [<dcf45d8e>] nfs_free_server+0x9e/0xd0 [nfs] [<dcf4e626>] nfs_kill_super+0x16/0x20 [nfs] [<c017b38d>] deactivate_super+0x7d/0xa0 [<c018f94b>] mntput_no_expire+0x4b/0x80 [<c018fd94>] expire_mount_list+0xe4/0x140 [<c0191219>] mark_mounts_for_expiry+0x99/0xb0 [<dcf55d1d>] nfs_expire_automounts+0xd/0x40 [nfs] [<c012e61b>] run_workqueue+0x12b/0x1e0 [<c012f05b>] worker_thread+0x9b/0x100 [<c0131c72>] kthread+0x42/0x70 [<c0104c0f>] kernel_thread_helper+0x7/0x18 ======================= Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-08-07 16:12:50 -04:00
Trond Myklebust	905f8d16e3	NFSv4: Don't call put_rpccred() from an rcu callback Doing so would require us to introduce bh-safe locks into put_rpccred(). This patch fixes the lockdep complaint reported by Marc Dietrich: inconsistent {softirq-on-W} -> {in-softirq-W} usage. swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (rpc_credcache_lock){-+..}, at: [<c01dc487>] _atomic_dec_and_lock+0x17/0x60 {softirq-on-W} state was registered at: [<c013e870>] __lock_acquire+0x650/0x1030 [<c013f2b1>] lock_acquire+0x61/0x80 [<c02db9ac>] _spin_lock+0x2c/0x40 [<c01dc487>] _atomic_dec_and_lock+0x17/0x60 [<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc] [<dced56c1>] rpcauth_unbindcred+0x21/0x60 [sunrpc] [<dced3fd4>] a0 [sunrpc] [<dcecefe0>] rpc_call_sync+0x30/0x40 [sunrpc] [<dcedc73b>] rpcb_register+0xdb/0x180 [sunrpc] [<dced65b3>] svc_register+0x93/0x160 [sunrpc] [<dced6ebe>] __svc_create+0x1ee/0x220 [sunrpc] [<dced7053>] svc_create+0x13/0x20 [sunrpc] [<dcf6d722>] nfs_callback_up+0x82/0x120 [nfs] [<dcf48f36>] nfs_get_client+0x176/0x390 [nfs] [<dcf49181>] nfs4_set_client+0x31/0x190 [nfs] [<dcf49983>] nfs4_create_server+0x63/0x3b0 [nfs] [<dcf52426>] nfs4_get_sb+0x346/0x5b0 [nfs] [<c017b444>] vfs_kern_mount+0x94/0x110 [<c0190a62>] do_mount+0x1f2/0x7d0 [<c01910a6>] sys_mount+0x66/0xa0 [<c0104046>] syscall_call+0x7/0xb [<ffffffff>] 0xffffffff irq event stamp: 5277830 hardirqs last enabled at (5277830): [<c017530a>] kmem_cache_free+0x8a/0xc0 hardirqs last disabled at (5277829): [<c01752d2>] kmem_cache_free+0x52/0xc0 softirqs last enabled at (5277798): [<c0124173>] __do_softirq+0xa3/0xc0 softirqs last disabled at (5277817): [<c01241d7>] do_softirq+0x47/0x50 other info that might help us debug this: no locks held by swapper/0. stack backtrace: [<c0104fda>] show_trace_log_lvl+0x1a/0x30 [<c0105c02>] show_trace+0x12/0x20 [<c0105d15>] dump_stack+0x15/0x20 [<c013ccc3>] print_usage_bug+0x153/0x160 [<c013d8b9>] mark_lock+0x449/0x620 [<c013e824>] __lock_acquire+0x604/0x1030 [<c013f2b1>] lock_acquire+0x61/0x80 [<c02db9ac>] _spin_lock+0x2c/0x40 [<c01dc487>] _atomic_dec_and_lock+0x17/0x60 [<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc] [<dcf6bf83>] nfs_free_delegation_callback+0x13/0x20 [nfs] [<c012f9ea>] __rcu_process_callbacks+0x6a/0x1c0 [<c012fb52>] rcu_process_callbacks+0x12/0x30 [<c0124218>] tasklet_action+0x38/0x80 [<c0124125>] __do_softirq+0x55/0xc0 [<c01241d7>] do_softirq+0x47/0x50 [<c0124605>] irq_exit+0x35/0x40 [<c0112463>] smp_apic_timer_interrupt+0x43/0x80 [<c0104a77>] apic_timer_interrupt+0x33/0x38 [<c02690df>] cpuidle_idle_call+0x6f/0x90 [<c01023c3>] cpu_idle+0x43/0x70 [<c02d8c27>] rest_init+0x47/0x50 [<c03bcb6a>] start_kernel+0x22a/0x2b0 [<00000000>] 0x0 ======================= Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-08-07 15:15:57 -04:00
Trond Myklebust	45328c354e	NFS: Fix NFSv4 open stateid regressions Do not allow cached open for O_RDONLY or O_WRONLY unless the file has been previously opened in these modes. Also Fix the calculation of the mode in nfs4_close_prepare. We should only issue an OPEN_DOWNGRADE if we're sure that we will still be holding the correct open modes. This may not be the case if we've been doing delegated opens. Finally, there is no need to adjust the open mode bit flags in nfs4_close_done(): that has already been done in nfs4_close_prepare(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-08-07 15:13:19 -04:00
Trond Myklebust	ba683031fa	NFSv4: Fix a locking regression in nfs4_set_mode_locked() We don't really need to clear &state->inode_states inside nfs4_set_mode_locked, and doing so without holding the inode->i_lock would in any case be a bug... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-08-07 15:13:18 -04:00
Trond Myklebust	5e11934d13	NFS: Fix put_nfs_open_context We need to grab the inode->i_lock atomically with the last reference put in order to remove the open context that is being freed from the nfsi->open_files list. Fix by converting the kref to a standard atomic counter and then using atomic_dec_and_lock()... Thanks to Arnd Bergmann for pointing out the problem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-08-07 15:13:17 -04:00
Masakazu Mokuno	313b0d3d86	[PATCH] remove duplicated ioctl entries in compat_ioctl.c This patch removes some duplicated wireless ioctl entries in the array 'struct ioctl_trans ioctl_start[]' of fs/compat_ioctl.c These entries are registered twice like: COMPATIBLE_IOCTL(SIOCGIWPRIV) and HANDLE_IOCTL(SIOCGIWPRIV, do_wireless_ioctl) Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-08-06 15:06:03 -04:00
David Woodhouse	b8e3ec30c2	[JFFS2] Print correct node offset when complaining about broken data CRC Debugging the hardware problems in OLPC trac #1905 would be a whole lot easier if the correct node offsets were printed for the offending nodes. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-08-02 21:43:46 +01:00
David Woodhouse	7b687707d7	[JFFS2] Fix suspend failure with JFFS2 GC thread. The try_to_freeze() call was in the wrong place; we need it in the signal-pending loop now that a pending freeze also makes signal_pending() return true. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-08-02 21:43:03 +01:00
David Woodhouse	71c2339775	[JFFS2] Deletion dirents should be REF_NORMAL, not REF_PRISTINE. Otherwise they'll never actually get garbage-collected. Noted by Jonathan Larmour. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-08-02 21:39:50 +01:00
Joakim Tjernlund	5bd5c03c31	[JFFS2] Prevent oops after 'node added in wrong place' debug check jffs2_add_physical_node_ref() should never really return error -- it's an internal debugging check which triggered. We really need to work out why and stop it happening. But in the meantime, let's make the failure mode a little less nasty. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-08-02 21:36:35 +01:00
Cyrill Gorcunov	ca76d2d803	UDF: fix UID and GID mount option ignorance This patch fix weird behaviour of UDF mounting procedure. To get UID changed (for now) we have to type mount -t udf -o uid=some_user,uid=ignore /dev/device /mnt/moun_point and specifying two uid at once is strange a bit. So with the patch we are able to mount without additional 'uid=ignore' option. The same for GID option is done. This patch will not break current mount scheme (with two option). Btw this does fix (I hope) the following [BUG 6124] mount of UDF fs ignores UID and GID options http://bugzilla.kernel.org/show_bug.cgi?id=6124 Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael <auslands-kv@gmx.de> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:43 -07:00
Christoph Hellwig	0af1a45046	rename setlease to generic_setlease Make it a little more clear that this is the default implementation for the setleast operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:43 -07:00
david m. richter	9700382c3c	VFS: fix a race in lease-breaking during truncate It is possible that another process could acquire a new file lease right after break_lease() is called during a truncate, but before lease-granting is disabled by the subsequent get_write_access(). Merely switching the order of the break_lease() and get_write_access() calls prevents this race. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:42 -07:00
Robert P. J. Day	d7ef970baf	NCP: delete test of long-deceased CONFIG_NCPFS_DEBUGDENTRY Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:41 -07:00
Kirill Kuvaldin	817794e0df	isofs: mounting to regular file may succeed It turned out that mounting a corrupted ISO image to a regular file may succeed, e.g. if an image was prepared as follows: $ dd if=correct.iso of=bad.iso bs=4k count=8 We then can mount it to a regular file: # mount -o loop -t iso9660 bad.iso /tmp/file But mounting it to a directory fails with -ENOTDIR, simply because the root directory inode doesn't have S_IFDIR set and the condition in graft_tree() is met: if (S_ISDIR(nd->dentry->d_inode->i_mode) != S_ISDIR(mnt->mnt_root->d_inode->i_mode)) return -ENOTDIR This is because the root directory inode was read from an incorrect block. It's supposed to be read from sbi->s_firstdatazone, which is an absolute value and gets messed up in the case of an incorrect image. In order to somehow circumvent this we have to check that the root directory inode is actually a directory after all. Signed-off-by: Kirill Kuvaldin <kuvkir@epsmu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:41 -07:00
Alexey Dobriyan	5ea473a1df	Fix leaks on /proc/{*/sched,sched_debug,timer_list,timer_stats} On every open/close one struct seq_operations leaks. Kudos to /proc/slab_allocators. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:40 -07:00
David Howells	ff8e210a95	AFS: fix file locking Fix file locking for AFS: () Start the lock manager thread under a mutex to avoid a race. () Made the locking non-fair: New readlocks will jump pending writelocks if there's a readlock currently granted on a file. This makes the behaviour similar to Linux's VFS locking. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:40 -07:00
J. Bruce Fields	4a4b88317a	knfsd: eliminate unnecessary -ENOENT returns on export downcalls A succesful downcall with a negative result (which indicates that the given filesystem is not exported to the given user) should not return an error. Currently mountd is depending on stdio to write these downcalls. With some versions of libc this appears to cause subsequent writes to attempt to write all accumulated data (for which writes previously failed) along with any new data. This can prevent the kernel from seeing responses to later downcalls. Symptoms will be that nfsd fails to respond to certain requests. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:38 -07:00
J. Bruce Fields	0a725fc4d3	nfsd4: idmap upcalls should use unsigned uid and gid We shouldn't be using negative uid's and gid's in the idmap upcalls. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:38 -07:00
Jeff Layton	749997e512	knfsd: set the response bitmask for NFS4_CREATE_EXCLUSIVE RFC 3530 says: If the server uses an attribute to store the exclusive create verifier, it will signify which attribute by setting the appropriate bit in the attribute mask that is returned in the results. Linux uses the atime and mtime to store the verifier, but sends a zeroed out bitmask back to the client. This patch makes sure that we set the correct bits in the bitmask in this situation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:38 -07:00
Mingming Cao	dd54567a83	"ext4_ext_put_in_cache" uses __u32 to receive physical block number Yan Zheng wrote: > I think I found a bug in ext4/extents.c, "ext4_ext_put_in_cache" uses > "__u32" to receive physical block number. "ext4_ext_put_in_cache" is > used in "ext4_ext_get_blocks", it sets ext4 inode's extent cache > according most recently tree lookup (higher 16 bits of saved physical > block number are always zero). when serving a mapping request, > "ext4_ext_get_blocks" first check whether the logical block is in > inode's extent cache. if the logical block is in the cache and the > cached region isn't a gap, "ext4_ext_get_blocks" gets physical block > number by using cached region's physical block number and offset in > the cached region. as described above, "ext4_ext_get_blocks" may > return wrong result when there are physical block numbers bigger than > 0xffffffff. > You are right. Thanks for reporting this! Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Yan Zheng <yanzheng@21cn.com> Cc: <stable@kernel.org> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:37 -07:00
David Howells	2e92a3baee	NOMMU: Fix SYSV IPC SHM Fix the SYSV IPC SHM to work with the changes applied by the new fault handler patches when CONFIG_MMU=n. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:36 -07:00
David S. Miller	8163904e66	[SPARC]: Mark SBUS framebuffer ioctls as IGNORE in compat_ioctl.c They are handled in a ->compat_ioctl() handler, so it's just noise when compat_ioctl.c warns which occurs when they are used on non-SBUS framebuffer devices. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-30 00:27:36 -07:00
Mark Fortescue	3961bae0ac	[PARTITION]: Sun/Solaris VTOC table corrections Start doing VTOC validation before using its contents. The validation is adjusted so as not to break existing setups that do not set the VTOC version, sanity and partition count entries. VTOC tables with more than 8 partitions will NOT be used. Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-30 00:27:31 -07:00
Mark Fortescue	b84d879639	[PARTITION] MSDOS: Fix Sun num_partitions handling. Correct the Solaris x86 number of partitions (slices) is a way that is backward compatible with the earlier size. This works without a new VTOC structure definition as the timestamp and v_asciilabel fields in the VTOC are not used by the kernel yet. Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-30 00:27:28 -07:00
Alexey Dobriyan	4e950f6f01	Remove fs.h from mm.h Remove fs.h from mm.h. For this, 1) Uninline vma_wants_writenotify(). It's pretty huge anyway. 2) Add back fs.h or less bloated headers (err.h) to files that need it. As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files rebuilt down to 3444 (-12.3%). Cross-compile tested without regressions on my two usual configs and (sigh): alpha arm-mx1ads mips-bigsur powerpc-ebony alpha-allnoconfig arm-neponset mips-capcella powerpc-g5 alpha-defconfig arm-netwinder mips-cobalt powerpc-holly alpha-up arm-netx mips-db1000 powerpc-iseries arm arm-ns9xxx mips-db1100 powerpc-linkstation arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200 arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2 arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads arm-ep93xx i386-up mips-pb1100 powerpc-pasemi arm-footbridge ia64 mips-pb1500 powerpc-pmac32 arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64 arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800 arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3 arm-h7202 ia64-gensparse mips-qemu powerpc-pseries arm-hackkit ia64-sim mips-rbhma4200 powerpc-up arm-integrator ia64-sn2 mips-rbhma4500 s390 arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig arm-iop33x ia64-zx1 mips-sead s390-up arm-ixp2000 m68k mips-tb0219 sparc arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig arm-jornada720 m68k-atari mips-workpad sparc-up arm-kafa m68k-bvme6000 mips-wrppmc sparc64 arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig arm-ks8695 m68k-mac parisc sparc64-defconfig arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64 arm-lpd7a400 m68k-q40 parisc-up x86_64 arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig arm-lusl7200 mips powerpc-celleb x86_64-up arm-mainstone mips-atlas powerpc-chrp32 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-29 17:09:29 -07:00
David Miller	778f3dd5a1	Fix procfs compat_ioctl regression It is important to only provide the compat_ioctl method if the downstream de->proc_fops does too, otherwise this utterly confuses the logic in fs/compat_ioctl.c and we end up doing the wrong thing. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-28 19:42:22 -07:00
Linus Torvalds	8e8ef2971b	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: docbook: add pipes, other fixes blktrace: use cpu_clock() instead of sched_clock() bsg: Fix build for CONFIG_BLOCK=n [patch] QUEUE_FLAG_READFULL QUEUE_FLAG_WRITEFULL comment fix	2007-07-28 19:31:13 -07:00
Tony Luck	7a6c813594	[IA64] Fix build failure in fs/quota.c `b716395e2b` added code to handle a compatability issue with 32bit quota tools, but the new compat routines are only needed when CONFIG_COMPAT=y (and with this set to 'n' there are compilation problems since some new typedefs are not visible). Reported by Doug Chapman. Fix tuned by a cast of thousands (Andi, Andreas, Arthur, HPA, Willy) Signed-off-by: Tony Luck <tony.luck@intel.com>	2007-07-27 15:40:13 -07:00
Randy Dunlap	79685b8dee	docbook: add pipes, other fixes Fix some typos in pipe.c and splice.c. Add pipes API to kernel-api.tmpl. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-27 08:08:51 +02:00
Eric Sandeen	780dcdb211	fix inode_table test in ext234_check_descriptors ext[234]_check_descriptors sanity checks block group descriptor geometry at mount time, testing whether the block bitmap, inode bitmap, and inode table reside wholly within the blockgroup. However, the inode table test is off by one so that if the last block in the inode table resides on the last block of the block group, the test incorrectly fails. This is because it tests the last block as (start + length) rather than (start + length - 1). This can be seen by trying to mount a filesystem made such as: mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024 which yields: EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)! EXT2-fs: group descriptors corrupted! There is a similar bug in e2fsprogs, patch already sent for that. (I wonder if inside(), outside(), and/or in_range() should someday be used in this and other tests throughout the ext filesystems...) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Davide Libenzi	098284020c	make timerfd return a u64 and fix the __put_user Davi fixed a missing cast in the __put_user(), that was making timerfd return a single byte instead of the full value. Talking with Michael about the timerfd man page, we think it'd be better to use a u64 for the returned value, to align it with the eventfd implementation. This is an ABI change. The timerfd code is new in 2.6.22 and if we merge this into 2.6.23 then we should also merge it into 2.6.22.x. That will leave a few early 2.6.22 kernels out in the wild which might misbehave when a future timerfd-enabled glibc is run on them. mtk says: The difference would be that read() will only return 4 bytes, while the application will expect 8. If the application is checking the size of returned value, as it should, then it will be able to detect the problem (it could even be sophisticated enough to know that if this is a 4-byte return, then it is running on an old 2.6.22 kernel). If the application is not checking the return from read(), then its 8-byte buffer will not be filled -- the contents of the last 4 bytes will be undefined, so the u64 value as a whole will be junk. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Davi Arnaut <davi@haxent.com.br> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Ulrich Drepper	f50cadaa8f	tiny signalfd cleanup This is probably a leftover from a time when the return wasn't there yet. Now the extra assignment is just irritating. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:33:06 -07:00
Al Viro	87588dd666	more reiserfs endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:11:58 -07:00
Al Viro	ad690ef9e6	xfs ioctl __user annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:11:57 -07:00
Al Viro	ca5c8cde93	lockd and nfsd endianness annotation fixes Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:11:56 -07:00
Steve French	a403a0a370	[CIFS] Fix hang in find_writable_file Caused by unneeded reopen during reconnect while spinlock held. Fixes kernel bugzilla bug #7903 Thanks to Lin Feng Shen for testing this, and Amit Arora for some nice problem determination to narrow this down. Acked-by: Dave Kleikamp <shaggy@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-26 15:54:16 +00:00
Jens Axboe	3836df6b52	ocfs2: bad kunmap_atomic() kunmap_atomic() takes the virtual address, not the mapped page as argument. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-24 16:02:55 -07:00
Linus Torvalds	b2e961eb2e	Merge branch 'request-queue-t' of git://git.kernel.dk/linux-2.6-block * 'request-queue-t' of git://git.kernel.dk/linux-2.6-block: [BLOCK] Add request_queue_t and mark it deprecated [BLOCK] Get rid of request_queue_t typedef	2007-07-24 12:26:44 -07:00
Ulrich Drepper	0d786d4a27	fallocate syscall interface deficiency The fallocate syscall returns ENOSYS in case the filesystem does not support the operation and expects the userlevel code to fill in. This is good in concept. The problem is that the libc code for old kernels should be able to distinguish the case where the syscall is not at all available vs not functioning for a specific mount point. As is this is not possible and we always have to invoke the syscall even if the kernel doesn't support it. I suggest the following patch. Using EOPNOTSUPP is IMO the right thing to do. Cc: Amit Arora <aarora@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-24 12:24:58 -07:00
Jens Axboe	165125e1e4	[BLOCK] Get rid of request_queue_t typedef Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-24 09:28:11 +02:00
Al Viro	41089644c1	fix broken handling of port=... in NFS option parsing Obviously broken on little-endian; fortunately, the option is not frequently used... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ Hey, sparse is wonderful, but even better than sparse is having people like Al that actually _run_ it and fix bugs using it. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-22 11:15:18 -07:00
Ravikiran G Thirumalai	c3508f8f34	x86_64: Avoid too many remote cpu references due to /proc/stat Too many remote cpu references due to /proc/stat. On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem. On every call to kstat_irqs, the process brings in per-cpu data from all online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS results in (256+3263) 63 remote cpu references on a 64 cpu config. /proc/stat is parsed by common commands like top, who etc, causing lots of cacheline transfers This statistic seems useless. Other 'big iron' arches disable this. AK: changed to remove for all SMP setups AK: add comment Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 18:37:09 -07:00
Andrew Morton	d4e3cc387e	revert "PIE randomization" There are reports of this causing userspace failures (http://lkml.org/lkml/2007/7/20/421). Revert. Cc: Jan Kratochvil <honza@jikos.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ulrich Kunitz <kune@deine-taler.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Bret Towe" <magnade@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:14 -07:00
J. Bruce Fields	3e63516c82	knfsd: fix typo in export display, print uid and gid as unsigned For display purposes, treat uid's and gid's as unsigned ints for now. Also fix a typo. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:14 -07:00
Jan Harkes	d3fec424b2	coda: remove CODA_STORE/CODA_RELEASE upcalls This is an variation on the patch sent by Christoph Hellwig which kills file_count abuse by the Coda kernel module by moving the coda_flush functionality into coda_release. However part of reason we were using the coda_flush callback was to allow Coda to pass errors that occur during writeback from the userspace cache manager back to close(). As Al Viro explained on linux-fsdevel, it is impossible to guarantee that such errors can in fact be returned back to the caller. There are many cases where the last reference to a file is not released by the close system call and it is also impossible to pick some close as a 'last-close' and delay it until all other references have been destroyed. The CODA_STORE/CODA_RELEASE upcall combination is clearly a broken design, and it is better to remove it completely. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:14 -07:00
Cyrill Gorcunov	28de7948a8	UDF: coding style conversion - lindent fixups This patch fixes up sources after conversion by Lindent. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:14 -07:00
Jens Axboe	6a860c979b	splice: fix bad unlock_page() in error case If add_to_page_cache_lru() fails, the page will not be locked. But splice jumps to an error path that does a page release and unlock, causing a BUG() in unlock_page(). Fix this by adding one more label that just releases the page. This bug was actually triggered on EL5 by gurudas pai <gurudas.pai@oracle.com> using fio. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 09:07:01 -07:00
David Howells	bd6dc742a4	AFS: Use patched rxrpc_kernel_send_data() correctly Fix afs_send_simple_reply() to accept a greater-than-zero return value from rxrpc_kernel_send_data() as being a successful return rather than thinking it an error and aborting the call. rxrpc_kernel_send_data() previously returned zero incorrectly when it worked successfully, but has been patched to return the number of bytes it transmitted. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 08:54:14 -07:00
Nick Piggin	1833633803	fix some conversion overflows Fix page index to offset conversion overflows in buffer layer, ecryptfs, and ocfs2. It would be nice to convert the whole tree to page_offset, but for now just fix the bugs. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 08:44:19 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Al Viro	5f47c7eac6	coda breakage a) switch by loff_t == __cmpdi2 use. Replaced with a couple of obvious ifs; update of ->f_pos in the first one makes sure that we do the right thing in all cases. b) block_signals() and unblock_signals() are globals on UML. Renamed coda ones; in principle UML probably ought to do rename as well, but that's another story. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 16:29:55 -07:00
Linus Torvalds	fdb64f93b3	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Fix inode size update before data write in xfs_setattr [XFS] Allow punching holes to free space when at ENOSPC [XFS] Implement ->page_mkwrite in XFS. [FS] Implement block_page_mkwrite. Manually fix up conflict with Nick's VM fault handling patches in fs/xfs/linux-2.6/xfs_file.c Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 14:41:33 -07:00
Linus Torvalds	3e1f900bff	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: NFSv4: handle lack of clientaddr in option string NFSv4: debug print ntohl(status) in nfs client callback xdr code SUNRPC: Clean up the sillyrename code NFS: Introduce struct nfs_removeargs+nfs_removeres NFS: Use dentry->d_time to store the parent directory verifier. SUNRPC: move bkl locking and xdr proc invocation into a common helper NFSv4: Fix the nfsv4 readlink reply buffer alignment NFSv4: Fix the readdir reply buffer alignment NFSv4: More NFSv4 xdr cleanups NFSv4: Try to recover from getfh failures in nfs4_xdr_dec_open NFSv4: 'constify' lookup arguments. NFSv4: Don't fail nfs4_xdr_dec_open if decode_restorefh() failed NFSv4: Fix open state recovery NFSD/SUNRPC: Fix the automatic selection of RPCSEC_GSS	2007-07-19 14:33:41 -07:00
Linus Torvalds	f745bb1c73	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: ->fallocate() support	2007-07-19 14:16:44 -07:00
Jeff Layton	0a87cf128f	NFSv4: handle lack of clientaddr in option string If a NFSv4 mount is attempted with string based options, and the option string doesn't contain a clientaddr= option, the kernel will currently oops. Check for this situation and return a proper error. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:40 -04:00
Benny Halevy	f9d888fcd9	NFSv4: debug print ntohl(status) in nfs client callback xdr code status in nfs client callback xdr code is passed in network order. print it in host order for better readability. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:40 -04:00
Trond Myklebust	e4eff1a622	SUNRPC: Clean up the sillyrename code Fix a couple of bugs: - Don't rely on the parent dentry still being valid when the call completes. Fixes a race with shrink_dcache_for_umount_subtree() - Don't remove the file if the filehandle has been labelled as stale. Fix a couple of inefficiencies - Remove the global list of sillyrenamed files. Instead we can cache the sillyrename information in the dentry->d_fsdata - Move common code from unlink_setup/unlink_done into fs/nfs/unlink.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Trond Myklebust	4fdc17b2a7	NFS: Introduce struct nfs_removeargs+nfs_removeres We need a common structure for setting up an unlink() rpc call in order to fix the asynchronous unlink code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Trond Myklebust	3062c532ad	NFS: Use dentry->d_time to store the parent directory verifier. This will free up the d_fsdata field for other use. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:21:39 -04:00
Trond Myklebust	e3a535e173	NFSv4: Fix the nfsv4 readlink reply buffer alignment Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:04 -04:00
Trond Myklebust	d6ac02dfaa	NFSv4: Fix the readdir reply buffer alignment Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:04 -04:00
Trond Myklebust	9104a55dc3	NFSv4: More NFSv4 xdr cleanups Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:04 -04:00
Trond Myklebust	9936781d01	NFSv4: Try to recover from getfh failures in nfs4_xdr_dec_open Try harder to recover the open state if the server failed to return a filehandle. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	56659e9926	NFSv4: 'constify' lookup arguments. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	365c8f589a	NFSv4: Don't fail nfs4_xdr_dec_open if decode_restorefh() failed We can already easily recover from that inside _nfs4_proc_open(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	6f220ed5a8	NFSv4: Fix open state recovery Ensure that opendata->state is always initialised when we do state recovery. Ensure that we set the filehandle in the case where we're doing an "OPEN_CLAIM_PREVIOUS" call due to a server reboot. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:03 -04:00
Trond Myklebust	8cd69e1bc7	NFSD/SUNRPC: Fix the automatic selection of RPCSEC_GSS Bruce's patch broke the ability to compile RPCSEC_GSS as a module. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-19 15:09:02 -04:00
Andrew Morton	275afcac99	afs build fix Bruce and David's patches clashed. fs/afs/flock.c: In function 'afs_do_getlk': fs/afs/flock.c:459: error: void value not ignored as it ought to be Cc: "J. Bruce Fields" <bfields@fieldses.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:57 -07:00
J. Bruce Fields	c7d51402d2	knfsd: clean up EX_RDONLY Share a little common code, reverse the arguments for consistency, drop the unnecessary "inline", and lowercase the name. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	e22841c637	knfsd: move EX_RDONLY out of header EX_RDONLY is only called in one place; just put it there. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	5d3dbbeaf5	nfsd: remove unnecessary NULL checks from nfsd_cross_mnt We can now assume that rqst_exp_get_by_name() does not return NULL; so clean up some unnecessary checks. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	9a25b96c1f	nfsd: return errors, not NULL, from export functions I converted the various export-returning functions to return -ENOENT instead of NULL, but missed a few cases. This particular case could cause actual bugs in the case of a krb5 client that doesn't match any ip-based client and that is trying to access a filesystem not exported to krb5 clients. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
J. Bruce Fields	a280df32db	nfsd: fix possible read-ahead cache and export table corruption The value of nperbucket calculated here is too small--we should be rounding up instead of down--with the result that the index j in the following loop can overflow the raparm_hash array. At least in my case, the next thing in memory turns out to be export_table, so the symptoms I see are crashes caused by the appearance of four zeroed-out export entries in the first bucket of the hash table of exports (which were actually entries in the readahead cache, a pointer to which had been written to the export table in this initialization code). It looks like the bug was probably introduced with commit `fce1456a19` ("knfsd: make the readahead params cache SMP-friendly"). Cc: <stable@kernel.org> Cc: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Yoann Padioleau	dd00cc486a	some kmalloc/memset ->kzalloc (tree wide) Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc). Here is a short excerpt of the semantic patch performing this transformation: @@ type T2; expression x; identifier f,fld; expression E; expression E1,E2; expression e1,e2,e3,y; statement S; @@ x = - kmalloc + kzalloc (E1,E2) ... when != $x->fld=E;\\|y=f(...,x,...);\\|f(...,x,...);\\|x=E;\\|while(...) S\\|for(e1;e2;e3) S$ - memset((T2)x,0,E1); @@ expression E1,E2,E3; @@ - kzalloc(E1 * E2,E3) + kcalloc(E1,E2,E3) [akpm@linux-foundation.org: get kcalloc args the right way around] Signed-off-by: Yoann Padioleau <padator@wanadoo.fr> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Bryan Wu <bryan.wu@analog.com> Acked-by: Jiri Slaby <jirislaby@gmail.com> Cc: Dave Airlie <airlied@linux.ie> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Jiri Kosina <jkosina@suse.cz> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Acked-by: Pierre Ossman <drzeus-list@drzeus.cx> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Greg KH <greg@kroah.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:50 -07:00
Jan Harkes	5b7f13bd26	coda: update module information Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:49 -07:00
Jan Harkes	3cf01f28c3	coda: remove statistics counters from /proc/fs/coda Similar information can easily be obtained with strace -c. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	a1b0aa8764	coda: remove struct coda_sb_info The sb_info structure only contains a single pointer to the character device, there is no need for the added indirection. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	5fd31e9a67	coda: cleanup downcall handler Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	ed36f72367	coda: cleanup coda_lookup, use dsplice_alias Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	970648eb03	coda: ignore returned values when upcalls return errors Venus returns an ENOENT error on open, so we shouldn't try to grab the filehandle for the returned fd. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	37461e1957	coda: replace upc_alloc/upc_free with kmalloc/kfree Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	978752534e	coda: avoid lockdep warning in coda_readdir Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	d9664c95af	coda: block signals during upcall processing We ignore signals for about 30 seconds to give userspace a chance to see the upcall. As we did not block signals we ended up in a busy loop for the remainder of the period when a signal is received. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	fe71b5f387	coda: cleanup for upcall handling path Make the code that processes upcall responses more straightforward, uncovered at least one bad assumption. We trusted that vc_inuse would be 0 when upcalls are aborted, however the device may have been reopened. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	8706551963	coda: cleanup /dev/cfs open and close handling - Make sure device index is not a negative number. - Unlink queued requests when the device is closed to avoid passing them to the next opener. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	ed31a7dd63	coda: use ilookup5 Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	fac1f0e340	coda: coda doesn't track atime Set MS_NOATIME flag to avoid unnecessary calls when the coda inode is accessed. Also, set statfs.f_bsize to 4k. 1k is obviously too small for the suggested IO size. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	8c6d215284	coda: allow removal of busy directories A directory without children may still be busy when it is the cwd for some process. We can safely remove such a directory because the VFS prevents further operations. Also we don't need to call d_delete as it is already called in vfs_rmdir. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	d728900cd5	coda: fix nlink updates for directories The Coda client sets the directory link count to 1 when it isn't sure how many subdirectories we have. In this case we shouldn't change the link count in the kernel when a subdirectory is created or removed. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	56ee354794	coda: correctly invalidate cached access rights Change the epoch value to forces a refresh instead of clearing the cached rights mask and block all further accesses to the object. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Jan Harkes	38c2e4370d	coda: do not grab an uninitialized fd when the open upcall returns an error When open fails the fd in the response is uninitialized and we ended up taking a reference on the file struct and never released it. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:48 -07:00
Mingming Cao	b38bd33a6b	fix ext4/JBD2 build warnings Looking at the current linus-git tree jbd_debug() define in include/linux/jbd2.h extern u8 journal_enable_debug; #define jbd_debug(n, f, a...) \ do { \ if ((n) <= journal_enable_debug) { \ printk (KERN_DEBUG "(%s, %d): %s: ", \ __FILE__, __LINE__, __FUNCTION__); \ printk (f, ## a); \ } \ } while (0) > fs/ext4/inode.c: In function âext4_write_inodeâ: > fs/ext4/inode.c:2906: warning: comparison is always true due to limited > range of data type > > fs/jbd2/recovery.c: In function âjbd2_journal_recoverâ: > fs/jbd2/recovery.c:254: warning: comparison is always true due to > limited range of data type > fs/jbd2/recovery.c:257: warning: comparison is always true due to > limited range of data type > > fs/jbd2/recovery.c: In function âjbd2_journal_skip_recoveryâ: > fs/jbd2/recovery.c:301: warning: comparison is always true due to > limited range of data type > Noticed all warnings are occurs when the debug level is 0. Then found the "jbd2: Move jbd2-debug file to debugfs" patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b changed the jbd2_journal_enable_debug from int type to u8, makes the jbd_debug comparision is always true when the debugging level is 0. Thus the compile warning occurs. Thought about changing the jbd2_journal_enable_debug data type back to int, but can't, because the jbd2-debug is moved to debug fs, where calling debugfs_create_u8() to create the debugfs entry needs the value to be u8 type. Even if we changed the data type back to int, the code is still buggy, kernel should not print jbd2 debug message if the jbd2_journal_enable_debug is set to 0. But this is not the case. The fix is change the level of debugging to 1. The same should fixed in ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we probably should fix it all together. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Theodore Tso <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	ee78b0a61f	coredump masking: ELF-FDPIC: enable core dump filtering This patch enables core dump filtering for ELF-FDPIC-formatted core file. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	e2e00906a0	coredump masking: ELF-FDPIC: remove an unused argument This patch removes an unused argument from elf_fdpic_dump_segments(). Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	a1b59e802f	coredump masking: ELF: enable core dump filtering This patch enables core dump filtering for ELF-formatted core file. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	3cb4a0bb1e	coredump masking: add an interface for core dump filter This patch adds an interface to set/reset flags which determines each memory segment should be dumped or not when a core file is generated. /proc/<pid>/coredump_filter file is provided to access the flags. You can change the flag status for a particular process by writing to or reading from the file. The flag status is inherited to the child process when it is created. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Kawai, Hidehiro	6c5d523826	coredump masking: reimplementation of dumpable using two flags This patch changes mm_struct.dumpable to a pair of bit flags. set_dumpable() converts three-value dumpable to two flags and stores it into lower two bits of mm_struct.flags instead of mm_struct.dumpable. get_dumpable() behaves in the opposite way. [akpm@linux-foundation.org: export set_dumpable] Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:46 -07:00
Josef 'Jeff' Sipek	f79c20f525	fs: remove path_walk export Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	c4a7808fc3	fs: mark link_path_walk static Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	16b6287a52	nfsctl: use vfs_path_lookup use vfs_path_lookup instead of open-coding the necessary functionality. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Acked-by: NeilBrown <neilb@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Josef 'Jeff' Sipek	16f1820028	fs: introduce vfs_path_lookup Stackable file systems, among others, frequently need to lookup paths or path components starting from an arbitrary point in the namespace (identified by a dentry and a vfsmount). Currently, such file systems use lookup_one_len, which is frowned upon [1] as it does not pass the lookup intent along; not passing a lookup intent, for example, can trigger BUG_ON's when stacking on top of NFSv4. The first patch introduces a new lookup function to allow lookup starting from an arbitrary point in the namespace. This approach has been suggested by Christoph Hellwig [2]. The second patch changes sunrpc to use vfs_path_lookup. The third patch changes nfsctl.c to use vfs_path_lookup. The fourth patch marks link_path_walk static. The fifth, and last patch, unexports path_walk because it is no longer unnecessary to call it directly, and using the new vfs_path_lookup is cleaner. For example, the following snippet of code, looks up "some/path/component" in a directory pointed to by parent_{dentry,vfsmnt}: err = vfs_path_lookup(parent_dentry, parent_vfsmnt, "some/path/component", 0, &nd); if (!err) { /* exits / ... / once done, release the references / path_release(&nd); } else if (err == -ENOENT) { / doesn't exist / } else { / other error */ } VFS functions such as lookup_create can be used on the nameidata structure to pass the create intent to the file system. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Ollie Wild	b6a2fea393	mm: variable length argument support Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from the old mm into the new mm. We create the new mm before the binfmt code runs, and place the new stack at the very top of the address space. Once the binfmt code runs and figures out where the stack should be, we move it downwards. It is a bit peculiar in that we have one task with two mm's, one of which is inactive. [a.p.zijlstra@chello.nl: limit stack size] Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Hugh Dickins <hugh@veritas.com> [bunk@stusta.de: unexport bprm_mm_init] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Peter Zijlstra	bdf4c48af2	audit: rework execve audit The purpose of audit_bprm() is to log the argv array to a userspace daemon at the end of the execve system call. Since user-space hasn't had time to run, this array is still in pristine state on the process' stack; so no need to copy it, we can just grab it from there. In order to minimize the damage to audit_log_*() copy each string into a temporary kernel buffer first. Currently the audit code requires that the full argument vector fits in a single packet. So currently it does clip the argv size to a (sysctl) limit, but only when execve auditing is enabled. If the audit protocol gets extended to allow for multiple packets this check can be removed. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> Cc: <linux-audit@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:45 -07:00
Rusty Russell	cf914a7d65	readahead: split ondemand readahead interface into two functions Split ondemand readahead interface into two functions. I think this makes it a little clearer for non-readahead experts (like Rusty). Internally they both call ondemand_readahead(), but the page argument is changed to an obvious boolean flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	d8983910a4	readahead: pass real splice size Pass real splice size to page_cache_readahead_ondemand(). The splice code works in chunks of 16 pages internally. The readahead code should be told of the overall splice size, instead of the internal chunk size. Otherwize bad things may happen. Imagine some 17-page random splice reads. The code before this patch will result in two readahead calls: readahead(16); readahead(1); That leads to one 16-page I/O and one 32-page I/O: one extra I/O and 31 readahead miss pages. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	431a4820bf	readahead: move synchronous readahead call out of splice loop Move synchronous page_cache_readahead_ondemand() call out of splice loop. This avoids one pointless page allocation/insertion in case of non-zero ra_pages, or many pointless readahead calls in case of zero ra_pages. Note that if a user sets ra_pages to less than PIPE_BUFFERS=16 pages, he will not get expected readahead behavior anyway. The splice code works in batches of 16 pages, which can be taken as another form of synchronous readahead. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	dc7868fcb9	readahead: convert ext3/ext4 invocations Convert ext3/ext4 dir reads to use on-demand readahead. Readahead for dirs operates _not_ on file level, but on blockdev level. This makes a difference when the data blocks are not continuous. And the read routine is somehow opaque: there's no handy info about the status of current page. So a simplified call scheme is employed: to call into readahead whenever the current page falls out of readahead windows. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	a08a166fe7	readahead: convert splice invocations Convert splice reads to use on-demand readahead. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Jens Axboe <axboe@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Michael Halcrow	64ee4808a7	eCryptfs: ecryptfs_setattr() bugfix There is another bug recently introduced into the ecryptfs_setattr() function in 2.6.22. eCryptfs will attempt to treat special files like regular eCryptfs files on chmod, chown, and so forth. This leads to a NULL pointer dereference. This patch validates that the file is a regular file before proceeding with operations related to the inode's crypt_stat. Thanks to Ryusuke Konishi for finding this bug and suggesting the fix. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Ravikiran G Thirumalai	4004c69ad6	Avoid too many remote cpu references due to /proc/stat Optimize show_stat to collect per-irq information just once. On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem. On every call to kstat_irqs, the process brings in per-cpu data from all online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS results in (256+3263) 63 remote cpu references on a 64 cpu config. Considering the fact that we already compute this value per-cpu, we can save on the remote references as below. Signed-off-by: Alok N Kataria <alok.kataria@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Akinobu Mita	e53252d97e	unregister_chrdev() return void unregister_chrdev() does not return meaningful value. This patch makes it return void like most unregister_* functions. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Cyrill Gorcunov	cb00ea3528	UDF: coding style conversion - lindent This patch converts UDF coding style to kernel coding style using Lindent. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:43 -07:00
Nick Piggin	83c54070ee	mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	d0217ac04c	mm: fault feedback #1 Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	54cb8821de	mm: merge populate and nopage into fault (fixes nonlinear) Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Nick Piggin	d00806b183	mm: fix fault vs invalidate race for linear mappings Fix the race between invalidate_inode_pages and do_no_page. Andrea Arcangeli identified a subtle race between invalidation of pages from pagecache with userspace mappings, and do_no_page. The issue is that invalidation has to shoot down all mappings to the page, before it can be discarded from the pagecache. Between shooting down ptes to a particular page, and actually dropping the struct page from the pagecache, do_no_page from any process might fault on that page and establish a new mapping to the page just before it gets discarded from the pagecache. The most common case where such invalidation is used is in file truncation. This case was catered for by doing a sort of open-coded seqlock between the file's i_size, and its truncate_count. Truncation will decrease i_size, then increment truncate_count before unmapping userspace pages; do_no_page will read truncate_count, then find the page if it is within i_size, and then check truncate_count under the page table lock and back out and retry if it had subsequently been changed (ptl will serialise against unmapping, and ensure a potentially updated truncate_count is actually visible). Complexity and documentation issues aside, the locking protocol fails in the case where we would like to invalidate pagecache inside i_size. do_no_page can come in anytime and filemap_nopage is not aware of the invalidation in progress (as it is when it is outside i_size). The end result is that dangling (->mapping == NULL) pages that appear to be from a particular file may be mapped into userspace with nonsense data. Valid mappings to the same place will see a different page. Andrea implemented two working fixes, one using a real seqlock, another using a page->flags bit. He also proposed using the page lock in do_no_page, but that was initially considered too heavyweight. However, it is not a global or per-file lock, and the page cacheline is modified in do_no_page to increment _count and _mapcount anyway, so a further modification should not be a large performance hit. Scalability is not an issue. This patch implements this latter approach. ->nopage implementations return with the page locked if it is possible for their underlying file to be invalidated (in that case, they must set a special vm_flags bit to indicate so). do_no_page only unlocks the page after setting up the mapping completely. invalidation is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
David Chinner	c32676eea1	[XFS] Fix inode size update before data write in xfs_setattr When changing the file size by a truncate() call, we log the change in the inode size. However, we do not flush any outstanding data that might not have been written to disk, thereby violating the data/inode size update order. This can leave files full of NULLs on crash. Hence if we are truncating the file, flush any unwritten data that may lie between the curret on disk inode size and the new inode size that is being logged to ensure that ordering is preserved. SGI-PV: 966308 SGI-Modid: xfs-linux-melb:xfs-kern:29174a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:52:05 +10:00
David Chinner	91ebecc74e	[XFS] Allow punching holes to free space when at ENOSPC Make the free file space transaction able to dip into the reserved blocks to ensure that we can successfully free blocks when the filesystem is at ENOSPC. SGI-PV: 967788 SGI-Modid: xfs-linux-melb:xfs-kern:29167a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:51:46 +10:00
David Chinner	4f57dbc6b5	[XFS] Implement ->page_mkwrite in XFS. Hook XFS up to ->page_mkwrite to ensure that we know about mmap pages being written to. This allows use to do correct delayed allocation and ENOSPC checking as well as remap unwritten extents so that they get converted correctly during writeback. This is done via the generic block_page_mkwrite code. SGI-PV: 940392 SGI-Modid: xfs-linux-melb:xfs-kern:29149a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:51:21 +10:00
David Chinner	5417169026	[FS] Implement block_page_mkwrite. Many filesystems need a ->page-mkwrite callout to correctly set up pages that have been written to by mmap. This is especially important when mmap is writing into holes as it allows filesystems to correctly account for and allocate space before the mmap write is allowed to proceed. Protection against truncate races is provided by locking the page and checking to see whether the page mapping is correct and whether it is beyond EOF so we don't end up allowing allocations beyond the current EOF or changing EOF as a result of a mmap write. SGI-PV: 940392 SGI-Modid: 2.6.x-xfs-melb:linux:29146a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-19 19:50:50 +10:00
Mark Fasheh	385820a38d	ocfs2: ->fallocate() support Plug ocfs2 into the ->fallocate() callback. This just re-uses the existing preallocation code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-19 00:23:55 -07:00
Linus Torvalds	789c56b7f7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (24 commits) [CIFS] merge conflict in fs/cifs/export.c [CIFS] Allow disabling CIFS Unix Extensions as mount option [CIFS] More whitespace/formatting fixes (noticed by checkpatch) [CIFS] Typo in previous patch [CIFS] zero_user_page() conversions [CIFS] use simple_prepare_write to zero page data [CIFS] Fix build break - inet.h not included when experimental ifdef off [CIFS] Add support for new POSIX unlink [CIFS] whitespace/formatting fixes [CIFS] Fix oops in cifs_create when nfsd server exports cifs mount [CIFS] whitespace cleanup [CIFS] Fix packet signatures for NTLMv2 case [CIFS] more whitespace fixes [CIFS] more whitespace cleanup [CIFS] whitespace cleanup [CIFS] whitespace cleanup [CIFS] ipv6 support no longer experimental [CIFS] Mount should fail if server signing off but client mount option requires it [CIFS] whitespace fixes [CIFS] Fix sign mount option and sign proc config setting ...	2007-07-18 18:32:28 -07:00
Linus Torvalds	29e7ee378e	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: cosmetic clean up on node creation failure paths sysfs: kill an extra put in sysfs_create_link() failure path Driver core: check return code of sysfs_create_link() HOWTO: Add the knwon_regression URI to the documentation dev_vdbg() documentation dev_vdbg(), available with -DVERBOSE_DEBUG sysfs: make sysfs_init_inode() static sysfs: fix sysfs root inode nlink accounting Documentation fix devres.txt: lib/iomap.c -> lib/devres.c sysfs: avoid kmem_cache_free(NULL) PM: remove deprecated dpm_runtime_* routines PM: Remove deprecated sysfs files Driver core: accept all valid action-strings in uevent-trigger debugfs: remove rmdir() non-empty complaint	2007-07-18 18:28:08 -07:00
Linus Torvalds	a8dcf12f9e	Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux * 'for-linus' of git://linux-nfs.org/~bfields/linux: locks: fix vfs_test_lock() comment locks: make posix_test_lock() interface more consistent nfs: disable leases over NFS gfs2: stop giving out non-cluster-coherent leases locks: export setlease to filesystems locks: provide a file lease method enabling cluster-coherent leases locks: rename lease functions to reflect locks.c conventions locks: share more common lease code locks: clean up lease_alloc() locks: convert an -EINVAL return to a BUG leases: minor break_lease() comment clarification	2007-07-18 18:27:00 -07:00
Steve French	1ff8392c32	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/cifs/export.c	2007-07-19 00:38:57 +00:00
Steve French	70b315b0dd	[CIFS] merge conflict in fs/cifs/export.c Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-19 00:32:25 +00:00
Steve French	c18c842b1f	[CIFS] Allow disabling CIFS Unix Extensions as mount option Previously the only way to do this was to umount all mounts to that server, turn off a proc setting (/proc/fs/cifs/LinuxExtensionsEnabled). Fixes Samba bugzilla bug number: 4582 (and also 2008) Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-18 23:21:09 +00:00
J. Bruce Fields	6924c55492	locks: fix vfs_test_lock() comment Thanks to Doug Chapman for pointing out that the comment here is inconsistent with the function prototype. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	6d34ac199a	locks: make posix_test_lock() interface more consistent Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or presence of a conflict lock by setting fl_type to, respectively, F_UNLCK or something other than F_UNLCK, the return value is no longer needed. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	370f6599e8	nfs: disable leases over NFS As Peter Staubach says elsewhere (http://marc.info/?l=linux-kernel&m=118113649526444&w=2): > The problem is that some file system such as NFSv2 and NFSv3 do > not have sufficient support to be able to support leases correctly. > In particular for these two file systems, there is no over the wire > protocol support. > > Currently, these two file systems fail the fcntl(F_SETLEASE) call > accidentally, due to a reference counting difference. These file > systems should fail more consciously, with a proper error to > indicate that the call is invalid for them. Define an nfs setlease method that just returns -EINVAL. If someone can demonstrate a real need, perhaps we could reenable them in the presence of the "nolock" mount option. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Peter Staubach <staubach@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-18 19:17:19 -04:00
Marc Eshel	60446067ba	gfs2: stop giving out non-cluster-coherent leases Since gfs2 can't prevent conflicting opens or leases on other nodes, we probably shouldn't allow it to give out leases at all. Put the newly defined lease operation into use in gfs2 by turning off lease, unless we're using the "nolock' locking module (in which case all locking is local anyway). Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Steven Whitehouse <swhiteho@redhat.com>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	4698afe8e3	locks: export setlease to filesystems Export setlease so it can used by filesystems to implement their lease methods. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:06 -04:00
J. Bruce Fields	f9ffed26d6	locks: provide a file lease method enabling cluster-coherent leases Currently leases are only kept locally, so there's no way for a distributed filesystem to enforce them against multiple clients. We're particularly interested in the case of nfsd exporting a cluster filesystem, in which case nfsd needs cluster-coherent leases in order to implement delegations correctly. Also add some documentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:14:47 -04:00
J. Bruce Fields	a9933cea7a	locks: rename lease functions to reflect locks.c conventions We've been using the convention that vfs_foo is the function that calls a filesystem-specific foo method if it exists, or falls back on a generic method if it doesn't; thus vfs_foo is what is called when some other part of the kernel (normally lockd or nfsd) wants to get a lock, whereas foo is what filesystems call to use the underlying local functionality as part of their lock implementation. So rename setlease to vfs_setlease (which will call a filesystem-specific setlease after a later patch) and __setlease to setlease. Also, vfs_setlease need only be GPL-exported as long as it's only needed by lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:14:12 -04:00
J. Bruce Fields	6d5e8b05ca	locks: share more common lease code Share more code between setlease (used by nfsd) and fcntl. Also some minor cleanup. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Christoph Hellwig <hch@infradead.org>	2007-07-18 19:09:27 -04:00
J. Bruce Fields	e32b8ee27b	locks: clean up lease_alloc() Return the newly allocated structure as the return value instead of using a struct ** parameter. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
J. Bruce Fields	d2ab0b0c4c	locks: convert an -EINVAL return to a BUG There's no point trying to return an error in these cases, which all represent bugs in the callers. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
david m. richter	87250dd26a	leases: minor break_lease() comment clarification clarify that break_lease() checks for presence of any lock, not just leases. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:09:27 -04:00
Tejun Heo	967e35dcc9	sysfs: cosmetic clean up on node creation failure paths Node addition failure is detected by testing return value of sysfs_addfm_finish() which returns the number of added and removed nodes. As the function is called as the last step of addition right on top of error handling block, the if blocks looked like the following. if (sysfs_addrm_finish(&acxt)) success handling, usually return; /* fall through to error handling */ This is the opposite of usual convention in sysfs and makes the code difficult to understand. This patch inverts the test and makes those blocks look more like others. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Gabriel C <nix.or.die@googlemail.com> Cc: Miles Lane <miles.lane@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:50 -07:00
Tejun Heo	a1da4dfe35	sysfs: kill an extra put in sysfs_create_link() failure path There is a subtle bug in sysfs_create_link() failure path. When symlink creation fails because there's already a node with the same name, the target sysfs_dirent is put twice - once by failure path of sysfs_create_link() and once more when the symlink is released. Fix it by making only the symlink node responsible for putting target_sd. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Gabriel C <nix.or.die@googlemail.com> Cc: Miles Lane <miles.lane@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:50 -07:00
Tejun Heo	bc37e28303	sysfs: make sysfs_init_inode() static With sysfs_fill_super() converted to use sysfs_get_inode(), there is no user of sysfs_init_inode() outside of fs/sysfs/inode.c. Make it static. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Tejun Heo	e080e436f6	sysfs: fix sysfs root inode nlink accounting While making sysfs indoes hashed, sysfs root inode was left out. Now that nlink accounting depends on the inode being on the hash, sysfs root inode nlink isn't adjusted properly. Put sysfs root inode on the inode hash by allocating it using sysfs_get_inode() like other sysfs inodes. While at it, massage comments a bit. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Akinobu Mita	01da2425f3	sysfs: avoid kmem_cache_free(NULL) kmem_cache_free() with NULL is not allowed. But it may happen if out of memory error is triggered in sysfs_new_dirent(). This patch fixes that error handling. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Jens Axboe	a6bb340da3	debugfs: remove rmdir() non-empty complaint Hi, This patch kills the pointless debugfs rmdir() printk() when called on a non-empty directory. blktrace will sometimes have to call it a few times when forcefully ending a trace, which polutes the log with pointless warnings. Rationale: - It's more code to work-around this "problem" in the debugfs users, and you would have to add code to check for empty directories to do so (or assume that debugfs is using simple_ helpers, but that would be a layering violation). - Other rmdir() implementations don't complain about something this silly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:48 -07:00
Linus Torvalds	d756d10e24	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: extent macros cleanup Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. ext4: remove extra IS_RDONLY() check ext4: Use is_power_of_2() Use zero_user_page() in ext4 where possible ext4: Remove 65000 subdirectory limit ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields ext4: Add nanosecond timestamps jbd2: Move jbd2-debug file to debugfs jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices ext4: Make extents code sanely handle on-disk corruption ext4: copy i_flags to inode flags on write ext4: Enable extents by default Change on-disk format to support 2^15 uninitialized extents write support for preallocated blocks fallocate support in ext4 sys_fallocate() implementation on i386, x86_64 and powerpc	2007-07-18 10:32:00 -07:00
Jeremy Fitzhardinge	86313c488a	usermodehelper: Tidy up waiting Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>	2007-07-18 08:47:40 -07:00
Dmitry Monakhov	e9f410b1c0	ext4: extent macros cleanup Use the EXT_LAST_INDEX macro; that's what it's there for. Clean up ext4_ext_ext_grow_indepth() so the correct EXT_FIRST_INDEX or EXT_FIRST_MACRO is used as necessary. The two macros are equivalent, so the C will collapse the if statement out, but it makes the code much more readable. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Singed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:09:15 -04:00
Dmitry Monakhov	26d535ed24	Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:33:37 -04:00
Dave Hansen	d699594dc1	ext4: remove extra IS_RDONLY() check ext4_change_inode_journal_flag() is only called from one location: ext4_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY() call in it so this one is superfluous. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:33:51 -04:00
Vignesh Babu	1330593eb2	ext4: Use is_power_of_2() Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2() Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:11:02 -04:00
Eric Sandeen	fc0e15a667	Use zero_user_page() in ext4 where possible Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:20:44 -04:00
Andreas Dilger	f8628a14a2	ext4: Remove 65000 subdirectory limit This patch adds support to ext4 for allowing more than 65000 subdirectories. Currently the maximum number of subdirectories is capped at 32000. If we exceed 65000 subdirectories in an htree directory it sets the inode link count to 1 and no longer counts subdirectories. The directory link count is not actually used when determining if a directory is empty, as that only counts subdirectories and not regular files that might be in there. A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if the subdir count for any directory crosses 65000. A later fsck will clear EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory with >65000 subdirs. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:38:01 -04:00
Kalpak Shah	6dd4ee7cab	ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:19:57 -04:00
Kalpak Shah	ef7f38359e	ext4: Add nanosecond timestamps This patch adds nanosecond timestamps for ext4. This involves adding *time_extra fields to the ext4_inode to extend the timestamps to 64-bits. Creation time is also added by this patch. These extended fields will fit into an inode if the filesystem was formatted with large inodes (-I 256 or larger) and there are currently no EAs consuming all of the available space. For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. So this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature flag(ro-compat so that older kernels can't create inodes with a smaller extra_isize). which indicates if the fields fitting inside s_min_extra_isize are available or not. If the expansion of inodes if unsuccessful then this feature will be disabled. This feature is only enabled if requested by the sysadmin. None of the extended inode fields is critical for correct filesystem operation. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:15:20 -04:00
Jose R. Santos	0f49d5d019	jbd2: Move jbd2-debug file to debugfs The jbd2-debug file used to be located in /proc/sys/fs/jbd2-debug, but it incorrectly used create_proc_entry() instead of the sysctl routines, and no proc entry was ever created. Instead of fixing this we might as well move the jbd2-debug file to debugfs which would be the preferred location for this kind of tunable. The new location is now /sys/kernel/debug/jbd2/jbd2-debug. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:50:18 -04:00
Jose R. Santos	e23291b912	jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG When the JBD code was forked to create the new JBD2 code base, the references to CONFIG_JBD_DEBUG where never changed to CONFIG_JBD2_DEBUG. This patch fixes that. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:57:06 -04:00
Jose R. Santos	eb40a09c67	ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices Set the journals JBD2_FEATURE_INCOMPAT_64BIT on devices with more than 32bit block sizes during mount time. This ensure proper record lenth when writing to the journal. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:37:25 -04:00
Alex Tomas	c29c0ae7f2	ext4: Make extents code sanely handle on-disk corruption Add more run-time checking of extent header fields and remove BUG_ON checks so we don't panic the kernel just because the on-disk filesystem is corrupted. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:19:09 -04:00
Jan Kara	ff9ddf7e84	ext4: copy i_flags to inode flags on write Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ext4-specific i_flags. Quota code changes these flags on quota files (to make it harder for sysadmin to screw himself) and these changes were not correctly propagated into the filesystem. (This is a forward port patch from ext3) Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:24:20 -04:00
Mingming Cao	1e2462f93e	ext4: Enable extents by default Turn on extents feature by default in ext4 filesystem, to get wider testing of extents feature in ext4dev. This can be disabled using -o noextents. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:00:55 -04:00
Amit Arora	749269faca	Change on-disk format to support 2^15 uninitialized extents This change was suggested by Andreas Dilger. This patch changes the EXT_MAX_LEN value and extent code which marks/checks uninitialized extents. With this change it will be possible to have initialized extents with 2^15 blocks (earlier the max blocks we could have was 2^15 - 1). This way we can have better extent-to-block alignment. Now, maximum number of blocks we can have in an initialized extent is 2^15 and in an uninitialized extent is 2^15 - 1. Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-18 09:02:56 -04:00
Amit Arora	56055d3ae4	write support for preallocated blocks This patch adds write support to the uninitialized extents that get created when a preallocation is done using fallocate(). It takes care of splitting the extents into multiple (upto three) extents and merging the new split extents with neighbouring ones, if possible. Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-17 21:42:38 -04:00
Amit Arora	a2df2a6340	fallocate support in ext4 This patch implements ->fallocate() inode operation in ext4. With this patch users of ext4 file systems will be able to use fallocate() system call for persistent preallocation. Current implementation only supports preallocation for regular files (directories not supported as of date) with extent maps. This patch does not support block-mapped files currently. Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of now. Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-17 21:42:41 -04:00
Amit Arora	97ac73506c	sys_fallocate() implementation on i386, x86_64 and powerpc fallocate() is a new system call being proposed here which will allow applications to preallocate space to any file(s) in a file system. Each file system implementation that wants to use this feature will need to support an inode operation called ->fallocate(). Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the the system becomes full. Currently, glibc provides an interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working on all file systems, but it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call. It is expected that posix_fallocate() will be modified to call this new system call first and incase the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks. ToDos: 1. Implementation on other architectures (other than i386, x86_64, and ppc). Patches for s390(x) and ia64 are already available from previous posts, but it was decided that they should be added later once fallocate is in the mainline. Hence not including those patches in this take. 2. Changes to glibc, a) to support fallocate() system call b) to make posix_fallocate() and posix_fallocate64() call fallocate() Signed-off-by: Amit Arora <aarora@in.ibm.com>	2007-07-17 21:42:44 -04:00
Linus Torvalds	6dfce901a4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix debug compilation error	2007-07-17 15:23:50 -07:00
Linus Torvalds	b8c638acac	Merge branch 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 * 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6: arch/i386/* fs/* ipc/: mark variables with uninitialized_var() drivers/: mark variables with uninitialized_var()	2007-07-17 15:19:06 -07:00
Jeff Garzik	8e1c091ccc	arch/i386/* fs/* ipc/*: mark variables with uninitialized_var() Mark variables with uninitialized_var() if such a warning appears, and analysis proves that the var is initialized properly on all paths it is used. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-17 16:23:19 -04:00
Satyam Sharma	3bd858ab1c	Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check Introduce is_owner_or_cap() macro in fs.h, and convert over relevant users to it. This is done because we want to avoid bugs in the future where we check for only effective fsuid of the current task against a file's owning uid, without simultaneously checking for CAP_FOWNER as well, thus violating its semantics. [ XFS uses special macros and structures, and in general looked ... untouchable, so we leave it alone -- but it has been looked over. ] The (current->fsuid != inode->i_uid) check in generic_permission() and exec_permission_lite() is left alone, because those operations are covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: Al Viro <viro@ftp.linux.org.uk> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 12:00:03 -07:00
Linus Torvalds	49c13b51a1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits) KVM: Use CPU_DYING for disabling virtualization KVM: Tune hotplug/suspend IPIs KVM: Keep track of which cpus have virtualization enabled SMP: Allow smp_call_function_single() to current cpu i386: Allow smp_call_function_single() to current cpu x86_64: Allow smp_call_function_single() to current cpu HOTPLUG: Adapt thermal throttle to CPU_DYING HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING HOTPLUG: Add CPU_DYING notifier KVM: Clean up #includes KVM: Remove kvmfs in favor of the anonymous inodes source KVM: SVM: Reliably detect if SVM was disabled by BIOS KVM: VMX: Remove unnecessary code in vmx_tlb_flush() KVM: MMU: Fix Wrong tlb flush order KVM: VMX: Reinitialize the real-mode tss when entering real mode KVM: Avoid useless memory write when possible KVM: Fix x86 emulator writeback KVM: Add support for in-kernel pio handlers KVM: VMX: Fix interrupt checking on lightweight exit KVM: Adds support for in-kernel mmio handlers ...	2007-07-17 11:50:26 -07:00
Steve French	63135e088a	[CIFS] More whitespace/formatting fixes (noticed by checkpatch) Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-17 17:34:02 +00:00
Mika Kukkonen	c381bfcf0c	Couple fixes to fs/ecryptfs/inode.c Following was uncovered by compiling the kernel with '-W' flag: CC [M] fs/ecryptfs/inode.o fs/ecryptfs/inode.c: In function ‘ecryptfs_lookup’: fs/ecryptfs/inode.c:304: warning: comparison of unsigned expression < 0 is always false fs/ecryptfs/inode.c: In function ‘ecryptfs_symlink’: fs/ecryptfs/inode.c:486: warning: comparison of unsigned expression < 0 is always false Function ecryptfs_encode_filename() can return -ENOMEM, so change the variables to plain int, as in the first case the only real use actually expects int, and in latter case there is no use beoynd the error check. Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	1269bc69b6	knfsd: nfsd: enforce per-flavor id squashing Allow root squashing to vary per-pseudoflavor, so that you can (for example) allow root access only when sufficiently strong security is in use. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	9091224f3c	knfsd: nfsd: allow auth_sys nlm on rpcsec_gss exports Our clients (like other clients, as far as I know) use only auth_sys for nlm, even when using rpcsec_gss for the main nfs operations. Administrators that want to deny non-kerberos-authenticated locking requests will need to turn off NFS protocol versions less than 4.... Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	4796f45740	knfsd: nfsd4: secinfo handling without secinfo= option We could return some sort of error in the case where someone asks for secinfo on an export without the secinfo= option set--that'd be no worse than what we've been doing. But it's not really correct. So, hack up an approximate secinfo response in that case--it may not be complete, but it'll tell the client at least one acceptable security flavor. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
Andy Adamson	dcb488a3b7	knfsd: nfsd4: implement secinfo Implement the secinfo operation. (Thanks to Usha Ketineni wrote an earlier version of this support.) Cc: Usha Ketineni <uketinen@us.ibm.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	91fe39d35e	knfsd: nfsd: display export secinfo information Add secinfo information to the display in proc/net/sunrpc/nfsd.export/content. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	ac34cdb03d	knfsd: nfsd: factor out code from show_expflags Factor out some code to be shared by secinfo display code. Remove some unnecessary conditional printing of commas where we know the condition is true. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	0ec757df97	knfsd: nfsd4: make readonly access depend on pseudoflavor Allow readonly access to vary depending on the pseudoflavor, using the flag passed with each pseudoflavor in the export downcall. The rest of the flags are ignored for now, though some day we might also allow id squashing to vary based on the flavor. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
Andy Adamson	32c1eb0cd7	knfsd: nfsd4: return nfserr_wrongsec Make the first actual use of the secinfo information by using it to return nfserr_wrongsec when an export is found that doesn't allow the flavor used on this request. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	6c0a654dce	knfsd: nfsd: factor nfsd_lookup into 2 pieces Factor nfsd_lookup into nfsd_lookup_dentry, which finds the right dentry and export, and a second part which composes the filehandle (and which will later check the security flavor on the new export). No change in behavior. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	2ea2209f07	knfsd: nfsd: use ip-address-based domain in secinfo case With this patch, we fall back on using the gss/pseudoflavor only if we fail to find a matching auth_unix export that has a secinfo list. As long as sec= options aren't used, there's still no change in behavior here (except possibly for some additional auth_unix cache lookups, whose results will be ignored). The sec= option, however, is not actually enforced yet; later patches will add the necessary checks. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:08 -07:00
J. Bruce Fields	3ab4d8b121	knfsd: nfsd: set rq_client to ip-address-determined-domain We want it to be possible for users to restrict exports both by IP address and by pseudoflavor. The pseudoflavor information has previously been passed using special auth_domains stored in the rq_client field. After the preceding patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so now we use rq_client for the ip information, as auth_null and auth_unix do. However, we keep around the special auth_domain in the rq_gssclient field for backwards compatibility purposes, so we can still do upcalls using the old "gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an appropriate export. This allows us to continue supporting old mountd. In fact, for this first patch, we always use the "gss/pseudoflavor" auth_domain (and only it) if it is available; thus rq_client is ignored in the auth_gss case, and this patch on its own makes no change in behavior; that will be left to later patches. Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap upcall by a dummy value--no version of idmapd has ever used it, and it's unlikely anyone really wants to perform idmapping differently depending on the where the client is (they may want to perform credential mapping differently, but that's a different matter--the idmapper just handles id's used in getattr and setattr). But I'm updating the idmapd code anyway, just out of general backwards-compatibility paranoia. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	0989a78896	knfsd: nfsd: provide export lookup wrappers which take a svc_rqst Split the callers of exp_get_by_name(), exp_find(), and exp_parent() into those that are processing requests and those that are doing other stuff (like looking up filehandles for mountd). No change in behavior, just a (fairly pointless, on its own) cleanup. (Note this has the effect of making nfsd_cross_mnt() pass rqstp->rq_client instead of exp->ex_client into exp_find_by_name(). However, the two should have the same value at this point.) Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	87548c37c8	knfsd: nfsd: remove superfluous assignment from nfsd_lookup The "err" variable will only be used in the final return, which always happens after either the preceding err = fh_compose(...); or after the following err = nfserrno(host_err); So the earlier assignment to err is ignored. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	df547efb03	knfsd: nfsd4: simplify exp_pseudoroot arguments We're passing three arguments to exp_pseudoroot, two of which are just fields of the svc_rqst. Soon we'll want to pass in a third field as well. So let's just give up and pass in the whole struct svc_rqst. Also sneak in some minor style cleanups while we're at it. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Andy Adamson	e677bfe4d4	knfsd: nfsd4: parse secinfo information in exports downcall We add a list of pseudoflavors to each export downcall, which will be used both as a list of security flavors allowed on that export, and (in the order given) as the list of pseudoflavors to return on secinfo calls. This patch parses the new downcall information and adds it to the export structure, but doesn't use it for anything yet. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	42ed95c4e7	knfsd: nfsd4: build rpcsec_gss whenever nfsd4 is built Select rpcsec_gss support whenever asked for NFSv4 support. The rfc actually requires gss, and gss is also the main reason to migrate to v4. We already do this on the client side. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	2d3bb25209	knfsd: nfsd: make all exp_finding functions return -errno's on err Currently exp_find(), exp_get_by_name(), and friends, return an export on success, and on failure return: errors -EAGAIN (drop this request pending an upcall) or -ETIMEDOUT (an upcall has timed out), or return NULL, which can mean either that there was a memory allocation failure, or that an export was not found, or that a passed-in export lacks an auth_domain. Many callers seem to assume that NULL means that an export was not found, which may lead to bugs in the case of a memory allocation failure. Modify these functions to distinguish between the two NULL cases by returning either -ENOENT or -ENOMEM. They now never return NULL. We get to simplify some code in the process. We return -ENOENT in the case of a missing auth_domain. This case should probably be removed (or converted to a bug) after confirming that it can never happen. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Meelap Shah	47f9940c55	knfsd: nfsd4: don't delegate files that have had conflicts One more incremental delegation policy improvement: don't give out a delegation on a file if conflicting access has previously required that a delegation be revoked on that file. (In practice we'll forget about the conflict when the struct nfs4_file is removed on close, so this is of limited use for now, though it should at least solve a temporary problem with self-conflicts on write opens from the same client.) Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Meelap Shah	c2f1a551de	knfsd: nfsd4: vary maximum delegation limit based on RAM size Our original NFSv4 delegation policy was to give out a read delegation on any open when it was possible to. Since the lifetime of a delegation isn't limited to that of an open, a client may quite reasonably hang on to a delegation as long as it has the inode cached. This becomes an obvious problem the first time a client's inode cache approaches the size of the server's total memory. Our first quick solution was to add a hard-coded limit. This patch makes a mild incremental improvement by varying that limit according to the server's total memory size, allowing at most 4 delegations per megabyte of RAM. My quick back-of-the-envelope calculation finds that in the worst case (where every delegation is for a different inode), a delegation could take about 1.5K, which would make the worst case usage about 6% of memory. The new limit works out to be about the same as the old on a 1-gig server. [akpm@linux-foundation.org: Don't needlessly bloat vmlinux] [akpm@linux-foundation.org: Make it right for highmem machines] Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	1e5140279f	knfsd: nfsd: remove unused header interface.h It looks like Al Viro gutted this header file five years ago and it hasn't been touched since. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
J. Bruce Fields	4b2ca38ad6	knfsd: nfsd4: fix handling of acl errrors nfs4_acl_nfsv4_to_posix() returns an error and returns any posix acls calculated in two caller-provided pointers. It was setting these pointers to -errno in some error cases, resulting in nfsd4_set_nfs4_acl() calling posix_acl_release() with a -errno as an argument. Fix both the caller and the callee, by modifying nfsd4_set_nfs4_acl() to stop relying on the passed-in-pointers being left as NULL in the error case, and by modifying nfs4_acl_nfsv4_to_posix() to stop returning garbage in those pointers. Thanks to Alex Soule for reporting the bug. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Alexander Soule <soule@umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00
Benny Halevy	0ac68d1799	knfsd: nfsd4: fix enc_stateid_sz for nfsd callbacks enc_stateid_sz should be given in u32 words units, not bytes, so we were overestimating the buffer space needed here. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:07 -07:00

... 2 3 4 5 6 ...

6345 Commits