linux

Author	SHA1	Message	Date
Christoph Hellwig	ef35e9255d	xfs: remove xfs_iput_new We never get an i_mode of 0 or a locked VFS inode until we pass in the XFS_IGET_CREATE flag to xfs_iget, which makes xfs_iput_new equivalent to xfs_iput for the only caller. In addition to that xfs_nfs_get_inode does not even need to lock the inode given that the generation never changes for a life inode, so just pass a 0 lock_flags to xfs_iget and release the inode using IRELE in the error path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:44 -05:00
Christoph Hellwig	d2e078c33c	xfs: some iget tracing cleanups / fixes The xfs_iget_alloc/found tracepoints are a bit misnamed and misplaced. Rename them to xfs_iget_hit/xfs_iget_miss and move them to the beggining of the xfs_iget_cache_hit/miss functions. Add a new xfs_iget_reclaim_fail tracepoint for the case where we fail to re-initialize a VFS inode, and add a second instance of the xfs_iget_skip tracepoint for the case of a failed igrab() call. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:43 -05:00
Christoph Hellwig	807cbbdb43	xfs: do not use emums for flags used in tracing The tracing code can't print flags defined as enums. Most flags that we want to print are defines as macros already, but move the few remaining ones over to make the trace output more useful. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:43 -05:00
Christoph Hellwig	64c8614941	xfs: remove explicit xfs_sync_data/xfs_sync_attr calls on umount On the final put of a superblock the VFS already calls sync_filesystem for us to write out all data and wait for it. No need to start another asynchronous writeback inside ->put_super. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:42 -05:00
Christoph Hellwig	f2bde9b89b	xfs: small cleanups for xfs_iomap / __xfs_get_blocks Remove the flags argument to __xfs_get_blocks as we can easily derive it from the direct argument, and remove the unused BMAPI_MMAP flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:42 -05:00
Christoph Hellwig	3070451eea	xfs: reduce stack usage in xfs_iomap xfs_iomap passes a xfs_bmbt_irec pointer to xfs_iomap_write_direct and xfs_iomap_write_allocate to give them the results of our read-only xfs_bmapi query. Instead of allocating a new xfs_bmbt_irec on stack for the next call to xfs_bmapi re use the one we got passed as it's not used after this point. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:42 -05:00
Christoph Hellwig	7a36c8a98a	xfs: avoid synchronous transaction in xfs_fs_write_inode We already rely on the fact that the sync code will cause a synchronous log force later on (currently via xfs_fs_sync_fs -> xfs_quiesce_data -> xfs_sync_data), so no need to do this here. This allows us to avoid a lot of synchronous log forces during sync, which pays of especially with delayed logging enabled. Some compilebench numbers that show this: xfs (delayed logging, 256k logbufs) =================================== intial create 25.94 MB/s 25.75 MB/s 25.64 MB/s create 8.54 MB/s 9.12 MB/s 9.15 MB/s patch 2.47 MB/s 2.47 MB/s 3.17 MB/s compile 29.65 MB/s 30.51 MB/s 27.33 MB/s clean 90.92 MB/s 98.83 MB/s 128.87 MB/s read tree 11.90 MB/s 11.84 MB/s 8.56 MB/s read compiled 28.75 MB/s 29.96 MB/s 24.25 MB/s delete tree 8.39 seconds 8.12 seconds 8.46 seconds delete compiled 8.35 seconds 8.44 seconds 5.11 seconds stat tree 6.03 seconds 5.59 seconds 5.19 seconds stat compiled tree 9.00 seconds 9.52 seconds 8.49 seconds xfs + write_inode log_force removal =================================== intial create 25.87 MB/s 25.76 MB/s 25.87 MB/s create 15.18 MB/s 14.80 MB/s 14.94 MB/s patch 3.13 MB/s 3.14 MB/s 3.11 MB/s compile 36.74 MB/s 37.17 MB/s 36.84 MB/s clean 226.02 MB/s 222.58 MB/s 217.94 MB/s read tree 15.14 MB/s 15.02 MB/s 15.14 MB/s read compiled tree 29.30 MB/s 29.31 MB/s 29.32 MB/s delete tree 6.22 seconds 6.14 seconds 6.15 seconds delete compiled tree 5.75 seconds 5.92 seconds 5.81 seconds stat tree 4.60 seconds 4.51 seconds 4.56 seconds stat compiled tree 4.07 seconds 3.87 seconds 3.96 seconds In addition to that also remove the delwri inode flush that is unessecary now that bulkstat is always coherent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:41 -05:00
Christoph Hellwig	20cb52ebd1	xfs: simplify xfs_vm_writepage The writepage implementation in XFS still tries to deal with dirty but unmapped buffers which used to caused by writes through shared mmaps. Since the introduction of ->page_mkwrite these can't happen anymore, so remove the code dealing with them. Note that the all_bh variable which causes us to start I/O on all buffers on the pages was controlled by the count of unmapped buffers, which also included those not actually dirty. It's now unconditionally initialized to 0 but set to 1 for the case of small file size extensions. It probably can be removed entirely, but that's left for another patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:41 -05:00
Christoph Hellwig	89f3b36396	xfs: simplify xfs_vm_releasepage Currently the xfs releasepage implementation has code to deal with converting delayed allocated and unwritten space. But we never get called for those as we always convert delayed and unwritten space when cleaning a page, or drop the state from the buffers in block_invalidatepage. We still keep a WARN_ON on those cases for now, but remove all the case dealing with it, which allows to fold xfs_page_state_convert into xfs_vm_writepage and remove the !startio case from the whole writeback path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:40 -05:00
Eric Sandeen	3d9b02e3c7	xfs: fix corruption case for block size < page size xfstests 194 first truncats a file back and then extends it again by truncating it to a larger size. This causes discard_buffer to drop the mapped, but not the uptodate bit and thus creates something that xfs_page_state_convert takes for unmapped space created by mmap because it doesn't check for the dirty bit, which also gets cleared by discard_buffer and checked by other ->writepage implementations like block_write_full_page. Handle this kind of buffers early, and unlike Eric's first version of the patch simply ASSERT that the buffers is dirty, given that the mmap write case can't happen anymore since the introduction of ->page_mkwrite. The now dead code dealing with that will be deleted in a follow on patch. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:40 -05:00
Christoph Hellwig	b4e9181e77	xfs: remove unused delta tracking code in xfs_bmapi This code was introduced four years ago in commit `3e57ecf640` without any review and has been unused since. Remove it just as the rest of the code introduced in that commit to reduce that stack usage and complexity in this central piece of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:39 -05:00
Christoph Hellwig	cd8b0bb3c4	xfs: remove unused XFS_BMAPI_ flags Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:39 -05:00
Christoph Hellwig	a59f55703c	xfs: remove the unused XFS_TRANS_NOSLEEP/XFS_TRANS_WAIT flags Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:38 -05:00
Christoph Hellwig	9134c2332e	xfs: remove the unused XFS_LOG_SLEEP and XFS_LOG_NOSLEEP flags Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:38 -05:00
Christoph Hellwig	dbb2f6529f	xfs: kill the unused xlog_debug variable Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:37 -05:00
Christoph Hellwig	4e0d5f926b	xfs: fix the xfs_log_iovec i_addr type By making this member a void pointer we can get rid of a lot of pointless casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:36 -05:00
Christoph Hellwig	898621d5a7	xfs: simplify inode to transaction joining Currently we need to either call IHOLD or xfs_trans_ihold on an inode when joining it to a transaction via xfs_trans_ijoin. This patches instead makes xfs_trans_ijoin usable on it's own by doing an implicity xfs_trans_ihold, which also allows us to drop the third argument. For the case where we want to hold a reference on the inode a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks the inode for needing an xfs_iput. In addition to the cleaner interface to the caller this also simplifies the implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:36 -05:00
Christoph Hellwig	4d16e9246f	xfs: simplify buffer pinning Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode them in their only callers, just like we did for the inode pinning a while ago. Also remove duplicate trace points - the bufitem tracepoints cover all the information that is present in a buffer tracepoint. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:36 -05:00
Christoph Hellwig	ca30b2a7b7	xfs: give li_cb callbacks the correct prototype Stop the function pointer casting madness and give all the li_cb instances correct prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:35 -05:00
Christoph Hellwig	7bfa31d8e0	xfs: give xfs_item_ops methods the correct prototypes Stop the function pointer casting madness and give all the xfs_item_ops the correct prototypes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:35 -05:00
Christoph Hellwig	9412e3181c	xfs: merge iop_unpin_remove into iop_unpin The unpin_remove item operation instances always share most of the implementation with the respective unpin implementation. So instead of keeping two different entry points add a remove flag to the unpin operation and share the code more easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:34 -05:00
Christoph Hellwig	e98c414f9a	xfs: simplify log item descriptor tracking Currently we track log item descriptor belonging to a transaction using a complex opencoded chunk allocator. This code has been there since day one and seems to work around the lack of an efficient slab allocator. This patch replaces it with dynamically allocated log item descriptors from a dedicated slab pool, linked to the transaction by a linked list. This allows to greatly simplify the log item descriptor tracking to the point where it's just a couple hundred lines in xfs_trans.c instead of a separate file. The external API has also been simplified while we're at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/ delete items from a transaction have been simplified to the bare minium, and the xfs_trans_find_item function is replaced with a direct dereference of the li_desc field. All debug code walking the list of log items in a transaction is down to a simple list_for_each_entry. Note that we could easily use a singly linked list here instead of the double linked list from list.h as the fastpath only does deletion from sequential traversal. But given that we don't have one available as a library function yet I use the list.h functions for simplicity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:34 -05:00
Christoph Hellwig	3400777ff0	xfs: remove unneeded #include statements Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>	2010-07-26 13:16:33 -05:00
Christoph Hellwig	288699feca	xfs: drop dmapi hooks Dmapi support was never merged upstream, but we still have a lot of hooks bloating XFS for it, all over the fast pathes of the filesystem. This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM support in mainline at least the namespace events can be done much saner in the VFS instead of the individual filesystem, so it's not like this is much help for future work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:33 -05:00
Ryusuke Konishi	89c0fd014d	nilfs2: reject filesystem with unsupported block size This inserts sanity check that refuses to mount a filesystem with unsupported block size. Previously, kernel code of nilfs was looking only limitation of devices though mkfs.nilfs2 limits the range of block sizes; there was no check that prevents rec_len overflow with larger block sizes. With this change, block sizes larger than 64KB or smaller than 1KB will get rejected explicitly by kernel. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-25 23:29:21 +09:00
Ryusuke Konishi	6cda9fa257	nilfs2: avoid rec_len overflow with 64KB block size With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry length. So this patch stores 0xffff instead and converts value when read from / written to disk. Nilfs derives its directory implementation from ext2 filesystem, and this draws upon the corresponding change on ext2. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-25 20:46:43 +09:00
Robert P. J. Day	25848b3ec6	ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES" Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-24 21:36:07 -07:00
Tejun Heo	40f2b6ffe5	fscache: fix build on !CONFIG_SYSCTL Commit `8b8edefa` (fscache: convert object to use workqueue instead of slow-work) made fscache_exit() call unregister_sysctl_table() unconditionally breaking build when sysctl is disabled. Fix it by putting it inside CONFIG_SYSCTL. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Howells <dhowells@redhat.com>	2010-07-24 11:10:09 +02:00
Ryusuke Konishi	c28e69d933	nilfs2: simplify nilfs_get_page function Implementation of nilfs_get_page() is a bit old as below: - A common read_mapping_page inline function is now available instead of its read_cache_page use. - wait_on_page_locked() use in the function is eliminable since read_cache_page function does the same thing through wait_on_page_read(). - PageUptodate() check is eliminable for the same reason. This renews nilfs_get_page() based on these points. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-24 17:36:29 +09:00
Sage Weil	1dadcce358	ceph: fix dentry lease release When we embed a dentry lease release notification in a request, invalidate our lease so we don't think we still have it. Otherwise we can get all sorts of incorrect client behavior when multiple clients are interacting with the same part of the namespace. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-23 13:54:21 -07:00
Sage Weil	8c696737aa	ceph: fix leak of dentry in ceph_init_dentry() error path If we fail to allocate a ceph_dentry_info, don't leak the dn reference. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-23 10:02:07 -07:00
Sage Weil	bc4fdca857	ceph: fix pg_mapping leak on pg_temp updates Free the ceph_pg_mapping structs when they are removed from the pg_temp rbtree. Also fix a leak in the __insert_pg_mapping() error path. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-23 10:02:06 -07:00
Sage Weil	252af52146	ceph: fix d_release dop for snapdir, snapped dentries We need to set the d_release dop for snapdir and snapped dentries so that the ceph_dentry_info struct gets released. We also use the dcache to cache readdir results when possible, which only works if we know when dentries are dropped from the cache. Since we don't use the dcache for readdir in the hidden snapdir, avoid that case in ceph_dentry_release. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-23 10:02:05 -07:00
J. Bruce Fields	af4718f3f9	nfsd: minor nfsd_svc() cleanup More idiomatic to put the error case in the if clause. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:51:27 -04:00
J. Bruce Fields	59db4a0c10	nfsd: move more into nfsd_startup() This is just cleanup--it's harmless to call nfsd_rachache_init, nfsd_init_socks, and nfsd_reset_versions more than once. But there's no point to it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:51:26 -04:00
Jeff Layton	ac77efbe2b	nfsd: just keep single lockd reference for nfsd Right now, nfsd keeps a lockd reference for each socket that it has open. This is unnecessary and complicates the error handling on startup and shutdown. Change it to just do a lockd_up when starting the first nfsd thread just do a single lockd_down when taking down the last nfsd thread. Because of the strange way the sv_count is handled this requires an extra flag to tell whether the nfsd_serv holds a reference for lockd or not. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:51:26 -04:00
Jeff Layton	628b368728	nfsd: clean up nfsd_create_serv error handling There doesn't seem to be any need to reset the nfssvc_boot time if the nfsd startup failed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:51:25 -04:00
Jeff Layton	0cd14a061e	nfsd: fix error handling in __write_ports_addxprt __write_ports_addxprt calls nfsd_create_serv. That increases the refcount of nfsd_serv (which is tracked in sv_nrthreads). The service only decrements the thread count on error, not on success like __write_ports_addfd does, so using this interface leaves the nfsd thread count high. Fix this by having this function call svc_destroy() on error to release the reference (and possibly to tear down the service) and simply decrement the refcount without tearing down the service on success. This makes the sv_threads handling work basically the same in both __write_ports_addxprt and __write_ports_addfd. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:51:24 -04:00
Jeff Layton	78a8d7c8ca	nfsd: fix error handling when starting nfsd with rpcbind down The refcounting for nfsd is a little goofy. What happens is that we create the nfsd RPC service, attach sockets to it but don't actually start the threads until someone writes to the "threads" procfile. To do this, __write_ports_addfd will create the nfsd service and then will decrement the refcount when exiting but won't actually destroy the service. This is fine when there aren't errors, but when there are this can cause later attempts to start nfsd to fail. nfsd_serv will be set, and that causes __write_versions to return EBUSY. Fix this by calling svc_destroy on nfsd_serv when this function is going to return error. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:51:23 -04:00
Jeff Layton	4ad9a344be	nfsd4: fix v4 state shutdown error paths If someone tries to shut down the laundry_wq while it isn't up it'll cause an oops. This can happen because write_ports can create a nfsd_svc before we really start the nfs server, and we may fail before the server is ever started. Also make sure state is shutdown on error paths in nfsd_svc(). Use a common global nfsd_up flag instead of nfs4_init, and create common helper functions for nfsd start/shutdown, as there will be other work that we want done only when we the number of nfsd threads transitions between zero and nonzero. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:51:22 -04:00
J. Bruce Fields	55b13354d7	nfsd: remove unused assignment from nfsd_link Trivial cleanup, since "dest" is never used. Reported-by: Anshul Madan <Anshul.Madan@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-07-23 08:50:39 -04:00
Tejun Heo	6ecd7c2dd9	gfs2: use workqueue instead of slow-work Workqueue can now handle high concurrency. Convert gfs to use workqueue instead of slow-work. * Steven pointed out that recovery path might be run from allocation path and thus requires forward progress guarantee without memory allocation. Create and use gfs_recovery_wq with rescuer. Please note that forward progress wasn't guaranteed with slow-work. * Updated to use non-reentrant workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-23 13:14:25 +02:00
Dave Chinner	aa32a79638	ext3: default to ordered mode data=writeback mode is dangerous as it leads to higher data loss and stale data exposure when systems crash. It should not be the default, especially when all major distros ensure their ext3 filesystems default to ordered mode. Change the default mode to the safer data=ordered mode, because we should be caring far more about avoiding stale data exposure than performance. CC: linux-ext4@vger.kernel.org Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-23 12:50:55 +02:00
Jan Kara	43d2932d88	quota: Use mark_inode_dirty_sync instead of mark_inode_dirty Quota code never touches file data. It just modifies i_blocks + i_bytes of inodes and inode flags of quota files. So use mark_inode_dirty_sync instead of mark_inode_dirty. Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-23 12:50:46 +02:00
Ryusuke Konishi	c5ca48aabe	nilfs2: reject incompatible filesystem This forces nilfs to check compatibility of feature flags so as to reject a filesystem with unknown features when it mounts or remounts the filesystem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:16 +09:00
Ryusuke Konishi	03bdb5ac58	nilfs2: apply read-ahead for nilfs_btree_lookup_contig This applies read-ahead to nilfs_btree_do_lookup and nilfs_btree_lookup_contig functions and extends them to read ahead siblings of level 1 btree nodes that hold data blocks. At present, the read-ahead is not applied to most btree operations; only get_block() callback function, which is used during read of regular files or directories, receives the benefit. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:16 +09:00
Ryusuke Konishi	4e13e66bee	nilfs2: introduce check flag to btree node buffer nilfs_btree_get_block() now may return untested buffer due to read-ahead. This adds a new flag for buffer heads so that the btree code can check whether the buffer is already verified or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	464ece8863	nilfs2: add btree get block function with readahead option This adds __nilfs_btree_get_block() function that can issue a series of read-ahead requests for sibling btree nodes. This read-ahead needs parent node block, so nilfs_btree_readahead_info structure is added to pass the information that __nilfs_btree_get_block() needs. This also replaces the previous nilfs_btree_get_block() implementation with a wrapper function of __nilfs_btree_get_block(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	26dfdd8e29	nilfs2: add read ahead mode to nilfs_btnode_submit_block This adds mode argument to nilfs_btnode_submit_block() function and allows it to issue a read-ahead request. An optional submit_ptr argument is also added to store the actual block address for which bio is sent. submit_ptr is used for a series of read-ahead requests, and helps to decide if each requested block is continous to the previous one on disk. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	f8e6cc013b	nilfs2: fix buffer head leak in nilfs_btnode_submit_block nilfs_btnode_submit_block() refers to buffer head just before returning from the function, but it releases the buffer head earlier than that if nilfs_dat_translate() gets an error. This has potential for oops in the erroneous case. This fixes the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	7c397a81fe	nilfs2: eliminate inline keywords in btree implementation This removes all inline uses from btree.c. Gcc now agressively apply inline expansion even for the functions declared without the keyword; the inline use in btree.c looks excessive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	5ad2686e92	nilfs2: get maximum number of child nodes from bmap object The patch "reduce repetitive calculation of max number of child nodes" gathered up the calculation of maximum number of child nodes into nilfs_btree_nchildren_per_block() function. This makes the function get resultant value from a private variable in bmap object instead of calculating it for each call. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	9b7b265c9a	nilfs2: reduce repetitive calculation of max number of child nodes The current btree implementation repeats the same calculation on the maximum number of child nodes. This is because a few low level routines use the calculation for index addressing in a btree node block. This reduces the calculation by explicitly passing the maximum number of child nodes (ncmax) through their argument. This changes parameter passing of the following functions: - nilfs_btree_node_dptrs - nilfs_btree_node_get_ptr - nilfs_btree_node_set_ptr - nilfs_btree_node_init - nilfs_btree_node_move_left - nilfs_btree_node_move_right - nilfs_btree_node_insert - nilfs_btree_node_delete, and - nilfs_btree_get_node The following functions are removed: - nilfs_btree_node_nchildren_min - nilfs_btree_node_nchildren_max Most middle level btree operations are rewritten to pass a proper ncmax value depending on whether each occurrence of node is "root" or not. A constant NILFS_BTREE_ROOT_NCHILDREN_MAX is used for the root node, whereas nilfs_btree_nchildren_per_block() function is used for non-root nodes. If a node could be either root or a non-root node, an output argument of nilfs_btree_get_node() is used to set up ncmax. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	ea64ab87cd	nilfs2: optimize calculation of min/max number of btree node children nilfs_btree_node_nchildren_max() and nilfs_btree_node_nchildren_min() functions switch return value depending on whether target node is the root or a node block. In most uses of these functions, however, the node type is fixed, and moreover the same calculation is repeatedly performed in loop. This unfold these functions depending on context and move them outside loops wherever possible. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	364ec2d700	nilfs2: remove redundant pointer checks in bmap lookup functions nilfs_bmap_lookup and its variants are supposed to take a valid pointer argument to return a block address, thus pointer checks in nilfs_btree_lookup and nilfs_direct_lookup are needless. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	05d0e94b66	nilfs2: get rid of nilfs_bmap_union This removes nilfs_bmap_union and finally unifies three structures and the union in bmap/btree code into one. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	dc935be2a0	nilfs2: unify bmap set_target_v operations This unifies two similar functions nilfs_btree_set_target_v and nilfs_direct_set_target_v into one, nilfs_bmap_set_target_v. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	e7c274f808	nilfs2: get rid of nilfs_btree uses This replaces all uses of nilfs_btree struct in implementation of btree mapping with nilfs_bmap struct. Name of local variable "btree" is kept not to bloat amount of change. And, a part of local variables "bmap" is renamed to "btree" to uniform naming rule. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	10ff885ba6	nilfs2: get rid of nilfs_direct uses This replaces all uses of nilfs_direct struct in implementation of direct mapping with nilfs_bmap struct. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	583ada4761	nilfs2: remove constant qualifier from argument of bmap propagate The first argument of bops->bop_propagate operation takes a constant qualifier, and causes compilation error when removed cast to pointer of nilfs_btree structure type. This fixes the issue to prepare for succesive removal of nilfs_btree struct. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	25b8d7ded0	nilfs2: get rid of private conversion macros on bmap key and pointer Will remove nilfs_bmap_key_to_dkey(), nilfs_bmap_dkey_to_key(), nilfs_bmap_ptr_to_dptr(), and nilfs_bmap_dptr_to_ptr() for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	1d5385b9f3	nilfs2: verify btree node after reading This inserts sanity checks soon after read btree node from disk. This allows early detection of broken btree nodes, and helps to narrow down problems due to file system corruption. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	cfa913a507	nilfs2: add sanity check in nilfs_btree_add_dirty_buffer According to the report titled "problem with nilfs_cleanerd" from Łukasz Wójcicki, nilfs_btree_lookup_dirty_buffers or nilfs_btree_add_dirty_buffer got memory violation during garbage collection. This could happen if a level field of given btree node buffer is incorrect, which is a crucial internal bug. This inserts a sanity check to figure out the problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	7c01745781	nilfs2: pass remount flag to parse_options This adds is_remount argument to the parse_options() function that obtains mount options from strings. Previously, parse_options did not distinguish context whether it's called for a new mount or remount, so the caller needed additional verifications outside the function. This allows parse_options to verify options and print messages depending on the context. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	c6b4d57ddf	nilfs2: use seq_puts to print mount options without argument This replaces seq_printf() with seq_puts() in nilfs_show_options for mount options which have no argument. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	802d317754	nilfs2: add nodiscard mount option Nilfs has "discard" mount option which issues discard/TRIM commands to underlying block device, but it lacks a complementary option and has no way to disable the feature through remount. This adds "nodiscard" option to resolve this imbalance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	773bc4f3b6	nilfs2: add barrier mount option Nilfs enables write barriers by default and has "nobarrier" mount option to disable this feature. But it lacks the complementary option and has no way to re-enable the feature on remount. This adds "barrier" option to resolve this imbalance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	325020477a	nilfs2: do not update log cursor for small change Super blocks of nilfs are periodically overwritten in order to record the recent log position. This shortens recovery time after unclean unmount, but the current implementation performs the update even for a few blocks of change. If the filesystem gets small changes slowly and continually, super blocks may be updated excessively. This moderates the issue by skipping update of log cursor if it does not cross a segment boundary. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Ryusuke Konishi	6c12516083	nilfs2: implement fallback for super root search Although nilfs redundantly uses two super blocks and each may point to different position on log, the current version of nilfs does not try fallback to the spare super block when it doesn't find any valid log at the position that the primary super block points to. This has been a cause of mount failures due to write order reversals on barrier less block devices. This inserts fallback code in error path of nilfs_search_super_root routine to resolve the mount failure problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Ryusuke Konishi	2d72b99ecd	nilfs2: add missing error code in comment of nilfs_search_super_root nilfs_search_super_root can return -ENOMEM, but this error code is not described in its kernel-doc comment. This fixes the discrepancy. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Ryusuke Konishi	843d63baa5	nilfs2: separate setup of log cursor from init_nilfs This separates a setup routine of log cursor from init_nilfs(). The routine, nilfs_store_log_cursor, reads the last position of the log containing a super root, and initializes relevant state on the nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Jiro SEKIBA	b2ac86e1a8	nilfs2: sync super blocks in turns This will sync super blocks in turns instead of syncing duplicate super blocks at the time. This will help searching valid super root when super block is written into disk before log is written, which is happen when barrier-less block devices are unmounted uncleanly. In the situation, old super block likely points to valid log. This patch introduces ns_sbwcount member to the nilfs object and adds nilfs_sb_will_flip() function; ns_sbwcount counts how many times super blocks write back to the disk. And, nilfs_sb_will_flip() decides whether flipping required or not based on the count of ns_sbwcount to sync super blocks asymmetrically. The following functions are also changed: - nilfs_prepare_super(): flips super blocks according to the argument. The argument is calculated by nilfs_sb_will_flip() function. - nilfs_cleanup_super(): sets "clean" flag to both super blocks if they point to the same checkpoint. To update both of super block information, caller of nilfs_commit_super must set the information on both super blocks. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Jiro SEKIBA	d26493b6f0	nilfs2: introduce nilfs_prepare_super This function checks validity of super block pointers. If first super block is invalid, it will swap the super blocks. The function should be called before any super block information updates. Caller must obtain nilfs->ns_sem. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	60f46b7efc	nilfs2: separate function that updates log position This moves out section that updates information of the recent log position stored in super blocks from nilfs_commit_super to a new routine named nilfs_set_log_cursor. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	c8a11c8a14	nilfs2: add nilfs_set_error This function marks error state and write it on super blocks. This is a preparation for making super block writeback alternately. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	7ecaa46cfe	nilfs2: add nilfs_cleanup_super This function write out filesystem state to super blocks in order to share the same cleanup work. This is a preparation for making super block writeback alternately. Cc: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	bde4e696e4	nilfs2: do not update mount time on rw->ro remount Mount time field in super block is wrongly updated when nilfs remounts the partition from read-write to read-only. This fixes the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	57a4bfc486	nilfs2: get rid of ns_free_segments_count This counter is unused. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	4762077c7b	nilfs2: get rid of macros for segment summary information This removes macros to test segment summary flags and redefines a few relevant macros with inline functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	85655484f8	nilfs2: do not use nilfs_segsum_info structure in recovery code This will get rid of nilfs_segsum_info use from recovery functions for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	354fa8be28	nilfs2: divide load_segment_summary function load_segment_summary function has two distinct roles: getting summary header of a log, and verifying consistencies of the log. This divide it into two corresponding functions, nilfs_read_log_header and nilfs_validate_log to clarify the meaning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	aee5ce2f57	nilfs2: rename nilfs_recover_logical_segments function The function name of nilfs_recover_logical_segments makes no sense. This changes the name into nilfs_salvage_orphan_logs to clarify the role of the function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	8b94025c00	nilfs2: refactor recovery logic routines Most functions in recovery code take an argument of a super block instance or a nilfs_sb_info struct for convenience sake. This replaces them aggressively with a nilfs object by applying __bread and __breadahead against routines using sb_bread and sb_breadahead. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:08 +09:00
Ryusuke Konishi	92c60ccaf3	nilfs2: add blocksize member to nilfs object This stores blocksize in nilfs objects for the successive refactoring of recovery logic. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:08 +09:00
Tejun Heo	9b64697246	cifs: use workqueue instead of slow-work Workqueue can now handle high concurrency. Use system_nrt_wq instead of slow-work. * Updated is_valid_oplock_break() to not call cifs_oplock_break_put() as advised by Steve French. It might cause deadlock. Instead, reference is increased after queueing succeeded and cifs_oplock_break() briefly grabs GlobalSMBSeslock before putting the cfile to make sure it doesn't put before the matching get is finished. * Anton Blanchard reported that cifs conversion was using now gone system_single_wq. Use system_nrt_wq which provides non-reentrance guarantee which is enough and much better. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Steve French <sfrench@samba.org> Cc: Anton Blanchard <anton@samba.org>	2010-07-22 22:59:15 +02:00
Tejun Heo	d098adfb7d	fscache: drop references to slow-work fscache no longer uses slow-work. Drop references to it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David Howells <dhowells@redhat.com>	2010-07-22 22:58:58 +02:00
Tejun Heo	8af7c12436	fscache: convert operation to use workqueue instead of slow-work Make fscache operation to use only workqueue instead of combination of workqueue and slow-work. FSCACHE_OP_SLOW is dropped and FSCACHE_OP_FAST is renamed to FSCACHE_OP_ASYNC and uses newly added fscache_op_wq workqueue to execute op->processor(). fscache_operation_init_slow() is dropped and fscache_operation_init() now takes @processor argument directly. * Unbound workqueue is used. * fscache_retrieval_work() is no longer necessary as OP_ASYNC now does the equivalent thing. * sysctl fscache.operation_max_active added to control concurrency. The default value is nr_cpus clamped between 2 and WQ_UNBOUND_MAX_ACTIVE. * debugfs support is dropped for now. Tracing API based debug facility is planned to be added. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David Howells <dhowells@redhat.com>	2010-07-22 22:58:47 +02:00
Tejun Heo	8b8edefa2f	fscache: convert object to use workqueue instead of slow-work Make fscache object state transition callbacks use workqueue instead of slow-work. New dedicated unbound CPU workqueue fscache_object_wq is created. get/put callbacks are renamed and modified to take @object and called directly from the enqueue wrapper and the work function. While at it, make all open coded instances of get/put to use fscache_get/put_object(). * Unbound workqueue is used. * work_busy() output is printed instead of slow-work flags in object debugging outputs. They mean basically the same thing bit-for-bit. * sysctl fscache.object_max_active added to control concurrency. The default value is nr_cpus clamped between 4 and WQ_UNBOUND_MAX_ACTIVE. * slow_work_sleep_till_thread_needed() is replaced with fscache private implementation fscache_object_sleep_till_congested() which waits on fscache_object_wq congestion. * debugfs support is dropped for now. Tracing API based debug facility is planned to be added. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David Howells <dhowells@redhat.com>	2010-07-22 22:58:34 +02:00
Sage Weil	a0dff78dab	ceph: avoid dcache readdir for snapdir We should always go to the MDS for readdir on the hidden snapdir. The set of snapshots can change at any time; the client can't trust its cache for that. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-22 13:50:45 -07:00
David Howells	4c0c03ca54	CIFS: Fix a malicious redirect problem in the DNS lookup code Fix the security problem in the CIFS filesystem DNS lookup code in which a malicious redirect could be installed by a random user by simply adding a result record into one of their keyrings with add_key() and then invoking a CIFS CFS lookup [CVE-2010-2524]. This is done by creating an internal keyring specifically for the caching of DNS lookups. To enforce the use of this keyring, the module init routine creates a set of override credentials with the keyring installed as the thread keyring and instructs request_key() to only install lookup result keys in that keyring. The override is then applied around the call to request_key(). This has some additional benefits when a kernel service uses this module to request a key: (1) The result keys are owned by root, not the user that caused the lookup. (2) The result keys don't pop up in the user's keyrings. (3) The result keys don't come out of the quota of the user that caused the lookup. The keyring can be viewed as root by doing cat /proc/keys: 2a0ca6c3 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: 1/4 It can then be listed with 'keyctl list' by root. # keyctl list 0x2a0ca6c3 1 key in keyring: 726766307: --alswrv 0 0 dns_resolver: foo.bar.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <smfrench@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-22 09:42:40 -07:00
Ingo Molnar	9dcdbf7a33	Merge branch 'linus' into perf/core Merge reason: Pick up the latest perf fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-21 21:43:06 +02:00
Johan Hedberg	63c7d09cd5	Bluetooth: Add HCIUARTSETFLAGS and HCIUARTGETFLAGS ioctls This patch introduces two new ioctls: HCIUARTSETFLAGS and HCIUARTGETFLAGS. The only flag available for now is HCI_UART_RAW_DEVICE which allows to initialize a UART device into RAW mode from userspace. This is particularly useful for experimenting with Bluetooth controllers that don't yet have proper support in BlueZ. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-07-21 10:39:11 -07:00
Johan Hedberg	2cdf096fff	Bluetooth: Add missing HCIUARTGETDEVICE ioctl to compat_ioctl.c Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-07-21 10:39:11 -07:00
Johan Hedberg	f03585689f	Bluetooth: Add blacklist support for incoming connections In some circumstances it could be desirable to reject incoming connections on the baseband level. This patch adds this feature through two new ioctl's: HCIBLOCKADDR and HCIUNBLOCKADDR. Both take a simple Bluetooth address as a parameter. BDADDR_ANY can be used with HCIUNBLOCKADDR to remove all devices from the blacklist. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-07-21 10:39:05 -07:00
Linus Torvalds	a4ce96ac35	Fix up trivial spelling errors ('taht' -> 'that') Pointed out by Lucas who found the new one in a comment in setup_percpu.c. And then I fixed the others that I grepped for. Reported-by: Lucas <canolucas@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-21 09:25:42 -07:00
Jiaying Zhang	fb5ffb0e16	quota: Change quota error message to print out disk and function name The current quota error message doesn't always print the disk name, so it is hard to identify the "bad" disk when quota error happens. This patch changes the standardized quota error message to print out disk name and function name. It also uses a combination of cpp macro and inline function to provide better type checking and to lower the text size of the message. [Jan Kara: Export __quota_error] Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-21 16:05:58 +02:00
Jan Kara	f25f624263	ext3: Avoid filesystem corruption after a crash under heavy delete load It can happen that ext3_free_branches calls ext3_forget() for an indirect block in an earlier transaction than a transaction in which we clear pointer to this indirect block. Thus if we crash before a transaction clearing the block pointer is committed, we will see indirect block pointing to already freed blocks and complain during orphan list cleanup. The fix is simple: Make sure ext3_forget() is called in the transaction doing block pointer clearing. This is a backport of an ext4 fix by Amir G. <amir73il@users.sourceforge.net> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-21 16:04:26 +02:00
Christoph Hellwig	4c4d390122	ext3: remove vestiges of nobh support The nobh option was only supported for writeback mode, but given that all write paths (except mmapped writed) actually create buffer heads, it effectively was a no-op already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-21 16:01:47 +02:00
Andi Kleen	0411ba7902	ext3: Fix set but unused variables [tytso@mit.edu: Fix compilation with CONFIG_JBD_DEBUG enabled] Acked-by: tytso@mit.edu cc: linux-ext4@vger.kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-21 16:01:47 +02:00
Christoph Hellwig	189eef59e7	quota: clean up quota active checks The various quota operations check for any quota beeing active on a superblock, and the inode not having the noquota flag. Merge these two checks into a dquot_active check and move that into dquot.c as that's the only place where it's needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-21 16:01:47 +02:00
Christoph Hellwig	ade7ce31c2	quota: Clean up the namespace in dqblk_xfs.h Almost all identifiers use the FS_* namespace, so rename the missing few XFS_* ones to FS_* as well. Without this some people might get upset about having too many XFS names in generic code. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-21 16:01:46 +02:00
Dmitry Monakhov	7af9cce8ae	quota: check quota reservation on remove_dquot_ref Reserved space must being claimed before remove_dquot_ref() for a given inode. Filesystem is responsible for performing force blocks allocation in case of dealloc in ->quota_off. Let's add sanity check for that case. Do it similar to add_dquot_ref(). Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2010-07-21 16:01:46 +02:00
Linus Torvalds	e0959371b4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: do not include cap/dentry releases in replayed messages ceph: reuse request message when replaying against recovering mds ceph: fix creation of ipv6 sockets ceph: fix parsing of ipv6 addresses ceph: fix printing of ipv6 addrs ceph: add kfree() to error path ceph: fix leak of mon authorizer ceph: fix message revocation	2010-07-20 16:27:58 -07:00
Stephen Boyd	59b4856831	fs/Kconfig: Fix typo Userpace -> Userspace Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-07-20 17:30:22 +02:00
Joe Perches	33fa1d909c	fs/ocfs2: Remove unnecessary casts of private_data Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-07-20 17:20:08 +02:00
Linus Torvalds	620d0be881	Merge branch 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev * 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev: xfs: track AGs with reclaimable inodes in per-ag radix tree xfs: convert inode shrinker to per-filesystem contexts mm: add context argument to shrinker callback	2010-07-19 20:18:24 -07:00
Linus Torvalds	ee1039307a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE Btrfs: fix CLONE ioctl destination file size expansion to block boundary Btrfs: fix split_leaf double split corner case	2010-07-19 19:33:02 -07:00
Dave Chinner	16fd536737	xfs: track AGs with reclaimable inodes in per-ag radix tree https://bugzilla.kernel.org/show_bug.cgi?id=16348 When the filesystem grows to a large number of allocation groups, the summing of recalimable inodes gets expensive. In many cases, most AGs won't have any reclaimable inodes and so we are wasting CPU time aggregating over these AGs. This is particularly important for the inode shrinker that gets called frequently under memory pressure. To avoid the overhead, track AGs with reclaimable inodes in the per-ag radix tree so that we can find all the AGs with reclaimable inodes via a simple gang tag lookup. This involves setting the tag when the first reclaimable inode is tracked in the AG, and removing the tag when the last reclaimable inode is removed from the tree. Then the summation process becomes a loop walking the radix tree summing AGs with the reclaim tag set. This significantly reduces the overhead of scanning - a 6400 AG filesystea now only uses about 25% of a cpu in kswapd while slab reclaim progresses instead of being permanently stuck at 100% CPU and making little progress. Clean filesystems filesystems will see no overhead and the overhead only increases linearly with the number of dirty AGs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-20 09:43:39 +10:00
Dave Chinner	70e60ce715	xfs: convert inode shrinker to per-filesystem contexts Now the shrinker passes us a context, wire up a shrinker context per filesystem. This allows us to remove the global mount list and the locking problems that introduced. It also means that a shrinker call does not need to traverse clean filesystems before finding a filesystem with reclaimable inodes. This significantly reduces scanning overhead when lots of filesystems are present. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-20 08:07:02 +10:00
Dan Rosenberg	2ebc346478	Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE 1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it. 2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around). I haven't been able to successfully exploit this, but I'd imagine that a clever attacker could use this to read things he shouldn't. Even if it's not exploitable, it couldn't hurt to be safe. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> cc: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:58:20 -04:00
Sage Weil	b5384d48f4	Btrfs: fix CLONE ioctl destination file size expansion to block boundary The CLONE and CLONE_RANGE ioctls round up the range of extents being cloned to the block size when the range to clone extends to the end of file (this is always the case with CLONE). It was then using that offset when extending the destination file's i_size. Fix this by not setting i_size beyond the originally requested ending offset. This bug was introduced by `a22285a6` (2.6.35-rc1). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:15:06 -04:00
Chris Mason	99d8f83c98	Btrfs: fix split_leaf double split corner case split_leaf was not properly balancing leaves when it was forced to split a leaf twice. This commit adds an extra push left and right before forcing the double split in hopes of getting the slot where we want to insert at either the start or end of the leaf. If the extra pushes do work, then we are able to avoid splitting twice and we keep the tree properly balanced. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-07-19 16:14:50 -04:00
Pavel Machek	a2531293db	update email address pavel@suse.cz no longer works, replace it with working address. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-07-19 10:56:54 +02:00
Peter Oberparleiter	cffab6bc55	[S390] dasd: use correct label location for diag fba disks Partition boundary calculation fails for DASD FBA disks under the following conditions: - disk is formatted with CMS FORMAT with a blocksize of more than 512 bytes - all of the disk is reserved to a single CMS file using CMS RESERVE - the disk is accessed using the DIAG mode of the DASD driver Under these circumstances, the partition detection code tries to read the CMS label block containing partition-relevant information from logical block offset 1, while it is in fact located at physical block offset 1. Fix this problem by using the correct CMS label block location depending on the device type as determined by the DASD SENSE ID information. Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-07-19 09:22:50 +02:00
Dave Chinner	7f8275d0d6	mm: add context argument to shrinker callback The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-07-19 14:56:17 +10:00
Linus Torvalds	bea9a6d239	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Silence gcc warning in ocfs2_write_zero_page(). jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node ocfs2: Don't duplicate pages past i_size during CoW. ocfs2: tighten up strlen() checking ocfs2: Make xattr reflink work with new local alloc reservation. ocfs2: make xattr extension work with new local alloc reservation. ocfs2: Remove the redundant cpu_to_le64. ocfs2/dlm: don't access beyond bitmap size ocfs2: No need to zero pages past i_size. ocfs2: Zero the tail cluster when extending past i_size. ocfs2: When zero extending, do it by page. ocfs2: Limit default local alloc size within bitmap range. ocfs2: Move orphan scan work to ocfs2_wq. fs/ocfs2/dlm: Add missing spin_unlock	2010-07-18 10:09:25 -07:00
Joel Becker	5453258d53	ocfs2: Silence gcc warning in ocfs2_write_zero_page(). ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc doesn't know that. Set ret=0 just to make gcc happy. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-16 13:33:39 -07:00
Sage Weil	e979cf5039	ceph: do not include cap/dentry releases in replayed messages Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-16 10:30:18 -07:00
Sage Weil	01a92f174f	ceph: reuse request message when replaying against recovering mds Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-16 10:30:17 -07:00
Jan Kara	13ceef099e	jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-15 15:17:47 -07:00
Wengang Wang	a39953dd95	ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set before sending DLM_MIG_LOCKRES_MSG message to the target. We are using dlm_migration_can_proceed() for that purpose. However, if the node is down, dlm_migration_can_proceed() will also return "go ahead". In this rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove the BUG_ON() that trips over this condition. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-15 10:56:30 -07:00
Tao Ma	f5e27b6ddf	ocfs2: Don't duplicate pages past i_size during CoW. During CoW, the pages after i_size don't contain valid data, so there's no need to read and duplicate them. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-15 10:54:28 -07:00
Bob Peterson	728a756b8f	GFS2: rename causes kernel Oops This patch fixes a kernel Oops in the GFS2 rename code. The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. In the failing case, gfs2's rename function was renaming a file to another name that had the same non-trivial length. The file being renamed happened to be the first directory entry on the leaf block. First, the rename code (gfs2_rename in ops_inode.c) found the original directory entry and decided it could do its job by simply replacing the directory entry with another. Therefore it determined correctly that no block allocations were needed. Next, the rename code deleted the old directory entry prior to replacing it with the new name. Therefore, the soon-to-be replaced directory entry was temporarily made into a directory entry "sentinel" or a place holder at the start of a leaf block. Lastly, it went to re-add the replacement directory entry in that leaf block. However, when gfs2_dirent_find_space was looking for space in the leaf block, it used the wrong value for the sentinel. That threw off its calculations so later it decides it can't really re-use the sentinel and therefore must allocate a new leaf block. But because it previously decided to re-use the directory entry, it didn't waste the time to grab a new block allocation for the inode. Therefore, the inode's i_alloc pointer was still NULL and it crashes trying to reference it. In the case of sentinel directory entries, the entire dirent is reused, not just the "free space" portion of it, and therefore the function gfs2_dirent_find_space should use the value 0 rather than GFS2_DIRENT_SIZE(0) for the actual dirent size. Fixing this calculation enables the reproducer programs to work properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:07:56 +01:00
Abhijith Das	8b4216018b	GFS2: BUG in gfs2_adjust_quota HighMem pages on i686 do not get mapped to the buffer_heads and this was causing a NULL pointer dereference when we were trying to memset page buffers to zero. We now use zero_user() that kmaps the page and directly manipulates page data. This patch also fixes a boundary condition that was incorrect. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:07:16 +01:00
Bob Peterson	b1becbdee7	GFS2: Fix kernel NULL pointer dereference by dlm_astd This patch fixes a problem in an error path when looking up dinodes. There are two sister-functions, gfs2_inode_lookup and gfs2_process_unlinked_inode. Both functions acquire and hold the i_iopen glock for the dinode being looked up. The last thing they try to do is hold the i_gl glock for the dinode. If that glock fails for some reason, the error path was incorrectly calling gfs2_glock_put for the i_iopen glock twice. This resulted in the glock being prematurely freed. The "minimum hold time" usually kept the glock in memory, but the lock interface to dlm (aka lock_dlm) freed its memory for the glock. In some circumstances, it would cause dlm's dlm_astd daemon to try to call the bast function for the freed lock_dlm memory, which resulted in a NULL pointer dereference. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:06:25 +01:00
Bob Peterson	b7dc2df572	GFS2: recovery stuck on transaction lock This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: Steve Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-07-15 09:05:57 +01:00
Bob Peterson	a8bf2bc212	GFS2: O_TRUNC not working on stuffed files across cluster This patch replaces a statement that got dropped out by accident. Without the patch, truncates on stuffed (very small) files cause those files to have an unpredictable size. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-07-15 09:05:17 +01:00
Artem Bityutskiy	6fb4374f6b	UBIFS: fix GC LEB recovery UBIFS tries to alway have an LEB reserved for GC, and stores it in c->gc_lnum. Besides, there is GC head which points to the current GC head LEB. In case of an unclean power cut, what may happen is that the GC head was switched to the reserved GC LEB (c->gc_lnum), but a new reserved GC LEB was not created yet. So, after an unclean reboot we may have no reserved GC LEB, and we need to find a new LEB for this. To do this, we find a dirty LEB which can fit the current GC head, move the data, unmap this dirty LEB, and it becomes our reserved GC LEB. However, if we cannot find a dirty enough LEB, we return failure, which is wrong, because we still can have free LEBs to use for the reserved GC LEB. This patch fixes the issue. This patch also fixes few typos in comments, which were spotted by aspell. Note, this patch fixes a real issue [ 14.328117] UBIFS: recovery needed [ 53.941378] UBIFS error (pid 462): ubifs_rcvry_gc_commit: could not find a dirty LEB [ 89.606399] UBIFS: recovery completed [ 89.609329] UBIFS assert failed in mount_ubifs at 1358 (pid 462) [ 89.616165] [<c0026144>] (unwind_backtrace+0x0/0xe4) from [<c0125ce4>] (ubifs_fill_super+0x11d0/0x1c4c) [ 89.625930] [<c0125ce4>] (ubifs_fill_super+0x11d0/0x1c4c) from [<c0126910>] (ubifs_get_sb+0x1b0/0x354) [ 89.635696] [<c0126910>] (ubifs_get_sb+0x1b0/0x354) from [<c008a50c>] (vfs_kern_mount+0x50/0xe0) [ 89.644485] [<c008a50c>] (vfs_kern_mount+0x50/0xe0) from [<c008a5e0>] (do_kern_mount+0x34/0xdc) [ 89.653274] [<c008a5e0>] (do_kern_mount+0x34/0xdc) from [<c00a29d8>] (do_mount+0x148/0x7cc) [ 89.662063] [<c00a29d8>] (do_mount+0x148/0x7cc) from [<c00a30f4>] (sys_mount+0x98/0xc8) [ 89.670852] [<c00a30f4>] (sys_mount+0x98/0xc8) from [<c0021f40>] (ret_fast_syscall+0x0/0x28) which was reported here: http://article.gmane.org/gmane.linux.drivers.mtd/29923 by Alexander Pazdnikov <pazdnikov@list.ru> Reported-by: Alexander Pazdnikov <pazdnikov@list.ru> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <adrian.hunter@nokia.com>	2010-07-13 06:51:57 +03:00
Dan Carpenter	e372357ba5	ocfs2: tighten up strlen() checking This function is only called from one place and it's like this: dlm_register_domain(conn->cc_name, dlm_key, &fs_version); The "conn->cc_name" is 64 characters long. If strlen(conn->cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb->uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:57:53 -07:00
Tao Ma	121a39bb00	ocfs2: Make xattr reflink work with new local alloc reservation. The new reservation code in local alloc has add the limitation that the caller should handle the case that the local alloc doesn't give use enough contiguous clusters. It make the old xattr reflink code broken. So this patch udpate the xattr reflink code so that it can handle the case that local alloc give us one cluster at a time. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:57:50 -07:00
Tao Ma	a78f9f4668	ocfs2: make xattr extension work with new local alloc reservation. The old ocfs2_xattr_extent_allocation is too optimistic about the clusters we can get. So actually if the file system is too fragmented, ocfs2_add_clusters_in_btree will return us with EGAIN and we need to allocate clusters once again. So this patch change it to a while loop so that we can allocate clusters until we reach clusters_to_add. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-12 13:57:24 -07:00
Tao Ma	0a463b74e7	ocfs2: Remove the redundant cpu_to_le64. In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno. But actually bg->bg_blkno is already changed to little endian in ocfs2_block_group_fill. So remove the extra cpu_to_le64. Reported-by: Marcos Matsunaga <Marcos.Matsunaga@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:56:18 -07:00
Wengang Wang	f471c9df92	ocfs2/dlm: don't access beyond bitmap size dlm->recovery_map is defined as unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; We should treat O2NM_MAX_NODES as the bit map size in bits. This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-07-12 13:56:14 -07:00
Joel Becker	693c241a5f	ocfs2: No need to zero pages past i_size. When ocfs2 fills a hole, it does so by allocating clusters. When a cluster is larger than the write, ocfs2 must zero the portions of the cluster outside of the write. If the clustersize is smaller than a pagecache page, this is handled by the normal pagecache mechanisms, but when the clustersize is larger than a page, ocfs2's write code will zero the pages adjacent to the write. This makes sure the entire cluster is zeroed correctly. Currently ocfs2 behaves exactly the same when writing past i_size. However, this means ocfs2 is writing zeroed pages for portions of a new cluster that are beyond i_size. The page writeback code isn't expecting this. It treats all pages past the one containing i_size as left behind due to a previous truncate operation. Thankfully, ocfs2 calculates the number of pages it will be working on up front. The rest of the write code merely honors the original calculation. We can simply trim the number of pages to only cover the actual file data. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-12 13:55:27 -07:00
Miklos Szeredi	2d45ba381a	fuse: add retrieve request Userspace filesystem can request data to be retrieved from the inode's mapping. This request is synchronous and the retrieved data is queued as a new request. If the write to the fuse device returns an error then the retrieve request was not completed and a reply will not be sent. Only present pages are returned in the retrieve reply. Retrieving stops when it finds a non-present page and only data prior to that is returned. This request doesn't change the dirty state of pages. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-07-12 14:41:40 +02:00
Miklos Szeredi	a1d75f2582	fuse: add store request Userspace filesystem can request data to be stored in the inode's mapping. This request is synchronous and has no reply. If the write to the fuse device returns an error then the store request was not fully completed (but may have updated some pages). If the stored data overflows the current file size, then the size is extended, similarly to a write(2) on the filesystem. Pages which have been completely stored are marked uptodate. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-07-12 14:41:40 +02:00
Miklos Szeredi	7909b1c640	fuse: don't use atomic kmap Don't use atomic kmap for mapping userspace buffers in device read/write/splice. This is necessary because the next patch (adding store notify) requires that caller of fuse_copy_page() may sleep between invocations. The simplest way to ensure this is to change the atomic kmaps to non-atomic ones. Thankfully architectures where kmap() is not a no-op are going out of fashion, so we can ignore the (probably negligible) performance impact of this change. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-07-12 14:41:40 +02:00
Arnd Bergmann	5f202bd5ca	do_coredump: Do not take BKL core_pattern is not actually protected and hasn't been ever since we introduced procfs support for sysctl -- a _long_ time. Don't take it here either. Also nothing inside do_coredump appears to require bkl protection. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ remove smp_lock.h headers ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-07-10 01:21:49 +02:00
Sage Weil	f91d3471cc	ceph: fix creation of ipv6 sockets Use the address family from the peer address instead of assuming IPv4. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-09 15:00:20 -07:00
Sage Weil	39139f64e1	ceph: fix parsing of ipv6 addresses Check for brackets around the ipv6 address to avoid ambiguity with the port number. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-09 15:00:18 -07:00
Sage Weil	d06dbaf6c2	ceph: fix printing of ipv6 addrs The buffer was too small. Make it bigger, use snprintf(), put brackets around the ipv6 address to avoid mixing it up with the :port, and use the ever-so-handy %pI[46] formats. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-08 16:49:53 -07:00
Joel Becker	5693486bad	ocfs2: Zero the tail cluster when extending past i_size. ocfs2's allocation unit is the cluster. This can be larger than a block or even a memory page. This means that a file may have many blocks in its last extent that are beyond the block containing i_size. There also may be more unwritten extents after that. When ocfs2 grows a file, it zeros the entire cluster in order to ensure future i_size growth will see cleared blocks. Unfortunately, block_write_full_page() drops the pages past i_size. This means that ocfs2 is actually leaking garbage data into the tail end of that last cluster. This is a bug. We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect when a write or truncate is past i_size. They will use ocfs2_zero_extend() to ensure the data is properly zeroed. Older versions of ocfs2_zero_extend() simply zeroed every block between i_size and the zeroing position. This presumes three things: 1) There is allocation for all of these blocks. 2) The extents are not unwritten. 3) The extents are not refcounted. (1) and (2) hold true for non-sparse filesystems, which used to be the only users of ocfs2_zero_extend(). (3) is another bug. Since we're now using ocfs2_zero_extend() for sparse filesystems as well, we teach ocfs2_zero_extend() to check every extent between i_size and the zeroing position. If the extent is unwritten, it is ignored. If it is refcounted, it is CoWed. Then it is zeroed. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-08 13:25:35 -07:00
Joel Becker	a4bfb4cf11	ocfs2: When zero extending, do it by page. ocfs2_zero_extend() does its zeroing block by block, but it calls a function named ocfs2_write_zero_page(). Let's have ocfs2_write_zero_page() handle the page level. From ocfs2_zero_extend()'s perspective, it is now page-at-a-time. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-07-08 13:24:49 -07:00
Linus Torvalds	c77e9e6826	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: writeback: simplify the write back thread queue writeback: split writeback_inodes_wb writeback: remove writeback_inodes_wbc fs-writeback: fix kernel-doc warnings splice: check f_mode for seekable file splice: direct_splice_actor() should not use pos in sd	2010-07-08 08:06:40 -07:00
Dan Carpenter	b0bbb0be8f	ceph: add kfree() to error path We leak a "pi" on this error path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-08 08:03:24 -07:00
Chuck Lever	43a9aa64a2	NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR Some well-known NFSv3 clients drop their directory entry caches when they receive replies with no WCC data. Without this data, they employ extra READ, LOOKUP, and GETATTR requests to ensure their directory entry caches are up to date, causing performance to suffer needlessly. In order to return WCC data, our server has to have both the pre-op and the post-op attribute data on hand when a reply is XDR encoded. The pre-op data is filled in when the incoming fh is locked, and the post-op data is filled in when the fh is unlocked. Unfortunately, for REMOVE, RMDIR, MKNOD, and MKDIR, the directory fh is not unlocked until well after the reply has been XDR encoded. This means that encode_wcc_data() does not have wcc_data for the parent directory, so none is returned to the client after these operations complete. By unlocking the parent directory fh immediately after the internal operations for each NFS procedure is complete, the post-op data is filled in before XDR encoding starts, so it can be returned to the client properly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-07-07 17:12:32 -04:00
Linus Torvalds	1cc9629402	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix crush device 'out' threshold to 1.0, not 0.1 ceph: fix caps usage accounting for import (non-reserved) case ceph: only release clean, unused caps with mds requests ceph: fix crush CHOOSE_LEAF when type is already a leaf ceph: fix crush recursion ceph: fix caps debugfs entry ceph: delay umount until all mds requests drop inode+dentry refs ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulate ceph: fix crush map update decoding ceph: fix message memory leak, uninitialized variable ceph: fix map handler error path ceph: some endianity fixes	2010-07-06 17:15:15 -07:00
J. Bruce Fields	6a85d6c769	nfsd4: comment nitpick Reported-by: "Madan, Anshul" <Anshul.Madan@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-07-06 12:40:22 -04:00
Christoph Hellwig	83ba7b071f	writeback: simplify the write back thread queue First remove items from work_list as soon as we start working on them. This means we don't have to track any pending or visited state and can get rid of all the RCU magic freeing the work items - we can simply free them once the operation has finished. Second use a real completion for tracking synchronous requests - if the caller sets the completion pointer we complete it, otherwise use it as a boolean indicator that we can free the work item directly. Third unify struct wb_writeback_args and struct bdi_work into a single data structure, wb_writeback_work. Previous we set all parameters into a struct wb_writeback_args, copied it into struct bdi_work, copied it again on the stack to use it there. Instead of just allocate one structure dynamically or on the stack and use it all the way through the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-06 08:59:53 +02:00
Christoph Hellwig	edadfb10ba	writeback: split writeback_inodes_wb The case where we have a superblock doesn't require a loop here as we scan over all inodes in writeback_sb_inodes. Split it out into a separate helper to make the code simpler. This also allows to get rid of the sb member in struct writeback_control, which was rather out of place there. Also update the comments in writeback_sb_inodes that explain the handling of inodes from wrong superblocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-06 08:54:08 +02:00
Christoph Hellwig	9c3a8ee8a1	writeback: remove writeback_inodes_wbc This was just an odd wrapper around writeback_inodes_wb. Removing this also allows to get rid of the bdi member of struct writeback_control which was rather out of place there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-06 08:54:03 +02:00
Sage Weil	22b1de06c9	ceph: fix leak of mon authorizer Fix leak of a struct ceph_buffer on umount. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-05 15:36:49 -07:00
Sage Weil	ed98adad3d	ceph: fix message revocation A message can be on a queue (pending or sent), or out_msg (sending), or both. We were assuming that if it's not on a queue it couldn't be out_msg, but that was false in the case of lossy connections like the OSD. Fix ceph_con_revoke() to treat these cases independently. Also, fix the out_kvec_is_message check to only trigger if we are currently sending _this_ message. This fixes a GPF in tcp_sendpage, triggered by OSD restarts. Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-05 12:16:23 -07:00
Sage Weil	153a10939e	ceph: fix crush device 'out' threshold to 1.0, not 0.1 Fix a typo that made any OSD weighted between 0.1 and 1.0 effectively weighted as 1.0 (fully in). Signed-off-by: Sage Weil <sage@newdream.net>	2010-07-05 09:44:17 -07:00
Ingo Molnar	08f8ba0799	Merge commit 'v2.6.35-rc4' into perf/core Merge reason: Pick up the latest perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-05 08:30:58 +02:00
Linus Torvalds	e3668dd83b	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: remove block number from inode lookup code xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED xfs: validate untrusted inode numbers during lookup xfs: always use iget in bulkstat xfs: prevent swapext from operating on write-only files	2010-07-04 20:13:31 -07:00
Ingo Molnar	0a54cec0c2	Merge branch 'linus' into core/rcu Conflicts: fs/fs-writeback.c Merge reason: Resolve the conflict Note, i picked the version from Linus's tree, which effectively reverts the fs-writeback.c bits of: `b97181f`: fs: remove all rcu head initializations, except on_stack initializations As the upstream changes to this file changed this code heavily and the first attempt to resolve the conflict resulted in a non-booting kernel. It's safer to re-try this portion of the commit cleanly. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-01 09:31:25 +02:00
Randy Dunlap	06d738fa91	fs-writeback: fix kernel-doc warnings Fix kernel-doc to match the function's changed args. Warning(fs/fs-writeback.c:190): No description found for parameter 'args' Warning(fs/fs-writeback.c:190): Excess function parameter 'sb' description in 'bdi_queue_work_onstack' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-07-01 08:26:34 +02:00
Changli Gao	19c9a49b43	splice: check f_mode for seekable file check f_mode for seekable file As a seekable file is allowed without a llseek function, so the old way isn't work any more. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> ---- fs/splice.c \| 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-30 08:12:37 +02:00
Changli Gao	2cb4b05e76	splice: direct_splice_actor() should not use pos in sd direct_splice_actor() shouldn't use sd->pos, as sd->pos is for file reading, file->f_pos should be used instead. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> ---- fs/splice.c \| 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-30 08:12:37 +02:00
Andrew Morton	f4985dc714	fs/fcntl.c:kill_fasync_rcu() fa_lock must be IRQ-safe Fix a lockdep-splat-causing regression introduced by commit `989a297920` ("fasync: RCU and fine grained locking"). kill_fasync() can be called from both process and hard-irq context, so fa_lock must be taken with IRQs disabled. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16230 Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:32 -07:00
Lubomir Rintel	46c23d7f52	sysvfs: fix NULL deref. when allocating new inode A call to sysv_write_inode() in sysv_new_inode() to its new interface that replaced wait flag with writeback structure. This was broken by `a9185b41a4` ("pass writeback_control to ->write_inode"). Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> [2.6.34.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:32 -07:00
Mike Frysinger	2952095c6b	flat: tweak default stack alignment The recent commit `1f0ce8b3dd` ("mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slab_def.h>") which moved the ARCH_SLAB_MINALIGN default into the global header inadvertently broke FLAT for a bunch of systems. Blackfin systems now fail on any FLAT exec with: Unable to read code+data+bss, errno 14 When your /init is a FLAT binary, obviously this can be annoying ;). This stems from the alignment usage in the FLAT loader. The behavior before was that FLAT would default to ARCH_SLAB_MINALIGN only if it was defined, and this was only defined by arches when they wanted a larger alignment value. Otherwise it'd default to pointer alignment. Arguably, this is kind of hokey that the FLAT is semi-abusing defines it shouldn't. So let's merge the two alignment requirements so the floor is never 0. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: David McCullough <davidm@snapgear.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Howells <dhowells@redhat.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:31 -07:00
Mike Frysinger	3c26c9d959	nommu: add '[stack]' label to /proc/pid/maps output Add support to the NOMMU /proc/pid/maps file to show which mapping is the stack of the original thread after execve. This is largely based on the MMU code. Subsidiary thread stacks are not indicated. For FDPIC, we now get: root:/> cat /proc/self/maps 02064000-02067ccc rw-p 0004d000 00:01 22 /bin/busybox 0206e000-0206f35c rw-p 00006000 00:01 295 /lib/ld-uClibc.so.0 025f0000-025f6f0c r-xs 00000000 00:01 295 /lib/ld-uClibc.so.0 02680000-026ba6b0 r-xs 00000000 00:01 297 /lib/libc.so.0 02700000-0274d384 r-xs 00000000 00:01 22 /bin/busybox 02816000-02817000 rw-p 00000000 00:00 0 02848000-0284c0d8 rw-p 00000000 00:00 0 02860000-02880000 rw-p 00000000 00:00 0 [stack] The semi-downside here is that for FLAT, we get: root:/> cat /proc/155/maps 029f0000-029f9000 rwxp 00000000 00:00 0 [stack] The reason being that FLAT combines a whole lot of stuff into one map (including the stack). But this isn't any worse than the current output (which is nothing), so screw it. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 15:29:30 -07:00
Theodore Ts'o	90c7201b97	ext4: Pass line number to ext4_journal_abort_handle() This allows the error messages to include the line number Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-29 14:53:24 -04:00
Linus Torvalds	984bc9601f	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Don't count_vm_events for discard bio in submit_bio. cfq: fix recursive call in cfq_blkiocg_update_completion_stats() cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n cfq: Don't allow queue merges for queues that have no process references block: fix DISCARD_BARRIER requests cciss: set SCSI max cmd len to 16, as default is wrong cpqarray: fix two more wrong section type cpqarray: fix wrong __init type on pci probe function drbd: Fixed a race between disk-attach and unexpected state changes writeback: fix pin_sb_for_writeback writeback: add missing requeue_io in writeback_inodes_wb writeback: simplify and split bdi_start_writeback writeback: simplify wakeup_flusher_threads writeback: fix writeback_inodes_wb from writeback_inodes_sb writeback: enforce s_umount locking in writeback_inodes_sb writeback: queue work on stack in writeback_inodes_sb writeback: fix writeback completion notifications	2010-06-29 10:42:52 -07:00
npiggin@suse.de	57439f878a	fs: fix superblock iteration race list_for_each_entry_safe is not suitable to protect against concurrent modification of the list. `6754af6` introduced a race in sb walking. list_for_each_entry can use the trick of pinning the current entry in the list before we drop and retake the lock because it subsequently follows cur->next. However list_for_each_entry_safe saves n=cur->next for following before entering the loop body, so when the lock is dropped, n may be deleted. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-29 10:38:22 -07:00
Theodore Ts'o	e29136f80e	ext4: Enhance ext4_grp_locked_error() to take block and function numbers Also use a macro definition so that __func__ and __LINE__ is implicit. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-29 12:54:28 -04:00
Sage Weil	443b3760a0	ceph: fix caps usage accounting for import (non-reserved) case We need to increase the total and used counters when allocating a new cap in the non-reserved (cap import) case. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-29 09:31:56 -07:00
Sage Weil	ec97f88ba6	ceph: only release clean, unused caps with mds requests We can drop caps with an mds request. Ensure we only drop unused AND clean caps, since the MDS doesn't support cap writeback in that context, nor do we track it. If caps are dirty, and the MDS needs them back, we it will revoke and we will flush in the normal fashion. This fixes a possibly loss of metadata. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-29 09:31:55 -07:00
Theodore Ts'o	c67d859e39	ext4: clean up ext4_abort() so __func__ is now implicit Use a macro definition for ext4_abort() to clean up the .c files a wee bit. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-29 11:07:07 -04:00
Theodore Ts'o	4a9cdec73f	ext4: Add new superblock fields reserved for the Next3 snapshot feature Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-29 11:00:23 -04:00
Thomas Gleixner	f384c954c9	Merge branch 'linus' into perf/core Reason: Further changes conflict with upstream fixes Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-06-28 22:33:24 +02:00
Matthew Whitworth	e956b4b7e3	Fix comment spelling errors in ncpfs/inode.c Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-06-28 11:56:32 +02:00
Tejun Heo	327f935a9e	ocfs2: update gfp/slab.h includes Implicit slab.h inclusion via percpu.h is about to go away. Make sure gfp.h or slab.h is included as necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>	2010-06-28 10:19:19 +10:00
Linus Torvalds	e7865c234f	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: Fix an embarassing typo in encode_attrs() NFSv4: Ensure that /proc/self/mountinfo displays the minor version number NFSv4.1: Ensure that we initialise the session when following a referral SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir() nfs4 use mandatory attribute file type in nfs4_get_root	2010-06-27 09:04:02 -07:00
Linus Torvalds	55982d9400	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: update ctime when changing the file's permission by setfacl ext2: update ctime when changing the file's permission by setfacl	2010-06-27 07:50:47 -07:00
Linus Torvalds	be1d29f59c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: MAINTAINERS: change mailing list address for CIFS cifs: remove bogus first_time check in NTLMv2 session setup code cifs: don't call cifs_new_fileinfo unless cifs_open succeeds cifs: don't ignore cifs_posix_open_inode_helper return value cifs: clean up arguments to cifs_open_inode_helper cifs: pass instantiated filp back after open call cifs: move cifs_new_fileinfo call out of cifs_posix_open cifs: implement drop_inode superblock op cifs: don't attempt busy-file rename unless it's in same directory	2010-06-27 07:34:02 -07:00
Miao Xie	30e2bab2d6	ext3: update ctime when changing the file's permission by setfacl ext3 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, ext3 must update it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz>	2010-06-25 01:20:37 +02:00
Jan Kara	523825bc58	ext2: update ctime when changing the file's permission by setfacl ext2 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, ext2 must update it. Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>. Signed-off-by: Jan Kara <jack@suse.cz>	2010-06-25 01:20:37 +02:00
Sage Weil	a1a31e7342	ceph: fix crush CHOOSE_LEAF when type is already a leaf We may not recurse for CHOOSE_LEAF if we start with a leaf node. When that happens, the out2 vector needs to be filled in with the result. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-24 12:58:14 -07:00
Sage Weil	55bda7aacd	ceph: fix crush recursion There was a longstanding problem with recursion through intervening bucket types on complex hierarchies. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-24 12:55:48 -07:00
Trond Myklebust	1f0e890dba	NFSv4: Clean up struct nfs4_state_owner The 'so_delegations' list appears to be unused. Also eliminate so_client. If we already have so_server, we can get to the nfs_client structure. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-24 15:11:43 -04:00
Yehuda Sadeh	bfaf148eb2	ceph: fix caps debugfs entry The ceph client structure was not set correctly. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-24 09:47:36 -07:00
J. Bruce Fields	cba9ba4b90	nfsd4: fix delegation recall race use-after-free When the rarely-used callback-connection-changing setclientid occurs simultaneously with a delegation recall, we rerun the recall by requeueing it on a workqueue. But we also need to take a reference on the delegation in that case, since the delegation held by the rpc itself will be released by the rpc_release callback. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-24 12:24:55 -04:00
J. Bruce Fields	ac94bf5825	nfsd4: fix deleg leak on callback error Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-24 12:24:53 -04:00
Dave Chinner	7b6259e7a8	xfs: remove block number from inode lookup code The block number comes from bulkstat based inode lookups to shortcut the mapping calculations. We ar enot able to trust anything from bulkstat, so drop the block number as well so that the correct lookups and mappings are always done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:35:17 +10:00
Dave Chinner	1920779e67	xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED Inode numbers may come from somewhere external to the filesystem (e.g. file handles, bulkstat information) and so are inherently untrusted. Rename the flag we use for these lookups to make it obvious we are doing a lookup of an untrusted inode number and need to verify it completely before trying to read it from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:15:47 +10:00
Dave Chinner	7124fe0a5b	xfs: validate untrusted inode numbers during lookup When we decode a handle or do a bulkstat lookup, we are using an inode number we cannot trust to be valid. If we are deleting inode chunks from disk (default noikeep mode), then we cannot trust the on disk inode buffer for any given inode number to correctly reflect whether the inode has been unlinked as the di_mode nor the generation number may have been updated on disk. This is due to the fact that when we delete an inode chunk, we do not write the clusters back to disk when they are removed - instead we mark them stale to avoid them being written back potentially over the top of something that has been subsequently allocated at that location. The result is that we can have locations of disk that look like they contain valid inodes but in reality do not. Hence we cannot simply convert the inode number to a block number and read the location from disk to determine if the inode is valid or not. As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the inode up in the inode allocation btree to determine if the inode number is valid or not. It should be noted even on ikeep filesystems, there is the possibility that blocks on disk may look like valid inode clusters. e.g. if there are filesystem images hosted on the filesystem. Hence even for ikeep filesystems we really need to validate that the inode number is valid before issuing the inode buffer read. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:15:33 +10:00
Christoph Hellwig	7dce11dbac	xfs: always use iget in bulkstat The non-coherent bulkstat versionsthat look directly at the inode buffers causes various problems with performance optimizations that make increased use of just logging inodes. This patch makes bulkstat always use iget, which should be fast enough for normal use with the radix-tree based inode cache introduced a while ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-06-23 18:11:11 +10:00
Dan Rosenberg	1817176a86	xfs: prevent swapext from operating on write-only files This patch prevents user "foo" from using the SWAPEXT ioctl to swap a write-only file owned by user "bar" into a file owned by "foo" and subsequently reading it. It does so by checking that the file descriptors passed to the ioctl are also opened for reading. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 12:07:47 +10:00
J. Bruce Fields	ec8acac84a	nfsd4: remove some debugging code This is overkill. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-22 22:29:03 -04:00
Benny Halevy	9303bbd3de	nfsd: nfs4callback encode_stateid helper function To be used also for the pnfs cb_layoutrecall callback Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd4: fix cb_recall encoding] "nfsd: nfs4callback encode_stateid helper function" forgot to reserve more space after return from the new helper. Reported-by: Michael Groshans <groshans@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-22 17:19:51 -04:00
J. Bruce Fields	4731030d58	nfsd4: translate memory errors to delay, not serverfault If the server is out of memory is better for clients to back off and retry than to just error out. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-22 17:19:36 -04:00
J. Bruce Fields	76407f76e0	nfsd4; fix session reference count leak Note the session has to be put() here regardless of what happens to the client. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-22 17:19:28 -04:00
Trond Myklebust	1055d76d91	NFSv4.1: There is no need to init the session more than once... Set up a flag to ensure that is indeed the case. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:03 -04:00
Trond Myklebust	fe74ba3a8d	NFSv41: Cleanup for nfs4_alloc_session. There is no reason to change the nfs_client state every time we allocate a new session. Move that line into nfs4_init_client_minor_version. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:03 -04:00
Trond Myklebust	d77d76ffb6	NFSv41: Clean up exclusive create Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:03 -04:00
Trond Myklebust	a443234535	NFSv41: Deprecate nfs_client->cl_minorversion Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:02 -04:00
Trond Myklebust	e047a10c12	NFSv41: Fix nfs_async_inode_return_delegation() ugliness Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:02 -04:00
Trond Myklebust	c48f4f3541	NFSv41: Convert the various reboot recovery ops etc to minor version ops Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:02 -04:00
Trond Myklebust	97dc135947	NFSv41: Clean up the NFSv4.1 minor version specific operations Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:02 -04:00
Trond Myklebust	a2118c33aa	NFSv41: Don't store session state in the nfs_client->cl_state Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:02 -04:00
Trond Myklebust	df8964554a	NFSv41: Further cleanup for nfs4_sequence_done Instead of testing if the nfs_client has a session, we should be testing if the struct nfs4_sequence_res was set up with one. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:02 -04:00
Trond Myklebust	035168ab39	NFSv4.1: Make nfs4_setup_sequence take a nfs_server argument In anticipation of the day when we have per-filesystem sessions, and also in order to allow the session to change in the event of a filesystem migration event. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:02 -04:00
Trond Myklebust	71ac6da994	NFSv4.1: Merge the nfs41_proc_async_sequence() and nfs4_proc_sequence() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:01 -04:00
Trond Myklebust	aa5190d0ed	NFSv4: Kill nfs4_async_handle_error() abuses by NFSv4.1 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:01 -04:00
Trond Myklebust	d185a334c7	NFSv4.1: Simplify nfs41_sequence_done() Nobody uses the rpc_status parameter. It is not obvious why we need the struct nfs_client argument either, when we already have that information in the session. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:01 -04:00
Trond Myklebust	2a6e26cdb8	NFSv4.1: Clean up nfs4_setup_sequence Firstly, there is little point in first zeroing out the entire struct nfs4_sequence_res, and then initialising all fields save one. Just initialise the last field to zero... Secondly, nfs41_setup_sequence() has only 2 possible return values: 0, or -EAGAIN, so there is no 'terminate rpc task' case. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:01 -04:00
Trond Myklebust	d5f8d3fe72	NFSv41: Fix a memory leak in nfs41_proc_async_sequence() If the call to rpc_call_async() fails, then the arguments will not be freed, since there will be no call to nfs41_sequence_call_done Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:24:01 -04:00
Trond Myklebust	d3f6baaa34	NFSv4: Fix an embarassing typo in encode_attrs() Apparently, we have never been able to set the atime correctly from the NFSv4 client. Reported-by: 小倉一夫 <ka-ogura@bd6.so-net.ne.jp> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-06-22 13:22:54 -04:00
Trond Myklebust	0be8189f2c	NFSv4: Ensure that /proc/self/mountinfo displays the minor version number Currently, we do not display the minor version mount parameter in the /proc mount info. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-06-22 13:22:53 -04:00
Trond Myklebust	44950b67a6	NFSv4.1: Ensure that we initialise the session when following a referral Put the code that is common to both the referral and ordinary mount cases into a common helper routine. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:22:53 -04:00
Andy Adamson	f799bdb355	nfs4 use mandatory attribute file type in nfs4_get_root S_ISDIR(fsinfo.fattr->mode) checks the file type rather than the mode bits, so we should be checking for the NFS_ATTR_FATTR_TYPE fattr property. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-06-22 13:17:43 -04:00
Sage Weil	17c688c3df	ceph: delay umount until all mds requests drop inode+dentry refs This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-21 16:11:50 -07:00
Sage Weil	d69ed05a80	ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulate Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-21 16:04:10 -07:00
Ingo Molnar	646b1db495	Merge commit 'v2.6.35-rc3' into perf/core Merge reason: Go from -rc1 base to -rc3 base, merge in fixes.	2010-06-18 10:53:19 +02:00
Sage Weil	cebc5be6b6	ceph: fix crush map update decoding If the incremental osdmap has a new crush map, advance the position after decoding so that we can parse the rest of the osdmap properly. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-17 10:22:48 -07:00
Jeff Layton	8a224d4894	cifs: remove bogus first_time check in NTLMv2 session setup code This bug appears to be the result of a cut-and-paste mistake from the NTLMv1 code. The function to generate the MAC key was commented out, but not the conditional above it. The conditional then ended up causing the session setup key not to be copied to the buffer unless this was the first session on the socket, and that made all but the first NTLMv2 session setup fail. Fix this by removing the conditional and all of the commented clutter that made it difficult to see. Cc: Stable <stable@kernel.org> Reported-by: Gunther Deschner <gdeschne@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2010-06-16 13:40:18 -04:00
Jeff Layton	47c78b7f40	cifs: don't call cifs_new_fileinfo unless cifs_open succeeds It's currently possible for cifs_open to fail after it has already called cifs_new_fileinfo. In that situation, the new fileinfo will be leaked as the caller doesn't call fput. That in turn leads to a busy inodes after umount problem since the fileinfo holds an extra inode reference now. Shuffle cifs_open around a bit so that it only calls cifs_new_fileinfo if it's going to succeed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:17 -04:00
Suresh Jayaraman	d9d5d8df95	cifs: don't ignore cifs_posix_open_inode_helper return value ...and ensure that we propagate the error back to avoid any surprises. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>	2010-06-16 13:40:17 -04:00
Jeff Layton	db460242bf	cifs: clean up arguments to cifs_open_inode_helper ...which takes a ton of unneeded arguments and does a lot more pointer dereferencing than is really needed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:17 -04:00
Jeff Layton	6ca9f3bae8	cifs: pass instantiated filp back after open call The current scheme of sticking open files on a list and assuming that cifs_open will scoop them off of it is broken and leads to "Busy inodes after umount..." errors at unmount time. The problem is that there is no guarantee that cifs_open will always be called after a ->lookup or ->create operation. If there are permissions or other problems, then it's quite likely that it won't be called. Fix this by fully instantiating the filp whenever the file is created and pass that filp back to the VFS. If there is a problem, the VFS can clean up the references. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:16 -04:00
Jeff Layton	2422f676fb	cifs: move cifs_new_fileinfo call out of cifs_posix_open Having cifs_posix_open call cifs_new_fileinfo is problematic and inconsistent with how "regular" opens work. It's also buggy as cifs_reopen_file calls this function on a reconnect, which creates a new struct cifsFileInfo that just gets leaked. Push it out into the callers. This also allows us to get rid of the "mnt" arg to cifs_posix_open. Finally, in the event that a cifsFileInfo isn't or can't be created, we always want to close the filehandle out on the server as the client won't have a record of the filehandle and can't actually use it. Make sure that CIFSSMBClose is called in those cases. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:16 -04:00
Jiri Kosina	f1bbbb6912	Merge branch 'master' into for-next	2010-06-16 18:08:13 +02:00
Uwe Kleine-König	421f91d21a	fix typos concerning "initiali[zs]e" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-06-16 18:05:05 +02:00
Steve French	0933a95dfd	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2010-06-16 13:19:36 +00:00
Tao Ma	1739da4054	ocfs2: Limit default local alloc size within bitmap range. In commit `6b82021b9e`, we increase our local alloc size and calculate how much megabytes we can get according to group size and volume size. But we also need to check the maximum bits a local alloc block bitmap can have. With a bs=512, cs=32K, local volume with 160G, it calculate 96MB while the maximum local alloc size is only 76M. So the bitmap will overflow and corrupt the system truncate log file. See bug http://oss.oracle.com/bugzilla/show_bug.cgi?id=1262 Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-06-15 16:50:43 -07:00
Tao Ma	40f165f416	ocfs2: Move orphan scan work to ocfs2_wq. We used to let orphan scan work in the default work queue, but there is a corner case which will make the system deadlock. The scenario is like this: 1. set heartbeat threadshold to 200. this will allow us to have a great chance to have a orphan scan work before our quorum decision. 2. mount node 1. 3. after 1~2 minutes, mount node 2(in order to make the bug easier to reproduce, better add maxcpus=1 to kernel command line). 4. node 1 do orphan scan work. 5. node 2 do orphan scan work. 6. node 1 do orphan scan work. After this, node 1 hold the orphan scan lock while node 2 know node 1 is the master. 7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection). Now when node 2 begins orphan scan, the system queue is blocked. The root cause is that both orphan scan work and quorum decision work will use the system event work queue. orphan scan has a chance of blocking the event work queue(in dlm_wait_for_node_death) so that there is no chance for quorum decision work to proceed. This patch resolve it by moving orphan scan work to ocfs2_wq. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-06-15 15:43:48 -07:00
Julia Lawall	6469272c35	fs/ocfs2/dlm: Add missing spin_unlock Add a spin_unlock missing on the error path. Unlock as in the other code that leads to the leave label. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-06-15 15:43:46 -07:00
Jan Kara	c6ac12a615	ext4: update ctime when changing the file's permission by setfacl ext4 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, ext4 must update it. Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-15 12:19:59 -04:00
Paul E. McKenney	b97181f242	fs: remove all rcu head initializations, except on_stack initializations Remove all rcu head inits. We don't care about the RCU head state before passing it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can keep track of objects on stack. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andries Brouwer <aeb@cwi.nl>	2010-06-14 16:37:26 -07:00
Christoph Hellwig	206f7ab4f4	ext4: remove vestiges of nobh support The nobh option was only supported for writeback mode, but given that all write paths actually create buffer heads it effectively was a no-op already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-14 14:42:49 -04:00
Andi Kleen	5a0790c2c4	ext4: remove initialized but not read variables No real bugs found, just removed some dead code. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-14 13:28:03 -04:00
Theodore Ts'o	07a038245b	ext4: Convert more i_flags references to use accessor functions These changes are not ones which are likely to result in races, but they should be fixed. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-14 09:54:48 -04:00
Jens Axboe	575f552012	Merge branch 'for-jens' of git://git.drbd.org/linux-2.6-drbd into for-linus	2010-06-14 12:54:57 +02:00
Michael Ellerman	9f069af5b6	of: Drop properties with "/" in their name Some bogus firmwares include properties with "/" in their name. This causes problems when creating the /proc/device-tree file system, because the slash is taken to indicate a directory. We don't care about those properties, and we don't want to encourage them, so just throw them away when creating /proc/device-tree. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Tested-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2010-06-13 18:12:24 -06:00
Sage Weil	ae32be3134	ceph: fix message memory leak, uninitialized variable We need to properly initialize skip, as not all alloc_msg op instances set it. Also, BUG if someone says skip but also allocates a message. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-13 10:34:36 -07:00
Sage Weil	4a32f93d29	ceph: fix map handler error path Don't leak message if we receive an unexpected message type. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-13 10:34:36 -07:00
Yehuda Sadeh	0cf5537b15	ceph: some endianity fixes Fix some problems that came up with sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-13 10:34:36 -07:00
Julia Lawall	6da5156fad	UBIFS: use ERR_CAST Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2010-06-12 14:46:15 +03:00
Artem Bityutskiy	276de5d2a1	UBIFS: check return code The error code from 'ubifs_rcvry_gc_commit()' was ignored, so UBIFS failed to recover and continued. Instead, we should refuse mounting the file-system. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2010-06-12 14:45:25 +03:00
Theodore Ts'o	a0375156ca	ext4: Clean up s_dirt handling We don't need to set s_dirt in most of the ext4 code when journaling is enabled. In ext3/4 some of the summary statistics for # of free inodes, blocks, and directories are calculated from the per-block group statistics when the file system is mounted or unmounted. As a result the superblock doesn't have to be updated, either via the journal or by setting s_dirt. There are a few exceptions, most notably when resizing the file system, where the superblock needs to be modified --- and in that case it should be done as a journalled operation if possible, and s_dirt set only in no-journal mode. This patch will optimize out some unneeded disk writes when using ext4 with a journal. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-11 23:14:04 -04:00
Jeff Layton	12420ac341	cifs: implement drop_inode superblock op The standard behavior for drop_inode is to delete the inode when the last reference to it is put and the nlink count goes to 0. This helps keep inodes that are still considered "not deleted" in cache as long as possible even when there aren't dentries attached to them. When server inode numbers are disabled, it's not possible for cifs_iget to ever match an existing inode (since inode numbers are generated via iunique). In this situation, cifs can keep a lot of inodes in cache that will never be used again. Implement a drop_inode routine that deletes the inode if server inode numbers are disabled on the mount. This helps keep the cifs inode caches down to a more manageable size when server inode numbers are disabled. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-06-12 02:06:52 +00:00
Jeff Layton	ed0e3ace57	cifs: don't attempt busy-file rename unless it's in same directory Busy-file renames don't actually work across directories, so we need to limit this code to renames within the same dir. This fixes the bug detailed here: https://bugzilla.redhat.com/show_bug.cgi?id=591938 Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-06-12 01:45:36 +00:00
Linus Torvalds	b25b550bb1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: The file argument for fsync() is never null Btrfs: handle ERR_PTR from posix_acl_from_xattr() Btrfs: avoid BUG when dropping root and reference in same transaction Btrfs: prohibit a operation of changing acl's mask when noacl mount option used Btrfs: should add a permission check for setfacl Btrfs: btrfs_lookup_dir_item() can return ERR_PTR Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs Btrfs: unwind after btrfs_start_transaction() errors Btrfs: btrfs_iget() returns ERR_PTR Btrfs: handle kzalloc() failure in open_ctree() Btrfs: handle error returns from btrfs_lookup_dir_item() Btrfs: Fix BUG_ON for fs converted from extN Btrfs: Fix null dereference in relocation.c Btrfs: fix remap_file_pages error Btrfs: uninitialized data is check_path_shared() Btrfs: fix fallocate regression Btrfs: fix loop device on top of btrfs	2010-06-11 14:18:47 -07:00
Dan Carpenter	6f902af400	Btrfs: The file argument for fsync() is never null The "file" argument for fsync is never null so we can remove this check. What drew my attention here is that `7ea8085910`: "drop unused dentry argument to ->fsync" introduced an unconditional dereference at the start of the function and that generated a smatch warning. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:40 -04:00
Dan Carpenter	834e74759a	Btrfs: handle ERR_PTR from posix_acl_from_xattr() posix_acl_from_xattr() returns both ERR_PTRs and null, but it's OK to pass null values to set_cached_acl() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:39 -04:00
Sage Weil	15e7000095	Btrfs: avoid BUG when dropping root and reference in same transaction If btrfs_ioctl_snap_destroy() deletes a snapshot but finishes with end_transaction(), the cleaner kthread may come in and drop the root in the same transaction. If that's the case, the root's refs still == 1 in the tree when btrfs_del_root() deletes the item, because commit_fs_roots() hasn't updated it yet (that happens during the commit). This wasn't a problem before only because btrfs_ioctl_snap_destroy() would commit the transaction before dropping the dentry reference, so the dead root wouldn't get queued up until after the fs root item was updated in the btree. Since it is not an error to drop the root reference and the root in the same transaction, just drop the BUG_ON() in btrfs_del_root(). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:39 -04:00
Shi Weihua	731e3d1b43	Btrfs: prohibit a operation of changing acl's mask when noacl mount option used when used Posix File System Test Suite(pjd-fstest) to test btrfs, some cases about setfacl failed when noacl mount option used. I simplified used commands in pjd-fstest, and the following steps can reproduce it. ------------------------ # cd btrfs-part/ # mkdir aaa # setfacl -m m::rw aaa <- successed, but not expected by pjd-fstest. ------------------------ I checked ext3, a warning message occured, like as: setfacl: aaa/: Operation not supported Certainly, it's expected by pjd-fstest. So, i compared acl.c of btrfs and ext3. Based on that, a patch created. Fortunately, it works. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:38 -04:00

... 3 4 5 6 7 ...

18874 Commits