linux

Author	SHA1	Message	Date
Jens Axboe	053c525fcf	buffer: switch do_emergency_thaw() away from pdflush_operation() This is (again) a preparatory patch similar to commit `a2a9537ac0`. It open codes a simple async way of executing do_thaw_all() out of context, so we can get rid of pdflush. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 08:28:12 +02:00
Linus Torvalds	e9de427e40	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix "direct_io" private mmap fuse: fix argument type in fuse_get_user_pages()	2009-04-14 10:12:07 -07:00
Linus Torvalds	9fc0178caa	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix possible mismatch of sufile counters on recovery nilfs2: segment usage file cleanups nilfs2: fix wrong accounting and duplicate brelse in nilfs_sufile_set_error nilfs2: simplify handling of active state of segments fix nilfs2: remove module version nilfs2: fix lockdep recursive locking warning on meta data files nilfs2: fix lockdep recursive locking warning on bmap nilfs2: return f_fsid for statfs2	2009-04-14 10:10:53 -07:00
Jan Kara	316cb4ef3e	ext2: fix data corruption for racing writes If two writers allocating blocks to file race with each other (e.g. because writepages races with ordinary write or two writepages race with each other), ext2_getblock() can be called on the same inode in parallel. Before we are going to allocate new blocks, we have to recheck the block chain we have obtained so far without holding truncate_mutex. Otherwise we could overwrite the indirect block pointer set by the other writer leading to data loss. The below test program by Ying is able to reproduce the data loss with ext2 on in BRD in a few minutes if the machine is under memory pressure: long kMemSize = 50 << 20; int kPageSize = 4096; int main(int argc, char *argv) { int status; int count = 0; int i; char fname = "/mnt/test.mmap"; char *mem; unlink(fname); int fd = open(fname, O_CREAT \| O_EXCL \| O_RDWR, 0600); status = ftruncate(fd, kMemSize); mem = mmap(0, kMemSize, PROT_READ \| PROT_WRITE, MAP_SHARED, fd, 0); // Fill the memory with 1s. memset(mem, 1, kMemSize); sleep(2); for (i = 0; i < kMemSize; i++) { int byte_good = mem[i] != 0; if (!byte_good && ((i % kPageSize) == 0)) { //printf("%d ", i / kPageSize); count++; } } munmap(mem, kMemSize); close(fd); unlink(fname); if (count > 0) { printf("Running %d bad page\n", count); return 1; } return 0; } Cc: Ying Han <yinghan@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-13 15:04:33 -07:00
Jan Kara	3243387948	jbd: update locking coments Update information about locking in JBD revoke code. Reported-by: Lin Tan <tammy000@gmail.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-13 15:04:32 -07:00
Dave Anderson	eb2e5f452a	hfs: fix memory leak when unmounting When an HFS filesystem is unmounted, it leaks a 2-page bitmap. Also, under extreme memory pressure, it's possible that hfs_releasepage() may use a tree pointer that has not been initialized, and if so, the release request should just be rejected. [akpm@linux-foundation.org: free_pages(0) is legal, remove obvious comment] Signed-off-by: Dave Anderson <anderson@redhat.com> Tested-by: Eugene Teo <eugeneteo@kernel.sg> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-13 15:04:29 -07:00
Linus Torvalds	3c1795cc4b	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: remove xfs_flush_space xfs: flush delayed allcoation blocks on ENOSPC in create xfs: block callers of xfs_flush_inodes() correctly xfs: make inode flush at ENOSPC synchronous xfs: use xfs_sync_inodes() for device flushing xfs: inform the xfsaild of the push target before sleeping xfs: prevent unwritten extent conversion from blocking I/O completion xfs: fix double free of inode xfs: validate log feature fields correctly	2009-04-13 14:35:13 -07:00
Ryusuke Konishi	c85399c2da	nilfs2: fix possible mismatch of sufile counters on recovery On-disk counters ndirtysegs and ncleansegs of sufile, can go wrong after roll-forward recovery because nilfs_prepare_segment_for_recovery() function marks segments dirty without adjusting value of these counters. This fixes the problem by adding a function to sufile which does the operation adjusting the counters, and by letting the recovery function use it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:52 +09:00
Ryusuke Konishi	a703018f7b	nilfs2: segment usage file cleanups This will simplify sufile.c by sharing common code which repeatedly appears in routines updating a segment usage entry; a wrapper function nilfs_sufile_update() is introduced for the purpose, and counter modifications are integrated to a new function nilfs_sufile_mod_counter(). This is a preparation for the successive bugfix patch ("nilfs2: fix possible mismatch of sufile counters on recovery"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	88072faf9a	nilfs2: fix wrong accounting and duplicate brelse in nilfs_sufile_set_error The nilfs_sufile_set_error() function wrongly adjusts the number of dirty segments instead of the number of clean segments. In addition, the function calls brelse() twice for the same buffer head. This fixes these bugs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	3efb55b496	nilfs2: simplify handling of active state of segments fix This fixes a bug of ("nilfs2: simplify handling of active state of segments") patch. The patch did not take account that a base index is increased in nilfs_sufile_get_suinfo() function if requested entries go across block boundary on sufile. Due to this bug, the active flag sometimes appears on wrong segments and has induced malfunction of garbage collection. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	e7a7402c0d	nilfs2: remove module version A MODULE_VERSION() macro has been used in out-of-tree nilfs modules, but it's needless and not updated in tree. So, this removes it along with the version declaration. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:50 +09:00
Ryusuke Konishi	c2698e50e3	nilfs2: fix lockdep recursive locking warning on meta data files This fixes the following false detection of lockdep against nilfs meta data files: ============================================= [ INFO: possible recursive locking detected ] 2.6.29 #26 --------------------------------------------- mount.nilfs2/4185 is trying to acquire lock: (&mi->mi_sem){----}, at: [<d0c7925b>] nilfs_sufile_get_stat+0x1e/0x105 [nilfs2] but task is already holding lock: (&mi->mi_sem){----}, at: [<d0c72026>] nilfs_count_free_blocks+0x48/0x84 [nilfs2] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:50 +09:00
Ryusuke Konishi	bcb48891b0	nilfs2: fix lockdep recursive locking warning on bmap The bmap semaphore of DAT file can be held while a bmap of other files is locked. This has caused the following false detection of lockdep check: mount.nilfs2/4667 is trying to acquire lock: (&bmap->b_sem){..--}, at: [<d0c6c4b4>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] but task is already holding lock: (&bmap->b_sem){..--}, at: [<d0c6c4b4>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] This will fix the false detection by distinguishing semaphores of the DAT and other files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:49 +09:00
Ryusuke Konishi	c306af23e1	nilfs2: return f_fsid for statfs2 This follows the change of Coly Li's series ("fs: return f_fsid for statfs(2)"), and make nilfs2 return f_fsid info for statfs(2). Acked-by: Coly Li <coly.li@suse.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:49 +09:00
Linus Torvalds	54f93b74cf	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: check block device size on mount ext4: Fix off-by-one-error in ext4_valid_extent_idx() ext4: Fix big-endian problem in __ext4_check_blockref()	2009-04-09 16:42:05 -07:00
Felix Blyakher	dc2a5536d6	Merge branch 'master' into for-linus	2009-04-09 14:12:07 -05:00
Stoyan Gaydarov	11ff5f6aff	afs: BUG to BUG_ON changes Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-09 10:41:19 -07:00
Miklos Szeredi	3121bfe763	fuse: fix "direct_io" private mmap MAP_PRIVATE mmap could return stale data from the cache for "direct_io" files. Fix this by flushing the cache on mmap. Found with a slightly modified fsx-linux. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-09 17:37:53 +02:00
Miklos Szeredi	ce60a2f157	fuse: fix argument type in fuse_get_user_pages() Fix the following warning: fs/fuse/file.c: In function 'fuse_direct_io': fs/fuse/file.c:1002: warning: passing argument 3 of 'fuse_get_user_pages' from incompatible pointer type This was introduced by commit `f4975c67` "fuse: allow kernel to access "direct_io" files". Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-04-09 17:37:52 +02:00
Linus Torvalds	a7b334de4d	Merge branch 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext3: Try to avoid starting a transaction in writepage for data=writepage block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG	2009-04-08 17:42:32 -07:00
Nobuhiro Iwamatsu	4c967291fc	nommu: fix typo vma->pg_off to vma->vm_pgoff `6260a4b052` ("/proc/pid/maps: don't show pgoff of pure ANON VMAs" had a typo. fs/proc/task_nommu.c:138: error: 'struct vm_area_struct' has no member named 'pg_off' distcc[21484] ERROR: compile fs/proc/task_nommu.c on sprygo/32 failed Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-08 10:21:44 -07:00
Alexander Beregalov	2b3fffefea	befs: fix build on parisc fs/befs/super.c:85: error: 'PAGE_SIZE' undeclared Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-08 10:21:43 -07:00
Jan Kara	430db323fa	ext3: Try to avoid starting a transaction in writepage for data=writepage This does the same as commit `9e80d40773` (avoid starting a transaction when no block allocation is needed) but for data=writeback mode of ext3. We also cleanup the data=ordered case a bit to stick to coding style... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-08 13:15:10 -04:00
Theodore Ts'o	6e34eeddf7	block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG, use WRITE_SYNC_PLUG in __block_write_full_page() to avoid unplugging the block device I/O queue between each page that gets flushed out. Otherwise, when we run sync() or fsync() and we need to write out a large number of pages, the block device queue will get unplugged between for every page that is flushed out, which will be a pretty serious performance regression caused by commit `a64c8610`. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-08 13:15:09 -04:00
Trond Myklebust	2b2ec7554c	NFS: Fix the return value in nfs_page_mkwrite() Commit `c2ec175c39` ("mm: page_mkwrite change prototype to match fault") exposed a bug in the NFS implementation of page_mkwrite. We should be returning 0 on success... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 14:07:03 -07:00
From: Thiemo Nagel	0f2ddca66d	ext4: check block device size on mount Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-07 14:07:47 -04:00
Miklos Szeredi	7bfac9ecf0	splice: fix deadlock in splicing to file There's a possible deadlock in generic_file_splice_write(), splice_from_pipe() and ocfs2_file_splice_write(): - task A calls generic_file_splice_write() - this calls inode_double_lock(), which locks i_mutex on both pipe->inode and target inode - ordering depends on inode pointers, can happen that pipe->inode is locked first - __splice_from_pipe() needs more data, calls pipe_wait() - this releases lock on pipe->inode, goes to interruptible sleep - task B calls generic_file_splice_write(), similarly to the first - this locks pipe->inode, then tries to lock inode, but that is already held by task A - task A is interrupted, it tries to lock pipe->inode, but fails, as it is already held by task B - ABBA deadlock Fix this by explicitly ordering locks: the outer lock must be on target inode and the inner lock (which is later unlocked and relocked) must be on pipe->inode. This is OK, pipe inodes and target inodes form two nonoverlapping sets, generic_file_splice_write() and friends are not called with a target which is a pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:34:46 -07:00
Ryusuke Konishi	612392307c	nilfs2: support nanosecond timestamp After a review of user's feedback for finding out other compatibility issues, I found nilfs improperly initializes timestamps in inode; CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs didn't have nanosecond timestamps on disk. A few users gave us the report that the tar program sometimes failed to expand symbolic links on nilfs, and it turned out to be the cause. Instead of applying the above displacement, I've decided to support nanosecond timestamps on this occation. Fortunetaly, a needless 64-bit field was in the nilfs_inode struct, and I found it's available for this purpose without impact for the users. So, this will do the enhancement and resolve the tar problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	e339ad31f5	nilfs2: introduce secondary super block The former versions didn't have extra super blocks. This improves the weak point by introducing another super block at unused region in tail of the partition. This doesn't break disk format compatibility; older versions just ingore the secondary super block, and new versions just recover it if it doesn't exist. The partition created by an old mkfs may not have unused region, but in that case, the secondary super block will not be added. This doesn't make more redundant copies of the super block; it is a future work. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	cece552074	nilfs2: simplify handling of active state of segments will reduce some lines of segment constructor. Previously, the state was complexly controlled through a list of segments in order to keep consistency in meta data of usage state of segments. Instead, this presents ``calculated'' active flags to userland cleaner program and stop maintaining its real flag on disk. Only by this fake flag, the cleaner cannot exactly know if each segment is reclaimable or not. However, the recent extension of nilfs_sustat ioctl struct (nilfs2-extend-nilfs_sustat-ioctl-struct.patch) can prevent the cleaner from reclaiming in-use segment wrongly. So, now I can apply this for simplification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	c96fa464a5	nilfs2: mark minor flag for checkpoint created by internal operation Nilfs creates checkpoints even for garbage collection or metadata updates such as checkpoint mode change. So, user often sees checkpoints created only by such internal operations. This is inconvenient in some situations. For example, application that monitors checkpoints and changes them to snapshots, will fall into an infinite loop because it cannot distinguish internally created checkpoints. This patch solves this sort of problem by adding a flag to checkpoint for identification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	458c5b0822	nilfs2: clean up sketch file The sketch file is a file to mark checkpoints with user data. It was experimentally introduced in the original implementation, and now obsolete. The file was handled differently with regular files; the file size got truncated when a checkpoint was created. This stops the special treatment and will treat it as a regular file. Most users are not affected because mkfs.nilfs2 no longer makes this file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	e626874685	nilfs2: super block operations fix endian bug This adds a missing endian conversion of checksum field in the super block. This fixes compatibility issue on big endian machines which will come to surface after supporting recovery of super block. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	1f5abe7e7d	nilfs2: replace BUG_ON and BUG calls triggerable from ioctl Pekka Enberg advised me: > It would be nice if BUG(), BUG_ON(), and panic() calls would be > converted to proper error handling using WARN_ON() calls. The BUG() > call in nilfs_cpfile_delete_checkpoints(), for example, looks to be > triggerable from user-space via the ioctl() system call. This will follow the comment and keep them to a minimum. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	2c2e52fc4f	nilfs2: extend nilfs_sustat ioctl struct This adds a new argument to the nilfs_sustat structure. The extended field allows to delete volatile active state of segments, which was needed to protect freshly-created segments from garbage collection but has confused code dealing with segments. This extension alleviates the mess and gives room for further simplifications. The volatile active flag is not persistent, so it's eliminable on this occasion without affecting compatibility other than the ioctl change. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	7a9461939a	nilfs2: use unlocked_ioctl Pekka Enberg suggested converting ->ioctl operations to use ->unlocked_ioctl to avoid BKL. The conversion was verified to be safe, so I will take it on this occasion. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	8082d36aed	nilfs2: remove compat ioctl code This removes compat code from the nilfs ioctls and applies the same function for both .ioctl and .compat_ioctl file operations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	dc498d09be	nilfs2: use fixed sized types for ioctl structures Nilfs ioctl had structures not having fixed sized types such as: struct nilfs_argv { void *v_base; size_t v_nmembs; size_t v_size; int v_index; int v_flags; }; Further, some of them are wrongly aligned: e.g. struct nilfs_cpmode { __u64 cm_cno; int cm_mode; }; The size of wrongly aligned structures varies depending on architectures, and it breaks the identity of ioctl commands, which leads to arch dependent errors. Previously, these are compensated by using compat_ioctl. This fixes these problems and allows removal of compat ioctl. Since this will change sizes of those structures, binary compatibility for the past utilities will once break; new utilities have to be used instead. However, it would be helpful to avoid platform dependent problems in the long term. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	1088dcf4c3	nilfs2: remove timedwait ioctl command This removes NILFS_IOCTL_TIMEDWAIT command from ioctl interface along with the related flags and wait queue. The command is terrible because it just sleeps in the ioctl. I prefer to avoid this by devising means of event polling in userland program. By reconsidering the userland GC daemon, I found this is possible without changing behaviour of the daemon and sacrificing efficiency. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	76068c4ff1	nilfs2: fix buggy behavior seen in enumerating checkpoints This will fix the weird behavior of lscp command in listing continuously created checkpoints; the output of lscp is rewinded regularly for the recent nilfs. As a result of debugging, a defect was found in nilfs_cpfile_do_get_cpinfo() function. Though the function can be repeatedly called to enumerate checkpoints and it can skip invalid checkpoint entries, the index value was not carried between successive calls. The bug has long been present, and came to surface after applying a bugfix nilfs2-fix-problems-of-memory-allocation-in-ioctl.patch, which increased frequency of calling the function. The similar bugfix was already applied for ``snapshots'' by nilfs2-fix-gc-failure-on-volumes-keeping-numerous-snapshots.patch. This fixes the problem by making the index argument bidirectional on the function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Pekka Enberg	8acfbf0939	nilfs2: clean up indirect function calling conventions This cleans up the strange indirect function calling convention used in nilfs to follow the normal kernel coding style. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	7fa10d2001	nilfs2: fix improper return values of nilfs_get_cpinfo ioctl A few tool developers gave me requests for fixing inconvenient return value of nilfs_get_cpinfo() ioctl; if the requested mode is NILFS_SNAPSHOT and the specified start entry is not a snapshot, the ioctl unnaturally returns one as the number of acquired snapshot item. In addition, the ioctl function returns an ENOENT error for checkpoints within blocks deleted by garbage collection. These behaviors require corrections for programs which enumerate snapshots. This resolves the inconvenience by changing the return values to zero for the above cases. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	b028fcfc4c	nilfs2: fix gc failure on volumes keeping numerous snapshots This resolves the following failure of nilfs2 cleaner daemon: nilfs_cleanerd[20670]: cannot clean segments: No such file or directory nilfs_cleanerd[20670]: shutdown When creating thousands of snapshots, the cleaner daemon had rarely died as above due to an error returned from the kernel code. After applying the recent patch which fixed memory allocation problems in ioctl (Message-Id: <20081215.155840.105124170.ryusuke@osrg.net>), the problem gets more frequent. It turned out to be a bug of nilfs_ioctl_wrap_copy function and one of its callback routines to read out information of snapshots; if the nilfs_ioctl_wrap_copy function divided a large read request into multiple requests, the second and later requests have failed since a restart position on snapshot meta data was not properly set forward. It's a deficiency of the callback interface that cannot pass the restart position among multiple requests. This patch fixes the issue by allowing nilfs_ioctl_wrap_copy and snapshot read functions to exchange a position argument. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	047180f2d7	nilfs2: insert explanations in gcinode file The file gcinode.c gives buffer cache functions for on-disk blocks moved in garbage collection. Joern Engel has suggested inserting its explanations in the source file (Message-ID: <20080917144146.GD8750@logfs.org> and <20080917224953.GB14644@logfs.org>). This follows the comment. Cc: Joern Engel <joern@logfs.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	47420c7998	nilfs2: avoid double error caused by nilfs_transaction_end Pekka Enberg pointed out that double error handlings found after nilfs_transaction_end() can be avoided by separating abort operation: OK, I don't understand this. The only way nilfs_transaction_end() can fail is if we have NILFS_TI_SYNC set and we fail to construct the segment. But why do we want to construct a segment if we don't commit? I guess what I'm asking is why don't we have a separate nilfs_transaction_abort() function that can't fail for the erroneous case to avoid this double error value tracking thing? This does the separation and renames nilfs_transaction_end() to nilfs_transaction_commit() for clarification. Since, some calls of these functions were used just for exclusion control against the segment constructor, they are replaced with semaphore operations. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	a2e7d2df82	nilfs2: cleanup nilfs_clear_inode This will remove the following unnecessary locks and cleanup code in nilfs_clear_inode(): - unnecessary protection using nilfs_transaction_begin() and nilfs_transaction_end(). - cleanup code of i_dirty list field which is never chained when this function is called. - spinlock used when releasing i_bh field. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	3358b4aaa8	nilfs2: fix problems of memory allocation in ioctl This is another patch for fixing the following problems of a memory copy function in nilfs2 ioctl: (1) It tries to allocate 128KB size of memory even for small objects. (2) Though the function repeatedly tries large memory allocations while reducing the size, GFP_NOWAIT flag is not specified. This increases the possibility of system memory shortage. (3) During the retries of (2), verbose warnings are printed because _GFP_NOWARN flag is not used for the kmalloc calls. The first patch was still doing large allocations by kmalloc which are repeatedly tried while reducing the size. Andi Kleen told me that using copy_from_user for large memory is not good from the viewpoint of preempt latency: On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <andi@firstfloor.org> wrote: > > In the current interface, each data item is copied twice: one is to > > the allocated memory from user space (via copy_from_user), and another > > For such large copies it is better to use multiple smaller (e.g. 4K) > copy user, that gives better real time preempt latencies. Each cfu has a > cond_resched(), but only one, not multiple times in the inner loop. He also advised me that: On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <andi@firstfloor.org> wrote: > Better would be if you could go to PAGE_SIZE. order 0 allocations > are typically the fastest / least likely to stall. > > Also in this case it's a good idea to use __get_free_pages() > directly, kmalloc tends to be become less efficient at larger > sizes. For the function in question, the size of buffer memory can be reduced since the buffer is repeatedly used for a number of small objects. On the other hand, it may incur large preempt latencies for larger buffer because a copy_from_user (and a copy_to_user) was applied only once each cycle. With that, this revision uses the order 0 allocations with __get_free_pages() to fix the original problems. Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Ryusuke Konishi	0c4fb87764	nilfs2: update makefile and Kconfig This adds a Makefile for the nilfs2 file system, and updates the makefile and Kconfig file in the file system directory. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Koji Sato	7942b919f7	nilfs2: ioctl operations This adds userland interface implemented with ioctl. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00

1 2 3 4 5 ...

13624 Commits