1
linux/fs
Nick Piggin 7ef0d7377c fs: new inode i_state corruption fix
There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121.  There is a script included to
reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem.  I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode.  This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
my case it would look like this:

CPU0                            CPU1
unlock_new_inode()              __sync_single_inode()
 reg <- inode->i_state
 reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
 reg -> inode->i_state          reg -> reg | I_SYNC
                                reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.

After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running.  Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either.  I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.

Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-12 16:20:24 -07:00
..
9p fs/Kconfig: move 9p out 2009-01-22 13:16:01 +03:00
adfs fs/Kconfig: move adfs out 2009-01-22 13:15:56 +03:00
affs fs/Kconfig: move affs out 2009-01-22 13:15:56 +03:00
afs fs/Kconfig: move afs out 2009-01-22 13:16:01 +03:00
autofs fs/Kconfig: move autofs, autofs4 out 2009-01-22 13:15:54 +03:00
autofs4 fs/Kconfig: move autofs, autofs4 out 2009-01-22 13:15:54 +03:00
befs fs/Kconfig: move befs out 2009-01-22 13:15:57 +03:00
bfs fs/Kconfig: move bfs out 2009-01-22 13:15:57 +03:00
btrfs Btrfs: fix spinlock assertions on UP systems 2009-03-09 11:45:38 -04:00
cifs [CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts 2009-02-21 03:37:10 +00:00
coda fs/Kconfig: move coda out 2009-01-22 13:16:01 +03:00
configfs Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()" 2009-02-04 09:46:25 -08:00
cramfs fs/Kconfig: move cramfs out 2009-01-22 13:15:58 +03:00
debugfs debugfs: add helpers for exporting a size_t simple value 2009-01-07 10:00:16 -08:00
devpts devpts: remove graffiti 2009-03-10 15:55:11 -07:00
dlm dlm: initialize file_lock struct in GETLK before copying conflicting lock 2009-01-21 15:28:45 -06:00
ecryptfs eCryptfs: Regression in unencrypted filename symlinks 2009-02-06 18:36:40 -08:00
efs fs/Kconfig: move efs out 2009-01-22 13:15:57 +03:00
exportfs Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
ext2 ext2/xip: refuse to change xip flag during remount with busy inodes 2009-02-11 14:25:36 -08:00
ext3 ext3: revert "ext3: wait on all pending commits in ext3_sync_fs" 2009-02-11 14:25:35 -08:00
ext4 ext4: fix ext4_free_inode() vs. ext4_claim_inode() race 2009-03-04 18:38:18 -05:00
fat Fix _fat_bmap() locking 2009-03-11 12:04:18 -07:00
freevxfs fs/Kconfig: move vxfs out 2009-01-22 13:15:58 +03:00
fuse Merge branch 'Kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/misc 2009-01-26 10:08:50 -08:00
gfs2 filesystem freeze: add error handling of write_super_lockfs/unlockfs 2009-01-09 16:54:42 -08:00
hfs fs/Kconfig: move hfs, hfsplus out 2009-01-22 13:15:57 +03:00
hfsplus fs/Kconfig: move hfs, hfsplus out 2009-01-22 13:15:57 +03:00
hostfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
hpfs fs/Kconfig: move hpfs out 2009-01-22 13:15:59 +03:00
hppfs
hugetlbfs Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
isofs fs/Kconfig: move iso9660, udf out 2009-01-22 13:15:55 +03:00
jbd jbd: fix return value of journal_start_commit() 2009-02-11 14:25:35 -08:00
jbd2 jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() 2009-02-10 11:15:34 -05:00
jffs2 [JFFS2] fix mount crash caused by removed nodes 2009-02-21 11:09:29 +01:00
jfs fs/Kconfig: move jfs out 2009-01-22 13:15:54 +03:00
lockd lockd: fix regression in lockd's handling of blocked locks 2009-02-09 13:19:46 -05:00
minix fs/Kconfig: move minix out 2009-01-22 13:15:58 +03:00
ncpfs fs/Kconfig: move the rest of ncpfs out 2009-01-22 13:16:01 +03:00
nfs fs/Kconfig: move nfs out 2009-01-22 13:16:00 +03:00
nfs_common
nfsd nfsd: only set file_lock.fl_lmops in nfsd4_lockt if a stateowner is found 2009-01-27 17:26:59 -05:00
nls
notify inotify: fix GFP_KERNEL related deadlock 2009-02-18 15:37:56 -08:00
ntfs fs/Kconfig: move ntfs out 2009-01-22 13:15:55 +03:00
ocfs2 ocfs2: add IO error check in ocfs2_get_sector() 2009-02-26 11:51:12 -08:00
omfs fs/Kconfig: move omfs out 2009-01-22 13:15:58 +03:00
openpromfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
partitions block: fix bug in ptbl lookup cache 2009-01-09 21:46:13 +01:00
proc proc: fix kflags to uflags copying in /proc/kpageflags 2009-03-11 07:43:33 -07:00
qnx4 fs/Kconfig: move qnx4 out 2009-01-22 13:15:59 +03:00
ramfs NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area() 2009-01-08 12:04:46 +00:00
reiserfs fs/Kconfig: move reiserfs out 2009-01-22 13:15:53 +03:00
romfs fs/Kconfig: move romfs out 2009-01-22 13:15:59 +03:00
smbfs fs/Kconfig: move smbfs out 2009-01-22 13:16:01 +03:00
squashfs Squashfs: frag_size should be signed, as it can hold an error result 2009-03-05 00:55:31 +00:00
sysfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 2009-01-26 10:40:28 -08:00
sysv fs/Kconfig: move sysv out 2009-01-22 13:15:59 +03:00
ubifs UBIFS: remove fast unmounting 2009-01-29 16:34:30 +02:00
udf fs/Kconfig: move iso9660, udf out 2009-01-22 13:15:55 +03:00
ufs fs/Kconfig: move ufs out 2009-01-22 13:16:00 +03:00
xfs xfs: only issues a cache flush on unmount if barriers are enabled 2009-03-06 17:35:12 -06:00
aio.c [CVE-2009-0029] System call wrappers part 16 2009-01-14 14:15:25 +01:00
anon_inodes.c anon_inodes: use fops->owner for module refcount 2008-12-31 16:55:44 +02:00
attr.c
bad_inode.c kill ->dir_notify() 2008-12-31 18:07:43 -05:00
binfmt_aout.c sanitize ifdefs in binfmt_aout 2009-01-03 11:45:54 -08:00
binfmt_elf_fdpic.c FDPIC: Don't attempt to expand the userspace stack to fill the space allocated 2009-01-08 12:04:47 +00:00
binfmt_elf.c elf core dump: fix get_user use 2009-02-06 17:34:07 -08:00
binfmt_em86.c
binfmt_flat.c FLAT: Don't attempt to expand the userspace stack to fill the space allocated 2009-01-08 12:04:47 +00:00
binfmt_misc.c fs/binfmt_misc.c: add terminating newline to /proc/sys/fs/binfmt_misc/status 2009-01-06 15:59:19 -08:00
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Remove obsolete BUG_ON 2009-01-30 12:34:36 +01:00
bio.c block: fix bogus gcc warning for uninitialized var usage 2009-02-26 10:45:48 +01:00
block_dev.c filesystem freeze: implement generic freeze feature 2009-01-09 16:54:42 -08:00
buffer.c Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2009-02-18 18:33:04 -08:00
char_dev.c fs: fix name overwrite in __register_chrdev_region() 2009-01-06 15:59:13 -08:00
compat_binfmt_elf.c
compat_ioctl.c Fix FREEZE/THAW compat_ioctl regression 2009-02-27 16:27:45 -08:00
compat.c CRED: Fix SUID exec regression 2009-02-07 08:46:18 +11:00
dcache.c EXPORT_SYMBOL(d_obtain_alias) rather than EXPORT_SYMBOL_GPL 2009-02-27 16:26:20 -08:00
dcookies.c [CVE-2009-0029] System call wrapper special cases 2009-01-14 14:15:18 +01:00
direct-io.c fs: truncate blocks outside i_size after O_DIRECT write error 2009-01-06 15:59:06 -08:00
dquot.c quota: Improve locking 2009-01-16 18:02:10 +01:00
drop_caches.c
eventfd.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
eventpoll.c epoll: drop max_user_instances and rely only on max_user_watches 2009-01-29 18:04:45 -08:00
exec.c CRED: Fix SUID exec regression 2009-02-07 08:46:18 +11:00
fcntl.c [CVE-2009-0029] System call wrappers part 15 2009-01-14 14:15:24 +01:00
fifo.c
file_table.c filp_cachep can be static in fs/file_table.c 2008-12-31 18:07:42 -05:00
file.c
filesystems.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
fs-writeback.c fs: new inode i_state corruption fix 2009-03-12 16:20:24 -07:00
generic_acl.c
inode.c fs: new inode i_state corruption fix 2009-03-12 16:20:24 -07:00
internal.h CRED: Fix SUID exec regression 2009-02-07 08:46:18 +11:00
ioctl.c [CVE-2009-0029] System call wrappers part 15 2009-01-14 14:15:24 +01:00
ioprio.c [CVE-2009-0029] System call wrappers part 28 2009-01-14 14:15:30 +01:00
Kconfig fs/Kconfig: move 9p out 2009-01-22 13:16:01 +03:00
Kconfig.binfmt CORE_DUMP_DEFAULT_ELF_HEADERS depends on ELF_CORE 2009-01-09 16:54:41 -08:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
locks.c [CVE-2009-0029] System call wrappers part 16 2009-01-14 14:15:25 +01:00
Makefile ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2 2009-02-28 09:50:01 -05:00
mbcache.c
mpage.c do_mpage_readpage(): remove useless clear_buffer_mapped() call 2009-01-06 15:59:01 -08:00
namei.c [CVE-2009-0029] System call wrappers part 29 2009-01-14 14:15:30 +01:00
namespace.c Fix incomplete __mntput locking 2009-02-17 14:02:08 -08:00
nfsctl.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
no-block.c
open.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
pipe.c pipe_rdwr_fasync: fix the error handling to prevent the leak/crash 2009-03-12 16:20:23 -07:00
pnode.c
pnode.h
posix_acl.c
quota_tree.c quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_tree.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_v1.c quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quota_v2.c quota: Convert union in mem_dqinfo to a pointer 2009-01-05 08:40:21 -08:00
quota.c [CVE-2009-0029] System call wrappers part 20 2009-01-14 14:15:26 +01:00
quotaio_v1.h quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quotaio_v2.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
read_write.c [CVE-2009-0029] System call wrappers part 20 2009-01-14 14:15:26 +01:00
read_write.h
readdir.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
select.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
seq_file.c seq_file: properly cope with pread 2009-02-18 15:37:53 -08:00
signalfd.c [CVE-2009-0029] System call wrappers part 31 2009-01-14 14:15:31 +01:00
splice.c [CVE-2009-0029] System call wrappers part 31 2009-01-14 14:15:31 +01:00
stack.c
stat.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
super.c vfs: add missing unlock in sget() 2009-03-12 16:20:23 -07:00
sync.c [CVE-2009-0029] System call wrappers part 09 2009-01-14 14:15:21 +01:00
timerfd.c timerfd: add flags check 2009-02-18 15:37:53 -08:00
utimes.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
xattr_acl.c
xattr.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00