1
Commit Graph

419 Commits

Author SHA1 Message Date
Linus Torvalds
d28619f156 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Convert quota statistics to generic percpu_counter
  ext3 uses rb_node = NULL; to zero rb_root.
  quota: Fixup dquot_transfer
  reiserfs: Fix resuming of quotas on remount read-write
  pohmelfs: Remove dead quota code
  ufs: Remove dead quota code
  udf: Remove dead quota code
  quota: rename default quotactl methods to dquot_
  quota: explicitly set ->dq_op and ->s_qcop
  quota: drop remount argument to ->quota_on and ->quota_off
  quota: move unmount handling into the filesystem
  quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
  quota: move remount handling into the filesystem
  ocfs2: Fix use after free on remount read-only

Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c
2010-05-30 09:11:11 -07:00
Christoph Hellwig
7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
jan Blunck
ca572727db fs/: do not fallback to default_llseek() when readdir() uses BKL
Do not use the fallback default_llseek() if the readdir operation of the
filesystem still uses the big kernel lock.

Since llseek() modifies
file->f_pos of the directory directly it may need locking to not confuse
readdir which usually uses file->f_pos directly as well

Since the special characteristics of the BKL (unlocked on schedule) are
not necessary in this case, the inode mutex can be used for locking as
provided by generic_file_llseek().  This is only possible since all
filesystems, except reiserfs, either use a directory as a flat file or
with disk address offsets.  Reiserfs on the other hand uses a 32bit hash
off the filename as the offset so generic_file_llseek() can get used as
well since the hash is always smaller than sb->s_maxbytes (= (512 << 32) -
blocksize).

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Anders Larsen <al@alarsen.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:56 -07:00
Jan Kara
f4b113ae6f reiserfs: Fix resuming of quotas on remount read-write
When quota was suspended on remount-ro, finish_unfinished() will try to turn
it on again (which fails) and also turns the quotas off on exit. Fix the
function to check whether quotas are already on at function entry and do
not turn them off in that case.

CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-27 17:39:36 +02:00
Christoph Hellwig
287a80958c quota: rename default quotactl methods to dquot_
Follow the dquot_* style used elsewhere in dquot.c.

[Jan Kara: Fixed up missing conversion of ext2]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24 14:10:17 +02:00
Christoph Hellwig
307ae18a56 quota: drop remount argument to ->quota_on and ->quota_off
Remount handling has fully moved into the filesystem, so all this is
superflous now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24 14:09:12 +02:00
Christoph Hellwig
e0ccfd959c quota: move unmount handling into the filesystem
Currently the VFS calls into the quotactl interface for unmounting
filesystems.  This means filesystems with their own quota handling
can't easily distinguish between user-space originating quotaoff
and an unount.  Instead move the responsibily of the unmount handling
into the filesystem to be consistent with all other dquot handling.

Note that we do call dquot_disable a lot later now, e.g. after
a sync_filesystem.  But this is fine as the quota code does all its
writes via blockdev's mapping and that is synced even later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24 14:09:12 +02:00
Christoph Hellwig
0f0dd62fdd quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
Instead of having wrappers in the VFS namespace export the dquot_suspend
and dquot_resume helpers directly.  Also rename vfs_quota_disable to
dquot_disable while we're at it.

[Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24 14:06:40 +02:00
Christoph Hellwig
c79d967de3 quota: move remount handling into the filesystem
Currently do_remount_sb calls into the dquot code to tell it about going
from rw to ro and ro to rw.  Move this code into the filesystem to
not depend on the dquot code in the VFS - note ocfs2 already ignores
these calls and handles remount by itself.  This gets rid of overloading
the quotactl calls and allows to unify the VFS and XFS codepaths in
that area later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-24 14:06:39 +02:00
Linus Torvalds
e8bebe2f71 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
  fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
  get rid of home-grown mutex in cris eeprom.c
  switch ecryptfs_write() to struct inode *, kill on-stack fake files
  switch ecryptfs_get_locked_page() to struct inode *
  simplify access to ecryptfs inodes in ->readpage() and friends
  AFS: Don't put struct file on the stack
  Ban ecryptfs over ecryptfs
  logfs: replace inode uid,gid,mode initialization with helper function
  ufs: replace inode uid,gid,mode initialization with helper function
  udf: replace inode uid,gid,mode init with helper
  ubifs: replace inode uid,gid,mode initialization with helper function
  sysv: replace inode uid,gid,mode initialization with helper function
  reiserfs: replace inode uid,gid,mode initialization with helper function
  ramfs: replace inode uid,gid,mode initialization with helper function
  omfs: replace inode uid,gid,mode initialization with helper function
  bfs: replace inode uid,gid,mode initialization with helper function
  ocfs2: replace inode uid,gid,mode initialization with helper function
  nilfs2: replace inode uid,gid,mode initialization with helper function
  minix: replace inode uid,gid,mode init with helper
  ext4: replace inode uid,gid,mode init with helper
  ...

Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)
2010-05-21 19:37:45 -07:00
Dmitry Monakhov
04b7ed0d33 reiserfs: replace inode uid,gid,mode initialization with helper function
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:26 -04:00
Stephen Hemminger
94d09a98cd reiserfs: constify xattr_handler
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:19 -04:00
Jens Axboe
ee9a3607fb Merge branch 'master' into for-2.6.35
Conflicts:
	fs/ext3/fsync.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:27:26 +02:00
Dmitry Monakhov
12755627bd quota: unify quota init condition in setattr
Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21 19:30:45 +02:00
Jens Axboe
7407cf355f Merge branch 'master' into for-2.6.35
Conflicts:
	fs/block_dev.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-29 09:36:24 +02:00
Dmitry Monakhov
fbd9b09a17 blkdev: generalize flags for blkdev_issue_fn functions
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-28 19:47:36 +02:00
Jeff Mahoney
fb2162df74 reiserfs: fix corruption during shrinking of xattrs
Commit 48b32a3553 ("reiserfs: use generic
xattr handlers") introduced a problem that causes corruption when extended
attributes are replaced with a smaller value.

The issue is that the reiserfs_setattr to shrink the xattr file was moved
from before the write to after the write.

The root issue has always been in the reiserfs xattr code, but was papered
over by the fact that in the shrink case, the file would just be expanded
again while the xattr was written.

The end result is that the last 8 bytes of xattr data are lost.

This patch fixes it to use new_size.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Edward Shishkin <edward.shishkin@gmail.com>
Cc: Jethro Beekman <kernel@jbeekman.nl>
Cc: Greg Surbey <gregsurbey@hotmail.com>
Cc: Marco Gatti <marco.gatti@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:24 -07:00
Jeff Mahoney
cac36f7071 reiserfs: fix permissions on .reiserfs_priv
Commit 677c9b2e39 ("reiserfs: remove
privroot hiding in lookup") removed the magic from the lookup code to hide
the .reiserfs_priv directory since it was getting loaded at mount-time
instead.  The intent was that the entry would be hidden from the user via
a poisoned d_compare, but this was faulty.

This introduced a security issue where unprivileged users could access and
modify extended attributes or ACLs belonging to other users, including
root.

This patch resolves the issue by properly hiding .reiserfs_priv.  This was
the intent of the xattr poisoning code, but it appears to have never
worked as expected.  This is fixed by using d_revalidate instead of
d_compare.

This patch makes -oexpose_privroot a no-op.  I'm fine leaving it this way.
The effort involved in working out the corner cases wrt permissions and
caching outweigh the benefit of the feature.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Reported-by: Matt McCutchen <matt@mattmccutchen.net>
Tested-by: Matt McCutchen <matt@mattmccutchen.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:24 -07:00
Tejun Heo
336f5899d2 Merge branch 'master' into export-slabh 2010-04-05 11:37:28 +09:00
Jeff Mahoney
b7b7fa4310 reiserfs: Fix locking BUG during mount failure
Commit 8ebc423238 (reiserfs: kill-the-BKL)
introduced a bug in the mount failure case.

The error label releases the lock before calling journal_release_error,
but it requires that the lock be held. do_journal_release unlocks and
retakes it. When it releases it without it held, we trigger a BUG().

The error_alloc label skips the unlock since the lock isn't held yet
but none of the other conditions that are clean up exist yet either.

This patch returns immediately after the kzalloc failure and moves
the reiserfs_write_unlock after the journal_release_error call.

This was reported in https://bugzilla.novell.com/show_bug.cgi?id=591807

Reported-by:  Thomas Siedentopf <thomas.siedentopf@novell.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Thomas Siedentopf <thomas.siedentopf@novell.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: 2.6.33.x <stable@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-03-30 22:13:09 +02:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jeff Mahoney
3f8b5ee332 reiserfs: properly honor read-only devices
The reiserfs journal behaves inconsistently when determining whether to
allow a mount of a read-only device.

This is due to the use of the continue_replay variable to short circuit
the journal scanning.  If it's set, it's assumed that there are
transactions to replay, but there may not be.  If it's unset, it's assumed
that there aren't any, and that may not be the case either.

I've observed two failure cases:
1) Where a clean file system on a read-only device refuses to mount
2) Where a clean file system on a read-only device passes the
   optimization and then tries writing the journal header to update
   the latest mount id.

The former is easily observable by using a freshly created file system on
a read-only loopback device.

This patch moves the check into journal_read_transaction, where it can
bail out before it's about to replay a transaction.  That way it can go
through and skip transactions where appropriate, yet still refuse to mount
a file system with outstanding transactions.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24 16:31:21 -07:00
Jeff Mahoney
6cb4aff0a7 reiserfs: fix oops while creating privroot with selinux enabled
Commit 57fe60df ("reiserfs: add atomic addition of selinux attributes
during inode creation") contains a bug that will cause it to oops when
mounting a file system that didn't previously contain extended attributes
on a system using security.* xattrs.

The issue is that while creating the privroot during mount
reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
dereferences the xattr root.  The xattr root doesn't exist, so we get an
oops.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15309

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24 16:31:21 -07:00
Jiri Kosina
318ae2edc3 Merge branch 'for-next' into for-linus
Conflicts:
	Documentation/filesystems/proc.txt
	arch/arm/mach-u300/include/mach/debug-macro.S
	drivers/net/qlge/qlge_ethtool.c
	drivers/net/qlge/qlge_main.c
	drivers/net/typhoon.c
2010-03-08 16:55:37 +01:00
Linus Torvalds
e213e26ab3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
  quota: stop using QUOTA_OK / NO_QUOTA
  dquot: cleanup dquot initialize routine
  dquot: move dquot initialization responsibility into the filesystem
  dquot: cleanup dquot drop routine
  dquot: move dquot drop responsibility into the filesystem
  dquot: cleanup dquot transfer routine
  dquot: move dquot transfer responsibility into the filesystem
  dquot: cleanup inode allocation / freeing routines
  dquot: cleanup space allocation / freeing routines
  ext3: add writepage sanity checks
  ext3: Truncate allocated blocks if direct IO write fails to update i_size
  quota: Properly invalidate caches even for filesystems with blocksize < pagesize
  quota: generalize quota transfer interface
  quota: sb_quota state flags cleanup
  jbd: Delay discarding buffers in journal_unmap_buffer
  ext3: quota_write cross block boundary behaviour
  quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
  quota: split out compat_sys_quotactl support from quota.c
  quota: split out netlink notification support from quota.c
  quota: remove invalid optimization from quota_sync_all
  ...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05 13:20:53 -08:00
Christoph Hellwig
a9185b41a4 pass writeback_control to ->write_inode
This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05 13:25:52 -05:00
Christoph Hellwig
871a293155 dquot: cleanup dquot initialize routine
Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
Christoph Hellwig
907f4554e2 dquot: move dquot initialization responsibility into the filesystem
Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
Christoph Hellwig
9f75475802 dquot: cleanup dquot drop routine
Get rid of the drop dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_drop helper to __dquot_drop
and vfs_dq_drop to dquot_drop to have a consistent namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
Christoph Hellwig
257ba15ced dquot: move dquot drop responsibility into the filesystem
Currently clear_inode calls vfs_dq_drop directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the drop inside the ->clear_inode
superblock operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:29 +01:00
Christoph Hellwig
b43fa8284d dquot: cleanup dquot transfer routine
Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:29 +01:00
Christoph Hellwig
63936ddaa1 dquot: cleanup inode allocation / freeing routines
Get rid of the alloc_inode and free_inode dquot operations - they are
always called from the filesystem and if a filesystem really needs
their own (which none currently does) it can just call into it's
own routine directly.

Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
call the lowlevel dquot_alloc_inode / dqout_free_inode routines
directly, which now lose the number argument which is always 1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:28 +01:00
Christoph Hellwig
5dd4056db8 dquot: cleanup space allocation / freeing routines
Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not.  Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:28 +01:00
Frederic Weisbecker
175359f89d reiserfs: Fix softlockup while waiting on an inode
When we wait for an inode through reiserfs_iget(), we hold
the reiserfs lock. And waiting for an inode may imply waiting
for its writeback. But the inode writeback path may also require
the reiserfs lock, which leads to a deadlock.

We just need to release the reiserfs lock from reiserfs_iget()
to fix this.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Chris Mason <chris.mason@oracle.com>
2010-02-14 19:07:56 +01:00
Daniel Mack
3ad2f3fbb9 tree-wide: Assorted spelling fixes
In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-09 11:13:56 +01:00
Frederic Weisbecker
bbec919150 reiserfs: Fix vmalloc call under reiserfs lock
Vmalloc is called to allocate journal->j_cnode_free_list but
we hold the reiserfs lock at this time, which raises a
{RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} lock inversion.

Just drop the reiserfs lock at this time, as it's not even
needed but kept for paranoid reasons.

This fixes:

[ INFO: inconsistent lock state ]
2.6.33-rc5 #1
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/313 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11118c8>]
reiserfs_write_lock_once+0x28/0x50
{RECLAIM_FS-ON-W} state was registered at:
  [<c104ee32>] mark_held_locks+0x62/0x90
  [<c104eefa>] lockdep_trace_alloc+0x9a/0xc0
  [<c108f7b6>] kmem_cache_alloc+0x26/0xf0
  [<c108621c>] __get_vm_area_node+0x6c/0xf0
  [<c108690e>] __vmalloc_node+0x7e/0xa0
  [<c1086aab>] vmalloc+0x2b/0x30
  [<c110e1fb>] journal_init+0x6cb/0xa10
  [<c10f90a2>] reiserfs_fill_super+0x342/0xb80
  [<c1095665>] get_sb_bdev+0x145/0x180
  [<c10f68e1>] get_super_block+0x21/0x30
  [<c1094520>] vfs_kern_mount+0x40/0xd0
  [<c1094609>] do_kern_mount+0x39/0xd0
  [<c10aaa97>] do_mount+0x2c7/0x6d0
  [<c10aaf06>] sys_mount+0x66/0xa0
  [<c16198a7>] mount_block_root+0xc4/0x245
  [<c1619a81>] mount_root+0x59/0x5f
  [<c1619b98>] prepare_namespace+0x111/0x14b
  [<c1619269>] kernel_init+0xcf/0xdb
  [<c100303a>] kernel_thread_helper+0x6/0x1c
irq event stamp: 63236801
hardirqs last  enabled at (63236801): [<c134e7fa>]
__mutex_unlock_slowpath+0x9a/0x120
hardirqs last disabled at (63236800): [<c134e799>]
__mutex_unlock_slowpath+0x39/0x120
softirqs last  enabled at (63218800): [<c102f451>] __do_softirq+0xc1/0x110
softirqs last disabled at (63218789): [<c102f4ed>] do_softirq+0x4d/0x60

other info that might help us debug this:
2 locks held by kswapd0/313:
 #0:  (shrinker_rwsem){++++..}, at: [<c1074bb4>] shrink_slab+0x24/0x170
 #1:  (&type->s_umount_key#19){++++..}, at: [<c10a2edd>]
shrink_dcache_memory+0xfd/0x1a0

stack backtrace:
Pid: 313, comm: kswapd0 Not tainted 2.6.33-rc5 #1
Call Trace:
 [<c134db2c>] ? printk+0x18/0x1c
 [<c104e7ef>] print_usage_bug+0x15f/0x1a0
 [<c104ebcf>] mark_lock+0x39f/0x5a0
 [<c104d66b>] ? trace_hardirqs_off+0xb/0x10
 [<c1052c50>] ? check_usage_forwards+0x0/0xf0
 [<c1050c24>] __lock_acquire+0x214/0xa70
 [<c10438c5>] ? sched_clock_cpu+0x95/0x110
 [<c10514fa>] lock_acquire+0x7a/0xa0
 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50
 [<c134f03f>] mutex_lock_nested+0x5f/0x2b0
 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50
 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50
 [<c11118c8>] reiserfs_write_lock_once+0x28/0x50
 [<c10f05b0>] reiserfs_delete_inode+0x50/0x140
 [<c10a653f>] ? generic_delete_inode+0x5f/0x150
 [<c10f0560>] ? reiserfs_delete_inode+0x0/0x140
 [<c10a657c>] generic_delete_inode+0x9c/0x150
 [<c10a666d>] generic_drop_inode+0x3d/0x60
 [<c10a5597>] iput+0x47/0x50
 [<c10a2a4f>] dentry_iput+0x6f/0xf0
 [<c10a2af4>] d_kill+0x24/0x50
 [<c10a2d3d>] __shrink_dcache_sb+0x21d/0x2b0
 [<c10a2f0f>] shrink_dcache_memory+0x12f/0x1a0
 [<c1074c9e>] shrink_slab+0x10e/0x170
 [<c1075177>] kswapd+0x477/0x6a0
 [<c1072d10>] ? isolate_pages_global+0x0/0x1b0
 [<c103e160>] ? autoremove_wake_function+0x0/0x40
 [<c1074d00>] ? kswapd+0x0/0x6a0
 [<c103de6c>] kthread+0x6c/0x80
 [<c103de00>] ? kthread+0x0/0x80
 [<c100303a>] kernel_thread_helper+0x6/0x1c

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Chris Mason <chris.mason@oracle.com>
2010-01-28 13:43:50 +01:00
Linus Torvalds
82062e7b50 Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  reiserfs: Relax reiserfs_xattr_set_handle() while acquiring xattr locks
  reiserfs: Fix unreachable statement
  reiserfs: Don't call reiserfs_get_acl() with the reiserfs lock
  reiserfs: Relax lock on xattr removing
  reiserfs: Relax the lock before truncating pages
  reiserfs: Fix recursive lock on lchown
  reiserfs: Fix mistake in down_write() conversion
2010-01-08 14:03:55 -08:00
Frederic Weisbecker
31370f62ba reiserfs: Relax reiserfs_xattr_set_handle() while acquiring xattr locks
Fix remaining xattr locks acquired in reiserfs_xattr_set_handle()
while we are holding the reiserfs lock to avoid lock inversions.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-07 16:02:53 +01:00
Jiri Slaby
e0baec1b63 reiserfs: Fix unreachable statement
Stanse found an unreachable statement in reiserfs_ioctl. There is a
if followed by error assignment and `break' with no braces. Add the
braces so that we don't break every time, but only in error case,
so that REISERFS_IOC_SETVERSION actually works when it returns no
error.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Reiserfs <reiserfs-devel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-07 14:03:18 +01:00
Frederic Weisbecker
6c28705418 reiserfs: Don't call reiserfs_get_acl() with the reiserfs lock
reiserfs_get_acl is usually not called under the reiserfs lock,
as it doesn't need it. But it happens when it is called by
reiserfs_acl_chmod(), which creates a dependency inversion against
the private xattr inodes mutexes for the given inode.

We need to call it without the reiserfs lock, especially since
it's unnecessary.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-07 13:46:48 +01:00
Frederic Weisbecker
4f3be1b5a9 reiserfs: Relax lock on xattr removing
When we remove an xattr, we call lookup_and_delete_xattr()
that takes some private xattr inodes mutexes. But we hold
the reiserfs lock at this time, which leads to dependency
inversions.

We can safely call lookup_and_delete_xattr() without the
reiserfs lock, where xattr inodes lookups only need the
xattr inodes mutexes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-05 08:00:50 +01:00
Frederic Weisbecker
108d3943c0 reiserfs: Relax the lock before truncating pages
While truncating a file, reiserfs_setattr() calls inode_setattr()
that will truncate the mapping for the given inode, but for that
it needs the pages locks.

In order to release these, the owners need the reiserfs lock to
complete their jobs. But they can't, as we don't release it before
calling inode_setattr().

We need to do that to fix the following softlockups:

INFO: task flush-8:0:2149 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0     D f51af998     0  2149      2 0x00000000
 f51af9ac 00000092 00000002 f51af998 c2803304 00000000 c1894ad0 010f3000
 f51af9cc c1462604 c189ef80 f51af974 c1710304 f715b450 f715b5ec c2807c40
 00000000 0005bb00 c2803320 c102c55b c1710304 c2807c50 c2803304 00000246
Call Trace:
 [<c1462604>] ? schedule+0x434/0xb20
 [<c102c55b>] ? resched_task+0x4b/0x70
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c146414d>] ? mutex_lock_nested+0x1fd/0x350
 [<c14640b9>] mutex_lock_nested+0x169/0x350
 [<c1178cde>] ? reiserfs_write_lock+0x2e/0x40
 [<c1178cde>] reiserfs_write_lock+0x2e/0x40
 [<c11719a2>] do_journal_end+0xc2/0xe70
 [<c1172912>] journal_end+0xb2/0x120
 [<c11686b3>] ? pathrelse+0x33/0xb0
 [<c11729e4>] reiserfs_end_persistent_transaction+0x64/0x70
 [<c1153caa>] reiserfs_get_block+0x12ba/0x15f0
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c1154b24>] reiserfs_writepage+0xa74/0xe80
 [<c1465a27>] ? _raw_spin_unlock_irq+0x27/0x50
 [<c11f3d25>] ? radix_tree_gang_lookup_tag_slot+0x95/0xc0
 [<c10b5377>] ? find_get_pages_tag+0x127/0x1a0
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c106fcd4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10bc1e0>] __writepage+0x10/0x40
 [<c10bc9ab>] write_cache_pages+0x16b/0x320
 [<c10bc1d0>] ? __writepage+0x0/0x40
 [<c10bcb88>] generic_writepages+0x28/0x40
 [<c10bcbd5>] do_writepages+0x35/0x40
 [<c11059f7>] writeback_single_inode+0xc7/0x330
 [<c11067b2>] writeback_inodes_wb+0x2c2/0x490
 [<c1106a86>] wb_writeback+0x106/0x1b0
 [<c1106cf6>] wb_do_writeback+0x106/0x1e0
 [<c1106c18>] ? wb_do_writeback+0x28/0x1e0
 [<c1106e0a>] bdi_writeback_task+0x3a/0xb0
 [<c10cbb13>] bdi_start_fn+0x63/0xc0
 [<c10cbab0>] ? bdi_start_fn+0x0/0xc0
 [<c105d1f4>] kthread+0x74/0x80
 [<c105d180>] ? kthread+0x0/0x80
 [<c100327a>] kernel_thread_helper+0x6/0x10
3 locks held by flush-8:0/2149:
 #0:  (&type->s_umount_key#30){+++++.}, at: [<c110676f>] writeback_inodes_wb+0x27f/0x490
 #1:  (&journal->j_mutex){+.+...}, at: [<c117199a>] do_journal_end+0xba/0xe70
 #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178cde>] reiserfs_write_lock+0x2e/0x40
INFO: task fstest:3813 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fstest        D 00000002     0  3813   3812 0x00000000
 f5103c94 00000082 f5103c40 00000002 f5ad5450 00000007 f5103c28 011f3000
 00000006 f5ad5450 c10bb005 00000480 c1710304 f5ad5450 f5ad55ec c2907c40
 00000001 f5ad5450 f5103c74 00000046 00000002 f5ad5450 00000007 f5103c6c
Call Trace:
 [<c10bb005>] ? free_hot_cold_page+0x1d5/0x280
 [<c1462d64>] io_schedule+0x74/0xc0
 [<c10b5a45>] sync_page+0x35/0x60
 [<c146325a>] __wait_on_bit_lock+0x4a/0x90
 [<c10b5a10>] ? sync_page+0x0/0x60
 [<c10b59e5>] __lock_page+0x85/0x90
 [<c105d660>] ? wake_bit_function+0x0/0x60
 [<c10bf654>] truncate_inode_pages_range+0x1e4/0x2d0
 [<c10bf75f>] truncate_inode_pages+0x1f/0x30
 [<c10bf7cf>] truncate_pagecache+0x5f/0xa0
 [<c10bf86a>] vmtruncate+0x5a/0x70
 [<c10fdb7d>] inode_setattr+0x5d/0x190
 [<c1150117>] reiserfs_setattr+0x1f7/0x2f0
 [<c1464569>] ? down_write+0x49/0x70
 [<c10fde01>] notify_change+0x151/0x330
 [<c10e6f3d>] do_truncate+0x6d/0xa0
 [<c10f4ce2>] do_filp_open+0x9a2/0xcf0
 [<c1465aec>] ? _raw_spin_unlock+0x2c/0x50
 [<c10fec50>] ? alloc_fd+0xe0/0x100
 [<c10e602d>] do_sys_open+0x6d/0x130
 [<c1002cfb>] ? sysenter_exit+0xf/0x16
 [<c10e615e>] sys_open+0x2e/0x40
 [<c1002ccc>] sysenter_do_call+0x12/0x32
3 locks held by fstest/3813:
 #0:  (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<c10e6f33>] do_truncate+0x63/0xa0
 #1:  (&sb->s_type->i_alloc_sem_key#3){+.+.+.}, at: [<c10fdf07>] notify_change+0x257/0x330
 #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178c8e>] reiserfs_write_lock_once+0x2e/0x50

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-05 08:00:29 +01:00
Frederic Weisbecker
5fe1533fda reiserfs: Fix recursive lock on lchown
On chown, reiserfs will call reiserfs_setattr() to change the owner
of the given inode, but it may also recursively call
reiserfs_setattr() to propagate the owner change to the private xattr
files for this inode.

Hence, the reiserfs lock may be acquired twice which is not wanted
as reiserfs_setattr() calls journal_begin() that is going to try to
relax the lock in order to safely acquire the journal mutex.

Using reiserfs_write_lock_once() from reiserfs_setattr() solves
the problem.

This fixes the following warning, that precedes a lockdep report.

WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3f/0x50()
Hardware name: MS-7418
Unwanted recursive reiserfs lock!
Pid: 4189, comm: fsstress Not tainted 2.6.33-rc2-tip-atom+ #195
Call Trace:
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c103f7ac>] warn_slowpath_common+0x6c/0xc0
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c103f84b>] warn_slowpath_fmt+0x2b/0x30
 [<c1178bff>] reiserfs_lock_check_recursive+0x3f/0x50
 [<c1172ae3>] do_journal_begin_r+0x83/0x350
 [<c1172f2d>] journal_begin+0x7d/0x140
 [<c106509a>] ? in_group_p+0x2a/0x30
 [<c10fda71>] ? inode_change_ok+0x91/0x140
 [<c115007d>] reiserfs_setattr+0x15d/0x2e0
 [<c10f9bf3>] ? dput+0xe3/0x140
 [<c1465adc>] ? _raw_spin_unlock+0x2c/0x50
 [<c117831d>] chown_one_xattr+0xd/0x10
 [<c11780a3>] reiserfs_for_each_xattr+0x113/0x2c0
 [<c1178310>] ? chown_one_xattr+0x0/0x10
 [<c14641e9>] ? mutex_lock_nested+0x2a9/0x350
 [<c117826f>] reiserfs_chown_xattrs+0x1f/0x60
 [<c106509a>] ? in_group_p+0x2a/0x30
 [<c10fda71>] ? inode_change_ok+0x91/0x140
 [<c1150046>] reiserfs_setattr+0x126/0x2e0
 [<c1177c20>] ? reiserfs_getxattr+0x0/0x90
 [<c11b0d57>] ? cap_inode_need_killpriv+0x37/0x50
 [<c10fde01>] notify_change+0x151/0x330
 [<c10e659f>] chown_common+0x6f/0x90
 [<c10e67bd>] sys_lchown+0x6d/0x80
 [<c1002ccc>] sysenter_do_call+0x12/0x32
---[ end trace 7c2b77224c1442fc ]---

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-05 07:59:38 +01:00
Frederic Weisbecker
f3e22f48f3 reiserfs: Fix mistake in down_write() conversion
Fix a mistake in commit 0719d34347
(reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion)
that has converted a down_write() into a down_read() accidentally.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-03 03:44:53 +01:00
Linus Torvalds
45d28b0972 Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  reiserfs: Safely acquire i_mutex from xattr_rmdir
  reiserfs: Safely acquire i_mutex from reiserfs_for_each_xattr
  reiserfs: Fix journal mutex <-> inode mutex lock inversion
  reiserfs: Fix unwanted recursive reiserfs lock in reiserfs_unlink()
  reiserfs: Relax lock before open xattr dir in reiserfs_xattr_set_handle()
  reiserfs: Relax reiserfs lock while freeing the journal
  reiserfs: Fix reiserfs lock <-> i_mutex dependency inversion on xattr
  reiserfs: Warn on lock relax if taken recursively
  reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion
  reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion
  reiserfs: Fix reiserfs lock <-> inode mutex dependency inversion
  reiserfs: Fix reiserfs lock and journal lock inversion dependency
  reiserfs: Fix possible recursive lock
2010-01-02 11:17:05 -08:00
Frederic Weisbecker
835d5247d9 reiserfs: Safely acquire i_mutex from xattr_rmdir
Relax the reiserfs lock before taking the inode mutex from
xattr_rmdir() to avoid the usual reiserfs lock <-> inode mutex
bad dependency.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:59:48 +01:00
Frederic Weisbecker
8b513f56d4 reiserfs: Safely acquire i_mutex from reiserfs_for_each_xattr
Relax the reiserfs lock before taking the inode mutex from
reiserfs_for_each_xattr() to avoid the usual bad dependencies:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-atom #179
-------------------------------------------------------
rm/3242 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c11428ef>] reiserfs_for_each_xattr+0x23f/0x290

but task is already holding lock:
 (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143389>] reiserfs_write_lock+0x29/0x40

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401aab>] mutex_lock_nested+0x5b/0x340
       [<c1143339>] reiserfs_write_lock_once+0x29/0x50
       [<c1117022>] reiserfs_lookup+0x62/0x140
       [<c10bd85f>] __lookup_hash+0xef/0x110
       [<c10bf21d>] lookup_one_len+0x8d/0xc0
       [<c1141e3a>] open_xa_dir+0xea/0x1b0
       [<c1142720>] reiserfs_for_each_xattr+0x70/0x290
       [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60
       [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150
       [<c10c9c32>] generic_delete_inode+0xa2/0x170
       [<c10c9d4f>] generic_drop_inode+0x4f/0x70
       [<c10c8b07>] iput+0x47/0x50
       [<c10c0965>] do_unlinkat+0xd5/0x160
       [<c10c0b13>] sys_unlinkat+0x23/0x40
       [<c1002ec4>] sysenter_do_call+0x12/0x32

-> #0 (&sb->s_type->i_mutex_key#4/3){+.+.+.}:
       [<c105f176>] __lock_acquire+0x18f6/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401aab>] mutex_lock_nested+0x5b/0x340
       [<c11428ef>] reiserfs_for_each_xattr+0x23f/0x290
       [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60
       [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150
       [<c10c9c32>] generic_delete_inode+0xa2/0x170
       [<c10c9d4f>] generic_drop_inode+0x4f/0x70
       [<c10c8b07>] iput+0x47/0x50
       [<c10c0965>] do_unlinkat+0xd5/0x160
       [<c10c0b13>] sys_unlinkat+0x23/0x40
       [<c1002ec4>] sysenter_do_call+0x12/0x32

other info that might help us debug this:

1 lock held by rm/3242:
 #0:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143389>] reiserfs_write_lock+0x29/0x40

stack backtrace:
Pid: 3242, comm: rm Not tainted 2.6.32-atom #179
Call Trace:
 [<c13ffa13>] ? printk+0x18/0x1a
 [<c105d33a>] print_circular_bug+0xca/0xd0
 [<c105f176>] __lock_acquire+0x18f6/0x19e0
 [<c105c932>] ? mark_held_locks+0x62/0x80
 [<c105cc3b>] ? trace_hardirqs_on+0xb/0x10
 [<c1401098>] ? mutex_unlock+0x8/0x10
 [<c105f2c8>] lock_acquire+0x68/0x90
 [<c11428ef>] ? reiserfs_for_each_xattr+0x23f/0x290
 [<c11428ef>] ? reiserfs_for_each_xattr+0x23f/0x290
 [<c1401aab>] mutex_lock_nested+0x5b/0x340
 [<c11428ef>] ? reiserfs_for_each_xattr+0x23f/0x290
 [<c11428ef>] reiserfs_for_each_xattr+0x23f/0x290
 [<c1143180>] ? delete_one_xattr+0x0/0x100
 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60
 [<c1143339>] ? reiserfs_write_lock_once+0x29/0x50
 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150
 [<c11b0d4f>] ? _atomic_dec_and_lock+0x4f/0x70
 [<c111e990>] ? reiserfs_delete_inode+0x0/0x150
 [<c10c9c32>] generic_delete_inode+0xa2/0x170
 [<c10c9d4f>] generic_drop_inode+0x4f/0x70
 [<c10c8b07>] iput+0x47/0x50
 [<c10c0965>] do_unlinkat+0xd5/0x160
 [<c1401098>] ? mutex_unlock+0x8/0x10
 [<c10c3e0d>] ? vfs_readdir+0x7d/0xb0
 [<c10c3af0>] ? filldir64+0x0/0xf0
 [<c1002ef3>] ? sysenter_exit+0xf/0x16
 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10c0b13>] sys_unlinkat+0x23/0x40
 [<c1002ec4>] sysenter_do_call+0x12/0x32

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:59:14 +01:00
Frederic Weisbecker
4dd859697f reiserfs: Fix journal mutex <-> inode mutex lock inversion
We need to relax the reiserfs lock before locking the inode mutex
from xattr_unlink(), otherwise we'll face the usual bad dependencies:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-atom #178
-------------------------------------------------------
rm/3202 is trying to acquire lock:
 (&journal->j_mutex){+.+...}, at: [<c113c234>] do_journal_begin_r+0x94/0x360

but task is already holding lock:
 (&sb->s_type->i_mutex_key#4/2){+.+...}, at: [<c1142a67>] xattr_unlink+0x57/0xb0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&sb->s_type->i_mutex_key#4/2){+.+...}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401a7b>] mutex_lock_nested+0x5b/0x340
       [<c1142a67>] xattr_unlink+0x57/0xb0
       [<c1143179>] delete_one_xattr+0x29/0x100
       [<c11427bb>] reiserfs_for_each_xattr+0x10b/0x290
       [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60
       [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150
       [<c10c9c32>] generic_delete_inode+0xa2/0x170
       [<c10c9d4f>] generic_drop_inode+0x4f/0x70
       [<c10c8b07>] iput+0x47/0x50
       [<c10c0965>] do_unlinkat+0xd5/0x160
       [<c10c0b13>] sys_unlinkat+0x23/0x40
       [<c1002ec4>] sysenter_do_call+0x12/0x32

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401a7b>] mutex_lock_nested+0x5b/0x340
       [<c1143359>] reiserfs_write_lock+0x29/0x40
       [<c113c23c>] do_journal_begin_r+0x9c/0x360
       [<c113c680>] journal_begin+0x80/0x130
       [<c1127363>] reiserfs_remount+0x223/0x4e0
       [<c10b6dd6>] do_remount_sb+0xa6/0x140
       [<c10ce6a0>] do_mount+0x560/0x750
       [<c10ce914>] sys_mount+0x84/0xb0
       [<c1002ec4>] sysenter_do_call+0x12/0x32

-> #0 (&journal->j_mutex){+.+...}:
       [<c105f176>] __lock_acquire+0x18f6/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401a7b>] mutex_lock_nested+0x5b/0x340
       [<c113c234>] do_journal_begin_r+0x94/0x360
       [<c113c680>] journal_begin+0x80/0x130
       [<c1116d63>] reiserfs_unlink+0x83/0x2e0
       [<c1142a74>] xattr_unlink+0x64/0xb0
       [<c1143179>] delete_one_xattr+0x29/0x100
       [<c11427bb>] reiserfs_for_each_xattr+0x10b/0x290
       [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60
       [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150
       [<c10c9c32>] generic_delete_inode+0xa2/0x170
       [<c10c9d4f>] generic_drop_inode+0x4f/0x70
       [<c10c8b07>] iput+0x47/0x50
       [<c10c0965>] do_unlinkat+0xd5/0x160
       [<c10c0b13>] sys_unlinkat+0x23/0x40
       [<c1002ec4>] sysenter_do_call+0x12/0x32

other info that might help us debug this:

2 locks held by rm/3202:
 #0:  (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c114274b>] reiserfs_for_each_xattr+0x9b/0x290
 #1:  (&sb->s_type->i_mutex_key#4/2){+.+...}, at: [<c1142a67>] xattr_unlink+0x57/0xb0

stack backtrace:
Pid: 3202, comm: rm Not tainted 2.6.32-atom #178
Call Trace:
 [<c13ff9e3>] ? printk+0x18/0x1a
 [<c105d33a>] print_circular_bug+0xca/0xd0
 [<c105f176>] __lock_acquire+0x18f6/0x19e0
 [<c1142a67>] ? xattr_unlink+0x57/0xb0
 [<c105f2c8>] lock_acquire+0x68/0x90
 [<c113c234>] ? do_journal_begin_r+0x94/0x360
 [<c113c234>] ? do_journal_begin_r+0x94/0x360
 [<c1401a7b>] mutex_lock_nested+0x5b/0x340
 [<c113c234>] ? do_journal_begin_r+0x94/0x360
 [<c113c234>] do_journal_begin_r+0x94/0x360
 [<c10411b6>] ? run_timer_softirq+0x1a6/0x220
 [<c103cb00>] ? __do_softirq+0x50/0x140
 [<c113c680>] journal_begin+0x80/0x130
 [<c103cba2>] ? __do_softirq+0xf2/0x140
 [<c104f72f>] ? hrtimer_interrupt+0xdf/0x220
 [<c1116d63>] reiserfs_unlink+0x83/0x2e0
 [<c105c932>] ? mark_held_locks+0x62/0x80
 [<c11b8d08>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c1002fd8>] ? restore_all_notrace+0x0/0x18
 [<c1142a67>] ? xattr_unlink+0x57/0xb0
 [<c1142a74>] xattr_unlink+0x64/0xb0
 [<c1143179>] delete_one_xattr+0x29/0x100
 [<c11427bb>] reiserfs_for_each_xattr+0x10b/0x290
 [<c1143150>] ? delete_one_xattr+0x0/0x100
 [<c1401cb9>] ? mutex_lock_nested+0x299/0x340
 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60
 [<c1143309>] ? reiserfs_write_lock_once+0x29/0x50
 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150
 [<c11b0d1f>] ? _atomic_dec_and_lock+0x4f/0x70
 [<c111e990>] ? reiserfs_delete_inode+0x0/0x150
 [<c10c9c32>] generic_delete_inode+0xa2/0x170
 [<c10c9d4f>] generic_drop_inode+0x4f/0x70
 [<c10c8b07>] iput+0x47/0x50
 [<c10c0965>] do_unlinkat+0xd5/0x160
 [<c1401068>] ? mutex_unlock+0x8/0x10
 [<c10c3e0d>] ? vfs_readdir+0x7d/0xb0
 [<c10c3af0>] ? filldir64+0x0/0xf0
 [<c1002ef3>] ? sysenter_exit+0xf/0x16
 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10c0b13>] sys_unlinkat+0x23/0x40
 [<c1002ec4>] sysenter_do_call+0x12/0x32

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:58:32 +01:00
Frederic Weisbecker
c674905ca7 reiserfs: Fix unwanted recursive reiserfs lock in reiserfs_unlink()
reiserfs_unlink() may or may not be called under the reiserfs
lock.
But it also takes the reiserfs lock and can then acquire it
recursively which leads to do_journal_begin_r() that fails to
relax the reiserfs lock before grabbing the journal mutex,
creating an unexpected lock inversion.

We need to ensure reiserfs_unlink() won't get the reiserfs lock
recursively using reiserfs_write_lock_once().

This fixes the following warning that precedes a lock inversion
report (reiserfs lock <-> journal mutex).

------------[ cut here ]------------
WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3a/0x50()
Hardware name: MS-7418
Unwanted recursive reiserfs lock!
Pid: 3208, comm: dbench Not tainted 2.6.32-atom #177
Call Trace:
 [<c114327a>] ? reiserfs_lock_check_recursive+0x3a/0x50
 [<c114327a>] ? reiserfs_lock_check_recursive+0x3a/0x50
 [<c10373a7>] warn_slowpath_common+0x67/0xc0
 [<c114327a>] ? reiserfs_lock_check_recursive+0x3a/0x50
 [<c1037446>] warn_slowpath_fmt+0x26/0x30
 [<c114327a>] reiserfs_lock_check_recursive+0x3a/0x50
 [<c113c213>] do_journal_begin_r+0x83/0x360
 [<c105eb16>] ? __lock_acquire+0x1296/0x19e0
 [<c1142a57>] ? xattr_unlink+0x57/0xb0
 [<c113c670>] journal_begin+0x80/0x130
 [<c1116d5d>] reiserfs_unlink+0x7d/0x2d0
 [<c1142a57>] ? xattr_unlink+0x57/0xb0
 [<c1142a57>] ? xattr_unlink+0x57/0xb0
 [<c1142a57>] ? xattr_unlink+0x57/0xb0
 [<c1142a64>] xattr_unlink+0x64/0xb0
 [<c1143169>] delete_one_xattr+0x29/0x100
 [<c11427ab>] reiserfs_for_each_xattr+0x10b/0x290
 [<c1143140>] ? delete_one_xattr+0x0/0x100
 [<c1401ca9>] ? mutex_lock_nested+0x299/0x340
 [<c11429aa>] reiserfs_delete_xattrs+0x1a/0x60
 [<c11432f9>] ? reiserfs_write_lock_once+0x29/0x50
 [<c111ea1f>] reiserfs_delete_inode+0x9f/0x150
 [<c11b0d0f>] ? _atomic_dec_and_lock+0x4f/0x70
 [<c111e980>] ? reiserfs_delete_inode+0x0/0x150
 [<c10c9c32>] generic_delete_inode+0xa2/0x170
 [<c10c9d4f>] generic_drop_inode+0x4f/0x70
 [<c10c8b07>] iput+0x47/0x50
 [<c10c0965>] do_unlinkat+0xd5/0x160
 [<c10505c6>] ? up_read+0x16/0x30
 [<c1022ab7>] ? do_page_fault+0x187/0x330
 [<c1002fd8>] ? restore_all_notrace+0x0/0x18
 [<c1022930>] ? do_page_fault+0x0/0x330
 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10c0a00>] sys_unlink+0x10/0x20
 [<c1002ec4>] sysenter_do_call+0x12/0x32
---[ end trace 2e35d71a6cc69d0c ]---

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:57:32 +01:00
Frederic Weisbecker
3f14fea6bb reiserfs: Relax lock before open xattr dir in reiserfs_xattr_set_handle()
We call xattr_lookup() from reiserfs_xattr_get(). We then hold
the reiserfs lock when we grab the i_mutex. But later, we may
relax the reiserfs lock, creating dependency inversion between
both locks.

The lookups and creation jobs ar already protected by the
inode mutex, so we can safely relax the reiserfs lock, dropping
the unwanted reiserfs lock -> i_mutex dependency, as shown
in the following lockdep report:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-atom #173
-------------------------------------------------------
cp/3204 is trying to acquire lock:
 (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11432b9>] reiserfs_write_lock_once+0x29/0x50

but task is already holding lock:
 (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#4/3){+.+.+.}:
       [<c105ea7f>] __lock_acquire+0x11ff/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401a2b>] mutex_lock_nested+0x5b/0x340
       [<c1141d83>] open_xa_dir+0x43/0x1b0
       [<c1142722>] reiserfs_for_each_xattr+0x62/0x260
       [<c114299a>] reiserfs_delete_xattrs+0x1a/0x60
       [<c111ea1f>] reiserfs_delete_inode+0x9f/0x150
       [<c10c9c32>] generic_delete_inode+0xa2/0x170
       [<c10c9d4f>] generic_drop_inode+0x4f/0x70
       [<c10c8b07>] iput+0x47/0x50
       [<c10c0965>] do_unlinkat+0xd5/0x160
       [<c10c0a00>] sys_unlink+0x10/0x20
       [<c1002ec4>] sysenter_do_call+0x12/0x32

-> #0 (&REISERFS_SB(s)->lock){+.+.+.}:
       [<c105f176>] __lock_acquire+0x18f6/0x19e0
       [<c105f2c8>] lock_acquire+0x68/0x90
       [<c1401a2b>] mutex_lock_nested+0x5b/0x340
       [<c11432b9>] reiserfs_write_lock_once+0x29/0x50
       [<c1117012>] reiserfs_lookup+0x62/0x140
       [<c10bd85f>] __lookup_hash+0xef/0x110
       [<c10bf21d>] lookup_one_len+0x8d/0xc0
       [<c1141e2a>] open_xa_dir+0xea/0x1b0
       [<c1141fe5>] xattr_lookup+0x15/0x160
       [<c1142476>] reiserfs_xattr_get+0x56/0x2a0
       [<c1144042>] reiserfs_get_acl+0xa2/0x360
       [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160
       [<c111789c>] reiserfs_mkdir+0x6c/0x2c0
       [<c10bea96>] vfs_mkdir+0xd6/0x180
       [<c10c0c10>] sys_mkdirat+0xc0/0xd0
       [<c10c0c40>] sys_mkdir+0x20/0x30
       [<c1002ec4>] sysenter_do_call+0x12/0x32

other info that might help us debug this:

2 locks held by cp/3204:
 #0:  (&sb->s_type->i_mutex_key#4/1){+.+.+.}, at: [<c10bd8d6>] lookup_create+0x26/0xa0
 #1:  (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0

stack backtrace:
Pid: 3204, comm: cp Not tainted 2.6.32-atom #173
Call Trace:
 [<c13ff993>] ? printk+0x18/0x1a
 [<c105d33a>] print_circular_bug+0xca/0xd0
 [<c105f176>] __lock_acquire+0x18f6/0x19e0
 [<c105d3aa>] ? check_usage+0x6a/0x460
 [<c105f2c8>] lock_acquire+0x68/0x90
 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50
 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50
 [<c1401a2b>] mutex_lock_nested+0x5b/0x340
 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50
 [<c11432b9>] reiserfs_write_lock_once+0x29/0x50
 [<c1117012>] reiserfs_lookup+0x62/0x140
 [<c105ccca>] ? debug_check_no_locks_freed+0x8a/0x140
 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10bd85f>] __lookup_hash+0xef/0x110
 [<c10bf21d>] lookup_one_len+0x8d/0xc0
 [<c1141e2a>] open_xa_dir+0xea/0x1b0
 [<c1141fe5>] xattr_lookup+0x15/0x160
 [<c1142476>] reiserfs_xattr_get+0x56/0x2a0
 [<c1144042>] reiserfs_get_acl+0xa2/0x360
 [<c10ca2e7>] ? new_inode+0x27/0xa0
 [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160
 [<c1402eb7>] ? _spin_unlock+0x27/0x40
 [<c111789c>] reiserfs_mkdir+0x6c/0x2c0
 [<c10c7cb8>] ? __d_lookup+0x108/0x190
 [<c105c932>] ? mark_held_locks+0x62/0x80
 [<c1401c8d>] ? mutex_lock_nested+0x2bd/0x340
 [<c10bd17a>] ? generic_permission+0x1a/0xa0
 [<c11788fe>] ? security_inode_permission+0x1e/0x20
 [<c10bea96>] vfs_mkdir+0xd6/0x180
 [<c10c0c10>] sys_mkdirat+0xc0/0xd0
 [<c10505c6>] ? up_read+0x16/0x30
 [<c1002fd8>] ? restore_all_notrace+0x0/0x18
 [<c10c0c40>] sys_mkdir+0x20/0x30
 [<c1002ec4>] sysenter_do_call+0x12/0x32

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-01-02 01:57:01 +01:00