1
linux/fs
Andi Kleen 69b4573296 Cache xattr security drop check for write v2
Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.

Why xattr lookup on every write I hear you ask?

write wants to drop suid and security related xattrs that could set o
capabilities for executables.  To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.

In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.

Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.

This patch adds a simple per inode to avoid this problem.  We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.

I also used the same flag to avoid the suid check, although
that one is pretty cheap.

A file system can also set this flag when it creates the inode,
if it has a cheap way to do so.  This is done for some common file systems
in followon patches.

With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.

v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-28 12:02:09 -04:00
..
9p 9p: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
adfs Fix common misspellings 2011-03-31 11:26:23 -03:00
affs affs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
afs afs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
autofs4 vfs: push dentry_unhash on rmdir into file systems 2011-05-26 07:26:47 -04:00
befs Fix common misspellings 2011-03-31 11:26:23 -03:00
bfs bfs: remove unnecessary dentry_unhash on dir rename 2011-05-28 01:02:50 -04:00
btrfs fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
cachefiles Fix common misspellings 2011-03-31 11:26:23 -03:00
ceph ceph: remove unnecessary dentry_unhash calls 2011-05-26 07:26:53 -04:00
cifs cifs: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:59 -04:00
coda coda: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
configfs configfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
cramfs
debugfs debugfs: move to new strtobool 2011-05-19 16:55:28 +09:30
devpts fs/devpts/inode.c: correctly check d_alloc_name() return code in devpts_pty_new() 2011-03-22 17:44:17 -07:00
dlm Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-05-26 13:19:00 -07:00
ecryptfs ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
efs block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
exofs exofs: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:57 -04:00
exportfs vfs: Add open by file handle support 2011-03-15 02:21:44 -04:00
ext2 ext2: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:56 -04:00
ext3 fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
ext4 fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
fat fat: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
freevxfs treewide: fix a few typos in comments 2011-05-10 10:16:21 +02:00
fscache fscache: remove dead code under CONFIG_WORKQUEUE_DEBUGFS 2011-05-25 08:39:44 -07:00
fuse fuse: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
gfs2 Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-05-26 13:19:00 -07:00
hfs hfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hfsplus hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hostfs hostfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hpfs hpfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
hppfs
hugetlbfs mm: don't access vm_flags as 'int' 2011-05-26 09:20:31 -07:00
isofs Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
jbd jbd: Fix comment to match the code in journal_start() 2011-05-24 00:27:53 +02:00
jbd2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2011-05-26 09:53:20 -07:00
jffs2 jffs2: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:50 -04:00
jfs jfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
lockd
logfs logfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
minix minix: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
ncpfs ncpfs: fix rename over directory with dangling references 2011-05-28 01:02:53 -04:00
nfs nfs: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:57 -04:00
nfs_common Fix common misspellings 2011-03-31 11:26:23 -03:00
nfsd treewide: fix a few typos in comments 2011-05-10 10:16:21 +02:00
nilfs2 nilfs2: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
nls
notify Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
ntfs Fix common misspellings 2011-03-31 11:26:23 -03:00
ocfs2 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 2011-05-26 10:55:15 -07:00
omfs omfs: remove unnecessary dentry_unhash on rmdir, dir rneame 2011-05-28 01:02:52 -04:00
openpromfs
partitions fs/partitions/efi.c: corrupted GUID partition tables can cause kernel oops 2011-05-26 17:12:37 -07:00
proc fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages 2011-05-26 17:12:37 -07:00
pstore pstore: fix pstore filesystem mount/remount issue 2011-05-16 11:05:00 -07:00
qnx4 block: remove per-queue plugging 2011-03-10 08:52:07 +01:00
quota vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
ramfs ramfs: fix memleak on no-mmu arch 2011-04-14 16:06:56 -07:00
reiserfs reiserfs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
romfs
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus 2011-05-26 17:27:35 -07:00
sysfs sysfs: remove "last sysfs file:" line from the oops messages 2011-05-13 16:05:51 -07:00
sysv sysv: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:50 -04:00
ubifs ubifs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
udf udf: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:52 -04:00
ufs ufs: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:51 -04:00
xfs fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
aio.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
anon_inodes.c
attr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c brk: COMPAT_BRK: fix detection of randomized brk 2011-04-14 16:06:55 -07:00
binfmt_em86.c
binfmt_flat.c CRED: Fix load_flat_shared_library() to initialise bprm correctly 2011-05-03 10:10:51 +10:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Require subsystems to explicitly allocate bio_set integrity mempool 2011-03-17 11:11:05 +01:00
bio.c vfs: Improve the bio_add_page() and bio_add_pc_page() descriptions 2011-05-27 09:43:00 -04:00
block_dev.c block: move bd_set_size() above rescan_partitions() in __blkdev_get() 2011-05-23 08:50:48 -07:00
buffer.c fs: block_page_mkwrite should wait for writeback to finish 2011-05-28 01:03:21 -04:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c exec: unify do_execve/compat_do_execve code 2011-04-09 15:53:56 +02:00
dcache.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
dcookies.c
direct-io.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
drop_caches.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
eventfd.c
eventpoll.c Fix common misspellings 2011-03-31 11:26:23 -03:00
exec.c coredump: add support for exe_file in core name 2011-05-26 17:12:36 -07:00
fcntl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
fhandle.c fs/fhandle.c: add <linux/personality.h> for ia64 2011-04-14 16:06:56 -07:00
fifo.c Filesystem: fifo: Fixed coding style issue. 2011-03-21 00:16:09 -04:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-03-16 13:26:17 -07:00
file.c vfs: avoid large kmalloc()s for the fdtable 2011-04-28 11:28:20 -07:00
filesystems.c fs: synchronize_rcu when unregister_filesystem success not failure 2011-04-17 10:42:01 -07:00
fs_struct.c
fs-writeback.c fs: pass exact type of data dirties to ->dirty_inode 2011-05-27 07:04:40 -04:00
generic_acl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
inode.c fs: cosmetic inode.c cleanups 2011-05-27 09:43:00 -04:00
internal.h fs: move i_wb_list out from under inode_lock 2011-03-24 21:17:51 -04:00
ioctl.c vfs: cleanup do_vfs_ioctl() 2011-03-21 00:16:08 -04:00
ioprio.c
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-05-26 09:52:14 -07:00
Kconfig.binfmt
libfs.c libfs: drop unneeded dentry_unhash 2011-05-26 07:26:50 -04:00
locks.c Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux 2011-03-24 08:20:39 -07:00
Makefile Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2011-03-16 19:01:29 -07:00
mbcache.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
mpage.c mm/fs: add hooks to support cleancache 2011-05-26 10:01:43 -06:00
namei.c Lift the check for automount points into do_lookup() 2011-05-27 07:03:15 -04:00
namespace.c fs/namespace.c: bound mount propagation fix 2011-05-26 07:26:44 -04:00
nfsctl.c open-style analog of vfs_path_lookup() 2011-03-14 09:15:28 -04:00
no-block.c
open.c fs: Use BUG_ON(!mnt) at dentry_open(). 2011-03-21 01:10:41 -04:00
pipe.c
pnode.c
pnode.h
posix_acl.c
read_write.c
read_write.h
readdir.c
select.c select: remove unused MAX_SELECT_SECONDS 2011-03-21 00:16:08 -04:00
seq_file.c
signalfd.c
splice.c splice: add wakeup_pipe_readers() 2011-05-23 19:58:53 +02:00
stack.c
stat.c readlinkat(), fchownat() and fstatat() with empty relative pathnames 2011-03-15 02:21:45 -04:00
statfs.c clean statfs-like syscalls up 2011-03-14 09:15:28 -04:00
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem 2011-05-26 10:50:56 -07:00
sync.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
timerfd.c timerfd: Manage cancelable timers in timerfd 2011-05-23 13:59:53 +02:00
utimes.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
xattr_acl.c
xattr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00