1
Commit Graph

26294 Commits

Author SHA1 Message Date
Gilad Ben-Yossef
42be35d039 fs: only send IPI to invalidate LRU BH when needed
In several code paths, such as when unmounting a file system (but not
only) we send an IPI to ask each cpu to invalidate its local LRU BHs.

For multi-cores systems that have many cpus that may not have any LRU BH
because they are idle or because they have not performed any file system
accesses since last invalidation (e.g.  CPU crunching on high perfomance
computing nodes that write results to shared memory or only using
filesystems that do not use the bh layer.) This can lead to loss of
performance each time someone switches the KVM (the virtual keyboard and
screen type, not the hypervisor) if it has a USB storage stuck in.

This patch attempts to only send an IPI to cpus that have LRU BH.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
Andrea Arcangeli
45f83cefe3 mm: thp: fix up pmd_trans_unstable() locations
pmd_trans_unstable() should be called before pmd_offset_map() in the
locations where the mmap_sem is held for reading.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
KAMEZAWA Hiroyuki
3748b2f15b procfs: fix /proc/statm
bda7bad62b ("procfs: speed up /proc/pid/stat, statm") broke /proc/statm
- 'text' is printed twice by mistake.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: Ulrich Drepper <drepper@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
Linus Torvalds
529b73fc0a trivial writeback fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPad6tAAoJECvKgwp+S8JaAFEP/jMTQmRUh59x8ABwhJjeWkGf
 zv8JyA98aDYauXTrDbYJFsmeCg3UBvhn9Ut2yZhU74EPsmDVyKu5l/gfrVuJqPVr
 cShzJHGxZUJHBa79mrY8Oe7OAHN1ZuaQUgM8b8sQP6OPUipZM/2QPv9nshlqggQq
 WHbcYaOqqvt1+rcRzG/rIxv9XSaarU5uz+nebECY1Pg10bYR0owrisVGzNUt1e8B
 EFH/g8mMOmO37oRDYdXBTrVPlfY3lMBafgYTrhCON6TA2ucTwsZqnlgL5AJNjJAK
 CQD03sK/kAVCC+VV4n4U9f+PnNoEiYrRx2gC9QScRGwRxyAGjfw/9UAEgY9IID59
 WHSZTgd3KkUeSST69FHu+B/Vx5QDbGH9vRhOrL6aDXcitmitLphJ0Pa32bAhPkPI
 qbNORGT+pDmbWjERLPy6YEPB2rkAt4FQrDjtpfOOMjREBrTMkJs+iWBABEXs1AM/
 ppc1LGUotZuzcabxKCwSdLCJX6Gb4A6cl4dKk1ClndIKLSb/Q9+CIjM66Q6TdT29
 Ftp5GRKv+Ga6OFfjku71yatJmYMxGZ9hW7Nnpzby3dbjWKqOE6Zglxrhg7VVb9zI
 I7VTqwKDXzVrkyx1Tu7lJZ2NMVx6Nt2tpTuJ3cq5d/CWd1MrhC948hwjmFh0C1GY
 uRn8KoBXIvcQUwy92seb
 =hjf/
 -----END PGP SIGNATURE-----

Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull trivial writeback fixes from Wu Fengguang:
 "They've been tested in linux-next for 20 days actually."

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Remove outdated comment
  fs: Remove bogus wait in write_inode_now()
2012-03-28 10:07:27 -07:00
Linus Torvalds
69e1aaddd6 Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes
The changes to export dirty_writeback_interval are from Artem's s_dirt
 cleanup patch series.  The same is true of the change to remove the
 s_dirt helper functions which never got used by anyone in-tree.  I've
 run these changes by Al Viro, and am carrying them so that Artem can
 more easily fix up the rest of the file systems during the next merge
 window.  (Originally we had hopped to remove the use of s_dirt from
 ext4 during this merge window, but his patches had some bugs, so I
 ultimately ended dropping them from the ext4 tree.)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJPb39rAAoJENNvdpvBGATwVz8P/3V1NqSsk20VJOLbmEE45GxL
 GDzQJ6OsFG0UiQk6ISSrSdwxfav/KTCGySsU9UtAoOdPcBwnnsf8S7wc6OggwwuC
 hBFGwwFzk6YSQaZ58sUxWRGeOJuP/FPem6Id6buC4DQ1KIcznP/hEEgEnh/ir4Ec
 vrsfexY93TR8BE2Mi23v2epDVLU0B6bY/w9nDqbTXif3xN/gh/ypoHHouuM6Bs2n
 TyWHOwD15NwfnvRHd8PfDDqQM/D29x3QI0FMrWj9McpwIz4d4cBfhN4LQ/G+yLDY
 izv5DM10GbinwHPrsOTGVAW3KIdSS9rP3jCJGVuOrJZ9ufGXosvHuIYVhI7J3SBK
 JhBu6QEsN1IsvlVYpz9q8mqVKaDXQLsz2eaTw+i4yfmyOk1kOX7nIEOxYFF78G+V
 Of/W1SpIpJQaXvLHRcDj9fDj0fZTciUZA8v7/HOFS+co2dzIl0iZbcfBFp0/56RY
 sWdQoeRlx1ciVDPR+w2TQO5w3VWQw1gT5aqux0NiPj0XFoiUHScxgNGAYbqENMQw
 v9chvyDMlorqj0rF/Vey5SssgEDi7MTdYuYTi4YyMqr7pcvOJaO85pf+wH9g2eKW
 XhW33PhPGuwCJDP5Pg8Y0Z2Hp/Q3DCqhLqhGfTyAs/NG9+hR4wgp3VWb8CUqhA1t
 C/yzNeOYqScAefCzQx2V
 =+9zk
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates for 3.4 from Ted Ts'o:
 "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes

  The changes to export dirty_writeback_interval are from Artem's s_dirt
  cleanup patch series.  The same is true of the change to remove the
  s_dirt helper functions which never got used by anyone in-tree.  I've
  run these changes by Al Viro, and am carrying them so that Artem can
  more easily fix up the rest of the file systems during the next merge
  window.  (Originally we had hopped to remove the use of s_dirt from
  ext4 during this merge window, but his patches had some bugs, so I
  ultimately ended dropping them from the ext4 tree.)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits)
  vfs: remove unused superblock helpers
  mm: export dirty_writeback_interval
  ext4: remove useless s_dirt assignment
  ext4: write superblock only once on unmount
  ext4: do not mark superblock as dirty unnecessarily
  ext4: correct ext4_punch_hole return codes
  ext4: remove restrictive checks for EOFBLOCKS_FL
  ext4: always set then trimmed blocks count into len
  ext4: fix trimmed block count accunting
  ext4: fix start and len arguments handling in ext4_trim_fs()
  ext4: update s_free_{inodes,blocks}_count during online resize
  ext4: change some printk() calls to use ext4_msg() instead
  ext4: avoid output message interleaving in ext4_error_<foo>()
  ext4: remove trailing newlines from ext4_msg() and ext4_error() messages
  ext4: add no_printk argument validation, fix fallout
  ext4: remove redundant "EXT4-fs: " from uses of ext4_msg
  ext4: give more helpful error message in ext4_ext_rm_leaf()
  ext4: remove unused code from ext4_ext_map_blocks()
  ext4: rewrite punch hole to use ext4_ext_remove_space()
  jbd2: cleanup journal tail after transaction commit
  ...
2012-03-28 10:02:55 -07:00
Linus Torvalds
56b59b429b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates for 3.4-rc1 from Sage Weil:
 "Alex has been busy.  There are a range of rbd and libceph cleanups,
  especially surrounding device setup and teardown, and a few critical
  fixes in that code.  There are more cleanups in the messenger code,
  virtual xattrs, a fix for CRC calculation/checks, and lots of other
  miscellaneous stuff.

  There's a patch from Amon Ott to make inos behave a bit better on
  32-bit boxes, some decode check fixes from Xi Wang, and network
  throttling fix from Jim Schutt, and a couple RBD fixes from Josh
  Durgin.

  No new functionality, just a lot of cleanup and bug fixing."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits)
  rbd: move snap_rwsem to the device, rename to header_rwsem
  ceph: fix three bugs, two in ceph_vxattrcb_file_layout()
  libceph: isolate kmap() call in write_partial_msg_pages()
  libceph: rename "page_shift" variable to something sensible
  libceph: get rid of zero_page_address
  libceph: only call kernel_sendpage() via helper
  libceph: use kernel_sendpage() for sending zeroes
  libceph: fix inverted crc option logic
  libceph: some simple changes
  libceph: small refactor in write_partial_kvec()
  libceph: do crc calculations outside loop
  libceph: separate CRC calculation from byte swapping
  libceph: use "do" in CRC-related Boolean variables
  ceph: ensure Boolean options support both senses
  libceph: a few small changes
  libceph: make ceph_tcp_connect() return int
  libceph: encapsulate some messenger cleanup code
  libceph: make ceph_msgr_wq private
  libceph: encapsulate connection kvec operations
  libceph: move prepare_write_banner()
  ...
2012-03-28 10:01:29 -07:00
Linus Torvalds
9a7259d5c8 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3, UDF, and quota fixes from Jan Kara:
 "A couple of ext3 & UDF fixes and also one improvement in quota
  locking."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext3: fix start and len arguments handling in ext3_trim_fs()
  udf: Fix deadlock in udf_release_file()
  udf: Fix file entry logicalBlocksRecorded
  udf: Fix handling of i_blocks
  quota: Make quota code not call tty layer with dqptr_sem held
  udf: Init/maintain file entry checkpoint field
  ext3: Update ctime in ext3_splice_branch() only when needed
  ext3: Don't call dquot_free_block() if we don't update anything
  udf: Remove unnecessary OOM messages
2012-03-28 10:00:14 -07:00
Linus Torvalds
e9c0f1529c 9p changes for the 3.4 merge window
The following changes since commit 3c761ea05a:
 
   Fix autofs compile without CONFIG_COMPAT (2012-02-26 09:44:55 -0800)
 
 are available in the git repository at:
   git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-linus
 
 Jim Garlick (3):
       net/9p: don't allow Tflush to be interrupted
       net/9p: handle flushed Tclunk/Tremove
       9p: statfs should not override server f_type
 
  fs/9p/vfs_super.c |    2 +-
  net/9p/client.c   |   26 +++++++++++++++++++++++---
  2 files changed, 24 insertions(+), 4 deletions(-)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJPcJNvAAoJEGQ1yQUIRaTOeDYP/2JdsCRt8SGstGgdpi9zWdDL
 GjhNguTSqfQSv+LB7mvdM49JFEFkPdZ1/Giu7nkZmYqRR6RdprYuRsiX6lQnAE49
 CFjIyQkeeMhvEvYLX10KG7WtMOicOQGHdLEIfAlHBdA8ILZDJTz+F7nB0S4EG6ST
 3TRXVoa2ojS5SVUxQSOFQsT0CM5+5K95Bnjlv57diyWNb3lkdhzXDQmmaL/OCFpE
 6Zh0+CylAaiybBxT3JjSKlchIlNh59Ei0X7S0J8wDPEzk7+PLfjhlKeyFvgC/TOp
 9yIpX1nzOm6B0l8vXXatdW9QRubIvOYTZe7mtoxpIPPWVnOTI1fhHsteBDjPA7wj
 NJeMTzDRkBWbo+ntQ53MzFsbBLlF+DuFBWFrIRc3Ov3+endDxUCEhCxSBDCj1Hd5
 /dLNvyeiQPcne+j3oKIlTQLc1HJi6EEMCNVusBhT3tm3k1FbEFfeT9YqcsuRvPlD
 ozn6tVcRkyCf184tF8rBOY1nPwk2TMLV8vrUPYOtqD/4plNLxmG6JTFcLLBLA+wE
 G4dQBknM8LCDX6mTHJAF7zkEXomjhvDK9JANQ0sOczKu47+IfAFAs2Vc55R6d6Dd
 8m2H9QlVv0DGu7yHzo0/0W5gFq2xl5/5lV6u0sDAp0jo5MmOuIPqG+oF7NVnaM3Q
 6aI9BHEZ0Dz63VD2dniY
 =Jfvn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p changes for the 3.4 merge window from Eric Van Hensbergen.

* tag 'for-linus-3.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: statfs should not override server f_type
  net/9p: handle flushed Tclunk/Tremove
  net/9p: don't allow Tflush to be interrupted
2012-03-28 09:58:38 -07:00
Michel Lespinasse
b18dafc86b vfs: fix d_ancestor() case in d_materialize_unique
In d_materialise_unique() there are 3 subcases to the 'aliased dentry'
case; in two subcases the inode i_lock is properly released but this
does not occur in the -ELOOP subcase.

This seems to have been introduced by commit 1836750115 ("fix loop
checks in d_materialise_unique()").

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: stable@vger.kernel.org # v3.0+
[ Added a comment, and moved the unlock to where we generate the -ELOOP,
  which seems to be more natural.

  You probably can't actually trigger this without a buggy network file
  server - d_materialize_unique() is for finding aliases on non-local
  filesystems, and the d_ancestor() case is for a hardlinked directory
  loop.

  But we should be robust in the case of such buggy servers anyway. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 09:54:34 -07:00
Linus Torvalds
fa453a625d Merge branch 'for-linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML changes from Richard Weinberger:
 "Mostly bug fixes and cleanups"

* 'for-linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (35 commits)
  um: Update defconfig
  um: Switch to large mcmodel on x86_64
  MTD: Relax dependencies
  um: Wire CONFIG_GENERIC_IO up
  um: Serve io_remap_pfn_range()
  Introduce CONFIG_GENERIC_IO
  um: allow SUBARCH=x86
  um: most of the SUBARCH uses can be killed
  um: deadlock in line_write_interrupt()
  um: don't bother trying to rebuild CHECKFLAGS for USER_OBJS
  um: use the right ifdef around exports in user_syms.c
  um: a bunch of headers can be killed by using generic-y
  um: ptrace-generic.h doesn't need user.h
  um: kill HOST_TASK_PID
  um: remove pointless include of asm/fixmap.h from asm/pgtable.h
  um: asm-offsets.h might as well come from underlying arch...
  um: merge processor_{32,64}.h a bit...
  um: switch close_chan() to struct line
  um: race fix: initialize delayed_work *before* registering IRQ
  um: line->have_irq is never checked...
  ...
2012-03-27 18:29:53 -07:00
Geert Uytterhoeven
3ee6bd8e8d uml/hostfs: Propagate dirent.d_type to filldir()
Currently the (optional) d_type member in struct dirent is always
DT_UNKNOWN on hostfs, which may confuse buggy software using readdir().
Make sure to propagate its value from the underlying filesystem if it's
available there.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-03-25 00:29:52 +01:00
Linus Torvalds
11bcb32848 The following text was taken from the original review request:
"[PATCH 0/3] RFC - module.h usage cleanups in fs/ and lib/"
 		https://lkml.org/lkml/2012/2/29/589
 --
 
 Fix up files in fs/ and lib/ dirs to only use module.h if they really
 need it.
 
 These are trivial in scope vs. the work done previously.  We now have
 things where any few remaining cleanups can be farmed out to arch or
 subsystem maintainers, and I have done so when possible.  What is
 remaining here represents the bits that don't clearly lie within a
 single arch/subsystem boundary, like the fs dir and the lib dir.
 
 Some duplicate includes arising from overlapping fixes from
 independent subsystem maintainer submissions are also quashed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPbNw3AAoJEOvOhAQsB9HWA7wQALrsQ6V6Z+B3KsvSoD5kFnpZ
 Y+4uggs+GdUdWmtRrZnTBp896gGuUgBxc3syA2XWd7Oqi49+c5c1m0cFxKyVdIHm
 fB+jmxS69soADtHR3cXmxcQshrUzUf2rTn8frcw4O/BmJuplv4xT9uPQzwGaRSZT
 gomQsQ1bGnkwjO2jfS8f/N5Mjr8u/z0WF7TTOTUSq+Cv3BervPaSPF1Ea6J8oo+N
 4+/n8RlU1HWiI4inrgrFPN6UHmE45BAL2xGbB47LgooHJW8P5kAnU+vxGScaoy1Q
 JKX9WKT3VCiwR3VOPa86iLKP3Y8a3VlhyGn+yzzcYkGX/n0tbT7aoRhQm21sGIv0
 DoeXWe7aiiY8cEW69G6GIfRPFl+Zh81m1Whbu7IZT/sV3asx6jWmEXE8CgCfeDt5
 mNQk9D4Irf6+rmCSbeSVC4L0eFfLxNFouNyh2aus/q+gIjKNKYwZQryHrodK4wpv
 UgMKSTZfPrTAWay2gCNWNqo3Zs8e1LDqkftetxeU3jx2kTuaNzBl4Y7mhsX7sLYe
 MsFX3JUJ2pn6XWbgqcY+bdr/mzgsCrjzqdf15MTUzEc5SIfVF+XpNNZN1ITwl6UA
 /ZH9keBu1mEdCoPU5W74kYwx4p35hIeWJGfc0MRp07ruf941F+SBgMD11B0+06f0
 pN0DcITTkD16+sS4x1cB
 =Z4w0
 -----END PGP SIGNATURE-----

Merge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
 "Fix up files in fs/ and lib/ dirs to only use module.h if they really
  need it.

  These are trivial in scope vs the work done previously.  We now have
  things where any few remaining cleanups can be farmed out to arch or
  subsystem maintainers, and I have done so when possible.  What is
  remaining here represents the bits that don't clearly lie within a
  single arch/subsystem boundary, like the fs dir and the lib dir.

  Some duplicate includes arising from overlapping fixes from
  independent subsystem maintainer submissions are also quashed."

Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).

* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  lib: reduce the use of module.h wherever possible
  fs: reduce the use of module.h wherever possible
  includecheck: delete any duplicate instances of module.h
2012-03-24 10:24:31 -07:00
Linus Torvalds
ed2d265d12 The following text was taken from the original review request:
"[RFC - PATCH 0/7] consolidation of BUG support code."
 		https://lkml.org/lkml/2012/1/26/525
 --
 
 The changes shown here are to unify linux's BUG support under
 the one <linux/bug.h> file.  Due to historical reasons, we have
 some BUG code in bug.h and some in kernel.h -- i.e. the support for
 BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h,
 but old code in kernel.h wasn't moved to bug.h at that time.  As
 a band-aid, kernel.h was including <asm/bug.h> to pseudo link them.
 
 This has caused confusion[1] and general yuck/WTF[2] reactions.
 Here is an example that violates the principle of least surprise:
 
       CC      lib/string.o
       lib/string.c: In function 'strlcat':
       lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
       make[2]: *** [lib/string.o] Error 1
       $
       $ grep linux/bug.h lib/string.c
       #include <linux/bug.h>
       $
 
 We've included <linux/bug.h> for the BUG infrastructure and yet we
 still get a compile fail!  [We've not kernel.h for BUILD_BUG_ON.]
 Ugh - very confusing for someone who is new to kernel development.
 
 With the above in mind, the goals of this changeset are:
 
 1) find and fix any include/*.h files that were relying on the
    implicit presence of BUG code.
 2) find and fix any C files that were consuming kernel.h and
    hence relying on implicitly getting some/all BUG code.
 3) Move the BUG related code living in kernel.h to <linux/bug.h>
 4) remove the asm/bug.h from kernel.h to finally break the chain.
 
 During development, the order was more like 3-4, build-test, 1-2.
 But to ensure that git history for bisect doesn't get needless
 build failures introduced, the commits have been reorderd to fix
 the problem areas in advance.
 
 [1]  https://lkml.org/lkml/2012/1/3/90
 [2]  https://lkml.org/lkml/2012/1/17/414
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPbNwpAAoJEOvOhAQsB9HWrqYP/A0t9VB0nK6e42F0OR2P14MZ
 GJFtf1B++wwioIrx+KSWSRfSur1C5FKhDbxLR3I/pvkAYl4+T4JvRdMG6xJwxyip
 CC1kVQQNDjWVVqzjz2x6rYkOffx6dUlw/ERyIyk+OzP+1HzRIsIrugMqbzGLlX0X
 y0v2Tbd0G6xg1DV8lcRdp95eIzcGuUvdb2iY2LGadWZczEOeSXx64Jz3QCFxg3aL
 LFU4oovsg8Nb7MRJmqDvHK/oQf5vaTm9WSrS0pvVte0msSQRn8LStYdWC0G9BPCS
 GwL86h/eLXlUXQlC5GpgWg1QQt5i2QpjBFcVBIG0IT5SgEPMx+gXyiqZva2KwbHu
 LKicjKtfnzPitQnyEV/N6JyV1fb1U6/MsB7ebU5nCCzt9Gr7MYbjZ44peNeprAtu
 HMvJ/BNnRr4Ha6nPQNu952AdASPKkxmeXFUwBL1zUbLkOX/bK/vy1ujlcdkFxCD7
 fP3t7hghYa737IHk0ehUOhrE4H67hvxTSCKioLUAy/YeN1IcfH/iOQiCBQVLWmoS
 AqYV6ou9cqgdYoyila2UeAqegb+8xyubPIHt+lebcaKxs5aGsTg+r3vq5juMDAPs
 iwSVYUDcIw9dHer1lJfo7QCy3QUTRDTxh+LB9VlHXQICgeCK02sLBOi9hbEr4/H8
 Ko9g8J3BMxcMkXLHT9ud
 =PYQT
 -----END PGP SIGNATURE-----

Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull <linux/bug.h> cleanup from Paul Gortmaker:
 "The changes shown here are to unify linux's BUG support under the one
  <linux/bug.h> file.  Due to historical reasons, we have some BUG code
  in bug.h and some in kernel.h -- i.e.  the support for BUILD_BUG in
  linux/kernel.h predates the addition of linux/bug.h, but old code in
  kernel.h wasn't moved to bug.h at that time.  As a band-aid, kernel.h
  was including <asm/bug.h> to pseudo link them.

  This has caused confusion[1] and general yuck/WTF[2] reactions.  Here
  is an example that violates the principle of least surprise:

      CC      lib/string.o
      lib/string.c: In function 'strlcat':
      lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
      make[2]: *** [lib/string.o] Error 1
      $
      $ grep linux/bug.h lib/string.c
      #include <linux/bug.h>
      $

  We've included <linux/bug.h> for the BUG infrastructure and yet we
  still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
  very confusing for someone who is new to kernel development.

  With the above in mind, the goals of this changeset are:

  1) find and fix any include/*.h files that were relying on the
     implicit presence of BUG code.
  2) find and fix any C files that were consuming kernel.h and hence
     relying on implicitly getting some/all BUG code.
  3) Move the BUG related code living in kernel.h to <linux/bug.h>
  4) remove the asm/bug.h from kernel.h to finally break the chain.

  During development, the order was more like 3-4, build-test, 1-2.  But
  to ensure that git history for bisect doesn't get needless build
  failures introduced, the commits have been reorderd to fix the problem
  areas in advance.

	[1]  https://lkml.org/lkml/2012/1/3/90
	[2]  https://lkml.org/lkml/2012/1/17/414"

Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
and linux-next.

* tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  kernel.h: doesn't explicitly use bug.h, so don't include it.
  bug: consolidate BUILD_BUG_ON with other bug code
  BUG: headers with BUG/BUG_ON etc. need linux/bug.h
  bug.h: add include of it to various implicit C users
  lib: fix implicit users of kernel.h for TAINT_WARN
  spinlock: macroize assert_spin_locked to avoid bug.h dependency
  x86: relocate get/set debugreg fcns to include/asm/debugreg.
2012-03-24 10:08:39 -07:00
Linus Torvalds
f1d38e423a Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl
Pull sysctl updates from Eric Biederman:

 - Rewrite of sysctl for speed and clarity.

   Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
   are no longer bottlenecks in the process of adding and removing
   network devices.

   sysctl is now focused on being a filesystem instead of system call
   and the code can all be found in fs/proc/proc_sysctl.c.  Hopefully
   this means the code is now approachable.

   Much thanks is owed to Lucian Grinjincu for keeping at this until
   something was found that was usable.

 - The recent proc_sys_poll oops found by the fuzzer during hibernation
   is fixed.

* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
  sysctl: protect poll() in entries that may go away
  sysctl: Don't call sysctl_follow_link unless we are a link.
  sysctl: Comments to make the code clearer.
  sysctl: Correct error return from get_subdir
  sysctl: An easier to read version of find_subdir
  sysctl: fix memset parameters in setup_sysctl_set()
  sysctl: remove an unused variable
  sysctl: Add register_sysctl for normal sysctl users
  sysctl: Index sysctl directories with rbtrees.
  sysctl: Make the header lists per directory.
  sysctl: Move sysctl_check_dups into insert_header
  sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
  sysctl: Replace root_list with links between sysctl_table_sets.
  sysctl: Add sysctl_print_dir and use it in get_subdir
  sysctl: Stop requiring explicit management of sysctl directories
  sysctl: Add a root pointer to ctl_table_set
  sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
  sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
  sysctl: Normalize the root_table data structure.
  sysctl: Factor out insert_header and erase_header
  ...
2012-03-23 18:08:58 -07:00
KAMEZAWA Hiroyuki
e075f59152 seq_file: add seq_set_overflow(), seq_overflow()
It is undocumented but a seq_file's overflow state is indicated by
m->count == m->size.  Add seq_set_overflow() and seq_overflow() to
set/check overflow status explicitly.

Based on an idea from Eric Dumazet.

[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Pravin B Shelar
1b26c9b334 proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace.  Keeping that network namespace from being freed
when the last user goes away.  Leaving things like vlan devices in the
leaked network namespace.

If you use ip netns add for much real work this problem becomes apparent
pretty quickly.  It light testing the problem hides because frequently
you simply don't notice the leak.

Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.

This issue exists back to 3.0.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
KAMEZAWA Hiroyuki
bda7bad62b procfs: speed up /proc/pid/stat, statm
Process accounting applications as top, ps visit some files under
/proc/<pid>.  With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
and /proc/<pid>/statm files.

This patch adds
  - seq_put_decimal_ll() for signed values.
  - allow delimiter == 0.
  - convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.

Test result on a system with 2000+ procs.

Before patch:
  [kamezawa@bluextal test]$ top -b -n 1 | wc -l
  2223
  [kamezawa@bluextal test]$ time top -b -n 1 > /dev/null

  real    0m0.675s
  user    0m0.044s
  sys     0m0.121s

  [kamezawa@bluextal test]$ time ps -elf > /dev/null

  real    0m0.236s
  user    0m0.056s
  sys     0m0.176s

After patch:
  kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null

  real    0m0.657s
  user    0m0.052s
  sys     0m0.100s

  [kamezawa@bluextal ~]$ time ps -elf > /dev/null

  real    0m0.198s
  user    0m0.050s
  sys     0m0.145s

Considering top, ps tend to scan /proc periodically, this will reduce cpu
consumption by top/ps to some extent.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
KAMEZAWA Hiroyuki
1ac101a5d6 procfs: add num_to_str() to speed up /proc/stat
== stat_check.py
num = 0
with open("/proc/stat") as f:
        while num < 1000 :
                data = f.read()
                f.seek(0, 0)
                num = num + 1
==

perf shows

    20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
    13.41%  stat_check.py  [kernel.kallsyms]    [k] number
    12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
    10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
     4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
     4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.150s
user    0m0.026s
sys     0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.055s
user    0m0.022s
sys     0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Turner <pjt@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Eric Dumazet
59a32e2ce5 proc: speed up /proc/stat handling
On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
and is slow :

  # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
  5826
  # grep "cpu " /tmp/STRACE
  read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504>

Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)

Fix this by :

 1) Taking into account nr_irqs in the initial buffer sizing.

 2) Using ksize() to allow better filling of initial buffer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Djalal Harouni
b908243c54 fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
get_sparsemem_vmemmap_info() is only used inside fs/proc/kcore.c

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Jason Baron
accb61fe7b coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
MADV_DONTDUMP, which can be set by applications to specifically request
memory regions which should not dump core.

The specific application I have in mind is qemu: we can add a flag there
that wouldn't dump all of guest memory when qemu dumps core.  This flag
might also be useful for security sensitive apps that want to absolutely
make sure that parts of memory are not dumped.  To clear the flag use:
MADV_DODUMP.

[akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
[akpm@linux-foundation.org: fix up the architectures which broke]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Jason Baron
909af768e8 coredump: remove VM_ALWAYSDUMP flag
The motivation for this patchset was that I was looking at a way for a
qemu-kvm process, to exclude the guest memory from its core dump, which
can be quite large.  There are already a number of filter flags in
/proc/<pid>/coredump_filter, however, these allow one to specify 'types'
of kernel memory, not specific address ranges (which is needed in this
case).

Since there are no more vma flags available, the first patch eliminates
the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
the kernel to mark vdso and vsyscall pages.  However, it is simple
enough to check if a vma covers a vdso or vsyscall page without the need
for this flag.

The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
continue to work the same as before unless 'MADV_DONTDUMP' is set on the
region.

The qemu code which implements this features is at:

  http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch

In my testing the qemu core dump shrunk from 383MB -> 13MB with this
patch.

I also believe that the 'MADV_DONTDUMP' flag might be useful for
security sensitive apps, which might want to select which areas are
dumped.

This patch:

The VM_ALWAYSDUMP flag is currently used by the coredump code to
indicate that a vma is part of a vsyscall or vdso section.  However, we
can determine if a vma is in one these sections by checking it against
the gate_vma and checking for a non-NULL return value from
arch_vma_name().  Thus, freeing a valuable vma bit.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Namjae Jeon
d533df07c2 fat: fix bug in enforcing Long File Name length
Since '*outlen' is initialized to zero, it is currently possible to
create a filename of length (FAT_LFN_LEN + 1) when utf8 is not enabled.
To enforce the FAT_LFN_LEN limit, we must perform one less iteration.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Namjae Jeon
41f0c02eac fat: clean up xlate_to_uni()
xlate_to_uni() is called by vfat_build_slots() with sbi->nls_io as the
final argument.  nls_io can never be null at this point because the
check is already being done in fat_fill_super() wherein the mount fails
if it is null.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Dan Carpenter
da0503aae0 epoll: remove unneeded variable in reverse_path_check()
We never use the length variable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
Steven Rostedt
02edc6fc4d epoll: comment the funky #ifdef
Looking for a bug in -rt, I stumbled across this code here from: commit
2dfa4eeab0 ("epoll keyed wakeups: teach epoll about hints coming with
the wakeup key"), specifically:

  #ifdef CONFIG_DEBUG_LOCK_ALLOC
  static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
                                      unsigned long events, int subclass)
  {
         unsigned long flags;

         spin_lock_irqsave_nested(&wqueue->lock, flags, subclass);
         wake_up_locked_poll(wqueue, events);
         spin_unlock_irqrestore(&wqueue->lock, flags);
  }
  #else
  static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
                                      unsigned long events, int subclass)
  {
         wake_up_poll(wqueue, events);
  }
  #endif

You change the function of ep_wake_up_nested() depending on whether
CONFIG_DEBUG_LOCK_ALLOC is set or not.  This looks awfully suspicious,
and there's no comment to explain why.  I initially thought that this
was trying to fool lockdep, and hiding a real bug.

Investigating it, I found the creation of wake_up_nested() (which no
longer exists) but was created for the sole purpose of epoll and its
strange wake ups, as explained in commit 0ccf831cbe ("lockdep:
annotate epoll")

Although the commit message says "annotate epoll" the change log is much
better at explaining what is happening than what is in the actual code.
Thus a comment is really necessary here.  And to save the time of other
developers from having to go trudging through the git logs trying to
figure out why this code exists.

I took parts of the change log and placed it into a comment above the
affected code.  This will make the description of what is happening more
visible to new developers that have to look at this code for the first
time.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
Hans Verkuil
626cf23660 poll: add poll_requested_events() and poll_does_not_wait() functions
In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for.  An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested.  This is something that can happen
in the video4linux subsystem among others.

Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably.  The poll_table_struct does have it: it
has a key field with the event mask.  But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table pointer.

Also, the eventpoll implementation always left the key field at ~0 instead
of using the requested events mask.

This was changed in eventpoll.c so the key field now contains the actual
events that should be polled for as set by the caller.

The solution to the NULL poll_table pointer is to set the qproc field to
NULL in poll_table once poll() matches the events, not the poll_table
pointer itself.  That way drivers can obtain the mask through a new
poll_requested_events inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
that case poll_requested_events() returns ~0 (i.e.  all events).

Very rarely drivers might want to know whether poll_wait will actually
wait.  If another earlier file descriptor in the set already matched the
events the caller wanted to wait for, then the kernel will return from the
select() call without waiting.  This might be useful information in order
to avoid doing expensive work.

A new helper function poll_does_not_wait() is added that drivers can use
to detect this situation.  This is now used in sock_poll_wait() in
include/net/sock.h.  This was the only place in the kernel that needed
this information.

Drivers should no longer access any of the poll_table internals, but use
the poll_requested_events() and poll_does_not_wait() access functions
instead.  In order to enforce that the poll_table fields are now prepended
with an underscore and a comment was added warning against using them
directly.

This required a change in unix_dgram_poll() in unix/af_unix.c which used
the key field to get the requested events.  It's been replaced by a call
to poll_requested_events().

For qproc it was especially important to change its name since the
behavior of that field changes with this patch since this function pointer
can now be NULL when that wasn't possible in the past.

Any driver accessing the qproc or key fields directly will now fail to compile.

Some notes regarding the correctness of this patch: the driver's poll()
function is called with a 'struct poll_table_struct *wait' argument.  This
pointer may or may not be NULL, drivers can never rely on it being one or
the other as that depends on whether or not an earlier file descriptor in
the select()'s fdset matched the requested events.

There are only three things a driver can do with the wait argument:

1) obtain the key field:

	events = wait ? wait->key : ~0;

   This will still work although it should be replaced with the new
   poll_requested_events() function (which does exactly the same).
   This will now even work better, since wait is no longer set to NULL
   unnecessarily.

2) use the qproc callback. This could be deadly since qproc can now be
   NULL. Renaming qproc should prevent this from happening. There are no
   kernel drivers that actually access this callback directly, BTW.

3) test whether wait == NULL to determine whether poll would return without
   waiting. This is no longer sufficient as the correct test is now
   wait == NULL || wait->_qproc == NULL.

   However, the worst that can happen here is a slight performance hit in
   the case where wait != NULL and wait->_qproc == NULL. In that case the
   driver will assume that poll_wait() will actually add the fd to the set
   of waiting file descriptors. Of course, poll_wait() will not do that
   since it tests for wait->_qproc. This will not break anything, though.

   There is only one place in the whole kernel where this happens
   (sock_poll_wait() in include/net/sock.h) and that code will be replaced
   by a call to poll_does_not_wait() in the next patch.

   Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
   actually waiting. The next file descriptor from the set might match the
   event mask and thus any possible waits will never happen.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
H Hartley Sweeten
9710a78e55 fs/notify/notification.c: make subsys_initcall function static
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Arun Sharma <asharma@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:31 -07:00
Muthu Kumar
b502bd1152 magic.h: move some FS magic numbers into magic.h
- Move open-coded filesystem magic numbers into magic.h

- Rearrange magic.h so that the filesystem-related constants are grouped
  together.

Signed-off-by: Muthukumar R <muthur@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:31 -07:00
Linus Torvalds
e57f146b28 - Improve error messages
- Clean-up i_nlink management
 - Minor clean-ups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPbHqpAAoJECmIfjd9wqK0PGYP/i06JD3HM0uxU59TB0NM4q/7
 zhSe8epwlFFXIWkpq8ljjgqAqwFkQicvRwK4TG0IosBa3/q7LxpONpkQKOvvtaZT
 K4VROqHtPXBwtiAkjqso+ukjxqtJEaro8hQiljt06LR/kPPFjXFROqENMWMQPsP2
 bmeU5LfuYaFmVOLwI1jbCQTZ21Qrc1Wxvv3A3qSMIHsvLATSGjoN2vdRLhnefYh5
 CREBi5jr1mU/1vh+j4IjQ3GZWW+dg88G8munPOANA3ZOm+OOrbV/aq4txpasXkw7
 3lL//w070NV/GfbdXMqTj1ASlT3zOozFu4iaXgHdpMmNwmNk/IF0LSFye3iC+VdL
 pz0mlD6PN0mZhw+6qVEQgVs5MfQ7cZrvYZkY7tWEMlzuzBImR3v0TJ5Dg6CfaRbv
 wRHOXE70DefKigWT762TRbpaZBzSDImFTc9YdCYLtb9eEUnmpEP+Ci5/jK5EUjin
 CmPvxzCH60OCWb8CZIfdVCnHyfzY4YugTwp+PjjTWhc9iWwrMkIthlvGBAIfwjab
 6wyQrlFrKFhxLyPHGDMNYtPIazKXDsxPlvHAEXX24uXsXQNWyUBp4LNFUgfIzBsM
 wqTapNnl1Qmd9ZJYYhR7byhKhbU8Nb/3DUFaJEemviMeCncN+v0p0Qk5/lNjAiGz
 d1f5Yfu4EjQBT5piBR4C
 =Sq1l
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.4-rc1' of git://git.infradead.org/linux-ubifs

Pull UBIFS changes from Artem Bityutskiy:
 - Improve error messages
 - Clean-up i_nlink management
 - Minor clean-ups

* tag 'upstream-3.4-rc1' of git://git.infradead.org/linux-ubifs:
  UBIFS: improve error messages
  UBIFS: kill CUR_MAX_KEY_LEN macro
  UBIFS: do not use inc_link when i_nlink is zero
  UBIFS: make the dbg_lock spinlock static
  UBIFS: increase dumps loglevel
  UBIFS: amend recovery debugging message
2012-03-23 09:27:40 -07:00
Linus Torvalds
6e55f8ed81 One pstore patch
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJPa7HFAAoJEKurIx+X31iBtm8P/1wtBftqpdXiNXbuphWMxZAb
 FmzVSrlCRpbPB4HQjY64bs8BCR3zCeeGPf19Mx3mDrrY8o5ENlWEz/8eZNsXeiyS
 WMNIPrUevxRfsnPO61TizZ0JvPiCouQarRovYh0FW32PC8/9I2yYIM0Mj+L1P+b9
 FZgjk5Jeg+kEoDZZM6HepX500/iZjsPBuyysJ91JPVNzQJNHPJUYvHwPx8xYWVva
 PwdrugkLPFEJSyyc1CNLt3kg9wtDNtWIKUvrjmRl6aRFeLmJBUVBhkxtZD2rrLIm
 oOvGY+UdlTCg5vHHAYtc4oAVnZiXUxFaotH0Msa8ots6ORvKAXKBBmXTwS2bce8Q
 DUtqs/CU6X3LykRDyuCRvjuwuH5e4WUM3SMCSWuxk8ZOCiYWHwqMaaWYIBUDAWIm
 AXFbP29M95hdWd/jvZUxwJCXRc2Vr/MEXNkQzPWIw8eAewP+2jbqLLICjXqVK0Vp
 vkHmhVTN07ajbz/dJsJ3K2JObVal+1YEL96DdW+nstdmgNPiDhyLibVS/WH8qOq/
 k9s175e3kLOaLU1LCBdKy4TbG3fGg7f1FUVePelbmTSF9gRSz0Mz3jMyrvoFS7jQ
 //ra4HlrbUHHwwsK8R5axZGa2ejezLIc7/wdgRlEy2ZffqytNMPJ4RwCk0Ft1YJ2
 pbD9OUSUqwhRrtbxJWJd
 =MzvB
 -----END PGP SIGNATURE-----

Merge tag 'pstore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull one pstore patch from Tony Luck

* tag 'pstore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: Introduce get_reason_str() to pstore
2012-03-23 09:24:07 -07:00
Linus Torvalds
49d99a2f9c Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Pull XFS updates from Ben Myers:
 "Scalability improvements for dquots, log grant code cleanups, plus
  bugfixes and cleanups large and small"

Fix up various trivial conflicts that were due to some of the earlier
patches already having been integrated into v3.3 as bugfixes, and then
there were development patches on top of those.  Easily merged by just
taking the newer version from the pulled branch.

* 'for-linus' of git://oss.sgi.com/xfs/xfs: (45 commits)
  xfs: fallback to vmalloc for large buffers in xfs_getbmap
  xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get
  xfs: remove remaining scraps of struct xfs_iomap
  xfs: fix inode lookup race
  xfs: clean up minor sparse warnings
  xfs: remove the global xfs_Gqm structure
  xfs: remove the per-filesystem list of dquots
  xfs: use per-filesystem radix trees for dquot lookup
  xfs: per-filesystem dquot LRU lists
  xfs: use common code for quota statistics
  xfs: reimplement fdatasync support
  xfs: split in-core and on-disk inode log item fields
  xfs: make xfs_inode_item_size idempotent
  xfs: log timestamp updates
  xfs: log file size updates at I/O completion time
  xfs: log file size updates as part of unwritten extent conversion
  xfs: do not require an ioend for new EOF calculation
  xfs: use per-filesystem I/O completion workqueues
  quota: make Q_XQUOTASYNC a noop
  xfs: include reservations in quota reporting
  ...
2012-03-23 09:19:22 -07:00
Linus Torvalds
1c3ddfe5ab Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: clean up ordering in exit_cifs
  cifs: clean up call to cifs_dfs_release_automount_timer()
  CIFS: Delete echo_retries module parm
  CIFS: Prepare credits code for a slot reservation
  CIFS: Make wait_for_free_request killable
  CIFS: Introduce credit-based flow control
  CIFS: Simplify inFlight logic
  cifs: fix issue mounting of DFS ROOT when redirecting from one domain controller to the next
  CIFS: Respect negotiated MaxMpxCount
  CIFS: Fix a spurious error in cifs_push_posix_locks
2012-03-23 09:07:15 -07:00
Linus Torvalds
f63d395d47 NFS client updates for Linux 3.4
New features include:
 - Add NFS client support for containers.
   This should enable most of the necessary functionality, including
   lockd support, and support for rpc.statd, NFSv4 idmapper and
   RPCSEC_GSS upcalls into the correct network namespace from
   which the mount system call was issued.
 - NFSv4 idmapper scalability improvements
   Base the idmapper cache on the keyring interface to allow concurrent
   access to idmapper entries. Start the process of migrating users from
   the single-threaded daemon-based approach to the multi-threaded
   request-key based approach.
 - NFSv4.1 implementation id.
   Allows the NFSv4.1 client and server to mutually identify each other
   for logging and debugging purposes.
 - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
   having to use the more counterintuitive 'vers=4,minorversion=1'.
 - SUNRPC tracepoints.
   Start the process of adding tracepoints in order to improve debugging
   of the RPC layer.
 - pNFS object layout support for autologin.
 
 Important bugfixes include:
 - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail
   to wake up all tasks when applied to priority waitqueues.
 - Ensure that we handle read delegations correctly, when we try to
   truncate a file.
 - A number of fixes for NFSv4 state manager loops (mostly to do with
   delegation recovery).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPalZbAAoJEGcL54qWCgDyCi4P+QHcmzQhJO7HWx3Pzjs67bFT
 xMSYaKHGWS4AJKUBVl5OKBxUExfrMHBNbElV3IKUIwBlDx8RVtnwfptKSe146iki
 dn4TrRO5es8nmI4hRDcGMlzJDZq4y0Qg//qiUFmojiNW/Avw0ljfMoVUejJJ09FV
 oeDk4EGtcxkEyH+g48ZjYbyspRnG8qtD3atf70Z3lYE0ELdG/B5Dyzw1RDrA5p73
 xJX3lqy8p/4ROzw/dmNoxdAXOrr3Q4/T58Bvp/lUglPy/EHyPmWzFoH0MU0C/PFu
 5VnAl6QDbNCTcIw9FvJlX/mIyErpNG9eKzUskUc9L9SA+B+J/i4rIap4KATRN3nH
 7QhE5qUacPuJnvxml7MPmlQTuft3fkAQ7NhKIWrbRi1QS9FmJC5NxctIb8loqlFn
 yIXdKeLfMshB+NyuFS9uzStX7SmV3eMgVd+5ZxRjYxm+PKJLw2KXeudArL6M5mHK
 3QeKZpqwaYQ3RfaTNpvAp0doiXHCO5UbWfI0Pe8xQs/QcMCNReffqV2G4IJKFAu6
 WpoN2UDQC9LCBifLw2nS7kku8+ZVXLQU8OC1NVl3TG15xD9cNLXuk3/y5llPGq4O
 odo52uLFpJohbDaHMj5RTKOfchTQCm2iyuVmxZEeAySypMSiAXmW7COSKHs/HxI1
 VBm+EI00Pvmm5+fUjIlp
 =LuHE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates for Linux 3.4 from Trond Myklebust:
 "New features include:
   - Add NFS client support for containers.

     This should enable most of the necessary functionality, including
     lockd support, and support for rpc.statd, NFSv4 idmapper and
     RPCSEC_GSS upcalls into the correct network namespace from which
     the mount system call was issued.

   - NFSv4 idmapper scalability improvements

     Base the idmapper cache on the keyring interface to allow
     concurrent access to idmapper entries.  Start the process of
     migrating users from the single-threaded daemon-based approach to
     the multi-threaded request-key based approach.

   - NFSv4.1 implementation id.

     Allows the NFSv4.1 client and server to mutually identify each
     other for logging and debugging purposes.

   - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
     having to use the more counterintuitive 'vers=4,minorversion=1'.

   - SUNRPC tracepoints.

     Start the process of adding tracepoints in order to improve
     debugging of the RPC layer.

   - pNFS object layout support for autologin.

  Important bugfixes include:

   - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
     fail to wake up all tasks when applied to priority waitqueues.

   - Ensure that we handle read delegations correctly, when we try to
     truncate a file.

   - A number of fixes for NFSv4 state manager loops (mostly to do with
     delegation recovery)."

* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
  NFS: fix sb->s_id in nfs debug prints
  xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
  xprtrdma: The transport should not bug-check when a dup reply is received
  pnfs-obj: autologin: Add support for protocol autologin
  NFS: Remove nfs4_setup_sequence from generic rename code
  NFS: Remove nfs4_setup_sequence from generic unlink code
  NFS: Remove nfs4_setup_sequence from generic read code
  NFS: Remove nfs4_setup_sequence from generic write code
  NFS: Fix more NFS debug related build warnings
  SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
  nfs: non void functions must return a value
  SUNRPC: Kill compiler warning when RPC_DEBUG is unset
  SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
  NFS: Use cond_resched_lock() to reduce latencies in the commit scans
  NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
  NFS: ncommit count is being double decremented
  SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
  Try using machine credentials for RENEW calls
  NFSv4.1: Fix a few issues in filelayout_commit_pagelist
  NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
  ...
2012-03-23 08:53:47 -07:00
Linus Torvalds
aab008db80 Cleanups: rename of flush to invalidate, moving reporting of statistics
into debugfs, and use __read_mostly as neccessary.
 Also add a MAINTAINER file for cleancache API files.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZ1sjAAoJEFjIrFwIi8fJCSkIALqBBCkuPSusvdebT59Oq145
 EW2eEwJzVft0vlvS/IPl+W37TH5VWhBVCW8frAYMpLaY5X/W1ZwzFpx76T4acAiM
 wkXwsyYXs6N13OOFoH5gmf3cwproAioRxbEeALucxeMR4rK5Yw+oZXT+hSu2KaLh
 1LtHyx+u+NLb0QOseAuDcmyjY9r6aLBA1HMcVD2z+4UW1n/9NWexQP3ShYP9uMDs
 GnyDzZ8vWXTcoc4Auj0rpaNsT5d47ltGegKASZmmvS3QyMHbZ4sk2HECnQY4wSee
 alPJwDyb6mOJmQ3e4s940onjfZPgd8/cW/DVEUrveH+dz6Eqjxqz41dVRVXiLz8=
 =tUNC
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm

Pull cleancache changes from Konrad Rzeszutek Wilk:
 "This has some patches for the cleancache API that should have been
  submitted a _long_ time ago.  They are basically cleanups:

   - rename of flush to invalidate

   - moving reporting of statistics into debugfs

   - use __read_mostly as necessary.

  Oh, and also the MAINTAINERS file change.  The files (except the
  MAINTAINERS file) have been in #linux-next for months now.  The late
  addition of MAINTAINERS file is a brain-fart on my side - didn't
  realize I needed that just until I was typing this up - and I based
  that patch on v3.3 - so the tree is on top of v3.3."

* tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
  MAINTAINERS: Adding cleancache API to the list.
  mm: cleancache: Use __read_mostly as appropiate.
  mm: cleancache: report statistics via debugfs instead of sysfs.
  mm: zcache/tmem/cleancache: s/flush/invalidate/
  mm: cleancache: s/flush/invalidate/
2012-03-22 19:52:47 -07:00
Linus Torvalds
f7493e5d9c vfs: tidy up sparse warnings in fs/namei.c
While doing the fs/namei.c cleanups, I ran sparse on it, and it pointed
out other large integers and a couple of cases of us using '0' instead
of the proper 'NULL'.

Sparse still doesn't understand some of the conditional locking going
on, but that's no excuse for not fixing up the trivial stuff.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22 16:10:40 -07:00
Linus Torvalds
989412bbd2 vfs: tidy up fs/namei.c byte-repeat word constants
In commit commit 1de5b41cd3 ("fs/namei.c: fix warnings on 32-bit")
Andrew said that there must be a tidier way of doing this.

This is that tidier way.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22 15:58:27 -07:00
Randy Dunlap
1f1e6e523e fs: fix kernel-doc warnings in dcache.c
Fix kernel-doc warnings in fs/dcache.c:

  Warning(fs/dcache.c:1743): No description found for parameter 'seqp'
  Warning(fs/dcache.c:1743): Excess function parameter 'seq' description in '__d_lookup_rcu'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22 15:49:18 -07:00
Al Viro
f132c5be05 Fix full_name_hash() behaviour when length is a multiple of 8
We want it to match what hash_name() is doing, which means extra
multiply by 9 in this case...

Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22 15:10:43 -07:00
Lucas De Marchi
4e474a00d7 sysctl: protect poll() in entries that may go away
Protect code accessing ctl_table by grabbing the header with grab_header()
and after releasing with sysctl_head_finish().  This is needed if poll()
is called in entries created by modules: currently only hostname and
domainname support poll(), but this bug may be triggered when/if modules
use it and if user called poll() in a file that doesn't support it.

Dave Jones reported the following when using a syscall fuzzer while
hibernating/resuming:

RIP: 0010:[<ffffffff81233e3e>]  [<ffffffff81233e3e>] proc_sys_poll+0x4e/0x90
RAX: 0000000000000145 RBX: ffff88020cab6940 RCX: 0000000000000000
RDX: ffffffff81233df0 RSI: 6b6b6b6b6b6b6b6b RDI: ffff88020cab6940
[ ... ]
Code: 00 48 89 fb 48 89 f1 48 8b 40 30 4c 8b 60 e8 b8 45 01 00 00 49 83
7c 24 28 00 74 2e 49 8b 74 24 30 48 85 f6 74 24 48 85 c9 75 32 <8b> 16
b8 45 01 00 00 48 63 d2 49 39 d5 74 10 8b 06 48 98 48 89

If an entry goes away while we are polling() it, ctl_table may not exist
anymore.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-03-22 14:46:56 -07:00
Linus Torvalds
95211279c5 Merge branch 'akpm' (Andrew's patch-bomb)
Merge first batch of patches from Andrew Morton:
 "A few misc things and all the MM queue"

* emailed from Andrew Morton <akpm@linux-foundation.org>: (92 commits)
  memcg: avoid THP split in task migration
  thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
  memcg: clean up existing move charge code
  mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
  mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
  mm/memcontrol.c: s/stealed/stolen/
  memcg: fix performance of mem_cgroup_begin_update_page_stat()
  memcg: remove PCG_FILE_MAPPED
  memcg: use new logic for page stat accounting
  memcg: remove PCG_MOVE_LOCK flag from page_cgroup
  memcg: simplify move_account() check
  memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  memcg: kill dead prev_priority stubs
  memcg: remove PCG_CACHE page_cgroup flag
  memcg: let css_get_next() rely upon rcu_read_lock()
  cgroup: revert ss_id_lock to spinlock
  idr: make idr_get_next() good for rcu_read_lock()
  memcg: remove unnecessary thp check in page stat accounting
  memcg: remove redundant returns
  memcg: enum lru_list lru
  ...
2012-03-22 09:04:48 -07:00
Alex Elder
3489b42a72 ceph: fix three bugs, two in ceph_vxattrcb_file_layout()
In ceph_vxattrcb_file_layout(), there is a check to determine
whether a preferred PG should be formatted into the output buffer.
That check assumes that a preferred PG number of 0 indicates "no
preference," but that is wrong.  No preference is indicated by a
negative (specifically, -1) PG number.

In addition, if that condition yields true, the preferred value
is formatted into a sized buffer, but the size consumed by the
earlier snprintf() call is not accounted for, opening up the
possibilty of a buffer overrun.

Finally, in ceph_vxattrcb_dir_rctime() where the nanoseconds part of
the time displayed did not include leading 0's, which led to
erroneous (sub-second portion of) time values being shown.

This fixes these three issues:
    http://tracker.newdream.net/issues/2155
    http://tracker.newdream.net/issues/2156
    http://tracker.newdream.net/issues/2157

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder
cffaba15cd ceph: ensure Boolean options support both senses
Many ceph-related Boolean options offer the ability to both enable
and disable a feature.  For all those that don't offer this, add
a new option so that they do.

Note that ceph_show_options()--which reports mount options currently
in effect--only reports the option if it is different from the
default value.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
ee57741c52 rbd: make ceph_parse_options() return a pointer
ceph_parse_options() takes the address of a pointer as an argument
and uses it to return the address of an allocated structure if
successful.  With this interface is not evident at call sites that
the pointer is always initialized.  Change the interface to return
the address instead (or a pointer-coded error code) to make the
validity of the returned pointer obvious.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:47 -05:00
Alex Elder
18fa8b3fea ceph: make ceph_setxattr() and ceph_removexattr() more alike
This patch just rearranges a few bits of code to make more
portions of ceph_setxattr() and ceph_removexattr() identical.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:46 -05:00
Alex Elder
3ce6cd1233 ceph: avoid repeatedly computing the size of constant vxattr names
All names defined in the directory and file virtual extended
attribute tables are constant, and the size of each is known at
compile time.  So there's no need to compute their length every
time any file's attribute is listed.

Record the length of each string and use it when needed to determine
the space need to represent them.  In addition, compute the
aggregate size of strings in each table just once at initialization
time.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:46 -05:00
Alex Elder
aa4066ed7b ceph: encode type in vxattr callback routines
The names of the callback functions used for virtual extended
attributes are based only on the last component of the attribute
name.  Because of the way these are defined, this precludes allowing
a single (lowest) attribute name for different callbacks, dependent
on the type of file being operated on.  (For example, it might be
nice to support both "ceph.dir.layout" and "ceph.file.layout".)

Just change the callback names to avoid this problem.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:46 -05:00
Alex Elder
881a5fa200 ceph: drop "_cb" from name of struct ceph_vxattr_cb
A struct ceph_vxattr_cb does not represent a callback at all, but
rather a virtual extended attribute itself.  Drop the "_cb" suffix
from its name to reflect that.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:46 -05:00
Alex Elder
eb78808446 ceph: use macros to normalize vxattr table definitions
Entries in the ceph virtual extended attribute tables all follow a
distinct pattern in their definition.  Enforce this pattern through
the use of a macro.

Also, a null name field signals the end of the table, so make that
be the first field in the ceph_vxattr_cb structure.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:46 -05:00
Alex Elder
2289190719 ceph: use a symbolic name for "ceph." extended attribute namespace
Use symbolic constants to define the top-level prefix for "ceph."
extended attribute names.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:46 -05:00