linux

Author	SHA1	Message	Date
Linus Torvalds	062e27ec1b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: fix checkpatch.pl warnings Squashfs: fix filename typo Squashfs: update Kconfig and documentation for LZO Squashfs: fix block size use in LZO decompressor Squashfs: Add LZO compression support squashfs: fix filename in header comment Squashfs: Make XATTR config name consistent with other file systems squashfs: fix compiler inline warning	2010-08-11 09:20:13 -07:00
Linus Torvalds	bf25db3654	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Fix groups code when num_devices is not divisible by group_width exofs: Remove useless optimization exofs: exofs_file_fsync and exofs_file_flush correctness exofs: Remove superfluous dependency on buffer_head and writeback	2010-08-11 09:19:43 -07:00
Linus Torvalds	682c30ed21	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits) ceph: generalize mon requests, add pool op support ceph: only queue async writeback on cap revocation if there is dirty data ceph: do not ignore osd_idle_ttl mount option ceph: constify dentry_operations ceph: whitespace cleanup ceph: add flock/fcntl lock support ceph: define on-wire types, constants for file locking support ceph: add CEPH_FEATURE_FLOCK to the supported feature bits ceph: support v2 reconnect encoding ceph: support v2 client_caps encoding ceph: move AES iv definition to shared header ceph: fix decoding of pool snap info ceph: make ->sync_fs not wait if wait==0 ceph: warn on missing snap realm ceph: print useful error message when crush rule not found ceph: use %pU to print uuid (fsid) ceph: sync header defs with server code ceph: clean up header guards ceph: strip misleading/obsolete version, feature info ceph: specify supported features in super.h ...	2010-08-11 09:18:32 -07:00
Lubomir Rintel	ab654bab04	fs/sysv/super.c: add support for non-PDP11 v7 filesystems This adds byte order autodetection (of PDP-11 and LE filesystems). No attempt is made to detect big-endian filesystems -- were there any? Tested with PDP-11 v7 filesystems and PC-IX maintenance floppy. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:23 -07:00
Lubomir Rintel	0bcaa65a56	fs/sysv: v7: adjust sanity checks for some volumes Newly mkfs-ed filesystems from Seventh Edition have last modification time set to zero, but are otherwise perfectly valid. Also, tighten up other sanity checks to filter out most filesystems with [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:22 -07:00
Lubomir Rintel	a36517e930	fs/sysv: add v7 alias So that the module gets autoloaded when a v7 filesystem is mounted. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:22 -07:00
Dan Carpenter	bebf8cfaea	afs: destroy work queue on init failure We can clean up the work queue on this error path. This function is called from afs_init(). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:22 -07:00
Alexey Dobriyan	9c867fbe06	partitions: fix sometimes unreadable partition strings Fix this garbage happening quite often: ==> sda: scsi 3:0:0:0: CD-ROM TOSHIBA ==> sda1 sda2 sda3 sda4 <sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray ^^^ Uniform CD-ROM driver Revision: 3.20 sr 3:0:0:0: Attached scsi CD-ROM sr0 ==> sda5 sda6 sda7 > Make "sda: sda1 ..." lines actually lines. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:20 -07:00
Robert P. J. Day	cfbef3cb16	procfs: simplify conditional processing of fs/proc.o. Since the entire fs/proc directory is conditionally included based on CONFIG_PROC_FS, it's redundant to check that same variable within that directory. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:20 -07:00
Nathan Lynch	a2a20c412c	signalfd: fill in ssi_int for posix timers and message queues If signalfd is used to consume a signal generated by a POSIX interval timer or POSIX message queue, the ssi_int field does not reflect the data (sigevent->sigev_value) supplied to timer_create(2) or mq_notify(3). (The ssi_ptr field, however, is filled in.) This behavior differs from signalfd's treatment of sigqueue-generated signals -- see the default case in signalfd_copyinfo. It also gives results that differ from the case when a signal is handled conventionally via a sigaction-registered handler. So, set signalfd_siginfo->ssi_int in the remaining cases (__SI_TIMER, __SI_MESGQ) where ssi_ptr is set. akpm: a non-back-compatible change. Merge into -stable to minimise the number of kernels which are in the field and which miss this feature. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:20 -07:00
Chris Wright	b7300b78d1	blkdev: cgroup whitelist permission fix The cgroup device whitelist code gets confused when trying to grant permission to a disk partition that is not currently open. Part of blkdev_open() includes __blkdev_get() on the whole disk. Basically, the only ways to reliably allow a cgroup access to a partition on a block device when using the whitelist are to 1) also give it access to the whole block device or 2) make sure the partition is already open in a different context. The patch avoids the cgroup check for the whole disk case when opening a partition. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=589662 Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Serge E. Hallyn <serue@us.ibm.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:18 -07:00
Changli Gao	b3397ad544	reiserfs: remove unused local `wait' Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:12 -07:00
Dan Carpenter	5fc79d85d2	autofs4: remove unneeded null check in try_to_fill_dentry() After `97e7449a7a`: "autofs4: fix indirect mount pending expire race" we no longer assumed that "ino" can be null. The other null checks got removed but this was one was missed. Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:06 -07:00
Changli Gao	a892e2d7dc	vfs: use kmalloc() to allocate fdmem if possible Use kmalloc() to allocate fdmem if possible. vmalloc() is used as a fallback solution for fdmem allocation. A new helper function __free_fdtable() is introduced to reduce the lines of code. A potential bug, vfree() a memory allocated by kmalloc(), is fixed. [akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem() and free_fdmem()] Signed-off-by: Changli Gao <xiaosuo@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jiri Slaby <jslaby@suse.cz> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Avi Kivity <avi@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:02 -07:00
Dmitry Torokhov	06b1e104b7	vfs: clarify that nonseekable_open() will never fail Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Kacur <jkacur@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:02 -07:00
Wu Fengguang	454eedb890	vfs: O_* bit numbers uniqueness check The O_* bit numbers are defined in 20+ arch/*, and can silently overlap. Add a compile time check to ensure the uniqueness as suggested by David Miller. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:02 -07:00
Tony Battersby	58939473ba	vfs: improve comment describing fget_light() Improve the description of fget_light(), which is currently incorrect about needing a prior refcnt (judging by the way it is actually used). Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:02 -07:00
Dave Kleikamp	aca0fa34bd	jfs: don't allow os2 xattr namespace overlap with others It's currently possible to bypass xattr namespace access rules by prefixing valid xattr names with "os2.", since the os2 namespace stores extended attributes in a legacy format with no prefix. This patch adds checking to deny access to any valid namespace prefix following "os2.". Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Reported-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-10 15:33:09 -07:00
Linus Torvalds	2f9e825d3e	Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits) block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n xen-blkfront: fix missing out label blkdev: fix blkdev_issue_zeroout return value block: update request stacking methods to support discards block: fix missing export of blk_types.h writeback: fix bad _bh spinlock nesting drbd: revert "delay probes", feature is being re-implemented differently drbd: Initialize all members of sync_conf to their defaults [Bugz 315] drbd: Disable delay probes for the upcomming release writeback: cleanup bdi_register writeback: add new tracepoints writeback: remove unnecessary init_timer call writeback: optimize periodic bdi thread wakeups writeback: prevent unnecessary bdi threads wakeups writeback: move bdi threads exiting logic to the forker thread writeback: restructure bdi forker loop a little writeback: move last_active to bdi writeback: do not remove bdi from bdi_list writeback: simplify bdi code a little writeback: do not lose wake-ups in bdi threads ... Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and drivers/scsi/scsi_error.c as per Jens.	2010-08-10 15:22:42 -07:00
Linus Torvalds	fc385c3132	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (68 commits) U6715 16550A serial driver support Char: nozomi, set tty->driver_data appropriately Char: nozomi, fix tty->count counting serial: max3107: Fix gpiolib support hsu: call PCI pm hooks in suspend/resume function hsu: some code cleanup hsu: add a periodic timer to check dma rx channel hsu: driver for Medfield High Speed UART device mxser: remove unnesesary NULL check serial: add support for OX16PCI958 card serial: 68328serial.c: remove dead (ALMA_ANS \| DRAGONIXVZ \| M68EZ328ADS) timbuart: use __devinit and __devexit macros for probe and remove serial: MMIO32 support for 8250_early.c serial: mcf: don't take spinlocks in already protected functions serial: general fixes in the serial_rs485 structure serial: fix missing bit coverage of ASYNC_FLAGS serial: "altera_uart: simplify altera_uart_console_putc()" checkpatch fixes serial: crisv10: formatting of pointers in printk() vt: Fix warning: statement with no effect due to vt_kern.h tty_io: remove casts from void* ...	2010-08-10 15:03:42 -07:00
Yehuda Sadeh	e56fa10e92	ceph: generalize mon requests, add pool op support Generalize the current statfs synchronous requests, and support pool_ops. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-08-10 14:41:25 -07:00
Linus Torvalds	7233e39276	Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: staging: Pushdown bkl to easycap ioctl handlers autofs/autofs4: Move compat_ioctl handling into fs v4l: Convert v4l2-dev to unlocked_ioctl ia64/perfmon: Convert to unlocked_ioctl sunrpc: Remove duplicated #include ncpfs: Remove duplicated #include	2010-08-10 13:58:28 -07:00
hyc@symas.com	26df6d1340	tty: Add EXTPROC support for LINEMODE This patch is against the 2.6.34 source. Paraphrased from the 1989 BSD patch by David Borman @ cray.com: These are the changes needed for the kernel to support LINEMODE in the server. There is a new bit in the termios local flag word, EXTPROC. When this bit is set, several aspects of the terminal driver are disabled. Input line editing, character echo, and mapping of signals are all disabled. This allows the telnetd to turn off these functions when in linemode, but still keep track of what state the user wants the terminal to be in. New ioctl: TIOCSIG Generate a signal to processes in the current process group of the pty. There is a new mode for packet driver, the TIOCPKT_IOCTL bit. When packet mode is turned on in the pty, and the EXTPROC bit is set, then whenever the state of the pty is changed, the next read on the master side of the pty will have the TIOCPKT_IOCTL bit set. This allows the process on the server side of the pty to know when the state of the terminal has changed; it can then issue the appropriate ioctl to retrieve the new state. Since the original BSD patches accompanied the source code for telnet I've left that reference here, but obviously the feature is useful for any remote terminal protocol, including ssh. The corresponding feature has existed in the BSD tty driver since 1989. For historical reference, a good copy of the relevant files can be found here: http://anonsvn.mit.edu/viewvc/krb5/trunk/src/appl/telnet/?pathrev=17741 Signed-off-by: Howard Chu <hyc@symas.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-08-10 13:47:39 -07:00
Linus Torvalds	26b55633a8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: ecryptfs: dont call lookup_one_len to avoid NULL nameidata fs/ecryptfs/file.c: introduce missing free ecryptfs: release reference to lower mount if interpose fails eCryptfs: Handle ioctl calls with unlocked and compat functions ecryptfs: Fix warning in ecryptfs_process_response()	2010-08-10 12:14:39 -07:00
Linus Torvalds	e8a89cebdb	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (79 commits) mtd: Remove obsolete <mtd/compatmac.h> include mtd: Update copyright notices jffs2: Update copyright notices mtd-physmap: add support users can assign the probe type in board files mtd: remove redwood map driver mxc_nand: Add v3 (i.MX51) Support mxc_nand: support 8bit ecc mxc_nand: fix correct_data function mxc_nand: add V1_V2 namespace to registers mxc_nand: factor out a check_int function mxc_nand: make some internally used functions overwriteable mxc_nand: rework get_dev_status mxc_nand: remove 0xe00 offset from registers mtd: denali: Add multi connected NAND support mtd: denali: Remove set_ecc_config function mtd: denali: Remove unuseful code in get_xx_nand_para functions mtd: denali: Remove device_info_tag structure mtd: m25p80: add support for the Winbond W25Q32 SPI flash chip mtd: m25p80: add support for the Intel/Numonyx {16,32,64}0S33B SPI flash chips mtd: m25p80: add support for the EON EN25P{32, 64} SPI flash chips ... Fix up trivial conflicts in drivers/mtd/maps/{Kconfig,redwood.c} due to redwood driver removal.	2010-08-10 11:49:21 -07:00
Linus Torvalds	8196867c74	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bcopeland/omfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bcopeland/omfs: omfs: fix uninitialized variable warning omfs: sanity check cluster size omfs: refuse to mount if bitmap pointer is obviously wrong omfs: check bounds on block numbers before passing to sb_bread omfs: fix memory leak	2010-08-10 11:47:36 -07:00
Linus Torvalds	8c8946f509	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits) fanotify: use both marks when possible fsnotify: pass both the vfsmount mark and inode mark fsnotify: walk the inode and vfsmount lists simultaneously fsnotify: rework ignored mark flushing fsnotify: remove global fsnotify groups lists fsnotify: remove group->mask fsnotify: remove the global masks fsnotify: cleanup should_send_event fanotify: use the mark in handler functions audit: use the mark in handler functions dnotify: use the mark in handler functions inotify: use the mark in handler functions fsnotify: send fsnotify_mark to groups in event handling functions fsnotify: Exchange list heads instead of moving elements fsnotify: srcu to protect read side of inode and vfsmount locks fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called fsnotify: use _rcu functions for mark list traversal fsnotify: place marks on object in order of group memory address vfs/fsnotify: fsnotify_close can delay the final work in fput fsnotify: store struct file not struct path ... Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.	2010-08-10 11:39:13 -07:00
Linus Torvalds	5f248c9c25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c	2010-08-10 11:26:52 -07:00
Kevin Winchester	85c9fe8fca	vfs: fix warning: 'dirent' is used uninitialized in this function Using: gcc (GCC) 4.5.0 20100610 (prerelease) The following warnings appear: fs/readdir.c: In function `filldir64': fs/readdir.c:240:15: warning: `dirent' is used uninitialized in this function fs/readdir.c: In function `filldir': fs/readdir.c:155:15: warning: `dirent' is used uninitialized in this function fs/compat.c: In function `compat_filldir64': fs/compat.c:1071:11: warning: `dirent' is used uninitialized in this function fs/compat.c: In function `compat_filldir': fs/compat.c:984:15: warning: `dirent' is used uninitialized in this function The warnings are related to the use of the NAME_OFFSET() macro. Luckily, it appears as though the standard offsetof() macro is what is being implemented by NAME_OFFSET(), thus we can fix the warning and use a more standard code construct at the same time. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:05 -07:00
Jan Kara	7624ee72aa	mm: avoid resetting wb_start after each writeback round WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write out some huge inode for too long while starving writeout of other inodes. To avoid livelocks, we record time we started writeback in wbc->wb_start and do not write out inodes which were dirtied after this time. But currently, writeback_inodes_wb() resets wb_start each time it is called thus effectively invalidating this logic and making any WB_SYNC_NONE writeback prone to livelocks. This patch makes sure wb_start is set only once when we start writeback. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:03 -07:00
David Rientjes	51b1bd2ace	oom: deprecate oom_adj tunable /proc/pid/oom_adj is now deprecated so that that it may eventually be removed. The target date for removal is August 2012. A warning will be printed to the kernel log if a task attempts to use this interface. Future warning will be suppressed until the kernel is rebooted to prevent spamming the kernel log. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:02 -07:00
David Rientjes	a63d83f427	oom: badness heuristic rewrite This a complete rewrite of the oom killer's badness() heuristic which is used to determine which task to kill in oom conditions. The goal is to make it as simple and predictable as possible so the results are better understood and we end up killing the task which will lead to the most memory freeing while still respecting the fine-tuning from userspace. Instead of basing the heuristic on mm->total_vm for each task, the task's rss and swap space is used instead. This is a better indication of the amount of memory that will be freeable if the oom killed task is chosen and subsequently exits. This helps specifically in cases where KDE or GNOME is chosen for oom kill on desktop systems instead of a memory hogging task. The baseline for the heuristic is a proportion of memory that each task is currently using in memory plus swap compared to the amount of "allowable" memory. "Allowable," in this sense, means the system-wide resources for unconstrained oom conditions, the set of mempolicy nodes, the mems attached to current's cpuset, or a memory controller's limit. The proportion is given on a scale of 0 (never kill) to 1000 (always kill), roughly meaning that if a task has a badness() score of 500 that the task consumes approximately 50% of allowable memory resident in RAM or in swap space. The proportion is always relative to the amount of "allowable" memory and not the total amount of RAM systemwide so that mempolicies and cpusets may operate in isolation; they shall not need to know the true size of the machine on which they are running if they are bound to a specific set of nodes or mems, respectively. Root tasks are given 3% extra memory just like __vm_enough_memory() provides in LSMs. In the event of two tasks consuming similar amounts of memory, it is generally better to save root's task. Because of the change in the badness() heuristic's baseline, it is also necessary to introduce a new user interface to tune it. It's not possible to redefine the meaning of /proc/pid/oom_adj with a new scale since the ABI cannot be changed for backward compatability. Instead, a new tunable, /proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may be used to polarize the heuristic such that certain tasks are never considered for oom kill while others may always be considered. The value is added directly into the badness() score so a value of -500, for example, means to discount 50% of its memory consumption in comparison to other tasks either on the system, bound to the mempolicy, in the cpuset, or sharing the same memory controller. /proc/pid/oom_adj is changed so that its meaning is rescaled into the units used by /proc/pid/oom_score_adj, and vice versa. Changing one of these per-task tunables will rescale the value of the other to an equivalent meaning. Although /proc/pid/oom_adj was originally defined as a bitshift on the badness score, it now shares the same linear growth as /proc/pid/oom_score_adj but with different granularity. This is required so the ABI is not broken with userspace applications and allows oom_adj to be deprecated for future removal. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:02 -07:00
Andrew Morton	74bcbf4054	oom: move badness() declaration into oom.h Cc: Minchan Kim <minchan.kim@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:02 -07:00
KOSAKI Motohiro	26ebc98491	oom: /proc/<pid>/oom_score treat kernel thread honestly If a kernel thread is using use_mm(), badness() returns a positive value. This is not a big issue because caller take care of it correctly. But there is one exception, /proc/<pid>/oom_score calls badness() directly and doesn't care that the task is a regular process. Another example, /proc/1/oom_score return !0 value. But it's unkillable. This incorrectness makes administration a little confusing. This patch fixes it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:01 -07:00
Al Viro	dca332528b	no need for list_for_each_entry_safe()/resetting with superblock list just delay __put_super() a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:49:02 -04:00
Al Viro	7a4dec5389	Fix sget() race with failing mount If sget() finds a matching superblock being set up, it'll grab an active reference to it and grab s_umount. That's fine - we'll wait for completion of foofs_get_sb() that way. However, if said foofs_get_sb() fails we'll end up holding the halfway-created superblock. deactivate_locked_super() called by foofs_get_sb() will just unlock the sucker since we are holding another active reference to it. What we need is a way to tell if superblock has been successfully set up. Unfortunately, neither ->s_root nor the check for MS_ACTIVE quite fit. Cheap and easy way, suitable for backport: new flag set by the (only) caller of ->get_sb(). If that flag isn't present by the time sget() grabbed s_umount on preexisting superblock it has found, it's seeing a stillborn and should just bury it with deactivate_locked_super() (and repeat the search). Longer term we want to set that flag in ->get_sb() instances (and check for it to distinguish between "sget() found us a live sb" and "sget() has allocated an sb, we need to set it up" in there, instead of checking ->s_root as we do now). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org	2010-08-09 16:49:01 -04:00
Tejun Heo	4f331f01b9	vfs: don't hold s_umount over close_bdev_exclusive() call Fix an obscure AB-BA deadlock in get_sb_bdev(). When a superblock is mounted more than once get_sb_bdev() calls close_bdev_exclusive() to drop the extra bdev reference while holding s_umount. However, sb->s_umount nests inside bd_mutex during __invalidate_device() and close_bdev_exclusive() acquires bd_mutex during blkdev_put(); thus creating an AB-BA deadlock. This condition doesn't trigger frequently. For this condition to be visible to lockdep, the filesystem must occupy the whole device (as __invalidate_device() only grabs bd_mutex for the whole device), the FS must be mounted more than once and partition rescan should be issued while the FS is still mounted. Fix it by dropping s_umount over close_bdev_exclusive(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ciprian Docan <docan@eden.rutgers.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:59 -04:00
Artem Bityutskiy	719f2c879f	sysv: do not mark superblock dirty on remount No need to mark the superblock as dirty in sysv_remount, synchronize it instead (only if mounting R/O). I did not find any docs about this file-system, and I have no possibility to test my changes. Thus, this is untested. I see other issues in sysv, e.g., why sysv_sync_fs writes only in the FSTYPE_SYSV4 case? However, it marks its SB bh's dirty for all types, and does not wait for them ever. With zero docs I'm unable to fix this. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:58 -04:00
Artem Bityutskiy	315671f3b8	sysv: do not mark superblock dirty on mount I did not find any docs about this file-system, and I have no possibility to test my changes. Thus, this is untested. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:56 -04:00
Artem Bityutskiy	696ac96c27	btrfs: remove junk sb_dirt change BTRFS does not define a '->write_super()' method, so it should not mark its superblock as dirty. This looks like some left-over. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:55 -04:00
Artem Bityutskiy	4e29d50a28	BFS: clean up the superblock usage BFS is a very simple FS and its superblocks contains only static information and is never changed. However, the BFS code for some misterious reasons marked its buffer head as dirty from time to time, but nothing in that buffer was ever changed. This patch removes all the BFS superblock manipulation, simply because it is not needed. It removes: 1. The si_sbh filed from 'struct bfs_sb_info' because it is not needed. We only need to read the SB once on mount to get the start of data blocks and the FS size. After this, we can forget about the SB. 2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because it is never changed. 3. The '->sync_fs()' method because there is nothing to sync (inodes are synched by VFS). 4. The '->write_super()' method, again, because the SB is never changed. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:53 -04:00
Artem Bityutskiy	7435d50611	AFFS: wait for sb synchronization when needed AFFS does not ever wait for superblock synchronization in ->put_super(), ->write_super, and ->sync_fs(). However, it should wait for synchronization in ->put_super() because it is about to be unmounted, in ->write_super() because this is periodic SB synchronization performed from a separate kernel thread, and in ->sync_fs() it should respect the 'wait' flag. This patch fixes the situation. Also, in ->put_super(), do not write the SB if it is not dirty. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:51 -04:00
Artem Bityutskiy	669d5f1f60	AFFS: clean up dirty flag usage In 'affs_write_super()': remove ancient and wrong commented code, remove unneeded 'clean' variable, so the function becomes a bit cleaner and simpler. In 'affs_remount(): remove unnecessary SB dirty flag changes. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:50 -04:00
Christoph Hellwig	1b9474635e	cifs: truncate fallout Remove the calls to inode_newsize_ok given that we already did it as part of inode_change_ok in the beginning of cifs_setattr_(no)unix. No need to call ->truncate if cifs doesn't have one, so remove the explicit call in cifs_vmtruncate, and replace the calls to vmtruncate with truncate_setsize which is vmtruncate minus inode_newsize_ok and the call to ->truncate. Rename cifs_vmtruncate to cifs_setsize to match the new calling conventions. Question 1: why does cifs do the pagecache munging and i_size update twice for each setattr call, once opencoded in cifs_vmtruncate, and once using the VFS helpers? Question 2: what is supposed to be protected by i_lock in cifs_vmtruncate? Do we need it around the call to inode_change_ok? [AV: fixed build breakage] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:48 -04:00
Andreas Gruenbacher	e566d48c9b	mbcache: fix shrinker function return value The shrinker function is supposed to return the number of cache entries after shrinking, not before shrinking. Fix that. Based on a patch from Wang Sheng-Hui <crosslonelyover@gmail.com>. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:47 -04:00
Andreas Gruenbacher	2aec7c5232	mbcache: Remove unused features The mbcache code was written to support a variable number of indexes, but all the existing users use exactly one index. Simplify to code to support only that case. There are also no users of the cache entry free operation, and none of the users keep extra data in cache entries. Remove those features as well. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:45 -04:00
Christoph Hellwig	365b181897	add f_flags to struct statfs(64) Add a flags field to help glibc implementing statvfs(3) efficiently. We copy the flag values from glibc, and add a new ST_VALID flag to denote that f_flags is implemented. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:44 -04:00
Christoph Hellwig	ebabe9a900	pass a struct path to vfs_statfs We'll need the path to implement the flags field for statvfs support. We do have it available in all callers except: - ecryptfs_statfs. This one doesn't actually need vfs_statfs but just needs to do a caller to the lower filesystem statfs method. - sys_ustat. Add a non-exported statfs_by_dentry helper for it which doesn't won't be able to fill out the flags field later on. In addition rename the helpers for statfs vs fstatfs to do_*statfs instead of the misleading vfs prefix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:42 -04:00
Al Viro	b70a3e0702	All filesystems that need invalidate_inode_buffers() are doing that explicitly Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:39 -04:00
Al Viro	b57922d97f	convert remaining ->clear_inode() to ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:37 -04:00

1 2 3 4 5 ...

19194 Commits