1
Commit Graph

3968 Commits

Author SHA1 Message Date
NeilBrown
596bbe53eb [PATCH] knfsd: Allow max size of NFSd payload to be configured
The max possible is the maximum RPC payload.  The default depends on amount of
total memory.

The value can be set within reason as long as no nfsd threads are currently
running.  The value can also be ready, allowing the default to be determined
after nfsd has started.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:16 -07:00
Greg Banks
7adae489fe [PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP
The limit over UDP remains at 32K.  Also, make some of the apparently
arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp.  This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet.  That comes next.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:16 -07:00
NeilBrown
3cc03b164c [PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom
..  by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack.  So we allocate it in 'struct
svc_rqst'.

However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union).  So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
NeilBrown
4452435948 [PATCH] knfsd: Replace two page lists in struct svc_rqst with one
We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256.  This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays.  However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.

The two arrays are for the pages holding the request, and the pages holding
the reply.  Instead of two arrays, we can simply keep an index into where the
first reply page is.

This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0.  i.e.  if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.

 check counters are initilised and incr properly
 check for consistant usage of ++ etc
 maybe extra some inlines for common approach
 general review

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
NeilBrown
5680c44632 [PATCH] knfsd: Fixed handling of lockd fail when adding nfsd socket
Arrgg..  We cannot 'lockd_up' before 'svc_addsock' as we don't know the
protocol yet....  So switch it around again and save the name of the created
sockets so that it can be closed if lock_up fails.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
NeilBrown
cda9e0cd8a [PATCH] knfsd: Protect update to sn_nrthreads with lock_kernel
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
NeilBrown
37a034729a [PATCH] knfsd: call lockd_down when closing a socket via a write to nfsd/portlist
The refcount that nfsd holds on lockd is based on the number of open sockets.
So when we close a socket, we should decrement the ref (with lockd_down).

Currently when a socket is closed via writing to the portlist file, that
doesn't happen.

So: make sure we get an error return if the socket that was requested does is
not found, and call lockd_down if it was.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
NeilBrown
7ed94296a6 [PATCH] knfsd: nfsd: lockdep annotation fix
nfsv2 needs the I_MUTEX_PARENT on the directory when creating a file too.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:15 -07:00
Eric Sesterhenn
585b7747d6 [PATCH] Remove unnecessary check in fs/reiserfs/inode.c
Since all callers dereference dir, we dont need this check.  Coverity id
#337.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:14 -07:00
Matthew Wilcox
dc02747da7 [PARISC] Fix fs/binfmt_som.c
Fix compilation (missing include of a.out.h)
Fix security hole (need to call unshare_files)

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-10-04 06:51:26 -06:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
Steven Whitehouse
7ecdb70a0e [GFS2] Fix endian bug for de_type
Missing endian conversion for the de_type field.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-03 21:03:35 -04:00
Eric Sesterhenn
9ab5aa911a BUG_ON conversion for fs/xfs/
This patch converts two if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:37:55 +02:00
Eric Sesterhenn
73dff8be9e BUG_ON() conversion in fs/nfsd/
This patch converts an if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:37:14 +02:00
Eric Sesterhenn
14a61442c2 BUG_ON conversion for fs/reiserfs
This patch converts several if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled. S_ISREG() has no side effects, so the
conversion is safe.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:36:38 +02:00
Komal Shah
5a65980ec5 debugfs: spelling fix
Change debufs_create_file() to debugfs_create_file().

Signed-off-by: Komal Shah <komal_shah802003@yahoo.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:28:36 +02:00
Uwe Zeisberger
f30c226954 fix file specification in comments
Many files include the filename at the beginning, serveral used a wrong one.

Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:01:26 +02:00
Matt LaPlante
cab00891c5 Still more typo fixes
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:36:44 +02:00
Matt LaPlante
44c09201a4 more misc typo fixes
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:34:14 +02:00
Matt LaPlante
cc2e2767f1 Typos in fs/Kconfig
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:22:29 +02:00
Ryan O'Hara
fcb47e0bd2 [GFS2] Initialize SELinux extended attributes at inode creation time.
This patch has gfs2_security_init declared as a static function, which
is correct. As a result, the declaration of this function in inode.h is
removed (and thus inode.h is unchanged). Also removed #include eaops.h,
which is not needed.

Signed-Off-By: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-03 11:57:35 -04:00
Steven Whitehouse
ddacfaf76d [GFS2] Move logging code into log.c (mostly)
This moves the logging code from meta_io.c into log.c and glops.c. As a
result the routines can now be static and all the logging code is together
in log.c, leaving meta_io.c with just metadata i/o code in it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-03 11:10:41 -04:00
Zach Brown
5c1fdf4150 [PATCH] pr_debug: sysfs: use size_t length modifier in pr_debug format arguments
sysfs: use size_t length modifier in pr_debug format arguments

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:19 -07:00
Zach Brown
4779efca14 [PATCH] pr_debug: configfs: use size_t length modifier in pr_debug format argument
configfs: use size_t length modifier in pr_debug format argument

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:19 -07:00
Zach Brown
691578cd55 [PATCH] pr_debug: aio: use size_t length modifier in pr_debug format arguments
aio: use size_t length modifier in pr_debug format arguments

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:19 -07:00
Jeff Garzik
c3b6571384 [PATCH] fs/eventpoll: error handling micro-cleanup
While reviewing the 'may be used uninitialized' bogus gcc warnings, I
noticed that an error code assignment was only needed if an error had
actually occured.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:03:41 -07:00
David Howells
afefdbb28a [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system.  They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example.  The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.

Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.

This patch:

Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.

The stat functions then returns the full 64-bit inode number where
available and where possible.  If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.

Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.

Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.

Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.

It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.

[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:03:40 -07:00
Steven Whitehouse
f92a0b6ff4 [GFS2] Mark nlink cleared so VFS sees it happen
This does nothing atm, but will be required for later support of
r/o bind mounts.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 16:01:53 -04:00
Steven Whitehouse
409e185d23 [GFS2] Two redundant casts removed
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 14:20:43 -04:00
Steven Whitehouse
48516ced21 [GFS2] Remove uneeded endian conversion
In many places GFS2 was calling the endian conversion routines
for an inode even when only a single field, or a few fields might
have changed. As a result we were copying lots of data needlessly.

This patch replaces those calls with conversion of just the
required fields in each case. This should be faster and easier
to understand. There are still other places which suffer from this
problem, but this is a start in the right direction.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 12:39:19 -04:00
Steven Whitehouse
3cf1e7bed4 [GFS2] Remove duplicate sb reading code
For some reason we had two different sets of code for reading in the
superblock. This removes one of them in favour of the other. Also we
don't need the temporary buffer for the sb since we already have one
in the gfs2 sb itself.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 11:49:41 -04:00
Steven Whitehouse
2e565bb69c [GFS2] Mark metadata reads for blktrace
Mark the metadata reads so that blktrace knows what they are.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 11:38:25 -04:00
Linus Torvalds
44f549217c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: White space cleanup
  [PATCH] JFS: return correct error when i-node allocation failed
  JFS: Remove shadow variable from fs/jfs/jfs_txnmgr.c:xtLog()
2006-10-02 08:27:09 -07:00
Steven Whitehouse
128e5ebaf8 [GFS2] Remove iflags.h, use FS_
Update GFS2 in the light of David Howells' patch:

[PATCH] BLOCK: Move common FS-specific ioctls to linux/fs.h [try #6]
36695673b0

which calls the filesystem independant flags FS_..._FL. As a result
we no longer need the flags.h file and the conversion routine is
moved into the GFS2 source code.

Userland programs which used to include iflags.h should now include
fs.h and use the new flag names.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 11:24:43 -04:00
David Howells
3f2e05e90e [PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h
Revert Andrew Morton's patch to temporarily hack around the lack of a
declaration of sigset_t in linux/compat.h to make the block-disablement
patches build on IA64.  This got accidentally pushed to Linus and should
be fixed in a different manner.

Also make linux/compat.h #include asm/signal.h to gain a definition of
sigset_t so that it can externally declare sigset_from_compat().

This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
ppc64 and run-tested on frv.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 08:03:31 -07:00
Oleg Nesterov
1a657f78dc [PATCH] introduce get_task_pid() to fix unsafe get_pid()
proc_pid_make_inode:

	ei->pid = get_pid(task_pid(task));

I think this is not safe.  get_pid() can be preempted after checking "pid
!= NULL".  Then the task exits, does detach_pid(), and RCU frees the pid.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Eric W. Biederman
1c0d04c9e4 [PATCH] proc: comment what proc_fill_cache does
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Eric W. Biederman
5e61feafa2 [PATCH] proc: remove the useless SMP-safe comments from /proc
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Eric W. Biederman
7bcd6b0efd [PATCH] proc: remove trailing blank entry from pid_entry arrays
It was pointed out that since I am taking ARRAY_SIZE anyway the trailing empty
entry is silly and just wastes space.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Eric W. Biederman
8e95bd936d [PATCH] proc: properly compute TGID_OFFSET
The value doesn't change but this ensures I will have the proper value when
other files are added to proc_base_stuff.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Oleg Nesterov
b0fa9db6ab [PATCH] proc: drop tasklist lock in task_state()
task_state() needs tasklist_lock to protect ->parent/->real_parent.  However
task->parent points to nowhere only when the actions below happen in order

	1) release_task(task)
	2) release_task(task->parent)
	3) a grace period passed

But 3) implies that the memory ops from 1) should be finished, so pid_alive()
can't be true in such a case.

Otherwise, we don't care if ->parent/->real_parent changes under us.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Oleg Nesterov
a593d6edeb [PATCH] proc: convert do_task_stat() to use lock_task_sighand()
Drop tasklist_lock. ->siglock protects almost all interesting data
(including sub-threads traversal) except:

	->signal->tty
		protected by tty_mutex

	->real_parent
		the task can't be unhashed while we are holding
		->siglock, so ->real_parent can change from under us
		but we can safely dereference it under rcu_read_lock()

	->pgrp/->session
		we can get inconsistent numbers if the task does
		sys_setsid/daemonize at the same time. I hope this
		is acceptable.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Oleg Nesterov
5e6b3f42ed [PATCH] proc: convert task_sig() to use lock_task_sighand()
lock_task_sighand() can take ->siglock without holding tasklist_lock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
7fbaac005c [PATCH] proc: Use pid_task instead of open coding it
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
72d9dcfc7a [PATCH] proc: Merge proc_tid_attr and proc_tgid_attr
The implementation is exactly the same and there is currently nothing to
distinguish proc_tid_attr, and proc_tgid_attr.  So it is pointless to have two
separate implementations.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
61a2878402 [PATCH] proc: Remove the hard coded inode numbers
The hard coded inode numbers in proc currently limit its maintainability,
its flexibility, and what can be done with the rest of system.  /proc limits
pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems,
and placing the pid in the inode number really gets in the way of implementing
subdirectories of per process information.

Ever since people started adding to the middle of the file type enumeration we
haven't been maintaing the historical inode numbers, all we have really
succeeded in doing is keeping the pid in the proc inode number.  The pid is
already available in the directory name so no information is lost removing it
from the inode number.

So if something in user space cares if we remove the inode number from the
/proc inode it is almost certainly broken.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
444ceed8d1 [PATCH] proc: Factor out an instantiate method from every lookup method
To remove the hard coded proc inode numbers it is necessary to be able to
create the proc inodes during readdir.  The instantiate methods are the subset
of lookup that is needed to accomplish that.

This first step just splits the lookup methods into 2 functions.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Eric W. Biederman
801199ce80 [PATCH] proc: Make the generation of the self symlink table driven
This patch generalizes the concept of files in /proc that are related to
processes but live in the root directory of /proc

Ideally this would reuse infrastructure from the rest of the process specific
parts of proc but unfortunately security_task_to_inode must not be called on
files that are not strictly per process.  security_task_to_inode really needs
to be reexamined as the security label can change in important places that we
are not currently catching, but I'm not certain that simplifies this problem.

By at least matching the structure of the rest of proc we get more idiom reuse
and it becomes easier to spot problems in the way things are put together.

Later things like /proc/mounts are likely to be moved into proc_base as well.
If union mounts are ever supported we may be able to make /proc a union mount,
and properly split it into 2 filesystems.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Serge E. Hallyn
96b644bdec [PATCH] namespaces: utsname: use init_utsname when appropriate
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
	right ones in net/ipv4/ipconfig.c.  These are now changed to
	utsname() (the per-process namespace utsname) in the previous
	patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
e9ff3990f0 [PATCH] namespaces: utsname: switch to using uts namespaces
Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
1651e14e28 [PATCH] namespaces: incorporate fs namespace into nsproxy
This moves the mount namespace into the nsproxy.  The mount namespace count
now refers to the number of nsproxies point to it, rather than the number of
tasks.  As a result, the unshare_namespace() function in kernel/fork.c no
longer checks whether it is being shared.

Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Peter Zijlstra
12fd352038 [PATCH] nfsd: lockdep annotation
while doing a kernel make modules_install install over an NFS mount.

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9550 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9550:
   #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9580 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9580:
   #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks
eed2965af1 [PATCH] knfsd: allow admin to set nthreads per node
Add /proc/fs/nfsd/pool_threads which allows the sysadmin (or a userspace
daemon) to read and change the number of nfsd threads in each pool.  The
format is a list of space-separated integers, one per pool.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks
eec09661dc [PATCH] knfsd: use svc_set_num_threads to manage threads in knfsd
Replace the existing list of all nfsd threads with new code using
svc_create_pooled().

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks
9a24ab5749 [PATCH] knfsd: add svc_get
add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
NeilBrown
4a3ae42dc3 [PATCH] knfsd: Correctly handle error condition from lockd_up
If lockd_up fails - what should we expect?  Do we have to later call
lockd_down?

Well the nfs client thinks "no", the nfs server thinks "yes".  lockd thinks
"yes".

The only answer that really makes sense is "no" !!

So:
  Make lockd_up only increment  nlmsvc_users on success.
  Make nfsd handle errors from lockd_up properly.
  Make sure lockd_up(0) never fails when lockd is running
    so that the 'reclaimer' call to lockd_up doesn't need to
    be error checked.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
7dcf91ec66 [PATCH] knfsd: Move makesock failed warning into make_socks.
Thus it is printed for any path that leads to failure (make_socks is called
from two places).

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
3dfb421053 [PATCH] knfsd: Check return value of lockd_up in write_ports
We should be checking the return value of lockd_up when adding a new socket to
nfsd.  So move the lockd_up before the svc_addsock and check the return value.

The move is because lockd_down is easy, but there is no easy way to remove a
recently added socket.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
6fb2b47fa1 [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process
It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
Josh Triplett
896440d560 [PATCH] nfsd: add lock annotations to e_start and e_stop
e_start acquires svc_export_cache.hash_lock, and e_stop releases it.  Add
lock annotations to these two functions so that sparse can check callers
for lock pairing, and so that sparse will not complain about these
functions since they intentionally use locks in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
Greg Banks
bc6f02e516 [PATCH] knfsd: Use SEQ_START_TOKEN instead of hardcoded magic (void*)1
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
b41b66d63c [PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'
Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist.  This will cause the nfs server to listen on that socket.

To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
80212d59e3 [PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports
This file will list all ports that nfsd has open.
Default when TCP enabled will be
   ipv4 udp 0.0.0.0 2049
   ipv4 tcp 0.0.0.0 2049

Later, the list of ports will be settable.

'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
non-mainline patches which created 'ports' with different semantics.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
02a375f0ac [PATCH] knfsd: separate out some parts of nfsd_svc, which start nfs servers
Separate out the code for creating a new service, and for creating initial
sockets.

Some of these new functions will have multiple callers soon.

[akpm@osdl.org: cleanups]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown
6658d3a7bb [PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions
We have an array 'nfsd_version' which lists the available versions of nfsd,
and 'nfsd_versions' (poor choice there :-() which lists the currently active
versions.

Then we have a bitmap - nfsd_versbits which says which versions are wanted.
The bits in this bitset cause content to be copied from nfsd_version to
nfsd_versions when nfsd starts.

This patch removes nfsd_versbits and moves information directly from
nfsd_version to nfsd_versions when requests for version changes arrive.

Note that this doesn't make it possible to change versions while the server is
running.  This is because serv->sv_xdrsize is calculated when a service is
created, and used when threads are created, and xdrsize depends on the active
versions.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
NeilBrown
24e36663c3 [PATCH] knfsd: be more selective in which sockets lockd listens on
Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.

However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.

So:
 - add an option to lockd_up saying which protocol is needed
 - Always open sockets for which an explicit port was given, otherwise
   only open a socket of the type required
 - Change nfsd to do one lockd_up per socket rather than one per thread.

This
 - removes the dependancy on CONFIG_NFSD_TCP
 - means that lockd may open sockets other than at startup
 - means that lockd will *not* listen on UDP if the only
   mounts are TCP mount (and nfsd hasn't started).

The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
NeilBrown
bc591ccff2 [PATCH] knfsd: add a callback for when last rpc thread finishes
nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more.  So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
Greg Banks
b06c7b4333 [PATCH] knfsd: remove an unused variable from e_show()
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
Greg Banks
3e3b480096 [PATCH] knfsd: add some missing newlines in printks
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
Eric W. Biederman
43fa1adb93 [PATCH] file: Add locking to f_getown
This has been needed for a long time, but now with the advent of a
reference counted struct pid there are real consequences for getting this
wrong.

Someone I think it was Oleg Nesterov pointed out that this construct was
missing locking, when I introduced struct pid.  After taking time to review
the locking construct already present I figured out which lock needs to be
taken.  The other paths that access f_owner.pid take either the f_owner
read or the write lock.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Sukadev Bhattiprolu
3fbc964864 [PATCH] Define struct pspace
Define a per-container pid space object.  And create one instance of this
object, init_pspace, to define the entire pid space.  Subsequent patches
will provide/use interfaces to create/destroy pid spaces.

Its a subset/rework of Eric Biederman's patch
http://lkml.org/lkml/2006/2/6/285 .

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Andreas Mohr
ed97bd37ef [PATCH] fs/inode.c tweaks
Only touch inode's i_mtime and i_ctime to make them equal to "now" in case
they aren't yet (don't just update timestamp unconditionally).  Uninline
the hash function to save 259 Bytes.

This tiny inode change which may improve cache behaviour also shaves off 8
Bytes from file_update_time() on i386.

Included a tiny codestyle cleanup, too.

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Alexey Dobriyan
07acaf28d2 [PATCH] Remove NULL check in register_nls()
Everybody passes valid pointer there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Eric W. Biederman
609d7fa956 [PATCH] file: modify struct fown_struct to use a struct pid
File handles can be requested to send sigio and sigurg to processes.  By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Eric W. Biederman
f6c7a1f34e [PATCH] proc: give the root directory a task
Helper functions in base.c like proc_pident_readdir and proc_pident_lookup
assume the directories have an associated task, and cannot currently be used
on the /proc root directory because it does not have such a task.

This small changes allows for base.c to be simplified and later when multiple
pid spaces are introduced it makes getting the needed context information
trivial.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman
20cdc894c4 [PATCH] proc: modify proc_pident_lookup to be completely table driven
Currently proc_pident_lookup gets the names and types from a table and then
has a huge switch statement to get the inode and file operations it needs.
That is silly and is becoming increasingly hard to maintain so I just put all
of the information in the table.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman
28a6d67179 [PATCH] proc: reorder the functions in base.c
There were enough changes in my last round of cleaning up proc I had to break
up the patch series into smaller chunks, and my last chunk never got resent.

This patchset gives proc dynamic inode numbers (the static inode numbers were
a pain to maintain and prevent all kinds of things), and removes the horrible
switch statements that had to be kept in sync with everything else.  Being
fully table driver takes us 90% of the way of being able to register new
process specific attributes in proc.

This patch:

Group the functions by what they implement instead of by type of operation.
As it existed base.c was quickly approaching the point where it could not be
followed.

No functionality or code changes asside from adding/removing forward
declartions are implemented in this patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman
0804ef4b0d [PATCH] proc: readdir race fix (take 3)
The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of system
calls.  For this race to trigger the process that proc_pid_readdir stops at
must exit before readdir is called again.

This can cause ps to fail to report processes, and it is in violation of
posix guarantees and normal application expectations with respect to
readdir.

Currently there is no way to work around this problem in user space short
of providing a gargantuan buffer to user space so the directory read all
happens in on system call.

This patch implements the normal directory semantics for proc, that
guarantee that a directory entry that is neither created nor destroyed
while reading the directory entry will be returned.  For directory that are
either created or destroyed during the readdir you may or may not see them.
 Furthermore you may seek to a directory offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires, and
more importantly that user space expects.  Plus it is a simple semantic to
implement reliable service.  It is just a matter of calling readdir a
second time if you are wondering if something new has show up.

These better semantics are implemented by scanning through the pids in
numerical order and by making the file offset a pid plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm.  Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
are only 40 cache lines for the entire 32K pid space.  A typical system
will have 100 pids or more so this is actually fewer cache lines we have to
look at to scan a linked list, and the worst case of having to scan the
entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less code than
what we are doing now.

Also another very subtle bug in this area has been fixed.  It is possible
to catch a task in the middle of de_thread where a thread is assuming the
thread of it's thread group leader.  This patch carefully handles that case
so if we hit it we don't fail to return the pid, that is undergoing the
de_thread dance.

Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.

[oleg@tv-sign.ru: fix it]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:12 -07:00
Dave Kleikamp
63f83c9fcf JFS: White space cleanup
Removed trailing spaces & tabs, and spaces preceding tabs.
Also a couple very minor comment cleanups.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
(cherry picked from f74156539964d7b3d5164fdf8848e6a682f75b97 commit)
2006-10-02 09:55:27 -05:00
Akinobu Mita
087387f90f [PATCH] JFS: return correct error when i-node allocation failed
I have seen confusing behavior on JFS when I injected many intentional
slab allocation errors. The cp command failed with no disk space error
with enough disk space.

This patch makes:

- change the return value in case slab allocation failures happen
  from -ENOSPC to -ENOMEM

- ialloc() return error code so that the caller can know the reason
  of failures

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
(cherry picked from 2b46f77976f798f3fe800809a1d0ed38763c71c8 commit)
2006-10-02 09:51:01 -05:00
Tony Breeds
2a6968a978 JFS: Remove shadow variable from fs/jfs/jfs_txnmgr.c:xtLog()
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
(cherry picked from bdc3d9e5af7d9c105be734dd7b5c3f1d9425a15a commit)
2006-10-02 09:50:51 -05:00
Steven Whitehouse
d00223f169 [GFS2] Fix code style/indent in ops_file.c
Fix a couple of minor issues.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 10:28:05 -04:00
Andrew Morton
930cc237d6 [GFS2] streamline-generic_file_-interfaces-and-filemap gfs fix
Fix GFS for streamline-generic_file_-interfaces-and-filemap.patch

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 09:02:54 -04:00
Badari Pulavarty
9c9eb21eee [GFS2] Remove readv/writev methods and use aio_read/aio_write instead (gfs bits)
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 09:02:12 -04:00
Steven Whitehouse
59458f40e2 Merge branch 'master' into gfs2 2006-10-02 08:45:08 -04:00
Andi Kleen
d025c9db7f [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern
Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.

This is done by overloading the existing core_pattern sysctl
with a new syntax:

|program

When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run.  The core dump will be
written to the standard input of that program instead of to a file.

This is useful for having automatic core dump analysis without filling up
disks.  The program can do some simple analysis and save only a summary of
the core dump.

The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.

I also increased the core pattern size to 128 bytes so that longer command
lines fit.

Most of the changes comes from allowing core dumps without seeks.  They are
fairly straight forward though.

One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour.  I think that's
unlikely to be a real problem though.

Additional background:

> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps?  I'm guessing that the embedded people will
> really want this functionality.

I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.

Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).

Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way.  I ran out of time before doing that though.

The demo prototype scripts weren't very good.  If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.

Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance).  If nobody else
does it I can probably do the rewrite myself again at some point.

My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.

Alan sayeth:

  I don't believe that piping as such as neccessarily the right model, but
  the ability to intercept and processes core dumps from user space is asked
  for by many enterprise users as well.  They want to know about, capture,
  analyse and process core dumps, often centrally and in automated form.

[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Andi Kleen
d6cbd281d1 [PATCH] Some cleanup in the pipe code
Split the big and hard to read do_pipe function into smaller pieces.

This creates new create_write_pipe/free_write_pipe/create_read_pipe
functions.  These functions are made global so that they can be used by
other parts of the kernel.

The resulting code is more generic and easier to read and has cleaner error
handling and less gotos.

[akpm@osdl.org: cleanup]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Dave Hansen
ce71ec3684 [PATCH] r/o bind mounts: monitor zeroing of i_nlink
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation.  We need to catch these in addition to the
decrement operations.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Mark Fasheh
17ff785691 [PATCH] r/o bind mounts: clean up OCFS2 nlink handling
OCFS2 does some operations on i_nlink, then reverts them if some of its
operations fail to complete.  This does not fit in well with the
drop_nlink() logic where we expect i_nlink to stay at zero once it gets
there.

So, delay all of the nlink operations until we're sure that the operations
have completed.  Also, introduce a small helper to check whether an inode
has proper "unlinkable" i_nlink counts no matter whether it is a directory
or regular inode.

This patch is broken out from the others because it does contain some
logical changes.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
d8c76e6f45 [PATCH] r/o bind mount prepwork: inc_nlink() helper
This is mostly included for parity with dec_nlink(), where we will have some
more hooks.  This one should stay pretty darn straightforward for now.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
9a53c3a783 [PATCH] r/o bind mounts: unlink: monitor i_nlink
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.

We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.

So, add a little helper function to do the decrements.  We'll tie into it in a
bit to note when i_nlink hits zero.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
aab520e2f6 [PATCH] r/o bind mount prepwork: move open_namei()'s vfs_create()
The code around vfs_create() in open_namei() is getting a bit too complex.
Right now, there is at least the reference count on the dentry, and the
i_mutex to worry about.  Soon, we'll also have mnt_writecount.

So, break the vfs_create() call out of open_namei(), and into a helper
function.  This duplicates the call to may_open(), but that isn't such a bad
thing since the arguments (acc_mode and flag) were being heavily massaged
anyway.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
6902d925d5 [PATCH] r/o bind mounts: prepare for write access checks: collapse if()
We're shortly going to be adding a bunch more permission checks in these
functions.  That requires adding either a bunch of new if() conditions, or
some gotos.  This patch collapses existing if()s and uses gotos instead to
prepare for the upcoming changes.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Jay Lan
8f0ab51479 [PATCH] csa: convert CONFIG tag for extended accounting routines
There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.

Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Badari Pulavarty
eed4e51fb6 [PATCH] Add vector AIO support
This work is initially done by Zach Brown to add support for vectored aio.
These are the core changes for AIO to support
IOCB_CMD_PREADV/IOCB_CMD_PWRITEV.

[akpm@osdl.org: huge build fix]
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Badari Pulavarty
543ade1fc9 [PATCH] Streamline generic_file_* interfaces and filemap cleanups
This patch cleans up generic_file_*_read/write() interfaces.  Christoph
Hellwig gave me the idea for this clean ups.

In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
do_sync_read/ do_sync_write() as their .read/.write methods.  This allows us
to cleanup all variants of generic_file_* routines.

Final available interfaces:

generic_file_aio_read() - read handler
generic_file_aio_write() - write handler
generic_file_aio_write_nolock() - no lock write handler

__generic_file_aio_write_nolock() - internal worker routine

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Badari Pulavarty
ee0b3e671b [PATCH] Remove readv/writev methods and use aio_read/aio_write instead
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Badari Pulavarty
027445c372 [PATCH] Vectorize aio_read/aio_write fileop methods
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Jeff Mahoney
9ea0f9499d [PATCH] reiserfs: eliminate minimum window size for bitmap searching
When a file system becomes fragmented (using MythTV, for example), the
bigalloc window searching ends up causing huge performance problems.  In a
file system presented by a user experiencing this bug, the file system was
90% free, but no 32-block free windows existed on the entire file system.
This causes the allocator to scan the entire file system for each 128k
write before backing down to searching for individual blocks.

In the end, finding a contiguous window for all the blocks in a write is an
advantageous special case, but one that can be found naturally when such a
window exists anyway.

This patch removes the bigalloc window searching, and has been proven to
fix the test case described above.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Jeff Mahoney
5a2618e6a9 [PATCH] reiserfs: use generic_file_open for open() checks
The other common disk-based file systems (I checked ext[23], xfs, jfs)
check to ensure that opens of files > 2 GB fail unless O_LARGEFILE is
specified.  They check via generic_file_open or their own open routine.

ReiserFS doesn't have an f_op->open defined, and as such, it's possible to
open files > 2 GB without O_LARGEFILE.

This patch adds the f_op->open member to conform with the expected
behavior.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00