linux

Author	SHA1	Message	Date
J. Bruce Fields	ebc63e531c	svcrpc: fix list-corrupting race on nfsd shutdown After commit `3262c816a3` "[PATCH] knfsd: split svc_serv into pools", svc_delete_xprt (then svc_delete_socket) no longer removed its xpt_ready (then sk_ready) field from whatever list it was on, noting that there was no point since the whole list was about to be destroyed anyway. That was mostly true, but forgot that a few svc_xprt_enqueue()'s might still be hanging around playing with the about-to-be-destroyed list, and could get themselves into trouble writing to freed memory if we left this xprt on the list after freeing it. (This is actually functionally identical to a patch made first by Ben Greear, but with more comments.) Cc: stable@kernel.org Cc: gnb@fmeh.org Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-15 18:58:46 -04:00
J. Bruce Fields	99de8ea962	rpc: keep backchannel xprt as long as server connection Multiple backchannels can share the same tcp connection; from rfc 5661 section 2.10.3.1: A connection's association with a session is not exclusive. A connection associated with the channel(s) of one session may be simultaneously associated with the channel(s) of other sessions including sessions associated with other client IDs. However, multiple backchannels share a connection, they must all share the same xid stream (hence the same rpc_xprt); the only way we have to match replies with calls at the rpc layer is using the xid. So, keep the rpc_xprt around as long as the connection lasts, in case we're asked to use the connection as a backchannel again. Requests to create new backchannel clients over a given server connection should results in creating new clients that reuse the existing rpc_xprt. But to start, just reject attempts to associate multiple rpc_xprt's with the same underlying bc_xprt. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-11 15:04:10 -05:00
J. Bruce Fields	9e701c6109	svcrpc: simpler request dropping Currently we use -EAGAIN returns to determine when to drop a deferred request. On its own, that is error-prone, as it makes us treat -EAGAIN returns from other functions specially to prevent inadvertent dropping. So, use a flag on the request instead. Returning an error on request deferral is still required, to prevent further processing, but we no longer need worry that an error return on its own could result in a drop. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:49:22 -05:00
NeilBrown	7c96aef759	sunrpc: remove xpt_pool The xpt_pool field is only used for reporting BUGs. And it isn't used correctly. In particular, when it is cleared in svc_xprt_received before XPT_BUSY is cleared, there is no guarantee that either the compiler or the CPU might not re-order to two assignments, just setting xpt_pool to NULL after XPT_BUSY is cleared. If a different cpu were running svc_xprt_enqueue at this moment, it might see XPT_BUSY clear and then xpt_pool non-NULL, and so BUG. This could be fixed by calling smp_mb__before_clear_bit() before the clear_bit. However as xpt_pool isn't really used, it seems safest to simply remove xpt_pool. Another alternate would be to change the clear_bit to clear_bit_unlock, and the test_and_set_bit to test_and_set_bit_lock. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-17 15:48:18 -05:00
J. Bruce Fields	ec66ee3797	Merge commit 'v2.6.37-rc6' into for-2.6.38	2010-12-17 13:29:07 -05:00
NeilBrown	ed2849d3ec	sunrpc: prevent use-after-free on clearing XPT_BUSY When an xprt is created, it has a refcount of 1, and XPT_BUSY is set. The refcount is not owned by the thread that created the xprt (as is clear from the fact that creators never put the reference). Rather, it is owned by the absence of XPT_DEAD. Once XPT_DEAD is set, (And XPT_BUSY is clear) that initial reference is dropped and the xprt can be freed. So when a creator clears XPT_BUSY it is dropping its only reference and so must not touch the xprt again. However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY reference on a new xprt), calls svc_xprt_recieved. This clears XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference. This is dangerous and has been seen to leave svc_xprt_enqueue working with an xprt containing garbage. So we need to hold an extra counted reference over that call to svc_xprt_received. For safety, any time we clear XPT_BUSY and then use the xprt again, we first get a reference, and the put it again afterwards. Note that svc_close_all does not need this extra protection as there are no threads running, and the final free can only be called asynchronously from such a thread. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-07 20:39:55 -05:00
J. Bruce Fields	9c335c0b8d	svcrpc: fix wspace-checking race We call svc_xprt_enqueue() after something happens which we think may require handling from a server thread. To avoid such events being lost, svc_xprt_enqueue() must guarantee that there will be a svc_serv() call from a server thread following any such event. It does that by either waking up a server thread itself, or checking that XPT_BUSY is set (in which case somebody else is doing it). But the check of XPT_BUSY could occur just as someone finishes processing some other event, and just before they clear XPT_BUSY. Therefore it's important not to clear XPT_BUSY without subsequently doing another svc_export_enqueue() to check whether the xprt should be requeued. The xpo_wspace() check in svc_xprt_enqueue() breaks this rule, allowing an event to be missed in situations like: data arrives call svc_tcp_data_ready(): call svc_xprt_enqueue(): set BUSY find no write space svc_reserve(): free up write space call svc_enqueue(): test BUSY clear BUSY So, instead, check wspace in the same places that the state flags are checked: before taking BUSY, and in svc_receive(). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:12 -05:00
J. Bruce Fields	b176331627	svcrpc: svc_close_xprt comment Neil Brown had to explain to me why we do this here; record the answer for posterity. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:12 -05:00
J. Bruce Fields	f8c0d226fe	svcrpc: simplify svc_close_all There's no need to be fooling with XPT_BUSY now that all the threads are gone. The list_del_init() here could execute at the same time as the svc_xprt_enqueue()'s list_add_tail(), with undefined results. We don't really care at this point, but it might result in a spurious list-corruption warning or something. And svc_close() isn't adding any value; just call svc_delete_xprt() directly. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:11 -05:00
J. Bruce Fields	ca7896cd83	nfsd4: centralize more calls to svc_xprt_received Follow up on `b48fa6b991` by moving all the svc_xprt_received() calls for the main xprt to one place. The clearing of XPT_BUSY here is critical to the correctness of the server, so I'd prefer it to be obvious where we do it. The only substantive result is moving svc_xprt_received() after svc_receive_deferred(). Other than a (likely insignificant) delay waking up the next thread, that should be harmless. Also reshuffle the exit code a little to skip a few other steps that we don't care about the in the svc_delete_xprt() case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:11 -05:00
J. Bruce Fields	62bac4af3d	svcrpc: don't set then immediately clear XPT_DEFERRED There's no harm to doing this, since the only caller will immediately call svc_enqueue() afterwards, ensuring we don't miss the remaining deferred requests just because XPT_DEFERRED was briefly cleared. But why not just do this the simple way? Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:11 -05:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
J. Bruce Fields	01dba075d5	svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue If any xprt marked DEAD is also left BUSY for the rest of its life, then the XPT_DEAD check here is superfluous--we'll get the same result from the XPT_BUSY check just after. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:33 -04:00
J. Bruce Fields	ac9303eb74	svcrpc: assume svc_delete_xprt() called only once As long as DEAD exports are left BUSY, and svc_delete_xprt is called only with BUSY held, then svc_delete_xprt() will never be called on an xprt that is already DEAD. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:32 -04:00
J. Bruce Fields	7e4fdd0744	svcrpc: never clear XPT_BUSY on dead xprt Once an xprt has been deleted, there's no reason to allow it to be enqueued--at worst, that might cause the xprt to be re-added to some global list, resulting in later corruption. Also, note this leaves us with no need for the reference-count manipulation here. Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:58:40 -04:00
Pavel Emelyanov	8f3a6de313	sunrpc: Turn list_for_each-s into the ..._entry-s Saves some lines of code and some branticks when reading one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:16 -04:00
J. Bruce Fields	edc7a89403	nfsd: provide callbacks on svc_xprt deletion NFSv4.1 needs warning when a client tcp connection goes down, if that connection is being used as a backchannel, so that it can warn the client that it has lost the backchannel connection. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-10-01 19:29:44 -04:00
Pavel Emelyanov	62832c039e	sunrpc: Pull net argument downto svc_create_socket After this the socket creation in it knows the context. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:55 -04:00
Pavel Emelyanov	fc5d00b04a	sunrpc: Add net argument to svc_create_xprt Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:54 -04:00
Pavel Emelyanov	4fb8518bda	sunrpc: Tag svc_xprt with net The transport representation should be per-net of course. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-09-27 10:16:12 -04:00
Pavel Emelyanov	e3bfca01c1	sunrpc: Make xprt auth cache release work with the xprt This is done in order to facilitate getting the ip_map_cache from which to put the ip_map. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-09-27 10:16:11 -04:00
J. Bruce Fields	6610f720e9	svcrpc: minor cache cleanup Pull out some code into helper functions, fix a typo. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-09-07 20:19:12 -04:00
NeilBrown	f16b6e8d83	sunrpc/cache: allow threads to block while waiting for cache update. The current practice of waiting for cache updates by queueing the whole request to be retried has (at least) two problems. 1/ With NFSv4, requests can be quite complex and re-trying a whole request when a later part fails should only be a last-resort, not a normal practice. 2/ Large requests, and in particular any 'write' request, will not be queued by the current code and doing so would be undesirable. In many cases only a very sort wait is needed before the cache gets valid data. So, providing the underlying transport permits it by setting ->thread_wait, arrange to wait briefly for an upcall to be completed (as reflected in the clearing of CACHE_PENDING). If the short wait was not long enough and CACHE_PENDING is still set, fall back on the old approach. The 'thread_wait' value is set to 5 seconds when there are spare threads, and 1 second when there are no spare threads. These values are probably much higher than needed, but will ensure some forward progress. Note that as we only request an update for a non-valid item, and as non-valid items are updated in place it is extremely unlikely that cache_check will return -ETIMEDOUT. Normally cache_defer_req will sleep for a short while and then find that the item is_valid. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-09-07 19:22:07 -04:00
J. Bruce Fields	5306293c9c	Merge commit 'v2.6.34-rc6' Conflicts: fs/nfsd/nfs4callback.c	2010-05-04 11:29:05 -04:00
Neil Brown	b48fa6b991	sunrpc: centralise most calls to svc_xprt_received svc_xprt_received must be called when ->xpo_recvfrom has finished receiving a message, so that the XPT_BUSY flag will be cleared and if necessary, requeued for further work. This call is currently made in each ->xpo_recvfrom function, often from multiple different points. In each case it is the earliest point on a particular path where it is known that the protection provided by XPT_BUSY is no longer needed. However there are (still) some error paths which do not call svc_xprt_received, and requiring each ->xpo_recvfrom to make the call does not encourage robustness. So: move the svc_xprt_received call to be made just after the call to ->xpo_recvfrom(), and move it of the various ->xpo_recvfrom methods. This means that it may not be called at the earliest possible instant, but this is unlikely to be a measurable performance issue. Note that there are still other calls to svc_xprt_received as it is also needed when an xprt is newly created. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-05-03 08:33:00 -04:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
J. Bruce Fields	788e69e548	svcrpc: don't hold sv_lock over svc_xprt_put() svc_xprt_put() can call tcp_close(), which can sleep, so we shouldn't be holding this lock. In fact, only the xpt_list removal and the sv_tmpcnt decrement should need the sv_lock here. Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-03-29 21:02:31 -04:00
J. Bruce Fields	1b644b6e6f	Revert "sunrpc: move the close processing after do recvfrom method" This reverts commit `b0401d7253`, which moved svc_delete_xprt() outside of XPT_BUSY, and allowed it to be called after svc_xpt_recived(), removing its last reference and destroying it after it had already been queued for future processing. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-02-28 16:39:30 -05:00
J. Bruce Fields	f5822754ea	Revert "sunrpc: fix peername failed on closed listener" This reverts commit `b292cf9ce7`. The commit that it attempted to patch up, `b0401d7253`, was fundamentally wrong, and will also be reverted. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-02-28 16:39:15 -05:00
Neil Brown	ab1b18f70a	sunrpc: remove unnecessary svc_xprt_put The 'struct svc_deferred_req's on the xpt_deferred queue do not own a reference to the owning xprt. This is seen in svc_revisit which is where things are added to this queue. dr->xprt is set to NULL and the reference to the xprt it put. So when this list is cleaned up in svc_delete_xprt, we mustn't put the reference. Also, replace the 'for' with a 'while' which is arguably simpler and more likely to compile efficiently. Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-02-26 17:42:46 -05:00
Chuck Lever	6871790815	SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found" write_ports() converts svc_create_xprt()'s ENOENT error return to EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error message that makes sense. It turns out that several of the other kernel APIs rpc.nfsd use can also return ENOENT from svc_create_xprt(), by way of lockd_up(). On the client side, an NFSv2 or NFSv3 mount request can also return the result of lockd_up(). This error may also be returned during an NFSv4 mount request, since the NFSv4 callback service uses svc_create_xprt() to create the callback listener. An ENOENT error return results in a confusing error message from the mount command. Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-01-26 17:59:21 -05:00
Chuck Lever	d6783b2b6c	SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt() Clean up: Bruce observed we have more or less common logic in each of svc_create_xprt()'s callers: the check to create an IPv6 RPC listener socket only if CONFIG_IPV6 is set. I'm about to add another case that does just the same. If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt() call sites can get rid of the "#ifdef" ugliness, and can use the same logic with or without IPv6 support available in the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-01-26 17:56:43 -05:00
Linus Torvalds	93939f4e5d	Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux * 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: sunrpc: fix peername failed on closed listener nfsd: make sure data is on disk before calling ->fsync nfsd: fix "insecure" export option	2010-01-06 18:10:15 -08:00
Xiaotian Feng	b292cf9ce7	sunrpc: fix peername failed on closed listener There're some warnings of "nfsd: peername failed (err 107)!" socket error -107 means Transport endpoint is not connected. This warning message was outputed by svc_tcp_accept() [net/sunrpc/svcsock.c], when kernel_getpeername returns -107. This means socket might be CLOSED. And svc_tcp_accept was called by svc_recv() [net/sunrpc/svc_xprt.c] if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) { <snip> newxpt = xprt->xpt_ops->xpo_accept(xprt); <snip> So this might happen when xprt->xpt_flags has both XPT_LISTENER and XPT_CLOSE. Let's take a look at commit `b0401d72`, this commit has moved the close processing after do recvfrom method, but this commit also introduces this warnings, if the xpt_flags has both XPT_LISTENER and XPT_CLOSED, we should close it, not accpet then close. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-01-06 17:38:04 -05:00
Linus Torvalds	37c24b37fb	Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux * 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: (42 commits) nfsd: remove pointless paths in file headers nfsd: move most of nfsfh.h to fs/nfsd nfsd: remove unused field rq_reffh nfsd: enable V4ROOT exports nfsd: make V4ROOT exports read-only nfsd: restrict filehandles accepted in V4ROOT case nfsd: allow exports of symlinks nfsd: filter readdir results in V4ROOT case nfsd: filter lookup results in V4ROOT case nfsd4: don't continue "under" mounts in V4ROOT case nfsd: introduce export flag for v4 pseudoroot nfsd: let "insecure" flag vary by pseudoflavor nfsd: new interface to advertise export features nfsd: Move private headers to source directory vfs: nfsctl.c un-used nfsd #includes lockd: Remove un-used nfsd headers #includes s390: remove un-used nfsd #includes sparc: remove un-used nfsd #includes parsic: remove un-used nfsd #includes compat.c: Remove dependence on nfsd private headers ...	2009-12-16 10:43:34 -08:00
Joe Perches	f64f9e7192	net: Move && and \|\| to end of previous line Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-29 16:55:45 -08:00
J. Bruce Fields	78c210efde	Revert "knfsd: avoid overloading the CPU scheduler with enormous load averages" This reverts commit `59a252ff8c`. This helps in an entirely cached workload but not necessarily in workloads that require waiting on disk. Conflicts: include/linux/sunrpc/svc.h net/sunrpc/svc_xprt.c Reported-by: Simon Kirby <sim@hostway.ca> Tested-by: Jesper Krogh <jesper@krogh.cc> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-11-23 12:34:05 -05:00
Rahul Iyer	4cfc7e6019	nfsd41: sunrpc: Added rpc server-side backchannel handling When the call direction is a reply, copy the xid and call direction into the req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns rpc_garbage. Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [get rid of CONFIG_NFSD_V4_1] [sunrpc: refactoring of svc_tcp_recvfrom] [nfsd41: sunrpc: create common send routine for the fore and the back channels] [nfsd41: sunrpc: Use free_page() to free server backchannel pages] [nfsd41: sunrpc: Document server backchannel locking] [nfsd41: sunrpc: remove bc_connect_worker()] [nfsd41: sunrpc: Define xprt_server_backchannel()[ [nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions] [nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()] [nfsd41: sunrpc: Don't auto close the server backchannel connection] [nfsd41: sunrpc: Remove unused functions] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: change bc_sock to bc_xprt] [nfsd41: sunrpc: move struct rpc_buffer def into a common header file] [nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex] [removed cosmetic changes] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: add new xprt class for nfsv4.1 backchannel] [sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> [reverted more cosmetic leftovers] [got rid of xprt_server_backchannel] [separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Cc: Trond Myklebust <trond.myklebust@netapp.com> [sunrpc: change idle timeout value for the backchannel] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Acked-by: Trond Myklebust <trond.myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-11 15:04:16 -04:00
Wei Yongjun	b0401d7253	sunrpc: move the close processing after do recvfrom method sunrpc: "Move close processing to a single place" (`d7979ae4a0`) moved the close processing before the recvfrom method. This may cause the close processing never to execute. So this patch moves it to the right place. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-27 17:18:38 -04:00
Ryusei Yamaguchi	ed2d8aed52	knfsd: Replace lock_kernel with a mutex in nfsd pool stats. lock_kernel() in knfsd was replaced with a mutex. The later commit `03cf6c9f49` ("knfsd: add file to export stats about nfsd pools") did not follow that change. This patch fixes the issue. Also move the get and put of nfsd_serv to the open and close methods (instead of start and stop methods) to allow atomic check and increment of reference count in the open method (where we can still return an error). Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Greg Banks <gnb@fmeh.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-08-25 12:39:37 -04:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Chuck Lever	335c54bdc4	NFSD: Prevent a buffer overflow in svc_xprt_names() The svc_xprt_names() function can overflow its buffer if it's so near the end of the passed in buffer that the "name too long" string still doesn't fit. Of course, it could never tell if it was near the end of the passed in buffer, since its only caller passes in zero as the buffer length. Let's make this API a little safer. Change svc_xprt_names() so it always checks for a buffer overflow, and change its only caller to pass in the correct buffer length. If svc_xprt_names() does overflow its buffer, it now fails with an ENAMETOOLONG errno, instead of trying to write a message at the end of the buffer. I don't like this much, but I can't figure out a clean way that's always safe to return some of the names, and an indication that the buffer was not long enough. The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is "File name too long". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:54:28 -04:00
H Hartley Sweeten	dcf1a3573e	net/sunrpc/svc_xprt.c: fix sparse warnings Fix the following sparse warnings in net/sunrpc/svc_xprt.c. warning: symbol 'svc_recv' was not declared. Should it be static? warning: symbol 'svc_drop' was not declared. Should it be static? warning: symbol 'svc_send' was not declared. Should it be static? warning: symbol 'svc_close_all' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-28 13:00:02 -04:00
Linus Torvalds	a63856252d	Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits) nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4 nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc nfsd41: Documentation/filesystems/nfs41-server.txt nfsd41: CREATE_EXCLUSIVE4_1 nfsd41: SUPPATTR_EXCLCREAT attribute nfsd41: support for 3-word long attribute bitmask nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify nfsd41: pass writable attrs mask to nfsd4_decode_fattr nfsd41: provide support for minor version 1 at rpc level nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap nfsd41: access_valid nfsd41: clientid handling nfsd41: check encode size for sessions maxresponse cached nfsd41: stateid handling nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op nfsd41: destroy_session operation nfsd41: non-page DRC for solo sequence responses nfsd41: Add a create session replay cache nfsd41: create_session operation ...	2009-04-06 13:25:56 -07:00
Andy Adamson	2f425878b6	nfsd: don't use the deferral service, return NFS4ERR_DELAY On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:12 -07:00
Chuck Lever	9652ada3fb	SUNRPC: Change svc_create_xprt() to take a @family argument The sv_family field is going away. Pass a protocol family argument to svc_create_xprt() instead of extracting the family from the passed-in svc_serv struct. Again, as this is a listener socket and not an address, we make this new argument an "int" protocol family, instead of an "sa_family_t." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:54:36 -04:00
Chuck Lever	156e62094a	SUNRPC: Clean up svc_find_xprt() calling sequence Clean up: add documentating comment and use appropriate data types for svc_find_xprt()'s arguments. This also eliminates a mixed sign comparison: @port was an int, while the return value of svc_xprt_local_port() is an unsigned short. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-28 15:53:57 -04:00
Greg Banks	03cf6c9f49	knfsd: add file to export stats about nfsd pools Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void )1) merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_()" by Harshula Jayasuriya. merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: Harshula Jayasuriya <harshula@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:42 -04:00
Greg Banks	59a252ff8c	knfsd: avoid overloading the CPU scheduler with enormous load averages Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374 Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:41 -04:00
Trond Myklebust	24c3767e41	SUNRPC: The sunrpc server code should not be used by out-of-tree modules Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-07 17:18:42 -05:00

1 2

76 Commits