1
Commit Graph

23572 Commits

Author SHA1 Message Date
Fred Isaman
31e6306a40 pnfsblock: note written INVAL areas for layoutcommit
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
650e2d39bd pnfsblock: bl_write_pagelist
Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and  leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.

When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.

[pnfsblock: bl_write_pagelist support functions]
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>

[SQUASHME: pnfsblock: mds_offset is set in the generic layer]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>

[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fixup blksize alignment in bl_setup_layoutcommit]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
9549ec01b0 pnfsblock: bl_read_pagelist
Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and  leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.

When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.

[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: read path error handling]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new read_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
b2be7811dd pnfsblock: cleanup_layoutcommit
In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
   extents list, for reads.

In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:

1. On setup_layoutcommit, the range is adjusted as before
   and a structure is allocated for communication with
   bl_encode_layoutcommit && bl_cleanup_layoutcommit
   (Generic layer provides a void-star to hang it on)

2. bl_encode_layoutcommit is called to do the actual
   encoding directly into xdr. The commit-extent-list is not
   freed and is stored on above structure.
   FIXME: The code is not yet converted to the new XDR cleanup

3. On cleanup the commit-extent-list is put back by a call
   to set_to_rw() as before, but with no need for XDR decoding
   of the list as before. And the commit-extent-list is freed.
   Finally allocated structure is freed.

[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees <rees@umich.edu>
[pnfsblock: introduce bl_committing list]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
[pnfsblock: cleanup_layoutcommit wants a status parameter]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
90ace12ac4 pnfsblock: encode_layoutcommit
In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
   extents list, for reads.

In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:

1. On setup_layoutcommit, the range is adjusted as before
   and a structure is allocated for communication with
   bl_encode_layoutcommit && bl_cleanup_layoutcommit
   (Generic layer provides a void-star to hang it on)

2. bl_encode_layoutcommit is called to do the actual
   encoding directly into xdr. The commit-extent-list is not
   freed and is stored on above structure.
   FIXME: The code is not yet converted to the new XDR cleanup

3. On cleanup the commit-extent-list is put back by a call
   to set_to_rw() as before, but with no need for XDR decoding
   of the list as before. And the commit-extent-list is freed.
   Finally allocated structure is freed.

[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
[pnfsblock: prevent commit list corruption]
[pnfsblock: fix layoutcommit with an empty opaque]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
9f3770422c pnfsblock: merge rw extents
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
c1c2a4cd35 pnfsblock: add extent manipulation functions
Adds working implementations of various support functions
to handle INVAL extents, needed by writes, such as
bl_mark_sectors_init and bl_is_sector_init.

[pnfsblock: fix 64-bit compiler warnings for extent manipulation]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[Implement release_inval_marks]
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
6d742ba538 pnfsblock: bl_find_get_extent
Implement bl_find_get_extent(), one of the core extent manipulation
routines.

[pnfsblock: Lookup list entry of layouts and tags in reverse order]
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>

pnfsblock: fix print format warnings for sector_t and size_t

gcc spews warnings about these on x86_64, e.g.:
fs/nfs/blocklayout/blocklayout.c:74: warning: format ‘%Lu’ expects type ‘long long unsigned int’, but argument 2 has type ‘sector_t’
fs/nfs/blocklayout/blocklayout.c:388: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
e9437ccef9 pnfsblock: xdr decode pnfs_block_layout4
XDR decodes the block layout payload sent in LAYOUTGET result, storing
the result in an extent list.

[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix bug getting pnfs_layout_type in translate_devid().]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
2f9fd18260 pnfsblock: call and parse getdevicelist
Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO
for each device returned.

[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
[pnfsblock: fix pnfs_deviceid references]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix print format warnings for sector_t and size_t]
[pnfs-block: #include <linux/vmalloc.h>]
[pnfsblock: no PNFS_NFS_SERVER]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfsblock: fix bug determining size of striped volume]
[pnfsblock: fix oops when using multiple devices]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: get rid of vmap and deviceid->area structure]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
03341d2cc9 pnfsblock: merge extents
Replace a stub, so that extents underlying the layouts are properly
added, merged, or ignored as necessary.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: delete the new node before put it]
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
a60d2ebd93 pnfsblock: lseg alloc and free
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix bug getting pnfs_layout_type in translate_devid().]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Jim Rees
025a70ed65 pnfsblock: remove device operations
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[upcall bugfixes]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Jim Rees
fe0a9b7408 pnfsblock: add device operations
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[upcall bugfixes]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
9e69296999 pnfsblock: basic extent code
Adds structures and basic create/delete code for extents.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Benny Halevy
e9643fe80d pnfsblock: use pageio_ops api
[pnfsblock: use pnfs_generic_pg_init_read/write]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Fred Isaman
155e7524f2 pnfsblock: add blocklayout Kconfig option, Makefile, and stubs
Define a configuration variable to enable/disable compilation of the
block driver code.

Add the minimal structure for a pnfs block layout driver, and empty
list-heads that will hold the extent data

[pnfsblock: make NFS_V4_1 select PNFS_BLOCK]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs-block: fix CONFIG_PNFS_BLOCK dependencies]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: layout alloc and free]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: define module alias]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Andy Adamson
db29c08909 pnfs: cleanup_layoutcommit
This gives layout driver a chance to cleanup structures they put in at
encode_layoutcommit.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixup layout header pointer for layoutcommit]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Fred Isaman
dae100c2b1 pnfs: ask for layout_blksize and save it in nfs_server
Block layout needs it to determine IO size.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Tao Guo <glorioustao@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Benny Halevy
738fd0f360 pnfs: add set-clear layoutdriver interface
To allow layout driver to issue getdevicelist at mount time, and clean up
at umount time.

[fixup non NFS_V4_1 set_pnfs_layoutdriver definition]
[pnfs: pass mntfh down the init_pnfs path]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Andy Adamson
7f11d8d38d pnfs: GETDEVICELIST
The block driver uses GETDEVICELIST

Signed-off-by: Andy Adamson <andros@netapp.com>
[pass struct nfs_server * to getdevicelist]
[get machince creds for getdevicelist]
[fix getdevicelist decode sizing]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao
3557c6c3be pnfs: use lwb as layoutcommit length
Using NFS4_MAX_UINT64 will break current protocol.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao
a9bae5666d pnfs: let layoutcommit handle a list of lseg
There can be multiple lseg per file, so layoutcommit should be
able to handle it.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao
9fa4075878 pnfs: save layoutcommit cred at layout header init
No need to save it for every lseg.
No need to save it at every pnfs_set_layoutcommit.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:14 -04:00
Peng Tao
acff588053 pnfs: save layoutcommit lwb at layout header
No need to save it for every lseg.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:14 -04:00
Bryan Schumaker
374e4e3ec3 Additional readdir cookie loop information
Print out the name of the file that triggers the cookie loop  message to
make it slightly easier to track down the cause.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-30 14:37:14 -04:00
Trond Myklebust
0c0308066c NFS: Fix spurious readdir cookie loop messages
If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

The patch also fixes a second loop detection bug by ensuring
that after turning on the ctx->duped flag, we read at least one new
cookie into ctx->dir_cookie before attempting to match with
ctx->dup_cookie.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-30 14:34:50 -04:00
Trond Myklebust
ed1e6211a0 NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
nfs_mark_return_delegation() is usually called without any locking, and
so it is not safe to dereference delegation->inode. Since the inode is
only used to discover the nfs_client anyway, it makes more sense to
have the callers pass a valid pointer to the nfs_server as a parameter.

Reported-by: Ian Kent <raven@themaw.net>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-25 15:37:29 -04:00
Jeff Layton
73ca1001ed nfs: don't use d_move in nfs_async_rename_done
If the task that initiated the sillyrename ends up being killed by a
fatal signal, then it will eventually return back to userspace and end
up releasing the i_mutex. d_move however needs to be done while holding
the i_mutex.

Instead of using d_move here, just unhash the old and new dentries to
prevent them from being found by lookups. With this change though, the
dentries are now incorrect post-rename and do not reflect the actual
name of the file on the server. I'm proceeding under the assumption
that since they are unhashed that this isn't really a problem.

In order for the sillydelete to still work though, the dname must be
copied earlier when setting up the sillydelete info, and the name must
be recopied if the sillydelete info has to be moved to a new dentry.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-25 15:00:21 -04:00
Stephen Rothwell
5f00bcb38e Merge branch 'master' into devel and apply fixup from Stephen Rothwell:
vfs/nfs: fixup for nfs_open_context change

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-25 14:53:52 -04:00
Aneesh Kumar K.V
48e370ff93 fs/9p: add 9P2000.L unlinkat operation
unlinkat - Remove a directory entry

size[4] Tunlinkat tag[2] dirfid[4] name[s] flag[4]
size[4] Runlinkat tag[2]

older Tremove have the below request format

size[4] Tremove tag[2] fid[4]

The remove message is used to remove a directory entry either file or directory
The remove opreation is actually a directory opertation and should ideally have
dirfid, if not we cannot represent the fid on server with anything other than
name. We will have to derive the directory name from fid in the Tremove request.

NOTE: The operation doesn't clunk the unlink fid.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:52 -05:00
Aneesh Kumar K.V
9e8fb38e7d fs/9p: add 9P2000.L renameat operation
renameat - change name of file or directory

size[4] Trenameat tag[2] olddirfid[4] oldname[s] newdirfid[4] newname[s]
size[4] Rrenameat tag[2]

older Trename have the below request format

size[4] Trename tag[2] fid[4] newdirfid[4] name[s]

The rename message is used to change the name of a file, possibly moving it
to a new directory. The rename opreation is actually a directory opertation
and should ideally have olddirfid, if not we cannot represent the fid on server
with anything other than name. We will have to derive the old directory name
from fid in the Trename request.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:51 -05:00
Aneesh Kumar K.V
ed80fcfac2 fs/9p: Always ask new inode in create
This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:50 -05:00
Prem Karat
a2dd43bb0d fs/9p: Fix invalid mount options/args
Without this fix, if any invalid mount options/args are passed while mouting
the 9p fs, no error (-EINVAL) is returned and default arg value is assigned.

This fix returns -EINVAL when an invalid arguement is found while parsing
mount options.

Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V
fd2421f544 fs/9p: When doing inode lookup compare qid details and inode mode bits.
This make sure we don't use wrong inode from the inode hash. The inode number
of the file deleted is reused by the next file system object created
and if we only use inode number for inode hash lookup we could end up
with wrong struct inode.

Also compare inode generation number. Not all Linux file system provide
st_gen in userspace. So it could be 0;

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V
2053d67c54 fs/9p: remove rename work around in 9p
Now that VFS does the right thing remove the work around.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:47 -05:00
Linus Torvalds
bbd9d6f7fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
  vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
  isofs: Remove global fs lock
  jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
  fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
  mm/truncate.c: fix build for CONFIG_BLOCK not enabled
  fs:update the NOTE of the file_operations structure
  Remove dead code in dget_parent()
  AFS: Fix silly characters in a comment
  switch d_add_ci() to d_splice_alias() in "found negative" case as well
  simplify gfs2_lookup()
  jfs_lookup(): don't bother with . or ..
  get rid of useless dget_parent() in btrfs rename() and link()
  get rid of useless dget_parent() in fs/btrfs/ioctl.c
  fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
  drivers: fix up various ->llseek() implementations
  fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
  Ext4: handle SEEK_HOLE/SEEK_DATA generically
  Btrfs: implement our own ->llseek
  fs: add SEEK_HOLE and SEEK_DATA flags
  reiserfs: make reiserfs default to barrier=flush
  ...

Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
2011-07-22 19:02:39 -07:00
Konstantin Khlebnikov
5a9a43646c vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
Replace unclear (struct dentry *) to (struct file *) typecast with ERR_CAST() macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:13 -04:00
Jan Kara
d769b3c2ab isofs: Remove global fs lock
sbi->s_mutex isn't needed for isofs at all so we can just remove it. Generally,
since isofs is always mounted read-only, filesystem structure cannot change
under us.  So buffer_head contents stays constant after it's filled in. That
leaves us with possible changes of global data structures. Superblock changes
only during filesystem mount (even remount does not change it), inodes are only
filled in during reading from disk. So there are no changes of these structures
to bother about.

Arguments why sbi->s_mutex can be removed at each place:
isofs_readdir: Accesses sb, inode, filp, local variables => s_mutex not needed
isofs_lookup: Protected by directory's i_mutex. Accesses sb, inode, dentry,
  local variables => s_mutex not needed
rock_ridge_symlink_readpage: Protected by page lock. Accesses sb, inode,
  local variables => s_mutex not needed.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:12 -04:00
Al Viro
22ba747f66 jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
We don't generate IN_DELETE_SELF on victim of overwriting rename() if
it happens to be a directory.  Trivially fixed by doing to ->i_nlink
what we do ->pino_nlink a couple of lines later in jffs2_rename().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:11 -04:00
Al Viro
841590ce16 fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
On ramfs and other simple_rename() users IN_DELETE_SELF is not generated
for victim of overwriting rename() if it's is a directory.  Works on
most of the local filesystems and really trivial to fix...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:11 -04:00
Linus Torvalds
8209f53d79 Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc
* 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits)
  ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
  ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction
  connector: add an event for monitoring process tracers
  ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED
  ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()
  ptrace_init_task: initialize child->jobctl explicitly
  has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/
  ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop
  ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/
  ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented()
  ptrace: ptrace_reparented() should check same_thread_group()
  redefine thread_group_leader() as exit_signal >= 0
  do not change dead_task->exit_signal
  kill task_detached()
  reparent_leader: check EXIT_DEAD instead of task_detached()
  make do_notify_parent() __must_check, update the callers
  __ptrace_detach: avoid task_detached(), check do_notify_parent()
  kill tracehook_notify_death()
  make do_notify_parent() return bool
  ptrace: s/tracehook_tracer_task()/ptrace_parent()/
  ...
2011-07-22 15:06:50 -07:00
Linus Torvalds
c1f792a5bf Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (49 commits)
  xfs: add size update tracepoint to IO completion
  xfs: convert AIL cursors to use struct list_head
  xfs: remove confusing ail cursor wrapper
  xfs: use a cursor for bulk AIL insertion
  xfs: failure mapping nfs fh to inode should return ESTALE
  xfs: Remove the second parameter to xfs_sb_count()
  xfs: remove the dead XFS_DABUF_DEBUG code
  xfs: remove leftovers of the old btree tracing code
  xfs: remove the dead QUOTADEBUG code
  xfs: remove the unused xfs_buf_delwri_sort function
  xfs: remove wrappers around b_iodone
  xfs: remove wrappers around b_fspriv
  xfs: add a proper transaction pointer to struct xfs_buf
  xfs: factor out xfs_da_grow_inode_int
  xfs: factor out xfs_dir2_leaf_find_stale
  xfs: cleanup struct xfs_dir2_free
  xfs: reshuffle dir2 headers
  xfs: start periodic workers later
  Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc"
  xfs: remove variables that serve no purpose in xfs_alloc_ag_vextent_exact()
  ...
2011-07-22 13:16:33 -07:00
Linus Torvalds
6aaf4404ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: don't limit active work items
  dlm: use workqueue for callbacks
  dlm: remove deadlock debug print
  dlm: improve rsb searches
  dlm: keep lkbs in idr
  dlm: fix kmalloc args
  dlm: don't do pointless NULL check, use kzalloc and fix order of arguments
  dlm: dump address of unknown node
  dlm: use vmalloc for hash tables
  dlm: show addresses in configfs
2011-07-22 13:16:07 -07:00
Linus Torvalds
ba1f9db908 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
  hfsplus: ensure bio requests are not smaller than the hardware sectors
  hfsplus: Add additional range check to handle on-disk corruptions
  hfsplus: Add error propagation for hfsplus_ext_write_extent_locked
  hfsplus: add error checking for hfs_find_init()
  hfsplus: lift the 2TB size limit
  hfsplus: fix overflow in hfsplus_read_wrapper
  hfsplus: fix overflow in hfsplus_get_block
  hfsplus: assignments inside `if' condition clean-up
2011-07-22 13:12:17 -07:00
Linus Torvalds
49302baa64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: combine duplicated block freeing routines
  GFS2: Add S_NOSEC support
  GFS2: Automatically adjust glock min hold time
  GFS2: Cache dir hash table in a contiguous buffer
2011-07-22 13:10:41 -07:00
Linus Torvalds
59a7ac1211 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (32 commits)
  MAINTAINERS: change e-mail of Adrian Hunter
  UBIFS: fix master node recovery
  UBIFS: improve power cut emulation testing
  UBIFS: rename recovery testing variables
  UBIFS: remove custom list of superblocks
  UBIFS: stop re-defining UBI operations
  UBIFS: switch to I/O helpers
  UBIFS: switch to ubifs_leb_write
  UBIFS: switch to ubifs_leb_read
  UBIFS: introduce more I/O helpers
  UBIFS: always print stacktrace when switching to R/O mode
  UBIFS: remove unused and unneeded debugging function
  UBIFS: add global debugfs knobs
  UBIFS: introduce debugfs helpers
  UBIFS: re-arrange debugging code a bit
  UBIFS: be more informative in failure mode
  UBIFS: switch self-check knobs to debugfs
  UBIFS: lessen amount of debugging check types
  UBIFS: introduce helper functions for debugging checks and tests
  UBIFS: amend debugging inode size check function prototype
  ...
2011-07-22 13:09:35 -07:00
Seth Forshee
6596528e39 hfsplus: ensure bio requests are not smaller than the hardware sectors
Currently all bio requests are 512 bytes, which may fail for media
whose physical sector size is larger than this. Ensure these
requests are not smaller than the block device logical block size.

BugLink: http://bugs.launchpad.net/bugs/734883
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-07-22 16:37:44 +02:00
Naohiro Aota
aac4e4198e hfsplus: Add additional range check to handle on-disk corruptions
'recoff' is read from disk and used for an argument to memcpy, so if
the value read from disk is larger than the page size, it result to
"general protection fault". This patch add additional range check for
the value, so that disk fuzz won't cause such fault.

Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-07-22 16:36:56 +02:00
Oleg Nesterov
eac1b5e57d ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
Test-case:

	void *tfunc(void *arg)
	{
		execvp("true", NULL);
		return NULL;
	}

	int main(void)
	{
		int pid;

		if (fork()) {
			pthread_t t;

			kill(getpid(), SIGSTOP);

			pthread_create(&t, NULL, tfunc, NULL);

			for (;;)
				pause();
		}

		pid = getppid();
		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);

		while (wait(NULL) > 0)
			ptrace(PTRACE_CONT, pid, 0,0);

		return 0;
	}

It is racy, exit_notify() does __wake_up_parent() too. But in the
likely case it triggers the problem: de_thread() does release_task()
and the old leader goes away without the notification, the tracer
sleeps in do_wait() without children/tracees.

Change de_thread() to do __wake_up_parent(traced_leader->parent).
Since it is already EXIT_DEAD we can do this without ptrace_unlink(),
EXIT_DEAD threads do not exist from do_wait's pov.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-07-22 15:10:49 +02:00