1
Commit Graph

875 Commits

Author SHA1 Message Date
Hisashi Hifumi
229615def3 GFS2: Pagecache usage optimization on GFS2
I introduced "is_partially_uptodate" aops for GFS2.

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.

I did a performance test using the sysbench.

#sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1
--file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0
--file-rw-ratio=1 run

-2.6.29-rc6
Test execution summary:
    total time:                          202.6389s
    total number of events:              200000
    total time taken by event execution: 2580.0480
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0129s
         max:                            49.5852s
         approx.  95 percentile:         0.0462s

-2.6.29-rc6-patched
Test execution summary:
    total time:                          177.8639s
    total number of events:              200000
    total time taken by event execution: 2419.0199
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0121s
         max:                            52.4306s
         approx.  95 percentile:         0.0444s

arch: ia64
pagesize: 16k
blocksize: 4k

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:25 +00:00
Hannes Eder
02ab172159 GFS2: fix sparse warning: Should it be static?
Impact: Make symbol static.

Fix this sparse warning:
  fs/gfs2/rgrp.c:188:5: warning: symbol 'gfs2_bitfit' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:25 +00:00
Hannes Eder
075ac44875 GFS2: fix sparse warnings: constant is so big it is ...
Fix this sparse warnings:
  fs/gfs2/rgrp.c:156:23: warning: constant 0xffffffffffffffff is so big it is unsigned long long
  fs/gfs2/rgrp.c:157:23: warning: constant 0xaaaaaaaaaaaaaaaa is so big it is unsigned long long
  fs/gfs2/rgrp.c:158:23: warning: constant 0x5555555555555555 is so big it is long long
  fs/gfs2/rgrp.c:194:20: warning: constant 0x5555555555555555 is so big it is long long
  fs/gfs2/rgrp.c:204:44: warning: constant 0x5555555555555555 is so big it is long long

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:24 +00:00
Steven Whitehouse
b9a9694570 GFS2: Support quota/noquota mount arguments
This adds support for "quota" and "noquota" mount options in addition to the
existing "quota=on/off/account" so that we are compatible with the names by
which these options are more generally known.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:23 +00:00
Steven Whitehouse
223b2b889f GFS2: Fix alignment issue and tidy gfs2_bitfit
An alignment issue with the existing bitfit algorithm was reported
on IA64. This patch attempts to fix that, and also to tidy up the
code a bit. There is now more documentation about how this works
and it has survived a number of different tests.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:22 +00:00
Steven Whitehouse
64d576ba23 GFS2: Add a "demote a glock" interface to sysfs
This adds a sysfs file called demote_rq to GFS2's
per filesystem directory. Its possible to use this
file to demote arbitrary glocks in exactly the same
way as if a request had come in from a remote node.

This is intended for testing issues relating to caching
of data under glocks. Despite that, the interface is
generic enough to send requests to any type of glock,
but be careful as its not always safe to send an
arbitrary message to an arbitrary glock. For that reason
and to prevent DoS, this interface is restricted to root
only.

The messages look like this:

<type>:<glocknumber> <mode>

Example:

echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq

Which means "please demote inode glock (type 2) number 13324 so that
I can get an EX (exclusive) lock". The lock modes are those which
would normally be sent by a remote node in its callback so if you
want to unlock a glock, you use EX, to demote to shared, use SH or PR
(depending on whether you like GFS2 or DLM lock modes better!).

If the glock doesn't exist, you'll get -ENOENT returned. If the
arguments don't make sense, you'll get -EINVAL returned.

The plan is that this interface will be used in combination with
the blktrace patch which I recently posted for comments although
it is, of course, still useful in its own right.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:22 +00:00
Steven Whitehouse
02e3cc70ec GFS2: Expose UUID via sysfs/uevent
Since we have a UUID, we ought to expose it to the user via sysfs
and uevents. We already have the fs name in both of these places
(a combination of the lock proto and lock table name) so if we add
the UUID as well, we have a full set.

For older filesystems (i.e. those created before mkfs.gfs2 was writing
UUIDs by default) the sysfs file will appear zero length, and no UUID
env var will be added to the uevents.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:21 +00:00
Steven Whitehouse
f15ab5619d GFS2: Support generation of discard requests
This patch allows GFS2 to generate discard requests for blocks which are
no longer useful to the filesystem (i.e. those which have been freed as
the result of an unlink operation). The requests are generated at the
time which those blocks become available for reuse in the filesystem.

In order to use this new feature, you have to specify the "discard"
mount option. The code coalesces adjacent blocks into a single extent
when generating the discard requests, thus generating the minimum
number.

If an error occurs when the request has been sent to the block device,
then it will print a message and turn off the requests for that
filesystem. If the problem is temporary, then you can use remount to
turn the option back on again. There is also a nodiscard mount option
so that you can use remount to turn discard requests off, if required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:20 +00:00
Steven Whitehouse
d8348de06f GFS2: Fix deadlock on journal flush
This patch fixes a deadlock when the journal is flushed and there
are dirty inodes other than the one which caused the journal flush.
Originally the journal flushing code was trying to obtain the
transaction glock while running the flush code for an inode glock.
We no longer require the transaction glock at this point in time
since we know that any attempt to get the transaction glock from
another node will result in a journal flush. So if we are flushing
the journal, we can be sure that the transaction lock is still
cached from when the transaction was started.

By inlining a version of gfs2_trans_begin() (minus the bit which
gets the transaction glock) we can avoid the deadlock problems
caused if there is a demote request queued up on the transaction
glock.

In addition I've also moved the umount rwsem so that it covers
the glock workqueue, since it all demotions are done by this
workqueue now. That fixes a bug on umount which I came across
while fixing the original problem.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:18 +00:00
Steven Whitehouse
e7c8707ea2 GFS2: Fix error path ref counting for root inode
We were keeping hold of an extra ref to the root inode in one
of the error paths, that resulted in a hang.

Reported-by: Nate Straz <nstraz@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Robert Peterson <rpeterso@redhat.com>
2009-03-24 11:21:17 +00:00
Steven Whitehouse
ac2425e7d3 GFS2: Remove unused field from glock
The time stamp field is unused in the glock now that we are
using a shrinker, so that we can remove it and save sizeof(unsigned long)
bytes in each glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:17 +00:00
Steven Whitehouse
f057f6cdf6 GFS2: Merge lock_dlm module into GFS2
This is the big patch that I've been working on for some time
now. There are many reasons for wanting to make this change
such as:
 o Reducing overhead by eliminating duplicated fields between structures
 o Simplifcation of the code (reduces the code size by a fair bit)
 o The locking interface is now the DLM interface itself as proposed
   some time ago.
 o Fewer lookups of glocks when processing replies from the DLM
 o Fewer memory allocations/deallocations for each glock
 o Scope to do further optimisations in the future (but this patch is
   more than big enough for now!)

Please note that (a) this patch relates to the lock_dlm module and
not the DLM itself, that is still a separate module; and (b) that
we retain the ability to build GFS2 as a standalone single node
filesystem with out requiring the DLM.

This patch needs a lot of testing, hence my keeping it I restarted
my -git tree after the last merge window. That way, this has the maximum
exposure before its merged. This is (modulo a few minor bug fixes) the
same patch that I've been posting on and off the the last three months
and its passed a number of different tests so far.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:14 +00:00
Steven Whitehouse
22077f57de GFS2: Remove "double" locking in quota
We only really need a single spin lock for the quota data, so
lets just use the lru lock for now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2009-03-24 11:21:13 +00:00
Abhijith Das
0a7ab79c5b GFS2: change gfs2_quota_scan into a shrinker
Deallocation of gfs2_quota_data objects now happens on-demand through a
shrinker instead of routinely deallocating through the quotad daemon.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:12 +00:00
Abhijith Das
2db2aac255 GFS2: Bring back lvb-related stuff to lock_nolock to support quotas
The quota code uses lvbs and this is currently not implemented in
lock_nolock, thereby causing panics when quota is enabled with
lock_nolock. This patch adds the relevant bits.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:11 +00:00
Steven Whitehouse
6f04c1c7fe GFS2: Fix remount argument parsing
The following patch fixes an issue relating to remount and argument
parsing. After this fix is applied, remount becomes atomic in that
it either succeeds changing the mount to the new state, or it fails
and leaves it in the old state. Previously it was possible for the
parsing of options to fail part way though and for the fs to be left
in a state where some of the new arguments had been applied, but some
had not.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24 11:21:10 +00:00
Takashi Sato
c4be0c1dc4 filesystem freeze: add error handling of write_super_lockfs/unlockfs
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 16:54:42 -08:00
Benjamin Marzinski
c8f554b947 GFS2: Fix typo in gfs_page_mkwrite()
There is a typo in gfs2_page_mkwrite()

gfs2_write_alloc_required() expects pos to be the offset in bytes. However,
instead of the page index being shifted by by PAGE_CACHE_SHIFT, it was shifted
by (PAGE_CACHE_SIZE - inode->i_blkbits).  This patch simply shifts the page
index by the proper amount.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-07 08:58:28 +00:00
Steven Whitehouse
0027ce681e GFS2: LSF and LBD are now one and the same
As a result of this recent patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b3a6ffe16b5cc48abe7db8d04882dc45280eb693
We only need to depend on LBD.

Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-07 08:57:35 +00:00
Steven Whitehouse
e4fefbac6c GFS2: Set GFP_NOFS when allocating page on write
We need to ensure that we always set GFP_NOFS in this one
particular case when allocating pages for write.

Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-07 08:57:04 +00:00
Julia Lawall
eb8374e71f GFS2: Use DEFINE_SPINLOCK
SPIN_LOCK_UNLOCKED is deprecated.  The following makes the change suggested
in Documentation/spinlocks.txt

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
declarer name DEFINE_SPINLOCK;
identifier xxx_lock;
@@

- spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+ DEFINE_SPINLOCK(xxx_lock);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:45:02 +00:00
Steven Whitehouse
88a19ad066 GFS2: Fix use-after-free bug on umount (try #2)
This should solve the issue with the previous attempt at fixing this.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:19 +00:00
Steven Whitehouse
fefc03bfed Revert "GFS2: Fix use-after-free bug on umount"
This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43.

The original patch is causing problems in relation to order of
operations at umount in relation to jdata files. I need to fix
this a different way.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:18 +00:00
Steven Whitehouse
7ed122e42c GFS2: Streamline alloc calculations for writes
This patch removes some unused code, and make the calculation
of the number of blocks required conditional in order to reduce
the number of times this (potentially expensive) calculation
is done.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:17 +00:00
Steven Whitehouse
9a776db737 GFS2: Send useful information with uevent messages
In order to distinguish between two differing uevent messages
and to avoid using the (racy) method of reading status from
sysfs in future, this adds some status information to our
uevent messages.

Btw, before anybody says "sysfs isn't racy", I'm aware of that,
but the way that GFS2 was using it (send an ambiugous uevent and
then expect the receiver to read sysfs to find out the status
of the reported operation) was.

The additional benefit of using the new interface is that it
should be possible for a node to recover multiple journals
at the same time, since there is no longer any confusion as
to which journal the status belongs to.

At some future stage, when all the userland programs have been
converted, I intend to remove the old interface.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:15 +00:00
Steven Whitehouse
3af165ac4d GFS2: Fix use-after-free bug on umount
There was a use-after-free with the GFS2 super block during
umount. This patch moves almost all of the umount code from
->put_super into ->kill_sb, the only bit that cannot be moved
being the glock hash clearing which has to remain as ->put_super
due to umount ordering requirements. As a result its now obvious
that the kfree is the final operation, whereas before it was
hidden in ->put_super.

Also gfs2_jindex_free is then only referenced from a single file
so thats moved and marked static too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:14 +00:00
Steven Whitehouse
2e204703a1 GFS2: Remove ancient, unused code
Remove code that used to have something to do with initrd
but has been unused for a long time.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:13 +00:00
Steven Whitehouse
2bfb6449b7 GFS2: Move four functions from super.c
The functions which are being moved can all be marked
static in their new locations, since they only have
a single caller each. Their new locations are more
logical than before and some of the functions are
small enough that the compiler might well inline them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:12 +00:00
Steven Whitehouse
b52896813c GFS2: Fix bug in gfs2_lock_fs_check_clean()
gfs2_lock_fs_check_clean() should not be calling gfs2_jindex_hold()
since it doesn't work like rindex hold, despite the comment. That
allows gfs2_jindex_hold() to be moved into ops_fstype.c where it
can be made static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:11 +00:00
Steven Whitehouse
fdd1062eba GFS2: Send some sensible sysfs stuff
We ought to inform the user of the locktable and lockproto for each
uevent we generate.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:10 +00:00
Steven Whitehouse
97cc1025b1 GFS2: Kill two daemons with one patch
This patch removes the two daemons, gfs2_scand and gfs2_glockd
and replaces them with a shrinker which is called from the VM.

The net result is that GFS2 responds better when there is memory
pressure, since it shrinks the glock cache at the same rate
as the VFS shrinks the dcache and icache. There are no longer
any time based criteria for shrinking glocks, they are kept
until such time as the VM asks for more memory and then we
demote just as many glocks as required.

There are potential future changes to this code, including the
possibility of sorting the glocks which are to be written back
into inode number order, to get a better I/O ordering. It would
be very useful to have an elevator based workqueue implementation
for this, as that would automatically deal with the read I/O cases
at the same time.

This patch is my answer to Andrew Morton's remark, made during
the initial review of GFS2, asking why GFS2 needs so many kernel
threads, the answer being that it doesn't :-) This patch is a
net loss of about 200 lines of code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:09 +00:00
Steven Whitehouse
9ac1b4d9b6 GFS2: Move gfs2_recoverd into recovery.c
By moving gfs2_recoverd, we can make an additional function static
and it also leaves only (the already scheduled for removal) gfs2_glockd
in daemon.c.

At the same time the declaration of gfs2_quotad is moved to quota.h
to reflect the new location of gfs2_quotad in a previous patch. Also
the recovery.h and quota.h headers are cleaned up.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:07 +00:00
Steven Whitehouse
813e0c46c9 GFS2: Fix "truncate in progress" hang
Following on from the recent clean up of gfs2_quotad, this patch moves
the processing of "truncate in progress" inodes from the glock workqueue
into gfs2_quotad. This fixes a hang due to the "truncate in progress"
processing requiring glocks in order to complete.

It might seem odd to use gfs2_quotad for this particular item, but
we have to use a pre-existing thread since creating a thread implies
a GFP_KERNEL memory allocation which is not allowed from the glock
workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
both scheduled for removal at some (hopefully not too distant) future
point. That leaves only gfs2_quotad whose workload is generally fairly
light and is easily adapted for this extra task.

Also, as a result of this change, it opens the way for a future patch to
make the reading of the inode's information asynchronous with respect to
the glock workqueue, which is another improvement that has been on the list
for some time now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:06 +00:00
Steven Whitehouse
37b2c8377c GFS2: Clean up & move gfs2_quotad
This patch is a clean up of gfs2_quotad prior to giving it an
extra job to do in addition to the current portfolio of updating
the quota and statfs information from time to time.

As a result it has been moved into quota.c allowing one of the
functions it calls to be made static. Also the clean up allows
the two existing functions to have separate timeouts and also
to coexist with its future role of dealing with the "truncate in
progress" inode flag.

The (pointless) setting of gfs2_quotad_secs is removed since we
arrange to only wake up quotad when one of the two timers expires.

In addition the struct gfs2_quota_data is moved into a slab cache,
mainly for easier debugging. It should also be possible to use
a shrinker in the future, rather than the current scheme of scanning
the quota data entries from time to time.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:05 +00:00
Steven Whitehouse
fa75cedc3d GFS2: Add more detail to debugfs glock dumps
Although the glock dumps print quite a lot of information about
the glocks themselves, there are more things which can be
usefully added to the dump realting to the objects themselves.

This patch adds a few more fields to the inode and resource
group lines, which should be useful for debugging.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:04 +00:00
Steven Whitehouse
73f749483e GFS2: Banish struct gfs2_rgrpd_host
This patch moves the final field so that we can get rid
of struct gfs2_rgrpd_host, as promised some time ago. Also
by rearranging the fields slightly, we are able to reduce
the size of the gfs2_rgrpd structure at the same time.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:03 +00:00
Steven Whitehouse
cfc8b54922 GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd
The second of three fields which need to move, in order
to remove the struct gfs2_rgrpd_host.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:02 +00:00
Steven Whitehouse
d8b71f7381 GFS2: Move rg_igeneration into struct gfs2_rgrpd
This moves one of the fields of struct gfs2_rgrpd_host into
the struct gfs2_rgrpd with the eventual aim of removing
the struct rgrpd_host completely.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:39:01 +00:00
Steven Whitehouse
383f01fbf4 GFS2: Banish struct gfs2_dinode_host
The final field in gfs2_dinode_host was the i_flags field. Thats
renamed to i_diskflags in order to avoid confusion with the existing
inode flags, and moved into the inode proper at a suitable location
to avoid creating a "hole".

At that point struct gfs2_dinode_host is no longer needed and as
promised (quite some time ago!) it can now be removed completely.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:59 +00:00
Steven Whitehouse
c9e9888677 GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize
This patch moved the i_size field from the gfs2_dinode_host and
following the ext3 convention renames it i_disksize.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:58 +00:00
Steven Whitehouse
3767ac21f4 GFS2: Move di_eattr into "proper" inode
This moves the di_eattr field out of gfs2_inode_host and
into the inode proper.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:57 +00:00
Steven Whitehouse
ad6203f2b4 GFS2: Move "entries" into "proper" inode
This moves the directory entry count into the proper inode.
Potentially we could get this to share the space used by
something else in the future, but this is one more step
on the way to removing the gfs2_dinode_host structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:56 +00:00
Steven Whitehouse
bcf0b5b348 GFS2: Move generation number into "proper" part of inode
This moves the generation number from the gfs2_dinode_host
into the gfs2_inode structure. Eventually the plan is to get
rid of the gfs2_dinode_host structure completely.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:55 +00:00
Harvey Harrison
55ba474dae GFS2: sparse annotation of gl->gl_spin
fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context
fs/gfs2/glock.c:308:5:    context '*gl+28': wanted >= 1, got 0
fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context
fs/gfs2/glock.c:529:2:    context '*gl+28': wanted >= 1, got 0
fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context
fs/gfs2/glock.c:925:3:    context '*gl+28': wanted >= 1, got 0

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:50 +00:00
Steven Whitehouse
1bb7322fd0 GFS2: Fix up jdata writepage/delete_inode
There is a bug in writepage and delete_inode which allows jdata files to
invalidate pages from the address space without being in a transaction at
the time. This causes problems in case the pages are in the journal. This
patch fixes that case and prevents the resulting oops.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:49 +00:00
Steven Whitehouse
b276058371 GFS2: Rationalise header files
Move the contents of some headers which contained very
little into more sensible places, and remove the original
header files. This should make it easier to find things.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05 07:38:48 +00:00
Steven Whitehouse
e9079cce20 GFS2: Support for FIEMAP ioctl
This patch implements the FIEMAP ioctl for GFS2. We can use the generic
code (aside from a lock order issue, solved as per Ted Tso's suggestion)
for which I've introduced a new variant of the generic function. We also
have one exception to deal with, namely stuffed files, so we do that
"by hand", setting all the required flags.

This has been tested with a modified (I could only find an old version) of
Eric's test program, and appears to work correctly.

This patch does not currently support FIEMAP of xattrs, but the plan is to add
that feature at some future point.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>
2009-01-05 07:38:46 +00:00
Nick Piggin
54566b2c15 fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
David Howells
3de7be3355 CRED: Wrap task credential accesses in the GFS2 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:53 +11:00
Christoph Hellwig
440037287c [PATCH] switch all filesystems over to d_obtain_alias
Switch all users of d_alloc_anon to d_obtain_alias.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:01 -04:00
Al Viro
3516586a42 [PATCH] make O_EXCL in nd->intent.flags visible in nd->flags
New flag: LOOKUP_EXCL.  Set before doing the final step of pathname
resolution on the paths that have LOOKUP_CREATE and O_EXCL.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:12:56 -04:00
Steven Whitehouse
a447c09324 vfs: Use const for kernel parser table
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13 10:10:37 -07:00
Linus Torvalds
13dd7f876d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: choose better identifiers
  dlm: remove bkl
  dlm: fix address compare
  dlm: fix locking of lockspace list in dlm_scand
  dlm: detect available userspace daemon
  dlm: allow multiple lockspace creates
2008-10-10 11:13:55 -07:00
Steven Whitehouse
254db57f9b GFS2: Support for I/O barriers
This patch adds barrier support to GFS2. There is not a lot of change
really... we just add the barrier flag when we write journal header
blocks. If the underlying device refuses to support them, we fall back
to the previous way of doing things (wait for the I/O and hope) since
there is nothing else we can do. There is no user configuration,
barriers will always be on unless the device refuses to support them.
This seems a reasonable solution to me since this is a correctness
issue.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-09-26 10:23:22 +01:00
Steven Whitehouse
719ee34467 GFS2: high time to take some time over atime
Until now, we've used the same scheme as GFS1 for atime. This has failed
since atime is a per vfsmnt flag, not a per fs flag and as such the
"noatime" flag was not getting passed down to the filesystems. This
patch removes all the "special casing" around atime updates and we
simply use the VFS's atime code.

The net result is that GFS2 will now support all the same atime related
mount options of any other filesystem on a per-vfsmnt basis. We do lose
the "lazy atime" updates, but we gain "relatime". We could add lazy
atime to the VFS at a later date, if there is a requirement for that
variant still - I suspect relatime will be enough.

Also we lose about 100 lines of code after this patch has been applied,
and I have a suspicion that it will speed things up a bit, even when
atime is "on". So it seems like a nice clean up as well.

From a user perspective, everything stays the same except the loss of
the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
least, and to be honest I don't think anybody ever used it) and that a
number of options which were ignored before now work correctly.

Please let me know if you've got any comments. I'm pushing this out
early so that you can all see what my plans are.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-09-18 13:53:59 +01:00
Steven Whitehouse
37ec89e83c GFS2: The war on bloat
The following patch shrinks the gfs2_args structure which is embedded in
every GFS2 superblock. It cuts down the size of the options to a single
unsigned int (the 13 bits of bitfields will be rounded up to that size
by the compiler) from the current 11 unsigned ints. So on x86 thats 44
bytes shrinking to 4 bytes, in each and every GFS2 superblock.

Signed-off-by: Steven Whitehouse <swhitho@redhat.com>
2008-09-18 13:49:32 +01:00
Abhijith Das
acd2c8aa02 GFS2: GFS2 will panic if you misspell any mount options
The gfs2 superblock pointer is NULL after a failed mount. When control
eventually goes to gfs2_kill_sb, we dereference this NULL pointer. This
patch ensures that the gfs2 superblock pointer is not NULL before being
dereferenced in gfs2_kill_sb.

Signed-off-by:   Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-09-15 16:08:32 +01:00
Bob Peterson
acb57a3652 GFS2: Direct IO write at end of file error
This patch fixes a problem whereby a direct_io write doesn't fall
back to buffered write properly at end of file.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-09-15 10:31:54 +01:00
Julien Brunel
bd1eb8818c GFS2: Use an IS_ERR test rather than a NULL test
In case of error, the function gfs2_inode_lookup returns an
ERR pointer, but never returns a NULL pointer. So a NULL test that
necessarily comes after an IS_ERR test should be deleted, and a NULL
test that may come after a call to this function should be
strengthened by an IS_ERR test.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@match_bad_null_test@
expression x, E;
statement S1,S2;
@@
x = gfs2_inode_lookup(...)
... when != x = E
* if (x != NULL)
S1 else S2
// </smpl>

Signed-off-by:  Julien Brunel <brunel@diku.dk>
Signed-off-by:  Julia Lawall <julia@diku.dk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-09-05 14:19:44 +01:00
Steven Whitehouse
dff5257473 GFS2: Fix race relating to glock min-hold time
In the case that a request for a glock arrives right after the
grant reply has arrived, it sometimes means that the gl_tstamp
field hasn't been updated recently enough. The net result is that
the min-hold time for the glock is ignored. If this happens
often enough, it leads to poor performance.

This patch adds an additional test, so that if the reply pending
bit is set on a glock, then it will select the maximum length of
time for the min-hold time, rather than looking at gl_tstamp.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-09-05 14:18:02 +01:00
David Teigland
0f8e0d9a31 dlm: allow multiple lockspace creates
Add a count for lockspace create and release so that create can
be called multiple times to use the lockspace from different places.
Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with
the previous behavior of returning -EEXIST if the lockspace already
exists.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-08-28 11:49:15 -05:00
Steven Whitehouse
0188d6c580 GFS2: Fix & clean up GFS2 rename
This patch fixes a locking issue in the rename code by ensuring that we hold
the per sb rename lock over both directory and "other" renames which involve
different parent directories.

At the same time, this moved the (only called from one place) function
gfs2_ok_to_move into the file that its called from, so we can mark it
static. This should make a code a bit easier to follow.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Peter Staubach <staubach@redhat.com>
2008-08-27 13:33:10 +01:00
Bob Peterson
72dbf4790f GFS2: rm on multiple nodes causes panic
This patch fixes a problem whereby simultaneous unlink, rmdir,
rename and link operations (e.g. rm -fR *) from multiple nodes
on the same GFS2 file system can cause kernel panics, hangs,
and/or memory corruption.  It also gets rid of all the non-rgrp
calls to gfs2_glock_nq_m.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-08-13 10:00:12 +01:00
Steven Whitehouse
9b8df98fc8 GFS2: Fix metafs mounts
This patch is intended to fix the issues reported in bz #457798. Instead
of having the metafs as a separate filesystem, it becomes a second root
of gfs2. As a result it will appear as type gfs2 in /proc/mounts, but it
is still possible (for backwards compatibility purposes) to mount it as
type gfs2meta. A new mount flag "meta" is introduced so that its possible
to tell the two cases apart in /proc/mounts.

As a result it becomes possible to mount type gfs2 with -o meta and
get the same result as mounting type gfs2meta. So it is possible to
mount just the metafs on its own. Currently if you do this, its then
impossible to mount the "normal" root of the gfs2 filesystem without
first unmounting the metafs root. I'm not sure if thats a feature or
a bug :-)

Either way, this is a great improvement on the previous scheme and I've
verified that it works ok with bind mounts on both the "normal" root
and the metafs root in various combinations.

There were also a bunch of functions in super.c which didn't belong there,
so this moves them into ops_fstype.c where they can be static. Hopefully
the mount/umount sequence is now more obvious as a result.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
2008-08-13 09:59:40 +01:00
Steven Whitehouse
c1e817d03a GFS2: Fix debugfs glock file iterator
Due to an incorrect iterator, some glocks were being missed from the
glock dumps obtained via debugfs. This patch fixes the problem and
ensures that we don't miss any glocks in future.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-08-13 09:59:10 +01:00
Al Viro
a569c711f6 [PATCH] don't pass nameidata to gfs2_lookupi()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:36 -04:00
Al Viro
e6305c43ed [PATCH] sanitize ->permission() prototype
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
  about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
  MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:14 -04:00
Alexey Dobriyan
51cc50685a SL*B: drop kmem cache argument from constructor
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:07 -07:00
Linus Torvalds
38c46578ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  [GFS2] Fix GFS2's use of do_div() in its quota calculations
  [GFS2] Remove unused declaration
  [GFS2] Remove support for unused and pointless flag
  [GFS2] Replace rgrp "recent list" with mru list
  [GFS2] Allow local DF locks when holding a cached EX glock
  [GFS2] Fix delayed demote race
  [GFS2] don't call permission()
  [GFS2] Fix module building
  [GFS2] Glock documentation
  [GFS2] Remove all_list from lock_dlm
  [GFS2] Remove obsolete conversion deadlock avoidance code
  [GFS2] Remove remote lock dropping code
  [GFS2] kernel panic mounting volume
  [GFS2] Revise readpage locking
  [GFS2] Fix ordering of args for list_add
  [GFS2] trivial sparse lock annotations
  [GFS2] No lock_nolock
  [GFS2] Fix ordering bug in lock_dlm
  [GFS2] Clean up the glock core
2008-07-15 10:38:46 -07:00
Jonathan Corbet
2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
David Howells
4abaca17e7 [GFS2] Fix GFS2's use of do_div() in its quota calculations
Fix GFS2's need_sync()'s use of do_div() on an s64 by using div_s64() instead.

This does assume that gt_quota_scale_den can be cast to an s32.

This was introduced by patch b3b94faa5f.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-11 14:35:01 +01:00
Li Xiaodong
a93a6ce242 [GFS2] Remove unused declaration
The implementation of gfs2_inode_attr_in is removed.
So remove its declaration.

Signed-off-by: Li Xiaodong <lixd@cn.fujitsu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-10 16:22:23 +01:00
Steven Whitehouse
c9f6a6bbc2 [GFS2] Remove support for unused and pointless flag
The ability to mark files for direct i/o access when opened
normally is both unused and pointless, so this patch removes
support for that feature.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-10 16:09:29 +01:00
Steven Whitehouse
9cabcdbd46 [GFS2] Replace rgrp "recent list" with mru list
This patch removes the "recent list" which is used during allocation
and replaces it with the (already existing) mru list used during
deletion. The "recent list" was not a true mru list leading to a number
of inefficiencies including a "next" function which made scanning the
list an order N^2 operation wrt to the number of list elements.

This should increase allocation performance with large numbers of rgrps.
Its also a useful preparation and cleanup before some further changes
which are planned in this area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-10 15:54:12 +01:00
Steven Whitehouse
209806aba9 [GFS2] Allow local DF locks when holding a cached EX glock
We already allow local SH locks while we hold a cached EX glock, so here
we allow DF locks as well. This works only because we rely on the VFS's
invalidation for locally cached data, and because if we hold an EX lock,
then we know that no other node can be caching data relating to this
file.

It dramatically speeds up initial writes to O_DIRECT files since we fall
back to buffered I/O for this and would otherwise bounce between DF and
EX modes on each and every write call. The lessons to be learned from
that are to ensure that (for the time being anyway) O_DIRECT files are
preallocated and that they are written to using reasonably large I/O
sizes. Even so this change fixes that corner case nicely

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-07 10:07:28 +01:00
Steven Whitehouse
265d529cef [GFS2] Fix delayed demote race
There is a race in the delayed demote code where it does the wrong thing
if a demotion to UN has occurred for other reasons before the delay has
expired. This patch adds an assert to catch that condition as well as
fixing the root cause by adding an additional check for the UN state.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2008-07-07 10:02:36 +01:00
Miklos Szeredi
f58ba88910 [GFS2] don't call permission()
GFS2 calls permission() to verify permissions after locks on the files
have been taken.

For this it's sufficient to call gfs2_permission() instead.  This
results in the following changes:

  - IS_RDONLY() check is not performed
  - IS_IMMUTABLE() check is not performed
  - devcgroup_inode_permission() is not called
  - security_inode_permission() is not called

IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
flag should provide protection against read-only remounts during
operations.  do_gfs2_set_flags() has been fixed to perform
mnt_want_write()/mnt_drop_write() to protect against remounting
read-only.

IS_IMMUTABLE has been added to gfs2_permission()

Repeating the security checks seems to be pointless, as they don't
normally change, and if they do, it's independent of the filesystem
state.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-03 10:22:01 +01:00
Andi Kleen
9465efc9e9 Remove BKL from remote_llseek v2
- Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use generic_file_llseek_unlocked directly or
take the BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Open problem: 32bit kernels can corrupt fpos because its modification
is not atomic, but they can do that anyways because there's other paths who
modify it without BKL.

Do we need a special lock for the pos/f_version = 0 checks?

Trond says the NFS BKL is likely not needed, but keep it for now
until his full audit.

v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
    and factor duplicated code (suggested by hch)

Cc: Trond.Myklebust@netapp.com
Cc: swhiteho@redhat.com
Cc: sfrench@samba.org
Cc: vandrove@vc.cvut.cz

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-02 15:06:27 -06:00
Steven Whitehouse
f17172e001 [GFS2] Fix module building
Two lines missed from the previous patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:40:57 +01:00
Steven Whitehouse
31fcba00fe [GFS2] Remove all_list from lock_dlm
I discovered that we had a list onto which every lock_dlm
lock was being put. Its only function was to discover whether
we'd got any locks left after umount. Since there was already
a counter for that purpose as well, I removed the list. The
saving is sizeof(struct list_head) per glock - well worth
having.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:50 +01:00
Steven Whitehouse
b2cad26cfc [GFS2] Remove obsolete conversion deadlock avoidance code
This is only used by GFS1 so can be removed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:47 +01:00
Steven Whitehouse
1bdad60633 [GFS2] Remove remote lock dropping code
There are several reasons why this is undesirable:

 1. It never happens during normal operation anyway
 2. If it does happen it causes performance to be very, very poor
 3. It isn't likely to solve the original problem (memory shortage
    on remote DLM node) it was supposed to solve
 4. It uses a bunch of arbitrary constants which are unlikely to be
    correct for any particular situation and for which the tuning seems
    to be a black art.
 5. In an N node cluster, only 1/N of the dropped locked will actually
    contribute to solving the problem on average.

So all in all we are better off without it. This also makes merging
the lock_dlm module into GFS2 a bit easier.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:44 +01:00
Bob Peterson
9171f5a991 [GFS2] kernel panic mounting volume
This patch fixes Red Hat bugzilla bug 450156.

This started with a not-too-improbable mount failure because the
locking protocol was never set back to its proper "lock_dlm" after the
system was rebooted in the middle of a gfs2_fsck.  That left a
(purposely) invalid locking protocol in the superblock, which caused an
error when the file system was mounted the next time.

When there's an error mounting, vfs calls DQUOT_OFF, which calls
vfs_quota_off which calls gfs2_sync_fs.  Next, gfs2_sync_fs calls
gfs2_log_flush passing s_fs_info.  But due to the error, s_fs_info
had been previously set to NULL, and so we have the kernel oops.

My solution in this patch is to test for the NULL value before passing
it.  I tested this patch and it fixes the problem.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:41 +01:00
Steven Whitehouse
01b7c7ae88 [GFS2] Revise readpage locking
The previous attempt to fix the locking in readpage failed due
to the use of a "try lock" which resulted in occasional high
cpu usage during testing (due to repeated tries) and also it
did not resolve all the ordering problems wrt the transaction
lock (although it did solve all the inode lock ordering problems).

This patch avoids the problem by unlocking the page and getting the
locks in the correct order. This means that we have to retest the
page to ensure that it hasn't changed when we relock the page.

This now passes the tests which were previously failing.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:37 +01:00
Steven Whitehouse
8027473722 [GFS2] Fix ordering of args for list_add
The patch to remove lock_nolock managed to get the arguments
of this list_add backwards. This fixes it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:34 +01:00
Harvey Harrison
2d81afb879 [GFS2] trivial sparse lock annotations
Annotate the &sdp->sd_log_lock.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:31 +01:00
Steven Whitehouse
048bca2237 [GFS2] No lock_nolock
This patch merges the lock_nolock module into GFS2 itself. As well as removing
some of the overhead of the module, it also means that its now impossible to
build GFS2 without a lock module (which would be a pointless thing to do
anyway).

We also plan to merge lock_dlm into GFS2 in the future, but that is a more
tricky task, and will therefore be a separate patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>
2008-06-27 09:39:28 +01:00
Steven Whitehouse
f3c9d38a26 [GFS2] Fix ordering bug in lock_dlm
This looks like a lot of change, but in fact its not. Mostly its
things moving from one file to another. The change is just that
instead of queuing lock completions and callbacks from the DLM
we now pass them directly to GFS2.

This gives us a net loss of two list heads per glock (a fair
saving in memory) plus a reduction in the latency of delivering
the messages to GFS2, plus we now have one thread fewer as well.
There was a bug where callbacks and completions could be delivered
in the wrong order due to this unnecessary queuing which is fixed
by this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2008-06-27 09:39:25 +01:00
Steven Whitehouse
6802e3400f [GFS2] Clean up the glock core
This patch implements a number of cleanups to the core of the
GFS2 glock code. As a result a lot of code is removed. It looks
like a really big change, but actually a large part of this patch
is either removing or moving existing code.

There are some new bits too though, such as the new run_queue()
function which is considerably streamlined. Highlights of this
patch include:

 o Fixes a cluster coherency bug during SH -> EX lock conversions
 o Removes the "glmutex" code in favour of a single bit lock
 o Removes the ->go_xmote_bh() for inodes since it was duplicating
   ->go_lock()
 o We now only use the ->lm_lock() function for both locks and
   unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
 o The fast path is considerably shortly, giving performance gains
   especially with lock_nolock
 o The glock_workqueue is now used for all the callbacks from the DLM
   which allows us to simplify the lock_dlm module (see following patch)
 o The way is now open to make further changes such as eliminating the two
   threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
   scheme.

This patch has undergone extensive testing with various test suites
so it should be pretty stable by now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2008-06-27 09:39:22 +01:00
Benjamin Marzinski
5af4e7a0be [GFS2] fix gfs2 block allocation (cleaned up)
This patch fixes bz 450641.

This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-24 19:02:28 +01:00
Bob Peterson
17c15da00c [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000
This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to
handle kernel paging request at ffff81002690e000.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-24 14:17:45 +01:00
Jean Delvare
00377d8e38 [GFS2] Prefer strlcpy() over snprintf()
strlcpy is faster than snprintf when you don't use the returned value.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-05-12 08:57:11 +01:00
Andrew Price
ad99f77778 [GFS2] Fix cast from unsigned int to s64
This fixes bz 444829 where allocating a new block caused gfs2 file systems to
report 0 bytes used in df. It was caused by a broken cast from an unsigned int
in gfs2_block_alloc() to a negative s64 in gfs2_statfs_change(). This patch
casts the unsigned int to an s64 before the unary minus is applied.

Signed-off-by: Andrew Price <andy@andrewprice.me.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-05-12 08:54:56 +01:00
Bob Peterson
091806edd4 [GFS2] filesystem consistency error from do_strip
This patch fixes a GFS2 filesystem consistency error reported from
function do_strip.  The problem was caused by a timing window
that allowed two vfs inodes to be created in memory that point
to the same file.  The problem is fixed by making the vfs's
iget_test, iget_set mechanism check and set a new bit in the
in-core gfs2_inode structure while the vfs inode spin_lock is held.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-05-12 08:54:53 +01:00
Harvey Harrison
8e24eea728 fs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
Nick Piggin
3c18ddd160 mm: remove nopage
Nothing in the tree uses nopage any more.  Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
David Teigland
2402211a83 dlm: move plock code from gfs2
Move the code that handles cluster posix locks from gfs2 into the dlm
so that it can be used by both gfs2 and ocfs2.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-04-21 11:22:28 -05:00
Roel Kluin
62be1f7167 [GFS2] fix assertion in log_refund()
since unsigned, unused >= 0 is always true.

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-04-18 08:36:09 +01:00
Josef Bacik
16c5f06f15 [GFS2] fix GFP_KERNEL misuses
There are several places where GFP_KERNEL allocations happen under a glock,
which will result in hangs if we're under memory pressure and go to re-enter the
fs in order to flush stuff out.  This patch changes the culprits to GFS_NOFS to
keep this problem from happening.  Thank you,

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-04-10 09:55:26 +01:00
Julia Lawall
773adff8e9 [GFS2] test for IS_ERR rather than 0
The function gfs2_inode_lookup always returns either a valid pointer or a
value made with ERR_PTR, so its result should be tested with IS_ERR, not
with a test for 0.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@

E = gfs2_inode_lookup(...)
... when != E = E1
if@p (E) S else S1

@n@
position a.p;
expression E,E1;
statement S,S1;
@@

E = NULL
... when != E = E1
if@p (E) S else S1

@depends on !n@
expression E;
statement S,S1;
position a.p;
@@

* if@p (E)
  S else S1
//</smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:46 +01:00
Benjamin Marzinski
58e9fee13e [GFS2] Invalidate cache at correct point
GFS2 wasn't invalidating its cache before it called into the lock manager
with a request that could potentially drop a lock.  This was leaving a
window where the lock could be actually be held by another node, but the
file's page cache would still appear valid, causing coherency problems.
This patch moves the cache invalidation to before the lock manager call
when dropping a lock. It also adds the option to the lock_dlm lock
manager to not use conversion mode deadlock avoidance, which, on a
conversion from shared to exclusive, could internally drop the lock, and
then reacquire in. GFS2 now asks lock_dlm to not do this.  Instead, GFS2
manually drops the lock and reacquires it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:44 +01:00
akpm@linux-foundation.org
f5a8cd0201 [GFS2] fs/gfs2/recovery.c: suppress warnings
fs/gfs2/recovery.c: In function 'get_log_header':
fs/gfs2/recovery.c:152: warning: 'lh.lh_sequence' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_flags' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_tail' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_blkno' may be used uninitialized in this function
fs/gfs2/recovery.c:152: warning: 'lh.lh_hash' may be used uninitialized in this function

Cc: David Teigland <teigland@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:41 +01:00
Bob Peterson
1f466a47e8 [GFS2] Faster gfs2_bitfit algorithm
This version of the gfs2_bitfit algorithm includes the latest
suggestions from Steve Whitehouse.  It is typically eight to
ten times faster than the version we're using today.  If there
is a lot of metadata mixed in (lots of small files) the
algorithm is often 15 times faster, and given the right
conditions, I've seen peaks of 20 times faster.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:39 +01:00
Steven Whitehouse
d82661d969 [GFS2] Streamline quota lock/check for no-quota case
This patch streamlines the quota checking in the "no quota" case by
making the check inline in the calling function, thus reducing the
number of function calls. Eventually we might be able to remove the
checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but
currently we can't as there are a very few places in the code which need
to call these functions directly still.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2008-03-31 10:41:36 +01:00
Steven Whitehouse
860b25d4a9 [GFS2] Remove drop of module ref where not needed
In an earlier patch "[GFS2] fix file_system_type leak on gfs2meta mount"
we removed the code to grab a ref to the module which was not needed
(since we know that the module cannot be unloaded at that time) so
this patch removes the code to drop that reference.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:33 +01:00
Abhijith Das
20b95bf2c4 [GFS2] gfs2_adjust_quota has broken unstuffing code
This patch combines the 2 patches in bug 434736 to correct the lock
ordering in the unstuffing of the quota inode in gfs2_adjust_quota and
adjusting the number of revokes in gfs2_write_jdata_pagevec

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:30 +01:00
Cyrill Gorcunov
182fe5abd8 [GFS2] possible null pointer dereference fixup
gfs2_alloc_get may fail so we have to check it to prevent
NULL pointer dereference.

Signed-off-by: Cyrill Gorcunov <gorcunov@gamil.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:28 +01:00
Steven Whitehouse
105284970b [GFS2] Need to ensure that sector_t is 64bits for GFS2
We need to ensure that sector_t is 64bits for GFS2, so that we need to
depend on LBD as well as LSF.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:25 +01:00
Denis Cheng
43a33c53cc [GFS2] re-support special inode
a previous commit removed call to
init_special_inode from inode lookuping, this cause problems as:

 # mknod /mnt/gfs2/dev/null c 1 3
 # cat /mnt/gfs2/dev/null
 cat: /mnt/gfs2/dev/null: Invalid argument

without special inode, GFS2 cannot support char device file,
block device file, fifo pipe, and socket file, lose many important
features as a common file system.

this one line patch re add special inode support.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:22 +01:00
Denis Cheng
d83225d45d [GFS2] remove gfs2_dev_iops
struct inode_operations gfs2_dev_iops is always the same as gfs2_file_iops,
since Jan 2006, when GFS2 merged into mainstream kernel.

So one of them could be removed.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:20 +01:00
Christoph Hellwig
7dc2cf1c8f [GFS2] fix file_system_type leak on gfs2meta mount
get_gfs2_sb does a get_fs_type without doing a put_filesystem and
thus leaking a file_system_type reference everytime it's called.

Just use gfs2_fs_type directly instead of doing the lookup and thus
fix the problem.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:17 +01:00
Steven Whitehouse
9b8c81d1de [GFS2] Allow bmap to allocate extents
We've supported mapping of extents when no block allocation is required
for some time. This patch extends that to mapping of extents when an
allocation has been requested. In that case we try to allocate as many
blocks as are requested, but we might return fewer in case there is
something preventing us from returning the complete amount (e.g. an
already allocated block is in the way).

Currently the only code path which can actually request multiple data
blocks in a single bmap call is the page_mkwrite path and even then it
only happens if there are multiple blocks per page. What this patch does
do however, is merge the allocation requests for metadata (growing the
metadata tree in either height or depth) with the allocation of the data
blocks in the case that both are needed. This results in lower overheads
even in the single block allocation case.

The one thing which we can't handle here at the moment is unstuffing. I
would like to be able to do that, but the problem which arises is that
in order to unstuff one has to get a locked page from the page cache
which results in locking problems in the (usual) case that the caller is
holding the page lock on the page it wishes to map. So that case will
have to be addressed in future patches.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:14 +01:00
Steven Whitehouse
7afd88d916 [GFS2] Fix a page lock / glock deadlock
We've previously been using a "try lock" in readpage on the basis that
it would prevent deadlocks due to the inverted lock ordering (our normal
lock ordering is glock first and then page lock). Unfortunately tests
have shown that this isn't enough. If the glock has a demote request
queued such that run_queue() in the glock code tries to do a demote when
its called under readpage then it will try and write out all the dirty
pages which requires locking them. This then deadlocks with the page
locked by readpage.

The solution is to always require two calls into readpage. The first
unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the
second does the actual readpage and unlocks the glock & page as
required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:12 +01:00
Adrian Bunk
60b779cfc1 [GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_ops
This patch adds a proper extern declaration for gdlm_ops in
fs/gfs2/locking/dlm/lock_dlm.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:09 +01:00
Adrian Bunk
8af4c72f7d [GFS2] gfs2/ops_file.c should #include "ops_inode.h"
Every file should include the headers containing the prototypes for
its global functions (in this case for gfs2_set_inode_flags()).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:06 +01:00
Marcin Slusarz
bb16b342b2 [GFS2] be*_add_cpu conversion
replace all:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
					expression_in_cpu_byteorder);
with:
	beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:41:03 +01:00
Steven Whitehouse
840ca0ec70 [GFS2] Fix bug where we called drop_bh incorrectly
As a result of an earlier patch, drop_bh was being called in cases
when it shouldn't have been. Since we never have a gh in the drop
case and we always have a gh in the promote case, we can use that
extra information to tell which case has been seen.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2008-03-31 10:41:01 +01:00
Steven Whitehouse
e23159d2a7 [GFS2] Get inode buffer only once per block map call
In the case that we needed to grow the height of the metadata tree
we were looking up the inode buffer and then brelse()ing it despite
the fact that it is needed later in the block map process.

This patch ensures that we look up the inode's buffer once and only
once during the block map process.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:58 +01:00
Steven Whitehouse
77658aad22 [GFS2] Eliminate (almost) duplicate field from gfs2_inode
The blocks counter is almost a duplicate of the i_blocks
field in the VFS inode. The only difference is that i_blocks
can be only 32bits long for 32bit arch without large single file
support. Since GFS2 doesn't handle the non-large single file
case (for 32 bit anyway) this adds a new config dependency on
64BIT || LSF. This has always been the case, however we've never
explicitly said so before.

Even if we do add support for the non-LSF case, we will still
not require this field to be duplicated since we will not be
able to access oversized files anyway.

So the net result of all this is that we shave 8 bytes from a gfs2_inode
and get our config deps correct.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:55 +01:00
Steven Whitehouse
30cbf189cd [GFS2] Add a function to interate over an extent
This adds a function (currently the only use is during mapping
of already allocated blocks, but watch this space) which iterates
over a number of pointers in a block and returns the extent length.

If the initial pointer is 0 (i.e. unallocated) it will return the
number of unallocated blocks in the extent. If the initial pointer
is allocated, then it returns the number of contiguously allocated
blocks in the extent.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:53 +01:00
Steven Whitehouse
c85a665f06 [GFS2] The case of the missing asterisk
A dereference was forgotten. This adds it back correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:50 +01:00
Steven Whitehouse
b45e41d7d5 [GFS2] Add extent allocation to block allocator
Rather than having to allocate a single block at a time, this patch
allows the block allocator to allocate an extent. Since there is
no difference (so far as the block allocator is concerned) between
data blocks and indirect blocks, it is posible to allocate a single
extent and for the caller to unrevoke just the blocks required
for indirect blocks.

Currently the only bit of GFS2 to make use of this feature is the
build height function. The intention is that gfs2_block_map will
be changed to make use of this feature in future patches.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:47 +01:00
Steven Whitehouse
1639431a3f [GFS2] Merge gfs2_alloc_meta and gfs2_alloc_data
Thanks to the preceeding patches, the only difference between
these two functions is their name. We can thus merge them
and call the new function gfs2_alloc_block to reflect the
fact that it can allocate either kind of block.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:45 +01:00
Steven Whitehouse
5731be53e3 [GFS2] Update gfs2_trans_add_unrevoke to accept extents
By adding an extra argument to gfs2_trans_add_unrevoke we can now
specify an extent length of blocks to unrevoke. This means that
we only need to make one pass through the list for each extent
rather than each block. Currently the only extent length which
is used is 1, but that will change in the future.

Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta
since its the only difference between this and gfs2_alloc_data
which is left. This will allow a future patch to merge these
two functions into one (i.e. one call to allocate both data
and metadata in a single extent in the future).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:42 +01:00
Steven Whitehouse
ac576cc5be [GFS2] Merge the rd_last_alloc_meta and rd_last_alloc_data fields
We don't need to keep track of when we last allocated data
and metadata separately since the only thing thats important
when searching for a free block is whether its free or not,
which is independent from what type of block it is.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:39 +01:00
Steven Whitehouse
ce276b06e8 [GFS2] Reduce inode size by merging fields
There were three fields being used to keep track of the location
of the most recently allocated block for each inode. These have
been merged into a single field in order to better keep the
data and metadata for an inode close on disk, and also to reduce
the space required for storage.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:37 +01:00
Bob Peterson
9feb7c889f [GFS2] Remove unused counters
This is kind of trivial in the greater scheme of things, but
this removes three counters that AFAICT are never used.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:34 +01:00
Steven Whitehouse
9a0045088d [GFS2] Shrink & rename di_depth
This patch forms a pair with the previous patch which shrunk
di_height. Like that patch di_depth is renamed i_depth and moved
into struct gfs2_inode directly. Also the field goes from 16 bits
to 8 bits since it is also limited to a max value which is rather
small (17 in this case). In addition we also now validate the field
against this maximum value when its read in.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:31 +01:00
Bob Peterson
cf45b752c9 [GFS2] Remove rgrp and glock version numbers
This patch further reduces GFS2's memory requirements by
eliminating the 64-bit version number fields in lieu of
a couple bits.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:29 +01:00
Steven Whitehouse
da755fdb41 [GFS2] Remove lm.[ch] and distribute content
The functions in lm.c were just wrappers which were mostly
only used in one other file. By moving the functions to
the files where they are being used, they can be marked
static and also this will usually result in them being inlined
since they are often only used from one point in the code.

A couple of really trivial functions have been inlined by hand
into the function which called them as it makes the code clearer
to do that.

We also gain from one fewer function call in the glock lock and
unlock paths.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:26 +01:00
Bob Peterson
ab0d756681 [GFS2] Eliminate gl_req_bh
This patch further reduces the memory needs of GFS2 by
eliminating the gl_req_bh variable from struct gfs2_glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:23 +01:00
Steven Whitehouse
110acf3837 [GFS2] Add consts to various bits of rgrp.c
There are a couple of routines which scan bitmaps where we can
mark the bitmaps const, plus a couple of call sites that can
be updated too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:21 +01:00
Steven Whitehouse
dbac6710a6 [GFS2] Introduce array of buffers to struct metapath
The reason for doing this is to allow all the block mapping code
to share the same array. As a result we can remove two arguments
from lookup_metapath since they are now returned via the array.

We also add a function to drop all refs to buffer heads when we
are done with the metapath. The build_height function shares the
struct metapath, but currently still frees its own buffers, and
this will change in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:18 +01:00
Steven Whitehouse
11707ea05e [GFS2] Move part of gfs2_block_map into a separate function
This is required to enable future changes to the block
mapping code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:15 +01:00
Bob Peterson
29d38cd163 [GFS2] Get rid of gl_waiters2
This patch reduces memory by replacing the int variable
gl_waiters2 by a single bit in the gl_flags.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:13 +01:00
Bob Peterson
42d52e3818 [GFS2] Combine rg_flags and rd_flags
This patch reduces the memory required by GFS2 by combining
the rd_flags and rg_flags (in core only).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:10 +01:00
Bob Peterson
6bdd9be628 [GFS2] Allocate gfs2_rgrpd from slab memory
This patch moves the gfs2_rgrpd structure to its own slab
memory.  This makes it easier to control and monitor, and
yields less memory fragmentation.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:07 +01:00
Bob Peterson
3ad62e87cd [GFS2] Plug an unlikely leak
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:05 +01:00
Adrian Bunk
048786f1e6 [GFS2] make gfs2_glock_hold() static
gfs2_glock_hold() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:02 +01:00
Bob Peterson
ef8c441cb7 [GFS2] Only wake the reclaim daemon if we need to
This patch only wakes up the glock reclaim daemon if there is
actually something to be reclaimed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:40:00 +01:00
Bob Peterson
7eabb77e65 [GFS2] Misc fixups
This patch contains two small fixups that didn't fit elsewhere.
They are: (1) get rid of temp variable in find_metapath.
(2) Remove vestigial "ret" variable from gfs2_writepage_common.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:39:57 +01:00
Bob Peterson
d0109bfa84 [GFS2] Only do lo_incore_commit once
This patch is performance related.  When we're doing a log flush,
I noticed we were calling buf_lo_incore_commit twice: once for
data bufs and once for metadata bufs.  Since this is the same
function and does the same thing in both cases, there should be
no reason to call it twice.  Since we only need to call it once,
we can also make it faster by removing it from the generic "lops"
code and making it a stand-along static function.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:39:54 +01:00
Bob Peterson
ca390601a8 [GFS2] Fix debug inode printing
I noticed that the latest change to i_height got rid of the
value from the inode dump.  This patch adds it back.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:39:52 +01:00
Bob Peterson
fe6c991c52 [GFS2] Get rid of unneeded parameter in gfs2_rlist_alloc
This patch removed the unnecessary parameter from function
gfs2_rlist_alloc.  The parameter was always passed in as 0.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:39:49 +01:00
Steven Whitehouse
ecc30c7915 [GFS2] Streamline indirect pointer tree height calculation
This patch improves the calculation of the tree height in order to reduce
the number of operations which are carried out on each call to gfs2_block_map.
In the common case, we now make a single comparison, rather than calculating
the required tree height from scratch each time. Also in the case that the
tree does need some extra height, we start from the current height rather from
zero when we work out what the new height ought to be.

In addition the di_height field is moved into the inode proper and reduced
in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:39:46 +01:00
Steven Whitehouse
941e6d7d09 [GFS2] Speed up gfs2_write_alloc_required, deprecate gfs2_extent_map
This patch removes the call to gfs2_extent_map from gfs2_write_alloc_required,
instead we call gfs2_block_map directly. This results in fewer overall calls
to gfs2_block_map in the multi-block case.

Also, gfs2_extent_map is marked as deprecated so that people know that its
going away as soon as all the callers have been converted.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31 10:39:44 +01:00
Jan Blunck
1d957f9bf8 Introduce path_put()
* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck
4ac9137858 Embed a struct path into struct nameidata instead of nd->{dentry,mnt}
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
  <dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
  struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
   text    data     bss     dec     hex filename
5321639  858418  715768 6895825  6938d1 vmlinux

with patch series:
   text    data     bss     dec     hex filename
5320026  858418  715768 6894212  693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
David Howells
69840b0d06 iget: use iget_failed() in GFS2
Use iget_failed() in GFS2 to kill a failed inode.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:27 -08:00
David Howells
e231c2ee64 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
Pavel Emelyanov
eccba06891 gfs2: make gfs2_glock.gl_owner_pid be a struct pid *
The gl_owner_pid field is used to get the lock owning task by its pid, so make
it in a proper manner, i.e.  by using the struct pid pointer and pid_task()
function.

The pid_task() becomes exported for the gfs2 module.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:06 -08:00
Pavel Emelyanov
b1e058da50 gfs2: make gfs2_holder.gh_owner_pid be a struct pid *
The gl_owner_pid field is used to get the holder task by its pid and check
whether the current is a holder, so make it in a proper manner, i.e.  via the
struct pid * manipulations.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:06 -08:00
Christoph Lameter
eebd2aa355 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user
Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:13 -08:00
Joe Perches
c78bad11fb fs/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:33:42 +02:00
Linus Torvalds
e07dd2ad30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (56 commits)
  [GFS2] Allow journal recovery on read-only mount
  [GFS2] Lockup on error
  [GFS2] Fix page_mkwrite truncation race path
  [GFS2] Fix typo
  [GFS2] Fix write alloc required shortcut calculation
  [GFS2] gfs2_alloc_required performance
  [GFS2] Remove unneeded i_spin
  [GFS2] Reduce inode size by moving i_alloc out of line
  [GFS2] Fix assert in log code
  [GFS2] Fix problems relating to execution of files on GFS2
  [GFS2] Initialize extent_list earlier
  [GFS2] Allow page migration for writeback and ordered pages
  [GFS2] Remove unused variable
  [GFS2] Fix log block mapper
  [GFS2] Minor correction
  [GFS2] Eliminate the no longer needed sd_statfs_mutex
  [GFS2] Incremental patch to fix compiler warning
  [GFS2] Function meta_read optimization
  [GFS2] Only fetch the dinode once in block_map
  [GFS2] Reorganize function gfs2_glmutex_lock
  ...
2008-01-25 08:39:18 -08:00
Abhijith Das
7bc5c414fe [GFS2] Allow journal recovery on read-only mount
This patch allows gfs2 to perform journal recovery even if it is mounted
read-only. Strictly speaking, a read-only mount should not be writing to
the filesystem, but we do this only to perform journal recovery. A
read-only mount will fail if we don't recover the dirty journal. Also,
when gfs2 is used as a root filesystem, it will be mounted read-only
before being mounted read-write during the boot sequence. A failed
read-only mount will panic the machine during bootup.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:21:22 +00:00
Bob Peterson
1b8177ec1e [GFS2] Lockup on error
I spotted this bug while I was digging around.  Looks like it could cause
a lockup in some rare error condition.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:21:04 +00:00
Steven Whitehouse
b7fe2e391e [GFS2] Fix page_mkwrite truncation race path
There was a bug in the truncation/invalidation race path for
->page_mkwrite for gfs2. It ought to return 0 so that the effect is the
same as if the page was truncated at any of the other points at which
the page_lock is dropped. This will result in the restart of the whole
page fault path. If it was due to a real truncation (as opposed to an
invalidate because we let a glock go) then the ->fault path will pick
that up when it gets called again.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:20:15 +00:00
Bob Peterson
3e5cd0877e [GFS2] Fix typo
This patch fixes a minor typo.  Surprisingly, it still compiled.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:19:51 +00:00
Steven Whitehouse
1af535727b [GFS2] Fix write alloc required shortcut calculation
The comparison was being made against the wrong quantity.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:19:28 +00:00
Bob Peterson
0522053519 [GFS2] gfs2_alloc_required performance
This is a small I/O performance enhancement to gfs2.  (Actually, it is a rework of
an earlier version I got wrong).  The idea here is to check if the write extends
past the last block in the file.  If so, the function can save itself a lot of
time and trouble because it knows an allocate will be required.  Benchmarks like
iozone should see better performance.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:19:03 +00:00
Bob Peterson
598278bd48 [GFS2] Remove unneeded i_spin
This patch removes a vestigial variable "i_spin" from the gfs2_inode
structure.  This not only saves us memory (>300000 of these in memory
for the oom test) it also saves us time because we don't have to
spend time initializing it (i.e. slightly better performance).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:18:44 +00:00
Steven Whitehouse
6dbd822487 [GFS2] Reduce inode size by moving i_alloc out of line
It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.

In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:18:25 +00:00
Steven Whitehouse
ac39aadd04 [GFS2] Fix assert in log code
Although the values were all being calculated correctly, there was a
race in the assert due to the way it was using atomic variables. This
changes the value we assert on so that we get the same effect by testing
a different variable. This prevents the assert triggering when it shouldn't.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:18:03 +00:00
Steven Whitehouse
9656b2c14c [GFS2] Fix problems relating to execution of files on GFS2
This patch fixes a couple of problems which affected the execution of files
on GFS2. The first is that there was a corner case where inodes were not
always uptodate at the point at which permissions checks were being carried
out, this was resulting in refusal of execute permission, but only on the
first lookup, subsequent requests worked correctly. The second was a problem
relating to incorrect updating of file sizes which was introduced with the
write_begin/end code for GFS2 a little while back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2008-01-25 08:17:31 +00:00
Bob Peterson
0811a127cb [GFS2] Initialize extent_list earlier
Here is a patch for the latest upstream GFS2 code:
The journal extent map needs to be initialized sooner than it
currently is.  Otherwise failed mount attempts (e.g. not enough
journals, etc.) may panic trying to access the uninitialized list.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:17:04 +00:00
Steven Whitehouse
e5d9dc278c [GFS2] Allow page migration for writeback and ordered pages
To improve performance on NUMA, we use the VM's standard page
migration for writeback and ordered pages. Probably we could
also do the same for journaled data, but that would need a
careful audit of the code, so will be the subject of a later
patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:16:41 +00:00
Steven Whitehouse
65a6290998 [GFS2] Remove unused variable
The go_drop_th function is never called or referenced.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:16:19 +00:00
Steven Whitehouse
ff91cc9bb4 [GFS2] Fix log block mapper
A missing offset in the calculation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:15:58 +00:00
Bob Peterson
fa3742fa85 [GFS2] Minor correction
This is a small correction to my previously posted patch1.
It just changes a divide to a shift.  It's faster and doesn't
introduce odd dependencies on 32-bit compiles.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:15:37 +00:00
Bob Peterson
c3f60b6e3a [GFS2] Eliminate the no longer needed sd_statfs_mutex
This patch eliminates the unneeded sd_statfs_mutex mutex but preserves
the ordering as discussed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:15:16 +00:00
Bob Peterson
b3513fca7e [GFS2] Incremental patch to fix compiler warning
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:14:53 +00:00
Bob Peterson
15c7cee799 [GFS2] Function meta_read optimization
This patch optimizes function gfs2_meta_read.  Basically, gfs2_meta_wait
was being called regardless of whether a disk read was requested.
This just pulls that wait into the if that triggers the read.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:14:33 +00:00
Bob Peterson
b0d5fd3074 [GFS2] Only fetch the dinode once in block_map
Function gfs2_block_map was often looking up the disk inode twice.
This optimizes it so that only does it once.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:14:13 +00:00
Bob Peterson
398bbe6832 [GFS2] Reorganize function gfs2_glmutex_lock
This patch optimizes the function gfs2_glmutex_lock.
The basic theory is: Why bother initializing a holder, setting up
wait bits and then waiting on them, if you know the glock can be
yours.  So the holder stuff is placed inside the if checking if the
glock is locked.  This one needs careful scrutiny because changing
anything to do with locking should strike terror into one's heart.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:13:52 +00:00
Bob Peterson
5fdc2eeb5d [GFS2] Run through full bitmaps quicker in gfs2_bitfit
I eliminated the passing of an unused parameter into gfs2_bitfit called rgd.

This also changes the gfs2_bitfit code that searches for free (or used) blocks.
Before, the code was trying to check for bytes that indicated 4 blocks in
the undesired state.  The problem is, it was spending more time trying to
do this than it actually was saving.  This version only optimizes the case
where we're looking for free blocks, and it checks a machine word at a time.
So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit
machines, it will check 64-bits (32 blocks) at a time.  The compiler
optimizes that quite well and we save some time, especially when running
through full bitmaps (like the bitmaps allocated for the journals).

There's probably a more elegant or optimized way to do this, but I haven't
thought of it yet.  I'm open to suggestions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:13:31 +00:00
Bob Peterson
0d0868bde3 [GFS2] Get rid of useless "found" variable in quota.c
This just eliminates an unused variable from the quota code.
Not likely to be a time saver.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:13:01 +00:00
Bob Peterson
da6dd40d59 [GFS2] Journal extent mapping
This patch saves a little time when gfs2 writes to the journals by
keeping a mapping between logical and physical blocks on disk.
That's better than constantly looking up indirect pointers in
buffers, when the journals are several levels of indirection
(which they typically are).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:11:46 +00:00
Bob Peterson
e9e1ef2b6e [GFS2] Remove function gfs2_get_block
This patch is just a cleanup.  Function gfs2_get_block() just calls
function gfs2_block_map reversing the last two parameters.  By
reversing the parameters, gfs2_block_map() may be called directly
and function gfs2_get_block may be eliminated altogether.
Since this function is done for every block operation,
this streamlines the code and makes it a little bit more efficient.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:25 +00:00
David Teigland
2066b58b0a [GFS2] use pid for plock owner for nfs clients
The fl_owner is that of lockd when posix locks arrive from nfs
clients, so it can't be used to distinguish between lock holders.
Use fl_pid as owner instead; it's the pid of the process on the
nfs client.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:23 +00:00
Steven Whitehouse
dbee2199c3 [GFS2] Remove unused variable
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:20 +00:00
Abhijith Das
292c8c14ca [GFS2] patch to check for recursive lock requests in gfs2_rename code path
A certain scenario in the rename code path triggers a kernel BUG()
because it accidentally does recursive locking The first lock is
requested to unlink an already existing inode (replacing a file) and the
second lock is requested when the destination directory needs to alloc
some space. It is rare that these two
events happen during the same rename call, and even more rare that these
two instances try to lock the same rgrp. It is, however, possible.
https://bugzilla.redhat.com/show_bug.cgi?id=404711

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:18 +00:00
Wendy Cheng
c97bfe4351 [GFS2] Remove lock methods for lock_nolock protocol
GFS2 supports two modes of locking - lock_nolock for single node filesystem
and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from
file operation table for lock_nolock protocol. This would allow VFS to handle
posix lock and flock logics just like other in-tree filesystems without
duplication.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:15 +00:00
Fabio M. Di Nitto
bcd405599f [GFS2] Remove unrequired code
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:13 +00:00
Fabio Massimo Di Nitto
6a69a23f7d [GFS2] Fix build warnings
Hi Steven,

Steven Whitehouse wrote:
> Hi,
>
> Now in the -nmw git tree. Thanks,
>
> Steve.
>
> On Wed, 2007-11-21 at 11:54 -0600, Ryan O'Hara wrote:

this patch introduces a bunch of build warnings by leaving around

struct inode *inode = &ip->i_inode;

The patch in attachment cleans them up. Please apply.

Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:11 +00:00
Ryan O'Hara
002ef1dc63 [GFS2] remove unnecessary permission checks
Remove read/write permission() checks from xattr operations.
VFS layer is already handling permission for xattrs via the
xattr_permission() call, so there is no need for gfs2 to
check permissions. Futhermore, using permission() for SELinux
xattrs ops is incorrect.

Signed-off-by: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:08 +00:00
Fabio Massimo Di Nitto
1a2781cfa5 [GFS2] Fix runtime issue with UP kernels
The issue is indeed UP vs SMP and it is totally random.

spin_is_locked() is a bad assertion because there is no correct answer on UP.
on UP spin_is_locked() has to return either one value or another, always.

This means that in my setup I am lucky enough to trigger the issue and your you
are lucky enough not to.

the patch in attachment removes the bogus calls to BUG_ON and according to David
(in CC and thanks for the long explanation on the problem) we can rely upon
things like lockdep to find problem that might be trying to catch.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:06 +00:00
David Teigland
00c134756c [GFS2] tidy up error message
Print error with log_error() to be consistent with others.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:04 +00:00
Fabio Massimo Di Nitto
0b7580c786 [GFS2] Check for installation of mount helpers for DLM mounts
The patch is a fix to abort mount if the mount.gfs* and possible
umount.* are missing from /sbin.

While we do what we can to guarantee that they are installed properly in
userland (CVS HEAD), we want to make sure that mount still aborts properly.

The only sign of missing helpers is that lock_dlm will receive no mount options
at all. According to David the problem does not exist for lock_nolock as the
helpers are not required.

The patch has been tested for both gfs and gfs2 and it works as expected. The
lack of mount.gfs* will generate an error that is propagated to mount:

oot@node1:~# mount -t  gfs2 /dev/nbd2 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/nbd2,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

[ 3513.303346] GFS2: fsid=: Trying to join cluster "lock_dlm", "gutsy:gfs2"
[ 3513.304546] DLM/GFS2/GFS ERROR: (u)mount helpers are not installed properly!
[ 3513.306290] GFS2: fsid=: can't mount proto=lock_dlm, table=gutsy:gfs2, hostdata=

You might want to notice that it will also avoid mount to hang or fail silently
or with strange errors that will require the cluster to reboot/restart before
you can actually mount the filesystem again.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:08:01 +00:00
Steven Whitehouse
e35b921185 [GFS2] Don't periodically update the jindex
We only care about the content of the jindex in two cases,
one is when we mount the fs and the other is when we need
to recover another journal. In both cases we have to update
the jindex anyway, so there is no point in updating it
periodically between times, so this removes it to simplify
gfs2_logd.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:59 +00:00
Steven Whitehouse
ec69b18883 [GFS2] Move gfs2_logd into log.c
This means that we can mark gfs2_ail1_empty static and prepares
the way for further changes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:56 +00:00
Steven Whitehouse
fd041f0b40 [GFS2] Use atomic_t for journal free blocks counter
This patch changes the counter which keeps track of the free
blocks in the journal to an atomic_t in preparation for the
following patch which will update the log reservation code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:54 +00:00
Steven Whitehouse
2bcd610d2f [GFS2] Don't add glocks to the journal
The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.

This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:52 +00:00
David Teigland
8cbc434247 [GFS2] check kthread_should_stop when waiting
Use wait_event_interruptible() in the lock_dlm thread instead
of an open coded equivalent, and include a kthread_should_stop()
check in the wait test so we don't miss a kthread_stop().

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:49 +00:00
Bob Peterson
c7227e4642 [GFS2] Given device ID rather than s_id in "id" sysfs file
This patch changes the /sys/fs/gfs2/<s_id>/id file to give the device
id "major:minor" rather than the s_id.  That enables gfs2_tool to
match devices properly (by id, not name) when locating the tuning files.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:47 +00:00
Steven Whitehouse
e589665eb9 [GFS2] Remove flags no longer required
The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
depending upon which of the two waiters lists they were going to
be queued upon. They were then tested when the holders were taken
off the lists to ensure that the right type of holder was being
dequeued.

Since we are already using separate lists, there doesn't seem a
lot of point having these flags as well, and since setting them
and testing them is in the fast path for locking and unlocking
glock, this patch removes them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:44 +00:00
Steven Whitehouse
3042a2ccd6 [GFS2] Reorder writeback for glock sync
Previously we were doing (write data, wait for data, write metadata, wait
for metadata). After this patch we so (write metadata, write data, wait for
data, wait for metadata) which should be more efficient.

Also I noticed that the drop_bh and xmote_bh functions were almost
identical. In fact the only difference was a single test, and that
test is such that in the drop_bh case, it would always evaluate to
the correct result. As such we can use the xmote_bh functions in
all the places where we were using the drop_bh function and remove
the drop_bh functions.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:42 +00:00
Steven Whitehouse
52d4c74b08 [GFS2] Add sync_page to metadata address space operations
This set of address space operations was missing a sync_page
operation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:40 +00:00
Steven Whitehouse
c2932e03db [GFS2] Remove "reclaim limit"
This call to reclaim glocks is not needed, and in particular we don't want it
in the fast path for locking glocks. The limit was entirely arbitrary anyway
and we can't expect users to adjust things like this, the remaining code will
do the right thing on its own.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:37 +00:00
Steven Whitehouse
60b0d08779 [GFS2] Remove unused variables
These haven't been used for some time, remove them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25 08:07:35 +00:00