linux

Author	SHA1	Message	Date
Chris Mason	93c82d5750	Btrfs: zero page past end of inline file items When btrfs_get_extent is reading inline file items for readpage, it needs to copy the inline extent into the page. If the inline extent doesn't cover all of the page, that means there is a hole in the file, or that our file is smaller than one page. readpage does zeroing for the case where the file is smaller than one page, but nobody is currently zeroing for the case where there is a hole after the inline item. This commit changes btrfs_get_extent to zero fill the page past the end of the inline item. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:08 -04:00
Chris Mason	50a9b214bc	Btrfs: fix btrfs page_mkwrite to return locked page This closes a whole where the page may be written before the page_mkwrite caller has a chance to dirty it (thanks to Nick Piggin) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:08 -04:00
Chris Mason	a1ed835e1a	Btrfs: Fix extent replacment race Data COW means that whenever we write to a file, we replace any old extent pointers with new ones. There was a window where a readpage might find the old extent pointers on disk and cache them in the extent_map tree in ram in the middle of a given write replacing them. Even though both the readpage and the write had their respective bytes in the file locked, the extent readpage inserts may cover more bytes than it had locked down. This commit closes the race by keeping the new extent pinned in the extent map tree until after the on-disk btree is properly setup with the new extent pointers. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:07 -04:00
Chris Mason	8b62b72b26	Btrfs: Use PagePrivate2 to track pages in the data=ordered code. Btrfs writes go through delalloc to the data=ordered code. This makes sure that all of the data is on disk before the metadata that references it. The tracking means that we have to make sure each page in an extent is fully written before we add that extent into the on-disk btree. This was done in the past by setting the EXTENT_ORDERED bit for the range of an extent when it was added to the data=ordered code, and then clearing the EXTENT_ORDERED bit in the extent state tree as each page finished IO. One of the reasons we had to do this was because sometimes pages are magically dirtied without page_mkwrite being called. The EXTENT_ORDERED bit is checked at writepage time, and if it isn't there, our page become dirty without going through the proper path. These bit operations make for a number of rbtree searches for each page, and can cause considerable lock contention. This commit switches from the EXTENT_ORDERED bit to use PagePrivate2. As pages go into the ordered code, PagePrivate2 is set on each one. This is a cheap operation because we already have all the pages locked and ready to go. As IO finishes, the PagePrivate2 bit is cleared and the ordered accoutning is updated for each page. At writepage time, if the PagePrivate2 bit is missing, we go into the writepage fixup code to handle improperly dirtied pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:07 -04:00
Chris Mason	9655d2982b	Btrfs: use a cached state for extent state operations during delalloc This changes the btrfs code to find delalloc ranges in the extent state tree to use the new state caching code from set/test bit. It reduces one of the biggest causes of rbtree searches in the writeback path. test_range_bit is also modified to take the cached state as a starting point while searching. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:07 -04:00
Chris Mason	d5550c6315	Btrfs: don't lock bits in the extent tree during writepage At writepage time, we have the page locked and we have the extent_map entry for this extent pinned in the extent_map tree. So, the page can't go away and its mapping can't change. There is no need for the extra extent_state lock bits during writepage. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:06 -04:00
Chris Mason	2c64c53d8d	Btrfs: cache values for locking extents Many of the btrfs extent state tree users follow the same pattern. They lock an extent range in the tree, do some operation and then unlock. This translates to at least 2 rbtree searches, and maybe more if they are doing operations on the extent state tree. A locked extent in the tree isn't going to be merged or changed, and so we can safely return the extent state structure as a cached handle. This changes set_extent_bit to give back a cached handle, and also changes both set_extent_bit and clear_extent_bit to use the cached handle if it is available. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:06 -04:00
Chris Mason	1edbb734b4	Btrfs: reduce CPU usage in the extent_state tree Btrfs is currently mirroring some of the page state bits into its extent state tree. The goal behind this was to use it in supporting blocksizes other than the page size. But, we don't currently support that, and we're using quite a lot of CPU on the rb tree and its spin lock. This commit starts a series of cleanups to reduce the amount of work done in the extent state tree as part of each IO. This commit: * Adds the ability to lock an extent in the state tree and also set other bits. The idea is to do locking and delalloc in one call * Removes the EXTENT_WRITEBACK and EXTENT_DIRTY bits. Btrfs is using a combination of the page bits and the ordered write code for this instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:06 -04:00
Chris Mason	e48c465bb3	Btrfs: Fix new state initialization order As the extent state tree is manipulated, there are call backs that are used to take extra actions when different state bits are set or cleared. One example of this is a counter for the total number of delayed allocation bytes in a single inode and in the whole FS. When new states are inserted, this callback is being done before we properly setup the new state. This hasn't caused problems before because the lock bit was always done first, and the existing call backs don't care about the lock bit. This patch makes sure the state is properly setup before using the callback, which is important for later optimizations that do more work without using the lock bit. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:05 -04:00
Chris Mason	890871be85	Btrfs: switch extent_map to a rw lock There are two main users of the extent_map tree. The first is regular file inodes, where it is evenly spread between readers and writers. The second is the chunk allocation tree, which maps blocks from logical addresses to phyiscal ones, and it is 99.99% reads. The mapping tree is a point of lock contention during heavy IO workloads, so this commit switches things to a rw lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:05 -04:00
Chris Mason	57fd5a5ff8	Btrfs: tweak congestion backoff The btrfs io submission thread tries to back off congested devices in favor of rotating off to another disk. But, it tries to make sure it submits at least some IO before rotating on (the others may be congested too), and so it has a magic number of requests it tries to write before it hops. This makes the magic number smaller. Testing shows that we're spending too much time on congested devices and leaving the other devices idle. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:05 -04:00
Chris Mason	a97adc9fff	Btrfs: use larger nr_to_write for larger extents When btrfs fills a large delayed allocation extent, it is a good idea to try and convince the write_cache_pages caller to go ahead and write a good chunk of that extent. The extra IO is basically free because we know it is contiguous. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:04 -04:00
Chris Mason	4f878e8475	Btrfs: reduce worker thread spin_lock_irq hold times This changes the btrfs worker threads to batch work items into a local list. It allows us to pull work items in large chunks and significantly reduces the number of times we need to take the worker thread spinlock. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:04 -04:00
Chris Mason	4e3f9c5042	Btrfs: keep irqs on more often in the worker threads The btrfs worker thread spinlock was being used both for the queueing of IO and for the processing of ordered events. The ordered events never happen from end_io handlers, and so they don't need to use the _irq version of spinlocks. This adds a dedicated lock to the ordered lists so they don't have to run with irqs off. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:04 -04:00
Chris Mason	40431d6c12	Btrfs: optimize set extent bit The Btrfs set_extent_bit call currently searches the rbtree every time it needs to find more extent_state objects to fill the requested operation. This adds a simple test with rb_next to see if the next object in the tree was adjacent to the one we just found. If so, we skip the search and just use the next object. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:03 -04:00
Chris Mason	9042846bc7	Btrfs: Allow worker threads to exit when idle The Btrfs worker threads don't currently die off after they have been idle for a while, leading to a lot of threads sitting around doing nothing for each mount. Also, they are unable to start atomically (from end_io hanlders). This commit reworks the worker threads so they can be started from end_io handlers (just setting a flag that asks for a thread to be added at a later date) and so they can exit if they have been idle for a long time. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:30:56 -04:00
Yan Zheng	ceab36edd3	Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY invalidate_inode_pages2_range may return -EBUSY occasionally which results Oops. This patch fixes the issue by moving invalidate_inode_pages2_range into a loop and keeping calling it until the return value is not -EBUSY. The EBUSY return is temporary, and can happen when the btrfs release page function is unable to release a page because the EXTENT_LOCK bit is set. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-08-07 13:51:33 -04:00
Julia Lawall	60f2e8f8a0	Btrfs: correct error-handling zlib error handling find_zlib_workspace returns an ERR_PTR value in an error case instead of NULL. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @match exists@ expression x, E; statement S1, S2; @@ x = find_zlib_workspace(...) ... when != x = E ( * if (x == NULL \|\| ...) S1 else S2 \| * if (x == NULL && ...) S1 else S2 ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-08-07 13:51:33 -04:00
Bartlomiej Zolnierkiewicz	4baf8c9201	Btrfs: remove superfluous NULL pointer check in btrfs_rename() This takes care of the following entry from Dan's list: fs/btrfs/inode.c +4788 btrfs_rename(36) warning: variable derefenced before check 'old_inode' Reported-by: Dan Carpenter <error27@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eugene Teo <eteo@redhat.com> Cc: Julia Lawall <julia@diku.dk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-08-07 13:47:08 -04:00
Chris Mason	013f1b12f4	Btrfs: make sure the async caching thread advances the key The async caching thread can end up looping forever if a given search puts it at the last key in a leaf. It will end up calling btrfs_next_leaf and then checking if it needs to politely drop the read semaphore. Most of the time this looping isn't noticed because it is able to make progress the next time around. But, during log replay, we wait on the async caching thread to finish, and the async thread is waiting on the commit, and no progress is really made. The fix used here is to copy the key out of the next leaf, that way our search lands there properly. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-31 14:57:55 -04:00
Josef Bacik	6606bb97e1	Btrfs: fix btrfs_remove_from_free_space corner case Yan Zheng hit a problem where we tried to remove some free space but failed because we couldn't find the free space entry. This is because the free space was held within a bitmap that had a starting offset well before the actual offset of the free space, and there were free space extents that were in the same range as that offset, so tree_search_offset returned with NULL because we couldn't find a free space extent that had that offset. This is fixed by making sure that if we fail to find the entry, we re-search again with bitmap_only set to 1 and do an offset_to_bitmap so we can get the appropriate bitmap. A similar problem happens in btrfs_alloc_from_bitmap for the clustering code, but that is not as bad since we will just go and redo our cluster allocation. Also this adds some debugging checks to make sure that the free space we are trying to remove from the bitmap is in fact there. This can probably go away after a while, but since this code is only used by the tree-logging stuff it would be nice to run with it for a while to make sure there are no problems. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-31 11:03:58 -04:00
Chris Mason	f36f3042ea	Btrfs: be more polite in the async caching threads The semaphore used by the async caching threads can prevent a transaction commit, which can make the FS appear to stall. This releases the semaphore more often when a transaction commit is in progress. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-30 10:14:46 -04:00
Yan Zheng	276e680d19	Btrfs: preserve commit_root for async caching The async block group caching code uses the commit_root pointer to get a stable version of the extent allocation tree for scanning. This copy of the tree root isn't going to change and it significantly reduces the complexity of the scanning code. During a commit, we have a loop where we update the extent allocation tree root. We need to loop because updating the root pointer in the tree of tree roots may allocate blocks which may change the extent allocation tree. Right now the commit_root pointer is changed inside this loop. It is more correct to change the commit_root pointer only after all the looping is done. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-30 09:40:40 -04:00
Yan Zheng	f25784b35f	Btrfs: Fix async caching interaction with unmount - don't stop the caching thread until btrfs_commit_super return. - if caching is interrupted by umount, set last to (u64)-1. otherwise the un-scanned range of block group will be considered as free extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-28 08:41:57 -04:00
Josef Bacik	68b38550dd	Btrfs: change how we unpin extents We are racy with async block caching and unpinning extents. This patch makes things much less complicated by only unpinning the extent if the block group is cached. We check the block_group->cached var under the block_group->lock spin lock. If it is set to BTRFS_CACHE_FINISHED then we update the pinned counters, and unpin the extent and add the free space back. If it is not set to this, we start the caching of the block group so the next time we unpin extents we can unpin the extent. This keeps us from racing with the async caching threads, lets us kill the fs wide async thread counter, and keeps us from having to set DELALLOC bits for every extent we hit if there are caching kthreads going. One thing that needed to be changed was btrfs_free_super_mirror_extents. Now instead of just looking for LOCKED extents, we also look for DIRTY extents, since we could have left some extents pinned in the previous transaction that will never get freed now that we are unmounting, which would cause us to leak memory. So btrfs_free_super_mirror_extents has been changed to btrfs_free_pinned_extents, and it will clear the extents locked for the super mirror, and any remaining pinned extents that may be present. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-27 13:57:01 -04:00
Julia Lawall	631c07c8d1	Btrfs: Correct redundant test in add_inode_ref dir has already been tested. It seems that this test should be on the recently returned value inode. A simplified version of the semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-27 13:57:00 -04:00
Chris Mason	9779b72f05	Btrfs: find smallest available device extent during chunk allocation Allocating new block group is easy when the disk has plenty of space. But things get difficult as the disk fills up, especially if the FS has been run through btrfs-vol -b. The balance operation is likely to make the total bytes available on the device greater than the largest extent we'll actually be able to allocate. But the device extent allocation code incorrectly assumes that a device with 5G free will be able to allocate a 5G extent. It isn't normally a problem because device extents don't get freed unless btrfs-vol -b is run. This fixes the device extent allocator to remember the largest free extent it can find, and then uses that value as a fallback. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 16:41:41 -04:00
Chris Mason	283bb1979f	Btrfs: clear all space_info->full after removing a block group Btrfs allocates individual extents from block groups, and each block group has a specific type. It may hold metadata, data mirrored or striped etc. When we balance space (btrfs-vol -b) or remove a drive (btrfs-vol -r) we free block groups. Once a block group is freed, the space it was using on the device may be available for use by new block groups. btrfs_remove_block_group was clearing the flag that said 'our devices are full, don't even try to allocate new block groups', but it was only clearing that flag for a specific type of block group. This commit clears the full flag for all of the types of block groups, making it much more likely that we'll be able to balance space when the drive is close to full. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 16:30:55 -04:00
Sage Weil	ebecd3d9d2	Btrfs: make flushoncommit mount option correctly wait on ordered_extents The commit_transaction call to wait_ordered_extents when snap_pending passes nocow_only=1 to process only NOCOW or PREALLOC extents. This isn't correct for the 'flushoncommit' mode, as it skips extents we just started IO on in start_delalloc_inodes. So, in the flushoncommit case, wait on all ordered extents. Otherwise, only pass the nocow_only flag to wait_ordered_extents if snap_pending. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 13:17:44 -04:00
Yan Zheng	d717aa1d31	Btrfs: Avoid delayed reference update looping btrfs_split_leaf and btrfs_del_items can end up in a loop where one is constantly spliting a given leaf and the other is constantly merging it back with the adjacent nodes. There is a better fix for this, but in the interest of something small, this patch just changes btrfs_del_items back to balancing less often. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 12:42:46 -04:00
Yan Zheng	0a4eefbb74	Btrfs: Fix ordering of key field checks in btrfs_previous_item Check objectid of item before checking the item type, otherwise we may return zero for a key that is actually too low. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:47 -04:00
Yan Zheng	1fcbac581b	Btrfs: find_free_dev_extent doesn't handle holes at the start of the device find_free_dev_extent does not properly handle the case where the device is not complete free, and there is a free extent at the beginning of the device. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:47 -04:00
Diego Calleja	20736abaa3	Btrfs: Remove code duplication in comp_keys comp_keys is duplicating what is done in btrfs_comp_cpu_keys, so just call it. Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 11:22:46 -04:00
Josef Bacik	817d52f8db	Btrfs: async block group caching This patch moves the caching of the block group off to a kthread in order to allow people to allocate sooner. Instead of blocking up behind the caching mutex, we instead kick of the caching kthread, and then attempt to make an allocation. If we cannot, we wait on the block groups caching waitqueue, which the caching kthread will wake the waiting threads up everytime it finds 2 meg worth of space, and then again when its finished caching. This is how I tested the speedup from this mkfs the disk mount the disk fill the disk up with fs_mark unmount the disk mount the disk time touch /mnt/foo Without my changes this took 11 seconds on my box, with these changes it now takes 1 second. Another change thats been put in place is we lock the super mirror's in the pinned extent map in order to keep us from adding that stuff as free space when caching the block group. This doesn't really change anything else as far as the pinned extent map is concerned, since for actual pinned extents we use EXTENT_DIRTY, but it does mean that when we unmount we have to go in and unlock those extents to keep from leaking memory. I've also added a check where when we are reading block groups from disk, if the amount of space used == the size of the block group, we go ahead and mark the block group as cached. This drastically reduces the amount of time it takes to cache the block groups. Using the same test as above, except doing a dd to a file and then unmounting, it used to take 33 seconds to umount, now it takes 3 seconds. This version uses the commit_root in the caching kthread, and then keeps track of how many async caching threads are running at any given time so if one of the async threads is still running as we cross transactions we can wait until its finished before handling the pinned extents. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 09:23:39 -04:00
Josef Bacik	9630308170	Btrfs: use hybrid extents+bitmap rb tree for free space Currently btrfs has a problem where it can use a ridiculous amount of RAM simply tracking free space. As free space gets fragmented, we end up with thousands of entries on an rb-tree per block group, which usually spans 1 gig of area. Since we currently don't ever flush free space cache back to disk this gets to be a bit unweildly on large fs's with lots of fragmentation. This patch solves this problem by using PAGE_SIZE bitmaps for parts of the free space cache. Initially we calculate a threshold of extent entries we can handle, which is however many extent entries we can cram into 16k of ram. The maximum amount of RAM that should ever be used to track 1 gigabyte of diskspace will be 32k of RAM, which scales much better than we did before. Once we pass the extent threshold, we start adding bitmaps and using those instead for tracking the free space. This patch also makes it so that any free space thats less than 4 * sectorsize we go ahead and put into a bitmap. This is nice since we try and allocate out of the front of a block group, so if the front of a block group is heavily fragmented and then has a huge chunk of free space at the end, we go ahead and add the fragmented areas to bitmaps and use a normal extent entry to track the big chunk at the back of the block group. I've also taken the opportunity to revamp how we search for free space. Previously we indexed free space via an offset indexed rb tree and a bytes indexed rb tree. I've dropped the bytes indexed rb tree and use only the offset indexed rb tree. This cuts the number of tree operations we were doing previously down by half, and gives us a little bit of a better allocation pattern since we will always start from a specific offset and search forward from there, instead of searching for the size we need and try and get it as close as possible to the offset we want. I've given this a healthy amount of testing pre-new format stuff, as well as post-new format stuff. I've booted up my fedora box which is installed on btrfs with this patch and ran with it for a few days without issues. I've not seen any performance regressions in any of my tests. Since the last patch Yan Zheng fixed a problem where we could have overlapping entries, so updating their offset inline would cause problems. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-24 09:23:30 -04:00
David Woodhouse	83121942b2	Btrfs: Fix crash on read failures at mount If the tree roots hit read errors during mount, btrfs is not properly erroring out. We need to check the uptodate bits after reading in the tree root node. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:52:13 -04:00
Daniel Cadete	c271b49241	Btrfs: remove of redundant btrfs_header_level This removes the continues call's of btrfs_header_level. One call of btrfs_header_level(c) its enough. Signed-off-by Daniel Cadete <danielncadete10@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:52:13 -04:00
Julia Lawall	33c17ad571	Btrfs: adjust NULL test Move the call to BUG_ON to before the dereference of the tested value. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:01 -04:00
David Woodhouse	3acada49c2	Btrfs: Remove broken sanity check from btrfs_rmap_block() It was never actually doing anything anyway (see the loop condition), and it would be difficult to make it work for RAID[56]. Even if it was actually working, it's checking for the wrong thing anyway. Instead of checking whether we list a block which _doesn't_ land at the relevant physical location, it should be checking that we _have_ listed all the logical blocks which refer to the required physical location on all devices. This function is only called from remove_sb_from_cache() to ensure that we reserve the logical blocks which would reside at the same physical location as the superblock copies. So listing more blocks than we need is actually OK. With RAID[56] we're going to throw away an entire stripe for each block we have to ignore, so we _are_ going to list blocks other than the ones which actually contain the superblock. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:01 -04:00
Julia Lawall	29c5e8ce01	Btrfs: convert nested spin_lock_irqsave to spin_lock If spin_lock_irqsave is called twice in a row with the same second argument, the interrupt state at the point of the second call overwrites the value saved by the first call. Indeed, the second call does not need to save the interrupt state, so it is changed to a simple spin_lock. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 16:49:00 -04:00
Yan Zheng	4a8c9a62d7	Btrfs: make sure all dirty blocks are written at commit time Write dirty block groups may allocate new block, and so may add new delayed back ref. btrfs_run_delayed_refs may make some block groups dirty. commit_cowonly_roots does not handle the recursion properly, and some dirty blocks can be left unwritten at commit time. This patch moves btrfs_run_delayed_refs into the loop that writes dirty block groups, and makes the code not break out of the loop until there are no dirty block groups or delayed back refs. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 10:07:05 -04:00
Yan Zheng	33c66f430b	Btrfs: fix locking issue in btrfs_find_next_key When walking up the tree, btrfs_find_next_key assumes the upper level tree block is properly locked. This isn't always true even path->keep_locks is 1. This is because btrfs_find_next_key may advance path->slots[] several times instead of only once. When 'path->slots[level] >= btrfs_header_nritems(path->nodes[level])' is found, we can't guarantee the original value of 'path->slots[level]' is 'btrfs_header_nritems(path->nodes[level]) - 1'. If it's not, the tree block at 'level + 1' isn't locked. This patch fixes the issue by explicitly checking the locking state, re-searching the tree if it's not locked. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	e457afec60	Btrfs: fix double increment of path->slots[0] in btrfs_next_leaf if 1 is returned by btrfs_search_slot, the path already points to the first item with 'key > searching key'. So increasing path->slots[0] by one is superfluous in that case. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	bf1fb512a5	Btrfs: properly update space information after shrinking device. Change 'goto done' to 'break' for the case of all device extents have been freed, so that the code updates space information will be execute. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Yan Zheng	1bec1aed1e	Btrfs: fix definition of struct btrfs_extent_inline_ref use __le64 instead of u64 in on-disk structure definition. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-22 09:59:00 -04:00
Hu Tao	68f5a38c3e	Btrfs: fix error message formatting Make an error msg look nicer by inserting a space between number and word. Signed-off-by: Hu Tao <hu.taoo@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:55:45 -04:00
Jiri Slaby	9b627e9bf4	Btrfs: fix use after free in btrfs_start_workers fail path worker memory is already freed on one fail path in btrfs_start_workers, but is still dereferenced. Switch the dereference and kfree. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:50:58 -04:00
Chris Mason	9427216476	Btrfs: honor nodatacow/sum mount options for new files The btrfs attr patches unconditionally inherited the inode flags field without honoring nodatacow and nodatasum. This fix makes sure we properly record the nodatacow/sum mount options in new inodes. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:41:17 -04:00
Yan Zheng	2c47e605a9	Btrfs: update backrefs while dropping snapshot The new backref format has restriction on type of backref item. If a tree block isn't referenced by its owner tree, full backrefs must be used for the pointers in it. When a tree block loses its owner tree's reference, backrefs for the pointers in it should be updated to full backrefs. Current btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for general use. This patch adds backrefs update code to btrfs_drop_snapshot. It isn't a problem in the restricted form btrfs_drop_snapshot is used today, but for general snapshot deletion this update is required. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:41:17 -04:00
Josef Bacik	a970b0a16c	Btrfs: account for space we may use in fallocate Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC panic on btrfs. This is because we do not account for data we may use when doing the fallocate. This patch fixes the problem by properly reserving space, and then just freeing it when we are done. The reservation stuff was made with delalloc in mind, so its a little crude for this case, but it keeps the box from panicing. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-07-02 13:41:16 -04:00

1 2 3 4 5 ...

145645 Commits