linux

Author	SHA1	Message	Date
NeilBrown	17045f52ac	md/raid5: recognise replacements when assembling array. If a Replacement is seen, file it as such. If we see two replacements (or two normal devices) for the one slot, abort. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:53 +11:00
NeilBrown	dd054fce88	md/raid5: handle activation of replacement device when recovery completes. When recovery completes - as reported by a call to ->spare_active, we clear In_sync on the original and set it on the replacement. Then when the original gets removed we move the replacement from 'replacement' to 'rdev'. This could race with other code that is looking at these pointers, so we use memory barriers and careful ordering to ensure that a reader might see one device twice, but never no devices. Then the readers guard against using both devices, which could only happen when writing. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:53 +11:00
NeilBrown	9a3e1101b8	md/raid5: detect and handle replacements during recovery. During recovery we want to write to the replacement but not the original. So we have two new flags - R5_NeedReplace if this stripe has a replacement that needs to be written at some stage - R5_WantReplace if NeedReplace, and the data is available, and a 'sync' has been requested on this stripe. We also distinguish between 'sync and replace' which need to read all other devices, and 'replace' which only needs to read the devices being replaced. Note that during resync we always write to any replacement device. It might not need to be written to, but as we don't read to compare, we have to write to be sure. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:53 +11:00
NeilBrown	977df36255	md/raid5: writes should get directed to replacement as well as original. When writing, we need to submit two writes, one to the original, and one to the replacement - if there is a replacement. If the write to the replacement results in a write error, we just fail the device. We only try to record write errors to the original. When writing for recovery, we shouldn't write to the original. This will be addressed in a subsequent patch that generally addresses recovery. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:53 +11:00
NeilBrown	657e3e4d88	md/raid5: allow removal for failed replacement devices. Enhance raid5_remove_disk to be able to remove ->replacement as well as ->rdev. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:52 +11:00
NeilBrown	14a75d3e07	md/raid5: preferentially read from replacement device if possible. If a replacement device is present and has been recovered far enough, then use it for reading into the stripe cache. If we get an error we don't try to repair it, we just fail the device. A replacement device that gives errors does not sound sensible. This requires removing the setting of R5_ReadError when we get a read error during a read that bypasses the cache. It was probably a bad idea anyway as we don't know that every block in the read caused an error, and it could cause ReadError to be set for the replacement device, which is bad. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:52 +11:00
NeilBrown	995c4275a7	md/raid5: remove redundant bio initialisations. We current initialise some fields of a bio when preparing a stripe_head, and again just before submitting the request. Remove the duplication by only setting the fields that lower level devices don't touch in raid5_build_block, and only set the changeable fields in ops_run_io. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:52 +11:00
NeilBrown	ede7ee8b4d	md/raid5: raid5.h cleanup Remove some #defines that are no longer used, and replace some others with an enum. And remove an unused field. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:52 +11:00
NeilBrown	671488cc25	md/raid5: allow each slot to have an extra replacement device Just enhance data structures to record a second device per slot to be used as a 'replacement' device, replacing the original. We also have a second bio in each slot in each stripe_head. This will only be used when writing to the array - we need to write to both the original and the replacement at the same time, so will need two bios. For now, only try using the replacement drive for aligned-reads. In this case, we prefer the replacement if it has been recovered far enough, otherwise use the original. This includes a small enhancement. Previously we would only do aligned reads if the target device was fully recovered. Now we also do them if it has recovered far enough. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:52 +11:00
NeilBrown	2d78f8c451	md: create externally visible flags for supporting hot-replace. hot-replace is a feature being added to md which will allow a device to be replaced without removing it from the array first. With hot-replace a spare can be activated and recovery can start while the original device is still in place, thus allowing a transition from an unreliable device to a reliable device without leaving the array degraded during the transition. It can also be use when the original device is still reliable but it not wanted for some reason. This will eventually be supported in RAID4/5/6 and RAID10. This patch adds a super-block flag to distinguish the replacement device. If an old kernel sees this flag it will reject the device. It also adds two per-device flags which are viewable and settable via sysfs. "want_replacement" can be set to request that a device be replaced. "replacement" is set to show that this device is replacing another device. The "rd%d" links in /sys/block/mdXx/md only apply to the original device, not the replacement. We currently don't make links for the replacement - there doesn't seem to be a need. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:51 +11:00
NeilBrown	b8321b68d1	md: change hot_remove_disk to take an rdev rather than a number. Soon an array will be able to have multiple devices with the same raid_disk number (an original and a replacement). So removing a device based on the number won't work. So pass the actual device handle instead. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:51 +11:00
NeilBrown	476a7abb9b	md: remove test for duplicate device when setting slot number. When setting the slot number on a device in an active array we currently check that the number is not already in use. We then call into the personality's hot_add_disk function which performs the same test and returns the same error. Thus the common test is not needed. As we will shortly be changing some personalities to allow duplicates in some cases (to support hot-replace), the common test will become inconvenient. So remove the common test. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:51 +11:00
NeilBrown	915c420ddf	md/bitmap: be more consistent when setting new bits in memory bitmap. For each active region corresponding to a bit in the bitmap with have a 14bit counter (and some flags). This counts number of active writes + bit in the on-disk bitmap + delay-needed. The "delay-needed" is because we always want a delay before clearing a bit. So the number here is normally number of active writes plus 2. If there have been no writes for a while, we drop to 1. If still no writes we clear the bit and drop to 0. So for consistency, when setting bit from the on-disk bitmap or by request from user-space it is best to set the counter to '2' to start with. In particular we might also set the NEEDED_MASK flag at this time, and in all other cases NEEDED_MASK is only set when the counter is 2 or more. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:51 +11:00
NeilBrown	908f4fbd26	md/raid5: be more thorough in calculating 'degraded' value. When an array is being reshaped to change the number of devices, the two halves can be differently degraded. e.g. one could be missing a device and the other not. So we need to be more careful about calculating the 'degraded' attribute. Instead of just inc/dec at appropriate times, perform a full re-calculation examining both possible cases. This doesn't happen often so it not a big cost, and we already have most of the code to do it. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:50 +11:00
NeilBrown	2e61ebbcc4	md/bitmap: daemon_work cleanup. We have a variable 'mddev' in this function, but repeatedly get the same value by dereferencing bitmap->mddev. There is room for simplification here... Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:50 +11:00
NeilBrown	506c9e44a8	md: allow non-privileged uses to GET_*_INFO about raid arrays. The info is already available in /proc/mdstat and /sys/block in an accessible form so there is no point in putting a road-block in the ioctl for information gathering. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 10:17:26 +11:00
NeilBrown	961902c0f8	md/bitmap: It is OK to clear bits during recovery. commit `d0a4bb4927` introduced a regression which is annoying but fairly harmless. When writing to an array that is undergoing recovery (a spare in being integrated into the array), writing to the array will set bits in the bitmap, but they will not be cleared when the write completes. For bits covering areas that have not been recovered yet this is not a problem as the recovery will clear the bits. However bits set in already-recovered region will stay set and never be cleared. This doesn't risk data integrity. The only negatives are: - next time there is a crash, more resyncing than necessary will be done. - the bitmap doesn't look clean, which is confusing. While an array is recovering we don't want to update the 'events_cleared' setting in the bitmap but we do still want to clear bits that have very recently been set - providing they were written to the recovering device. So split those two needs - which previously both depended on 'success' and always clear the bit of the write went to all devices. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 09:57:48 +11:00
NeilBrown	60fc13702a	md: don't give up looking for spares on first failure-to-add Before performing a recovery we try to remove any spares that might not be working, then add any that might have become relevant. Currently we abort on the first spare that cannot be added. This is a false optimisation. It is conceivable that - depending on rules in the personality - a subsequent spare might be accepted. Also the loop does other things like count the available spares and reset the 'recovery_offset' value. If we abort early these might not happen properly. So remove the early abort. In particular if you have an array what is undergoing recovery and which has extra spares, then the recovery may not restart after as reboot as the could of 'spares' might end up as zero. Reported-by: Anssi Hannula <anssi.hannula@iki.fi> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 09:57:19 +11:00
NeilBrown	30d7a48368	md/raid5: ensure correct assessment of drives during degraded reshape. While reshaping a degraded array (as when reshaping a RAID0 by first converting it to a degraded RAID4) we currently get confused about which devices are in_sync. In most cases we get it right, but in the region that is being reshaped we need to treat non-failed devices as in-sync when we have the data but haven't actually written it out yet. Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 09:57:00 +11:00
NeilBrown	09cd9270ea	md/linear: fix hot-add of devices to linear arrays. commit `d70ed2e4fa` broke hot-add to a linear array. After that commit, metadata if not written to devices until they have been fully integrated into the array as determined by saved_raid_disk. That patch arranged to clear that field after a recovery completed. However for linear arrays, there is no recovery - the integration is instantaneous. So we need to explicitly clear the saved_raid_disk field. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-23 09:56:55 +11:00
Adam Kwolek	5d8c71f9e5	md: raid5 crash during degradation NULL pointer access causes crash in raid5 module. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-09 14:26:11 +11:00
NeilBrown	9283d8c5af	md/raid5: never wait for bad-block acks on failed device. Once a device is failed we really want to completely ignore it. It should go away soon anyway. In particular the presence of bad blocks on it should not cause us to block as we won't be trying to write there anyway. So as soon as we can check if a device is Faulty, do so and pretend that it is already gone if it is Faulty. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 16:27:57 +11:00
NeilBrown	8bd2f0a05b	md: ensure new badblocks are handled promptly. When we mark blocks as bad we need them to be acknowledged by the metadata handler promptly. For an in-kernel metadata handler that was already being done. But for an external metadata handler we need to alert it of the change by sending a notification through the sysfs file. This adds that notification. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 16:26:08 +11:00
NeilBrown	52c64152a9	md: bad blocks shouldn't cause a Blocked status on a Faulty device. Once a device is marked Faulty the badblocks - whether acknowledged or not - become irrelevant. So they shouldn't cause the device to be marked as Blocked. Without this patch, a process might write "-blocked" to clear the Blocked status, but while that will correctly fail the device, it won't remove the apparent 'blocked' status. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 16:22:48 +11:00
NeilBrown	af8a24347f	md: take a reference to mddev during sysfs access. When we are accessing an mddev via sysfs we know that the mddev cannot disappear because it has an embedded kobj which is refcounted by sysfs. And we also take the mddev_lock. However this is not enough. The final mddev_put could have been called and the mddev_delayed_delete is waiting for sysfs to let go so it can destroy the kobj and mddev. In this state there are a lot of changes that should not be attempted. To to guard against this we: - initialise mddev->all_mddevs in on last put so the state can be easily detected. - in md_attr_show and md_attr_store, check ->all_mddevs under all_mddevs_lock and mddev_get the mddev if it still appears to be active. This means that if we get to sysfs as the mddev is being deleted we will get -EBUSY. rdev_attr_store and rdev_attr_show are similar but already have sufficient protection. They check that rdev->mddev still points to mddev after taking mddev_lock. As this is cleared before delayed removal which can only be requested under the mddev_lock, this ensure the rdev and mddev are still alive. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 15:49:46 +11:00
NeilBrown	1d23f178d5	md: refine interpretation of "hold_active == UNTIL_IOCTL". We like md devices to disappear when they really are not needed. However it is not possible to tell from the current state whether it is needed or not. We can only tell from recent history of changes. In particular immediately after we create an md device it looks very similar to immediately after we have finished with it. So we always preserve a newly created md device until something significant happens. This state is stored in 'hold_active'. The normal case is to keep it until an ioctl happens, as that will normally either activate it, or explicitly de-activate it. If it doesn't then it was probably created by mistake and it is now time to get rid of it. We can also modify an array via sysfs (instead of via ioctl) and we currently treat any change via sysfs like an ioctl as a sign that if it now isn't more active, it should be destroyed. However this is not appropriate as changes made via sysfs are more gradual so we should look for a more definitive change. So this patch only clears 'hold_active' from UNTIL_IOCTL to clear when the array_state is changed via sysfs. Other changes via sysfs are ignored. Signed-off-by: NeilBrown <neilb@suse.de>	2011-12-08 15:49:12 +11:00
NeilBrown	7c8f424798	md/lock: ensure updates to page_attrs are properly locked. Page attributes are set using __set_bit rather than set_bit as it normally called under a spinlock so the extra atomicity is not needed. However there are two places where we might set or clear page attributes without holding the spinlock. So add the spinlock in those cases. This might be the cause of occasional reports that bits a aren't getting clear properly - theory is that BITMAP_PAGE_PENDING gets lost when BITMAP_PAGE_NEEDWRITE is set or cleared. This is an inconvenience, not a threat to data safety. Signed-off-by: NeilBrown <neilb@suse.de>	2011-11-23 10:18:52 +11:00
Dan Williams	257a4b42af	md/raid5: STRIPE_ACTIVE has lock semantics, add barriers All updates that occur under STRIPE_ACTIVE should be globally visible when STRIPE_ACTIVE clears. test_and_set_bit() implies a barrier, but clear_bit() does not. This is suitable for 3.1-stable. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org	2011-11-08 16:22:06 +11:00
NeilBrown	9a3f530f39	md/raid5: abort any pending parity operations when array fails. When the number of failed devices exceeds the allowed number we must abort any active parity operations (checks or updates) as they are no longer meaningful, and can lead to a BUG_ON in handle_parity_checks6. This bug was introduce by commit `6c0069c0ae` in 2.6.29. Reported-by: Manish Katiyar <mkatiyar@gmail.com> Tested-by: Manish Katiyar <mkatiyar@gmail.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org	2011-11-08 16:22:01 +11:00
Stephen Rothwell	a84450604d	device-mapper: using EXPORT_SYBOL in dm-space-map-checker.c needs export.h Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-07 10:29:10 -08:00
Stephen Rothwell	6f66263f8e	device-mapper: dm-bufio.c needs to include module.h since it uses the module facilities. Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-07 10:29:10 -08:00
Paul Gortmaker	1944ce60fe	drivers/md: change module.h -> export.h in persistent-data/dm-* For the files which are not themselves modular, we can change them to include only the smaller export.h since all they are doing is looking for EXPORT_SYMBOL. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-07 10:29:09 -08:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Linus Torvalds	b4fdcb02f1	Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits) block: don't call blk_drain_queue() if elevator is not up blk-throttle: use queue_is_locked() instead of lockdep_is_held() blk-throttle: Take blkcg->lock while traversing blkcg->policy_list blk-throttle: Free up policy node associated with deleted rule block: warn if tag is greater than real_max_depth. block: make gendisk hold a reference to its queue blk-flush: move the queue kick into blk-flush: fix invalid BUG_ON in blk_insert_flush block: Remove the control of complete cpu from bio. block: fix a typo in the blk-cgroup.h file block: initialize the bounce pool if high memory may be added later block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown block: drop @tsk from attempt_plug_merge() and explain sync rules block: make get_request[_wait]() fail if queue is dead block: reorganize throtl_get_tg() and blk_throtl_bio() block: reorganize queue draining block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() block: pass around REQ_* flags instead of broken down booleans during request alloc/free block: move blk_throtl prototypes to block/blk.h block: fix genhd refcounting in blkio_policy_parse_and_set() ... Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion and making the request functions be of type "void" instead of "int" in - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c} - drivers/staging/zram/zram_drv.c	2011-11-04 17:06:58 -07:00
Linus Torvalds	43672a0784	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm * git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm: dm: raid fix device status indicator when array initializing dm log userspace: add log device dependency dm log userspace: fix comment hyphens dm: add thin provisioning target dm: add persistent data library dm: add bufio dm: export dm get md dm table: add immutable feature dm table: add always writeable feature dm table: add singleton feature dm kcopyd: add dm_kcopyd_zero to zero an area dm: remove superfluous smp_mb dm: use local printk ratelimit dm table: propagate non rotational flag	2011-11-02 17:02:37 -07:00
Paul Gortmaker	daaa5f7cbe	md: Add in export.h for files using EXPORT_SYMBOL These files were getting the defines for EXPORT_SYMBOL because device.h was including module.h. But we are going to put an end to that. So add the proper export.h include now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:19 -04:00
Paul Gortmaker	056075c764	md: Add module.h to all files using it implicitly A pending cleanup will mean that module.h won't be implicitly everywhere anymore. Make sure the modular drivers in md dir are actually calling out for <module.h> explicitly in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:18 -04:00
Linus Torvalds	571109f536	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: md/raid10: Fix bug when activating a hot-spare.	2011-10-31 15:21:29 -07:00
Jonathan E Brassow	2e727c3ca1	dm: raid fix device status indicator when array initializing When devices in a RAID array are not in-sync, they are supposed to be reported as such in the status output as an 'a' character, which means "alive, but not in-sync". But when the entire array is rebuilt 'A' is being used, which is incorrect. This patch corrects this to 'a'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:26 +00:00
Jonathan E Brassow	5a25f0eb70	dm log userspace: add log device dependency Allow userspace dm log implementations to register their log device so it is no longer missing from the list of device dependencies. When device mapper targets use a device they normally call dm_get_device which includes it in the device list returned to userspace applications such as LVM through the DM_TABLE_DEPS ioctl. Userspace log devices don't use dm_get_device as userspace opens them so they are missing from the list of dependencies. This patch extends the DM_ULOG_CTR operation to allow userspace to respond with the name of the log device (if appropriate) to be registered via 'dm_get_device'. DM_ULOG_REQUEST_VERSION is incremented. This is backwards compatible. If the kernel and userspace log server have both been updated, the new information will be passed down to the kernel and the device will be registered. If the kernel is new, but the log server is old, the log server will not pass down any device information and the kernel will simply bypass the device registration as before. If the kernel is old but the log server is new, the log server will see the old version number and not pass the device info. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:24 +00:00
Jonathan Brassow	b89544575d	dm log userspace: fix comment hyphens Fix comments: clustered-disk needs a hyphen not an underscore. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:22 +00:00
Joe Thornber	991d9fa02d	dm: add thin provisioning target Initial EXPERIMENTAL implementation of device-mapper thin provisioning with snapshot support. The 'thin' target is used to create instances of the virtual devices that are hosted in the 'thin-pool' target. The thin-pool target provides data sharing among devices. This sharing is made possible using the persistent-data library in the previous patch. The main highlight of this implementation, compared to the previous implementation of snapshots, is that it allows many virtual devices to be stored on the same data volume, simplifying administration and allowing sharing of data between volumes (thus reducing disk usage). Another big feature is support for arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots ...). The previous implementation of snapshots did this by chaining together lookup tables, and so performance was O(depth). This new implementation uses a single data structure so we don't get this degradation with depth. For further information and examples of how to use this, please read Documentation/device-mapper/thin-provisioning.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:18 +00:00
Joe Thornber	3241b1d3e0	dm: add persistent data library The persistent-data library offers a re-usable framework for the storage and management of on-disk metadata in device-mapper targets. It's used by the thin-provisioning target in the next patch and in an upcoming hierarchical storage target. For further information, please read Documentation/device-mapper/persistent-data.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:11 +00:00
Mikulas Patocka	95d402f057	dm: add bufio The dm-bufio interface allows you to do cached I/O on devices, holding recently-read blocks in memory and performing delayed writes. We don't use buffer cache or page cache already present in the kernel, because: * we need to handle block sizes larger than a page * we can't allocate memory to perform reads or we'd have deadlocks Currently, when a cache is required, we limit its size to a fraction of available memory. Usage can be viewed and changed in /sys/module/dm_bufio/parameters/ . The first user is thin provisioning, but more dm users are planned. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:09 +00:00
Alasdair G Kergon	3cf2e4ba74	dm: export dm get md Export dm_get_md() for the new thin provisioning target to use. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:06 +00:00
Alasdair G Kergon	36a0456fbf	dm table: add immutable feature Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed with any other target type, and once loaded into a device, it cannot be replaced with a table containing a different type. The thin provisioning pool device will use this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:04 +00:00
Alasdair G Kergon	cc6cbe141a	dm table: add always writeable feature Add a target feature flag DM_TARGET_ALWAYS_WRITEABLE to indicate that a target does not support read-only mode. The initial implementation of the thin provisioning target uses this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:02 +00:00
Alasdair G Kergon	3791e2fc0e	dm table: add singleton feature Introduce the concept of a singleton table which contains exactly one target. If a target type sets the DM_TARGET_SINGLETON feature bit device-mapper will ensure that any table that includes that target contains no others. The thin provisioning pool target uses this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:19:00 +00:00
Mikulas Patocka	7f06965390	dm kcopyd: add dm_kcopyd_zero to zero an area This patch introduces dm_kcopyd_zero() to make it easy to use kcopyd to write zeros into the requested areas instead instead of copying. It is implemented by passing a NULL copying source to dm_kcopyd_copy(). The forthcoming thin provisioning target uses this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:18:58 +00:00
Namhyung Kim	fbdc86f3bd	dm: remove superfluous smp_mb Since set_current_state() contains a memory barrier in it, an additional barrier isn't needed. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:18:56 +00:00

1 2 3 4 5 ...

2172 Commits