linux

Author	SHA1	Message	Date
Kent Overstreet	ca43f73cd1	bcachefs: bch2_btree_write_buffer_flush_going_ro() The write buffer needs to be specifically flushed when going RO: keys in the journal that haven't yet been moved to the write buffer don't have a journal pin yet. This fixes numerous syzbot bugs, all with symptoms of still doing writes after we've got RO. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-11-07 23:31:11 -05:00
Kent Overstreet	a0d11feefb	bcachefs: Don't use commit_do() unnecessarily Using commit_do() to call alloc_sectors_start_trans() breaks when we're randomly injecting transaction restarts - the restart in the commit causes us to leak the lock that alloc_sectorS_start_trans() takes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-10-18 00:49:48 -04:00
Kent Overstreet	5e3b72324d	bcachefs: Fix sysfs warning in fstests generic/730,731 sysfs warns if we're removing a symlink from a directory that's no longer in sysfs; this is triggered by fstests generic/730, which simulates hot removal of a block device. This patch is however not a correct fix, since checking kobj->state_in_sysfs on a kobj owned by another subsystem is racy. A better fix would be to add the appropriate check to sysfs_remove_link() - and sysfs_create_link() as well. But kobject_add_internal()/kobject_del() do not as of today have locking that would support that. Note that the block/holder.c code appears to be subject to this race as well. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-10-14 05:43:01 -04:00
Kent Overstreet	691f2cba22	bcachefs: btree cache counters should be size_t 32 bits won't overflow any time soon, but size_t is the correct type for counting objects in memory. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	17405279e8	bcachefs: bch2_sb_member_alloc() refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	6b812f1dce	bcachefs: bch2_dev_remove_alloc() -> alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	c7652f253a	bcachefs: promote_whole_extents is now a normal option Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-09 09:41:48 -04:00
Kent Overstreet	112d21fd1a	bcachefs: switch to rhashtable for vfs inodes hash the standard vfs inode hash table suffers from painful lock contention - this is long overdue Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-09 09:41:47 -04:00
Kent Overstreet	e61dd67860	bcachefs: Fix double free of ca->buckets_nouse Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `ffcbec6076` ("bcachefs: Kill opts.buckets_nouse") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-30 20:43:29 -04:00
Kent Overstreet	ec8bf491a9	bcachefs: Improve startup message We're not always mounting when we start the filesystem Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:16 -04:00
Kent Overstreet	36008d5d01	bcachefs: Plumb more logging through stdio redirect Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	5668e5deec	bcachefs: bch2_verify_accounting_clean() Verify that the in-memory accounting verifies the on-disk accounting after a clean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	fb23d57a6d	bcachefs: Convert gc to new accounting Rewrite fsck/gc for the new accounting scheme. This adds a second set of in-memory accounting counters for gc to use; like with other parts of gc we run all trigger in TRIGGER_GC mode, then compare what we calculated to existing in-memory accounting at the end. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	4c4a7d48bd	bcachefs: Kill replicas_journal_res More dead code deletion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	8bb8d683a4	bcachefs: Delete journal-buf-sharded old style accounting More deletion of dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	3afb8dbf03	bcachefs: kill bch2_fs_usage_read() With bch2_ioctl_fs_usage(), this is now dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	1d16c605cc	bcachefs: Disk space accounting rewrite Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Thomas Bertschinger	51fc436c80	bcachefs: allow passing full device path for target options The output of mount options such as "metadata_target" in `/proc/mounts` uses the full path to the device. mount(8) from util-linux uses the output from `/proc/mounts` to pass existing mount options when performing a remount, so bcachefs should accept as input the same form that it prints as output. Without this change: $ mount -t bcachefs -o metadata_target=vdb /dev/vdb /mnt $ strace mount -o remount /mnt ... fsconfig(4, FSCONFIG_SET_STRING, "metadata_target", "/dev/vdb", 0) = -1 EINVAL (Invalid argument) ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	44ec599035	bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs On a new filesystem or device we have to allocate the journal with a bump allocator, because allocation info isn't ready yet - but when hot-adding a device that doesn't have a journal, we don't want to use that path. Reported-by: syzbot+24a867cb90d8315cccff@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:47:31 -04:00
Kent Overstreet	759b2e800f	bcachefs: Switch online_reserved shutdown assert to WARN() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 11:06:31 -04:00
Kent Overstreet	64ee1431cc	bcachefs: Discard, invalidate workers are now per device There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 18:47:55 -04:00
Kent Overstreet	36da8e387b	bcachefs: Add missing recalc_capacity() call This fixes filesystem size not changing on device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 10:12:51 -04:00
Kent Overstreet	504794067f	bcachefs: Replace bare EEXIST with private error codes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	f770a6e9a3	bcachefs: Fix initialization order for srcu barrier btree_iter_init() needs to happen before key_cache_init(), to initialize btree_trans_barrier Reported-by: syzbot+3cca837c2183f8f6fcaf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	161f73c2c7	bcachefs: Split out btree_write_submit_wq Split the workqueues for btree read completions and btree write submissions; we don't want concurrency control on btree read completions, but we do want concurrency control on write submissions, else blocking in submit_bio() will cause a ton of kworkers to be allocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:15 -04:00
Kent Overstreet	d509cadc3a	bcachefs: Fix debug assert Reported-by: syzbot+a8074a75b8d73328751e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-26 12:40:30 -04:00
Kent Overstreet	d293ece108	bcachefs: Fix shutdown ordering the btree key cache uses the srcu struct created/destroyed by btree_iter.c; btree_iter needs to be exited last. Reported-by: syzbot+3af9daea347788b15213@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 19:54:03 -04:00
Kent Overstreet	dbd0408087	bcachefs: move replica_set from bch_dev to bch_fs This is needed for the next patch - the write submit path has to be able to allocate a replica bio even when we weren't able to get a ref on the device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	552aa54865	bcachefs: Debug asserts for ca->ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	f295298b8c	bcachefs: New helpers for device refcounts This will be used in the next patch for adding some new debug mode asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	b895c70326	bcachefs: x-macroize journal flags enums Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	3a718c0647	bcachefs: On device add, prefer unused slots We can't strictly guarantee that no pointers refer to nonexistent devices - we attempt to, but we need to be safe when the filesystem is corrupt. Therefore, change device_add to try to pick a slot that's never been used, or the slot that's been unused the longest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	ffcbec6076	bcachefs: Kill opts.buckets_nouse Now explicitly allocate and free the buckets_nouse bitmap - this is going to be used for online fsck. To go RW when we haven't check allocations, we'll do a much slimmed down version that just initializes the buckets_nouse bitmaps. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	f04158290d	bcachefs: journal seq blacklist gc no longer has to walk btree Since btree_ptr_v2, we no longer require the journal seq blacklist table for skipping blacklisted bsets (btree node entries); the pointer to a given node indicates how much data is present. Therefore there's no longer any need for journal seq blacklist gc to walk the btree - we can prune entries older than journal last_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	103304021e	bcachefs: Move gc of bucket.oldest_gen to workqueue This is a nice cleanup - and we've also been having problems with kthread creation in the mount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	feb255537d	bcachefs: assert that online_reserved == 0 on shutdown Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	2f724563fc	bcachefs: member helper cleanups Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	5dd8c60e1e	bcachefs: iter/update/trigger/str_hash flag cleanup Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	c281db0fa5	bcachefs: mark_superblock cleanup Consolidate mark_superblock() and trans_mark_superblock(), like we did with the other trigger paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	497c982f05	bcachefs: New assertion for writing to the journal after shutdown Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	db42549d40	bcachefs: Add a better limit for maximum number of buckets The bucket_gens array is a single array allocation (one byte per bucket), and kernel allocations are still limited to INT_MAX. Check this limit to avoid failing the bucket_gens array allocation. Reported-by: syzbot+b29f436493184ea42e2b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-06 10:58:17 -04:00
Kent Overstreet	3a2d025927	bcachefs: Fix bch2_dev_lookup() refcounting bch2_dev_lookup() is supposed to take a ref on the device it returns, but for_each_member_device() takes refs as it iterates, for_each_member_device_rcu() does not. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-06 10:58:17 -04:00
Kent Overstreet	ec438ac59d	bcachefs: Fix missing call to bch2_fs_allocator_background_exit() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-20 00:31:59 -04:00
Kent Overstreet	9802ff48f3	bcachefs: Print shutdown journal sequence number Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-04 16:56:44 -04:00
Kent Overstreet	4409b8081d	bcachefs: Repair pass for scanning for btree nodes If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	13c1e583f9	bcachefs: Improve -o norecovery; opts.recovery_pass_limit This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	0a34c058fc	bcachefs: Ensure bch_sb_field_ext always exists This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	3ed94062e3	bcachefs: Improve bch2_fatal_error() error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 00:24:24 -04:00
Kent Overstreet	f3589bfa7e	bcachefs: fix for building in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Darrick J. Wong	273960b8f3	bcachefs: time_stats: split stats-with-quantiles into a separate structure Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:38:01 -04:00

1 2 3 4 5 ...

314 Commits