linux

Author	SHA1	Message	Date
Kent Overstreet	778ac324cc	bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open buckets Open buckets on the partial list should not count as allocated when we're trying to allocate from the partial list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-10-29 06:34:10 -04:00
Kent Overstreet	1c0ee43b2c	bcachefs: BCH_FS_clean_recovery Add a filesystem flag to indicate whether we did a clean recovery - using c->sb.clean after we've got rw is incorrect, since c->sb is updated whenever we write the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 22:32:22 -04:00
Kent Overstreet	8d65b15f8d	bcachefs: Fix BCH_SB_ERRS() so we can reorder BCH_SB_ERRS() has a field for the actual enum val so that we can reorder to reorganize, but the way BCH_SB_ERR_MAX was defined didn't allow for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	83ccd9b31d	bcachefs: bch_fs.rw_devs_change_count Add a counter that's incremented whenever rw devices change; this will be used for erasure coding so that it can keep ec_stripe_head in sync and not deadlock on a new stripe when a device it wants goes away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	4f19a60c32	bcachefs: Options for recovery_passes, recovery_passes_exclude This adds mount options for specifying recovery passes to run, or exclude; the immediate need for this is that backpointers fsck is having trouble completing, so we need a way to skip it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	2c6a7bff2a	bcachefs: Switch gc bucket array to a genradix A user with a 30 tb device is overflowing the INT_MAX limit on vmalloc allocations... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-09 09:41:49 -04:00
Kent Overstreet	c7652f253a	bcachefs: promote_whole_extents is now a normal option Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-09 09:41:48 -04:00
Kent Overstreet	112d21fd1a	bcachefs: switch to rhashtable for vfs inodes hash the standard vfs inode hash table suffers from painful lock contention - this is long overdue Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-09 09:41:47 -04:00
Alyssa Ross	a3ed1cc413	bcachefs: Fix negative timespecs This fixes two problems in the handling of negative times: • rem is signed, but the rem * c->sb.nsec_per_time_unit operation produced a bogus unsigned result, because s32 * u32 = u32. • The timespec was not normalized (it could contain more than a billion nanoseconds). For example, { .tv_sec = -14245441, .tv_nsec = 750000000 }, after being round tripped through timespec_to_bch2_time and then bch2_time_to_timespec would come back as { .tv_sec = -14245440, .tv_nsec = 4044967296 } (more than 4 billion nanoseconds). Cc: stable@vger.kernel.org Fixes: `595c1e9bab` ("bcachefs: Fix time handling") Closes: https://github.com/koverstreet/bcachefs/issues/743 Co-developed-by: Erin Shepherd <erin.shepherd@e43.eu> Signed-off-by: Erin Shepherd <erin.shepherd@e43.eu> Co-developed-by: Ryan Lahfa <ryan@lahfa.xyz> Signed-off-by: Ryan Lahfa <ryan@lahfa.xyz> Signed-off-by: Alyssa Ross <hi@alyssa.is> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-09 09:41:47 -04:00
Kent Overstreet	06a8693b89	bcachefs: Add a time_stat for blocked on key cache flush Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-13 23:00:50 -04:00
Kent Overstreet	cecf72798b	bcachefs: Make allocator stuck timeout configurable, ratelimit messages Limit these messages to once every 2 minutes to avoid spamming logs; with multiple devices the output can be quite significant. Also, up the default timeout to 30 seconds from 10 seconds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-07 21:04:55 -04:00
Kent Overstreet	b0d3ab531f	bcachefs: Reduce the scope of gc_lock gc_lock is now only for synchronization between check_alloc_info and interior btree updates - nothing else Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:15 -04:00
Kent Overstreet	2574e95a8b	bcachefs: Refactor disk accounting data structures Break up the percpu counter allocations into individual allocations for each disk accounting counter; this fixes an issue on large systems where we have too many replica entries to for the percpu allocator's max practical size. Also, use just one eytzinger tree for the normal set of counters and the gc counters; this simplifies accounting_gc_done() where we need the same set of counters to be present in both tables. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:15 -04:00
Kent Overstreet	36008d5d01	bcachefs: Plumb more logging through stdio redirect Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	fb23d57a6d	bcachefs: Convert gc to new accounting Rewrite fsck/gc for the new accounting scheme. This adds a second set of in-memory accounting counters for gc to use; like with other parts of gc we run all trigger in TRIGGER_GC mode, then compare what we calculated to existing in-memory accounting at the end. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	4c4a7d48bd	bcachefs: Kill replicas_journal_res More dead code deletion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	8bb8d683a4	bcachefs: Delete journal-buf-sharded old style accounting More deletion of dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	3afb8dbf03	bcachefs: kill bch2_fs_usage_read() With bch2_ioctl_fs_usage(), this is now dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	f5095b9f85	bcachefs: dev_usage updated by new accounting Reading disk accounting now requires an eytzinger lookup (see: bch2_accounting_mem_read()), but the per-device counters are used frequently enough that we'd like to still be able to read them with just a percpu sum, as in the old code. This patch special cases the device counters; when we update in-memory accounting we also update the old style percpu counters if it's a deice counter update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	1d16c605cc	bcachefs: Disk space accounting rewrite Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	5d9667d1d6	bcachefs: btree write buffer knows how to accumulate bch_accounting keys Teach the btree write buffer how to accumulate accounting keys - instead of having the newer key overwrite the older key as we do with other updates, we need to add them together. Also, add a flag so that write buffer flush knows when journal replay is finished flushing accounting, and teach it to hold accounting keys until that flag is set. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	7773df19c3	bcachefs: metadata version bucket_stripe_sectors New on disk format version for bch_alloc->stripe_sectors and BCH_DATA_unstriped - accounting for unstriped data in stripe buckets. Upgrade/downgrade requires regenerating alloc info - but only if erasure coding is in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	26a170aa61	bcachefs: add capacity, reserved to fs_alloc_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	64ee1431cc	bcachefs: Discard, invalidate workers are now per device There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 18:47:55 -04:00
Kent Overstreet	cff07e2739	bcachefs: Guard against overflowing LRU_TIME_BITS LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	161f73c2c7	bcachefs: Split out btree_write_submit_wq Split the workqueues for btree read completions and btree write submissions; we don't want concurrency control on btree read completions, but we do want concurrency control on write submissions, else blocking in submit_bio() will cause a ton of kworkers to be allocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:15 -04:00
Kent Overstreet	088d0de812	bcachefs: btree_gc can now handle unknown btrees Compatibility fix - we no longer have a separate table for which order gc walks btrees in, and special case the stripes btree directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Thomas Bertschinger	07f9a27f19	bcachefs: add no_invalid_checks flag Setting this flag on a filesystem results in validity checks being skipped when writing bkeys. This flag will be used by tooling that deliberately injects corruption into a filesystem in order to exercise fsck. It shouldn't be set outside of testing/debugging code. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:24:30 -04:00
Kent Overstreet	c670509134	bcachefs: Allocator prefers not to expand mi.btree_allocated bitmap We now have a small bitmap in the member info section of the superblock for "regions that have btree nodes", so that if we ever have to scan for btree nodes in repair we don't have to scan the whole device(s). This tweaks the allocator to prefer allocating from regions that are already marked in this bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	3858aa4268	bcachefs: ptr_stale() -> dev_ptr_stale() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	dbd0408087	bcachefs: move replica_set from bch_dev to bch_fs This is needed for the next patch - the write submit path has to be able to allocate a replica bio even when we weren't able to get a ref on the device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	552aa54865	bcachefs: Debug asserts for ca->ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	24b27975a9	bcachefs: Kill gc_init_recurse() This unifies the online and offline btree gc passes; we're not yet running it online. We now iterate over one level of the btree at a time - the same as check_extents_to_backpointers(); this ordering preserves order of keys regardless of btree splits and merges, which will be important when we re-enable online gc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	930e1a92d6	bcachefs: kill gc looping for bucket gens looping when we change a bucket gen is not ideal - it means we risk failing if we'd go into an infinite loop, and it's better to make forward progress even if fsck doesn't fix everything. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	f04158290d	bcachefs: journal seq blacklist gc no longer has to walk btree Since btree_ptr_v2, we no longer require the journal seq blacklist table for skipping blacklisted bsets (btree node entries); the pointer to a given node indicates how much data is present. Therefore there's no longer any need for journal seq blacklist gc to walk the btree - we can prune entries older than journal last_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	103304021e	bcachefs: Move gc of bucket.oldest_gen to workqueue This is a nice cleanup - and we've also been having problems with kthread creation in the mount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	2f724563fc	bcachefs: member helper cleanups Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	d155272b6e	bcachefs: bucket_valid() cut out a branch from doing it the obvious way Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	5577881455	bcachefs: add btree_node_merging_disabled debug param Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	9e203c43dc	bcachefs: Fix missing write refs in fs fio paths bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-13 22:48:17 -04:00
Kent Overstreet	a292be3b68	bcachefs: Reconstruct missing snapshot nodes When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:46:51 -04:00
Kent Overstreet	55936afe11	bcachefs: Flag btrees with missing data We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:46:51 -04:00
Kent Overstreet	4409b8081d	bcachefs: Repair pass for scanning for btree nodes If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	d2554263ad	bcachefs: Split out recovery_passes.c We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	63332394c7	bcachefs: Move snapshot table size to struct snapshot_table We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	a0a466ea98	bcachefs: Split out btree_node_rewrite_worker This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Darrick J. Wong	273960b8f3	bcachefs: time_stats: split stats-with-quantiles into a separate structure Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:38:01 -04:00
Kent Overstreet	f1ca1abfb0	bcachefs: pull out time_stats.[ch] prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:30:35 -04:00
Kent Overstreet	894d062254	bcachefs: Rename journal_keys.d -> journal_keys.data This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	a393f33123	bcachefs: Split out discard fastpath Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00

1 2 3 4 5 ...

299 Commits