linux

Author	SHA1	Message	Date
Kent Overstreet	7f2de6947f	bcachefs: Fix warning in bch2_fs_journal_stop() j->last_empty_seq needs to match j->seq when the journal is empty Reported-by: syzbot+4093905737cf289b6b38@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-08-22 02:07:23 -04:00
Uros Bizjak	68573b936d	bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg() Use try_cmpxchg() family of functions instead of cmpxchg (ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg() implicitly assigns old ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	ef05bdf5d6	bcachefs: Add missing printbuf_tabstops_reset() calls Fixes warnings from bch2_print_allocator_stuck() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:14:18 -04:00
Kent Overstreet	44ec599035	bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs On a new filesystem or device we have to allocate the journal with a bump allocator, because allocation info isn't ready yet - but when hot-adding a device that doesn't have a journal, we don't want to use that path. Reported-by: syzbot+24a867cb90d8315cccff@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:47:31 -04:00
Kent Overstreet	600b8be5e7	bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:16:41 -04:00
Kent Overstreet	d6b52f6828	bcachefs: Fix null ptr deref in journal_pins_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 12:07:07 -04:00
Kent Overstreet	dbf4d79b7f	bcachefs: Fix early init error path in journal code We shouln't be running the journal shutdown sequence if we never fully initialized the journal. Reported-by: syzbot+ffd2270f0bca3322ee00@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	e98786ea85	bcachefs: bch2_print_allocator_stuck() If we block on the allocator for more than 10 seconds, print out some useful debugging info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	b895c70326	bcachefs: x-macroize journal flags enums Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	e7f63c67fc	bcachefs: plumb data_type into bch2_bucket_alloc_trans() prep work for making the allocator try to keep btree nodes within the existing member info btree allocated bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	b25fd02ab4	bcachefs: fix flag printing in journal_buf_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	5dd8c60e1e	bcachefs: iter/update/trigger/str_hash flag cleanup Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	c281db0fa5	bcachefs: mark_superblock cleanup Consolidate mark_superblock() and trans_mark_superblock(), like we did with the other trigger paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	497c982f05	bcachefs: New assertion for writing to the journal after shutdown Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	7423330e30	bcachefs: prt_printf() now respects \r\n\t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	6e297a73bc	bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-07 11:02:37 -04:00
Kent Overstreet	72e71bf029	bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf() We're using mutex_lock() inside a wait_event() conditional - prepare_to_wait() has already flipped task state, so potentially blocking ops need annotation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-06 10:14:13 -04:00
Kent Overstreet	2e92d26b25	bcachefs: Fix lost wakeup on journal shutdown We need to check for journal shutdown first in __journal_res_get() - after the journal is shutdown, j->watermark won't be changing anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 23:35:42 -04:00
Kent Overstreet	f1ca1abfb0	bcachefs: pull out time_stats.[ch] prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:30:35 -04:00
Kent Overstreet	5e105fb806	bcachefs: fix bch2_journal_buf_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	7efa287526	bcachefs: Fix bch2_journal_noflush_seq() Improved journal pipelining broke journal_noflush_seq(); it implicitly assumed only the oldest outstanding journal buf could be in flight, but that's no longer true. Make this more straightforward by just setting buf->must_flush whenever we know a journal buf is going to be flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	2cce3752ce	bcachefs: split out ignore_blacklisted, ignore_not_dirty prep work for replaying the journal backwards Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	90aa35c4c9	bcachefs: Add journal.blocked to journal_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	06d493fee4	bcachefs: improve bch2_journal_buf_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	cb6fc943b6	bcachefs: kill kvpmalloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 18:39:12 -04:00
Kent Overstreet	916abefd43	bcachefs: better journal pipelining Recently a severe performance regression was discovered, which bisected to `a6548c8b5e` bcachefs: Avoid flushing the journal in the discard path It turns out the old behaviour, which issued excessive journal flushes, worked around a performance issue where queueing delays would cause the journal to not be able to write quickly enough and stall. The journal flushes masked the issue because they periodically flushed the device write cache, reducing write latency for non flushes. This patch reworks the journalling code to allow more than one (non-flush) write to be in flight at a time. With this patch, doing 4k random writes and an iodepth of 128, we are now able to hit 560k iops to a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	38789c2508	bcachefs: closure per journal buf Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	5165400275	bcachefs: bio per journal buf Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	a555bcf4fa	bcachefs: convert journal replay ptrs to darray Eliminates some error paths - no longer have a hardcoded BCH_REPLICAS_MAX limit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	e6fab655e6	bcachefs: Avoid taking journal lock unnecessarily Previously, any time we failed to get a journal reservation we'd retry, with the journal lock held; but this isn't necessary given wait_event()/wake_up() ordering. This avoids performance cliffs when the journal starts to get backed up and lock contention shoots up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	a4e9233911	bcachefs: Avoid setting j->write_work unnecessarily Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	656f05d8bd	bcachefs: Split out journal workqueue We don't want journal write completions to be blocked behind btree transactions - io_complete_wq is used for btree updates after data and metadata writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	612e1110d6	bcachefs: Add gfp flags param to bch2_prt_task_backtrace() Fixes: `e6a2566f7a` ("bcachefs: Better journal tracepoints") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: smatch	2024-01-22 12:37:51 -05:00
Kent Overstreet	e6a2566f7a	bcachefs: Better journal tracepoints Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	41b84fb489	bcachefs: for_each_member_device_rcu() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	9fea2274f7	bcachefs: for_each_member_device() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	cf904c8d96	bcachefs: bch_err_(fn\|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	09caeabe1a	bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	b05c0e9370	bcachefs: journal->buf_lock Add a new lock for synchronizing between journal IO path and btree write buffer flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	ae0e61175e	bcachefs: Add a tracepoint for journal entry close Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	066a26460b	bcachefs: track_event_change() This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	fa5df9e7d5	bcachefs: Include average write size in sysfs journal_debug Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	a66ff26b0f	bcachefs: Close journal entry if necessary when flushing all pins Since outstanding journal buffers hold a journal pin, when flushing all pins we need to close the current journal entry if necessary so its pin can be released. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-12-10 16:53:46 -05:00
Kent Overstreet	ef0beeb8dd	bcachefs: move journal seq assertion journal_cur_seq() can legitimately be used outside of the journal lock, where this assert can race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-28 22:58:22 -05:00
Kent Overstreet	006ccc3090	bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Kent Overstreet	bbe682c767	bcachefs: Ensure devices are always correctly initialized We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-31 12:18:37 -04:00
Kent Overstreet	4637429e39	bcachefs: bch2_sb_field_get() refactoring Instead of using token pasting to generate methods for each superblock section, just make the type a parameter to bch2_sb_field_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Brian Foster	3e55189b50	bcachefs: fix race between journal entry close and pin set bcachefs freeze testing via fstests generic/390 occasionally reproduces the following BUG from bch2_fs_read_only(): BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty)); This indicates that one or more dirty key cache keys still exist after the attempt to flush and quiesce the fs. The sequence that leads to this problem actually occurs on unfreeze (ro->rw), and looks something like the following: - Task A begins a transaction commit and acquires journal_res for the current seq. This transaction intends to perform key cache insertion. - Task B begins a bch2_journal_flush() via bch2_sync_fs(). This ends up in journal_entry_want_write(), which closes the current journal entry and drops the reference to the pin list created on entry open. The pin put pops the front of the journal via fast reclaim since the reference count has dropped to 0. - Task A attempts to set the journal pin for the associated cached key, but bch2_journal_pin_set() skips the pin insert because the seq of the transaction reservation is behind the front of the pin list fifo. The end result is that the pin associated with the cached key is not added, which prevents a subsequent reclaim from processing the key and thus leaves it dangling at freeze time. The fundamental cause of this problem is that the front of the journal is allowed to pop before a transaction with outstanding reservation on the associated journal seq is able to add a pin. The count for the pin list associated with the seq drops to zero and is prematurely reclaimed as a result. The logical fix for this problem lies in how the journal buffer is managed in similar scenarios where the entry might have been closed before a transaction with outstanding reservations happens to be committed. When a journal entry is opened, the current sequence number is bumped, the associated pin list is initialized with a reference count of 1, and the journal buffer reference count is bumped (via journal_state_inc()). When a journal reservation is acquired, the reservation also acquires a reference on the associated buffer. If the journal entry is closed in the meantime, it drops both the pin and buffer references held by the open entry, but the buffer still has references held by outstanding reservation. After the associated transaction commits, the reservation release drops the associated buffer references and the buffer is written out once the reference count has dropped to zero. The fundamental problem here is that the lifecycle of the pin list reference held by an open journal entry is too short to cover the processing of transactions with outstanding reservations. The simplest way to address this is to expand the pin list reference to the lifecycle of the buffer vs. the shorter lifecycle of the open journal entry. This ensures the pin list for a seq with outstanding reservation cannot be popped and reclaimed before all outstanding reservations have been released, even if the associated journal entry has been closed for further reservations. Move the pin put from journal entry close to where final processing of the journal buffer occurs. Create a duplicate helper to cover the case where the caller doesn't already hold the journal lock. This allows generic/390 to pass reliably. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Brian Foster	fc08031bb8	bcachefs: prepare journal buf put to handle pin put bcachefs freeze testing has uncovered some raciness between journal entry open/close and pin list reference count management. The details of the problem are described in a separate patch. In preparation for the associated fix, refactor the journal buffer put path a bit to allow it to eventually handle dropping the pin list reference currently held by an open journal entry. Retain the journal write dispatch helper since the closure code is inlined and we don't want to increase the amount of inline code in the transaction commit path, but rename the function to reflect the purpose of final processing of the journal buffer. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00
Brian Foster	92b63f5bf0	bcachefs: refactor pin put helpers We have a couple journal pin put helpers to handle cases where the journal lock is already held or not. Refactor the helpers to lock and reclaim from the highest level and open code the reclaim from the one caller of the internal variant. The latter call will be moved into the journal buf release helper in a later patch. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:14 -04:00

1 2 3 4

186 Commits