master
387 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Andrey Skvortsov
|
486fd58af7 |
zram: don't free statically defined names
When CONFIG_ZRAM_MULTI_COMP isn't set ZRAM_SECONDARY_COMP can hold
default_compressor, because it's the same offset as ZRAM_PRIMARY_COMP, so
we need to make sure that we don't attempt to kfree() the statically
defined compressor name.
This is detected by KASAN.
==================================================================
Call trace:
kfree+0x60/0x3a0
zram_destroy_comps+0x98/0x198 [zram]
zram_reset_device+0x22c/0x4a8 [zram]
reset_store+0x1bc/0x2d8 [zram]
dev_attr_store+0x44/0x80
sysfs_kf_write+0xfc/0x188
kernfs_fop_write_iter+0x28c/0x428
vfs_write+0x4dc/0x9b8
ksys_write+0x100/0x1f8
__arm64_sys_write+0x74/0xb8
invoke_syscall+0xd8/0x260
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x40/0xc8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198
==================================================================
Link: https://lkml.kernel.org/r/20240923164843.1117010-1-andrej.skvortzov@gmail.com
Fixes:
|
||
Linus Torvalds
|
617a814f14 |
ALong with the usual shower of singleton patches, notable patch series in
this pull request are: "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. "mm: split underused THPs" from Yu Zhao. Improve THP=always policy - this was overprovisioning THPs in sparsely accessed memory areas. "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo= =s0T+ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ... |
||
Sergey Senozhatsky
|
684826f827 |
zram: free secondary algorithms names
We need to kfree() secondary algorithms names when reset zram device that
had multi-streams, otherwise we leak memory.
[senozhatsky@chromium.org: kfree(NULL) is legal]
Link: https://lkml.kernel.org/r/20240917013021.868769-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20240911025600.3681789-1-senozhatsky@chromium.org
Fixes:
|
||
Sergey Senozhatsky
|
e899007a5e |
zram: support priority parameter in recompression
recompress device attribute supports alg=NAME parameter so that we can specify only one particular algorithm we want to perform recompression with. However, with algo params we now can have several exactly same secondary algorithms but each with its own params tuning (e.g. priority 1 configured to use more aggressive level, and priority 2 configured to use a pre-trained dictionary). Support priority=NUM parameter so that we can correctly determine which secondary algorithm we want to use. Link: https://lkml.kernel.org/r/20240902105656.1383858-25-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
6a559ecd6e |
zram: add dictionary support to zstd backend
This adds support for pre-trained zstd dictionaries [1] Dictionary is setup in params once (per-comp) and loaded to Cctx and Dctx by reference, so we don't allocate extra memory. TEST ==== *** zstd /sys/block/zram0/mm_stat 1750654976 504565092 514203648 0 514203648 1 0 34204 34204 *** zstd dict=/etc/zstd-dict-amd64 /sys/block/zram0/mm_stat 1750638592 465851259 475373568 0 475373568 1 0 34185 34185 *** zstd level=8 dict=/etc/zstd-dict-amd64 /sys/block/zram0/mm_stat 1750642688 430765171 439955456 0 439955456 1 0 34185 34185 [1] https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md#dictionary-builder Link: https://lkml.kernel.org/r/20240902105656.1383858-23-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
1e673c8cf7 |
zram: add dictionary support to lz4hc
Support pre-trained dictionary param. Just like lz4, lz4hc doesn't mandate specific format of the dictionary and zstd --train can be used to train a dictionary for lz4, according to [1]. TEST ==== *** lz4hc /sys/block/zram0/mm_stat 1750638592 608954620 621031424 0 621031424 1 0 34288 34288 *** lz4hc dict=/etc/lz4-dict-amd64 /sys/block/zram0/mm_stat 1750671360 505068582 514994176 0 514994176 1 0 34278 34278 [1] https://github.com/lz4/lz4/issues/557 Link: https://lkml.kernel.org/r/20240902105656.1383858-22-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
fb4f644ee8 |
zram: add dictionary support to lz4
Support pre-trained dictionary param. lz4 doesn't mandate specific format of the dictionary and even zstd --train can be used to train a dictionary for lz4, according to [1]. TEST ==== *** lz4 /sys/block/zram0/mm_stat 1750654976 664188565 676864000 0 676864000 1 0 34288 34288 *** lz4 dict=/etc/lz4-dict-amd64 /sys/block/zram0/mm_stat 1750638592 619891141 632053760 0 632053760 1 0 34278 34278 *** lz4 level=5 dict=/etc/lz4-dict-amd64 /sys/block/zram0/mm_stat 1750638592 727174243 740810752 0 740810752 1 0 34437 34437 [1] https://github.com/lz4/lz4/issues/557 Link: https://lkml.kernel.org/r/20240902105656.1383858-21-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
b8f03cb703 |
zram: move immutable comp params away from per-CPU context
Immutable params never change once comp has been allocated and setup, so we don't need to store multiple copies of them in each per-CPU backend context. Move those to per-comp zcomp_params and pass it to backends callbacks for requests execution. Basically, this means parameters sharing between different contexts. Also introduce two new backends callbacks: setup_params() and release_params(). First, we need to validate params in a driver-specific way; second, driver may want to allocate its specific representation of the params which is needed to execute requests. Link: https://lkml.kernel.org/r/20240902105656.1383858-20-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
6a81bdfeb3 |
zram: introduce zcomp_ctx structure
Keep run-time driver data (scratch buffers, etc.) in zcomp_ctx structure. This structure is allocated per-CPU because drivers (backends) need to modify its content during requests execution. We will split mutable and immutable driver data, this is a preparation path. Link: https://lkml.kernel.org/r/20240902105656.1383858-19-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
52c7b4e2ba |
zram: introduce zcomp_req structure
Encapsulate compression/decompression data in zcomp_req structure. Link: https://lkml.kernel.org/r/20240902105656.1383858-18-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
dea77d7aea |
zram: add support for dict comp config
Handle dict=path algorithm param so that we can read a pre-trained compression algorithm dictionary which we then pass to the backend configuration. Link: https://lkml.kernel.org/r/20240902105656.1383858-17-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
4eac932103 |
zram: introduce algorithm_params device attribute
This attribute is used to setup compression algorithms' parameters, so we can tweak algorithms' characteristics. At this point only 'level' is supported (to be extended in the future). Each call sets up parameters for one particular algorithm, which should be specified either by the algorithm's priority or algo name. This is expected to be called after corresponding algorithm is selected via comp_algorithm or recomp_algorithm. echo "priority=0 level=1" > /sys/block/zram0/algorithm_params or echo "algo=zstd level=1" > /sys/block/zram0/algorithm_params Link: https://lkml.kernel.org/r/20240902105656.1383858-16-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
eb826a0190 |
zram: recalculate zstd compression params once
zstd compression params depends on level, but are constant for a given instance of zstd compression backend. Calculate once (during ctx creation). Link: https://lkml.kernel.org/r/20240902105656.1383858-15-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
f2bac7ad18 |
zram: introduce zcomp_params structure
We will store a per-algorithm parameters there (compression level, dictionary, dictionary size, etc.). Link: https://lkml.kernel.org/r/20240902105656.1383858-14-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
1a78390d87 |
zram: check that backends array has at least one backend
Make sure that backends array has anything apart from the sentinel NULL value. We also select LZO_BACKEND if none backends were selected. Link: https://lkml.kernel.org/r/20240902105656.1383858-13-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
1d3100cf14 |
zram: add 842 compression backend support
Add s/w 842 compression support. Link: https://lkml.kernel.org/r/20240902105656.1383858-12-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
84112e314f |
zram: add zlib compression backend support
Add s/w zlib (inflate/deflate) compression. Link: https://lkml.kernel.org/r/20240902105656.1383858-11-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
dbf2763cec |
zram: pass estimated src size hint to zstd
zram works with PAGE_SIZE buffers, so we always know exact size of the source buffer and hence can pass estimated_src_size to zstd_get_params(). This hint on x86_64, for example, reduces the size of the work memory buffer from 1303520 bytes down to 90080 bytes. Given that compression streams are per-CPU that's quite some memory saving. Link: https://lkml.kernel.org/r/20240902105656.1383858-10-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
73e7d81abb |
zram: add zstd compression backend support
Add s/w zstd compression. Link: https://lkml.kernel.org/r/20240902105656.1383858-9-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
c60a4ef544 |
zram: add lz4hc compression backend support
Add s/w lz4hc compression support. Link: https://lkml.kernel.org/r/20240902105656.1383858-8-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
22d651c3b3 |
zram: add lz4 compression backend support
Add s/w lz4 compression support. Link: https://lkml.kernel.org/r/20240902105656.1383858-7-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
2152247c55 |
zram: add lzo and lzorle compression backends support
Add s/w lzo/lzorle compression support. Link: https://lkml.kernel.org/r/20240902105656.1383858-6-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
917a59e81c |
zram: introduce custom comp backends API
Moving to custom backends implementation gives us ability to have our own minimalistic and extendable API, and algorithms tunings becomes possible. The list of compression backends is empty at this point, we will add backends in the followup patches. Link: https://lkml.kernel.org/r/20240902105656.1383858-5-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sebastian Andrzej Siewior
|
68d20eb60e |
zram: Shrink zram_table_entry::flags.
The zram_table_entry::flags member is of type long and uses 8 bytes on a 64bit architecture. With a PAGE_SIZE of 256KiB we have PAGE_SHIFT of 18 which in turn leads to __NR_ZRAM_PAGEFLAGS = 27. This still fits in an ordinary integer. By reducing the size of `flags' to four bytes, the size of the struct goes back to 16 bytes. The padding between the lock and ac_time (if enabled) is also gone. Make zram_table_entry::flags an unsigned int and update the build test to reflect the change. Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20240906141520.730009-4-bigeasy@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Sebastian Andrzej Siewior
|
6086aeb49e |
zram: Remove ZRAM_LOCK
The ZRAM_LOCK was used for locking and after the addition of spinlock_t the bit set and cleared but there no reader of it. Remove the ZRAM_LOCK bit. Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20240906141520.730009-3-bigeasy@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Mike Galbraith
|
9518e5bfaa |
zram: Replace bit spinlocks with a spinlock_t.
The bit spinlock disables preemption. The spinlock_t lock becomes a sleeping lock on PREEMPT_RT and it can not be acquired in this context. In this locked section, zs_free() acquires a zs_pool::lock, and there is access to zram::wb_limit_lock. Add a spinlock_t for locking. Keep the set/ clear ZRAM_LOCK bit after the lock has been acquired/ dropped. The size of struct zram_table_entry increases by 4 bytes due to lock and additional 4 bytes padding with CONFIG_ZRAM_TRACK_ENTRY_ACTIME enabled. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20240906141520.730009-2-bigeasy@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Matthew Wilcox (Oracle)
|
04cb7502a5 |
zsmalloc: use all available 24 bits of page_type
Now that we have an extra 8 bits, we don't need to limit ourselves to supporting a 64KiB page size. I'm sure both Hexagon users are grateful, but it does reduce complexity a little. We can also remove reset_first_obj_offset() as calling __ClearPageZsmalloc() will now reset all 32 bits of page_type. Link: https://lkml.kernel.org/r/20240821173914.2270383-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Linus Torvalds
|
fbc90c042c |
- 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff). Yu has a fix which I'll send along later via the hotfixes branch. - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Is anyone reading this stuff? If so, email me! - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8= =z0Lf -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ... |
||
David Hildenbrand
|
43d746dc49 |
mm/zsmalloc: use a proper page type
Let's clean it up: use a proper page type and store our data (offset into a page) in the lower 16 bit as documented. We won't be able to support 256 KiB base pages, which is acceptable. Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC. Based on this, we should do a proper "struct zsdesc" conversion, as proposed in [1]. This removes the last _mapcount/page_type offender. [1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/ Link: https://lkml.kernel.org/r/20240529111904.2069608-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> [zram/zsmalloc workloads] Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Christoph Hellwig
|
aadd5c59c9 |
block: move the synchronous flag to queue_limits
Move the synchronous flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Christoph Hellwig
|
1a02f3a73f |
block: move the stable_writes flag to queue_limits
Move the stable_writes flag into the queue_limits feature field so that it can be set atomically with the queue frozen. The flag is now inherited by blk_stack_limits, which greatly simplifies the code in dm, and fixed md which previously did not pass on the flag set on lower devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Christoph Hellwig
|
bd4a633b6f |
block: move the nonrot flag to queue_limits
Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Linus Torvalds
|
5ad8b6ad9a |
getting rid of bogus set_blocksize() uses, switching it
to struct file * and verifying that caller has device opened exclusively. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwkfQAKCRBZ7Krx/gZQ 62C3AQDW5vuXNx2+KDPma5YStjFpPLC0xtSyAS5D3YANjtyRFgD/TOcCarq7rvBt KubxHVFsfW+eu6ASeaoMRB83w5OIzwk= =Liix -----END PGP SIGNATURE----- Merge tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs blocksize updates from Al Viro: "This gets rid of bogus set_blocksize() uses, switches it over to be based on a 'struct file *' and verifies that the caller has the device opened exclusively" * tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make set_blocksize() fail unless block device is opened exclusive set_blocksize(): switch to passing struct file * btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens swsusp: don't bother with setting block size zram: don't bother with reopening - just use O_EXCL for open swapon(2): open swap with O_EXCL swapon(2)/swapoff(2): don't bother with block size pktcdvd: sort set_blocksize() calls out bcache_register(): don't bother with set_blocksize() |
||
Al Viro
|
ebb0173df2 |
zram: don't bother with reopening - just use O_EXCL for open
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Sergey Senozhatsky
|
34efe1c3b6 |
zram: add max_pages param to recompression
Introduce "max_pages" param to recompress device attribute which sets an upper limit on the number of entries (pages) zram attempts to recompress (in this particular recompression call). S/W recompression can be quite expensive so limiting the number of pages recompress touches can be quite helpful. Link: https://lkml.kernel.org/r/20240329094050.2815699-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Brian Geffon <bgeffon@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Linus Torvalds
|
902861e34c |
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx TMNhHfyiHYDTx/GAV9NXW84tasJSDgA= =TG55 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ... |
||
Linus Torvalds
|
1ddeeb2a05 |
for-6.9/block-20240310
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmXuFO4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpq33D/9hyNyBce2A9iyo026eK8EqLDoed6BPzuvB kLKj5tsGvX4YlfuswvP86M5dgibTASXclnfUK394TijW/JPOfJ3mNhi9gMnHzRoK ZaR1di0Lum56dY1FkpMmWiGmE4fB79PAtXYKtajOkuoIcNzylncEAAACUY4/Ouhg Cm+LMg2prcc+m9g8rKDNQ51pUFg4U21KAUTl35XLMUAaQk1ahW3EDEVYhweC/zwE V/5hJsv8UY72+oQGY2Dc/YgQk/Zj4ZDh7C+oHR9XeB/ro99kr3/Vopagu0gBMLZi Rq6qqz6PVMhVcuz8uN2rsTQKXmXhsBn9/adsl4AKtdxcW5D5moWb5BLq1P0WQylc nzMxa1d6cVcTKZpaUQQv3Rj6ZMrLuDwP277UYHfn5x1oPWYRZCG7FtHuOo1gNcpG DrSNwVG6BSDcbABqI+MIS2oD1JoUMyevjwT7e2hOXukZhc6GLO5F3ODWE5j3KnCR S/aGSAmcdR4fTcgavULqWdQVt7SYl4f1IxT8KrUirJGVhc2LgahaWj69ooklVHoU fPDFRiruwJ5YkH4RWCSDm9mi4kAz6eUf+f4yE06wZOFOb2fT8/1ZK2Snpz2KeXuZ INO0RejtFzT8L0OUlu7dBmF20y6rgAYt87lR8mIt71yuuATIrVhzlX1VdsvhdrAo VLHGV1Ncgw== =WlVL -----END PGP SIGNATURE----- Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - MD pull requests via Song: - Cleanup redundant checks (Yu Kuai) - Remove deprecated headers (Marc Zyngier, Song Liu) - Concurrency fixes (Li Lingfeng) - Memory leak fix (Li Nan) - Refactor raid1 read_balance (Yu Kuai, Paul Luse) - Clean up and fix for md_ioctl (Li Nan) - Other small fixes (Gui-Dong Han, Heming Zhao) - MD atomic limits (Christoph) - NVMe pull request via Keith: - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith) - Improvement and cleanup for cached request handling (Christoph) - Moving towards atomic queue limits. Core changes and driver bits so far (Christoph) - Fix UAF issues in aoeblk (Chun-Yi) - Zoned fix and cleanups (Damien) - s390 dasd cleanups and fixes (Jan, Miroslav) - Block issue timestamp caching (me) - noio scope guarding for zoned IO (Johannes) - block/nvme PI improvements (Kanchan) - Ability to terminate long running discard loop (Keith) - bdev revalidation fix (Li) - Get rid of old nr_queues hack for kdump kernels (Ming) - Support for async deletion of ublk (Ming) - Improve IRQ bio recycling (Pavel) - Factor in CPU capacity for remote vs local completion (Qais) - Add shared_tags configfs entry for null_blk (Shin'ichiro - Fix for a regression in page refcounts introduced by the folio unification (Tony) - Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid, Ricardo, Roman, Tang, Uwe) * tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits) block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC block/swim: Convert to platform remove callback returning void cdrom: gdrom: Convert to platform remove callback returning void block: remove disk_stack_limits md: remove mddev->queue md: don't initialize queue limits md/raid10: use the atomic queue limit update APIs md/raid5: use the atomic queue limit update APIs md/raid1: use the atomic queue limit update APIs md/raid0: use the atomic queue limit update APIs md: add queue limit helpers md: add a mddev_is_dm helper md: add a mddev_add_trace_msg helper md: add a mddev_trace_remap helper bcache: move calculation of stripe_size and io_opt into bcache_device_init virtio_blk: Do not use disk_set_max_open/active_zones() aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts block: move capacity validation to blkpg_do_ioctl() block: prevent division by zero in blk_rq_stat_sum() drbd: atomically update queue limits in drbd_reconsider_queue_parameters ... |
||
Kefeng Wang
|
9602e0ce98 |
zram: zcomp: remove zcomp_set_max_streams() declaration
The zcomp_set_max_streams() is removed from commit
|
||
Christian Brauner
|
be914f8fd2
|
zram: port block device access to file
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-12-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Barry Song
|
45866e0e21 |
zram: do not allocate physically contiguous strm buffers
Currently zram allocates 2 physically contiguous pages per-CPU's compression stream (we may have up to 4 streams per-CPU). Since those buffers are per-CPU we allocate them from CPU hotplug path, which may have higher risks of failed allocations on devices with fragmented memory. Switch to virtually contiguous allocations - crypto comp does not seem impose requirements on compression working buffers to be physically contiguous. Link: https://lkml.kernel.org/r/20240213065400.6561-1-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Mark-PK Tsai
|
80ba4caf8b |
zram: use copy_page for full page copy
Some architectures, such as arm, have implemented optimized copy_page for full page copying. Replace the full page memcpy with copy_page to take advantage of the optimization. Link: https://lkml.kernel.org/r/20231007070554.8657-1-mark-pk.tsai@mediatek.com Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: YJ Chiang <yj.chiang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Christoph Hellwig
|
4190b3f291 |
zram: pass queue_limits to blk_mq_alloc_disk
Pass the queue limits directly to blk_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Christoph Hellwig
|
74fa8f9c55 |
block: pass a queue_limits argument to blk_alloc_disk
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also change blk_alloc_disk to return an ERR_PTR instead of just NULL which can't distinguish errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Linus Torvalds
|
01d550f0fc |
for-6.8/block-2024-01-08
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmWcIOIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpn6hD/9oO7U75PuxUwYYHZ9Uzxpw6gQ0LEmeyJmE NQYCkfYHVq3IsgOdF7elI9v3qtr6v8V8CdB7cByrnn3DgwsMuiTKZZ0dK7vH37PO DX+/xn349e8oH7RdRo7f3m95g1YbHfpfnj0Rc4mjTDV72Jr/HlLTVgGTQg8DEnCR wBIFmeuBHHgeeLh87gsWLAP7ReReiy9V1uqpDFsko2/4BxRAM/8eedkwcAxD8aEy rd+dT/SBQj2cOdQMUeExT3gWjwzHh6ZHx3f1WCLK5fdck6BogH2hBUeri6F/H98L HoaXjBZYBTH68hB/mnO5I4g1ZlrVM74Vp7JPa3e1SFFtyEi6lsyrk2J3GoNh0E7r pXqH5kAcaJwBsBrbRGuvEyGbn9RLTaN5Gvseud0VE4oMruyodTniQaHXuIGackgz sMavMho4486EUWPaF7gIBdLNK1hO13w+IDZ4+3oBxhudMqdgZbk4iYpOCqQ7QY5G 2vkzAE/sZ+aVNXeaIQOI8dE5clBy8gJ+6+t8dm3DY1r1xdbcnU40iZ8/fri3h69r vHs9bpQnVWZF0gEyEflY1pkcAPpIkvMmWCR7Ehy5YCkIfa+qfSL05o3dicpWovLP N+gCtpkhTK2AvmUWsUMypMLRvoSOImyCIiobrr3qNBaUdgRP8xKfUa72RuRp8cGl Vrj5oAiE3w== =YAfp -----END PGP SIGNATURE----- Merge tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round this time around. This contains: - NVMe updates via Keith: - nvme fabrics spec updates (Guixin, Max) - nvme target udpates (Guixin, Evan) - nvme attribute refactoring (Daniel) - nvme-fc numa fix (Keith) - MD updates via Song: - Fix/Cleanup RCU usage from conf->disks[i].rdev (Yu Kuai) - Fix raid5 hang issue (Junxiao Bi) - Add Yu Kuai as Reviewer of the md subsystem - Remove deprecated flavors (Song Liu) - raid1 read error check support (Li Nan) - Better handle events off-by-1 case (Alex Lyakas) - Efficiency improvements for passthrough (Kundan) - Support for mapping integrity data directly (Keith) - Zoned write fix (Damien) - rnbd fixes (Kees, Santosh, Supriti) - Default to a sane discard size granularity (Christoph) - Make the default max transfer size naming less confusing (Christoph) - Remove support for deprecated host aware zoned model (Christoph) - Misc fixes (me, Li, Matthew, Min, Ming, Randy, liyouhong, Daniel, Bart, Christoph)" * tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux: (78 commits) block: Treat sequential write preferred zone type as invalid block: remove disk_clear_zoned sd: remove the !ZBC && blk_queue_is_zoned case in sd_read_block_characteristics drivers/block/xen-blkback/common.h: Fix spelling typo in comment blk-cgroup: fix rcu lockdep warning in blkg_lookup() blk-cgroup: don't use removal safe list iterators block: floor the discard granularity to the physical block size mtd_blkdevs: use the default discard granularity bcache: use the default discard granularity zram: use the default discard granularity null_blk: use the default discard granularity nbd: use the default discard granularity ubd: use the default discard granularity block: default the discard granularity to sector size bcache: discard_granularity should not be smaller than a sector block: remove two comments in bio_split_discard block: rename and document BLK_DEF_MAX_SECTORS loop: don't abuse BLK_DEF_MAX_SECTORS aoe: don't abuse BLK_DEF_MAX_SECTORS null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS ... |
||
Christoph Hellwig
|
3753039def |
zram: use the default discard granularity
The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Sergey Senozhatsky
|
73829b7134 |
zram: use kmap_local_page()
Use kmap_local_page() instead of kmap_atomic() which has been deprecated. Link: https://lkml.kernel.org/r/20231128083845.848008-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
2e16898d0d |
zram: tweak writeback config help
Writeback is for incompressible and idle zram pages. Link: https://lkml.kernel.org/r/20231115024223.4133148-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sergey Senozhatsky
|
a7a0350583 |
zram: split memory-tracking and ac-time tracking
ZRAM_MEMORY_TRACKING enables two features: - per-entry ac-time tracking - debugfs interface The latter one is the reason why memory-tracking depends on DEBUG_FS, while the former one is used far beyond debugging these days. Namely ac-time is used for fine grained writeback of idle entries (pages). Move ac-time tracking under its own config option so that it can be enabled (along with writeback) on systems without DEBUG_FS. [senozhatsky@chromium.org: ifdef fixup, per Dmytro] Link: https://lkml.kernel.org/r/20231117013543.540280-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20231115024223.4133148-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Jan Kara
|
eed993a091
|
zram: Convert to use bdev_open_by_dev()
Convert zram to use bdev_open_by_dev() and pass the handle around. CC: Minchan Kim <minchan@kernel.org> CC: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-8-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Christoph Hellwig
|
95848dcb9d |
zram: take device and not only bvec offset into account
Commit |