1

ALong with the usual shower of singleton patches, notable patch series in

this pull request are:
 
 "Align kvrealloc() with krealloc()" from Danilo Krummrich.  Adds
 consistency to the APIs and behaviour of these two core allocation
 functions.  This also simplifies/enables Rustification.
 
 "Some cleanups for shmem" from Baolin Wang.  No functional changes - mode
 code reuse, better function naming, logic simplifications.
 
 "mm: some small page fault cleanups" from Josef Bacik.  No functional
 changes - code cleanups only.
 
 "Various memory tiering fixes" from Zi Yan.  A small fix and a little
 cleanup.
 
 "mm/swap: remove boilerplate" from Yu Zhao.  Code cleanups and
 simplifications and .text shrinkage.
 
 "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt.  This
 is a feature, it adds new feilds to /proc/vmstat such as
 
     $ grep kstack /proc/vmstat
     kstack_1k 3
     kstack_2k 188
     kstack_4k 11391
     kstack_8k 243
     kstack_16k 0
 
 which tells us that 11391 processes used 4k of stack while none at all
 used 16k.  Useful for some system tuning things, but partivularly useful
 for "the dynamic kernel stack project".
 
 "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
 Teaches kmemleak to detect leaksage of percpu memory.
 
 "mm: memcg: page counters optimizations" from Roman Gushchin.  "3
 independent small optimizations of page counters".
 
 "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
 Hildenbrand.  Improves PTE/PMD splitlock detection, makes powerpc/8xx work
 correctly by design rather than by accident.
 
 "mm: remove arch_make_page_accessible()" from David Hildenbrand.  Some
 folio conversions which make arch_make_page_accessible() unneeded.
 
 "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
 Cleans up and fixes our handling of the resetting of the cgroup/process
 peak-memory-use detector.
 
 "Make core VMA operations internal and testable" from Lorenzo Stoakes.
 Rationalizaion and encapsulation of the VMA manipulation APIs.  With a
 view to better enable testing of the VMA functions, even from a
 userspace-only harness.
 
 "mm: zswap: fixes for global shrinker" from Takero Funaki.  Fix issues in
 the zswap global shrinker, resulting in improved performance.
 
 "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao.  Fill in
 some missing info in /proc/zoneinfo.
 
 "mm: replace follow_page() by folio_walk" from David Hildenbrand.  Code
 cleanups and rationalizations (conversion to folio_walk()) resulting in
 the removal of follow_page().
 
 "improving dynamic zswap shrinker protection scheme" from Nhat Pham.  Some
 tuning to improve zswap's dynamic shrinker.  Significant reductions in
 swapin and improvements in performance are shown.
 
 "mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
 Improvements to the new unaccepted memory feature,
 
 "mm/mprotect: Fix dax puds" from Peter Xu.  Implements mprotect on DAX
 PUDs.  This was missing, although nobody seems to have notied yet.
 
 "Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
 Cleanups and modest performance improvements for the maple tree library
 code.
 
 "memcg: further decouple v1 code from v2" from Shakeel Butt.  Move more
 cgroup v1 remnants away from the v2 memcg code.
 
 "memcg: initiate deprecation of v1 features" from Shakeel Butt.  Adds
 various warnings telling users that memcg v1 features are deprecated.
 
 "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
 Greatly improves the success rate of the mTHP swap allocation.
 
 "mm: introduce numa_memblks" from Mike Rapoport.  Moves various disparate
 per-arch implementations of numa_memblk code into generic code.
 
 "mm: batch free swaps for zap_pte_range()" from Barry Song.  Greatly
 improves the performance of munmap() of swap-filled ptes.
 
 "support large folio swap-out and swap-in for shmem" from Baolin Wang.
 With this series we no longer split shmem large folios into simgle-page
 folios when swapping out shmem.
 
 "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao.  Nice performance
 improvements and code reductions for gigantic folios.
 
 "support shmem mTHP collapse" from Baolin Wang.  Adds support for
 khugepaged's collapsing of shmem mTHP folios.
 
 "mm: Optimize mseal checks" from Pedro Falcato.  Fixes an mprotect()
 performance regression due to the addition of mseal().
 
 "Increase the number of bits available in page_type" from Matthew Wilcox.
 Increases the number of bits available in page_type!
 
 "Simplify the page flags a little" from Matthew Wilcox.  Many legacy page
 flags are now folio flags, so the page-based flags and their
 accessors/mutators can be removed.
 
 "mm: store zero pages to be swapped out in a bitmap" from Usama Arif.  An
 optimization which permits us to avoid writing/reading zero-filled zswap
 pages to backing store.
 
 "Avoid MAP_FIXED gap exposure" from Liam Howlett.  Fixes a race window
 which occurs when a MAP_FIXED operqtion is occurring during an unrelated
 vma tree walk.
 
 "mm: remove vma_merge()" from Lorenzo Stoakes.  Major rotorooting of the
 vma_merge() functionality, making ot cleaner, more testable and better
 tested.
 
 "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.  Minor
 fixups of DAMON selftests and kunit tests.
 
 "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.  Code
 cleanups and folio conversions.
 
 "Shmem mTHP controls and stats improvements" from Ryan Roberts.  Cleanups
 for shmem controls and stats.
 
 "mm: count the number of anonymous THPs per size" from Barry Song.  Expose
 additional anon THP stats to userspace for improved tuning.
 
 "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
 conversions and removal of now-unused page-based APIs.
 
 "replace per-quota region priorities histogram buffer with per-context
 one" from SeongJae Park.  DAMON histogram rationalization.
 
 "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
 Park.  DAMON documentation updates.
 
 "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
 related doc and warn" from Jason Wang: fixes usage of page allocator
 __GFP_NOFAIL and GFP_ATOMIC flags.
 
 "mm: split underused THPs" from Yu Zhao.  Improve THP=always policy - this
 was overprovisioning THPs in sparsely accessed memory areas.
 
 "zram: introduce custom comp backends API" frm Sergey Senozhatsky.  Add
 support for zram run-time compression algorithm tuning.
 
 "mm: Care about shadow stack guard gap when getting an unmapped area" from
 Mark Brown.  Fix up the various arch_get_unmapped_area() implementations
 to better respect guard areas.
 
 "Improve mem_cgroup_iter()" from Kinsey Ho.  Improve the reliability of
 mem_cgroup_iter() and various code cleanups.
 
 "mm: Support huge pfnmaps" from Peter Xu.  Extends the usage of huge
 pfnmap support.
 
 "resource: Fix region_intersects() vs add_memory_driver_managed()" from
 Huang Ying.  Fix a bug in region_intersects() for systems with CXL memory.
 
 "mm: hwpoison: two more poison recovery" from Kefeng Wang.  Teaches a
 couple more code paths to correctly recover from the encountering of
 poisoned memry.
 
 "mm: enable large folios swap-in support" from Barry Song.  Support the
 swapin of mTHP memory into appropriately-sized folios, rather than into
 single-page folios.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
 jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
 =s0T+
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
This commit is contained in:
Linus Torvalds 2024-09-21 07:29:05 -07:00
commit 617a814f14
379 changed files with 16279 additions and 8499 deletions

View File

@ -151,3 +151,10 @@ Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description: Description:
The recompress file is write-only and triggers re-compression The recompress file is write-only and triggers re-compression
with secondary compression algorithms. with secondary compression algorithms.
What: /sys/block/zram<id>/algorithm_params
Date: August 2024
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description:
The algorithm_params file is write-only and is used to setup
compression algorithm parameters.

View File

@ -102,17 +102,41 @@ Examples::
#select lzo compression algorithm #select lzo compression algorithm
echo lzo > /sys/block/zram0/comp_algorithm echo lzo > /sys/block/zram0/comp_algorithm
For the time being, the `comp_algorithm` content does not necessarily For the time being, the `comp_algorithm` content shows only compression
show every compression algorithm supported by the kernel. We keep this algorithms that are supported by zram.
list primarily to simplify device configuration and one can configure
a new device with a compression algorithm that is not listed in
`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
and, if some of the algorithms were built as modules, it's impossible
to list all of them using, for instance, /proc/crypto or any other
method. This, however, has an advantage of permitting the usage of
custom crypto compression modules (implementing S/W or H/W compression).
4) Set Disksize 4) Set compression algorithm parameters: Optional
=================================================
Compression algorithms may support specific parameters which can be
tweaked for particular dataset. ZRAM has an `algorithm_params` device
attribute which provides a per-algorithm params configuration.
For example, several compression algorithms support `level` parameter.
In addition, certain compression algorithms support pre-trained dictionaries,
which significantly change algorithms' characteristics. In order to configure
compression algorithm to use external pre-trained dictionary, pass full
path to the `dict` along with other parameters::
#pass path to pre-trained zstd dictionary
echo "algo=zstd dict=/etc/dictioary" > /sys/block/zram0/algorithm_params
#same, but using algorithm priority
echo "priority=1 dict=/etc/dictioary" > \
/sys/block/zram0/algorithm_params
#pass path to pre-trained zstd dictionary and compression level
echo "algo=zstd level=8 dict=/etc/dictioary" > \
/sys/block/zram0/algorithm_params
Parameters are algorithm specific: not all algorithms support pre-trained
dictionaries, not all algorithms support `level`. Furthermore, for certain
algorithms `level` controls the compression level (the higher the value the
better the compression ratio, it even can take negatives values for some
algorithms), for other algorithms `level` is acceleration level (the higher
the value the lower the compression ratio).
5) Set Disksize
=============== ===============
Set disk size by writing the value to sysfs node 'disksize'. Set disk size by writing the value to sysfs node 'disksize'.
@ -132,7 +156,7 @@ There is little point creating a zram of greater than twice the size of memory
since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
size of the disk when not in use so a huge zram is wasteful. size of the disk when not in use so a huge zram is wasteful.
5) Set memory limit: Optional 6) Set memory limit: Optional
============================= =============================
Set memory limit by writing the value to sysfs node 'mem_limit'. Set memory limit by writing the value to sysfs node 'mem_limit'.
@ -151,7 +175,7 @@ Examples::
# To disable memory limit # To disable memory limit
echo 0 > /sys/block/zram0/mem_limit echo 0 > /sys/block/zram0/mem_limit
6) Activate 7) Activate
=========== ===========
:: ::
@ -162,7 +186,7 @@ Examples::
mkfs.ext4 /dev/zram1 mkfs.ext4 /dev/zram1
mount /dev/zram1 /tmp mount /dev/zram1 /tmp
7) Add/remove zram devices 8) Add/remove zram devices
========================== ==========================
zram provides a control interface, which enables dynamic (on-demand) device zram provides a control interface, which enables dynamic (on-demand) device
@ -182,7 +206,7 @@ execute::
echo X > /sys/class/zram-control/hot_remove echo X > /sys/class/zram-control/hot_remove
8) Stats 9) Stats
======== ========
Per-device statistics are exported as various nodes under /sys/block/zram<id>/ Per-device statistics are exported as various nodes under /sys/block/zram<id>/
@ -205,6 +229,7 @@ writeback_limit_enable RW show and set writeback_limit feature
max_comp_streams RW the number of possible concurrent compress max_comp_streams RW the number of possible concurrent compress
operations operations
comp_algorithm RW show and change the compression algorithm comp_algorithm RW show and change the compression algorithm
algorithm_params WO setup compression algorithm parameters
compact WO trigger memory compaction compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes debug_stat RO this file is used for zram debugging purposes
backing_dev RW set up backend storage for zram to write out backing_dev RW set up backend storage for zram to write out
@ -283,15 +308,15 @@ a single line of text and contains the following stats separated by whitespace:
Unit: 4K bytes Unit: 4K bytes
============== ============================================================= ============== =============================================================
9) Deactivate 10) Deactivate
============= ==============
:: ::
swapoff /dev/zram0 swapoff /dev/zram0
umount /dev/zram1 umount /dev/zram1
10) Reset 11) Reset
========= =========
Write any positive value to 'reset' sysfs node:: Write any positive value to 'reset' sysfs node::
@ -487,11 +512,14 @@ registered compression algorithms, increases our chances of finding the
algorithm that successfully compresses a particular page. Sometimes, however, algorithm that successfully compresses a particular page. Sometimes, however,
it is convenient (and sometimes even necessary) to limit recompression to it is convenient (and sometimes even necessary) to limit recompression to
only one particular algorithm so that it will not try any other algorithms. only one particular algorithm so that it will not try any other algorithms.
This can be achieved by providing a algo=NAME parameter::: This can be achieved by providing a `algo` or `priority` parameter:::
#use zstd algorithm only (if registered) #use zstd algorithm only (if registered)
echo "type=huge algo=zstd" > /sys/block/zramX/recompress echo "type=huge algo=zstd" > /sys/block/zramX/recompress
#use zstd algorithm only (if zstd was registered under priority 1)
echo "type=huge priority=1" > /sys/block/zramX/recompress
memory tracking memory tracking
=============== ===============

View File

@ -78,18 +78,24 @@ Brief summary of control files.
memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded
memory.soft_limit_in_bytes set/show soft limit of memory usage memory.soft_limit_in_bytes set/show soft limit of memory usage
This knob is not available on CONFIG_PREEMPT_RT systems. This knob is not available on CONFIG_PREEMPT_RT systems.
This knob is deprecated and shouldn't be
used.
memory.stat show various statistics memory.stat show various statistics
memory.use_hierarchy set/show hierarchical account enabled memory.use_hierarchy set/show hierarchical account enabled
This knob is deprecated and shouldn't be This knob is deprecated and shouldn't be
used. used.
memory.force_empty trigger forced page reclaim memory.force_empty trigger forced page reclaim
memory.pressure_level set memory pressure notifications memory.pressure_level set memory pressure notifications
This knob is deprecated and shouldn't be
used.
memory.swappiness set/show swappiness parameter of vmscan memory.swappiness set/show swappiness parameter of vmscan
(See sysctl's vm.swappiness) (See sysctl's vm.swappiness)
memory.move_charge_at_immigrate set/show controls of moving charges memory.move_charge_at_immigrate set/show controls of moving charges
This knob is deprecated and shouldn't be This knob is deprecated and shouldn't be
used. used.
memory.oom_control set/show oom controls. memory.oom_control set/show oom controls.
This knob is deprecated and shouldn't be
used.
memory.numa_stat show the number of memory usage per numa memory.numa_stat show the number of memory usage per numa
node node
memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
@ -105,10 +111,18 @@ Brief summary of control files.
memory.kmem.max_usage_in_bytes show max kernel memory usage recorded memory.kmem.max_usage_in_bytes show max kernel memory usage recorded
memory.kmem.tcp.limit_in_bytes set/show hard limit for tcp buf memory memory.kmem.tcp.limit_in_bytes set/show hard limit for tcp buf memory
This knob is deprecated and shouldn't be
used.
memory.kmem.tcp.usage_in_bytes show current tcp buf memory allocation memory.kmem.tcp.usage_in_bytes show current tcp buf memory allocation
This knob is deprecated and shouldn't be
used.
memory.kmem.tcp.failcnt show the number of tcp buf memory usage memory.kmem.tcp.failcnt show the number of tcp buf memory usage
hits limits hits limits
This knob is deprecated and shouldn't be
used.
memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded
This knob is deprecated and shouldn't be
used.
==================================== ========================================== ==================================== ==========================================
1. History 1. History
@ -693,8 +707,10 @@ For compatibility reasons writing 1 to memory.use_hierarchy will always pass::
# echo 1 > memory.use_hierarchy # echo 1 > memory.use_hierarchy
7. Soft limits 7. Soft limits (DEPRECATED)
============== ===========================
THIS IS DEPRECATED!
Soft limits allow for greater sharing of memory. The idea behind soft limits Soft limits allow for greater sharing of memory. The idea behind soft limits
is to allow control groups to use as much of the memory as needed, provided is to allow control groups to use as much of the memory as needed, provided
@ -834,8 +850,10 @@ It's applicable for root and non-root cgroup.
.. _cgroup-v1-memory-oom-control: .. _cgroup-v1-memory-oom-control:
10. OOM Control 10. OOM Control (DEPRECATED)
=============== ============================
THIS IS DEPRECATED!
memory.oom_control file is for OOM notification and other controls. memory.oom_control file is for OOM notification and other controls.
@ -882,8 +900,10 @@ At reading, current status of OOM is shown.
The number of processes belonging to this cgroup killed by any The number of processes belonging to this cgroup killed by any
kind of OOM killer. kind of OOM killer.
11. Memory Pressure 11. Memory Pressure (DEPRECATED)
=================== ================================
THIS IS DEPRECATED!
The pressure level notifications can be used to monitor the memory The pressure level notifications can be used to monitor the memory
allocation cost; based on the pressure, applications can implement allocation cost; based on the pressure, applications can implement

View File

@ -1343,11 +1343,14 @@ The following nested keys are defined.
all the existing limitations and potential future extensions. all the existing limitations and potential future extensions.
memory.peak memory.peak
A read-only single value file which exists on non-root A read-write single value file which exists on non-root cgroups.
cgroups.
The max memory usage recorded for the cgroup and its The max memory usage recorded for the cgroup and its descendants since
descendants since the creation of the cgroup. either the creation of the cgroup or the most recent reset for that FD.
A write of any non-empty string to this file resets it to the
current memory usage for subsequent reads through the same
file descriptor.
memory.oom.group memory.oom.group
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
@ -1624,6 +1627,25 @@ The following nested keys are defined.
Usually because failed to allocate some continuous swap space Usually because failed to allocate some continuous swap space
for the huge page. for the huge page.
numa_pages_migrated (npn)
Number of pages migrated by NUMA balancing.
numa_pte_updates (npn)
Number of pages whose page table entries are modified by
NUMA balancing to produce NUMA hinting faults on access.
numa_hint_faults (npn)
Number of NUMA hinting faults.
pgdemote_kswapd
Number of pages demoted by kswapd.
pgdemote_direct
Number of pages demoted directly.
pgdemote_khugepaged
Number of pages demoted by khugepaged.
memory.numa_stat memory.numa_stat
A read-only nested-keyed file which exists on non-root cgroups. A read-only nested-keyed file which exists on non-root cgroups.
@ -1673,11 +1695,14 @@ The following nested keys are defined.
Healthy workloads are not expected to reach this limit. Healthy workloads are not expected to reach this limit.
memory.swap.peak memory.swap.peak
A read-only single value file which exists on non-root A read-write single value file which exists on non-root cgroups.
cgroups.
The max swap usage recorded for the cgroup and its The max swap usage recorded for the cgroup and its descendants since
descendants since the creation of the cgroup. the creation of the cgroup or the most recent reset for that FD.
A write of any non-empty string to this file resets it to the
current memory usage for subsequent reads through the same
file descriptor.
memory.swap.max memory.swap.max
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
@ -1741,6 +1766,8 @@ The following nested keys are defined.
Note that this is subtly different from setting memory.swap.max to Note that this is subtly different from setting memory.swap.max to
0, as it still allows for pages to be written to the zswap pool. 0, as it still allows for pages to be written to the zswap pool.
This setting has no effect if zswap is disabled, and swapping
is allowed unless memory.swap.max is set to 0.
memory.pressure memory.pressure
A read-only nested-keyed file. A read-only nested-keyed file.

View File

@ -4152,6 +4152,21 @@
Disable NUMA, Only set up a single NUMA node Disable NUMA, Only set up a single NUMA node
spanning all memory. spanning all memory.
numa=fake=<size>[MG]
[KNL, ARM64, RISCV, X86, EARLY]
If given as a memory unit, fills all system RAM with
nodes of size interleaved over physical nodes.
numa=fake=<N>
[KNL, ARM64, RISCV, X86, EARLY]
If given as an integer, fills all system RAM with N
fake nodes interleaved over physical nodes.
numa=fake=<N>U
[KNL, ARM64, RISCV, X86, EARLY]
If given as an integer followed by 'U', it will
divide each physical node into N emulated nodes.
numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic
NUMA balancing. NUMA balancing.
Allowed values are enable and disable Allowed values are enable and disable
@ -6655,6 +6670,15 @@
<deci-seconds>: poll all this frequency <deci-seconds>: poll all this frequency
0: no polling (default) 0: no polling (default)
thp_anon= [KNL]
Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
state is one of "always", "madvise", "never" or "inherit".
Control the default behavior of the system with respect
to anonymous transparent hugepages.
Can be used multiple times for multiple anon THP sizes.
See Documentation/admin-guide/mm/transhuge.rst for more
details.
threadirqs [KNL,EARLY] threadirqs [KNL,EARLY]
Force threading of all interrupt handlers except those Force threading of all interrupt handlers except those
marked explicitly IRQF_NO_THREAD. marked explicitly IRQF_NO_THREAD.

View File

@ -7,7 +7,7 @@ Getting Started
This document briefly describes how you can use DAMON by demonstrating its This document briefly describes how you can use DAMON by demonstrating its
default user space tool. Please note that this document describes only a part default user space tool. Please note that this document describes only a part
of its features for brevity. Please refer to the usage `doc of its features for brevity. Please refer to the usage `doc
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_ of the tool for more <https://github.com/damonitor/damo/blob/next/USAGE.md>`_ of the tool for more
details. details.
@ -26,7 +26,7 @@ User Space Tool
For the demonstration, we will use the default user space tool for DAMON, For the demonstration, we will use the default user space tool for DAMON,
called DAMON Operator (DAMO). It is available at called DAMON Operator (DAMO). It is available at
https://github.com/awslabs/damo. The examples below assume that ``damo`` is on https://github.com/damonitor/damo. The examples below assume that ``damo`` is on
your ``$PATH``. It's not mandatory, though. your ``$PATH``. It's not mandatory, though.
Because DAMO is using the sysfs interface (refer to :doc:`usage` for the Because DAMO is using the sysfs interface (refer to :doc:`usage` for the

View File

@ -7,19 +7,19 @@ Detailed Usages
DAMON provides below interfaces for different users. DAMON provides below interfaces for different users.
- *DAMON user space tool.* - *DAMON user space tool.*
`This <https://github.com/awslabs/damo>`_ is for privileged people such as `This <https://github.com/damonitor/damo>`_ is for privileged people such as
system administrators who want a just-working human-friendly interface. system administrators who want a just-working human-friendly interface.
Using this, users can use the DAMONs major features in a human-friendly way. Using this, users can use the DAMONs major features in a human-friendly way.
It may not be highly tuned for special cases, though. For more detail, It may not be highly tuned for special cases, though. For more detail,
please refer to its `usage document please refer to its `usage document
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_. <https://github.com/damonitor/damo/blob/next/USAGE.md>`_.
- *sysfs interface.* - *sysfs interface.*
:ref:`This <sysfs_interface>` is for privileged user space programmers who :ref:`This <sysfs_interface>` is for privileged user space programmers who
want more optimized use of DAMON. Using this, users can use DAMONs major want more optimized use of DAMON. Using this, users can use DAMONs major
features by reading from and writing to special sysfs files. Therefore, features by reading from and writing to special sysfs files. Therefore,
you can write and use your personalized DAMON sysfs wrapper programs that you can write and use your personalized DAMON sysfs wrapper programs that
reads/writes the sysfs files instead of you. The `DAMON user space tool reads/writes the sysfs files instead of you. The `DAMON user space tool
<https://github.com/awslabs/damo>`_ is one example of such programs. <https://github.com/damonitor/damo>`_ is one example of such programs.
- *Kernel Space Programming Interface.* - *Kernel Space Programming Interface.*
:doc:`This </mm/damon/api>` is for kernel space programmers. Using this, :doc:`This </mm/damon/api>` is for kernel space programmers. Using this,
users can utilize every feature of DAMON most flexibly and efficiently by users can utilize every feature of DAMON most flexibly and efficiently by
@ -543,7 +543,7 @@ memory rate becomes larger than 60%, or lower than 30%". ::
# echo 300 > watermarks/low # echo 300 > watermarks/low
Please note that it's highly recommended to use user space tools like `damo Please note that it's highly recommended to use user space tools like `damo
<https://github.com/awslabs/damo>`_ rather than manually reading and writing <https://github.com/damonitor/damo>`_ rather than manually reading and writing
the files as above. Above is only for an example. the files as above. Above is only for an example.
.. _tracepoint: .. _tracepoint:

View File

@ -202,6 +202,16 @@ PMD-mappable transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
All THPs at fault and collapse time will be added to _deferred_list,
and will therefore be split under memory presure if they are considered
"underused". A THP is underused if the number of zero-filled pages in
the THP is above max_ptes_none (see below). It is possible to disable
this behaviour by writing 0 to shrink_underused, and enable it by writing
1 to it::
echo 0 > /sys/kernel/mm/transparent_hugepage/shrink_underused
echo 1 > /sys/kernel/mm/transparent_hugepage/shrink_underused
khugepaged will be automatically started when PMD-sized THP is enabled khugepaged will be automatically started when PMD-sized THP is enabled
(either of the per-size anon control or the top-level control are set (either of the per-size anon control or the top-level control are set
to "always" or "madvise"), and it'll be automatically shutdown when to "always" or "madvise"), and it'll be automatically shutdown when
@ -284,13 +294,37 @@ that THP is shared. Exceeding the number would block the collapse::
A higher value may increase memory footprint for some workloads. A higher value may increase memory footprint for some workloads.
Boot parameter Boot parameters
============== ===============
You can change the sysfs boot time defaults of Transparent Hugepage You can change the sysfs boot time default for the top-level "enabled"
Support by passing the parameter ``transparent_hugepage=always`` or control by passing the parameter ``transparent_hugepage=always`` or
``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` ``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
to the kernel command line. kernel command line.
Alternatively, each supported anonymous THP size can be controlled by
passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
supported anonymous THP) and ``<state>`` is one of ``always``, ``madvise``,
``never`` or ``inherit``.
For example, the following will set 16K, 32K, 64K THP to ``always``,
set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
to ``never``::
thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
``thp_anon=`` may be specified multiple times to configure all THP sizes as
required. If ``thp_anon=`` is specified at least once, any anon THP sizes
not explicitly configured on the command line are implicitly set to
``never``.
``transparent_hugepage`` setting only affects the global toggle. If
``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
However, if a valid ``thp_anon`` setting is provided by the user, the
PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
is not defined within a valid ``thp_anon``, its policy will default to
``never``.
Hugepages in tmpfs/shmem Hugepages in tmpfs/shmem
======================== ========================
@ -447,6 +481,12 @@ thp_deferred_split_page
splitting it would free up some memory. Pages on split queue are splitting it would free up some memory. Pages on split queue are
going to be split under memory pressure. going to be split under memory pressure.
thp_underused_split_page
is incremented when a huge page on the split queue was split
because it was underused. A THP is underused if the number of
zero pages in the THP is above a certain threshold
(/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none).
thp_split_pmd thp_split_pmd
is incremented every time a PMD split into table of PTEs. is incremented every time a PMD split into table of PTEs.
This can happen, for instance, when application calls mprotect() or This can happen, for instance, when application calls mprotect() or
@ -527,6 +567,18 @@ split_deferred
it would free up some memory. Pages on split queue are going to it would free up some memory. Pages on split queue are going to
be split under memory pressure, if splitting is possible. be split under memory pressure, if splitting is possible.
nr_anon
the number of anonymous THP we have in the whole system. These THPs
might be currently entirely mapped or have partially unmapped/unused
subpages.
nr_anon_partially_mapped
the number of anonymous THP which are likely partially mapped, possibly
wasting memory, and have been queued for deferred memory reclamation.
Note that in corner some cases (e.g., failed migration), we might detect
an anonymous THP as "partially mapped" and count it here, even though it
is not actually partially mapped anymore.
As the system ages, allocating huge pages may be expensive as the As the system ages, allocating huge pages may be expensive as the
system uses memory compaction to copy data around memory to free a system uses memory compaction to copy data around memory to free a
huge page for use. There are some counters in ``/proc/vmstat`` to help huge page for use. There are some counters in ``/proc/vmstat`` to help

View File

@ -170,18 +170,6 @@ NUMA
Don't parse the HMAT table for NUMA setup, or soft-reserved memory Don't parse the HMAT table for NUMA setup, or soft-reserved memory
partitioning. partitioning.
numa=fake=<size>[MG]
If given as a memory unit, fills all system RAM with nodes of
size interleaved over physical nodes.
numa=fake=<N>
If given as an integer, fills all system RAM with N fake nodes
interleaved over physical nodes.
numa=fake=<N>U
If given as an integer followed by 'U', it will divide each
physical node into N emulated nodes.
ACPI ACPI
==== ====

View File

@ -576,13 +576,12 @@ The field width is passed by value, the bitmap is passed by reference.
Helper macros cpumask_pr_args() and nodemask_pr_args() are available to ease Helper macros cpumask_pr_args() and nodemask_pr_args() are available to ease
printing cpumask and nodemask. printing cpumask and nodemask.
Flags bitfields such as page flags, page_type, gfp_flags Flags bitfields such as page flags and gfp_flags
-------------------------------------------------------- --------------------------------------------------------
:: ::
%pGp 0x17ffffc0002036(referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff) %pGp 0x17ffffc0002036(referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff)
%pGt 0xffffff7f(buddy)
%pGg GFP_USER|GFP_DMA32|GFP_NOWARN %pGg GFP_USER|GFP_DMA32|GFP_NOWARN
%pGv read|exec|mayread|maywrite|mayexec|denywrite %pGv read|exec|mayread|maywrite|mayexec|denywrite
@ -591,7 +590,6 @@ would construct the value. The type of flags is given by the third
character. Currently supported are: character. Currently supported are:
- p - [p]age flags, expects value of type (``unsigned long *``) - p - [p]age flags, expects value of type (``unsigned long *``)
- t - page [t]ype, expects value of type (``unsigned int *``)
- v - [v]ma_flags, expects value of type (``unsigned long *``) - v - [v]ma_flags, expects value of type (``unsigned long *``)
- g - [g]fp_flags, expects value of type (``gfp_t *``) - g - [g]fp_flags, expects value of type (``gfp_t *``)

View File

@ -53,6 +53,13 @@ configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``.
The KUnit test suite is very likely to fail when using a deferrable timer The KUnit test suite is very likely to fail when using a deferrable timer
since it currently causes very unpredictable sample intervals. since it currently causes very unpredictable sample intervals.
By default KFENCE will only sample 1 heap allocation within each sample
interval. *Burst mode* allows to sample successive heap allocations, where the
kernel boot parameter ``kfence.burst`` can be set to a non-zero value which
denotes the *additional* successive allocations within a sample interval;
setting ``kfence.burst=N`` means that ``1 + N`` successive allocations are
attempted through KFENCE for each sample interval.
The KFENCE memory pool is of fixed size, and if the pool is exhausted, no The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
255), the number of available guarded objects can be controlled. Each object 255), the number of available guarded objects can be controlled. Each object

View File

@ -1,30 +0,0 @@
#
# Feature name: PG_uncached
# Kconfig: ARCH_USES_PG_UNCACHED
# description: arch supports the PG_uncached page flag
#
-----------------------
| arch |status|
-----------------------
| alpha: | TODO |
| arc: | TODO |
| arm: | TODO |
| arm64: | TODO |
| csky: | TODO |
| hexagon: | TODO |
| loongarch: | TODO |
| m68k: | TODO |
| microblaze: | TODO |
| mips: | TODO |
| nios2: | TODO |
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | TODO |
| riscv: | TODO |
| s390: | TODO |
| sh: | TODO |
| sparc: | TODO |
| um: | TODO |
| x86: | ok |
| xtensa: | TODO |
-----------------------

View File

@ -913,8 +913,7 @@ cache in your filesystem. The following members are defined:
stop attempting I/O, it can simply return. The caller will stop attempting I/O, it can simply return. The caller will
remove the remaining pages from the address space, unlock them remove the remaining pages from the address space, unlock them
and decrement the page refcount. Set PageUptodate if the I/O and decrement the page refcount. Set PageUptodate if the I/O
completes successfully. Setting PageError on any page will be completes successfully.
ignored; simply unlock the page if an I/O error occurs.
``write_begin`` ``write_begin``
Called by the generic buffered write code to ask the filesystem Called by the generic buffered write code to ask the filesystem

View File

@ -586,7 +586,7 @@ API, and return the results to the user-space.
The ABIs are designed to be used for user space applications development, The ABIs are designed to be used for user space applications development,
rather than human beings' fingers. Human users are recommended to use such rather than human beings' fingers. Human users are recommended to use such
user space tools. One such Python-written user space tool is available at user space tools. One such Python-written user space tool is available at
Github (https://github.com/awslabs/damo), Pypi Github (https://github.com/damonitor/damo), Pypi
(https://pypistats.org/packages/damo), and Fedora (https://pypistats.org/packages/damo), and Fedora
(https://packages.fedoraproject.org/pkgs/python-damo/damo/). (https://packages.fedoraproject.org/pkgs/python-damo/damo/).

View File

@ -7,23 +7,27 @@ The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR'
section of 'MAINTAINERS' file. section of 'MAINTAINERS' file.
The mailing lists for the subsystem are damon@lists.linux.dev and The mailing lists for the subsystem are damon@lists.linux.dev and
linux-mm@kvack.org. Patches should be made against the mm-unstable tree [1]_ linux-mm@kvack.org. Patches should be made against the mm-unstable `tree
whenever possible and posted to the mailing lists. <https://git.kernel.org/akpm/mm/h/mm-unstable>` whenever possible and posted to
the mailing lists.
SCM Trees SCM Trees
--------- ---------
There are multiple Linux trees for DAMON development. Patches under There are multiple Linux trees for DAMON development. Patches under
development or testing are queued in damon/next [2]_ by the DAMON maintainer. development or testing are queued in `damon/next
Sufficiently reviewed patches will be queued in mm-unstable [1]_ by the memory <https://git.kernel.org/sj/h/damon/next>` by the DAMON maintainer.
management subsystem maintainer. After more sufficient tests, the patches will Sufficiently reviewed patches will be queued in `mm-unstable
be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the <https://git.kernel.org/akpm/mm/h/mm-unstable>` by the memory management
memory management subsystem maintainer. subsystem maintainer. After more sufficient tests, the patches will be queued
in `mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>` , and finally
pull-requested to the mainline by the memory management subsystem maintainer.
Note again the patches for mm-unstable tree [1]_ are queued by the memory Note again the patches for mm-unstable `tree
<https://git.kernel.org/akpm/mm/h/mm-unstable>` are queued by the memory
management subsystem maintainer. If the patches requires some patches in management subsystem maintainer. If the patches requires some patches in
damon/next tree [2]_ which not yet merged in mm-unstable, please make sure the damon/next `tree <https://git.kernel.org/sj/h/damon/next>` which not yet merged
requirement is clearly specified. in mm-unstable, please make sure the requirement is clearly specified.
Submit checklist addendum Submit checklist addendum
------------------------- -------------------------
@ -32,18 +36,27 @@ When making DAMON changes, you should do below.
- Build changes related outputs including kernel and documents. - Build changes related outputs including kernel and documents.
- Ensure the builds introduce no new errors or warnings. - Ensure the builds introduce no new errors or warnings.
- Run and ensure no new failures for DAMON selftests [4]_ and kunittests [5]_ . - Run and ensure no new failures for DAMON `selftests
<https://github.com/awslabs/damon-tests/blob/master/corr/run.sh#L49>` and
`kunittests
<https://github.com/awslabs/damon-tests/blob/master/corr/tests/kunit.sh>`.
Further doing below and putting the results will be helpful. Further doing below and putting the results will be helpful.
- Run damon-tests/corr [6]_ for normal changes. - Run `damon-tests/corr
- Run damon-tests/perf [7]_ for performance changes. <https://github.com/awslabs/damon-tests/tree/master/corr>` for normal
changes.
- Run `damon-tests/perf
<https://github.com/awslabs/damon-tests/tree/master/perf>` for performance
changes.
Key cycle dates Key cycle dates
--------------- ---------------
Patches can be sent anytime. Key cycle dates of the mm-unstable [1]_ and Patches can be sent anytime. Key cycle dates of the `mm-unstable
mm-stable [3]_ trees depend on the memory management subsystem maintainer. <https://git.kernel.org/akpm/mm/h/mm-unstable>` and `mm-stable
<https://git.kernel.org/akpm/mm/h/mm-stable>` trees depend on the memory
management subsystem maintainer.
Review cadence Review cadence
-------------- --------------
@ -58,16 +71,17 @@ Mailing tool
Like many other Linux kernel subsystems, DAMON uses the mailing lists Like many other Linux kernel subsystems, DAMON uses the mailing lists
(damon@lists.linux.dev and linux-mm@kvack.org) as the major communication (damon@lists.linux.dev and linux-mm@kvack.org) as the major communication
channel. There is a simple tool called HacKerMaiL (``hkml``) [8]_ , which is channel. There is a simple tool called `HacKerMaiL
for people who are not very familiar with the mailing lists based <https://github.com/damonitor/hackermail>` (``hkml``), which is for people who
communication. The tool could be particularly helpful for DAMON community are not very familiar with the mailing lists based communication. The tool
members since it is developed and maintained by DAMON maintainer. The tool is could be particularly helpful for DAMON community members since it is developed
also officially announced to support DAMON and general Linux kernel development and maintained by DAMON maintainer. The tool is also officially announced to
workflow. support DAMON and general Linux kernel development workflow.
In other words, ``hkml`` [8]_ is a mailing tool for DAMON community, which In other words, `hkml <https://github.com/damonitor/hackermail>` is a mailing
DAMON maintainer is committed to support. Please feel free to try and report tool for DAMON community, which DAMON maintainer is committed to support.
issues or feature requests for the tool to the maintainer. Please feel free to try and report issues or feature requests for the tool to
the maintainer.
Community meetup Community meetup
---------------- ----------------
@ -83,17 +97,9 @@ members including the maintainer. The maintainer shares the available time
slots, and attendees should reserve one of those at least 24 hours before the slots, and attendees should reserve one of those at least 24 hours before the
time slot, by reaching out to the maintainer. time slot, by reaching out to the maintainer.
Schedules and available reservation time slots are available at the Google doc Schedules and available reservation time slots are available at the Google `doc
[9]_ . DAMON maintainer will also provide periodic reminder to the mailing <https://docs.google.com/document/d/1v43Kcj3ly4CYqmAkMaZzLiM2GEnWfgdGbZAH3mi2vpM/edit?usp=sharing>`.
list (damon@lists.linux.dev). There is also a public Google `calendar
<https://calendar.google.com/calendar/u/0?cid=ZDIwOTA4YTMxNjc2MDQ3NTIyMmUzYTM5ZmQyM2U4NDA0ZGIwZjBiYmJlZGQxNDM0MmY4ZTRjOTE0NjdhZDRiY0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t>`
that has the events. Anyone can subscribe it. DAMON maintainer will also
.. [1] https://git.kernel.org/akpm/mm/h/mm-unstable provide periodic reminder to the mailing list (damon@lists.linux.dev).
.. [2] https://git.kernel.org/sj/h/damon/next
.. [3] https://git.kernel.org/akpm/mm/h/mm-stable
.. [4] https://github.com/awslabs/damon-tests/blob/master/corr/run.sh#L49
.. [5] https://github.com/awslabs/damon-tests/blob/master/corr/tests/kunit.sh
.. [6] https://github.com/awslabs/damon-tests/tree/master/corr
.. [7] https://github.com/awslabs/damon-tests/tree/master/perf
.. [8] https://github.com/damonitor/hackermail
.. [9] https://docs.google.com/document/d/1v43Kcj3ly4CYqmAkMaZzLiM2GEnWfgdGbZAH3mi2vpM/edit?usp=sharing

View File

@ -63,15 +63,15 @@ and then a low level description of how the low level details work.
In kernel use of migrate_pages() In kernel use of migrate_pages()
================================ ================================
1. Remove pages from the LRU. 1. Remove folios from the LRU.
Lists of pages to be migrated are generated by scanning over Lists of folios to be migrated are generated by scanning over
pages and moving them into lists. This is done by folios and moving them into lists. This is done by
calling isolate_lru_page(). calling folio_isolate_lru().
Calling isolate_lru_page() increases the references to the page Calling folio_isolate_lru() increases the references to the folio
so that it cannot vanish while the page migration occurs. so that it cannot vanish while the folio migration occurs.
It also prevents the swapper or other scans from encountering It also prevents the swapper or other scans from encountering
the page. the folio.
2. We need to have a function of type new_folio_t that can be 2. We need to have a function of type new_folio_t that can be
passed to migrate_pages(). This function should figure out passed to migrate_pages(). This function should figure out
@ -84,10 +84,10 @@ In kernel use of migrate_pages()
How migrate_pages() works How migrate_pages() works
========================= =========================
migrate_pages() does several passes over its list of pages. A page is moved migrate_pages() does several passes over its list of folios. A folio is moved
if all references to a page are removable at the time. The page has if all references to a folio are removable at the time. The folio has
already been removed from the LRU via isolate_lru_page() and the refcount already been removed from the LRU via folio_isolate_lru() and the refcount
is increased so that the page cannot be freed while page migration occurs. is increased so that the folio cannot be freed while folio migration occurs.
Steps: Steps:

View File

@ -31,10 +31,10 @@ Design principles
feature that applies to all dynamic high order allocations in the feature that applies to all dynamic high order allocations in the
kernel) kernel)
get_user_pages and follow_page get_user_pages and pin_user_pages
============================== =================================
get_user_pages and follow_page if run on a hugepage, will return the get_user_pages and pin_user_pages if run on a hugepage, will return the
head or tail pages as usual (exactly as they would do on head or tail pages as usual (exactly as they would do on
hugetlbfs). Most GUP users will only care about the actual physical hugetlbfs). Most GUP users will only care about the actual physical
address of the page and its temporary pinning to release after the I/O address of the page and its temporary pinning to release after the I/O

View File

@ -80,7 +80,7 @@ on an additional LRU list for a few reasons:
(2) We want to be able to migrate unevictable folios between nodes for memory (2) We want to be able to migrate unevictable folios between nodes for memory
defragmentation, workload management and memory hotplug. The Linux kernel defragmentation, workload management and memory hotplug. The Linux kernel
can only migrate folios that it can successfully isolate from the LRU can only migrate folios that it can successfully isolate from the LRU
lists (or "Movable" pages: outside of consideration here). If we were to lists (or "Movable" folios: outside of consideration here). If we were to
maintain folios elsewhere than on an LRU-like list, where they can be maintain folios elsewhere than on an LRU-like list, where they can be
detected by folio_isolate_lru(), we would prevent their migration. detected by folio_isolate_lru(), we would prevent their migration.
@ -230,7 +230,7 @@ In Nick's patch, he used one of the struct page LRU list link fields as a count
of VM_LOCKED VMAs that map the page (Rik van Riel had the same idea three years of VM_LOCKED VMAs that map the page (Rik van Riel had the same idea three years
earlier). But this use of the link field for a count prevented the management earlier). But this use of the link field for a count prevented the management
of the pages on an LRU list, and thus mlocked pages were not migratable as of the pages on an LRU list, and thus mlocked pages were not migratable as
isolate_lru_page() could not detect them, and the LRU list link field was not folio_isolate_lru() could not detect them, and the LRU list link field was not
available to the migration subsystem. available to the migration subsystem.
Nick resolved this by putting mlocked pages back on the LRU list before Nick resolved this by putting mlocked pages back on the LRU list before
@ -253,8 +253,8 @@ Basic Management
mlocked pages - pages mapped into a VM_LOCKED VMA - are a class of unevictable mlocked pages - pages mapped into a VM_LOCKED VMA - are a class of unevictable
pages. When such a page has been "noticed" by the memory management subsystem, pages. When such a page has been "noticed" by the memory management subsystem,
the page is marked with the PG_mlocked flag. This can be manipulated using the the folio is marked with the PG_mlocked flag. This can be manipulated using
PageMlocked() functions. folio_set_mlocked() and folio_clear_mlocked() functions.
A PG_mlocked page will be placed on the unevictable list when it is added to A PG_mlocked page will be placed on the unevictable list when it is added to
the LRU. Such pages can be "noticed" by memory management in several places: the LRU. Such pages can be "noticed" by memory management in several places:

View File

@ -15,7 +15,7 @@
本文通过演示DAMON的默认用户空间工具简要地介绍了如何使用DAMON。请注意为了简洁 本文通过演示DAMON的默认用户空间工具简要地介绍了如何使用DAMON。请注意为了简洁
起见,本文档只描述了它的部分功能。更多细节请参考该工具的使用文档。 起见,本文档只描述了它的部分功能。更多细节请参考该工具的使用文档。
`doc <https://github.com/awslabs/damo/blob/next/USAGE.md>`_ . `doc <https://github.com/damonitor/damo/blob/next/USAGE.md>`_ .
前提条件 前提条件
@ -31,7 +31,7 @@
------------ ------------
在演示中我们将使用DAMON的默认用户空间工具称为DAMON OperatorDAMO。它可以在 在演示中我们将使用DAMON的默认用户空间工具称为DAMON OperatorDAMO。它可以在
https://github.com/awslabs/damo找到。下面的例子假设DAMO在你的$PATH上。当然 https://github.com/damonitor/damo找到。下面的例子假设DAMO在你的$PATH上。当然
这并不是强制性的。 这并不是强制性的。
因为DAMO使用了DAMON的sysfs接口详情请参考:doc:`usage`),你应该确保 因为DAMO使用了DAMON的sysfs接口详情请参考:doc:`usage`),你应该确保

View File

@ -16,16 +16,16 @@
DAMON 为不同的用户提供了下面这些接口。 DAMON 为不同的用户提供了下面这些接口。
- *DAMON用户空间工具。* - *DAMON用户空间工具。*
`这 <https://github.com/awslabs/damo>`_ 为有这特权的人, 如系统管理员,希望有一个刚好 `这 <https://github.com/damonitor/damo>`_ 为有这特权的人, 如系统管理员,希望有一个刚好
可以工作的人性化界面。 可以工作的人性化界面。
使用它用户可以以人性化的方式使用DAMON的主要功能。不过它可能不会为特殊情况进行高度调整。 使用它用户可以以人性化的方式使用DAMON的主要功能。不过它可能不会为特殊情况进行高度调整。
它同时支持虚拟和物理地址空间的监测。更多细节,请参考它的 `使用文档 它同时支持虚拟和物理地址空间的监测。更多细节,请参考它的 `使用文档
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_。 <https://github.com/damonitor/damo/blob/next/USAGE.md>`_。
- *sysfs接口。* - *sysfs接口。*
:ref:`这 <sysfs_interface>` 是为那些希望更高级的使用DAMON的特权用户空间程序员准备的。 :ref:`这 <sysfs_interface>` 是为那些希望更高级的使用DAMON的特权用户空间程序员准备的。
使用它用户可以通过读取和写入特殊的sysfs文件来使用DAMON的主要功能。因此你可以编写和使 使用它用户可以通过读取和写入特殊的sysfs文件来使用DAMON的主要功能。因此你可以编写和使
用你个性化的DAMON sysfs包装程序代替你读/写sysfs文件。 `DAMON用户空间工具 用你个性化的DAMON sysfs包装程序代替你读/写sysfs文件。 `DAMON用户空间工具
<https://github.com/awslabs/damo>`_ 就是这种程序的一个例子 它同时支持虚拟和物理地址 <https://github.com/damonitor/damo>`_ 就是这种程序的一个例子 它同时支持虚拟和物理地址
空间的监测。注意,这个界面只提供简单的监测结果 :ref:`统计 <damos_stats>`。对于详细的监测 空间的监测。注意,这个界面只提供简单的监测结果 :ref:`统计 <damos_stats>`。对于详细的监测
结果DAMON提供了一个:ref:`跟踪点 <tracepoint>` 结果DAMON提供了一个:ref:`跟踪点 <tracepoint>`
- *debugfs interface.* - *debugfs interface.*
@ -332,7 +332,7 @@ tried_regions/<N>/
# echo 500 > watermarks/mid # echo 500 > watermarks/mid
# echo 300 > watermarks/low # echo 300 > watermarks/low
请注意,我们强烈建议使用用户空间的工具,如 `damo <https://github.com/awslabs/damo>`_ 请注意,我们强烈建议使用用户空间的工具,如 `damo <https://github.com/damonitor/damo>`_
而不是像上面那样手动读写文件。以上只是一个例子。 而不是像上面那样手动读写文件。以上只是一个例子。
debugfs接口 debugfs接口

View File

@ -50,8 +50,8 @@ mbind()设置一个新的内存策略。一个进程的页面也可以通过sys_
1. 从LRU中移除页面。 1. 从LRU中移除页面。
要迁移的页面列表是通过扫描页面并把它们移到列表中来生成的。这是通过调用 isolate_lru_page() 要迁移的页面列表是通过扫描页面并把它们移到列表中来生成的。这是通过调用 folio_isolate_lru()
来完成的。调用isolate_lru_page()增加了对该页的引用,这样在页面迁移发生时它就不会 来完成的。调用folio_isolate_lru()增加了对该页的引用,这样在页面迁移发生时它就不会
消失。它还可以防止交换器或其他扫描器遇到该页。 消失。它还可以防止交换器或其他扫描器遇到该页。
@ -65,7 +65,7 @@ migrate_pages()如何工作
======================= =======================
migrate_pages()对它的页面列表进行了多次处理。如果当时对一个页面的所有引用都可以被移除, migrate_pages()对它的页面列表进行了多次处理。如果当时对一个页面的所有引用都可以被移除,
那么这个页面就会被移动。该页已经通过isolate_lru_page()从LRU中移除并且refcount被 那么这个页面就会被移动。该页已经通过folio_isolate_lru()从LRU中移除并且refcount被
增加,以便在页面迁移发生时不释放该页。 增加,以便在页面迁移发生时不释放该页。
步骤: 步骤:

View File

@ -15,7 +15,7 @@
本文通過演示DAMON的默認用戶空間工具簡要地介紹瞭如何使用DAMON。請注意爲了簡潔 本文通過演示DAMON的默認用戶空間工具簡要地介紹瞭如何使用DAMON。請注意爲了簡潔
起見,本文檔只描述了它的部分功能。更多細節請參考該工具的使用文檔。 起見,本文檔只描述了它的部分功能。更多細節請參考該工具的使用文檔。
`doc <https://github.com/awslabs/damo/blob/next/USAGE.md>`_ . `doc <https://github.com/damonitor/damo/blob/next/USAGE.md>`_ .
前提條件 前提條件
@ -31,7 +31,7 @@
------------ ------------
在演示中我們將使用DAMON的默認用戶空間工具稱爲DAMON OperatorDAMO。它可以在 在演示中我們將使用DAMON的默認用戶空間工具稱爲DAMON OperatorDAMO。它可以在
https://github.com/awslabs/damo找到。下面的例子假設DAMO在你的$PATH上。當然 https://github.com/damonitor/damo找到。下面的例子假設DAMO在你的$PATH上。當然
這並不是強制性的。 這並不是強制性的。
因爲DAMO使用了DAMON的sysfs接口詳情請參考:doc:`usage`),你應該確保 因爲DAMO使用了DAMON的sysfs接口詳情請參考:doc:`usage`),你應該確保

View File

@ -16,16 +16,16 @@
DAMON 爲不同的用戶提供了下面這些接口。 DAMON 爲不同的用戶提供了下面這些接口。
- *DAMON用戶空間工具。* - *DAMON用戶空間工具。*
`這 <https://github.com/awslabs/damo>`_ 爲有這特權的人, 如系統管理員,希望有一個剛好 `這 <https://github.com/damonitor/damo>`_ 爲有這特權的人, 如系統管理員,希望有一個剛好
可以工作的人性化界面。 可以工作的人性化界面。
使用它用戶可以以人性化的方式使用DAMON的主要功能。不過它可能不會爲特殊情況進行高度調整。 使用它用戶可以以人性化的方式使用DAMON的主要功能。不過它可能不會爲特殊情況進行高度調整。
它同時支持虛擬和物理地址空間的監測。更多細節,請參考它的 `使用文檔 它同時支持虛擬和物理地址空間的監測。更多細節,請參考它的 `使用文檔
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_。 <https://github.com/damonitor/damo/blob/next/USAGE.md>`_。
- *sysfs接口。* - *sysfs接口。*
:ref:`這 <sysfs_interface>` 是爲那些希望更高級的使用DAMON的特權用戶空間程序員準備的。 :ref:`這 <sysfs_interface>` 是爲那些希望更高級的使用DAMON的特權用戶空間程序員準備的。
使用它用戶可以通過讀取和寫入特殊的sysfs文件來使用DAMON的主要功能。因此你可以編寫和使 使用它用戶可以通過讀取和寫入特殊的sysfs文件來使用DAMON的主要功能。因此你可以編寫和使
用你個性化的DAMON sysfs包裝程序代替你讀/寫sysfs文件。 `DAMON用戶空間工具 用你個性化的DAMON sysfs包裝程序代替你讀/寫sysfs文件。 `DAMON用戶空間工具
<https://github.com/awslabs/damo>`_ 就是這種程序的一個例子 它同時支持虛擬和物理地址 <https://github.com/damonitor/damo>`_ 就是這種程序的一個例子 它同時支持虛擬和物理地址
空間的監測。注意,這個界面只提供簡單的監測結果 :ref:`統計 <damos_stats>`。對於詳細的監測 空間的監測。注意,這個界面只提供簡單的監測結果 :ref:`統計 <damos_stats>`。對於詳細的監測
結果DAMON提供了一個:ref:`跟蹤點 <tracepoint>` 結果DAMON提供了一個:ref:`跟蹤點 <tracepoint>`
- *debugfs interface.* - *debugfs interface.*
@ -332,7 +332,7 @@ tried_regions/<N>/
# echo 500 > watermarks/mid # echo 500 > watermarks/mid
# echo 300 > watermarks/low # echo 300 > watermarks/low
請注意,我們強烈建議使用用戶空間的工具,如 `damo <https://github.com/awslabs/damo>`_ 請注意,我們強烈建議使用用戶空間的工具,如 `damo <https://github.com/damonitor/damo>`_
而不是像上面那樣手動讀寫文件。以上只是一個例子。 而不是像上面那樣手動讀寫文件。以上只是一個例子。
debugfs接口 debugfs接口

View File

@ -24603,6 +24603,20 @@ F: include/uapi/linux/vsockmon.h
F: net/vmw_vsock/ F: net/vmw_vsock/
F: tools/testing/vsock/ F: tools/testing/vsock/
VMA
M: Andrew Morton <akpm@linux-foundation.org>
R: Liam R. Howlett <Liam.Howlett@oracle.com>
R: Vlastimil Babka <vbabka@suse.cz>
R: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
L: linux-mm@kvack.org
S: Maintained
W: https://www.linux-mm.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
F: mm/vma.c
F: mm/vma.h
F: mm/vma_internal.h
F: tools/testing/vma/
VMALLOC VMALLOC
M: Andrew Morton <akpm@linux-foundation.org> M: Andrew Morton <akpm@linux-foundation.org>
R: Uladzislau Rezki <urezki@gmail.com> R: Uladzislau Rezki <urezki@gmail.com>

View File

@ -1229,7 +1229,7 @@ arch_get_unmapped_area_1(unsigned long addr, unsigned long len,
unsigned long unsigned long
arch_get_unmapped_area(struct file *filp, unsigned long addr, arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long len, unsigned long pgoff,
unsigned long flags) unsigned long flags, vm_flags_t vm_flags)
{ {
unsigned long limit; unsigned long limit;

View File

@ -23,7 +23,8 @@
*/ */
unsigned long unsigned long
arch_get_unmapped_area(struct file *filp, unsigned long addr, arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long len, unsigned long pgoff,
unsigned long flags, vm_flags_t vm_flags)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct *vma; struct vm_area_struct *vma;

View File

@ -61,7 +61,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address,
return ret; return ret;
} }
#if USE_SPLIT_PTE_PTLOCKS #if defined(CONFIG_SPLIT_PTE_PTLOCKS)
/* /*
* If we are using split PTE locks, then we need to take the page * If we are using split PTE locks, then we need to take the page
* lock here. Otherwise we are using shared mm->page_table_lock * lock here. Otherwise we are using shared mm->page_table_lock
@ -80,10 +80,10 @@ static inline void do_pte_unlock(spinlock_t *ptl)
{ {
spin_unlock(ptl); spin_unlock(ptl);
} }
#else /* !USE_SPLIT_PTE_PTLOCKS */ #else /* !defined(CONFIG_SPLIT_PTE_PTLOCKS) */
static inline void do_pte_lock(spinlock_t *ptl) {} static inline void do_pte_lock(spinlock_t *ptl) {}
static inline void do_pte_unlock(spinlock_t *ptl) {} static inline void do_pte_unlock(spinlock_t *ptl) {}
#endif /* USE_SPLIT_PTE_PTLOCKS */ #endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */
static int adjust_pte(struct vm_area_struct *vma, unsigned long address, static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
unsigned long pfn) unsigned long pfn)

View File

@ -28,7 +28,8 @@
*/ */
unsigned long unsigned long
arch_get_unmapped_area(struct file *filp, unsigned long addr, arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long len, unsigned long pgoff,
unsigned long flags, vm_flags_t vm_flags)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct *vma; struct vm_area_struct *vma;
@ -78,8 +79,8 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long unsigned long
arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
const unsigned long len, const unsigned long pgoff, const unsigned long len, const unsigned long pgoff,
const unsigned long flags) const unsigned long flags, vm_flags_t vm_flags)
{ {
struct vm_area_struct *vma; struct vm_area_struct *vma;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;

View File

@ -101,6 +101,7 @@ config ARM64
select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_RT select ARCH_SUPPORTS_RT
select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
@ -2104,7 +2105,8 @@ config ARM64_MTE
depends on ARM64_PAN depends on ARM64_PAN
select ARCH_HAS_SUBPAGE_FAULTS select ARCH_HAS_SUBPAGE_FAULTS
select ARCH_USES_HIGH_VMA_FLAGS select ARCH_USES_HIGH_VMA_FLAGS
select ARCH_USES_PG_ARCH_X select ARCH_USES_PG_ARCH_2
select ARCH_USES_PG_ARCH_3
help help
Memory Tagging (part of the ARMv8.5 Extensions) provides Memory Tagging (part of the ARMv8.5 Extensions) provides
architectural support for run-time, always-on detection of architectural support for run-time, always-on detection of

View File

@ -9,6 +9,7 @@ syscall-y += unistd_compat_32.h
generic-y += early_ioremap.h generic-y += early_ioremap.h
generic-y += mcs_spinlock.h generic-y += mcs_spinlock.h
generic-y += mmzone.h
generic-y += qrwlock.h generic-y += qrwlock.h
generic-y += qspinlock.h generic-y += qspinlock.h
generic-y += parport.h generic-y += parport.h

View File

@ -1,13 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_MMZONE_H
#define __ASM_MMZONE_H
#ifdef CONFIG_NUMA
#include <asm/numa.h>
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[(nid)])
#endif /* CONFIG_NUMA */
#endif /* __ASM_MMZONE_H */

View File

@ -407,6 +407,7 @@ static inline void __sync_cache_and_tags(pte_t pte, unsigned int nr_pages)
/* /*
* Select all bits except the pfn * Select all bits except the pfn
*/ */
#define pte_pgprot pte_pgprot
static inline pgprot_t pte_pgprot(pte_t pte) static inline pgprot_t pte_pgprot(pte_t pte)
{ {
unsigned long pfn = pte_pfn(pte); unsigned long pfn = pte_pfn(pte);
@ -600,6 +601,14 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
return pte_pmd(set_pte_bit(pmd_pte(pmd), __pgprot(PTE_DEVMAP))); return pte_pmd(set_pte_bit(pmd_pte(pmd), __pgprot(PTE_DEVMAP)));
} }
#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
#define pmd_special(pte) (!!((pmd_val(pte) & PTE_SPECIAL)))
static inline pmd_t pmd_mkspecial(pmd_t pmd)
{
return set_pmd_bit(pmd, __pgprot(PTE_SPECIAL));
}
#endif
#define __pmd_to_phys(pmd) __pte_to_phys(pmd_pte(pmd)) #define __pmd_to_phys(pmd) __pte_to_phys(pmd_pte(pmd))
#define __phys_to_pmd_val(phys) __phys_to_pte_val(phys) #define __phys_to_pmd_val(phys) __phys_to_pte_val(phys)
#define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT) #define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT)
@ -617,6 +626,27 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
#define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT)
#define pfn_pud(pfn,prot) __pud(__phys_to_pud_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define pfn_pud(pfn,prot) __pud(__phys_to_pud_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
#ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP
#define pud_special(pte) pte_special(pud_pte(pud))
#define pud_mkspecial(pte) pte_pud(pte_mkspecial(pud_pte(pud)))
#endif
#define pmd_pgprot pmd_pgprot
static inline pgprot_t pmd_pgprot(pmd_t pmd)
{
unsigned long pfn = pmd_pfn(pmd);
return __pgprot(pmd_val(pfn_pmd(pfn, __pgprot(0))) ^ pmd_val(pmd));
}
#define pud_pgprot pud_pgprot
static inline pgprot_t pud_pgprot(pud_t pud)
{
unsigned long pfn = pud_pfn(pud);
return __pgprot(pud_val(pfn_pud(pfn, __pgprot(0))) ^ pud_val(pud));
}
static inline void __set_pte_at(struct mm_struct *mm, static inline void __set_pte_at(struct mm_struct *mm,
unsigned long __always_unused addr, unsigned long __always_unused addr,
pte_t *ptep, pte_t pte, unsigned int nr) pte_t *ptep, pte_t pte, unsigned int nr)

View File

@ -5,6 +5,7 @@
#include <linux/cpumask.h> #include <linux/cpumask.h>
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
#include <asm/numa.h>
struct pci_bus; struct pci_bus;
int pcibus_to_node(struct pci_bus *bus); int pcibus_to_node(struct pci_bus *bus);

View File

@ -62,7 +62,6 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
*/ */
num_mmus = atomic_read(&kvm->online_vcpus) * S2_MMU_PER_VCPU; num_mmus = atomic_read(&kvm->online_vcpus) * S2_MMU_PER_VCPU;
tmp = kvrealloc(kvm->arch.nested_mmus, tmp = kvrealloc(kvm->arch.nested_mmus,
size_mul(sizeof(*kvm->arch.nested_mmus), kvm->arch.nested_mmus_size),
size_mul(sizeof(*kvm->arch.nested_mmus), num_mmus), size_mul(sizeof(*kvm->arch.nested_mmus), num_mmus),
GFP_KERNEL_ACCOUNT | __GFP_ZERO); GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!tmp) if (!tmp)

View File

@ -23,7 +23,8 @@
*/ */
unsigned long unsigned long
arch_get_unmapped_area(struct file *filp, unsigned long addr, arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long len, unsigned long pgoff,
unsigned long flags, vm_flags_t vm_flags)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct *vma; struct vm_area_struct *vma;

View File

@ -45,9 +45,16 @@ arch_initcall(vdso_init);
int arch_setup_additional_pages(struct linux_binprm *bprm, int arch_setup_additional_pages(struct linux_binprm *bprm,
int uses_interp) int uses_interp)
{ {
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
unsigned long vdso_base, vdso_len; unsigned long vdso_base, vdso_len;
int ret; int ret;
static struct vm_special_mapping vdso_mapping = {
.name = "[vdso]",
};
static struct vm_special_mapping vvar_mapping = {
.name = "[vvar]",
};
vdso_len = (vdso_pages + 1) << PAGE_SHIFT; vdso_len = (vdso_pages + 1) << PAGE_SHIFT;
@ -65,22 +72,29 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
*/ */
mm->context.vdso = (void *)vdso_base; mm->context.vdso = (void *)vdso_base;
ret = vdso_mapping.pages = vdso_pagelist;
install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT, vma =
_install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
(VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC), (VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC),
vdso_pagelist); &vdso_mapping);
if (unlikely(ret)) { if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
mm->context.vdso = NULL; mm->context.vdso = NULL;
goto end; goto end;
} }
vdso_base += (vdso_pages << PAGE_SHIFT); vdso_base += (vdso_pages << PAGE_SHIFT);
ret = install_special_mapping(mm, vdso_base, PAGE_SIZE, vvar_mapping.pages = &vdso_pagelist[vdso_pages];
(VM_READ | VM_MAYREAD), &vdso_pagelist[vdso_pages]); vma = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
(VM_READ | VM_MAYREAD), &vvar_mapping);
if (unlikely(ret)) if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
mm->context.vdso = NULL; mm->context.vdso = NULL;
goto end;
}
ret = 0;
end: end:
mmap_write_unlock(mm); mmap_write_unlock(mm);
return ret; return ret;

View File

@ -51,7 +51,11 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{ {
int ret; int ret;
unsigned long vdso_base; unsigned long vdso_base;
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
static struct vm_special_mapping vdso_mapping = {
name = "[vdso]",
};
if (mmap_write_lock_killable(mm)) if (mmap_write_lock_killable(mm))
return -EINTR; return -EINTR;
@ -66,16 +70,18 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
} }
/* MAYWRITE to allow gdb to COW and set breakpoints. */ /* MAYWRITE to allow gdb to COW and set breakpoints. */
ret = install_special_mapping(mm, vdso_base, PAGE_SIZE, vdso_mapping.pages = &vdso_page;
vma = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
VM_READ|VM_EXEC| VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
&vdso_page); &vdso_mapping);
if (ret) ret = PTR_ERR(vma);
if (IS_ERR(vma))
goto up_fail; goto up_fail;
mm->context.vdso = (void *)vdso_base; mm->context.vdso = (void *)vdso_base;
ret = 0;
up_fail: up_fail:
mmap_write_unlock(mm); mmap_write_unlock(mm);
return ret; return ret;

View File

@ -96,7 +96,6 @@ CONFIG_ZPOOL=y
CONFIG_ZSWAP=y CONFIG_ZSWAP=y
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD=y CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD=y
CONFIG_ZBUD=y CONFIG_ZBUD=y
CONFIG_Z3FOLD=y
CONFIG_ZSMALLOC=m CONFIG_ZSMALLOC=m
# CONFIG_COMPAT_BRK is not set # CONFIG_COMPAT_BRK is not set
CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG=y

View File

@ -8,5 +8,6 @@ generic-y += early_ioremap.h
generic-y += qrwlock.h generic-y += qrwlock.h
generic-y += user.h generic-y += user.h
generic-y += ioctl.h generic-y += ioctl.h
generic-y += mmzone.h
generic-y += statfs.h generic-y += statfs.h
generic-y += param.h generic-y += param.h

View File

@ -1,16 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Author: Huacai Chen (chenhuacai@loongson.cn)
* Copyright (C) 2020-2022 Loongson Technology Corporation Limited
*/
#ifndef _ASM_MMZONE_H_
#define _ASM_MMZONE_H_
#include <asm/page.h>
#include <asm/numa.h>
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[(nid)])
#endif /* _ASM_MMZONE_H_ */

View File

@ -8,6 +8,7 @@
#include <linux/smp.h> #include <linux/smp.h>
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
#include <asm/numa.h>
extern cpumask_t cpus_on_node[]; extern cpumask_t cpus_on_node[];

View File

@ -27,10 +27,7 @@
#include <asm/time.h> #include <asm/time.h>
int numa_off; int numa_off;
struct pglist_data *node_data[MAX_NUMNODES];
unsigned char node_distances[MAX_NUMNODES][MAX_NUMNODES]; unsigned char node_distances[MAX_NUMNODES][MAX_NUMNODES];
EXPORT_SYMBOL(node_data);
EXPORT_SYMBOL(node_distances); EXPORT_SYMBOL(node_distances);
static struct numa_meminfo numa_meminfo; static struct numa_meminfo numa_meminfo;
@ -190,24 +187,6 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
return numa_add_memblk_to(nid, start, end, &numa_meminfo); return numa_add_memblk_to(nid, start, end, &numa_meminfo);
} }
static void __init alloc_node_data(int nid)
{
void *nd;
unsigned long nd_pa;
size_t nd_sz = roundup(sizeof(pg_data_t), PAGE_SIZE);
nd_pa = memblock_phys_alloc_try_nid(nd_sz, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
pr_err("Cannot find %zu Byte for node_data (initial node: %d)\n", nd_sz, nid);
return;
}
nd = __va(nd_pa);
node_data[nid] = nd;
memset(nd, 0, sizeof(pg_data_t));
}
static void __init node_mem_init(unsigned int node) static void __init node_mem_init(unsigned int node)
{ {
unsigned long start_pfn, end_pfn; unsigned long start_pfn, end_pfn;

View File

@ -89,7 +89,8 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
} }
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr0, unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr0,
unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long len, unsigned long pgoff, unsigned long flags,
vm_flags_t vm_flags)
{ {
return arch_get_unmapped_area_common(filp, return arch_get_unmapped_area_common(filp,
addr0, len, pgoff, flags, UP); addr0, len, pgoff, flags, UP);
@ -101,7 +102,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr0,
*/ */
unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long arch_get_unmapped_area_topdown(struct file *filp,
unsigned long addr0, unsigned long len, unsigned long pgoff, unsigned long addr0, unsigned long len, unsigned long pgoff,
unsigned long flags) unsigned long flags, vm_flags_t vm_flags)
{ {
return arch_get_unmapped_area_common(filp, return arch_get_unmapped_area_common(filp,
addr0, len, pgoff, flags, DOWN); addr0, len, pgoff, flags, DOWN);

View File

@ -502,7 +502,6 @@ config MACH_LOONGSON64
select USE_OF select USE_OF
select BUILTIN_DTB select BUILTIN_DTB
select PCI_HOST_GENERIC select PCI_HOST_GENERIC
select HAVE_ARCH_NODEDATA_EXTENSION if NUMA
help help
This enables the support of Loongson-2/3 family of machines. This enables the support of Loongson-2/3 family of machines.
@ -735,7 +734,6 @@ config SGI_IP27
select WAR_R10000_LLSC select WAR_R10000_LLSC
select MIPS_L1_CACHE_SHIFT_7 select MIPS_L1_CACHE_SHIFT_7
select NUMA select NUMA
select HAVE_ARCH_NODEDATA_EXTENSION
help help
This are the SGI Origin 200, Origin 2000 and Onyx 2 Graphics This are the SGI Origin 200, Origin 2000 and Onyx 2 Graphics
workstations. To compile a Linux kernel that runs on these, say Y workstations. To compile a Linux kernel that runs on these, say Y
@ -2613,9 +2611,6 @@ config NUMA
config SYS_SUPPORTS_NUMA config SYS_SUPPORTS_NUMA
bool bool
config HAVE_ARCH_NODEDATA_EXTENSION
bool
config RELOCATABLE config RELOCATABLE
bool "Relocatable kernel" bool "Relocatable kernel"
depends on SYS_SUPPORTS_RELOCATABLE depends on SYS_SUPPORTS_RELOCATABLE

View File

@ -22,7 +22,6 @@ struct node_data {
extern struct node_data *__node_data[]; extern struct node_data *__node_data[];
#define NODE_DATA(n) (&__node_data[(n)]->pglist)
#define hub_data(n) (&__node_data[(n)]->hub) #define hub_data(n) (&__node_data[(n)]->hub)
#endif /* _ASM_MACH_MMZONE_H */ #endif /* _ASM_MACH_MMZONE_H */

View File

@ -14,10 +14,6 @@
#define pa_to_nid(addr) (((addr) & 0xf00000000000) >> NODE_ADDRSPACE_SHIFT) #define pa_to_nid(addr) (((addr) & 0xf00000000000) >> NODE_ADDRSPACE_SHIFT)
#define nid_to_addrbase(nid) ((unsigned long)(nid) << NODE_ADDRSPACE_SHIFT) #define nid_to_addrbase(nid) ((unsigned long)(nid) << NODE_ADDRSPACE_SHIFT)
extern struct pglist_data *__node_data[];
#define NODE_DATA(n) (__node_data[n])
extern void __init prom_init_numa_memory(void); extern void __init prom_init_numa_memory(void);
#endif /* _ASM_MACH_MMZONE_H */ #endif /* _ASM_MACH_MMZONE_H */

View File

@ -29,8 +29,6 @@
unsigned char __node_distances[MAX_NUMNODES][MAX_NUMNODES]; unsigned char __node_distances[MAX_NUMNODES][MAX_NUMNODES];
EXPORT_SYMBOL(__node_distances); EXPORT_SYMBOL(__node_distances);
struct pglist_data *__node_data[MAX_NUMNODES];
EXPORT_SYMBOL(__node_data);
cpumask_t __node_cpumask[MAX_NUMNODES]; cpumask_t __node_cpumask[MAX_NUMNODES];
EXPORT_SYMBOL(__node_cpumask); EXPORT_SYMBOL(__node_cpumask);
@ -83,12 +81,8 @@ static void __init init_topology_matrix(void)
static void __init node_mem_init(unsigned int node) static void __init node_mem_init(unsigned int node)
{ {
struct pglist_data *nd;
unsigned long node_addrspace_offset; unsigned long node_addrspace_offset;
unsigned long start_pfn, end_pfn; unsigned long start_pfn, end_pfn;
unsigned long nd_pa;
int tnid;
const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
node_addrspace_offset = nid_to_addrbase(node); node_addrspace_offset = nid_to_addrbase(node);
pr_info("Node%d's addrspace_offset is 0x%lx\n", pr_info("Node%d's addrspace_offset is 0x%lx\n",
@ -98,16 +92,8 @@ static void __init node_mem_init(unsigned int node)
pr_info("Node%d: start_pfn=0x%lx, end_pfn=0x%lx\n", pr_info("Node%d: start_pfn=0x%lx, end_pfn=0x%lx\n",
node, start_pfn, end_pfn); node, start_pfn, end_pfn);
nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, node); alloc_node_data(node);
if (!nd_pa)
panic("Cannot allocate %zu bytes for node %d data\n",
nd_size, node);
nd = __va(nd_pa);
memset(nd, 0, sizeof(struct pglist_data));
tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
if (tnid != node)
pr_info("NODE_DATA(%d) on node %d\n", node, tnid);
__node_data[node] = nd;
NODE_DATA(node)->node_start_pfn = start_pfn; NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn; NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
@ -198,13 +184,3 @@ void __init prom_init_numa_memory(void)
pr_info("CP0_PageGrain: CP0 5.1 (0x%x)\n", read_c0_pagegrain()); pr_info("CP0_PageGrain: CP0 5.1 (0x%x)\n", read_c0_pagegrain());
prom_meminit(); prom_meminit();
} }
pg_data_t * __init arch_alloc_nodedata(int nid)
{
return memblock_alloc(sizeof(pg_data_t), SMP_CACHE_BYTES);
}
void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
{
__node_data[nid] = pgdat;
}

View File

@ -98,7 +98,8 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
} }
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr0, unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr0,
unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long len, unsigned long pgoff, unsigned long flags,
vm_flags_t vm_flags)
{ {
return arch_get_unmapped_area_common(filp, return arch_get_unmapped_area_common(filp,
addr0, len, pgoff, flags, UP); addr0, len, pgoff, flags, UP);
@ -110,7 +111,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr0,
*/ */
unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long arch_get_unmapped_area_topdown(struct file *filp,
unsigned long addr0, unsigned long len, unsigned long pgoff, unsigned long addr0, unsigned long len, unsigned long pgoff,
unsigned long flags) unsigned long flags, vm_flags_t vm_flags)
{ {
return arch_get_unmapped_area_common(filp, return arch_get_unmapped_area_common(filp,
addr0, len, pgoff, flags, DOWN); addr0, len, pgoff, flags, DOWN);

View File

@ -35,7 +35,6 @@
#define PFN_NASIDSHFT (NASID_SHFT - PAGE_SHIFT) #define PFN_NASIDSHFT (NASID_SHFT - PAGE_SHIFT)
struct node_data *__node_data[MAX_NUMNODES]; struct node_data *__node_data[MAX_NUMNODES];
EXPORT_SYMBOL(__node_data); EXPORT_SYMBOL(__node_data);
static u64 gen_region_mask(void) static u64 gen_region_mask(void)
@ -361,6 +360,7 @@ static void __init node_mem_init(nasid_t node)
*/ */
__node_data[node] = __va(slot_freepfn << PAGE_SHIFT); __node_data[node] = __va(slot_freepfn << PAGE_SHIFT);
memset(__node_data[node], 0, PAGE_SIZE); memset(__node_data[node], 0, PAGE_SIZE);
node_data[node] = &__node_data[node]->pglist;
NODE_DATA(node)->node_start_pfn = start_pfn; NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn; NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
@ -423,13 +423,3 @@ void __init mem_init(void)
memblock_free_all(); memblock_free_all();
setup_zero_pages(); /* This comes from node 0 */ setup_zero_pages(); /* This comes from node 0 */
} }
pg_data_t * __init arch_alloc_nodedata(int nid)
{
return memblock_alloc(sizeof(pg_data_t), SMP_CACHE_BYTES);
}
void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
{
__node_data[nid] = (struct node_data *)pgdat;
}

View File

@ -70,11 +70,13 @@ void cpu_node_probe(void)
gda_t *gdap = GDA; gda_t *gdap = GDA;
nodes_clear(node_online_map); nodes_clear(node_online_map);
nodes_clear(node_possible_map);
for (i = 0; i < MAX_NUMNODES; i++) { for (i = 0; i < MAX_NUMNODES; i++) {
nasid_t nasid = gdap->g_nasidtable[i]; nasid_t nasid = gdap->g_nasidtable[i];
if (nasid == INVALID_NASID) if (nasid == INVALID_NASID)
break; break;
node_set_online(nasid); node_set_online(nasid);
node_set(nasid, node_possible_map);
highest = node_scan_cpus(nasid, highest); highest = node_scan_cpus(nasid, highest);
} }

View File

@ -82,6 +82,10 @@ void __init mmu_init(void)
pgd_t swapper_pg_dir[PTRS_PER_PGD] __aligned(PAGE_SIZE); pgd_t swapper_pg_dir[PTRS_PER_PGD] __aligned(PAGE_SIZE);
pte_t invalid_pte_table[PTRS_PER_PTE] __aligned(PAGE_SIZE); pte_t invalid_pte_table[PTRS_PER_PTE] __aligned(PAGE_SIZE);
static struct page *kuser_page[1]; static struct page *kuser_page[1];
static struct vm_special_mapping vdso_mapping = {
.name = "[vdso]",
.pages = kuser_page,
};
static int alloc_kuser_page(void) static int alloc_kuser_page(void)
{ {
@ -106,18 +110,18 @@ arch_initcall(alloc_kuser_page);
int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
int ret; struct vm_area_struct *vma;
mmap_write_lock(mm); mmap_write_lock(mm);
/* Map kuser helpers to user space address */ /* Map kuser helpers to user space address */
ret = install_special_mapping(mm, KUSER_BASE, KUSER_SIZE, vma = _install_special_mapping(mm, KUSER_BASE, KUSER_SIZE,
VM_READ | VM_EXEC | VM_MAYREAD | VM_READ | VM_EXEC | VM_MAYREAD |
VM_MAYEXEC, kuser_page); VM_MAYEXEC, &vdso_mapping);
mmap_write_unlock(mm); mmap_write_unlock(mm);
return ret; return IS_ERR(vma) ? PTR_ERR(vma) : 0;
} }
const char *arch_vma_name(struct vm_area_struct *vma) const char *arch_vma_name(struct vm_area_struct *vma)

View File

@ -167,7 +167,8 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
} }
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long len, unsigned long pgoff, unsigned long flags,
vm_flags_t vm_flags)
{ {
return arch_get_unmapped_area_common(filp, return arch_get_unmapped_area_common(filp,
addr, len, pgoff, flags, UP); addr, len, pgoff, flags, UP);
@ -175,7 +176,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long arch_get_unmapped_area_topdown(struct file *filp,
unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long addr, unsigned long len, unsigned long pgoff,
unsigned long flags) unsigned long flags, vm_flags_t vm_flags)
{ {
return arch_get_unmapped_area_common(filp, return arch_get_unmapped_area_common(filp,
addr, len, pgoff, flags, DOWN); addr, len, pgoff, flags, DOWN);

View File

@ -40,7 +40,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
addr = ALIGN(addr, huge_page_size(h)); addr = ALIGN(addr, huge_page_size(h));
/* we need to make sure the colouring is OK */ /* we need to make sure the colouring is OK */
return arch_get_unmapped_area(file, addr, len, pgoff, flags); return arch_get_unmapped_area(file, addr, len, pgoff, flags, 0);
} }

View File

@ -81,7 +81,6 @@ CONFIG_MODULE_SIG_SHA512=y
CONFIG_PARTITION_ADVANCED=y CONFIG_PARTITION_ADVANCED=y
CONFIG_BINFMT_MISC=m CONFIG_BINFMT_MISC=m
CONFIG_ZSWAP=y CONFIG_ZSWAP=y
CONFIG_Z3FOLD=y
CONFIG_ZSMALLOC=y CONFIG_ZSMALLOC=y
# CONFIG_SLAB_MERGE_DEFAULT is not set # CONFIG_SLAB_MERGE_DEFAULT is not set
CONFIG_SLAB_FREELIST_RANDOM=y CONFIG_SLAB_FREELIST_RANDOM=y

View File

@ -1098,6 +1098,7 @@ extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot);
extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot); extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot);
extern pud_t pud_modify(pud_t pud, pgprot_t newprot);
extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd); pmd_t *pmdp, pmd_t pmd);
extern void set_pud_at(struct mm_struct *mm, unsigned long addr, extern void set_pud_at(struct mm_struct *mm, unsigned long addr,
@ -1358,6 +1359,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
#define __HAVE_ARCH_PMDP_INVALIDATE #define __HAVE_ARCH_PMDP_INVALIDATE
extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp); pmd_t *pmdp);
extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
pud_t *pudp);
#define pmd_move_must_withdraw pmd_move_must_withdraw #define pmd_move_must_withdraw pmd_move_must_withdraw
struct spinlock; struct spinlock;

View File

@ -257,15 +257,6 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
extern void arch_exit_mmap(struct mm_struct *mm); extern void arch_exit_mmap(struct mm_struct *mm);
static inline void arch_unmap(struct mm_struct *mm,
unsigned long start, unsigned long end)
{
unsigned long vdso_base = (unsigned long)mm->context.vdso;
if (start <= vdso_base && vdso_base < end)
mm->context.vdso = NULL;
}
#ifdef CONFIG_PPC_MEM_KEYS #ifdef CONFIG_PPC_MEM_KEYS
bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write, bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
bool execute, bool foreign); bool execute, bool foreign);

View File

@ -20,12 +20,6 @@
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
extern struct pglist_data *node_data[];
/*
* Return a pointer to the node data for node n.
*/
#define NODE_DATA(nid) (node_data[nid])
/* /*
* Following are specific to this numa platform. * Following are specific to this numa platform.
*/ */

View File

@ -65,6 +65,7 @@ static inline unsigned long pte_pfn(pte_t pte)
/* /*
* Select all bits except the pfn * Select all bits except the pfn
*/ */
#define pte_pgprot pte_pgprot
static inline pgprot_t pte_pgprot(pte_t pte) static inline pgprot_t pte_pgprot(pte_t pte)
{ {
unsigned long pte_flags; unsigned long pte_flags;

View File

@ -81,6 +81,21 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start); return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
} }
static void vdso_close(const struct vm_special_mapping *sm, struct vm_area_struct *vma)
{
struct mm_struct *mm = vma->vm_mm;
/*
* close() is called for munmap() but also for mremap(). In the mremap()
* case the vdso pointer has already been updated by the mremap() hook
* above, so it must not be set to NULL here.
*/
if (vma->vm_start != (unsigned long)mm->context.vdso)
return;
mm->context.vdso = NULL;
}
static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
struct vm_area_struct *vma, struct vm_fault *vmf); struct vm_area_struct *vma, struct vm_fault *vmf);
@ -92,11 +107,13 @@ static struct vm_special_mapping vvar_spec __ro_after_init = {
static struct vm_special_mapping vdso32_spec __ro_after_init = { static struct vm_special_mapping vdso32_spec __ro_after_init = {
.name = "[vdso]", .name = "[vdso]",
.mremap = vdso32_mremap, .mremap = vdso32_mremap,
.close = vdso_close,
}; };
static struct vm_special_mapping vdso64_spec __ro_after_init = { static struct vm_special_mapping vdso64_spec __ro_after_init = {
.name = "[vdso]", .name = "[vdso]",
.mremap = vdso64_mremap, .mremap = vdso64_mremap,
.close = vdso_close,
}; };
#ifdef CONFIG_TIME_NS #ifdef CONFIG_TIME_NS
@ -197,13 +214,6 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
/* Add required alignment. */ /* Add required alignment. */
vdso_base = ALIGN(vdso_base, VDSO_ALIGNMENT); vdso_base = ALIGN(vdso_base, VDSO_ALIGNMENT);
/*
* Put vDSO base into mm struct. We need to do this before calling
* install_special_mapping or the perf counter mmap tracking code
* will fail to recognise it as a vDSO.
*/
mm->context.vdso = (void __user *)vdso_base + vvar_size;
vma = _install_special_mapping(mm, vdso_base, vvar_size, vma = _install_special_mapping(mm, vdso_base, vvar_size,
VM_READ | VM_MAYREAD | VM_IO | VM_READ | VM_MAYREAD | VM_IO |
VM_DONTDUMP | VM_PFNMAP, &vvar_spec); VM_DONTDUMP | VM_PFNMAP, &vvar_spec);
@ -223,10 +233,15 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
vma = _install_special_mapping(mm, vdso_base + vvar_size, vdso_size, vma = _install_special_mapping(mm, vdso_base + vvar_size, vdso_size,
VM_READ | VM_EXEC | VM_MAYREAD | VM_READ | VM_EXEC | VM_MAYREAD |
VM_MAYWRITE | VM_MAYEXEC, vdso_spec); VM_MAYWRITE | VM_MAYEXEC, vdso_spec);
if (IS_ERR(vma)) if (IS_ERR(vma)) {
do_munmap(mm, vdso_base, vvar_size, NULL); do_munmap(mm, vdso_base, vvar_size, NULL);
return PTR_ERR(vma);
}
return PTR_ERR_OR_ZERO(vma); // Now that the mappings are in place, set the mm VDSO pointer
mm->context.vdso = (void __user *)vdso_base + vvar_size;
return 0;
} }
int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
@ -240,8 +255,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
return -EINTR; return -EINTR;
rc = __arch_setup_additional_pages(bprm, uses_interp); rc = __arch_setup_additional_pages(bprm, uses_interp);
if (rc)
mm->context.vdso = NULL;
mmap_write_unlock(mm); mmap_write_unlock(mm);
return rc; return rc;

View File

@ -176,6 +176,17 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
return __pmd(old_pmd); return __pmd(old_pmd);
} }
pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
pud_t *pudp)
{
unsigned long old_pud;
VM_WARN_ON_ONCE(!pud_present(*pudp));
old_pud = pud_hugepage_update(vma->vm_mm, address, pudp, _PAGE_PRESENT, _PAGE_INVALID);
flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE);
return __pud(old_pud);
}
pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp, int full) unsigned long addr, pmd_t *pmdp, int full)
{ {
@ -259,6 +270,15 @@ pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
pmdv &= _HPAGE_CHG_MASK; pmdv &= _HPAGE_CHG_MASK;
return pmd_set_protbits(__pmd(pmdv), newprot); return pmd_set_protbits(__pmd(pmdv), newprot);
} }
pud_t pud_modify(pud_t pud, pgprot_t newprot)
{
unsigned long pudv;
pudv = pud_val(pud);
pudv &= _HPAGE_CHG_MASK;
return pud_set_protbits(__pud(pudv), newprot);
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
/* For use by kexec, called with MMU off */ /* For use by kexec, called with MMU off */

View File

@ -637,10 +637,11 @@ unsigned long arch_get_unmapped_area(struct file *filp,
unsigned long addr, unsigned long addr,
unsigned long len, unsigned long len,
unsigned long pgoff, unsigned long pgoff,
unsigned long flags) unsigned long flags,
vm_flags_t vm_flags)
{ {
if (radix_enabled()) if (radix_enabled())
return generic_get_unmapped_area(filp, addr, len, pgoff, flags); return generic_get_unmapped_area(filp, addr, len, pgoff, flags, vm_flags);
return slice_get_unmapped_area(addr, len, flags, return slice_get_unmapped_area(addr, len, flags,
mm_ctx_user_psize(&current->mm->context), 0); mm_ctx_user_psize(&current->mm->context), 0);
@ -650,10 +651,11 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp,
const unsigned long addr0, const unsigned long addr0,
const unsigned long len, const unsigned long len,
const unsigned long pgoff, const unsigned long pgoff,
const unsigned long flags) const unsigned long flags,
vm_flags_t vm_flags)
{ {
if (radix_enabled()) if (radix_enabled())
return generic_get_unmapped_area_topdown(filp, addr0, len, pgoff, flags); return generic_get_unmapped_area_topdown(filp, addr0, len, pgoff, flags, vm_flags);
return slice_get_unmapped_area(addr0, len, flags, return slice_get_unmapped_area(addr0, len, flags,
mm_ctx_user_psize(&current->mm->context), 1); mm_ctx_user_psize(&current->mm->context), 1);

View File

@ -43,11 +43,9 @@ static char *cmdline __initdata;
int numa_cpu_lookup_table[NR_CPUS]; int numa_cpu_lookup_table[NR_CPUS];
cpumask_var_t node_to_cpumask_map[MAX_NUMNODES]; cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
struct pglist_data *node_data[MAX_NUMNODES];
EXPORT_SYMBOL(numa_cpu_lookup_table); EXPORT_SYMBOL(numa_cpu_lookup_table);
EXPORT_SYMBOL(node_to_cpumask_map); EXPORT_SYMBOL(node_to_cpumask_map);
EXPORT_SYMBOL(node_data);
static int primary_domain_index; static int primary_domain_index;
static int n_mem_addr_cells, n_mem_size_cells; static int n_mem_addr_cells, n_mem_size_cells;
@ -1095,27 +1093,9 @@ void __init dump_numa_cpu_topology(void)
static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
{ {
u64 spanned_pages = end_pfn - start_pfn; u64 spanned_pages = end_pfn - start_pfn;
const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
u64 nd_pa;
void *nd;
int tnid;
nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); alloc_node_data(nid);
if (!nd_pa)
panic("Cannot allocate %zu bytes for node %d data\n",
nd_size, nid);
nd = __va(nd_pa);
/* report and initialize */
pr_info(" NODE_DATA [mem %#010Lx-%#010Lx]\n",
nd_pa, nd_pa + nd_size - 1);
tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
if (tnid != nid)
pr_info(" NODE_DATA(%d) on node %d\n", nid, tnid);
node_data[nid] = nd;
memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
NODE_DATA(nid)->node_id = nid; NODE_DATA(nid)->node_id = nid;
NODE_DATA(nid)->node_start_pfn = start_pfn; NODE_DATA(nid)->node_start_pfn = start_pfn;
NODE_DATA(nid)->node_spanned_pages = spanned_pages; NODE_DATA(nid)->node_spanned_pages = spanned_pages;

View File

@ -136,10 +136,10 @@ void pte_fragment_free(unsigned long *table, int kernel)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE #ifdef CONFIG_TRANSPARENT_HUGEPAGE
void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
{ {
struct page *page; struct folio *folio;
page = virt_to_page(pgtable); folio = virt_to_folio(pgtable);
SetPageActive(page); folio_set_active(folio);
pte_fragment_free((unsigned long *)pgtable, 0); pte_fragment_free((unsigned long *)pgtable, 0);
} }
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */

View File

@ -297,6 +297,12 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
} }
#if defined(CONFIG_PPC_8xx) #if defined(CONFIG_PPC_8xx)
#if defined(CONFIG_SPLIT_PTE_PTLOCKS) || defined(CONFIG_SPLIT_PMD_PTLOCKS)
/* We need the same lock to protect the PMD table and the two PTE tables. */
#error "8M hugetlb folios are incompatible with split page table locks"
#endif
static void __set_huge_pte_at(pmd_t *pmd, pte_t *ptep, pte_basic_t val) static void __set_huge_pte_at(pmd_t *pmd, pte_t *ptep, pte_basic_t val)
{ {
pte_basic_t *entry = (pte_basic_t *)ptep; pte_basic_t *entry = (pte_basic_t *)ptep;

View File

@ -156,10 +156,7 @@ static int vpd_blob_extend(struct vpd_blob *blob, const char *data, size_t len)
const char *old_ptr = blob->data; const char *old_ptr = blob->data;
char *new_ptr; char *new_ptr;
new_ptr = old_ptr ? new_ptr = kvrealloc(old_ptr, new_len, GFP_KERNEL_ACCOUNT);
kvrealloc(old_ptr, old_len, new_len, GFP_KERNEL_ACCOUNT) :
kvmalloc(len, GFP_KERNEL_ACCOUNT);
if (!new_ptr) if (!new_ptr)
return -ENOMEM; return -ENOMEM;

View File

@ -5,6 +5,7 @@ syscall-y += syscall_table_64.h
generic-y += early_ioremap.h generic-y += early_ioremap.h
generic-y += flat.h generic-y += flat.h
generic-y += kvm_para.h generic-y += kvm_para.h
generic-y += mmzone.h
generic-y += parport.h generic-y += parport.h
generic-y += spinlock.h generic-y += spinlock.h
generic-y += spinlock_types.h generic-y += spinlock_types.h

View File

@ -1,13 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_MMZONE_H
#define __ASM_MMZONE_H
#ifdef CONFIG_NUMA
#include <asm/numa.h>
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[(nid)])
#endif /* CONFIG_NUMA */
#endif /* __ASM_MMZONE_H */

View File

@ -4,6 +4,10 @@
#include <linux/arch_topology.h> #include <linux/arch_topology.h>
#ifdef CONFIG_NUMA
#include <asm/numa.h>
#endif
/* Replace task scheduler's default frequency-invariant accounting */ /* Replace task scheduler's default frequency-invariant accounting */
#define arch_scale_freq_tick topology_scale_freq_tick #define arch_scale_freq_tick topology_scale_freq_tick
#define arch_set_freq_scale topology_set_freq_scale #define arch_set_freq_scale topology_set_freq_scale

View File

@ -7,3 +7,4 @@ generated-y += unistd_nr.h
generic-y += asm-offsets.h generic-y += asm-offsets.h
generic-y += kvm_types.h generic-y += kvm_types.h
generic-y += mcs_spinlock.h generic-y += mcs_spinlock.h
generic-y += mmzone.h

View File

@ -1,17 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* NUMA support for s390
*
* Copyright IBM Corp. 2015
*/
#ifndef _ASM_S390_MMZONE_H
#define _ASM_S390_MMZONE_H
#ifdef CONFIG_NUMA
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[nid])
#endif /* CONFIG_NUMA */
#endif /* _ASM_S390_MMZONE_H */

View File

@ -176,8 +176,6 @@ static inline int devmem_is_allowed(unsigned long pfn)
int arch_make_folio_accessible(struct folio *folio); int arch_make_folio_accessible(struct folio *folio);
#define HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE #define HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
int arch_make_page_accessible(struct page *page);
#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
struct vm_layout { struct vm_layout {
unsigned long kaslr_offset; unsigned long kaslr_offset;

View File

@ -955,6 +955,7 @@ static inline int pte_unused(pte_t pte)
* young/old accounting is not supported, i.e _PAGE_PROTECT and _PAGE_INVALID * young/old accounting is not supported, i.e _PAGE_PROTECT and _PAGE_INVALID
* must not be set. * must not be set.
*/ */
#define pte_pgprot pte_pgprot
static inline pgprot_t pte_pgprot(pte_t pte) static inline pgprot_t pte_pgprot(pte_t pte)
{ {
unsigned long pte_flags = pte_val(pte) & _PAGE_CHG_MASK; unsigned long pte_flags = pte_val(pte) & _PAGE_CHG_MASK;

View File

@ -14,9 +14,6 @@
#include <linux/node.h> #include <linux/node.h>
#include <asm/numa.h> #include <asm/numa.h>
struct pglist_data *node_data[MAX_NUMNODES];
EXPORT_SYMBOL(node_data);
void __init numa_setup(void) void __init numa_setup(void)
{ {
int nid; int nid;

View File

@ -14,6 +14,7 @@
#include <linux/memblock.h> #include <linux/memblock.h>
#include <linux/pagemap.h> #include <linux/pagemap.h>
#include <linux/swap.h> #include <linux/swap.h>
#include <linux/pagewalk.h>
#include <asm/facility.h> #include <asm/facility.h>
#include <asm/sections.h> #include <asm/sections.h>
#include <asm/uv.h> #include <asm/uv.h>
@ -462,9 +463,9 @@ EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr) int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
{ {
struct vm_area_struct *vma; struct vm_area_struct *vma;
struct folio_walk fw;
unsigned long uaddr; unsigned long uaddr;
struct folio *folio; struct folio *folio;
struct page *page;
int rc; int rc;
rc = -EFAULT; rc = -EFAULT;
@ -483,11 +484,15 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
goto out; goto out;
rc = 0; rc = 0;
/* we take an extra reference here */ folio = folio_walk_start(&fw, vma, uaddr, 0);
page = follow_page(vma, uaddr, FOLL_WRITE | FOLL_GET); if (!folio)
if (IS_ERR_OR_NULL(page))
goto out; goto out;
folio = page_folio(page); /*
* See gmap_make_secure(): large folios cannot be secure. Small
* folio implies FW_LEVEL_PTE.
*/
if (folio_test_large(folio) || !pte_write(fw.pte))
goto out_walk_end;
rc = uv_destroy_folio(folio); rc = uv_destroy_folio(folio);
/* /*
* Fault handlers can race; it is possible that two CPUs will fault * Fault handlers can race; it is possible that two CPUs will fault
@ -500,7 +505,8 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
*/ */
if (rc) if (rc)
rc = uv_convert_from_secure_folio(folio); rc = uv_convert_from_secure_folio(folio);
folio_put(folio); out_walk_end:
folio_walk_end(&fw, vma);
out: out:
mmap_read_unlock(gmap->mm); mmap_read_unlock(gmap->mm);
return rc; return rc;
@ -548,11 +554,6 @@ int arch_make_folio_accessible(struct folio *folio)
} }
EXPORT_SYMBOL_GPL(arch_make_folio_accessible); EXPORT_SYMBOL_GPL(arch_make_folio_accessible);
int arch_make_page_accessible(struct page *page)
{
return arch_make_folio_accessible(page_folio(page));
}
EXPORT_SYMBOL_GPL(arch_make_page_accessible);
static ssize_t uv_query_facilities(struct kobject *kobj, static ssize_t uv_query_facilities(struct kobject *kobj,
struct kobj_attribute *attr, char *buf) struct kobj_attribute *attr, char *buf)
{ {

View File

@ -34,6 +34,7 @@
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <linux/hugetlb.h> #include <linux/hugetlb.h>
#include <linux/kfence.h> #include <linux/kfence.h>
#include <linux/pagewalk.h>
#include <asm/asm-extable.h> #include <asm/asm-extable.h>
#include <asm/asm-offsets.h> #include <asm/asm-offsets.h>
#include <asm/ptrace.h> #include <asm/ptrace.h>
@ -492,9 +493,9 @@ void do_secure_storage_access(struct pt_regs *regs)
union teid teid = { .val = regs->int_parm_long }; union teid teid = { .val = regs->int_parm_long };
unsigned long addr = get_fault_address(regs); unsigned long addr = get_fault_address(regs);
struct vm_area_struct *vma; struct vm_area_struct *vma;
struct folio_walk fw;
struct mm_struct *mm; struct mm_struct *mm;
struct folio *folio; struct folio *folio;
struct page *page;
struct gmap *gmap; struct gmap *gmap;
int rc; int rc;
@ -536,15 +537,18 @@ void do_secure_storage_access(struct pt_regs *regs)
vma = find_vma(mm, addr); vma = find_vma(mm, addr);
if (!vma) if (!vma)
return handle_fault_error(regs, SEGV_MAPERR); return handle_fault_error(regs, SEGV_MAPERR);
page = follow_page(vma, addr, FOLL_WRITE | FOLL_GET); folio = folio_walk_start(&fw, vma, addr, 0);
if (IS_ERR_OR_NULL(page)) { if (!folio) {
mmap_read_unlock(mm); mmap_read_unlock(mm);
break; break;
} }
folio = page_folio(page); /* arch_make_folio_accessible() needs a raised refcount. */
if (arch_make_folio_accessible(folio)) folio_get(folio);
send_sig(SIGSEGV, current, 0); rc = arch_make_folio_accessible(folio);
folio_put(folio); folio_put(folio);
folio_walk_end(&fw, vma);
if (rc)
send_sig(SIGSEGV, current, 0);
mmap_read_unlock(mm); mmap_read_unlock(mm);
break; break;
case KERNEL_FAULT: case KERNEL_FAULT:

View File

@ -82,7 +82,7 @@ static int get_align_mask(struct file *filp, unsigned long flags)
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long len, unsigned long pgoff,
unsigned long flags) unsigned long flags, vm_flags_t vm_flags)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct *vma; struct vm_area_struct *vma;
@ -117,7 +117,7 @@ check_asce_limit:
unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long len, unsigned long pgoff,
unsigned long flags) unsigned long flags, vm_flags_t vm_flags)
{ {
struct vm_area_struct *vma; struct vm_area_struct *vma;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;

View File

@ -118,12 +118,11 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr, SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
const void __user *, user_buffer, size_t, length) const void __user *, user_buffer, size_t, length)
{ {
struct follow_pfnmap_args args = { };
u8 local_buf[64]; u8 local_buf[64];
void __iomem *io_addr; void __iomem *io_addr;
void *buf; void *buf;
struct vm_area_struct *vma; struct vm_area_struct *vma;
pte_t *ptep;
spinlock_t *ptl;
long ret; long ret;
if (!zpci_is_enabled()) if (!zpci_is_enabled())
@ -169,11 +168,13 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
if (!(vma->vm_flags & VM_WRITE)) if (!(vma->vm_flags & VM_WRITE))
goto out_unlock_mmap; goto out_unlock_mmap;
ret = follow_pte(vma, mmio_addr, &ptep, &ptl); args.address = mmio_addr;
args.vma = vma;
ret = follow_pfnmap_start(&args);
if (ret) if (ret)
goto out_unlock_mmap; goto out_unlock_mmap;
io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) | io_addr = (void __iomem *)((args.pfn << PAGE_SHIFT) |
(mmio_addr & ~PAGE_MASK)); (mmio_addr & ~PAGE_MASK));
if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
@ -181,7 +182,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
ret = zpci_memcpy_toio(io_addr, buf, length); ret = zpci_memcpy_toio(io_addr, buf, length);
out_unlock_pt: out_unlock_pt:
pte_unmap_unlock(ptep, ptl); follow_pfnmap_end(&args);
out_unlock_mmap: out_unlock_mmap:
mmap_read_unlock(current->mm); mmap_read_unlock(current->mm);
out_free: out_free:
@ -260,12 +261,11 @@ static inline int __memcpy_fromio_inuser(void __user *dst,
SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr, SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
void __user *, user_buffer, size_t, length) void __user *, user_buffer, size_t, length)
{ {
struct follow_pfnmap_args args = { };
u8 local_buf[64]; u8 local_buf[64];
void __iomem *io_addr; void __iomem *io_addr;
void *buf; void *buf;
struct vm_area_struct *vma; struct vm_area_struct *vma;
pte_t *ptep;
spinlock_t *ptl;
long ret; long ret;
if (!zpci_is_enabled()) if (!zpci_is_enabled())
@ -308,11 +308,13 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
if (!(vma->vm_flags & VM_WRITE)) if (!(vma->vm_flags & VM_WRITE))
goto out_unlock_mmap; goto out_unlock_mmap;
ret = follow_pte(vma, mmio_addr, &ptep, &ptl); args.vma = vma;
args.address = mmio_addr;
ret = follow_pfnmap_start(&args);
if (ret) if (ret)
goto out_unlock_mmap; goto out_unlock_mmap;
io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) | io_addr = (void __iomem *)((args.pfn << PAGE_SHIFT) |
(mmio_addr & ~PAGE_MASK)); (mmio_addr & ~PAGE_MASK));
if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) { if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
@ -322,7 +324,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
ret = zpci_memcpy_fromio(buf, io_addr, length); ret = zpci_memcpy_fromio(buf, io_addr, length);
out_unlock_pt: out_unlock_pt:
pte_unmap_unlock(ptep, ptl); follow_pfnmap_end(&args);
out_unlock_mmap: out_unlock_mmap:
mmap_read_unlock(current->mm); mmap_read_unlock(current->mm);

View File

@ -5,9 +5,6 @@
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
#include <linux/numa.h> #include <linux/numa.h>
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[nid])
static inline int pfn_to_nid(unsigned long pfn) static inline int pfn_to_nid(unsigned long pfn)
{ {
int nid; int nid;

View File

@ -36,6 +36,10 @@ __setup("vdso=", vdso_setup);
*/ */
extern const char vsyscall_trapa_start, vsyscall_trapa_end; extern const char vsyscall_trapa_start, vsyscall_trapa_end;
static struct page *syscall_pages[1]; static struct page *syscall_pages[1];
static struct vm_special_mapping vdso_mapping = {
.name = "[vdso]",
.pages = syscall_pages,
};
int __init vsyscall_init(void) int __init vsyscall_init(void)
{ {
@ -58,6 +62,7 @@ int __init vsyscall_init(void)
int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
unsigned long addr; unsigned long addr;
int ret; int ret;
@ -70,14 +75,17 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
goto up_fail; goto up_fail;
} }
ret = install_special_mapping(mm, addr, PAGE_SIZE, vdso_mapping.pages = syscall_pages;
vma = _install_special_mapping(mm, addr, PAGE_SIZE,
VM_READ | VM_EXEC | VM_READ | VM_EXEC |
VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC, VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
syscall_pages); &vdso_mapping);
if (unlikely(ret)) ret = PTR_ERR(vma);
if (IS_ERR(vma))
goto up_fail; goto up_fail;
current->mm->context.vdso = (void *)addr; current->mm->context.vdso = (void *)addr;
ret = 0;
up_fail: up_fail:
mmap_write_unlock(mm); mmap_write_unlock(mm);

View File

@ -212,12 +212,7 @@ void __init allocate_pgdat(unsigned int nid)
get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
NODE_DATA(nid) = memblock_alloc_try_nid( alloc_node_data(nid);
sizeof(struct pglist_data),
SMP_CACHE_BYTES, MEMBLOCK_LOW_LIMIT,
MEMBLOCK_ALLOC_ACCESSIBLE, nid);
if (!NODE_DATA(nid))
panic("Can't allocate pgdat for node %d\n", nid);
#endif #endif
NODE_DATA(nid)->node_start_pfn = start_pfn; NODE_DATA(nid)->node_start_pfn = start_pfn;

View File

@ -52,7 +52,8 @@ static inline unsigned long COLOUR_ALIGN(unsigned long addr,
} }
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long len, unsigned long pgoff, unsigned long flags,
vm_flags_t vm_flags)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct *vma; struct vm_area_struct *vma;
@ -99,7 +100,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long unsigned long
arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
const unsigned long len, const unsigned long pgoff, const unsigned long len, const unsigned long pgoff,
const unsigned long flags) const unsigned long flags, vm_flags_t vm_flags)
{ {
struct vm_area_struct *vma; struct vm_area_struct *vma;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;

View File

@ -14,9 +14,6 @@
#include <linux/pfn.h> #include <linux/pfn.h>
#include <asm/sections.h> #include <asm/sections.h>
struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
EXPORT_SYMBOL_GPL(node_data);
/* /*
* On SH machines the conventional approach is to stash system RAM * On SH machines the conventional approach is to stash system RAM
* in node 0, and other memory blocks in to node 1 and up, ordered by * in node 0, and other memory blocks in to node 1 and up, ordered by

View File

@ -6,10 +6,6 @@
#include <linux/cpumask.h> #include <linux/cpumask.h>
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[nid])
extern int numa_cpu_lookup_table[]; extern int numa_cpu_lookup_table[];
extern cpumask_t numa_cpumask_lookup_table[]; extern cpumask_t numa_cpumask_lookup_table[];

View File

@ -783,6 +783,7 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd)
return __pmd(pte_val(pte)); return __pmd(pte_val(pte));
} }
#define pmd_pgprot pmd_pgprot
static inline pgprot_t pmd_pgprot(pmd_t entry) static inline pgprot_t pmd_pgprot(pmd_t entry)
{ {
unsigned long val = pmd_val(entry); unsigned long val = pmd_val(entry);

View File

@ -39,7 +39,7 @@ SYSCALL_DEFINE0(getpagesize)
return PAGE_SIZE; /* Possibly older binaries want 8192 on sun4's? */ return PAGE_SIZE; /* Possibly older binaries want 8192 on sun4's? */
} }
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{ {
struct vm_unmapped_area_info info = {}; struct vm_unmapped_area_info info = {};

View File

@ -87,7 +87,7 @@ static inline unsigned long COLOR_ALIGN(unsigned long addr,
return base + off; return base + off;
} }
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
struct vm_area_struct * vma; struct vm_area_struct * vma;
@ -146,7 +146,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsi
unsigned long unsigned long
arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
const unsigned long len, const unsigned long pgoff, const unsigned long len, const unsigned long pgoff,
const unsigned long flags) const unsigned long flags, vm_flags_t vm_flags)
{ {
struct vm_area_struct *vma; struct vm_area_struct *vma;
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;

View File

@ -1075,14 +1075,9 @@ static void __init allocate_node_data(int nid)
{ {
struct pglist_data *p; struct pglist_data *p;
unsigned long start_pfn, end_pfn; unsigned long start_pfn, end_pfn;
#ifdef CONFIG_NUMA
NODE_DATA(nid) = memblock_alloc_node(sizeof(struct pglist_data), #ifdef CONFIG_NUMA
SMP_CACHE_BYTES, nid); alloc_node_data(nid);
if (!NODE_DATA(nid)) {
prom_printf("Cannot allocate pglist_data for nid[%d]\n", nid);
prom_halt();
}
NODE_DATA(nid)->node_id = nid; NODE_DATA(nid)->node_id = nid;
#endif #endif
@ -1115,11 +1110,9 @@ static void init_node_masks_nonnuma(void)
} }
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
struct pglist_data *node_data[MAX_NUMNODES];
EXPORT_SYMBOL(numa_cpu_lookup_table); EXPORT_SYMBOL(numa_cpu_lookup_table);
EXPORT_SYMBOL(numa_cpumask_lookup_table); EXPORT_SYMBOL(numa_cpumask_lookup_table);
EXPORT_SYMBOL(node_data);
static int scan_pio_for_cfg_handle(struct mdesc_handle *md, u64 pio, static int scan_pio_for_cfg_handle(struct mdesc_handle *md, u64 pio,
u32 cfg_handle) u32 cfg_handle)

View File

@ -28,6 +28,7 @@ config X86_64
select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select HAVE_ARCH_SOFT_DIRTY select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE select NEED_DMA_MAP_STATE
@ -299,6 +300,7 @@ config X86
select NEED_PER_CPU_EMBED_FIRST_CHUNK select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK select NEED_PER_CPU_PAGE_FIRST_CHUNK
select NEED_SG_DMA_LENGTH select NEED_SG_DMA_LENGTH
select NUMA_MEMBLKS if NUMA
select PCI_DOMAINS if PCI select PCI_DOMAINS if PCI
select PCI_LOCKLESS_CONFIG if PCI select PCI_LOCKLESS_CONFIG if PCI
select PERF_EVENTS select PERF_EVENTS
@ -1601,14 +1603,6 @@ config X86_64_ACPI_NUMA
help help
Enable ACPI SRAT based node topology detection. Enable ACPI SRAT based node topology detection.
config NUMA_EMU
bool "NUMA emulation"
depends on NUMA
help
Enable NUMA emulation. A flat machine will be split
into virtual nodes when booted with "numa=fake=N", where N is the
number of nodes. This is only useful for debugging.
config NODES_SHIFT config NODES_SHIFT
int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP
range 1 10 range 1 10
@ -1808,6 +1802,7 @@ config X86_PAT
def_bool y def_bool y
prompt "x86 PAT support" if EXPERT prompt "x86 PAT support" if EXPERT
depends on MTRR depends on MTRR
select ARCH_USES_PG_ARCH_2
help help
Use PAT attributes to setup page level cache control. Use PAT attributes to setup page level cache control.
@ -1819,10 +1814,6 @@ config X86_PAT
If unsure, say Y. If unsure, say Y.
config ARCH_USES_PG_UNCACHED
def_bool y
depends on X86_PAT
config X86_UMIP config X86_UMIP
def_bool y def_bool y
prompt "User Mode Instruction Prevention" if EXPERT prompt "User Mode Instruction Prevention" if EXPERT

View File

@ -511,7 +511,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
if (init_unaccepted_memory()) { if (init_unaccepted_memory()) {
debug_putstr("Accepting memory... "); debug_putstr("Accepting memory... ");
accept_memory(__pa(output), __pa(output) + needed_size); accept_memory(__pa(output), needed_size);
} }
entry_offset = decompress_kernel(output, virt_addr, error); entry_offset = decompress_kernel(output, virt_addr, error);

View File

@ -256,6 +256,6 @@ static inline bool init_unaccepted_memory(void) { return false; }
/* Defined in EFI stub */ /* Defined in EFI stub */
extern struct efi_unaccepted_memory *unaccepted_table; extern struct efi_unaccepted_memory *unaccepted_table;
void accept_memory(phys_addr_t start, phys_addr_t end); void accept_memory(phys_addr_t start, unsigned long size);
#endif /* BOOT_COMPRESSED_MISC_H */ #endif /* BOOT_COMPRESSED_MISC_H */

View File

@ -11,3 +11,4 @@ generated-y += xen-hypercalls.h
generic-y += early_ioremap.h generic-y += early_ioremap.h
generic-y += mcs_spinlock.h generic-y += mcs_spinlock.h
generic-y += mmzone.h

View File

@ -238,11 +238,6 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
} }
#endif #endif
static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
unsigned long end)
{
}
/* /*
* We only want to enforce protection keys on the current process * We only want to enforce protection keys on the current process
* because we effectively have no access to PKRU for other * because we effectively have no access to PKRU for other

View File

@ -1,6 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifdef CONFIG_X86_32
# include <asm/mmzone_32.h>
#else
# include <asm/mmzone_64.h>
#endif

View File

@ -1,17 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Written by Pat Gaughen (gone@us.ibm.com) Mar 2002
*
*/
#ifndef _ASM_X86_MMZONE_32_H
#define _ASM_X86_MMZONE_32_H
#include <asm/smp.h>
#ifdef CONFIG_NUMA
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[nid])
#endif /* CONFIG_NUMA */
#endif /* _ASM_X86_MMZONE_32_H */

View File

@ -1,18 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* K8 NUMA support */
/* Copyright 2002,2003 by Andi Kleen, SuSE Labs */
/* 2.5 Version loosely based on the NUMAQ Code by Pat Gaughen. */
#ifndef _ASM_X86_MMZONE_64_H
#define _ASM_X86_MMZONE_64_H
#ifdef CONFIG_NUMA
#include <linux/mmdebug.h>
#include <asm/smp.h>
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[nid])
#endif
#endif /* _ASM_X86_MMZONE_64_H */

View File

@ -10,8 +10,6 @@
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
#define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
extern int numa_off; extern int numa_off;
/* /*
@ -25,9 +23,6 @@ extern int numa_off;
extern s16 __apicid_to_node[MAX_LOCAL_APIC]; extern s16 __apicid_to_node[MAX_LOCAL_APIC];
extern nodemask_t numa_nodes_parsed __initdata; extern nodemask_t numa_nodes_parsed __initdata;
extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
extern void __init numa_set_distance(int from, int to, int distance);
static inline void set_apicid_to_node(int apicid, s16 node) static inline void set_apicid_to_node(int apicid, s16 node)
{ {
__apicid_to_node[apicid] = node; __apicid_to_node[apicid] = node;
@ -54,31 +49,20 @@ static inline int numa_cpu_node(int cpu)
extern void numa_set_node(int cpu, int node); extern void numa_set_node(int cpu, int node);
extern void numa_clear_node(int cpu); extern void numa_clear_node(int cpu);
extern void __init init_cpu_to_node(void); extern void __init init_cpu_to_node(void);
extern void numa_add_cpu(int cpu); extern void numa_add_cpu(unsigned int cpu);
extern void numa_remove_cpu(int cpu); extern void numa_remove_cpu(unsigned int cpu);
extern void init_gi_nodes(void); extern void init_gi_nodes(void);
#else /* CONFIG_NUMA */ #else /* CONFIG_NUMA */
static inline void numa_set_node(int cpu, int node) { } static inline void numa_set_node(int cpu, int node) { }
static inline void numa_clear_node(int cpu) { } static inline void numa_clear_node(int cpu) { }
static inline void init_cpu_to_node(void) { } static inline void init_cpu_to_node(void) { }
static inline void numa_add_cpu(int cpu) { } static inline void numa_add_cpu(unsigned int cpu) { }
static inline void numa_remove_cpu(int cpu) { } static inline void numa_remove_cpu(unsigned int cpu) { }
static inline void init_gi_nodes(void) { } static inline void init_gi_nodes(void) { }
#endif /* CONFIG_NUMA */ #endif /* CONFIG_NUMA */
#ifdef CONFIG_DEBUG_PER_CPU_MAPS #ifdef CONFIG_DEBUG_PER_CPU_MAPS
void debug_cpumask_set_cpu(int cpu, int node, bool enable); void debug_cpumask_set_cpu(unsigned int cpu, int node, bool enable);
#endif #endif
#ifdef CONFIG_NUMA_EMU
#define FAKE_NODE_MIN_SIZE ((u64)32 << 20)
#define FAKE_NODE_MIN_HASH_MASK (~(FAKE_NODE_MIN_SIZE - 1UL))
int numa_emu_cmdline(char *str);
#else /* CONFIG_NUMA_EMU */
static inline int numa_emu_cmdline(char *str)
{
return -EINVAL;
}
#endif /* CONFIG_NUMA_EMU */
#endif /* _ASM_X86_NUMA_H */ #endif /* _ASM_X86_NUMA_H */

View File

@ -120,6 +120,34 @@ extern pmdval_t early_pmd_flags;
#define arch_end_context_switch(prev) do {} while(0) #define arch_end_context_switch(prev) do {} while(0)
#endif /* CONFIG_PARAVIRT_XXL */ #endif /* CONFIG_PARAVIRT_XXL */
static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
{
pmdval_t v = native_pmd_val(pmd);
return native_make_pmd(v | set);
}
static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
{
pmdval_t v = native_pmd_val(pmd);
return native_make_pmd(v & ~clear);
}
static inline pud_t pud_set_flags(pud_t pud, pudval_t set)
{
pudval_t v = native_pud_val(pud);
return native_make_pud(v | set);
}
static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear)
{
pudval_t v = native_pud_val(pud);
return native_make_pud(v & ~clear);
}
/* /*
* The following only work if pte_present() is true. * The following only work if pte_present() is true.
* Undefined behaviour if not.. * Undefined behaviour if not..
@ -174,6 +202,13 @@ static inline int pud_young(pud_t pud)
return pud_flags(pud) & _PAGE_ACCESSED; return pud_flags(pud) & _PAGE_ACCESSED;
} }
static inline bool pud_shstk(pud_t pud)
{
return cpu_feature_enabled(X86_FEATURE_SHSTK) &&
(pud_flags(pud) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) ==
(_PAGE_DIRTY | _PAGE_PSE);
}
static inline int pte_write(pte_t pte) static inline int pte_write(pte_t pte)
{ {
/* /*
@ -310,6 +345,30 @@ static inline int pud_devmap(pud_t pud)
} }
#endif #endif
#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
static inline bool pmd_special(pmd_t pmd)
{
return pmd_flags(pmd) & _PAGE_SPECIAL;
}
static inline pmd_t pmd_mkspecial(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_SPECIAL);
}
#endif /* CONFIG_ARCH_SUPPORTS_PMD_PFNMAP */
#ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP
static inline bool pud_special(pud_t pud)
{
return pud_flags(pud) & _PAGE_SPECIAL;
}
static inline pud_t pud_mkspecial(pud_t pud)
{
return pud_set_flags(pud, _PAGE_SPECIAL);
}
#endif /* CONFIG_ARCH_SUPPORTS_PUD_PFNMAP */
static inline int pgd_devmap(pgd_t pgd) static inline int pgd_devmap(pgd_t pgd)
{ {
return 0; return 0;
@ -480,20 +539,6 @@ static inline pte_t pte_mkdevmap(pte_t pte)
return pte_set_flags(pte, _PAGE_SPECIAL|_PAGE_DEVMAP); return pte_set_flags(pte, _PAGE_SPECIAL|_PAGE_DEVMAP);
} }
static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
{
pmdval_t v = native_pmd_val(pmd);
return native_make_pmd(v | set);
}
static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
{
pmdval_t v = native_pmd_val(pmd);
return native_make_pmd(v & ~clear);
}
/* See comments above mksaveddirty_shift() */ /* See comments above mksaveddirty_shift() */
static inline pmd_t pmd_mksaveddirty(pmd_t pmd) static inline pmd_t pmd_mksaveddirty(pmd_t pmd)
{ {
@ -588,20 +633,6 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd)
pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
#define pmd_mkwrite pmd_mkwrite #define pmd_mkwrite pmd_mkwrite
static inline pud_t pud_set_flags(pud_t pud, pudval_t set)
{
pudval_t v = native_pud_val(pud);
return native_make_pud(v | set);
}
static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear)
{
pudval_t v = native_pud_val(pud);
return native_make_pud(v & ~clear);
}
/* See comments above mksaveddirty_shift() */ /* See comments above mksaveddirty_shift() */
static inline pud_t pud_mksaveddirty(pud_t pud) static inline pud_t pud_mksaveddirty(pud_t pud)
{ {
@ -780,6 +811,12 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd)
__pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
} }
static inline pud_t pud_mkinvalid(pud_t pud)
{
return pfn_pud(pud_pfn(pud),
__pgprot(pud_flags(pud) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
}
static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
@ -827,14 +864,8 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
pmd_result = __pmd(val); pmd_result = __pmd(val);
/* /*
* To avoid creating Write=0,Dirty=1 PMDs, pte_modify() needs to avoid: * Avoid creating shadow stack PMD by accident. See comment in
* 1. Marking Write=0 PMDs Dirty=1 * pte_modify().
* 2. Marking Dirty=1 PMDs Write=0
*
* The first case cannot happen because the _PAGE_CHG_MASK will filter
* out any Dirty bit passed in newprot. Handle the second case by
* going through the mksaveddirty exercise. Only do this if the old
* value was Write=1 to avoid doing this on Shadow Stack PTEs.
*/ */
if (oldval & _PAGE_RW) if (oldval & _PAGE_RW)
pmd_result = pmd_mksaveddirty(pmd_result); pmd_result = pmd_mksaveddirty(pmd_result);
@ -844,6 +875,29 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
return pmd_result; return pmd_result;
} }
static inline pud_t pud_modify(pud_t pud, pgprot_t newprot)
{
pudval_t val = pud_val(pud), oldval = val;
pud_t pud_result;
val &= _HPAGE_CHG_MASK;
val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
val = flip_protnone_guard(oldval, val, PHYSICAL_PUD_PAGE_MASK);
pud_result = __pud(val);
/*
* Avoid creating shadow stack PUD by accident. See comment in
* pte_modify().
*/
if (oldval & _PAGE_RW)
pud_result = pud_mksaveddirty(pud_result);
else
pud_result = pud_clear_saveddirty(pud_result);
return pud_result;
}
/* /*
* mprotect needs to preserve PAT and encryption bits when updating * mprotect needs to preserve PAT and encryption bits when updating
* vm_page_prot * vm_page_prot
@ -1078,8 +1132,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
#define pud_leaf pud_leaf #define pud_leaf pud_leaf
static inline bool pud_leaf(pud_t pud) static inline bool pud_leaf(pud_t pud)
{ {
return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) == return pud_val(pud) & _PAGE_PSE;
(_PAGE_PSE | _PAGE_PRESENT);
} }
static inline int pud_bad(pud_t pud) static inline int pud_bad(pud_t pud)
@ -1383,10 +1436,28 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
} }
#endif #endif
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static inline pud_t pudp_establish(struct vm_area_struct *vma,
unsigned long address, pud_t *pudp, pud_t pud)
{
page_table_check_pud_set(vma->vm_mm, pudp, pud);
if (IS_ENABLED(CONFIG_SMP)) {
return xchg(pudp, pud);
} else {
pud_t old = *pudp;
WRITE_ONCE(*pudp, pud);
return old;
}
}
#endif
#define __HAVE_ARCH_PMDP_INVALIDATE_AD #define __HAVE_ARCH_PMDP_INVALIDATE_AD
extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp); unsigned long address, pmd_t *pmdp);
pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
pud_t *pudp);
/* /*
* Page table pages are page-aligned. The lower half of the top * Page table pages are page-aligned. The lower half of the top
* level is used for userspace and the top half for the kernel. * level is used for userspace and the top half for the kernel.
@ -1668,6 +1739,9 @@ void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte);
#define arch_check_zapped_pmd arch_check_zapped_pmd #define arch_check_zapped_pmd arch_check_zapped_pmd
void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd); void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd);
#define arch_check_zapped_pud arch_check_zapped_pud
void arch_check_zapped_pud(struct vm_area_struct *vma, pud_t pud);
#ifdef CONFIG_XEN_PV #ifdef CONFIG_XEN_PV
#define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young #define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young
static inline bool arch_has_hw_nonleaf_pmd_young(void) static inline bool arch_has_hw_nonleaf_pmd_young(void)

View File

@ -245,7 +245,6 @@ extern void cleanup_highmap(void);
#define HAVE_ARCH_UNMAPPED_AREA #define HAVE_ARCH_UNMAPPED_AREA
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
#define HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
#define PAGE_AGP PAGE_KERNEL_NOCACHE #define PAGE_AGP PAGE_KERNEL_NOCACHE
#define HAVE_PAGE_AGP 1 #define HAVE_PAGE_AGP 1

View File

@ -31,13 +31,4 @@
#endif /* CONFIG_SPARSEMEM */ #endif /* CONFIG_SPARSEMEM */
#ifndef __ASSEMBLY__
#ifdef CONFIG_NUMA_KEEP_MEMINFO
extern int phys_to_target_node(phys_addr_t start);
#define phys_to_target_node phys_to_target_node
extern int memory_add_physaddr_to_nid(u64 start);
#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
#endif
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_SPARSEMEM_H */ #endif /* _ASM_X86_SPARSEMEM_H */

View File

@ -121,7 +121,7 @@ static inline unsigned long stack_guard_placement(vm_flags_t vm_flags)
} }
unsigned long unsigned long
arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len, arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags) unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{ {
struct mm_struct *mm = current->mm; struct mm_struct *mm = current->mm;
@ -158,7 +158,7 @@ arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned l
} }
unsigned long unsigned long
arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0, arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr0,
unsigned long len, unsigned long pgoff, unsigned long len, unsigned long pgoff,
unsigned long flags, vm_flags_t vm_flags) unsigned long flags, vm_flags_t vm_flags)
{ {
@ -228,20 +228,5 @@ bottomup:
* can happen with large stack limits and large mmap() * can happen with large stack limits and large mmap()
* allocations. * allocations.
*/ */
return arch_get_unmapped_area(filp, addr0, len, pgoff, flags); return arch_get_unmapped_area(filp, addr0, len, pgoff, flags, 0);
}
unsigned long
arch_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)
{
return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
}
unsigned long
arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr,
const unsigned long len, const unsigned long pgoff,
const unsigned long flags)
{
return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff, flags, 0);
} }

Some files were not shown because too many files have changed in this diff Show More