1
linux/mm
Christoph Lameter e498be7daf [PATCH] Numa-aware slab allocator V5
The NUMA API change that introduced kmalloc_node was accepted for
2.6.12-rc3.  Now it is possible to do slab allocations on a node to
localize memory structures.  This API was used by the pageset localization
patch and the block layer localization patch now in mm.  The existing
kmalloc_node is slow since it simply searches through all pages of the slab
to find a page that is on the node requested.  The two patches do a one
time allocation of slab structures at initialization and therefore the
speed of kmalloc node does not matter.

This patch allows kmalloc_node to be as fast as kmalloc by introducing node
specific page lists for partial, free and full slabs.  Slab allocation
improves in a NUMA system so that we are seeing a performance gain in AIM7
of about 5% with this patch alone.

More NUMA localizations are possible if kmalloc_node operates in an fast
way like kmalloc.

Test run on a 32p systems with 32G Ram.

w/o patch
Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      485.36  100       485.3640     11.99      1.91   Sat Apr 30 14:01:51 2005
  100    26582.63   88       265.8263     21.89    144.96   Sat Apr 30 14:02:14 2005
  200    29866.83   81       149.3342     38.97    286.08   Sat Apr 30 14:02:53 2005
  300    33127.16   78       110.4239     52.71    426.54   Sat Apr 30 14:03:46 2005
  400    34889.47   80        87.2237     66.72    568.90   Sat Apr 30 14:04:53 2005
  500    35654.34   76        71.3087     81.62    714.55   Sat Apr 30 14:06:15 2005
  600    36460.83   75        60.7681     95.77    853.42   Sat Apr 30 14:07:51 2005
  700    35957.00   75        51.3671    113.30    990.67   Sat Apr 30 14:09:45 2005
  800    33380.65   73        41.7258    139.48   1140.86   Sat Apr 30 14:12:05 2005
  900    35095.01   76        38.9945    149.25   1281.30   Sat Apr 30 14:14:35 2005
 1000    36094.37   74        36.0944    161.24   1419.66   Sat Apr 30 14:17:17 2005

w/patch
Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      484.27  100       484.2736     12.02      1.93   Sat Apr 30 15:59:45 2005
  100    28262.03   90       282.6203     20.59    143.57   Sat Apr 30 16:00:06 2005
  200    32246.45   82       161.2322     36.10    282.89   Sat Apr 30 16:00:42 2005
  300    37945.80   83       126.4860     46.01    418.75   Sat Apr 30 16:01:28 2005
  400    40000.69   81       100.0017     58.20    561.48   Sat Apr 30 16:02:27 2005
  500    40976.10   78        81.9522     71.02    696.95   Sat Apr 30 16:03:38 2005
  600    41121.54   78        68.5359     84.92    834.86   Sat Apr 30 16:05:04 2005
  700    44052.77   78        62.9325     92.48    971.53   Sat Apr 30 16:06:37 2005
  800    41066.89   79        51.3336    113.38   1111.15   Sat Apr 30 16:08:31 2005
  900    38918.77   79        43.2431    134.59   1252.57   Sat Apr 30 16:10:46 2005
 1000    41842.21   76        41.8422    139.09   1392.33   Sat Apr 30 16:13:05 2005

These are measurement taken directly after boot and show a greater
improvement than 5%.  However, the performance improvements become less
over time if the AIM7 runs are repeated and settle down at around 5%.

Links to earlier discussions:
http://marc.theaimsgroup.com/?t=111094594500003&r=1&w=2
http://marc.theaimsgroup.com/?t=111603406600002&r=1&w=2

Changelog V4-V5:
- alloc_arraycache and alloc_aliencache take node parameter instead of cpu
- fix initialization so that nodes without cpus are properly handled.
- simplify code in kmem_cache_init
- patch against Andrews temp mm3 release
- Add Shai to credits
- fallback to __cache_alloc from __cache_alloc_node if the node's cache
  is not available yet.

Changelog V3-V4:
- Patch against 2.6.12-rc5-mm1
- Cleanup patch integrated
- More and better use of for_each_node and for_each_cpu
- GCC 2.95 fix (do not use [] use [0])
- Correct determination of INDEX_AC
- Remove hack to cause an error on platforms that have no CONFIG_NUMA but nodes.
- Remove list3_data and list3_data_ptr macros for better readability

Changelog V2-V3:
- Made to patch against 2.6.12-rc4-mm1
- Revised bootstrap mechanism so that larger size kmem_list3 structs can be
  supported. Do a generic solution so that the right slab can be found
  for the internal structs.
- use for_each_online_node

Changelog V1-V2:
- Batching for freeing of wrong-node objects (alien caches)
- Locking changes and NUMA #ifdefs as requested by Manfred

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Shai Fultheim <Shai@Scalex86.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:48 -07:00
..
bootmem.c [PATCH] Use ALIGN to remove duplicate code 2005-06-25 16:25:02 -07:00
fadvise.c [PATCH] xip: madvice/fadvice: execute in place 2005-06-24 00:06:42 -07:00
filemap_xip.c [PATCH] execute-in-place fixes 2005-07-15 09:54:50 -07:00
filemap.c [PATCH] shmem_populate: avoid an useless check, and some comments 2005-09-05 00:05:45 -07:00
filemap.h [PATCH] xip: reduce code duplication 2005-06-24 00:06:41 -07:00
fremap.c
highmem.c
hugetlb.c [PATCH] hugetlb: move stale pte check into huge_pte_alloc() 2005-09-05 00:05:46 -07:00
internal.h
Kconfig [PATCH] sparsemem extreme implementation 2005-09-05 00:05:38 -07:00
madvise.c [PATCH] mm: fix madvise vma merging 2005-09-05 00:05:44 -07:00
Makefile [PATCH] xip: fs/mm: execute in place 2005-06-24 00:06:41 -07:00
memory.c [PATCH] x86: ptep_clear optimization 2005-09-05 00:05:48 -07:00
mempolicy.c [PATCH] PCI: Run PCI driver initialization on local node 2005-09-08 14:57:23 -07:00
mempool.c [PATCH] propagate __nocast annotations 2005-07-07 18:23:46 -07:00
mincore.c
mlock.c
mmap.c [PATCH] remove misleading comment above sys_brk 2005-09-07 16:57:23 -07:00
mprotect.c
mremap.c [PATCH] mm: remap ZERO_PAGE mappings 2005-09-05 00:05:44 -07:00
msync.c
nommu.c [PATCH] __vm_enough_memory() signedness fix 2005-08-04 21:43:14 -07:00
oom_kill.c [PATCH] cpusets: confine oom_killer to mem_exclusive cpuset 2005-09-07 16:57:40 -07:00
page_alloc.c [PATCH] cpusets: formalize intermediate GFP_KERNEL containment 2005-09-07 16:57:40 -07:00
page_io.c [PATCH] swsusp: kill config_pm_disk 2005-06-25 16:24:32 -07:00
page-writeback.c [PATCH] rename wakeup_bdflush to wakeup_pdflush 2005-06-28 21:20:31 -07:00
pdflush.c [PATCH] Cleanup patch for process freezing 2005-06-25 17:10:13 -07:00
prio_tree.c
readahead.c [PATCH] readahead: reset cache_hit earlier 2005-09-07 16:57:25 -07:00
rmap.c [PATCH] mm: cleanup rmap 2005-09-05 00:05:43 -07:00
shmem.c [PATCH] tmpfs: Enable atomic inode security labeling 2005-09-09 13:57:28 -07:00
slab.c [PATCH] Numa-aware slab allocator V5 2005-09-09 13:57:48 -07:00
sparse.c [PATCH] sparsemem extreme: hotplug preparation 2005-09-05 00:05:38 -07:00
swap_state.c [PATCH] delete from_swap_cache BUG_ONs 2005-09-05 00:05:42 -07:00
swap.c
swapfile.c [PATCH] swap: swap_lock replace list+device 2005-09-05 00:05:42 -07:00
thrash.c
tiny-shmem.c
truncate.c
vmalloc.c [PATCH] arm: allow for arch-specific IOREMAP_MAX_ORDER 2005-09-05 00:05:46 -07:00
vmscan.c [PATCH] cpusets: formalize intermediate GFP_KERNEL containment 2005-09-07 16:57:40 -07:00