1
linux/lib
Denis Vlasenko 4277eedd79 vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2
Optimize integer-to-string conversion in vsprintf.c for base 10.  This is
by far the most used conversion, and in some use cases it impacts
performance.  For example, top reads /proc/$PID/stat for every process, and
with 4000 processes decimal conversion alone takes noticeable time.

Using code from

http://www.cs.uiowa.edu/~jones/bcd/decimal.html
(with permission from the author, Douglas W. Jones)

binary-to-decimal-string conversion is done in groups of five digits at
once, using only additions/subtractions/shifts (with -O2; -Os throws in
some multiply instructions).

On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code.

This patch is run tested. Userspace benchmark/test is also attached.
I tested it on PIII and AMD64 and new code is generally ~2.5 times
faster. On AMD64:

# ./vsprintf_verify-O2
Original decimal conv: .......... 151 ns per iteration
Patched decimal conv:  .......... 62 ns per iteration
Testing correctness
12895992590592 ok...        [Ctrl-C]
# ./vsprintf_verify-O2
Original decimal conv: .......... 151 ns per iteration
Patched decimal conv:  .......... 62 ns per iteration
Testing correctness
26025406464 ok...        [Ctrl-C]

More realistic test: top from busybox project was modified to
report how many us it took to scan /proc (this does not account
any processing done after that, like sorting process list),
and then I test it with 4000 processes:

#!/bin/sh
i=4000
while test $i != 0; do
    sleep 30 &
    let i--
done
busybox top -b -n3 >/dev/null

on unpatched kernel:

top: 4120 processes took 102864 microseconds to scan
top: 4120 processes took 91757 microseconds to scan
top: 4120 processes took 92517 microseconds to scan
top: 4120 processes took 92581 microseconds to scan

on patched kernel:

top: 4120 processes took 75460 microseconds to scan
top: 4120 processes took 66451 microseconds to scan
top: 4120 processes took 67267 microseconds to scan
top: 4120 processes took 67618 microseconds to scan

The speedup comes from much faster generation of /proc/PID/stat
by sprintf() calls inside the kernel.

Signed-off-by: Douglas W Jones <jones@cs.uiowa.edu>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:52 -07:00
..
lzo Add LZO1X algorithm to the kernel 2007-07-10 17:51:13 -07:00
reed_solomon [RSLIB] Support non-canonical GF representations 2007-05-02 11:56:33 +01:00
zlib_deflate [PATCH] zlib_inflate: Upgrade library code to a recent version 2006-06-22 15:05:58 -07:00
zlib_inflate Fix ppp_deflate issues with recent zlib_inflate changes 2007-05-07 12:13:04 -07:00
.gitignore Add some basic .gitignore files 2005-10-18 08:26:15 -07:00
audit.c [PATCH] audit signal recipients 2007-05-11 05:38:25 -04:00
bitmap.c [PATCH] kernel-doc fixes for 2.6.20-git15 (non-drivers) 2007-03-01 14:53:37 -08:00
bitrev.c [PATCH] add MODULE_* attributes to bit reversal library 2006-12-10 10:07:52 -08:00
bug.c generic bug: use show_regs() instead of dump_stack() 2007-07-16 09:05:51 -07:00
bust_spinlocks.c [PATCH] Extract and use wake_up_klogd() 2007-02-11 10:51:34 -08:00
check_signature.c uninline check_signature() 2007-07-16 09:05:50 -07:00
cmdline.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
cpumask.c Safer nr_node_ids and nr_node_ids determination and initial values 2007-05-07 12:12:51 -07:00
crc16.c [PATCH] kernel-doc for lib/crc*.c 2006-06-25 10:01:20 -07:00
crc32.c [PATCH] crc32: replace bitreverse by bitrev32 2006-12-08 08:28:39 -08:00
crc32defs.h Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
crc-ccitt.c [PATCH] kernel-doc for lib/crc*.c 2006-06-25 10:01:20 -07:00
crc-itu-t.c CRC ITU-T V.41 2007-05-10 18:24:13 +02:00
ctype.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
debug_locks.c [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
dec_and_lock.c [PATCH] atomic: dec_and_lock use atomic primitives 2006-01-08 20:13:48 -08:00
devres.c iomap: implement pcim_iounmap_regions() 2007-04-28 14:15:58 -04:00
div64.c [S390]: Fix build on 31-bit. 2007-04-25 22:28:53 -07:00
dump_stack.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
extable.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
fault-inject.c simplify the stacktrace code 2007-05-08 11:14:58 -07:00
find_next_bit.c [PATCH] bitops: generic ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() 2006-03-26 08:57:11 -08:00
gen_crc32table.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
genalloc.c [PATCH] genalloc warning fixes 2007-02-20 17:10:15 -08:00
halfmd4.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
hexdump.c hexdump: more output formatting 2007-06-08 17:23:34 -07:00
hweight.c [PATCH] optimize hweight64 for x86_64 2006-09-26 10:52:38 +02:00
idr.c lib: add idr_remove_all 2007-07-16 09:05:34 -07:00
inflate.c [PATCH] x86-64: deflate inflate_dynamic too 2007-05-02 19:27:15 +02:00
int_sqrt.c [PATCH] lib: Fix bug in int_sqrt() for 64 bit longs 2006-02-03 08:32:08 -08:00
iomap_copy.c [PATCH] add __iowrite64_copy 2006-06-20 20:24:58 -07:00
iomap.c iomap: make the default iomap functions fail softer 2007-05-04 20:44:23 -07:00
ioremap.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
irq_regs.c [PATCH] irq_reqs: export __irq_regs 2006-10-06 08:53:40 -07:00
Kconfig Add LZO1X algorithm to the kernel 2007-07-10 17:51:13 -07:00
Kconfig.debug SLUB: support slub_debug on by default 2007-07-16 09:05:36 -07:00
kernel_lock.c [PATCH] lockdep: prove spinlock rwlock locking correctness 2006-07-03 15:27:04 -07:00
klist.c Driver core: Don't call put methods while holding a spinlock 2006-09-25 21:08:40 -07:00
kobject_uevent.c the overdue removal of the mount/umount uevents 2007-04-27 10:57:31 -07:00
kobject.c sysfs: make kobj point to sysfs_dirent instead of dentry 2007-07-11 16:09:08 -07:00
kref.c kref: fix CPU ordering with respect to krefs 2007-04-27 10:57:29 -07:00
libcrc32c.c [PATCH] constify libcrc32c table 2006-06-25 10:01:09 -07:00
list_debug.c [PATCH] More list debugging context 2006-12-07 08:39:35 -08:00
locking-selftest-hardirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-mutex.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-rlock-hardirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-rlock-softirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-rlock.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-rsem.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-softirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-spin-hardirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-spin-softirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-spin.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-wlock-hardirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-wlock-softirq.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-wlock.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest-wsem.h [PATCH] lockdep: locking API self tests 2006-07-03 15:27:03 -07:00
locking-selftest.c [PATCH] lockdep: show more details about self-test failures 2006-12-07 08:39:43 -08:00
Makefile uninline check_signature() 2007-07-16 09:05:50 -07:00
parser.c [AFS]: Make the match_*() functions take const options. 2007-05-03 03:10:39 -07:00
percpu_counter.c percpu_counters: use for_each_online_cpu() 2007-07-16 09:05:41 -07:00
plist.c [PATCH] pi-futex: add plist implementation 2006-06-27 17:32:46 -07:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
radix-tree.c [LIB]: export radix_tree_preload() 2007-07-14 16:05:04 +10:00
random32.c [PATCH] severing module.h->sched.h 2006-12-04 02:00:22 -05:00
rbtree.c [PATCH] rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev 2006-09-30 20:26:56 +02:00
reciprocal_div.c [PATCH] SLAB: use a multiply instead of a divide in obj_to_index() 2006-12-13 09:05:49 -08:00
rwsem-spinlock.c Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() 2006-10-11 01:45:14 -04:00
rwsem.c Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() 2006-10-11 01:45:14 -04:00
semaphore-sleepers.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
sha1.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
smp_processor_id.c [PATCH] fix missing includes 2005-10-30 17:37:32 -08:00
sort.c [PATCH] Numerous fixes to kernel-doc info in source files. 2007-02-11 10:51:32 -08:00
spinlock_debug.c [PATCH] x86: all cpu backtrace 2006-12-07 02:14:01 +01:00
string.c [STRING]: Move strcasecmp/strncasecmp to lib/string.c 2007-04-26 01:54:39 -07:00
swiotlb.c fix section mismatch warning in lib/swiotlb.c 2007-05-08 11:14:59 -07:00
textsearch.c Various typo fixes. 2007-02-17 19:07:33 +01:00
ts_bm.c [TEXTSEARCH]: Fix Boyer Moore initialization bug 2006-08-22 14:33:58 -07:00
ts_fsm.c [PATCH] lib/ts_fsm.c: constify structs 2006-09-29 09:18:23 -07:00
ts_kmp.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
vsprintf.c vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2 2007-07-16 09:05:52 -07:00