linux

History

Paul Mackerras 53cfbf5937 perf_counter: record time running and time enabled for each counter Impact: new functionality Currently, if there are more counters enabled than can fit on the CPU, the kernel will multiplex the counters on to the hardware using round-robin scheduling. That isn't too bad for sampling counters, but for counting counters it means that the value read from a counter represents some unknown fraction of the true count of events that occurred while the counter was enabled. This remedies the situation by keeping track of how long each counter is enabled for, and how long it is actually on the cpu and counting events. These times are recorded in nanoseconds using the task clock for per-task counters and the cpu clock for per-cpu counters. These values can be supplied to userspace on a read from the counter. Userspace requests that they be supplied after the counter value by setting the PERF_FORMAT_TOTAL_TIME_ENABLED and/or PERF_FORMAT_TOTAL_TIME_RUNNING bits in the hw_event.read_format field when creating the counter. (There is no way to change the read format after the counter is created, though it would be possible to add some way to do that.) Using this information it is possible for userspace to scale the count it reads from the counter to get an estimate of the true count: true_count_estimate = count * total_time_enabled / total_time_running This also lets userspace detect the situation where the counter never got to go on the cpu: total_time_running == 0. This functionality has been requested by the PAPI developers, and will be generally needed for interpreting the count values from counting counters correctly. In the implementation, this keeps 5 time values (in nanoseconds) for each counter: total_time_enabled and total_time_running are used when the counter is in state OFF or ERROR and for reporting back to userspace. When the counter is in state INACTIVE or ACTIVE, it is the tstamp_enabled, tstamp_running and tstamp_stopped values that are relevant, and total_time_enabled and total_time_running are determined from them. (tstamp_stopped is only used in INACTIVE state.) The reason for doing it like this is that it means that only counters being enabled or disabled at sched-in and sched-out time need to be updated. There are no new loops that iterate over all counters to update total_time_enabled or total_time_running. This also keeps separate child_total_time_running and child_total_time_enabled fields that get added in when reporting the totals to userspace. They are separate fields so that they can be atomic. We don't want to use atomics for total_time_running, total_time_enabled etc., because then we would have to use atomic sequences to update them, which are slower than regular arithmetic and memory accesses. It is possible to measure total_time_running by adding a task_clock counter to each group of counters, and total_time_enabled can be measured approximately with a top-level task_clock counter (though inaccuracies will creep in if you need to disable and enable groups since it is not possible in general to disable/enable the top-level task_clock counter simultaneously with another group). However, that adds extra overhead - I measured around 15% increase in the context switch latency reported by lat_ctx (from lmbench) when a task_clock counter was added to each of 2 groups, and around 25% increase when a task_clock counter was added to each of 4 groups. (In both cases a top-level task-clock counter was also added.) In contrast, the code added in this commit gives better information with no overhead that I could measure (in fact in some cases I measured lower times with this code, but the differences were all less than one standard deviation). [ v2: address review comments by Andrew Morton. ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Orig-LKML-Reference: <18890.6578.728637.139402@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>		2009-04-06 09:30:36 +02:00
..
vdso32	powerpc/mm: Introduce MMU features	2008-12-21 14:21:16 +11:00
vdso64	powerpc/mm: Introduce MMU features	2008-12-21 14:21:16 +11:00
.gitignore	powerpc: Ignore generated vmlinux.lds in git	2008-10-07 14:26:18 +11:00
align.c	powerpc: Fix load/store float double alignment handler	2009-02-26 14:02:53 +11:00
asm-offsets.c	Merge branch 'linus' into perfcounters/core-v2	2009-04-06 09:02:57 +02:00
audit.c
btext.c
cacheinfo.c	powerpc/cacheinfo: Rename cache_dir per-cpu variable	2009-01-13 14:48:02 +11:00
cacheinfo.h	powerpc: Rewrite sysfs processor cache info code	2009-01-08 16:25:10 +11:00
clock.c
compat_audit.c
cpu_setup_6xx.S	powerpc/mm: e300c2/c3/c4 TLB errata workaround	2009-03-24 13:47:32 +11:00
cpu_setup_44x.S	AMCC PPC 460SX redwood SoC platform initial framework	2009-02-14 14:41:29 -05:00
cpu_setup_fsl_booke.S	powerpc/fsl-booke: Cleanup init/exception setup to be runtime	2009-01-28 18:16:50 -06:00
cpu_setup_pa6t.S
cpu_setup_ppc970.S	powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit	2008-09-15 11:08:35 -07:00
cputable.c	powerpc/mm: e300c2/c3/c4 TLB errata workaround	2009-03-24 13:47:32 +11:00
crash_dump.c	powerpc: Unify opcode definitions and support	2009-02-23 10:48:56 +11:00
crash.c
dbell.c	powerpc: Add support for using doorbells for SMP IPI	2009-02-23 15:53:03 +11:00
dma-iommu.c	powerpc: Change u64/s64 to a long long integer type	2009-01-13 14:47:59 +11:00
dma.c	powerpc: Add sync__for_ to dma_ops	2008-12-03 20:46:36 +11:00
entry_32.S	powerpc: Unify opcode definitions and support	2009-02-23 10:48:56 +11:00
entry_64.S	Merge branch 'linus' into perfcounters/core-v2	2009-04-06 09:02:57 +02:00
firmware.c
fpu.S
ftrace.c	tracing, powerpc: fix powerpc tree and tracing tree interaction	2009-04-02 00:50:24 +02:00
head_8xx.S
head_32.S	powerpc/mm: e300c2/c3/c4 TLB errata workaround	2009-03-24 13:47:32 +11:00
head_40x.S
head_44x.S	powerpc/44x: Support 16K/64K base page sizes on 44x	2008-12-29 09:53:25 +11:00
head_64.S	powerpc/kconfig: Kill PPC_MULTIPLATFORM	2009-03-11 17:11:35 +11:00
head_booke.h	Merge commit 'jwb/next' into next	2009-03-03 13:30:03 +11:00
head_fsl_booke.S	powerpc: Add support for using doorbells for SMP IPI	2009-02-23 15:53:03 +11:00
ibmebus.c	powerpc: struct device - replace bus_id with dev_name(), dev_set_name()	2008-12-16 15:53:38 +11:00
idle_6xx.S
idle_e500.S
idle_power4.S
idle.c	powerpc: ftrace, do not latency trace idle	2008-11-20 10:51:15 -08:00
init_task.c	take init_fs to saner place	2008-12-31 18:07:42 -05:00
io.c
iomap.c
iommu.c	powerpc: Change u64/s64 to a long long integer type	2009-01-13 14:47:59 +11:00
irq.c	perf_counter: abstract wakeup flag setting in core to fix powerpc build	2009-04-06 09:30:14 +02:00
isa-bridge.c
kgdb.c	kgdb, x86, arm, mips, powerpc: ignore user space single stepping	2008-09-26 10:36:41 -05:00
kprobes.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2009-01-07 11:31:52 -08:00
l2cr_6xx.S
legacy_serial.c
lparcfg.c	powerpc: Change u64/s64 to a long long integer type	2009-01-13 14:47:59 +11:00
machine_kexec_32.c
machine_kexec_64.c	powerpc/32: Setup OF properties for kdump	2008-12-23 15:13:29 +11:00
machine_kexec.c	powerpc/kexec: Check crash_base for relocatable kernel	2009-01-13 14:47:59 +11:00
Makefile	Merge branch 'linus' into perfcounters/core-v2	2009-04-06 09:02:57 +02:00
misc_32.S	powerpc/44x: Support 16K/64K base page sizes on 44x	2008-12-29 09:53:25 +11:00
misc_64.S	powerpc: Kexec exit should not use magic numbers	2008-10-31 16:11:44 +11:00
misc.S	powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit	2008-09-15 11:08:35 -07:00
module_32.c	powerpc/ppc32: ftrace, dynamic ftrace to handle modules	2008-11-20 10:52:53 -08:00
module_64.c	powerpc: Unify opcode definitions and support	2009-02-23 10:48:56 +11:00
module.c	powerpc/mm: Introduce MMU features	2008-12-21 14:21:16 +11:00
msi.c	powerpc/PCI: include pci.h in powerpc MSI implementation	2009-03-25 08:54:29 -07:00
nvram_64.c
of_device.c	powerpc: struct device - replace bus_id with dev_name(), dev_set_name()	2008-12-16 15:53:38 +11:00
of_platform.c
paca.c	powerpc: Update page-in counter for CMM	2008-11-05 22:08:28 +11:00
pci_32.c	powerpc/pci: Fix PCI<->OF matching of old style multifunc devices	2009-02-23 10:48:57 +11:00
pci_64.c	powerpc/pci: Move hose_list and pci_address_to_pio to pci-common	2009-02-11 16:00:07 +11:00
pci_dn.c
pci-common.c	powerpc/pci: Default to dma_direct_ops for pci dma_ops	2009-03-24 13:47:30 +11:00
perf_counter.c	perf_counter: record time running and time enabled for each counter	2009-04-06 09:30:36 +02:00
pmc.c
power4-pmu.c	perfcounters/powerpc: add support for POWER4 processors	2009-03-06 16:30:57 +11:00
power5-pmu.c	perfcounters/powerpc: Add support for POWER5 processors	2009-02-26 15:36:48 +11:00
power5+-pmu.c	perfcounters/powerpc: add support for POWER5+ processors	2009-03-06 16:28:37 +11:00
power6-pmu.c	powerpc/perf_counter: Add support for POWER6	2009-01-10 16:35:01 +11:00
ppc32.h
ppc970-pmu.c	powerpc/perf_counter: Add support for PPC970 family	2009-01-10 16:34:07 +11:00
ppc_ksyms.c	powerpc: Export cacheable_memzero as its now used in a driver	2009-01-08 16:25:17 +11:00
ppc_save_regs.S	powerpc: Prepare xmon_save_regs for use with kdump	2008-12-23 15:13:28 +11:00
proc_ppc64.c
process.c	Simplify copy_thread()	2009-04-02 19:04:51 -07:00
prom_init_check.sh	powerpc: Print linux_banner in prom_init	2009-03-11 17:11:33 +11:00
prom_init.c	powerpc: Fix prom_init on 32-bit OF machines	2009-03-24 13:43:35 +11:00
prom_parse.c	PCI: powerpc: use generic pci_swizzle_interrupt_pin()	2009-01-07 11:12:52 -08:00
prom.c	powerpc: Allow debugging of LMBs with lmb=debug	2009-02-11 13:38:00 +11:00
ptrace32.c
ptrace.c
reloc_64.S	powerpc: Make the 64-bit kernel as a position-independent executable	2008-09-15 11:08:38 -07:00
rtas_flash.c	proc 2/2: remove struct proc_dir_entry::owner	2009-03-31 01:14:44 +04:00
rtas_pci.c	powerpc/pci: Fix various pseries PCI hotplug issues	2008-11-06 09:31:52 +11:00
rtas-proc.c
rtas-rtc.c
rtas.c	powerpc/pseries: Fix partition migration hang under load	2009-02-23 15:53:04 +11:00
setup_32.c	powerpc/32: Wire up the trampoline code for kdump	2008-12-23 15:13:29 +11:00
setup_64.c	powerpc/mm: Introduce early_init_mmu() on 64-bit	2009-03-24 13:47:34 +11:00
setup-common.c	powerpc: setup default archdata for {of_}platform via bus_register_notifier	2009-03-24 13:47:30 +11:00
setup.h
signal_32.c	powerpc: Sanitize stack pointer in signal handling code	2009-03-27 16:58:24 +11:00
signal_64.c	powerpc: Sanitize stack pointer in signal handling code	2009-03-27 16:58:24 +11:00
signal.c	powerpc: Sanitize stack pointer in signal handling code	2009-03-27 16:58:24 +11:00
signal.h	powerpc: Sanitize stack pointer in signal handling code	2009-03-27 16:58:24 +11:00
smp-tbsync.c	powerpc: Silence software timebase sync	2008-11-05 22:08:28 +11:00
smp.c	Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-01-02 11:44:09 -08:00
softemu8xx.c
stacktrace.c
suspend.c
swsusp_32.S	powerpc/mm: Introduce MMU features	2008-12-21 14:21:16 +11:00
swsusp_64.c
swsusp_asm64.S	powerpc: Fix 64-bit hibernation with 64k pages	2008-10-07 14:26:20 +11:00
swsusp.c	powerpc/mm: Split mmu_context handling	2008-12-21 14:21:15 +11:00
sys_ppc32.c	compat: generic compat get/settimeofday	2008-10-16 11:21:33 -07:00
syscalls.c
sysfs.c	powerpc: Fix bugs introduced by sysfs changes	2009-03-27 16:58:24 +11:00
systbl_chk.c
systbl_chk.sh
systbl.S
tau_6xx.c
time.c	powerpc: Hook up rtc-generic, and kill rtc-ppc	2009-04-02 01:05:31 +00:00
traps.c	powerpc: Add support for using doorbells for SMP IPI	2009-02-23 15:53:03 +11:00
udbg_16550.c	powerpc/udbg: Fix lost byte during console handover; change LFCR to CRLF	2009-03-11 17:11:34 +11:00
udbg.c	powerpc/udbg: Fix lost byte during console handover; change LFCR to CRLF	2009-03-11 17:11:34 +11:00
vdso.c	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc	2008-12-28 16:54:33 -08:00
vecemu.c
vector.S
vio.c	workqueue: add to_delayed_work() helper function	2009-04-02 19:04:50 -07:00
vmlinux.lds.S	Merge commit 'origin/master' into next	2009-03-30 14:04:53 +11:00