1
linux/arch/powerpc/kernel
Paul Mackerras 53cfbf5937 perf_counter: record time running and time enabled for each counter
Impact: new functionality

Currently, if there are more counters enabled than can fit on the CPU,
the kernel will multiplex the counters on to the hardware using
round-robin scheduling.  That isn't too bad for sampling counters, but
for counting counters it means that the value read from a counter
represents some unknown fraction of the true count of events that
occurred while the counter was enabled.

This remedies the situation by keeping track of how long each counter
is enabled for, and how long it is actually on the cpu and counting
events.  These times are recorded in nanoseconds using the task clock
for per-task counters and the cpu clock for per-cpu counters.

These values can be supplied to userspace on a read from the counter.
Userspace requests that they be supplied after the counter value by
setting the PERF_FORMAT_TOTAL_TIME_ENABLED and/or
PERF_FORMAT_TOTAL_TIME_RUNNING bits in the hw_event.read_format field
when creating the counter.  (There is no way to change the read format
after the counter is created, though it would be possible to add some
way to do that.)

Using this information it is possible for userspace to scale the count
it reads from the counter to get an estimate of the true count:

true_count_estimate = count * total_time_enabled / total_time_running

This also lets userspace detect the situation where the counter never
got to go on the cpu: total_time_running == 0.

This functionality has been requested by the PAPI developers, and will
be generally needed for interpreting the count values from counting
counters correctly.

In the implementation, this keeps 5 time values (in nanoseconds) for
each counter: total_time_enabled and total_time_running are used when
the counter is in state OFF or ERROR and for reporting back to
userspace.  When the counter is in state INACTIVE or ACTIVE, it is the
tstamp_enabled, tstamp_running and tstamp_stopped values that are
relevant, and total_time_enabled and total_time_running are determined
from them.  (tstamp_stopped is only used in INACTIVE state.)  The
reason for doing it like this is that it means that only counters
being enabled or disabled at sched-in and sched-out time need to be
updated.  There are no new loops that iterate over all counters to
update total_time_enabled or total_time_running.

This also keeps separate child_total_time_running and
child_total_time_enabled fields that get added in when reporting the
totals to userspace.  They are separate fields so that they can be
atomic.  We don't want to use atomics for total_time_running,
total_time_enabled etc., because then we would have to use atomic
sequences to update them, which are slower than regular arithmetic and
memory accesses.

It is possible to measure total_time_running by adding a task_clock
counter to each group of counters, and total_time_enabled can be
measured approximately with a top-level task_clock counter (though
inaccuracies will creep in if you need to disable and enable groups
since it is not possible in general to disable/enable the top-level
task_clock counter simultaneously with another group).  However, that
adds extra overhead - I measured around 15% increase in the context
switch latency reported by lat_ctx (from lmbench) when a task_clock
counter was added to each of 2 groups, and around 25% increase when a
task_clock counter was added to each of 4 groups.  (In both cases a
top-level task-clock counter was also added.)

In contrast, the code added in this commit gives better information
with no overhead that I could measure (in fact in some cases I
measured lower times with this code, but the differences were all less
than one standard deviation).

[ v2: address review comments by Andrew Morton. ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Orig-LKML-Reference: <18890.6578.728637.139402@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:36 +02:00
..
vdso32 powerpc/mm: Introduce MMU features 2008-12-21 14:21:16 +11:00
vdso64 powerpc/mm: Introduce MMU features 2008-12-21 14:21:16 +11:00
.gitignore powerpc: Ignore generated vmlinux.lds in git 2008-10-07 14:26:18 +11:00
align.c powerpc: Fix load/store float double alignment handler 2009-02-26 14:02:53 +11:00
asm-offsets.c Merge branch 'linus' into perfcounters/core-v2 2009-04-06 09:02:57 +02:00
audit.c
btext.c
cacheinfo.c powerpc/cacheinfo: Rename cache_dir per-cpu variable 2009-01-13 14:48:02 +11:00
cacheinfo.h powerpc: Rewrite sysfs processor cache info code 2009-01-08 16:25:10 +11:00
clock.c
compat_audit.c
cpu_setup_6xx.S powerpc/mm: e300c2/c3/c4 TLB errata workaround 2009-03-24 13:47:32 +11:00
cpu_setup_44x.S AMCC PPC 460SX redwood SoC platform initial framework 2009-02-14 14:41:29 -05:00
cpu_setup_fsl_booke.S powerpc/fsl-booke: Cleanup init/exception setup to be runtime 2009-01-28 18:16:50 -06:00
cpu_setup_pa6t.S
cpu_setup_ppc970.S powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit 2008-09-15 11:08:35 -07:00
cputable.c powerpc/mm: e300c2/c3/c4 TLB errata workaround 2009-03-24 13:47:32 +11:00
crash_dump.c powerpc: Unify opcode definitions and support 2009-02-23 10:48:56 +11:00
crash.c
dbell.c powerpc: Add support for using doorbells for SMP IPI 2009-02-23 15:53:03 +11:00
dma-iommu.c powerpc: Change u64/s64 to a long long integer type 2009-01-13 14:47:59 +11:00
dma.c powerpc: Add sync_*_for_* to dma_ops 2008-12-03 20:46:36 +11:00
entry_32.S powerpc: Unify opcode definitions and support 2009-02-23 10:48:56 +11:00
entry_64.S Merge branch 'linus' into perfcounters/core-v2 2009-04-06 09:02:57 +02:00
firmware.c
fpu.S
ftrace.c tracing, powerpc: fix powerpc tree and tracing tree interaction 2009-04-02 00:50:24 +02:00
head_8xx.S
head_32.S powerpc/mm: e300c2/c3/c4 TLB errata workaround 2009-03-24 13:47:32 +11:00
head_40x.S
head_44x.S powerpc/44x: Support 16K/64K base page sizes on 44x 2008-12-29 09:53:25 +11:00
head_64.S powerpc/kconfig: Kill PPC_MULTIPLATFORM 2009-03-11 17:11:35 +11:00
head_booke.h Merge commit 'jwb/next' into next 2009-03-03 13:30:03 +11:00
head_fsl_booke.S powerpc: Add support for using doorbells for SMP IPI 2009-02-23 15:53:03 +11:00
ibmebus.c powerpc: struct device - replace bus_id with dev_name(), dev_set_name() 2008-12-16 15:53:38 +11:00
idle_6xx.S
idle_e500.S
idle_power4.S
idle.c powerpc: ftrace, do not latency trace idle 2008-11-20 10:51:15 -08:00
init_task.c take init_fs to saner place 2008-12-31 18:07:42 -05:00
io.c
iomap.c
iommu.c powerpc: Change u64/s64 to a long long integer type 2009-01-13 14:47:59 +11:00
irq.c perf_counter: abstract wakeup flag setting in core to fix powerpc build 2009-04-06 09:30:14 +02:00
isa-bridge.c
kgdb.c kgdb, x86, arm, mips, powerpc: ignore user space single stepping 2008-09-26 10:36:41 -05:00
kprobes.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-01-07 11:31:52 -08:00
l2cr_6xx.S
legacy_serial.c
lparcfg.c powerpc: Change u64/s64 to a long long integer type 2009-01-13 14:47:59 +11:00
machine_kexec_32.c
machine_kexec_64.c powerpc/32: Setup OF properties for kdump 2008-12-23 15:13:29 +11:00
machine_kexec.c powerpc/kexec: Check crash_base for relocatable kernel 2009-01-13 14:47:59 +11:00
Makefile Merge branch 'linus' into perfcounters/core-v2 2009-04-06 09:02:57 +02:00
misc_32.S powerpc/44x: Support 16K/64K base page sizes on 44x 2008-12-29 09:53:25 +11:00
misc_64.S powerpc: Kexec exit should not use magic numbers 2008-10-31 16:11:44 +11:00
misc.S powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit 2008-09-15 11:08:35 -07:00
module_32.c powerpc/ppc32: ftrace, dynamic ftrace to handle modules 2008-11-20 10:52:53 -08:00
module_64.c powerpc: Unify opcode definitions and support 2009-02-23 10:48:56 +11:00
module.c powerpc/mm: Introduce MMU features 2008-12-21 14:21:16 +11:00
msi.c powerpc/PCI: include pci.h in powerpc MSI implementation 2009-03-25 08:54:29 -07:00
nvram_64.c
of_device.c powerpc: struct device - replace bus_id with dev_name(), dev_set_name() 2008-12-16 15:53:38 +11:00
of_platform.c
paca.c powerpc: Update page-in counter for CMM 2008-11-05 22:08:28 +11:00
pci_32.c powerpc/pci: Fix PCI<->OF matching of old style multifunc devices 2009-02-23 10:48:57 +11:00
pci_64.c powerpc/pci: Move hose_list and pci_address_to_pio to pci-common 2009-02-11 16:00:07 +11:00
pci_dn.c
pci-common.c powerpc/pci: Default to dma_direct_ops for pci dma_ops 2009-03-24 13:47:30 +11:00
perf_counter.c perf_counter: record time running and time enabled for each counter 2009-04-06 09:30:36 +02:00
pmc.c
power4-pmu.c perfcounters/powerpc: add support for POWER4 processors 2009-03-06 16:30:57 +11:00
power5-pmu.c perfcounters/powerpc: Add support for POWER5 processors 2009-02-26 15:36:48 +11:00
power5+-pmu.c perfcounters/powerpc: add support for POWER5+ processors 2009-03-06 16:28:37 +11:00
power6-pmu.c powerpc/perf_counter: Add support for POWER6 2009-01-10 16:35:01 +11:00
ppc32.h
ppc970-pmu.c powerpc/perf_counter: Add support for PPC970 family 2009-01-10 16:34:07 +11:00
ppc_ksyms.c powerpc: Export cacheable_memzero as its now used in a driver 2009-01-08 16:25:17 +11:00
ppc_save_regs.S powerpc: Prepare xmon_save_regs for use with kdump 2008-12-23 15:13:28 +11:00
proc_ppc64.c
process.c Simplify copy_thread() 2009-04-02 19:04:51 -07:00
prom_init_check.sh powerpc: Print linux_banner in prom_init 2009-03-11 17:11:33 +11:00
prom_init.c powerpc: Fix prom_init on 32-bit OF machines 2009-03-24 13:43:35 +11:00
prom_parse.c PCI: powerpc: use generic pci_swizzle_interrupt_pin() 2009-01-07 11:12:52 -08:00
prom.c powerpc: Allow debugging of LMBs with lmb=debug 2009-02-11 13:38:00 +11:00
ptrace32.c
ptrace.c
reloc_64.S powerpc: Make the 64-bit kernel as a position-independent executable 2008-09-15 11:08:38 -07:00
rtas_flash.c proc 2/2: remove struct proc_dir_entry::owner 2009-03-31 01:14:44 +04:00
rtas_pci.c powerpc/pci: Fix various pseries PCI hotplug issues 2008-11-06 09:31:52 +11:00
rtas-proc.c
rtas-rtc.c
rtas.c powerpc/pseries: Fix partition migration hang under load 2009-02-23 15:53:04 +11:00
setup_32.c powerpc/32: Wire up the trampoline code for kdump 2008-12-23 15:13:29 +11:00
setup_64.c powerpc/mm: Introduce early_init_mmu() on 64-bit 2009-03-24 13:47:34 +11:00
setup-common.c powerpc: setup default archdata for {of_}platform via bus_register_notifier 2009-03-24 13:47:30 +11:00
setup.h
signal_32.c powerpc: Sanitize stack pointer in signal handling code 2009-03-27 16:58:24 +11:00
signal_64.c powerpc: Sanitize stack pointer in signal handling code 2009-03-27 16:58:24 +11:00
signal.c powerpc: Sanitize stack pointer in signal handling code 2009-03-27 16:58:24 +11:00
signal.h powerpc: Sanitize stack pointer in signal handling code 2009-03-27 16:58:24 +11:00
smp-tbsync.c powerpc: Silence software timebase sync 2008-11-05 22:08:28 +11:00
smp.c Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-02 11:44:09 -08:00
softemu8xx.c
stacktrace.c
suspend.c
swsusp_32.S powerpc/mm: Introduce MMU features 2008-12-21 14:21:16 +11:00
swsusp_64.c
swsusp_asm64.S powerpc: Fix 64-bit hibernation with 64k pages 2008-10-07 14:26:20 +11:00
swsusp.c powerpc/mm: Split mmu_context handling 2008-12-21 14:21:15 +11:00
sys_ppc32.c compat: generic compat get/settimeofday 2008-10-16 11:21:33 -07:00
syscalls.c
sysfs.c powerpc: Fix bugs introduced by sysfs changes 2009-03-27 16:58:24 +11:00
systbl_chk.c
systbl_chk.sh
systbl.S
tau_6xx.c
time.c powerpc: Hook up rtc-generic, and kill rtc-ppc 2009-04-02 01:05:31 +00:00
traps.c powerpc: Add support for using doorbells for SMP IPI 2009-02-23 15:53:03 +11:00
udbg_16550.c powerpc/udbg: Fix lost byte during console handover; change LFCR to CRLF 2009-03-11 17:11:34 +11:00
udbg.c powerpc/udbg: Fix lost byte during console handover; change LFCR to CRLF 2009-03-11 17:11:34 +11:00
vdso.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2008-12-28 16:54:33 -08:00
vecemu.c
vector.S
vio.c workqueue: add to_delayed_work() helper function 2009-04-02 19:04:50 -07:00
vmlinux.lds.S Merge commit 'origin/master' into next 2009-03-30 14:04:53 +11:00