1
Commit Graph

45 Commits

Author SHA1 Message Date
Frederic Weisbecker
5a116dd279 perf tools: Use kernel bitmap library
Use the kernel bitmap library for internal perf tools uses.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1255792354-11304-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-19 09:26:34 +02:00
Julia Lawall
f39cdf25bf perf tools: Move dereference after NULL test
In each case, if the NULL test on thread is needed, then the
dereference should be after the NULL test.

A simplified version of the semantic match that detects this
problem is as follows (http://coccinelle.lip6.fr/):

// <smpl>
@match exists@
expression x, E;
identifier fld;
@@

* x->fld
  ... when != \(x = E\|&x\)
* x == NULL
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
LKML-Reference: <Pine.LNX.4.64.0910170842500.9213@ask.diku.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-17 09:29:10 +02:00
Ingo Molnar
b226f744d4 Merge branch 'linus' into perf/core
Merge reason: pick up tools/perf/ changes from upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 08:44:44 +02:00
Arnaldo Carvalho de Melo
d5b889f2ec perf tools: Move threads & last_match to threads.c
This was just being copy'n'pasted all over.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091013141629.GD21809@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 17:12:18 +02:00
Mike Galbraith
55ffb7a6bd perf sched: Add -C option to measure on a specific CPU
To refresh, trying to sched record only one CPU results in bogus
latencies as below.

I fixed^Wmade it stop doing the bad thing today, by
following task migration events properly.

Before:

  marge:/root/tmp # taskset -c 1 perf sched record -C 0 -- sleep 10
  marge:/root/tmp # perf sched lat
   -----------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms |
   -----------------------------------------------------------------------------------------
    Xorg:4943             |      1.290 ms |        1 | avg: 1670.132 ms | max: 1670.132 ms |
    hald-addon-stor:3569  |      0.091 ms |        3 | avg:  658.609 ms | max: 1975.797 ms |
    hald-addon-stor:3573  |      0.209 ms |        4 | avg:  499.138 ms | max: 1990.565 ms |
    audispd:4270          |      0.012 ms |        1 | avg:    0.015 ms | max:    0.015 ms |
  ....

  marge:/root/tmp # perf sched trace|grep 'Xorg:4943'
           swapper-0     [000]   401.184013288: sched_stat_runtime: task: Xorg:4943 runtime: 1233188 [ns], vruntime: 19105169779 [ns]
   rt2870TimerQHan-4947  [000]   402.854140127: sched_stat_wait: task: Xorg:4943 wait: 580073 [ns]
   rt2870TimerQHan-4947  [000]   402.854141770: sched_migrate_task: task Xorg:4943 [140] from: 1  to: 0
   rt2870TimerQHan-4947  [000]   402.854143854: sched_stat_wait: task: Xorg:4943 wait: 0 [ns]
   rt2870TimerQHan-4947  [000]   402.854145397: sched_switch: task rt2870TimerQHan:4947 [140] (D) ==> Xorg:4943 [140]
              Xorg-4943  [000]   402.854193133: sched_stat_runtime: task: Xorg:4943 runtime: 56546 [ns], vruntime: 11766332500 [ns]
              Xorg-4943  [000]   402.854196842: sched_switch: task Xorg:4943 [140] (S) ==> swapper:0 [140]

After:

  marge:/root/tmp # taskset -c 1 perf sched record -C 0 -- sleep 10
  marge:/root/tmp # perf sched lat
   -----------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms |
   -----------------------------------------------------------------------------------------
    amarokapp:11150       |    271.297 ms |      878 | avg:    0.130 ms | max:    1.057 ms |
    konsole:5965          |      1.370 ms |       12 | avg:    0.092 ms | max:    0.855 ms |
    Xorg:4943             |    179.980 ms |     1109 | avg:    0.087 ms | max:    1.206 ms |
    hald-addon-stor:3574  |      0.212 ms |        9 | avg:    0.040 ms | max:    0.169 ms |
    hald-addon-stor:3570  |      0.223 ms |        9 | avg:    0.037 ms | max:    0.223 ms |
    klauncher:5864        |      0.550 ms |        8 | avg:    0.032 ms | max:    0.048 ms |

The 'Maximum delay ms' results are now sane.

Signed-off-by: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 09:10:55 +02:00
Randy Dunlap
cbef79a82a perf tools: Fix const char type propagation
The following perf build warnings/errors in function
argument types:

  builtin-sched.c:1894: warning: passing argument 1 of 'sort_dimension__add' discards qualifiers from pointer target type
  util/trace-event-parse.c:685: warning: passing argument 2 of 'read_expected' discards qualifiers from pointer target type
  util/trace-event-parse.c:741: warning: passing argument 4 of 'test_type_token' discards qualifiers from pointer target type
  util/trace-event-parse.c:706: warning: passing argument 2 of 'read_expected_item' discards qualifiers from pointer target type

... trigger because older GCC is not able to prove that
sort_dimension__add() does not change the string.

Some goes for test_type_token().

Fix this by improving type consistency.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20091005131729.78444bfb.randy.dunlap@oracle.com>
[ Also remove ugly type cast now unnecessary. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 08:35:00 +02:00
Frederic Weisbecker
97ea1a7fa6 perf tools: Fix thread comm resolution in perf sched
This reverts commit 9a92b479b2 ("perf
tools: Improve thread comm resolution in perf sched") and fixes the
real bug.

The bug was elsewhere:

We are failing to resolve thread names in perf sched because the
table of threads we are building, on top of comm events, has a per
process granularity. But perf sched, unlike the other perf tools,
needs a per thread granularity as we are profiling every tasks
individually.

So fix it by building our threads table using the tid instead of
the pid as the thread identifier.

v2: Revert the previous fix - it is not really needed

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1255028657-11158-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08 21:10:21 +02:00
Frederic Weisbecker
9a92b479b2 perf tools: Improve thread comm resolution in perf sched
When we get sched traces that involve a task that was already
created before opening the event, we won't have the comm event for
it.

So if we can't find the comm event for a given thread, we look at
the traces that may contain these informations.

Before:

 ata/1:371             |      0.000 ms |        1 | avg: 3988.693 ms | max: 3988.693 ms |
 kondemand/1:421       |      0.096 ms |        3 | avg:  345.346 ms | max: 1035.989 ms |
 kondemand/0:420       |      0.025 ms |        3 | avg:  421.332 ms | max:  964.014 ms |
 :5124:5124            |      0.103 ms |        5 | avg:   74.082 ms | max:  277.194 ms |
 :6244:6244            |      0.691 ms |        9 | avg:  125.655 ms | max:  271.306 ms |
 firefox:5080          |      0.924 ms |        5 | avg:   53.833 ms | max:  257.828 ms |
 npviewer.bin:6225     |     21.871 ms |       53 | avg:   22.462 ms | max:  220.835 ms |
 :6245:6245            |      9.631 ms |       21 | avg:   41.864 ms | max:  213.349 ms |

After:

 ata/1:371             |      0.000 ms |        1 | avg: 3988.693 ms | max: 3988.693 ms |
 kondemand/1:421       |      0.096 ms |        3 | avg:  345.346 ms | max: 1035.989 ms |
 kondemand/0:420       |      0.025 ms |        3 | avg:  421.332 ms | max:  964.014 ms |
 firefox:5124          |      0.103 ms |        5 | avg:   74.082 ms | max:  277.194 ms |
 npviewer.bin:6244     |      0.691 ms |        9 | avg:  125.655 ms | max:  271.306 ms |
 firefox:5080          |      0.924 ms |        5 | avg:   53.833 ms | max:  257.828 ms |
 npviewer.bin:6225     |     21.871 ms |       53 | avg:   22.462 ms | max:  220.835 ms |
 npviewer.bin:6245     |      9.631 ms |       21 | avg:   41.864 ms | max:  213.349 ms |

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1255012632-7882-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08 16:56:33 +02:00
Frederic Weisbecker
016e92fbc9 perf tools: Unify perf.data mapping and events handling
This librarizes the perf.data file mapping and handling in various
perf tools, roughly reducing the amount of code and fixing the
places that mmap from beginning of the file whereas we want to mmap
from the beginning of the data, leading to page fault because the
mmap window is too small since the trace info are written in the
file too.

TODO:

 - convert perf timechart too

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <20091007104729.GD5043@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08 16:56:32 +02:00
Frederic Weisbecker
03456a158d perf tools: Merge trace.info content into perf.data
This drops the trace.info file and move its contents into the
common perf.data file.

This is done by creating a new trace_info section into this file. A
user of perf headers needs to call perf_header__set_trace_info() to
save the trace meta informations into the perf.data file.

A file created by perf after his patch is unsupported by previous
version because the size of the headers have increased.

That said, it's two new fields that have been added in the end of
the headers, and those could be ignored by previous versions if
they just handled the dynamic header size and then ignore the
unknow part. The offsets guarantee the compatibility. We'll do a
-stable fix for that.

But current previous versions handle the header size using its
static size, not dynamic, then it's not backward compatible with
trace records.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20091006213643.GA5343@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-07 08:36:10 +02:00
Arnaldo Carvalho de Melo
a80deb622d perf sched: Remove dead code
Several variables are not used at all, cut'n'paste leftovers.

Also check if the sample_type is RAW earlier, to avoid needless
searches.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-30 13:57:57 +02:00
Ingo Molnar
cdd6c482c9 perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:28:04 +02:00
Mike Galbraith
4b77a72977 perf sched: Add --input=file option to builtin-sched.c
perf sched record passes unparsed args on to perf record, so
specifying an output file via perf sched record -o FILE (cmd) just
works.  Ergo, provide an option to specify input file as well.

Also add the missing 'map' command to help.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253254944.20589.11.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18 08:36:06 +02:00
Ingo Molnar
40749d0ff4 perf sched: Determine the number of CPUs automatically
For 'perf sched map' output, determine max_cpu automatically,
instead of the static default of 15.

[ v2: use sysconf() pointed out by Arjan van de Ven <arjan@infradead.org> ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-17 22:09:19 +02:00
Ingo Molnar
0ec04e16d0 perf sched: Add 'perf sched map' scheduling event map printout
This prints a textual context-switching outline of workload
captured via perf sched record.

For example, on a 16 CPU box it outputs:

   N1  O1  .   .   .   S1  .   .   .   B0  .  *I0  C1  .   M1  .    23002.773423 secs
   N1  O1  .  *Q0  .   S1  .   .   .   B0  .   I0  C1  .   M1  .    23002.773423 secs
   N1  O1  .   Q0  .   S1  .   .   .   B0  .  *R1  C1  .   M1  .    23002.773485 secs
   N1  O1  .   Q0  .   S1  .  *S0  .   B0  .   R1  C1  .   M1  .    23002.773478 secs
  *L0  O1  .   Q0  .   S1  .   S0  .   B0  .   R1  C1  .   M1  .    23002.773523 secs
   L0  O1  .  *.   .   S1  .   S0  .   B0  .   R1  C1  .   M1  .    23002.773531 secs
   L0  O1  .   .   .   S1  .   S0  .   B0  .   R1  C1 *T1  M1  .    23002.773547 secs T1 => irqbalance:2089
   L0  O1  .   .   .   S1  .   S0  .  *P0  .   R1  C1  T1  M1  .    23002.773549 secs
  *N1  O1  .   .   .   S1  .   S0  .   P0  .   R1  C1  T1  M1  .    23002.773566 secs
   N1  O1  .   .   .  *J0  .   S0  .   P0  .   R1  C1  T1  M1  .    23002.773571 secs
   N1  O1  .   .   .   J0  .   S0 *B0  P0  .   R1  C1  T1  M1  .    23002.773592 secs
   N1  O1  .   .   .   J0  .  *U0  B0  P0  .   R1  C1  T1  M1  .    23002.773582 secs
   N1  O1  .   .   .  *S1  .   U0  B0  P0  .   R1  C1  T1  M1  .    23002.773604 secs
   N1  O1  .   .   .   S1  .   U0  B0 *.   .   R1  C1  T1  M1  .    23002.773615 secs
   N1  O1  .   .   .   S1  .   U0  B0  .   .  *K0  C1  T1  M1  .    23002.773631 secs
   N1  O1  .  *M0  .   S1  .   U0  B0  .   .   K0  C1  T1  M1  .    23002.773624 secs
   N1  O1  .   M0  .   S1  .   U0 *.   .   .   K0  C1  T1  M1  .    23002.773644 secs
   N1  O1  .   M0  .   S1  .   U0  .   .   .  *R1  C1  T1  M1  .    23002.773662 secs
   N1  O1  .   M0  .   S1  .  *.   .   .   .   R1  C1  T1  M1  .    23002.773648 secs
   N1  O1  .  *.   .   S1  .   .   .   .   .   R1  C1  T1  M1  .    23002.773680 secs
   N1  O1  .   .   .  *L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773717 secs
  *N0  O1  .   .   .   L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773709 secs
  *N1  O1  .   .   .   L0  .   .   .   .   .   R1  C1  T1  M1  .    23002.773747 secs

Columns stand for individual CPUs, from CPU0 to CPU15, and the
two-letter shortcuts stand for tasks that are running on a CPU.

'*' denotes the CPU that had the event.

A dot signals an idle CPU.

New tasks are assigned new two-letter shortcuts - when they occur
first they are printed. In the above example 'T1' stood for irqbalance:

      T1 => irqbalance:2089

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-16 16:41:30 +02:00
Ingo Molnar
80ed0987f3 perf sched: Make idle thread and comm/pid names more consistent
Peter noticed that we have 3 ways of referring to the idle thread:

 [idle]:0
 swapper:0
 swapper-0

Standardize on 'swapper:0'.

Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-16 12:15:47 +02:00
Ingo Molnar
c8a3775104 perf sched: Sanity check context switch events
Use 'perf sched latency' to track the current task based on
context-switch events, and flag the cases where there's some
impossible transition: such as a PID being switched out that
was not switched in.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-16 12:08:28 +02:00
Ingo Molnar
dc02bf7178 perf sched: Account for lost events, increase default buffering
Output such lost event and state machine weirdness stats:

   TOTAL:                |  14974.910 ms |    46384 |
  ---------------------------------------------------
   INFO: 8.865% lost events (19132 out of 215819, in 8 chunks)
   INFO: 0.198% state machine bugs (49 out of 24708) (due to lost events?)

And increase buffering to -m 1024 (4 MB) by default. Since we
use output multiplexing that kind of space is needed.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-16 11:48:05 +02:00
mingo
39aeb52f99 perf sched: Add support for sched:sched_stat_runtime events
This allows more precise 'perf sched latency' output:

 ---------------------------------------------------------------------------------------
  Task                  |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
 ---------------------------------------------------------------------------------------
  ksoftirqd/0-4         |    0.010 ms |        2 | avg:    2.476 ms | max:    2.977 ms |
  perf-12328            |   15.844 ms |       66 | avg:    1.118 ms | max:    9.979 ms |
  bdi-default-235       |    0.009 ms |        1 | avg:    0.998 ms | max:    0.998 ms |
  events/1-8            |    0.020 ms |        2 | avg:    0.998 ms | max:    0.998 ms |
  events/0-7            |    0.018 ms |        2 | avg:    0.992 ms | max:    0.996 ms |
  sleep-12329           |    0.742 ms |        3 | avg:    0.906 ms | max:    2.289 ms |
  sshd-12122            |    0.163 ms |        2 | avg:    0.283 ms | max:    0.562 ms |
  loop-getpid-lon-12322 | 1023.636 ms |       69 | avg:    0.208 ms | max:    5.996 ms |
  loop-getpid-lon-12321 | 1038.638 ms |        5 | avg:    0.073 ms | max:    0.171 ms |
  migration/1-5         |    0.000 ms |        1 | avg:    0.006 ms | max:    0.006 ms |
 ---------------------------------------------------------------------------------------
  TOTAL:                | 2079.078 ms |      153 |
 -------------------------------------------------

Also, streamline the code a bit more, add asserts for various state
machine failures (they should be debugged if they occur) and fix
a few odd ends.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 20:08:23 +02:00
mingo
08f69e6c2e perf sched: Print PIDs too
Often it's useful to know the PID of the task as well - print it
out too.

( While at it, reformat the output to be a bit more
  paste-into-commit-logs friendly. )

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 18:34:57 +02:00
Ingo Molnar
d11533893b perf sched: Fix 'perf sched latency' output on 32-bit systems
Before:

  -----------------------------------------------------------------------------------
   Task              |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
  -----------------------------------------------------------------------------------
   perf              |4853313.251 ms |       10 | avg:    0.046 ms | max:    0.337 ms |
   flush-8:0         |2426659.202 ms |        5 | avg:    0.015 ms | max:    0.016 ms |
   sleep             |485331.966 ms |        1 | avg:    0.012 ms | max:    0.012 ms |
   ksoftirqd/1       |485331.320 ms |        1 | avg:    0.005 ms | max:    0.005 ms |
  -----------------------------------------------------------------------------------
   TOTAL:            |8250635.739 ms |       17 |
  ---------------------------------------------

After:

  -----------------------------------------------------------------------------------
   Task              |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
  -----------------------------------------------------------------------------------
   perf              |    0.206 ms |       10 | avg:    0.046 ms | max:    0.337 ms |
   flush-8:0         |    2.680 ms |        5 | avg:    0.015 ms | max:    0.016 ms |
   sleep             |    0.662 ms |        1 | avg:    0.012 ms | max:    0.012 ms |
   ksoftirqd/1       |    0.015 ms |        1 | avg:    0.005 ms | max:    0.005 ms |
  -----------------------------------------------------------------------------------
   TOTAL:            |    3.563 ms |       17 |
  ---------------------------------------------

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 18:22:53 +02:00
Ingo Molnar
ea57c4f520 perf tools: Implement counter output multiplexing
Finish the -M/--multiplex option implementation:

 - separate it out from group_fd

 - correctly set it via the ioctl and dont mmap counters that
   are multiplexed

 - modify the perf record event loop to deal with buffer-less
   counters.

 - remove the -g option from perf sched record

 - account for unordered events in perf sched latency

 - (add -f to perf sched record to ease measurements)

 - skip idle threads (pid==0) in latency output

The result is better latency output by 'perf sched latency':

 -----------------------------------------------------------------------------------
  Task              |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
 -----------------------------------------------------------------------------------
  ksoftirqd/8       |    0.071 ms |        2 | avg:    0.458 ms | max:    0.913 ms |
  at-spi-registry   |    0.609 ms |       19 | avg:    0.013 ms | max:    0.023 ms |
  perf              |    3.316 ms |       16 | avg:    0.013 ms | max:    0.054 ms |
  Xorg              |    0.392 ms |       19 | avg:    0.011 ms | max:    0.018 ms |
  sleep             |    0.537 ms |        2 | avg:    0.009 ms | max:    0.009 ms |
 -----------------------------------------------------------------------------------
  TOTAL:            |    4.925 ms |       58 |
 ---------------------------------------------

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 15:45:11 +02:00
Frederic Weisbecker
aa1ab9d26a perf tools: Fix processing of randomly serialized sched traces
Currently it's possible to meet such too high latency results
with 'perf sched latency'.

 -----------------------------------------------------------------------------------
 Task              |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
 -----------------------------------------------------------------------------------
 xfce4-panel       |    0.222 ms |        2 | avg: 4718.345 ms | max: 9436.493 ms |
 scsi_eh_3         |    3.962 ms |       36 | avg:   55.957 ms | max: 1977.829 ms |

The origin is on traces that are sometimes badly serialized across cpus.
For example the raw traces that raised such results for xfce4-panel:

(1)          [init]-0     [000]  1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120]
(2)     xfce4-panel-4569  [000]  1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140]
(3)            Xorg-4276  [001]  1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(4)            Xorg-4276  [001]  1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(5)            perf-5219  [000]  1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120]

The traces are processed in the order they arrive. Then in (2),
xfce4-panel sleeps, it is first waken up in (3) and eventually
scheduled in (5).

The latency reported is then 1504 - 1495 = 9 secs, as reported by perf
sched. But this is wrong, we are confident in the fact the traces are
nicely serialized while we should actually more trust the timestamps.

If we reorder by timestamps we get:

(1)            Xorg-4276  [001]  1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(2)          [init]-0     [000]  1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120]
(3)     xfce4-panel-4569  [000]  1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140]
(4)            Xorg-4276  [001]  1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(5)            perf-5219  [000]  1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120]

Now the trace make more sense, xfce4-panel is sleeping. Then it is
woken up in (1), scheduled in (2)
It goes to sleep in (3), woken up in (4) and scheduled in (5).

Now, latency captured between (1) and (2) is of 39 us.
And between (4) and (5) it is 2.1 ms.

Such pattern of bad serializing is the origin of the high latencies
reported by perf sched.

Basically, we need to check whether wake up time is higher than
schedule out time. If it's not the case, we need to tag the current
work atom as invalid.

Beside that, we may need to work later on a better ordering of the
traces given by the kernel.

After this patch:

xfce4-session     |    0.221 ms |        1 | avg:    0.538 ms | max:    0.538 ms |

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 15:45:10 +02:00
Frederic Weisbecker
d13025222c perf tools: Add an option to multiplex counters in a single channel
Add an option to multiplex counters output in the channel of
the group leader, ie: the first counter opened:

	-M --multiplex

The effect is better serialized samples. This is especially
useful for tracepoint samples that need to be well serialized
for their post-processing.

Also make use of this option in 'perf sched'.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 09:52:23 +02:00
Ingo Molnar
c13f0d3c81 perf sched: Add 'perf sched trace', improve documentation
Alias 'perf sched trace' to 'perf trace', for workflow completeness.

Add a bit of documentation for perf sched.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 17:55:23 +02:00
Ingo Molnar
1fc35b29b4 perf sched: Implement the 'perf sched record' subcommand
Implement the 'perf sched record' subcommand that adds a
default list of events, turns on raw sampling and system-wide
tracing and passes off the rest of the command to perf record.

This is more convenient than having to specify the events all
the time.

Before:

 $ perf record -a -R -e sched:sched_switch:r -e sched:sched_stat_wait:r -e sched:sched_stat_sleep:r -e sched:sched_stat_iowait:r -e sched:sched_process_exit:r -e sched:sched_process_fork:r -e sched:sched_wakeup:r -e sched:sched_migrate_task:r -c 1 sleep 1

After:

 $ perf sched record -f sleep 1

Also fix an assumption in the event string parser that assumed
that strings passed in can be modified. (In this case they wont
be as they come from a readonly constant section.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:51 +02:00
Ingo Molnar
b5fae128e4 perf sched: Clean up PID sorting logic
Use a sort list for thread atoms insertion as well - instead of
hardcoded for PID.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:50 +02:00
Ingo Molnar
b1ffe8f3e0 perf sched: Finish latency => atom rename and misc cleanups
- Rename 'latency' field/variable names to the better 'atom' ones

 - Reduce the number of #include lines and consolidate them

 - Gather file scope variables at the top of the file

 - Remove unused bits

No change in functionality.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:49 +02:00
Ingo Molnar
f2858d8ad9 perf sched: Add 'perf sched latency' and 'perf sched replay'
Separate the option parsing cleanly and add two variants:

 - 'perf sched latency' (can be abbreviated via 'perf sched lat')
 - 'perf sched replay'  (can be abbreviated via 'perf sched rep')

Also add a repeat count option to replay and add a separation
set of options for replay.

Do the sorting setup only in the latency sub-command.

Display separate help screens for 'perf sched' and
'perf sched replay -h' - i.e. further separation of the
sub-commands.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:49 +02:00
Frederic Weisbecker
daa1d7a5ea perf sched: Implement multidimensional sorting
Implement multidimensional sorting on perf sched so that
you can sort either by number of switches, latency average,
latency maximum, runtime.

perf sched -l -s avg,max  (this is the default)

-----------------------------------------------------------------------------------
 Task              |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------
 gnome-power-man   |    0.113 ms |        1 | avg: 4998.531 ms | max: 4998.531 ms |
 xfdesktop         |    1.190 ms |        7 | avg:  136.475 ms | max:  940.933 ms |
 xfce-mcs-manage   |    2.194 ms |       22 | avg:   38.534 ms | max:  735.174 ms |
 notification-da   |    2.749 ms |       31 | avg:   27.436 ms | max:  731.791 ms |
 xfce4-session     |    3.343 ms |       28 | avg:   26.796 ms | max:  734.891 ms |
 xfwm4             |    3.159 ms |       22 | avg:   12.406 ms | max:  241.333 ms |
 xchat             |   42.789 ms |      214 | avg:   11.886 ms | max:  100.349 ms |
 xfce4-terminal    |    5.386 ms |       22 | avg:   11.414 ms | max:  241.611 ms |
 firefox           |  151.992 ms |      123 | avg:    9.543 ms | max:  153.717 ms |
 xfce4-panel       |   24.324 ms |       47 | avg:    8.189 ms | max:  242.352 ms |
 :5090             |    6.932 ms |      111 | avg:    8.131 ms | max:  102.665 ms |
 events/0          |    0.758 ms |       12 | avg:    1.964 ms | max:   21.879 ms |
 Xorg              |  280.558 ms |      340 | avg:    1.864 ms | max:   99.526 ms |
 geany             |   63.391 ms |      295 | avg:    1.099 ms | max:    9.334 ms |
 reiserfs/0        |    0.039 ms |        2 | avg:    0.854 ms | max:    1.487 ms |
 kondemand/0       |    8.251 ms |      245 | avg:    0.691 ms | max:   34.372 ms |

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:48 +02:00
Frederic Weisbecker
7362262687 perf sched: Fix nsec to msec conversion
We are dividing a time in ns by 1e9. This is a nsec to sec
conversion. What we want is msecs. Fix it by dividing by 1e6.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:48 +02:00
Frederic Weisbecker
66685678a0 perf sched: Export the total, max latency and total runtime to thread atoms list
Add a field in the thread atom list that keeps track of the
total and max latencies and also the total runtime. This makes
a faster output and also prepares for sorting.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:47 +02:00
Frederic Weisbecker
c6ced61112 perf sched: Add involuntarily sleeping task in work atoms
Currently in perf sched, we are measuring the scheduler wakeup
latencies.

Now we also want measure the time a task wait to be scheduled
after it gets preempted.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:47 +02:00
Frederic Weisbecker
1756220530 perf sched: Rename struct lat_snapshot to struct work atoms
To measures the latencies, we capture the sched atoms data into
a specific structure named struct lat_snapshot.

As this structure can be used for other purposes of scheduler
profiling and mirrors what happens in a thread work atom, lets
rename it to struct work_atom and propagate this renaming in
other functions and structures names to keep it coherent.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:46 +02:00
Ingo Molnar
3e304147cd perf sched: Output runtime and context switch totals
After:

-----------------------------------------------------------------------------------
 Task              |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------
 make              |    0.678 ms |       13 | avg:    0.018 ms | max:    0.050 ms |
 gcc               |    0.014 ms |        2 | avg:    0.320 ms | max:    0.627 ms |
 gcc               |    0.000 ms |        2 | avg:    0.185 ms | max:    0.369 ms |
...
-----------------------------------------------------------------------------------
 TOTAL:            |   21.316 ms |       63 |
---------------------------------------------

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:45 +02:00
Ingo Molnar
ea92ed5a8f perf sched: Add runtime stats
Extend the latency tracking structure with scheduling atom
runtime info - and sum it up during per task display.

(Also clean up a few details.)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:45 +02:00
Ingo Molnar
d9340c1db3 perf sched: Display time in milliseconds, reorganize output
After:

-----------------------------------------------------------------------------------
 Task              |  runtime ms | switches | average delay ms | maximum delay ms |
-----------------------------------------------------------------------------------
 migration/0       |    0.000 ms |        1 | avg:    0.047 ms | max:    0.047 ms |
 ksoftirqd/0       |    0.000 ms |        1 | avg:    0.039 ms | max:    0.039 ms |
 migration/1       |    0.000 ms |        3 | avg:    0.013 ms | max:    0.016 ms |
 migration/3       |    0.000 ms |        2 | avg:    0.003 ms | max:    0.004 ms |
 migration/4       |    0.000 ms |        1 | avg:    0.022 ms | max:    0.022 ms |
 distccd           |    0.000 ms |        1 | avg:    0.004 ms | max:    0.004 ms |
 distccd           |    0.000 ms |        1 | avg:    0.014 ms | max:    0.014 ms |
 distccd           |    0.000 ms |        2 | avg:    0.000 ms | max:    0.000 ms |
 distccd           |    0.000 ms |        2 | avg:    0.012 ms | max:    0.019 ms |
 distccd           |    0.000 ms |        1 | avg:    0.002 ms | max:    0.002 ms |
 as                |    0.000 ms |        2 | avg:    0.019 ms | max:    0.019 ms |
 as                |    0.000 ms |        3 | avg:    0.015 ms | max:    0.017 ms |
 as                |    0.000 ms |        1 | avg:    0.009 ms | max:    0.009 ms |
 perf              |    0.000 ms |        1 | avg:    0.001 ms | max:    0.001 ms |
 gcc               |    0.000 ms |        1 | avg:    0.021 ms | max:    0.021 ms |
 run-mozilla.sh    |    0.000 ms |        2 | avg:    0.010 ms | max:    0.017 ms |
 mozilla-plugin-   |    0.000 ms |        1 | avg:    0.006 ms | max:    0.006 ms |
 gcc               |    0.000 ms |        2 | avg:    0.013 ms | max:    0.013 ms |
-----------------------------------------------------------------------------------

(The runtime ms column is not filled in yet.)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:44 +02:00
Ingo Molnar
46f392c97f perf sched: Clean up latency and replay sub-commands
- Separate the latency and the replay commands more cleanly

 - Use consistent naming

 - Display help page on 'perf sched' outlining comments,
   instead of aborting

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:44 +02:00
Frederic Weisbecker
cdce9d738b perf sched: Add sched latency profiling
Add the -l --latency option that reports statistics about the
scheduler latencies.

For now, the latencies are measured in the following sequence
scope:

- task A is sleeping (D or S state)
- task B wakes up A
         ^
         |
         |

   latency timeframe

         |
         |
         v
- task A is scheduled in

Start by recording every scheduler events:

	perf record -e sched:*

and then fetch the results:

	perf sched -l

 Tasks                     count          total              avg            max

migration/0                  2             39849            19924           28826
ksoftirqd/0                  7            756383           108054          373014
migration/1                  5             45391             9078           10452
ksoftirqd/1                  2            399055           199527          359130
events/0                     8           4780110           597513         4500250
events/1                     9           6353057           705895         2986012
kblockd/0                   42          37805097           900121         5077684

The snapshot are in nanoseconds.

- Count: number of snapshots taken for the given task
- Total: total latencies in nanosec
- Avg  : average of latency between wake up and sched in
- Max  : max snapshot latency

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:43 +02:00
Frederic Weisbecker
419ab0d6a9 perf sched: Make it easier to plug in new sub profilers
Create a sched event structure of handlers in which various
sched events reader can plug their own callbacks.

This makes easier the addition of new perf sched sub commands.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:42 +02:00
Frederic Weisbecker
4653881802 perf sched: Fix bad event alignment
perf sched raises the following error when it meets a sched
switch event:

perf: builtin-sched.c:286: register_pid: Assertion `!(pid >= 65536)' failed.
Abandon

Currently in x86-64, the sched switch events have a hole in the
middle of the structure:

	u16 common_type;
	u8 common_flags;
	u8 common_preempt_count;
	u32 common_pid;
	u32 common_tgid;

	char prev_comm[16];
	u32 prev_pid;
	u32 prev_prio;
			<--- there
	u64 prev_state;
	char next_comm[16];
	u32 next_pid;
	u32 next_prio;

Gcc inserts a 4 bytes hole there for prev_state to be u64
aligned. And the events are exported to userspace with this
hole.

But in userspace, from perf sched, we fetch it using a
structure that has a new field in the beginning: u32 size. This
is because our trace is exported with its size as a field. But
now that we have this new field, the hole in the middle
disappears because it makes prev_state becoming well aligned.

And since we are using a pointer to the raw trace using this
struct, instead of reading prev_state, we are reading the hole.

We could fix it by keeping the size seperate from the struct
but actually there a lot of other potential problems: some
fields may be saved as long in a 64 bits system and later read
as long in a 32 bits system. Also this direct cast doesn't care
about the endianness differences between the host traced
machine and the machine in which we do the post processing.

So instead of using such dangerous direct casts, fetch the
values using the trace parsing API that already takes care of
all these problems.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:41 +02:00
Ingo Molnar
ad236fd23b perf sched: Tighten up the code
Various small cleanups - removal of debug printks and dead
functions, etc.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:39 +02:00
Ingo Molnar
fbf9482911 perf sched: Implement the scheduling workload replay engine
Integrate the schedbench.c bits with the raw trace events
that we get from the perf machinery, and activate the
workload replayer/simulator.

Example of a captured 'make -j' workload:

$ perf sched

  run measurement overhead: 90 nsecs
  sleep measurement overhead: 2724743 nsecs
  the run test took 1000081 nsecs
  the sleep test took 2981111 nsecs
  version = 0.5
  ...
  nr_run_events:        70
  nr_sleep_events:      66
  nr_wakeup_events:     9
  target-less wakeups:  71
  multi-target wakeups: 47
  run events optimized: 139
  task      0 (                perf:      6607), nr_events: 2
  task      1 (                perf:      6608), nr_events: 6
  task      2 (                    :         0), nr_events: 1
  task      3 (                make:      6609), nr_events: 5
  task      4 (                  sh:      6610), nr_events: 4
  task      5 (                make:      6611), nr_events: 6
  task      6 (                  sh:      6612), nr_events: 4
  task      7 (                make:      6613), nr_events: 5
  task      8 (        migration/11:        25), nr_events: 1
  task      9 (        migration/13:        29), nr_events: 1
  task     10 (        migration/15:        33), nr_events: 1
  task     11 (         migration/9:        21), nr_events: 1
  task     12 (                  sh:      6614), nr_events: 4
  task     13 (                make:      6615), nr_events: 5
  task     14 (                  sh:      6616), nr_events: 4
  task     15 (                make:      6617), nr_events: 7
  task     16 (         migration/3:         9), nr_events: 1
  task     17 (         migration/5:        13), nr_events: 1
  task     18 (         migration/7:        17), nr_events: 1
  task     19 (         migration/1:         5), nr_events: 1
  task     20 (                  sh:      6618), nr_events: 4
  task     21 (                make:      6619), nr_events: 5
  task     22 (                  sh:      6620), nr_events: 4
  task     23 (                make:      6621), nr_events: 10
  task     24 (                  sh:      6623), nr_events: 3
  task     25 (                 gcc:      6624), nr_events: 4
  task     26 (                 gcc:      6625), nr_events: 4
  task     27 (                 gcc:      6626), nr_events: 5
  task     28 (            collect2:      6627), nr_events: 5
  task     29 (                  sh:      6622), nr_events: 1
  task     30 (                make:      6628), nr_events: 7
  task     31 (                  sh:      6630), nr_events: 4
  task     32 (                 gcc:      6631), nr_events: 4
  task     33 (                  sh:      6629), nr_events: 1
  task     34 (                 gcc:      6632), nr_events: 4
  task     35 (                 gcc:      6633), nr_events: 4
  task     36 (            collect2:      6634), nr_events: 4
  task     37 (                make:      6635), nr_events: 8
  task     38 (                  sh:      6637), nr_events: 4
  task     39 (                  sh:      6636), nr_events: 1
  task     40 (                 gcc:      6638), nr_events: 4
  task     41 (                 gcc:      6639), nr_events: 4
  task     42 (                 gcc:      6640), nr_events: 4
  task     43 (            collect2:      6641), nr_events: 4
  task     44 (                make:      6642), nr_events: 6
  task     45 (                  sh:      6643), nr_events: 5
  task     46 (                  sh:      6644), nr_events: 3
  task     47 (                  sh:      6645), nr_events: 4
  task     48 (                make:      6646), nr_events: 6
  task     49 (                  sh:      6647), nr_events: 3
  task     50 (                make:      6648), nr_events: 5
  task     51 (                  sh:      6649), nr_events: 5
  task     52 (                  sh:      6650), nr_events: 6
  task     53 (                make:      6651), nr_events: 4
  task     54 (                make:      6652), nr_events: 5
  task     55 (                make:      6653), nr_events: 4
  task     56 (                make:      6654), nr_events: 4
  task     57 (                make:      6655), nr_events: 5
  task     58 (                  sh:      6656), nr_events: 4
  task     59 (                 gcc:      6657), nr_events: 9
  task     60 (         ksoftirqd/3:        10), nr_events: 1
  task     61 (                 gcc:      6658), nr_events: 4
  task     62 (                make:      6659), nr_events: 5
  task     63 (                  sh:      6660), nr_events: 3
  task     64 (                 gcc:      6661), nr_events: 5
  task     65 (            collect2:      6662), nr_events: 4
  ------------------------------------------------------------
  #1  : 256.745, ravg: 256.74, cpu: 0.00 / 0.00
  #2  : 439.372, ravg: 275.01, cpu: 0.00 / 0.00
  #3  : 411.971, ravg: 288.70, cpu: 0.00 / 0.00
  #4  : 385.500, ravg: 298.38, cpu: 0.00 / 0.00
  #5  : 366.526, ravg: 305.20, cpu: 0.00 / 0.00
  #6  : 381.281, ravg: 312.81, cpu: 0.00 / 0.00
  #7  : 410.756, ravg: 322.60, cpu: 0.00 / 0.00
  #8  : 368.009, ravg: 327.14, cpu: 0.00 / 0.00
  #9  : 408.098, ravg: 335.24, cpu: 0.00 / 0.00
  #10 : 368.582, ravg: 338.57, cpu: 0.00 / 0.00

I.e. we successfully analyzed the trace, replayed it
via real threads and measured the replayed workload's
scheduling properties.

This is how it looked like in 'top' output:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  7164 mingo     20   0 1434m 8080  888 R 57.0  0.1   0:02.04 :perf
  7165 mingo     20   0 1434m 8080  888 R 41.8  0.1   0:01.52 :perf
  7228 mingo     20   0 1434m 8080  888 R 39.8  0.1   0:01.44 :gcc
  7225 mingo     20   0 1434m 8080  888 R 33.8  0.1   0:01.26 :gcc
  7202 mingo     20   0 1434m 8080  888 R 31.2  0.1   0:01.16 :sh
  7222 mingo     20   0 1434m 8080  888 R 25.2  0.1   0:00.96 :sh
  7211 mingo     20   0 1434m 8080  888 R 21.9  0.1   0:00.82 :sh
  7213 mingo     20   0 1434m 8080  888 D 19.2  0.1   0:00.74 :sh
  7194 mingo     20   0 1434m 8080  888 D 18.6  0.1   0:00.72 :make

There's still various kinks in it - more patches to come.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:38 +02:00
Ingo Molnar
ec156764d4 perf sched: Import schedbench.c
Import the schedbench.c tool that i wrote some time ago to
simulate scheduler behavior but never finished. It's a good
basis for perf sched nevertheless.

Most of its guts are not hooked up to the perf event loop
yet - that will be done in the patches to come.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:37 +02:00
Ingo Molnar
0a02ad9331 perf: Add 'perf sched' tool
This turn-key tool allows scheduler measurements to be
conducted and the results be displayed numerically.

First baby step towards that goal: clone the new command off of
perf trace.

Fix a few other details along the way:

 - add (minimal) perf trace documentation

 - reorder a few places

 - list perf trace in the mainporcelain list as well
   as it's a very useful utility.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13 10:22:36 +02:00