linux

Author	SHA1	Message	Date
Ingo Molnar	d1dedb52ac	panic, smp: provide smp_send_stop() wrapper on UP too Impact: cleanup, no code changed Remove an ugly #ifdef CONFIG_SMP from panic(), by providing an smp_send_stop() wrapper on UP too. LKML-Reference: <49B91A7E.76E4.0078.0@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 11:24:31 +01:00
Ingo Molnar	ffd71da4e3	panic: decrease oops_in_progress only after having done the panic Impact: eliminate secondary warnings during panic() We can panic() in a number of difficult, atomic contexts, hence we use bust_spinlocks(1) in panic() to increase oops_in_progress, which prevents various debug checks we have in place. But in practice this protection only covers the first few printk's done by panic() - it does not cover the later attempt to stop all other CPUs and kexec(). If a secondary warning triggers in one of those facilities that can make the panic message scroll off. So do bust_spinlocks(0) only much later in panic(). (which code is only reached if panic policy is relaxed that it can return after a warning message) Reported-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <49B91A7E.76E4.0078.0@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 11:06:47 +01:00
Ingo Molnar	cd80a8142e	Merge branch 'x86/core' into core/ipi	2009-03-13 11:05:58 +01:00
Ingo Molnar	641cd4cfcd	generic-ipi: eliminate WARN_ON()s during oops/panic Do not output smp-call related warnings in the oops/panic codepath. Reported-by: Jan Beulich <jbeulich@novell.com> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49B91A7E.76E4.0078.0@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 10:47:34 +01:00
Ingo Molnar	88f502fedb	futex: remove the pointer math from double_unlock_hb, fix Impact: fix double unlock crash Thomas Gleixner noticed that the simplified double_unlock_hb() became ... too unsophisticated: in the hb1 == hb2 case it will do a double unlock. Reported-by: Thomas Gleixner <tglx@linutronix.de> Cc: Darren Hart <dvhltc@us.ibm.com> LKML-Reference: <20090312221118.11146.68610.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 10:32:07 +01:00
Zhaolei	fa9d13cf13	ftrace: don't try to __ftrace_replace_code on !FTRACE_FL_CONVERTED rec Do __ftrace_replace_code for !FTRACE_FL_CONVERTED rec will always fail, we should ignore this rec. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: "Steven Rostedt ;" <rostedt@goodmis.org> LKML-Reference: <49BA2472.4060206@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 10:25:06 +01:00
Zhaolei	b00f0b6dc1	ftrace: avoid double-free of dyn_ftrace If dyn_ftrace is freed before ftrace_release(), ftrace_release() will free it again and make ftrace_free_records wrong. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: "Steven Rostedt ;" <rostedt@goodmis.org> LKML-Reference: <49BA23D9.1050900@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 10:25:06 +01:00
Ingo Molnar	62a394eb77	Merge branches 'tracing/ftrace' and 'tracing/syscalls'; commit 'v2.6.29-rc8' into tracing/core	2009-03-13 10:23:39 +01:00
Frederic Weisbecker	ee08c6eccb	tracing/ftrace: syscall tracing infrastructure, basics Provide basic callbacks to do syscall tracing. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <1236401580-5758-2-git-send-email-fweisbec@gmail.com> [ simplified it to a trace_printk() for now. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 06:25:43 +01:00
Steven Rostedt	899039e874	softirq: no need to have SOFTIRQ in softirq name Impact: clean up It is redundant to have 'SOFTIRQ' in the softirq names. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-13 00:43:33 -04:00
Steven Rostedt	7f96f93f02	tracing: move binary buffers into per cpu directory The binary_buffers directory in /debugfs/tracing held the files to read the trace buffers in a binary format. This held one file per CPU buffer. But we also have a per_cpu directory that holds a way to read the pretty-print formats. This patch moves the binary buffers into the per_cpu_directory: # ls /debug/tracing/per_cpu/cpu1/ trace trace_pipe trace_pipe_raw The new name is called "trace_pipe_raw". The binary buffers always acted similar to trace_pipe, except that they produce raw data. Requested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-13 00:37:42 -04:00
Rusty Russell	c69fc56de1	cpumask: use topology_core_cpumask/topology_thread_cpumask instead of cpu_core_map/cpu_sibling_map Impact: cleanup This is presumably what those definitions are for, and while all archs define cpu_core_map/cpu_sibling map, that's changing (eg. x86 wants to change it to a pointer). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:46 +10:30
Steven Rostedt	bdc067582b	tracing: add comment for use of double __builtin_consant_p Impact: documentation The use of the double __builtin_contant_p checks in the event_trace_printk can be confusing to developers and reviewers. This patch adds a comment to explain why it is there. Requested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> LKML-Reference: <20090313122235.43EB.A69D9226@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-13 00:15:46 -04:00
Steven Rostedt	eb1871f343	tracing: left align location header in stack_trace Ingo Molnar suggested, instead of: Depth Size Location (27 entries) ----- ---- -------- 0) 2880 48 lock_timer_base+0x2b/0x4f 1) 2832 80 __mod_timer+0x33/0xe0 2) 2752 16 __ide_set_handler+0x63/0x65 To have it be: Depth Size Location (27 entries) ----- ---- -------- 0) 2880 48 lock_timer_base+0x2b/0x4f 1) 2832 80 __mod_timer+0x33/0xe0 2) 2752 16 __ide_set_handler+0x63/0x65 Requested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-13 00:00:58 -04:00
Ingo Molnar	f6411fe7e0	Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/core	2009-03-13 04:50:44 +01:00
Steven Rostedt	5cc9854888	ring-buffer: document reader page design In a private email conversation I explained how the ring buffer page worked by using silly ASCII art. Ingo suggested that I add that to the comments of the code. Here it is. Requested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 22:24:17 -04:00
Steven Rostedt	f28e55765e	tracing: show event name in trace for TRACE_EVENT created events Unlike TRACE_FORMAT() macros, the TRACE_EVENT() macros do not show the event name in the trace file. Knowing the event type in the trace output is very useful. Instead of: task swapper:0 [140] ==> ntpd:3308 [120] We now have: sched_switch: task swapper:0 [140] ==> ntpd:3308 [120] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 22:00:19 -04:00
KOSAKI Motohiro	889a6c3672	tracing: Don't use tracing_record_cmdline() in workqueue tracer fix commit `c3ffc7a40b` "Don't use tracing_record_cmdline() in workqueue tracer" has a race window. find_task_by_vpid() requires task_list_lock(). LKML-Reference: <20090313090042.43CD.A69D9226@jp.fujitsu.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:23:47 -04:00
Jason Baron	39842323ce	tracing: tracepoints for softirq entry/exit - tracepoints Introduce softirq entry/exit tracepoints. These are useful for augmenting existing tracers, and to figure out softirq frequencies and timings. [ s/irq_softirq_/softirq_/ for trace point names and Fixed printf format in TRACE_FORMAT macro - Steven Rostedt ] LKML-Reference: <20090312183603.GC3352@redhat.com> Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:20:58 -04:00
Jason Baron	5d592b44b2	tracing: tracepoints for softirq entry/exit - add softirq-to-name array Create a 'softirq_to_name' array, which is indexed by softirq #, so that we can easily convert between the softirq index # and its name, in order to get more meaningful output messages. LKML-Reference: <20090312183336.GB3352@redhat.com> Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:15:02 -04:00
Steven Rostedt	e447e1df2e	tracing: explain why stack tracer is empty If the stack tracing is disabled (by default) the stack_trace file will only contain the header: # cat /debug/tracing/stack_trace Depth Size Location (0 entries) ----- ---- -------- This can be frustrating to a developer that does not realize that the stack tracer is disabled. This patch adds the following text: # cat /debug/tracing/stack_trace Depth Size Location (0 entries) ----- ---- -------- # # Stack tracer disabled # # To enable the stack tracer, either add 'stacktrace' to the # kernel command line # or 'echo 1 > /proc/sys/kernel/stack_tracer_enabled' # Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:15:01 -04:00
Steven Rostedt	2da03ecee6	tracing: fix stack tracer header The stack tracer use to look like this: # cat /debug/tracing/stack_trace Depth Size Location (57 entries) ----- ---- -------- 0) 5088 16 mempool_alloc_slab+0x16/0x18 1) 5072 144 mempool_alloc+0x4d/0xfe 2) 4928 16 scsi_sg_alloc+0x48/0x4a [scsi_mod] Now it looks like this: # cat /debug/tracing/stack_trace Depth Size Location (57 entries) ----- ---- -------- 0) 5088 16 mempool_alloc_slab+0x16/0x18 1) 5072 144 mempool_alloc+0x4d/0xfe 2) 4928 16 scsi_sg_alloc+0x48/0x4a [scsi_mod] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:15:01 -04:00
Steven Rostedt	7975a2be16	tracing: export trace formats to user space The binary printk saves a pointer to the format string in the ring buffer. On output, the format is processed. But if the user is reading the ring buffer through a binary interface, the pointer is meaningless. This patch creates a file called printk_formats that maps the pointers to the formats. # cat /debug/tracing/printk_formats 0xffffffff80713d40 : "irq_handler_entry: irq=%d handler=%s\n" 0xffffffff80713d48 : "lock_acquire: %s%s%s\n" 0xffffffff80713d50 : "lock_release: %s\n" Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:15:01 -04:00
Steven Rostedt	e9fb2b6d58	tracing: have event_trace_printk use static tracer Impact: speed up on event tracing The event_trace_printk is currently a wrapper function that calls trace_vprintk. Because it uses a variable for the fmt it misses out on the optimization of using the binary printk. This patch makes event_trace_printk into a macro wrapper to use the fmt as the same as the trace_printks. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:15:00 -04:00
Steven Rostedt	828275574e	tracing: make bprint event use the proper event id The bprint record is using TRACE_PRINT when it should be TRACE_BPRINT. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:15:00 -04:00
Frederic Weisbecker	48ead02030	tracing/core: bring back raw trace_printk for dynamic formats strings Impact: fix callsites with dynamic format strings Since its new binary implementation, trace_printk() internally uses static containers for the format strings on each callsites. But the value is assigned once at build time, which means that it can't take dynamic formats. So this patch unearthes the raw trace_printk implementation for the callers that will need trace_printk to be able to carry these dynamic format strings. The trace_printk() macro will use the appropriate implementation for each callsite. Most of the time however, the binary implementation will still be used. The other impact of this patch is that mmiotrace_printk() will use the old implementation because it calls the low level trace_vprintk and we can't guess here whether the format passed in it is dynamic or not. Some parts of this patch have been written by Steven Rostedt (most notably the part that chooses the appropriate implementation for each callsites). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:15:00 -04:00
Steven Rostedt	db526ca329	tracing: show that buffer size is not expanded Impact: do not confuse user on small trace buffer sizes When the system boots up, the trace buffer is small to conserve memory. It is only two pages per online CPU. When the tracer is used, it expands to the default value. This can confuse the user if they look at the buffer size and see only 7, but then later they see 1408. # cat /debug/tracing/buffer_size_kb 7 # echo sched_switch > /debug/tracing/current_tracer # cat /debug/tracing/buffer_size_kb 1408 This patch tries to help remove this confustion by showing that the buffer has not been expanded. # cat /debug/tracing/buffer_size_kb 7 (expanded: 1408) Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:14:59 -04:00
Steven Rostedt	8aabee573d	ring-buffer: remove unneeded get_online_cpus Impact: speed up and remove possible races The get_online_cpus was added to the ring buffer because the original design would free the ring buffer on a CPU that was being taken off line. The final design kept the ring buffer around even when the CPU was taken off line. This is to allow a user to still read the information on that ring buffer. Most of the get_online_cpus are no longer needed since the ring buffer will not disappear from the use cases. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:14:59 -04:00
Steven Rostedt	59222efe2d	ring-buffer: use CONFIG_HOTPLUG_CPU not CONFIG_HOTPLUG The hotplug code in the ring buffers is for use with CPU hotplug, not generic hotplug. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:14:59 -04:00
Steven Rostedt	1027fcb206	tracing: protect ring_buffer_expanded with trace_types_lock Impact: prevent races with ring_buffer_expanded This patch places the expanding of the tracing buffer under the protection of the trace_types_lock mutex. It is highly unlikely that there would be any contention, but better safe than sorry. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:14:58 -04:00
Steven Rostedt	a123c52b46	tracing: fix comments about trace buffer resizing Impact: cleanup Some of the comments about the trace buffer resizing is gobbledygook. And I wonder why people question if I'm a native English speaker. This patch makes the comments make a bit more sense. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-12 21:14:58 -04:00
Ingo Molnar	25d500067d	Merge branch 'linus' into core/ipi	2009-03-13 02:14:25 +01:00
Steven Rostedt	51b643b404	Merge branch 'tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/ftrace-merge	2009-03-12 21:12:46 -04:00
Ingo Molnar	480c93df5b	Merge branch 'core/locking' into tracing/ftrace	2009-03-13 01:33:21 +01:00
Ingo Molnar	d820ac4c2f	locking: rename trace_softirq_[enter\|exit] => lockdep_softirq_[enter\|exit] Impact: cleanup The naming clashes with upcoming softirq tracepoints, so rename the APIs to lockdep_*(). Requested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 01:32:36 +01:00
Ingo Molnar	3c1f67d60e	Merge branch 'linus' into core/locking	2009-03-13 01:29:17 +01:00
Darren Hart	f061d35150	futex: remove the pointer math from double_unlock_hb Impact: simplify code I mistakenly included the pointer value ordering in the double_unlock_hb() in my previous patch. It's only necessary in the double_lock_hb() function. This patch removes it. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20090312221118.11146.68610.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 01:15:46 +01:00
Magnus Damm	eb53b4e8fe	irq: export remove_irq() and setup_irq() symbols Export the setup_irq() and remove_irq() symbols. I'd like to export these functions since I have timer code that needs to use setup_irq() early on (too early for request_irq()), and the same code can also be compiled as a module. Signed-off-by: Magnus Damm <damm@igel.co.jp> LKML-Reference: <20090312120559.2926.82371.sendpatchset@rx1.opensource.se> [ changed to _GPL as these are special APIs deep inside the irq layer. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 13:16:33 +01:00
Magnus Damm	cbf94f0682	irq: match remove_irq() args with setup_irq() Modify remove_irq() to match setup_irq(). Signed-off-by: Magnus Damm <damm@igel.co.jp> LKML-Reference: <20090312120551.2926.43942.sendpatchset@rx1.opensource.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 13:16:33 +01:00
Magnus Damm	f21cfb258d	irq: add remove_irq() for freeing of setup_irq() irqs Impact: add new API This patch adds a remove_irq() function for releasing interrupts requested with setup_irq(). Without this patch we have no way of releasing such interrupts since free_irq() today tries to kfree() the irqaction passed with setup_irq(). Signed-off-by: Magnus Damm <damm@igel.co.jp> LKML-Reference: <20090312120542.2926.56609.sendpatchset@rx1.opensource.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 13:16:32 +01:00
Ingo Molnar	f8cb22cbb8	Merge branch 'linus' into irq/genirq	2009-03-12 13:16:18 +01:00
Darren Hart	e4dc5b7a36	futex: clean up fault logic Impact: cleanup Older versions of the futex code held the mmap_sem which had to be dropped in order to call get_user(), so a two-pronged fault handling mechanism was employed to handle faults of the atomic operations. The mmap_sem is no longer held, so get_user() should be adequate. This patch greatly simplifies the logic and improves legibility. Build and boot tested on a 4 way Intel x86_64 workstation. Passes basic pthread_mutex and PI tests out of ltp/testcases/realtime. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20090312075612.9856.48612.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 11:20:57 +01:00
Darren Hart	e8f6386c01	futex: unlock before returning -EFAULT Impact: rt-mutex failure case fix futex_lock_pi can potentially return -EFAULT with the rt_mutex held. This seems like the wrong thing to do as userspace should assume -EFAULT means the lock was not taken. Even if it could figure this out, we'd be leaving the pi_state->owner in an inconsistent state. This patch unlocks the rt_mutex prior to returning -EFAULT to userspace. Build and boot tested on a 4 way Intel x86_64 workstation. Passes basic pthread_mutex and PI tests out of ltp/testcases/realtime. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20090312075606.9856.88729.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 11:20:57 +01:00
Darren Hart	16f4993f4e	futex: use current->time_slack_ns for rt tasks too RT tasks should set their timer slack to 0 on their own. This patch removes the 'if (rt_task()) slack = 0;' block in futex_wait. Build and boot tested on a 4 way Intel x86_64 workstation. Passes basic pthread_mutex and PI tests out of ltp/testcases/realtime. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <20090312075559.9856.28822.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 11:20:57 +01:00
Darren Hart	5eb3dc62fc	futex: add double_unlock_hb() Impact: cleanup The futex code uses double_lock_hb() which locks the hb->lock's in pointer value order. There is no parallel unlock routine, and the code unlocks them in name order, ignoring pointer value. This patch adds double_unlock_hb() to refactor the duplicated code segments. Build and boot tested on a 4 way Intel x86_64 workstation. Passes basic pthread_mutex and PI tests out of ltp/testcases/realtime. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20090312075552.9856.48021.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 11:20:56 +01:00
Darren Hart	de87fcc124	futex: additional (get\|put)_futex_key() fixes Impact: fix races futex_requeue and futex_lock_pi still had some bad (get\|put)_futex_key() usage. This patch adds the missing put_futex_keys() and corrects a goto in futex_lock_pi() to avoid a double get. Build and boot tested on a 4 way Intel x86_64 workstation. Passes basic pthread_mutex and PI tests out of ltp/testcases/realtime. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20090312075545.9856.75152.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 11:20:56 +01:00
Darren Hart	b2d0994b13	futex: update futex commentary Impact: cleanup The futex_hash_bucket can be a bit confusing when first looking at the code as it is a shared queue (and futex_q isn't a queue at all, but rather an element on the queue). The mmap_sem is no longer held outside of the futex_handle_fault() routine, yet numerous comments refer to it. The fshared argument is no an integer. I left some of these comments along as they are simply removed in future patches. Some of the commentary refering to futexes by virtual page mappings was not very clear, and completely accurate (as for shared futexes both the page and the offset are used to determine the key). For the purposes of the function description, just referring to "the futex" seems sufficient. With hashed futexes we now access the page after the hash-bucket is locked, and not only after it is enqueued. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20090312075537.9856.29954.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-12 11:20:55 +01:00
Steven Rostedt	554f786e28	ring-buffer: only allocate buffers for online cpus Impact: save on memory Currently, a ring buffer was allocated for each "possible_cpus". On some systems, this is the same as NR_CPUS. Thus, if a system defined NR_CPUS = 64 but it only had 1 CPU, we could have possibly 63 useless ring buffers taking up space. With a default buffer of 3 megs, this could be quite drastic. This patch changes the ring buffer code to only allocate ring buffers for online CPUs. If a CPU goes off line, we do not free the buffer. This is because the user may still have trace data in that buffer that they would like to look at. Perhaps in the future we could add code to delete a ring buffer if the CPU is offline and the ring buffer becomes empty. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-11 22:15:27 -04:00
Steven Rostedt	9aba60fe6e	tracing: fix trace_wait to know to wait on all cpus or just one Impact: fix to task live locking on reading trace_pipe on one CPU The same code is used for both trace_pipe (all CPUS) and the per_cpu trace_pipe file. When there is no data to read, it will check for signals and wait on the trace wait queue. The problem happens with the per_cpu wait. The trace_wait code checks all CPUs. Thus, if there's data in another CPU buffer, then it will exit the wait, without checking for signals or waiting on the wait queue. It would then try to read the empty buffer, and since that will just return nothing, then it will try to wait again. Unfortunately, that will again fail due to there still being data in the other buffers. This ends up with a live lock for the task. This patch fixes the trace_wait to be aware that the iterator may only be waiting on a single buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-11 22:15:25 -04:00
Steven Rostedt	1852fcce18	tracing: expand the ring buffers when an event is activated To save memory, the tracer ring buffers are set to a minimum. The activating of a trace expands the ring buffer size. This patch adds this expanding, when an event is activated. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-11 22:15:24 -04:00
Steven Rostedt	73c5162aa3	tracing: keep ring buffer to minimum size till used Impact: less memory impact on systems not using tracer When the kernel boots up that has tracing configured, it allocates the default size of the ring buffer. This currently happens to be 1.4Megs per possible CPU. This is quite a bit of wasted memory if the system is never using the tracer. The current solution is to keep the ring buffers to a minimum size until the user uses them. Once a tracer is piped into the current_tracer the ring buffer will be expanded to the default size. If the user changes the size of the ring buffer, it will take the size given by the user immediately. If the user adds a "ftrace=" to the kernel command line, then the ring buffers will be set to the default size on initialization. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-11 22:15:22 -04:00
Ingo Molnar	aecfcde920	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-11 20:47:23 +01:00
Mike Galbraith	df1c99d416	sched: add avg_overlap decay Impact: more precise avg_overlap metric - better load-balancing avg_overlap is used to measure the runtime overlap of the waker and wakee. However, when a process changes behaviour, eg a pipe becomes un-congested and we don't need to go to sleep after a wakeup for a while, the avg_overlap value grows stale. When running we use the avg runtime between preemption as a measure for avg_overlap since the amount of runtime can be correlated to cache footprint. The longer we run, the less likely we'll be wanting to be migrated to another CPU. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236709131.25234.576.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-11 11:31:50 +01:00
Ingo Molnar	78b020d035	Merge branches 'x86/cleanups', 'x86/kexec', 'x86/mce2' and 'linus' into x86/core	2009-03-11 10:49:15 +01:00
Benjamin Herrenschmidt	e14eee56c2	Merge commit 'origin/master' into next	2009-03-11 17:10:07 +11:00
Dhaval Giani	be50b8342d	kernel/user.c: fix a memory leak when freeing up non-init usernamespaces users We were returning early in the sysfs directory cleanup function if the user belonged to a non init usernamespace. Due to this a lot of the cleanup was not done and we were left with a leak. Fix the leak. Reported-by: Serge Hallyn <serue@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-10 15:55:11 -07:00
Ingo Molnar	e2b8b28085	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-10 22:55:31 +01:00
Ingo Molnar	4dd163a051	Merge branches 'tracing/ftrace', 'tracing/textedit' and 'linus' into tracing/core	2009-03-10 22:54:23 +01:00
Steven Rostedt	80370cb758	tracing: use raw spinlocks for trace_vprintk Impact: prevent locking up by lockdep tracer The lockdep tracer uses trace_vprintk and thus trace_vprintk can not call back into lockdep without locking up. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 17:16:35 -04:00
Peter Zijlstra	6cc3c6e12b	trace_clock: fix preemption bug Using the function_graph tracer in recent kernels generates a spew of preemption BUGs. Fix this by not requiring trace_clock_local() users to disable preemption themselves. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-10 20:03:01 +01:00
Steven Rostedt	ef18012b24	tracing: remove funky whitespace in the trace code Impact: clean up There existed a lot of <space><tab>'s in the tracing code. This patch removes them. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 14:13:14 -04:00
Steven Rostedt	0e3d0f0566	tracing: update comments to match event code macros Impact: clean up / comments The comments that described the ftrace macros to manipulate the TRACE_EVENT and TRACE_FORMAT macros no longer match the code. This patch updates them. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 13:12:58 -04:00
Steven Rostedt	30a8fecc2d	tracing: flip the TP_printk and TP_fast_assign in the TRACE_EVENT macro Impact: clean up In trying to stay consistant with the C style format in the TRACE_EVENT macro, it makes more sense to do the printk after the assigning of the variables. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 12:41:38 -04:00
Steven Rostedt	2314c4ae14	tracing: add back the available_events file The event directory files type and available_types were no longer needed with the new TRACE_EVENT_FORMAT macros, they were deleted. But by accident the available_events file was also removed. This patch brings it back. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 12:04:02 -04:00
Peter Zijlstra	57310a98a3	sched: optimize ttwu vs group scheduling Impact: micro-optimization We can avoid the sched domain walk on try_to_wake_up() when we know there are no groups. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236603381.8389.455.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-10 16:43:36 +01:00
Ingo Molnar	8c54436ae9	Merge branches 'sched/cleanups' and 'linus' into sched/core	2009-03-10 16:34:43 +01:00
Steven Rostedt	40e26815fa	tracing: do not allow modifying the ftrace events via the event files Impact: fix to prevent crash on calling NULL function pointer The ftrace internal records have their format exported via the event system under the ftrace subsystem. These are only for exporting the format to allow binary readers to be able to parse them in a binary output. The ftrace subsystem events can only be enabled via the ftrace tracers and do not have a registering function. The event files expect the event record to have registering function and will call it directly. Passing in a ftrace subsystem event will cause the kernel to crash because it will execute a NULL pointer. This patch prevents the ftrace subsystem from being viewable to the event enabling files. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 11:32:40 -04:00
Steven Rostedt	ce8eb2bf05	tracing: fix printk format specifier Impact: clean up The offsetof and sizeof are of type size_t, and instead of typecasting them to unsigned int for printk formatting, one could just use %zu. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 10:14:35 -04:00
Américo Wang	bbb76b552a	ptrace: remove a useless goto Impact: cleanup Obviously, this goto is useless. Remove it. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Roland McGrath <roland@redhat.com> LKML-Reference: <20090310093447.GC3179@hack> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-10 13:39:29 +01:00
KOSAKI Motohiro	bbcd306359	tracing: Don't assume possible cpu list have continuous numbers "for (++cpu ; cpu < num_possible_cpus(); cpu++)" statement assumes possible cpus have continuous number - but that's a wrong assumption. Insted, cpumask_next() should be used. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090310104437.A480.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-10 10:20:30 +01:00
Ingo Molnar	8293dd6f86	Merge branch 'x86/core' into tracing/ftrace Semantic merge: kernel/trace/trace_functions_graph.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-10 10:17:48 +01:00
Ingo Molnar	9a1043d19c	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-10 09:57:16 +01:00
Ingo Molnar	12e87e36e0	Merge branches 'tracing/doc', 'tracing/ftrace', 'tracing/printk' and 'linus' into tracing/core	2009-03-10 09:56:25 +01:00
Ingo Molnar	467c88fee5	Merge branches 'x86/apic', 'x86/asm', 'x86/fixmap', 'x86/memtest', 'x86/mm', 'x86/urgent', 'linus' and 'core/percpu' into x86/core	2009-03-10 09:26:38 +01:00
Steven Rostedt	157587d7ac	tracing: remove obsolete TRACE_EVENT_FORMAT macro Impact: clean up The TRACE_EVENT_FORMAT macro is no longer used by trace points and only the DECLARE_TRACE, TRACE_FORMAT or TRACE_EVENT macros should be used by them. Although the TRACE_EVENT_FORMAT macro is still used by the internal tracing utility, it should not be used in core kernel code. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 00:35:12 -04:00
Steven Rostedt	da4d03020c	tracing: new format for specialized trace points Impact: clean up and enhancement The TRACE_EVENT_FORMAT macro looks quite ugly and is limited in its ability to save data as well as to print the record out. Working with Ingo Molnar, we came up with a new format that is much more pleasing to the eye of C developers. This new macro is more C style than the old macro, and is more obvious to what it does. Here's the example. The only updated macro in this patch is the sched_switch trace point. The old method looked like this: TRACE_EVENT_FORMAT(sched_switch, TP_PROTO(struct rq rq, struct task_struct prev, struct task_struct next), TP_ARGS(rq, prev, next), TP_FMT("task %s:%d ==> %s:%d", prev->comm, prev->pid, next->comm, next->pid), TRACE_STRUCT( TRACE_FIELD(pid_t, prev_pid, prev->pid) TRACE_FIELD(int, prev_prio, prev->prio) TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN], next_comm, TP_CMD(memcpy(TRACE_ENTRY->next_comm, next->comm, TASK_COMM_LEN))) TRACE_FIELD(pid_t, next_pid, next->pid) TRACE_FIELD(int, next_prio, next->prio) ), TP_RAW_FMT("prev %d:%d ==> next %s:%d:%d") ); The above method is hard to read and requires two format fields. The new method: / * Tracepoint for task switches, performed by the scheduler: * * (NOTE: the 'rq' argument is not used by generic trace events, * but used by the latency tracer plugin. ) / TRACE_EVENT(sched_switch, TP_PROTO(struct rq rq, struct task_struct prev, struct task_struct next), TP_ARGS(rq, prev, next), TP_STRUCT__entry( __array( char, prev_comm, TASK_COMM_LEN ) __field( pid_t, prev_pid ) __field( int, prev_prio ) __array( char, next_comm, TASK_COMM_LEN ) __field( pid_t, next_pid ) __field( int, next_prio ) ), TP_printk("task %s:%d [%d] ==> %s:%d [%d]", __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, __entry->next_comm, __entry->next_pid, __entry->next_prio), TP_fast_assign( memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); __entry->prev_pid = prev->pid; __entry->prev_prio = prev->prio; memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); __entry->next_pid = next->pid; __entry->next_prio = next->prio; ) ); This macro is called TRACE_EVENT, it is broken up into 5 parts: TP_PROTO: the proto type of the trace point TP_ARGS: the arguments of the trace point TP_STRUCT_entry: the structure layout of the entry in the ring buffer TP_printk: the printk format TP_fast_assign: the method used to write the entry into the ring buffer The structure is the definition of how the event will be saved in the ring buffer. The printk is used by the internal tracing in case of an oops, and the kernel needs to print out the format of the record to the console. This the TP_printk gives a means to show the records in a human readable format. It is also used to print out the data from the trace file. The TP_fast_assign is executed directly. It is basically like a C function, where the __entry is the handle to the record. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 00:35:07 -04:00
Steven Rostedt	9cc26a261d	tracing: use generic __stringify Impact: clean up This removes the custom made STR(x) macros in the tracer and uses the generic __stringify macro instead. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 00:35:05 -04:00
Steven Rostedt	2939b0469d	tracing: replace TP<var> with TP_<var> Impact: clean up The macros TPPROTO, TPARGS, TPFMT, TPRAWFMT, and TPCMD all look a bit ugly. This patch adds an underscore to their names. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 00:35:04 -04:00
Steven Rostedt	156b5f172a	tracing: typecast sizeof and offsetof to unsigned int Impact: fix compiler warnings On x86_64 sizeof and offsetof are treated as long, where as on x86_32 they are int. This patch typecasts them to unsigned int to avoid one arch giving warnings while the other does not. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-10 00:34:03 -04:00
Oleg Nesterov	2d5516cbb9	copy_process: fix CLONE_PARENT && parent_exec_id interaction CLONE_PARENT can fool the ->self_exec_id/parent_exec_id logic. If we re-use the old parent, we must also re-use ->parent_exec_id to make sure exit_notify() sees the right ->xxx_exec_id's when the CLONE_PARENT'ed task exits. Also, move down the "p->parent_exec_id = p->self_exec_id" thing, to place two different cases together. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-09 13:23:25 -07:00
Heiko Carstens	6d5b5acca9	Fix fixpoint divide exception in acct_update_integrals Frans Pop reported the crash below when running an s390 kernel under Hercules: Kernel BUG at 000738b4 verbose debug info unavailable! fixpoint divide exception: 0009 #1! SMP Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod dasd_eckd_mod dasd_mod CPU: 0 Not tainted 2.6.27.19 #13 Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18) Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 Krnl GPRS: 00000000 000007d0 7fffffff fffff830 00000000 ffffffff 00000002 0f9ed9b8 00000000 00008ca0 00000000 0f9ed9b8 0f9edda4 8007386e 0f4f7ec8 0f4f7e98 Krnl Code: 800738aa: a71807d0 lhi %r1,2000 800738ae: 8c200001 srdl %r2,1 800738b2: 1d21 dr %r2,%r1 >800738b4: 5810d10e l %r1,270(%r13) 800738b8: 1823 lr %r2,%r3 800738ba: 4130f060 la %r3,96(%r15) 800738be: 0de1 basr %r14,%r1 800738c0: 5800f060 l %r0,96(%r15) Call Trace: ( <000000000004fdea>! blocking_notifier_call_chain+0x1e/0x2c) <0000000000038502>! do_exit+0x106/0x7c0 <0000000000038c36>! do_group_exit+0x7a/0xb4 <0000000000038c8e>! SyS_exit_group+0x1e/0x30 <0000000000021c28>! sysc_do_restart+0x12/0x16 <0000000077e7e924>! 0x77e7e924 Reason for this is that cpu time accounting usually only happens from interrupt context, but acct_update_integrals gets also called from process context with interrupts enabled. So in acct_update_integrals we may end up with the following scenario: Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt happens which updates accouting values. This causes acct_timexpd to be greater than the former stime + utime. The subsequent calculation of dtime = cputime_sub(time, tsk->acct_timexpd); will be negative and the division performed by cputime_to_jiffies(dtime) will generate an exception since the result won't fit into a 32 bit register. In order to fix this just always disable interrupts while accessing any of the accounting values. Reported by: Frans Pop <elendil@planet.nl> Tested by: Frans Pop <elendil@planet.nl> Cc: stable@kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-09 08:13:35 -07:00
KOSAKI Motohiro	c3ffc7a40b	tracing: Don't use tracing_record_cmdline() in workqueue tracer Impact: improve workqueue tracer output Currently, /sys/kernel/debug/tracing/trace_stat/workqueues can display wrong and strange thread names. Why? Currently, ftrace has tracing_record_cmdline()/trace_find_cmdline() convenience function that implements a task->comm string cache. This can avoid unnecessary memcpy overhead and the workqueue tracer uses it. However, in general, any trace statistics feature shouldn't use tracing_record_cmdline() because trace statistics can display very old process. Then comm cache can return wrong string because recent process overrides the cache. Fortunately, workqueue trace guarantees that displayed processes are live. Thus we can search comm string from PID at display time. <before> % cat workqueues # CPU INSERTED EXECUTED NAME # \| \| \| \| 7 431913 431913 kondemand/7 7 0 0 tail 7 21 21 git 7 0 0 ls 7 9 9 cat 7 832632 832632 unix_chkpwd 7 236292 236292 ls Note: tail, git, ls, cat unix_chkpwd are obiously not workqueue thread. <after> % cat workqueues # CPU INSERTED EXECUTED NAME # \| \| \| \| 7 510 510 kondemand/7 7 0 0 kmpathd/7 7 15 15 ata/7 7 0 0 aio/7 7 11 11 kblockd/7 7 1063 1063 work_on_cpu/7 7 167 167 events/7 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-09 10:26:13 +01:00
KOSAKI Motohiro	888b55dc31	ftrace: tracing header should put '#' at the beginning of a line In a recent discussion, Andrew Morton pointed out that tracing header should put '#' at the beginning of a line. Then, we can easily filtered the header by following grep usage: cat trace \| grep -v '^#' Wakeup trace also has the same header problem. Comparison of headers displayed: before this patch: # tracer: wakeup # wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip -------------------------------------------------------------------- latency: 19059 us, #21277/21277, CPU#1 \| (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) ----------------- \| task: kondemand/1-1644 (uid:0 nice:-5 policy:0 rt_prio:0) ----------------- # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / irqbalan-1887 1d.s. 0us : 1887:120:R + [001] 1644:115:S kondemand/1 irqbalan-1887 1d.s. 1us : default_wake_function <-autoremove_wake_function irqbalan-1887 1d.s. 2us : check_preempt_wakeup <-try_to_wake_up after this patch: # tracer: wakeup # # wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip # -------------------------------------------------------------------- # latency: 529 us, #530/530, CPU#0 \| (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # \| task: kondemand/0-1641 (uid:0 nice:-5 policy:0 rt_prio:0) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / sshd-2496 0d.s. 0us : 2496:120:R + [000] 1641:115:S kondemand/0 sshd-2496 0d.s. 1us : default_wake_function <-autoremove_wake_function sshd-2496 0d.s. 1us : check_preempt_wakeup <-try_to_wake_up Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20090308124421.23C3.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-08 16:57:22 +01:00
Ingo Molnar	dba58e39ce	Merge branches 'tracing/doc', 'tracing/ftrace', 'tracing/printk' and 'tracing/textedit' into tracing/core	2009-03-08 16:48:51 +01:00
Ingo Molnar	9de36825b3	tracing: trace_bprintk() cleanups Impact: cleanup Remove a few leftovers and clean up the code a bit. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:12 +01:00
Frederic Weisbecker	769b0441f4	tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:12 +01:00
Lai Jiangshan	1ba28e02a1	tracing: add trace_bprintk() Impact: add a generic printk() for tracing, like trace_printk() trace_bprintk() uses the infrastructure to record events on ring_buffer. [ fweisbec@gmail.com: ported to latest -tip, made it work if !CONFIG_MODULES, never free the format strings from modules because we can't keep track of them and conditionnaly create the ftrace format strings section (reported by Steven Rostedt) ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:11 +01:00
Lai Jiangshan	1427cdf059	tracing: infrastructure for supporting binary record Impact: save on memory for tracing Current tracers are typically using a struct(like struct ftrace_entry, struct ctx_switch_entry, struct special_entr etc...)to record a binary event. These structs can only record a their own kind of events. A new kind of tracer need a new struct and a lot of code too handle it. So we need a generic binary record for events. This infrastructure is for this purpose. [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched tracing as reported by Steven Rostedt] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:11 +01:00
Mathieu Desnoyers	4460fdad85	tracing, Text Edit Lock - kprobes architecture independent support Use the mutual exclusion provided by the text edit lock in the kprobes code. It allows coherent manipulation of the kernel code by other subsystems. Changelog: Move the kernel_text_lock/unlock out of the for loops. Use text_mutex directly instead of a function. Remove whitespace modifications. (note : kprobes_mutex is always taken outside of text_mutex) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <49B14306.2080202@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 16:49:00 +01:00
Ingo Molnar	f0ef039851	Merge branch 'x86/core' into tracing/textedit Conflicts: arch/x86/Kconfig block/blktrace.c kernel/irq/handle.c Semantic conflict: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 16:45:01 +01:00
Lai Jiangshan	5ed0cec0ac	sched: TIF_NEED_RESCHED -> need_reshed() cleanup Impact: cleanup Use test_tsk_need_resched(), set_tsk_need_resched(), need_resched() instead of using TIF_NEED_RESCHED. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <49B10BA4.9070209@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 12:48:55 +01:00
KOSAKI Motohiro	10dd3ebe21	tracing: fix deadlock when setting set_ftrace_pid Impact: fix deadlock while using set_ftrace_pid Reproducer: # cd /sys/kernel/debug/tracing # echo $$ > set_ftrace_pid then, console becomes hung. Details: when writing set_ftracepid, kernel callstack is following ftrace_pid_write() mutex_lock(&ftrace_lock); ftrace_update_pid_func() mutex_lock(&ftrace_lock); mutex_unlock(&ftrace_lock); mutex_unlock(&ftrace_lock); then, system always deadlocks when ftrace_pid_write() is called. In past days, ftrace_pid_write() used ftrace_start_lock, but commit `e6ea44e9b4` consolidated ftrace_start_lock to ftrace_lock. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <20090306151155.0778.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 12:07:38 +01:00
KOSAKI Motohiro	422d3c7a57	tracing: current tip/master can't enable ftrace After commit `40ada30f96`, "make menuconfig" doesn't display "Tracer" item. Following modification restores it.	2009-03-06 11:56:42 +01:00
Ingo Molnar	7fc07d8410	Merge branch 'sched/core' into sched/cleanups	2009-03-06 11:47:52 +01:00
Ingo Molnar	bc722f508a	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-06 11:40:37 +01:00
Ingo Molnar	1609743970	Merge branches 'tracing/ftrace' and 'tracing/function-graph-tracer' into tracing/core	2009-03-06 11:39:18 +01:00
Tejun Heo	edcb463997	percpu, module: implement reserved allocation and use it for module percpu variables Impact: add reserved allocation functionality and use it for module percpu variables This patch implements reserved allocation from the first chunk. When setting up the first chunk, arch can ask to set aside certain number of bytes right after the core static area which is available only through a separate reserved allocator. This will be used primarily for module static percpu variables on architectures with limited relocation range to ensure that the module perpcu symbols are inside the relocatable range. If reserved area is requested, the first chunk becomes reserved and isn't available for regular allocation. If the first chunk also includes piggy-back dynamic allocation area, a separate chunk mapping the same region is created to serve dynamic allocation. The first one is called static first chunk and the second dynamic first chunk. Although they share the page map, their different area map initializations guarantee they serve disjoint areas according to their purposes. If arch doesn't setup reserved area, reserved allocation is handled like any other allocation. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-03-06 14:33:59 +09:00
Steven Rostedt	770cb24345	tracing: add format files for ftrace default entries Impact: allow user apps to read binary format of basic ftrace entries Currently, only defined raw events export their formats so a binary reader can parse them. There's no reason that the default ftrace entries can't export their formats. This patch adds a subsystem called "ftrace" in the events directory that includes the ftrace entries for basic ftrace recorded items. These only have three files in the events directory: type : printf available_types : printf format : format for the event entry For example: # cat /debug/tracing/events/ftrace/wakeup/format name: wakeup ID: 3 format: field:unsigned char type; offset:0; size:1; field:unsigned char flags; offset:1; size:1; field:unsigned char preempt_count; offset:2; size:1; field:int pid; offset:4; size:4; field:int tgid; offset:8; size:4; field:unsigned int prev_pid; offset:12; size:4; field:unsigned char prev_prio; offset:16; size:1; field:unsigned char prev_state; offset:17; size:1; field:unsigned int next_pid; offset:20; size:4; field:unsigned char next_prio; offset:24; size:1; field:unsigned char next_state; offset:25; size:1; field:unsigned int next_cpu; offset:28; size:4; print fmt: "%u:%u:%u ==+ %u:%u:%u [%03u]" Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-05 21:46:44 -05:00
Steven Rostedt	33b0c229e3	tracing: move print of event format to separate file Impact: clean up Move the macro that creates the event format file to a separate header. This will allow the default ftrace events to use this same macro to create the formats to read those events. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-05 21:46:42 -05:00
Steven Rostedt	5e2336a0d4	tracing: make all file_operations const Impact: cleanup All file_operations structures should be constant. No one is going to change them. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-05 21:46:40 -05:00
Ingo Molnar	40ada30f96	tracing: clean up menu Clean up menu structure, introduce TRACING_SUPPORT switch that signals whether an architecture supports various instrumentation mechanisms. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 21:53:25 +01:00
Ingo Molnar	a1413c89ae	Merge branch 'x86/urgent' into x86/core Conflicts: arch/x86/include/asm/fixmap_64.h Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 21:48:50 +01:00
Frederic Weisbecker	8a0be9ef82	sched: don't rebalance if attached on NULL domain Impact: fix function graph trace hang / drop pointless softirq on UP While debugging a function graph trace hang on an old PII, I saw that it consumed most of its time on the timer interrupt. And the domain rebalancing softirq was the most concerned. The timer interrupt calls trigger_load_balance() which will decide if it is worth to schedule a rebalancing softirq. In case of builtin UP kernel, no problem arises because there is no domain question. In case of builtin SMP kernel running on an SMP box, still no problem, the softirq will be raised each time we reach the next_balance time. In case of builtin SMP kernel running on a UP box (most distros provide default SMP kernels, whatever the box you have), then the CPU is attached to the NULL sched domain. So a kind of unexpected behaviour happen: trigger_load_balance() -> raises the rebalancing softirq later on softirq: run_rebalance_domains() -> rebalance_domains() where the for_each_domain(cpu, sd) is not taken because of the NULL domain we are attached at. Which means rq->next_balance is never updated. So on the next timer tick, we will enter trigger_load_balance() which will always reschedule() the rebalacing softirq: if (time_after_eq(jiffies, rq->next_balance)) raise_softirq(SCHED_SOFTIRQ); So for each tick, we process this pointless softirq. This patch fixes it by checking if we are attached to the null domain before raising the softirq, another possible fix would be to set the maximal possible JIFFIES value to rq->next_balance if we are attached to the NULL domain. v2: build fix on UP Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:04:44 +01:00
Frederic Weisbecker	0012693ad4	tracing/function-graph-tracer: use the more lightweight local clock Impact: decrease hangs risks with the graph tracer on slow systems Since the function graph tracer can spend too much time on timer interrupts, it's better now to use the more lightweight local clock. Anyway, the function graph traces are more reliable on a per cpu trace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 12:14:41 +01:00
Ingo Molnar	49d2d266ad	Merge commit 'v2.6.29-rc7' into sched/core	2009-03-05 11:59:10 +01:00
David Rientjes	03d78913f0	lockdep: remove duplicate CONFIG_DEBUG_LOCKDEP definitions Impact: cleanup The atomic debug modifiers are already defined in kernel/lockdep_internals.h. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <alpine.DEB.2.00.0903050222160.30401@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 11:46:00 +01:00
Ingo Molnar	a140feab42	Merge commit 'v2.6.29-rc7' into core/locking	2009-03-05 11:45:22 +01:00
David S. Miller	508827ff0a	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tokenring/tmspci.c drivers/net/ucc_geth_mii.c	2009-03-05 02:06:47 -08:00
Ingo Molnar	5e1607a00b	tracing: rename ftrace_printk() => trace_printk() Impact: cleanup Use a more generic name - this also allows the prototype to move to kernel.h and be generally available to kernel developers who want to do some quick tracing. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 10:24:48 +01:00
Steven Rostedt	e9d25fe6ea	tracing: have latency tracers set the latency format The latency tracers (irqsoff, preemptoff, preemptirqsoff, and wakeup) are pretty useless with the default output format. This patch makes them automatically enable the latency format when they are selected. They also record the state of the latency option, and if it was not enabled when selected, they disable it on reset. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 22:15:30 -05:00
Steven Rostedt	27d48be844	tracing: consolidate print_lat_fmt and print_trace_fmt Impact: clean up Both print_lat_fmt and print_trace_fmt do pretty much the same thing except for one different function call. This patch consolidates the two functions and adds an if statement to perform the difference. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 21:57:29 -05:00
Steven Rostedt	5fd73f8624	tracing: remove extra latency_trace method from trace structure Impact: clean up The trace and latency_trace function pointers are identical for every tracer but the function tracer. The differences in the function tracer are trivial (latency output puts paranthesis around parent). This patch removes the latency_trace pointer and all prints will now just use the trace output function pointer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 21:42:04 -05:00
Steven Rostedt	c032ef64d6	tracing: add latency output format option With the removal of the latency_trace file, we lost the ability to see some of the finer details in a trace. Like the state of interrupts enabled, the preempt count, need resched, and if we are in an interrupt handler, softirq handler or not. This patch simply creates an option to bring back the old format. This also removes the warning about an unused variable that held the latency_trace file operations. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 20:34:24 -05:00
Steven Rostedt	e74da5235c	tracing: fix seq read from trace files The buffer used by trace_seq was updated incorrectly. Instead of consuming what was actually read, it consumed the rest of the buffer on reads. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 20:31:11 -05:00
Steven Rostedt	2dc5d12b1f	tracing: do not return EFAULT if read copied anything Impact: fix trace read to conform to standards Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention that a userspace read should not return -EFAULT if it succeeded in copying anything. It should only return -EFAULT if it failed to copy at all. This patch modifies the check of copy_from_user and updates the return code appropriately. I also used H. Peter Anvin's short cut rule to just test ret == count. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 19:10:05 -05:00
Steven Rostedt	4f3640f8a3	ring-buffer: fix timestamp in partial ring_buffer_page_read If a partial ring_buffer_page_read happens, then some of the incremental timestamps may be lost. This patch writes the recent timestamp into the page that is passed back to the caller. A partial ring_buffer_page_read is where the full page would not be written back to the user, and instead, just part of the page is copied to the user. A full page would be a page swap with the ring buffer and the timestamps would be correct. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 19:01:45 -05:00
Steven Rostedt	e543ad7691	tracing: add cpu_file intialization for ftrace_dump Impact: fix to ftrace_dump output corruption The commit: `b04cc6b1f6` tracing/core: introduce per cpu tracing files added a new field to the iterator called cpu_file. This was a handle to differentiate between the per cpu trace output files and the all cpu "trace" file. The all cpu "trace" file required setting this to TRACE_PIPE_ALL_CPU. The problem is that the ftrace_dump sets up its own iterator but was not updated to handle this change. The result was only CPU 0 printing out on crash and a lot of "<0>"'s also being printed. Reported-by: Thomas Gleixner <tglx@linuxtronix.de> Tested-by: Darren Hart <dvhtc@us.ibm.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 18:32:28 -05:00
Eric Dumazet	64ca5ab913	rcu: increment quiescent state counter in ksoftirqd() If a machine is flooded by network frames, a cpu can loop 100% of its time inside ksoftirqd() without calling schedule(). This can delay RCU grace period to insane values. Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem. Paul: "This regression was a result of the recent change from "schedule()" to "cond_resched()", which got rid of that quiescent state in the common case where a reschedule is not needed". Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 22:08:45 +01:00
Peter Zijlstra	efed792d67	tracing: add lockdep tracepoints for lock acquire/release Augment the traces with lock names when lockdep is available: 1) \| down_read_trylock() { 1) \| _spin_lock_irqsave() { 1) \| /* lock_acquire: &sem->wait_lock / 1) 4.201 us \| } 1) \| _spin_unlock_irqrestore() { 1) \| / lock_release: &sem->wait_lock / 1) 3.523 us \| } 1) \| / lock_acquire: try read &mm->mmap_sem / 1) + 13.386 us \| } 1) 1.635 us \| find_vma(); 1) \| handle_mm_fault() { 1) \| __do_fault() { 1) \| filemap_fault() { 1) \| find_lock_page() { 1) \| find_get_page() { 1) \| / lock_acquire: read rcu_read_lock / 1) \| / lock_release: rcu_read_lock / 1) 5.697 us \| } 1) 8.158 us \| } 1) + 11.079 us \| } 1) \| _spin_lock() { 1) \| / lock_acquire: __pte_lockptr(page) / 1) 3.949 us \| } 1) 1.460 us \| page_add_file_rmap(); 1) \| _spin_unlock() { 1) \| / lock_release: __pte_lockptr(page) / 1) 3.115 us \| } 1) \| unlock_page() { 1) 1.421 us \| page_waitqueue(); 1) 1.220 us \| __wake_up_bit(); 1) 6.519 us \| } 1) + 34.328 us \| } 1) + 37.452 us \| } 1) \| up_read() { 1) \| / lock_release: &mm->mmap_sem / 1) \| _spin_lock_irqsave() { 1) \| / lock_acquire: &sem->wait_lock / 1) 3.865 us \| } 1) \| _spin_unlock_irqrestore() { 1) \| / lock_release: &sem->wait_lock */ 1) 8.562 us \| } 1) + 17.370 us \| } Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1236166375.5330.7209.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 18:49:58 +01:00
Ingo Molnar	28b1bd1cbc	Merge branch 'core/locking' into tracing/ftrace	2009-03-04 18:49:19 +01:00
Peter Zijlstra	26575e28df	lockdep: remove extra "irq" string Impact: clarify lockdep printk text print_irq_inversion_bug() gets handed state strings of the form "HARDIRQ", "SOFTIRQ", "RECLAIM_FS" and appends "-irq-{un,}safe" to them, which is either redudant for *IRQ or confusing in the RECLAIM_FS case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236175192.5330.7585.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 18:38:34 +01:00
Peter Zijlstra	1c21f14ec4	lockdep: fix incorrect state name In the recent mark_lock_irq() rework a bug snuck in that would report the state of write locks causing irq inversion under a read lock as a read lock. Fix this by masking the read bit of the state when validating write dependencies. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236172646.5330.7450.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 18:38:33 +01:00
Ingo Molnar	2602c3ba45	Merge branch 'rfc/splice/tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-04 11:15:42 +01:00
Ingo Molnar	a1be621dfa	Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core	2009-03-04 11:14:47 +01:00
Steven Rostedt	2cadf9135e	tracing: add binary buffer files for use with splice Impact: new feature This patch creates a directory of files that correspond to the per CPU ring buffers. These are binary files and are made to be used with splice. This is the fastest way to extract data from the ftrace ring buffers. Thanks to Jiaying Zhang for pushing me to get this code fixed, and to Eduard - Gabriel Munteanu for his splice code that helped me debug my code. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 21:01:55 -05:00
Steven Rostedt	474d32b68d	ring-buffer: make ring_buffer_read_page read from start on partial page Impact: dont leave holes in read buffer page The ring_buffer_read_page swaps a given page with the reader page of the ring buffer, if certain conditions are set: 1) requested length is big enough to hold entire page data 2) a writer is not currently on the page 3) the page is not partially consumed. Instead of swapping with the supplied page. It copies the data to the supplied page instead. But currently the data is copied in the same offset as the source page. This causes a hole at the start of the reader page. This complicates the use of this function. Instead, it should copy the data at the beginning of the function and update the index fields accordingly. Other small clean ups are also done in this patch. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 20:52:27 -05:00
Steven Rostedt	e3d6bf0a07	ring-buffer: replace sizeof of event header with offsetof Impact: fix to possible alignment problems on some archs. Some arch compilers include an NULL char array in the sizeof field. Since the ring_buffer_event type includes one of these, it is better to use the "offsetof" instead, to avoid strange bugs on these archs. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 20:52:01 -05:00
Steven Rostedt	ef7a4a1614	ring-buffer: fix ring_buffer_read_page The ring_buffer_read_page was broken if it were to only copy part of the page. This patch fixes that up as well as adds a parameter to allow a length field, in order to only copy part of the buffer page. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 20:51:24 -05:00
Steven Rostedt	41be4da4e8	ring-buffer: reset write field for ring_buffer_read_page Impact: fix ring_buffer_read_page After a page is swapped into the ring buffer, the write field must also be reset. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 20:50:54 -05:00
Ingo Molnar	91d75e209b	Merge branch 'x86/core' into core/percpu	2009-03-04 02:29:19 +01:00
Ingo Molnar	8b0e5860cb	Merge branches 'x86/apic', 'x86/cpu', 'x86/fixmap', 'x86/mm', 'x86/sched', 'x86/setup-lzma', 'x86/signal' and 'x86/urgent' into x86/core	2009-03-04 02:22:31 +01:00
Linus Torvalds	219f170a85	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: don't allow setuid to succeed if the user does not have rt bandwidth sched_rt: don't start timer when rt bandwidth disabled	2009-03-03 14:33:20 -08:00
Linus Torvalds	b24746c7be	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Teach RCU that idle task is not quiscent state at boot	2009-03-03 14:32:04 -08:00
Ingo Molnar	3612fdf780	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-03 16:02:16 +01:00
Steven Rostedt	633ddaa7f4	tracing: fix return value to registering events The registering of events had the return value check backwards. A zero returned is success, the check had it as a failure. This patch also fixes a missing "\n" in the warning that the check failed. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 09:43:50 -05:00
Roland McGrath	5b1017404a	x86-64: seccomp: fix 32/64 syscall hole On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with ljmp, and then use the "syscall" instruction to make a 64-bit system call. A 64-bit process make a 32-bit system call with int $0x80. In both these cases under CONFIG_SECCOMP=y, secure_computing() will use the wrong system call number table. The fix is simple: test TS_COMPAT instead of TIF_IA32. Here is an example exploit: /* test case for seccomp circumvention on x86-64 There are two failure modes: compile with -m64 or compile with -m32. The -m64 case is the worst one, because it does "chmod 777 ." (could be any chmod call). The -m32 case demonstrates it was able to do stat(), which can glean information but not harm anything directly. A buggy kernel will let the test do something, print, and exit 1; a fixed kernel will make it exit with SIGKILL before it does anything. / #define _GNU_SOURCE #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <linux/prctl.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> int main (int argc, char *argv) { char buf[100]; static const char dot[] = "."; long ret; unsigned st[24]; if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0) perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?"); #ifdef __x86_64__ assert ((uintptr_t) dot < (1UL << 32)); asm ("int $0x80 # %0 <- %1(%2 %3)" : "=a" (ret) : "0" (15), "b" (dot), "c" (0777)); ret = snprintf (buf, sizeof buf, "result %ld (check mode on .!)\n", ret); #elif defined __i386__ asm (".code32\n" "pushl %%cs\n" "pushl $2f\n" "ljmpl $0x33, $1f\n" ".code64\n" "1: syscall # %0 <- %1(%2 %3)\n" "lretl\n" ".code32\n" "2:" : "=a" (ret) : "0" (4), "D" (dot), "S" (&st)); if (ret == 0) ret = snprintf (buf, sizeof buf, "stat . -> st_uid=%u\n", st[7]); else ret = snprintf (buf, sizeof buf, "result %ld\n", ret); #else # error "not this one" #endif write (1, buf, ret); syscall (__NR_exit, 1); return 2; } Signed-off-by: Roland McGrath <roland@redhat.com> [ I don't know if anybody actually uses seccomp, but it's enabled in at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-02 15:41:30 -08:00
Peter Zijlstra	044d408409	genirq: assert that irq handlers are indeed running in hardirq context Make sure the genirq layer handlers are indeed running handlers in hardirq context. That is the genirq expectation and doing anything else is broken. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1236006812.5330.632.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 00:05:45 +01:00
Ingo Molnar	ed662d9b2a	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-02 22:38:31 +01:00
Ingo Molnar	fdfa66ab45	Merge branches 'tracing/ftrace', 'tracing/mmiotrace' and 'linus' into tracing/core	2009-03-02 22:37:35 +01:00
Ingo Molnar	c02368a9d0	Merge branch 'linus' into irq/genirq	2009-03-02 22:08:56 +01:00
Steven Rostedt	96ccd21cd1	tracing: add print format to event trace format files This patch adds the internal print format used to print the raw events to the event trace point format file. # cat /debug/tracing/events/sched/sched_switch/format name: sched_switch ID: 29 format: field:unsigned char type; offset:0; size:1; field:unsigned char flags; offset:1; size:1; field:unsigned char preempt_count; offset:2; size:1; field:int pid; offset:4; size:4; field:int tgid; offset:8; size:4; field:pid_t prev_pid; offset:12; size:4; field:int prev_prio; offset:16; size:4; field special:char next_comm[TASK_COMM_LEN]; offset:20; size:16; field:pid_t next_pid; offset:36; size:4; field:int next_prio; offset:40; size:4; print fmt: "prev %d:%d ==> next %s:%d:%d" Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-02 15:22:21 -05:00
Steven Rostedt	c5e4e19271	tracing: add trace name and id to event formats To be able to identify the trace in the binary format output, the id of the trace event (which is dynamically assigned) must also be listed. This patch adds the name of the trace point as well as the id assigned. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-02 15:10:02 -05:00
Steven Rostedt	91729ef966	tracing: add ftrace headers to event format files This patch includes the ftrace header to the event formats files: # cat /debug/tracing/events/sched/sched_switch/format field:unsigned char type; offset:0; size:1; field:unsigned char flags; offset:1; size:1; field:unsigned char preempt_count; offset:2; size:1; field:int pid; offset:4; size:4; field:int tgid; offset:8; size:4; field:pid_t prev_pid; offset:12; size:4; field:int prev_prio; offset:16; size:4; field special:char next_comm[TASK_COMM_LEN]; offset:20; size:16; field:pid_t next_pid; offset:36; size:4; field:int next_prio; offset:40; size:4; A blank line is used as a deliminator between the ftrace header and the trace point fields. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-02 15:03:01 -05:00
Steven Rostedt	981d081ec8	tracing: add format file to describe event struct fields This patch adds the "format" file to the trace point event directory. This is based off of work by Tom Zanussi, in which a file is exported to be tread from user land such that a user space app may read the binary record stored in the ring buffer. # cat /debug/tracing/events/sched/sched_switch/format field:pid_t prev_pid; offset:12; size:4; field:int prev_prio; offset:16; size:4; field special:char next_comm[TASK_COMM_LEN]; offset:20; size:16; field:pid_t next_pid; offset:36; size:4; field:int next_prio; offset:40; size:4; Idea-from: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-02 14:27:27 -05:00
Steven Rostedt	f9520750c4	tracing: make trace_seq_reset global and rename to trace_seq_init Impact: clean up The trace_seq functions may be used separately outside of the ftrace iterator. The trace_seq_reset is needed for these operations. This patch also renames trace_seq_reset to the more appropriate trace_seq_init. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-02 14:08:51 -05:00
Steven Rostedt	11a241a330	tracing: add protection around modify trace event fields The trace event objects are currently not proctected against reentrancy. This patch adds a mutex around the modifications of the trace event fields. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-02 11:49:04 -05:00
Steven Rostedt	d20e3b0384	tracing: add TRACE_FIELD_SPECIAL to record complex entries Tom Zanussi pointed out that the simple TRACE_FIELD was not enough to record trace data that required memcpy. This patch addresses this issue by adding a TRACE_FIELD_SPECIAL. The format is similar to TRACE_FIELD but looks like so: TRACE_FIELD_SPECIAL(type_item, item, cmd) What TRACE_FIELD gave was: TRACE_FIELD(type, item, assign) The TRACE_FIELD would be used in declaring a structure: struct { type item; }; And later assign it via: entry->item = assign; What TRACE_FIELD_SPECIAL gives us is: In the declaration of the structure: struct { type_item; }; And the assignment: cmd; This change log will explain the one example used in the patch: TRACE_EVENT_FORMAT(sched_switch, TPPROTO(struct rq rq, struct task_struct prev, struct task_struct *next), TPARGS(rq, prev, next), TPFMT("task %s:%d ==> %s:%d", prev->comm, prev->pid, next->comm, next->pid), TRACE_STRUCT( TRACE_FIELD(pid_t, prev_pid, prev->pid) TRACE_FIELD(int, prev_prio, prev->prio) TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN], next_comm, TPCMD(memcpy(TRACE_ENTRY->next_comm, next->comm, TASK_COMM_LEN))) TRACE_FIELD(pid_t, next_pid, next->pid) TRACE_FIELD(int, next_prio, next->prio) ), TPRAWFMT("prev %d:%d ==> next %s:%d:%d") ); The struct will be create as: struct { pid_t prev_pid; int prev_prio; char next_comm[TASK_COMM_LEN]; pid_t next_pid; int next_prio; }; Note the TRACE_ENTRY in the cmd part of TRACE_SPECIAL. TRACE_ENTRY will be set by the tracer to point to the structure inside the trace buffer. entry->prev_pid = prev->pid; entry->prev_prio = prev->prio; memcpy(entry->next_comm, next->comm, TASK_COMM_LEN); entry->next_pid = next->pid; entry->next_prio = next->prio Reported-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-02 10:53:15 -05:00
Wang Chen	b67802ea80	sched: kill unused parameter of pick_next_task() Impact: micro-optimization Parameter "prev" is not used really. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 12:02:53 +01:00
Ingo Molnar	5512b3ece0	Merge branches 'sched/clock', 'sched/urgent' and 'linus' into sched/core	2009-03-02 12:02:36 +01:00
David S. Miller	aa4abc9bcc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-tx.c net/8021q/vlan_core.c net/core/dev.c	2009-03-01 21:35:16 -08:00
Ingo Molnar	55f2b78995	Merge branch 'x86/urgent' into x86/pat	2009-03-01 12:47:58 +01:00
Ingo Molnar	acdb2c2879	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-28 10:15:58 +01:00
Steven Rostedt	fd99498989	tracing: add raw fast tracing interface for trace events This patch adds the interface to enable the C style trace points. In the directory /debugfs/tracing/events/subsystem/event We now have three files: enable : values 0 or 1 to enable or disable the trace event. available_types: values 'raw' and 'printf' which indicate the tracing types available for the trace point. If a developer does not use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT macro, then only 'printf' will be available. This file is read only. type: values 'raw' or 'printf'. This indicates which type of tracing is active for that trace point. 'printf' is the default and if 'raw' is not available, this file is read only. # echo raw > /debug/tracing/events/sched/sched_wakeup/type # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable Will enable the C style tracing for the sched_wakeup trace point. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-28 04:04:03 -05:00
Steven Rostedt	c32e827b25	tracing: add raw trace point recording infrastructure Impact: lower overhead tracing The current event tracer can automatically pick up trace points that are registered with the TRACE_FORMAT macro. But it required a printf format string and parsing. Although, this adds the ability to get guaranteed information like task names and such, it took a hit in overhead processing. This processing can add about 500-1000 nanoseconds overhead, but in some cases that too is considered too much and we want to shave off as much from this overhead as possible. Tom Zanussi recently posted tracing patches to lkml that are based on a nice idea about capturing the data via C structs using STRUCT_ENTER, STRUCT_EXIT type of macros. I liked that method very much, but did not like the implementation that required a developer to add data/code in several disjoint locations. This patch extends the event_tracer macros to do a similar "raw C" approach that Tom Zanussi did. But instead of having the developers needing to tweak a bunch of code all over the place, they can do it all in one macro - preferably placed near the code that it is tracing. That makes it much more likely that tracepoints will be maintained on an ongoing basis by the code they modify. The new macro TRACE_EVENT_FORMAT is created for this approach. (Note, a developer may still utilize the more low level DECLARE_TRACE macros if they don't care about getting their traces automatically in the event tracer.) They can also use the existing TRACE_FORMAT if they don't need to code the tracepoint in C, but just want to use the convenience of printf. So if the developer wants to "hardwire" a tracepoint in the fastest possible way, and wants to acquire their data via a user space utility in a raw binary format, or wants to see it in the trace output but not sacrifice any performance, then they can implement the faster but more complex TRACE_EVENT_FORMAT macro. Here's what usage looks like: TRACE_EVENT_FORMAT(name, TPPROTO(proto), TPARGS(args), TPFMT(fmt, fmt_args), TRACE_STUCT( TRACE_FIELD(type1, item1, assign1) TRACE_FIELD(type2, item2, assign2) [...] ), TPRAWFMT(raw_fmt) ); Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT uses. name: is the unique identifier of the trace point proto: The proto type that the trace point uses args: the args in the proto type fmt: printf format to use with the event printf tracer fmt_args: the printf argments to match fmt TRACE_STRUCT starts the ability to create a structure. Each item in the structure is defined with a TRACE_FIELD TRACE_FIELD(type, item, assign) type: the C type of item. item: the name of the item in the stucture assign: what to assign the item in the trace point callback raw_fmt is a way to pretty print the struct. It must match the order of the items are added in TRACE_STUCT An example of this would be: TRACE_EVENT_FORMAT(sched_wakeup, TPPROTO(struct rq rq, struct task_struct p, int success), TPARGS(rq, p, success), TPFMT("task %s:%d %s", p->comm, p->pid, success?"succeeded":"failed"), TRACE_STRUCT( TRACE_FIELD(pid_t, pid, p->pid) TRACE_FIELD(int, success, success) ), TPRAWFMT("task %d success=%d") ); This creates us a unique struct of: struct { pid_t pid; int success; }; And the way the call back would assign these values would be: entry->pid = p->pid; entry->success = success; The nice part about this is that the creation of the assignent is done via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is created, the developer will then have a faster method to record into the ring buffer. They do not need to worry about the tracer itself. The developer would only need to touch the files in include/trace/*.h Again, I would like to give special thanks to Tom Zanussi for this nice idea. Idea-from: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-28 03:09:32 -05:00
Steven Rostedt	ef5580d0ff	tracing: add interface to write into current tracer buffer Right now all tracers must manage their own trace buffers. This was to enforce tracers to be independent in case we finally decide to allow each tracer to have their own trace buffer. But now we are adding event tracing that writes to the current tracer's buffer. This adds an interface to allow events to write to the current tracer buffer without having to manage its own. Since event tracing has no "tracer", and is just a way to hook into any other tracer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-28 03:06:44 -05:00
Steven Rostedt	b628b3e629	tracing: make the set_event and available_events subsystem aware This patch makes the event files, set_event and available_events aware of the subsystem. Now you can enable an entire subsystem with: echo 'irq:' > set_event Note: the '' is not needed. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-28 03:05:40 -05:00
Ingo Molnar	e9abf4c59d	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-28 09:03:58 +01:00
Steven Rostedt	6ecc2d1ca3	tracing: add subsystem level to trace events If a trace point header defines TRACE_SYSTEM, then it will add the following trace points into that event system. If include/trace/irq_event_types.h has: #define TRACE_SYSTEM irq at the top and #undef TRACE_SYSTEM at the bottom, then a directory "irq" will be created in the /debug/tracing/events directory. Inside that directory will contain the two trace points that are defined in include/trace/irq_event_types.h. Only adding the above to irq and not to sched, we get: # ls /debug/tracing/events/ irq sched_process_exit sched_signal_send sched_wakeup_new sched_kthread_stop sched_process_fork sched_switch sched_kthread_stop_ret sched_process_free sched_wait_task sched_migrate_task sched_process_wait sched_wakeup # ls /debug/tracing/events/irq irq_handler_entry irq_handler_exit If we add #define TRACE_SYSTEM sched to the trace/sched_event_types.h then the rest of the trace events will be put in a sched directory within the events directory. I've been playing with this idea of the subsystem for a while, but recently Tom Zanussi posted some patches to lkml that included this method. Tom's approach was clean and got me to finally put some effort to clean up the event trace points. Thanks to Tom Zanussi for demonstrating how nice the subsystem method is. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-28 02:59:43 -05:00
Steven Rostedt	eb594e45f6	tracing: move trace point formats to files in include/trace directory Impact: clean up To further facilitate the ease of adding trace points for developers, this patch creates include/trace/trace_events.h and include/trace/trace_event_types.h. The former file will hold the trace/<type>.h files and the latter will hold the trace/<type>_event_types.h files. To create new tracepoints and to have them automatically appear in the event tracer, a developer makes the trace/<type>.h file which includes <linux/tracepoint.h> and the trace/<type>_event_types.h file. The trace/<type>_event_types.h file will hold the TRACE_FORMAT macros. Then add the trace/<type>.h file to trace/trace_events.h, and add the trace/<type>_event_types.h to the trace_event_types.h file. No need to modify files elsewhere. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-28 02:58:50 -05:00
David Howells	5170836679	Fix recursive lock in free_uid()/free_user_ns() free_uid() and free_user_ns() are corecursive when CONFIG_USER_SCHED=n, but free_user_ns() is called from free_uid() by way of uid_hash_remove(), which requires uidhash_lock to be held. free_user_ns() then calls free_uid() to complete the destruction. Fix this by deferring the destruction of the user_namespace. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-27 16:26:21 -08:00
Steven Rostedt	0cfe82451d	tracing: replace kzalloc with kcalloc Impact: clean up kcalloc is a better approach to allocate a NULL array. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-27 10:51:10 -05:00
Dhaval Giani	54e9912428	sched: don't allow setuid to succeed if the user does not have rt bandwidth Impact: fix hung task with certain (non-default) rt-limit settings Corey Hickey reported that on using setuid to change the uid of a rt process, the process would be unkillable and not be running. This is because there was no rt runtime for that user group. Add in a check to see if a user can attach an rt task to its task group. On failure, return EINVAL, which is also returned in CONFIG_CGROUP_SCHED. Reported-by: Corey Hickey <bugfood-ml@fatooh.org> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-27 11:11:53 +01:00
Ingo Molnar	da74ff0f9b	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace Conflicts: kernel/sched_clock.c	2009-02-27 09:06:11 +01:00
Ingo Molnar	1b49061d40	Merge branch 'sched/clock' into tracing/ftrace Conflicts: kernel/sched_clock.c	2009-02-27 08:35:19 +01:00
Steven Rostedt	5c6a3ae1b4	tracing: use newline separator for trace options list Impact: clean up Instead of listing the trace options like: # cat /debug/tracing/trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj We now list them like: # cat /debug/tracing/trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-27 00:22:21 -05:00
Steven Rostedt	85a2f9b46f	tracing: use pointer error returns for __tracing_open Impact: fix compile warning and clean up When I first wrote __tracing_open, instead of passing the error code via the ERR_PTR macros, I lazily used a separate parameter to hold the return for errors. When Frederic Weisbecker updated that function, he used the Linux kernel ERR_PTR for the returns. This caused the parameter return to possibly not be initialized on error. gcc correctly pointed this out with a warning. This patch converts the entire function to use the Linux kernel ERR_PTR macro methods. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-27 00:12:38 -05:00
Steven Rostedt	d8e83d26b5	tracing: add protection around open use of current_tracer Impact: fix to possible race conditions There's some uses of current_tracer that is not protected by the trace_types_lock. There is a small chance that a sysadmin changes the tracer while the current_tracer is being referenced. If the race is hit, it is unlikely to cause any harm since the tracers are constant and are not freed. But some strang side effects may occur. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-26 23:55:58 -05:00
Steven Rostedt	577b785f55	tracing: add tracer dependent options to options directory This patch adds the tracer dependent options dynamically to the options directory when the tracer is activated. These options are removed when the tracer is deactivated. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-26 23:43:05 -05:00
Steven Rostedt	a825907507	tracing: add options directory and core option files This patch creates an options directory in the debugfs, that contains the available tracing options. These files contain 1 or 0, where 1 is the option is enabled and 0 it is disabled. Simply echoing in 1 will enable the option and 0 will disable it. This patch only contains the core options, not the tracer options. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-26 22:22:46 -05:00
Serge E. Hallyn	1d1e97562e	keys: distinguish per-uid keys in different namespaces per-uid keys were looked by uid only. Use the user namespace to distinguish the same uid in different namespaces. This does not address key_permission. So a task can for instance try to join a keyring owned by the same uid in another namespace. That will be handled by a separate patch. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-27 12:35:06 +11:00
Peter Zijlstra	8325d9c09d	sched_clock: cleanups - remove superfluous checks in __update_sched_clock() - skip sched_clock_tick() for sched_clock_stable - reinstate the simple !HAVE_UNSTABLE_SCHED_CLOCK code to please the bloatwatch Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 21:56:07 +01:00
Ingo Molnar	5d0859cef2	Merge branch 'sched/clock' into tracing/ftrace Conflicts: kernel/sched_clock.c	2009-02-26 21:21:59 +01:00
Ingo Molnar	b342501cd3	sched: allow architectures to specify sched_clock_stable Allow CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures to still specify that their sched_clock() implementation is reliable. This will be used by x86 to switch on a faster sched_clock_cpu() implementation on certain CPU types. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 21:20:22 +01:00
John Stultz	a2a5ac8650	time: ntp: fix bug in ntp_update_offset() & do_adjtimex(), fix The time_status conditional was accidentally placed right after we clear the checked time_status bits, which causes us to take the conditional every time through. This fixes it by moving the conditional to before we clear the time_status bits. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 19:39:47 +01:00
Ingo Molnar	14131f2f98	tracing: implement trace_clock_*() APIs Impact: implement new tracing timestamp APIs Add three trace clock variants, with differing scalability/precision tradeoffs: - local: CPU-local trace clock - medium: scalable global clock with some jitter - global: globally monotonic, serialized clock Make the ring-buffer use the local trace clock internally. Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 18:44:06 +01:00
Ingo Molnar	6409c4da28	sched: sched_clock() improvement: use in_nmi() make sure we dont execute more complex sched_clock() code in NMI context. Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 18:44:05 +01:00
Jason Baron	af39241b90	tracing, genirq: add irq enter and exit trace events Impact: add new tracepoints Add them to the generic IRQ code, that way every architecture gets these new tracepoints, not just x86. Using Steve's new 'TRACE_FORMAT', I can get function graph trace as follows using the original two IRQ tracepoints: 3) \| handle_IRQ_event() { 3) \| /* (irq_handler_entry) irq=28 handler=eth0 / 3) \| e1000_intr_msi() { 3) 2.460 us \| __napi_schedule(); 3) 9.416 us \| } 3) \| / (irq_handler_exit) irq=28 handler=eth0 return=handled */ 3) + 22.935 us \| } Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 18:43:50 +01:00
Hiroshi Shimamoto	cac64d00c2	sched_rt: don't start timer when rt bandwidth disabled Impact: fix incorrect condition check No need to start rt bandwidth timer when rt bandwidth is disabled. If this timer starts, it may stop at sched_rt_period_timer() on the first time. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 14:18:55 +01:00
Frederic Weisbecker	8656e7a2fa	tracing/core: make the per cpu trace files in per cpu directories Impact: restructure the VFS layout of per CPU trace buffers The per cpu trace files are all in a single directory: /debug/tracing/per_cpu. In case of a large number of cpu, the content of this directory becomes messy so we create now one directory per cpu inside /debug/tracing/per_cpu which contain each their own trace_pipe and trace files. Ie: /debug/tracing$ ls -R per_cpu per_cpu: cpu0 cpu1 per_cpu/cpu0: trace trace_pipe per_cpu/cpu1: trace trace_pipe Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 14:04:08 +01:00
Li Zefan	c40c6f85a7	cpuacct: add a branch prediction cpuacct_charge() is in fast-path, and checking of !cpuacct_susys.active always returns false after cpuacct has been initialized at system boot. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 13:22:38 +01:00
Ingo Molnar	4434e51564	Merge branches 'sched/cleanups', 'sched/urgent' and 'linus' into sched/core	2009-02-26 13:22:13 +01:00
Paul E. McKenney	a682604838	rcu: Teach RCU that idle task is not quiscent state at boot This patch fixes a bug located by Vegard Nossum with the aid of kmemcheck, updated based on review comments from Nick Piggin, Ingo Molnar, and Andrew Morton. And cleans up the variable-name and function-name language. ;-) The boot CPU runs in the context of its idle thread during boot-up. During this time, idle_cpu(0) will always return nonzero, which will fool Classic and Hierarchical RCU into deciding that a large chunk of the boot-up sequence is a big long quiescent state. This in turn causes RCU to prematurely end grace periods during this time. This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks() function to ignore the idle task as a quiescent state until the system has started up the scheduler in rest_init(), introducing a new non-API function rcu_idle_now_means_idle() to inform RCU of this transition. RCU maintains an internal rcu_idle_cpu_truthful variable to track this state, which is then used by rcu_check_callback() to determine if it should believe idle_cpu(). Because this patch has the effect of disallowing RCU grace periods during long stretches of the boot-up sequence, this patch also introduces Josh Triplett's UP-only optimization that makes synchronize_rcu() be a no-op if num_online_cpus() returns 1. This allows boot-time code that calls synchronize_rcu() to proceed normally. Note, however, that RCU callbacks registered by call_rcu() will likely queue up until later in the boot sequence. Although rcuclassic and rcutree can also use this same optimization after boot completes, rcupreempt must restrict its use of this optimization to the portion of the boot sequence before the scheduler starts up, given that an rcupreempt RCU read-side critical section may be preeempted. In addition, this patch takes Nick Piggin's suggestion to make the system_state global variable be __read_mostly. Changes since v4: o Changes the name of the introduced function and variable to be less emotional. ;-) Changes since v3: o WARN_ON(nr_context_switches() > 0) to verify that RCU switches out of boot-time mode before the first context switch, as suggested by Nick Piggin. Changes since v2: o Created rcu_blocking_is_gp() internal-to-RCU API that determines whether a call to synchronize_rcu() is itself a grace period. o The definition of rcu_blocking_is_gp() for rcuclassic and rcutree checks to see if but a single CPU is online. o The definition of rcu_blocking_is_gp() for rcupreempt checks to see both if but a single CPU is online and if the system is still in early boot. This allows rcupreempt to again work correctly if running on a single CPU after booting is complete. o Added check to rcupreempt's synchronize_sched() for there being but one online CPU. Tested all three variants both SMP and !SMP, booted fine, passed a short rcutorture test on both x86 and Power. Located-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 04:08:14 +01:00
Ingo Molnar	f4abfb8d0d	Merge branch 'tip/tracing/ftrace' of ssh://master.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-26 03:48:44 +01:00
Ingo Molnar	e36b1e136a	Merge branches 'tracing/ftrace', 'tracing/hw-branch-tracing' and 'linus' into tracing/core	2009-02-26 03:47:27 +01:00
Steven Rostedt	eef62a6826	tracing: rename DEFINE_TRACE_FMT to just TRACE_FORMAT There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it TRACE_FORMAT. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-25 21:44:22 -05:00
Ingo Molnar	39854fe8c1	time: ntp: clean up second_overflow() Impact: cleanup, no functionality changed The 'time_adj' local variable is named in a very confusing way because it almost shadows the 'time_adjust' global variable - which is used in this same function. Rename it to 'delta' - to make them stand apart more clearly. kernel/time/ntp.o: text data bss dec hex filename 2545 114 144 2803 af3 ntp.o.before 2545 114 144 2803 af3 ntp.o.after md5: 1bf0b3be564512279ba7cee299d1d2be ntp.o.before.asm 1bf0b3be564512279ba7cee299d1d2be ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:17 +01:00
Ingo Molnar	069569e025	time: ntp: simplify ntp_tick_adj calculations Impact: micro-optimization Convert the (internal) ntp_tick_adj value we store from unscaled units to scaled units. This is a constant that we never modify, so scaling it up once during bootup is enough - we dont have to do it for every adjustment step. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:16 +01:00
Ingo Molnar	2b9d1496e7	time: ntp: make 64-bit constants more robust Impact: cleanup, no functionality changed - make PPM_SCALE an explicit s64 constant, to remove (s64) casts from usage sites. kernel/time/ntp.o: text data bss dec hex filename 2536 114 136 2786 ae2 ntp.o.before 2536 114 136 2786 ae2 ntp.o.after md5: 40a7728d1188aa18e83e21a81fa7b150 ntp.o.before.asm 40a7728d1188aa18e83e21a81fa7b150 ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:15 +01:00
Ingo Molnar	e96291653b	time: ntp: refactor do_adjtimex() some more Impact: cleanup, no functionality changed Further simplify do_adjtimex(): - introduce the ntp_start_leap_timer() helper function - eliminate the goto adj_done complication Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:15 +01:00
Ingo Molnar	80f2257116	time: ntp: refactor do_adjtimex() Impact: cleanup, no functionality changed do_adjtimex() is currently a monster function with a maze of branches. Refactor the txc->modes setting aspects of it into two new helper functions: process_adj_status() process_adjtimex_modes() kernel/time/ntp.o: text data bss dec hex filename 2512 114 136 2762 aca ntp.o.before 2512 114 136 2762 aca ntp.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:14 +01:00
Ingo Molnar	10dd31a7a1	time: ntp: fix bug in ntp_update_offset() & do_adjtimex() Impact: change (fix) the way the NTP PLL seconds offset is initialized/tracked Fix a bug and do a micro-optimization: When PLL is enabled we do not reset time_reftime. If the PLL was off for a long time (for example after bootup), this is arguably the wrong thing to do. We already had a hack for the common boot-time case in ntp_update_offset(), in form of: if (unlikely(time_status & STA_FREQHOLD \|\| time_reftime == 0)) secs = 0; But the update delta should be reset later on too - not just when the PLL is enabled for the first time after bootup. So do it on !STA_PLL -> STA_PLL transitions. This changes behavior, as previously if ntpd was disabled for a long time and we restarted it, we'd run from that last update, with a very large delta. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:13 +01:00
Ingo Molnar	c7986acba2	time: ntp: micro-optimize ntp_update_offset() Impact: cleanup, no functionality changed The time_reftime update in ntp_update_offset() to xtime.tv_sec is a convoluted way of saying that we want to freeze the frequency and want the 'secs' delta to be 0. Also make this branch unlikely. This shaves off 8 bytes from the code size: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2496 114 136 2746 aba ntp.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:12 +01:00
Ingo Molnar	478b7aab16	time: ntp: simplify ntp_update_offset_fll() Impact: cleanup, no functionality changed Change ntp_update_offset_fll() to delta logic instead of absolute value logic. This eliminates 'freq_adj' from the function. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:12 +01:00
Ingo Molnar	f939890b66	time: ntp: refactor and clean up ntp_update_offset() Impact: cleanup, no functionality changed - introduce the ntp_update_offset_fll() helper - clean up the flow and variable naming kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 01f7b8e1a5472a3056f9e4ae84d46315 ntp.o.before.asm 01f7b8e1a5472a3056f9e4ae84d46315 ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:11 +01:00
Ingo Molnar	bc26c31d44	time: ntp: refactor up ntp_update_frequency() Impact: cleanup, no functionality changed Change ntp_update_frequency() from a hard to follow code flow that uses global variables as temporaries, to a clean input+output flow. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:10 +01:00
Ingo Molnar	9ce616aaef	time: ntp: clean up ntp_update_frequency() Impact: cleanup, no functionality changed Prepare a refactoring of ntp_update_frequency(). kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 41f3009debc9b397d7394dd77d912f0a ntp.o.before.asm 41f3009debc9b397d7394dd77d912f0a ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:09 +01:00
Ingo Molnar	bbd1267690	time: ntp: simplify the MAX_TICKADJ_SCALED definition Impact: cleanup, no functionality changed There's an ugly u64 typecase in the MAX_TICKADJ_SCALED definition, this can be eliminated by making the MAX_TICKADJ constant's type 64-bit (signed). kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 41f3009debc9b397d7394dd77d912f0a ntp.o.before.asm 41f3009debc9b397d7394dd77d912f0a ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:08 +01:00
Ingo Molnar	3c972c2444	time: ntp: simplify the second_overflow() code flow Impact: cleanup, no functionality changed Instead of a hierarchy of conditions, transform them to clean gradual conditions and return's. This makes the flow easier to read and makes the purpose of the function easier to understand. kernel/time/ntp.o: text data bss dec hex filename 2552 170 168 2890 b4a ntp.o.before 2552 170 168 2890 b4a ntp.o.after md5: eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.before.asm eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:07 +01:00
Ingo Molnar	53bbfa9e94	time: ntp: clean up kernel/time/ntp.c Impact: cleanup, no functionality changed Make this file a bit more readable by applying a consistent coding style. No code changed: kernel/time/ntp.o: text data bss dec hex filename 2552 170 168 2890 b4a ntp.o.before 2552 170 168 2890 b4a ntp.o.after md5: eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.before.asm eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:06 +01:00
Ingo Molnar	0b13fda1e0	generic-ipi: cleanups Andrew pointed out that there's some small amount of style rot in kernel/smp.c. Clean it up. Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 16:52:50 +01:00
Peter Zijlstra	6e2756376c	generic-ipi: remove CSD_FLAG_WAIT Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework the code so that we can use CSD_FLAG_LOCK for both purposes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 14:13:44 +01:00
Peter Zijlstra	8969a5ede0	generic-ipi: remove kmalloc() Remove the use of kmalloc() from the smp_call_function_*() calls. Steven's generic-ipi patch (`d7240b98`: generic-ipi: use per cpu data for single cpu ipi calls) started the discussion on the use of kmalloc() in this code and fixed the smp_call_function_single(.wait=0) fallback case. In this patch we complete this by also providing means for the _many() call, which fully removes the need for kmalloc() in this code. The problem with the _many() call is that other cpus might still be observing our entry when we're done with it. It solved this by dynamically allocating data elements and RCU-freeing it. We solve it by using a single per-cpu entry which provides static storage and solves one half of the problem (avoiding referencing freed data). The other half, ensuring the queue iteration it still possible, is done by placing re-used entries at the head of the list. This means that if someone was still iterating that entry when it got moved, he will now re-visit the entries on the list he had already seen, but avoids skipping over entries like would have happened had we placed the new entry at the end. Furthermore, visiting entries twice is not a problem, since we remove our cpu from the entry's cpumask once its called. Many thanks to Oleg for his suggestions and him poking holes in my earlier attempts. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 14:13:43 +01:00
Frederic Weisbecker	d7350c3f45	tracing/core: make the read callbacks reentrants Now that several per-cpu files can be read or spliced at the same, we want the read/splice callbacks for tracing files to be reentrants. Until now, a single global mutex (trace_types_lock) serialized the access to tracing_read_pipe(), tracing_splice_read_pipe(), and the seq helpers. Ie: it means that if a user tries to read trace_pipe0 and trace_pipe1 at the same time, the access to the function tracing_read_pipe() is contended and one reader must wait for the other to finish its read call. The trace_type_lock mutex is mostly here to serialize the access to the global current tracer (current_trace), which can be changed concurrently. Although the iter struct keeps a private pointer to this tracer, its callbacks can be changed by another function. The method used here is to not keep anymore private reference to the tracer inside the iterator but to make a copy of it inside the iterator. Then it checks on subsequents read calls if the tracer has changed. This is not costly because the current tracer is not expected to be changed often, so we use a branch prediction for that. Moreover, we add a private mutex to the iterator (there is one iterator per file descriptor) to serialize the accesses in case of multiple consumers per file descriptor (which would be a silly idea from the user). Note that this is not to protect the ring buffer, since the ring buffer already serializes the readers accesses. This is to prevent from traces weirdness in case of concurrent consumers. But these mutexes can be dropped anyway, that would not result in any crash. Just tell me what you think about it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 13:40:58 +01:00
Frederic Weisbecker	b04cc6b1f6	tracing/core: introduce per cpu tracing files Impact: split up tracing output per cpu Currently, on the tracing debugfs directory, three files are available to the user to let him extracting the trace output: - trace is an iterator through the ring-buffer. It's a reader but not a consumer It doesn't block when no more traces are available. - trace pretty similar to the former, except that it adds more informations such as prempt count, irq flag, ... - trace_pipe is a reader and a consumer, it will also block waiting for traces if necessary (heh, yes it's a pipe). The traces coming from different cpus are curretly mixed up inside these files. Sometimes it messes up the informations, sometimes it's useful, depending on what does the tracer capture. The tracing_cpumask file is useful to filter the output and select only the traces captured a custom defined set of cpus. But still it is not enough powerful to extract at the same time one trace buffer per cpu. So this patch creates a new directory: /debug/tracing/per_cpu/. Inside this directory, you will now find one trace_pipe file and one trace file per cpu. Which means if you have two cpus, you will have: trace0 trace1 trace_pipe0 trace_pipe1 And of course, reading these files will have the same effect than with the usual tracing files, except that you will only see the traces from the given cpu. The original all-in-one cpu trace file are still available on their original place. Until now, only one consumer was allowed on trace_pipe to avoid racy consuming on the ring-buffer. Now the approach changed a bit, you can have only one consumer per cpu. Which means you are allowed to read concurrently trace_pipe0 and trace_pipe1 But you can't have two readers on trace_pipe0 or trace_pipe1. Following the same logic, if there is one reader on the common trace_pipe, you can not have at the same time another reader on trace_pipe0 or in trace_pipe1. Because in trace_pipe is already a consumer in all cpu buffers in essence. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 13:40:58 +01:00
Ingo Molnar	2b1b858f69	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-25 12:50:07 +01:00
Nick Piggin	15d0d3b337	generic IPI: simplify barriers and locking Simplify the barriers in generic remote function call interrupt code. Firstly, just unconditionally take the lock and check the list in the generic_call_function_single_interrupt IPI handler. As we've just taken an IPI here, the chances are fairly high that there will be work on the list for us, so do the locking unconditionally. This removes the tricky lockless list_empty check and dubious barriers. The change looks bigger than it is because it is just removing an outer loop. Secondly, clarify architecture specific IPI locking rules. Generic code has no tools to impose any sane ordering on IPIs if they go outside normal cache coherency, ergo the arch code must make them appear to obey cache coherency as a "memory operation" to initiate an IPI, and a "memory operation" to receive one. This way at least they can be reasoned about in generic code, and smp_mb used to provide ordering. The combination of these two changes means that explict barriers can be taken out of queue handling for the single case -- shared data is explicitly locked, and ipi ordering must conform to that, so no barriers needed. An extra barrier is needed in the many handler, so as to ensure we load the list element after the IPI is received. Does any architecture actually need these barriers? For the initiator I could see it, but for the handler I would be surprised. So the other thing we could do for simplicity is just to require that, rather than just matching with cache coherency, we just require a full barrier before generating an IPI, and after receiving an IPI. In which case, the smp_mb()s can go away. But just for now, we'll be on the safe side and use the barriers (they're in the slow case anyway). Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-arch@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 12:27:08 +01:00
Ingo Molnar	886b5b73d7	tracing: remove /debug/tracing/latency_trace Impact: remove old debug/tracing API /debug/tracing/latency_trace is an old legacy format we kept from the old latency tracer. Remove the file for now. If there's any useful bit missing then we'll propagate any useful output bits into the /debug/tracing/trace output. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 11:05:34 +01:00
Ingo Molnar	2d542cf342	tracing/hw-branch-tracing: convert bts-tracer mutex to a spinlock Impact: fix CPU hotplug lockup bts_hotcpu_handler() is called with irqs disabled, so using mutex_lock() is a no-no. All the BTS codepaths here are atomic (they do not schedule), so using a spinlock is the right solution. Cc: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 09:16:01 +01:00
Steven Rostedt	1473e4417c	tracing: make event directory structure This patch adds the directory /debug/tracing/events/ that will contain all the registered trace points. # ls /debug/tracing/events/ sched_kthread_stop sched_process_fork sched_switch sched_kthread_stop_ret sched_process_free sched_wait_task sched_migrate_task sched_process_wait sched_wakeup sched_process_exit sched_signal_send sched_wakeup_new # ls /debug/tracing/events/sched_switch/ enable # cat /debug/tracing/events/sched_switch/enable 1 # cat /debug/tracing/set_event sched_switch Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-24 21:54:08 -05:00
Steven Rostedt	f3fe8e4a38	tracing: add schedule events to event trace This patch changes the trace/sched.h to use the DECLARE_TRACE_FMT such that they are automatically registered with the event tracer. And it also adds the tracing sched headers to kernel/trace/events.c Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-24 21:54:07 -05:00
Steven Rostedt	b77e38aa24	tracing: add event trace infrastructure This patch creates the event tracing infrastructure of ftrace. It will create the files: /debug/tracing/available_events /debug/tracing/set_event The available_events will list the trace points that have been registered with the event tracer. set_events will allow the user to enable or disable an event hook. example: # echo sched_wakeup > /debug/tracing/set_event Will enable the sched_wakeup event (if it is registered). # echo "!sched_wakeup" >> /debug/tracing/set_event Will disable the sched_wakeup event (and only that event). # echo > /debug/tracing/set_event Will disable all events (notice the '>') # cat /debug/tracing/available_events > /debug/tracing/set_event Will enable all registered event hooks. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-24 21:54:05 -05:00
Ingo Molnar	0edcf8d692	Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu Conflicts: arch/x86/include/asm/pgtable.h	2009-02-24 21:52:45 +01:00
Markus Metzger	5e01cb695d	x86, ftrace: fix section mismatch in hw-branch-tracer Fix an invalid memory reference problem when cpu hotplug support is disabled and the hw-branch-tracer is set as current tracer. Initializing the tracer calls bts_trace_init() which has already been freed at this time. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-24 18:23:34 +01:00
Ingo Molnar	a7f4463e03	Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc6' into tracing/core	2009-02-24 18:22:39 +01:00
David S. Miller	e70049b9e7	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/	2009-02-24 03:50:29 -08:00
Benjamin Herrenschmidt	35f88e6b06	Merge commit 'ftrace/function-graph' into next	2009-02-23 10:47:23 +11:00
Ingo Molnar	fc6fc7f1b1	Merge branch 'linus' into x86/apic Conflicts: arch/x86/mach-default/setup.c Semantic conflict resolution: arch/x86/kernel/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-22 20:05:19 +01:00
Rafael J. Wysocki	770824bdc4	PM: Split up sysdev_[suspend\|resume] from device_power_[down\|up] Move the sysdev_suspend/resume from the callee to the callers, with no real change in semantics, so that we can rework the disabling of interrupts during suspend/hibernation. This is based on an earlier patch from Linus. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-22 10:33:44 -08:00
Ingo Molnar	c478f87869	Merge branch 'tip/x86/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace Conflicts: include/linux/ftrace.h kernel/trace/ftrace.c	2009-02-22 18:12:01 +01:00
Linus Torvalds	adfafefd10	Merge branch 'hibernate' * hibernate: PM: Fix suspend_console and resume_console to use only one semaphore PM: Wait for console in resume PM: Fix pm_notifiers during user mode hibernation swsusp: clean up shrink_all_zones() swsusp: dont fiddle with swappiness PM: fix build for CONFIG_PM unset PM/hibernate: fix "swap breaks after hibernation failures" PM/resume: wait for device probing to finish Consolidate driver_probe_done() loops into one place	2009-02-21 14:17:26 -08:00
Arve Hjønnevåg	403f307576	PM: Fix suspend_console and resume_console to use only one semaphore This fixes a race where a thread acquires the console while the console is suspended, and the console is resumed before this thread releases it. In this case, the secondary console semaphore would be left locked, and the primary semaphore would be released twice. This in turn would cause the console switch on suspend or resume to hang forever. Note that suspend_console does not actually lock the console for clients that use acquire_console_sem, it only locks it for clients that use try_acquire_console_sem. If we change suspend_console to fully lock the console, then the kernel may deadlock on suspend. One client of try_acquire_console_sem is acquire_console_semaphore_for_printk, which uses it to prevent printk from using the console while it is suspended. Signed-off-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-21 14:17:18 -08:00
Arve Hjønnevåg	b090f9fa53	PM: Wait for console in resume Avoids later waking up to a blinking cursor if the device woke up and returned to sleep before the console switch happened. Signed-off-by: Brian Swetland <swetland@google.com> Signed-off-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-21 14:17:17 -08:00
Andrey Borzenkov	ebae2604f2	PM: Fix pm_notifiers during user mode hibernation Snapshot device is opened with O_RDONLY during suspend and O_WRONLY durig resume. Make sure we also call notifiers with correct parameter telling them what we are really doing. Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-21 14:17:17 -08:00
Rafael J. Wysocki	09664fda48	PM: fix build for CONFIG_PM unset Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken config dependncies. Fix that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-21 14:17:17 -08:00
Arjan van de Ven	eed3ee0829	PM/resume: wait for device probing to finish the resume code does not currently wait for device probing to finish. Even without async function calls this is dicey and not correct, but with async function calls during the boot sequence this is going to get hit more... This patch adds the synchronization using the newly introduced helper. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Acked-by: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-21 14:17:17 -08:00
Steven Rostedt	4377245aa9	ftrace: break out modify loop immediately on detection of error Impact: added precaution on failure detection Break out of the modifying loop as soon as a failure is detected. This is just an added precaution found by code review and was not found by any bug chasing. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-20 14:30:20 -05:00
Steven Rostedt	000ab69117	ftrace: allow archs to preform pre and post process for code modification This patch creates the weak functions: ftrace_arch_code_modify_prepare and ftrace_arch_code_modify_post_process that are called before and after the stop machine is called to modify the kernel text. If the arch needs to do pre or post processing, it only needs to define these functions. [ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_* over using ftrace_arch_modify_* ] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-20 13:16:18 -05:00
Ingo Molnar	3b6f7b9beb	Merge branch 'x86/urgent' into x86/core	2009-02-20 17:40:43 +01:00
Frederic Weisbecker	f9349a8f97	tracing/function-graph-tracer: make set_graph_function file support ftrace regex Impact: trace only functions matching a pattern The set_graph_function file let one to trace only one or several chosen functions and follow all their code flow. Currently, only a constant function name is allowed so this patch allows the ftrace_regex functions: - matches all functions that end with "name": echo name > set_graph_function - matches all functions that begin with "name": echo name > set_graph_function - matches all functions that contains "name": echo name > set_graph_function Example: echo mutex* > set_graph_function 0) \| mutex_lock_nested() { 0) 0.563 us \| __might_sleep(); 0) 2.072 us \| } 0) \| mutex_unlock() { 0) 1.036 us \| __mutex_unlock_slowpath(); 0) 2.433 us \| } 0) \| mutex_unlock() { 0) 0.691 us \| __mutex_unlock_slowpath(); 0) 1.787 us \| } 0) \| mutex_lock_interruptible_nested() { 0) 0.548 us \| __might_sleep(); 0) 1.945 us \| } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-20 11:36:24 +01:00
Tejun Heo	fbf59bc9d7	percpu: implement new dynamic percpu allocator Impact: new scalable dynamic percpu allocator which allows dynamic percpu areas to be accessed the same way as static ones Implement scalable dynamic percpu allocator which can be used for both static and dynamic percpu areas. This will allow static and dynamic areas to share faster direct access methods. This feature is optional and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by arch. Please read comment on top of mm/percpu.c for details. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>	2009-02-20 16:29:08 +09:00
Rusty Russell	b36128c830	alloc_percpu: change percpu_ptr to per_cpu_ptr Impact: cleanup There are two allocated per-cpu accessor macros with almost identical spelling. The original and far more popular is per_cpu_ptr (44 files), so change over the other 4 files. tj: kill percpu_ptr() and update UP too Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: mingo@redhat.com Cc: lenb@kernel.org Cc: cpufreq@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2009-02-20 16:29:08 +09:00
Tejun Heo	6b588c18f8	module: reorder module pcpu related functions Impact: cleanup Move percpu_modinit() upwards. This is to ease further changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-02-20 16:29:07 +09:00
Linus Torvalds	f54b2fe4ae	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: limit the number of loops the ring buffer self test can make tracing: have function trace select kallsyms tracing: disable tracing while testing ring buffer tracing/function-graph-tracer: trace the idle tasks	2009-02-19 09:14:22 -08:00
Ingo Molnar	00a8bf8593	tracing/function-graph-tracer: fix merge Merge artifact: pid got changed to ent->pid meanwhile. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-19 13:01:37 +01:00
Frederic Weisbecker	d1f9cbd788	tracing/function-graph-tracer: fix traces weirdness while absolute time printing Impact: trace output cleanup/reordering When an interrupt occurs and and the abstime option is selected: echo funcgraph-abstime > /debug/tracing/trace_options then we observe broken traces: 30581.025422 \| 0) Xorg-4291 \| 0.503 us \| idle_cpu(); 30581.025424 \| 0) Xorg-4291 \| 2.576 us \| } 30581.025424 \| 0) Xorg-4291 \| + 75.771 us \| } 0) Xorg-4291 \| <========== \| 30581.025425 \| 0) Xorg-4291 \| \| schedule() { 30581.025426 \| 0) Xorg-4291 \| \| __schedule() { 30581.025426 \| 0) Xorg-4291 \| 0.705 us \| _spin_lock_irq(); With this patch, the interrupts output better adapts to absolute time printing: 414.856543 \| 1) Xorg-4279 \| 8.816 us \| } 414.856544 \| 1) Xorg-4279 \| 0.525 us \| rcu_irq_exit(); 414.856545 \| 1) Xorg-4279 \| 0.526 us \| idle_cpu(); 414.856546 \| 1) Xorg-4279 \| + 12.157 us \| } 414.856549 \| 1) Xorg-4279 \| ! 104.114 us \| } 414.856549 \| 1) Xorg-4279 \| <========== \| 414.856549 \| 1) Xorg-4279 \| ! 107.944 us \| } 414.856550 \| 1) Xorg-4279 \| ! 137.010 us \| } 414.856551 \| 1) Xorg-4279 \| 0.624 us \| _read_unlock(); 414.856552 \| 1) Xorg-4279 \| ! 140.930 us \| } 414.856552 \| 1) Xorg-4279 \| ! 166.159 us \| } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-19 12:33:21 +01:00
Ingo Molnar	4cd0332db7	Merge branch 'mainline/function-graph' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/function-graph-tracer	2009-02-19 12:13:33 +01:00
Ingo Molnar	40999096e8	Merge branches 'tracing/blktrace', 'tracing/ftrace' and 'tracing/urgent' into tracing/core	2009-02-19 10:20:17 +01:00
john stultz	fdcedf7b75	time: apply NTP frequency/tick changes immediately Since the GENERIC_TIME changes landed, the adjtimex behavior changed for struct timex.tick and .freq changed. When the tick or freq value is set, we adjust the tick_length_base in ntp_update_frequency(). However, this new value doesn't get applied to tick_length until the next second (via second_overflow). This means some applications that do quick time tweaking do not see the requested change made as quickly as expected. I've run a few tests with this change, and ntpd still functions fine. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-19 10:10:08 +01:00
Ingo Molnar	72c26c9a26	Merge branch 'linus' into tracing/blktrace Conflicts: block/blktrace.c Semantic merge: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-19 09:00:35 +01:00
Steven Rostedt	4b3e3d2284	tracing: limit the number of loops the ring buffer self test can make Impact: prevent deadlock if ring buffer gets corrupted This patch adds a paranoid check to make sure the ring buffer consumer does not go into an infinite loop. Since the ring buffer has been set to read only, the consumer should not loop for more than the ring buffer size. A check is added to make sure the consumer does not loop more than the ring buffer size. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-18 22:50:01 -05:00
Steven Rostedt	4d7a077c0c	tracing: have function trace select kallsyms Impact: fix output of function tracer to be useful The function tracer is pretty useless if KALLSYMS is not configured. Unless you are good at reading hex values, the function tracer should select the KALLSYMS configuration. Also, the dynamic function tracer will fail its self test if KALLSYMS is not selected. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-18 22:06:18 -05:00
Steven Rostedt	0c5119c1e6	tracing: disable tracing while testing ring buffer Impact: fix to prevent hard lockup on self tests If one of the tracers are broken and is constantly filling the ring buffer while the test of the ring buffer is running, it will hang the box. The reason is that the test is a consumer that will not stop till the ring buffer is empty. But if the tracer is broken and is constantly producing input to the buffer, this test will never end. The result is a lockup of the box. This happened when KALLSYMS was not defined and the dynamic ftrace test constantly filled the ring buffer, because the filter failed and all functions were being traced. Something was being called that constantly filled the buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-18 22:04:01 -05:00
Linus Torvalds	ba95fd47d1	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list block: fix booting from partitioned md array block: revert part of `18ce3751cc` cciss: PCI power management reset for kexec paride/pg.c: xs(): &&/\|\| confusion fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free block: fix bad definition of BIO_RW_SYNC bsg: Fix sense buffer bug in SG_IO	2009-02-18 18:33:04 -08:00
Rafael J. Wysocki	42f5e039c3	pm: fix build for CONFIG_PM unset Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken config dependncies. Fix that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:54 -08:00
Li Zefan	67e055d144	cgroups: fix possible use after free In cgroup_kill_sb(), root is freed before sb is detached from the list, so another sget() may find this sb and call cgroup_test_super(), which will access the root that has been freed. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:54 -08:00
Steven Rostedt	712406a6bf	tracing/function-graph-tracer: make arch generic push pop functions There is nothing really arch specific of the push and pop functions used by the function graph tracer. This patch moves them to generic code. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-18 13:43:04 -05:00
Ingo Molnar	74019224ac	timers: add mod_timer_pending() Impact: new timer API Based on an idea from Martin Josefsson with the help of Patrick McHardy and Stephen Hemminger: introduce the mod_timer_pending() API which is a mod_timer() offspring that is an invariant on already removed timers. (regular mod_timer() re-activates non-pending timers.) This is useful for the networking code in that it can allow unserialized mod_timer_pending() timer-forwarding calls, but a single del_timer*() will stop the timer from being reactivated again. Also while at it: - optimize the regular mod_timer() path some more, the timer-stat and a debug check was needlessly duplicated in __mod_timer(). - make the exports come straight after the function, as most other exports in timer.c already did. - eliminate __mod_timer() as an external API, change the users to mod_timer(). The regular mod_timer() code path is not impacted significantly, due to inlining optimizations and due to the simplifications. Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-18 19:26:33 +01:00
Jens Axboe	93dbb39350	block: fix bad definition of BIO_RW_SYNC We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before `213d9417fe`. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-02-18 10:32:00 +01:00
Frederic Weisbecker	fa7c7f6e11	tracing/core: remove unused parameter in tracing_fill_pipe_page() Impact: cleanup The struct page *pages parameter is unused. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-18 01:40:20 +01:00
Frederic Weisbecker	6eaaa5d57e	tracing/core: use appropriate waiting on trace_pipe Impact: api and pipe waiting change Currently, the waiting used in tracing_read_pipe() is done through a 100 msecs schedule_timeout() loop which periodically check if there are traces on the buffer. This can cause small latencies for programs which are reading the incoming events. This patch makes the reader waiting for the trace_wait waitqueue except for few tracers such as the sched and functions tracers which might be already hold the runqueue lock while waking up the reader. This is performed through a new callback wait_pipe() on struct tracer. If none is implemented on a specific tracer, the default waiting for trace_wait queue is attached. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-18 01:40:20 +01:00
Ingo Molnar	ac07bcaa82	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-18 01:09:07 +01:00
Ingo Molnar	37bd824a35	Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core	2009-02-18 01:08:13 +01:00
Linus Torvalds	e78ac4b9de	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: cpu hotplug fix	2009-02-17 14:30:06 -08:00
Linus Torvalds	29a0c5ce5d	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: more consistently use clock vs timer	2009-02-17 14:29:42 -08:00
Linus Torvalds	f8effd1a4a	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: doc: mmiotrace.txt, buffer size control change trace: mmiotrace to the tracer menu in Kconfig mmiotrace: count events lost due to not recording	2009-02-17 14:29:15 -08:00
Linus Torvalds	8ce9a75a30	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: iommu: fix Intel IOMMU write-buffer flushing futex: fix reference leak Trivial conflicts fixed manually in drivers/pci/intel-iommu.c	2009-02-17 14:26:35 -08:00
Ingo Molnar	f17c75453b	irq: name 'p' variables a bit better 'p' stands for pointer - make it clear in setup_irq() and free_irq() what kind of pointer it is. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-17 20:44:47 +01:00
Ingo Molnar	8316e38100	irq: further clean up the free_irq() code flow Linus noticed that the 'pp' variable can be eliminated altogether, and the loop can be cleaned up further. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-17 20:28:29 +01:00
Frederic Weisbecker	5b058bcde9	tracing/function-graph-tracer: trace the idle tasks When the function graph tracer is activated, it iterates over the task_list to allocate a stack to store the return addresses. But the per cpu idle tasks are not iterated by using do_each_thread / while_each_thread. So we have to iterate on them manually. This fixes somes weirdness in the traces and many losses of traces. Examples on two cpus: 0) Xorg-4287 \| 2.906 us \| } 0) Xorg-4287 \| 3.965 us \| } 0) Xorg-4287 \| 5.302 us \| } ------------------------------------------ 0) Xorg-4287 => <idle>-0 ------------------------------------------ 0) <idle>-0 \| 2.861 us \| } 0) <idle>-0 \| 0.526 us \| set_normalized_timespec(); 0) <idle>-0 \| 7.201 us \| } 0) <idle>-0 \| 8.214 us \| } 0) <idle>-0 \| \| clockevents_program_event() { 0) <idle>-0 \| \| lapic_next_event() { 0) <idle>-0 \| 0.510 us \| native_apic_mem_write(); 0) <idle>-0 \| 1.546 us \| } 0) <idle>-0 \| 2.583 us \| } 0) <idle>-0 \| + 12.435 us \| } 0) <idle>-0 \| + 13.470 us \| } 0) <idle>-0 \| 0.608 us \| _spin_unlock_irqrestore(); 0) <idle>-0 \| + 23.270 us \| } 0) <idle>-0 \| + 24.336 us \| } 0) <idle>-0 \| + 25.417 us \| } 0) <idle>-0 \| 0.593 us \| _spin_unlock(); 0) <idle>-0 \| + 41.869 us \| } 0) <idle>-0 \| + 42.906 us \| } 0) <idle>-0 \| + 95.035 us \| } 0) <idle>-0 \| 0.540 us \| menu_reflect(); 0) <idle>-0 \| ! 100.404 us \| } 0) <idle>-0 \| 0.564 us \| mce_idle_callback(); 0) <idle>-0 \| \| enter_idle() { 0) <idle>-0 \| 0.526 us \| mce_idle_callback(); 0) <idle>-0 \| 1.757 us \| } 0) <idle>-0 \| \| cpuidle_idle_call() { 0) <idle>-0 \| \| menu_select() { 0) <idle>-0 \| 0.525 us \| pm_qos_requirement(); 0) <idle>-0 \| 0.518 us \| tick_nohz_get_sleep_length(); 0) <idle>-0 \| 2.621 us \| } [...] 1) <idle>-0 \| 0.518 us \| touch_softlockup_watchdog(); 1) <idle>-0 \| + 14.355 us \| } 1) <idle>-0 \| + 22.840 us \| } 1) <idle>-0 \| + 25.949 us \| } 1) <idle>-0 \| \| handle_irq() { 1) <idle>-0 \| 0.511 us \| irq_to_desc(); 1) <idle>-0 \| \| handle_edge_irq() { 1) <idle>-0 \| 0.638 us \| _spin_lock(); 1) <idle>-0 \| \| ack_apic_edge() { 1) <idle>-0 \| 0.510 us \| irq_to_desc(); 1) <idle>-0 \| \| move_native_irq() { 1) <idle>-0 \| 0.510 us \| irq_to_desc(); 1) <idle>-0 \| 1.532 us \| } 1) <idle>-0 \| 0.511 us \| native_apic_mem_write(); ------------------------------------------ 1) <idle>-0 => cat-5073 ------------------------------------------ 1) cat-5073 \| 3.731 us \| } 1) cat-5073 \| \| run_local_timers() { 1) cat-5073 \| 0.533 us \| hrtimer_run_queues(); 1) cat-5073 \| \| raise_softirq() { 1) cat-5073 \| \| __raise_softirq_irqoff() { 1) cat-5073 \| \| /* nr: 1 */ 1) cat-5073 \| 2.718 us \| } 1) cat-5073 \| 3.814 us \| } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-17 19:20:17 +01:00
Steven Rostedt	35ebf1caa4	ftrace: show unlimited when traceon or traceoff has no counter Impact: clean up The traceon and traceoff function probes are confusing to developers to what happens when a counter is not specified. This should help clear things up. # echo "*:traceoff" > set_ftrace_filter # cat /debug/tracing/set_ftrace_filter #### all functions enabled #### do_fork:traceoff:unlimited Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-17 13:12:12 -05:00
Wenji Huang	73d8b8bc4f	tracing: fix typing mistake in hint message and comments Impact: cleanup Fix incorrect hint message in code and typos in comments. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-17 12:38:24 -05:00
Wenji Huang	d2ef7c2f0f	tracing: fix the return value of trace selftest This patch is to fix the return value of trace_selftest_startup_sysprof and trace_selftest_startup_branch on failure. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-17 12:38:24 -05:00
Wenji Huang	af51309845	tracing: use the more proper parameter Pass tsk to tracing_record_cmdline instead of current. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-17 12:38:23 -05:00
Steven Rostedt	b6887d7916	ftrace: rename _hook to _probe Impact: clean up Ingo Molnar did not like the _hook naming convention used by the select function tracer. Luis Claudio R. Goncalves suggested using the "_probe" extension. This patch implements the change of calling the functions and variables "_hook" and replacing them with "_probe". Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-17 12:32:04 -05:00
Steven Rostedt	6a24a244cd	ftrace: clean up coding style Ingo Molnar pointed out some coding style issues with the recent ftrace updates. This patch cleans them up. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-17 11:20:26 -05:00
Ingo Molnar	494df596f9	Merge branches 'x86/acpi', 'x86/apic', 'x86/cpudetect', 'x86/headers', 'x86/paravirt', 'x86/urgent' and 'x86/xen'; commit 'v2.6.29-rc5' into x86/core	2009-02-17 12:07:00 +01:00
Ingo Molnar	73d3fd96e7	ftrace: fix !CONFIG_DYNAMIC_FTRACE ftrace_swapper_pid definition Impact: build fix Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-17 11:50:42 +01:00
Ingo Molnar	f492d3f838	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-17 11:31:01 +01:00
Ingo Molnar	c4e2b432d5	Merge branches 'tracing/hw-branch-tracing' and 'tracing/power-tracer' into tracing/core	2009-02-17 11:29:53 +01:00
Steven Rostedt	e110e3d1ea	ftrace: add pretty print function for traceon and traceoff hooks This patch adds a pretty print version of traceon and traceoff output for set_ftrace_filter. # echo 'sys_open:traceon:4' > set_ftrace_filter # cat set_ftrace_filter #### all functions enabled #### sys_open:traceon:count=4 Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 23:38:13 -05:00
Steven Rostedt	809dcf29ce	ftrace: add pretty print to selected fuction traces This patch adds a call back for the tracers that have hooks to selected functions. This allows the tracer to show better output in the set_ftrace_filter file. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 23:06:01 -05:00
Steven Rostedt	8fc0c701c5	ftrace: show selected functions in set_ftrace_filter This patch adds output to show what functions have tracer hooks attached to them. # echo 'sys_open:traceon:4' > /debug/tracing/set_ftrace_filter # cat set_ftrace_filter #### all functions enabled #### sys_open:ftrace_traceon:0000000000000004 # echo 'do_fork:traceoff:' > set_ftrace_filter # cat set_ftrace_filter #### all functions enabled #### sys_open:ftrace_traceon:0000000000000002 do_fork:ftrace_traceoff:ffffffffffffffff Note the 4 changed to a 2. This is because The code was executed twice since the traceoff was added. If a cat is done again: #### all functions enabled #### sys_open:ftrace_traceon do_fork:ftrace_traceoff:ffffffffffffffff The number disappears. That is because it will not print a NULL. Callbacks to allow the tracer to pretty print will be implemented soon. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 22:50:06 -05:00
Steven Rostedt	23b4ff3aa4	ftrace: add traceon traceoff commands to enable/disable the buffers This patch adds the new function selection commands traceon and traceoff. traceon sets the function to enable the ring buffers while traceoff disables the ring buffers. You can pass in the number of times you want the command to be executed when the function is hit. It will only execute if the state of the buffers are not already in that state. Example: # echo do_fork:traceon:4 Will enable the ring buffers if they are disabled every time it hits do_fork, up to 4 times. # echo sys_close:traceoff This will disable the ring buffers every time (unlimited) when sys_close is called. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 22:50:04 -05:00
Steven Rostedt	988ae9d6b2	ring-buffer: add tracing_is_on to test if ring buffer is enabled This patch adds the tracing_is_on() interface to tell if the ring buffer is turned on or not. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 22:50:01 -05:00
Steven Rostedt	59df055f19	ftrace: trace different functions with a different tracer Impact: new feature Currently, the function tracer only gives you an ability to hook a tracer to all functions being traced. The dynamic function trace allows you to pick and choose which of those functions will be traced, but all functions being traced will call all tracers that registered with the function tracer. This patch adds a new feature that allows a tracer to hook to specific functions, even when all functions are being traced. It allows for different functions to call different tracer hooks. The way this is accomplished is by a special function that will hook to the function tracer and will set up a hash table knowing which tracer hook to call with which function. This is the most general and easiest method to accomplish this. Later, an arch may choose to supply their own method in changing the mcount call of a function to call a different tracer. But that will be an exercise for the future. To register a function: struct ftrace_hook_ops { void (func)(unsigned long ip, unsigned long parent_ip, void data); int (callback)(unsigned long ip, void *data); void (free)(void *data); }; int register_ftrace_function_hook(char glob, struct ftrace_hook_ops ops, void data); glob is a simple glob to search for the functions to hook. ops is a pointer to the operations (listed below) data is the default data to be passed to the hook functions when traced ops: func is the hook function to call when the functions are traced callback is a callback function that is called when setting up the hash. That is, if the tracer needs to do something special for each function, that is being traced, and wants to give each function its own data. The address of the entry data is passed to this callback, so that the callback may wish to update the entry to whatever it would like. free is a callback for when the entry is freed. In case the tracer allocated any data, it is give the chance to free it. To unregister we have three functions: void unregister_ftrace_function_hook(char glob, struct ftrace_hook_ops ops, void data) This will unregister all hooks that match glob, point to ops, and have its data matching data. (note, if glob is NULL, blank or '', all functions will be tested). void unregister_ftrace_function_hook_func(char glob, struct ftrace_hook_ops ops) This will unregister all functions matching glob that has an entry pointing to ops. void unregister_ftrace_function_hook_all(char *glob) This simply unregisters all funcs. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 22:44:09 -05:00
Steven Rostedt	e6ea44e9b4	ftrace: consolidate mutexes Impact: clean up Now that ftrace_lock is a mutex, there is no reason to have three different mutexes protecting similar data. All the mutex paths are not in hot paths, so having a mutex to cover more data is not a problem. This patch removes the ftrace_sysctl_lock and ftrace_start_lock and uses the ftrace_lock to protect the locations that were protected by these locks. By doing so, this change also removes some of the lock nesting that was taking place. There are still more mutexes in ftrace.c that can probably be consolidated, but they can be dealt with later. We need to be careful about the way the locks are nested, and by consolidating, we can cause a recursive deadlock. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 18:15:31 -05:00
Steven Rostedt	52baf11922	ftrace: convert ftrace_lock from a spinlock to mutex Impact: clean up The older versions of ftrace required doing the ftrace list search under atomic context. Now all the calls are in non-atomic context. There is no reason to keep the ftrace_lock as a spinlock. This patch converts it to a mutex. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 17:33:14 -05:00
Steven Rostedt	f6180773d9	ftrace: add command interface for function selection Allow for other tracers to add their own commands for function selection. This interface gives a trace the ability to name a command for function selection. Right now it is pretty limited in what it offers, but this is a building step for more features. The :mod: command is converted to this interface and also serves as a template for other implementations. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 17:06:02 -05:00
Steven Rostedt	e68746a271	ftrace: enable filtering only when a function is filtered on Impact: fix to prevent empty set_ftrace_filter and no ftrace output The function filter is used to only trace a given set of functions. The filter is enabled when a function name is echoed into the set_ftrace_filter file. But if the name has a typo and the function is not found, the filter is enabled, but no function is listed. This makes a confusing situation where set_ftrace_filter is empty but no functions ever get enabled for tracing. For example: # cat /debug/tracing/set_ftrace_filter #### all functions enabled #### # echo bad_name > set_ftrace_filter # cat /debug/tracing/set_ftrace_filter # echo function > current_tracer # cat trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # \| \| \| \| \| This patch changes that to only enable filtering if a function is set to be filtered on. Now, the filter is not enabled if a bad name is echoed into set_ftrace_filter. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 17:03:49 -05:00
Steven Rostedt	64e7c44061	ftrace: add module command function filter selection This patch adds a "command" syntax to the function filtering files: /debugfs/tracing/set_ftrace_filter /debugfs/tracing/set_ftrace_notrace Of the format: <function>:<command>:<parameter> The command is optional, and dependent on the command, so are the parameters. echo do_fork > set_ftrace_filter Will only trace 'do_fork'. echo 'sched_' > set_ftrace_filter Will only trace functions starting with the letters 'sched_'. echo ':mod:ext3' > set_ftrace_filter Will trace only the ext3 module functions. echo 'write:mod:ext3' > set_ftrace_notrace Will prevent the ext3 functions with the letters 'write' in the name from being traced. echo '!*_allocate:mod:ext3' > set_ftrace_filter Will remove the functions in ext3 that end with the letters '_allocate' from the ftrace filter. Although this patch implements the 'command' format, only the 'mod' command is supported. More commands to follow. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 16:55:50 -05:00
Steven Rostedt	9f4801e30a	ftrace: break up ftrace_match_records into smaller components Impact: clean up ftrace_match_records does a lot of things that other features can use. This patch breaks up ftrace_match_records and pulls out ftrace_setup_glob and ftrace_match_record. ftrace_setup_glob prepares a simple glob expression for use with ftrace_match_record. ftrace_match_record compares a single record with a glob type. Breaking this up will allow for more features to run on individual records. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 16:49:57 -05:00
Steven Rostedt	7f24b31b01	ftrace: rename ftrace_match to ftrace_match_records Impact: clean up ftrace_match is too generic of a name. What it really does is search all records and matches the records with the given string, and either sets or unsets the functions to be traced depending on if the parameter 'enable' is set or not. This allows us to make another function called ftrace_match that can be used to test a single record. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 16:33:15 -05:00
Steven Rostedt	265c831cb0	ftrace: add do_for_each_ftrace_rec and while_for_each_ftrace_rec Impact: clean up To iterate over all the functions that dynamic trace knows about it requires two for loops. One to iterate over the pages and the other to iterate over the records within the page. There are several duplications of these loops in ftrace.c. This patch creates the macros do_for_each_ftrace_rec and while_for_each_ftrace_rec to handle this logic, and removes the duplicate code. While making this change, I also discovered and fixed a small bug that one of the iterations should exit the loop after it found the record it was searching for. This used a break when it should have used a goto, since there were two loops it needed to break out from. No real harm was done by this bug since it would only continue to search the other records, and the code was in a slow path anyway. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 16:25:12 -05:00
Steven Rostedt	0c75a3ed63	ftrace: state that all functions are enabled in set_ftrace_filter Impact: clean up, make set_ftrace_filter less confusing The set_ftrace_filter shows only the functions that will be traced. But when it is empty, it will trace all functions. This can be a bit confusing. This patch makes set_ftrace_filter show: #### all functions enabled #### When all functions will be traced, and we do not filter only a select few. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-16 16:21:35 -05:00
Américo Wang	2b8f836fb1	sched: use TASK_NICE for task_struct #define TASK_NICE(p) PRIO_TO_NICE((p)->static_prio) So it's better to use TASK_NICE here. Signed-off-by: WANG Cong <wangcong@zeuux.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-16 13:45:52 +01:00
Patrick Ohly	a75244c3d5	timecompare: generic infrastructure to map between two time bases Mapping from a struct timecounter to a time returned by functions like ktime_get_real() is implemented. This is sufficient to use this code in a network device driver which wants to support hardware time stamping and transformation of hardware time stamps to system time. The interface could have been made more versatile by not depending on a time counter, but this wasn't done to avoid writing glue code elsewhere. The method implemented here is the one used and analyzed under the name "assisted PTP" in the LCI PTP paper: http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-15 22:43:32 -08:00
Patrick Ohly	a038a353c3	clocksource: allow usage independent of timekeeping.c So far struct clocksource acted as the interface between time/timekeeping.c and hardware. This patch generalizes the concept so that a similar interface can also be used in other contexts. For that it introduces new structures and related functions without touching the existing struct clocksource. The reasons for adding these new structures to clocksource.[ch] are * the APIs are clearly related * struct clocksource could be cleaned up to use the new structs * avoids proliferation of files with similar names (timesource.h? timecounter.h?) As outlined in the discussion with John Stultz, this patch adds * struct cyclecounter: stateless API to hardware which counts clock cycles * struct timecounter: stateful utility code built on a cyclecounter which provides a nanosecond counter * only the function to read the nanosecond counter; deltas are used internally and not exposed to users of timecounter The code does no locking of the shared state. It must be called at least as often as the cycle counter wraps around to detect these wrap arounds. Both is the responsibility of the timecounter user. Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-15 22:43:31 -08:00
Henrik Austad	a0a522ce3d	sched: idle_at_tick is only used when CONFIG_SMP is set Impact: struct rq size optimization The idle_at_tick in struct rq is only used in SMP settings and it does not make sense to have this in the rq in an UP setup. Signed-off-by: Henrik Austad <henrik@austad.us> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-15 21:16:10 +01:00
Ingo Molnar	5274f8354d	Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/core	2009-02-15 21:15:16 +01:00
Ingo Molnar	72b623c736	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/power-tracer	2009-02-15 20:43:03 +01:00
Rakib Mullick	a234aa9ecd	tracing: fix section mismatch in trace_hw_branches.c The function bts_trace_init() references a variable bts_hotcpu_notifier which is marked as __cpuinitdata. Thus causes section mismatch. This patch fixes it. LD kernel/trace/built-in.o WARNING: kernel/trace/built-in.o(.text+0xc90c): Section mismatch in reference from the function bts_trace_init() to the variable .cpuinit.data:bts_hotcpu_notifier The function bts_trace_init() references the variable __cpuinitdata bts_hotcpu_notifier. This is often because bts_trace_init lacks a __cpuinitdata annotation or the annotation of bts_hotcpu_notifier is wrong. WARNING: kernel/trace/built-in.o(.text+0xc92a): Section mismatch in reference from the function bts_trace_reset() to the variable .cpuinit.data:bts_hotcpu_notifier The function bts_trace_reset() references the variable __cpuinitdata bts_hotcpu_notifier. This is often because bts_trace_reset lacks a __cpuinitdata annotation or the annotation of bts_hotcpu_notifier is wrong. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: markus.t.metzger@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-15 20:41:08 +01:00
Ingo Molnar	9abd603048	Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core Conflicts: kernel/trace/trace_mmiotrace.c	2009-02-15 20:08:55 +01:00
Pekka Paalanen	6bc5c366b1	trace: mmiotrace to the tracer menu in Kconfig Impact: cosmetic change in Kconfig menu layout This patch was originally suggested by Peter Zijlstra, but seems it was forgotten. CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable directly under the Kernel hacking / debugging menu in the kernel configuration system. They were present only for x86 and x86_64. Other tracers that use the ftrace tracing framework are in their own sub-menu. This patch moves the mmiotrace configuration options there. Since the Kconfig file, where the tracer menu is, is not architecture specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by x86/x86_64. CONFIG_MMIOTRACE now depends on it. Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-15 20:03:28 +01:00
Pekka Paalanen	391b170f10	mmiotrace: count events lost due to not recording Impact: enhances lost events counting in mmiotrace The tracing framework, or the ring buffer facility it uses, has a switch to stop recording data. When recording is off, the trace events will be lost. The framework does not count these, so mmiotrace has to count them itself. Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-15 20:02:42 +01:00
Ingo Molnar	ae88a23b32	irq: refactor and clean up the free_irq() code flow Impact: cleanup - separate out the loop from the actual freeing logic, this wins us two indentation levels allowing a number of followup prettifications - turn the WARN_ON() into a more informative WARN(). - clean up the comments and the code flow some more Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-15 11:36:49 +01:00
Ingo Molnar	327ec5699c	irq: clean up manage.c - make printk message git-greppable - fix a few style details Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-15 11:21:37 +01:00
Ingo Molnar	b69bc39674	Merge commit 'v2.6.29-rc5' into x86/apic	2009-02-15 09:00:18 +01:00
David S. Miller	5e30589521	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ Conflicts: drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c	2009-02-14 23:12:00 -08:00
Peter Zijlstra	868a23a804	lockdep: build fix for !PROVE_LOCKING The __GFP_FS annotations fail to build with CONFIG_LOCKDEP=y, CONFIG_PROVE_LOCKING=n, ammend that. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-15 00:30:26 +01:00
Peter Zijlstra	9833f8cb95	lockstat: warn about disabled lock debugging Avoid confusion and clearly state lock debugging got disabled. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:28 +01:00
Peter Zijlstra	b4b136f44b	lockdep: use stringify.h Arnd pointed out we have the stringify macro magic already in-kernel. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:26 +01:00
Peter Zijlstra	4f367d8adc	lockdep: simplify check_prev_add_irq() Remove the manual state iteration thingy. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:24 +01:00
Peter Zijlstra	f510b233cf	lockdep: get_user_chars() redo Generic, states independent, get_user_chars(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:22 +01:00
Peter Zijlstra	3ff176ca47	lockdep: simplify get_user_chars() there's too much repetition of code.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:21 +01:00
Peter Zijlstra	38aa271438	lockdep: add comments to mark_lock_irq() re-add some of the comments that got lost in the refactoring. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:19 +01:00
Peter Zijlstra	cf2ad4d13c	lockdep: remove macro usage from mark_held_locks() Now that we have nice numerical relations for the states, remove the macro magics. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:17 +01:00
Peter Zijlstra	9d3651a23d	lockdep: fully reduce mark_lock_irq() Now what its only two functions, they again look rather similar. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:15 +01:00
Peter Zijlstra	42c50d544e	lockdep: merge the !_READ mark_lock_irq() helpers These two are also remakably similar Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:13 +01:00
Peter Zijlstra	780e820b2d	lockdep: merge the _READ mark_lock_irq() helpers The _READ helpers show remarkable similarity, merge them. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:12 +01:00
Peter Zijlstra	cd95302d25	lockdep: simplify mark_lock_irq() helpers #3 Kill another argument Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:10 +01:00
Peter Zijlstra	f989209e2f	lockdep: further simplify mark_lock_irq() helpers take away another parameter Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:08 +01:00
Peter Zijlstra	604de3b5b6	lockdep: simplify the mark_lock_irq() helpers In order to unify them, take some arguments away Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:06 +01:00
Peter Zijlstra	6a6904d347	lockdep: split up mark_lock_irq() split mark_lock_irq() into 4 simple helper functions Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:04 +01:00
Peter Zijlstra	fabe9c42c6	lockdep: generate usage strings generate the usage strings XXX capital invasion :-( Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:02 +01:00
Peter Zijlstra	d7b1b02134	lockdep: generate the state bit definitions Generate the state bit definitions from the lockdep_states.h file. Also, move LOCK_USED to last, so that the USED_IN USED_IN_READ ENABLED ENABLED_READ states are nicely bit aligned -- we're going to use that property Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:28:01 +01:00
Peter Zijlstra	9851673bc3	lockdep: move state bit definitions around For convenience later. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:59 +01:00
Peter Zijlstra	5346417e17	lockdep: simplify mark_lock() remove the state iteration Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:57 +01:00
Peter Zijlstra	36bfb9bb03	lockdep: simplify mark_held_locks remove the explicit state iteration Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:56 +01:00
Peter Zijlstra	9fe51abf7a	lockdep: lockdep_states.h Introduce a header file to generate all the states from. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:54 +01:00
Peter Zijlstra	a652d7081b	lockdep: sanitize reclaim bit names s/HELD_OVER/ENABLED/g so that its similar to the hard and soft-irq names. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:52 +01:00
Peter Zijlstra	4fc95e867f	lockdep: sanitize bit names s/$LOCKF\?_ENABLED_[^ ]*$S$_READ$\?\>/\1\2/g So that the USED_IN and ENABLED have the same names. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:51 +01:00
Nick Piggin	cf40bd16fd	lockdep: annotate reclaim context (__GFP_NOFS) Here is another version, with the incremental patch rolled up, and added reclaim context annotation to kswapd, and allocation tracing to slab allocators (which may only ever reach the page allocator in rare cases, so it is good to put annotations here too). Haven't tested this version as such, but it should be getting closer to merge worthy ;) -- After noticing some code in mm/filemap.c accidentally perform a __GFP_FS allocation when it should not have been, I thought it might be a good idea to try to catch this kind of thing with lockdep. I coded up a little idea that seems to work. Unfortunately the system has to actually be in __GFP_FS page reclaim, then take the lock, before it will mark it. But at least that might still be some orders of magnitude more common (and more debuggable) than an actual deadlock condition, so we have some improvement I hope (the concept is no less complete than discovery of a lock's interrupt contexts). I guess we could even do the same thing with __GFP_IO (normal reclaim), and even GFP_NOIO locks too... but filesystems will have the most locks and fiddly code paths, so let's start there and see how it goes. It seems to work. I did a quick test. ================================= [ INFO: inconsistent lock state ] 2.6.28-rc6-00007-ged31348-dirty #26 --------------------------------- inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage. modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd] {in-reclaim-W} state was registered at: [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60 [<ffffffff80268f71>] lock_acquire+0x91/0xc0 [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310 [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd] [<ffffffff8020903b>] _stext+0x3b/0x170 [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0 [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 3929 hardirqs last enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310 hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310 softirqs last enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0 softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0 other info that might help us debug this: 1 lock held by modprobe/8526: #0: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd] stack backtrace: Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26 Call Trace: [<ffffffff80265483>] print_usage_bug+0x193/0x1d0 [<ffffffff80266530>] mark_lock+0xaf0/0xca0 [<ffffffff80266735>] mark_held_locks+0x55/0xc0 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60 [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580 [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd] [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffff8020903b>] _stext+0x3b/0x170 [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10 [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180 [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190 [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0 [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:49 +01:00
Johannes Berg	6f2b9b9a9d	timer: implement lockdep deadlock detection This modifies the timer code in a way to allow lockdep to detect deadlocks resulting from a lock being taken in the timer function as well as around the del_timer_sync() call. Signed-off-by: Johannes Berg <johannes@sipsolutions.net>	2009-02-14 23:25:52 +01:00
Serge E. Hallyn	fb5ae64fdd	User namespaces: Only put the userns when we unhash the uid uids in namespaces other than init don't get a sysfs entry. For those in the init namespace, while we're waiting to remove the sysfs entry for the uid the uid is still hashed, and alloc_uid() may re-grab that uid without getting a new reference to the user_ns, which we've already put in free_user before scheduling remove_user_sysfs_dir(). Reported-and-tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-13 08:07:40 -08:00
Jason Baron	b5f9fd0f8a	tracing: convert c/p state power tracer to use tracepoints Convert the c/p state "power" tracer to use tracepoints. Avoids a function call when the tracer is disabled. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-13 09:06:18 -05:00
Peter Zijlstra	3997ad317f	timers: more consistently use clock vs timer While reviewing the manpages, I noticed I'd missed some clock vs timer sites. Make sure that all timer functions call cpu_timer_sample_group() and not cpu_clock_sample_group(). This ensures that we enable the process wide timer in time, and therefore pay the O(n) thread group cost from the syscall. Not doing it here, will result in the first jiffy tick after setting the timer doing this, resulting in a very expensive tick (but only once) and a delay in actually starting the timer. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-13 13:04:05 +01:00
Ingo Molnar	8f8573ae9f	Merge branches 'irq/genirq', 'irq/sparseirq' and 'irq/urgent' into irq/core	2009-02-13 11:57:18 +01:00
Johannes Weiner	0e43785c57	irq: use GFP_KERNEL for action allocation in request_irq() request_irq() calls into proc code via __setup_irq() which is not safe in an atomic context, so request_irq() can itself use the more reliable GFP_KERNEL allocation for the action descriptor. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-13 10:52:07 +01:00
Ingo Molnar	d351c8db95	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-13 10:26:45 +01:00
Ingo Molnar	1c511f740f	Merge branches 'tracing/ftrace', 'tracing/ring-buffer', 'tracing/sysprof', 'tracing/urgent' and 'linus' into tracing/core	2009-02-13 10:25:18 +01:00
Ingo Molnar	ab639f3593	Merge branch 'core/percpu' into x86/core	2009-02-13 09:45:09 +01:00
Ingo Molnar	f8a6b2b9ce	Merge branch 'linus' into x86/apic Conflicts: arch/x86/kernel/acpi/boot.c arch/x86/mm/fault.c	2009-02-13 09:44:22 +01:00
Steven Rostedt	45141d4667	ring-buffer: rename label out_unlock to out_reset Impact: clean up While reviewing the ring buffer code, I thougth I saw a bug with if (!__raw_spin_trylock(&cpu_buffer->lock)) goto out_unlock; But I forgot that we use a variable "lock_taken" that is set if the spinlock is taken, and only unlock it if that variable is set. To avoid further confusion from other reviewers, this patch renames the label out_unlock with out_reset, which is the more appropriate name. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-12 13:39:46 -05:00
Ingo Molnar	9f8d979f08	softlockup: move 'one' to the softlockup section in sysctl.c CONFIG_SOFTLOCKUP=y \|\| CONFIG_DETECT_HUNG_TASKS=y is now the only user of the 'one' constant in kernel/sysctl.c. Move it to the softlockup block of constants. This fixes a GCC warning. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-12 17:08:15 +01:00
Ingo Molnar	871cafcc96	Merge branch 'linus' into core/softlockup	2009-02-12 13:08:57 +01:00
Ingo Molnar	a0490fa35d	sched: cpu hotplug fix rq_attach_root() does a kfree() with the runqueue lock held. That's not a very wise move, fix it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-12 11:57:36 +01:00
Li Zefan	cfebe563bd	cgroups: fix lockdep subclasses overflow I enabled all cgroup subsystems when compiling kernel, and then: # mount -t cgroup -o net_cls xxx /mnt # mkdir /mnt/0 This showed up immediately: BUG: MAX_LOCKDEP_SUBCLASSES too low! turning off the locking correctness validator. It's caused by the cgroup hierarchy lock: for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { struct cgroup_subsys *ss = subsys[i]; if (ss->root == root) mutex_lock_nested(&ss->hierarchy_mutex, i); } Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but MAX_LOCKDEP_SUBCLASSES is 8. This patch uses different lockdep keys for different subsystems. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:36 -08:00
Sven Wegener	fc3501d411	mm: fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches We need to pass an unsigned long as the minimum, because it gets casted to an unsigned long in the sysctl handler. If we pass an int, we'll access four more bytes on 64bit arches, resulting in a random minimum value. [rientjes@google.com: fix type of `old_bytes'] Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-11 14:25:35 -08:00
Peter Zijlstra	2fff78c784	futex: fix reference leak Catalin noticed that (`38d47c1b70`: futex: rely on get_user_pages() for shared futexes) caused an mm_struct leak. Some tracing with the function graph tracer quickly pointed out that futex_wait() has exit paths with unbalanced reference counts. This regression was discovered by kmemleak. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 18:24:08 +01:00
Linus Torvalds	6c6f1f0f4d	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: revert recent sync wakeup changes	2009-02-11 08:25:06 -08:00
Linus Torvalds	94dba89533	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: fix TIMER_ABSTIME for process wide cpu timers timers: split process wide cpu clocks/timers, fix x86: clean up hpet timer reinit timers: split process wide cpu clocks/timers, remove spurious warning timers: split process wide cpu clocks/timers signal: re-add dead task accumulation stats. x86: fix hpet timer reinit for x86_64 sched: fix nohz load balancer on cpu offline	2009-02-11 08:24:32 -08:00
Linus Torvalds	9ce04f9238	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ptrace, x86: fix the usage of ptrace_fork() i8327: fix outb() parameter order x86: fix math_emu register frame access x86: math_emu info cleanup x86: include correct %gs in a.out core dump x86, vmi: put a missing paravirt_release_pmd in pgd_dtor x86: find nr_irqs_gsi with mp_ioapic_routing x86: add clflush before monitor for Intel 7400 series x86: disable intel_iommu support by default x86: don't apply __supported_pte_mask to non-present ptes x86: fix grammar in user-visible BIOS warning x86/Kconfig.cpu: make Kconfig help readable in the console x86, 64-bit: print DMI info in the oops trace	2009-02-11 08:23:22 -08:00
Peter Zijlstra	fc631c82e1	sched: revert recent sync wakeup changes Intel reported a 10% regression (mysql+sysbench) on a 16-way machine with these patches: `1596e29`: sched: symmetric sync vs avg_overlap `d942fb6`: sched: fix sync wakeups Revert them. Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Bisected-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 14:43:35 +01:00
Peter Zijlstra	4da94d49b2	timers: fix TIMER_ABSTIME for process wide cpu timers The POSIX timer interface allows for absolute time expiry values through the TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock every time we start it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 14:04:21 +01:00
Peter Zijlstra	3fccfd67df	timers: split process wide cpu clocks/timers, fix To decrease the chance of a missed enable, always enable the timer when we sample it, we'll always disable it when we find that there are no active timers in the jiffy tick. This fixes a flood of warnings reported by Mike Galbraith. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 14:04:19 +01:00
Arnaldo Carvalho de Melo	00f62f614b	ring_buffer: pahole struct ring_buffer While fixing some bugs in pahole (built-in.o files were not being processed due to relocation problems) I found out about these packable structures: $ pahole --packable kernel/trace/ring_buffer.o \| grep ring ring_buffer 72 64 8 ring_buffer_per_cpu 112 104 8 If we take a look at the current layout of struct ring_buffer we can see that we have two 4 bytes holes. $ pahole -C ring_buffer kernel/trace/ring_buffer.o struct ring_buffer { unsigned int pages; /* 0 4 / unsigned int flags; / 4 4 / int cpus; / 8 4 / / XXX 4 bytes hole, try to pack / cpumask_var_t cpumask; / 16 8 / atomic_t record_disabled; / 24 4 / / XXX 4 bytes hole, try to pack / struct mutex mutex; / 32 32 / / --- cacheline 1 boundary (64 bytes) --- / struct ring_buffer_per_cpu * buffers; /* 64 8 / / size: 72, cachelines: 2, members: 7 / / sum members: 64, holes: 2, sum holes: 8 / / last cacheline: 8 bytes / }; So, if I ask pahole to reorganize it: $ pahole -C ring_buffer --reorganize kernel/trace/ring_buffer.o struct ring_buffer { unsigned int pages; / 0 4 / unsigned int flags; / 4 4 / int cpus; / 8 4 / atomic_t record_disabled; / 12 4 / cpumask_var_t cpumask; / 16 8 / struct mutex mutex; / 24 32 / struct ring_buffer_per_cpu * buffers; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / / size: 64, cachelines: 1, members: 7 / }; / saved 8 bytes and 1 cacheline! / We get it using just one 64 bytes cacheline. To see what it did: $ pahole -C ring_buffer --reorganize --show_reorg_steps \ kernel/trace/ring_buffer.o \| grep \/ / Moving 'record_disabled' from after 'cpumask' to after 'cpus' */ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 13:21:40 +01:00
Frederic Weisbecker	b22f485812	tracing/sysprof: add missing tracing_{start,stop}_record_cmdline() Add the missing pair tracing_{start,stop}_record_cmdline() to record well the cmdline associated with pid. Changes in v2: - fix a build error, the sched_switch tracer is needed to record the cmdline. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 12:55:19 +01:00
Frederic Weisbecker	cf2592f59c	softlockup: ensure the task has been switched out once When we check if a task has been switched out since the last scan, we might have a race condition on the following scenario: - the task is freshly created and scheduled - it puts its state to TASK_UNINTERRUPTIBLE and is not yet switched out - check_hung_task() scans this task and will report a false positive because t->nvcsw + t->nivcsw == t->last_switch_count == 0 Add a check for such cases. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mandeep Singh Baines <msb@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 11:04:16 +01:00
Oleg Nesterov	06eb23b1ba	ptrace, x86: fix the usage of ptrace_fork() I noticed by pure accident we have ptrace_fork() and friends. This was added by "x86, bts: add fork and exit handling", commit `bf53de907d`. I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly believe this needs the fix. I think something like this program int main(void) { int pid = fork(); if (!pid) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); kill(getpid(), SIGSTOP); fork(); } else { struct ptrace_bts_config bts = { .flags = PTRACE_BTS_O_ALLOC, .size = 4 * 4096, }; wait(NULL); ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK); ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts)); ptrace(PTRACE_CONT, pid, NULL, NULL); sleep(1); } return 0; } should crash the kernel. If the task is traced by its natural parent ptrace_reparented() returns 0 but we should clear ->btsxxx anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 10:32:46 +01:00
Arjan van de Ven	ad0b0fd554	sched, latencytop: incorporate review feedback from Andrew Morton Andrew had some suggestions for the latencytop file; this patch takes care of most of these: * Add documentation * Turn account_scheduler_latency into an inline function * Don't report negative values to userspace * Make the file operations struct const * Fix a few checkpatch.pl warnings Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 10:18:04 +01:00
Ingo Molnar	f437e8b53e	Merge commit 'v2.6.29-rc4' into sched/core	2009-02-11 10:17:42 +01:00
Hannes Eder	e7669b8e32	tracing: fix sparse warning: attribute function with __acquires/__releases Fix this sparse warning: kernel/trace/trace.c:458:9: warning: context imbalance in 'register_tracer' - unexpected unlock Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 10:15:42 +01:00
Hannes Eder	5e39841c45	tracing: fix sparse warnings: fix (un-)signedness Fix these sparse warnings: kernel/trace/ring_buffer.c:70:37: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:84:39: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:96:43: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2500:40: warning: incorrect type in argument 3 (different signedness) kernel/trace/ring_buffer.c:2505:44: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2507:46: warning: incorrect type in argument 2 (different signedness) kernel/trace/trace.c:2130:40: warning: incorrect type in argument 3 (different signedness) kernel/trace/trace.c:2280:40: warning: incorrect type in argument 3 (different signedness) Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 10:15:42 +01:00
Hannes Eder	4fd2735881	tracing: fix sparse warnings: make symbols static Impact: make global variables and a global function static The function '__trace_userstack' does not seem to have a caller, so it is commented out. Fix this sparse warnings: kernel/trace/trace.c:82:5: warning: symbol 'tracing_disabled' was not declared. Should it be static? kernel/trace/trace.c:600:10: warning: symbol 'trace_record_cmdline_disabled' was not declared. Should it be static? kernel/trace/trace.c:957:6: warning: symbol '__trace_userstack' was not declared. Should it be static? kernel/trace/trace.c:1694:5: warning: symbol 'tracing_release' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-11 10:15:41 +01:00
Ingo Molnar	4040068dce	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-11 10:03:53 +01:00
Wenji Huang	c3706f005c	tracing: fix typos in comments Impact: clean up. Fix typos in the comments. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-10 12:32:35 -05:00
Wenji Huang	810dc73265	tracing: provide correct return value after outputting the event This patch is to make the function return early on failure, and give correct return value on success. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-10 12:32:33 -05:00
Wenji Huang	f54fc98aa6	tracing: remove unneeded variable Impact: clean up. Remove the unnecessary variable ret. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-10 12:32:18 -05:00
Tobias Klauser	4543ae7ce1	tracing: storage class should be before const qualifier The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-10 11:58:45 -05:00
Lai Jiangshan	667d241258	ring_buffer: fix ring_buffer_read_page() Impact: change API and init bpage when copy ring_buffer_read_page()/rb_remove_entries() may be called for a partially consumed page. Add a parameter for rb_remove_entries() and make it update cpu_buffer->entries correctly for partially consumed pages. ring_buffer_read_page() now returns the offset to the next event. Init the bpage's time_stamp when return value is 0. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-10 09:17:37 -05:00
Lai Jiangshan	b85fa01ed9	ring_buffer: fix typing mistake Impact: Fix bug I found several very very curious line. It's so curious that it may be brought by typing mistake. When (cpu_buffer->reader_page == cpu_buffer->commit_page): 1) We haven't copied it for bpage is changed: bpage = cpu_buffer->reader_page->page; memcpy(bpage->data, cpu_buffer->reader_page->page->data + read ... ) 2) We need update cpu_buffer->reader_page->read, but "cpu_buffer->reader_page += read;" is not right. [ This bug was a typo. The commit->reader_page is a page pointer and not an index into the page. The line should have been commit->reader_page->read += read. The other changes by Lai are nice clean ups to the code. - SDR ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-10 09:17:19 -05:00
Ingo Molnar	ae216dd239	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-02-10 13:58:28 +01:00
Ingo Molnar	f9915bfef3	Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core	2009-02-10 13:25:42 +01:00
Hugh Dickins	acd895795d	profiling: fix broken profiling regression Impact: fix broken /proc/profile on UP machines Commit `c309b917ca` "cpumask: convert kernel/profile.c" broke profiling. prof_cpu_mask was previously initialized to CPU_MASK_ALL, but left uninitialized in that commit. We need to copy cpu_possible_mask (cpu_online_mask is not enough). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-10 00:50:37 +01:00
Tejun Heo	5d707e9c8e	stackprotector: update make rules Impact: no default -fno-stack-protector if stackp is enabled, cleanup Stackprotector make rules had the following problems. * cc support test and warning are scattered across makefile and kernel/panic.c. * -fno-stack-protector was always added regardless of configuration. Update such that cc support test and warning are contained in makefile and -fno-stack-protector is added iff stackp is turned off. While at it, prepare for 32bit support. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-10 00:41:54 +01:00
Tejun Heo	6cd61c0baa	elf: add ELF_CORE_COPY_KERNEL_REGS() ELF core dump is used for both user land core dump and kernel crash dump. Depending on architecture, register might need to be accessed differently for userland and kernel. Allow architectures to define ELF_CORE_COPY_KERNEL_REGS() and use different operation for kernel register dump. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-10 00:41:26 +01:00
Steven Rostedt	34cd4998d3	tracing: clean up splice code Ingo Molnar suggested a series of clean ups for the splice code. This patch implements those suggestions. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-09 12:24:58 -05:00
Eduard - Gabriel Munteanu	ff98781bab	tracing: Move pipe waiting code out of tracing_read_pipe(). This moves the pipe waiting code from tracing_read_pipe() into tracing_wait_pipe(), which is useful to implement other fops, like splice_read. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-09 12:24:51 -05:00
Eduard - Gabriel Munteanu	3c56819b14	tracing: splice support for tracing_pipe Added and implemented tracing_pipe_fops->splice_read(). This allows userspace programs to get tracing data more efficiently. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-09 12:24:34 -05:00
Ingo Molnar	249d51b53a	Merge commit 'v2.6.29-rc4' into core/percpu Conflicts: arch/x86/mach-voyager/voyager_smp.c arch/x86/mm/fault.c	2009-02-09 14:58:11 +01:00
Frederic Weisbecker	b91facc367	tracing/function-graph-tracer: handle the leaf functions from trace_pipe When one cats the trace file, the leaf functions are printed without brackets: function(); whereas in the trace_pipe file we'll see the following: function() { } This is because the ring_buffer handling is not the same between those two files. On the trace file, when an entry is printed, the iterator advanced and then we can check the next entry. There is no iterator with trace_pipe, the current entry to print has been peeked and not consumed. So checking the next entry will still return the current one while we don't consume it. This patch introduces a new value for the output callbacks to ask the tracing core to not consume the current entry after printing it. We need it because we will have to consume the current entry ourself to check the next one. Now the trace_pipe is able to handle well the leaf functions. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 12:37:27 +01:00
Ingo Molnar	1dfba05d0f	tracing/blktrace: move the tracing file to kernel/trace, fix Impact: build fix The BLK_DEV_IO_TRACE entry used to be in block/Kconfig - which file itself was dependent on CONFIG_BLOCK. But now the entry is in kernel/trace/Kconfig - which is present even on !CONFIG_BLOCK. So add a 'depends on BLOCK' to BLK_DEV_IO_TRACE. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 12:07:28 +01:00
Mandeep Singh Baines	17406b82d6	softlockup: remove timestamp checking from hung_task Impact: saves sizeof(long) bytes per task_struct By guaranteeing that sysctl_hung_task_timeout_secs have elapsed between tasklist scans we can avoid using timestamps. Signed-off-by: Mandeep Singh Baines <msb@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 11:03:49 +01:00
Arnaldo Carvalho de Melo	b5db03c435	tracing: handle unregistering the current tracer Impact: simplification Instead of requiring that plugins have the sequence: my_tracer_stop(my_trace_array); unregister_tracer(my_tracer); it should be possible just do a: unregister_tracer(my_tracer); Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 10:56:53 +01:00
Frederic Weisbecker	3861a17bcc	tracing/function-graph-tracer: drop the kernel_text_address check When the function graph tracer picks a return address, it ensures this address is really a kernel text one by calling __kernel_text_address() Actually this path has never been taken.Its role was more likely to debug the tracer on the beginning of its development but this function is wasteful since it is called for every traced function. The fault check is already sufficient. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 10:51:38 +01:00
Frederic Weisbecker	1292211058	tracing/power: move the power trace headers to a dedicated file Impact: cleanup Move the power tracer headers to trace/power.h to keep ftrace.h and power bits more easy to maintain as separated topics. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 10:51:38 +01:00
Frederic Weisbecker	7447dce96f	tracing/function-graph-tracer: provide a selftest for the function graph tracer Making it more easy to do a basic regression test for this tracer. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 10:51:37 +01:00
Frederic Weisbecker	2db270a80b	tracing/blktrace: move the tracing file to kernel/trace Impact: cleanup Move blktrace.c to kernel/trace, also move its config entry. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 10:51:02 +01:00
Ingo Molnar	44b0635481	Merge branch 'tip/tracing/core/devel' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace Conflicts: kernel/trace/trace_hw_branches.c	2009-02-09 10:35:12 +01:00
Ingo Molnar	4ad476e11f	Merge commit 'v2.6.29-rc4' into tracing/core	2009-02-09 10:32:48 +01:00
Hannes Eder	548c893380	kernel/irq: fix sparse warning: make symbol static While being at it make every occurrence of 'do_irq_select_affinity' have the same signature in terms of signedness of the first argument. Fix this sparse warning: kernel/irq/manage.c:112:5: warning: symbol 'do_irq_select_affinity' was not declared. Should it be static? Also rename do_irq_select_affinity() to setup_affinity() - shorter name and clearer naming. Signed-off-by: Hannes Eder <hannes@hanneseder.net> Acked-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 10:14:05 +01:00
Yinghai Lu	005bf0e6fa	irq: optimize init_kstat_irqs/init_copy_kstat_irqs Simplify and make init_kstat_irqs etc more type proof, suggested by Andrew. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 09:02:34 +01:00
Yinghai Lu	0f3c2a89c1	irq: clear kstat_irqs Impact: get correct kstat_irqs [/proc/interrupts] for msi/msi-x etc need to call clear_kstat_irqs(), so when we reuse that irq_desc, we get correct kstat in /proc/interrupts. This makes /proc/interrupts not have <NULL> entries. Don't need to worry about arch that doesn't support genirq, because they will not call dynamic_irq_cleanup(). v2: simplify and make clear_kstat_irqs more robust Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-09 08:55:08 +01:00
Ingo Molnar	5ba1ae92b6	Merge branches 'timers/clockevents', 'timers/hpet', 'timers/hrtimers' and 'timers/urgent' into timers/core	2009-02-08 20:14:11 +01:00
Ingo Molnar	140573d33b	Merge branches 'sched/rt' and 'sched/urgent' into sched/core	2009-02-08 20:12:46 +01:00
Stefan Richter	f7de7621f0	async: use list_move_tail list.h provides a dedicated primitive for "list_del followed by list_add_tail"... list_move_tail. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2009-02-08 10:00:26 -08:00
Cornelia Huck	766ccb9ed4	async: Rename _special -> _domain for clarity. Rename the async__special() functions to async__domain(), which describes the purpose of these functions much better. [Broke up long lines to silence checkpatch] Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2009-02-08 09:56:11 -08:00
Cornelia Huck	f30d5b307c	async: Add some documentation. Add some kerneldoc to the async interface. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2009-02-08 09:56:11 -08:00
Cornelia Huck	86532d8b16	async: Handle kthread_run() return codes. If we fail to create the manager thread, fall back to non-fastboot. If we fail to create an async thread, try again after waiting for a bit. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2009-02-08 09:56:10 -08:00
Cornelia Huck	7a89bbc749	async: Fix running list handling. async_schedule() should pass in async_running as the running list, and run_one_entry() should put the entry to be run on the provided running list instead of always on the generic one. Reported-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2009-02-08 09:56:10 -08:00
Wenji Huang	57794a9d48	trace: trivial fixes in comment typos. Impact: clean up Fixed several typos in the comments. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-07 20:03:36 -05:00
Steven Rostedt	a81bd80a0b	ring-buffer: use generic version of in_nmi Impact: clean up Now that a generic in_nmi is available, this patch removes the special code in the ring_buffer and implements the in_nmi generic version instead. With this change, I was also able to rename the "arch_ftrace_nmi_enter" back to "ftrace_nmi_enter" and remove the code from the ring buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-07 20:03:33 -05:00
Steven Rostedt	78d904b46a	ring-buffer: add NMI protection for spinlocks Impact: prevent deadlock in NMI The ring buffers are not yet totally lockless with writing to the buffer. When a writer crosses a page, it grabs a per cpu spinlock to protect against a reader. The spinlocks taken by a writer are not to protect against other writers, since a writer can only write to its own per cpu buffer. The spinlocks protect against readers that can touch any cpu buffer. The writers are made to be reentrant with the spinlocks disabling interrupts. The problem arises when an NMI writes to the buffer, and that write crosses a page boundary. If it grabs a spinlock, it can be racing with another writer (since disabling interrupts does not protect against NMIs) or with a reader on the same CPU. Luckily, most of the users are not reentrant and protects against this issue. But if a user of the ring buffer becomes reentrant (which is what the ring buffers do allow), if the NMI also writes to the ring buffer then we risk the chance of a deadlock. This patch moves the ftrace_nmi_enter called by nmi_enter() to the ring buffer code. It replaces the current ftrace_nmi_enter that is used by arch specific code to arch_ftrace_nmi_enter and updates the Kconfig to handle it. When an NMI is called, it will set a per cpu variable in the ring buffer code and will clear it when the NMI exits. If a write to the ring buffer crosses page boundaries inside an NMI, a trylock is used on the spin lock instead. If the spinlock fails to be acquired, then the entry is discarded. This bug appeared in the ftrace work in the RT tree, where event tracing is reentrant. This workaround solved the deadlocks that appeared there. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-07 20:00:17 -05:00
Steven Rostedt	1830b52d0d	trace: remove deprecated entry->cpu Impact: fix to prevent developers from using entry->cpu With the new ring buffer infrastructure, the cpu for the entry is implicit with which CPU buffer it is on. The original code use to record the current cpu into the generic entry header, which can be retrieved by entry->cpu. When the ring buffer was introduced, the users were convert to use the the cpu number of which cpu ring buffer was in use (this was passed to the tracers by the iterator: iter->cpu). Unfortunately, the cpu item in the entry structure was never removed. This allowed for developers to use it instead of the proper iter->cpu, unknowingly, using an uninitialized variable. This was not the fault of the developers, since it would seem like the logical place to retrieve the cpu identifier. This patch removes the cpu item from the entry structure and fixes all the users that should have been using iter->cpu. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-07 19:38:43 -05:00
Ingo Molnar	673f820591	Merge branch 'linus' into core/locking Conflicts: fs/btrfs/locking.c	2009-02-07 18:31:54 +01:00
Len Brown	2d29c6a075	Merge branches 'release', 'asus', 'bugzilla-12450', 'cpuidle', 'debug', 'ec', 'misc', 'printk' and 'processor' into release	2009-02-07 01:34:56 -05:00
Li Zefan	04ec93fe9b	fork.c: fix NULL pointer dereference when nr_threads == threads-max I happened to forked lots of processes, and hit NULL pointer dereference. It is because in copy_process() after checking max_threads, 0 is returned but not -EAGAIN. The bug is introduced by "CRED: Detach the credentials from task_struct" (commit `f1752eec61`). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-06 08:43:11 -08:00
Arnaldo Carvalho de Melo	b6f11df26f	trace: Call tracing_reset_online_cpus before tracer->init() Impact: cleanup To make it easy for ftrace plugin writers, as this was open coded in the existing plugins Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-06 01:01:41 +01:00
Arnaldo Carvalho de Melo	51a763dd84	tracing: Introduce trace_buffer_{lock_reserve,unlock_commit} Impact: new API These new functions do what previously was being open coded, reducing the number of details ftrace plugin writers have to worry about. It also standardizes the handling of stacktrace, userstacktrace and other trace options we may introduce in the future. With this patch, for instance, the blk tracer (and some others already in the tree) can use the "userstacktrace" /d/tracing/trace_options facility. $ codiff /tmp/vmlinux.before /tmp/vmlinux.after linux-2.6-tip/kernel/trace/trace.c: trace_vprintk \| -5 trace_graph_return \| -22 trace_graph_entry \| -26 trace_function \| -45 __ftrace_trace_stack \| -27 ftrace_trace_userstack \| -29 tracing_sched_switch_trace \| -66 tracing_stop \| +1 trace_seq_to_user \| -1 ftrace_trace_special \| -63 ftrace_special \| +1 tracing_sched_wakeup_trace \| -70 tracing_reset_online_cpus \| -1 13 functions changed, 2 bytes added, 355 bytes removed, diff: -353 linux-2.6-tip/block/blktrace.c: __blk_add_trace \| -58 1 function changed, 58 bytes removed, diff: -58 linux-2.6-tip/kernel/trace/trace.c: trace_buffer_lock_reserve \| +88 trace_buffer_unlock_commit \| +86 2 functions changed, 174 bytes added, diff: +174 /tmp/vmlinux.after: 16 functions changed, 176 bytes added, 413 bytes removed, diff: -237 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-06 01:01:41 +01:00
Arnaldo Carvalho de Melo	0a9877514c	ring_buffer: remove unused flags parameter Impact: API change, cleanup >From ring_buffer_{lock_reserve,unlock_commit}. $ codiff /tmp/vmlinux.before /tmp/vmlinux.after linux-2.6-tip/kernel/trace/trace.c: trace_vprintk \| -14 trace_graph_return \| -14 trace_graph_entry \| -10 trace_function \| -8 __ftrace_trace_stack \| -8 ftrace_trace_userstack \| -8 tracing_sched_switch_trace \| -8 ftrace_trace_special \| -12 tracing_sched_wakeup_trace \| -8 9 functions changed, 90 bytes removed, diff: -90 linux-2.6-tip/block/blktrace.c: __blk_add_trace \| -1 1 function changed, 1 bytes removed, diff: -1 /tmp/vmlinux.after: 10 functions changed, 91 bytes removed, diff: -91 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-06 01:01:40 +01:00

... 6 7 8 9 10 ...

6829 Commits