linux

Author	SHA1	Message	Date
Cyrill Gorcunov	a76bfd0da2	initcalls: Fix m68k build and possible buffer overflow This patch fixes a build bug on m68k - gcc decides to emit a call to the strlen library function, which we don't implement. More importantly - my previous patch "init: don't lose initcall return values" (commit `e662e1cfd4`) had introduced potential buffer overflow by wrong calculation of string accumulator size. Use strlcat() instead, fixing both bugs. Many thanks Andreas Schwab and Geert Uytterhoeven for helping to catch and fix the bug. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-15 18:20:06 -07:00
Linus Torvalds	e0df154f45	Split up 'do_initcalls()' into two simpler functions One function to just loop over the entries, one function to actually do the call and the associated debugging code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-15 18:14:01 -07:00
Linus Torvalds	a442ac512f	Clean up 'print_fn_descriptor_symbol()' types Everybody wants to pass it a function pointer, and in fact, that is what you _must_ pass it for it to make sense (since it knows that ia64 and ppc64 use descriptors for function pointers and fetches the actual address from there). So don't make the argument be a 'unsigned long' and force everybody to add a cast. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-15 17:50:37 -07:00
Cyrill Gorcunov	e662e1cfd4	init: don't lose initcall return values There is an ability to lose an initcall return value if it happened with irq disabled or imbalanced preemption (and if we debug initcall). Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-13 08:02:25 -07:00
Peter Zijlstra	3e51f33fcc	sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK this replaces the rq->clock stuff (and possibly cpu_clock()). - architectures that have an 'imperfect' hardware clock can set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK - the 'jiffie' window might be superfulous when we update tick_gtod before the __update_sched_clock() call in sched_clock_tick() - cpu_clock() might be implemented as: sched_clock_cpu(smp_processor_id()) if the accuracy proves good enough - how far can TSC drift in a single jiffie when considering the filtering and idle hooks? [ mingo@elte.hu: various fixes and cleanups ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-05-05 23:56:18 +02:00
Thomas Gleixner	3ac7fe5a4a	infrastructure to debug (dynamic) objects We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:53 -07:00
Pavel Emelyanov	5cd204550b	Deprecate find_task_by_pid() There are some places that are known to operate on tasks' global pids only: * the rest_init() call (called on boot) * the kgdb's getthread * the create_kthread() (since the kthread is run in init ns) So use the find_task_by_pid_ns(..., &init_pid_ns) there and schedule the find_task_by_pid for removal. [sukadev@us.ibm.com: Fix warning in kernel/pid.c] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:48 -07:00
Oleg Nesterov	fae5fa44f1	signals: fix /sbin/init protection from unwanted signals The global init has a lot of long standing problems with the unhandled fatal signals. - The "is_global_init(current)" check in get_signal_to_deliver() protects only the main thread. Sub-thread can dequee the fatal signal and shutdown the whole thread group except the main thread. If it dequeues SIGSTOP /sbin/init will be stopped, this is not right too. Note that we can't use is_global_init(->group_leader), this breaks exec and this can't solve other problems we have. - Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT on delivery. This breaks exec, has other bad implications, and this is just wrong. Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps to solve some other problems addressed by the subsequent patches. Currently we use this flag for the global init only, but it could also be used by kthreads and (perhaps) by the sub-namespace inits. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:37 -07:00
Akinobu Mita	199f0ca514	idr: create idr_layer_cache at boot time Avoid a possible kmem_cache_create() failure by creating idr_layer_cache unconditionary at boot time rather than creating it on-demand when idr_init() is called the first time. This change also enables us to eliminate the check every time idr_init() is called. [akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()] [akpm@linux-foundation.org: fix alpha build] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:25 -07:00
Balbir Singh	cf475ad28a	cgroups: add an owner to the mm_struct Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: David Rientjes <rientjes@google.com>, Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Paul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:10 -07:00
Bjorn Helgaas	626adeb667	Simplify initcall_debug output print_fn_descriptor_symbol() prints the address if we don't have a symbol, so no need to print both. Also, combine printing return value with elapsed time. Changes this: Calling initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50() initcall 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50() returned 1. initcall 0xc05b7a70 ran for 0 msecs: pci_mmcfg_late_insert_resources+0x0/0x50() initcall at 0xc05b7a70: pci_mmcfg_late_insert_resources+0x0/0x50(): returned with error code 1 to this: calling pci_mmcfg_late_insert_resources+0x0/0x50() initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned 1 after 0 msecs initcall pci_mmcfg_late_insert_resources+0x0/0x50() returned with error code 1 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:02 -07:00
Benjamin Herrenschmidt	839ad62e75	[POWERPC] Use __weak macro for smp_setup_processor_id Use the __weak macro instead of the longer __attribute__ ((weak)) form in one place in init/main.c. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> -- init/main.c \| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:57:33 +10:00
Benjamin Herrenschmidt	8c9843e57a	[POWERPC] Add thread_info_cache_init() weak hook Some architectures need to maintain a kmem cache for thread info structures. The next commit adds that to powerpc to fix an alignment problem. There is no good arch callback to use to initialize that cache that I can find, so this adds a new one in the form of a weak function whose default is empty. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:57:33 +10:00
Mike Travis	e0982e90cd	init: move setup of nr_cpu_ids to as early as possible Move the setting of nr_cpu_ids from sched_init() to start_kernel() so that it's available as early as possible. Note that an arch has the option of setting it even earlier if need be, but it should not result in a different value than the setup_nr_cpu_ids() function. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Mike Travis	321a8e9dcb	cpumask: add CPU_MASK_ALL_PTR macro * Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as a pointer reference to CPU_MASK_ALL. This reduces where possible the instances where CPU_MASK_ALL allocates and fills a large array on the stack. Used only if NR_CPUS > BITS_PER_LONG. * Change init/main.c to use new set_cpus_allowed_ptr(). Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Linus Torvalds	9a9e0d6855	ACPI: Remove ACPI_CUSTOM_DSDT_INITRD option This essentially reverts commit `71fc47a9ad` ("ACPI: basic initramfs DSDT override support"), because the code simply isn't ready. It did ugly things to the init sequence to populate the rootfs image early, but that just ended up showing other problems with the whole approach. The fact is, the VFS layer simply isn't initialized this early, and the relevant ACPI code should either run much later, or this shouldn't be done at all. For 2.6.25, we'll just pick the latter option. We can revisit this concept later if necessary. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Tilman Schmidt <tilman@imap.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Renninger <trenn@suse.de> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Markus Gaugusch <dsdt@gaugusch.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-15 11:58:04 -07:00
Alex Riesen	d9d4fcfe51	Fix "Malformed early option 'loglevel'" Keith Mannthey said: The parameter hotadd_percent is setup right but there is a "Malformed early option 'numa'" message. Rusty Russell said: This happens when the function registered with early_param() returns non-zero. __setup() functions return 1 if OK, module_param() and early_param() return 0 or a -ve error code. For instance: Linux version 2.6.25-rc3-t (raa@steel) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #22 SMP PREEMPT Tue Feb 26 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) Malformed early option 'loglevel' 127MB HIGHMEM available. 896MB LOWMEM available. Command line: BOOT_IMAGE=2.6.25-t ro root=809 ro console=ttyS0,57600n8 console=tty0 loglevel=5 Acked-by: Yinghai Lu <yhlu.kernel@gmai.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Keith Mannthey <kmannth@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:10 -08:00
Thomas Gleixner	a03c2a48e0	x86: DEBUG_PAGEALLOC: enable after mem_init() DEBUG_PAGEALLOC must not be enabled before mem_init(). Before this point there is nothing to allocate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-09 23:24:09 +01:00
Yinghai Lu	f6f21c8146	Convert loglevel-related kernel boot parameters to early_param So we can use them for the early console like console=uart8250 or earlycon=uart8250 or early_printk Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:42 -08:00
Oleg Nesterov	430c623121	start the global /sbin/init with 0,0 special pids As Eric pointed out, there is no problem with init starting with sid == pgid == 0, and this was historical linux behavior changed in 2.6.18. Remove kernel_init()->__set_special_pids(), this is unneeded and complicates the rules for sys_setsid(). This change and the previous change in daemonize() mean that /sbin/init does not need the special "session != 1" hack in sys_setsid() any longer. We can't remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so update the comment only. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Oleg Nesterov	8520d7c7f8	teach set_special_pids() to use struct pid Change set_special_pids() to work with struct pid, not pid_t from global name space. This again speedups and imho cleanups the code, also a preparation for the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:27 -08:00
Markus Gaugusch	71fc47a9ad	ACPI: basic initramfs DSDT override support The basics of DSDT from initramfs. In case this option is selected, populate_rootfs() is called a bit earlier to have the initramfs content available during ACPI initialization. This is a very similar path to the one available at http://gaugusch.at/kernel.shtml but with some update in the documentation, default set to No and the change of populate_rootfs() the "Jeff Mahony way" (which avoids reading the initramfs twice). Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net> Signed-off-by: Len Brown <len.brown@intel.com>	2008-02-06 22:07:41 -05:00
Adrian Bunk	a1c9eea9e5	proper prototype for signals_init() Add a proper prototype for signals_init() in include/linux/signal.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Ingo Molnar	12d6f21eac	x86: do not PSE on CONFIG_DEBUG_PAGEALLOC=y get more testing of the c_p_a() code done by not turning off PSE on DEBUG_PAGEALLOC. this simplifies the early pagetable setup code, and tests the largepage-splitup code quite heavily. In the end, all the largepages will be split up pretty quickly, so there's no difference to how DEBUG_PAGEALLOC worked before. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:58 +01:00
Mike Travis	dd5af90a7f	x86/non-x86: percpu, node ids, apic ids x86.git fixup Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:32 +01:00
Andi Kleen	ca74a6f84e	x86: optimize lock prefix switching to run less frequently On VMs implemented using JITs that cache translated code changing the lock prefixes is a quite costly operation that forces the JIT to throw away and retranslate a lot of code. Previously a SMP kernel would rewrite the locks once for each CPU which is quite unnecessary. This patch changes the code to never switch at boot in the normal case (SMP kernel booting with >1 CPU) or only once for SMP kernel on UP. This makes a significant difference in boot up performance on AMD SimNow! Also I expect it to be a little faster on native systems too because a smp switch does a lot of text_poke()s which each synchronize the pipeline. v1->v2: Rename max_cpus v1->v2: Fix off by one in UP check (Thomas Gleixner) Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:17 +01:00
travis@sgi.com	b32ef636a5	percpu: use a kconfig variable to signal arch specific percpu setup The use of the __GENERIC_PERCPU is a bit problematic since arches may want to run their own percpu setup while using the generic percpu definitions. Replace it through a kconfig variable. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:32:51 +01:00
Gautham R Shenoy	d221938c04	cpu-hotplug: refcount based cpu hotplug This patch implements a Refcount + Waitqueue based model for cpu-hotplug. Now, a thread which wants to prevent cpu-hotplug, will bump up a global refcount and the thread which wants to perform a cpu-hotplug operation will block till the global refcount goes to zero. The readers, if any, during an ongoing cpu-hotplug operation are blocked until the cpu-hotplug operation is over. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:01 +01:00
Adrian Bunk	e6fe6649b4	sched: proper prototype for kernel/sched.c:migration_init() This patch adds a proper prototype for migration_init() in include/linux/sched.h Since there's no point in always returning 0 to a caller that doesn't check the return value it also changes the function to return void. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-11-09 22:39:39 +01:00
Simon Arlott	211fee8a82	spelling fixes: init/ Spelling fix in init/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:28:29 +02:00
Robert P. J. Day	d8af7c6ab0	Drop the superfluous test for an old version of gcc. The header file <linux/compiler.h> already enforces a suitably recent version of gcc, so there's no point checking for that again. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:13:32 +02:00
Paul Menage	ddbcc7e8e5	Task Control Groups: basic task cgroup framework Generic Process Control Groups -------------------------- There have recently been various proposals floating around for resource management/accounting and other task grouping subsystems in the kernel, including ResGroups, User BeanCounters, NSProxy cgroups, and others. These all need the basic abstraction of being able to group together multiple processes in an aggregate, in order to track/limit the resources permitted to those processes, or control other behaviour of the processes, and all implement this grouping in different ways. This patchset provides a framework for tracking and grouping processes into arbitrary "cgroups" and assigning arbitrary state to those groupings, in order to control the behaviour of the cgroup as an aggregate. The intention is that the various resource management and virtualization/cgroup efforts can also become task cgroup clients, with the result that: - the userspace APIs are (somewhat) normalised - it's easier to test e.g. the ResGroups CPU controller in conjunction with the BeanCounters memory controller, or use either of them as the resource-control portion of a virtual server system. - the additional kernel footprint of any of the competing resource management systems is substantially reduced, since it doesn't need to provide process grouping/containment, hence improving their chances of getting into the kernel This patch: Add the main task cgroups framework - the cgroup filesystem, and the basic structures for tracking membership and associating subsystem state objects to tasks. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Hugh Dickins	62e6f1e8bb	fix maxcpus=1 oops in show_stat() Alexey Dobriyan reports that maxcpus=1 is still broken in 2.6.23-rc4: if CONFIG_HOTPLUG_CPU is not set, x86_64 bootup oopses in show_stat() - for_each_possible_cpu accesses a per-cpu area which was never set up. Alexey identified commit `61ec7567db` (ACPI: boot correctly with "nosmp" or "maxcpus=0") as the origin; but it's not really to blame, just exposes a bug in 2.6.23-rc1's commit `8b3b295502` (Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for CPUs that can never become available). rc1's test for max_cpus < 2 in start_kernel() wasn't working because max_cpus was still NR_CPUS at that point: until rc4 moved the maxcpus parsing earlier. Now it sets cpu_possible_map to 1 before allocating all possible per-cpu areas; then smp_init() expands cpu_possible_map to cpu_present_map (0xf in my case) later on. rc1's commit has good intentions, but expects cpu_present_map to be limited by maxcpus, which is only the case on i386. cpus_and(possible, possible,present) might be good, but needs an audit of cpu_present_map uses - there may well be assumptions that any cpu present is possible. So stay safe for now and just revert those #ifndef CONFIG_HOTPLUG_CPU optimizations in rc1's commit. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Len Brown <lenb@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-30 21:54:31 -07:00
Hugh Dickins	8134097717	fix maxcpus=N parsing Commit `61ec7567db` ('ACPI: boot correctly with "nosmp" or "maxcpus=0"') broke 'maxcpus=' handling on x86[-64]. maxcpus=N is now having no effect on x86_64, and freezing bootup on i386 (because of inconsistency with the separate maxcpus parsing down in arch/i386, I guess). That's because early_param parsing is a little different from __setup parsing, and needs the "=" omitted: then it seems to work as the original commit intended (no mention of IO-APIC in /proc/interrupts when maxcpus=0). Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Len Brown <len.brown@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-27 10:27:48 -07:00
Len Brown	61ec7567db	ACPI: boot correctly with "nosmp" or "maxcpus=0" In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled. However, in ACPI mode, these parameters didn't completely disable the IO APIC initialization code and boot failed. init/main.c: Disable the IO_APIC if "nosmp" or "maxcpus=0" undefine disable_ioapic_setup() when it doesn't apply. i386: delete ioapic_setup(), it was a duplicate of parse_noapic() delete undefinition of disable_ioapic_setup() x86_64: rename disable_ioapic_setup() to parse_noapic() to match i386 define disable_ioapic_setup() in header to match i386 http://bugzilla.kernel.org/show_bug.cgi?id=1641 Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>	2007-08-21 00:33:35 -04:00
Jan Beulich	8b3b295502	adjust nosmp handling Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for CPUs that can never become available. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Dave Jones	97842216b8	Allow softlockup to be runtime disabled It's useful sometimes to disable the softlockup checker at boottime. Especially if it triggers during a distro install. Signed-off-by: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:47 -07:00
Yinghai Lu	18a8bd949d	serial: convert early_uart to earlycon for 8250 Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in serial initializing stage. the console_init=>serial8250_console_init=> register_console=>serial8250_console_setup will return -ENDEV, and console ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in drivers/serial/serial_core.c to call register_console to get console ttyS0. that is too late. Make early_uart to use early_param, so uart console can be used earlier. Make it to be bootconsole with CON_BOOT flag, so can use console handover feature. and it will switch to corresponding normal serial console automatically. new command line will be: console=uart8250,io,0x3f8,9600n8 console=uart8250,mmio,0xff5e0000,115200n8 or earlycon=uart8250,io,0x3f8,9600n8 earlycon=uart8250,mmio,0xff5e0000,115200n8 it will print in very early stage: Early serial console at I/O port 0x3f8 (options '9600n8') console [uart0] enabled later for console it will print: console handover: boot [uart0] -> real [ttyS0] Signed-off-by: <yinghai.lu@sun.com> Cc: Andi Kleen <ak@suse.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:35 -07:00
Ingo Molnar	1df21055e3	sched: add init_idle_bootup_task() add the init_idle_bootup_task() callback to the bootup thread, unused at the moment. (CFS will use it to switch the scheduling class of the boot thread to the idle class) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-09 18:51:58 +02:00
Sam Ravnborg	92080309df	init/main: use __init_refok to fix section mismatch Kill a special case in modpost by introducing the __init_refok marker. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2007-05-19 09:11:58 +02:00
Sukadev Bhattiprolu	0e29b24aa6	Explicitly set pgid and sid of init process Explicitly set pgid and sid of init process to 1. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: <containers@lists.osdl.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:35 -07:00
Eric W. Biederman	73c279927f	kthread: don't depend on work queues Currently there is a circular reference between work queue initialization and kthread initialization. This prevents the kthread infrastructure from initializing until after work queues have been initialized. We want the properties of tasks created with kthread_create to be as close as possible to the init_task and to not be contaminated by user processes. The later we start our kthreadd that creates these tasks the harder it is to avoid contamination from user processes and the more of a mess we have to clean up because the defaults have changed on us. So this patch modifies the kthread support to not use work queues but to instead use a simple list of structures, and to have kthreadd start from init_task immediately after our kernel thread that execs /sbin/init. By being a true child of init_task we only have to change those process settings that we want to have different from init_task, such as our process name, the cpus that are allowed, blocking all signals and setting SIGCHLD to SIG_IGN so that all of our children are reaped automatically. By being a true child of init_task we also naturally get our ppid set to 0 and do not wind up as a child of PID == 1. Ensuring that tasks generated by kthread_create will not slow down the functioning of the wait family of functions. [akpm@linux-foundation.org: use interruptible sleeps] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Ingo Molnar	8f0c45cdf8	enhance initcall_debug, measure latency enhance the initcall_debug boot option: - measure the time the initcall took to execute and report it in units of milliseconds. - show the return code of initcalls (useful to see failures and to make sure that an initcall hung) [akpm@linux-foundation.org: fix printk warning] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Adrian Bunk	46595390e9	init/do_mounts.c: proper prepare_namespace() prototype Add a proper protype for prepare_namespace() in include/linux/init.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Christoph Lameter	476f35348e	Safer nr_node_ids and nr_node_ids determination and initial values The nr_cpu_ids value is currently only calculated in smp_init. However, it may be needed before (SLUB needs it on kmem_cache_init!) and other kernel components may also want to allocate dynamically sized per cpu array before smp_init. So move the determination of possible cpus into sched_init() where we already loop over all possible cpus early in boot. Also initialize both nr_node_ids and nr_cpu_ids with the highest value they could take. If we have accidental users before these values are determined then the current valud of 0 may cause too small per cpu and per node arrays to be allocated. If it is set to the maximum possible then we only waste some memory for early boot users. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Linus Torvalds	15700770ef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits) kconfig: fix mconf segmentation fault kbuild: enable use of code from a different dir kconfig: error out if recursive dependencies are found kbuild: scripts/basic/fixdep segfault on pathological string-o-death kconfig: correct minor typo in Kconfig warning message. kconfig: fix path to modules.txt in Kconfig help usr/Kconfig: fix typo kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs' kbuild: be more explicit on missing .config file kbuild: clarify the creation of the LOCALVERSION_AUTO string. kbuild: propagate errors from find in scripts/gen_initramfs_list.sh kconfig: refer to qt3 if we cannot find qt libraries kbuild: handle compressed cpio initramfs-es kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text kbuild: remove stale comment in modpost.c kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE kbuild: fix make mrproper for Documentation/DocBook/man kbuild: remove kconfig binaries during make mrproper kconfig/menuconfig: do not hardcode '.config' kbuild: override build timestamp & version ...	2007-05-06 13:21:57 -07:00
Sam Ravnborg	aae5f662a3	kbuild: whitelist section mismatch in init/main.c In init/main.c we have a reference from rest_init() to .init.text which is intentional. Rename the function 'init' to 'kernel_init' to make it a kernel wide unique symbol and whitelist the reference. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2007-05-02 20:58:07 +02:00
Jeremy Fitzhardinge	b6e3590f81	[PATCH] x86: Allow percpu variables to be page-aligned Let's allow page-alignment in general for per-cpu data (wanted by Xen, and Ingo suggested KVM as well). Because larger alignments can use more room, we increase the max per-cpu memory to 64k rather than 32k: it's getting a little tight. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:12 +02:00
Christoph Lameter	53b8a315b7	[PATCH] Convert highest_possible_processor_id to nr_cpu_ids We frequently need the maximum number of possible processors in order to allocate arrays for all processors. So far this was done using highest_possible_processor_id(). However, we do need the number of processors not the highest id. Moreover the number was so far dynamically calculated on each invokation. The number of possible processors does not change when the system is running. We can therefore calculate that number once. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Frederik Deweerdt <frederik.deweerdt@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:13 -08:00
Andrew Morton	6168a702ab	[PATCH] Declare init_irq_proc before we use it. powerpc gets: init/main.c: In function `do_basic_setup': init/main.c:714: warning: implicit declaration of function `init_irq_proc' but we cannot include linux/irq.h in generic code. Fix it by moving the declaration into linux/interrupt.h instead. And make sure all code that defines init_irq_proc() is including linux/interrupt.h. And nuke an ifdef-in-C Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-19 14:21:50 -08:00

1 2 3

114 Commits