linux

Author	SHA1	Message	Date
Alexey Dobriyan	8948e11f45	Allow access to /proc/$PID/fd after setuid() /proc/$PID/fd has r-x------ permissions, so if process does setuid(), it will not be able to access /proc/*/fd/. This breaks fstatat() emulation in glibc. open("foo", O_RDONLY\|O_DIRECTORY) = 4 setuid32(65534) = 0 stat64("/proc/self/fd/4/bar", 0xbfafb298) = -1 EACCES (Permission denied) Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: Kirill Korotaev <dev@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
David Rientjes	b813e931b4	smaps: add clear_refs file to clear reference Adds /proc/pid/clear_refs. When any non-zero number is written to this file, pte_mkold() and ClearPageReferenced() is called for each pte and its corresponding page, respectively, in that task's VMAs. This file is only writable by the user who owns the task. It is now possible to measure _approximately_ how much memory a task is using by clearing the reference bits with echo 1 > /proc/pid/clear_refs and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval. For example, to observe the approximate change in memory footprint for a task, write a script that clears the references (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and extracts the size in kB. Add the sizes for each VMA together for the total referenced footprint. Moments later, repeat the process and observe the difference. For example, using an efficient Mozilla: accumulated time referenced memory ---------------- ----------------- 0 s 408 kB 1 s 408 kB 2 s 556 kB 3 s 1028 kB 4 s 872 kB 5 s 1956 kB 6 s 416 kB 7 s 1560 kB 8 s 2336 kB 9 s 1044 kB 10 s 416 kB This is a valuable tool to get an approximate measurement of the memory footprint for a task. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> [akpm@linux-foundation.org: build fixes] [mpm@selenic.com: rename for_each_pmd] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Al Viro	04ff97086b	[PATCH] sanitize security_getprocattr() API have it return the buffer it had allocated Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-14 15:27:48 -07:00
Glauber de Oliveira Costa	63967fa911	[PATCH] Missing __user in pointer referenced within copy_from_user Pointers to user data should be marked with a __user hint. This one is missing. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:15 -08:00
Arjan van de Ven	c5ef1c42c5	[PATCH] mark struct inode_operations const 3 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Arjan van de Ven	00977a59b9	[PATCH] mark struct file_operations const 6 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:45 -08:00
Alexey Dobriyan	4b98d11b40	[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct They are fat: 4x8 bytes in task_struct. They are uncoditionally updated in every fork, read, write and sendfile. They are used only if you have some "extended acct fields feature". And please, please, please, read(2) knows about bytes, not characters, why it is called "rchar"? Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Guillaume Chazarain	7d8952440f	[PATCH] procfs: Fix listing of /proc/NOT_A_TGID/task Listing /proc/PID/task were PID is not a TGID should not result in duplicated entries. [g ~]$ pidof thunderbird-bin 2751 [g ~]$ ls /proc/2751/task 2751 2770 2771 2824 2826 2834 2835 2851 2853 [g ~]$ ls /proc/2770/task 2751 2770 2771 2824 2826 2834 2835 2851 2853 2770 2771 2824 2826 2834 2835 2851 2853 [g ~]$ Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-01 16:22:41 -08:00
Alexey Dobriyan	863c47028e	[PATCH] Fix NULL ->nsproxy dereference in /proc//mounts /proc//mounstats was fixed, all right, but... To reproduce: while true; do find /proc -type f 2>/dev/null \| xargs cat 1>/dev/null 2>/dev/null; done BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000c printing eip: c01754df *pde = 00000000 Oops: 0000 [#28] Modules linked in: af_packet ohci_hcd e1000 ehci_hcd uhci_hcd usbcore xfs CPU: 0 EIP: 0060:[<c01754df>] Not tainted VLI EFLAGS: 00010286 (2.6.20-rc5 #1) EIP is at mounts_open+0x1c/0xac eax: 00000000 ebx: d5898ac0 ecx: d1d27b18 edx: d1d27a50 esi: e6083e10 edi: d3c87f38 ebp: d5898ac0 esp: d3c87ef0 ds: 007b es: 007b ss: 0068 Process cat (pid: 18071, ti=d3c86000 task=f7d5f070 task.ti=d3c86000) Stack: d5898ac0 e6083e10 d3c87f38 c01754c3 c0147c91 c18c52c0 d343f314 d5898ac0 00008000 d3c87f38 ffffff9c c0147e09 d5898ac0 00000000 00000000 c0147e4b 00000000 d3c87f38 d343f314 c18c52c0 c015e53e 00001000 08051000 00000101 Call Trace: [<c01754c3>] mounts_open+0x0/0xac [<c0147c91>] __dentry_open+0xa1/0x18c [<c0147e09>] nameidata_to_filp+0x31/0x3a [<c0147e4b>] do_filp_open+0x39/0x40 [<c015e53e>] seq_read+0x128/0x2aa [<c0147e8c>] do_sys_open+0x3a/0x6d [<c0147efa>] sys_open+0x1c/0x20 [<c0102b76>] sysenter_past_esp+0x5f/0x85 [<c02a0033>] unix_stream_recvmsg+0x3bf/0x4bf ======================= Code: 5d c3 89 d8 e8 06 e0 f9 ff eb bd 0f 0b eb fe 55 57 56 53 89 d5 8b 40 f0 31 d2 e8 02 c1 fa ff 89 c2 85 c0 74 5c 8b 80 48 04 00 00 <8b> 58 0c 85 db 74 02 ff 03 ff 4a 08 0f 94 c0 84 c0 75 74 85 db EIP: [<c01754df>] mounts_open+0x1c/0xac SS:ESP 0068:d3c87ef0 A race with do_exit()'s call to exit_namespaces(). Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-26 13:50:58 -08:00
Andrew Morton	aba76fdb8a	[PATCH] io-accounting: report in procfs Add a simple /proc/pid/io to show the IO accounting fields. Maybe this shouldn't be merged in mainline - the preferred reporting channel is taskstats. But given the poor state of our userspace support for taskstats, this is useful for developer-testing, at least. And it improves the changes that the procps developers will wire it up into top(1). Opinions are sought. The patch also wires up the existing IO-accounting fields. It's a bit racy on 32-bit machines: if process A reads process B's /proc/pid/io while process B is updating one of those 64-bit counters, process A could see an intermediate result. Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:41 -08:00
Akinobu Mita	f4f154fd92	[PATCH] fault injection: process filtering for fault-injection capabilities This patch provides process filtering feature. The process filter allows failing only permitted processes by /proc/<pid>/make-it-fail Please see the example that demostrates how to inject slab allocation failures into module init/cleanup code in Documentation/fault-injection/fault-injection.txt Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:29:02 -08:00
Kirill Korotaev	6b3286ed11	[PATCH] rename struct namespace to struct mnt_namespace Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with other namespaces being developped for the containers : pid, uts, ipc, etc. 'namespace' variables and attributes are also renamed to 'mnt_ns' Signed-off-by: Kirill Korotaev <dev@sw.ru> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:51 -08:00
Josef "Jeff" Sipek	2fddfeefee	[PATCH] proc: change uses of f_{dentry, vfsmnt} to use f_path Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the proc filesystem code. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:41 -08:00
Adrian Bunk	9711ef9945	[PATCH] make fs/proc/base.c:proc_pid_instantiate() static Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:40 -08:00
Guillem Jover	8fb4fc68ca	[PATCH] Allow user processes to raise their oom_adj value Currently a user process cannot rise its own oom_adj value (i.e. unprotecting itself from the OOM killer). As this value is stored in the task structure it gets inherited and the unprivileged childs will be unable to rise it. The EPERM will be handled by the generic proc fs layer, as only processes with the proper caps or the owner of the process will be able to write to the file. So we allow only the processes with CAP_SYS_RESOURCE to lower the value, otherwise it will get an EACCES which seems more appropriate than EPERM. Signed-off-by: Guillem Jover <guillem.jover@nokia.com> Acked-by: Andrea Arcangeli <andrea@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Vasily Tarasov	701e054e0c	[PATCH] mounstats NULL pointer dereference OpenVZ developers team has encountered the following problem in 2.6.19-rc6 kernel. After some seconds of running script while [[ 1 ]] do find /proc -name mountstats \| xargs cat done this Oops appears: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010 printing eip: c01a6b70 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables parport_pc lp parport sunrpc af_packet thermal processor fan button battery asus_acpi ac ohci_hcd ehci_hcd usbcore i2c_nforce2 i2c_core tg3 floppy pata_amd ide_cd cdrom sata_nv libata CPU: 1 EIP: 0060:[<c01a6b70>] Not tainted VLI EFLAGS: 00010246 (2.6.19-rc6 #2) EIP is at mountstats_open+0x70/0xf0 eax: 00000000 ebx: e6247030 ecx: e62470f8 edx: 00000000 esi: 00000000 edi: c01a6b00 ebp: c33b83c0 esp: f4105eb4 ds: 007b es: 007b ss: 0068 Process cat (pid: 6044, ti=f4105000 task=f4104a70 task.ti=f4105000) Stack: c33b83c0 c04ee940 f46a4a80 c33b83c0 e4df31b4 c01a6b00 f4105000 c0169231 e4df31b4 c33b83c0 c33b83c0 f4105f20 00000003 f4105000 c0169445 f2503cf0 f7f8c4c0 00008000 c33b83c0 00000000 00008000 c0169350 f4105f20 00008000 Call Trace: [<c01a6b00>] mountstats_open+0x0/0xf0 [<c0169231>] __dentry_open+0x181/0x250 [<c0169445>] nameidata_to_filp+0x35/0x50 [<c0169350>] do_filp_open+0x50/0x60 [<c01873d6>] seq_read+0xc6/0x300 [<c0169511>] get_unused_fd+0x31/0xc0 [<c01696d3>] do_sys_open+0x63/0x110 [<c01697a7>] sys_open+0x27/0x30 [<c01030bd>] sysenter_past_esp+0x56/0x79 ======================= Code: 45 74 8b 54 24 20 89 44 24 08 8b 42 f0 31 d2 e8 47 cb f8 ff 85 c0 89 c3 74 51 8d 80 a0 04 00 00 e8 46 06 2c 00 8b 83 48 04 00 00 <8b> 78 10 85 ff 74 03 f0 ff 07 b0 01 86 83 a0 04 00 00 f0 ff 4b EIP: [<c01a6b70>] mountstats_open+0x70/0xf0 SS:ESP 0068:f4105eb4 The problem is that task->nsproxy can be equal NULL for some time during task exit. This patch fixes the BUG. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-25 13:28:33 -08:00
Alexey Dobriyan	8ac773b4f7	[PATCH] OOM killer meets userspace headers Despite mm.h is not being exported header, it does contain one thing which is part of userspace ABI -- value disabling OOM killer for given process. So, a) create and export include/linux/oom.h b) move OOM_DISABLE define there. c) turn bounding values of /proc/$PID/oom_adj into defines and export them too. Note: mass __KERNEL__ removal will be done later. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:38 -07:00
Andrew Morton	0187f879ee	[PATCH] PROC_NUMBUF is wrong Actually, the decimal representation of a 32-bit signed number can take 12 bytes, including the \0. And then some code adds a \n as well, so let's give it 13 bytes. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:43 -07:00
Oleg Nesterov	1a657f78dc	[PATCH] introduce get_task_pid() to fix unsafe get_pid() proc_pid_make_inode: ei->pid = get_pid(task_pid(task)); I think this is not safe. get_pid() can be preempted after checking "pid != NULL". Then the task exits, does detach_pid(), and RCU frees the pid. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Eric W. Biederman	1c0d04c9e4	[PATCH] proc: comment what proc_fill_cache does Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Eric W. Biederman	5e61feafa2	[PATCH] proc: remove the useless SMP-safe comments from /proc Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Eric W. Biederman	7bcd6b0efd	[PATCH] proc: remove trailing blank entry from pid_entry arrays It was pointed out that since I am taking ARRAY_SIZE anyway the trailing empty entry is silly and just wastes space. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Eric W. Biederman	8e95bd936d	[PATCH] proc: properly compute TGID_OFFSET The value doesn't change but this ensures I will have the proper value when other files are added to proc_base_stuff. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:24 -07:00
Eric W. Biederman	7fbaac005c	[PATCH] proc: Use pid_task instead of open coding it Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:24 -07:00
Eric W. Biederman	72d9dcfc7a	[PATCH] proc: Merge proc_tid_attr and proc_tgid_attr The implementation is exactly the same and there is currently nothing to distinguish proc_tid_attr, and proc_tgid_attr. So it is pointless to have two separate implementations. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:24 -07:00
Eric W. Biederman	61a2878402	[PATCH] proc: Remove the hard coded inode numbers The hard coded inode numbers in proc currently limit its maintainability, its flexibility, and what can be done with the rest of system. /proc limits pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems, and placing the pid in the inode number really gets in the way of implementing subdirectories of per process information. Ever since people started adding to the middle of the file type enumeration we haven't been maintaing the historical inode numbers, all we have really succeeded in doing is keeping the pid in the proc inode number. The pid is already available in the directory name so no information is lost removing it from the inode number. So if something in user space cares if we remove the inode number from the /proc inode it is almost certainly broken. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:24 -07:00
Eric W. Biederman	444ceed8d1	[PATCH] proc: Factor out an instantiate method from every lookup method To remove the hard coded proc inode numbers it is necessary to be able to create the proc inodes during readdir. The instantiate methods are the subset of lookup that is needed to accomplish that. This first step just splits the lookup methods into 2 functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:24 -07:00
Eric W. Biederman	801199ce80	[PATCH] proc: Make the generation of the self symlink table driven This patch generalizes the concept of files in /proc that are related to processes but live in the root directory of /proc Ideally this would reuse infrastructure from the rest of the process specific parts of proc but unfortunately security_task_to_inode must not be called on files that are not strictly per process. security_task_to_inode really needs to be reexamined as the security label can change in important places that we are not currently catching, but I'm not certain that simplifies this problem. By at least matching the structure of the rest of proc we get more idiom reuse and it becomes easier to spot problems in the way things are put together. Later things like /proc/mounts are likely to be moved into proc_base as well. If union mounts are ever supported we may be able to make /proc a union mount, and properly split it into 2 filesystems. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:24 -07:00
Serge E. Hallyn	1651e14e28	[PATCH] namespaces: incorporate fs namespace into nsproxy This moves the mount namespace into the nsproxy. The mount namespace count now refers to the number of nsproxies point to it, rather than the number of tasks. As a result, the unshare_namespace() function in kernel/fork.c no longer checks whether it is being shared. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Eric W. Biederman	20cdc894c4	[PATCH] proc: modify proc_pident_lookup to be completely table driven Currently proc_pident_lookup gets the names and types from a table and then has a huge switch statement to get the inode and file operations it needs. That is silly and is becoming increasingly hard to maintain so I just put all of the information in the table. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:13 -07:00
Eric W. Biederman	28a6d67179	[PATCH] proc: reorder the functions in base.c There were enough changes in my last round of cleaning up proc I had to break up the patch series into smaller chunks, and my last chunk never got resent. This patchset gives proc dynamic inode numbers (the static inode numbers were a pain to maintain and prevent all kinds of things), and removes the horrible switch statements that had to be kept in sync with everything else. Being fully table driver takes us 90% of the way of being able to register new process specific attributes in proc. This patch: Group the functions by what they implement instead of by type of operation. As it existed base.c was quickly approaching the point where it could not be followed. No functionality or code changes asside from adding/removing forward declartions are implemented in this patch. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:13 -07:00
Eric W. Biederman	0804ef4b0d	[PATCH] proc: readdir race fix (take 3) The problem: An opendir, readdir, closedir sequence can fail to report process ids that are continually in use throughout the sequence of system calls. For this race to trigger the process that proc_pid_readdir stops at must exit before readdir is called again. This can cause ps to fail to report processes, and it is in violation of posix guarantees and normal application expectations with respect to readdir. Currently there is no way to work around this problem in user space short of providing a gargantuan buffer to user space so the directory read all happens in on system call. This patch implements the normal directory semantics for proc, that guarantee that a directory entry that is neither created nor destroyed while reading the directory entry will be returned. For directory that are either created or destroyed during the readdir you may or may not see them. Furthermore you may seek to a directory offset you have previously seen. These are the guarantee that ext[23] provides and that posix requires, and more importantly that user space expects. Plus it is a simple semantic to implement reliable service. It is just a matter of calling readdir a second time if you are wondering if something new has show up. These better semantics are implemented by scanning through the pids in numerical order and by making the file offset a pid plus a fixed offset. The pid scan happens on the pid bitmap, which when you look at it is remarkably efficient for a brute force algorithm. Given that a typical cache line is 64 bytes and thus covers space for 64*8 == 200 pids. There are only 40 cache lines for the entire 32K pid space. A typical system will have 100 pids or more so this is actually fewer cache lines we have to look at to scan a linked list, and the worst case of having to scan the entire pid bitmap is pretty reasonable. If we need something more efficient we can go to a more efficient data structure for indexing the pids, but for now what we have should be sufficient. In addition this takes no additional locks and is actually less code than what we are doing now. Also another very subtle bug in this area has been fixed. It is possible to catch a task in the middle of de_thread where a thread is assuming the thread of it's thread group leader. This patch carefully handles that case so if we hit it we don't fail to return the pid, that is undergoing the de_thread dance. Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for providing the first fix, pointing this out and working on it. [oleg@tv-sign.ru: fix it] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:12 -07:00
Frederik Deweerdt	f7ca54f486	[PATCH] fix mem_write() return value At the beginning of the routine, "copied" is set to 0, but it is no good because in lines 805 and 812 it is set to other values. Finally, the routine returns as if it copied 12 (=ENOMEM) bytes less than it actually did. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Acked-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:19 -07:00
Linus Torvalds	6d76fa58b0	Don't allow chmod() on the /proc/<pid>/ files This just turns off chmod() on the /proc/<pid>/ files, since there is no good reason to allow it, and had we disallowed it originally, the nasty /proc race exploit wouldn't have been possible. The other patches already fixed the problem chmod() could cause, so this is really just some final mop-up.. This particular version is based off a patch by Eugene and Marcel which had much better naming than my original equivalent one. Signed-off-by: Eugene Teo <eteo@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-15 12:26:45 -07:00
Linus Torvalds	9ee8ab9fbf	Relax /proc fix a bit Clearign all of i_mode was a bit draconian. We only really care about S_ISUID/ISGID, after all. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:48:03 -07:00
Linus Torvalds	18b0bbd8ca	Fix nasty /proc vulnerability We have a bad interaction with both the kernel and user space being able to change some of the /proc file status. This fixes the most obvious part of it, but I expect we'll also make it harder for users to modify even their "own" files in /proc. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 16:51:34 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Eric Paris	42c3e03ef6	[PATCH] SELinux: Add sockcreate node to procattr API Below is a patch to add a new /proc/self/attr/sockcreate A process may write a context into this interface and all subsequent sockets created will be labeled with that context. This is the same idea as the fscreate interface where a process can specify the label of a file about to be created. At this time one envisioned user of this will be xinetd. It will be able to better label sockets for the actual services. At this time all sockets take the label of the creating process, so all xinitd sockets would just be labeled the same. I tested this by creating a tcp sender and listener. The sender was able to write to this new proc file and then create sockets with the specified label. I am able to be sure the new label was used since the avc denial messages kicked out by the kernel included both the new security permission setsockcreate and all the socket denials were for the new label, not the label of the running process. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Oleg Nesterov	c1df7fb88a	[PATCH] cleanup next_tid() Try to make next_tid() a bit more readable and deletes unnecessary "pid_alive(pos)" check. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Oleg Nesterov	a872ff0cb2	[PATCH] simplify/fix first_tid() first_tid: /* If nr exceeds the number of threads there is nothing todo */ if (nr) { if (nr >= get_nr_threads(leader)) goto done; } This is not reliable: sub-threads can exit after this check, so the 'for' loop below can overlap and proc_task_readdir() can return an already filldir'ed dirents. for (; pos && pid_alive(pos); pos = next_thread(pos)) { if (--nr > 0) continue; Off-by-one error, will return 'leader' when nr == 1. This patch tries to fix these problems and simplify the code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Eric W. Biederman	cc288738c9	[PATCH] proc: Remove tasklist_lock from proc_task_readdir. This is just like my previous removal of tasklist_lock from first_tgid, and next_tgid. It simply had to wait until it was rcu safe to walk the thread list. This should be the last instance of the tasklist_lock in proc. So user processes should not be able to influence the tasklist lock hold times. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Eric W. Biederman	df26c40e56	[PATCH] proc: Cleanup proc_fd_access_allowed In process of getting proc_fd_access_allowed to work it has developed a few warts. In particular the special case that always allows introspection and the special case to allow inspection of kernel threads. The special case for introspection is needed for /proc/self/mem. The special case for kernel threads really should be overridable by security modules. So consolidate these checks into ptrace.c:may_attach(). The check to always allow introspection is trivial. The check to allow access to kernel threads, and zombies is a little trickier. mem_read and mem_write already verify an mm exists so it isn't needed twice. proc_fd_access_allowed only doesn't want a check to verify task->mm exits, s it prevents all access to kernel threads. So just move the task->mm check into ptrace_attach where it is needed for practical reasons. I did a quick audit and none of the security modules in the kernel seem to care if they are passed a task without an mm into security_ptrace. So the above move should be safe and it allows security modules to come up with more restrictive policy. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Eric W. Biederman	778c114477	[PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks Since 2.2 we have been doing a chroot check to see if it is appropriate to return a read or follow one of these magic symlinks. The chroot check was asking a question about the visibility of files to the calling process and it was actually checking the destination process, and not the files themselves. That test was clearly bogus. In my first pass through I simply fixed the test to check the visibility of the files themselves. That naive approach to fixing the permissions was too strict and resulted in cases where a task could not even see all of it's file descriptors. What has disturbed me about relaxing this check is that file descriptors are per-process private things, and they are occasionaly used a user space capability tokens. Looking a little farther into the symlink path on /proc I did find userid checks and a check for capability (CAP_DAC_OVERRIDE) so there were permissions checking this. But I was still concerned about privacy. Besides /proc there is only one other way to find out this kind of information, and that is ptrace. ptrace has been around for a long time and it has a well established security model. So after thinking about it I finally realized that the permission checks that make sense are the permission checks applied to ptrace_attach. The checks are simple per process, and won't cause nasty surprises for people coming from less capable unices. Unfortunately there is one case that the current ptrace_attach test does not cover: Zombies and kernel threads. Single stepping those kinds of processes is impossible. Being able to see which file descriptors are open on these tasks is important to lsof, fuser and friends. So for these special processes I made the rule you can't find out unless you have CAP_SYS_PTRACE. These proc permission checks should now conform to the principle of least surprise. As well as using much less code to implement :) Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Eric W. Biederman	5b0c1dd38b	[PATCH] proc: optimize proc_check_dentry_visible The code doesn't need to sleep to when making this check so I can just do the comparison and not worry about the reference counts. TODO: While looking at this I realized that my original cleanup did not push the permission check far enough down into the stack. The call of proc_check_dentry_visible needs to move out of the generic proc readlink/follow link code and into the individual get_link instances. Otherwise the shared resources checks are not quite correct (shared files_struct does not require a shared fs_struct), and there are races with unshare. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Eric W. Biederman	13b41b0949	[PATCH] proc: Use struct pid not struct task_ref Incrementally update my proc-dont-lock-task_structs-indefinitely patches so that they work with struct pid instead of struct task_ref. Mostly this is a straight 1-1 substitution. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:26 -07:00
Eric W. Biederman	99f8955183	[PATCH] proc: don't lock task_structs indefinitely Every inode in /proc holds a reference to a struct task_struct. If a directory or file is opened and remains open after the the task exits this pinning continues. With 8K stacks on a 32bit machine the amount pinned per file descriptor is about 10K. Normally I would figure a reasonable per user process limit is about 100 processes. With 80 processes, with a 1000 file descriptors each I can trigger the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless data. This patch replaces the struct task_struct pointer with a pointer to a struct task_ref which has a struct task_struct pointer. The so the pinning of dead tasks does not happen. The code now has to contend with the fact that the task may now exit at any time. Which is a little but not muh more complicated. With this change it takes about 1000 processes each opening up 1000 file descriptors before I can trigger the OOM killer. Much better. [mlp@google.com: task_mmu small fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Paul Jackson <pj@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Albert Cahalan <acahalan@gmail.com> Signed-off-by: Prasanna Meda <mlp@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:25 -07:00
Eric W. Biederman	8578cea750	[PATCH] proc: make PROC_NUMBUF the buffer size for holding integers as strings Currently in /proc at several different places we define buffers to hold a process id, or a file descriptor . In most of them we use either a hard coded number or a different define. Modify them all to use PROC_NUMBUF, so the code has a chance of being maintained. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:25 -07:00
Eric W. Biederman	9cc8cbc7f8	[PATCH] simply fix first_tgid Like the bug Oleg spotted in first_tid there was also a small off by one error in first_tgid, when a seek was done on the /proc directory. This fixes that and changes the code structure to make it a little more obvious what is going on. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:25 -07:00
Eric W. Biederman	de7587343b	[PATCH] proc: Remove tasklist_lock from proc_pid_lookup() and proc_task_lookup() Since we no longer need the tasklist_lock for get_task_struct the lookup methods no longer need the tasklist_lock. This just depends on my previous patch that makes get_task_struct() rcu safe. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:25 -07:00
Eric W. Biederman	454cc105ef	[PATCH] proc: Remove tasklist_lock from proc_pid_readdir We don't need the tasklist_lock to safely iterate through processes anymore. This depends on my previous to task patches that make get_task_struct rcu safe, and that make next_task() rcu safe. I haven't gotten first_tid/next_tid yet only because next_thread is missing an rcu_dereference. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:25 -07:00

1 2

90 Commits