1
Commit Graph

234310 Commits

Author SHA1 Message Date
Linus Torvalds
b80cd62b7d Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
  futex: Deobfuscate handle_futex_death()
  plist: Add priority list test
  plist: Shrink struct plist_head
  futex,plist: Remove debug lock assignment from plist_node
  futex,plist: Pass the real head of the priority list to plist_del()
  futex: Sanitize futex ops argument types
  futex: Sanitize cmpxchg_futex_value_locked API
  futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic()
  futex: Avoid redudant evaluation of task_pid_vnr()
  futex: Update futex_wait_setup comments about locking
2011-03-15 18:23:52 -07:00
Linus Torvalds
c345f60a5f Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  debugobjects: Add hint for better object identification
2011-03-15 18:23:25 -07:00
Linus Torvalds
422e6c4bc4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
  tidy the trailing symlinks traversal up
  Turn resolution of trailing symlinks iterative everywhere
  simplify link_path_walk() tail
  Make trailing symlink resolution in path_lookupat() iterative
  update nd->inode in __do_follow_link() instead of after do_follow_link()
  pull handling of one pathname component into a helper
  fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
  Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
  readlinkat(), fchownat() and fstatat() with empty relative pathnames
  Allow O_PATH for symlinks
  New kind of open files - "location only".
  ext4: Copy fs UUID to superblock
  ext3: Copy fs UUID to superblock.
  vfs: Export file system uuid via /proc/<pid>/mountinfo
  unistd.h: Add new syscalls numbers to asm-generic
  x86: Add new syscalls for x86_64
  x86: Add new syscalls for x86_32
  fs: Remove i_nlink check from file system link callback
  fs: Don't allow to create hardlink for deleted file
  vfs: Add open by file handle support
  ...
2011-03-15 15:48:13 -07:00
Trond Myklebust
c83ce989cb VFS: Fix the nfs sillyrename regression in kernel 2.6.38
The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename
because the latter relies on being able to determine the parent
directory of the dentry in the ->iput() callback in order to send the
appropriate unlink rpc call.

Looking at the code that cares about races with dput(), there doesn't
seem to be anything that specifically uses d_parent as a test for
whether or not there is a race:
  - __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent
  - shrink_dcache_for_umount() is safe since nothing else can rearrange
    the dentries in that super block.
  - have_submount(), select_parent() and d_genocide() can test for a
    deletion if we set the DCACHE_DISCONNECTED flag when the dentry
    is removed from the parent's d_subdirs list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org (2.6.38, needs commit c826cb7dfc "dcache.c:
	create helper function for duplicated functionality" )
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-15 15:46:11 -07:00
Linus Torvalds
c826cb7dfc dcache.c: create helper function for duplicated functionality
This creates a helper function for he "try to ascend into the parent
directory" case, which was written out in triplicate before.  With all
the locking and subtle sequence number stuff, we really don't want to
duplicate that kind of code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-15 15:29:21 -07:00
Al Viro
574197e0de tidy the trailing symlinks traversal up
* pull the handling of current->total_link_count into
__do_follow_link()
* put the common "do ->put_link() if needed and path_put() the link"
  stuff into a helper (put_link(nd, link, cookie))
* rename __do_follow_link() to follow_link(), while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 17:16:25 -04:00
Al Viro
b356379a02 Turn resolution of trailing symlinks iterative everywhere
The last remaining place (resolution of nested symlink) converted
to the loop of the same kind we have in path_lookupat() and
path_openat().

Note that we still *do* have a recursion in pathname resolution;
can't avoid it, really.  However, it's strictly for nested symlinks
now - i.e. ones in the middle of a pathname.

link_path_walk() has lost the tail now - it always walks everything
except the last component.

do_follow_link() renamed to nested_symlink() and moved down.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 17:16:25 -04:00
Al Viro
ce0525449d simplify link_path_walk() tail
Now that link_path_walk() is called without LOOKUP_PARENT
only from do_follow_link(), we can simplify the checks in
last component handling.  First of all, checking if we'd
arrived to a directory is not needed - the caller will check
it anyway.  And LOOKUP_FOLLOW is guaranteed to be there,
since we only get to that place with nd->depth > 0.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 17:16:25 -04:00
Al Viro
bd92d7fed8 Make trailing symlink resolution in path_lookupat() iterative
Now the only caller of link_path_walk() that does *not* pass
LOOKUP_PARENT is do_follow_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 17:16:25 -04:00
Al Viro
b21041d0f7 update nd->inode in __do_follow_link() instead of after do_follow_link()
... and note that we only need to do it for LAST_BIND symlinks

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 17:16:25 -04:00
Al Viro
ce57dfc179 pull handling of one pathname component into a helper
new helper: walk_component().  Handles everything except symlinks;
returns negative on error, 0 on success and 1 on symlinks we decided
to follow.  Drops out of RCU mode on such symlinks.

link_path_walk() and do_last() switched to using that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 17:16:20 -04:00
Aneesh Kumar K.V
11a7b371b6 fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
We don't want to allow creation of private hardlinks by different application
using the fd passed to them via SCM_RIGHTS. So limit the null relative name
usage in linkat syscall to CAP_DAC_READ_SEARCH

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2011-03-15 17:16:05 -04:00
Linus Torvalds
76ca078328 Merge branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: suspend: remove xen_hvm_suspend
  xen: suspend: pull pre/post suspend hooks out into suspend_info
  xen: suspend: move arch specific pre/post suspend hooks into generic hooks
  xen: suspend: refactor non-arch specific pre/post suspend hooks
  xen: suspend: add "arch" to pre/post suspend hooks
  xen: suspend: pass extra hypercall argument via suspend_info struct
  xen: suspend: refactor cancellation flag into a structure
  xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding
  xen: switch to new schedop hypercall by default.
  xen: use new schedop interface for suspend
  xen: do not respond to unknown xenstore control requests
  xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled
  xen: PV on HVM: support PV spinlocks and IPIs
  xen: make the ballon driver work for hvm domains
  xen-blkfront: handle Xen major numbers other than XENVBD
  xen: do not use xen_info on HVM, set pv_info name to "Xen HVM"
  xen: no need to delay xen_setup_shutdown_event for hvm guests anymore
2011-03-15 10:59:09 -07:00
Linus Torvalds
27d2a8b97e Merge branches 'stable/ia64', 'stable/blkfront-cleanup' and 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: ia64 build broken due to "xen: switch to new schedop hypercall by default."

* 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Union the blkif_request request specific fields

* 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: annotate functions which only call into __init at start of day
  xen p2m: annotate variable which appears unused
  xen: events: mark cpu_evtchn_mask_p as __refdata
2011-03-15 10:49:16 -07:00
Linus Torvalds
010b8f4e26 Merge branch 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: events: remove dom0 specific xen_create_msi_irq
  xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq
  xen: events: push set_irq_msi down into xen_create_msi_irq
  xen: events: update pirq_to_irq in xen_create_msi_irq
  xen: events: refactor xen_create_msi_irq slightly
  xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ
  xen: events: assume PHYSDEVOP_get_free_pirq exists
  xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq
  xen: events: return irq from xen_allocate_pirq_msi
  xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi
  xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available.
  xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0
2011-03-15 10:47:56 -07:00
Linus Torvalds
397fae0818 Merge branches 'stable/irq.rework' and 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well.
  xen: Use IRQF_FORCE_RESUME
  xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.
  xen: Fix compile error introduced by "switch to new irq_chip functions"
  xen: Switch to new irq_chip functions
  xen: Remove stale irq_chip.end
  xen: events: do not free legacy IRQs
  xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges.
  xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq
  xen:events: move find_unbound_irq inside CONFIG_PCI_MSI
  xen: handled remapped IRQs when enabling a pcifront PCI device.
  genirq: Add IRQF_FORCE_RESUME

* 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code.
  pci/xen: Cleanup: convert int** to int[]
  pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq
  xen-pcifront: Sanity check the MSI/MSI-X values
  xen-pcifront: don't use flush_scheduled_work()
2011-03-15 10:47:16 -07:00
Linus Torvalds
c7146dd009 Merge branches 'stable/p2m-identity.v4.9.1' and 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/p2m-identity.v4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
  xen/m2p: No need to catch exceptions when we know that there is no RAM
  xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set.
  xen/debugfs: Add 'p2m' file for printing out the P2M layout.
  xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
  xen/mmu: WARN_ON when racing to swap middle leaf.
  xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN.
  xen/mmu: Add the notion of identity (1-1) mapping.
  xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.

* 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow.
  xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps.
2011-03-15 10:32:15 -07:00
Al Viro
326be7b484 Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Al Viro
65cfc67223 readlinkat(), fchownat() and fstatat() with empty relative pathnames
For readlinkat() we simply allow empty pathname; it will fail unless
we have dfd equal to O_PATH-opened symlink, so we are outside of
POSIX scope here.  For fchownat() and fstatat() we allow AT_EMPTY_PATH;
let the caller explicitly ask for such behaviour.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Al Viro
bcda76524c Allow O_PATH for symlinks
At that point we can't do almost nothing with them.  They can be opened
with O_PATH, we can manipulate such descriptors with dup(), etc. and
we can see them in /proc/*/{fd,fdinfo}/*.

We can't (and won't be able to) follow /proc/*/fd/* symlinks for those;
there's simply not enough information for pathname resolution to go on
from such point - to resolve a symlink we need to know which directory
does it live in.

We will be able to do useful things with them after the next commit, though -
readlinkat() and fchownat() will be possible to use with dfd being an
O_PATH-opened symlink and empty relative pathname.  Combined with
open_by_handle() it'll give us a way to do realink-by-handle and
lchown-by-handle without messing with more redundant syscalls.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Al Viro
1abf0c718f New kind of open files - "location only".
New flag for open(2) - O_PATH.  Semantics:
	* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
	* almost all operations on the resulting descriptors shall
fail with -EBADF.  Exceptions are:
	1) operations on descriptors themselves (i.e.
		close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
		fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
		fcntl(fd, F_SETFD, ...))
	2) fcntl(fd, F_GETFL), for a common non-destructive way to
		check if descriptor is open
	3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
		points of pathname resolution
	* closing such descriptor does *NOT* affect dnotify or
posix locks.
	* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself.  Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).

fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them.  That protects
existing code from dealing with those things.

There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
f2fa2ffc20 ext4: Copy fs UUID to superblock
File system UUID is made available to application
via  /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
03cb5f03dc ext3: Copy fs UUID to superblock.
File system UUID is made available to application
via  /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
93f1c20bc8 vfs: Export file system uuid via /proc/<pid>/mountinfo
We add a per superblock uuid field. File systems should
update the uuid in the fill_super callback

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
a51571ccb8 unistd.h: Add new syscalls numbers to asm-generic
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Aneesh Kumar K.V
6aae5f2b20 x86: Add new syscalls for x86_64
This patch add new syscalls to x86_64

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V
7dadb755b0 x86: Add new syscalls for x86_32
This patch adds new syscalls to x86_32

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V
f17b604207 fs: Remove i_nlink check from file system link callback
Now that VFS check for inode->i_nlink == 0 and returns proper
error, remove similar check from file system

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V
aae8a97d3e fs: Don't allow to create hardlink for deleted file
Add inode->i_nlink == 0 check in VFS. Some of the file systems
do this internally. A followup patch will remove those instance.
This is needed to ensure that with link by handle we don't allow
to create hardlink of an unlinked file. The check also prevent a race
between unlink and link

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V
becfd1f375 vfs: Add open by file handle support
[AV: duplicate of open() guts removed; file_open_root() used instead]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V
990d6c2d7a vfs: Add name to file handle conversion support
The syscall also return mount id which can be used
to lookup file system specific information such as uuid
in /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:37 -04:00
Linus Torvalds
521cb40b0c Linux 2.6.38 2011-03-14 18:20:32 -07:00
Al Viro
f52e0c1130 New AT_... flag: AT_EMPTY_PATH
For name_to_handle_at(2) we'll want both ...at()-style syscall that
would be usable for non-directory descriptors (with empty relative
pathname).  Introduce new flag (AT_EMPTY_PATH) to deal with that and
corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to
deal with the latter.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 19:12:20 -04:00
Linus Torvalds
59766edc79 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300:
  MN10300: atomic_read() should ensure it emits a load
  MN10300: The SMP_ICACHE_INV_FLUSH_RANGE IPI command does not exist
  MN10300: Proper use of macros get_user() in the case of incremented pointers
2011-03-14 15:20:39 -07:00
Linus Torvalds
2990821d0e Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (26 commits)
  MIPS: Alchemy: Fix reset for MTX-1 and XXS1500
  MIPS: MTX-1: Make au1000_eth probe all PHY addresses
  MIPS: Jz4740: Add HAVE_CLK
  MIPS: Move idle task creation to work queue
  MIPS, Perf-events: Use unsigned delta for right shift in event update
  MIPS, Perf-events: Work with the new callchain interface
  MIPS, Perf-events: Fix event check in validate_event()
  MIPS, Perf-events: Work with the new PMU interface
  MIPS, Perf-events: Work with irq_work
  MIPS: Fix always CONFIG_LOONGSON_UART_BASE=y
  MIPS: Loongson: Fix potentially wrong string handling
  MIPS: Fix GCC-4.6 'set but not used' warning in arch/mips/mm/init.c
  MIPS: Fix GCC-4.6 'set but not used' warning in ieee754int.h
  MIPS: Remove unused code from arch/mips/kernel/syscall.c
  MIPS: Fix GCC-4.6 'set but not used' warning in signal*.c
  MIPS: MSP: Fix MSP71xx bpci interrupt handler return value
  MIPS: Select R4K timer lib for all MSP platforms
  MIPS: Loongson: Remove ad-hoc cmdline default
  MIPS: Clear the correct flag in sysmips(MIPS_FIXADE, ...).
  MIPS: Add an unreachable return statement to satisfy buggy GCCs.
  ...
2011-03-14 15:20:12 -07:00
Linus Torvalds
869c34f520 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: ce4100: Set pci ops via callback instead of module init
  x86/mm: Fix pgd_lock deadlock
  x86/mm: Handle mm_fault_error() in kernel space
  x86: Don't check for BIOS corruption in first 64K when there's no need to
2011-03-14 15:19:09 -07:00
Linus Torvalds
52d3c03675 Revert "oom: oom_kill_process: fix the child_points logic"
This reverts the parent commit.  I hate doing that, but it's generating
some discussion ("half of it is right"), and since I am planning on
doing the 2.6.38 release later today we can punt it to stable if
required. Let's not rock the boat right now.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-14 15:17:07 -07:00
Oleg Nesterov
dc1b83ab08 oom: oom_kill_process: fix the child_points logic
oom_kill_process() starts with victim_points == 0.  This means that
(most likely) any child has more points and can be killed erroneously.

Also, "children has a different mm" doesn't match the reality, we should
check child->mm != t->mm.  This check is not exactly correct if t->mm ==
NULL but this doesn't really matter, oom_kill_task() will kill them
anyway.

Note: "Kill all processes sharing p->mm" in oom_kill_task() is wrong
too.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-14 13:38:35 -07:00
Thomas Gleixner
07d5ecae29 arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
commit 522d7dec(futex: Remove redundant pagefault_disable in
futex_atomic_cmpxchg_inatomic()) added a bogus comment.

/* Note that preemption is disabled by futex_atomic_cmpxchg_inatomic
 * call sites. */

Bogus in two aspects:

1) pagefault_disable != preempt_disable even if the mechanism we use
   is the same

2) we have a call site which deliberately does not disable pagefaults
   as it wants the possible fault to be handled - though that has been
   changed for consistency reasons now.

Sigh. I really should have seen that when committing the above. :(

Catched-by-and-rightfully-ranted-at-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1103141126590.2787@localhost6.localdomain6>
Cc: Michel Lespinasse <walken@google.com>
Cc: Darren Hart <darren@dvhart.com>
2011-03-14 21:10:26 +01:00
Thomas Gleixner
6e0aa9f8a8 futex: Deobfuscate handle_futex_death()
handle_futex_death() uses futex_atomic_cmpxchg_inatomic() without
disabling page faults. That's ok, but totally non obvious.

We don't hold locks so we actually can and want to fault here, because
the get_user() before futex_atomic_cmpxchg_inatomic() does not
guarantee a R/W mapping.

We could just add a big fat comment to explain this, but actually
changing the code so that the functionality is entirely clear is
better.

Use the helper function which disables page faults around the
futex_atomic_cmpxchg_inatomic() and handle a fault with a call to
fault_in_user_writeable() as all other places in the futex code do as
well.

Pointed-out-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <darren@dvhart.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
LKML-Reference: <alpine.LFD.2.00.1103141126590.2787@localhost6.localdomain6>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-14 21:08:47 +01:00
Florian Fainelli
9ced975711 MIPS: Alchemy: Fix reset for MTX-1 and XXS1500
Since commit 32fd6901 (MIPS: Alchemy: get rid of common/reset.c)
Alchemy-based boards use their own reset function. For MTX-1 and XXS1500,
the reset function pokes at the BCSR.SYSTEM_RESET register, but this does
not work. According to Bruno Randolf, this was not tested when written.

Previously, the generic au1000_restart() routine called the board specific
reset function, which for MTX-1 and XXS1500 did not work, but finally made
a jump to the reset vector, which really triggers a system restart. Fix
reboot for both targets by jumping to the reset vector.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2093/
Acked-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:28 +01:00
Florian Fainelli
bf3a1eb859 MIPS: MTX-1: Make au1000_eth probe all PHY addresses
When au1000_eth probes the MII bus for PHY address, if we do not set
au1000_eth platform data's phy_search_highest_address, the MII probing
logic will exit early and will assume a valid PHY is found at address 0.
For MTX-1, the PHY is at address 31, and without this patch, the link
detection/speed/duplex would not work correctly.

CC: stable@kernel.org
Signed-off-by: Florian Fainelli <florian@openwrt.org>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2111/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:27 +01:00
Maurus Cuelenaere
ab5330eb26 MIPS: Jz4740: Add HAVE_CLK
Jz4740 supports the clock framework but doesn't have HAVE_CLK defined,
so define it!

Signed-off-by: Maurus Cuelenaere <mcuelenaere@gmail.com>
To: linux-mips@linux-mips.org
To: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/2112/
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:27 +01:00
Maksim Rayskiy
6667deb69e MIPS: Move idle task creation to work queue
To avoid forking usermode thread when creating an idle task, move fork_idle
to a work queue.

If kernel starts with maxcpus= option which does not bring all available
cpus online at boot time, idle tasks for offline cpus are not created. If
later offline cpus are hotplugged through sysfs, __cpu_up is called in
the context of the user task, and fork_idle copies its non-zero mm
pointer.  This causes BUG() in per_cpu_trap_init.

This also avoids issues with resource limits of the CPU writing to sysfs,
containers, maybe others.

Signed-off-by: Maksim Rayskiy <mrayskiy@broadcom.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2070/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:27 +01:00
Deng-Cheng Zhu
ba9786f324 MIPS, Perf-events: Use unsigned delta for right shift in event update
Leverage the commit for ARM by Will Deacon:

- 446a5a8b1e
    ARM: 6205/1: perf: ensure counter delta is treated as unsigned

    Hardware performance counters on ARM are 32-bits wide but atomic64_t
    variables are used to represent counter data in the hw_perf_event structure.

    The armpmu_event_update function right-shifts a signed 64-bit delta variable
    and adds the result to the event count. This can lead to shifting in sign-bits
    if the MSB of the 32-bit counter value is set. This results in perf output
    such as:

     Performance counter stats for 'sleep 20':

     18446744073460670464  cycles             <-- 0xFFFFFFFFF12A6000
            7783773  instructions             #      0.000 IPC
                465  context-switches
                161  page-faults
            1172393  branches

       20.154242147  seconds time elapsed

    This patch ensures that the delta value is treated as unsigned so that the
    right shift sets the upper bits to zero.

Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
To: a.p.zijlstra@chello.nl
To: fweisbec@gmail.com
To: will.deacon@arm.com
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: wuzhangjin@gmail.com
Cc: paulus@samba.org
Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: matt@console-pimps.org
Cc: sshtylyov@mvista.com
Patchwork: http://patchwork.linux-mips.org/patch/2015/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:27 +01:00
Deng-Cheng Zhu
98f92f2f9e MIPS, Perf-events: Work with the new callchain interface
This is the MIPS part of the following commits by Frederic Weisbecker:

- f72c1a931e
    perf: Factorize callchain context handling

    Store the kernel and user contexts from the generic layer instead
    of archs, this gathers some repetitive code.

- 56962b4449
    perf: Generalize some arch callchain code

    - Most archs use one callchain buffer per cpu, except x86 that needs
      to deal with NMIs. Provide a default perf_callchain_buffer()
      implementation that x86 overrides.

    - Centralize all the kernel/user regs handling and invoke new arch
      handlers from there: perf_callchain_user() / perf_callchain_kernel()
      That avoid all the user_mode(), current->mm checks and so...

    - Invert some parameters in perf_callchain_*() helpers: entry to the
      left, regs to the right, following the traditional (dst, src).

- 70791ce9ba
    perf: Generalize callchain_store()

    callchain_store() is the same on every archs, inline it in
    perf_event.h and rename it to perf_callchain_store() to avoid
    any collision.

    This removes repetitive code.

- c1a65932fd
    perf: Drop unappropriate tests on arch callchains

    Drop the TASK_RUNNING test on user tasks for callchains as
    this check doesn't seem to make any sense.

    Also remove the tests for !current that is not supposed to
    happen and current->pid as this should be handled at the
    generic level, with exclude_idle attribute.

Reported-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
To: a.p.zijlstra@chello.nl
To: will.deacon@arm.com
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: paulus@samba.org
Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: dengcheng.zhu@gmail.com
Cc: matt@console-pimps.org
Cc: sshtylyov@mvista.com
Patchwork: http://patchwork.linux-mips.org/patch/2014/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:27 +01:00
Deng-Cheng Zhu
c049b6a5f2 MIPS, Perf-events: Fix event check in validate_event()
Ignore events that are in off/error state or belong to a different PMU.

This patch originates from the following commit for ARM by Will Deacon:

- 65b4711ff5
    ARM: 6352/1: perf: fix event validation

    The validate_event function in the ARM perf events backend has the
    following problems:

    1.) Events that are disabled count towards the cost.
    2.) Events associated with other PMUs [for example, software events or
        breakpoints] do not count towards the cost, but do fail validation,
        causing the group to fail.

    This patch changes validate_event so that it ignores events in the
    PERF_EVENT_STATE_OFF state or that are scheduled for other PMUs.

Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
To: a.p.zijlstra@chello.nl
To: fweisbec@gmail.com
To: will.deacon@arm.com
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: wuzhangjin@gmail.com
Cc: paulus@samba.org
Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: dengcheng.zhu@gmail.com
Cc: matt@console-pimps.org
Cc: sshtylyov@mvista.com
Cc: ddaney@caviumnetworks.com
Patchwork: http://patchwork.linux-mips.org/patch/2013/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:27 +01:00
Deng-Cheng Zhu
404ff63840 MIPS, Perf-events: Work with the new PMU interface
This is the MIPS part of the following commits by Peter Zijlstra:

- a4eaf7f146
    perf: Rework the PMU methods

    Replace pmu::{enable,disable,start,stop,unthrottle} with
    pmu::{add,del,start,stop}, all of which take a flags argument.

    The new interface extends the capability to stop a counter while
    keeping it scheduled on the PMU. We replace the throttled state with
    the generic stopped state.

    This also allows us to efficiently stop/start counters over certain
    code paths (like IRQ handlers).

    It also allows scheduling a counter without it starting, allowing for
    a generic frozen state (useful for rotating stopped counters).

    The stopped state is implemented in two different ways, depending on
    how the architecture implemented the throttled state:

     1) We disable the counter:
        a) the pmu has per-counter enable bits, we flip that
        b) we program a NOP event, preserving the counter state

     2) We store the counter state and ignore all read/overflow events

For MIPSXX, the stopped state is implemented in the way of 1.b as above.

- 33696fc0d1
    perf: Per PMU disable

    Changes perf_disable() into perf_pmu_disable().

- 24cd7f54a0
    perf: Reduce perf_disable() usage

    Since the current perf_disable() usage is only an optimization,
    remove it for now. This eases the removal of the __weak
    hw_perf_enable() interface.

- b0a873ebbf
    perf: Register PMU implementations

    Simple registration interface for struct pmu, this provides the
    infrastructure for removing all the weak functions.

- 51b0fe3954
    perf: Deconstify struct pmu

    sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

Reported-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
To: a.p.zijlstra@chello.nl
To: fweisbec@gmail.com
To: will.deacon@arm.com
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: wuzhangjin@gmail.com
Cc: paulus@samba.org
Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: dengcheng.zhu@gmail.com
Cc: matt@console-pimps.org
Cc: sshtylyov@mvista.com
Cc: ddaney@caviumnetworks.com
Patchwork: http://patchwork.linux-mips.org/patch/2012/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:26 +01:00
Deng-Cheng Zhu
91f017372a MIPS, Perf-events: Work with irq_work
This is the MIPS part of the following commit by Peter Zijlstra:

- e360adbe29
    irq_work: Add generic hardirq context callbacks

    Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

For MIPSXX, we need to call irq_work_run() at the tail of the perf IRQ
handler as described above.

Reported-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
To: fweisbec@gmail.com
To: will.deacon@arm.com
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: paulus@samba.org
Cc: mingo@elte.hu
Cc: acme@redhat.com
Cc: matt@console-pimps.org
Cc: sshtylyov@mvista.com,
Patchwork: http://patchwork.linux-mips.org/patch/2011/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:26 +01:00
Yoichi Yuasa
efe8dc556c MIPS: Fix always CONFIG_LOONGSON_UART_BASE=y
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2055/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-03-14 21:07:26 +01:00