This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.
David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space futex variable are out
of sync. First we assumed that this is a user space data corruption,
but deeper investigation revieled that the problem happend because the
pi-futex code is not handling a fault in the futex_lock_pi path when
the user space variable needs to be fixed up.
The fault happens when a fork mapped the anon memory which contains
the futex readonly for COW or the page got swapped out exactly between
the unlock of the futex and the return of either the new futex owner
or the task which was the expected owner but failed to acquire the
kernel internal rtmutex. The current futex_lock_pi() code drops out
with an inconsistent in case it faults and returns -EFAULT to user
space. User space has no way to fixup that state.
When we wrote this code we thought that we could not drop the hash
bucket lock at this point to handle the fault.
After analysing the code again it turned out to be wrong because there
are only two tasks involved which might modify the pi_state and the
user space variable:
- the task which acquired the rtmutex
- the pending owner of the pi_state which did not get the rtmutex
Both tasks drop into the fixup_pi_state() function before returning to
user space. The first task which acquired the hash bucket lock faults
in the fixup of the user space variable, drops the spinlock and calls
futex_handle_fault() to fault in the page. Now the second task could
acquire the hash bucket lock and tries to fixup the user space
variable as well. It either faults as well or it succeeds because the
first task already faulted the page in.
One caveat is to avoid a double fixup. After returning from the fault
handling we reacquire the hash bucket lock and check whether the
pi_state owner has been modified already.
Reported-by: David Holmes <david.holmes@sun.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Holmes <david.holmes@sun.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel/futex.c | 93 ++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 73 insertions(+), 20 deletions(-)
The zonelist patches caused the loop that checks for available
objects in permitted zones to not terminate immediately. One object
per zone per allocation may be allocated and then abandoned.
Break the loop when we have successfully allocated one object.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch changes the function reserve_bootmem_node() from void to int,
returning -ENOMEM if the allocation fails.
This fixes a build problem on x86 with CONFIG_KEXEC=y and
CONFIG_NEED_MULTIPLE_NODES=y
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netns: Don't receive new packets in a dead network namespace.
sctp: Make sure N * sizeof(union sctp_addr) does not overflow.
pppoe: warning fix
ipv6: Drop packets for loopback address from outside of the box.
ipv6: Remove options header when setsockopt's optlen is 0
mac80211: detect driver tx bugs
As noticed by Gabriel Campana, the kmalloc() length arg
passed in by sctp_getsockopt_local_addrs_old() can overflow
if ->addr_num is large enough.
Therefore, enforce an appropriate limit.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix warning:
drivers/net/pppoe.c: In function 'pppoe_recvmsg':
drivers/net/pppoe.c:945: warning: comparison of distinct pointer types lacks a cast
because skb->len is unsigned int and total_len is size_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Which was removed in the hope that generic legacy IDE quirk in
drivers/pci/probe.c is sufficient for Cypress IDE.
It isn't, as this controller has non-standard BAR layout:
secondary channel registers are in the BAR0-1 of the second
PCI function - not in the BAR2-3 of the same function, as the
generic quirk routine assumes.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vast majority of these build failures are gcc-4.3 warnings
about static functions and objects being referenced from
non-static (read: "extern inline") functions, in conjunction
with our -Werror.
We cannot just convert "extern inline" to "static inline",
as people keep suggesting all the time, because "extern inline"
logic is crucial for generic kernel build.
So
- just make sure that all callees of critical "extern inline"
functions are also "extern inline";
- use "static inline", wherever it's possible.
traps.c: work around gcc-4.3 being too smart about array
bounds-checking.
TODO: add "gnu_inline" attribute to all our "extern inline"
functions to ensure desired behaviour with future compilers.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With built-in scsi disk driver, the final link fails with a following
error:
`.exit.text' referenced in section `.rodata' of drivers/built-in.o:
defined in discarded section `.exit.text' of drivers/built-in.o
This happens with -Os (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) with all gcc-4
versions, and also with -O2 and gcc-4.3.
The problem is in sd.c:sd_major() being inlined into __exit function
exit_sd(), and the compiler generating a jump table in .rodata section
for the 'switch' statement in sd_major(). So we have references to
discarded section.
Fixed with a big hammer in the form of -fno-jump-tables.
Note that jump tables vs. discarded sections is a generic problem,
other architectures are just lucky not to suffer from it. But with
a slightly more complex switch/case statement it can be reproduced
on x86 as well. So maybe at some point we should consider
-fno-jump-tables as a generic compile option...
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To calculate addresses of locally defined variables, GCC uses 32-bit
displacement from the GP. Which doesn't work for per cpu variables in
modules, as an offset to the kernel per cpu area is way above 4G.
The workaround is to force allocation of a GOT entry for per cpu variable
using ldq instruction with a 'literal' relocation.
I had to use custom asm/percpu.h, as a required argument magic doesn't
work with asm-generic/percpu.h macros.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
BAST: Remove old IDE driver
pcmcia ide kingston compactflash's have a new manufacturer id
pcmcia: add another pata/ide ID
pcmcia: add an pata/ide ID
ide: increase timeout in wait_drive_not_busy()
palm_bk3710: fix resource management
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
ieee1394: Kconfig menu touch-up
firewire: Kconfig menu touch-up
firewire: deadline for PHY config transmission
firewire: fw-ohci: unify printk prefixes
firewire: fill_bus_reset_event needs lock protection
firewire: fw-ohci: write selfIDBufferPtr before LinkControl.rcvSelfID
firewire: fw-ohci: disable PHY packet reception into AR context
firewire: fw-ohci: use of uninitialized data in AR handler
firewire: don't panic on invalid AR request buffer
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: no AC status notification
ACPI Exception (video-1721): UNKNOWN_STATUS_CODE, Cant attach device
* 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (21 commits)
drm: only trust core drm ioctls - driver ioctls are a mess.
drm/i915: add support for Intel series 4 chipsets.
drm/radeon: add hier-z registers for r300 and r500 chipsets
drm/radeon: use DSTCACHE_CTLSTAT rather than RB2D_DSTCACHE_CTLSTAT
drm/radeon: switch IGP gart to use radeon_write_agp_base()
drm/radeon: Restore sw interrupt on resume
drm/r500: add support for AGP based cards.
drm/radeon: fix texture uploads with large 3d textures (bug 13980)
drm/radeon: add initial r500 support.
drm/radeon: init pipe setup in kernel code.
drm/radeon: fixup radeon_do_engine_reset
drm/radeon: fix pixcache and purge/cache flushing registers
drm/radeon: write AGP_BASE_2 on chips that support it.
drm/radeon: merge IGP chip setup and fixup RS400 vs RS480 support
drm/radeon: IGP clean up register and magic numbers.
drm/rs690: set base 2 to 0.
drm/rs690: set all of gart base address.
radeon: add production microcode from AMD
drm: pcigart use proper pci map interfaces.
drm: the sg alloc ioctl should write back the handle to userspace
...
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
[agp]: fixup chipset flush for new Intel G4x.
agp: brown paper bag patch - put back the two lines it took out.
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
rcupreempt: remove export of rcu_batches_completed_bh
cpuset: limit the input of cpuset.sched_relax_domain_level
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched, delay accounting: fix incorrect delay time when constantly waiting on runqueue
sched: CPU hotplug events must not destroy scheduler domains created by the cpusets
sched: rt-group: fix RR buglet
sched: rt-group: heirarchy aware throttle
sched: rt-group: fix hierarchy
sched: NULL pointer dereference while setting sched_rt_period_us
sched: fix defined-but-unused warning
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, geode: add a VSA2 ID for General Software
x86: use BOOTMEM_EXCLUSIVE on 32-bit
x86, 32-bit: fix boot failure on TSC-less processors
x86: fix NULL pointer deref in __switch_to
x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin Serial Driver: Use timer to poll CTS PIN instead of workqueue.
Blackfin arch: fix typo error in bf548 serial header file
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ahci: sis can't do PMP
ata_piix: add TECRA M4 to broken suspend list
LIBATA: Add HAVE_PATA_PLATFORM to select PATA_PLATFORM driver
sata_mv: warn on PIO with multiple DRQs
sata_mv: enable async_notify for 60x1 Rev.C0 and higher
libata: don't check whether to use DMA or not for no data commands
ahci: jmb361 has only one port
The inline assembly in drivers/watchdog/hpwdt.c was incredibly broken,
and included all the function prologue and epilogue stuff, even though
it was itself then inside a C function where the compiler would add its
own prologue and epilogue on top of it all.
This then just _happened_ to work if you had exactly the right compiler
version and exactly the right compiler flags, so that gcc just happened
to not create any prologue at all (the gcc-generated epilogue wouldn't
matter, since it would never be reached).
But the more proper way to fix it is to simply not do this. Move the
inline asm to the top level, with no surrounding function at all (the
better alternative would be to remove the prologue and make it actually
use proper description of the arguments to the inline asm, but that's a
bigger change than the one I'm willing to make right now).
Tested-by: S.Çağlar Onur <caglar@pardus.org.tr>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Security hole in sn2_ptc_proc_write
It is possible to overrun a buffer with a write to this /proc file.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove the old BAST IDE driver, as we are now using the platform-pata
support.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Up to now, Kingston compactflash cards (ab)used the Toshiba Manufacturer's ID,
In their new CF cards, they use a new one. Let's the ide subsystem
recognize CF cards with the new id.
Signed-off-by: Christophe Niclaes <cniclaes@develtech.com>
Acked-by: Philippe De Muyter <phdm@macqel.be>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Addition of Transcend 1GB 45x id so that it is properly detected.
[bart: fix typo in ide-cs's ID spotted by Alan Cox]
Signed-off-by: William Peters <w1ll14@gmail.com>
Signed-off-by: Kristoffer Ericson <Kristoffer_e1@hotmail.com>
CC: Alan Cox <alan@lxorguk.ukuu.org.uk>
CC: linux-ide@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Some ATAPI devices take longer than the current max timeout value to
become ready (i.e. TEAC DV-W28ECW takes 6 ms) so increase the timeout
value to 10 ms.
This fixes kernel.org bugzilla bug #10887:
http://bugzilla.kernel.org/show_bug.cgi?id=10887
Reported-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The driver expected a *virtual* address in the IDE platform device's memory
resource and didn't request the memory region for the register block. Fix this
taking into account the fact that DaVinci SoC devices are fixed-mapped to the
virtual memory early and we can get their virtual addresses using IO_ADDRESS()
macro, not having to call ioremap()...
While at it, also do some cosmetic changes...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit
557ed1fa26 ("remove ZERO_PAGE") removed
the ZERO_PAGE from the VM mappings, any users of get_user_pages() will
generally now populate the VM with real empty pages needlessly.
We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but
since fault handling no longer uses ZERO_PAGE for new anonymous pages,
we now need to handle that special case in follow_page() instead.
In particular, the removal of ZERO_PAGE effectively removed the core
file writing optimization where we would skip writing pages that had not
been populated at all, and increased memory pressure a lot by allocating
all those useless newly zeroed pages.
This reinstates the optimization by making the unmapped PTE case the
same as for a non-existent page table, which already did this correctly.
While at it, this also fixes the XIP case for follow_page(), where the
caller could not differentiate between the case of a page that simply
could not be used (because it had no "struct page" associated with it)
and a page that just wasn't mapped.
We do that by simply returning an error pointer for pages that could not
be turned into a "struct page *". The error is arbitrarily picked to be
EFAULT, since that was what get_user_pages() already used for the
equivalent IO-mapped page case.
[ Also removed an impossible test for pte_offset_map_lock() failing:
that's not how that function works ]
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the patch for the group descriptor table corruption during
online resize pointed out by Theodore Tso. The problem was caused by
the fact that the ext4 group descriptor can be either 32 or 64 bytes
long. Only the 64 bytes structure was taken into account.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[ Based upon original report and patch by Karsten Keil. Karsten
has verified that this fixes the TAHI test case "ICMPv6 test
v6LC.5.1.2 Part F". -DaveM ]
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the sticky Hop-by-Hop options header by calling setsockopt()
for IPV6_HOPOPTS with a zero option length, per RFC3542.
Routing header and Destination options header does the same as
Hop-by-Hop options header.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
General Software writes their own VSA2 module for their version
of the Geode BIOS, which returns a different ID then the standard
VSA2. This was causing the framebuffer driver to break for most
GSW boards.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Cc: tglx@linutronix.de
Cc: linux-geode@lists.infradead.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch corrects the incorrect value of per process run-queue wait
time reported by delay statistics. The anomaly was due to the following
reason. When a process leaves the CPU and immediately starts waiting for
CPU on the runqueue (which means it remains in the TASK_RUNNABLE state),
the time of re-entry into the run-queue is never recorded. Due to this,
the waiting time on the runqueue from this point of re-entry upto the
next time it hits the CPU is not accounted for. This is solved by
recording the time of re-entry of a process leaving the CPU in the
sched_info_depart() function IF the process will go back to waiting on
the run-queue. This IF condition is verified by checking whether the
process is still in the TASK_RUNNABLE state.
The patch was tested on 2.6.26-rc6 using two simple CPU hog programs.
The values noted prior to the fix did not account for the time spent on
the runqueue waiting. After the fix, the correct values were reported
back to user space.
Signed-off-by: Bharath Ravi <bharathravi1@gmail.com>
Signed-off-by: Madhava K R <madhavakr@gmail.com>
Cc: dhaval@linux.vnet.ibm.com
Cc: vatsa@in.ibm.com
Cc: balbir@in.ibm.com
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LM75 sensor reading bugfix: never save error status as valid
sensor output. This could be improved, but at least this
prevents certain rude failure modes.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
It has been reported that the abituguru3 driver fails to load after a BIOS
update. This patch fixes this by loosening the detection routine so that it
will work after the BIOS update too. To compensate for the now very loose
detection an additional check is added on the DMI Base Board vendor string to
make sure we only load on Abit motherboards, this is the same as the check in
the abituguru (1 / 2) driver.
Signed-of-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
This patch identifies the Abit AW8D board as such, and adds support for its
aux5 fan connector
Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
* Document the characteristics of libsensors 3.0.0 and 3.0.1.
* The sysfs interface is no longer subject to changes.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Juerg Haefliger <juergh at gmail.com>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>
data->max_duty_at_overheat is not updated in adt7473_update_device,
so it might be used before it is initialized (if the user reads from
sysfs file max_duty_at_crit before writing to it.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com>