Impact: futureproof as we convert more code to new APIs
The old cpumask operators treat all NR_CPUS bits as relevent, the new
ones use nr_cpumask_bits. For large NR_CPUS and small nr_cpu_ids, this
makes a difference.
However, mixing the two can cause problems with undefined bits. An
arch which sets CONFIG_CPUMASK_OFFSTACK should have converted across
to the new operators, so it's safe in that case.
(Thanks to Stephen Rothwell for bisecting the initial unused-bits bug,
and Mike Travis for this solution).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mike Travis <travis@sgi.com>
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
This defines them in the generic non-NUMA case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Impact: Add config option to enable code in cpumask.h
Currently it can be set if DEBUG_PER_CPU_MAPS, or set specifically by
an arch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: change calling convention of existing clock_event APIs
struct clock_event_timer's cpumask field gets changed to take pointer,
as does the ->broadcast function.
Another single-patch change. For safety, we BUG_ON() in
clockevents_register_device() if it's not set.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Impact: change existing irq_chip API
Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.
Fortunately, not widely used code, but hits a few architectures.
Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly. Ingo, does this break anything?
(Folded in fix from KOSAKI Motohiro)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: change calling convention of existing cpumask APIs
Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.
These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
Impact: cleanup
Each SMP arch defines these themselves. Move them to a central
location.
Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
Those archs simply have phys_cpu_present_map replaced everywhere.
3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
so I just manipulate them both in sync.
4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
declarations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
We merge the irq/sparseirq, x86/quirks and x86/reboot trees into the
cpus4096 tree because the io-apic changes in the sparseirq change
conflict with the cpumask changes in the cpumask tree, and we
want to resolve those.
arch_reinit_sched_domains() used to call arch_update_cpu_topology()
via arch_init_sched_domains(). This call got lost with
e761b77252 ("cpu hotplug, sched: Introduce
cpu_active_map and redo sched domain managment (take 2)".
So we might end up with outdated and missing cpus in the cpu core
maps (architecture used to call arch_reinit_sched_domains if cpu
topology changed).
This adds a call to arch_update_cpu_topology in partition_sched_domains
which gets called whenever scheduling domains get updated. Which is
what is supposed to happen when cpu topology changes.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change arch_update_cpu_topology so it returns 1 if the cpu topology changed
and 0 if it didn't change. This will be useful for the next patch which adds
a call to this function in partition_sched_domains.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix build error
/home/mingo/tip/usr/include/linux/random.h:11: included file
'linux/irqnr.h' is not exported
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
fix:
In file included from /home/mingo/tip/arch/m68k/amiga/amiints.c:39:
/home/mingo/tip/include/linux/interrupt.h:21: error: expected identifier or '('
/home/mingo/tip/arch/m68k/amiga/amiints.c: In function 'amiga_init_IRQ':
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For now include/trace/boot.h doesn't need to include necessary headers
for its functions and structures because the files that include it already
do it.
But boot.h could be needed as well for further uses on other files.
So, this patch adds the necessary headers for future purposes...
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix possible stack overrun
This is a port of a patch included in the mainline (KSYM_SYMBOL_LEN fixes).
The current func len is not large enough to contain the max symbol len, the
right size must be KSYM_SYMBOL_LEN.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit b1ee26bab1, along with
the "fixes" for it that all just caused problems:
- c4c6fa9891 "radeonfb: fix problem with
color expansion & alignment"
- f3179748a1 "radeonfb: Disable new color
expand acceleration unless explicitely enabled"
because even when disabled, it breaks for people. See
http://bugzilla.kernel.org/show_bug.cgi?id=12191
for the latest example.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: James Cloos <cloos@jhcloos.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
test in 2.6.7! Bad bad bug, good good find.
The i_mmap_writable count must be incremented for VM_SHARED (just as
i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
when dup_mmap() copies the vma for fork: it has its own more optimal
version of __vma_link_file(), and I missed this out. So the count
was later going down to 0 (dangerous) when one end unmapped, then
wrapping negative (inefficient) when the other end unmapped.
The only impact on x86 would have been that setting a mandatory lock on
a file which has at some time been opened O_RDWR and mapped MAP_SHARED
(but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN
when it should succeed, or succeed when it should fail.
But those architectures which rely on flush_dcache_page() to flush
userspace modifications back into the page before the kernel reads it,
may in some cases have skipped the flush after such a fork - though any
repetitive test will soon wrap the count negative, in which case it will
flush_dcache_page() unnecessarily.
Fix would be a two-liner, but mapping variable added, and comment moved.
Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last patch to lib/idr.c caused a bug if idr_get_new_above() was
called on an empty idr.
Usually, nodes stay on the same layer. New layers are added to the top
of the tree.
The exception is idr_get_new_above() on an empty tree: In this case, the
new root node is first added on layer 0, then moved upwards. p->layer
was not updated.
As usual: You shall never rely on the source code comments, they will
only mislead you.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Give the correct size when reserving the interrupt vector table. It should be
a page not a single byte.
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix __put_user_asm8() by jumping to the end label (3:) from the exception
handler, rather than jumping back to retry the second store instruction (label
2:).
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the preemption resume_kernel() routine by inverting the test to see
whether interrupts are off (IM7 is all enabled, not all disabled).
Furthermore, interrupts should be disabled on entry to resume_kernel() so that
they're correctly set for jumping to restore_all() and doing the need
reschedule test.
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Discard low-prioriy Tx interrupts when closing an MN10300 on-chip serial port.
The MN10300 on-chip serial port uses three interrupts to manage its serial
ports:
(1) A very high priority interrupt that drives virtual DMA for Rx.
(2) A very high priority interrupt that drives virtual DMA for Tx.
(3) A normal priority virtual interrupt that does the normal UART interrupt
stuff and is shared between Rx and Tx.
mn10300_serial_stop_tx() only disables the high priority Tx interrupt. It
doesn't also disable the normal priority one because it is shared with Rx.
However, the high priority interrupt may interrupt local_irq_disabled()
sections, and so may have queued up a low priority virtual interrupt whilst the
UART driver is asking for the Tx interrupt to be disabled.
The result of this can be an oops when we try to process the interrupt in
mn10300_serial_transmit_interrupt() as port->uart.info and port->uart.info->tty
may have gone away.
To deal with this, if either of those pointers is NULL, we make sure the
high-priority Tx interrupt is disabled and discard the interrupt. The low
priority interrupt is disabled by the mn10300_serial_pic irq_chip table.
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Include the linux/page.h header into the MN10300 kernel linker script thus
allowing us to use PAGE_SIZE macro instead of a numeric constant.
Also use the PERCPU macro instead of an explicit section definition.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCIe: ASPM: Break out of endless loop waiting for PCI config bits to switch
PCI: stop leaking 'slot_name' in pci_create_slot
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] SN: prevent IRQ retargetting in request_irq()
[IA64] Fix section mismatch ioc3uart_init()/ioc3uart_submodule
[IA64] Clear up section mismatch for ioc4_ide_attach_one.
[IA64] Clear up section mismatch with arch_unregister_cpu()
[IA64] Clear up section mismatch for sn_check_wars.
[IA64] Updated the generic_defconfig to work with the 2.6.28-rc7 kernel.
[IA64] Fix GRU compile error w/o CONFIG_HUGETLB_PAGE
[IA64] eliminate NULL test and memset after alloc_bootmem
[IA64] remove BUILD_BUG_ON from paravirt_getreg()
The pktcdvd created class devices only export some sysfs files,
but have no char dev_t registered in the driver.
At class device creation time they copy the dev_t value of the
block device to the char device, wich will register a new char
device in the driver core and userspace, with a conflicting dev_t
value.
In many cases the class devices dev_t just points to a random
USB device. This fixes the sysfs "duplicate entry" errors.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Peter Osterlund <petero2@telia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sparc64:
drivers/video/mb862xx/mb862xxfb.c:929: warning: long long unsigned int format, resource_size_t arg (arg 4)
drivers/video/mb862xx/mb862xxfb.c:931: warning: long long unsigned int format, resource_size_t arg (arg 4)
We don't know what type the architecture uses to implement u64, hence they
cannot be printed.
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Dmitry Baryshkov <dbaryshkov@gmail.com>
Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: Matteo Fortini <m.fortini@selcomgroup.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12dc sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.
The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.
Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.
[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc Miles Lane <miles.lane@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On umount two event will be dispatched to watcher:
1: inotify_dev_queue_event(.., IN_UNMOUNT,..)
2: remove_watch(watch, dev)
->inotify_dev_queue_event(.., IN_IGNORED, ..)
But if watcher has IN_ONESHOT bit set then the watcher will be released
inside first event. Which result in accessing invalid object later. IMHO
it is not pure regression. This bug wasn't triggered while initial
inotify interface testing phase because of another bug in IN_ONESHOT
handling logic :)
commit ac74c00e49
Author: Ulisses Furquim <ulissesf@gmail.com>
Date: Fri Feb 8 04:18:16 2008 -0800
inotify: fix check for one-shot watches before destroying them
As the IN_ONESHOT bit is never set when an event is sent we must check it
in the watch's mask and not in the event's mask.
TESTCASE:
mkdir mnt
mount -ttmpfs none mnt
mkdir mnt/d
./inotify mnt/d&
umount mnt ## << lockup or crash here
TESTSOURCE:
/* gcc -oinotify inotify.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/inotify.h>
int main(int argc, char **argv)
{
char buf[1024];
struct inotify_event *ie;
char *p;
int i;
ssize_t l;
p = argv[1];
i = inotify_init();
inotify_add_watch(i, p, ~0);
l = read(i, buf, sizeof(buf));
printf("read %d bytes\n", l);
ie = (struct inotify_event *) buf;
printf("event mask: %d\n", ie->mask);
return 0;
}
Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Ulisses Furquim <ulissesf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
atomic_long_xchg() is not correctly defined for 32bit arches.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 2f007e74bb, do_pages_stat()
gets the page address from user-space and puts the corresponding status
back while holding the mmap_sem for read. There is no need to hold
mmap_sem there while some page-faults may occur.
This patch adds a temporary address and status buffer so as to only
hold mmap_sem while working on these kernel buffers. This is
implemented by extracting do_pages_stat_array() out of do_pages_stat().
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The large pages fix from bcf8039ed4 broke 32-bit pagemap by pulling the
pagemap entry code out into a function with the wrong return type.
Pagemap entries are 64 bits on all systems and unsigned long is only 32
bits on 32-bit systems.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Reported-by: Doug Graham <dgraham@nortel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a total bootup freeze on ia64.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix build error when RTC_INTF_DEV=n:
drivers/rtc/rtc-twl4030.c:402: error: 'twl4030_rtc_ioctl' undeclared here (not in a function)
make[3]: *** [drivers/rtc/rtc-twl4030.o] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Samuel Ortiz <sameo@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a call to cancel_work_sync() in fbcon_exit() to cancel any pending
work in the fbcon workqueue.
The current implementation of fbcon_exit() sets the fbcon workqueue
function info->queue.func to NULL, but does not assure that there is no
work pending when it does so. On occasion, depending on system timing,
there will still be pending work in the queue when fbcon_exit() is
called. This results in a null pointer deference when run_workqueue()
tries to call the queue's work function.
Fixes errors on shutdown similar to these:
Console: switching to colour dummy device 80x25
Unable to handle kernel paging request for data at address 0x00000000
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, lru_add_drain_all() has two version.
(1) use schedule_on_each_cpu()
(2) don't use schedule_on_each_cpu()
Gerald Schaefer reported it doesn't work well on SMP (not NUMA) S390
machine.
offline_pages() calls lru_add_drain_all() followed by drain_all_pages().
While drain_all_pages() works on each cpu, lru_add_drain_all() only runs
on the current cpu for architectures w/o CONFIG_NUMA. This let us run
into the BUG_ON(!PageBuddy(page)) in __offline_isolated_pages() during
memory hotplug stress test on s390. The page in question was still on the
pcp list, because of a race with lru_add_drain_all() and drain_all_pages()
on different cpus.
Actually, Almost machine has CONFIG_UNEVICTABLE_LRU=y. Then almost machine use
(1) version lru_add_drain_all although the machine is UP.
Then this ifdef is not valueable.
simple removing is better.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert
commit e8ced39d5e
Author: Mingming Cao <cmm@us.ibm.com>
Date: Fri Jul 11 19:27:31 2008 -0400
percpu_counter: new function percpu_counter_sum_and_set
As described in
revert "percpu counter: clean up percpu_counter_sum_and_set()"
the new percpu_counter_sum_and_set() is racy against updates to the
cpu-local accumulators on other CPUs. Revert that change.
This means that ext4 will be slow again. But correct.
Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org> [2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert
commit 1f7c14c62c
Author: Mingming Cao <cmm@us.ibm.com>
Date: Thu Oct 9 12:50:59 2008 -0400
percpu counter: clean up percpu_counter_sum_and_set()
Before this patch we had the following:
percpu_counter_sum(): return the percpu_counter's value
percpu_counter_sum_and_set(): return the percpu_counter's value, copying
that value into the central value and zeroing the per-cpu counters before
returning.
After this patch, percpu_counter_sum_and_set() has gone, and
percpu_counter_sum() gets the old percpu_counter_sum_and_set()
functionality.
Problem is, as Eric points out, the old percpu_counter_sum_and_set()
functionality was racy and wrong. It zeroes out counters on "other" cpus,
without holding any locks which will prevent races agaist updates from
those other CPUS.
This patch reverts 1f7c14c62c. This means
that percpu_counter_sum_and_set() still has the race, but
percpu_counter_sum() does not.
Note that this is not a simple revert - ext4 has since started using
percpu_counter_sum() for its dirty_blocks counter as well.
Note that this revert patch changes percpu_counter_sum() semantics.
Before the patch, a call to percpu_counter_sum() will bring the counter's
central counter mostly up-to-date, so a following percpu_counter_read()
will return a close value.
After this patch, a call to percpu_counter_sum() will leave the counter's
central accumulator unaltered, so a subsequent call to
percpu_counter_read() can now return a significantly inaccurate result.
If there is any code in the tree which was introduced after
e8ced39d5e was merged, and which depends
upon the new percpu_counter_sum() semantics, that code will break.
Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should first delete the counter from percpu_counters list
before freeing memory, or a percpu_counter_hotcpu_callback()
could dereference a NULL pointer.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>