1
Commit Graph

1640 Commits

Author SHA1 Message Date
Toshiyuki Okajima
ac15ee691f radix_tree: radix_tree_gang_lookup_tag_slot() may never return
Executed command: fsstress -d /mnt -n 600 -p 850

  crash> bt
  PID: 7947   TASK: ffff880160546a70  CPU: 0   COMMAND: "fsstress"
   #0 [ffff8800dfc07d00] machine_kexec at ffffffff81030db9
   #1 [ffff8800dfc07d70] crash_kexec at ffffffff810a7952
   #2 [ffff8800dfc07e40] oops_end at ffffffff814aa7c8
   #3 [ffff8800dfc07e70] die_nmi at ffffffff814aa969
   #4 [ffff8800dfc07ea0] do_nmi_callback at ffffffff8102b07b
   #5 [ffff8800dfc07f10] do_nmi at ffffffff814aa514
   #6 [ffff8800dfc07f50] nmi at ffffffff814a9d60
      [exception RIP: __lookup_tag+100]
      RIP: ffffffff812274b4  RSP: ffff88016056b998  RFLAGS: 00000287
      RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000006
      RDX: 000000000000001d  RSI: ffff88016056bb18  RDI: ffff8800c85366e0
      RBP: ffff88016056b9c8   R8: ffff88016056b9e8   R9: 0000000000000000
      R10: 000000000000000e  R11: ffff8800c8536908  R12: 0000000000000010
      R13: 0000000000000040  R14: ffffffffffffffc0  R15: ffff8800c85366e0
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  <NMI exception stack>
   #7 [ffff88016056b998] __lookup_tag at ffffffff812274b4
   #8 [ffff88016056b9d0] radix_tree_gang_lookup_tag_slot at ffffffff81227605
   #9 [ffff88016056ba20] find_get_pages_tag at ffffffff810fc110
  #10 [ffff88016056ba80] pagevec_lookup_tag at ffffffff81105e85
  #11 [ffff88016056baa0] write_cache_pages at ffffffff81104c47
  #12 [ffff88016056bbd0] generic_writepages at ffffffff81105014
  #13 [ffff88016056bbe0] do_writepages at ffffffff81105055
  #14 [ffff88016056bbf0] __filemap_fdatawrite_range at ffffffff810fb2cb
  #15 [ffff88016056bc40] filemap_write_and_wait_range at ffffffff810fb32a
  #16 [ffff88016056bc70] generic_file_direct_write at ffffffff810fb3dc
  #17 [ffff88016056bce0] __generic_file_aio_write at ffffffff810fcee5
  #18 [ffff88016056bda0] generic_file_aio_write at ffffffff810fd085
  #19 [ffff88016056bdf0] do_sync_write at ffffffff8114f9ea
  #20 [ffff88016056bf00] vfs_write at ffffffff8114fcf8
  #21 [ffff88016056bf30] sys_write at ffffffff81150691
  #22 [ffff88016056bf80] system_call_fastpath at ffffffff8100c0b2

I think this root cause is the following:

 radix_tree_range_tag_if_tagged() always tags the root tag with settag
 if the root tag is set with iftag even if there are no iftag tags
 in the specified range (Of course, there are some iftag tags
 outside the specified range).

===============================================================================
[[[Detailed description]]]

(1) Why cannot radix_tree_gang_lookup_tag_slot() return forever?

__lookup_tag():
 - Return with 0.
 - Return with the index which is not bigger than the old one as the
   input parameter.

Therefore the following "while" repeats forever because the above
conditions cause "ret" not to be updated and the cur_index cannot be
changed into the bigger one.

(So, radix_tree_gang_lookup_tag_slot() cannot return forever.)

radix_tree_gang_lookup_tag_slot():
1178         while (ret < max_items) {
1179                 unsigned int slots_found;
1180                 unsigned long next_index;       /* Index of next search */
1181
1182                 if (cur_index > max_index)
1183                         break;
1184                 slots_found = __lookup_tag(node, results + ret,
1185                                 cur_index, max_items - ret, &next_index,
tag);
1186                 ret += slots_found;
			// cannot update ret because slots_found == 0.
			// so, this while loops forever.
1187                 if (next_index == 0)
1188                         break;
1189                 cur_index = next_index;
1190         }

(2) Why does __lookup_tag() return with 0 and doesn't update the index?

Assuming the following:
  - the one of the slot in radix_tree_node is NULL.
  - the one of the tag which corresponds to the slot sets with
    PAGECACHE_TAG_TOWRITE or other.
  - In a certain height(!=0), the corresponding index is 0.

a) __lookup_tag() notices that the tag is set.

1005 static unsigned int
1006 __lookup_tag(struct radix_tree_node *slot, void ***results, unsigned long index,
1007         unsigned int max_items, unsigned long *next_index, unsigned int tag)
1008 {
1009         unsigned int nr_found = 0;
1010         unsigned int shift, height;
1011
1012         height = slot->height;
1013         if (height == 0)
1014                 goto out;
1015         shift = (height-1) * RADIX_TREE_MAP_SHIFT;
1016
1017         while (height > 0) {
1018                 unsigned long i = (index >> shift) & RADIX_TREE_MAP_MASK ;
1019
1020                 for (;;) {
1021                         if (tag_get(slot, tag, i))
1022                                 break;
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* the index is not updated yet.

b) __lookup_tag() notices that the slot is NULL.

1023                         index &= ~((1UL << shift) - 1);
1024                         index += 1UL << shift;
1025                         if (index == 0)
1026                                 goto out;       /* 32-bit wraparound */
1027                         i++;
1028                         if (i == RADIX_TREE_MAP_SIZE)
1029                                 goto out;
1030                 }
1031                 height--;
1032                 if (height == 0) {      /* Bottom level: grab some items */
...
1055                 }
1056                 shift -= RADIX_TREE_MAP_SHIFT;
1057                 slot = rcu_dereference_raw(slot->slots[i]);
1058                 if (slot == NULL)
1059                         break;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

c) __lookup_tag() doesn't update the index and return with 0.

1060         }
1061 out:
1062         *next_index = index;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1063         return nr_found;
1064 }

(3) Why is the slot NULL even if the tag is set?

Because radix_tree_range_tag_if_tagged() always sets the root tag with
PAGECACHE_TAG_TOWRITE if the root tag is set with PAGECACHE_TAG_DIRTY,
even if there is no tag which can be set with PAGECACHE_TAG_TOWRITE
in the specified range (from *first_indexp to last_index). Of course,
some PAGECACHE_TAG_DIRTY nodes must exist outside the specified range.
(radix_tree_range_tag_if_tagged() is called only from tag_pages_for_writeback())

 640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
*root,
 641                 unsigned long *first_indexp, unsigned long last_index,
 642                 unsigned long nr_to_tag,
 643                 unsigned int iftag, unsigned int settag)
 644 {
 645         unsigned int height = root->height;
 646         struct radix_tree_path path[height];
 647         struct radix_tree_path *pathp = path;
 648         struct radix_tree_node *slot;
 649         unsigned int shift;
 650         unsigned long tagged = 0;
 651         unsigned long index = *first_indexp;
 652
 653         last_index = min(last_index, radix_tree_maxindex(height));
 654         if (index > last_index)
 655                 return 0;
 656         if (!nr_to_tag)
 657                 return 0;
 658         if (!root_tag_get(root, iftag)) {
 659                 *first_indexp = last_index + 1;
 660                 return 0;
 661         }
 662         if (height == 0) {
 663                 *first_indexp = last_index + 1;
 664                 root_tag_set(root, settag);
 665                 return 1;
 666         }
...
 733         root_tag_set(root, settag);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 734         *first_indexp = index;
 735
 736         return tagged;
 737 }

As the result, there is no radix_tree_node which is set with
PAGECACHE_TAG_TOWRITE but the root tag(radix_tree_root) is set with
PAGECACHE_TAG_TOWRITE.

[figure: inside radix_tree]
(Please see the figure with typewriter font)
===========================================
          [roottag = DIRTY]
                 |             tag=0:NOTHING
         tag[0 0 0 1]              1:DIRTY
            [x x x +]              2:WRITEBACK
                   |               3:DIRTY,WRITEBACK
                   p               4:TOWRITE
             <--->                 5:DIRTY,TOWRITE ...
     specified range (index: 0 to 2)

* There is no DIRTY tag within the specified range.
 (But there is a DIRTY tag outside that range.)

            | | | | | | | | |
    after calling tag_pages_for_writeback()
            | | | | | | | | |
            v v v v v v v v v

          [roottag = DIRTY,TOWRITE]
                 |                 p is "page".
         tag[0 0 0 1]              x is NULL.
            [x x x +]              +- is a pointer to "page".
                   |
                   p

* But TOWRITE tag is set on the root tag.
============================================

After that, radix_tree_extend() via radix_tree_insert() is called
when the page is added.
This function sets the new radix_tree_node with PAGECACHE_TAG_TOWRITE
to succeed the status of the root tag.

 246 static int radix_tree_extend(struct radix_tree_root *root, unsigned long
index)
 247 {
 248         struct radix_tree_node *node;
 249         unsigned int height;
 250         int tag;
 251
 252         /* Figure out what the height should be.  */
 253         height = root->height + 1;
 254         while (index > radix_tree_maxindex(height))
 255                 height++;
 256
 257         if (root->rnode == NULL) {
 258                 root->height = height;
 259                 goto out;
 260         }
 261
 262         do {
 263                 unsigned int newheight;
 264                 if (!(node = radix_tree_node_alloc(root)))
 265                         return -ENOMEM;
 266
 267                 /* Increase the height.  */
 268                 node->slots[0] = radix_tree_indirect_to_ptr(root->rnode);
 269
 270                 /* Propagate the aggregated tag info into the new root */
 271                 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
 272                         if (root_tag_get(root, tag))
 273                                 tag_set(node, tag, 0);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 274                 }

===========================================
          [roottag = DIRTY,TOWRITE]
                 |     :
         tag[0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p (new page)

            | | | | | | | | |
    after calling radix_tree_insert
            | | | | | | | | |
            v v v v v v v v v

          [roottag = DIRTY,TOWRITE]
                 |
         tag [5 0 0 0]    *  DIRTY and TOWRITE tags are
             [+ + x x]       succeeded to the new node.
              | |
  tag [0 0 0 1] [0 0 0 0]
      [x x x +] [+ x x x]
             |   |
             p   p
============================================

After that, the index 3 page is released by remove_from_page_cache().
Then we can make the situation that the tag is set with PAGECACHE_TAG_TOWRITE
and that the slot which corresponds to the tag is NULL.
===========================================
          [roottag = DIRTY,TOWRITE]
                 |
         tag [5 0 0 0]
             [+ + x x]
              | |
  tag [0 0 0 1] [0 0 0 0]
      [x x x +] [+ x x x]
             |   |
             p   p
         (remove)

            | | | | | | | | |
    after calling remove_page_cache
            | | | | | | | | |
            v v v v v v v v v

          [roottag = DIRTY,TOWRITE]
                 |
         tag [4 0 0 0]      * Only DIRTY tag is cleared
             [x + x x]        because no TOWRITE tag is existed
                |             in the bottom node.
                [0 0 0 0]
                [+ x x x]
                 |
                 p
============================================

To solve this problem

Change to that radix_tree_tag_if_tagged() doesn't tag the root tag
if it doesn't set any tags within the specified range.

Like this.
============================================
 640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
*root,
 641                 unsigned long *first_indexp, unsigned long last_index,
 642                 unsigned long nr_to_tag,
 643                 unsigned int iftag, unsigned int settag)
 644 {
 650         unsigned long tagged = 0;
...
 733 	     if (tagged)
^^^^^^^^^^^^^^^^^^^^^^^^
 734            root_tag_set(root, settag);
 735         *first_indexp = index;
 736
 737         return tagged;
 738 }

============================================

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-26 10:50:04 +10:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Linus Torvalds
52cfd503ad Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
  ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
  ACPI: fix resource check message
  ACPI / Battery: Update information on info notification and resume
  ACPI: Drop device flag wake_capable
  ACPI: Always check if _PRW is present before trying to evaluate it
  ACPI / PM: Check status of power resources under mutexes
  ACPI / PM: Rename acpi_power_off_device()
  ACPI / PM: Drop acpi_power_nocheck
  ACPI / PM: Drop acpi_bus_get_power()
  Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
  ACPI / Fan: Rework the handling of power resources
  ACPI / PM: Register power resource devices as soon as they are needed
  ACPI / PM: Register acpi_power_driver early
  ACPI / PM: Add function for updating device power state consistently
  ACPI / PM: Add function for device power state initialization
  ACPI / PM: Introduce __acpi_bus_get_power()
  ACPI / PM: Introduce function for refcounting device power resources
  ACPI / PM: Add functions for manipulating lists of power resources
  ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
  ACPICA: Update version to 20101209
  ...
2011-01-13 20:15:35 -08:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Lasse Collin
1da914e064 decompressors: check input size in decompress_inflate.c
Check for end of the input buffer when skipping over the filename field in
the .gz file header.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:25 -08:00
Lasse Collin
3ebe12439b decompressors: add boot-time XZ support
This implements the API defined in <linux/decompress/generic.h> which is
used for kernel, initramfs, and initrd decompression.  This patch together
with the first patch is enough for XZ-compressed initramfs and initrd;
XZ-compressed kernel will need arch-specific changes.

The buffering requirements described in decompress_unxz.c are stricter
than with gzip, so the relevant changes should be done to the
arch-specific code when adding support for XZ-compressed kernel.
Similarly, the heap size in arch-specific pre-boot code may need to be
increased (30 KiB is enough).

The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and
memzero() (memset(ptr, 0, size)), which aren't available in all
arch-specific pre-boot environments.  I'm including simple versions in
decompress_unxz.c, but a cleaner solution would naturally be nicer.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:25 -08:00
Lasse Collin
24fa0402a9 decompressors: add XZ decompressor module
In userspace, the .lzma format has become mostly a legacy file format that
got superseded by the .xz format.  Similarly, LZMA Utils was superseded by
XZ Utils.

These patches add support for XZ decompression into the kernel.  Most of
the code is as is from XZ Embedded <http://tukaani.org/xz/embedded.html>.
It was written for the Linux kernel but is usable in other projects too.

Advantages of XZ over the current LZMA code in the kernel:
  - Nice API that can be used by other kernel modules; it's
    not limited to kernel, initramfs, and initrd decompression.
  - Integrity check support (CRC32)
  - BCJ filters improve compression of executable code on
    certain architectures. These together with LZMA2 can
    produce a few percent smaller kernel or Squashfs images
    than plain LZMA without making the decompression slower.

This patch: Add the main decompression code (xz_dec), testing module
(xz_dec_test), wrapper script (xz_wrap.sh) for the xz command line tool,
and documentation.  The xz_dec module is enough to have a usable XZ
decompressor e.g.  for Squashfs.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
fb7fa589fd Decompressors: fix callback-to-callback mode in decompress_unlzo.c
Callback-to-callback decompression mode is used for initrd (not
initramfs).  The LZO wrapper is broken for this use case for two reasons:

  - The argument validation is needlessly too strict by
    requiring that "posp" is non-NULL when "fill" is non-NULL.

  - The buffer handling code didn't work at all for this
    use case.

I tested with LZO-compressed kernel, initramfs, initrd, and corrupt
(truncated) initramfs and initrd images.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
5a3f81a702 Decompressors: check input size in decompress_unlzo.c
The code assumes that the input is valid and not truncated.  Add checks to
avoid reading past the end of the input buffer.  Change the type of "skip"
from u8 to int to fix a possible integer overflow.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
8f9b54a35a Decompressors: check for write errors in decompress_unlzo.c
The return value of flush() is not checked in unlzo().  This means that
the decompressor won't stop even if the caller doesn't want more data.
This can happen e.g.  with a corrupt LZO-compressed initramfs image.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
eb0cf3e19b Decompressors: validate match distance in decompress_unlzma.c
Validate the newly decoded distance (rep0) in process_bit1().  This is to
detect corrupt LZMA data quickly.  The old code can run for long time
producing garbage until it hits the end of the input.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
528941ca05 Decompressors: check for write errors in decompress_unlzma.c
The return value of wr->flush() is not checked in write_byte().  This
means that the decompressor won't stop even if the caller doesn't want
more data.  This can happen e.g.  with corrupt LZMA-compressed initramfs.
Returning the error quickly allows the user to see the error message
quicker.

There is a similar missing check for wr.flush() near the end of unlzma().

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
278208d9d6 Decompressors: check for read errors in decompress_unlzma.c
Return value of rc->fill() is checked in rc_read() and error() is called
when needed, but then the code continues as if nothing had happened.

rc_read() is a void function and it's on the top of performance critical
call stacks, so propagating the error code via return values doesn't sound
like the best fix.  It seems better to check rc->buffer_size (which holds
the return value of rc->fill()) in the main loop.  It does nothing bad
that the code runs a little with unknown data after a failed rc->fill().

This fixes an infinite loop in initramfs decompression if the
LZMA-compressed initramfs image is corrupt.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
8218a43723 Decompressors: fix header validation in decompress_unlzma.c
Validation of header.pos calls error() but doesn't make the function
return to indicate an error to the caller.  Instead the decoding is
attempted with invalid header.pos.  This fixes it.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:24 -08:00
Lasse Collin
22e4420820 Decompressors: remove unused function from lib/decompress_unlzma.c
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:23 -08:00
Lasse Collin
2b6b5caa6d Decompressors: include <linux/slab.h> in <linux/decompress/mm.h>
Currently users of mm.h need to include <linux/slab.h> to use the macros
malloc() and free() provided by mm.h.  This fixes it.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:23 -08:00
Lasse Collin
93685ad247 Decompressors: get rid of set_error_fn() macro
set_error_fn() has become a useless complication after c1e7c3ae59
("bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure") fixed
the use of error() in malloc().  Only decompress_unlzma.c had some use for
it and that was easy to change too.

This also gets rid of the static function pointer "error", which
should have been marked as __initdata.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:23 -08:00
Lasse Collin
6b01ed64c1 Decompressors: add missing INIT (i.e. __init)
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:23 -08:00
David Rientjes
78c377d1b5 flex_array: export symbols to modules
Alex said:

  I want to use flex_array to store a sparse array of ATM cell
  re-assembly buffers for my ATM over Ethernet driver.  Using the per-vcc
  user_back structure causes problems when stacked with things like
  br2684.

Add EXPORT_SYMBOL() for all publically accessible flex array functions
and move to obj-y so that modules may use this library.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Reported-by: Alex Bennee <kernel-hacker@bennee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:11 -08:00
Anton Arapov
b921c69fb2 lib/vsprintf.c: fix vscnprintf() if @size is == 0
vscnprintf() should return 0 if @size is == 0.  Update the comment for it,
as @size is unsigned.

This change based on the code of commit
b903c0b889 ("lib: fix scnprintf() if @size
is == 0") moves the real fix into vscnprinf() from scnprintf() and makes
scnprintf() call vscnprintf(), thus avoid code duplication.

Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:10 -08:00
Joe Perches
ac83ed6878 include/linux/printk.h lib/hexdump.c: neatening and add CONFIG_PRINTK guard
- Move prototypes and align arguments.

- Add CONFIG_PRINTK guard for print_hex functions

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:10 -08:00
Dan Rosenberg
455cd5ab30 kptr_restrict for hiding kernel pointers from unprivileged users
Add the %pK printk format specifier and the /proc/sys/kernel/kptr_restrict
sysctl.

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
 If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges.  Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

[akpm@linux-foundation.org: check for IRQ context when !kptr_restrict, save an indent level, s/WARN/WARN_ONCE/]
[akpm@linux-foundation.org: coding-style fixup]
[randy.dunlap@oracle.com: fix kernel/sysctl.c warning]
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:08 -08:00
Huang Ying
81e88fdc43 ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

This patch adds POLL/IRQ/NMI notification types support.

Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.

Known issue:

- Error information can not be printed for recoverable errors notified
  via NMI, because printk is not NMI-safe. Will fix this via delay
  printing to IRQ context via irq_work or make printk NMI-safe.

v2:

- adjust printk format per comments.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 03:06:19 -05:00
Linus Torvalds
42776163e1 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
  perf session: Fix infinite loop in __perf_session__process_events
  perf evsel: Support perf_evsel__open(cpus > 1 && threads > 1)
  perf sched: Use PTHREAD_STACK_MIN to avoid pthread_attr_setstacksize() fail
  perf tools: Emit clearer message for sys_perf_event_open ENOENT return
  perf stat: better error message for unsupported events
  perf sched: Fix allocation result check
  perf, x86: P4 PMU - Fix unflagged overflows handling
  dynamic debug: Fix build issue with older gcc
  tracing: Fix TRACE_EVENT power tracepoint creation
  tracing: Fix preempt count leak
  tracepoint: Add __rcu annotation
  tracing: remove duplicate null-pointer check in skb tracepoint
  tracing/trivial: Add missing comma in TRACE_EVENT comment
  tracing: Include module.h in define_trace.h
  x86: Save rbp in pt_regs on irq entry
  x86, dumpstack: Fix unused variable warning
  x86, NMI: Clean-up default_do_nmi()
  x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU
  x86, NMI: Remove DIE_NMI_IPI
  x86, NMI: Add priorities to handlers
  ...
2011-01-11 11:02:13 -08:00
Linus Torvalds
5b2eef966c Merge branch 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (390 commits)
  drm/radeon/kms: disable underscan by default
  drm/radeon/kms: only enable hdmi features if the monitor supports audio
  drm: Restore the old_fb upon modeset failure
  drm/nouveau: fix hwmon device binding
  radeon: consolidate asic-specific function decls for pre-r600
  vga_switcheroo: comparing too few characters in strncmp()
  drm/radeon/kms: add NI pci ids
  drm/radeon/kms: don't enable pcie gen2 on NI yet
  drm/radeon/kms: add radeon_asic struct for NI asics
  drm/radeon/kms/ni: load default sclk/mclk/vddc at pm init
  drm/radeon/kms: add ucode loader for NI
  drm/radeon/kms: add support for DCE5 display LUTs
  drm/radeon/kms: add ni_reg.h
  drm/radeon/kms: add bo blit support for NI
  drm/radeon/kms: always use writeback/events for fences on NI
  drm/radeon/kms: adjust default clock/vddc tracking for pm on DCE5
  drm/radeon/kms: add backend map workaround for barts
  drm/radeon/kms: fill gpu init for NI asics
  drm/radeon/kms: add disabled vbios accessor for NI asics
  drm/radeon/kms: handle NI thermal controller
  ...
2011-01-10 17:11:39 -08:00
James Morris
d2e7ad1922 Merge branch 'master' into next
Conflicts:
	security/smack/smack_lsm.c

Verified and added fix by Stephen Rothwell <sfr@canb.auug.org.au>
Ok'd by Casey Schaufler <casey@schaufler-ca.com>

Signed-off-by: James Morris <jmorris@namei.org>
2011-01-10 09:46:24 +11:00
Jason Baron
2d75af2f2a dynamic debug: Fix build issue with older gcc
On older gcc (3.3) dynamic debug fails to compile:

include/net/inet_connection_sock.h: In function `inet_csk_reset_xmit_timer':
include/net/inet_connection_sock.h:236: error: duplicate label declaration `do_printk'
include/net/inet_connection_sock.h:219: error: this is a previous declaration
include/net/inet_connection_sock.h:236: error: duplicate label declaration `out'
include/net/inet_connection_sock.h:219: error: this is a previous declaration
include/net/inet_connection_sock.h:236: error: duplicate label `do_printk'
include/net/inet_connection_sock.h:236: error: duplicate label `out'

Fix, by reverting the usage of JUMP_LABEL() in dynamic debug for now.

Cc: <stable@kernel.org>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-01-07 23:36:59 -05:00
Linus Torvalds
72eb6a7914 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
  gameport: use this_cpu_read instead of lookup
  x86: udelay: Use this_cpu_read to avoid address calculation
  x86: Use this_cpu_inc_return for nmi counter
  x86: Replace uses of current_cpu_data with this_cpu ops
  x86: Use this_cpu_ops to optimize code
  vmstat: User per cpu atomics to avoid interrupt disable / enable
  irq_work: Use per cpu atomics instead of regular atomics
  cpuops: Use cmpxchg for xchg to avoid lock semantics
  x86: this_cpu_cmpxchg and this_cpu_xchg operations
  percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
  percpu,x86: relocate this_cpu_add_return() and friends
  connector: Use this_cpu operations
  xen: Use this_cpu_inc_return
  taskstats: Use this_cpu_ops
  random: Use this_cpu_inc_return
  fs: Use this_cpu_inc_return in buffer.c
  highmem: Use this_cpu_xx_return() operations
  vmstat: Use this_cpu_inc_return for vm statistics
  x86: Support for this_cpu_add, sub, dec, inc_return
  percpu: Generic support for this_cpu_add, sub, dec, inc_return
  ...

Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
2011-01-07 17:02:58 -08:00
Linus Torvalds
abb359450f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1436 commits)
  cassini: Use local-mac-address prom property for Cassini MAC address
  net: remove the duplicate #ifdef __KERNEL__
  net: bridge: check the length of skb after nf_bridge_maybe_copy_header()
  netconsole: clarify stopping message
  netconsole: don't announce stopping if nothing happened
  cnic: Fix the type field in SPQ messages
  netfilter: fix export secctx error handling
  netfilter: fix the race when initializing nf_ct_expect_hash_rnd
  ipv4: IP defragmentation must be ECN aware
  net: r6040: Return proper error for r6040_init_one
  dcb: use after free in dcb_flushapp()
  dcb: unlock on error in dcbnl_ieee_get()
  net: ixp4xx_eth: Return proper error for eth_init_one
  include/linux/if_ether.h: Add #define ETH_P_LINK_CTL for HPNA and wlan local tunnel
  net: add POLLPRI to sock_def_readable()
  af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks.
  net_sched: pfifo_head_drop problem
  mac80211: remove stray extern
  mac80211: implement off-channel TX using hw r-o-c offload
  mac80211: implement hardware offload for remain-on-channel
  ...
2011-01-06 12:30:19 -08:00
Linus Torvalds
dda5f0a372 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  MAINTAINERS: Update timer related entries
  timers: Use this_cpu_read
  timerqueue: Make timerqueue_getnext() static inline
  hrtimer: fix timerqueue conversion flub
  hrtimers: Convert hrtimers to use timerlist infrastructure
  timers: Fixup allmodconfig build issue
  timers: Rename timerlist infrastructure to timerqueue
  timers: Introduce timerlist infrastructure.
  hrtimer: Remove stale comment on curr_timer
  timer: Warn when del_timer_sync() is called in hardirq context
  timer: Del_timer_sync() can be used in softirq context
  timer: Make try_to_del_timer_sync() the same on SMP and UP
  posix-timers: Annotate lock_timer()
  timer: Permit statically-declared work with deferrable timers
  time: Use ARRAY_SIZE macro in timecompare.c
  timer: Initialize the field slack of timer_list
  timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS
  time: Compensate for rounding on odd-frequency clocksources

Fix up trivial conflict in MAINTAINERS
2011-01-06 10:42:43 -08:00
David S. Miller
17f7f4d9fc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/fib_frontend.c
2010-12-26 22:37:05 -08:00
Don Zickus
4a7863cc2e x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR
The x86 arch has shifted its use of the nmi_watchdog from a
local implementation to the global one provide by
kernel/watchdog.c.  This shift has caused a whole bunch of
compile problems under different config options.  I attempt to
simplify things with the patch below.

In order to simplify things, I had to come to terms with the
meaning of two terms ARCH_HAS_NMI_WATCHDOG and
CONFIG_HARDLOCKUP_DETECTOR.  Basically they mean the same thing,
the former on a local level and the latter on a global level.

With the old x86 nmi watchdog gone, there is no need to rely on
defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
make sense any more.  x86 will now use the global
implementation.

The changes below do a few things.  First it changes the few
places that relied on ARCH_HAS_NMI_WATCHDOG to use
CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
anyway, so nothing unusual here).  Those pieces of code were
relying more on local apic functionality the nmi watchdog
functionality, so the change should make sense.

Second, I removed the x86 implementation of
touch_nmi_watchdog().  It isn't need now, instead x86 will rely
on kernel/watchdog.c's implementation.

Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
x86.  And tweaked the include/linux/nmi.h file to tell users to
look for an externally defined touch_nmi_watchdog in the case of
ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
changes removes some of the ugliness in that file.

Finally, I added a Kconfig dependency for
CONFIG_HARDLOCKUP_DETECTOR that said you can't have
ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR.  You can
only have one nmi_watchdog.

Tested with
ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
(various broken configs)

Hopefully, after this patch I won't get any more compile broken
emails. :-)

v3:
  changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
  prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22 22:15:32 +01:00
Jiri Kosina
4b7bd36470 Merge branch 'master' into for-next
Conflicts:
	MAINTAINERS
	arch/arm/mach-omap2/pm24xx.c
	drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.
2010-12-22 18:57:02 +01:00
Christoph Lameter
819a72af8d percpucounter: Optimize __percpu_counter_add a bit through the use of this_cpu() options.
The this_cpu_* options can be used to optimize __percpu_counter_add a bit. Avoids
some address arithmetic and saves 12 bytes.

Before:


00000000000001d3 <__percpu_counter_add>:
 1d3:	55                   	push   %rbp
 1d4:	48 89 e5             	mov    %rsp,%rbp
 1d7:	41 55                	push   %r13
 1d9:	41 54                	push   %r12
 1db:	53                   	push   %rbx
 1dc:	48 89 fb             	mov    %rdi,%rbx
 1df:	48 83 ec 08          	sub    $0x8,%rsp
 1e3:	4c 8b 67 30          	mov    0x30(%rdi),%r12
 1e7:	65 4c 03 24 25 00 00 	add    %gs:0x0,%r12
 1ee:	00 00
 1f0:	4d 63 2c 24          	movslq (%r12),%r13
 1f4:	48 63 c2             	movslq %edx,%rax
 1f7:	49 01 f5             	add    %rsi,%r13
 1fa:	49 39 c5             	cmp    %rax,%r13
 1fd:	7d 0a                	jge    209 <__percpu_counter_add+0x36>
 1ff:	f7 da                	neg    %edx
 201:	48 63 d2             	movslq %edx,%rdx
 204:	49 39 d5             	cmp    %rdx,%r13
 207:	7f 1e                	jg     227 <__percpu_counter_add+0x54>
 209:	48 89 df             	mov    %rbx,%rdi
 20c:	e8 00 00 00 00       	callq  211 <__percpu_counter_add+0x3e>
 211:	4c 01 6b 18          	add    %r13,0x18(%rbx)
 215:	48 89 df             	mov    %rbx,%rdi
 218:	41 c7 04 24 00 00 00 	movl   $0x0,(%r12)
 21f:	00
 220:	e8 00 00 00 00       	callq  225 <__percpu_counter_add+0x52>
 225:	eb 04                	jmp    22b <__percpu_counter_add+0x58>
 227:	45 89 2c 24          	mov    %r13d,(%r12)
 22b:	5b                   	pop    %rbx
 22c:	5b                   	pop    %rbx
 22d:	41 5c                	pop    %r12
 22f:	41 5d                	pop    %r13
 231:	c9                   	leaveq
 232:	c3                   	retq


After:

00000000000001d3 <__percpu_counter_add>:
 1d3:	55                   	push   %rbp
 1d4:	48 63 ca             	movslq %edx,%rcx
 1d7:	48 89 e5             	mov    %rsp,%rbp
 1da:	41 54                	push   %r12
 1dc:	53                   	push   %rbx
 1dd:	48 89 fb             	mov    %rdi,%rbx
 1e0:	48 8b 47 30          	mov    0x30(%rdi),%rax
 1e4:	65 44 8b 20          	mov    %gs:(%rax),%r12d
 1e8:	4d 63 e4             	movslq %r12d,%r12
 1eb:	49 01 f4             	add    %rsi,%r12
 1ee:	49 39 cc             	cmp    %rcx,%r12
 1f1:	7d 0a                	jge    1fd <__percpu_counter_add+0x2a>
 1f3:	f7 da                	neg    %edx
 1f5:	48 63 d2             	movslq %edx,%rdx
 1f8:	49 39 d4             	cmp    %rdx,%r12
 1fb:	7f 21                	jg     21e <__percpu_counter_add+0x4b>
 1fd:	48 89 df             	mov    %rbx,%rdi
 200:	e8 00 00 00 00       	callq  205 <__percpu_counter_add+0x32>
 205:	4c 01 63 18          	add    %r12,0x18(%rbx)
 209:	48 8b 43 30          	mov    0x30(%rbx),%rax
 20d:	48 89 df             	mov    %rbx,%rdi
 210:	65 c7 00 00 00 00 00 	movl   $0x0,%gs:(%rax)
 217:	e8 00 00 00 00       	callq  21c <__percpu_counter_add+0x49>
 21c:	eb 04                	jmp    222 <__percpu_counter_add+0x4f>
 21e:	65 44 89 20          	mov    %r12d,%gs:(%rax)
 222:	5b                   	pop    %rbx
 223:	41 5c                	pop    %r12
 225:	c9                   	leaveq
 226:	c3                   	retq

Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-17 15:07:18 +01:00
Chris Wilson
d8c58fabd7 Merge remote branch 'airlied/drm-core-next' into drm-intel-next 2010-12-16 21:02:15 +00:00
John W. Linville
1d212aa96e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-12-13 15:20:45 -05:00
Thomas Gleixner
45f74264e1 timerqueue: Make timerqueue_getnext() static inline
No point in calling a function just to dereference a pointer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
2010-12-11 12:34:34 +01:00
John Stultz
9bb99b1470 timers: Fixup allmodconfig build issue
Adds missed EXPORT_SYMBOL lines that cause the following build
failures with allmodconfig:
ERROR: "timerqueue_add" [drivers/rtc/rtc-core.ko] undefined!
ERROR: "timerqueue_getnext" [drivers/rtc/rtc-core.ko] undefined!
ERROR: "timerqueue_del" [drivers/rtc/rtc-core.ko] undefined!

Reported-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2010-12-10 11:54:02 -08:00
John Stultz
1f5a24794a timers: Rename timerlist infrastructure to timerqueue
Thomas pointed out a namespace collision between the new timerlist
infrastructure I introduced and the existing timer_list.c

So to avoid confusion, I've renamed the timerlist infrastructure
to timerqueue.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2010-12-10 11:52:17 -08:00
Bruno Randolf
af55688435 lib: Improve EWMA efficiency by using bitshifts
Using bitshifts instead of division and multiplication should improve
performance. That requires weight and factor to be powers of two, but i think
this is something we can live with.

Thanks to Peter Zijlstra for the improved formula!

Signed-off-by: Bruno Randolf <br1@einfach.org>

--

v2:	use log2.h functions
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 15:58:43 -05:00
John Stultz
87de5ac782 timers: Introduce timerlist infrastructure.
The timerlist infrastructure is a thin layer over the rbtree
code that implements a simple list of timers sorted by an
expires value, and a getnext function that provides a pointer
to the earliest timer.

This infrastructure allows drivers and other kernel infrastructure
to easily implement timers without duplicating code.

Signed-off-by: John Stultz <john.stultz@linaro.org>
LKML Reference: <1290136329-18291-2-git-send-email-john.stultz@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Alessandro Zummo <a.zummo@towertech.it>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Richard Cochran <richardcochran@gmail.com>
2010-12-02 16:41:39 -08:00
Dave Airlie
bcb38ceb22 Revert "debug_locks: set oops_in_progress if we will log messages."
This reverts commit e0fdace10e.

On-list discussion seems to suggest that the robustness fixes for printk
make this unnecessary and DaveM has also agreed in person at Kernel Summit
and on list.

The main problem with this code is once we hit a lockdep splat we always
keep oops_in_progress set, the console layer uses oops_in_progress with KMS
to decide when it should be showing the oops and not showing X, so it causes
problems around suspend/resume time when a userspace resume can cause a console
switch away from X, only if oops_in_progress is set (which is what we want
if an oops actually is in progress, but not because we had a lockdep splat
2 days prior).

Cc: David S Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-29 15:18:28 -08:00
Mimi Zohar
dc88e46029 lib: hex2bin converts ascii hexadecimal string to binary
Similar to the kgdb_hex2mem() code, hex2bin converts a string
to binary using the hex_to_bin() library call.

Changelog:
- Replace parameter names with src/dst (based on David Howell's comment)
- Add 'const' where needed (based on David Howell's comment)
- Replace int with size_t (based on David Howell's comment)

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2010-11-29 08:55:11 +11:00
John W. Linville
51cce8a590 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-11-24 16:49:20 -05:00
Thomas Hellstrom
ecf7ace9a8 kref: Add a kref_sub function
Makes it possible to optimize batched multiple unrefs.
Initial user will be drivers/gpu/ttm which accumulates unrefs to be
processed outside of atomic code.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-11-22 13:25:13 +10:00
Bruno Randolf
c5485a7e75 lib: Add generic exponentially weighted moving average (EWMA) function
This adds generic functions for calculating Exponentially Weighted Moving
Averages (EWMA). This implementation makes use of a structure which keeps the
EWMA parameters and a scaled up internal representation to reduce rounding
errors.

The original idea for this implementation came from the rt2x00 driver
(rt2x00link.c). I would like to use it in several places in the mac80211 and
ath5k code and I hope it can be useful in many other places in the kernel code.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-18 14:21:52 -05:00
Jan Engelhardt
3654654f7a netlink: let nlmsg and nla functions take pointer-to-const args
The changed functions do not modify the NL messages and/or attributes
at all. They should use const (similar to strchr), so that callers
which have a const nlmsg/nlattr around can make use of them without
casting.

While at it, constify a data array.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 09:52:32 -08:00
Nick Piggin
27d20fddc8 radix-tree: fix RCU bug
Salman Qazi describes the following radix-tree bug:

In the following case, we get can get a deadlock:

0.  The radix tree contains two items, one has the index 0.
1.  The reader (in this case find_get_pages) takes the rcu_read_lock.
2.  The reader acquires slot(s) for item(s) including the index 0 item.
3.  The non-zero index item is deleted, and as a consequence the other item is
    moved to the root of the tree. The place where it used to be is queued for
    deletion after the readers finish.
3b. The zero item is deleted, removing it from the direct slot, it remains in
    the rcu-delayed indirect node.
4.  The reader looks at the index 0 slot, and finds that the page has 0 ref
    count
5.  The reader looks at it again, hoping that the item will either be freed or
    the ref count will increase. This never happens, as the slot it is looking
    at will never be updated. Also, this slot can never be reclaimed because
    the reader is holding rcu_read_lock and is in an infinite loop.

The fix is to re-use the same "indirect" pointer case that requires a slot
lookup retry into a general "retry the lookup" bit.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Reported-by: Salman Qazi <sqazi@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-12 07:55:32 -08:00
Uwe Kleine-König
b595076a18 tree-wide: fix comment/printk typos
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-01 15:38:34 -04:00
Linus Torvalds
e3e1288e86 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (48 commits)
  DMAENGINE: move COH901318 to arch_initcall
  dma: imx-dma: fix signedness bug
  dma/timberdale: simplify conditional
  ste_dma40: remove channel_type
  ste_dma40: remove enum for endianess
  ste_dma40: remove TIM_FOR_LINK option
  ste_dma40: move mode_opt to separate config
  ste_dma40: move channel mode to a separate field
  ste_dma40: move priority to separate field
  ste_dma40: add variable to indicate valid dma_cfg
  async_tx: make async_tx channel switching opt-in
  move async raid6 test to lib/Kconfig.debug
  dmaengine: Add Freescale i.MX1/21/27 DMA driver
  intel_mid_dma: change the slave interface
  intel_mid_dma: fix the WARN_ONs
  intel_mid_dma: Add sg list support to DMA driver
  intel_mid_dma: Allow DMAC2 to share interrupt
  intel_mid_dma: Allow IRQ sharing
  intel_mid_dma: Add runtime PM support
  DMAENGINE: define a dummy filter function for ste_dma40
  ...
2010-10-27 19:04:36 -07:00