Currently page_address_in_vma() compares vma->anon_vma and
page_anon_vma(page) for parameter check, but in 2.6.34 a vma can have
multiple anon_vmas with anon_vma_chain, so current check does not work.
(For anonymous page shared by multiple processes, some verified (page,vma)
pairs return -EFAULT wrongly.)
We can go to checking all anon_vmas in the "same_vma" chain, but it needs
to meet lock requirement. Instead, we can remove anon_vma check safely
because page_address_in_vma() assumes that page and vma are already
checked to belong to the identical process.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ordinarily, application using hugetlbfs will create mappings with
reserves. For shared mappings, these pages are reserved before mmap()
returns success and for private mappings, the caller process is guaranteed
and a child process that cannot get the pages gets killed with sigbus.
An application that uses MAP_NORESERVE gets no reservations and mmap()
will always succeed at the risk the page will not be available at fault
time. This might be used for example on very large sparse mappings where
the developer is confident the necessary huge pages exist to satisfy all
faults even though the whole mapping cannot be backed by huge pages.
Unfortunately, if an allocation does fail, VM_FAULT_OOM is returned to the
fault handler which proceeds to trigger the OOM-killer. This is
unhelpful.
Even without hugetlbfs mounted, a user using mmap() can trivially trigger
the OOM-killer because VM_FAULT_OOM is returned (will provide example
program if desired - it's a whopping 24 lines long). It could be
considered a DOS available to an unprivileged user.
This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
where huge pages were not available with SIGBUS instead of triggering the
OOM killer.
This change affects hugetlb_cow() as well. I feel there is a failure case
in there, but I didn't create one. It would need a fairly specific target
in terms of the faulting application and the hugepage pool size. The
hugetlb_no_page() path is much easier to hit but both might as well be
closed.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two "echo 0 > /sys/kernel/kexec_crash_size" OOPSes kernel. Also content
of this file is invalid after first shrink to zero: it shows 1 instead of
0.
This scenario is unlikely to happen often (root privs, valid crashkernel=
in cmdline, dump-capture kernel not loaded), I hit it only by chance.
This patch fixes it.
Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In debugfs, printing of command response reports resp[2] twice: fix it to
resp[3].
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Disable data error interrupts while we are actually recording that there
is not such errors. This will prevent, in some cases, the warning message
printed at new request queuing (in atmci_start_request()).
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: <linux-mmc@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The removing of an SD card in certain circumstances can lead to a kernel
oops if we do not make sure that the "data" field of the host structure is
valid. This patch adds a test in atmci_dma_cleanup() function and also
calls atmci_stop_dma() before throwing away the reference to data.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: <linux-mmc@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two parameters were swapped in the calls to atmci_init_slot().
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reported-by: Anders Grahn <anders.grahn@hd-wireless.se>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: <linux-mmc@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.
Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.
Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.
Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.
Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.
This patch nearly undoes the effect of all these patches.
The reason for reverting these is it provides an unusable value in
field 28. For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address. This unpredictability of the stack_start value makes
it worthless. That includes the intended use of showing how much stack
space a thread has.
Other architectures will get different values. As an example, ia64
gets 0. The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.
I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") . If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured. Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.
I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") . I left the KSTK_ESP() change in
place as that seemed worthwhile.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SIO chip contains 16 possible gpio lines, not 14. The schematic was
not read carefully.
Signed-off-by: Denis Turischev <denis@compulab.co.il>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_sync_single_range_for_cpu() and dma_sync_single_range_for_device() use
a wrong address with a partial synchronization.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updates the i2400m driver to default to firmware versions v1.5 for the
Intel Wireless WiMAX Connection 5150 and 5350 devices.
Firmware available in linux-firmware.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
This patch moves the module parameters to the file where they
can be avoided to be global and allow them to be static.
The module param : idle_mode_disabled and power_save_disabled
are moved from driver.c to control.c. Also these module parameters
are declared to be static as they are not required to be global anymore.
The module param : rx_reorder_disabled is moved from driver.c file to
rx.c file. Also this parameter is declated as static as it is not
required to be global anymore.
Signed-off-by: Prasanna S Panchamukhi<prasannax.s.panchamukhi@intel.com>
wimax_msg_alloc() returns an ERR_PTR and not null. I changed it to test
for ERR_PTR instead of null. I also added a check in front of the
kfree() because kfree() can handle null but not ERR_PTR.
Signed-off-by: Dan Carpenter <error27@gmail.com>
stch_skb is allocated with wimax_gnl_re_state_change_alloc(). That
function returns ERR_PTRs on failure and doesn't return NULL.
Signed-off-by: Dan Carpenter <error27@gmail.com>
This patch specifies the TX queue's buffer room required by the
USB bus driver while allocating header space for a new message.
Please refer the documentation in the code.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
This patch specifies the TX queue's minimum buffer room required to
accommodate one smallest SDIO payload.
Please refer the documentation in the code.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
Increase the possibilities of including at least one payload by reserving
some additional space in the TX queue while allocating TX queue's space
for new message header. Please refer the documentation in the code for details.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
According to Intel Wimax i3200, i5x50 and i6x60 device specification documents,
the host driver must not reset the device if the normalized sequence numbers
are greater than 1023 for type 2 and type 3 RX messages.
This patch removes the code that incorrectly used to reset the device.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
This patch fixes the race condition when one thread tries to destroy
the memory allocated for rx_roq, while another thread still happen
to access rx_roq.
Such a race condition occurs when i2400m-sdio kernel module gets
unloaded, destroying the memory allocated for rx_roq while rx_roq
is accessed by i2400m_rx_edata(), as explained below:
$thread1 $thread2
$ void i2400m_rx_edata() $
$Access rx_roq[] $
$roq = &i2400m->rx_roq[ro_cin] $
$ i2400m_roq_[reset/queue/update_ws] $
$ $ void i2400m_rx_release();
$ $kfree(rx->roq);
$ $rx->roq = NULL;
$Oops! rx_roq is NULL
This patch fixes the race condition using refcount approach.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
This patch increases the tx_queue_len to 20 so as to
minimize the jitter in the throughput.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
This patch fixes an infinite loop caused by i2400m_tx_fifo_push() due
to a corner case where there is no tail space in the TX FIFO.
Please refer the documentation in the code for details.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
This fixes i2400m_tx_fifo_push(); the check for having enough
space in the TX FIFO's tail was obscure and broken in certain
corner cases. The new check works in all cases and is way
clearer. Please refer the documentation in the code for details.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
The older method of computing the maximum PDU size relied
on a method that doesn't work when we prop the maximum
number of payloads up to the physical limit, and thus we kill
the whole computation and just verify that the constants are
congruent.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
According to Intel Wimax i3200, i5x50 and i6x50 specification
documents, the maximum size of each TX message can be upto 16KiB.
This patch modifies the i2400m_tx() routine to check that the
message size does not exceed the 16KiB limit.
Please refer the documentation in the code for details.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
According to Intel Wimax i3200, i5x50 and i6x50 device specification
documents, the maximum number of payloads per message can be up to 60.
Increasing the number of payloads to 60 per message helps to
accommodate smaller payloads in a single transaction. This patch
increases the maximum number of payloads from 12 to 60 per message.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
This patch makes sure whenever tx_setup() is invoked during driver
initialization or device reset where TX FIFO is released and re-allocated,
the indices tx_in, tx_out, tx_msg_size, tx_sequence, tx_msg are properly
initialized.
When a device reset happens and the TX FIFO is released/re-allocated,
a new block of memory may be allocated for the TX FIFO, therefore tx_msg
should be cleared so that no any TX threads (tx_worker, tx) would access
to the out-of-date addresses.
Also, the TX threads use tx_in and tx_out to decide where to put the new
host-to-device messages and from where to copy them to the device HW FIFO,
these indices have to be cleared so after the TX FIFO is re-allocated during
the reset, the indices both refer to the head of the FIFO, ie. a new start.
The same rational applies to tx_msg_size and tx_sequence.
To protect the indices from being accessed by multiple threads simultaneously,
the lock tx_lock has to be obtained before the initializations and released
afterwards.
Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>
When bus_setup fails in i2400m_post_reset(), it falls to the error path handler
"error_bus_setup:" which includes unlock the mutext. However, we didn't ever
try to the obtain the lock when running bus_setup.
The patch is to fix the misplaced error path handler "error_bus_setup:".
Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>
This patch adds an error recovery mechanism on TX path.
The intention is to bring back the device to some known state
whenever TX sees -110 (-ETIMEOUT) on copying the data to the HW FIFO.
The TX failure could mean a device bus stuck or function stuck, so
the current error recovery implementation is to trigger a bus reset
and expect this can bring back the device.
Since the TX work is done in a thread context, there may be a queue of TX works
already that all hit the -ETIMEOUT error condition because the device has
somewhat stuck already. We don't want any consecutive bus resets simply because
multiple TX works in the queue all hit the same device erratum, the flag
"error_recovery" is introduced to denote if we are ready for taking any
error recovery. See @error_recovery doc in i2400m.h.
Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>
The problem is only seen on SDIO interface since on USB, a bus reset would
really re-probe the driver, but on SDIO interface, a bus reset will not
re-enumerate the SDIO bus, so no driver re-probe is happening. Therefore,
on SDIO interface, the reset event should be still detected and handled by
dev_reset_handle().
Problem description:
Whenever a reboot barker is received during operational mode (i2400m->boot_mode == 0),
dev_reset_handle() is invoked to handle that function reset event.
dev_reset_handle() then sets the flag i2400m->boot_mode to 1 indicating the device is
back to bootmode before proceeding to dev_stop() and dev_start().
If dev_start() returns failure, a bus reset is triggered by dev_reset_handle().
The flag i2400m->boot_mode then remains 1 when the second reboot barker arrives.
However the interrupt service routine i2400ms_rx() instead of invoking dev_reset_handle()
to handle that reset event, it filters out that boot event to bootmode because it sees
the flag i2400m->boot_mode equal to 1.
The fix:
Maintain the flag i2400m->boot_mode within dev_reset_handle() and set the flag
i2400m->boot_mode to 1 when entering dev_reset_handle(). It remains 1
until the dev_reset_handle() issues a bus reset. ie: the bus reset is
taking place just like it happens for the first time during operational mode.
To denote the actual device state and the state we expect, a flag i2400m->alive
is introduced in addition to the existing flag i2400m->updown.
It's maintained with the same way for i2400m->updown but instead of reflecting
the actual state like i2400m->updown does, i2400m->alive maintains the state
we expect. i2400m->alive is set 1 just like whenever i2400m->updown is set 1.
Yet i2400m->alive remains 1 since we expect the device to be up all the time
until the driver is removed. See the doc for @alive in i2400m.h.
An enumeration I2400M_BUS_RESET_RETRIES is added to define the maximum number of
bus resets that a device reboot can retry.
A counter i2400m->bus_reset_retries is added to track how many bus resets
have been retried in one device reboot. If I2400M_BUS_RESET_RETRIES bus resets
were retried in this boot, we give up any further retrying so the device would enter
low power state. The counter i2400m->bus_reset_retries is incremented whenever
dev_reset_handle() is issuing a bus reset and is cleared to 0 when dev_start() is
successfully done, ie: a successful reboot.
Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>
This fix is to correct order of the handlers in the error path
of dev_start(). When i2400m_firmware_check fails, all the works done
before it should be released or cleared.
Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>
The race condition happens when the TX queue is accessed by
the TX work while the same TX queue is being destroyed because
a bus reset is triggered either by debugfs entry or simply
by failing waking up the device from WiMAX IDLE mode.
This fix is to prevent the TX queue from being accessed by
multiple threads
Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>
This patch increases the Tx buffer size so as to accommodate 12 payloads
of 1408 (1400 MTU 16 bytes aligned). Currently Tx buffer is 32 KiB which
is insufficient to accommodate 12 payloads of 1408 size.
This patch
- increases I2400M_TX_BUF_SIZE from 32KiB to 64KiB
- Adds a BUILD_BUG_ON if the calculated buffer size based
on the given MTU exceeds the I2400M_TX_BUF_SIZE.
Below is how we calculate the size of the Tx buffer.
Payload + 4 bytes prefix for each payload (1400 MTU 16 bytes boundary aligned)
= (1408 + sizeof(struct i2400m_pl_data_hdr)) * I2400M_TX_PLD_MAX
Adding 16 byte message header = + sizeof(struct i2400m_msg_hdr)
Aligning to 256 byte boundary
Total Tx buffer = (((((1408 + sizeof(struct i2400m_pl_data_hdr))
* I2400M_TX_PLD_MAX )+ sizeof(struct i2400m_msg_hdr))
/ 256) + 1) * 256 * 2
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
This patch moves I2400M_MAX_MTU enum defined in netdev.c to i2400m.h.
Follow up changes will make use of this value in other location,
thus requiring it to be moved to a global header file i2400m.h.
Signed-off-by: Prasanna S. Panchamukhi <prasannax.s.panchamukhi@intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
i2400m_tx() routine was returning -ESHUTDOWN even when there was no Tx buffer
available. This patch fixes the i2400m_tx() to return -ESHUTDOWN only when
the device is down(i2400m->tx_buf is NULL) and also to return -ENOSPC
when there is no Tx buffer. Error seen in the kernel log.
kernel: i2400m_sdio mmc0:0001:1: can't send message 0x5606: -108
kernel: i2400m_sdio mmc0:0001:1: Failed to issue 'Enter power save'command: -108
Signed-off-by: Prasanna S.Panchamukhi <prasannax.s.panchamukhi@intel.com>
Don't send them for further processing.
Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These changes include:
* For PAPRD, the TXRF3.capdiv5G, TXRF3.rdiv5G and TXRF3.rdiv2G
are set to 0x0, the TXRF6.capdiv2G is set to 0x2 for all
three chains.
* The d2cas5G/d3cas5G/d4cas5G was updated to 4/4/4 in lowest_ob_db
Tx gain table.
* To improve DPPM, three parameters were updated (Released from Madhan):
1. RANGE_OSDAC is set to 0x1 for 2G, 0x0 for 5G
2. offsetC1 is set to 0xc
3. inv_clk320_adc is set to 0x1
* To reduce PHY error(from spur), cycpwr_thr1 and cycpwr_thr1_ext
are increased to 0x8 at 2G.
* The 2G Rx gain tables are updated with mixer gain setting 3,1,0.
The new checksums yield:
initvals -f ar9003
0x00000000c2bfa7d5 ar9300_2p0_radio_postamble
0x00000000ada2b114 ar9300Modes_lowest_ob_db_tx_gain_table_2p0
0x00000000e0bc2c84 ar9300Modes_fast_clock_2p0
0x00000000056eaf74 ar9300_2p0_radio_core
0x0000000000000000 ar9300Common_rx_gain_table_merlin_2p0
0x0000000078658fb5 ar9300_2p0_mac_postamble
0x0000000023235333 ar9300_2p0_soc_postamble
0x0000000054d41904 ar9200_merlin_2p0_radio_core
0x00000000748572cf ar9300_2p0_baseband_postamble
0x000000009aa5a0a4 ar9300_2p0_baseband_core
0x000000003df9a326 ar9300Modes_high_power_tx_gain_table_2p0
0x000000001cfba124 ar9300Modes_high_ob_db_tx_gain_table_2p0
0x0000000011302700 ar9300Common_rx_gain_table_2p0
0x00000000e3eab114 ar9300Modes_low_ob_db_tx_gain_table_2p0
0x00000000c9d66d40 ar9300_2p0_mac_core
0x000000001e1d0800 ar9300Common_wo_xlna_rx_gain_table_2p0
0x00000000a0c54980 ar9300_2p0_soc_preamble
0x00000000292e2544 ar9300PciePhy_pll_on_clkreq_disable_L1_2p0
0x000000002d3e2544 ar9300PciePhy_clkreq_enable_L1_2p0
0x00000000293e2544 ar9300PciePhy_clkreq_disable_L1_2p0
Cc: Don Breslin <don.breslin@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Jumbo frames are not supported, and if they are seen it is likely
a bogus frame so just silently discard them instead of warning on
them all time. Also, instead of dropping them immediately though
move the check *after* we check for all sort of frame errors. This
should enable us to discard these frames if the hardware picks
other bogus items first. Lets see if we still get those jumbo
counters increasing still with this.
Jumbo frames would happen if we tell hardware we can support
a small 802.11 chunks of DMA'd frame, hardware would split RX'd
frames into parts and we'd have to reconstruct them in software.
This is done with USB due to the bulk size but with ath5k we
already provide a good limit to hardware and this should not be
happening.
This is reported quite often and if it fills the logs then this
needs to be addressed and to avoid spurious reports.
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The goto and the break are equivelent. I removed the goto in memory of
Edsger Dijkstra who famously hated gotos and who would have been eighty
years old next Tuesday.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The "(wl == NULL)" test doesn't work here because "wl" is always
non-null. The intent of the code is to return if the interface
was not supported by the driver.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We should start the loop consistently with the "wl_lock" lock held.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Fix FDDI and TR config checks in ipv4 arp and LLC.
IPv4: unresolved multicast route cleanup
mac80211: remove association work when processing deauth request
ar9170: wait for asynchronous firmware loading
ipv4: udp: fix short packet and bad checksum logging
phy: Fix initialization in micrel driver.
sctp: Fix a race between ICMP protocol unreachable and connect()
veth: Dont kfree_skb() after dev_forward_skb()
IPv6: fix IPV6_RECVERR handling of locally-generated errors
net/gianfar: drop recycled skbs on MTU change
iwlwifi: work around passive scan issue
Fix an occasional EIO returned by a call to vfs_unlink():
[ 4868.465413] CacheFiles: I/O Error: Unlink failed
[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
[ 4947.320011] CacheFiles: File cache on md3 unregistering
[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
[ 5127.348716] CacheFiles: File cache on md3 registered
[ 7076.871081] CacheFiles: I/O Error: Unlink failed
[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
[ 7116.780891] CacheFiles: File cache on md3 unregistering
[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
[ 7296.813432] CacheFiles: File cache on md3 registered
What happens is this:
(1) A cached NFS file is seen to have become out of date, so NFS retires the
object and immediately acquires a new object with the same key.
(2) Retirement of the old object is done asynchronously - so the lookup/create
to generate the new object may be done first.
This can be a problem as the old object and the new object must exist at
the same point in the backing filesystem (i.e. they must have the same
pathname).
(3) The lookup for the new object sees that a backing file already exists,
checks to see whether it is valid and sees that it isn't. It then deletes
that file and creates a new one on disk.
(4) The retirement phase for the old file is then performed. It tries to
delete the dentry it has, but ext4_unlink() returns -EIO because the inode
attached to that dentry no longer matches the inode number associated with
the filename in the parent directory.
The trace below shows this quite well.
[md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1)
[md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100)
NFS has retired the old cookie and asked for a new one.
[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24})
[kslowd] <== fscache_object_state_machine() [->OBJECT_DYING]
[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0})
[kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP]
[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24})
[kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING]
The old object (OBJ52) is going through the terminal states to get rid of it,
whilst the new object - (OBJ53) - is coming into being.
[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0})
[kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,)
[kslowd] lookup '@68'
[kslowd] next -> ffff88002ce41bd0 positive
[kslowd] advance
[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
[kslowd] next -> ffff8800369faac8 positive
The new object has looked up the subdir in which the file would be in (getting
dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry
ffff8800369faac8).
[kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
[kslowd] unlink stale object
[kslowd] <== cachefiles_bury_object() = 0
It then checks the file's xattrs to see if it's valid. NFS says that the
auxiliary data indicate the file is out of date (obvious to us - that's why NFS
ditched the old version and got a new one). CacheFiles then deletes the old
file (dentry ffff8800369faac8).
[kslowd] redo lookup
[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
[kslowd] next -> ffff88002cd94288 negative
[kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}}
CacheFiles then redoes the lookup and gets a negative result in a new dentry
(ffff88002cd94288) which it then creates a file for.
[kslowd] ==> cachefiles_mark_object_active(,OBJ53)
[kslowd] <== cachefiles_mark_object_active() = 0
[kslowd] === OBTAINED_OBJECT ===
[kslowd] <== cachefiles_walk_to_object() = 0 [148247]
[kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE]
The new object is then marked active and the state machine moves to the
available state - at which point NFS can start filling the object.
[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20})
[kslowd] ==> fscache_release_object()
[kslowd] ==> cachefiles_drop_object({OBJ52,2})
[kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8})
The old object, meanwhile, goes on with being retired. If allocation occurs
first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to
become available before it can continue.
[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
[kslowd] unlink stale object
EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193)
CacheFiles: I/O Error: Unlink failed
FS-Cache: Cache cachefiles stopped due to I/O error
CacheFiles then tries to delete the file for the old object, but the dentry it
has (ffff8800369faac8) no longer points to a valid inode for that directory
entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino.
[kslowd] <== cachefiles_bury_object() = -5
[kslowd] <== cachefiles_delete_object() = -5
[kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD]
[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0})
[kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE]
(Note that the above trace includes extra information beyond that produced by
the upstream code).
The fix is to note when an object that is being retired has had its object
deleted preemptively by a replacement object that is being created, and to
skip the second removal attempt in such a case.
Reported-by: Greg M <gregm@servu.net.au>
Reported-by: Mark Moseley <moseleymark@gmail.com>
Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Duplicate entries ended up acpisleep_dmi_table[] by accident.
They don't hurt functionality, but they are ugly, so let's get
rid of them.
Cc: stable@kernel.org
Signed-off-by: Alex Chiang <achiang@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prepare the arrays for use with the multiregister function. The
future layer-3 xt matches can then be easily added to it without
needing more (un)register code.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>