In rare cases, open port request might timeout, erp calls
zfcp_port_put, port gets dequeued. Now, the late returning (or
dismissed) fsf-port-open calls the fsf_port_open_handler that tries to
reference the port data structure leading to a kernel oops.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Add comments where there is a deliberate fall through in switch/case
statements. This makes some code checkers happy and makes it clear
that there is no missing break statement.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
enum dma_data_direction only has the 4 values DMA_BIDIRECTIONAL,
DMA_TO_DEVICE, DMA_FROM_DEVICE and DMA_NONE. No need to have the
default case. While changing this, setup sbtype in one place to make
sparse happy.
The default value of retval is already -EIO, so remove the
additional assignment for these two cases.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
zfcp did always set the queue_depth for SCSI devices to 32, not
allowing to change this. Introduce a kernel parameter zfcp.queue_depth
and the change_queue_depth callback to allow changing the queue_depth
when it is required.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Update the newly introduced message for the boxed status to conform to
match the style of s390 and zfcp messages.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The zfcp traces used the fsf_req address in place of the req_id.
Change this to save the correct req_id.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The zfcp_port might have been removed, while the FC fast_io_fail timer
is still running and could trigger the terminate_rport_io callback.
Set the pointer to the zfcp_port to NULL and check accordingly
before using it.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Before dropping the reference count with zfcp_adapter_put, increase it
with zfcp_adapter_get when issuing cfdc requests.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If this problem appears zfcp ports cannot be de-queued since it is
checked for a zero refcount. The port reference counting is wrong for
existing zfcp ports when e.g. an adapter gets on-line again. During
port scanning the reference counting for existing ports should not be
changed.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The current sbal counting can be wrong if a fsf request is
waiting for free sbals and at the same time qdio request queue
is shutdown and re-opened. Revering a previous patch fixes this
issue.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When the abort handler cannot find a pending FSF request, the request
completion could just be running. This means we cannot return SUCCESS,
since this would lead to call to scsi_done after exiting the SCSI
error handler which is not allowed.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
A remote port remains in error state even if we receive a RSCN
stating that the connection is re-established. The port recovery
is not started due to a flag which is not reset.
The solution is to clear the flag in question before we trigger a ERP.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Error codes specific to the control file requests are evaluated by the
actcli tool, so don't report -ENXIO for those. Generic problems are
still checked for outside the command specific handler.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
On some hardware it can take some time to add a unit. If
some remove this unit during this process the remove will
fail.
Signed-off-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The remote port remains in error state even if the connection
is re-established. A wrong precondition check was performed on
the port status leading to a cancellation of the port reopen.
Remove the pre-req check because it's not required and better
handled within the ERP.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The ERP thread is performing a task before it is executing the
corresponding down on the semaphore. The response handler of the
just started exchange config should wait for the completion by
performing a down on this semaphore. Since this semaphore is still
positive from the ERP enqueue the handler won't wait and therefore
the exchange config will always fail leaving the adapter in error.
The problem can be solved by performing the down on the semaphore
before starting an ERP task. This is the logically correct order.
Only walk the ERP loop if there is a task to perform.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The nameserver port might be in state online when the adapter is
offlined. On adapter reactivation the nameserver port is not
re-opened due to the PORT_ONLINE status. This results in an
unsuccessful recovery. In forcing the nameserver port status
to offline on all adapter offline events this issue is prevented.
Waiting for the reference count to drop to zero in
zfcp_wka_port_offline is not required, so remove it.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When running the scsi_scan from the zfcp workqueue and the target
device does not respond, the zfcp workqueue can block until the
scsi_scan hits a timeout. Move the work to the scsi host workqueue,
since this one is also used for the scan from the SCSI midlayer.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fix problem that zfcp_fsf_exchange_config_data_sync and
zfcp_fsf_exchange_config_data_sync could try to call zfcp_fsf_req_free
with a NULL pointer.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Since we're setting the host port type now to FC_PORTTYPE_NPIV
for adapters running in NPIV mode we should allow this port type
for auto-port scanning as well.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Avoid referencing a fsf request after sending it in fcp_fsf_req_send,
it might have already completed and deallocated.
Signed-off-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
tape_3590_offline and tape_34xx_offline are removed and tape_generic_offline
is called directly instead.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A ccw command that reads or writes several records at once will
usually transfer more data then fits into one page and needs to
address memory areas using a list of indirect data address words
(idaw). All but the first of these areas must start on a 4KB or 2KB
block boundary (depending on the idaw format).
A check for this restriction was missing and has been added with
this patch.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The dasd driver can automatically online detected dasds, which
especially important for finding the root device. Currently,
it will wait for each online operation to finish individually,
which may take long if many dasds need to be onlined. When using
the new async framework, these onlining operations can run in
parallel and presence of the root device is ensured by the fact
that prepare_namespace() waits for all async threads to finish.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The QDIO ccw devices are started by ccw_device_start so no timeout
can occur for the interrupt handler. Remove the dead code.
In case of an I/O error set the device state to error and wake up
a possibly running qdio_shutdown waiter.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It is a fairly common operation to have a pointer to a work and to need a
pointer to the delayed work it is contained in. In particular, all
delayed works which want to rearm themselves will have to do that. So it
would seem fair to offer a helper function for this operation.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 64ef895798 ("qeth: remove EDDP")
removed the qeth_core_offl.[hc] files, but ended up doing so by just
patching them to zero size, rather than removing them properly.
Actually remove the files.
Reported-by: Andrew Price <andy@andrewprice.me.uk>
Cc: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Start a new device recognition if someone writes to sysfs online attribute
of a boxed ccw device. The current test will fail, since cu_type != 0
for devices which were recognized before.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Return -EAGAIN on writes to sysfs online attribute if the corresponding
ccw device is in transient state.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If a ccw device did not respond in time during internal io, we set it
into boxed state. With this patch we have the following behaviour:
* the ccw driver will get a notification if the device was online and
goes into the boxed state
* if the device was disconnected and got boxed nothing special is to be
done (it will be handled in reprobing later)
* if the device got boxed while initial sensing it will be unregistered
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce ccw_device_schedule_sch_unregister as a wrapper for queuing
ccw_device_call_sch_unregister on the slow_path_wq. This wrapper
will be used in the next patch.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Wake up even on failed device recognition, since this may be triggered
from a user trying to force a device online. With this patch a write
to the online sysfs attribute will not block for ever but return with
-EAGAIN in this case.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (81 commits)
[S390] remove duplicated #includes
[S390] cpumask: use mm_cpumask() wrapper
[S390] cpumask: Use accessors code.
[S390] cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.
[S390] cpumask: remove cpu_coregroup_map
[S390] fix clock comparator save area usage
[S390] Add hwcap flag for the etf3 enhancement facility
[S390] Ensure that ipl panic notifier is called late.
[S390] fix dfp elf hwcap/facility bit detection
[S390] smp: perform initial cpu reset before starting a cpu
[S390] smp: fix memory leak on __cpu_up
[S390] ipl: Improve checking logic and remove switch defaults.
[S390] s390dbf: Remove needless check for NULL pointer.
[S390] s390dbf: Remove redundant initilizations.
[S390] use kzfree()
[S390] BUG to BUG_ON changes
[S390] zfcpdump: Prevent zcore from beeing built as a kernel module.
[S390] Use csum_partial in checksum.h
[S390] cleanup lowcore.h
[S390] eliminate ipl_device from lowcore
...
Use kzfree() instead of memset() + kfree().
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The cksm function in system.h is duplicate to csum_partial in checksum.h.
Remove cksm and use csum_partial instead.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Trivial cleanup, list_del(); list_add{,_tail}() is equivalent
to list_move{,_tail}(). Semantic patch for coccinelle can be
found at www.cccmz.de/~snakebyte/list_move_tail.spatch
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is a cleanup of all the messages this driver prints. It uses the
dev_message macros now.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The inbound and outbound handlers are nearly identical if the outbound
handler uses first_to_check as end index instead of last_move. Since both
values are identical at that point the handlers can be merged.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Errors from SIGA instructions are stored in the per queue qdio_error
and reported back when the queue handler is called. That opens a race
when multiple error conditions occur simultanously.
Report SIGA errors immediately in the return value of do_QDIO so the
upper layer can react and SIGA errors no longer interfere with other
errors.
Move the SIGA error handling in qeth from the outbound handler to
qeth_flush_buffers.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
If the qdio module is unloaded the tiqdio tasklet must be terminated
by tasklet_kill. Move the tasklet_kill after the unregistration of
the adapter interrupt so the tiqdio tasklet will not be scheduled
anymore before calling tasklet_kill.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The index value that indicated that the input queue moved was also used to
store the index of the first acknowledged buffer. For non-qebsm only the
newest buffer is acknowledged which may be different from the last move index
so two seperate values are needed to track the input queue.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The ACKnowledgement state should be set on the newest SBAL so an
adapter interrupt surpression check needs to scan fewer SBALs.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
qdio_cleanup is a wrapper function that should call qdio_shutdown and
qdio_free. qdio_free was not called if an error occured in qdio_shutdown
resulting in a missing free of allocated resources.
Call qdio_free regardless of the return value of qdio_shutdown.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queue tasklets were stopped with tasklet_disable. Although tasklet_disable
prevents the tasklet from beeing executed it is still possible that a tasklet
is scheduled on a CPU at that point. A following qdio_establish calls
tasklet_init which clears the tasklet count and the tasklet state leading to
the following Oops:
<2>kernel BUG at kernel/softirq.c:392!
<4>illegal operation: 0001 [#1] SMP
<4>Modules linked in: iptable_filter ip_tables x_tables dm_round_robin dm_multipath scsi_dh sg sd_mod crc_t10dif nfs lockd nfs
_acl sunrpc fuse loop dm_mod qeth_l3 ipv6 zfcp qeth scsi_transport_fc qdio scsi_tgt scsi_mod chsc_sch ccwgroup dasd_eckd_mod dasdm
od ext3 mbcache jbd
<4>Supported: Yes
<4>CPU: 0 Not tainted 2.6.27.13-1.1.mz13-default #1
<4>Process blast.LzS_64 (pid: 16445, task: 000000006cc02538, ksp: 000000006cb67998)
<4>Krnl PSW : 0704c00180000000 00000000001399f4 (tasklet_action+0xc8/0x1d4)
<4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
<4>Krnl GPRS: ffffffff00000030 0000000000000002 0000000000000002 fffffffffffffffe
<4> 000000000013aabe 00000000003b6a18 fffffffffffffffd 0000000000000000
<4> 00000000006705a8 000000007d0914a8 000000007d0914b0 000000007fecfd30
<4> 0000000000000000 00000000003b63e8 000000007fecfd90 000000007fecfd30
<4>Krnl Code: 00000000001399e8: b9200021 cgr %r2,%r1
<4> 00000000001399ec: a7740004 brc 7,1399f4
<4> 00000000001399f0: a7f40001 brc 15,1399f2
<4> >00000000001399f4: c0100027e8ee larl %r1,636bd0
<4> 00000000001399fa: bf1f1008 icm %r1,15,8(%r1)
<4> 00000000001399fe: a7840019 brc 8,139a30
<4> 0000000000139a02: c0300027e8ef larl %r3,636be0
<4> 0000000000139a08: e3c030000004 lg %r12,0(%r3)
<4>Call Trace:
<4>([<0000000000139c12>] tasklet_hi_action+0x112/0x1d4)
<4> [<000000000013aabe>] __do_softirq+0xde/0x1c4
<4> [<000000000010fa2e>] do_softirq+0x96/0xb0
<4> [<000000000013a8d8>] irq_exit+0x70/0xcc
<4> [<000000000010d1d8>] do_extint+0xf0/0x110
<4> [<0000000000113b10>] ext_no_vtime+0x16/0x1a
<4> [<000003e0000a3662>] ext3_dirty_inode+0xe6/0xe8 [ext3]
<4>([<00000000001f6cf2>] __mark_inode_dirty+0x52/0x1d4)
<4> [<000003e0000a44f0>] ext3_ordered_write_end+0x138/0x190 [ext3]
<4> [<000000000018d5ec>] generic_perform_write+0x174/0x230
<4> [<0000000000190144>] generic_file_buffered_write+0xb4/0x194
<4> [<0000000000190864>] __generic_file_aio_write_nolock+0x418/0x454
<4> [<0000000000190ee2>] generic_file_aio_write+0x76/0xe4
<4> [<000003e0000a05c2>] ext3_file_write+0x3e/0xc8 [ext3]
<4> [<00000000001cc2fe>] do_sync_write+0xd6/0x120
<4> [<00000000001ccfc8>] vfs_write+0xac/0x184
<4> [<00000000001cd218>] SyS_write+0x68/0xe0
<4> [<0000000000113402>] sysc_noemu+0x10/0x16
<4> [<0000020000043188>] 0x20000043188
<4>Last Breaking-Event-Address:
<4> [<00000000001399f0>] tasklet_action+0xc4/0x1d4
<6>qdio: 0.0.c61b ZFCP on SC f67 using AI:1 QEBSM:0 PCI:1 TDD:1 SIGA: W AOP
<4> <0>Kernel panic - not syncing: Fatal exception in interrupt
Use tasklet_kill instead of tasklet_disbale. Since tasklet_schedule must not be
called after tasklet_kill use the QDIO_IRQ_STATE_STOPPED to inidicate that a
queue is going down and prevent further tasklet schedules in that case.
Remove superflous tasklet_schedule from input queue setup, at that time
the queues are not ready so the schedule results in a NOP.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the call to qdio_shutdown from qdio_activate since the upper-layer
drivers are responsible to call qdio_shutdown when qdio_activate returns
with an error.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>