Previous work to add asynchronous-scsi-scanning support
(d19044c32b) caused peculiar
semantic changes when no cabling was attached to the HBA
whereby unneeded and intrusive 'error-handling' would take
place due to the initial link state being unset.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Similarly to previous LOGO requests on non-24xx hardware,
perform an implicit-LOGO as to avoid the potential 2 *
R_A_TOV delay which can result during an explicit-LOGO
request.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This includes BIOS, EFI, FCODE and firmware versions.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
No restriction should be placed on the IRQ number assigned
to a given ISP. Original code incorrectly assumed a
non-zero IRQ number assignment by the system. In these
circumstances the proper freeing of the IRQ (via free_irq())
would not take place.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch removes a duplicate device id from the IPR driver. Based on
the ipr.h file, I'm not so sure this was intended to be a duplicate, and
if so, the .h file should be modified to use the proper sub-device id
instead.
This was pointed out to me by Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Set allow_restart=1 for all SAS disks so that they are spun up when needed.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Register libsas's default device reset code with the scsi.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch moves the code that handles SAS failures out of the main EH
function and into a separate function. It also detects commands that have
no sas_task (i.e. they completed, but with error data) and sends them into
scsi_error for processing. This allows us to handle SCSI errors (and
enables auto-spinup as a side effect) instead of dropping them on the
floor and falling into an infinite loop. It also requires the
implementation of a device reset function, which the SAS failure code has
been modified to employ for REQ_DEVICE_RESET.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export a couple of functions from scsi_error that are needed to handle
failed SCSI commands from the SAS EH.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
make exports GPL and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Get rid of: "warning: ignoring return value of sysfs_create_link..."
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sas_rphy_delete does two things: it removes the sas_rphy from the transport
layer and frees the sas_rphy. This can be broken down into two functions,
sas_rphy_remove and sas_rphy_free; sas_rphy_remove is of interest to
sas_discover_root_expander because it calls functions that require
sas_rphy_add as a prerequisite and can fail (namely sas_discover_expander).
In that case, sas_discover_root_expander needs to be able to undo the effects
of sas_rphy_add yet leave the job of freeing the sas_rphy to the caller of
sas_discover_root_expander.
This patch also removes some unnecessary code from sas_discover_end_dev
to eliminate an unnecessary cycle of sas_notify_lldd_gone/found for SAS
devices, thus eliminating a sas_rphy_remove call (and fixing a race condition
where a SCSI target scan can come in between the gone and found call).
It also moves the sas_rphy_free calls into sas_discover_domain and
sas_ex_discover_end_dev to complement the sas_rphy_allocation via
sas_get_port_device.
This patch does not change the semantics of sas_rphy_delete.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently, sas_form_port checks the given asd_sas_phy's sas_phy to see if
there's already a port attached. If so, the SAS addresses of the port and
the phy are compared to determine if we need to detach from the port
because the addresses don't match or if we can stop; the SAS address stored
in the sas_port reflects whatever device _was_ attached to the port/phy, and
the SAS address stored in the sas_port reflects whatever device we just
discovered. As written, the code detaches from the port if the addresses
_do_ match, and prints an error if they do _not_ match. I believe this to
be incorrect, as it seems more logical to keep the port if the addresses
match (i.e. the phy was reset but the device didn't change), and detach it
they do not (i.e. the device changed).
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Take the expose_physicals flag and allow the user to select default (physicals
available via /dev/sg), exposed (physicals available via /dev/sd for
experimental reasons) and hidden (physicals blocked from all access). This
expands the functionality of the previous expose_physicals insmod parameter
which was added to support some experimental configurations.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Replace all if/else packet formations with platform function calls. This is in
recognition of the proliferation of read and write packet types, and in the
need to migrate to up-and-coming packets for new products.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Add in the NEMER/ARK physical register mapping, represented in up and coming
products currently under test at Adaptec.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Replace all if/else communication transports with a platform function call.
This is in recognition of the need to migrate to up-and-coming transports.
Currently the Linux driver does not support two available communication
transports provided by our products, these will be added in future patches, and
will expand the platform function set.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Since the pci_block_user_cfg_access API was modified to track
block/unblocks, it was discovered that the ipr driver had a
path through its code (in PCI error recovery) which would unblock
when not previously blocked.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Don't fail initialization of an adapter if the PCI-X registers
cannot be found since it may be a PCI-E adapter.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Since ipr handles dynamic ids, it must handle driver_data
not being set, so remove the current usage of driver_data
so it can be used for other things in future patches.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
fix typos and bump version number
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Alexis Bruemmer <alexisb@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The EH should fall into I_T recovery (and potentially stronger
remedies) if ABORT TASK fails.
Signed-off-by: Alexis Bruemmer <alexisb@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Track sas_ha_struct state so that we ignore events that come in while
we're shutting things down.
Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Convert the phy port locks to use irq spinlocks.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add the necessary hooks to the aic94xx driver to support the asynchronous SCSI
device scan infrastructure.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Allowing the phy reset controls to be world-triggerable does not seem like
a terribly good idea because SAS devices can be disrupted (and ATA devices
are really disrupted) by a phy reset. By default only root should be able
to do things like that.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Extend the use of the DDB lock to include all DDB accesses, because
DDB updates now occur from multiple threads. This fixes the SMP timeout
problems that we were occasionally seeing with a x260, because the
controller got confused when the DDBs got corrupted.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Ed Chim of Adaptec informs us that the DDB registers need to be zeroed at
initialization time and that some SCB initializations need to happen even if
we don't use the SCB.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The vmalloc() blob holding the sequencer firmware wasn't being released at
module unload time, which resulted in a memory leak.
Signed-off-by: Alexis Bruemmer <alexisb@us.ibm.com>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Now that task aborts and device port resets are done by the EH, we can
remove all the code that set up workqueues and such and simply call
sas_task_abort and let libsas figure things out.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sas_task_abort() should simply abort the upper-level SCSI command and wait
until the error handler to send the actual ABORT TASK command. By
deferring things to the EH we simplify the concurrency coordination and
eliminate some race conditions. Note that sas_task_abort has a few hooks
to handle libsas internal commands properly too.
Also rename do_sas_task_abort to __sas_task_abort just in case we really
want to abort the task *right now* and we don't have a scsi_cmnd attached
to the command. This is a hook for libata internal commands to abort.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When a SAS LLDD needs to request a device port reset, it needs to have all
commands aborted before it can reset the port. Since commands are put on
the EH's list in the order that they were queued, the LLDD can set a "need
reset" flag in the last task to be aborted so that the EH can reset the
port after all commands are aborted.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This flag is no longer necessary because we push tasks to be aborted into
the EH as soon as we possibly can, and let the SCSI EH code take care of
the coordination for which this flag was used.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In this driver, TMF_QUERY_TASK translates to QUERY_SSP_TASK. The
sequencer, it seems, is perfectly happy sending us a SSP response, which
this function promptly "converts" into TMF_RESP_FUNC_FAILED. This leads to
the SAS EH making bad decisions based on bad data, so we should not perform
the conversion in this case.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The aic94xx module has a parameter that looks like it should set
lldd_max_execute_num in the sas_ha, but it never sets this value. Either
we should set it or remove the parameter. This allows us to enable task
collector mode for this driver, though it is still off by default.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If we use task collector mode, we can end up destroying the task collector
thread before we release the ports, which is bad if a port release causes
a disk I/O (such as cache flushing).
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Every so often, a scsi_cmnd will time out, and the libsas timeout handler
will discover that the scsi_cmnd does not have a sas_task attached to it.
This can happen in two cases: (1) the scsi_cmnd actually made it through
libsas to the HBA and is now going through scsi_done, or (2) the
scsi_cmnd has been held up (host lock, slab alloc, etc) and libsas has
not yet attached a sas_task. In both cases, it is safe to ask SCSI for
more time to process the command via EH_RESET_TIMER; we cannot blindly
return EH_HANDLED because if (2) happens, we could end up calling
scsi_done while another CPU is heading towards sas_queuecommand, which
causes slab corruption when sas_task_done updates the freed scsi_cmnd.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch lets a user arbitrarily enable or disable a phy via sysfs.
Potential applications include shutting down a phy to replace one
lane of wide port, and (more importantly) providing a method for the
libata SATL to control the phy.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
On a system with many SAS targets, it appears possible that a scsi_cmnd
can time out without ever making it to the SAS LLDD or at the same time
that a completion is occurring. In both of these cases, telling the
LLDD to abort the sas_task makes no sense because the LLDD won't know
about the sas_task; what we really want to do is to increase the timer.
Note that this involves creating another sas_task bit to indicate
whether or not the task has been sent to the LLDD; I could have
implemented this by slightly redefining SAS_TASK_STATE_PENDING, but
this way seems cleaner.
This second version amends the aic94xx portion to set the
TASK_AT_INITIATOR flag for all sas_tasks that were passed to
lldd_execute_task.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sas_get_port_device assigns a rphy to a domain device in anticipation
of finding a disk. When a discovery error occurs in
sas_discover_{sata,sas,expander}*, however, we need to clean up that
rphy and the port device list so that we don't GPF. In addition, we
need to check the result of the second sas_notify_lldd_dev_found.
This patch seems ok on a x260, x366 and x206m.
This patch fixes up sas_expander.c separately because jejb has some
cleanup patches of his own that are a prerequisite.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sas_get_port_device assigns a rphy to a domain device in anticipation
of finding a disk. When a discovery error occurs in
sas_discover_{sata,sas,expander}*, however, we need to clean up that
rphy and the port device list so that we don't GPF. In addition, we
need to check the result of the second sas_notify_lldd_dev_found.
This patch seems ok on a x260, x366 and x206m.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Removed spin_unlock_irq()/spin_lock_irq() pairs surrounding
starget_for_each_device() calls.
As Matthew W. pointed out, starget_for_each_device() can be called under
a spinlock being held.
The change has been tested and verified on qla2xxx.ko module.
Thanks Matthew W. and Hisashi H. for help.
Signed-off-by: Andrew Vasquez <Andrew.vasquez@qlogic.com>
Signed-off-by: Seokmann Ju <Seokmann.ju@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Hi,
Minor typo ...
In my first iteration of patches (that got merged), the
BLIST_ATTACH_PQ3 actually had the value 0x800000, but that
got changed later to avoid conflicts. This piece must have
been overlooked.
You could obviously do something like %x and then add the
bitflags, but that looks overkill for something that does
not tend to change.
Please merge.
(Patch applied against latest 2.6.20rc version that I tested.)
From: Kurt Garloff <kurt@garloff.de>
Subject: [SCSI SCAN] Fix logging message for PQ3 devices
The blacklist flags BLIST_ATTACH_PQ3 has value 0x1000000,
not 0x800000.
Signed-off-by: Kurt Garloff <garloff@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
More megaraid kernel-doc fixes.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Sumant Patro <sumantp@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
kernel-doc modifications:
- change "@param var" notation to @var;
- change function/description separator from ':' to '-';
- change var/description separator from '-' to ':';
- fix a few doc. typos;
- don't use kernel-doc /** lead-in when the doc. block is not kernel-doc;
- use Linux common */ ending comment format instead of **/;
- use correct function parameter names;
- place function parameters immediately after the function short description;
- place kernel-doc immediately before its function or macro;
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Sumant Patro <sumantp@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>