1
Commit Graph

797 Commits

Author SHA1 Message Date
Roland Dreier
69e9fbb460 IB/mthca: Clean up mthca array index mask
Define a constant MTHCA_ARRAY_MASK to replace repeated uses of
(PAGE_SIZE / sizeof (void *) - 1) in mthca array code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 09:44:22 -07:00
Michael S. Tsirkin
bf74c7479e IB/mthca: Fix mthca_array_clear() thinko
mthca_array_clear() does not clear the slot if the used count is
positive. This leads to crashes in mthca_qp_event() since that uses
mthca_array_get() to check that the qp is valid.

Discovered by Ali Ayoub.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 09:44:21 -07:00
Tom Tucker
e795d09250 [NET] infiniband: Cleanup ib_addr module to use the netevents
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:22 -07:00
Mike Christie
1c83469d36 [SCSI] iscsi bugfixes: fix oops when iser is flushing io
When we enter recovery and flush the running commands
we cannot freee the connection before flushing the commands.
Some commands may have a reference to the connection
that needs to be released before. iscsi_stop was forcing
the term and suspend too early and was causing a oops
in iser, so this patch removes those callbacks all together
and allows the LLD to handle that detail.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-07-28 11:48:32 -05:00
Roland Dreier
8fdf679fdb IB/mthca: Initialize max_cmds before debug code prints it
Read the max_cmds value from the response to the QUERY_FW command
before printing out the value, so that the real value goes into the
debug output.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:36:50 -07:00
Michael S. Tsirkin
8a7f752125 IB/ipoib: Fix packet loss after hardware address update
The neighbour ha field may get updated without destroying the
neighbour.  In this case, the ha field gets out of sync with the
address handle stored in ipoib_neigh->ah, with the result that
the ah field would point to an incorrect path, resulting in all
packets being lost.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Or Gerlitz
624d01f899 IB/ipoib: Fix oops with ipoib_debug_mcast set
Need to set mcast->ah before debug code dereferences it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Sean Hefty
2527e681fd IB/mad: Validate MADs for spec compliance
Validate MADs sent by userspace clients for spec compliance with
C13-18.1.1 (prevent duplicate requests and responses sent on the
same port).  Without this, RMPP transactions get aborted because
of duplicate packets.

This patch is similar to that provided by Jack Morgenstein.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Ralph Campbell
16c59419a0 IB/ipath: ipath_skip_sge() can break if num_sge > 1
ipath_skip_sge() doesn't exactly duplicate the side effects of
ipath_copy_sge() if num_sge > 1 since it doesn't decrement ss->num_sge.
This could result in the sg_list being accessed out of bounds.
Since ipath_skip_sge() is almost always called with num_sge == 1,
the original "optimization" is almost never used.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Ralph Campbell
c9f79bdc21 IB/ipath: Fix ib_ipath driver to work with SRP
I am still working on a proposal to remove the phys_to_virt() calls
in the ib_ipath driver.  In the mean time, this patch allows SRP
to work by fixing the R_Key check and conversion from IB address
to kernel virtual address.  It also returns the correct page size
for FMRs.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Ralph Campbell
3d37b9e209 IB/ipath: Fix a data corruption
This patch fixes a problem where certain error packets are passed
to the InfiniBand layer for processing even though the packet
actually was received with an error.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:05 -07:00
Dotan Barak
1252c517cf IB/mthca: Fix SRQ limit event range check
Mem-free HCAs always keep one spare SRQ WQE, so the SRQ limit cannot
be set beyond srq->max - 1.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 07:20:32 -07:00
Roland Dreier
43db2bc044 IB/uverbs: Fix lockdep warnings
Lockdep warns because uverbs is trying to take uobj->mutex when it
already holds that lock.  This is because there are really multiple
types of uobjs even though all of their locks are initialized in
common code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-23 15:16:04 -07:00
Michael S. Tsirkin
ec924b4726 IB/uverbs: Fix unlocking in error paths
ib_uverbs_create_ah() and ib_uverbs_create_srq() did not release the
PD's read lock in their error paths, which lead to deadlock when
destroying the PD.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-23 15:16:03 -07:00
Michael S. Tsirkin
e322fedf0c [PATCH] IB/core: use correct gfp_mask in sa_query
Avoid bogus out of memory errors: fix sa_query to actually pass gfp_mask
supplied by the user to idr_pre_get.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: "Sean Hefty" <mshefty@ichips.intel.com>
Acked-by: "Roland Dreier" <rdreier@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:51 -07:00
Michael S. Tsirkin
adfaa888a2 [PATCH] fmr pool: remove unnecessary pointer dereference
ib_fmr_pool_map_phys gets the virtual address by pointer but never writes
there, and users (e.g.  srp) seem to assume this and ignore the value
returned.  This patch cleans up the API to get the VA by value, and updates
all users.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:51 -07:00
Ira Weiny
74f76fbac7 [PATCH] IB/cm: set private data length for reject messages
Set private data length for reject messages to the correct size.  Fix from
openib svn r8483.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Vu Pham
6583eb3dcc [PATCH] srp: fix fmr error handling
srp_unmap_data assumes req->fmr is NULL if the request is not mapped, so we
must clean it out in case of an error.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Michael S. Tsirkin
f0ee3404cc [PATCH] IB/addr: gid structure alignment fix
The device address contains unsigned character arrays, which contain raw GID
addresses.  The GIDs may not be naturally aligned, so do not cast them to
structures or unions.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Michael S. Tsirkin
04c335430f [PATCH] IB/cm: drop REQ when out of memory
If a user of the IB CM returns -ENOMEM from their connection callback, simply
drop the incoming REQ - do not attempt to send a reject.  This should allow
the sender to retry the request.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Michael S. Tsirkin
0964d91618 [PATCH] IB/mthca: comment fix
After recent changes, mthca_wq_init does not actually initialize the WQ as it
used to - it simply resets all index fields to their initial values.  So,
let's rename it to mthca_wq_reset.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Jack Morgenstein
2290d2c9f5 [PATCH] IB/mthca: fix static rate returned by mthca_ah_query
mthca_ah_query returs the static rate of the address handle in internal mthc
format.  fix it to use rate encoding from enum ib_rate, which is what users
expect.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Zach Brown
a46f9484f8 [PATCH] mthca: initialize send and receive queue locks separately
mthca: initialize send and receive queue locks separately

lockdep identifies a lock by the call site of its initialization.  By
initializing the send and receive queue locks in mthca_wq_init() we confuse
lockdep.  It warns that that the ordered acquiry of both locks in
mthca_modify_qp() is recursive acquiry of one lock:

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  modprobe/1192 is trying to acquire lock:
   (&wq->lock){....}, at: [<f892b4db>] mthca_modify_qp+0x60/0xa7b [ib_mthca]
  but task is already holding lock:
   (&wq->lock){....}, at: [<f892b4ce>] mthca_modify_qp+0x53/0xa7b [ib_mthca]

Initializing the locks separately in mthca_alloc_qp_common() stops the
warning and will let lockdep enforce proper ordering on paths that acquire
both locks.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-04 10:24:57 -07:00
James Bottomley
c4e00fac42 Merge ../scsi-misc-2.6
Conflicts:

	drivers/scsi/nsp32.c
	drivers/scsi/pcmcia/nsp_cs.c

Removal of randomness flag conflicts with SA_ -> IRQF_ global
replacement.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-07-03 09:41:12 -05:00
Thomas Gleixner
dace145374 [PATCH] irq-flags: misc drivers: Use the new IRQF_ constants
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02 13:58:50 -07:00
Bryan O'Sullivan
27b678dd04 [PATCH] IB/ipath: namespace cleanup: replace ips with ipath
Remove ips namespace from infinipath drivers.  This renames ips_common.h to
ipath_common.h.  Definitions, data structures, etc.  that were not used by
kernel modules have moved to user-only headers.  All names including ips have
been renamed to ipath.  Some names have had an ipath prefix added.

Signed-off-by: Christian Bell <christian.bell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:02 -07:00
Bryan O'Sullivan
357b552ff3 [PATCH] IB/ipath: ignore receive queue size if SRQ is specified
The receive work queue size should be ignored if the QP is created to use a
shared receive queue according to the IB spec.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:02 -07:00
Bryan O'Sullivan
3e0018bc74 [PATCH] IB/ipath: remove some #if 0 code related to lockable memory
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
35783ec07c [PATCH] IB/ipath: fix a bug that results in addresses near 0 being written via DMA
We can't tell for sure if any packets are in the infinipath receive buffer
when we shut down a chip port.  Normally this is taken care of by orderly
shutdown, but when processes are terminated, or sending process has a bug, we
can continue to receive packets.  So rather than writing zero to the address
registers for the closing port, we point it at a dummy memory.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
6d8e9dd050 [PATCH] IB/ipath: read/write correct sizes through diag interface
We must increment uaddr by size we are reading or writing, since it's passed
as a char *, not a pointer to the appropriate size.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
8307c28eec [PATCH] IB/ipath: support more models of InfiniPath hardware
We do a few more explicit checks for specific models, and now also support the
old PathScale serial number style, or new QLogic style.

This is backwards compatible with previous versions of software and hardware.
That is, older software will see a plausible serial number and correct GUID
when used with a new board, while newer software will correctly handle an
older board.

Signed-off-by: Mike Albaugh <mike.albaugh@qlogic.com>
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
46bbeac922 [PATCH] IB/ipath: drop the "stats" sysfs attribute group
This attribute group made it into the original driver, but should not have.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
1eb68b990a [PATCH] IB/ipath: purge sps_lid and sps_mlid arrays
The two arrays only had space for 4 units.

Also changed from ipath_set_sps_lid() to ipath_set_lid(); the sps was
leftover.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
12eef41f8b [PATCH] IB/ipath: rC receive interrupt performance changes
This patch separates QP state used for sending and receiving RC packets so the
processing in the receive interrupt handler can be done mostly without locks
being held.  ACK packets are now sent without requiring synchronization with
the send tasklet.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
fba75200ad [PATCH] IB/ipath: fixes to performance get counters for IB compliance
This patch fixes some problems uncovered during IB compliance testing to
return the right values for error counters returned by the Performance Get
Counters packet.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
85322947d7 [PATCH] IB/ipath: check for valid LID and multicast LIDs
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:01 -07:00
Bryan O'Sullivan
7cd658cd2b [PATCH] IB/ipath: removed redundant statements
The tail register read became redundant as the result of earlier receive
interrupt bug fixes.

Drop another unneeded register read.

And another line that got duplicated.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
c100f622fd [PATCH] IB/ipath: don't confuse the max message size with the MTU
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
4faba98a1a [PATCH] IB/ipath: disallow send of invalid packet sizes over UD
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
57abad25f8 [PATCH] IB/ipath: fix lost interrupts on HT-400
Do an extra check to see if in-memory tail changed while processing packets,
and if so, going back through the loop again (but only once per call to
ipath_kreceive()).  In practice, this seems to be enough to guarantee that if
we crossed the clearing of an interrupt at start of ipath_intr with a
scheduled tail register update, that we'll process the "extra" packet that
lost the interrupt because we cleared it just as it was about to arrive.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
f5f99929ac [PATCH] IB/ipath: fixed bug 9776 for real
The problem was that I was updating the head register multiple times in the
rcvhdrq processing loop, and setting the counter on each update.  Since that
meant that the tail register was ahead of head for all but the last update, we
would get extra interrupts.  The fix was to not write the counter value except
on the last update.

I also changed to update rcvhdrhead and rcvegrindexhead at most every 16
packets, if there were lots of packets in the queue (and of course, on the
last packet, regardless).

I also made some small cleanups while debugging this.

With these changes, xeon/monty typically sees two openib packets per interrupt
on sdp and ipoib, opteron/monty is about 1.25 pkts/intr.

I'm seeing about 3800 Mbit/s monty/xeon, and 5000-5100 opteron/monty with
netperf sdp.  Netpipe doesn't show as good as that, peaking at about 4400 on
opteron/monty sdp.  Plain ipoib xeon is about 2100+ netperf, opteron 2900+, at
128KB

Signed-off-by: olson@eng-12.pathscale.com
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
13aef4942c [PATCH] IB/ipath: reduce overhead on receive interrupts
Also count the number of interrupts where that works (fastrcvint).  On any
interrupt where the port0 head and tail registers are not equal, just call the
ipath_kreceive code without reading the interrupt status, thus saving the
approximately 0.25usec processor stall waiting for the read to return.  If any
other interrupt bits are set, or head==tail, take the normal path, but that
has been reordered to handle read ahead of pioavail.  Also no longer call
ipath_kreceive() from ipath_qcheck(), because that just seems to make things
worse, and isn't really buying us anything, these days.

Also no longer loop in ipath_kreceive(); better to not hold things off too
long (I saw many cases where we would loop 4-8 times, and handle thousands (up
to 3500) in a single call).

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
f37bda9246 [PATCH] IB/ipath: memory management cleanups
Made in-memory rcvhdrq tail update be in dma_alloc'ed memory, not random user
or special kernel (needed for ppc, also "just the right thing to do").

Some cleanups to make unexpected link transitions less likely to produce
complaints about packet errors, and also to not leave SMA packets stuck and
unable to go out.

A few other random debug and comment cleanups.

Always init rcvhdrq head/tail registers to 0, to avoid race conditions (should
have been that way some time ago).

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
06993ca6bc [PATCH] IB/ipath: use vmalloc to allocate struct ipath_devdata
This is not a DMA target, so no need to use dma_alloc_coherent on it.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
0ed9a4a0b6 [PATCH] IB/ipath: use more appropriate gfp flags
This helps us to survive better when memory is fragmented.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:00 -07:00
Bryan O'Sullivan
a40f55fc33 [PATCH] IB/ipath: enable freeze mode when shutting down device
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
b1d8865a20 [PATCH] IB/ipath: print better debug info when handling 32/64-bit DMA mask problems
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
b35f004dd3 [PATCH] IB/ipath: removed unused field ipath_kregvirt from struct ipath_devdata
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
fe62546a6a [PATCH] IB/ipath: enforce device resource limits
These limits are somewhat artificial in that we don't actually have any
device limits.  However, the verbs layer expects that such limits exist
and are enforced, so we make up arbitrary (but sensible) limits.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
e8a88f09f2 [PATCH] IB/ipath: report correct device identification information in /sys
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
9edbd990bb [PATCH] IB/ipath: return an error for unknown multicast GID
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
60460dfd42 [PATCH] IB/ipath: fix some memory leaks on failure paths
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
4a45b7d4ec [PATCH] IB/ipath: don't allow resources to be created with illegal values
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
6665ddee85 [PATCH] IB/ipath: remove some duplicate code
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
685f97e81b [PATCH] IB/ipath: update some comments and fix typos
Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
a2acb2ff36 [PATCH] IB/ipath: allow diags on any unit
There is no longer a /dev/ipath_diag file; instead, there's
/dev/ipath_diag0, 1, etc.

It's still not possible to have diags run on more than one unit at a time,
but that's easy to fix at some point.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:58 -07:00
Bryan O'Sullivan
6700efdfc0 [PATCH] IB/ipath: fix shared receive queues for RC
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:58 -07:00
Bryan O'Sullivan
7bbb15ea85 [PATCH] IB/ipath: fix an indenting problem
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:58 -07:00
Bryan O'Sullivan
ddd4bb2210 [PATCH] IB/ipath: share more common code between RC and UC protocols
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:58 -07:00
Bryan O'Sullivan
759d57686d [PATCH] IB/ipath: update copyrights and other strings to reflect new company name
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:58 -07:00
Bryan O'Sullivan
443a64abbc [PATCH] IB/ipath: name zero counter offsets so it's clear they aren't counters
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:58 -07:00
Linus Torvalds
93fdf10d4c Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/core: Set alternate port number when initializing QP attributes
  IB/uverbs: Set correct user handle for user SRQs
2006-06-30 15:39:49 -07:00
Linus Torvalds
22a3e233ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Remove obsolete #include <linux/config.h>
  remove obsolete swsusp_encrypt
  arch/arm26/Kconfig typos
  Documentation/IPMI typos
  Kconfig: Typos in net/sched/Kconfig
  v9fs: do not include linux/version.h
  Documentation/DocBook/mtdnand.tmpl: typo fixes
  typo fixes: specfic -> specific
  typo fixes in Documentation/networking/pktgen.txt
  typo fixes: occuring -> occurring
  typo fixes: infomation -> information
  typo fixes: disadvantadge -> disadvantage
  typo fixes: aquire -> acquire
  typo fixes: mecanism -> mechanism
  typo fixes: bandwith -> bandwidth
  fix a typo in the RTC_CLASS help text
  smb is no longer maintained

Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30 15:39:30 -07:00
Sean Hefty
0d8fdfd71b IB/core: Set alternate port number when initializing QP attributes
Set alternate port number when initializing QP attributes.  This bug
is OpenFabrics bugzilla bug #160.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-30 14:10:14 -07:00
Roland Dreier
146d26b2bf IB/uverbs: Set correct user handle for user SRQs
Store away the user handle passed in from userspace when creating an
SRQ, so that the kernel can return the correct handle when an SRQ
asynchronous event occurs.  (A 0 was incorrectly stored as the user
handle as part of the changes in 9ead190b, "IB/uverbs: Don't serialize
with ib_uverbs_idr_mutex")

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-30 13:40:13 -07:00
Andrew Morton
cfa7b0d469 [PATCH] infiniband: devfs fix
Remove devfs leftovers.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:41 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Adrian Bunk
80f7228b59 typo fixes: occuring -> occurring
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 18:27:16 +02:00
Mike Christie
358ff019b8 [SCSI] iscsi: convert iser to new set/get param fns
Convert iser to libiscsi get/set param functions.
Fix bugs in it returning old error return values and
have it expose exp_statsn.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-06-29 11:07:41 -04:00
Greg Kroah-Hartman
e29419fffc [PATCH] 64bit resource: fix up printks for resources in misc drivers
This is needed if we wish to change the size of the resource structures.

Based on an original patch from Vivek Goyal <vgoyal@in.ibm.com>

Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-27 09:23:59 -07:00
Eric Sesterhenn
5fd571cbc1 [PATCH] Array overrun in drivers/infiniband/core/cma.c
This was spotted by coverity #id 1300.  Since the array has only four
elements, we should just use those four.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 11:57:28 -07:00
Akinobu Mita
179e09172a [PATCH] drivers: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under drivers/.

Acked-by: Corey Minyard <minyard@mvista.com>
Cc: Ben Collins <bcollins@debian.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Alasdair Kergon <dm-devel@redhat.com>
Cc: Gerd Knorr <kraxel@bytesex.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Pavlic <fpavlic@de.ibm.com>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Cc: Andrew Vasquez <linux-driver@qlogic.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Linus Torvalds
61b9175808 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: iSER Kconfig and Makefile
  IB/iser: iSER handling of memory for RDMA
  IB/iser: iSER RDMA CM (CMA) and IB verbs interaction
  IB/iser: iSER initiator iSCSI PDU and TX/RX
  IB/iser: iSCSI iSER transport provider high level code
  IB/iser: iSCSI iSER transport provider header file
  IB/uverbs: Remove unnecessary list_del()s
  IB/uverbs: Don't free wr list when it's known to be empty
2006-06-25 16:07:58 -07:00
David Howells
454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Or Gerlitz
3f1244a2f8 IB/iser: iSER Kconfig and Makefile
Kconfig and Makefile for iSER.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:14 -07:00
Or Gerlitz
6461f64ab5 IB/iser: iSER handling of memory for RDMA
This file contains the processing carried over an SG list associated with
a SCSI command such that it can be registered with the IB verbs. The
registration produces a network virtual address (VA) and a remote access
key (RKEY or STAG in iSER spec notation) which are used by the target for
its RDMA operation.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:12 -07:00
Or Gerlitz
1cfa0a75db IB/iser: iSER RDMA CM (CMA) and IB verbs interaction
This file contains the low level interaction with the RDMA CM
and the IB verbs, where iSER is consumer of both.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:11 -07:00
Or Gerlitz
e85b24b5e7 IB/iser: iSER initiator iSCSI PDU and TX/RX
This file contains the iSER initiator processing of iSCSI PDUs - controls,
commands and data-outs along with processing of TX and RX completions.
It interacts with the lower level iser code doing the memory registration
and and the cma and verbs calls.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:09 -07:00
Or Gerlitz
65e7ae7bfc IB/iser: iSCSI iSER transport provider high level code
This file contains the code that registeres with the iscsi transport manager
and with the SCSI Mid Layer, where much of the provided functions to iSCSI and
SCSI are implemented in libiscsi.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:07 -07:00
Or Gerlitz
49cd5382f6 IB/iser: iSCSI iSER transport provider header file
iSER (iSCSI Extensions for RDMA) transport provider driver for the iSCSI
initiator, whose other parts (under drivers/scsi) are scsi_transport_iscsi
- the transport management module, iscsi_tcp - the TCP transport provider
module and libiscsi - a kernel library (module) implementing functionality
needed by both TCP and iSER transports. iSER is both a provider of the iSCSI
transport api and a SCSI low level driver.

This file contains internal data structures and non static service functions.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:05 -07:00
Roland Dreier
9b8efc0242 IB/uverbs: Remove unnecessary list_del()s
In ib_uverbs_cleanup_ucontext(), when iterating through the lists of
objects, there's no reason to do list_del() to remove the objects,
since both the objects and the lists that contain them are about to be
freed anyway.  Since list_del() is a moderately big inline function,
getting rid of this extra work saves quite a bit of .text:

add/remove: 0/0 grow/shrink: 1/2 up/down: 3/-217 (-214)
function                                     old     new   delta
ib_uverbs_comp_handler                       225     228      +3
ib_uverbs_async_handler                      256     255      -1
ib_uverbs_close                              905     689    -216

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:47:27 -07:00
Krishna Kumar
183208284e IB/uverbs: Don't free wr list when it's known to be empty
In ib_uverbs_post_send(), move the "out:" label after the loop that
frees the list of work requests, since the only place that jumps there
is before any work requests could possibly be added to the list.

This removes a compile warning: "is_ud might be used uninitialized in
this function".

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:47:27 -07:00
Linus Torvalds
4c84a39c8a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (46 commits)
  IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
  IB/mthca: Make all device methods truly reentrant
  IB/mthca: Fix memory leak on modify_qp error paths
  IB/uverbs: Factor out common idr code
  IB/uverbs: Don't decrement usecnt on error paths
  IB/uverbs: Release lock on error path
  IB/cm: Use address handle helpers
  IB/sa: Add ib_init_ah_from_path()
  IB: Add ib_init_ah_from_wc()
  IB/ucm: Get rid of duplicate P_Key parameter
  IB/srp: Factor out common request reset code
  IB/srp: Support SRP rev. 10 targets
  [SCSI] srp.h: Add I/O Class values
  IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
  IB/mthca: Fill in max_map_per_fmr device attribute
  IB/ipath: Add client reregister event generation
  IB/mthca: Add client reregister event generation
  IB: Move struct port_info from ipath to <rdma/ib_smi.h>
  IPoIB: Handle client reregister events
  IB: Add client reregister event type
  ...
2006-06-19 19:01:59 -07:00
Herbert Xu
932ff279a4 [NET]: Add netif_tx_lock
Various drivers use xmit_lock internally to synchronise with their
transmission routines.  They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.

With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set.  This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.

While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire.  So
delaying or dropping the message is to be avoided as much as possible.

So the only alternative is to always set xmit_lock_owner.  The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.

I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly.  I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.

This is pretty much a straight text substitution except for a small
bug fix in winbond.  It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission.  This is
unsafe as an IRQ can potentially wake up the queue.  So it is safer to
use netif_tx_disable.

The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:30:14 -07:00
Roland Dreier
9ead190bfd IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex.  This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.

Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.

This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention.  However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.

Surprisingly, these changes even shrink the object code:

add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:44:49 -07:00
Roland Dreier
c93b6fbaa9 IB/mthca: Make all device methods truly reentrant
Documentation/infiniband/core_locking.txt says:

  All of the methods in struct ib_device exported by a low-level
  driver must be fully reentrant.  The low-level driver is required to
  perform all synchronization necessary to maintain consistency, even
  if multiple function calls using the same object are run
  simultaneously.

However, mthca's modify_qp, modify_srq and resize_cq methods are
currently not reentrant.  Add a mutex to the QP, SRQ and CQ structures
so that these calls can be properly serialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:41 -07:00
Roland Dreier
c9c5d9feef IB/mthca: Fix memory leak on modify_qp error paths
Some error paths after the mthca_alloc_mailbox() call in mthca_modify_qp()
just do a "return -EINVAL" without freeing the mailbox.  Convert these
returns to "goto out" to avoid leaking the mailbox storage.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:41 -07:00
Roland Dreier
3463175d6e IB/uverbs: Factor out common idr code
Factor out common code for adding a userspace object to an idr into a
function idr_add_uobj().  This shrinks both the source and object code:

add/remove: 1/0 grow/shrink: 0/6 up/down: 57/-220 (-163)
function                                     old     new   delta
idr_add_uobj                                   -      57     +57
ib_uverbs_create_ah                          543     512     -31
ib_uverbs_create_srq                         662     630     -32
ib_uverbs_reg_mr                             737     699     -38
ib_uverbs_create_cq                          639     600     -39
ib_uverbs_alloc_pd                           485     446     -39
ib_uverbs_create_qp                         1020     979     -41

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Roland Dreier
92b1582268 IB/uverbs: Don't decrement usecnt on error paths
In error paths when destroying an object, uverbs should not decrement
associated objects' usecnt, since ib_dereg_mr(), ib_destroy_qp(),
etc. already do that.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Ganapathi CH
77f76013e3 IB/uverbs: Release lock on error path
If ibdev->alloc_ucontext() fails then ib_uverbs_get_context() does not
unlock file->mutex before returning error.

Signed-off by: Ganapathi CH <cganapathi@novell.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Sean Hefty
ca222c6b2c IB/cm: Use address handle helpers
Use new ib_init_ah_from_wc() and ib_init_ah_from_path() helper
functions to clean up the IB CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Sean Hefty
6d969a471b IB/sa: Add ib_init_ah_from_path()
Add a call to initialize address handle attributes given a path record.
This is used by the CM, and would be useful for users of UD QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Sean Hefty
4e00d69454 IB: Add ib_init_ah_from_wc()
Add a function to initialize address handle attributes from a work
completion.  This functionality is duplicated by both verbs and the CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Sean Hefty
75af908851 IB/ucm: Get rid of duplicate P_Key parameter
The P_Key is provided into a SIDR REQ in two places, once as a
parameter, and again in the path record.  Remove the P_Key as a
parameter and always use the one given in the path record.

This change has no practical effect on ABI functionality.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Ishai Rabinovitz
526b4caa0a IB/srp: Factor out common request reset code
Misc cleanups in ib_srp:
1) I think that it is more efficient to move the req entries from req_list
   to free_list in srp_reconnect_target (rather than rebuild the free_list).
   (In any case this code is shorter).
2) This allows us to reuse code in srp_reset_device and srp_reconnect_target
   and call a new function srp_reset_req.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Ramachandra K
0c0450db31 IB/srp: Support SRP rev. 10 targets
There has been a change in the format of port identifiers between
revision 10 of the SRP specification and the current revision 16A.

Revision 10 specifies port identifier format as

  lower 8 bytes :  GUID   upper 8 bytes :  Extension

Whereas revision 16A specifies it as 

 lower 8 bytes :  Extension  upper 8 bytes :  GUID

There are older targets (e.g. SilverStorm Virtual Fibre Channel
Bridge) which conform to revision 10 of the SRP specification.

The I/O class of revision 10 is 0xFF00 and the I/O class of revision
16A is 0x0100.

For supporting older targets, this patch:

1) Adds a new optional target creation parameter "io_class". Default
   value of io_class is 0x0100 (i.e. revision 16A)
2) Uses the correct port identifier format for targets with IO class
   of 0xFF00 (i.e. conforming to revision 10)

Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Or Gerlitz
6c8c1aa25d IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
When creating a FMR pool, query the IB device and use the returned
max_map_map_per_fmr attribute as for the max number of FMR remaps. If
the device does not suport querying this attribute, use the original
IB_FMR_MAX_REMAPS (32) default.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Or Gerlitz
d4cb0784fd IB/mthca: Fill in max_map_per_fmr device attribute
Report the true max_map_per_fmr value from mthca_query_device(),
taking into account the change in FMR remapping introduced by the
Sinai performance optimization.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Roland Dreier
6eddb5cb90 IB/ipath: Add client reregister event generation
Generate a client reregister event instead of a LID change event when
client reregister bit is set.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Leonid Arsh
12bbb2b7be IB/mthca: Add client reregister event generation
Change the mthca snoop of MADs that set PortInfo to check if the SM
has set the client reregister bit, and if it has, generate a client
reregister event.  If the bit is not set, just generate a LID change
event as usual.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh
da2ab62ab5 IB: Move struct port_info from ipath to <rdma/ib_smi.h>
Move ipath's struct port_info into <rdma/ib_smi.h>, so that it can be
used by mthca to implement client reregister support.

Remove the __attribute__((packed)) because all the members of the struct
are naturally aligned anyway.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh
508e434123 IPoIB: Handle client reregister events
Handle client reregister events by treating them just like LID or
SM changes -- flush all cached paths and rejoin multicast groups.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Jack Morgenstein
37c22a7721 IPoIB: Fix kernel unaligned access on ia64
Fix misaligned access faults on ia64: never cast a misaligned
neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer
instead.  The memcpy was being optimized to use full word accesses
because the compiler thought that union ib_gid is always aligned.

The cast in IPOIB_GID_ARG is safe, since it is fixed to access each
byte separately.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:35 -07:00
Roland Dreier
31c02e2157 IPoIB: Avoid using stale last_send counter when reaping AHs
The comparisons of priv->tx_tail to ah->last_send in ipoib_free_ah()
and ipoib_post_receive() are slightly unsafe, because priv->tx_lock is
not held and hence a stale value of ah->last_send might be used, which
would lead to freeing an AH before the driver was really done with it.
The simple way to fix this is to the optimization of early free from
ipoib_free_ah() and unconditionally queue AHs for reaping, and then
take priv->tx_lock in __ipoib_reap_ah().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Jack Morgenstein
9874e74655 IB/mad: Check GID/LID when matching requests
Check GID/LID for requester side when searching for request which
matches received response.  This is in order to guarantee uniqueness
if the same TID is used when requesting via multiple source LIDs (when
LMC is not zero).  Use ports' cached LMC to perform the check.

Further, do not perform LID check for direct-routed packets, since
the permissive LID makes a proper check impossible.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Jack Morgenstein
6fb9cdbf2c IB: Add caching of ports' LMC
Add an LMC cache to struct ib_device, and add a function
ib_get_cached_lmc() to query the cache.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Michael S. Tsirkin
856c256f88 IB/cm: remove unneeded flush_workqueue
destroy_workqueue() already does flush_workqueue().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2006-06-17 20:37:33 -07:00
Sean Hefty
4be10c1e6d IB/ucm: convert semaphore to mutex
Convert semaphore in ib_ucm_file to a real mutex.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Roland Dreier
6bfa24fa3e IB/srp: Get rid of "Target has req_lim 0" messages
It's perfectly valid for a connection to an SRP target to have a
request limit of 0, so get rid of the message about it, which can spam
kernel logs even with printk_ratelimit().  Keep a count of such events
in a "zero_req_lim" SCSI host attribute instead, so someone who cares
can look at the statistics.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Ishai Rabinovitz
b7ac4ab497 IB/srp: Handle DREQ events from CM
Handle IB_CM_DREQ_ERROR and IB_CM_DREQ_RECEIVED events from the CM,
instead of just printing "Unhandled CM event".  In the case of
DREQ_ERROR, just ignore the event -- a TIMEWAIT_EXIT will be generated
also.  For DREQ_RECEIVED, send a DREP in response to shut the
connection down cleanly.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham
74b0a15b5e IB/srp: Allow sg_tablesize to be adjusted
Make the sg_tablesize used by SRP adjustable at module load time via a
module parameter.  Calculate the corresponding IU length required to
support this.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham
52fb2b50c4 IB/srp: Allow cmd_per_lun to be set per target port
Allow userspace to throttle traffic on a given connection to a target
port by adding "max_cmd_per_lun=xyz" to lower the cmd_per_lun value
set for that scsi_host.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Ishai Rabinovitz
0c5b395239 IB/srp: Clean up loop in srp_remove_one()
Interrupts will always be enabled in srp_remove_one(), so
spin_lock_irq() can be used instead of spin_lock_irqsave().
Also, the loop takes target->scsi_host->host_lock, so target->state
can just be set to SRP_TARGET_REMOVED witout testing the old value.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Roland Dreier
403a496fd4 IB: Make needlessly global ib_mad_cache static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Matthew Wilcox
b3589fd490 IB/srp: Change target_mutex to a spinlock
The SRP driver never sleeps while holding target_mutex, and it's just
used to protect some simple list operations, so hold times will be
short.  So just convert it to a spinlock, which is smaller and faster.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox
549c5fc2c8 IB/srp: Get rid of unneeded use of list_for_each_entry_safe()
list_for_each_entry_safe() is used in one place where the list isn't
modified.  So just change it to list_for_each_entry().

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox
1962a4a1e4 IB/srp: Use SCAN_WILD_CARD from SCSI headers
SCAN_WILD_CARD is indeed available from <scsi/scsi.h>, which is
already included.  So get rid of private hack.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Roland Dreier
e9cd59418f IB/mthca: Convert FW commands to use wait_for_completion_timeout()
The kernel has had wait_for_completion_timeout() for a long time now.
mthca should use it to handle FW commands timing out, instead of
implementing the same thing in a much more complicated way by using
wait_for_completion() along with a timer that does complete().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Roland Dreier
f5358a172f IB/srp: Use FMRs to map gather/scatter lists
Create an SRP FMR pool on HCAs that support FMRs, and use FMRs to map
gather/scatter lists that have more than one entry into a single
memory region that appears virtually contiguous to the SRP target
(which is the RDMA initiator).

This patch bails out on FMR mapping for SCSI commands where the
gather/scatter list cannot be mapped into a single FMR because there
are sub-page-sized entries in middle of the list.  An unaligned
start or end of the list is OK.

Based on a patch by Vu Pham <vuhuong@mellanox.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Michael S. Tsirkin
a26026c122 IB/mthca: Remove dead code
Kill some dead code in mthca_eq.c

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Sean Hefty
e51060f08a IB: IP address based RDMA connection manager
Kernel connection management agent over InfiniBand that connects based
on IP addresses.  The agent defines a generic RDMA connection
abstraction to support clients wanting to connect over different RDMA
devices.

The agent also handles RDMA device hotplug events on behalf of clients.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Sean Hefty
7025fcd36b IB: address translation to map IP toIB addresses (GIDs)
Add an address translation service that maps IP addresses to
InfiniBand GID addresses using IPoIB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty
6e61d04f2d IB/cm: Match connection requests based on private data
Extend matching connection requests to listens in the InfiniBand CM to
include private data checks.

This allows applications to listen on the same service identifier,
with private data directing the request to the appropriate application.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty
6a9af2e18a IB: common handling for marshalling parameters to/from userspace
Provide common handling for marshalling data between userspace clients
and kernel InfiniBand drivers.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:27 -07:00
Michael S. Tsirkin
4e56ea794e IB/mthca: memfree completion with error FW bug workaround
Memfree firmware is in rare cases reporting WQE index == base - 1 in
receive completion with error, instead of (rq size - 1); base is 0 in
mthca.  Here is a patch to avoid kernel crash and report a correct WR
id in this case.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:20 -07:00
Michael S. Tsirkin
13aa6ecb47 IB/mthca: restore missing PCI registers after reset
mthca does not restore the following PCI-X/PCI Express registers after reset:
  PCI-X device: PCI-X command register
  PCI-X bridge: upstream and downstream split transaction registers
  PCI Express : PCI Express device control and link control registers

This causes instability and/or bad performance on systems where one of
these registers is set to a non-default value by BIOS.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:00 -07:00
Eli Cohen
959eb39297 IPoIB: Fix AH leak at interface down
When ipoib_stop() is called it first calls netif_stop_queue() to stop
the kernel from passing more packets to the network driver. However,
the completion handler may call netif_wake_queue() re-enabling packet
transfer.

This might result in leaks (we see AH leaks which we think can be
attributed to this bug) as new packets get posted while the interface
is going down.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-05 09:51:36 -07:00
Michael S. Tsirkin
ab28b171ea IB/mthca: Fix posting lists of 256 receive requests to SRQ for Tavor
If we post a list of length exactly a multiple of 256, nreq in
doorbell gets set to 256 which is wrong: it should be encoded by 0.
This is because we only zero it out on the next WR, which may not be
there.  The solution is to ring the doorbell after posting a WQE, not
before posting the next one.

This is the same bug that we just fixed for QPs with non-shared RQ.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-24 13:43:37 -07:00
Bryan O'Sullivan
09b74de9ff IB/ipath: deref correct pointer when using kernel SMA
At this point, the core QP structure hasn't been initialized, so what's
in there isn't valid.  Get the same information elsewhere.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:29:35 -07:00
Bryan O'Sullivan
3977026462 IB/ipath: fix null deref during rdma ops
The problem was that node A's sending thread, which handles sending RDMA
read response data, would write the trigger word, the last packet would
be sent, node B would send a new RDMA read request, node A's interrupt
handler would initialize s_rdma_sge, then node A's sending thread would
update s_rdma_sge.  This didn't happen very often naturally but was more
frequent with 1 byte RDMA reads.  Rather than adding more locking or
increasing the QP structure size and copying sge data, I modified the
copy routine to update the pointers before writing the trigger word to
avoid the update race.

Signed-off-by: Ralph Campbell <ralphc@pathscale.com>
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:29:35 -07:00
Bryan O'Sullivan
41c75a19bf IB/ipath: register as IB device owner
This fixes an oops.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:29:35 -07:00
Bryan O'Sullivan
9dcc0e58e2 IB/ipath: enable PE800 receive interrupts on user ports
Fixed so it works on the PE-800.  It had not previously been updated to
match PE-800 receive interrupt differences from HT-400.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:29:35 -07:00
Bryan O'Sullivan
f2080fa3c6 IB/ipath: enable GPIO interrupt on HT-460
This is required for even semi-decent performance on OpenIB.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:29:34 -07:00
Bryan O'Sullivan
b0ff7c2005 IB/ipath: fix NULL dereference during cleanup
Fix NULL deref due to pcidev being clobbered before dd->ipath_f_cleanup()
was called.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:27:06 -07:00
Bryan O'Sullivan
94b8d9f98d IB/ipath: replace uses of LIST_POISON
Per Andrew's request.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:27:06 -07:00
Bryan O'Sullivan
eaf6733bc1 IB/ipath: fix reporting of driver version to userspace
Fix the interface version that gets exported to userspace.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:27:06 -07:00
Bryan O'Sullivan
b228b43c49 IB/ipath: don't modify QP if changes fail
Make sure modify_qp won't modify the QP if any of the changes failed.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:27:06 -07:00
Bryan O'Sullivan
ebac3800e5 IB/ipath: fix spinlock recursion bug
The local loopback path for RC can lock the rkey table lock without
blocking interrupts.  The receive interrupt path can then call
ipath_rkey_ok() and deadlock.  Remove the redundant lock.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-23 13:27:06 -07:00
Michael S. Tsirkin
23f3bc0f2c IB/mthca: Fix posting lists of 256 receive requests for Tavor
If we post a list of length 256 exactly, nreq in doorbell gets set to
256 which is wrong: it should be encoded by 0.  This is because we
only zero it out on the next WR, which may not be there.  The solution
is to ring the doorbell after posting a WQE, not before posting the
next one.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-18 11:37:03 -07:00
Roland Dreier
0cb4fe8d26 IB/uverbs: Don't leak ref to mm on error path
In ib_umem_release_on_close(), if the kmalloc() fails, then a
reference to current->mm will be leaked.  Fix this by adding a mmput()
instead of just returning on kmalloc() failure.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 22:20:50 -07:00
Ishai Rabinovitz
093beac189 IB/srp: Complete correct SCSI commands on device reset
When flushing out queued commands after a successful device reset,
make sure that SRP completes the right commands, instead of calling
scsi_done on the command passed into the device reset handler over and
over.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:20:48 -07:00
Roland Dreier
ec2d720849 IB/srp: Get rid of extra scsi_host_put()s if reconnection fails
If a reconnection attempt fails, then SRP does two scsi_host_put()s.
This is a historical relic from an earlier version of the driver that
took a reference on the scsi_host before trying to reconnect, so get
rid of the extra scsi_host_put().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:16:03 -07:00
Roland Dreier
e65810566f IB/srp: Don't wait for disconnection if sending DREQ fails
Sending a DREQ may fail, for example because the remote target has
already broken the connection.  If so, then SRP should not wait for
the disconnection to complete, because it never will.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:13:21 -07:00
Roland Dreier
1db76c14d2 IB/mthca: Make fw_cmd_doorbell default to 0
Setting fw_cmd_doorbell allows FW command to be queued using posted
writes instead of requiring polling on a "go" bit, so it should be a
performance boost.  However, the option causes problems with at least
some device/firmware combinations, so set the default to 0 until we
understand what's going on better.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 07:48:07 -07:00
Sean Hefty
1b52fa98ed IB: refcount race fixes
Fix race condition during destruction calls to avoid possibility of
accessing object after it has been freed.  Instead of waking up a wait
queue directly, which is susceptible to a race where the object is
freed between the reference count going to 0 and the wake_up(), use a
completion to wait in the function doing the freeing.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-12 14:57:52 -07:00
Roland Dreier
6f4bb3d820 IB/ipath: Properly terminate PCI ID table
The ipath driver's table of PCI IDs needs a { 0, } entry at the end.
This makes all of the device aliases visible to userspace so hotplug
loads the module for all supported devices.  Without the patch,
modinfo ipath_core only shows:

    alias:          pci:v00001FC1d0000000Dsv*sd*bc*sc*i*

instead of the correct:

    alias:          pci:v00001FC1d00000010sv*sd*bc*sc*i*
    alias:          pci:v00001FC1d0000000Dsv*sd*bc*sc*i*

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
2006-05-12 14:57:52 -07:00
Michael S. Tsirkin
ce477ae4f8 IB/mthca: FMR ioremap fix
Addresses for ioremap must be calculated off of pci_resource_start;
we can't directly use the bus address as seen by the HCA.  Fix the
code that remaps device memory for FMR access.

Based on patch by Klaus Smolin.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-10 15:16:57 -07:00
Roland Dreier
5941d079f2 IPoIB: Free child interfaces properly
When deleting a child interface with a non-default P_Key via
/sys/class/net/ibX/delete_child, the interface must be freed with
free_netdev() (rather than kfree() on the private data).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 22:54:59 -07:00
Roland Dreier
a3285aa4ee IB/mthca: Fix race in reference counting
Fix races in in destroying various objects.  If a destroy routine
waits for an object to become free by doing

	wait_event(&obj->wait, !atomic_read(&obj->refcount));
	/* now clean up and destroy the object */

and another place drops a reference to the object by doing

	if (atomic_dec_and_test(&obj->refcount))
		wake_up(&obj->wait);

then this is susceptible to a race where the wait_event() and final
freeing of the object occur between the atomic_dec_and_test() and the
wake_up().  And this is a use-after-free, since wake_up() will be
called on part of the already-freed object.

Fix this in mthca by replacing the atomic_t refcounts with plain old
integers protected by a spinlock.  This makes it possible to do the
decrement of the reference count and the wake_up() so that it appears
as a single atomic operation to the code waiting on the wait queue.

While touching this code, also simplify mthca_cq_clean(): the CQ being
cleaned cannot go away, because it still has a QP attached to it.  So
there's no reason to be paranoid and look up the CQ by number; it's
perfectly safe to use the pointer that the callers already have.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:29 -07:00
Roland Dreier
d945e1df28 IB/srp: Fix tracking of pending requests during error handling
If a SCSI abort completes, or the command completes successfully, then
the driver must remove the command from its queue of pending
commands.  Similarly, if a device reset succeeds, then all commands
queued for the given device must be removed from the queue.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:28 -07:00
Ralph Campbell
d8b9f23b23 IB: Fix display of 4-bit port counters in sysfs
The code to display local_link_integrity_errors and
excessive_buffer_overrun_errors in
/sys/class/infiniband/<hca>/ports/<n>/counters/
uses the wrong shift to extract the 4 bit values.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:28 -07:00
Bryan O'Sullivan
f6f0413e10 IB/ipath: tidy up white space in a few files
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:23 -07:00
Bryan O'Sullivan
d562a5ae69 IB/ipath: fix label name in interrupt handler
Names that are the opposite of their intended meanings are not so helpful.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:22 -07:00
Bryan O'Sullivan
ae15f540cc IB/ipath: improve sparse annotation
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:21 -07:00
Bryan O'Sullivan
9b2017f1e1 IB/ipath: simplify IB timer usage
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:21 -07:00
Bryan O'Sullivan
76f0dd141b IB/ipath: simplify RC send posting
Remove some unnecessarily complicated tests.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:20 -07:00
Bryan O'Sullivan
c71c30dcba IB/ipath: prevent hardware from being accessed during reset
The reset code now turns off the PRESENT flag during a reset, so that
other code won't attempt to access a device that's in mid-reset.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:16 -07:00
Bryan O'Sullivan
fccea66364 IB/ipath: fix verbs registration
Remember when the verbs layer unregisters from the lower-level code.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:15 -07:00
Bryan O'Sullivan
52e7fad825 IB/ipath: change handling of PIO buffers
Different ipath hardware types have different numbers of buffers
available, so we decide on the counts ourselves unless we are specifically
overridden with a module parameter.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:14 -07:00
Bryan O'Sullivan
23e86a4584 IB/ipath: iterate over correct number of ports during reset
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:13 -07:00
Bryan O'Sullivan
68dd43a162 IB/ipath: set up 32-bit DMA mask if 64-bit setup fails
Some systems do not set up 64-bit maps on systems with 2GB or less of
memory installed, so we have to fall back to trying a 32-bit setup.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:12 -07:00
Bryan O'Sullivan
755e4ca4a9 IB/ipath: fix race with exposing reset file
We were accidentally exposing the "reset" sysfs file more than once
per device.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:11 -07:00
Roland Dreier
254abfd33a IB/mthca: Fix offset in query_gid method
GuidInfo records have 8 byte GUIDs in them, so an index should be
multiplied by 8 to get an offset.  mthca_query_gid() was incorrectly
multiplying by 16.

Noticed by Leonid Keller <leonid@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 10:40:23 -07:00
Adrian Bunk
415dcd95b2 IB/mthca: make a function static
This patch makes the needlessly global mthca_update_rate() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:12 -07:00
Roland Dreier
5494c22ba2 IB/ipath: Fix whitespace
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:12 -07:00
Roland Dreier
ac2ae4c977 IB/ipath: Make more names static
Make symbols that are only used in a single source file static.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:12 -07:00
Hal Rosenstock
64cb9c6aff IB/mad: Fix RMPP version check during agent registration
Only check that RMPP version is not specified when MAD class does not
support RMPP.  Just because a class is allowed to use RMPP doesn't
mean that rmpp_version needs to be set for the MAD agent to
register. Checking this was a recent change which was too pedantic.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:11 -07:00
Roland Dreier
f80887d0b9 IB/srp: Remove request from list when SCSI abort succeeds
If a SCSI abort succeeds, then the aborted request should to be
removed from the list of pending requests.  This fixes list corruption
after an abort occurs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:10 -07:00
Jack Morgenstein
59fef3b1e9 IB/mthca: Fix max_srq_sge returned by ib_query_device for Tavor devices
The driver allocates SRQ WQEs size with a power of 2 size both for
Tavor and for memfree. For Tavor, however, the hardware only requires
the WQE size to be a multiple of 16, not a power of 2, and the max
number of scatter-gather allowed is reported accordingly by the
firmware (and this is the value currently returned by
ib_query_device() and ibv_query_device()).

If the max number of scatter/gather entries reported by the FW is used
when creating an SRQ, the creation will fail for Tavor, since the
required WQE size will be increased to the next power of 2, which
turns out to be larger than the device permitted max WQE size (which
is not a power of 2).

This patch reduces the reported SRQ max wqe size so that it can be used
successfully in creating an SRQ on Tavor HCAs.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-12 11:42:30 -07:00
Michael S. Tsirkin
ce684df05a IB/cache: Use correct pointer to calculate size
When allocating gid_cache, use kmalloc(sizeof *gid_cache, ...) rather
than kmalloc(sizeof *pkey_cache, ...).  It doesn't really matter which
one is used, since the size ends up the same either way, but it's much
better to say what we mean.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 13:17:43 -07:00
Roland Dreier
f697f74a6b IPoIB: Use spin_lock_irq() instead of spin_lock_irqsave()
We know ipoib_flush_paths() is called from plain process context with
interrupts enabled, since it does wait_for_completion().  So there's
no need to use spin_lock_irqsave() -- spin_lock_irq() is fine.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:59 -07:00
Eli Cohen
a30bb96c6f IPoIB: Close race in ipoib_flush_paths()
ib_sa_cancel_query() must be called with priv->lock held since
a completion might arrive and set path->query to NULL.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:59 -07:00
Michael S. Tsirkin
abf45dbb5b IB/mthca: Disable tuning PCI read burst size
The PCI spec recommends against drivers playing with a device's PCI
read burst size, and says that systems software should configure it.
And we actually have users that report that changing it from the
default set by BIOS hurts performance and/or stability for them.  On
the other hand, the Mellanox Programmer's Reference Manual recommends
turning it up all the way to the maximum value.  Some tests conducted
here in the lab do not show performance improvement from this tuning,
but this might be just me.

As a work-around, make this tuning an option, off by default (safe
value), with an eye towards removing it completely one day if no one
complains.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Shirley Ma
0f4852513f IPoIB: Make send and receive queue sizes tunable
Make IPoIB's send and receive queue sizes tunable via module
parameters ("send_queue_size" and "recv_queue_size").  This allows the
queue sizes to be enlarged to fix disastrously bad performance on some
platforms and workloads, without bloating memory usage when large
queues aren't needed.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Eli Cohen
f2de3b0612 IPoIB: Wait for join to finish before freeing mcast struct
ipoib_mcast_restart_task() might free an mcast object while a join
request is still outstanding, leading to an oops when the query
completes.  Fix this by waiting for query to complete, similar to what
ipoib_stop_thread() is doing.  The wait for mcast completion code is
consolidated in wait_for_mcast_join().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Jack Morgenstein
bf6a9e31cf IB: simplify static rate encoding
Push translation of static rate to HCA format into low-level drivers,
where it belongs.  For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage.  The changes are:

 - Add enum ib_rate to midlayer includes.
 - Get rid of static rate translation in IPoIB; just use static rate
   directly from Path and MulticastGroup records.
 - Update mthca driver to translate absolute static rate into the
   format used by hardware.  This also fixes mthca's static rate
   handling for HCAs that are capable of 4X DDR.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:47 -07:00
Michael S. Tsirkin
d2e0655ede IPoIB: Consolidate private neighbour data handling
Consolidate IPoIB's private neighbour data handling into
ipoib_neigh_alloc() and ipoib_neigh_free().  This will make it easier
to keep track of the neighbour structures that IPoIB is handling, and
is a nice cleanup of the code:

add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78)
function                                     old     new   delta
ipoib_neigh_alloc                              -      61     +61
ipoib_neigh_free                               -      36     +36
ipoib_mcast_join_finish                     1288    1291      +3
path_rec_completion                          575     573      -2
ipoib_mcast_join_task                        664     660      -4
ipoib_neigh_destructor                       101      92      -9
ipoib_neigh_setup_dev                         14       3     -11
ipoib_neigh_setup                             17       -     -17
path_free                                    238     215     -23
ipoib_mcast_free                             329     306     -23
ipoib_mcast_send                             718     684     -34
neigh_add_path                               705     650     -55

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-04 14:46:48 -07:00
Roland Dreier
ce1823f032 IB/srp: Fix memory leak in options parsing
Fix memory leak if parsing destination GID fails.

Coverity bug 1042

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-03 09:31:04 -07:00
Roland Dreier
227c939b00 IB/mthca: Always build debugging code unless CONFIG_EMBEDDED=y
Change the mthca debugging trace output code so that it can enabled
and disabled at runtime with the debug_level module parameter in
sysfs.  Also, don't allow CONFIG_INFINIBAND_MTHCA_DEBUG to be disabled
unless CONFIG_EMBEDDED is selected.  We want users (and especially
distros) to have this turned on unless they really need to save space,
because by the time we want debugging output, it's usually too late to
rebuild a kernel.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:20 -07:00
Roland Dreier
f5545d24b8 IPoIB: Always build debugging code unless CONFIG_EMBEDDED=y
Don't allow CONFIG_INFINIBAND_IPOIB_DEBUG to be disabled unless
CONFIG_EMBEDDED is selected.  We want users (and especially distros)
to have this turned on unless they really need to save space, because
by the time we want debugging output, it's usually too late to rebuild
a kernel.  The debugging output can be controlled at runtime via the
debug_level module parameter in sysfs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:19 -07:00
Michael S. Tsirkin
37289efe3e IB/mad: fix oops in cancel_mads
We have seen the following OOPs in cancel_mads, when restarting opensm
multiple times:

    Call Trace:
      [<c010549b>] show_stack+0x9b/0xb0
      [<c01055ec>] show_registers+0x11c/0x190
      [<c01057cd>] die+0xed/0x160
      [<c031b966>] do_page_fault+0x3f6/0x5d0
      [<c010511f>] error_code+0x4f/0x60
      [<f8ac4e38>] cancel_mads+0x128/0x150 [ib_mad]
      [<f8ac2811>] unregister_mad_agent+0x11/0x130 [ib_mad]
      [<f8ac2a12>] ib_unregister_mad_agent+0x12/0x20 [ib_mad]
      [<f8b10f23>] ib_umad_close+0xf3/0x130 [ib_umad]
      [<c0162937>] __fput+0x187/0x1c0
      [<c01627a9>] fput+0x19/0x20
      [<c0160f7a>] filp_close+0x3a/0x60
      [<c0121ca8>] put_files_struct+0x68/0xa0
      [<c0103cf7>] do_signal+0x47/0x100
      [<c0103ded>] do_notify_resume+0x3d/0x40
      [<c0103f9e>] work_notifysig+0x13/0x25

We traced this back to local_completions unlocking mad_agent_priv->lock
while still keeping a pointer into local_list. A later call to
list_del(&local->completion_list) would then corrupt the list.

To fix this, remove the entry from local_list after looking it up but
before releasing mad_agent_priv->lock, to prevent cancel_mads from
finding and freeing it.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:19 -07:00
Bryan O'Sullivan
77d8798b55 IB/ipath: kbuild infrastructure
Integrate the ipath core and OpenIB drivers into the kernel build
infrastructure.  Add entry to MAINTAINERS.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:21 -08:00
Bryan O'Sullivan
6522108f19 IB/ipath: infiniband verbs support
The ipath_verbs.c file implements the driver-specific components of the
kernel's Infiniband verbs layer.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:21 -08:00
Bryan O'Sullivan
e28c00ad67 IB/ipath: misc infiniband code, part 2
Management datagram support, queue pairs, and reliable and unreliable
connections.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:21 -08:00
Bryan O'Sullivan
cef1cce5c8 IB/ipath: misc infiniband code, part 1
Completion queues, local and remote memory keys, and memory region
support.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan
97f9efbc47 IB/ipath: infiniband RC protocol support
This is an implementation of the Infiniband RC ("reliable connection")
protocol.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan
74ed6b5eb1 IB/ipath: infiniband UC and UD protocol support
These files implement the Infiniband UC ("unreliable connection") and UD
("unreliable datagram") protocols.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan
aa735edf5d IB/ipath: infiniband header files
These header files are used by the layered Infiniband driver.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan
889ab795a3 IB/ipath: layering interfaces used by higher-level driver code
The layering interfaces are used to implement the Infiniband protocols
and the ethernet emulation driver.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan
7f510b46e4 IB/ipath: support for userspace apps using core driver
These files introduce a char device that userspace apps use to gain
direct memory-mapped access to the InfiniPath hardware, and routines for
pinning and unpinning user memory in cases where the hardware needs to
DMA into the user address space.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:19 -08:00
Bryan O'Sullivan
3e9b4a5eb4 IB/ipath: sysfs and ipathfs support for core driver
The ipathfs filesystem contains files that are not appropriate for
sysfs, because they contain binary data.  The hierarchy is simple; the
top-level directory contains driver-wide attribute files, while numbered
subdirectories contain per-device attribute files.

Our userspace code currently expects this filesystem to be mounted on
/ipathfs, but a final location has not yet been chosen.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:19 -08:00
Bryan O'Sullivan
108ecf0d90 IB/ipath: misc driver support code
EEPROM support, interrupt handling, statistics gathering, and write
combining management for x86_64.

A note regarding i2c: The Atmel EEPROM hardware we use looks like an
i2c device electrically, but is not i2c compliant at all from a
functional perspective.  We tried using the kernel's i2c support to
talk to it, but failed.

Normal i2c devices have a single 7-bit or 10-bit i2c address that they
respond to.  Valid 7-bit addresses range from 0x03 to 0x77.  Addresses
0x00 to 0x02 and 0x78 to 0x7F are special reserved addresses
(e.g. 0x00 is the "general call" address.)  The Atmel device, on the
other hand, responds to ALL addresses.  It's designed to be the only
device on a given i2c bus.  A given i2c device address corresponds to
the memory address within the i2c device itself.

At least one reason why the linux core i2c stuff won't work for this
is that it prohibits access to reserved addresses like 0x00, which are
really valid addresses on the Atmel devices.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:19 -08:00
Bryan O'Sullivan
097709fea0 IB/ipath: chip initialisation code, and diag support
ipath_init_chip.c sets up an InfiniPath device for use.

ipath_diag.c permits userspace diagnostic tools to read and write a
chip's registers.  It is different in purpose from the mmap interfaces
to the /sys/bus/pci resource files.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:19 -08:00
Bryan O'Sullivan
dc741bbd4f IB/ipath: support for PCI Express devices
This file contains routines and definitions specific to InfiniPath
devices that have PCI Express interfaces.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:19 -08:00
Bryan O'Sullivan
cc533a5721 IB/ipath: support for HyperTransport devices
The ipath_ht400.c file contains routines and definitions specific to
HyperTransport-based InfiniPath devices.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:18 -08:00
Bryan O'Sullivan
d41d3aeb76 IB/ipath: core driver header files
ipath_common.h and ips_common.h contain definitions shared between
userspace and the kernel.

ipath_kernel.h is the core driver header file.

ipath_debug.h contains mask values used for controlling driver debugging.

ipath_registers.h contains bitmask definitions used in chip registers.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:18 -08:00
Bryan O'Sullivan
7bb206e3b2 IB/ipath: core device driver
The ipath driver is a low-level driver for PathScale InfiniPath host
channel adapters (HCAs) based on the HT-400 and PE-800 chips, including
the InfiniPath HT-460, the small form factor InfiniPath HT-460, the
InfiniPath HT-470 and the Linux Networx LS/X.

The ipath_driver.c file contains much of the low-level device handling
code.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:18 -08:00
Hal Rosenstock
618a3c03fc IB/mad: RMPP support for additional classes
Add RMPP support for additional management classes that support it.
Also, validate RMPP is consistent with management class specified.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-30 07:19:51 -08:00
Jack Morgenstein
fa9656bbd9 IB/mad: include GID/class when matching receives
Received responses are currently matched against sent requests based
on TID only.  According to the spec, responses should match based on
the combination of TID, management class, and requester LID/GID.

Without the additional qualification, an agent that is responding to
two requests, both of which have the same TID, can match RMPP ACKs
with the incorrect transaction.  This problem can occur on the SM node
when responding to SA queries.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-30 07:19:48 -08:00
Roland Dreier
e1f7868c80 IB/mthca: Fix section mismatch problems
Quite a few cleanup functions in mthca were marked as __devexit.
However, they could also be called from error paths during
initialization, so they cannot be marked that way.  Just delete all of
the incorrect annotations.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-29 09:36:46 -08:00
Roland Dreier
ef12d45619 IPoIB: Fix oops with raw sockets
ipoib_hard_header() needs to handle the case that daddr is NULL.  This
can happen when packets are injected via a raw socket, and IPoIB
shouldn't oops in this case.

Reported by Anton Blanchard <anton@samba.org>

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-29 09:36:46 -08:00
Jack Morgenstein
a07bacca7b IB/mthca: Fix check of size in SRQ creation
The previous patch for Tavor broke MemFree logic.

The driver should perform limit check only for Tavor.  For MemFree,
the check is incorrect, since ds (WQE stride) is always a power-of-2
(although the max_desc_size may not be).

In Tavor, however, WQE stride and desc_size are the same, and are not
necessarily power-of-2.  The check was really for the WQE stride (and
it Tavor, we use max_desc_size for the stride).

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-29 09:36:46 -08:00
Roland Dreier
3f89f83449 IB/srp: Fix unmapping of fake scatterlist
The recently merged patch to create a fake scatterlist for non-SG SCSI
commands had a bug: the driver ended up doing dma_unmap_sg() on a
scatterlist scmnd->request_buffer rather than the fake scatter list it
created.  Fix this so that the driver unmaps the same thing it maps.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-29 09:36:45 -08:00
Leonid Arsh
7a343d4c46 IPoIB: P_Key change event handling
This patch causes the network interface to respond to P_Key change
events correctly.  As a result, you'll see a child interface in the
"RUNNING" state (netif_carrier_on()) only when the corresponding P_Key
is configured by the SM.  When SM removes a P_Key, the "RUNNING" state
will be disabled for the corresponding network interface.  To
implement this, I added IB_EVENT_PKEY_CHANGE event handling.  To
prevent flushing the device before the device is open by the "delay
open" mechanism, I added an additional device flag called
IPOIB_FLAG_INITIALIZED.

This also prevents the child network interface from trying to join to
multicast groups until the PKEY is configured.  We used to get error
messages like:

    ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff

in this case.  To fix this, I just check IPOIB_FLAG_OPER_UP flag in
ipoib_set_mcast_list().

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:30 -08:00
Roland Dreier
192daa18dd IB/mthca: Fix modify QP error path
If the call to mthca_MODIFY_QP() failed, then mthca_modify_qp() would
still do some things it shouldn't, such as store away attributes for
special QPs.  Fix this, and simplify the code, by simply jumping to
the exit path if mthca_MODIFY_QP() fails.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:30 -08:00
Leonid Arsh
4e37b95616 IPoIB: Fix network interface "RUNNING" status
With the current IPoIB driver, the status of network interfaces stays
"RUNNING" even if the link goes down (for example because a cable is
unplugged).  Fix this by flushing the IPoIB interface when the link
goes down.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:29 -08:00
Roland Dreier
b0b3a8e193 IB/mthca: Fix indentation
Fix some whitespace damage (indenting with spaces) that snuck in.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:29 -08:00
Jack Morgenstein
b3f64967fa IB/mthca: Fix uninitialized variable in mthca_alloc_qp()
mthca_alloc_sqp() by mthca_set_qp_size() need to set qp->transport
before calling mthca_set_qp_size(), since the value is used there.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:29 -08:00
Jack Morgenstein
d4301e2c66 IB/mthca: Check SRQ limit in modify SRQ operation
When setting the shared receive queue (SRQ) watermark in a modify SRQ
operation, make sure that the supplied value is not larger than the
full size of the SRQ.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:28 -08:00
Jack Morgenstein
ded9ad721d IB/mthca: Check that SRQ WQE size does not exceed device's max value
Guarantee the calculated work queue entry size does not exceed the max
allowable WQE size when creating an SRQ.  This is a problem with Arbel
in Tavor-compatibility mode because the current WQE size computation
method rounds up to next power of 2.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:27 -08:00
Dotan Barak
0ef61db837 IB/mthca: Check that sgid_index and path_mtu are valid in modify_qp
Add a check that the modify QP parameters sgid_index and path_mtu are
valid, since they might come from userspace.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:27 -08:00
Roland Dreier
cf368713a3 IB/srp: Use a fake scatterlist for non-SG SCSI commands
Since the SCSI midlayer is moving towards entirely getting rid of
commands with use_sg == 0, we should treat this case as an exception.
Therefore, change the IB SRP initiator to create a fake scatterlist
for these commands with sg_init_one().  This simplifies the flow of
DMA mapping and unmapping, since SRP can just use dma_map_sg() and
dma_unmap_sg() unconditionally, rather than having to choose between
the dma_{map,unmap}_sg() and dma_{map,unmap}_single() variants.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:26 -08:00
Leonid Arsh
6f633c8d69 IPoIB: Pass correct pointer when flushing child interfaces
ipoib_ib_dev_flush() should get passed cpriv->dev, not &cpriv->dev.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:25 -08:00
Linus Torvalds
3d1f337b3e Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (235 commits)
  [NETFILTER]: Add H.323 conntrack/NAT helper
  [TG3]: Don't mark tg3_test_registers() as returning const.
  [IPV6]: Cleanups for net/ipv6/addrconf.c (kzalloc, early exit) v2
  [IPV6]: Nearly complete kzalloc cleanup for net/ipv6
  [IPV6]: Cleanup of net/ipv6/reassambly.c
  [BRIDGE]: Remove duplicate const from is_link_local() argument type.
  [DECNET]: net/decnet/dn_route.c: fix inconsequent NULL checking
  [TG3]: make drivers/net/tg3.c:tg3_request_irq() static
  [BRIDGE]: use LLC to send STP
  [LLC]: llc_mac_hdr_init const arguments
  [BRIDGE]: allow show/store of group multicast address
  [BRIDGE]: use llc for receiving STP packets
  [BRIDGE]: stp timer to jiffies cleanup
  [BRIDGE]: forwarding remove unneeded preempt and bh diasables
  [BRIDGE]: netfilter inline cleanup
  [BRIDGE]: netfilter VLAN macro cleanup
  [BRIDGE]: netfilter dont use __constant_htons
  [BRIDGE]: netfilter whitespace
  [BRIDGE]: optimize frame pass up
  [BRIDGE]: use kzalloc
  ...
2006-03-21 09:31:48 -08:00
Arnaldo Carvalho de Melo
e35fc38565 [INFINIBAND] ipoib: Remove leftover use of neigh_ops->destructor
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:46:40 -08:00
Michael S. Tsirkin
c5ecd62c25 [NET]: Move destructor from neigh->ops to neigh_params
struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed.  We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.

The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup.  Two additional
patches in the patch series update ipoib to use this new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:25:41 -08:00
Eli Cohen
fd02e8038e IB/mthca: Query SRQ srq_limit fixes
Fix endianness handling of srq_limit: it is big-endian in the context
structure, so we need to swab it before returning it.

Also add support for srq_limit query for Tavor (non-MemFree) HCAs.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:26 -08:00
Roland Dreier
bfef73fa78 IPoIB: Get rid of useless test of queue length
In neigh_add_path(), the queue of delayed packets can never be full,
because the queue is always freshly created and cannot be found by any
other code path.  In fact, the test of the queue length is worse than
useless: if somehow the test ever triggered and path_rec_start() also
failed, then dev_kfree_skb_any() will be called twice on the same skb.
Fix this by deleting the useless test.  Pointed out by Michael
S. Tsirkin <mst@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:26 -08:00
Dotan Barak
e10e271bfd IB/mthca: Correct reported SRQ size in MemFree case.
MemFree devices need to reserve one shared receive queue (SRQ) work
request for internal use, so the capacity returned from the create_srq
and query_srq methods should be srq->max - 1.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:26 -08:00
Michael S. Tsirkin
dc05980dd7 IB/mad: Fix oopsable race on device removal
Fix an oopsable race debugged by Eli Cohen <eli@mellanox.co.il>:
After removing the port from port_list, ib_mad_port_close flushes
port_priv->wq before destroying the special QPs. This means that a
completion event could arrive, and queue a new work in this work queue
after flush.

This patch also removes an unnecessary flush_workqueue():
destroy_workqueue() already includes a flush.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:25 -08:00
Roland Dreier
bf17c1c7cc IB/srp: Coverity fix to srp_parse_options()
Fix leak found by Coverity: in the SRP_OPT_DGID case,
srp_parse_options() didn't free the result of match_strdup().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:25 -08:00
Roland Dreier
6b63e3015a IB/mthca: Coverity fix to mthca_init_eq_table()
Fix bug found by coverity: the loop body never executed, because it
was doing for (i = 0; i < MTHCA_EQ_CMD; ++i), but MTHCA_EQ_CMD is 0.
The correct loop bound is MTHCA_NUM_EQ, to loop over all EQs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:25 -08:00
Roland Dreier
048975ac58 IB: Coverity fixes to sysfs.c
Fix two bugs found by coverity:
 - Memory leak in error path of alloc_group_attrs()
 - Fencepost error in state_show(): the test should be < ARRAY_SIZE(),
   not <= ARRAY_SIZE().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:25 -08:00
Jack Morgenstein
0b3ea0829c IPoIB: Move ipoib_ib_dev_flush() to ipoib workqueue
Move ipoib_ib_dev_flush() to ipoib's workqueue.  This keeps it ordered
with respect to other work scheduled by the ipoib driver.  This fixes
problems with races, for example:
 - ipoib_ib_dev_flush() has started running because of an IB event
 - user does ifconfig ib0 down
 - ipoib_mcast_stop_thread() gets called twice and waits for the same
   completion twice

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:24 -08:00
Roland Dreier
8b9ab02b69 IPoIB: Fix build now that neighbour destructor is in neigh_params
Fix the IPoIB build (which is broken in net-2.6.17 because of my
screw-up, which left out this chunk in ipoib_multicast.c). 
The neighbour destructor is now in neigh_params, so we don't
need to clear it in the ops structure.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:24 -08:00
Ami Perlmutter
702b2aaccf IB/uverbs: Use correct alt_pkey_index in modify QP
The old code incorrectly used the primary P_Key index as the alternate
index too.

Signed-off-by: Ami Perlmutter <amip@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:24 -08:00
Jack Morgenstein
f36e1793e2 IB/umad: Add support for large RMPP transfers
Add support for sending and receiving large RMPP transfers.  The old
code supports transfers only as large as a single contiguous kernel
memory allocation.  This patch uses linked list of memory buffers when
sending and receiving data to avoid needing contiguous pages for
larger transfers.

  Receive side: copy the arriving MADs in chunks instead of coalescing
  to one large buffer in kernel space.

  Send side: split a multipacket MAD buffer to a list of segments,
  (multipacket_list) and send these using a gather list of size 2.
  Also, save pointer to last sent segment, and retrieve requested
  segments by walking list starting at last sent segment. Finally,
  save pointer to last-acked segment.  When retrying, retrieve
  segments for resending relative to this pointer.  When updating last
  ack, start at this pointer.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:23 -08:00
Roland Dreier
6ecb0c8496 IB/srp: Add SCSI host attributes to show target port
Add SCSI host attributes in sysfs that show the ID extension, IOC
GUID, service ID, P_Key and destination GID for each target port that
the SRP initiator connects to.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:23 -08:00
Sean Hefty
87fd1a11ae IB/cm: Check cm_id state before handling a REP
Move checking the state of a cm_id before modifying it when handling a
REP.  This fixes a bug seen under MPI scale-up testing, where a NULL
timewait_info pointer is dereferenced if a request times out before a
REP is received.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:23 -08:00
Roland Dreier
6226bb5701 IB/mthca: Update firmware versions
Update known firmware versions in driver's table to the latest releases.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:22 -08:00
Eli Cohen
651eaac928 IB/mthca: Optimize large messages on Sinai HCAs
Sinai (one-port PCI Express) HCAs get improved throughput for messages
bigger than 80 KB in DDR mode if memory keys are formatted in a
specific way.  The enhancement only works if the memory key table is
smaller than 2^24 entries.  For larger tables, the enhancement is off
and a warning is printed (to avoid silent performance loss).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:22 -08:00
Dotan Barak
27d5630064 IB/uverbs: Fix query QP return of sq_sig_all
The old code didn't convert from the kernel's enum correctly.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:21 -08:00
Dotan Barak
4546d31d84 IB: Fix modify QP checking of "current QP state" attribute
According to the IB spec version 1.2, section 11.2.4.2, the current
table has a couple of mistakes where it allows the current QP state
(IB_QP_CUR_STATE) attribute.  For the transitions:

  RTS -> RTS: IB_QP_CUR_STATE should be allowed for all transports
  SQD -> SQD: IB_QP_CUR_STATE should never be allowed

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:20 -08:00
Michael S. Tsirkin
9acf6a8570 IPoIB: Fix multicast race between canceling and completing
ipoib_mcast_stop_thread currently tests mcast->query and if it is
NULL, does not perform wait_for_completion on the mcast and frees the
mcast object directly.

However, since both operations are done without locking, it is
possible that ipoib_mcast_join_complete is in progress on this mcast
object and has set mcast->query to NULL already.

Solve this by:
- taking priv->lock before we change mcast->query in ipoib_mcast_join_complete,
  and keeping it until we no longer need the mcast object
- taking priv->lock around mcast->query test in ipoib_mcast_stop_thread

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:20 -08:00
Eli Cohen
54d07e2a1e IPoIB: Clean up if posting receives fails
If posting receives in ipoib_ib_dev_open() fails, call
ipoib_ib_dev_stop() to move the device's QP back to the RESET state so
that we can try again later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:19 -08:00
Ishai Rabinovitz
8d3ef29d6b IB/mthca: Use an enum for HCA page size
Use a named enum for the HCA's internal page size, rather than having
magic values of 4096 and shifts by 12 all over the code.  Also, fix
one minor bug in EQ handling: only one HCA page is mapped to the HCA
during initialization, but a full kernel page is unmapped during
cleanup.  This might cause problems when PAGE_SIZE != 4096.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:19 -08:00
Dotan Barak
67e7377661 IB/mthca: Check alternate P_Key index when setting alternate path
Check that the alternate P_Key index is in range when setting the
alternate path for a QP.  Also make a cosmetic touch up to the debug
message printed when the main P_Key index is out of range.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:18 -08:00
Dotan Barak
7667abd152 IB/mthca: Add support for send work request fence flag
Add support for IB_SEND_FENCE flag in post_send methods.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:18 -08:00
Eli Cohen
7343b231f2 IPoIB: Close race in setting mcast->ah
ipoib_mcast_send() tests mcast->ah twice.  If this value is changed
between these two points, we leak an skb.  However,
ipoib_mcast_join_finish() sets mcast->ah with no locking, so it could
race against ipoib_mcast_send().

As a solution, take priv->lock around assignment to mcast->ah thus
making sure ipoib_mcast_send() (which also takes priv->lock) is not in
flight.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:18 -08:00
Jack Morgenstein
1d89b1ae6c IB/mthca: Implement query_ah method
Implement query_ah (except for AVs which are in HCA memory).  This is
needed to implement RMPP duplicate session detection on sending side
(extraction of DGID/DLID and GRH flag from address handle).

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:17 -08:00
Eli Cohen
14abdffcc0 IB/mthca: Write FW commands through doorbell page
This patch is checks whether the HCA supports posting FW commands
through a doorbell page (user access region 0, or "UAR0").  If this is
supported, the driver maps UAR0 and uses it for FW commands. This can
be controlled by the value of a writable module parameter
fw_cmd_doorbell.  When the parameter is 0, the commands are posted
through HCR using the old method; otherwise if HCA is capable commands
go through UAR0.

This use of UAR0 to post commands eliminates the need for polling the
"go" bit prior to posting a new command. Since reading from a PCI
device is much more expensive then issuing a posted write, it is
expected that issuing FW commands this way will provide better CPU
utilization.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:17 -08:00
Dotan Barak
ea88fd16d6 IB/uverbs: Return actual capacity from create SRQ operation
Pass actual capacity of created SRQ back to userspace, so that
userspace can report accurate capacities.  This requires an ABI bump,
to change struct ib_uverbs_create_srq_resp.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:16 -08:00
Dotan Barak
abb6e9ba17 IB/mthca: Return actual capacity from create_srq
Have mthca's create_srq method return the actual capacity of the SRQ
that gets created.  Also update comments in <rdma/ib_verbs.h> to
clarify that this is what is expected from ib_create_srq().

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:16 -08:00
Michael S. Tsirkin
44af79f952 IPoIB: clarify to_ipoib_neigh()
Cosmetic change: make alignment explicit in to_ipoib_neigh.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:16 -08:00
Roland Dreier
00df1b2c8b IB/mthca: Bump driver version and release date
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:15 -08:00
Eli Cohen
8ebe5077e3 IB/mthca: Support for query QP and SRQ
Implement the query_qp and query_srq methods in mthca.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:15 -08:00
Dotan Barak
8bdb0e8632 IB/uverbs: Support for query SRQ from userspace
Add support to uverbs to handle querying userspace SRQs (shared
receive queues), including adding an ABI for marshalling requests and
responses.  The kernel midlayer already has the underlying
ib_query_srq() function.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:14 -08:00
Dotan Barak
7ccc9a24e0 IB/uverbs: Support for query QP from userspace
Add support to uverbs to handle querying userspace QPs (queue pairs),
including adding an ABI for marshalling requests and responses.  The
kernel midlayer already has the underlying ib_query_qp() function.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:14 -08:00
Roland Dreier
a74cd4af0b IB: Whitespace cleanups
Remove trailing whitespace and fix indentation that with spaces
instead of tabs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:13 -08:00
Roland Dreier
d844183d9c IB/mthca: Convert to use ib_modify_qp_is_ok()
Use ib_modify_qp_is_ok() in mthca, and delete the big table of
attributes for queue pair state transitions.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:13 -08:00