T4 EQ entries are in multiples of 64 bytes. Currently the RDMA SQ and
RQ use fixed sized entries composed of 4 EQ entries for the SQ and 2
EQ entries for the RQ. For optimial latency with small IO, we need to
change this so the HW only needs to DMA the EQ entries actually used
by a given work request.
Implementation:
- add wq_pidx counter to track where we are in the EQ. cidx/pidx are
used for the sw sq/rq tracking and flow control.
- the variable part of work requests is the SGL. Add new functions to
build the SGL and/or immediate data directly in the EQ memory
wrapping when needed.
- adjust the min burst size for the EQ contexts to 64B.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This replace the PCI DMA state API (include/linux/pci-dma.h) with the
DMA equivalents since the PCI DMA state API will be obsolete.
No functional change.
For further information about the background:
http://marc.info/?l=linux-netdev&m=127037540020276&w=2
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- wrap cq->cqidx_inc based on cq size.
- optimize t4_arm_cq logic.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
1) save the timestamp flit in the cq when we consume a CQE.
2) always compare the saved flit with the previous entry flit when
reading the next CQE entry. If the flits don't compare, then we
have overflowed.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add an RDMA/iWARP driver for Chelsio T4 Ethernet adapters.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>