The DCCP base time resolution is 10 microseconds (RFC 4340, 13.1 ... 13.3).
Using a timer with a lower resolution was found to trigger the following
bug warnings/problems on high-speed networks (e.g. local loopback):
* RTT samples are rounded down to 0 if below resolution;
* in some cases, negative RTT samples were observed;
* the CCID-3 feedback timer complains that the feedback interval is 0,
since the feedback interval is in the order of 1 RTT or less and RTT
measurement rounded this down to 0;
On an Intel computer this will for instance happen when using a
boot-time parameter of "clocksource=jiffies".
The following system log messages were observed:
11:24:00 kernel: BUG: delta (0) <= 0 at ccid3_hc_rx_send_feedback()
11:26:12 kernel: BUG: delta (0) <= 0 at ccid3_hc_rx_send_feedback()
11:26:30 kernel: dccp_sample_rtt: unusable RTT sample 0, using min
11:26:30 last message repeated 5 times
This patch defines a global constant for the time resolution, adds this in
timer.c, and checks the available clock resolution at CCID-3 module load time.
When the resolution is worse than 10 microseconds, module loading exits with
a message "socket type not supported".
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This extends the existing wait-for-ccid routine so that it may be used with
different types of CCID. It further addresses the problems listed below.
The code looks if the write queue is non-empty and grants the TX CCID up to
`timeout' jiffies to drain the queue. It will instead purge that queue if
* the delay suggested by the CCID exceeds the time budget;
* a socket error occurred while waiting for the CCID;
* there is a signal pending (eg. annoyed user pressed Control-C);
* the CCID does not support delays (we don't know how long it will take).
D e t a i l s [can be removed]
-------------------------------
DCCP's sending mechanism functions a bit like non-blocking I/O: dccp_sendmsg()
will enqueue up to net.dccp.default.tx_qlen packets (default=5), without waiting
for them to be released to the network.
Rate-based CCIDs, such as CCID3/4, can impose sending delays of up to maximally
64 seconds (t_mbi in RFC 3448). Hence the write queue may still contain packets
when the application closes. Since the write queue is congestion-controlled by
the CCID, draining the queue is also under control of the CCID.
There are several problems that needed to be addressed:
1) The queue-drain mechanism only works with rate-based CCIDs. If CCID2 for
example has a full TX queue and becomes network-limited just as the
application wants to close, then waiting for CCID2 to become unblocked could
lead to an indefinite delay (i.e., application "hangs").
2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
in its sending policy while the queue is being drained. This can lead to
further delays during which the application will not be able to terminate.
3) The minimum wait time for CCID3/4 can be expected to be the queue length
times the current inter-packet delay. For example if tx_qlen=100 and a delay
of 15 ms is used for each packet, then the application would have to wait
for a minimum of 1.5 seconds before being allowed to exit.
4) There is no way for the user/application to control this behaviour. It would
be good to use the timeout argument of dccp_close() as an upper bound. Then
the maximum time that an application is willing to wait for its CCIDs to can
be set via the SO_LINGER option.
These problems are addressed by giving the CCID a grace period of up to the
`timeout' value.
The wait-for-ccid function is, as before, used when the application
(a) has read all the data in its receive buffer and
(b) if SO_LINGER was set with a non-zero linger time, or
(c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
state (client application closes after receiving CloseReq).
In addition, there is a catch-all case by calling __skb_queue_purge() after
waiting for the CCID. This is necessary since the write queue may still have
data when
(a) the host has been passively-closed,
(b) abnormal termination (unread data, zero linger time),
(c) wait-for-ccid could not finish within the given time limit.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This extends the packet dequeuing interface of dccp_write_xmit() to allow
1. CCIDs to take care of timing when the next packet may be sent;
2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).
The main purpose is to take CCID2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).
The interface can also be used to support other CCIDs.
The mode of operation for (2) is as follows:
* new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
* ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
* it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
* dccp_write_xmit() returns without further action;
* after some time the wait-condition for CCID becomes true,
* that CCID schedules the tasklet,
* tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
* since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
* packet is sent, and possibly more (since dccp_write_xmit() loops).
Code reuse: the taskled function calls dccp_write_xmit(), the timer function
reduces to a wrapper around the same code.
If the tasklet finds that the socket is locked, it re-schedules the tasklet
function (not the tasklet) after one jiffy.
Changed DCCP_BUG to dccp_pr_debug when transmit_skb returns an error (e.g. when a
local qdisc is used, NET_XMIT_DROP=1 can be returned for many packets).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch starts the new implementation of feature negotiation:
1. Although it is theoretically possible to perform feature negotiation at any
time (and RFC 4340 supports this), in practice this is prohibitively complex,
as it requires to put traffic on hold for each new negotiation.
2. As a byproduct of restricting feature negotiation to connection setup, the
feature-negotiation retransmit timer is no longer required. This part is now
mapped onto the protocol-level retransmission.
Details indicating why timers are no longer needed can be found on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html
This patch disables anytime negotiation, subsequent patches work out full
feature negotiation support for connection setup.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch allows the sender to distinguish original and retransmitted packets,
which is in particular needed for the retransmission of DCCP-Requests:
* the first Request uses ISS (generated in net/dccp/ip*.c), and sets GSS = ISS;
* all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted
Request has sequence number ISS + n (mod 48).
To add generic support, the patch reorganises existing code so that:
* icsk_retransmits == 0 for the original packet and
* icsk_retransmits = n > 0 for the n-th retransmitted packet
at the time dccp_transmit_skb() is called, via dccp_retransmit_skb().
Thanks to Wei Yongjun for pointing this problem out.
Further changes:
----------------
* removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head
is used for all retransmissions (the exception is client-Acks in PARTOPEN
state, but these do not use sk_send_head);
* since sk_send_head always contains the original skb (via dccp_entail()),
skb_cloned() never evaluated to true and thus pskb_copy() was never used.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.
The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides a timesource, conveniently used for DCCP timestamps, which
returns the elapsed time in 10s of microseconds since initialisation.
This makes for a wrap-around time of about 11.9 hours, which should be
sufficient for most applications.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TX CCID needs the write_xmit_timer for delaying packet sends. Previously
this timer was only activated on active (connecting) sockets.
This patch initialises the write_xmit_timer in sync with the other timers, i.e.
the timer will be ready on any socket. This is used by applications with a
listening socket which start to stream after receiving an initiation by the
client. The write_xmit_timer is stopped when the application closes, as before.
Was tested to work and to remove the timer bug reported on dccp@vger.
Also moved timer initialisation into timer.c (static).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This removes 3 forward declarations by reordering 2 functions.
No code change at all.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This adds 3 sysctls which govern the retransmission behaviour of DCCP control
packets (3way handshake, feature negotiation).
It removes 4 FIXMEs from the code.
The close resemblance of sysctl variables to their TCP analogues is emphasised
not only by their name, but also by giving them the same initial values.
This is useful since there is not much practical experience with DCCP yet.
Furthermore, with regard to the previous patch, it is now possible to limit
the number of keepalive-Responses by setting net.dccp.default.request_retries
(also a bit like in TCP).
Lastly, added documentation of all existing DCCP sysctls.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This updates program documentation: spell out precise conditions about
which packets are eligible for retransmission (which is actually quite
hard to extract from RFC 4340).
It is based on the following table derived from RFC 4340:
+-----------+---------------------------------+---------------------+
| Type | Retransmit? | Remark |
+-----------+---------------------------------+---------------------+
| Request | in client-REQUEST state | sec. 8.1.1 |
| Response | NEVER | SHOULD NOT, 8.1.3 |
| Data | NEVER | unreliable protocol |
| Ack | possible in client-PARTOPEN | sec. 8.1.5 |
| DataAck | NEVER | unreliable protocol |
| CloseReq | only in server-CLOSEREQ state | MUST, sec. 8.3 |
| Close | in node-CLOSING state | MUST, sec. 8.3 |
+-----------+-------------------------------------------------------+
| Reset | only in response to other packets |
| Sync | only in response to sequence-invalid packets (7.5.4) |
| SyncAck | only in response to Sync packets |
+-----------+-------------------------------------------------------+
Hence the only packets eligible for retransmission are:
* Requests in client-REQUEST state (sec. 8.1.1)
* Acks in client-PARTOPEN state (sec. 8.1.5)
* CloseReq in server-CLOSEREQ state (sec. 8.3)
* Close in node-CLOSING state (sec. 8.3)
I had meant to put in a check for these types too, but have left that
for later.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Renaming it to dccp_send_reset and moving it from the ipv4 specific
code to the core dccp code.
This fixes some bugs in IPV6 where timers would send v4 resets, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is from a first audit, more eyeballs are more than welcome.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also improves reqsk_queue_prune and renames it to
inet_csk_reqsk_queue_prune, as it deals with both inet_connection_sock
and inet_request_sock objects, not just with request_sock ones thus
belonging to inet_request_sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Development to this point was done on a subversion repository at:
http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/
This repository will be kept at this site for the foreseable future,
so that interested parties can see the history of this code,
attributions, etc.
If I ever decide to take this offline I'll provide the full history at
some other suitable place.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>