linux

Author	SHA1	Message	Date
Gerrit Renker	146993cf51	dccp: Refine the wait-for-ccid mechanism This extends the existing wait-for-ccid routine so that it may be used with different types of CCID. It further addresses the problems listed below. The code looks if the write queue is non-empty and grants the TX CCID up to `timeout' jiffies to drain the queue. It will instead purge that queue if * the delay suggested by the CCID exceeds the time budget; * a socket error occurred while waiting for the CCID; * there is a signal pending (eg. annoyed user pressed Control-C); * the CCID does not support delays (we don't know how long it will take). D e t a i l s [can be removed] ------------------------------- DCCP's sending mechanism functions a bit like non-blocking I/O: dccp_sendmsg() will enqueue up to net.dccp.default.tx_qlen packets (default=5), without waiting for them to be released to the network. Rate-based CCIDs, such as CCID3/4, can impose sending delays of up to maximally 64 seconds (t_mbi in RFC 3448). Hence the write queue may still contain packets when the application closes. Since the write queue is congestion-controlled by the CCID, draining the queue is also under control of the CCID. There are several problems that needed to be addressed: 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID2 for example has a full TX queue and becomes network-limited just as the application wants to close, then waiting for CCID2 to become unblocked could lead to an indefinite delay (i.e., application "hangs"). 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes in its sending policy while the queue is being drained. This can lead to further delays during which the application will not be able to terminate. 3) The minimum wait time for CCID3/4 can be expected to be the queue length times the current inter-packet delay. For example if tx_qlen=100 and a delay of 15 ms is used for each packet, then the application would have to wait for a minimum of 1.5 seconds before being allowed to exit. 4) There is no way for the user/application to control this behaviour. It would be good to use the timeout argument of dccp_close() as an upper bound. Then the maximum time that an application is willing to wait for its CCIDs to can be set via the SO_LINGER option. These problems are addressed by giving the CCID a grace period of up to the `timeout' value. The wait-for-ccid function is, as before, used when the application (a) has read all the data in its receive buffer and (b) if SO_LINGER was set with a non-zero linger time, or (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ state (client application closes after receiving CloseReq). In addition, there is a catch-all case by calling __skb_queue_purge() after waiting for the CCID. This is necessary since the write queue may still have data when (a) the host has been passively-closed, (b) abnormal termination (unread data, zero linger time), (c) wait-for-ccid could not finish within the given time limit. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:38 +02:00
Gerrit Renker	e7937772d7	dccp: Extend CCID packet dequeueing interface This extends the packet dequeuing interface of dccp_write_xmit() to allow 1. CCIDs to take care of timing when the next packet may be sent; 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds). The main purpose is to take CCID2 out of its polling mode (when it is network- limited, it tries every millisecond to send, without interruption). The interface can also be used to support other CCIDs. The mode of operation for (2) is as follows: * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(), * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full), * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER', * dccp_write_xmit() returns without further action; * after some time the wait-condition for CCID becomes true, * that CCID schedules the tasklet, * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(), * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now", * packet is sent, and possibly more (since dccp_write_xmit() loops). Code reuse: the taskled function calls dccp_write_xmit(), the timer function reduces to a wrapper around the same code. If the tasklet finds that the socket is locked, it re-schedules the tasklet function (not the tasklet) after one jiffy. Changed DCCP_BUG to dccp_pr_debug when transmit_skb returns an error (e.g. when a local qdisc is used, NET_XMIT_DROP=1 can be returned for many packets). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:38 +02:00
Gerrit Renker	f4a66ca4d2	dccp: Return-value convention of hc_tx_send_packet() This patch reorganises the return value convention of the CCID TX sending function, to permit more flexible schemes, as required by subsequent patches. Currently the convention is * values < 0 mean error, * a value == 0 means "send now", and * a value x > 0 means "send in x milliseconds". The patch provides symbolic constants and a function to interpret return values. In addition, it caps the maximum positive return value to 0xFFFF milliseconds, corresponding to 65.535 seconds. This is possible since in CCID-3 the maximum inter-packet gap is t_mbi = 64 sec. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:38 +02:00
Gerrit Renker	c8bf462bc5	dccp ccid-2: Separate option parsing from CCID processing This patch replaces an almost identical replication of code: large parts of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c. Apart from the duplication, this caused two more problems: 1. CCIDs should not need to be concerned with parsing header options; 2. one can not assume that Ack Vectors appear as a contiguous area within an skb, it is legal to insert other options and/or padding in between. The current code would throw an error and stop reading in such a case. The patch provides a new data structure and associated list housekeeping. Only small changes were necessary to integrate with CCID-2: data structure initialisation, adapt list traversal routine, and add call to the provided cleanup routine. The latter also lead to fixing the following BUG: CCID-2 so far ignored Ack Vectors on all packets other than Ack/DataAck, which is incorrect, since Ack Vectors can be present on any packet that has an Ack field. Details: -------- * received Ack Vectors are parsed by dccp_parse_options() alone, which passes the result on to the CCID-specific routine ccid_hc_tx_parse_options(); * CCIDs interested in using/decoding Ack Vector information will add code to fetch parsed Ack Vectors via this interface; * a data structure, `struct dccp_ackvec_parsed' is provided as interface; * this structure arranges Ack Vectors of the same skb into a FIFO order; * a doubly-linked list is used to keep the required FIFO code small. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:37 +02:00
Gerrit Renker	5a577b488f	dccp ccid-2: Remove old infrastructure This removes * functions for which updates have been provided in the preceding patches and * the @av_vec_len field - it is no longer necessary since the buffer length is now always computed dynamically; * conditional debugging code (CONFIG_IP_DCCP_ACKVEC). The reason for removing the conditional debugging code is that Ack Vectors are an almost inevitable necessity - RFC 4341 says that for CCID-2, Ack Vectors must be used. Furthermore, the code would be only interesting for coding - after some extensive testing with this patch set, having the debug code around is no longer of real help. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:37 +02:00
Gerrit Renker	c2f42077bd	dccp ccid-2: Schedule Sync as out-of-band mechanism The problem with Ack Vectors is that i) their length is variable and can in principle grow quite large, ii) it is hard to predict exactly how large they will be. Due to the second point it seems not a good idea to reduce the MPS; in particular when on average there is enough room for the Ack Vector and an increase in length is momentarily due to some burst loss, after which the Ack Vector returns to its normal/average length. The solution taken by this patch is to subtract a minimum-expected Ack Vector length from the MPS (previous patch), and to defer any larger Ack Vectors onto a separate Sync - but only if indeed there is no space left on the skb. This patch provides the infrastructure to schedule Sync-packets for transporting (urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since it does not need to wait for new application data. It can thus serve other parts of the DCCP code as well. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:37 +02:00
Gerrit Renker	283fb4a5f3	dccp ccid-2: Consolidate Ack-Vector processing within main DCCP module This aggregates Ack Vector processing (handling input and clearing old state) into one function, for the following reasons and benefits: * all Ack Vector-specific processing is now in one place; * duplicated code is removed; * ensuring sanity: from an Ack Vector point of view, it is better to clear the old state first before entering new state; * Ack Event handling happens mostly within the CCIDs, not the main DCCP module. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:37 +02:00
Gerrit Renker	e28fe59f9c	dccp ccid-2: Update code for the Ack Vector input/registration routine This patch uupdates the code which registers new packets as received, using the new circular buffer interface. It contributes a new algorithm which * supports both tail/head pointers and buffer wrap-around and * deals with overflow (head/tail move in lock-step). The updated code is also partioned differently, into 1. dealing with the empty buffer, 2. adding new packets into non-empty buffer, 3. reserving space when encountering a `hole' in the sequence space, 4. updating old state and deciding when old state is irrelevant. Protection against large burst losses: With regard to (3), it is too costly to reserve space when there are large bursts of losses. When bursts get too large, the code does no longer reserve space and just fills in cells normally. This measure reduces space consumption by a factor of 63. The code reuses in part the previous implementation by Arnaldo de Melo. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:37 +02:00
Gerrit Renker	68b1de1576	dccp ccid-2: Algorithm to update buffer state This provides a routine to consistently update the buffer state when the peer acknowledges receipt of Ack Vectors; updating state in the list of Ack Vectors as well as in the circular buffer. While based on RFC 4340, several additional (and necessary) precautions were added to protect the consistency of the buffer state. These additions are essential, since analysis and experience showed that the basic algorithm was insufficient for this task (which lead to problems that were hard to debug). The algorithm now * deals with HC-sender acknowledging to HC-receiver and vice versa, * keeps track of the last unacknowledged but received seqno in tail_ackno, * has special cases to reset the overflow condition when appropriate, * is protected against receiving older information (would mess up buffer state). Note: The older code performed an unnecessary step, where the sender cleared Ack Vector state by parsing the Ack Vector received by the HC-receiver. Doing this was entirely redundant, since * the receiver always puts the full acknowledgment window (groups 2,3 in 11.4.2) into the Ack Vectors it sends; hence the HC-receiver is only interested in the highest state that the HC-sender received; * this means that the acknowledgment number on the (Data)Ack from the HC-sender is sufficient; and work done in parsing earlier state is not necessary, since the later state subsumes the earlier one (see also RFC 4340, A.4). This older interface (dccp_ackvec_parse()) is therefore removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:37 +02:00
Gerrit Renker	d7dc7e5f49	dccp ccid-2: Implementation of circular Ack Vector buffer with overflow handling This completes the implementation of a circular buffer for Ack Vectors, by extending the current (linear array-based) implementation. The changes are: (a) An `overflow' flag to deal with the case of overflow. As before, dynamic growth of the buffer will not be supported; but code will be added to deal robustly with overflowing Ack Vector buffers. (b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A in RFC 4340, problems arise whenever subsequent Ack Vector records overlap, which can bring the entire run length calculation completely out of synch. (This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\ ack_vectors/tracking_tail_ackno/ .) (c) The buffer lengthi is now computed dynamically (i.e. current fill level), as the span between head to tail. As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured. Note on overflow handling: ------------------------- The Ack Vector code previously simply started to drop packets when the Ack Vector buffer overflowed. This means that the userspace application will not be able to receive, only because of an Ack Vector storage problem. Furthermore, overflow may be transient, so that applications may later recover from the overflow. Recovering from dropped packets is more difficult (e.g. video key frames). Hence the patch uses a different policy: when the buffer overflows, the oldest entries are subsequently overwritten. This has a higher chance of recovery. Details are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/ Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:36 +02:00
Gerrit Renker	4829007c7b	dccp ccid-2: Separate internals of Ack Vectors from option-parsing code This patch * separates Ack Vector housekeeping code from option-insertion code; * shifts option-specific code from ackvec.c into options.c; * introduces a dedicated routine to take care of the Ack Vector records; * simplifies the dccp_ackvec_insert_avr() routine: the BUG_ON was redundant, since the list is automatically arranged in descending order of ack_seqno. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:36 +02:00
Gerrit Renker	ff49e27089	dccp ccid-2: Ack Vector interface clean-up This patch brings the Ack Vector interface up to date. Its main purpose is to lay the basis for the subsequent patches of this set, which will use the new data structure fields and routines. There are no real algorithmic changes, rather an adaptation: (1) Replaced the static Ack Vector size (2) with a #define so that it can be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems to be sufficient for the moment) and added a solution so that computing the ECN nonce will continue to work - even with larger Ack Vectors. (2) Replaced the #defines for Ack Vector states with a complete enum. (3) Replaced #defines to compute Ack Vector length and state with general purpose routines (inlines), and updated code to use these. (4) Added a `tail' field (conversion to circular buffer in subsequent patch). (5) Updated the (outdated) documentation for Ack Vector struct. (6) All sequence number containers now trimmed to 48 bits. (7) Removal of unused bits: * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record); * removed Elapsed Time for Ack Vectors (it was nowhere used); * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since the code needs to be able to remember the old run length; * reduced the de-/allocation routines (redundant / duplicate tests). Justification for removing Elapsed Time information [can be removed]: --------------------------------------------------------------------- 1. The Elapsed Time information for Ack Vectors was nowhere used in the code. 2. DCCP does not implement rate-based pacing of acknowledgments. The only recommendation for always including Elapsed Time is in section 11.3 of RFC 4340: "Receivers that rate-pace acknowledgements SHOULD [...] include Elapsed Time options". But such is not the case here. 3. It does not really improve estimation accuracy. The Elapsed Time field only records the time between the arrival of the last acknowledgeable packet and the time the Ack Vector is sent out. Since Linux does not (yet) implement delayed Acks, the time difference will typically be small, since often the arrival of a data packet triggers sending feedback at the HC-receiver. Justification for changes in de-/allocation routines [can be removed]: ---------------------------------------------------------------------- * INIT_LIST_HEAD in dccp_ackvec_record_new was redundant, since the list pointers were later overwritten when the node was added via list_add(); * dccp_ackvec_record_new() was called in a single place only; * calls to list_del_init() before calling dccp_ackvec_record_delete() were redundant, since subsequently the entire element was k-freed; * since all calls to dccp_ackvec_record_delete() were preceded to a call to list_del_init(), the WARN_ON test would never evaluate to true; * since all calls to dccp_ackvec_record_delete() were made from within list_for_each_entry_safe(), the test for avr == NULL was redundant; * list_empty() in ackvec_free was redundant, since the same condition is embedded in the loop condition of the subsequent list_for_each_entry_safe(). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:36 +02:00
Gerrit Renker	b8c6bcee1d	dccp: Reduce noise in output and convert to ktime_t This fixes the problem that dccp_probe output can grow quite large without apparent benefit (many identical data points), creating huge files (up to over one Gigabyte for a few minutes' test run) which are very hard to post-process (in one instance it got so bad that gnuplot ate up all memory plus swap). The cause for the problem is that the kprobe is inserted into dccp_sendmsg(), which can be called in a polling-mode (whenever the TX queue is full due to congestion-control issues, EAGAIN is returned). This creates many very similar data points, i.e. the increase of processing time does not increase the quality/information of the probe output. The fix is to attach the probe to a different function -- write_xmit was chosen since it gets called continually (both via userspace and timer); an input-path function would stop sampling as soon as the other end stops sending feedback. For comparison the output file sizes for the same 20 second test run over a lossy link: * before / without patch: 118 Megabytes * after / with patch: 1.2 Megabytes and there was much less noise in the output. To allow backward compatibility with scripts that people use, the now-unused `size' field in the output has been replaced with the CCID identifier. This also serves for future compatibility - support for CCID2 is work in progress (depends on the still unfinished SRTT/RTTVAR updates). While at it, the update to ktime_t was also performed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:36 +02:00
Gerrit Renker	a9c1656ab1	dccp: Merge now-reduced connect_init() function After moving the assignment of GAR/ISS from dccp_connect_init() to dccp_transmit_skb(), the former function becomes very small, so that a merger with dccp_connect() suggests itself. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:35 +02:00
Gerrit Renker	bfbddd085a	dccp: Fix the adjustments to AWL and SWL This fixes a problem and a potential loophole with regard to seqno/ackno validity: the problem is that the initial adjustments to AWL/SWL were only performed at the begin of the connection, during the handshake. Since the Sequence Window feature is always greater than Wmin=32 (7.5.2), it is however necessary to perform these adjustments at least for the first W/W' (variables as per 7.5.1) packets in the lifetime of a connection. This requirement is complicated by the fact that W/W' can change at any time during the lifetime of a connection. Therefore the consequence is to perform this safety check each time SWL/AWL are updated. A second problem solved by this patch is that the remote/local Sequence Window feature values (which set the bounds for AWL/SWL/SWH) are undefined until the feature negotiation has completed. During the initial handshake we have more stringent sequence number protection, the changes added by this patch effect that {A,S}W{L,H} are within the correct bounds at the instant that feature negotiation completes (since the SeqWin feature activation handlers call dccp_update_gsr/gss()). A detailed rationale is below -- can be removed from the commit message. 1. Server sequence number checks during initial handshake --------------------------------------------------------- The server can not use the fields of the listening socket for seqno/ackno checks and thus needs to store all relevant information on a per-connection basis on the dccp_request socket. This is a size-constrained structure and has currently only ISS (dreq_iss) and ISR (dreq_isr) defined. Adding further fields (SW{L,H}, AW{L,H}) would increase the size of the struct and it is questionable whether this will have any practical gain. The currently implemented solution is as follows. * receiving first Request: dccp_v{4,6}_conn_request sets ISR := P.seqno, ISS := dccp_v{4,6}_init_sequence() * sending first Response: dccp_v{4,6}_send_response via dccp_make_response() sets P.seqno := ISS, sets P.ackno := ISR * receiving retransmitted Request: dccp_check_req() overrides ISR := P.seqno * answering retransmitted Request: dccp_make_response() sets ISS += 1, otherwise as per first Response * completing the handshake: succeeds in dccp_check_req() for the first Ack where P.ackno == ISS (P.seqno is not tested) * creating child socket: ISS, ISR are copied from the request_sock This solution will succeed whenever the server can receive the Request and the subsequent Ack in succession, without retransmissions. If there is packet loss, the client needs to retransmit until this condition succeeds; it will otherwise eventually give up. Adding further fields to the request_sock could increase the robustness a bit, in that it would make possible to let a reordered Ack (from a retransmitted Response) pass. The argument against such a solution is that if the packet loss is not persistent and an Ack gets through, why not wait for the one answering the original response: if the loss is persistent, it is probably better to not start the connection in the first place. Long story short: the present design (by Arnaldo) is simple and will likely work just as well as a more complicated solution. As a consequence, {A,S}W{L,H} are not needed until the moment the request_sock is cloned into the accept queue. At that stage feature negotiation has completed, so that the values for the local and remote Sequence Window feature (7.5.2) are known, i.e. we are now in a better position to compute {A,S}W{L,H}. 2. Client sequence number checks during initial handshake --------------------------------------------------------- Until entering PARTOPEN the client does not need the adjustments, since it constrains the Ack window to the packet it sent. * sending first Request: dccp_v{4,6}_connect() choose ISS, dccp_connect() then sets GAR := ISS (as per 8.5), dccp_transmit_skb() (with the previous bug fix) sets GSS := ISS, AWL := ISS, AWH := GSS * n-th retransmitted Request (with previous patch): dccp_retransmit_skb() via timer calls dccp_transmit_skb(), which sets GSS := ISS+n and then AWL := ISS, AWH := ISS+n * receiving any Response: dccp_rcv_request_sent_state_process() -- accepts packet if AWL <= P.ackno <= AWH; -- sets GSR = ISR = P.seqno * sending the Ack completing the handshake: dccp_send_ack() calls dccp_transmit_skb(), which sets GSS += 1 and AWL := ISS, AWH := GSS Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:35 +02:00
Gerrit Renker	2975abd251	dccp: Schedule an Ack when receiving timestamps This schedules an Ack when receiving a timestamp, exploiting the existing inet_csk_schedule_ack() function, saving one case in the `dccp_ack_pending()' function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:35 +02:00
Gerrit Renker	d0995e6a9e	dccp ccid-3: Remove dead states This patch is thanks to an investigation by Leandro Sales de Melo and his colleagues. They worked out two state diagrams which highlight the fact that the xxx_TERM states in CCID-3/4 are in fact not necessary. And this can be confirmed by in turn looking at the code: the xxx_TERM states are only ever set in ccid3_hc_{rx,tx}_exit(). These two functions are part of the following call chain: * ccid_hc_{tx,rx}_exit() are called from ccid_delete() only; * ccid_delete() invokes ccid_hc_{tx,rx}_exit() in the way of a destructor: after calling ccid_hc_{tx,rx}_exit(), the CCID is released from memory; * ccid_delete() is in turn called only by ccid_hc_{tx,rx}_delete(); * ccid_hc_{tx,rx}_delete() is called only if - feature negotiation failed (dccp_feat_activate_values()), - when changing the RX/TX CCID (to eject the current CCID), - when destroying the socket (in dccp_destroy_sock()). In other words, when CCID-3 sets the state to xxx_TERM, it is at a time where no more processing should be going on, hence it is not necessary to introduce a dedicated exit state - this is implicit when unloading the CCID. The patch removes this state, one switch-statement collapses as a result. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:35 +02:00
Gerrit Renker	5fe94963a1	dccp ccid-3: Remove duplicate documentation This removes RX-socket documentation which is either duplicate or non-existent. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:35 +02:00
Gerrit Renker	c506d91d9a	dccp: Unused argument in CCID tx function This removes the argument `more' from ccid_hc_tx_packet_sent, since it was nowhere used in the entire code. (Anecdotally, this argument was not even used in the original KAME code where the function originally came from; compare the variable moreToSend in the freebsd61-dccp-kame-28.08.2006.patch now maintained by Emmanuel Lochin.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:35 +02:00
Gerrit Renker	f10ecaee6d	dccp: Replace magic CCID-specific numbers by symbolic constants The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but instead for the CCID-specific options numbers are used. This patch unifies the use of CCID-specific option numbers, by adding symbolic names reflecting the definitions in RFC 4340, 10.3. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:34 +02:00
Gerrit Renker	ce177ae2e6	dccp ccid-3: Remove redundant 'options_received' struct The `options_received' struct is redundant, since it re-duplicates the existing `p' and `x_recv' fields. This patch removes the sub-struct and migrates the format conversion operations (cf. below) to ccid3_hc_tx_parse_options(). Why the fields are redundant ---------------------------- The Loss Event Rate p and the Receive Rate x_recv are initially 0 when first loading CCID-3, as ccid_new() zeroes out the entire ccid3_hc_tx_sock. When Loss Event Rate or Receive Rate options are received, they are stored by ccid3_hc_tx_parse_options() into the fields `ccid3or_loss_event_rate' and `ccid3or_receive_rate' of the sub-struct `options_received' in ccid3_hc_tx_sock. After parsing (considering only the established state - dccp_rcv_established()), the packet is passed on to ccid_hc_tx_packet_recv(). This calls the CCID-3 specific routine ccid3_hc_tx_packet_recv(), which performs the following copy operations between fields of ccid3_hc_tx_sock: * hctx->options_received.ccid3or_receive_rate is copied into hctx->x_recv, after scaling it for fixpoint arithmetic, by 2^64; * hctx->options_received.ccid3or_loss_event_rate is copied into hctx->p, considering the above special cases; in addition, a value of 0 here needs to be mapped into p=0 (when no Loss Event Rate option has been received yet). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:34 +02:00
Gerrit Renker	535c55df13	dccp tfrc/ccid-3: Computing Loss Rate from Loss Event Rate This adds a function to take care of the following cases occurring in the computation of the Loss Rate p: * 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5; * 1/0 is mapped into the maximum of 100%; * we want to avoid that p = 1/x is rounded down to 0 when x is very large, since this means accidentally re-entering slow-start (indicated by p==0). In the last case, the minimum-resolution value of p is returned. Furthermore, a bug in ccid3_hc_rx_getsockopt is fixed (1/0 was mapped into ~0U), which now allows to consistently print the scaled p-values as printf("Loss Event Rate = %u.%04u %%\n", rx_info.tfrcrx_p / 10000, rx_info.tfrcrx_p % 10000); Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:34 +02:00
Gerrit Renker	3306c781ff	dccp: Add packet type information to CCID-specific option parsing This patch ... 1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state which options may (not) appear on what packet type. 2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets") and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets"). 3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This is also no longer necessary, since the CCID-specific option-parsing routines are passed every single parameter of the type-length-value option encoding. Also added documentation and made argument naming scheme consistent. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:34 +02:00
Gerrit Renker	47a61e7b43	dccp ccid-3: Simplify and consolidate tx_parse_options This simplifies and consolidates the TX option-parsing code: 1. The Loss Intervals option is not currently used, so dead code related to this option is removed. I am aware of no plans to support the option, but if someone wants to implement it (e.g. for inter-op tests), it is better to start afresh than having to also update currently unused code. 2. The Loss Event and Receive Rate options have a lot of code in common (both are 32 bit, both have same length etc.), so this is consolidated. 3. The test against GSR is not necessary, because - on first loading CCID3, ccid_new() zeroes out all fields in the socket; - ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to pinv = opt_recv->ccid3or_loss_event_rate; if (pinv == ~0U \|\| pinv == 0) hctx->p = 0; - as a result, the sequence number field is removed from opt_recv. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:34 +02:00
Gerrit Renker	63b3a73bb8	dccp ccid-3: Remove ugly RTT-sampling history lookup This removes the RTT-sampling function tfrc_tx_hist_rtt(), since 1. it suffered from complex passing of return values (the return value both indicated successful lookup while the value doubled as RTT sample); 2. when for some odd reason the sample value equalled 0, this triggered a bug warning about "bogus Ack", due to the ambiguity of the return value; 3. on a passive host which has not sent anything the TX history is empty and thus will lead to unwanted "bogus Ack" warnings such as ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-28197148 ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-26641606. The fix is to replace the implicit encoding by performing the steps manually. Furthermore, the "bogus Ack" warning has been removed, since it can actually be triggered due to several reasons (network reordering, old packet, (3) above), hence it is not very useful. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:34 +02:00
Gerrit Renker	de6f2b59e5	dccp ccid-3: Bug fix for the inter-packet scheduling algorithm This fixes a subtle bug in the calculation of the inter-packet gap and shows that t_delta, as it is currently used, is not needed. And hence replaced. The algorithm from RFC 3448, 4.6 below continually computes a send time t_nom, which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies the scheduling granularity, s the packet size, and X the sending rate: t_distance = t_nom - t_now; // in microseconds t_delta = min(t_ipi, t_gran) / 2; // `delta' parameter in microseconds if (t_distance >= t_delta) { reschedule after (t_distance / 1000) milliseconds; } else { t_ipi = s / X; // inter-packet interval in usec t_nom += t_ipi; // compute the next send time send packet now; } 1) Description of the bug ------------------------- Rescheduling requires a conversion into milliseconds, due to this call chain: * ccid3_hc_tx_send_packet() returns a timeout in milliseconds, * this value is converted by msecs_to_jiffies() in dccp_write_xmit(), * and finally used as jiffy-expires-value for sk_reset_timer(). The highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher granularity does not make much sense here. As a consequence, values of t_distance < 1000 are truncated to 0. This issue has so far been resolved by using instead if (t_distance >= t_delta + 1000) reschedule after (t_distance / 1000) milliseconds; The bug is in artificially inflating t_delta to t_delta' = t_delta + 1000. This is unnecessarily large, a more adequate value is t_delta' = max(t_delta, 1000). 2) Consequences of using the corrected t_delta' ----------------------------------------------- Since t_delta <= t_gran/2 = 10^6/(2HZ), we have t_delta <= 1000 as long as HZ >= 500. This means that t_delta' = max(1000, t_delta) is constant at 1000. On the other hand, when using a coarse HZ value of HZ < 500, we have three sub-cases that can all be reduced to using another constant of t_gran/2. (a) The first case arises when t_ipi > t_gran. Here t_delta' is the constant t_delta' = max(1000, t_gran/2) = t_gran/2. (b) If t_ipi <= 2000 < t_gran = 10^6/HZ usec, then t_delta = t_ipi/2 <= 1000, so that t_delta' = max(1000, t_delta) = 1000 < t_gran/2. (c) If 2000 < t_ipi <= t_gran, we have t_delta' = max(t_delta, 1000) = t_ipi/2. In the second and third cases we have delay values less than t_gran/2, which is in the order of less than or equal to half a jiffy. How these are treated depends on how fractions of a jiffy are handled: they are either always rounded down to 0, or always rounded up to 1 jiffy (assuming non-zero values). In both cases the error is on average in the order of 50%. Thus we are not increasing the error when in the second/third case we replace a value less than t_gran/2 with 0, by setting t_delta' to the constant t_gran/2. 3) Summary ---------- Fixing (1) and considering (2), the patch replaces t_delta with a constant, whose value depends on CONFIG_HZ, changing the above algorithm to: if (t_distance >= t_delta') reschedule after (t_distance / 1000) milliseconds; where t_delta' = 10^6/(2HZ) if HZ < 500, and t_delta' = 1000 otherwise. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:33 +02:00
Gerrit Renker	b2e317f4b5	dccp ccid-3: No more CCID control blocks in LISTEN state The CCIDs are activated as last of the features, at the end of the handshake, were the LISTEN state of the master socket is inherited into the server state of the child socket. Thus, the only states visible to CCIDs now are OPEN/PARTOPEN, and the closing states. This allows to remove tests which were previously necessary to protect against referencing a socket in the listening state (in CCID3), but which now have become redundant. As a further byproduct of enabling the CCIDs only after the connection has been fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock can now be eliminated: * the CCID is loaded, so it is not necessary to test if it is NULL, * if it is possible to load a CCID and leave the private area NULL, then this is a bug, which should crash loudly - and earlier, * the test for state==OPEN \|\| state==PARTOPEN now reduces only to the closing phase (e.g. when the node has received an unexpected Reset). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:33 +02:00
Gerrit Renker	842d1ef14f	dccp ccid-3: Remove ccid3hc{tx,rx}_ prefixes This patch does the same for CCID-3 as the previous patch for CCID-2: s#ccid3hctx_##g; s#ccid3hcrx_##g; plus manual editing to retain consistency. Please note: expanded the fields of the `struct tfrc_tx_info' in the hc_tx_sock, since using short #define identifiers is not a good idea. The only place where this embedded struct was used is ccid3_hc_tx_getsockopt(). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:33 +02:00
Gerrit Renker	1fb8750960	dccp ccid-2: Remove ccid2hc{tx,rx}_ prefixes This patch fixes two problems caused by the ubiquitous long "hctx->ccid2htx_" and "hcrx->ccid2hcrx_" prefixes: * code becomes hard to read; * multiple-line statements are almost inevitable even for simple expressions; The prefixes are not really necessary (compare with "struct tcp_sock"). There had been previous discussion of this on dccp@vger, but so far this was not followed up (most people agreed that the prefixes are too long). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Leandro Melo de Sales <leandroal@gmail.com>	2008-09-04 07:45:33 +02:00
Gerrit Renker	88ddac513a	dccp: Special case of the MPS for client-PARTOPEN with DataAcks To increase robustness, it is necessary to resend Confirm feature-negotiation options, even though the RFC does not mandate it. But feature negotiation options can take (much) more room than the options on common DataAck packets. Instead of reducing the MPS always for a case which only applies to the three messages send during initial handshake, this patch devises a special case: if the payload length of the DataAck in PARTOPEN is too large, an Ack is sent to carry the options, and the feature-negotiation list is then flushed. This means that the server gets two Acks for one Response. If both Acks get lost, it is probably better to restart the connection anyway and devising yet another special-case does not seem worth the extra complexity. The patch (over-)estimates the expected overhead to be 32*4 bytes -- commonly seen values were 20-90 bytes for initial feature-negotiation options. It uses sizeof(u32) to mean "aligned units of 4 bytes". For consistency, another use of sizeof is modified. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:33 +02:00
Gerrit Renker	55ebe3ab2d	dccp: Leave headroom for options when calculating the MPS The Maximum Packet Size (MPS) is of interest for applications which want to transfer data, so it is only relevant to the data transfer phase of a connection (unless one wants to send data on the DCCP-Request, but that is not considered here). The strategy chosen to deal with this requirement is to leave room for only such options that may appear on data packets. A special consideration applies to Ack Vectors: this is purely guesswork, since these can have any length between 3 and 1020 bytes. The strategy chosen here is to subtract a configurable minimum, the value of 16 bytes (2 bytes for type/length plus 14 Ack Vector cells) has been found by experimentatation. If people experience this as too much or too little, this could later be turned into a Kconfig option. There are currently no CCID-specific header options which may appear on data packets, hence it is not necessary to define a corresponding CCID field. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:33 +02:00
Gerrit Renker	2faae5587f	dccp ccid-2: Use feature-negotiation to report Ack Ratio changes This uses the new feature-negotiation framework to signal Ack Ratio changes, as required by RFC 4341, sec. 6.1.2. This raises some problems for CCID-2 since it can at the moment not cope gracefully with Ack Ratio of e.g. 2. A FIXME has thus been added which reverts to the existing policy of bypassing the Ack Ratio sysctl. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:32 +02:00
Gerrit Renker	4861a35443	dccp: Support for exchanging of NN options in established state This patch provides support for the reception of NN options in (PART)OPEN state. It is a combination of change_recv() and confirm_recv(), specifically geared towards receiving the `fast-path' NN options. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:32 +02:00
Gerrit Renker	624a965a93	dccp: Support for the exchange of NN options in established state In contrast to static feature negotiation at the begin of a connection, which establishes the capabilities of both endpoints, this patch introduces support for dynamic exchange of feature negotiation options. Such a dynamic exchange is necessary in at least two cases: * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection; * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as as the connection progresses". Both are NN (non-negotiable) features. Hence dynamic feature "negotiation" is distinguished from static/pre-connection negotiation by the following: * no new capabilities are negotiated (those that matter for the connection are negotiated prior to setting up the connection, comparable to SIP); * features must be understood by each endpoint: as per RFC 4340, 6.4, Sequence Window is "Req'd" and Ack Ratio must be understood when CCID-2 is used as per the note underneath Table 4. These characteristics are reflected in the implementation: * only NN options can be exchanged after connection setup; * NN options are activated directly after validating them. The rationale is that a peer must accept every valid NN value (RFC 4340, 6.3.2), hence it will either accept the value and send a "Confirm R", or it will send an empty Confirm (which will reset the connection according to FN rules). * An Ack is scheduled directly after activation to accelerate communicating the update to the peer. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:32 +02:00
Gerrit Renker	76f738a795	dccp: Debugging functions for feature negotiation Since all feature-negotiation processing now takes place in feat.c, functions for producing verbose debugging output are concentrated there. New functions to print out values, entry records, and options are provided, and also a macro is defined to not always have the function name in the output line. Thanks a lot to Wei Yongjun and Giuseppe Galeota for help with errors in an earlier revision of this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:32 +02:00
Gerrit Renker	0a4822679d	dccp: Initialisation and type-checking of feature sysctls This patch takes care of initialising and type-checking sysctls related to feature negotiation. Type checking is important since some of the sysctls now directly act on the feature-negotiation process. The sysctls are initialised with the known default values for each feature. For the type-checking the value constraints from RFC 4340 are used: * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes), tested and confirmed that it works up to 4294967295 - for Gbps speed; * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer); * CCIDs are between 0 .. 255; * request_retries, retries1, retries2 also between 0..255 for good measure; * tx_qlen is checked to be non-negative; * sync_ratelimit remains as before. Further changes: ---------------- Performed s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:32 +02:00
Gerrit Renker	51c7d4fa26	dccp: Implement both feature-local and feature-remote Sequence Window feature This adds full support for local/remote Sequence Window feature, from which the * sequence-number-validity (W) and * acknowledgment-number-validity (W') windows derive as specified in RFC 4340, 7.5.3. Specifically, the following changes are introduced: * integrated new socket fields into dccp_sk; * updated the update_gsr/gss routines with regard to these fields; * updated handler code: the Sequence Window feature is located at the TX side, so the local feature is meant if the handler-rx flag is false; * the initialisation of `rcv_wnd' in reqsk is removed, since - rcv_wnd is not used by the code anywhere; - sequence number checks are not done in the LISTEN state (cf. 7.5.3); - dccp_check_req checks the Ack number validity more rigorously; * the `struct dccp_minisock' became empty and is now removed. Until the handshake completes with activating negotiated values, the local/remote Sequence-Window values are undefined and thus can not reliably be estimated. This issue is addressed in a separate patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:32 +02:00
Gerrit Renker	09856c1089	dccp: Auto-load (when supported) CCID plugins for negotiation This adds auto-loading of CCIDs (when module loading is enabled) for the purpose of feature negotiation. The problem with loading the CCIDs at the end of feature negotiation is that this would happen in software interrupt context. Besides, if the host advertises CCIDs during negotiation, it should have them ready to use, in case an agreeing peer wants to use it for the connection. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:31 +02:00
Gerrit Renker	5d3dac267a	dccp: Initialisation framework for feature negotiation This initialises feature negotiation from two tables, which are initialised from sysctls. As a novel feature, specifics of the implementation (e.g. currently short seqnos and ECN are not supported) are advertised for robustness. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:31 +02:00
Gerrit Renker	b235dc4abb	dccp ccid-2: Phase out the use of boolean Ack Vector sysctl This removes the use of the sysctl and the minisock variable for the Send Ack Vector feature, which is now handled fully dynamically via feature negotiation; i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per RFC 4341, 4.). Using a sysctl in parallel to this implementation would open the door to crashes, since much of the code relies on tests of the boolean minisock / sysctl variable. Thus, this patch replaces all tests of type if (dccp_msk(sk)->dccpms_send_ack_vector) /* ... / with if (dp->dccps_hc_rx_ackvec != NULL) / ... / The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature negotiation concluded that Ack Vectors are to be used on the half-connection. Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child), so that the test is a valid one. The activation handler for Ack Vectors is called as soon as the feature negotiation has concluded at the server when the Ack marking the transition RESPOND => OPEN arrives; * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN. Adding the sequence number of the Response packet to the Ack Vector has been removed, since (a) connection establishment implies that the Response has been received; (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e. this entry will always be ignored; (c) it can not be used for anything useful - to detect loss for instance, only packets received after the loss can serve as pseudo-dupacks. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:31 +02:00
Gerrit Renker	68e074bfce	dccp: Remove manual influence on NDP Count feature Updating the NDP count feature is handled automatically now: * for CCID-2 it is disabled, since the code does not use NDP counts; * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths. Allowing the user to change NDP values leads to unpredictable and failing behaviour, since it is then possible to disable NDP counts even when they are needed (e.g. in CCID-3). This means that only those user settings are sensible that agree with the values for Send NDP Count implied by the choice of CCID. But those settings are already activated by the feature negotiation (CCID dependency tracking), hence this form of support is redundant. At startup the initialisation of the NDP count feature is with the default value of 0, which is done implicitly by the zeroing-out of the socket when it is allocated. If the choice of CCID or feature negotiation enables NDP count, this will then be updated via the NDP activation handler. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:31 +02:00
Gerrit Renker	78673e24df	dccp: Remove obsolete parts of the old CCID interface The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector case, their value equals initially that of the sysctl, but at the end of feature negotiation may be something different. The old interface removed by this patch thus has been replaced by the newer interface to dynamically query the currently loaded CCIDs earlier in this patch set. Also removed the constructors for the TX CCID and the RX CCID, since the switch rx/non-rx is done by the handler in minisocks.c (and the handler is the only place in the code where CCIDs are loaded). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:31 +02:00
Gerrit Renker	23479cbfd3	dccp: Clean up old feature-negotiation infrastructure The code removed by this patch is no longer referenced or used, the added lines update documentation and copyrights. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	c49b22729f	dccp: Integration of dynamic feature activation - part 3 (client side) This integrates feature-activation in the client, with these details: 1. When dccp_parse_options() fails, the reset code is already set, request_sent _state_process() currently overrides this with `Packet Error', which is not intended - so changed to use the reset code set in dccp_parse_options(); 2. There was a FIXME to change the error code when dccp_ackvec_add() fails. I have looked this up and found that: * the check whether ackno < ISN is already made earlier, * this Response is likely the 1st packet with an Ackno that the client gets, * so when dccp_ackvec_add() fails, the reason is likely not a packet error. 3. When feature negotiation fails, the socket should be marked as not usable, so that the application is notified that an error occurs. This is achieved by a new label, which uses an error code of `Aborted' and which sets the socket state to CLOSED, as well as sk_err. 4. Avoids parsing the Ack twice in Respond state by not doing option processing again in dccp_rcv_respond_partopen_state_process (as option processing has already been done on the request_sock in dccp_check_req). Since this addresses congestion-control initialisation, a corresponding FIXME has been removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	e70cacb90d	dccp: Integration of dynamic feature activation - part 2 (server side) This patch integrates the activation of features at the end of negotiation into the server-side code. Note: In dccp_create_openreq_child the request_sock argument is no longer constant, since dccp_activate_values() uses the feature-negotiation list on dreq to sort out the initialisation values for the different features of the child socket; and purges this queue after use (but the `req' argument to openreq_child can and does still remain constant). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	3a53a9adfa	dccp: Integration of dynamic feature activation - part 1 (socket setup) This first patch out of three replaces the hardcoded default settings with initialisation code for the dynamic feature negotiation. Note on retransmitting Confirm options: --------------------------------------- This patch also defers flushing the client feature-negotiation queue, due to the following considerations. As long as the client is in PARTOPEN, it needs to retransmit the Confirm options for the Change options received on the DCCP-Response from the server. Otherwise, if the packet containing the Confirm options gets dropped in the network, the connection aborts due to undefined feature negotiation state. Thanks to Leandro Melo de Sales who reported a bug in an earlier revision of the patch set, resulting from not retransmitting the Confirm options. The patch now ensures that the client feature-negotiation queue is flushed only when entering the OPEN state. Since confirmed Change options are removed as soon as they are confirmed (in the DCCP-Response), this ensures that Confirm options are retransmitted. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	c926c6aed3	dccp: Feature activation handlers This patch provides the post-processing of feature negotiation state, after the negotiation has completed. To this purpose, handlers are used and added to the dccp_feat_table. Each handler is passed a boolean flag whether the RX or TX side of the feature is meant. Several handlers are provided already, new handlers can easily be added. The initialisation is now fully dynamic, i.e. CCIDs are activated only after the feature negotiation. The integration of this dynamic activation is done in the subsequent patches. Thanks to Wei Yongjun for pointing out the necessity of skipping over empty Confirm options while copying the negotiated feature values. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	d2150b7bff	dccp: Processing Confirm options Analogous to the previous patch, this adds code to interpret incoming Confirm feature-negotiation options. Both functions operate on the feature-negotiation list of either the request_sock (server) or the dccp_sock (client). Thanks to Wei Yongjun for pointing out that it is overly restrictive to check the entire list of confirmed SP values. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	5a146b97d5	dccp: Process incoming Change feature-negotiation options This adds/replaces code for processing incoming ChangeL/R options. The main difference is that: * mandatory FN options are now interpreted inside the function (there are too many individual cases to do this externally); * the function returns an appropriate Reset code or 0, which is then used to fill in the data for the Reset packet. Old code, which is no longer used or referenced, has been removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:29 +02:00
Gerrit Renker	c664d4f4e2	dccp: Preference list reconciliation This provides two functions to * reconcile preference lists (with appropriate return codes) and * reorder the preference list if successful reconciliation changed the preferred value. The patch also removes the old code for processing SP/NN Change options, since new code to process these is mostly there already; related references have been commented out. The code for processing Change options follows in the next patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	f8a644c07e	dccp: Integrate feature-negotiation insertion code The patch implements insertion of feature negotiation at the server (listening and request socket) and the client (connecting socket). In dccp_insert_options(), several statements have been grouped together now to achieve (I hope) better efficiency by reducing the number of tests each packet has to go through: - Ack Vectors are sent if the packet is neither a Data or a Request packet; - a previous issue is corrected - feature negotiation options are allowed on DataAck packets (5.8). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	0ef118a017	dccp: Insert feature-negotiation options into skb This patch replaces the earlier insertion routine from options.c, so that code specific to feature negotiation can remain in feat.c. This is possible by calling a function already existing in options.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	cf9ddf73b9	dccp: Header option insertion routine for feature-negotiation The patch extends existing code: * Confirm options divide into the confirmed value plus an optional preference list for SP values. Previously only the preference list was echoed for SP values, now the confirmed value is added as per RFC 4340, 6.1; * length and sanity checks are added to avoid illegal memory (or NULL) access. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	d0440ee6f6	dccp: Support for Mandatory options Support for Mandatory options is provided by this patch, which will be used by subsequent feature-negotiation patches. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2008-09-04 07:45:28 +02:00
Gerrit Renker	b9aaac1c53	dccp: Increase the scope of variable-length htonl/ntohl functions This extends the scope of two available functions, encode\|decode_value_var, to work up to 6 (8) bytes, to match maximum requirements in the RFC. These functions are going to be used both by general option processing and feature negotiation code, hence declarations have been put into feat.h. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2008-09-04 07:45:28 +02:00
Gerrit Renker	c8041e264b	dccp: API to query the current TX/RX CCID This provides function to query the current TX/RX CCID dynamically, without reliance on the minisock value, using dynamic information available in the currently loaded CCID module. This query function is then used to (a) provide the getsockopt part for getting/setting CCIDs via sockopts; (b) replace the current test for "which CCID is in use" in probe.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:28 +02:00
Gerrit Renker	fade756f18	dccp: Set per-connection CCIDs via socket options With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which overrides the defaults set by the global sysctl variables for TX/RX CCIDs. To make full use of this facility, the remaining patches of this patch set are needed, which track dependencies and activate negotiated feature values. Note on the maximum number of CCIDs that can be registered: ----------------------------------------------------------- The maximum number of CCIDs that can be registered on the socket is constrained by the space in a Confirm/Change feature negotiation option. The space in these in turn depends on the size of header options as defined in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from ackvec.h into linux/dccp.h, clarifying its purpose. Relative to this size, the maximum number of CCID identifiers that can be present in a Confirm option (which always consumes 1 byte more than a Change option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the CCID-feature-type and one for the selected value. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:28 +02:00
Gerrit Renker	73bbe095bb	dccp: Tidy up setsockopt calls This splits the setsockopt calls into two groups, depending on whether an integer argument (val) is required and whether routines being called do their own locking. Some options (such as setting the CCID) use u8 rather than int, so that for these the test with regard to integer-sizeof can not be used. The second switch-case statement now only has those statements which need locking and which make use of `val'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.sg>	2008-09-04 07:45:28 +02:00
Gerrit Renker	17c30b40ed	dccp: Deprecate Ack Ratio sysctl This patch deprecates the Ack Ratio sysctl, since * Ack Ratio is entirely ignored by CCID-3 and CCID-4, * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1); * even if it would work in CCID-2, there is no point for a user to change it: - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2), - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts (since waiting for Acks which will never arrive in this window), - cwnd is not a user-configurable value. The only reasonable place for Ack Ratio is to print it for debugging. It is planned to do this later on, as part of e.g. dccp_probe. With this patch Ack Ratio is now under full control of feature negotiation: * Ack Ratio is resolved as a dependency of the selected CCID; * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to the default of 2, following RFC 4340, 11.3 - "New connections start with Ack Ratio 2 for both endpoints"; * what happens then is part of another patch set, since it concerns the dynamic update of Ack Ratio while the connection is in full flight. Thanks to Tomasz Grobelny for discussion leading up to this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2008-09-04 07:45:28 +02:00
Gerrit Renker	20f41eee82	dccp: Feature negotiation for minimum-checksum-coverage This provides feature negotiation for server minimum checksum coverage which so far has been missing. Since sender/receiver coverage values range only from 0...15, their type has also been reduced in size from u16 to u4. Feature-negotiation options are now generated for both sender and receiver coverage, i.e. when the peer has `forgotten' to enable partial coverage then feature negotiation will automatically enable (negotiate) the partial coverage value for this connection. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	668144f7b4	dccp: Deprecate old setsockopt framework The previous setsockopt interface, which passed socket options via struct dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to ugly code since the old approach did not distinguish between NN and SP values. This patch removes the old setsockopt interface and replaces it with two new functions to register NN/SP values for feature negotiation. These are essentially wrappers around the internal __feat_register functions, with checking added to avoid * wrong usage (type); * changing values while the connection is in progress. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:27 +02:00
Gerrit Renker	d4c8741c43	dccp: Mechanism to resolve CCID dependencies This adds a hook to resolve features whose value depends on the choice of CCID. It is done at the server since it can only be done after the CCID values have been negotiated; i.e. the client will add its CCID preference list on the Change options sent in the Request, which will be reconciled with the local preference list of the server. The concept is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\ implementation_notes.html#ccid_dependencies Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	093e1f46cf	dccp: Resolve dependencies of features on choice of CCID This provides a missing link in the code chain, as several features implicitly depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector feature, but also Ack Ratio and Send Loss Event Rate (also taken care of). For Send Ack Vector, the situation is as follows: * since CCID2 mandates the use of Ack Vectors, there is no point in allowing endpoints which use CCID2 to disable Ack Vector features such a connection; * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4); * for all other CCIDs, the use of (Send) Ack Vector is optional and thus negotiable. However, this implies that the code negotiating the use of Ack Vectors also supports it (i.e. is able to supply and to either parse or ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack Vector support), the use of Ack Vectors is here disabled, with a comment in the source code. An analogous consideration arises for the Send Loss Event Rate feature, since the CCID-3 implementation does not support the loss interval options of RFC 4342. To make such use explicit, corresponding feature-negotiation options are inserted which signal the use of the loss event rate option, as it is used by the CCID3 code. Lastly, the values of the Ack Ratio feature are matched to the choice of CCID. The patch implements this as a function which is called after the user has made all other registrations for changing default values of features. The table is variable-length, the reserved (and hence for feature-negotiation invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0' is used to mark the end of the table. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	71bb49596b	dccp: Query supported CCIDs This provides a data structure to record which CCIDs are locally supported and three accessor functions: - a test function for internal use which is used to validate CCID requests made by the user; - a copy function so that the list can be used for feature-negotiation; - documented getsockopt() support so that the user can query capabilities. The data structure is a table which is filled in at compile-time with the list of available CCIDs (which in turn depends on the Kconfig choices). Using the copy function for cloning the list of supported CCIDs is useful for feature negotiation, since the negotiation is now with the full list of available CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation will not fail if the peer requests to use CCID3 instead of CCID2. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	86349c8d9c	dccp: Registration routines for changing feature values Two registration routines, for SP and NN features, are provided by this patch, replacing a previous routine which was used for both feature types. These are internal-only routines and therefore start with `__feat_register'. It further exports the known limits of Sequence Window and Ack Ratio as symbolic constants. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	5591d28628	dccp: Limit feature negotiation to connection setup phase This patch starts the new implementation of feature negotiation: 1. Although it is theoretically possible to perform feature negotiation at any time (and RFC 4340 supports this), in practice this is prohibitively complex, as it requires to put traffic on hold for each new negotiation. 2. As a byproduct of restricting feature negotiation to connection setup, the feature-negotiation retransmit timer is no longer required. This part is now mapped onto the protocol-level retransmission. Details indicating why timers are no longer needed can be found on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\ implementation_notes.html This patch disables anytime negotiation, subsequent patches work out full feature negotiation support for connection setup. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:27 +02:00
Gerrit Renker	702083839b	dccp: Cleanup routines for feature negotiation This inserts the required de-allocation routines for memory allocated by feature negotiation in the socket destructors, replacing dccp_feat_clean() in one instance. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	828755cee0	dccp: Per-socket initialisation of feature negotiation This provides feature-negotiation initialisation for both DCCP sockets and DCCP request_sockets, to support feature negotiation during connection setup. It also resolves a FIXME regarding the congestion control initialisation. Thanks to Wei Yongjun for help with the IPv6 side of this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	3001fc0569	dccp: List management for new feature negotiation This adds list fields and list management functions for the new feature negotiation implementation. The new code is kept in parallel to the old code, until removed at the end of the patch set. Thanks to Arnaldo for suggestions to improve the code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	b4eec20637	dccp: Implement lookup table for feature-negotiation information A lookup table for feature-negotiation information, extracted from RFC 4340/42, is provided by this patch. All currently known features can be found in this table, along with their feature location, their default value, and type. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	5c7c9451f1	dccp: Basic data structure for feature negotiation This patch prepares for the new and extended feature-negotiation routines. The following feature-negotiation data structures are provided: * a container for the various (SP or NN) values, * symbolic state names to track feature states, * an entry struct which holds all current information together, * elementary functions to fill in and process these structures. Entry structs are arranged as FIFO for the following reason: RFC 4340 specifies that if multiple options of the same type are present, they are processed in the order of their appearance in the packet; which means that this order needs to be preserved in the local data structure (the later insertion code also respects this order). The struct list_head has been chosen for the following reasons: the most frequent operations are * add new entry at tail (when receiving Change or setting socket options); * delete entry (when Confirm has been received); * deep copy of entire list (cloning from listening socket onto request socket). The NN value has been set to 64 bit, which is a currently sufficient upper limit (Sequence Window feature has 48 bit). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	959fd992f0	dccp ccid-3: Replace lazy BUG_ON with condition The BUG_ON(w_tot == 0) only holds if there is no more than 1 loss interval in the loss history. If there is only a single loss interval, the calc_i_mean() routine need in fact not be called (RFC 3448, 6.3.1). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	432649916b	dccp: Toggle debug output without module unloading This sets the sysfs permissions so that root can toggle the `debug' parameter available for nearly every DCCP module. This is useful since there are various module inter-dependencies. The debug flag can now be toggled at runtime using echo 1 > /sys/module/dccp/parameters/dccp_debug echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug The last is not very useful yet, since no code at the moment calls the tfrc_debug() macro. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	48816322ad	dccp: Empty the write queue when disconnecting dccp_disconnect() can be called due to several reasons: 1. when the connection setup failed (inet_stream_connect()); 2. when shutting down (inet_shutdown(), inet_csk_listen_stop()); 3. when aborting the connection (dccp_close() with 0 linger time). In case (1) the write queue is empty. This patch empties the write queue, if in case (2) or (3) it was not yet empty. This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues() later on. It also seems natural to do: when breaking an association, to delete all packets that were originally intended for the soon-disconnected end (compare with call to tcp_write_queue_purge in tcp_disconnect()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	eac7726bf5	dccp: Fill in the Data fields for "Option Error" Resets This updates the use of the `out_invalid_option' label, which produces a Reset (code 5, "Option Error"), to fill in the Data1...Data3 fields as specified in RFC 4340, 5.6. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	faf61c3319	dccp: Silently ignore options with nonsensical lengths This updates the option-parsing code with regard to RFC 4340, 5.8: "[..] options with nonsensical lengths (length byte less than two or more than the remaining space in the options portion of the header) MUST be ignored, and any option space following an option with nonsensical length MUST likewise be ignored." Hence in the following cases erratic options will be ignored: 1. The type byte of a multi-byte option is the last byte of the header options (i.e. effective option length of 1). 2. The value of the length byte is less than the minimum 2. This has been changed from previously 3: although no multi-byte option with a length less than 3 yet exists (cf. table 3 in 5.8), a length of 2 is valid. (The switch-statement in dccp_parse has further per-option length checks.) 3. The option length exceeds the length of the remaining option space. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:24 +02:00
Wei Yongjun	ba1a6c7bc0	dccp: Always generate a Reset in response to option errors RFC4340 states that if a packet is received with an option error (such as a Mandatory Option as the last byte of the option list), the endpoint should repond with a Reset. In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset, while in the REQUEST/OPEN states, packets with option errors are just ignored. The packet sequence is as follows: Case 1: Endpoint A Endpoint B (CLOSED) (CLOSED) <---------------- REQUEST RESPONSE -----------------> (1) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (1) currently just ignored, no Reset is sent Case 2: Endpoint A Endpoint B (OPEN) (OPEN) DATA-ACK -----------------> (2) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (2) currently just ignored, no Reset is sent This patch fixes the problem, by generating a Reset instead of silently ignoring option errors. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:24 +02:00
Linus Torvalds	316343e2cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: bnx2x: Accessing un-mapped page ath9k: Fix TX control flag use for no ACK and RTS/CTS ath9k: Fix TX status reporting iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove iwlwifi: call apm stop on exit iwlwifi: fix Tx cmd memory allocation failure handling iwlwifi: fix rx_chain computation iwlwifi: fix station mimo power save values iwlwifi: remove false rxon if rx chain changes iwlwifi: fix hidden ssid discovery in passive channels iwlwifi: W/A for the TSF correction in IBSS netxen: Remove workaround for chipset quirk pcnet-cs, axnet_cs: add new IDs, remove dup ID with less info ixgbe: initialize interrupt throttle rate net/usb/pegasus: avoid hundreds of diagnostics tipc: Don't use structure names which easily globally conflict.	2008-09-03 16:21:02 -07:00
David S. Miller	6c00055a81	tipc: Don't use structure names which easily globally conflict. Andrew Morton reported a build failure on sparc32, because TIPC uses names like "struct node" and there is a like named data structure defined in linux/node.h This just regexp replaces "struct node" to "struct tipc_node" to avoid this and any future similar problems. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 23:38:32 -07:00
Linus Torvalds	d26acd92fa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ipsec: Fix deadlock in xfrm_state management. ipv: Re-enable IP when MTU > 68 net/xfrm: Use an IS_ERR test rather than a NULL test ath9: Fix ath_rx_flush_tid() for IRQs disabled kernel warning message. ath9k: Incorrect key used when group and pairwise ciphers are different. rt2x00: Compiler warning unmasked by fix of BUILD_BUG_ON mac80211: Fix debugfs union misuse and pointer corruption wireless/libertas/if_cs.c: fix memory leaks orinoco: Multicast to the specified addresses iwlwifi: fix 64bit platform firmware loading iwlwifi: fix apm_stop (wrong bit polarity for FLAG_INIT_DONE) iwlwifi: workaround interrupt handling no some platforms iwlwifi: do not use GFP_DMA in iwl_tx_queue_init net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS net: Unbreak userspace usage of linux/mroute.h pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock() ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0	2008-09-02 21:02:14 -07:00
David S. Miller	37b08e34a9	ipsec: Fix deadlock in xfrm_state management. Ever since commit `4c563f7669` ("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is illegal to call __xfrm_state_destroy (and thus xfrm_state_put()) with xfrm_state_lock held. If we do, we'll deadlock since we have the lock already and __xfrm_state_destroy() tries to take it again. Fix this by pushing the xfrm_state_put() calls after the lock is dropped. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 20:14:15 -07:00
Thomas Graf	2c10b32bf5	netlink: Remove compat API for nested attributes Removes all _nested_compat() functions from the API. The prio qdisc no longer requires them and netem has its own format anyway. Their existance is only confusing. Resend: Also remove the wrapper macro. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:30:27 -07:00
Breno Leitao	06770843c2	ipv: Re-enable IP when MTU > 68 Re-enable IP when the MTU gets back to a valid size. This patch just checks if the in_dev is NULL on a NETDEV_CHANGEMTU event and if MTU is valid (bigger than 68), then re-enable in_dev. Also a function that checks valid MTU size was created. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:28:58 -07:00
Julien Brunel	9d7d74029e	net/xfrm: Use an IS_ERR test rather than a NULL test In case of error, the function xfrm_bundle_create returns an ERR pointer, but never returns a NULL pointer. So a NULL test that comes after an IS_ERR test should be deleted. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @match_bad_null_test@ expression x, E; statement S1,S2; @@ x = xfrm_bundle_create(...) ... when != x = E * if (x != NULL) S1 else S2 // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:24:28 -07:00
Jouni Malinen	2b58b20939	mac80211: Fix debugfs union misuse and pointer corruption debugfs union in struct ieee80211_sub_if_data is misused by including a common default_key dentry as a union member. This ends occupying the same memory area with the first dentry in other union members (structures; usually drop_unencrypted). Consequently, debugfs operations on default_key symlinks and drop_unencrypted entry are using the same dentry pointer even though they are supposed to be separate ones. This can lead to removing entries incorrectly or potentially leaving something behind since one of the dentry pointers gets lost. Fix this by moving the default_key dentry to a new struct (common_debugfs) that contains dentries (more to be added in future) that are shared by all vif types. The debugfs union must only be used for vif type-specific entries to avoid this type of pointer corruption. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-09-02 17:39:50 -04:00
Florian Mickler	d9664741e0	net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS Current setup with hal and NetworkManager will fail to work without newest hal version with this config option disabled. Although this will solve itself by time, at the moment it is dishonest to say that we don't know any software that uses it, if there are many many people relying on old hal versions. Signed-off-by: Florian Mickler <florian@mickler.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-09-02 15:03:19 -04:00
Linus Torvalds	e77295dc9e	Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux * 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: nfsd: fix buffer overrun decoding NFSv4 acl sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports nfsd: fix compound state allocation error handling svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet	2008-09-02 10:58:11 -07:00
Cyrill Gorcunov	27df6f25ff	sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports Vegard Nossum reported ---------------------- > I noticed that something weird is going on with /proc/sys/sunrpc/transports. > This file is generated in net/sunrpc/sysctl.c, function proc_do_xprt(). When > I "cat" this file, I get the expected output: > $ cat /proc/sys/sunrpc/transports > tcp 1048576 > udp 32768 > But I think that it does not check the length of the buffer supplied by > userspace to read(). With my original program, I found that the stack was > being overwritten by the characters above, even when the length given to > read() was just 1. David Wagner added (among other things) that copy_to_user could be probably used here. Ingo Oeser suggested to use simple_read_from_buffer() here. The conclusion is that proc_do_xprt doesn't check for userside buffer size indeed so fix this by using Ingo's suggestion. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> CC: Ingo Oeser <ioe-lkml@rameria.de> Cc: Neil Brown <neilb@suse.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Greg Banks <gnb@sgi.com> Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-09-01 14:24:24 -04:00
David S. Miller	b171e19ed0	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/mlme.c	2008-08-29 23:06:00 -07:00
Jarek Poplawski	102396ae65	pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock() Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where appropriate. The only difference is while dev is deactivated, when currently we can use a sleeping qdisc with the lock of noop_qdisc. This shouldn't be dangerous since after deactivation root lock could be used only by gen_estimator code, but looks wrong anyway. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-29 14:27:52 -07:00
Yang Hongyang	3cc76caa98	ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0 Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-29 14:27:51 -07:00
David S. Miller	143b11c03c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2008-08-29 14:02:13 -07:00
Henrique de Moraes Holschuh	1563574448	rfkill: rename rfkill_mutex to rfkill_global_mutex rfkill_mutex and rfkill->mutex are too easy to confuse with each other. Rename rfkill_mutex to rfkill_global_mutex, so that they are easier to tell apart with just one glance. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:11 -04:00
Henrique de Moraes Holschuh	f745ba03a1	rfkill: add WARN and BUG_ON paranoia (v2) BUG_ON() and WARN() the heck out of buggy drivers calling into the rfkill subsystem. Also switch from WARN_ON(1) to the new descriptive WARN(). Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:10 -04:00
Felipe Balbi	01b510b9c2	rfkill: add missing line break Trivial patch adding a missing line break on rfkill_claim_show(). Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.co> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:10 -04:00
Henrique de Moraes Holschuh	849e0576a7	rfkill: use strict_strtoul (v2) Switch sysfs parsing to something that actually works properly. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:10 -04:00
Jouni Malinen	36aedc903e	mac80211/cfg80211: HT capabilities for NEW_STA Allow userspace (e.g., hostapd) to set HT capabilities for associated STAs. This is based on a patch from Zhu Yi <yi.zhu@intel.com> (only the NL80211_ATTR_HT_CAPABILITY for NEW_STA part is included here). Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:09 -04:00
Daniel Wagner	2f58bbf27f	mac80211: Use only precedence level of DSCP field for frame classification Bit 4-5 of DSCP should not be considered by classify_d1. The 802.11 QoS Priority field is only depending on the precedence level. Signed-off-by: Daniel Wagner <wagi@monom.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:05 -04:00
Jouni Malinen	43ac2ca384	mac80211: Handle scan result IEs in one block Clean up and extend scan result processing by storing all the IEs from Beacon/Probe Response frames in a single block instead of allocating memory for each specific IE separately. This removes lot of unnecessary code and automatically supports reporting of new IEs (e.g., IEEE 802.11r) into user space without need to manually extend mac80211 scanning code whenever a new protocol adds IE(s). Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:05 -04:00
Jouni Malinen	9f1ba9062e	mac80211/cfg80211: Add BSS configuration options for AP mode This change adds a new cfg80211 command, NL80211_CMD_SET_BSS, to allow AP mode BSS parameters to be changed from user space (e.g., hostapd). The drivers using mac80211 are expected to be modified with separate changes to use the new BSS info parameter for short slot time in the bss_info_changed() handler. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:23:55 -04:00
Johannes Berg	7f93ea3e24	mac80211: fill start-sequence-number for BA session start Otherwise, drivers are required to keep track of the sequence numbers themselves, and they really shouldn't be since we already do it for them. I'll fix the race once we figure out how this code should work at all, it's currently disabled. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:23:55 -04:00
Eric Dumazet	a627266570	ip: speedup /proc/net/rt_cache handling When scanning route cache hash table, we can avoid taking locks for empty buckets. Both /proc/net/rt_cache and NETLINK RTM_GETROUTE interface are taken into account. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-28 01:11:25 -07:00
Andi Kleen	6be547a61d	inet_diag: Add empty bucket optimization to inet_diag too Skip quickly over empty buckets in inet_diag. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-28 01:09:54 -07:00
Andi Kleen	6eac560407	tcp: Skip empty hash buckets faster in /proc/net/tcp On most systems most of the TCP established/time-wait hash buckets are empty. When walking the hash table for /proc/net/tcp their read locks would always be aquired just to find out they're empty. This patch changes the code to check first if the buckets have any entries before taking the lock, which is much cheaper than taking a lock. Since the hash tables are large this makes a measurable difference on processing /proc/net/tcp, especially on architectures with slow read_lock (e.g. PPC) On a 2GB Core2 system time cat /proc/net/tcp > /dev/null (with a mostly empty hash table) goes from 0.046s to 0.005s. On systems with slower atomics (like P4 or POWER4) or larger hash tables (more RAM) the difference is much higher. This can be noticeable because there are some daemons around who regularly scan /proc/net/tcp. Original idea for this patch from Marcus Meissner, but redone by me. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-28 01:08:02 -07:00
Vlad Yasevich	d97240552c	sctp: fix random memory dereference with SCTP_HMAC_IDENT option. The number of identifiers needs to be checked against the option length. Also, the identifier index provided needs to be verified to make sure that it doesn't exceed the bounds of the array. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 16:09:49 -07:00
Vlad Yasevich	328fc47ea0	sctp: correct bounds check in sctp_setsockopt_auth_key The bonds check to prevent buffer overlflow was not exactly right. It still allowed overflow of up to 8 bytes which is sizeof(struct sctp_authkey). Since optlen is already checked against the size of that struct, we are guaranteed not to cause interger overflow either. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 16:08:54 -07:00
David S. Miller	4d40555250	Merge branch 'lvs-next-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6	2008-08-27 05:11:26 -07:00
David S. Miller	6c36810a73	Merge branch 'no-iwlwifi' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-08-27 04:29:50 -07:00
Hugh Dickins	d994af0d50	ipv4: mode 0555 in ipv4_skeleton vpnc on today's kernel says Cannot open "/proc/sys/net/ipv4/route/flush": d--------- 0 root root 0 2008-08-26 11:32 /proc/sys/net/ipv4/route d--------- 0 root root 0 2008-08-26 19:16 /proc/sys/net/ipv4/neigh Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:35:18 -07:00
Philip Love	7982d5e1b3	tcp: fix tcp header size miscalculation when window scale is unused The size of the TCP header is miscalculated when the window scale ends up being 0. Additionally, this can be induced by sending a SYN to a passive open port with a window scale option with value 0. Signed-off-by: Philip Love <love_phil@emc.com> Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:33:50 -07:00
Jarek Poplawski	f6f9b93f16	pkt_sched: Fix gen_estimator locks While passing a qdisc root lock to gen_new_estimator() and gen_replace_estimator() dev could be deactivated or even before grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so using qdisc_root_lock() is not enough. This patch adds qdisc_root_sleeping_lock() for this, plus additional checks, where necessary. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:25:17 -07:00
Jarek Poplawski	f7a54c13c7	pkt_sched: Use rcu_assign_pointer() to change dev_queue->qdisc These pointers are RCU protected, so proper primitives should be used. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:22:07 -07:00
Jarek Poplawski	666d9bbedf	pkt_sched: Fix dev_graft_qdisc() locking During dev_graft_qdisc() dev is deactivated, so qdisc_root_lock() returns wrong lock of noop_qdisc instead of qdisc_sleeping. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:15:20 -07:00
Gerrit Renker	eff253c427	dccp ccid-3: Replace lazy BUG_ON with condition The BUG_ON(w_tot == 0) only holds if there is no more than 1 loss interval in the loss history. If there is only a single loss interval, the calc_i_mean() routine need in fact not be called (RFC 3448, 6.3.1). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	157439fa4a	dccp: Toggle debug output without module unloading This sets the sysfs permissions so that root can toggle the `debug' parameter available for nearly every DCCP module. This is useful since there are various module inter-dependencies. The debug flag can now be toggled at runtime using echo 1 > /sys/module/dccp/parameters/dccp_debug echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug The last is not very useful yet, since no code at the moment calls the tfrc_debug() macro. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	b569d5a134	dccp: Empty the write queue when disconnecting dccp_disconnect() can be called due to several reasons: 1. when the connection setup failed (inet_stream_connect()); 2. when shutting down (inet_shutdown(), inet_csk_listen_stop()); 3. when aborting the connection (dccp_close() with 0 linger time). In case (1) the write queue is empty. This patch empties the write queue, if in case (2) or (3) it was not yet empty. This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues() later on. It also seems natural to do: when breaking an association, to delete all packets that were originally intended for the soon-disconnected end (compare with call to tcp_write_queue_purge in tcp_disconnect()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	5a056417e6	dccp: Fill in the Data fields for "Option Error" Resets This updates the use of the `out_invalid_option' label, which produces a Reset (code 5, "Option Error"), to fill in the Data1...Data3 fields as specified in RFC 4340, 5.6. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	1efa6bbac8	dccp: Silently ignore options with nonsensical lengths This updates the option-parsing code with regard to RFC 4340, 5.8: "[..] options with nonsensical lengths (length byte less than two or more than the remaining space in the options portion of the header) MUST be ignored, and any option space following an option with nonsensical length MUST likewise be ignored." Hence in the following cases erratic options will be ignored: 1. The type byte of a multi-byte option is the last byte of the header options (i.e. effective option length of 1). 2. The value of the length byte is less than the minimum 2. This has been changed from previously 3: although no multi-byte option with a length less than 3 yet exists (cf. table 3 in 5.8), a length of 2 is valid. (The switch-statement in dccp_parse has further per-option length checks.) 3. The option length exceeds the length of the remaining option space. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:21:59 +02:00
Wei Yongjun	33c449675c	dccp: Always generate a Reset in response to option errors RFC4340 states that if a packet is received with an option error (such as a Mandatory Option as the last byte of the option list), the endpoint should repond with a Reset. In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset, while in the REQUEST/OPEN states, packets with option errors are just ignored. The packet sequence is as follows: Case 1: Endpoint A Endpoint B (CLOSED) (CLOSED) <---------------- REQUEST RESPONSE -----------------> (1) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (1) currently just ignored, no Reset is sent Case 2: Endpoint A Endpoint B (OPEN) (OPEN) DATA-ACK -----------------> (2) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (2) currently just ignored, no Reset is sent This patch fixes the problem, by generating a Reset instead of silently ignoring option errors. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:21:59 +02:00
Simon Horman	7fd1067851	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6 into lvs-next-2.6	2008-08-27 15:11:37 +10:00
Julius Volz	e3c2ced8d2	IPVS: Rename ip_vs_proto_ah.c to ip_vs_proto_ah_esp.c After integrating ESP into ip_vs_proto_ah, rename it (and the references to it) to ip_vs_proto_ah_esp.c and delete the old ip_vs_proto_esp.c. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-27 13:50:37 +10:00
Julius Volz	409a19669e	IPVS: Integrate ESP protocol into ip_vs_proto_ah.c Rename all ah_* functions to ah_esp_* (and adjust comments). Move ESP protocol definition into ip_vs_proto_ah.c and remove all usage of ip_vs_proto_esp.c. Make the compilation of ip_vs_proto_ah.c dependent on a new config variable, IP_VS_PROTO_AH_ESP, which is selected either by IP_VS_PROTO_ESP or IP_VS_PROTO_AH. Only compile the selected protocols' structures within this file. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-27 13:50:35 +10:00
John W. Linville	576fdeaef6	mac80211: quiet chatty IBSS merge message It seems obvious that this #ifndef should be the opposite polarity... Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:33:34 -04:00
Jan-Espen Pettersen	8ab65b03b7	mac80211: don't send empty extended rates IE The association request includes a list of supported data rates. 802.11b: 4 supported rates. 802.11g: 12 (8 + 4) supported rates. 802.11a: 8 supported rates. The rates tag of the assoc request has room for only 8 rates. In case of 802.11g an extended rate tag is appended. However in net/wireless/mlme.c an extended (empty) rate tag is also appended if the number of rates is exact 8. This empty (length=0) extended rates tag causes some APs to deny association with code 18 (unsupported rates). These APs include my ZyXEL G-570U, and according to Tomas Winkler som Cisco APs. 'If count == 8' has been used to check for the need for an extended rates tag. But count would also be equal to 8 if the for loop exited because of no more supported rates. Therefore a check for count being less than rates_len would seem more correct. Thanks to: * Dan Williams for newbie guidance * Tomas Winkler for confirming the problem Signed-off-by: Jan-Espen Pettersen <sigsegv@radiotube.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:33 -04:00
Jouni Malinen	93015f0f34	mac80211: Fix debugfs file add/del for netdev Previous version was using incorrect union structures for non-AP interfaces when adding and removing max_ratectrl_rateidx and force_unicast_rateidx entries. Depending on the vif type, this ended up in corrupting debugfs entries since the dentries inside different union structures ended up going being on top of eachother.. As the end result, debugfs files were being left behind with references to freed data (instant kernel oops on access) and directories were not removed properly when unloading mac80211 drivers. This patch fixes those issues by using only a single union structure based on the vif type. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:33 -04:00
Julia Lawall	667d8af9af	net/mac80211/mesh.c: correct the argument to __mesh_table_free In the function mesh_table_grow, it is the new table not the argument table that should be freed if the function fails (cf commit `bd9b448f4c`) The semantic match that detects this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; expression E,f; position p1,p2,p3; identifier l; statement S; @@ x = mesh_table_alloc@p1(...) ... if (x == NULL) S ... when != E = x when != mesh_table_free(x) goto@p2 l; ... when != E = x when != f(...,x,...) when any ( return $0\\|x$; \| return@p3 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; p3 << r.p3; @@ print "%s: call on line %s not freed or saved before return on line %s via line %s" % (p1[0].file,p1[0].line,p3[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:32 -04:00
Jouni Malinen	087d833e5a	mac80211: Use IWEVASSOCREQIE instead of IWEVCUSTOM The previous code was using IWEVCUSTOM to report IEs from AssocReq and AssocResp frames into user space. This can easily hit the 256 byte limit (IW_CUSTOM_MAX) with APs that include number of vendor IEs in AssocResp. This results in the event message not being sent and dmesg showing "wlan0 (WE) : Wireless Event too big (366)" type of errors. Convert mac80211 to use IWEVASSOCREQIE/IWEVASSOCRESPIE to avoid the issue of being unable to send association IEs as wireless events. These newer event types use binary encoding and larger maximum size (IW_GENERIC_IE_MAX = 1024), so the likelyhood of not being able to send the IEs is much smaller than with IWEVCUSTOM. As an extra benefit, the code is also quite a bit simpler since there is no need to allocate an extra buffer for hex encoding. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:32 -04:00
Felipe Balbi	988b02f1bf	net: rfkill: add missing line break Trivial patch adding a missing line break on rfkill_claim_show(). Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com> Acked-by: Ivo van Doorn <IvDoorn@gmail.co> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:31 -04:00
Al Viro	ce3113ec57	ipv6: sysctl fixes Braino: net.ipv6 in ipv6 skeleton has no business in rotable class Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-25 15:18:15 -07:00
Al Viro	2f4520d35d	ipv4: sysctl fixes net.ipv4.neigh should be a part of skeleton to avoid ordering problems Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-25 15:17:44 -07:00
Vlad Yasevich	30c2235cbc	sctp: add verification checks to SCTP_AUTH_KEY option The structure used for SCTP_AUTH_KEY option contains a length that needs to be verfied to prevent buffer overflow conditions. Spoted by Eugene Teo <eteo@redhat.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-25 15:16:19 -07:00
Stephen Hemminger	f410a1fba7	ipv6: protocol for address routes This fixes a problem spotted with zebra, but not sure if it is necessary a kernel problem. With IPV6 when an address is added to an interface, Zebra creates a duplicate RIB entry, one as a connected route, and other as a kernel route. When an address is added to an interface the RTN_NEWADDR message causes Zebra to create a connected route. In IPV4 when an address is added to an interface a RTN_NEWROUTE message is set to user space with the protocol RTPROT_KERNEL. Zebra ignores these messages, because it already has the connected route. The problem is that route created in IPV6 has route protocol == RTPROT_BOOT. Was this a design decision or a bug? This fixes it. Same patch applies to both net-2.6 and stable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:16:46 -07:00
Ilpo Järvinen	a4356b2920	tcp: Add tcp_parse_aligned_timestamp Some duplicated code lying around. Located with my suffix tree tool. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:12:29 -07:00
Ilpo Järvinen	2cf46637b5	tcp: Add tcp_collapse_one to eliminate duplicated code Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:11:41 -07:00
Ilpo Järvinen	cbe2d128a0	tcp: Add tcp_validate_incoming & put duplicated code there Large block of code duplication removed. Sadly, the return value thing is a bit tricky here but it seems the most sensible way to return positive from validator on success rather than negative. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:10:12 -07:00
Denis V. Lunev	fdc0bde90a	icmp: icmp_sk() should not use smp_processor_id() in preemptible code Pass namespace into icmp_xmit_lock, obtain socket inside and return it as a result for caller. Thanks Alexey Dobryan for this report: Steps to reproduce: CONFIG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y tracepath <something> BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205 caller is icmp_sk+0x15/0x30 Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1 Call Trace: [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0 [<ffffffff80409405>] icmp_sk+0x15/0x30 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60 [<ffffffff803e6650>] ? dst_output+0x0/0x10 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60 [<ffffffff803e92e3>] ip_output+0xa3/0xf0 [<ffffffff803e68d0>] ip_local_out+0x20/0x30 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80 [<ffffffff803ba50a>] sys_sendto+0xea/0x120 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0 [<ffffffff8024e0c6>] ? up_read+0x26/0x30 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b icmp6_sk() is similar. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 04:43:33 -07:00
Ron Rindjunsky	9859b81eae	mac80211: add direct probe before association This patch adds a direct probe request as first step in the association flow if data we have is not up to date. Motivation of this step is to make sure that the bss information we have is correct, since last scan could have been done a while ago, and beacons do not fully answer this need as there are potential differences between them and probe responses (e.g. WMM parameter element) Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:30:00 -04:00
Ron Rindjunsky	6042a3e3ff	mac80211: change number of pre-assoc scans This patch fixes noticed problem in noisy environments of 50+ APs that scan fails to find the requested AP on first try, which leads to connection refusal. second scan has empirically proven to fix this problem in almost all cases. Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Esti Kummer <ester.kummer@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:59 -04:00
Tomas Winkler	48c2fc59aa	mac80211: cleanup mlme state namespace This patch move add STA_MLME to station mlme state defines. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:59 -04:00
Tomas Winkler	8e7cdbb633	mac80211: filter probes in ieee80211_rx_mgmt_probe_resp This patch moves filtering statement from ieee80211_rx_bss_info which is called for both beacon and probe to ieee80211_rx_mgmt_probe_resp and save few cycles in beacon parsing. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:58 -04:00
Jasper Bryant-Greene	f698d856f6	replace net_device arguments with ieee80211_{local,sub_if_data} as appropriate This patch replaces net_device arguments to mac80211 internal functions with ieee80211_{local,sub_if_data} as appropriate. It also does the same for many 802.11s mesh functions, and changes the mesh path table to be indexed on sub_if_data rather than net_device. If the mesh part needs to be a separate patch let me know, but since mesh uses a lot of mac80211 functions which were being converted anyway, the changes go hand-in-hand somewhat. This patch probably does not convert all the functions which could be converted, but it is a large chunk and followup patches will be provided. Signed-off-by: Jasper Bryant-Greene <jasper@amiton.co.nz> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:58 -04:00
Jasper Bryant-Greene	fef1643bf0	move ETH_P_PAE from ieee80211_i.h to if_ether.h ETH_P_PAE belongs in if_ether.h with the other ETH_P_* definitions. This patch moves it there. Signed-off-by: Jasper Bryant-Greene <jasper@amiton.co.nz> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:57 -04:00
Henrique de Moraes Holschuh	96c87607ac	rfkill: introduce RFKILL_STATE_MAX While it is interesting to not add last-enum-markers because it allows gcc to warn us of switch() statements missing a valid state, we really should be handling memory corruption on a rfkill state with default clauses, anyway. So add RFKILL_STATE_MAX and use it where applicable. It makes for safer code in the long run. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:57 -04:00
Henrique de Moraes Holschuh	77fba13ccc	rfkill: add __must_check annotations rfkill is not a small, mere detail in wireless support. Once it starts supporting rfkill and users start counting on that support, a wireless device is at risk of operating in dangerous conditions should rfkill support fail to properly activate. Therefore, add the required __must_check annotations on some key functions of the rfkill API, for which the wireless drivers absolutely MUST handle the failure mode safely in order to avoid a potentially dangerous situation where the wireless transmitter is left enabled when the user don't want it to. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:57 -04:00
Henrique de Moraes Holschuh	9961920199	rfkill: add default global states (v2) Add a second set of global states, "rfkill_default_states", to track the state that will be used when the first rfkill class of a given type is registered, and also to save "undo" information when rfkill_epo is called. Add a new exported function, rfkill_set_default(), which can be used by platform drivers to restore radio state saved by the platform across reboots or shutdown. Also, fix rfkill_epo to properly update rfkill_states, but still preserve a copy of the state so that we can undo the effect of rfkill_epo later if we want to. Add rfkill_restore_states() to restore rfkill_states from the copy. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:56 -04:00
Henrique de Moraes Holschuh	02589f6051	rfkill: detect bogus double-registering (v2) Detect and abort with -EEXIST if rfkill_register is called twice on the same rfkill struct. And WARN_ON(it) for good measure. While at it, flag when we are adding the first switch of a type, we will need that information later. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:56 -04:00
Luis Carlos Cobo	bdbe819540	mac80211: allow no mac address until firmware load Originally by Johannes Berg. This patch adds support for devices that do not report their MAC address until the firmware is loaded. While the address is not known, a multicast on is used. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:55 -04:00
Harvey Harrison	4eb2ae9a42	mac80211: remove WLAN_FC_DATA_PRESENT All users are gone now. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:55 -04:00
Harvey Harrison	a4b7d7bda5	mac80211: remove rx/tx_data->fc member Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:54 -04:00
Harvey Harrison	358c8d9d33	mac80211: use ieee80211 frame control directly Remove the last users of the rx/tx_data->fc data members and use the le16 frame_control from the header directly. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:54 -04:00

1 2 3 4 5 ...

10031 Commits