The ipfragok flag controls whether the packet may be fragmented
either on the local host on beyond. The latter is only valid on
IPv4.
In fact, we never want to do the latter even on IPv4 when PMTU is
enabled. This is because even though we can't fragment packets
within SCTP due to the prtocol's inherent faults, we can still
fragment it at IP layer. By setting the DF bit we will improve
the PMTU process.
RFC 2960 only says that we SHOULD clear the DF bit in this case,
so we're compliant even if we set the DF bit. In fact RFC 4960
no longer has this statement.
Once we make this change, we only need to control the local
fragmentation. There is already a bit in the skb which controls
that, local_df. So this patch sets that instead of using the
ipfragok argument.
The only complication is that there isn't a struct sock object
per transport, so for IPv4 we have to resort to changing the
pmtudisc field for every packet. This should be safe though
as the protocol is single-threaded.
Note that after this patch we can remove ipfragok from the rest
of the stack too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netns: fix ip_rt_frag_needed rt_is_expired
netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences
netfilter: fix double-free and use-after free
netfilter: arptables in netns for real
netfilter: ip{,6}tables_security: fix future section mismatch
selinux: use nf_register_hooks()
netfilter: ebtables: use nf_register_hooks()
Revert "pkt_sched: sch_sfq: dump a real number of flows"
qeth: use dev->ml_priv instead of dev->priv
syncookies: Make sure ECN is disabled
net: drop unused BUG_TRAP()
net: convert BUG_TRAP to generic WARN_ON
drivers/net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for flag values which are ORed to the type passwd
to socket and socketpair. The additional code is minimal. The flag
values in this implementation can and must match the O_* flags. This
avoids overhead in the conversion.
The internal functions sock_alloc_fd and sock_map_fd get a new parameters
and all callers are changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>
#define PORT 57392
/* For Linux these must be the same. */
#define SOCK_CLOEXEC O_CLOEXEC
int
main (void)
{
int fd;
fd = socket (PF_INET, SOCK_STREAM, 0);
if (fd == -1)
{
puts ("socket(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("socket(0) set close-on-exec flag");
return 1;
}
close (fd);
fd = socket (PF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0);
if (fd == -1)
{
puts ("socket(SOCK_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (fd);
int fds[2];
if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
{
puts ("socketpair(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
coe = fcntl (fds[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}
if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, fds) == -1)
{
puts ("socketpair(SOCK_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
coe = fcntl (fds[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 20c2c1fd6c
(sctp: add sctp/remaddr table to complete RFC remote address table OID)
added an unused sctp_assoc_proc_exit() function that seems to have been
unintentionally created when copying the assocs code.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_outq_flush() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update sctp global memory limit allocations to be the same as TCP.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multiple socket bind to the same port with SO_REUSEADDR,
only 1 can be listining.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP permits multiple listen call and on subsequent calls
we leak he memory allocated for the crypto transforms.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
valgrind reports uninizialized memory accesses when running
sctp inside the network simulation cradle simulator:
Conditional jump or move depends on uninitialised value(s)
at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324)
by 0x57427DA: sctp_packet_transmit (output.c:403)
by 0x5710EFF: sctp_outq_flush (outqueue.c:824)
by 0x5710B88: sctp_outq_uncork (outqueue.c:701)
by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548)
by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976)
by 0x5744460: sctp_do_sm (sm_sideeffect.c:945)
by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94)
by 0x5725C04: __sctp_connect (socket.c:1094)
by 0x57297DC: sctp_connect (socket.c:3297)
Conditional jump or move depends on uninitialised value(s)
at 0x575D3A5: mod_timer (timer.c:630)
by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555)
by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448)
by 0x5753607: sctp_side_effects (sm_sideeffect.c:976)
by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945)
by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474)
by 0x573347F: sctp_inq_push (inqueue.c:104)
by 0x572EF93: sctp_rcv (input.c:256)
by 0x5689623: ip_local_deliver_finish (ip_input.c:230)
by 0x5689759: ip_local_deliver (ip_input.c:268)
by 0x5689CAC: ip_rcv_finish (dst.h:246)
#1 is due to "if (t->pmtu_pending)".
8a4794914f "[SCTP] Flag a pmtu change request"
suggests it should be initialized to 0.
#2 is the heartbeat timer 'expires' value, which is uninizialised, but
test by mod_timer().
T3_rtx_timer seems to be affected by the same problem, so initialize it, too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This puts CONFIG_PROC_FS defines around the proc init/exit functions
and also avoids compiling proc.c if procfs is not supported.
Also make SCTP_DBG_OBJCNT depend on procfs.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't
have where to get the net from.
I decided to add a sk argument, not the net itself, only to factor
all the required sock_net(sk) calls inside the enter_memory_pressure
callback itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we don't have the buffer space or memory allocations fail,
the data chunk is dropped, but TSN is still reported as received.
This introduced a data loss that can't be recovered. We should
only mark TSNs are received after memory allocations finish.
The one exception is the invalid stream identifier, but that's
due to user error and is reported back to the user.
This was noticed by Michael Tuexen.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Socket options SCTP_GET_PEER_ADDR_OLD, SCTP_GET_PEER_ADDR_NUM_OLD,
SCTP_GET_LOCAL_ADDR_OLD, and SCTP_GET_PEER_LOCAL_ADDR_NUM_OLD
have been replaced by newer versions a since 2005. It's time
to officially deprecate them and schedule them for removal.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by Gabriel Campana, the kmalloc() length arg
passed in by sctp_getsockopt_local_addrs_old() can overflow
if ->addr_num is large enough.
Therefore, enforce an appropriate limit.
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet. If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet. If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses. Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.
We implement this by not servicing our outqueue until we reach the end
of the packet. This enables maximum bundling. We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add to validate initiate tag and chunk type if verification
tag is 0 when handling ICMP message.
RFC 4960, Appendix C. ICMP Handling
ICMP6) An implementation MUST validate that the Verification Tag
contained in the ICMP message matches the Verification Tag of the peer.
If the Verification Tag is not 0 and does NOT match, discard the ICMP
message. If it is 0 and the ICMP message contains enough bytes to
verify that the chunk type is an INIT chunk and that the Initiate Tag
matches the tag of the peer, continue with ICMP7. If the ICMP message
is too short or the chunk type or the Initiate Tag does not match,
silently discard the packet.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the sctp_remaddr_proc_init failed, the proper rollback is
not the sctp_remaddr_proc_exit, but the sctp_assocs_proc_exit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now, any time we set a primary transport we set
the changeover_active flag. As a result, we invoke SFR-CACC
even when there has been no changeover events.
Only set changeover_active, when there is a true changeover
event, i.e. we had a primary path and we are changing to
another transport.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch remove the proc fs entry which has been created if fail to
set up proc fs entry for the SCTP protocol.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change struct proto destroy function pointer to return void. Noticed
by Al Viro.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default sack frequency should be 2. Also fix copy/paste
error when updating all transports.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e9df2e8fd8 ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6. As a
result, SCTP was not marking ECN capablity because the traffic class
was never set. This patch brings back the markings for IPv6 traffic.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fast retransmit is triggered by a sack, we should flush the queue
only once so that only 1 retransmit happens. Also, since we could
potentially have non-fast-rtx chunks on the retransmit queue, we need
make sure any chunks eligable for fast retransmit are sent first
during fast retransmission.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need to execute sctp_v4_dst_saddr() for each
iteration, just move it out of loop.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the current retran_path is the only active one, it should
update it to the the next inactive one.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7cbca67c07 ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduced by c4492586 (sctp: Add address type check while process
paramaters of ASCONF chunk):
net/sctp/sm_make_chunk.c: In function 'sctp_process_asconf':
net/sctp/sm_make_chunk.c:2828: warning: 'addr_param' may be used uninitialized in this function
net/sctp/sm_make_chunk.c:2828: note: 'addr_param' was declared here
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If socket is create by AF_INET type, add IPv6 address to asoc will cause
kernel panic while packet is transmitted on that transport.
This patch add address type check before process paramaters of ASCONF
chunk. If peer is not support this address type, return with error
invald parameter.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If socket is create by PF_INET type, it can not used IPv6 address to
send/recv DATA, So we can not used IPv6 address even if peer tell us it
support IPv6 address.
This patch fix to only enabled peer IPv6 address support on PF_INET6 socket.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The specification of sctp_connectx() has been changed to return
an association id. We've added a new socket option that will
return the association id as the return value from the setsockopt()
call. The library that implements sctp_connectx() interface will
implement both socket options.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brings delayed_ack socket option set/get into line with the latest ietf
socket extensions API draft, while maintaining backwards compatibility.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Plan C: we can follow the Al Viro's proposal about %n like in this patch.
The same applies to udp, fib (the /proc/net/route file), rt_cache and
sctp debug. This is minus ~150-200 bytes for each.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to RFC4960 7.2.2,
When all of the data transmitted by the sender has
been acknowledged by the recerver, partial_bytes_acked is initialized to 0.
This patch conforms to rfc requirement.
Without this fix, cwnd might be error incremented.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>