1
Commit Graph

5522 Commits

Author SHA1 Message Date
Linus Torvalds
edb1e9671a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [VLAN]: Fix net_device leak.
  [PPP] generic: Fix receive path data clobbering & non-linear handling
  [PPP] generic: Call skb_cow_head before scribbling over skb
  [NET] skbuff: Add skb_cow_head
  [BRIDGE]: Kill clone argument to br_flood_*
  [PPP] pppoe: Fill in header directly in __pppoe_xmit
  [PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value
  [PPP] pppoe: Fix skb_unshare_check call position
  [SCTP]: Convert bind_addr_list locking to RCU
  [SCTP]: Add RCU synchronization around sctp_localaddr_list
  [PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning
  [PKTGEN]: srcmac fix
  [IPV6]: Fix source address selection.
  [IPV4]: Just increment OutDatagrams once per a datagram.
  [IPV6]: Just increment OutDatagrams once per a datagram.
  [IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
  [NET_SCHED] protect action config/dump from irqs
  [NET]: Fix two issues wrt. SO_BINDTODEVICE.
2007-09-16 21:14:54 -07:00
Al Viro
d9f30ec0b0 [VLAN]: Fix net_device leak.
In "[VLAN]: Move device registation to seperate function" (commit
e89fe42cd0), a pile of code got moved
to register_vlan_dev(), including grabbing a reference to underlying
device.  However, original dev_hold() had been left behind, so we
leak a reference to net_device now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 16:43:04 -07:00
Herbert Xu
d9cc20484e [NET] skbuff: Add skb_cow_head
This patch adds an optimised version of skb_cow that avoids the copy if
the header can be modified even if the rest of the payload is cloned.

This can be used in encapsulating paths where we only need to modify the
header.  As it is, this can be used in PPPOE and bridging.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 16:21:16 -07:00
Herbert Xu
e081e1e3ef [BRIDGE]: Kill clone argument to br_flood_*
The clone argument is only used by one caller and that caller can clone
the packet itself.  This patch moves the clone call into the caller and
kills the clone argument.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 16:20:48 -07:00
Vlad Yasevich
559cf710b0 [SCTP]: Convert bind_addr_list locking to RCU
Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU.  This includes the
sctp_bind_addrs structure and it's list of bound addresses.

This list is currently protected by an external rw_lock and that
looks like an overkill.  There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other.  These are also relatively rare, so we
should be good with RCU.

The readers are varied and they are easily converted to RCU.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 16:03:28 -07:00
Vlad Yasevich
2930354799 [SCTP]: Add RCU synchronization around sctp_localaddr_list
sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers.  As a result,
the readers can access an entry that has been freed and
crash the sytem.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 16:02:12 -07:00
Satyam Sharma
ddeee3ce7f [PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning
net/sched/sch_cbq.c: In function 'cbq_enqueue':
net/sched/sch_cbq.c:383: warning: 'ret' may be used uninitialized in this function

has been verified to be a bogus case. So let's shut it up.

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 14:54:05 -07:00
Adit Ranadive
ce5d0b47f1 [PKTGEN]: srcmac fix
From: Adit Ranadive <adit.262@gmail.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 14:52:15 -07:00
Jiri Kosina
6ae5f983cf [IPV6]: Fix source address selection.
The commit 95c385 broke proper source address selection for cases in which 
there is a address which is makred 'deprecated'. The commit mistakenly 
changed ifa->flags to ifa_result->flags (probably copy/paste error from a 
few lines above) in the 'Rule 3' address selection code.

The patch restores the previous RFC-compliant behavior.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-16 14:48:21 -07:00
YOSHIFUJI Hideaki
2a0c6c980d [IPV4]: Just increment OutDatagrams once per a datagram.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-14 17:15:19 -07:00
YOSHIFUJI Hideaki
cd562c9859 [IPV6]: Just increment OutDatagrams once per a datagram.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-14 17:15:01 -07:00
YOSHIFUJI Hideaki
3ef9d943d2 [IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-14 16:45:40 -07:00
Jamal Hadi Salim
e1e992e52f [NET_SCHED] protect action config/dump from irqs
(with no apologies to C Heston)

On Mon, 2007-10-09 at 21:00 +0800, Herbert Xu wrote:
On Sun, Sep 02, 2007 at 01:11:29PM +0000, Christian Kujau wrote:
> >
> > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep
> > was quite noisy when I tried to shape my external (wireless) interface:
> >
> > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock:
> > [ 6400.534713]  (&dev->ingress_lock){-+..}, at: [<c038d595>]
> > netif_receive_skb+0x2d5/0x3c0
> > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the
> > past:
> > [ 6400.535145]  (police_lock){-.--}
>
> This is a genuine dead-lock.  The police lock can be taken
> for reading with softirqs on.  If a second CPU tries to take
> the police lock for writing, while holding the ingress lock,
> then a softirq on the first CPU can dead-lock when it tries
> to get the ingress lock.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-14 16:43:05 -07:00
David S. Miller
4878809f71 [NET]: Fix two issues wrt. SO_BINDTODEVICE.
1) Comments suggest that setting optlen to zero will unbind
   the socket from whatever device it might be attached to.  This
   hasn't been the case since at least 2.2.x because the first thing
   this function does is return -EINVAL if 'optlen' is less than
   sizeof(int).

   This check also means that passing in a two byte string doesn't
   work so well.  It's almost as if this code was testing with "eth?"
   patterned strings and nothing else :-)

   Fix this by breaking the logic of this facility out into a
   seperate function which validates optlen more appropriately.

   The optlen==0 and small string cases now work properly.

2) We should reset the cached route of the socket after we have made
   the device binding changes, not before.

Reported by Ben Greear.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-14 16:41:03 -07:00
Neil Brown
7a1fa065a0 Correctly close old nfsd/lockd sockets.
Commit aaf68cfbf2 added a bias
to sk_inuse, so this test for an unused socket now fails.  So no
sockets get closed because they are old (they might get closed
if the client closed them).

This bug has existed since 2.6.21-rc1.

Thanks to Wolfgang Walter for finding and reporting the bug.

Cc: Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-14 13:58:11 -07:00
David S. Miller
1da97f83a8 [BLUETOOTH]: Fix non-COMPAT build of hci_sock.c
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-12 14:10:58 +02:00
Patrick McHardy
0a9c730144 [INET_DIAG]: Fix oops in netlink_rcv_skb
netlink_run_queue() doesn't handle multiple processes processing the
queue concurrently. Serialize queue processing in inet_diag to fix
a oops in netlink_rcv_skb caused by netlink_run_queue passing a
NULL for the skb.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000054
[349587.500454]  printing eip:
[349587.500457] c03318ae
[349587.500459] *pde = 00000000
[349587.500464] Oops: 0000 [#1]
[349587.500466] PREEMPT SMP
[349587.500474] Modules linked in: w83627hf hwmon_vid i2c_isa
[349587.500483] CPU:    0
[349587.500485] EIP:    0060:[<c03318ae>]    Not tainted VLI
[349587.500487] EFLAGS: 00010246   (2.6.22.3 #1)
[349587.500499] EIP is at netlink_rcv_skb+0xa/0x7e
[349587.500506] eax: 00000000   ebx: 00000000   ecx: c148d2a0   edx: c0398819
[349587.500510] esi: 00000000   edi: c0398819   ebp: c7a21c8c   esp: c7a21c80
[349587.500517] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[349587.500521] Process oidentd (pid: 17943, ti=c7a20000 task=cee231c0 task.ti=c7a20000)
[349587.500527] Stack: 00000000 c7a21cac f7c8ba78 c7a21ca4 c0331962 c0398819 f7c8ba00 0000004c
[349587.500542]        f736f000 c7a21cb4 c03988e3 00000001 f7c8ba00 c7a21cc4 c03312a5 0000004c
[349587.500558]        f7c8ba00 c7a21cd4 c0330681 f7c8ba00 e4695280 c7a21d00 c03307c6 7fffffff
[349587.500578] Call Trace:
[349587.500581]  [<c010361a>] show_trace_log_lvl+0x1c/0x33
[349587.500591]  [<c01036d4>] show_stack_log_lvl+0x8d/0xaa
[349587.500595]  [<c010390e>] show_registers+0x1cb/0x321
[349587.500604]  [<c0103bff>] die+0x112/0x1e1
[349587.500607]  [<c01132d2>] do_page_fault+0x229/0x565
[349587.500618]  [<c03c8d3a>] error_code+0x72/0x78
[349587.500625]  [<c0331962>] netlink_run_queue+0x40/0x76
[349587.500632]  [<c03988e3>] inet_diag_rcv+0x1f/0x2c
[349587.500639]  [<c03312a5>] netlink_data_ready+0x57/0x59
[349587.500643]  [<c0330681>] netlink_sendskb+0x24/0x45
[349587.500651]  [<c03307c6>] netlink_unicast+0x100/0x116
[349587.500656]  [<c0330f83>] netlink_sendmsg+0x1c2/0x280
[349587.500664]  [<c02fcce9>] sock_sendmsg+0xba/0xd5
[349587.500671]  [<c02fe4d1>] sys_sendmsg+0x17b/0x1e8
[349587.500676]  [<c02fe92d>] sys_socketcall+0x230/0x24d
[349587.500684]  [<c01028d2>] syscall_call+0x7/0xb
[349587.500691]  =======================
[349587.500693] Code: f0 ff 4e 18 0f 94 c0 84 c0 0f 84 66 ff ff ff 89 f0 e8 86 e2 fc ff e9 5a ff ff ff f0 ff 40 10 eb be 55 89 e5 57 89 d7 56 89 c6 53 <8b> 50 54 83 fa 10 72 55 8b 9e 9c 00 00 00 31 c9 8b 03 83 f8 0f

Reported by Athanasius <link@miggy.org>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 11:33:28 +02:00
YOSHIFUJI Hideaki
e1f52208bb [IPv6]: Fix NULL pointer dereference in ip6_flush_pending_frames
Some of skbs in sk->write_queue do not have skb->dst because
we do not fill skb->dst when we allocate new skb in append_data().

BTW, I think we may not need to (or we should not) increment some stats
when using corking; if 100 sendmsg() (with MSG_MORE) result in 2 packets,
how many should we increment?

If 100, we should set skb->dst for every queued skbs.

If 1 (or 2 (*)), we increment the stats for the first queued skb and
we should just skip incrementing OutDiscards for the rest of queued skbs,
adn we should also impelement this semantics in other places;
e.g., we should increment other stats just once, not 100 times.

*: depends on the place we are discarding the datagram.

I guess should just increment by 1 (or 2).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 11:31:43 +02:00
Neil Horman
16fcec35e7 [NETFILTER]: Fix/improve deadlock condition on module removal netfilter
So I've had a deadlock reported to me.  I've found that the sequence of
events goes like this:

1) process A (modprobe) runs to remove ip_tables.ko

2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
increasing the ip_tables socket_ops use count

3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
in the kernel, which in turn executes the ip_tables module cleanup routine,
which calls nf_unregister_sockopt

4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
calling process into uninterruptible sleep, expecting the process using the
socket option code to wake it up when it exits the kernel

4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
ipt_find_table_lock, which in this case calls request_module to load
ip_tables_nat.ko

5) request_module forks a copy of modprobe (process C) to load the module and
blocks until modprobe exits.

6) Process C. forked by request_module process the dependencies of
ip_tables_nat.ko, of which ip_tables.ko is one.

7) Process C attempts to lock the request module and all its dependencies, it
blocks when it attempts to lock ip_tables.ko (which was previously locked in
step 3)

Theres not really any great permanent solution to this that I can see, but I've
developed a two part solution that corrects the problem

Part 1) Modifies the nf_sockopt registration code so that, instead of using a
use counter internal to the nf_sockopt_ops structure, we instead use a pointer
to the registering modules owner to do module reference counting when nf_sockopt
calls a modules set/get routine.  This prevents the deadlock by preventing set 4
from happening.

Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
remove operations (the same way rmmod does), and add an option to explicity
request blocking operation.  So if you select blocking operation in modprobe you
can still cause the above deadlock, but only if you explicity try (and since
root can do any old stupid thing it would like....  :)  ).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 11:28:26 +02:00
Patrick McHardy
0fb9670137 [NETFILTER]: nf_conntrack_ipv4: fix "Frag of proto ..." messages
Since we're now using a generic tuple decoding function in ICMP
connection tracking, ipv4_get_l4proto() might get called with a
fragmented packet from within an ICMP error. Remove the error
message we used to print when this happens.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 11:27:01 +02:00
David S. Miller
66eb50d5c9 Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2007-09-11 11:15:30 +02:00
Denis V. Lunev
9e3be4b343 [IPV6]: Freeing alive inet6 address
From: Denis V. Lunev <den@openvz.org>

addrconf_dad_failure calls addrconf_dad_stop which takes referenced address
and drops the count. So, in6_ifa_put perrformed at out: is extra. This
results in message: "Freeing alive inet6 address" and not released dst entries.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 11:04:49 +02:00
Patrick McHardy
a2221f308d [DECNET]: Fix interface address listing regression.
Not all are listed, same as the IPV4 devinet bug.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 10:45:15 +02:00
Stephen Hemminger
596e415095 [IPV4] devinet: show all addresses assigned to interface
Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876

Not all ips are shown by "ip addr show" command when IPs number assigned to an
interface is more than 60-80 (in fact it depends on broadcast/label etc
presence on each address).

Steps to reproduce:
It's terribly simple to reproduce:

# for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done
# ip addr show

this will _not_ show all IPs.
Looks like the problem is in netlink/ipv4 message processing.

This is fix from bug submitter, it looks correct.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 10:41:04 +02:00
Herbert Xu
ef8aef55ce [NET]: Do not dereference iov if length is zero
When msg_iovlen is zero we shouldn't try to dereference
msg_iov.  Right now the only thing that tries to do so
is skb_copy_and_csum_datagram_iovec.  Since the total
length should also be zero if msg_iovlen is zero, it's
sufficient to check the total length there and simply
return if it's zero.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-11 10:29:07 +02:00
Marcel Holtmann
89f2783ded [Bluetooth] Fix parameter list for event filter command
On device initialization the event filters are cleared. In case of
clearing the filters the extra condition type shall be omitted.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-09-09 08:39:49 +02:00
Marcel Holtmann
7c631a6760 [Bluetooth] Update security filter for Bluetooth 2.1
This patch updates the HCI security filter with support for the
Bluetooth 2.1 commands and events.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-09-09 08:39:43 +02:00
Marcel Holtmann
767c5eb5d3 [Bluetooth] Add compat handling for timestamp structure
The timestamp structure needs special handling in case of compat
programs. Use the same wrapping method the network core uses.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2007-09-09 08:39:34 +02:00
David S. Miller
5c127c58ae [TCP]: 'dst' can be NULL in tcp_rto_min()
Reported by Rick Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-31 14:39:44 -07:00
Pavel Emelyanov
88282c6ecf [PKTGEN]: Remove write-only variable.
The pktgen_thread.pid is set to current->pid and is never used
after this. So remove this at all.

Found during isolating the explicit pid/tgid usage.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-30 22:46:36 -07:00
Jesper Bengtsson
a289d70d74 [NETFILTER]: xt_tcpudp: fix wrong struct in udp_checkentry
It doesn't seem to have any effect on the x86 architecture but it does
have effect on the Axis CRIS architecture.

Signed-off-by: Jesper Bengtsson <jesper.bengtsson@axis.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-30 22:36:43 -07:00
Lucas Nussbaum
dbaaa07a60 [NET_SCHED] sch_prio.c: remove duplicate call of tc_classify()
When CONFIG_NET_CLS_ACT is enabled, tc_classify() is called twice in
prio_classify(). This causes "interesting" behaviour: with the setup
below, packets are duplicated, sent twice to ifb0, and then loop in and
out of ifb0.

The patch uses the previously calculated return value in the switch,
which is probably what Patrick had in mind in commit
bdba91ec70 -- maybe Patrick can
double-check this?

-- example setup --
ifconfig ifb0 up
tc qdisc add dev ifb0 root netem delay 2s
tc qdisc add dev $ETH root handle 1: prio
tc filter add dev $ETH parent 1: protocol ip prio 10 u32 \
 match ip dst 172.24.110.6/32 flowid 1:1 \
 action mirred egress redirect dev ifb0
ping -c1 172.24.110.6

Signed-off-by: Lucas Nussbaum <lucas.nussbaum@imag.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-30 22:35:46 -07:00
Stephen Hemminger
b4a488d182 [BRIDGE]: Fix OOPS when bridging device without ethtool.
Bridge code calls ethtool to get speed. The conversion to using
only ethtool_ops broke the case of devices without ethtool_ops.
This is a new regression in 2.6.23.

Rearranged the switch to a logical order, and use gcc initializer.

Ps: speed should have been part of the network device structure from
    the start rather than burying it in ethtool.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-30 22:16:22 -07:00
Stephen Hemminger
df1c0b8468 [BRIDGE]: Packets leaking out of disabled/blocked ports.
This patch fixes some packet leakage in bridge.  The bridging code was
allowing forward table entries to be generated even if a device was
being blocked. The fix is to not add forwarding database entries
unless the port is active.

The bug arose as part of the conversion to processing STP frames
through normal receive path (in 2.6.17).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-30 22:15:35 -07:00
David S. Miller
b91ddd8437 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev 2007-08-30 22:11:31 -07:00
David S. Miller
05bb1fad1c [TCP]: Allow minimum RTO to be configurable via routing metrics.
Cell phone networks do link layer retransmissions and other
things that cause unnecessary timeout retransmits.  So allow
the minimum RTO to be inflated per-route to deal with this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-30 22:10:28 -07:00
Wei Yongjun
cb243a1a9f SCTP: Fix to handle invalid parameter length correctly
If an INIT with invalid parameter length look like this:
Parameter Type : 1
Parameter Length: 800
and not contain any payload, SCTP will ignore this  parameter and send
back a INIT-ACK.
This patch is fix to handle this invalid parameter length correctly.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 16:44:27 -04:00
Vlad Yasevich
609ee4679b SCTP: Abort on COOKIE-ECHO if backlog is exceeded.
Currently we abort on the INIT chunk we our backlog is currenlty
exceeded.  Delay this about untill COOKIE-ECHO to give the user
time to accept the socket.  Also, make sure that we treat
sk_max_backlog of 0 as no connections allowed.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 14:12:25 -04:00
Vlad Yasevich
498d63071e SCTP: Correctly disable listening when backlog is 0.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 14:03:58 -04:00
Vlad Yasevich
d0ce92910b SCTP: Do not retransmit chunks that are newer then rtt.
When performing a retransmit, do not include the chunk if
it was sent less then 1 rtt ago.  The reason is that we
may receive the SACK very soon and wouldn't retransmit.
Suggested by Randy Stewart.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 13:56:06 -04:00
Vlad Yasevich
cc75689a4c SCTP: Uncomfirmed transports can't become Inactive
Do not set Unconfirmed transports to Inactive state.  This may
result in an inactive association being destroyed since we start
counting errors on "inactive" transports against the association.
This was found at the SCTP interop event.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 13:55:41 -04:00
Vlad Yasevich
2772b495ef SCTP: Pick the correct port when binding to 0.
sctp_bindx() allows the use of unspecified port.  The problem is
that every address we bind to ends up selecting a new port if
the user specified port 0.  This patch allows re-use of the
already selected port when the port from bindx was 0.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 13:55:20 -04:00
Wei Yongjun
d99fa42963 SCTP: Use net_ratelimit to suppress error messages print too fast
When multi bundling SHUTDOWN-ACK message is received in ESTAB state,
this will cause "sctp protocol violation state" message print many times.
If SHUTDOWN-ACK is bundled 300 times in one packet, message will be
print 300 times. The same problem also exists when received unexpected
HEARTBEAT-ACK message which is bundled message times.

This patch used net_ratelimit() to suppress error messages print too fast.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 13:52:56 -04:00
Wei Yongjun
00f1c2df2a SCTP: Fix to encode PROTOCOL VIOLATION error cause correctly
PROTOCOL VIOLATION error cause in ABORT is bad encode when make abort
chunk. When SCTP encode ABORT chunk with PROTOCOL VIOLATION error cause,
it just add the error messages to PROTOCOL VIOLATION error cause, the
rest four bytes(struct sctp_paramhdr) is just add to the chunk, not
change the length of error cause. This cause the ABORT chunk to be a bad
format. The chunk is like this:

ABORT chunk
  Chunk type: ABORT (6)
  Chunk flags: 0x00
  Chunk length: 72 (*1)
  Protocol violation cause
    Cause code: Protocol violation (0x000d)
    Cause length: 62 (*2)
    Cause information: 5468652063756D756C61746976652074736E2061636B2062...
    Cause padding: 0000
[Needless] 00030010
Chunk Length(*1) = 72 but Cause length(*2) only 62, not include the
extend 4 bytes.
((72 - sizeof(chunk_hdr)) = 68) != (62 +3) / 4 * 4

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 13:50:48 -04:00
Wei Yongjun
8d614ade51 SCTP: Fix sctp_addto_chunk() to add pad with correct length
At function sctp_addto_chunk(), it do pad before add payload to chunk if
chunk length is not 4-byte alignment. But it do pad with a bad length.
This patch fixed this probleam.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-30 11:56:17 -04:00
Vlad Yasevich
ab3e5e7b65 SCTP: Assign stream sequence numbers to the entire message
Currently we only assign the sequence number to a packet that
we are about to transmit.  This however breaks the Partial
Reliability extensions, because it's possible for us to
never transmit a packet, i.e. it expires before we get to send
it.  In such cases, if the message contained multiple SCTP
fragments, and we did manage to send the first part of the
message, the Stream sequence numbers would get into invalid
state and cause receiver to stall.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-08-29 13:34:34 -04:00
Vlad Yasevich
ea2dfb3733 SCTP: properly clean up fragment and ordering queues during FWD-TSN.
When we recieve a FWD-TSN (meaning the peer has abandoned the data),
we need to clean up any partially received messages that may be
hanging out on the re-assembly or re-ordering queues.  This is
a MUST requirement that was not properly done before.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com.>
2007-08-29 13:34:33 -04:00
Robert Olsson
378be2c083 [PKTGEN]: Fix multiqueue oops.
Initially pkt_dev can be NULL this causes netif_subqueue_stopped to 
oops. The patch below should cure it. But maybe the pktgen TX logic 
should be reworked to better support the new multiqueue support. 

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-28 15:43:14 -07:00
Evgeniy Polyakov
e7c243c925 [VLAN/BRIDGE]: Fix "skb_pull_rcsum - Fatal exception in interrupt"
I tried to preserve bridging code as it was before, but logic is quite
strange - I think we should free skb on error, since it is already
unshared and thus will just leak.

Herbert Xu states:

> +	if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
> +		goto out;

If this happens it'll be a double-free on skb since we'll
return NF_DROP which makes the caller free it too.

We could return NF_STOLEN to prevent that but I'm not sure
whether that's correct netfilter semantics.  Patrick, could
you please make a call on this?

Patrick McHardy states:

NF_STOLEN should work fine here.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-26 18:35:47 -07:00
Benjamin Thery
aaa53c4aba [NET]: Fix crash in dev_mc_sync()/dev_mc_unsync()
This patch fixes a crash that may occur when the routine dev_mc_sync()
deletes an address from the list it is currently going through. It
saves the pointer to the next element before deleting the current one.
The problem may also exist in dev_mc_unsync().

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-26 18:35:43 -07:00