1
Commit Graph

13651 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
f062f41d06 Phonet: routing table Netlink interface
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:17 -07:00
Rémi Denis-Courmont
55748ac046 Phonet: routing table backend
The Phonet "universe" only has 64 addresses, so we keep a trivial flat
routing table.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:16 -07:00
Rémi Denis-Courmont
f14001fcd7 Phonet: deliver broadcast packets to broadcast sockets
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-14 15:04:15 -07:00
David S. Miller
421355de87 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-13 12:55:20 -07:00
Eric Dumazet
89d71a66c4 net: Use netdev_alloc_skb_ip_align()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 11:48:18 -07:00
David S. Miller
417c5233db Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-10-13 11:41:34 -07:00
roel kluin
06a96b33ae x25: bit and/or confusion in x25_ioctl()?
Looking at commit ebc3f64b86 it appears that this was intended
and not the original, equivalent to `if (facilities.reverse & ~0x81)'.

In x25_parse_facilities() that patch changed how facilities->reverse
was set. No other bits were set than 0x80 and/or 0x01.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:07 -07:00
Eric Dumazet
f373b53b5f tcp: replace ehash_size by ehash_mask
Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:02 -07:00
Cosmin Ratiu
c3faca053d ipv6: fix devconf after adding force_tllao option
Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:44:02 -07:00
Eric Dumazet
8558467201 udp: Fix udp_poll() and ioctl()
udp_poll() can in some circumstances drop frames with incorrect checksums.

Problem is we now have to lock the socket while dropping frames, or risk
sk_forward corruption.

This bug is present since commit 95766fff6b
([UDP]: Add memory accounting.)

While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 03:16:54 -07:00
Willy Tarreau
6d01a026b7 tcp: fix tcp_defer_accept to consider the timeout
I was trying to use TCP_DEFER_ACCEPT and noticed that if the
client does not talk, the connection is never accepted and
remains in SYN_RECV state until the retransmits expire, where
it finally is deleted. This is bad when some firewall such as
netfilter sits between the client and the server because the
firewall sees the connection in ESTABLISHED state while the
server will finally silently drop it without sending an RST.

This behaviour contradicts the man page which says it should
wait only for some time :

       TCP_DEFER_ACCEPT (since Linux 2.4)
          Allows a listener to be awakened only when data arrives
          on the socket.  Takes an integer value  (seconds), this
          can  bound  the  maximum  number  of attempts TCP will
          make to complete the connection. This option should not
          be used in code intended to be portable.

Also, looking at ipv4/tcp.c, a retransmit counter is correctly
computed :

        case TCP_DEFER_ACCEPT:
                icsk->icsk_accept_queue.rskq_defer_accept = 0;
                if (val > 0) {
                        /* Translate value in seconds to number of
                         * retransmits */
                        while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                               val > ((TCP_TIMEOUT_INIT / HZ) <<
                                       icsk->icsk_accept_queue.rskq_defer_accept))
                                icsk->icsk_accept_queue.rskq_defer_accept++;
                        icsk->icsk_accept_queue.rskq_defer_accept++;
                }
                break;

==> rskq_defer_accept is used as a counter of retransmits.

But in tcp_minisocks.c, this counter is only checked. And in
fact, I have found no location which updates it. So I think
that what was intended was to decrease it in tcp_minisocks
whenever it is checked, which the trivial patch below does.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-13 01:35:28 -07:00
Arnaldo Carvalho de Melo
a2e2725541 net: Introduce recvmmsg socket syscall
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 23:40:10 -07:00
Neil Horman
3b885787ea net: Generalize socket rx gap / receive queue overflow cmsg
Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 13:26:31 -07:00
Johannes Berg
d20ef63d32 mac80211: document ieee80211_rx() context requirement
ieee80211_rx() must be called with softirqs disabled
since the networking stack requires this for netif_rx()
and some code in mac80211 can assume that it can not
be processing its own tasklet and this call at the same
time.

It may be possible to remove this requirement after a
careful audit of mac80211 and doing any needed locking
improvements in it along with disabling softirqs around
netif_rx(). An alternative might be to push all packet
processing to process context in mac80211, instead of
to the tasklet, and add other synchronisation.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-12 15:55:53 -04:00
Johannes Berg
51f98f1313 mac80211: fix ibss race
When a scan completes, we call ieee80211_sta_find_ibss(),
which is also called from other places. When the scan was
done in software, there's no problem as both run from the
single-threaded mac80211 workqueue and are thus serialised
against each other, but with hardware scan the completion
can be in a different context and race against callers of
this function from the workqueue (e.g. due to beacon RX).
So instead of calling ieee80211_sta_find_ibss() directly,
just arm the timer and have it fire, scheduling the work,
which will invoke ieee80211_sta_find_ibss() (if that is
appropriate in the current state).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-12 15:55:52 -04:00
Felix Fietkau
5e4708bcb5 mac80211: fix logic error ibss merge bssid check
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-12 15:55:52 -04:00
David S. Miller
d5e63bded6 Revert "af_packet: add interframe drop cmsg (v6)"
This reverts commit 977750076d.

Neil is reimplementing this generically, outside of AF_PACKET.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 03:00:31 -07:00
YOSHIFUJI Hideaki / 吉藤英明
91b2a3f9bb ipv6 sit: Set relay to 0.0.0.0 directly if relay_prefixlen == 0.
ipv6 sit: Set relay to 0.0.0.0 directly if relay_prefixlen == 0.

Do not use bit-shift if relay_prefixlen == 0;
relay_prefix << 32 does not result in 0.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-11 23:41:10 -07:00
YOSHIFUJI Hideaki / 吉藤英明
e7db38c38f ipv6 sit: Fix 6rd relay address.
ipv6 sit: Fix 6rd relay address.

Relay's address should be extracted from real IPv6 address
instead of configured prefix.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-11 23:41:08 -07:00
YOSHIFUJI Hideaki / 吉藤英明
e0c9394815 ipv6 sit: Ensure to initialize 6rd parameters.
ipv6 sit: Ensure to initialize 6rd parameters.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-11 23:41:07 -07:00
David S. Miller
7fe13c5733 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-11 23:15:47 -07:00
jamal
53f7e35f8b pkt_sched: pedit use proper struct
This probably deserves to go into -stable.

Pedit will reject a policy that is large because it
uses the wrong structure in the policy validation.
This fixes it.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-11 23:03:47 -07:00
David S. Miller
8aa0f64ac3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-10-09 14:40:09 -07:00
David S. Miller
67972e0c23 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-10-08 15:55:21 -07:00
Johannes Berg
8a8e05e5d8 cfg80211: fix netns error unwinding bug
The error unwinding code in set_netns has a bug
that will make it run into a BUG_ON if passed a
bad wiphy index, fix by not trying to unlock a
wiphy that doesn't exist.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-08 16:27:00 -04:00
Jiri Pirko
ad61df918c netlink: fix typo in initialization
Commit 9ef1d4c7c7 ("[NETLINK]: Missing
initializations in dumped data") introduced a typo in
initialization. This patch fixes this.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-08 01:21:46 -07:00
Sridhar Samudrala
72dad218f8 bridge: Allow enable/disable UFO on bridge device via ethtool
Allow enable/disable UFO on bridge device via ethtool

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 22:00:24 -07:00
Sridhar Samudrala
d9f5950f90 net: Make UFO on master device independent of attached devices
Now that software UFO is supported, UFO can be enabled on master
devices like bridge, bond even though the attached device doesn't
support this feature in hardware.

This allows UFO to be used between KVM host and guest even when a
physical interface attached to the bridge doesn't support UFO.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 22:00:23 -07:00
Eric Dumazet
f86dcc5aa8 udp: dynamically size hash tables at boot time
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.

4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.

Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helper

Add an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.

dmesg logs two new lines :
[    0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[    0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 22:00:22 -07:00
Alexandre Cassen
8a6dfd43d1 IPv6: Fix 6RD typo
Following fix a small typo.

Signed-off-by: Alexandre Cassen <acassen@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 14:50:30 -07:00
Hagen Paul Pfeifer
4b17d50f9e ipv4: Define cipso_v4_delopt static
There is no reason that cipso_v4_delopt() is not
defined as a static function.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 14:45:58 -07:00
Hagen Paul Pfeifer
9e8342971d econet: Fix redeclaration of symbol len
Function argument len was redeclarated within the
function. This patch fix the redeclaration of symbol 'len'.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 14:43:04 -07:00
Brian Haley
b301e82cf8 IPv6: use ipv6_addr_set_v4mapped()
Might as well use the ipv6_addr_set_v4mapped() inline we created last
year.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:58:25 -07:00
Brian Haley
86c36ce45d IPv6: use ipv6_addr_copy() in ip6_route_redirect()
Change ip6_route_redirect() to use ipv6_addr_copy().

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:58:01 -07:00
Atis Elsts
ffce908246 net: Add sk_mark route lookup support for IPv4 listening sockets
Add support for route lookup using sk_mark on IPv4 listening sockets.

Signed-off-by: Atis Elsts <atis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:55:57 -07:00
Gerrit Renker
996ccf4900 dccp ccid-3: Remove CCID naming redundancy 2/2
This continues the previous patch, by applying the same change to CCID-3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:51:24 -07:00
Gerrit Renker
77d2dd9374 dccp ccid-2: Remove CCID naming redundancy 1/2
This removes a redundancy in the CCID half-connection (hc) naming scheme:
 * instead of 'hctx->tx_...', write 'hc->tx_...';
 * instead of 'hcrx->rx_...', write 'hc->rx_...';

which works because the 'type' of the half-connection is encoded in the
'rx_' / 'tx_' prefixes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:51:23 -07:00
Gerrit Renker
388d5e9905 dccp ccid-3: Overhaul CCID naming convention 2/2
This implements the new naming scheme also for CCID-3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:51:22 -07:00
Gerrit Renker
b1c00fe3cf dccp ccid-2: Overhaul CCID naming convention 1/2
This patch starts a less problematic naming convention for CCID structs.

The old naming convention used 'hc{tx,rx}->ccid?hc{tx,rx}->...' as
recurring prefixes, which made the code
 * hard to write (not easy to fit into 80 characters);
 * hard to read  (most of the space is occupied by prefixes).

The new naming scheme:
 * struct entries for the TX socket are prefixed by 'tx_';
 * and those for the RX socket are prefixed by 'rx_'.

The identifiers then remain distinguishable when grep-ing through the tree:
 (a) RX/TX sockets are distinguished by the naming scheme,
 (b) individual CCIDs are distinguished by filename (ccid{2,3,4}.{c,h}).

This first patch implements the scheme for CCID-2.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 13:51:21 -07:00
John W. Linville
a82ac21efc net/wireless/ethtool.h: drop unnecessary include of linux/ethtool.h
Everything including this header includes net/cfg80211.h, which
includes linux/netdevice.h, which includes linux/ethtool.h already.  Why
slow-down the build, even a little bit?

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:49 -04:00
John W. Linville
0adc23f58e mac80211: support ETHTOOL_GPERMADDR
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:48 -04:00
Kalle Valo
dfce95f51f cfg80211: add firmware and hardware version to wiphy
It's useful to provide firmware and hardware version to user space and have a
generic interface to retrieve them. Users can provide the version information
in bug reports etc.

Add fields for firmware and hardware version to struct wiphy.

(Dropped nl80211 bits for now and modified remaining bits in favor of
ethtool. -- JWL)

Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:46 -04:00
John W. Linville
4890e3bedd wireless: implement basic ethtool support for cfg80211 devices
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:45 -04:00
Johannes Berg
3d23e349d8 wext: refactor
Refactor wext to
 * split out iwpriv handling
 * split out iwspy handling
 * split out procfs support
 * allow cfg80211 to have wireless extensions compat code
   w/o CONFIG_WIRELESS_EXT

After this, drivers need to
 - select WIRELESS_EXT	- for wext support
 - select WEXT_PRIV	- for iwpriv support
 - select WEXT_SPY	- for iwspy support

except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.

Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:43 -04:00
Holger Schurig
7c89606e24 nl80211: report age of scan results
Linux keeps scan results up to 15 seconds. This can be a problem for fast
moving clients: they get back stale data. But if the kernel reports the age
of the BSS items, then user-space can simply weed out old entries by itself.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:42 -04:00
Roel Kluin
0819663d16 mac80211: use kfree_skb() to free struct sk_buff pointers
kfree_skb() should be used to free struct sk_buff pointers.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:33:51 -04:00
Johannes Berg
fbc44bf717 mac80211: fix vlan and optimise RX
When receiving data frames, we can send them only to
the interface they belong to based on transmitting
station (this doesn't work for probe requests). Also,
don't try to handle other frames for AP_VLAN at all
since those interface should only receive data.

Additionally, the transmit side must check that the
station we're sending a frame to is actually on the
interface we're transmitting on, and not transmit
packets to functions that live on other interfaces,
so validate that as well.

Another bug fix is needed in sta_info.c where in the
VLAN case when adding/removing stations we overwrite
the sdata variable we still need.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:33:49 -04:00
Stephen Hemminger
a21090cff2 ipv4: arp_notify address list bug
This fixes a bug with arp_notify.

If arp_notify is enabled, kernel will crash if address is changed
and no IP address is assigned.
  http://bugzilla.kernel.org/show_bug.cgi?id=14330

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 03:18:17 -07:00
Stephen Hemminger
ec1b4cf74c net: mark net_proto_ops as const
All usages of structure net_proto_ops should be declared const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:46 -07:00
Octavian Purdila
f7734fdf61 make TLLAO option for NA packets configurable
On Friday 02 October 2009 20:53:51 you wrote:

> This is good although I would have shortened the name.

Ah, I knew I forgot something :) Here is v4.

tavi

>From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Fri, 2 Oct 2009 00:51:15 +0300
Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs

Neighbor advertisements responding to unicast neighbor solicitations
did not include the target link-layer address option. This patch adds
a new sysctl option (disabled by default) which controls whether this
option should be sent even with unicast NAs.

The need for this arose because certain routers expect the TLLAO in
some situations even as a response to unicast NS packets.

Moreover, RFC 2461 recommends sending this to avoid a race condition
(section 4.4, Target link-layer address)

Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:45 -07:00
Brian Haley
51953d5bc4 Use sk_mark for IPv6 routing lookups
Atis Elsts wrote:
> Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.

Ok, I'll just drop that part, I'm not sure what should be done in this case.

> Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?

I just sent that patch out too quickly, here's a better one with the updates.

Add support for IPv6 route lookups using sk_mark.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:45 -07:00
Ben Hutchings
d73d3a8cb4 ethtool: Add reset operation
After updating firmware stored in flash, users may wish to reset the
relevant hardware and start the new firmware immediately.  This should
not be completely automatic as it may be disruptive.

A selective reset may also be useful for debugging or diagnostics.

This adds a separate reset operation which takes flags indicating the
components to be reset.  Drivers are allowed to reset only a subset of
those requested, and must indicate the actual subset.  This allows the
use of generic component masks and some future expansion.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:10:44 -07:00
Eric Dumazet
d250a5f90e pkt_sched: gen_estimator: Dont report fake rate estimators
Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...

Okay :) Here is the last round, before the night !

Thanks again

[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.

# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)

After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.

This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:07:42 -07:00
Eric Dumazet
2d37a186ce Use sk_mark for routing lookup in more places
Here is a followup on this area, thanks.

[RFC] af_packet: fill skb->mark at xmit

skb->mark may be used by classifiers, so fill it in case user
set a SO_MARK option on socket.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:07:38 -07:00
YOSHIFUJI Hideaki / 吉藤英明
fa857afcf7 ipv6 sit: 6rd (IPv6 Rapid Deployment) Support.
IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
deploy IPv6 unicast service to IPv4 sites to which it provides
customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
IPv4 encapsulation in order to transit IPv4-only network
infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
prefix of its own in place of the fixed 6to4 prefix.

With this option enabled, the SIT driver offers 6rd functionality by
providing additional ioctl API to configure the IPv6 Prefix for in
stead of static 2002::/16 for 6to4.

Original patch was done by Alexandre Cassen <acassen@freebox.fr>
based on old Internet-Draft.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 01:07:37 -07:00
Ilia K
ee5e81f000 add vif using local interface index instead of IP
When routing daemon wants to enable forwarding of multicast traffic it
performs something like:

       struct vifctl vc = {
               .vifc_vifi  = 1,
               .vifc_flags = 0,
               .vifc_threshold = 1,
               .vifc_rate_limit = 0,
               .vifc_lcl_addr = ip, /* <--- ip address of physical
interface, e.g. eth0 */
               .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
         };
       setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));

This leads (in the kernel) to calling  vif_add() function call which
search the (physical) device using assigned IP address:
       dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);

The current API (struct vifctl) does not allow to specify an
interface other way than using it's IP, and if there are more than a
single interface with specified IP only the first one will be found.

The attached patch (against 2.6.30.4) allows to specify an interface
by its index, instead of IP address:

       struct vifctl vc = {
               .vifc_vifi  = 1,
               .vifc_flags = VIFF_USE_IFINDEX,   /* NEW */
               .vifc_threshold = 1,
               .vifc_rate_limit = 0,
               .vifc_lcl_ifindex = if_nametoindex("eth0"),   /* NEW */
               .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
         };
       setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));

Signed-off-by: Ilia K. <mail4ilia@gmail.com>

=== modified file 'include/linux/mroute.h'
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07 00:48:41 -07:00
David S. Miller
7ecc59c1b7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-10-06 22:43:16 -07:00
Eric Dumazet
bcdce7195e net: speedup sk_wake_async()
An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
offsetof(struct sock, sk_lock)         =0x34   (rw)

offsetof(struct sock, sk_sleep)        =0x50 (read)
offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
offsetof(struct sock, sk_receive_queue)=0x74   (rw)

offsetof(struct sock, sk_forward_alloc)=0x98   (rw)

offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter)       =0xf8    (read)

offsetof(struct sock, sk_socket)       =0x138 (read)

offsetof(struct sock, sk_data_ready)   =0x15c   (read)


We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-06 17:28:29 -07:00
Johannes Berg
a160ee69c6 wext: let get_wireless_stats() sleep
A number of drivers (recently including cfg80211-based ones)
assume that all wireless handlers, including statistics, can
sleep and they often also implicitly assume that the rtnl is
held around their invocation. This is almost always true now
except when reading from sysfs:

  BUG: sleeping function called from invalid context at kernel/mutex.c:280
  in_atomic(): 1, irqs_disabled(): 0, pid: 10450, name: head
  2 locks held by head/10450:
   #0:  (&buffer->mutex){+.+.+.}, at: [<c10ceb99>] sysfs_read_file+0x24/0xf4
   #1:  (dev_base_lock){++.?..}, at: [<c12844ee>] wireless_show+0x1a/0x4c
  Pid: 10450, comm: head Not tainted 2.6.32-rc3 #1
  Call Trace:
   [<c102301c>] __might_sleep+0xf0/0xf7
   [<c1324355>] mutex_lock_nested+0x1a/0x33
   [<f8cea53b>] wdev_lock+0xd/0xf [cfg80211]
   [<f8cea58f>] cfg80211_wireless_stats+0x45/0x12d [cfg80211]
   [<c13118d6>] get_wireless_stats+0x16/0x1c
   [<c12844fe>] wireless_show+0x2a/0x4c

Fix this by using the rtnl instead of dev_base_lock.

Reported-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 02:22:23 -07:00
Andy Gospodarek
d519e17e2d net: export device speed and duplex via sysfs
This patch exports the link-speed (in Mbps) and duplex of an interface
via sysfs.  This eliminates the need to use ethtool just to check the
link-speed.  Not requiring 'ethtool' and not relying on the SIOCETHTOOL
ioctl should be helpful in an embedded environment where space is at a
premium as well.

NOTE: This patch also intentionally allows non-root users to check the link
speed and duplex -- something not possible with ethtool.

Here's some sample output:

# cat /sys/class/net/eth0/speed
100
# cat /sys/class/net/eth0/duplex
half
# ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  Not reported
        Advertised auto-negotiation: No
        Speed: 100Mb/s
        Duplex: Half
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: off
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x000000ff (255)
        Link detected: yes

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:43:35 -07:00
Marcel Holtmann
053a93dd12 cfg80211: assign device type in netdev notifier callback
Instead of having to modify every non-mac80211 for device type assignment,
do this inside the netdev notifier callback of cfg80211. So all drivers
that integrate with cfg80211 will export a proper device type.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:43:34 -07:00
Johannes Berg
7ffbe3fdac net: introduce NETDEV_POST_INIT notifier
For various purposes including a wireless extensions
bugfix, we need to hook into the netdev creation before
before netdev_register_kobject(). This will also ease
doing the dev type assignment that Marcel was working
on for cfg80211 drivers w/o touching them all.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:43:34 -07:00
Eric Dumazet
0bfbedb14a tunnels: Optimize tx path
We currently dirty a cache line to update tunnel device stats
(tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
counters that already are present in cpu cache, in the cache
line shared with txq->_xmit_lock

This patch extends IPTUNNEL_XMIT() macro to use txq pointer
provided by the caller.

Also &tunnel->dev->stats can be replaced by &dev->stats

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:57 -07:00
Stephen Hemminger
16c6cf8bb4 ipv4: fib table algorithm performance improvement
The FIB algorithim for IPV4 is set at compile time, but kernel goes through
the overhead of function call indirection at runtime. Save some
cycles by turning the indirect calls to direct calls to either
hash or trie code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:56 -07:00
Neil Horman
977750076d af_packet: add interframe drop cmsg (v6)
Add Ancilliary data to better represent loss information

I've had a few requests recently to provide more detail regarding frame loss
during an AF_PACKET packet capture session.  Specifically the requestors want to
see where in a packet sequence frames were lost, i.e. they want to see that 40
frames were lost between frames 302 and 303 in a packet capture file.  In order
to do this we need:

1) The kernel to export this data to user space
2) The applications to make use of it

This patch addresses item (1).  It does this by doing the following:

A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
also no increment a stats called po->stats.tp_gap.

B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
value of po->stats.tp_gap in skb->mark.  skb->cb would nominally be the place to
record this, but since all the space there is used up, we're overloading
skb->mark.  Its safe to do since any enqueued packet is guaranteed to be
unshared at this point, and skb->mark isn't used for anything else in the rx
path to the application.  After we record tp_gap in the skb, we zero
po->stats.tp_gap.  This allows us to keep a counter of the number of frames lost
between any two enqueued packets

C) When the application goes to dequeue a frame from the packet socket, we look
at skb->mark for that frame.  If it is non-zero, we add a cmsg chunk to the
msghdr of level SOL_PACKET and type PACKET_GAPDATA.  Its a 32 bit integer that
represents the number of frames lost between this packet and the last previous
frame received.

Note there is a chance that if there is frame loss after a receive, and then the
socket is closed, some gap data might be lost.  This is covered by the use of
the PACKET_AUXDATA socket option, which gives total loss data.  With a bit of
math, the final gap can be determined that way.

I've tested this patch myself, and it works well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

 include/linux/if_packet.h |    2 ++
 net/packet/af_packet.c    |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:55 -07:00
Eric Dumazet
0835acfe72 pktgen: Avoid dirtying skb->users when txq is full
We can avoid two atomic ops on skb->users if packet is not going to be
sent to the device (because hardware txqueue is full)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:54 -07:00
Eric Dumazet
b3a5b6cc7c icmp: No need to call sk_write_space()
We can make icmp messages tx completion callback a litle bit faster.

Setting SOCK_USE_WRITE_QUEUE sk flag tells sock_wfree() to
not call sk_write_space() on a socket we know no thread is posssibly
waiting for write space. (on per cpu kernel internal icmp sockets only)

This avoids the sock_def_write_space() call and
read_lock(&sk->sk_callback_lock)/read_unlock(&sk->sk_callback_lock) calls
as well.

We avoid three atomic ops.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:21:54 -07:00
Ben Hutchings
a9828ec6bc ethtool: Remove support for obsolete string query operations
The in-tree implementations have all been converted to
get_sset_count().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 00:10:11 -07:00
Eric Dumazet
9240d7154e pktgen: restore nanosec delays
Commit fd29cf72 (pktgen: convert to use ktime_t)
inadvertantly converted "delay" parameter from nanosec to microsec.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-04 21:08:58 -07:00
Eric Dumazet
896a7cf8d8 pktgen: Fix multiqueue handling
It is not currently possible to instruct pktgen to use one selected tx queue.

When Robert added multiqueue support in commit 45b270f8, he added
an interval (queue_map_min, queue_map_max), and his code doesnt take
into account the case of min = max, to select one tx queue exactly.

I suspect a high performance setup on a eight txqueue device wants
to use exactly eight cpus, and assign one tx queue to each sender.

This patchs makes pktgen select the right tx queue, not the first one.

Also updates Documentation to reflect Robert changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-04 21:08:54 -07:00
Alexey Dobriyan
a99bbaf5ee headers: remove sched.h from poll.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-04 15:05:10 -07:00
Linus Torvalds
90d5ffc729 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
  cnic: Fix NETDEV_UP event processing.
  uvesafb/connector: Disallow unpliviged users to send netlink packets
  pohmelfs/connector: Disallow unpliviged users to configure pohmelfs
  dst/connector: Disallow unpliviged users to configure dst
  dm/connector: Only process connector packages from privileged processes
  connector: Removed the destruct_data callback since it is always kfree_skb()
  connector/dm: Fixed a compilation warning
  connector: Provide the sender's credentials to the callback
  connector: Keep the skb in cn_callback_data
  e1000e/igb/ixgbe: Don't report an error if devices don't support AER
  net: Fix wrong sizeof
  net: splice() from tcp to pipe should take into account O_NONBLOCK
  net: Use sk_mark for routing lookup in more places
  sky2: irqname based on pci address
  skge: use unique IRQ name
  IPv4 TCP fails to send window scale option when window scale is zero
  net/ipv4/tcp.c: fix min() type mismatch warning
  Kconfig: STRIP: Remove stale bits of STRIP help text
  NET: mkiss: Fix typo
  tg3: Remove prev_vlan_tag from struct tx_ring_info
  ...
2009-10-02 13:37:18 -07:00
Eric Dumazet
42324c6270 net: splice() from tcp to pipe should take into account O_NONBLOCK
tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag

Before this patch :

splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
causes a random endless block (if pipe is full) and
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
will return 0 immediately if the TCP buffer is empty.

User application has no way to instruct splice() that socket should be in blocking mode
but pipe in nonblock more.

Many projects cannot use splice(tcp -> pipe) because of this flaw.

http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html

Linus introduced  SPLICE_F_NONBLOCK in commit 29e350944f
(splice: add SPLICE_F_NONBLOCK flag )

  It doesn't make the splice itself necessarily nonblocking (because the
  actual file descriptors that are spliced from/to may block unless they
  have the O_NONBLOCK flag set), but it makes the splice pipe operations
  nonblocking.

Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only

This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK
flag, as other socket operations do.

Users will then call :

splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );

to block on data coming from socket (if file is in blocking mode),
and not block on pipe output (to avoid deadlock)

First version of this patch was submitted by Octavian Purdila

Reported-by: Volker Lendecke <vl@samba.org>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-02 09:46:05 -07:00
Atis Elsts
914a9ab386 net: Use sk_mark for routing lookup in more places
This patch against v2.6.31 adds support for route lookup using sk_mark in some 
more places. The benefits from this patch are the following.
First, SO_MARK option now has effect on UDP sockets too.
Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
lookup correctly if TCP sockets with SO_MARK were used.

Signed-off-by: Atis Elsts <atis@mikrotik.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-10-01 15:16:49 -07:00
Ori Finkelman
89e95a613c IPv4 TCP fails to send window scale option when window scale is zero
Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
and SYN headers even if our window scale is zero.

This fixes the following observed behavior:

1. Client sends a SYN with TCP window scaling option and non zero window scale
value to a Linux box.
2. Linux box notes large receive window from client.
3. Linux decides on a zero value of window scale for its part.
4. Due to compare against requested window scale size option, Linux does not to
 send windows scale TCP option header on SYN/ACK at all.

With the following result:

Client box thinks TCP window scaling is not supported, since SYN/ACK had no
TCP window scale option, while Linux thinks that TCP window scaling is
supported (and scale might be non zero), since SYN had  TCP window scale
option and we have a mismatched idea between the client and server
regarding window sizes.

Probably it also fixes up the following bug (not observed in practice):

1. Linux box opens TCP connection to some server.
2. Linux decides on zero value of window scale.
3. Due to compare against computed window scale size option, Linux does
not to set windows scale TCP  option header on SYN.

With the expected result that the server OS does not use window scale option
due to not receiving such an option in the SYN headers, leading to suboptimal
performance.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-01 15:14:51 -07:00
Andrew Morton
4fdb78d309 net/ipv4/tcp.c: fix min() type mismatch warning
net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-01 15:02:20 -07:00
David S. Miller
a98917acc7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-10-01 12:43:07 -07:00
Eric Dumazet
417bc4b855 pktgen: Fix delay handling
After last pktgen changes, delay handling is wrong.

pktgen actually sends packets at full line speed.

Fix is to update pkt_dev->next_tx even if spin() returns early,
so that next spin() calls have a chance to see a positive delay.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-01 09:29:45 -07:00
Linus Torvalds
817b33d38f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ax25: Fix possible oops in ax25_make_new
  net: restore tx timestamping for accelerated vlans
  Phonet: fix mutex imbalance
  sit: fix off-by-one in ipip6_tunnel_get_prl
  net: Fix sock_wfree() race
  net: Make setsockopt() optlen be unsigned.
2009-09-30 17:36:45 -07:00
Jarek Poplawski
8c185ab618 ax25: Fix possible oops in ax25_make_new
In ax25_make_new, if kmemdup of digipeat returns an error, there would
be an oops in sk_free while calling sk_destruct, because sk_protinfo
is NULL at the moment; move sk->sk_destruct initialization after this.

BTW of reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:44:12 -07:00
Eric Dumazet
81bbb3d404 net: restore tx timestamping for accelerated vlans
Since commit 9b22ea5609
( net: fix packet socket delivery in rx irq handler )

We lost rx timestamping of packets received on accelerated vlans.

Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps
too late (ie at skb dequeueing time, not at skb queueing time)

14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1
14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1

14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2
14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2

14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3
14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:42:42 -07:00
Rémi Denis-Courmont
013820a360 Phonet: fix mutex imbalance
From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

port_mutex was unlocked twice.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:41:34 -07:00
Sascha Hlusiak
298bf12ddb sit: fix off-by-one in ipip6_tunnel_get_prl
When requesting all prl entries (kprl.addr == INADDR_ANY) and there are
more prl entries than there is space passed from userspace, the existing
code would always copy cmax+1 entries, which is more than can be handled.

This patch makes the kernel copy only exactly cmax entries.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-By: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:39:27 -07:00
Eric Dumazet
d99927f4d9 net: Fix sock_wfree() race
Commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.

A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.

Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:20:38 -07:00
David S. Miller
b7058842c9 net: Make setsockopt() optlen be unsigned.
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:12:20 -07:00
Linus Torvalds
5a4c8d75f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
  sony-laptop: re-read the rfkill state when resuming from suspend
  sony-laptop: check for rfkill hard block at load time
  wext: add back wireless/ dir in sysfs for cfg80211 interfaces
  wext: Add bound checks for copy_from_user
  mac80211: improve/fix mlme messages
  cfg80211: always get BSS
  iwlwifi: fix 3945 ucode info retrieval after failure
  iwlwifi: fix memory leak in command queue handling
  iwlwifi: fix debugfs buffer handling
  cfg80211: don't set privacy w/o key
  cfg80211: wext: don't display BSSID unless associated
  net: Add explicit bound checks in net/socket.c
  bridge: Fix double-free in br_add_if.
  isdn: fix netjet/isdnhdlc build errors
  atm: dereference of he_dev->rbps_virt in he_init_group()
  ax25: Add missing dev_put in ax25_setsockopt
  Revert "sit: stateless autoconf for isatap"
  net: fix double skb free in dcbnl
  net: fix nlmsg len size for skb when error bit is set.
  net: fix vlan_get_size to include vlan_flags size
  ...
2009-09-30 08:07:12 -07:00
Igor Perminov
1f08e84ff6 mac80211: Fix [re]association power saving issue on AP side
Consider the following step-by step:
1. A STA authenticates and associates with the AP and exchanges
traffic.
2. The STA reports to the AP that it is going to PS state.
3. Some time later the STA device goes to the stand-by mode (not only
its wi-fi card, but the device itself) and drops the association state
without sending a disassociation frame.
4. The STA device wakes up and begins authentication with an
Auth frame as it hasn't been authenticated/associated previously.

At the step 4 the AP "remembers" the STA and considers it is still in
the PS state, so the AP buffers frames, which it has to send to the STA.
But the STA isn't actually in the PS state and so it neither checks
TIM bits nor reports to the AP that it isn't power saving.
Because of that authentication/[re]association fails.

To fix authentication/[re]association stage of this issue, Auth, Assoc
Resp and Reassoc Resp frames are transmitted disregarding of STA's power
saving state.

N.B. This patch doesn't fix further data frame exchange after
authentication/[re]association. A patch in hostapd is required to fix
that.

Signed-off-by: Igor Perminov <igor.perminov@inbox.ru>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-29 17:25:15 -04:00
David S. Miller
eb1cf0f8f7 Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-09-28 14:50:06 -07:00
Johannes Berg
8f1546cadf wext: add back wireless/ dir in sysfs for cfg80211 interfaces
The move away from having drivers assign wireless handlers,
in favour of making cfg80211 assign them, broke the sysfs
registration (the wireless/ dir went missing) because the
handlers are now assigned only after registration, which is
too late.

Fix this by special-casing cfg80211-based devices, all
of which are required to have an ieee80211_ptr, in the
sysfs code, and also using get_wireless_stats() to have
the same values reported as in procfs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:07 -04:00
Arjan van de Ven
8503bd8c7d wext: Add bound checks for copy_from_user
The wireless extensions have a copy_from_user to a local stack
array "essid", but both me and gcc have failed to find where
the bounds for this copy are located in the code.

This patch adds some basic sanity checks for the copy length
to make sure that we don't overflow the stack buffer.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:06 -04:00
Johannes Berg
0ff716136a mac80211: improve/fix mlme messages
It's useful to know the MAC address when being
disassociated; fix a typo (missing colon) and
move some messages so we get them only when they
are actually taking effect.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:05 -04:00
Johannes Berg
8bb894859e cfg80211: always get BSS
Multiple problems were reported due to interaction
between wpa_supplicant and the wext compat code in
cfg80211, which appear to be due to it not getting
any bss pointer here when wpa_supplicant sets all
parameters -- do that now. We should still get the
bss after doing an extra scan, but that appears to
increase the time we need for connecting enough to
sometimes cause timeouts.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Hin-Tak Leung <hintak.leung@gmail.com>,
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:05 -04:00
Johannes Berg
4be3bd8ccc cfg80211: don't set privacy w/o key
When wpa_supplicant is used to connect to open networks,
it causes the wdev->wext.keys to point to key memory, but
that key memory is all empty. Only use privacy when there
is a default key to be used.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:04 -04:00
Johannes Berg
33de4f9d78 cfg80211: wext: don't display BSSID unless associated
Currently, cfg80211's SIOCGIWAP implementation returns
the BSSID that the user set, even if the connection has
since been dropped due to other changes. It only should
return the current BSSID when actually connected.

Also do a small code cleanup.

Reported-by: Thomas H. Guenther <thomas.h.guenther@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Thomas H. Guenther <thomas.h.guenther@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-28 16:55:04 -04:00
Arjan van de Ven
47379052b5 net: Add explicit bound checks in net/socket.c
The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-28 12:57:44 -07:00
Jeff Hansen
30df94f800 bridge: Fix double-free in br_add_if.
There is a potential double-kfree in net/bridge/br_if.c.  If br_fdb_insert
fails, then the kobject is put back (which calls kfree due to the kobject
release), and then kfree is called again on the net_bridge_port.  This
patch fixes the crash.

Thanks to Stephen Hemminger for the one-line fix.

Signed-off-by: Jeff Hansen <x@jeffhansen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-28 12:54:25 -07:00
Ralf Baechle
2f72291d3d ax25: Add missing dev_put in ax25_setsockopt
ax25_setsockopt SO_BINDTODEVICE is missing a dev_put call in case of
success.  Re-order code to fix this bug.  While at it also reformat two
lines of code to comply with the Linux coding style.

Initial patch by Jarek Poplawski <jarkao2@gmail.com>.

Reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-28 12:26:28 -07:00
Alexey Dobriyan
f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Sascha Hlusiak
d1f8297a96 Revert "sit: stateless autoconf for isatap"
This reverts commit 645069299a.

While the code does not actually break anything, it does not completely follow
RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
implemented in userspace.

The kernel should not send RS packages for ISATAP at all.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-26 20:28:07 -07:00
John Fastabend
7eaf5077b3 net: fix double skb free in dcbnl
netlink_unicast() calls kfree_skb even in the error case.

dcbnl calls netlink_unicast() which when it fails free's the
skb and returns an error value.  dcbnl is free'ing the skb
again when this error occurs.  This patch removes the double
free.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-26 20:16:15 -07:00