1
Commit Graph

12258 Commits

Author SHA1 Message Date
Linus Torvalds
c8bce3d3bd Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
  svcrdma: dma unmap the correct length for the RPCRDMA header page.
  nfsd: Revert "svcrpc: take advantage of tcp autotuning"
  nfsd: fix hung up of nfs client while sync write data to nfs server
2009-05-29 08:49:09 -07:00
Steve Wise
98779be861 svcrdma: dma unmap the correct length for the RPCRDMA header page.
The svcrdma module was incorrectly unmapping the RPCRDMA header page.
On IBM pserver systems this causes a resource leak that results in
running out of bus address space (10 cthon iterations will reproduce it).
The code was mapping the full page but only unmapping the actual header
length.  The fix is to only map the header length.

I also cleaned up the use of ib_dma_map_page() calls since the unmap
logic always uses ib_dma_unmap_single().  I made these symmetrical.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 18:57:24 -04:00
J. Bruce Fields
7f4218354f nfsd: Revert "svcrpc: take advantage of tcp autotuning"
This reverts commit 47a14ef1af "svcrpc:
take advantage of tcp autotuning", which uncovered some further problems
in the server rpc code, causing significant performance regressions in
common cases.

We will likely reinstate this patch after releasing 2.6.30 and applying
some work on the underlying fixes to the problem (developed by Trond).

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: Jim Rees <rees@umich.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 18:51:06 -04:00
Linus Torvalds
e2a1b9ee23 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFSv4: Fix the case where NFSv4 renewal fails
  nfs: fix build error in nfsroot with initconst
  XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices
2009-05-26 12:15:35 -07:00
Vu Pham
68743082b5 XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices
mlx4/connectX FRMR requires local write enable together with remote
rdma write enable. This fixes NFS/RDMA operation over the ConnectX
Infiniband HCA in the default memreg mode.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-05-26 14:51:00 -04:00
Doug Leith
c80a5cdfc5 tcp: tcp_vegas ssthresh bugfix
This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases

Signed-off-by: Doug Leith <doug.leith@nuim.ie>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 22:44:59 -07:00
Dan Carpenter
0975ecba3b RxRPC: Error handling for rxrpc_alloc_connection()
rxrpc_alloc_connection() doesn't return an error code on failure, it just
returns NULL.  IS_ERR(NULL) is false.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:22:02 -07:00
Robert Olsson
3ed18d76d9 ipv4: Fix oops with FIB_TRIE
It seems we can fix this by disabling preemption while we re-balance the 
trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high 
loads continuesly taking a full BGP table up/down via iproute -batch.

Note. fib_trie is not updated for CONFIG_PREEMPT_RCU

Reported-by: Andrei Popa
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:20:59 -07:00
Florian Westphal
5b5f792a6a pktgen: do not access flows[] beyond its length
typo -- pkt_dev->nflows is for stats only, the number of concurrent
flows is stored in cflows.

Reported-By: Vladimir Ivashchenko <hazard@francoudi.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:07:12 -07:00
Jean-Mickael Guerin
4f72427998 IPv6: set RTPROT_KERNEL to initial route
The use of unspecified protocol in IPv6 initial route prevents quagga to
install IPv6 default route:
# show ipv6 route
S   ::/0 [1/0] via fe80::1, eth1_0
K>* ::/0 is directly connected, lo, rej
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0

# ip -6 route
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
unreachable default dev lo  proto none  metric -1  error -101 hoplimit 255

The attached patch ensures RTPROT_KERNEL to the default initial route
and fixes the problem for quagga.
This is similar to "ipv6: protocol for address routes"
f410a1fba7.

# show ipv6 route
S>* ::/0 [1/0] via fe80::1, eth1_0
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0

# ip -6 route
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
default via fe80::1 dev eth1_0  proto zebra  metric 1024  mtu 1500
advmss 1440 hoplimit -1
unreachable default dev lo  proto kernel  metric -1  error -101 hoplimit 255

Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:38:59 -07:00
David S. Miller
86c2fe1e3a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-05-20 17:31:25 -07:00
Eric Dumazet
1ddbcb005c net: fix rtable leak in net/ipv4/route.c
Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709fb
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.

Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:18:02 -07:00
Eric Dumazet
cf8da764fc net: fix length computation in rt_check_expire()
rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:18:01 -07:00
Luis R. Rodriguez
5078b2e32a cfg80211: fix race between core hint and driver's custom apply
Its possible for cfg80211 to have scheduled the work and for
the global workqueue to not have kicked in prior to a cfg80211
driver's regulatory hint or wiphy_apply_custom_regulatory().

Although this is very unlikely its possible and should fix
this race. When this race would happen you are expected to have
hit a null pointer dereference panic.

Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:29:54 -04:00
Johannes Berg
88f16db7a2 wext: verify buffer size for SIOCSIWENCODEEXT
Another design flaw in wireless extensions (is anybody
surprised?) in the way it handles the iw_encode_ext
structure: The structure is part of the 'extra' memory
but contains the key length explicitly, instead of it
just being the length of the extra buffer - size of
the struct and using the explicit key length only for
the get operation (which only writes it).

Therefore, we have this layout:

extra: +-------------------------+
       | struct iw_encode_ext  { |
       |     ...                 |
       |     u16 key_len;        |
       |     u8 key[0];          |
       | };                      |
       +-------------------------+
       | key material            |
       +-------------------------+

Now, all drivers I checked use ext->key_len without
checking that both key_len and the struct fit into the
extra buffer that has been copied from userspace. This
leads to a buffer overrun while reading that buffer,
depending on the driver it may be possible to specify
arbitrary key_len or it may need to be a proper length
for the key algorithm specified.

Thankfully, this is only exploitable by root, but root
can actually cause a segfault or use kernel memory as
a key (which you can even get back with siocgiwencode
or siocgiwencodeext from the key buffer).

Fix this by verifying that key_len fits into the buffer
along with struct iw_encode_ext.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:07:50 -04:00
Frans Pop
bc8a539743 ipv4: make default for INET_LRO consistent with help text
Commit e81963b1 ("ipv4: Make INET_LRO a bool instead of tristate.")
changed this config from tristate to bool.  Add default so that it is
consistent with the help text.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 21:48:38 -07:00
Thomas Chenault
995b337952 net: fix skb_seq_read returning wrong offset/length for page frag data
When called with a consumed value that is less than skb_headlen(skb)
bytes into a page frag, skb_seq_read() incorrectly returns an
offset/length relative to skb->data. Ensure that data which should come
from a page frag does.

Signed-off-by: Thomas Chenault <thomas_chenault@dell.com>
Tested-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 21:43:27 -07:00
Eric Dumazet
511e11e396 pkt_sched: gen_estimator: use 64 bit intermediate counters for bps
gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit
(2^32 bytes)

Using 64 bit intermediate avbps/brate counters can allow us to reach
this theorical limit.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 19:26:37 -07:00
Eric Dumazet
c0f84d0d4b sch_teql: should not dereference skb after ndo_start_xmit()
It is illegal to dereference a skb after a successful ndo_start_xmit()
call. We must store skb length in a local variable instead.

Bug was introduced in 2.6.27 by commit 0abf77e55a
(net_sched: Add accessor function for packet length for qdiscs)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:12:31 -07:00
Ilpo Järvinen
7752731318 tcp: fix MSG_PEEK race check
Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.

I tried my best to deal with urg hole madness too which happens
here:
	if (!sock_flag(sk, SOCK_URGINLINE)) {
		++*seq;
		...
by using additional offset by one but I certainly have very
little interest in testing that part.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Frans Pop <elendil@planet.nl>
Tested-by: Ian Zimmermann <itz@buug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:05:40 -07:00
Stephen Hemminger
4f0611af47 bridge: fix initial packet flood if !STP
If bridge is configured with no STP and forwarding delay of 0 (which
is typical for virtualization) then when link starts it will flood all
packets for the first 20 seconds.

This bug was introduced by a combination of earlier changes:
  * forwarding database uses hold time of zero to indicate
    user wants to always flood packets
  * optimzation of the case of forwarding delay of 0 avoids the initial
    timer tick

The fix is to just skip all the topology change detection code if
kernel STP is not being used.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 21:12:55 -07:00
Stephen Hemminger
a598f6aebe bridge: relay bridge multicast pkgs if !STP
Currently the bridge catches all STP packets; even if STP is turned
off.  This prevents other systems (which do have STP turned on)
from being able to detect loops in the network.

With this patch, if STP is off, then any packet sent to the STP
multicast group address is forwarded to all ports.

Based on earlier patch by Joakim Tjernlund with changes
to go through forwarding (not local chain), and optimization
that only last octet needs to be checked.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 21:12:54 -07:00
Chris Friesen
2513dfb83f ipconfig: handle case of delayed DHCP server
If a DHCP server is delayed, it's possible for the client to receive the 
DHCPOFFER after it has already sent out a new DHCPDISCOVER message from 
a second interface.  The client then sends out a DHCPREQUEST from the 
second interface, but the server doesn't recognize the device and 
rejects the request.

This patch simply tracks the current device being configured and throws 
away the OFFER if it is not intended for the current device.  A more 
sophisticated approach would be to put the OFFER information into the 
struct ic_device rather than storing it globally.

Signed-off-by: Chris Friesen <cfriesen@nortel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:39:33 -07:00
Pavel Emelyanov
5e392739d6 netpoll: don't dereference NULL dev from np
It looks like the dev in netpoll_poll can be NULL - at lease it's
checked at the function beginning. Thus the dev->netde_ops dereference
looks dangerous.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:37:55 -07:00
Linus Torvalds
7c7327d966 Merge git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6:
  Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing
  Bluetooth: Don't use hci_acl_connect_cancel() for incoming connections
  Bluetooth: Fix wrong module refcount when connection setup fails

Another case of me handling the fallout from Davem's unfortunate
addiction to shuffleboard.

Won't anybody think of the children? Join the anti-shuffleboard league
today!
2009-05-15 14:30:02 -07:00
Linus Torvalds
3346857f6f Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
  iwlwifi: fix device id registration for 6000 series 2x2 devices
  ath5k: update channel in sw state after stopping RX and TX
  rtl8187: use DMA-aware buffers with usb_control_msg
  mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
  airo: airo_get_encode{,ext} potential buffer overflow

Pulled directly by Linus because Davem is off playing shuffle-board at
some Alaskan cruise, and the NULL ptr deref issue hits people and should
get merged sooner rather than later.

David - make us proud on the shuffle-board tournament!
2009-05-15 12:02:06 -07:00
Linus Torvalds
2ea3f86848 Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
  nfsd: silence lockdep warning
  lockd: fix list corruption on lockd restart
  nfsd4: check for negative dentry before use in nfsv4 readdir
  nfsd41: slots are freed with session
  svcrdma: clean up error paths.
  svcrdma: Fix dma map direction for rdma read targets
2009-05-12 17:11:56 -07:00
John W. Linville
621ad7c96a mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
"There is another problem with this piece of code. The sband will be NULL
after second iteration on single band device and cause null pointer
dereference. Everything is working with dual band card. Sorry, but i
don't know how to explain this clearly in English. I have looked on the
second patch for pid algorithm and found similar bug."

Reported-by: Karol Szuster <qflon@o2.pl>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:07:01 -04:00
Linus Torvalds
2ad20802b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
  bonding: fix panic if initialization fails
  IXP4xx: complete Ethernet netdev setup before calling register_netdev().
  IXP4xx: use "ENODEV" instead of "ENOSYS" in module initialization.
  ipvs: Fix IPv4 FWMARK virtual services
  ipv4: Make INET_LRO a bool instead of tristate.
  net: remove stale reference to fastroute from Kconfig help text
  net: update skb_recycle_check() for hardware timestamping changes
  bnx2: Fix panic in bnx2_poll_work().
  net-sched: fix bfifo default limit
  igb: resolve panic on shutdown when SR-IOV is enabled
  wimax: oops: wimax_dev_add() is the only one that can initialize the state
  wimax: fix oops if netlink fails to add attribute
  Bluetooth: Move dev_set_name() to a context that can sleep
  netfilter: ctnetlink: fix wrong message type in user updates
  netfilter: xt_cluster: fix use of cluster match with 32 nodes
  netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE
  netfilter: add missing linux/types.h include to xt_LED.h
  mac80211: pid, fix memory corruption
  mac80211: minstrel, fix memory corruption
  cfg80211: fix comment on regulatory hint processing
  ...
2009-05-10 10:46:45 -07:00
Marcel Holtmann
3d7a9d1c7e Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing
A remote device in security mode 3 that tries to connect will require
the pairing during the connection setup phase. The disconnect timeout
is now triggered within 10 milliseconds and causes the pairing to fail.

If a connection is not fully established and a PIN code request is
received, don't trigger the disconnect timeout. The either successful
or failing connection complete event will make sure that the timeout
is triggered at the right time.

The biggest problem with security mode 3 is that many Bluetooth 2.0
device and before use a temporary security mode 3 for dedicated
bonding.

Based on a report by Johan Hedberg <johan.hedberg@nokia.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Johan Hedberg <johan.hedberg@nokia.com>
2009-05-09 18:09:52 -07:00
Marcel Holtmann
1b0336bb36 Bluetooth: Don't use hci_acl_connect_cancel() for incoming connections
The connection setup phase takes around 2 seconds or longer and in
that time it is possible that the need for an ACL connection is no
longer present. If that happens then, the connection attempt will
be canceled.

This only applies to outgoing connections, but currently it can also
be triggered by incoming connection. Don't call hci_acl_connect_cancel()
on incoming connection since these have to be either accepted or rejected
in this state. Once they are successfully connected they need to be
fully disconnected anyway.

Also remove the wrong hci_acl_disconn() call for SCO and eSCO links
since at this stage they can't be disconnected either, because the
connection handle is still unknown.

Based on a report by Johan Hedberg <johan.hedberg@nokia.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Johan Hedberg <johan.hedberg@nokia.com>
2009-05-09 18:09:45 -07:00
Marcel Holtmann
384943ec1b Bluetooth: Fix wrong module refcount when connection setup fails
The module refcount is increased by hci_dev_hold() call in hci_conn_add()
and decreased by hci_dev_put() call in del_conn(). In case the connection
setup fails, hci_dev_put() is never called.

Procedure to reproduce the issue:

  # hciconfig hci0 up
  # lsmod | grep btusb                   -> "used by" refcount = 1

  # hcitool cc <non-exisiting bdaddr>    -> will get timeout

  # lsmod | grep btusb                   -> "used by" refcount = 2
  # hciconfig hci0 down
  # lsmod | grep btusb                   -> "used by" refcount = 1
  # rmmod btusb                          -> ERROR: Module btusb is in use

The hci_dev_put() call got moved into del_conn() with the 2.6.25 kernel
to fix an issue with hci_dev going away before hci_conn. However that
change was wrong and introduced this problem.

When calling hci_conn_del() it has to call hci_dev_put() after freeing
the connection details. This handling should be fully symmetric. The
execution of del_conn() is done in a work queue and needs it own calls
to hci_dev_hold() and hci_dev_put() to ensure that the hci_dev stays
until the connection cleanup has been finished.

Based on a report by Bing Zhao <bzhao@marvell.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Bing Zhao <bzhao@marvell.com>
2009-05-09 18:09:38 -07:00
Simon Horman
be8be9eccb ipvs: Fix IPv4 FWMARK virtual services
This fixes the use of fwmarks to denote IPv4 virtual services
which was unfortunately broken as a result of the integration
of IPv6 support into IPVS, which was included in 2.6.28.

The problem arises because fwmarks are stored in the 4th octet
of a union nf_inet_addr .all, however in the case of IPv4 only
the first octet, corresponding to .ip, is assigned and compared.

In other words, using .all = { 0, 0, 0, htonl(svc->fwmark) always
results in a value of 0 (32bits) being stored for IPv4. This means
that one fwmark can be used, as it ends up being mapped to 0, but things
break down when multiple fwmarks are used, as they all end up being mapped
to 0.

As fwmarks are 32bits a reasonable fix seems to be to just store the fwmark
in .ip, and comparing and storing .ip when fwmarks are used.

This patch makes the assumption that in calls to ip_vs_ct_in_get()
and ip_vs_sched_persist() if the proto parameter is IPPROTO_IP then
we are dealing with an fwmark. I believe this is valid as ip_vs_in()
does fairly strict filtering on the protocol and IPPROTO_IP should
not be used in these calls unless explicitly passed when making
these calls for fwmarks in ip_vs_sched_persist().

Tested-by: Fabien Duchêne <fabien.duchene@student.uclouvain.be>
Cc: Joseph Mack NA3T <jmack@wm7d.net>
Cc: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08 14:54:47 -07:00
David S. Miller
e81963b180 ipv4: Make INET_LRO a bool instead of tristate.
This code is used as a library by several device drivers,
which select INET_LRO.

If some are modules and some are statically built into the
kernel, we get build failures if INET_LRO is modular.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08 12:45:26 -07:00
Ashish Karkare
9b05126baa net: remove stale reference to fastroute from Kconfig help text
Signed-off-by: Ashish Karkare <akarkare@marvell.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-07 16:31:01 -07:00
Lennert Buytenhek
b805007545 net: update skb_recycle_check() for hardware timestamping changes
Commit ac45f602ee ("net: infrastructure
for hardware time stamping") added two skb initialization actions to
__alloc_skb(), which need to be added to skb_recycle_check() as well.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-06 16:49:18 -07:00
Patrick McHardy
6473990c7f net-sched: fix bfifo default limit
When no limit is given, the bfifo uses a default of tx_queue_len * mtu.
Packets handled by qdiscs include the link layer header, so this should
be taken into account, similar to what other qdiscs do.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-06 16:45:07 -07:00
David S. Miller
a860820dce Merge branch 'linux-2.6.30.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2009-05-06 16:42:19 -07:00
Inaky Perez-Gonzalez
94c7f2d495 wimax: oops: wimax_dev_add() is the only one that can initialize the state
When a new wimax_dev is created, it's state has to be __WIMAX_ST_NULL
until wimax_dev_add() is succesfully called. This allows calls into
the stack that happen before said time to be rejected.

Until now, the state was being set (by mistake) to UNINITIALIZED,
which was allowing calls such as wimax_report_rfkill_hw() to go
through even when a call to wimax_dev_add() had failed; that was
causing an oops when touching uninitialized data.

This situation is normal when the device starts reporting state before
the whole initialization has been completed. It just has to be dealt
with.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-05-06 13:48:37 -07:00
Inaky Perez-Gonzalez
d1a2627a29 wimax: fix oops if netlink fails to add attribute
When sending a message to user space using wimax_msg(), if nla_put()
fails, correctly interpret the return code from wimax_msg_alloc() as
an err ptr and return the error code instead of crashing (as it is
assuming than non-NULL means the pointer is ok).

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-05-06 13:48:36 -07:00
Marcel Holtmann
457ca7bb6b Bluetooth: Move dev_set_name() to a context that can sleep
Setting the name of a sysfs device has to be done in a context that can
actually sleep. It allocates its memory with GFP_KERNEL. Previously it
was a static (size limited) string and that got changed to accommodate
longer device names. So move the dev_set_name() just before calling
device_add() which is executed in a work queue.

This fixes the following error:

[  110.012125] BUG: sleeping function called from invalid context at mm/slub.c:1595
[  110.012135] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
[  110.012141] 2 locks held by swapper/0:
[  110.012145]  #0:  (hci_task_lock){++.-.+}, at: [<ffffffffa01f822f>] hci_rx_task+0x2f/0x2d0 [bluetooth]
[  110.012173]  #1:  (&hdev->lock){+.-.+.}, at: [<ffffffffa01fb9e2>] hci_event_packet+0x72/0x25c0 [bluetooth]
[  110.012198] Pid: 0, comm: swapper Tainted: G        W 2.6.30-rc4-g953cdaa #1
[  110.012203] Call Trace:
[  110.012207]  <IRQ>  [<ffffffff8023eabd>] __might_sleep+0x14d/0x170
[  110.012228]  [<ffffffff802cfbe1>] __kmalloc+0x111/0x170
[  110.012239]  [<ffffffff803c2094>] kvasprintf+0x64/0xb0
[  110.012248]  [<ffffffff803b7a5b>] kobject_set_name_vargs+0x3b/0xa0
[  110.012257]  [<ffffffff80465326>] dev_set_name+0x76/0xa0
[  110.012273]  [<ffffffffa01fb9e2>] ? hci_event_packet+0x72/0x25c0 [bluetooth]
[  110.012289]  [<ffffffffa01ffc1d>] hci_conn_add_sysfs+0x3d/0x70 [bluetooth]
[  110.012303]  [<ffffffffa01fba2c>] hci_event_packet+0xbc/0x25c0 [bluetooth]
[  110.012312]  [<ffffffff80516eb0>] ? sock_def_readable+0x80/0xa0
[  110.012328]  [<ffffffffa01fee0c>] ? hci_send_to_sock+0xfc/0x1c0 [bluetooth]
[  110.012343]  [<ffffffff80516eb0>] ? sock_def_readable+0x80/0xa0
[  110.012347]  [<ffffffff805e88c5>] ? _read_unlock+0x75/0x80
[  110.012354]  [<ffffffffa01fee0c>] ? hci_send_to_sock+0xfc/0x1c0 [bluetooth]
[  110.012360]  [<ffffffffa01f8403>] hci_rx_task+0x203/0x2d0 [bluetooth]
[  110.012365]  [<ffffffff80250ab5>] tasklet_action+0xb5/0x160
[  110.012369]  [<ffffffff8025116c>] __do_softirq+0x9c/0x150
[  110.012372]  [<ffffffff805e850f>] ? _spin_unlock+0x3f/0x80
[  110.012376]  [<ffffffff8020cbbc>] call_softirq+0x1c/0x30
[  110.012380]  [<ffffffff8020f01d>] do_softirq+0x8d/0xe0
[  110.012383]  [<ffffffff80250df5>] irq_exit+0xc5/0xe0
[  110.012386]  [<ffffffff8020e71d>] do_IRQ+0x9d/0x120
[  110.012389]  [<ffffffff8020c3d3>] ret_from_intr+0x0/0xf
[  110.012391]  <EOI>  [<ffffffff80431832>] ? acpi_idle_enter_bm+0x264/0x2a6
[  110.012399]  [<ffffffff80431828>] ? acpi_idle_enter_bm+0x25a/0x2a6
[  110.012403]  [<ffffffff804f50d5>] ? cpuidle_idle_call+0xc5/0x130
[  110.012407]  [<ffffffff8020a4b4>] ? cpu_idle+0xc4/0x130
[  110.012411]  [<ffffffff805d2268>] ? rest_init+0x88/0xb0
[  110.012416]  [<ffffffff807e2fbd>] ? start_kernel+0x3b5/0x412
[  110.012420]  [<ffffffff807e2281>] ? x86_64_start_reservations+0x91/0xb5
[  110.012424]  [<ffffffff807e2394>] ? x86_64_start_kernel+0xef/0x11b

Based on a report by Davide Pesavento <davidepesa@gmail.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Hugo Mildenberger <hugo.mildenberger@namir.de>
Tested-by: Bing Zhao <bzhao@marvell.com>
2009-05-05 13:26:08 -07:00
David S. Miller
356d6c2d55 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-05-05 12:00:53 -07:00
David S. Miller
86b698b8cb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-05-05 11:56:07 -07:00
Pablo Neira Ayuso
fecc1133b6 netfilter: ctnetlink: fix wrong message type in user updates
This patch fixes the wrong message type that are triggered by
user updates, the following commands:

(term1)# conntrack -I -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state LISTEN
(term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_SENT
(term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_RECV

only trigger event message of type NEW, when only the first is NEW
while others should be UPDATE.

(term2)# conntrack -E
    [NEW] tcp      6 10 LISTEN src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
    [NEW] tcp      6 10 SYN_SENT src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
    [NEW] tcp      6 10 SYN_RECV src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0

This patch also removes IPCT_REFRESH from the bitmask since it is
not of any use.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-05 17:48:26 +02:00
Pablo Neira Ayuso
280f37afa2 netfilter: xt_cluster: fix use of cluster match with 32 nodes
This patch fixes a problem when you use 32 nodes in the cluster
match:

% iptables -I PREROUTING -t mangle -i eth0 -m cluster \
  --cluster-total-nodes  32  --cluster-local-node  32 \
  --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 0xffff
iptables: Invalid argument. Run `dmesg' for more information.
% dmesg | tail -1
xt_cluster: this node mask cannot be higher than the total number of nodes

The problem is related to this checking:

if (info->node_mask >= (1 << info->total_nodes)) {
	printk(KERN_ERR "xt_cluster: this node mask cannot be "
			"higher than the total number of nodes\n");
	return false;
}

(1 << 32) is 1. Thus, the checking fails.

BTW, I said this before but I insist: I have only tested the cluster
match with 2 nodes getting ~45% extra performance in an active-active setup.
The maximum limit of 32 nodes is still completely arbitrary. I'd really
appreciate if people that have more nodes in their setups let me know.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-05 17:46:07 +02:00
Linus Torvalds
80445de577 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  e1000: fix virtualization bug
  bonding: fix alb mode locking regression
  Bluetooth: Fix issue with sysfs handling for connections
  usbnet: CDC EEM support (v5)
  tcp: Fix tcp_prequeue() to get correct rto_min value
  ehea: fix invalid pointer access
  ne2k-pci: Do not register device until initialized.
  Subject: [PATCH] br2684: restore net_dev initialization
  net: Only store high 16 bits of kernel generated filter priorities
  virtio_net: Fix function name typo
  virtio_net: Cleanup command queue scatterlist usage
  bonding: correct the cleanup in bond_create()
  virtio: add missing include to virtio_net.h
  smsc95xx: add support for LAN9512 and LAN9514
  smsc95xx: configure LED outputs
  netconsole: take care of NETDEV_UNREGISTER event
  xt_socket: checks for the state of nf_conntrack
  bonding: bond_slave_info_query() fix
  cxgb3: fixing gcc 4.4 compiler warning: suggest parentheses around operand of ‘!’
  netfilter: use likely() in xt_info_rdlock_bh()
  ...
2009-05-05 08:26:10 -07:00
Christoph Paasch
b98b4947cb netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE
As packets ending with NEXTHDR_NONE don't have a last extension header,
the check for the length needs to be after the check for NEXTHDR_NONE.

Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-05 15:32:16 +02:00
Marcel Holtmann
a67e899cf3 Bluetooth: Fix issue with sysfs handling for connections
Due to a semantic changes in flush_workqueue() the current approach of
synchronizing the sysfs handling for connections doesn't work anymore. The
whole approach is actually fully broken and based on assumptions that are
no longer valid.

With the introduction of Simple Pairing support, the creation of low-level
ACL links got changed. This change invalidates the reason why in the past
two independent work queues have been used for adding/removing sysfs
devices. The adding of the actual sysfs device is now postponed until the
host controller successfully assigns an unique handle to that link. So
the real synchronization happens inside the controller and not the host.

The only left-over problem is that some internals of the sysfs device
handling are not initialized ahead of time. This leaves potential access
to invalid data and can cause various NULL pointer dereferences. To fix
this a new function makes sure that all sysfs details are initialized
when an connection attempt is made. The actual sysfs device is only
registered when the connection has been successfully established. To
avoid a race condition with the registration, the check if a device is
registered has been moved into the removal work.

As an extra protection two flush_work() calls are left in place to
make sure a previous add/del work has been completed first.

Based on a report by Marc Pignat <marc.pignat@hevs.ch>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Roger Quadros <ext-roger.quadros@nokia.com>
Tested-by: Marc Pignat <marc.pignat@hevs.ch>
2009-05-04 14:29:02 -07:00
Jiri Slaby
6909268dc9 mac80211: pid, fix memory corruption
pid doesn't count with some band having more bitrates than the one
associated the first time.
Fix that by counting the maximal available bitrate count and allocate
big enough space.

Secondly, fix touching uninitialized memory which causes panics.
Index sucked from this random memory points to the hell.
The fix is to sort the rates on each band change.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04 16:22:16 -04:00
Jiri Slaby
8e53217527 mac80211: minstrel, fix memory corruption
minstrel doesn't count max rate count in fact, since it doesn't use
a loop variable `i' and hence allocs space only for bitrates found in
the first band.

Fix it by involving the `i' as an index so that it traverses all the
bands now and finds the real max bitrate count.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-04 16:22:15 -04:00