1
Commit Graph

192491 Commits

Author SHA1 Message Date
Scott Feldman
57b610805c net: Add netlink support for virtual port management (was iovnl)
Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
(added to end of IFLA_cmd list).  These are both nested atrtibutes
using this layout:

              [IFLA_NUM_VF]
              [IFLA_VF_PORTS]
                      [IFLA_VF_PORT]
                              [IFLA_PORT_*], ...
                      [IFLA_VF_PORT]
                              [IFLA_PORT_*], ...
                      ...
              [IFLA_PORT_SELF]
                      [IFLA_PORT_*], ...

These attributes are design to be set and get symmetrically.  VF_PORTS
is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
device.  PORT_SELF is for the PF of the SR-IOV device, in case it wants
to also have a port-profile, or for the case where the VF==PF, like in
enic patch 2/2 of this patch set.

A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev.  A port-profile is an identifier known to the switch.  How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.

There are two types of port-profiles specs in the netlink msg.  The first spec
is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details.  In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs.  For example, both specs
have a notition of associating/deassociating a port-profile.  And both specs
require some information from the hypervisor manager, such as client port
instance ID.

The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile.  What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:49:55 -07:00
Joe Perches
ee289b6440 drivers/net: remove useless semicolons
switch and while statements don't need semicolons at end of statement

[ Fixup minor conflicts with recent wimax merge... -DaveM ]

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:47:34 -07:00
Ursula Braun
5113fec098 qeth: support the new OSA CHPID types OSX and OSM
The qeth driver is enabled to support the new OSA CHPID types OSX
and OSM.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:42:55 -07:00
Julia Lawall
ae57b20a0a drivers/s390/net: Drop memory allocation cast
Drop cast on the result of kmalloc and similar functions.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
@@

- (T *)
  (\(kmalloc\|kzalloc\|kcalloc\|kmem_cache_alloc\|kmem_cache_zalloc\|
   kmem_cache_alloc_node\|kmalloc_node\|kzalloc_node\)(...))
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:42:54 -07:00
Tadashi Abe
95718c1c25 pegasus: fix USB device ID for ETX-US2
USB device ID definition for I-O Data ETX-US2 is wrong.
Correct ID is 0x093a. Here's snippet from /proc/bus/usb/devices;

T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=ff(vend.) Sub=ff Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=04bb ProdID=093a Rev= 1.01
S:  Manufacturer=I-O DATA DEVICE,INC.
S:  Product=I-O DATA ETX2-US2
S:  SerialNumber=A26427
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=224mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=00 Driver=pegasus
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=   8 Ivl=125us

This patch enables pegasus driver to work fine with ETX-US2.

Signed-off-by: Tadashi Abe <tabe@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:41:45 -07:00
Wolfgang Grandegger
56e6943b90 can: sja1000 platform data fixes
The member "clock" of struct "sja1000_platform_data" is documented as
"CAN bus oscillator frequency in Hz" but it's actually used as the CAN
clock frequency, which is half of it. To avoid further confusion, this
patch fixes it by renaming the member to "osc_freq". That way, also
non mainline users will notice the change. The platform code for the
relevant boards is updated accordingly. Furthermore, pre-defined
values are now used for the members "ocr" and "cdr".

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:39:48 -07:00
Eric Dumazet
d19d56ddc8 net: Introduce skb_tunnel_rx() helper
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.

There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:36:55 -07:00
Eric Dumazet
de213e5eed tcp: tcp_synack_options() fix
Commit 33ad798c92 (tcp: options clean up) introduced a problem
if MD5+SACK+timestamps were used in initial SYN message.

Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
sessions, but since 40 bytes of tcp options space are not enough to
store all the bits needed, we chose to disable timestamps in this case.

We send a SYN-ACK _without_ timestamp option, but socket has timestamps
enabled and all further outgoing messages contain a TS block, all with
the initial timestamp of the remote peer.

Fix is to really disable timestamps option for the whole session.

Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:35:36 -07:00
Stephen Hemminger
eedf042a63 ipv6: fix the bug of address check
The duplicate address check code got broken in the conversion
to hlist (2.6.35).  The earlier patch did not fix the case where
two addresses match same hash value. Use two exit paths,
rather than depending on state of loop variables (from macro).

Based on earlier fix by Shan Wei.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:27:12 -07:00
David S. Miller
820ae8a80e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-05-17 21:09:11 -07:00
Baruch Siach
380fefb2dd dm9000: fix "BUG: spinlock recursion"
dm9000_set_rx_csum and dm9000_hash_table are called from atomic context (in
dm9000_init_dm9000), and from non-atomic context (via ethtool_ops and
net_device_ops respectively). This causes a spinlock recursion BUG. Fix this by
renaming these functions to *_unlocked for the atomic context, and make the
original functions locking wrappers for use in the non-atomic context.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:45:48 -07:00
Julia Lawall
5476b8b225 drivers/net/usb: Use kmemdup
Use kmemdup when some other buffer is immediately copied into the
allocated region.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
statement S;
@@

-  to = \(kmalloc\|kzalloc\)(size,flag);
+  to = kmemdup(from,size,flag);
   if (to==NULL || ...) S
-  memcpy(to, from, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:48 -07:00
Julia Lawall
99bf236612 drivers/net/usb: Use kmemdup
Use kmemdup when some other buffer is immediately copied into the
allocated region.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
statement S;
@@

-  to = \(kmalloc\|kzalloc\)(size,flag);
+  to = kmemdup(from,size,flag);
   if (to==NULL || ...) S
-  memcpy(to, from, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:47 -07:00
Julia Lawall
175c044141 drivers/net/usb: Use kmemdup
Use kmemdup when some other buffer is immediately copied into the
allocated region.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
statement S;
@@

-  to = \(kmalloc\|kzalloc\)(size,flag);
+  to = kmemdup(from,size,flag);
   if (to==NULL || ...) S
-  memcpy(to, from, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:47 -07:00
Anton Vorontsov
08d18f3b62 fsl_pq_mdio: Fix mdiobus allocation handling
The driver could return success code even if mdiobus_alloc() failed.
This patch fixes the issue.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:46 -07:00
Patrick McHardy
a2f7922713 net_sched: sch_hfsc: fix classification loops
When attaching filters to a class pointing to a class higher up in the
hierarchy, classification may enter an endless loop. Currently this is
prevented for filters that are already resolved, but not for filters
resolved at runtime.

Only allow filters to point downwards in the hierarchy, similar to what
CBQ does.

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:46 -07:00
Nathan Williams
e1bc7eedba atm: select FW_LOADER in Kconfig for solos-pci
solos-pci uses request_firmware() for firmware upgrades

Signed-off-by: Nathan Williams <nathan@traverse.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:36 -07:00
Florian Fainelli
ce26b4d1d5 r6040: fix link checking with switches
The current link checking logic only works for one port, which is not correct
for swiches were multiple ports can have different link status. As a result
we would only check for link status on port 1 of the switch. Move the calls
to mii_check_media in r6040_timer which will be polling a single PHY chip
correctly and assume link is up for switches.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:36 -07:00
Anton Vorontsov
b14ed884df gianfar: Remove legacy PM callbacks
These callbacks were needed because dev_pm_ops support for OF
platform devices was in the powerpc tree, and the patch that
added dev_pm_ops for gianfar driver was in the netdev tree. Now
that netdev and powerpc trees have merged into Linus' tree, we
can remove the legacy hooks.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
stephen hemminger
f0cd15081a tbf: stop wanton destruction of children (v2)
Several netem users use TBF for rate control. But every time the parameters
of TBF are changed it destroys the child qdisc, requiring reconfigation.
Better to just keep child qdisc and just notify it of changed limit.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
Joe Perches
ccbd6a5a4f net: Remove unnecessary semicolons after switch statements
Also added an explicit break; to avoid
a fallthrough in net/ipv4/tcp_input.c

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
andrew hendry
935e2a26b8 X25: Remove bkl in sockopts
Removes the BKL in x25 setsock and getsockopts.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:28 -07:00
andrew hendry
37cda78741 X25: Move accept approve flag to bitfield
Moves the x25 accept approve flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry
b7792e34cb X25: Move interrupt flag to bitfield
Moves the x25 interrupt flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry
cb863ffd4a X25: Move qbit flag to bitfield
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:26 -07:00
Stanislaw Gruszka
c89af1a308 bnx2x: avoid TX timeout when stopping device
When stop device call netif_carrier_off() just after disabling TX queue to
avoid possibility of netdev watchdog warning and ->ndo_tx_timeout() invocation.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:35:38 -07:00
Michael Chan
a0ba676008 bnx2: Use netif_carrier_off() to prevent timeout.
Based on original patch from Stanislaw Gruszka <sgruszka@redhat.com>.

Using netif_carrier_off() is better than updating all the ->trans_start
on all the tx queues.

netif_carrier_off() needs to be called after bnx2_disable_int_sync()
to guarantee no race conditions with the serdes timers that can
modify the carrier state.

If the chip or phy is reset, carrier will turn back on when we get the
link interrupt.  If there is no reset, we need to turn carrier back on
in bnx2_netif_start().  Again, the phy_lock prevents race conditions with
the serdes timers.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:34:43 -07:00
Michael Chan
a931d29404 bnx2: Update 5709 MIPS firmware and version to 2.0.15.
New firmware fixes a performance regression on small packets.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:33:31 -07:00
Eddie Wai
b98eba5278 bnx2: Fix register printouts during NETEV_WATCHDOG.
Dump the correct MCP registers and add EMAC_RX_STATUS register during
NETDEV_WATCHDOG for debugging.

Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:32:56 -07:00
Amit Kumar Salecha
a7fc948f4d qlcnic: mark device state fail
Device state need to be mark as FAILED, if fail to start firmware.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:30:56 -07:00
Amit Kumar Salecha
20c67bd40e qlcnic: remove unused register
Removing register defines which are not used by Qlgoic CNA device.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:30:55 -07:00
Sucheta Chakraborty
78ad389230 qlcnic: fix internal loopback test
Reset/set DEV_UP bit during allocation and deallocation of resources.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:30:55 -07:00
Amit Kumar Salecha
4d5bdb3848 qlcnic: module param for firmware load option
By default fw is loaded from flash, user can
change this priority using load_fw_file module param.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:30:54 -07:00
Sucheta Chakraborty
7e382594a2 qlcnic: fix rx bytes statistics
Added lrobytes to it.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:30:54 -07:00
Sucheta Chakraborty
02f6e46f35 qlcnic: change adapter name display
Append mac address to adapter name.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:30:53 -07:00
Anirban Chakraborty
aadd8184ae qlcnic: fix memory leaks
Fixes memory leak in error path when memory allocation
for adapter data structures fails.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:30:53 -07:00
Sonic Zhang
0e995cd3d3 netdev: bfin_mac: check for mii_bus platform data
If the platform data for the mii_bus is missing, gracefully error out
rather than deference NULL pointers.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:21:01 -07:00
Mike Frysinger
2bfa0f0c9a netdev: bfin_mac: handle timeouts with the MDIO registers gracefully
Have the low level MDIO functions pass back up timeout information so we
don't waste time polling them multiple times when there is a problem, and
so we don't let higher layers think the device is available when it isn't.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:21:00 -07:00
Sonic Zhang
c0da776bde netdev: bfin_mac: use promiscuous flag for promiscuous mode
Rather than using the Receive All Frames (RAF) bit to enable promiscuous
mode, use the Promiscuous (PR) bit.  This lowers overhead at runtime as
we let the hardware process the packets that should actually be checked.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:21:00 -07:00
Michael Hennerich
53fd3f2829 netdev: bfin_mac: add support for wake-on-lan magic packets
Note that WOL works only in PM Suspend Standby Mode (Sleep Mode).

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:20:59 -07:00
Sonic Zhang
812a9de715 netdev: bfin_mac: clear RXCKS if hardware generated checksum is not enabled
Otherwise we might be get a setting mismatch from a previous module or
bootloader and what the driver currently expects.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:20:59 -07:00
Sonic Zhang
ad2864d887 netdev: bfin_mac: deduce Ethernet FCS from hardware IP payload checksum
IP checksum is based on 16-bit one's complement algorithm, so to deduce a
value from checksum is equal to add its complement.

Unfortunately, the Blackfin on-chip MAC checksum logic only works when the
IP packet has a header length of 20 bytes.  This is true for most IPv4
packets, but not for IPv6 packets or IPv4 packets which use header options.
So only use the hardware checksum when appropriate.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Jon Kowal <jon.kowal@dspecialists.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:20:58 -07:00
Sonic Zhang
f6e1e4f3e5 netdev: bfin_mac: invalid data cache only once for each new rx skb buffer
The skb buffer isn't actually used until we finish transferring and pass
it up to higher layers, so only invalidate the range once before we start
receiving actual data.  This also avoids the problem with data invalidating
on Blackfin systems -- there is no invalidate-only, just invalidate+flush.
So when running in writeback mode, there is the small (but not uncommon)
possibility of the flush overwriting valid DMA-ed data from the cache.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:20:58 -07:00
Peter Meerwald
ec497b32c3 netdev: bfin_mac: handler RX status errors
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:20:50 -07:00
Barry Song
fe92afedee netdev: bfin_mac: add support for IEEE 1588 PTP
Newer on-chip MAC peripherals support IEEE 1588 PTP in the hardware, so
extend the driver to support this functionality.

Signed-off-by: Barry Song <barry.song@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:19:40 -07:00
Eric Dumazet
ab6e3feba1 net: No dst refcounting in ip_queue_xmit()
TCP outgoing packets can avoid two atomic ops, and dirtying
of previously higly contended cache line using new refdst
infrastructure.

Note 1: loopback device excluded because of !IFF_XMIT_DST_RELEASE
Note 2: UDP packets dsts are built before ip_queue_xmit().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:52 -07:00
Eric Dumazet
4a94445c9a net: Use ip_route_input_noref() in input path
Use ip_route_input_noref() in ip fast path, to avoid two atomic ops per
incoming packet.

Note: loopback is excluded from this optimization in ip_rcv_finish()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:51 -07:00
Eric Dumazet
407eadd996 net: implements ip_route_input_noref()
ip_route_input() is the version returning a refcounted dst, while
ip_route_input_noref() returns a non refcounted one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:51 -07:00
Eric Dumazet
7fee226ad2 net: add a noref bit on skb dst
Use low order bit of skb->_skb_dst to tell dst is not refcounted.

Change _skb_dst to _skb_refdst to make sure all uses are catched.

skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.

New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)

skb_dst_drop() drops a reference only if skb dst was refcounted.

skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.

Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().

Use skb_dst_force() in dev_requeue_skb().

Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:50 -07:00
Eric Dumazet
ebda37c27d rps: avoid one atomic in enqueue_to_backlog
If CONFIG_SMP=y, then we own a queue spinlock, we can avoid the atomic
test_and_set_bit() from napi_schedule_prep().

We now have same number of atomic ops per netif_rx() calls than with
pre-RPS kernel.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:50 -07:00