1
linux/net/ipv4
Eric Dumazet 271b72c7fa udp: RCU handling for Unicast packets.
Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
 writes should happen in the fast path.

 Note: Multicasts and broadcasts still will need to take a lock,
 because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.
  Quoting Cristopher Lameter :
   "Right. That results in cacheline cooldown. You'd want to recycle
    the object as they are cache hot on a per cpu basis. That is screwed
    up by the delayed regular rcu processing. We have seen multiple
    regressions due to cacheline cooldown.
    The only choice in cacheline hot sensitive areas is to deal with the
    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 02:11:14 -07:00
..
netfilter netfilter: snmp nat leaks memory in case of failure 2008-10-20 03:33:24 -07:00
af_inet.c ipv4: Allow binding to non-local addresses if IP_TRANSPARENT is set 2008-10-01 07:31:24 -07:00
ah4.c
arp.c netfilter: replace old NF_ARP calls with NFPROTO_ARP 2008-10-20 03:34:51 -07:00
cipso_ipv4.c net: don't use INIT_RCU_HEAD 2008-10-28 13:25:09 -07:00
datagram.c mib: add net to IP_INC_STATS_BH 2008-07-16 20:20:11 -07:00
devinet.c net: don't use INIT_RCU_HEAD 2008-10-28 13:25:09 -07:00
esp4.c ipsec: Interfamily IPSec BEET 2008-08-06 02:39:30 -07:00
fib_frontend.c netns: add namespace parameter to rt_cache_flush 2008-07-05 19:00:44 -07:00
fib_hash.c netns: add namespace parameter to rt_cache_flush 2008-07-05 19:00:44 -07:00
fib_lookup.h
fib_rules.c net: add fib_rules_ops to flush_cache method 2008-07-05 19:01:28 -07:00
fib_semantics.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2008-06-13 20:52:39 -07:00
fib_trie.c proc: consolidate per-net single-release callers 2008-07-18 04:07:44 -07:00
icmp.c net: reduce structures when XFRM=n 2008-10-28 13:24:06 -07:00
igmp.c net: Rationalise email address: Network Specific Parts 2008-10-13 19:01:08 -07:00
inet_connection_sock.c inet: cleanup of local_port_range 2008-10-08 14:18:04 -07:00
inet_diag.c net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) 2008-10-16 15:24:51 -07:00
inet_fragment.c net: convert BUG_TRAP to generic WARN_ON 2008-07-25 21:43:18 -07:00
inet_hashtables.c net: convert BUG_TRAP to generic WARN_ON 2008-07-25 21:43:18 -07:00
inet_lro.c net/inet_lro: remove setting skb->ip_summed when not LRO-able 2008-06-27 20:09:00 -07:00
inet_timewait_sock.c ipv4: Implement IP_TRANSPARENT socket option 2008-10-01 07:30:02 -07:00
inetpeer.c net: remove CVS keywords 2008-06-11 21:00:38 -07:00
ip_forward.c net: reduce structures when XFRM=n 2008-10-28 13:24:06 -07:00
ip_fragment.c net: Rationalise email address: Network Specific Parts 2008-10-13 19:01:08 -07:00
ip_gre.c gre: Initialise rtnl_link tunnel parameters properly 2008-10-11 12:20:15 -07:00
ip_input.c net: Rationalise email address: Network Specific Parts 2008-10-13 19:01:08 -07:00
ip_options.c cipso: Add support for native local labeling and fixup mapping names 2008-10-10 10:16:34 -04:00
ip_output.c ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible 2008-10-01 07:44:42 -07:00
ip_sockglue.c ipv4: Implement IP_TRANSPARENT socket option 2008-10-01 07:30:02 -07:00
ipcomp.c ipcomp: Fix warnings after ipcomp consolidation. 2008-07-27 03:59:24 -07:00
ipconfig.c netns: Use net_eq() to compare net-namespaces for optimization. 2008-07-19 22:34:43 -07:00
ipip.c net: Rationalise email address: Network Specific Parts 2008-10-13 19:01:08 -07:00
ipmr.c net: Rationalise email address: Network Specific Parts 2008-10-13 19:01:08 -07:00
Kconfig IPVS: Move IPVS to net/netfilter/ipvs 2008-10-07 08:38:24 +11:00
Makefile IPVS: Move IPVS to net/netfilter/ipvs 2008-10-07 08:38:24 +11:00
netfilter.c netfilter: netns: fix {ip,6}_route_me_harder() in netns 2008-10-08 11:35:03 +02:00
proc.c tcp: MD5: Use MIB counter instead of warning for MD5 mismatch. 2008-07-30 03:27:25 -07:00
protocol.c net: remove CVS keywords 2008-06-11 21:00:38 -07:00
raw.c mib: add net to IP_INC_STATS 2008-07-16 20:19:49 -07:00
route.c net: don't use INIT_RCU_HEAD 2008-10-28 13:25:09 -07:00
syncookies.c tcp: Port redirection support for TCP 2008-10-01 07:46:49 -07:00
sysctl_net_ipv4.c net: implement emergency route cache rebulds when gc_elasticity is exceeded 2008-10-27 17:06:14 -07:00
tcp_bic.c
tcp_cong.c net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) 2008-10-16 15:24:51 -07:00
tcp_cubic.c
tcp_diag.c net: remove CVS keywords 2008-06-11 21:00:38 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd. 2008-10-07 15:58:17 -07:00
tcp_illinois.c
tcp_input.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
tcp_ipv4.c tcpv[46]: fix md5 pseudoheader address field ordering 2008-10-09 14:37:47 -07:00
tcp_lp.c
tcp_minisocks.c tcp: kill pointless urg_mode 2008-10-07 14:43:06 -07:00
tcp_output.c syncookies: fix inclusion of tcp options in syn-ack 2008-10-26 23:10:12 -07:00
tcp_probe.c tcp: correct kcalloc usage 2008-07-10 16:51:32 -07:00
tcp_scalable.c
tcp_timer.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c tcp: kill pointless urg_mode 2008-10-07 14:43:06 -07:00
tunnel4.c
udp_impl.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
udp.c udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
udplite.c udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
xfrm4_input.c
xfrm4_mode_beet.c ipsec: Interfamily IPSec BEET 2008-08-06 02:39:30 -07:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c xfrm: fix fragmentation for ipv4 xfrm tunnel 2008-06-17 16:38:23 -07:00
xfrm4_output.c
xfrm4_policy.c
xfrm4_state.c
xfrm4_tunnel.c