linux

Author	SHA1	Message	Date
Herbert Xu	d26f398400	[IPSEC]: Make x->lastused an unsigned long Currently x->lastused is u64 which means that it cannot be read/written atomically on all architectures. David Miller observed that the value stored in it is only an unsigned long which is always atomic. So based on his suggestion this patch changes the internal representation from u64 to unsigned long while the user-interface still refers to it as u64. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:52 -08:00
Herbert Xu	0ebea8ef35	[IPSEC]: Move state lock into x->type->input This patch releases the lock on the state before calling x->type->input. It also adds the lock to the spots where they're currently needed. Most of those places (all except mip6) are expected to disappear with async crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:52 -08:00
Herbert Xu	668dc8af31	[IPSEC]: Move integrity stat collection into xfrm_input Similar to the moving out of the replay processing on the output, this patch moves the integrity stat collectin from x->type->input into xfrm_input. This would eventually allow transforms such as AH/ESP to be lockless. The error value EBADMSG (currently unused in the crypto layer) is used to indicate a failed integrity check. In future this error can be directly returned by the crypto layer once we switch to aead algorithms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:51 -08:00
Herbert Xu	716062fd4c	[IPSEC]: Merge most of the input path As part of the work on asynchronous cryptographic operations, we need to be able to resume from the spot where they occur. As such, it helps if we isolate them to one spot. This patch moves most of the remaining family-specific processing into the common input code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:50 -08:00
Herbert Xu	862b82c6f9	[IPSEC]: Merge most of the output path As part of the work on asynchrnous cryptographic operations, we need to be able to resume from the spot where they occur. As such, it helps if we isolate them to one spot. This patch moves most of the remaining family-specific processing into the common output code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:48 -08:00
Herbert Xu	ef76bc23ef	[IPV6]: Add ip6_local_out Most callers of the LOCAL_OUT chain will set the IP packet length before doing so. They also share the same output function dst_output. This patch creates a new function called ip6_local_out which does all of that and converts the appropriate users over to it. Apart from removing duplicate code, it will also help in merging the IPsec output path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:47 -08:00
Herbert Xu	227620e295	[IPSEC]: Separate inner/outer mode processing on input With inter-family transforms the inner mode differs from the outer mode. Attempting to handle both sides from the same function means that it needs to handle both IPv4 and IPv6 which creates duplication and confusion. This patch separates the two parts on the input path so that each function deals with one family only. In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut moves the pertinent fields from the IPv4/IPv6 IP headers into a neutral format stored in skb->cb. This is then used by the inner mode input functions to modify the inner IP header. In this way the input function no longer has to know about the outer address family. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:46 -08:00
Herbert Xu	36cf9acf93	[IPSEC]: Separate inner/outer mode processing on output With inter-family transforms the inner mode differs from the outer mode. Attempting to handle both sides from the same function means that it needs to handle both IPv4 and IPv6 which creates duplication and confusion. This patch separates the two parts on the output path so that each function deals with one family only. In particular, the functions xfrm4_extract_output/xfrm6_extract_output moves the pertinent fields from the IPv4/IPv6 IP headers into a neutral format stored in skb->cb. This is then used by the outer mode output functions to write the outer IP header. In this way the output function no longer has to know about the inner address family. Since the extract functions are only called by tunnel modes (the only modes that can support inter-family transforms), I've also moved the xfrm*_tunnel_check_size calls into them. This allows the correct ICMP message to be sent as opposed to now where you might call icmp_send with an IPv6 packet and vice versa. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:45 -08:00
Herbert Xu	29bb43b4ec	[INET]: Give outer DSCP directly to ip*_copy_dscp This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so that they directly take the outer DSCP rather than the outer IP header. This will help us to unify the code for inter-family tunnels. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:45 -08:00
Herbert Xu	a2deb6d26f	[IPSEC]: Move x->outer_mode->output out of locked section RO mode is the only one that requires a locked output function. So it's easier to move the lock into that function rather than requiring everyone else to run under the lock. In particular, this allows us to move the size check into the output function without causing a potential dead-lock should the ICMP error somehow hit the same SA on transmission. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:44 -08:00
Herbert Xu	e40b328615	[IPSEC]: Forbid BEET + ipcomp for now While BEET can theoretically work with IPComp the current code can't do that because it tries to construct a BEET mode tunnel type which doesn't (and cannot) exist. In fact as it is it won't even attach a tunnel object at all for BEET which is bogus. To support this fully we'd also need to change the policy checks on input to recognise a plain tunnel as a legal variant of an optional BEET transform. This patch simply fails such constructions for now. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:43 -08:00
Herbert Xu	25ee3286dc	[IPSEC]: Merge common code into xfrm_bundle_create Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are common. This patch extracts that logic and puts it into xfrm_bundle_create. The rest of it are then accessed through afinfo. As a result this fixes the problem with inter-family transforms where we treat every xfrm dst in the bundle as if it belongs to the top family. This patch also fixes a long-standing error-path bug where we may free the xfrm states twice. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:43 -08:00
Herbert Xu	66cdb3ca27	[IPSEC]: Move flow construction into xfrm_dst_lookup This patch moves the flow construction from the callers of xfrm_dst_lookup into that function. It also changes xfrm_dst_lookup so that it takes an xfrm state as its argument instead of explicit addresses. This removes any address-specific logic from the callers of xfrm_dst_lookup which is needed to correctly support inter-family transforms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:42 -08:00
Herbert Xu	f04e7e8d7f	[IPSEC]: Replace x->type->{local,remote}_addr with flags The functions local_addr and remote_addr are more than what they're needed for. The same thing can be done easily with flags on the type object. This patch does that and simplifies the wrapper functions in xfrm6_policy accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:41 -08:00
Herbert Xu	fff6938880	[IPSEC]: Make sure idev is consistent with dev in xfrm_dst Previously we took the device from the bottom route and idev from the top route. This is bad because idev may well point to a different device. This patch changes it so that we get the idev from the device directly. It also makes it an error if either dev or idev is NULL. This is consistent with the rest of the routing code which also treats these cases as errors. I've removed the err initialisation in xfrm6_policy.c because it achieves no purpose and hid a bug when an initial version of this patch neglected to set err to -ENODEV (fortunately the IPv4 version warned about it). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:40 -08:00
Herbert Xu	45ff5a3f9a	[IPSEC]: Set dst->input to dst_discard The input function should never be invoked on IPsec dst objects. This is because we don't apply IPsec on input until after we've made the routing decision. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:40 -08:00
Herbert Xu	8ce68ceb55	[IPSEC]: Only set neighbour on top xfrm dst The neighbour field is only used by dst_confirm which only ever happens on the top-most xfrm dst. So it's a waste to duplicate for every other xfrm dst. This patch moves its setting out of the loop so that only the top one gets set. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:39 -08:00
Herbert Xu	352e512c32	[NET]: Eliminate duplicate copies of dst_discard We have a number of copies of dst_discard scattered around the place which all do the same thing, namely free a packet on the input or output paths. This patch deletes all of them except dst_discard and points all the users to it. The only non-trivial bit is decnet where it returns an error. However, conceptually this is identical to the blackhole functions used in IPv4 and IPv6 which do not return errors. So they should either all return errors or all return zero. For now I've stuck with the majority and picked zero as the return value. It doesn't really matter in practice since few if any driver would react differently depending on a zero return value or NET_RX_DROP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:37 -08:00
Herbert Xu	b4ce92775c	[IPV6]: Move nfheader_len into rt6_info The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. It also reorders the fields in rt6_info to minimize holes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:37 -08:00
Herbert Xu	0148894223	[IPV6]: Only set nfheader_len for top xfrm dst We only need to set nfheader_len in the top xfrm dst. This is because we only ever read the nfheader_len from the top xfrm dst. It is also easier to count nfheader_len as part of header_len which then lets us remove the ugly wrapper functions for incrementing and decrementing header lengths in xfrm6_policy.c. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Pavel Emelyanov	b24b8a247f	[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Wang Chen	a92aa318b4	[IPV6]: Add raw6 drops counter. Add raw drops counter for IPv6 in /proc/net/raw6 . Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:34 -08:00
Jens Axboe	a0974dd3da	[TCP] splice: add tcp_splice_read() to IPV6 Thanks to YOSHIFUJI Hideaki for the hint! Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:32 -08:00
Rolf Manderscheid	a9e527e3f9	IPoIB: improve IPv4/IPv6 to IB mcast mapping functions An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:37 -08:00
Herbert Xu	f945fa7ad9	[INET]: Fix truesize setting in ip_append_data As it is ip_append_data only counts page fragments to the skb that allocated it. As such it means that the first skb gets hit with a 4K charge even though it might have only used a fraction of it while all subsequent skb's that use the same page gets away with no charge at all. This bug was exposed by the UDP accounting patch. [ The wmem_alloc bumping needs to be moved with the truesize, noticed by Takahiro Yasui. -DaveM ] Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-23 03:11:43 -08:00
Wang Chen	fa95c28322	[IPV6]: RFC 2011 compatibility broken The snmp6 entry name was changed, and it broke compatibility to RFC 2011. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-21 03:05:43 -08:00
Wang Chen	c964ff4ffb	[IPV6]: ICMP6_MIB_OUTMSGS increment duplicated icmpv6_send() calls ip6_push_pending_frames() indirectly. Both ip6_push_pending_frames() and icmpv6_send() increment counter ICMP6_MIB_OUTMSGS. This patch remove the increment from icmpv6_send. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-21 03:05:20 -08:00
YOSHIFUJI Hideaki	398bcbebb6	[IPV6] ROUTE: Make sending algorithm more friendly with RFC 4861. We omit (or delay) sending NSes for known-to-unreachable routers (in NUD_FAILED state) according to RFC 4191 (Default Router Preferences and More-Specific Routes). But this is not fully compatible with RFC 4861 (Neighbor Discovery Protocol for IPv6), which does not remember unreachability of neighbors. So, let's avoid mixing sending algorithm of RFC 4191 and that of RFC 4861, and make the algorithm more friendly with RFC 4861 if RFC 4191 is disabled. Issue was found by IPv6 Ready Logo Core Self_Test 1.5.0b2 (by TAHI Project), and has been tracked down by Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:40 -08:00
Pavel Emelyanov	b3652b2dc5	[IPV6]: Mischecked tw match in __inet6_check_established. When looking for a conflicting connection the !sk->sk_bound_dev_if check is performed only for live sockets, but not for timewait-ed. This is not the case for ipv4, for __inet6_lookup_established in both ipv4 and ipv6 and for other places that check for tw-s. Was this missed accidentally? If so, then this patch fixes it and besides makes use if the dif variable declared in the function. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:36 -08:00
Yasuyuki Kozakai	8f41f01786	[NETFILTER]: ip6t_eui64: Fixes calculation of Universal/Local bit RFC2464 says that the next to lowerst order bit of the first octet of the Interface Identifier is formed by complementing the Universal/Local bit of the EUI-64. But ip6t_eui64 uses OR not XOR. Thanks Peter Ivancik for reporing this bug and posting a patch for it. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-10 22:40:39 -08:00
Brian Haley	1ac4f00885	[IPV6]: IPV6_MULTICAST_IF setting is ignored on link-local connect() Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-08 23:52:21 -08:00
Joe Perches	bea8519547	[IPV6]: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 14:01:35 -08:00
Wei Yongjun	cf6fc4a924	[IPV6]: Fix the return value of ipv6_getsockopt If CONFIG_NETFILTER if not selected when compile the kernel source code, ipv6_getsockopt will returen an EINVAL error if optname is not supported by the kernel. But if CONFIG_NETFILTER is selected, ENOPROTOOPT error will be return. This patch fix to always return ENOPROTOOPT error if optname argument of ipv6_getsockopt is not supported by the kernel. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-16 13:39:57 -08:00
Thomas Graf	95a02cfd4d	[IPv6] ESP: Discard dummy packets introduced in rfc4303 RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-11 02:45:27 -08:00
YOSHIFUJI Hideaki	1df2e44560	[IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx. RTCF_xxx flags, defined in include/linux/in_route.h) are available for IPv4 route (rtable) entries only. Use RTF_xxx flags instead, defined in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-11 02:45:24 -08:00
Mitsuru Chinen	ca46f9c834	[IPv6] SNMP: Increment OutNoRoutes when connecting to unreachable network IPv6 stack doesn't increment OutNoRoutes counter when IP datagrams is being discarded because no route could be found to transmit them to their destination. IPv6 stack should increment the counter. Incidentally, IPv4 stack increments that counter in such situation. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-07 01:06:30 -08:00
Evgeniy Polyakov	d31c7b8fa3	[IPV6]: Restore IPv6 when MTU is big enough Avaid provided test application, so bug got fixed. IPv6 addrconf removes ipv6 inner device from netdev each time cmu changes and new value is less than IPV6_MIN_MTU (1280 bytes). When mtu is changed and new value is greater than IPV6_MIN_MTU, it does not add ipv6 addresses and inner device bac. This patch fixes that. Tested with Avaid's application, which works ok now. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-11-30 23:36:08 +11:00
YOSHIFUJI Hideaki	77adefdc98	[IPV6] TCPMD5: Fix deleting key operation. Due to the bug, refcnt for md5sig pool was leaked when an user try to delete a key if we have more than one key. In addition to the leakage, we returned incorrect return result value for userspace. This fix should close Bug #9418, reported by <ming-baini@163.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-20 17:31:23 -08:00
YOSHIFUJI Hideaki	aacbe8c880	[IPV6] TCPMD5: Check return value of tcp_alloc_md5sig_pool(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-20 17:30:56 -08:00
Joe Perches	3b6d821c4f	[IPV6]: Add missing "space" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-19 23:47:25 -08:00
Pierre Ynard	dbb2ed2485	[IPV6]: Add ifindex field to ND user option messages. Userland neighbor discovery options are typically heavily involved with the interface on which thay are received: add a missing ifindex field to the original struct. Thanks to R�mi Denis-Courmont. Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-12 17:58:35 -08:00
Denis V. Lunev	2994c63863	[INET]: Small possible memory leak in FIB rules This patch fixes a small memory leak. Default fib rules can be deleted by the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by ip rule flush Such a rule will not be freed as the ref-counter has 2 on start and becomes clearly unreachable after removal. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 22:12:03 -08:00
Pavel Emelyanov	03f49f3457	[NET]: Make helper to get dst entry and "use" it There are many places that get the dst entry, increase the __use counter and set the "lastuse" time stamp. Make a helper for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:28:34 -08:00
Eric Dumazet	230140cffa	[INET]: Remove per bucket rwlock in tcp/dccp ehash table. As done two years ago on IP route cache table (commit `22c047ccbc`) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle performance differences. (we hit a different cache line for the rwlock, but then the bucket cache line have a better sharing factor among cpus, since we dirty it less often). For netstat or ss commands that want a full scan of hash table, we perform fewer memory accesses. Using a 'small' table of hashed rwlocks should be more than enough to provide correct SMP concurrency between different buckets, without using too much memory. Sizing of this table depends on num_possible_cpus() and various CONFIG settings. This patch provides some locking abstraction that may ease a future work using a different model for TCP/DCCP table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:11 -08:00
Herbert Xu	4999f3621f	[IPSEC]: Fix crypto_alloc_comp error checking The function crypto_alloc_comp returns an errno instead of NULL to indicate error. So it needs to be tested with IS_ERR. This is based on a patch by Vicen� Beltran Querol. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:03 -08:00
Alexey Dobriyan	33120b30cc	[IPV6]: Convert /proc/net/ipv6_route to seq_file interface This removes last proc_net_create() user. Kudos to Benjamin Thery and Stephen Hemminger for comments on previous version. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:18 -08:00
Eric Dumazet	c5a432f1a1	[IPV6]: Use the {DEFINE\|REF}_PROTO_INUSE infrastructure Trivial patch to make "tcpv6,udpv6,udplitev6,rawv6" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:59 -08:00
Eric Dumazet	286ab3d460	[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. "struct proto" currently uses an array stats[NR_CPUS] to track change on 'inuse' sockets per protocol. If NR_CPUS is big, this means we use a big memory area for this. Moreover, all this memory area is located on a single node on NUMA machines, increasing memory pressure on the boot node. In this patch, I tried to : - Keep a fast !CONFIG_SMP implementation - Keep a fast CONFIG_SMP implementation for often used protocols (tcp,udp,raw,...) - Introduce a NUMA efficient implementation Some helper macros are defined in include/net/sock.h These macros take into account CONFIG_SMP If a "struct proto" is declared without using DEFINE_PROTO_INUSE / REF_PROTO_INUSE macros, it will automatically use a default implementation, using a dynamically allocated percpu zone. This default implementation will be NUMA efficient, but might use 32/64 bytes per possible cpu because of current alloc_percpu() implementation. However it still should be better than previous implementation based on stats[NR_CPUS] field. When a "struct proto" is changed to use the new macros, we use a single static "int" percpu variable, lowering the memory and cpu costs, still preserving NUMA efficiency. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:57 -08:00
Mitsuru Chinen	7a0ff716c2	[IPv6] SNMP: Restore Udp6InErrors incrementation As the checksum verification is postponed till user calls recv or poll, the inrementation of Udp6InErrors counter should be also postponed. Currently, it is postponed in non-blocking operation case. However it should be postponed in all case like the IPv4 code. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:54 -08:00
Pavel Emelyanov	bf138862b1	[IPV6]: Consolidate the ip cork destruction in ip6_output.c The ip6_push_pending_frames and ip6_flush_pending_frames do the same things to flush the sock's cork. Move this into a separate function and save ~100 bytes from the .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:26 -08:00

1 2 3 4 5 ...

1035 Commits