linux

Author	SHA1	Message	Date
Alexey Dobriyan	df200969b1	[NETFILTER]: netns: put table module on netns stop When number of entries exceeds number of initial entries, foo-tables code will pin table module. But during table unregister on netns stop, that additional pin was forgotten. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:41 -08:00
Alexey Dobriyan	9ea0cb2601	[NETFILTER]: arp_tables: per-netns arp_tables FILTER Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:41 -08:00
Alexey Dobriyan	79df341ab6	[NETFILTER]: arp_tables: netns preparation * Propagate netns from userspace. * arpt_register_table() registers table in supplied netns. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:40 -08:00
Alexey Dobriyan	9335f047fe	[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW Now, iptables show and configure different set of rules in different netnss'. Filtering decisions are still made by consulting only init_net's set. Changes are identical except naming so no splitting. P.S.: one need to remove init_net checks in nf_sockopt.c and inet_create() to see the effect. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:38 -08:00
Alexey Dobriyan	34bd137ba7	[NETFILTER]: ip_tables: propagate netns from userspace .. all the way down to table searching functions. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:37 -08:00
Alexey Dobriyan	44d34e721e	[NETFILTER]: x_tables: return new table from {arp,ip,ip6}t_register_table() Typical table module registers xt_table structure (i.e. packet_filter) and link it to list during it. We can't use one template for it because corresponding list_head will become corrupted. We also can't unregister with template because it wasn't changed at all and thus doesn't know in which list it is. So, we duplicate template at the very first step of table registration. Table modules will save it for use during unregistration time and actual filtering. Do it at once to not screw bisection. P.S.: renaming i.e. packet_filter => __packet_filter is temporary until full netnsization of table modules is done. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:36 -08:00
Alexey Dobriyan	8d87005207	[NETFILTER]: x_tables: per-netns xt_tables In fact all we want is per-netns set of rules, however doing that will unnecessary complicate routines such as ipt_hook()/ipt_do_table, so make full xt_table array per-netns. Every user stubbed with init_net for a while. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:35 -08:00
Alexey Dobriyan	a98da11d88	[NETFILTER]: x_tables: change xt_table_register() return value convention Switch from 0/-E to ptr/PTR_ERR convention. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:35 -08:00
Jan Engelhardt	abfdf1c489	[NETFILTER]: ebtables: remove casts, use consts Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:33 -08:00
Patrick McHardy	d44caf88e8	[NETFILTER]: nf_nat: remove double bysource hash initialization The hash table is already initialized by nf_ct_alloc_hashtable(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:28 -08:00
Jan Engelhardt	ecb6f85e11	[NETFILTER]: Use const in struct xt_match, xt_target, xt_table Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:28 -08:00
Denis V. Lunev	3046d76746	[RAW]: Wrong content of the /proc/net/raw6. The address of IPv6 raw sockets was shown in the wrong format, from IPv4 ones. The problem has been introduced by the commit `42a73808ed` ("[RAW]: Consolidate proc interface.") Thanks to Adrian Bunk who originally noticed the problem. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:26 -08:00
Denis V. Lunev	8cd850efa4	[RAW]: Cleanup IPv4 raw_seq_show. There is no need to use 128 bytes on the stack at all. Clean the code in the IPv6 style. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:25 -08:00
Denis V. Lunev	377cf82d66	[RAW]: Family check in the /proc/net/raw[6] is extra. Different hashtables are used for IPv6 and IPv4 raw sockets, so no need to check the socket family in the iterator over hashtables. Clean this out. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:24 -08:00
Herbert Xu	b1641064a3	[IPCOMP]: Fix reception of incompressible packets I made a silly typo by entering IPPROTO_IP (== 0) instead of IPPROTO_IPIP (== 4). This broke the reception of incompressible packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:24 -08:00
Eric Dumazet	e242297055	[NET]: should explicitely initialize atomic_t field in struct dst_ops All but one struct dst_ops static initializations miss explicit initialization of entries field. As this field is atomic_t, we should use ATOMIC_INIT(0), and not rely on atomic_t implementation. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:23 -08:00
Ilpo Järvinen	ad1984e844	[TCP]: NewReno must count every skb while marking losses NewReno should add cnt per skb (as with FACK) instead of depending on SACKED_ACKED bits which won't be set with it at all. Effectively, NewReno should always exists after the first iteration anyway (or immediately if there's already head in lost_out. This was fixed earlier in net-2.6.25 but got reverted among other stuff and I didn't notice that this is still necessary (actually wasn't even considering this case while trying to figure out the reports because I lived with different kind of code than it in reality was). This should solve the WARN_ONs in TCP code that as a result of this triggered multiple times in every place we check for this invariant. Special thanks to Dave Young <hidave.darkstar@gmail.com> and Krishna Kumar2 <krkumar2@in.ibm.com> for trying with my debug patches. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Dave Young <hidave.darkstar@gmail.com> Tested-by: Krishna Kumar2 <krkumar2@in.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:22 -08:00
Eric Dumazet	533cb5b0a6	[XFRM]: constify 'struct xfrm_type' Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:20 -08:00
Laszlo Attila Toth	4a19ec5800	[NET]: Introducing socket mark socket option. A userspace program may wish to set the mark for each packets its send without using the netfilter MARK target. Changing the mark can be used for mark based routing without netfilter or for packet filtering. It requires CAP_NET_ADMIN capability. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:19 -08:00
Herbert Xu	2614fa59fa	[IPCOMP]: Fetch nexthdr before ipch is destroyed When I moved the nexthdr setting out of IPComp I accidently moved the reading of ipch->nexthdr after the decompression. Unfortunately this means that we'd be reading from a stale ipch pointer which doesn't work very well. This patch moves the reading up so that we get the correct nexthdr value. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:11 -08:00
Julian Anastasov	936f6f8e1b	[IPV4] fib_trie: apply fixes from fib_hash Update fib_trie with some fib_hash fixes: - check for duplicate alternative routes for prefix+tos+priority when replacing route - properly insert by matching tos together with priority - fix alias walking to use list_for_each_entry_continue for insertion and deletion when fa_head is not NULL - copy state from fa to new_fa on replace (not a problem for now) - additionally, avoid replacement without error if new route is same, as Joonwoo Park suggests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:10 -08:00
Julian Anastasov	c18865f392	[IPV4] fib: fix route replacement, fib_info is shared fib_info can be shared by many route prefixes but we don't want duplicate alternative routes for a prefix+tos+priority. Last change was not correct to check fib_treeref because it accounts usage from other prefixes. Additionally, avoid replacement without error if new route is same, as Joonwoo Park suggests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:10 -08:00
Arnaldo Carvalho de Melo	8cf8e5a67f	[INET_DIAG]: Fix inet_diag_lock_handler error path. Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=9825 The inet_diag_lock_handler function uses ERR_PTR to encode errors but its callers were testing against NULL. This only happens when the only inet_diag modular user, DCCP, is not built into the kernel or available as a module. Also there was a problem with not dropping the mutex lock when a handler was not found, also fixed in this patch. This caused an OOPS and ss would then hang on subsequent calls, as &inet_diag_table_mutex was being left locked. Thanks to spike at ml.yaroslavl.ru for report it after trying 'ss -d' on a kernel that doesn't have DCCP available. This bug was introduced in cset `d523a328fb` ("Fix inet_diag dead-lock regression"), after 2.6.24-rc3, so just 2.6.24 seems to be affected. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:08 -08:00
Herbert Xu	29ffe1a5c5	[INET]: Prevent out-of-sync truesize on ip_fragment slow path When ip_fragment has to hit the slow path the value of skb->truesize may go out of sync because we would have updated it without changing the packet length. This violates the constraints on truesize. This patch postpones the update of skb->truesize to prevent this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:07 -08:00
Herbert Xu	1a6509d991	[IPSEC]: Add support for combined mode algorithms This patch adds support for combined mode algorithms with GCM being the first algorithm supported. Combined mode algorithms can be added through the xfrm_user interface using the new algorithm payload type XFRMA_ALG_AEAD. Each algorithms is identified by its name and the ICV length. For the purposes of matching algorithms in xfrm_tmpl structures, combined mode algorithms occupy the same name space as encryption algorithms. This is in line with how they are negotiated using IKE. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:03 -08:00
Herbert Xu	38320c70d2	[IPSEC]: Use crypto_aead and authenc in ESP This patch converts ESP to use the crypto_aead interface and in particular the authenc algorithm. This lays the foundations for future support of combined mode algorithms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:02 -08:00
Paul Moore	16efd45435	NetLabel: Add secid token support to the NetLabel secattr struct This patch adds support to the NetLabel LSM secattr struct for a secid token and a type field, paving the way for full LSM/SELinux context support and "static" or "fallback" labels. In addition, this patch adds a fair amount of documentation to the core NetLabel structures used as part of the NetLabel kernel API. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-01-30 08:17:19 +11:00
Stephen Hemminger	ac97f75faa	[IPV4] fib_trie: remove unneeded NULL check Since fib_route_seq_show now uses hlist_for_each_entry(), the leaf info can not be NULL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:26 -08:00
Stephen Hemminger	f638a2f057	[IPV4] fib_trie: More whitespace cleanup. Remove extra blank lines. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:25 -08:00
Denis V. Lunev	dde1bc0e6f	[NETNS]: Add namespace for ICMP replying code. All needed API is done, the namespace is available when required from the device on the DST entry from the incoming packet. So, just replace init_net with proper namespace. Other protocols will follow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:13 -08:00
Denis V. Lunev	b5921910a1	[NETNS]: Routing cache virtualization. Basically, this piece looks relatively easy. Namespace is already available on the dst entry via device and the device is safe to dereferrence. Compare it with one of a searcher and skip entry if appropriate. The only exception is ip_rt_frag_needed. So, add namespace parameter to it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:13 -08:00
Denis V. Lunev	f206351a50	[NETNS]: Add namespace parameter to ip_route_output_key. Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:07 -08:00
Denis V. Lunev	f1b050bf7a	[NETNS]: Add namespace parameter to ip_route_output_flow. Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:06 -08:00
Denis V. Lunev	611c183ebc	[NETNS]: Add namespace parameter to __ip_route_output_key. This is only required to propagate it down to the ip_route_output_slow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:05 -08:00
Denis V. Lunev	b40afd0e5c	[NETNS]: Add namespace parameter to ip_route_output_slow. This function needs a net namespace to lookup devices, fib tables, etc. in, so pass it there. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:05 -08:00
Denis V. Lunev	1ab352768f	[NETNS]: Add namespace parameter to ip_dev_find. in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:04 -08:00
Denis V. Lunev	010278ec4c	[NETNS]: Add netns parameter to fib_select_default. Currently fib_select_default calls fib_get_table() with the init_net. Prepare it to provide a correct namespace to lookup default route. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:03 -08:00
Denis V. Lunev	64c2d53829	[IPV4]: Consolidate fib_select_default. The difference in the implementation of the fib_select_default when CONFIG_IP_MULTIPLE_TABLES is (not) defined looks negligible. Consolidate it and place into fib_frontend.c. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:02 -08:00
Stephen Hemminger	d5ce8a0e97	[IPV4] fib_trie: avoid rescan on dump This converts dumping (and flushing) of large route tables form O(N^2) to O(N). If the route dump took multiple pages then the dump routine gets called again. The old code kept track of location by counter, the new code instead uses the last key. This is a really big win ( 0.3 sec vs 12 sec) for big route tables. One side effect is that if the table changes during the dump, then the last key will not be found, and we will return -EBUSY. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:01 -08:00
Stephen Hemminger	9195bef7fb	[IPV4] fib_trie: avoid extra search on delete Get rid of extra search that made route deletion O(n). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:00 -08:00
Stephen Hemminger	a88ee22925	[IPV4] fib_trie: dump table in sorted order It is easier with TRIE to dump the data traversal rather than interating over every possible prefix. This saves some time and makes the dump come out in sorted order. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:00 -08:00
Stephen Hemminger	82cfbb0085	[IPV4] fib_trie: iterator recode Remove the complex loop structure of nextleaf() and replace it with a simpler tree walker. This improves the performance and is much cleaner. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:59 -08:00
Stephen Hemminger	64347f786d	[IPV4] fib_trie: dump message multiple part flag Match fib_hash, and set NLM_F_MULTI to handle multiple part messages. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:58 -08:00
Stephen Hemminger	1328042e26	[IPV4] fib_trie: use hash list The code to dump can use the existing hash chain rather than doing repeated lookup. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:58 -08:00
Stephen Hemminger	936722922f	[IPV4] fib_trie: compute size when needed Compute the number of prefixes when needed, rather than doing bookeeping. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:57 -08:00
Stephen Hemminger	a07f5f508a	[IPV4] fib_trie: style cleanup Style cleanups: * make check_leaf return -1 or plen, rather than by reference * Get rid of #ifdef that is always set * split out embedded function calls in if statements. * checkpatch warnings Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:56 -08:00
Stephen Hemminger	bc3c8c1e02	[IPV4] fib_trie: put leaf nodes in a slab cache This improves locality for operations that touch all the leaves. Save space since these entries don't need to be hardware cache aligned. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:56 -08:00
Eric Dumazet	69a73829db	[DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64 On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to 0x180 bytes by SLAB allocator. We can reduce this to exactly 0x140 bytes, without alignment overhead, and store 12 struct rtable per PAGE instead of 10. rate_tokens is currently defined as an "unsigned long", while its content should not exceed 6*HZ. It can safely be converted to an unsigned int. Moving tclassid right after rate_tokens to fill the 4 bytes hole permits to save 8 bytes on 'struct dst_entry', which finally permits to save 8 bytes on 'struct rtable' Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:41 -08:00
Pavel Emelyanov	81566e8322	[NETNS][FRAGS]: Make the pernet subsystem for fragments. On namespace start we mainly prepare the ctl variables. When the namespace is stopped we have to kill all the fragments that point to this namespace. The inet_frags_exit_net() handles it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:40 -08:00
Pavel Emelyanov	3140c25c82	[NETNS][FRAGS]: Make the LRU list per namespace. The inet_frags.lru_list is used for evicting only, so we have to make it per-namespace, to evict only those fragments, who's namespace exceeded its high threshold, but not the whole hash. Besides, this helps to avoid long loops in evictor. The spinlock is not per-namespace because it protects the hash table as well, which is global. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:39 -08:00
Pavel Emelyanov	3b4bc4a2bf	[NETNS][FRAGS]: Isolate the secret interval from namespaces. Since we have one hashtable to lookup the fragment, having different secret_interval-s for hash rebuild doesn't make sense, so move this one to inet_frags. The inet_frags_ctl becomes empty after this, so remove it. The appropriate ctl table is kept read-only in namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:39 -08:00
Pavel Emelyanov	e31e0bdc7e	[NETNS][FRAGS]: Make thresholds work in namespaces. This is the same as with the timeout variable. Currently, after exceeding the high threshold _all_ the fragments are evicted, but it will be fixed in later patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:38 -08:00
Pavel Emelyanov	b2fd5321dd	[NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces. Move it to the netns_frags, adjust the usage and make the appropriate ctl table writable. Now fragment, that live in different namespaces can live for different times. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:37 -08:00
Pavel Emelyanov	e4a2d5c2bc	[NETNS][FRAGS]: Duplicate sysctl tables for new namespaces. Each namespace has to have own tables to tune their different parameters, so duplicate the tables and register them. All the tables in sub-namespaces are temporarily made read-only. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:37 -08:00
Pavel Emelyanov	6ddc082223	[NETNS][FRAGS]: Make the mem counter per-namespace. This is also simple, but introduces more changes, since then mem counter is altered in more places. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:36 -08:00
Pavel Emelyanov	e5a2bb842c	[NETNS][FRAGS]: Make the nqueues counter per-namespace. This is simple - just move the variable from struct inet_frags to struct netns_frags and adjust the usage appropriately. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:35 -08:00
Pavel Emelyanov	ac18e7509e	[NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces. Since fragment management code is consolidated, we cannot have the pointer from inet_frag_queue to struct net, since we must know what king of fragment this is. So, I introduce the netns_frags structure. This one is currently empty, but will be eventually filled with per-namespace attributes. Each inet_frag_queue is tagged with this one. The conntrack_reasm is not "netns-izated", so it has one static netns_frags instance to keep working in init namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:34 -08:00
Pavel Emelyanov	8d8354d2fb	[NETNS][FRAGS]: Move ctl tables around. This is a preparation for sysctl netns-ization. Move the ctl tables to the files, where the tuning variables reside. Plus make the helpers to register the tables. This will simplify the later patches and will keep similar things closer to each other. ipv4, ipv6 and conntrack_reasm are patched differently, but the result is all the tables are in appropriate files. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:34 -08:00
YOSHIFUJI Hideaki	fc80be87dc	[IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void. Fix following sparse warnings: \| net/ipv4/udp.c:421:2: warning: returning void-valued expression \| net/ipv4/udplite.c:38:2: warning: returning void-valued expression Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-01-28 15:10:24 -08:00
Denis V. Lunev	ecfdc8c542	[NETNS]: Pass correct namespace in ip_rt_get_source. ip_rt_get_source is the infamous place for which dst_ifdown kludges have been implemented. This means that rt->u.dst.dev can be safely dereferrenced obtain nd_net. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:23 -08:00
Denis V. Lunev	84a885f449	[NETNS]: Pass correct namespace in ip_route_input_slow. The packet on the input path always has a referrence to an input network device it is passed from. Extract network namespace from it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:22 -08:00
Denis V. Lunev	86167a377f	[NETNS]: Pass correct namespace in context fib_check_nh. Correct network namespace is already used in fib_check_nh. Re-work its usage for better readability and pass into fib_lookup & inetdev_by_index. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:21 -08:00
Denis V. Lunev	5b707aaae4	[NETNS]: Pass correct namespace in fib_validate_source. Correct network namespace is available inside fib_validate_source. It can be obtained from the device passed in. The device is not NULL as in_device is obtained from it just above. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:21 -08:00
Denis V. Lunev	7fee0ca237	[NETNS]: Add netns parameter to inetdev_by_index. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:20 -08:00
Denis V. Lunev	da0e28cb68	[NETNS]: Add netns parameter to fib_lookup. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:19 -08:00
Stephen Hemminger	ba93ef7465	[IPV4]: ipmr sparse warnings Get rid of some of the sparse warnings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:18 -08:00
Stephen Hemminger	dd329bfa96	[IPV4]: igmp sparse warnings Partial sparse warning fix. The other conditional locking is too much for sparse to handle. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:18 -08:00
Jan Engelhardt	1e637c74b0	[IPV4]: Enable use of 240/4 address space. This short patch modifies the IPv4 networking to enable use of the 240.0.0.0/4 (aka "class-E") address space as propsed in the internet draft draft-fuller-240space-00.txt. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:44 -08:00
Denis V. Lunev	51314a17ba	[NETNS]: Process FIB rule action in the context of the namespace. Save namespace context on the fib rule at the rule creation time and call routing lookup in the correct namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:14 -08:00
Denis V. Lunev	9e3a548781	[NETNS]: FIB rules API cleanup. Remove struct net from fib_rules_register(unregister)/notify_change paths and diet code size a bit. add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65) function old new delta notify_rule_change 273 280 +7 trie_show_stats 471 475 +4 fn_trie_delete 473 477 +4 fib_rules_unregister 144 148 +4 fib4_rule_compare 119 123 +4 resize 2842 2845 +3 fn_trie_select_default 515 518 +3 inet_sk_rebuild_header 836 838 +2 fib_trie_seq_show 764 766 +2 __devinet_sysctl_register 276 278 +2 fn_trie_lookup 1124 1123 -1 ip_fib_check_default 133 131 -2 devinet_conf_sysctl 223 221 -2 snmp_fold_field 126 123 -3 fn_trie_insert 2091 2086 -5 inet_create 876 870 -6 fib4_rules_init 197 191 -6 fib_sync_down 452 444 -8 inet_gso_send_check 334 325 -9 fib_create_info 3003 2991 -12 fib_nl_delrule 568 553 -15 fib_nl_newrule 883 852 -31 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:13 -08:00
Denis V. Lunev	0359238333	[FIB]: Add netns to fib_rules_ops. The backward link from FIB rules operations to the network namespace will allow to simplify the API a bit. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:13 -08:00
Denis V. Lunev	775516bfa2	[NETNS]: Namespace stop vs 'ip r l' race. During network namespace stop process kernel side netlink sockets belonging to a namespace should be closed. They should not prevent namespace to stop, so they do not increment namespace usage counter. Though this counter will be put during last sock_put. The raplacement of the correct netns for init_ns solves the problem only partial as socket to be stoped until proper stop is a valid netlink kernel socket and can be looked up by the user processes. This is not a problem until it resides in initial namespace (no processes inside this net), but this is not true for init_net. So, hold the referrence for a socket, remove it from lookup tables and only after that change namespace and perform a last put. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:08 -08:00
Denis V. Lunev	b7c6ba6eb1	[NETNS]: Consolidate kernel netlink socket destruction. Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:07 -08:00
Denis V. Lunev	4f84d82f7a	[NETNS]: Memory leak on network namespace stop. Network namespace allocates 2 kernel netlink sockets, fibnl & rtnl. These sockets should be disposed properly, i.e. by sock_release. Plain sock_put is not enough. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:06 -08:00
Daniel Lezcano	569d36452e	[NETNS][DST] dst: pass the dst_ops as parameter to the gc functions The garbage collection function receive the dst_ops structure as parameter. This is useful for the next incoming patchset because it will need the dst_ops (there will be several instances) and the network namespace pointer (contained in the dst_ops). The protocols which do not take care of the namespaces will not be impacted by this change (expect for the function signature), they do just ignore the parameter. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:46 -08:00
Eric Dumazet	a6501e080c	[IPV4] FIB_HASH: Reduce memory needs and speedup lookups Currently, sizeof(struct fib_alias) is 24 or 48 bytes on 32/64 bits arches. Because of SLAB_HWCACHE_ALIGN requirement, these are rounded to 32 and 64 bytes respectively. This patch moves rcu to the end of fib_alias, and conditionally defines it only for CONFIG_IP_FIB_TRIE. We also remove SLAB_HWCACHE_ALIGN requirement for fib_alias and fib_node objects because it is not necessary. (BTW SLUB currently denies it for objects smaller than cache_line_size() / 2, but not SLAB) Finally, sizeof(fib_alias) go back to 16 and 32 bytes. Then, we can embed one fib_alias on each fib_node, to favor locality. Most of the time access to the fib_alias will be free because one cache line contains both the list head (fn_alias) and (one of) the list element. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:46 -08:00
Eric Dumazet	b59cfbf77d	[FIB]: Fix rcu_dereference() abuses in fib_trie.c node_parent() and tnode_get_child() currently use rcu_dereference(). These functions are called from both - readers only paths (where rcu_dereference() is needed), and - writer path (where rcu_dereference() is not needed) To make explicit where rcu_dereference() is really needed, I introduced new node_parent_rcu() and tnode_get_child_rcu() functions which use rcu_dereference(), while node_parent() and tnode_get_child() dont use it. Then I changed calling sites where rcu_dereference() was really needed to call the _rcu() variants. This should have no impact but for alpha architecture, and may help future sparse checks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:45 -08:00
Patrick McHardy	c71e916708	[NETFILTER]: nf_conntrack: make print_conntrack function optional for l4protos Allows to remove five empty implementations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:42 -08:00
Patrick McHardy	c56cc9c07b	[NETFILTER]: nf_conntrack: remove print_conntrack function from l3protos Its unused and unlikely to ever be used. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:41 -08:00
Patrick McHardy	4f536522da	[NETFILTER]: kill nf_sysctl.c Since there now is generic support for shared sysctl paths, the only remains are the net/netfilter and net/ipv4/netfilter paths. Move them to net/netfilter/core.c and net/ipv4/netfilter.c and kill nf_sysctl.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:40 -08:00
Denys Vlasenko	9ba99b0d3f	[NETFILTER]: ipt_REJECT: properly handle IP options The current TCP RST construction reuses the old packet and can't deal with IP options as a consequence of that. Construct the RST from scratch instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:30 -08:00
Denys Vlasenko	022748a935	[NETFILTER]: {ip,ip6}_tables: remove some inlines This patch removes inlines except those which are used by packet matching code and thus are performance-critical. Before: $ size ///iptables.o text data bss dec hex filename 6402 500 16 6918 1b06 net/ipv4/netfilter/ip_tables.o 7130 500 16 7646 1dde net/ipv6/netfilter/ip6_tables.o After: $ size ///iptables.o text data bss dec hex filename 6307 500 16 6823 1aa7 net/ipv4/netfilter/ip_tables.o 7010 500 16 7526 1d66 net/ipv6/netfilter/ip6_tables.o Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:29 -08:00
Jan Engelhardt	f72e25a897	[NETFILTER]: Rename ipt_iprange to xt_iprange This patch moves ipt_iprange to xt_iprange, in preparation for adding IPv6 support to xt_iprange. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:27 -08:00
Jan Engelhardt	2ae15b64e6	[NETFILTER]: Update modules' descriptions Updates the MODULE_DESCRIPTION() tags for all Netfilter modules, actually describing what the module does and not just "netfilter XYZ target". Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:26 -08:00
Jan Engelhardt	11fa2aa362	[NETFILTER]: remove ipt_TOS.c Commit 88c85d81f74f92371745158aebc5cbf490412002 forgot to remove the old ipt_TOS file (whose code has been merged into xt_DSCP). Remove it now. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:17 -08:00
Patrick McHardy	8ce22fcab4	[NETFILTER]: Remove some EXPERIMENTAL dependencies Most of the netfilter modules are not considered experimental anymore, the only ones I want to keep marked as EXPERIMENTAL are: - TCPOPTSTRIP target, which is brand new. - SANE helper, which is quite new. - CLUSTERIP target, which I believe hasn't had much testing despite being in the kernel for quite a long time. - SCTP match and conntrack protocol, which are a mess and need to be reviewed and cleaned up before I would trust them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:16 -08:00
Stephen Hemminger	7f9b80529b	[IPV4]: fib hash\|trie initialization Initialization of the slab cache's should be done when IP is initialized to make sure of available memory, and that code can be marked __init. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:15 -08:00
Stephen Hemminger	d717a9a620	[IPV4] fib_trie: size and statistics Show number of entries in trie, the size field was being set but never used, but it only counted leaves, not all entries. Refactor the two cases in fib_triestat_seq_show into a single routine. Note: the stat structure was being malloc'd but the stack usage isn't so high (288 bytes) that it is worth the additional complexity. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:14 -08:00
Eric Dumazet	28d36e3702	[FIB]: Avoid using static variables without proper locking fib_trie_seq_show() uses two helper functions, rtn_scope() and rtn_type() that can write to static storage without locking. Just pass to them a temporary buffer to avoid potential corruption (probably not triggerable but still...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:13 -08:00
Denis V. Lunev	39a6d06300	[NETNS]: Process inet_confirm_addr in the correct namespace. inet_confirm_addr can be called with NULL in_dev from arp_ignore iff scope is RT_SCOPE_LINK. Lets always pass the device and check for RT_SCOPE_LINK scope inside inet_confirm_addr. This let us take network namespace from in_device a need for an additional argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:13 -08:00
Denis V. Lunev	9bd85e3264	[IPV4]: Remove extra argument from arp_ignore. arp_ignore has two arguments: dev & in_dev. dev is used for inet_confirm_addr calling only. inet_confirm_addr, in turn, either gets in_dev from the device passed or iterates over all network devices if the device passed is NULL. It seems logical to directly pass in_dev into inet_confirm_addr. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:12 -08:00
Denis V. Lunev	2db82b534b	[NETNS]: Make arp code network namespace consistent. Some calls in the arp.c have network namespace as an argument. Getting init_net inside these functions is simply inconsistent. Fix this. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:08 -08:00
Denis V. Lunev	a79878f00d	[ARP]: Move inet_addr_type call after simple error checks in arp_contructor. The neighbour entry will be destroyed in the case of error, so it is pointless to perform constly routing table lookup in this case. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:08 -08:00
Pavel Emelyanov	a308da1627	[NETNS][RAW]: Create the /proc/net/raw(6) in each namespace. To do so, just register the proper subsystem and create files in ->init callbacks. No other special per-namespace handling for raw sockets is required. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:07 -08:00
Pavel Emelyanov	e5ba31f11f	[NETNS][RAW]: Eliminate explicit init_net references. Happily, in all the rest places (->bind callbacks only), that require the struct net, we have a socket, so get the net from it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:06 -08:00
Pavel Emelyanov	f51d599fbe	[NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list. Pull the struct net pointer up to the showing functions to filter the sockets depending on their namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:06 -08:00
Pavel Emelyanov	be185884b3	[NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware. This requires just to pass the appropriate struct net pointer into __raw_v[46]_lookup and skip sockets that do not belong to a needed namespace. The proper net is get from skb->dev in all the cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:05 -08:00
Eric Dumazet	8d96544475	[FIB]: full_children & empty_children should be uint, not ushort If declared as unsigned short, these fields can overflow, and whole trie logic is broken. I could not make the machine crash, but some tnode can never be freed. Note for 64 bit arches : By reordering t_key and parent in [node, leaf, tnode] structures, we can use 32 bits hole after t_key so that sizeof(struct tnode) doesnt change after this patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:04 -08:00
Eric Dumazet	4dde4610c4	[IPV4] fib_trie: removes a memset() call in tnode_new() tnode_alloc() already clears allocated memory, using kcalloc() or alloc_pages(GFP_KERNEL\|__GFP_ZERO, ...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:02 -08:00
David S. Miller	88ebc72f68	[IPV4] FIB: Include nexthop device indexes in fib_info hashfn. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:01 -08:00

1 2 3 4 5 ...

2209 Commits