linux

Author	SHA1	Message	Date
Venkata Mohan Reddy	2906f66a56	ipvs: SCTP Trasport Loadbalancing Support Enhance IPVS to load balance SCTP transport protocol packets. This is done based on the SCTP rfc 4960. All possible control chunks have been taken care. The state machine used in this code looks some what lengthy. I tried to make the state machine easy to understand. Signed-off-by: Venkata Mohan Reddy Koppula <mohanreddykv@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-18 12:31:05 +01:00
Eric Dumazet	8a5ce54562	xt_hashlimit: fix locking Commit `2eff25c18c` (netfilter: xt_hashlimit: fix race condition and simplify locking) added a mutex deadlock : htable_create() is called with hashlimit_mutex already locked Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-17 13:27:11 -08:00
Florian Westphal	3e5e524ffb	netfilter: CONFIG_COMPAT: allow delta to exceed 32767 with 32 bit userland and 64 bit kernels, it is unlikely but possible that insertion of new rules fails even tough there are only about 2000 iptables rules. This happens because the compat delta is using a short int. Easily reproducible via "iptables -m limit" ; after about 2050 rules inserting new ones fails with -ELOOP. Note that compat_delta included 2 bytes of padding on x86_64, so structure size remains the same. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-15 18:17:10 +01:00
Patrick McHardy	ef00f89f1e	netfilter: ctnetlink: add zone support Parse and dump the conntrack zone in ctnetlink. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-15 18:14:57 +01:00
Patrick McHardy	5d0aa2ccd4	netfilter: nf_conntrack: add support for "conntrack zones" Normally, each connection needs a unique identity. Conntrack zones allow to specify a numerical zone using the CT target, connections in different zones can use the same identity. Example: iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1 iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1 Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-15 18:13:33 +01:00
Patrick McHardy	8fea97ec17	netfilter: nf_conntrack: pass template to l4proto ->error() handler The error handlers might need the template to get the conntrack zone introduced in the next patches to perform a conntrack lookup. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-15 17:45:08 +01:00
Jan Engelhardt	739674fb7f	netfilter: xtables: constify args in compat copying functions Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-02-15 16:59:28 +01:00
Jan Engelhardt	b402405d71	netfilter: xtables: print details on size mismatch Print which revision has been used and which size are which (kernel/user) for easier debugging. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-02-15 16:59:28 +01:00
Jan Engelhardt	98e6d2d5ee	netfilter: xt_recent: inform user when hitcount is too large It is one of these things that iptables cannot catch and which can cause "Invalid argument" to be printed. Without a hint in dmesg, it is not going to be helpful. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-15 16:31:35 +01:00
Alexey Dobriyan	ca1c2e2da9	netfilter: don't use INIT_RCU_HEAD() call_rcu() will unconditionally reinitialize RCU head anyway. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-12 06:25:36 +01:00
Patrick McHardy	9d288dffe3	netfilter: nf_conntrack_sip: add T.38 FAX support Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-11 12:30:21 +01:00
Patrick McHardy	48f8ac2653	netfilter: nf_nat_sip: add TCP support Add support for mangling TCP SIP packets. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-11 12:29:38 +01:00
Patrick McHardy	f5b321bd37	netfilter: nf_conntrack_sip: add TCP support Add TCP support, which is mandated by RFC3261 for all SIP elements. SIP over TCP is similar to UDP, except that messages are delimited by Content-Length: headers and multiple messages may appear in one packet. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-11 12:26:19 +01:00
Patrick McHardy	3b6b9fab42	netfilter: nf_conntrack_sip: pass data offset to NAT functions When using TCP multiple SIP messages might be present in a single packet. A following patch will parse them by setting the dptr to the beginning of each message. The NAT helper needs to reload the dptr value after mangling the packet however, so it needs to know the offset of the message to the beginning of the packet. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-11 12:23:53 +01:00
Patrick McHardy	54101f4f3b	netfilter: nf_conntrack_sip: fix ct_sip_parse_request() REGISTER request parsing When requests are parsed, the "sip:" part of the SIP URI should be skipped. Usually this doesn't matter because address parsing skips forward until after the username part, but in case REGISTER requests it doesn't contain a username and the address can not be parsed. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-11 12:23:12 +01:00
Patrick McHardy	b87921bdf2	netfilter: nf_conntrack: show helper and class in /proc/net/nf_conntrack_expect Make the output a bit more informative by showing the helper an expectation belongs to and the expectation class. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-11 12:22:48 +01:00
Patrick McHardy	d1e7a03f4f	netfilter: ctnetlink: dump expectation helper name Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-11 12:22:28 +01:00
Patrick McHardy	a8c28d0515	Merge branch 'master' of git://dev.medozas.de/linux	2010-02-10 17:56:46 +01:00
Jan Engelhardt	e3eaa9910b	netfilter: xtables: generate initial table on-demand The static initial tables are pretty large, and after the net namespace has been instantiated, they just hang around for nothing. This commit removes them and creates tables on-demand at runtime when needed. Size shrinks by 7735 bytes (x86_64). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-02-10 17:50:47 +01:00
Jan Engelhardt	2b95efe7f6	netfilter: xtables: use xt_table for hook instantiation The respective xt_table structures already have most of the metadata needed for hook setup. Add a 'priority' field to struct xt_table so that xt_hook_link() can be called with a reduced number of arguments. So should we be having more tables in the future, it comes at no static cost (only runtime, as before) - space saved: 6807373->6806555. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-02-10 17:13:33 +01:00
Patrick McHardy	d0b0268fdd	netfilter: ctnetlink: add missing netlink attribute policies Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-10 15:38:33 +01:00
Alexey Dobriyan	42107f5009	netfilter: xtables: symmetric COMPAT_XT_ALIGN definition Rewrite COMPAT_XT_ALIGN in terms of dummy structure hack. Compat counters logically have nothing to do with it. Use ALIGN() macro while I'm at it for same types. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-10 15:03:27 +01:00
Patrick McHardy	9ab99d5a43	Merge branch 'master' of /repos/git/net-next-2.6 Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-10 14:17:10 +01:00
Patrick McHardy	d696c7bdaa	netfilter: nf_conntrack: fix hash resizing with namespaces As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-08 11:18:07 -08:00
Alexey Dobriyan	13ccdfc2af	netfilter: nf_conntrack: restrict runtime expect hashsize modifications Expectation hashtable size was simply glued to a variable with no code to rehash expectations, so it was a bug to allow writing to it. Make "expect_hashsize" readonly. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-08 11:17:22 -08:00
Eric Dumazet	5b3501faa8	netfilter: nf_conntrack: per netns nf_conntrack_cachep nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also must use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-08 11:16:56 -08:00
Patrick McHardy	9edd7ca0a3	netfilter: nf_conntrack: fix memory corruption with multiple namespaces As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked" conntrack, which is located in the data section, might be accidentally freed when a new namespace is instantiated while the untracked conntrack is attached to a skb because the reference count it re-initialized. The best fix would be to use a seperate untracked conntrack per namespace since it includes a namespace pointer. Unfortunately this is not possible without larger changes since the namespace is not easily available everywhere we need it. For now move the untracked conntrack initialization to the init_net setup function to make sure the reference count is not re-initialized and handle cleanup in the init_net cleanup function to make sure namespaces can exit properly while the untracked conntrack is in use in other namespaces. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-08 11:16:26 -08:00
Patrick McHardy	84f3bb9ae9	netfilter: xtables: add CT target Add a new target for the raw table, which can be used to specify conntrack parameters for specific connections, f.i. the conntrack helper. The target attaches a "template" connection tracking entry to the skb, which is used by the conntrack core when initializing a new conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-03 17:17:06 +01:00
Patrick McHardy	b2a15a604d	netfilter: nf_conntrack: support conntrack templates Support initializing selected parameters of new conntrack entries from a "conntrack template", which is a specially marked conntrack entry attached to the skb. Currently the helper and the event delivery masks can be initialized this way. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-03 14:40:17 +01:00
Patrick McHardy	0cebe4b416	netfilter: ctnetlink: support selective event delivery Add two masks for conntrack end expectation events to struct nf_conntrack_ecache and use them to filter events. Their default value is "all events" when the event sysctl is on and "no events" when it is off. A following patch will add specific initializations. Expectation events depend on the ecache struct of their master conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-03 13:51:51 +01:00
Patrick McHardy	858b313300	netfilter: nf_conntrack: split up IPCT_STATUS event Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is generated when the IPS_ASSURED bit is set. In combination with a following patch to support selective event delivery, this can be used for "sparse" conntrack replication: start replicating the conntrack entry after it reached the ASSURED state and that way it's SYN-flood resistant. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-03 13:48:53 +01:00
Patrick McHardy	794e68716b	netfilter: ctnetlink: only assign helpers for matching protocols Make sure not to assign a helper for a different network or transport layer protocol to a connection. Additionally change expectation deletion by helper to compare the name directly - there might be multiple helper registrations using the same name, currently one of them is chosen in an unpredictable manner and only those expectations are removed. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-03 13:41:29 +01:00
Patrick McHardy	2eff25c18c	netfilter: xt_hashlimit: fix race condition and simplify locking As noticed by Shin Hong <hongshin@gmail.com>, there is a race between htable_find_get() and htable_put(): htable_put(): htable_find_get(): spin_lock_bh(&hashlimit_lock); <search entry> atomic_dec_and_test(&hinfo->use) atomic_inc(&hinfo->use) spin_unlock_bh(&hashlimit_lock) return hinfo; spin_lock_bh(&hashlimit_lock); hlist_del(&hinfo->node); spin_unlock_bh(&hashlimit_lock); htable_destroy(hinfo); The entire locking concept is overly complicated, tables are only created/referenced and released in process context, so a single mutex works just fine. Remove the hashinfo_spinlock and atomic reference count and use the mutex to protect table lookups/creation and reference count changes. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-03 13:24:54 +01:00
David S. Miller	a4c89051c8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6	2010-02-02 09:04:58 -08:00
Simon Arlott	10a199394b	netfilter: xt_TCPMSS: SYN packets are allowed to contain data The TCPMSS target is dropping SYN packets where: 1) There is data, or 2) The data offset makes the TCP header larger than the packet. Both of these result in an error level printk. This printk has been removed. This change avoids dropping SYN packets containing data. If there is also no MSS option (as well as data), one will not be added because of possible complications due to the increased packet size. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-02-02 15:33:38 +01:00
Patrick McHardy	e578756c35	netfilter: ctnetlink: fix expectation mask dump The protocol number is not initialized, so userspace can't interpret the layer 4 data properly. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-26 17:04:02 +01:00
Patrick McHardy	135d01899b	netfilter: nf_conntrack_sip: fix off-by-one in compact header parsing In a string like "v:SIP/2.0..." it was checking for !isalpha('S') when it meant to be inspecting the ':'. Patch by Greg Alexander <greqcs@galexander.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-19 19:06:59 +01:00
Eric Leblond	a5d896adf0	netfilter: nfnetlink_queue: simplify warning message This patch remove variable part from a debug message to have message concatenation from syslog. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-18 09:44:39 +01:00
Alexey Dobriyan	e89fc3f1b0	netfilter: xt_hashlimit: netns support Make hashtable per-netns. Make proc files per-netns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-18 08:33:28 +01:00
Alexey Dobriyan	7d07d5632b	netfilter: xt_recent: netns support Make recent table list per-netns. Make proc files per-netns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-18 08:31:00 +01:00
Alexey Dobriyan	a1004d8e3d	netfilter: xt_hashlimit: simplify seqfile code Simply pass hashtable to seqfile iterators, proc entry itself is not needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-18 08:14:50 +01:00
Alexey Dobriyan	83fc81024b	netfilter: xt_connlimit: netns support Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-18 08:07:50 +01:00
Alexey Dobriyan	9592a5c01e	netfilter: ctnetlink: netns support Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-13 16:04:18 +01:00
Alexey Dobriyan	cd8c20b650	netfilter: nfnetlink: netns support Make nfnl socket per-petns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-13 16:02:14 +01:00
Linus Torvalds	597d8c7178	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits) sky2: Fix oops in sky2_xmit_frame() after TX timeout Documentation/3c509: document ethtool support af_packet: Don't use skb after dev_queue_xmit() vxge: use pci_dma_mapping_error to test return value netfilter: ebtables: enforce CAP_NET_ADMIN e1000e: fix and commonize code for setting the receive address registers e1000e: e1000e_enable_tx_pkt_filtering() returns wrong value e1000e: perform 10/100 adaptive IFS only on parts that support it e1000e: don't accumulate PHY statistics on PHY read failure e1000e: call pci_save_state() after pci_restore_state() netxen: update version to 4.0.72 netxen: fix set mac addr netxen: fix smatch warning netxen: fix tx ring memory leak tcp: update the netstamp_needed counter when cloning sockets TI DaVinci EMAC: Handle emac module clock correctly. dmfe/tulip: Let dmfe handle DM910x except for SPARC on-board chips ixgbe: Fix compiler warning about variable being used uninitialized netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq() mv643xx_eth: don't include cache padding in rx desc buffer size ... Fix trivial conflict in drivers/scsi/cxgb3i/cxgb3i_offload.c	2010-01-12 20:53:29 -08:00
Joe Perches	7f635d0d1b	netfilter: xt_osf: change %pi4 to %pI4 commit `8a27f7c90f` changed the output style of %pi4 to use fixed width leading zero IP addresses "001.002.003.004". It's useful when printing multiple lines of addresses, but was a change in output style for some existing uses. Using %pI4 restores the previous output style. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-11 11:55:36 +01:00
Joe Perches	a79e7ac4ad	ipvs: use standardized format in sprintf Use the same format string as net/ipv4/netfilter/nf_nat_ftp.c to encode an ipv4 address and port. Both uses should be a single common function. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-11 11:53:31 +01:00
Patrick McHardy	aaff23a95a	netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq() As noticed by Dan Carpenter <error27@gmail.com>, update_nl_seq() currently contains an out of bounds read of the seq_aft_nl array when looking for the oldest sequence number position. Fix it to only compare valid positions. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-07 18:33:18 +01:00
Catalin(ux) M. BOIE	6f7edb4881	IPVS: Allow boot time change of hash size I was very frustrated about the fact that I have to recompile the kernel to change the hash size. So, I created this patch. If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel command line, or, if you built IPVS as modules, you can add options ip_vs conn_tab_bits=??. To keep everything backward compatible, you still can select the size at compile time, and that will be used as default. It has been about a year since this patch was originally posted and subsequently dropped on the basis of insufficient test data. Mark Bergsma has provided the following test results which seem to strongly support the need for larger hash table sizes: We do however run into the same problem with the default setting (212 = 4096 entries), as most of our LVS balancers handle around a million connections/SLAB entries at any point in time (around 100-150 kpps load). With only 4096 hash table entries this implies that each entry consists of a linked list of 256 connections on average. To provide some statistics, I did an oprofile run on an 2.6.31 kernel, with both the default 4096 table size, and the same kernel recompiled with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a quick test setup with a part of Wikimedia/Wikipedia's live traffic mirrored by the switch to the test host. With the default setting, at ~ 120 kpps packet load we saw a typical %si CPU usage of around 30-35%, and oprofile reported a hot spot in ip_vs_conn_in_get: samples % image name app name symbol name 1719761 42.3741 ip_vs.ko ip_vs.ko ip_vs_conn_in_get 302577 7.4554 bnx2 bnx2 /bnx2 181984 4.4840 vmlinux vmlinux __ticket_spin_lock 128636 3.1695 vmlinux vmlinux ip_route_input 74345 1.8318 ip_vs.ko ip_vs.ko ip_vs_conn_out_get 68482 1.6874 vmlinux vmlinux mwait_idle After loading the recompiled kernel with 218 entries, %si CPU usage dropped in half to around 12-18%, and oprofile looks much healthier, with only 7% spent in ip_vs_conn_in_get: samples % image name app name symbol name 265641 14.4616 bnx2 bnx2 /bnx2 143251 7.7986 vmlinux vmlinux __ticket_spin_lock 140661 7.6576 ip_vs.ko ip_vs.ko ip_vs_conn_in_get 94364 5.1372 vmlinux vmlinux mwait_idle 86267 4.6964 vmlinux vmlinux ip_route_input [ horms@verge.net.au: trivial up-port and minor style fixes ] Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Cc: Mark Bergsma <mark@wikimedia.org> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-05 05:50:24 +01:00
Arjan van de Ven	04bcef2a83	ipvs: Add boundary check on ioctl arguments The ipvs code has a nifty system for doing the size of ioctl command copies; it defines an array with values into which it indexes the cmd to find the right length. Unfortunately, the ipvs code forgot to check if the cmd was in the range that the array provides, allowing for an index outside of the array, which then gives a "garbage" result into the length, which then gets used for copying into a stack buffer. Fix this by adding sanity checks on these as well as the copy size. [ horms@verge.net.au: adjusted limit to IP_VS_SO_GET_MAX ] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-01-04 16:37:12 +01:00

1 2 3 4 5 ...

1145 Commits