linux

Author	SHA1	Message	Date
Eric Dumazet	686a7e32ca	inetpeer: fix race in unused_list manipulations Several crashes in cleanup_once() were reported in recent kernels. Commit `d6cc1d642d` (inetpeer: various changes) added a race in unlink_from_unused(). One way to avoid taking unused_peers.lock before doing the list_empty() test is to catch 0->1 refcnt transitions, using full barrier atomic operations variants (atomic_cmpxchg() and atomic_inc_return()) instead of previous atomic_inc() and atomic_add_unless() variants. We then call unlink_from_unused() only for the owner of the 0->1 transition. Add a new atomic_add_unless_return() static helper With help from Arun Sharma. Refs: https://bugzilla.kernel.org/show_bug.cgi?id=32772 Reported-by: Arun Sharma <asharma@fb.com> Reported-by: Maximilian Engelhardt <maxi@daemonizer.de> Reported-by: Yann Dupont <Yann.Dupont@univ-nantes.fr> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-27 13:39:11 -04:00
Veaceslav Falico	24cf3af3fe	igmp: call ip_mc_clear_src() only when we have no users of ip_mc_list In igmp_group_dropped() we call ip_mc_clear_src(), which resets the number of source filters per mulitcast. However, igmp_group_dropped() is also called on NETDEV_DOWN, NETDEV_PRE_TYPE_CHANGE and NETDEV_UNREGISTER, which means that the group might get added back on NETDEV_UP, NETDEV_REGISTER and NETDEV_POST_TYPE_CHANGE respectively, leaving us with broken source filters. To fix that, we must clear the source filters only when there are no users in the ip_mc_list, i.e. in ip_mc_dec_group() and on device destroy. Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-24 13:26:12 -04:00
Dan Rosenberg	71338aa7d0	net: convert %p usage to %pK The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-24 01:13:12 -04:00
Eric Dumazet	19a76fa959	net: ping: cleanups ping_v4_unhash() net/ipv4/ping.c: In function ‘ping_v4_unhash’: net/ipv4/ping.c:140:28: warning: variable ‘hslot’ set but not used Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-23 16:29:24 -04:00
Linus Torvalds	53ee7569ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) bnx2x: allow device properly initialize after hotplug bnx2x: fix DMAE timeout according to hw specifications bnx2x: properly handle CFC DEL in cnic flow bnx2x: call dev_kfree_skb_any instead of dev_kfree_skb net: filter: move forward declarations to avoid compile warnings pktgen: refactor pg_init() code pktgen: use vzalloc_node() instead of vmalloc_node() + memset() net: skb_trim explicitely check the linearity instead of data_len ipv4: Give backtrace in ip_rt_bug(). net: avoid synchronize_rcu() in dev_deactivate_many net: remove synchronize_net() from netdev_set_master() rtnetlink: ignore NETDEV_RELEASE and NETDEV_JOIN event net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASE bridge: call NETDEV_JOIN notifiers when add a slave netpoll: disable netpoll when enslave a device macvlan: Forward unicast frames in bridge mode to lowerdev net: Remove linux/prefetch.h include from linux/skbuff.h ipv4: Include linux/prefetch.h in fib_trie.c netlabel: Remove prefetches from list handlers. drivers/net: add prefetch header for prefetch users ... Fixed up prefetch parts: removed a few duplicate prefetch.h includes, fixed the location of the igb prefetch.h, took my version of the skbuff.h code without the extra parentheses etc.	2011-05-23 08:39:24 -07:00
Paul Gortmaker	70c7160619	Add appropriate <linux/prefetch.h> include for prefetch users After discovering that wide use of prefetch on modern CPUs could be a net loss instead of a win, net drivers which were relying on the implicit inclusion of prefetch.h via the list headers showed up in the resulting cleanup fallout. Give them an explicit include via the following $0.02 script. ========================================= #!/bin/bash MANUAL="" for i in `git grep -l 'prefetch(.*)' .` ; do grep -q '<linux/prefetch.h>' $i if [ $? = 0 ] ; then continue fi ( echo '?^#include <linux/?a' echo '#include <linux/prefetch.h>' echo . echo w echo q ) \| ed -s $i > /dev/null 2>&1 if [ $? != 0 ]; then echo $i needs manual fixup MANUAL="$i $MANUAL" fi done echo ------------------- 8\<---------------------- echo vi $MANUAL ========================================= Signed-off-by: Paul <paul.gortmaker@windriver.com> [ Fixed up some incorrect #include placements, and added some non-network drivers and the fib_trie.c case - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-22 21:41:57 -07:00
Dave Jones	c378a9c019	ipv4: Give backtrace in ip_rt_bug(). Add a stack backtrace to the ip_rt_bug path for debugging Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-22 21:01:20 -04:00
David S. Miller	120a3d5c7c	ipv4: Include linux/prefetch.h in fib_trie.c Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-22 20:53:43 -04:00
Linus Torvalds	06f4e926d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.	2011-05-20 13:43:21 -07:00
Linus Torvalds	eb04f2f04e	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits) Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree() batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu() net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu() net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu() perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu() perf,rcu: convert call_rcu(free_ctx) to kfree_rcu() net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(net_generic_release) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu() security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu() net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu() net,rcu: convert call_rcu(xps_map_release) to kfree_rcu() net,rcu: convert call_rcu(rps_map_release) to kfree_rcu() ...	2011-05-19 18:14:34 -07:00
Micha Nelissen	3fb72f1e6e	ipconfig wait for carrier v3 -> v4: fix return boolean false instead of 0 for ic_is_init_dev Currently the ip auto configuration has a hardcoded delay of 1 second. When (ethernet) link takes longer to come up (e.g. more than 3 seconds), nfs root may not be found. Remove the hardcoded delay, and wait for carrier on at least one network device. Signed-off-by: Micha Nelissen <micha@neli.hopto.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-19 17:13:04 -04:00
Changli Gao	75e308c894	net: ping: fix the coding style The characters in a line should be no more than 80. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-19 16:17:51 -04:00
Changli Gao	bb0cd2fb53	net: ping: make local functions static As these functions are only used in this file. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-19 16:17:51 -04:00
David S. Miller	a48eff1288	ipv4: Pass explicit destination address to rt_bind_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-18 18:42:43 -04:00
David S. Miller	ed2361e66e	ipv4: Pass explicit destination address to rt_get_peer(). This will next trickle down to rt_bind_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-18 18:38:54 -04:00
David S. Miller	6bd023f3dd	ipv4: Make caller provide flowi4 key to inet_csk_route_req(). This way the caller can get at the fully resolved fl4->{daddr,saddr} etc. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-18 18:32:03 -04:00
David S. Miller	6882f933cc	ipv4: Kill RT_CACHE_DEBUG It's way past it's usefulness. And this gets rid of a bunch of stray ->rt_{dst,src} references. Even the comment documenting the macro was inaccurate (stated default was 1 when it's 0). If reintroduced, it should be done properly, with dynamic debug facilities. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-18 18:23:21 -04:00
David S. Miller	1d1652cbdb	ipv4: Don't use enums as bitmasks in ip_fragment.c Noticed by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-17 17:28:02 -04:00
Vasiliy Kulikov	f56e03e8dc	net: ping: fix build failure If CONFIG_PROC_SYSCTL=n the building process fails: ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net' Moved inet_get_ping_group_range_net() to ping.c. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-17 14:16:58 -04:00
Eric Dumazet	5173cc0577	ipv4: more compliant RFC 3168 support Commit `6623e3b24a` (ipv4: IP defragmentation must be ECN aware) was an attempt to not lose "Congestion Experienced" (CE) indications when performing datagram defragmentation. Stefanos Harhalakis raised the point that RFC 3168 requirements were not completely met by this commit. In particular, we MUST detect invalid combinations and eventually drop illegal frames. Reported-by: Stefanos Harhalakis <v13@v13.gr> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-16 14:49:14 -04:00
David S. Miller	c5be24ff62	ipv4: Trivial rt->rt_src conversions in net/ipv4/route.c At these points we have a fully filled in value via the IP header the form of ip_hdr(skb)->saddr Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-16 13:49:05 -04:00
Eric Dumazet	1a8218e962	net: ping: dont call udp_ioctl() udp_ioctl() really handles UDP and UDPLite protocols. 1) It can increment UDP_MIB_INERRORS in case first_packet_length() finds a frame with bad checksum. 2) It has a dependency on sizeof(struct udphdr), not applicable to ICMP/PING If ping sockets need to handle SIOCINQ/SIOCOUTQ ioctl, this should be done differently. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-16 11:49:39 -04:00
Eric Dumazet	1b1cb1f78a	net: ping: small changes ping_table is not __read_mostly, since it contains one rwlock, and is static to ping.c ping_port_rover & ping_v4_lookup are static Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-15 01:22:21 -04:00
David S. Miller	7be799a70b	ipv4: Remove rt->rt_dst reference from ip_forward_options(). At this point iph->daddr equals what rt->rt_dst would hold. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-13 17:31:02 -04:00
David S. Miller	8e36360ae8	ipv4: Remove route key identity dependencies in ip_rt_get_source(). Pass in the sk_buff so that we can fetch the necessary keys from the packet header when working with input routes. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-13 17:29:41 -04:00
David S. Miller	22f728f8f3	ipv4: Always call ip_options_build() after rest of IP header is filled in. This will allow ip_options_build() to reliably look at the values of iph->{daddr,saddr} Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-13 17:21:27 -04:00
David S. Miller	0374d9ceb0	ipv4: Kill spurious write to iph->daddr in ip_forward_options(). This code block executes when opt->srr_is_hit is set. It will be set only by ip_options_rcv_srr(). ip_options_rcv_srr() walks until it hits a matching nexthop in the SRR option addresses, and when it matches one 1) looks up the route for that nexthop and 2) on route lookup success it writes that nexthop value into iph->daddr. ip_forward_options() runs later, and again walks the SRR option addresses looking for the option matching the destination of the route stored in skb_rtable(). This route will be the same exact one looked up for the nexthop by ip_options_rcv_srr(). Therefore "rt->rt_dst == iph->daddr" must be true. All it really needs to do is record the route's source address in the matching SRR option adddress. It need not write iph->daddr again, since that has already been done by ip_options_rcv_srr() as detailed above. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-13 17:15:50 -04:00
Vasiliy Kulikov	c319b4d76b	net: ipv4: add IPPROTO_ICMP socket kind This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any special privileges. In other words, the patch makes it possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In order not to increase the kernel's attack surface, the new functionality is disabled by default, but is enabled at bootup by supporting Linux distributions, optionally with restriction to a group or a group range (see below). Similar functionality is implemented in Mac OS X: http://www.manpagez.com/man/4/icmp/ A new ping socket is created with socket(PF_INET, SOCK_DGRAM, PROT_ICMP) Message identifiers (octets 4-5 of ICMP header) are interpreted as local ports. Addresses are stored in struct sockaddr_in. No port numbers are reserved for privileged processes, port 0 is reserved for API ("let the kernel pick a free number"). There is no notion of remote ports, remote port numbers provided by the user (e.g. in connect()) are ignored. Data sent and received include ICMP headers. This is deliberate to: 1) Avoid the need to transport headers values like sequence numbers by other means. 2) Make it easier to port existing programs using raw sockets. ICMP headers given to send() are checked and sanitized. The type must be ICMP_ECHO and the code must be zero (future extensions might relax this, see below). The id is set to the number (local port) of the socket, the checksum is always recomputed. ICMP reply packets received from the network are demultiplexed according to their id's, and are returned by recv() without any modifications. IP header information and ICMP errors of those packets may be obtained via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source quenches and redirects are reported as fake errors via the error queue (IP_RECVERR); the next hop address for redirects is saved to ee_info (in network order). socket(2) is restricted to the group range specified in "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning that nobody (not even root) may create ping sockets. Setting it to "100 100" would grant permissions to the single group (to either make /sbin/ping g+s and owned by this group or to grant permissions to the "netadmins" group), "0 4294967295" would enable it for the world, "100 4294967295" would enable it for the users, but not daemons. The existing code might be (in the unlikely case anyone needs it) extended rather easily to handle other similar pairs of ICMP messages (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply etc.). Userspace ping util & patch for it: http://openwall.info/wiki/people/segoon/ping For Openwall GNU/*/Linux it was the last step on the road to the setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels) is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs: http://mirrors.kernel.org/openwall/Owl/current/iso/ Initially this functionality was written by Pavel Kankovsky for Linux 2.4.32, but unfortunately it was never made public. All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with the patch. PATCH v3: - switched to flowi4. - minor changes to be consistent with raw sockets code. PATCH v2: - changed ping_debug() to pr_debug(). - removed CONFIG_IP_PING. - removed ping_seq_fops.owner field (unused for procfs). - switched to proc_net_fops_create(). - switched to %pK in seq_printf(). PATCH v1: - fixed checksumming bug. - CAP_NET_RAW may not create icmp sockets anymore. RFC v2: - minor cleanups. - introduced sysctl'able group range to restrict socket(2). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-13 16:08:13 -04:00
David S. Miller	72a8f97bf2	ipv4: Fix 'iph' use before set. I swear none of my compilers warned about this, yet it is so obvious. > net/ipv4/ip_forward.c: In function 'ip_forward': > net/ipv4/ip_forward.c:87: warning: 'iph' may be used uninitialized in this function Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-12 23:03:46 -04:00
David S. Miller	def57687e9	ipv4: Elide use of rt->rt_dst in ip_forward() No matter what kind of header mangling occurs due to IP options processing, rt->rt_dst will always equal iph->daddr in the packet. So we can safely use iph->daddr instead of rt->rt_dst here. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-12 19:34:30 -04:00
David S. Miller	c30883bdff	ipv4: Simplify iph->daddr overwrite in ip_options_rcv_srr(). We already copy the 4-byte nexthop from the options block into local variable "nexthop" for the route lookup. Re-use that variable instead of memcpy()'ing again when assigning to iph->daddr after the route lookup succeeds. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-12 19:30:58 -04:00
David S. Miller	10949550bd	ipv4: Kill spurious opt->srr check in ip_options_rcv_srr(). All call sites conditionalize the call to ip_options_rcv_srr() with a check of opt->srr, so no need to check it again there. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-12 19:26:57 -04:00
David S. Miller	3c709f8fb4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 Conflicts: drivers/net/benet/be_main.c	2011-05-11 14:26:58 -04:00
Steffen Klassert	43a4dea4c9	xfrm: Assign the inner mode output function to the dst entry As it is, we assign the outer modes output function to the dst entry when we create the xfrm bundle. This leads to two problems on interfamily scenarios. We might insert ipv4 packets into ip6_fragment when called from xfrm6_output. The system crashes if we try to fragment an ipv4 packet with ip6_fragment. This issue was introduced with git commit `ad0081e4` (ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed). The second issue is, that we might insert ipv4 packets in netfilter6 and vice versa on interfamily scenarios. With this patch we assign the inner mode output function to the dst entry when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner mode is used and the right fragmentation and netfilter functions are called. We switch then to outer mode with the output_finish functions. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 15:03:34 -07:00
Eric Dumazet	1fc19aff84	net: fix two lockdep splats Commit `e67f88dd12` (net: dont hold rtnl mutex during netlink dump callbacks) switched rtnl protection to RCU, but we forgot to adjust two rcu_dereference() lockdep annotations : inet_get_link_af_size() or inet_fill_link_af() might be called with rcu_read_lock or rtnl held, so use rcu_dereference_rtnl() instead of rtnl_dereference() Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 15:03:01 -07:00
David S. Miller	8f01cb0827	ipv4: xfrm: Eliminate ->rt_src reference in policy code. Rearrange xfrm4_dst_lookup() so that it works by calling a helper function __xfrm_dst_lookup() that takes an explicit flow key storage area as an argument. Use this new helper in xfrm4_get_saddr() so we can fetch the selected source address from the flow instead of from rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 13:32:48 -07:00
David S. Miller	79ab053145	ipv4: udp: Eliminate remaining uses of rt->rt_src We already track and pass around the correct flow key, so simply use it in udp_send_skb(). Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 13:32:47 -07:00
David S. Miller	9f6abb5f17	ipv4: icmp: Eliminate remaining uses of rt->rt_src On input packets, rt->rt_src always equals ip_hdr(skb)->saddr Anything that mangles or otherwise changes the IP header must relookup the route found at skb_rtable(). Therefore this invariant must always hold true. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 13:32:46 -07:00
David S. Miller	0a5ebb8000	ipv4: Pass explicit daddr arg to ip_send_reply(). This eliminates an access to rt->rt_src. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 13:32:46 -07:00
David S. Miller	f5fca60865	ipv4: Pass flow key down into ip_append_*(). This way rt->rt_dst accesses are unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 21:24:07 -07:00
David S. Miller	77968b7824	ipv4: Pass flow keys down into datagram packet building engine. This way ip_output.c no longer needs rt->rt_{src,dst}. We already have these keys sitting, ready and waiting, on the stack or in a socket structure. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 21:24:06 -07:00
David S. Miller	e474995f29	udp: Use flow key information instead of rt->rt_{src,dst} We have two cases. Either the socket is in TCP_ESTABLISHED state and connect() filled in the inet socket cork flow, or we looked up the route here and used an on-stack flow. Track which one it was, and use it to obtain src/dst addrs. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 21:12:48 -07:00
stephen hemminger	b9f47a3aae	tcp_cubic: limit delayed_ack ratio to prevent divide error TCP Cubic keeps a metric that estimates the amount of delayed acknowledgements to use in adjusting the window. If an abnormally large number of packets are acknowledged at once, then the update could wrap and reach zero. This kind of ACK could only happen when there was a large window and huge number of ACK's were lost. This patch limits the value of delayed ack ratio. The choice of 32 is just a conservative value since normally it should be range of 1 to 4 packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 15:51:57 -07:00
David S. Miller	c5216cc70f	tcp: Use cork flow info instead of rt->rt_dst in tcp_v4_get_peer() Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 15:28:29 -07:00
David S. Miller	ea4fc0d619	ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit(). Now we can pick it out of the provided flow key. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 15:28:28 -07:00
David S. Miller	d9d8da805d	inet: Pass flowi to ->queue_xmit(). This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 15:28:28 -07:00
David S. Miller	0e73441992	ipv4: Use inet_csk_route_child_sock() in DCCP and TCP. Operation order is now transposed, we first create the child socket then we try to hook up the route. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 15:28:03 -07:00
David S. Miller	77357a9552	ipv4: Create inet_csk_route_child_sock(). This is just like inet_csk_route_req() except that it operates after we've created the new child socket. In this way we can use the new socket's cork flow for proper route key storage. This will be used by DCCP and TCP child socket creation handling. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 14:34:22 -07:00
David S. Miller	b57ae01a8a	ipv4: Use cork flow in ip_queue_xmit() All invokers of ip_queue_xmit() must make certain that the socket is locked. All of SCTP, TCP, DCCP, and L2TP now make sure this is the case. Therefore we can use the cork flow during output route lookup in ip_queue_xmit() when the socket route check fails. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 14:05:14 -07:00
David S. Miller	6e86913810	ipv4: Use cork flow in inet_sk_{reselect_saddr,rebuild_header}() These two functions must be invoked only when the socket is locked (because socket identity modifications are made non-atomically). Therefore we can use the cork flow for output route lookups. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 14:05:13 -07:00

1 2 3 4 5 ...

4278 Commits