1
linux/net/ipv4
Vasiliy Kulikov c319b4d76b net: ipv4: add IPPROTO_ICMP socket kind
This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges.  In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).

Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/

A new ping socket is created with

    socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.

Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.

ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.

ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).

socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets.  Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.

The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).

Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping

For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/

Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.

All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.

PATCH v3:
    - switched to flowi4.
    - minor changes to be consistent with raw sockets code.

PATCH v2:
    - changed ping_debug() to pr_debug().
    - removed CONFIG_IP_PING.
    - removed ping_seq_fops.owner field (unused for procfs).
    - switched to proc_net_fops_create().
    - switched to %pK in seq_printf().

PATCH v1:
    - fixed checksumming bug.
    - CAP_NET_RAW may not create icmp sockets anymore.

RFC v2:
    - minor cleanups.
    - introduced sysctl'able group range to restrict socket(2).

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 16:08:13 -04:00
..
netfilter inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
af_inet.c net: ipv4: add IPPROTO_ICMP socket kind 2011-05-13 16:08:13 -04:00
ah4.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
arp.c net: gre: provide multicast mappings for ipv4 and ipv6 2011-03-30 00:10:47 -07:00
cipso_ipv4.c inet: add RCU protection to inet->opt 2011-04-28 13:16:35 -07:00
datagram.c ipv4: Lock socket and use cork flow in ip4_datagram_connect(). 2011-05-08 13:48:57 -07:00
devinet.c net: fix two lockdep splats 2011-05-10 15:03:01 -07:00
esp4.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
fib_frontend.c Disable rp_filter for IPsec packets 2011-04-10 18:50:59 -07:00
fib_lookup.h ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_rules.c ipv4: Use flowi4 in FIB layer. 2011-03-12 15:08:49 -08:00
fib_semantics.c ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_trie.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-05-05 14:59:02 -07:00
gre.c tunnels: add _rcu annotations 2010-10-25 13:09:45 -07:00
icmp.c net: ipv4: add IPPROTO_ICMP socket kind 2011-05-13 16:08:13 -04:00
igmp.c ipv4: Use flowi4's {saddr,daddr} in igmpv3_newpack() and igmp_send_report() 2011-05-03 20:53:12 -07:00
inet_connection_sock.c ipv4: Create inet_csk_route_child_sock(). 2011-05-08 14:34:22 -07:00
inet_diag.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
inet_fragment.c
inet_hashtables.c inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners 2010-11-28 18:18:44 -08:00
inet_lro.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
inet_timewait_sock.c tcp: fix inet_twsk_deschedule() 2011-02-19 18:59:04 -08:00
inetpeer.c inetpeer: reduce stack usage 2011-04-12 13:58:33 -07:00
ip_forward.c ipv4: Fix 'iph' use before set. 2011-05-12 23:03:46 -04:00
ip_fragment.c net: ip_expire() must revalidate route 2011-05-04 14:04:07 -07:00
ip_gre.c net: call dev_alloc_name from register_netdevice 2011-05-05 10:57:45 -07:00
ip_input.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
ip_options.c ipv4: Simplify iph->daddr overwrite in ip_options_rcv_srr(). 2011-05-12 19:30:58 -04:00
ip_output.c ipv4: Pass explicit daddr arg to ip_send_reply(). 2011-05-10 13:32:46 -07:00
ip_sockglue.c inet: add RCU protection to inet->opt 2011-04-28 13:16:35 -07:00
ipcomp.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
ipconfig.c Fix common misspellings 2011-03-31 11:26:23 -03:00
ipip.c net: call dev_alloc_name from register_netdevice 2011-05-05 10:57:45 -07:00
ipmr.c ipv4: Pass explicit saddr/daddr args to ipmr_get_route(). 2011-05-04 12:18:54 -07:00
Kconfig ipv4: Remove fib_hash. 2011-02-01 15:35:25 -08:00
Makefile net: ipv4: add IPPROTO_ICMP socket kind 2011-05-13 16:08:13 -04:00
netfilter.c netfilter: af_info: add 'strict' parameter to limit lookup to .oif 2011-04-04 17:00:54 +02:00
ping.c net: ipv4: add IPPROTO_ICMP socket kind 2011-05-13 16:08:13 -04:00
proc.c tcp: Replace time wait bucket msg by counter 2010-12-08 12:16:33 -08:00
protocol.c net: add __rcu annotations to protocol 2010-10-27 11:37:31 -07:00
raw.c ipv4: Pass flow key down into ip_append_*(). 2011-05-08 21:24:07 -07:00
route.c ipv4: Pass explicit saddr/daddr args to ipmr_get_route(). 2011-05-04 12:18:54 -07:00
syncookies.c inet: add RCU protection to inet->opt 2011-04-28 13:16:35 -07:00
sysctl_net_ipv4.c net: ipv4: add IPPROTO_ICMP socket kind 2011-05-13 16:08:13 -04:00
tcp_bic.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_cong.c
tcp_cubic.c tcp_cubic: limit delayed_ack ratio to prevent divide error 2011-05-08 15:51:57 -07:00
tcp_diag.c
tcp_highspeed.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_htcp.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_hybla.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_illinois.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_input.c tcp: Make undo_ssthresh arg to tcp_undo_cwr() a bool. 2011-03-22 19:37:11 -07:00
tcp_ipv4.c ipv4: Pass explicit daddr arg to ip_send_reply(). 2011-05-10 13:32:46 -07:00
tcp_lp.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp_minisocks.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-12-08 13:47:38 -08:00
tcp_output.c inet: Pass flowi to ->queue_xmit(). 2011-05-08 15:28:28 -07:00
tcp_probe.c net: ipv4: tcp_probe: cleanup snprintf() use 2010-11-17 12:27:46 -08:00
tcp_scalable.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_timer.c tcp: Remove debug macro of TCP_CHECK_TIMER 2011-02-20 11:10:14 -08:00
tcp_vegas.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_vegas.h
tcp_veno.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_westwood.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_yeah.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp.c net: Allow no-cache copy from user on transmit 2011-04-04 22:30:30 -07:00
tunnel4.c tunnels: add __rcu annotations 2010-10-27 11:37:32 -07:00
udp_impl.h
udp.c ipv4: udp: Eliminate remaining uses of rt->rt_src 2011-05-10 13:32:47 -07:00
udplite.c net: fix nulls list corruptions in sk_prot_alloc 2010-12-16 14:26:56 -08:00
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: Don't pre-seed hoplimit metric. 2010-12-12 22:08:17 -08:00
xfrm4_output.c xfrm: Assign the inner mode output function to the dst entry 2011-05-10 15:03:34 -07:00
xfrm4_policy.c ipv4: xfrm: Eliminate ->rt_src reference in policy code. 2011-05-10 13:32:48 -07:00
xfrm4_state.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 2011-05-11 14:26:58 -04:00
xfrm4_tunnel.c