Skip to content
  • Vasiliy Kulikov's avatar
    net: ipv4: add IPPROTO_ICMP socket kind · c319b4d7
    Vasiliy Kulikov authored
    This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
    ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
    without any special privileges.  In other words, the patch makes it
    possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
    order not to increase the kernel's attack surface, the new functionality
    is disabled by default, but is enabled at bootup by supporting Linux
    distributions, optionally with restriction to a group or a group range
    (see below).
    
    Similar functionality is implemented in Mac OS X:
    http://www.manpagez.com/man/4/icmp/
    
    A new ping socket is created with
    
        socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
    
    Message identifiers (octets 4-5 of ICMP header) are interpreted as local
    ports. Addresses are stored in struct sockaddr_in. No port numbers are
    reserved for privileged processes, port 0 is reserved for API ("let the
    kernel pick a free number"). There is no notion of remote ports, remote
    port numbers provided by the user (e.g. in connect()) are ignored.
    
    Data sent and received include ICMP headers. This is deliberate to:
    1) Avoid the need to transport headers values like sequence numbers by
    other means.
    2) Make it easier to port existing programs using raw sockets.
    
    ICMP headers given to send() are checked and sanitized. The type must be
    ICMP_ECHO and the code must be zero (future extensions might relax this,
    see below). The id is set to the number (local port) of the socket, the
    checksum is always recomputed.
    
    ICMP reply packets received from the network are demultiplexed according
    to their id's, and are returned by recv() without any modifications.
    IP header information and ICMP errors of those packets may be obtained
    via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
    quenches and redirects are reported as fake errors via the error queue
    (IP_RECVERR); the next hop address for redirects is saved to ee_info (in
    network order).
    
    socket(2) is restricted to the group range specified in
    "/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
    that nobody (not even root) may create ping sockets.  Setting it to "100
    100" would grant permissions to the single group (to either make
    /sbin/ping g+s and owned by this group or to grant permissions to the
    "netadmins" group), "0 4294967295" would enable it for the world, "100
    4294967295" would enable it for the users, but not daemons.
    
    The existing code might be (in the unlikely case anyone needs it)
    extended rather easily to handle other similar pairs of ICMP messages
    (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
    etc.).
    
    Userspace ping util & patch for it:
    http://openwall.info/wiki/people/segoon/ping
    
    For Openwall GNU/*/Linux it was the last step on the road to the
    setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
    is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
    http://mirrors.kernel.org/openwall/Owl/current/iso/
    
    
    
    Initially this functionality was written by Pavel Kankovsky for
    Linux 2.4.32, but unfortunately it was never made public.
    
    All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
    the patch.
    
    PATCH v3:
        - switched to flowi4.
        - minor changes to be consistent with raw sockets code.
    
    PATCH v2:
        - changed ping_debug() to pr_debug().
        - removed CONFIG_IP_PING.
        - removed ping_seq_fops.owner field (unused for procfs).
        - switched to proc_net_fops_create().
        - switched to %pK in seq_printf().
    
    PATCH v1:
        - fixed checksumming bug.
        - CAP_NET_RAW may not create icmp sockets anymore.
    
    RFC v2:
        - minor cleanups.
        - introduced sysctl'able group range to restrict socket(2).
    
    Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    c319b4d7