1. 10 Jun, 2018 1 commit
    • Cong Wang's avatar
      socket: close race condition between sock_close() and sockfs_setattr() · 6d8c50dc
      Cong Wang authored
      fchownat() doesn't even hold refcnt of fd until it figures out
      fd is really needed (otherwise is ignored) and releases it after
      it resolves the path. This means sock_close() could race with
      sockfs_setattr(), which leads to a NULL pointer dereference
      since typically we set sock->sk to NULL in ->release().
      
      As pointed out by Al, this is unique to sockfs. So we can fix this
      in socket layer by acquiring inode_lock in sock_close() and
      checking against NULL in sockfs_setattr().
      
      sock_release() is called in many places, only the sock_close()
      path matters here. And fortunately, this should not affect normal
      sock_close() as it is only called when the last fd refcnt is gone.
      It only affects sock_close() with a parallel sockfs_setattr() in
      progress, which is not common.
      
      Fixes: 86741ec2 ("net: core: Add a UID field to struct sock.")
      Reported-by: default avatarshankarapailoor <shankarapailoor@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d8c50dc
  2. 26 May, 2018 2 commits
  3. 04 May, 2018 1 commit
  4. 02 Apr, 2018 17 commits
  5. 27 Mar, 2018 1 commit
  6. 12 Mar, 2018 1 commit
    • Xin Long's avatar
      sock_diag: request _diag module only when the family or proto has been registered · bf2ae2e4
      Xin Long authored
      Now when using 'ss' in iproute, kernel would try to load all _diag
      modules, which also causes corresponding family and proto modules
      to be loaded as well due to module dependencies.
      
      Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
      would be loaded.
      
      For example:
      
        $ lsmod|grep sctp
        $ ss
        $ lsmod|grep sctp
        sctp_diag              16384  0
        sctp                  323584  5 sctp_diag
        inet_diag              24576  4 raw_diag,tcp_diag,sctp_diag,udp_diag
        libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
      
      As these family and proto modules are loaded unintentionally, it
      could cause some problems, like:
      
      - Some debug tools use 'ss' to collect the socket info, which loads all
        those diag and family and protocol modules. It's noisy for identifying
        issues.
      
      - Users usually expect to drop sctp init packet silently when they
        have no sense of sctp protocol instead of sending abort back.
      
      - It wastes resources (especially with multiple netns), and SCTP module
        can't be unloaded once it's loaded.
      
      ...
      
      In short, it's really inappropriate to have these family and proto
      modules loaded unexpectedly when just doing debugging with inet_diag.
      
      This patch is to introduce sock_load_diag_module() where it loads
      the _diag module only when it's corresponding family or proto has
      been already registered.
      
      Note that we can't just load _diag module without the family or
      proto loaded, as some symbols used in _diag module are from the
      family or proto module.
      
      v1->v2:
        - move inet proto check to inet_diag to avoid a compiling err.
      v2->v3:
        - define sock_load_diag_module in sock.c and export one symbol
          only.
        - improve the changelog.
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarPhil Sutter <phil@nwl.cc>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf2ae2e4
  7. 02 Mar, 2018 1 commit
  8. 26 Feb, 2018 1 commit
  9. 15 Feb, 2018 1 commit
  10. 14 Feb, 2018 1 commit
  11. 12 Feb, 2018 1 commit
    • Denys Vlasenko's avatar
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko authored
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  12. 25 Jan, 2018 7 commits
  13. 20 Jan, 2018 1 commit
  14. 10 Jan, 2018 1 commit
  15. 09 Jan, 2018 1 commit
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_JIT_ALWAYS_ON config · 290af866
      Alexei Starovoitov authored
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      290af866
  16. 19 Dec, 2017 1 commit
    • Tonghao Zhang's avatar
      sock: Move the socket inuse to namespace. · 648845ab
      Tonghao Zhang authored
      In some case, we want to know how many sockets are in use in
      different _net_ namespaces. It's a key resource metric.
      
      This patch add a member in struct netns_core. This is a counter
      for socket-inuse in the _net_ namespace. The patch will add/sub
      counter in the sk_alloc, sk_clone_lock and __sk_free.
      
      This patch will not counter the socket created in kernel.
      It's not very useful for userspace to know how many kernel
      sockets we created.
      
      The main reasons for doing this are that:
      
      1. When linux calls the 'do_exit' for process to exit, the functions
      'exit_task_namespaces' and 'exit_task_work' will be called sequentially.
      'exit_task_namespaces' may have destroyed the _net_ namespace, but
      'sock_release' called in 'exit_task_work' may use the _net_ namespace
      if we counter the socket-inuse in sock_release.
      
      2. socket and sock are in pair. More important, sock holds the _net_
      namespace. We counter the socket-inuse in sock, for avoiding holding
      _net_ namespace again in socket. It's a easy way to maintain the code.
      Signed-off-by: default avatarMartin Zhang <zhangjunweimartin@didichuxing.com>
      Signed-off-by: default avatarTonghao Zhang <zhangtonghao@didichuxing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      648845ab
  17. 05 Dec, 2017 1 commit