1. 30 Nov, 2017 5 commits
    • Johan Hovold's avatar
      NFC: fix device-allocation error return · eb2499b3
      Johan Hovold authored
      commit c45e3e4c5b134b081e8af362109905427967eb19 upstream.
      
      A recent change fixing NFC device allocation itself introduced an
      error-handling bug by returning an error pointer in case device-id
      allocation failed. This is clearly broken as the callers still expected
      NULL to be returned on errors as detected by Dan's static checker.
      
      Fix this up by returning NULL in the event that we've run out of memory
      when allocating a new device id.
      
      Note that the offending commit is marked for stable (3.8) so this fix
      needs to be backported along with it.
      
      Fixes: 20777bc5 ("NFC: fix broken device allocation")
      Reported-by: 's avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: 's avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: 's avatarSamuel Ortiz <sameo@linux.intel.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb2499b3
    • Chuck Lever's avatar
      svcrdma: Preserve CB send buffer across retransmits · a7a05def
      Chuck Lever authored
      commit 0bad47cada5defba13e98827d22d06f13258dfb3 upstream.
      
      During each NFSv4 callback Call, an RDMA Send completion frees the
      page that contains the RPC Call message. If the upper layer
      determines that a retransmit is necessary, this is too soon.
      
      One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
      callback request, the following BUG fires on the NFS server:
      
      kernel: BUG: Bad page state in process kworker/0:2H  pfn:7d3ce2
      kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping:          (null) index:0x0
      kernel: flags: 0x2fffff80000000()
      kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
      kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
      kernel: page dumped because: nonzero _refcount
      kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
      ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
      rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
      kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
      iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
      mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
      acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
      libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
      syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
      mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
      kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
      kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
      kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
      kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      kernel: Call Trace:
      kernel: dump_stack+0x62/0x80
      kernel: bad_page+0xfe/0x11a
      kernel: free_pages_check_bad+0x76/0x78
      kernel: free_pcppages_bulk+0x364/0x441
      kernel: ? ttwu_do_activate.isra.61+0x71/0x78
      kernel: free_hot_cold_page+0x1c5/0x202
      kernel: __put_page+0x2c/0x36
      kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
      kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]
      
      This issue exists all the way back to v4.5, but refactoring and code
      re-organization prevents this simple patch from applying to kernels
      older than v4.12. The fix is the same, however, if someone needs to
      backport it.
      Reported-by: 's avatarBen Coddington <bcodding@redhat.com>
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314
      Fixes: 5d252f90 ('svcrdma: Add class for RDMA backwards ... ')
      Signed-off-by: 's avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: 's avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: 's avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7a05def
    • Tuomas Tynkkynen's avatar
      net/9p: Switch to wait_event_killable() · b5c87f23
      Tuomas Tynkkynen authored
      commit 9523feac272ccad2ad8186ba4fcc89103754de52 upstream.
      
      Because userspace gets Very Unhappy when calls like stat() and execve()
      return -EINTR on 9p filesystem mounts. For instance, when bash is
      looking in PATH for things to execute and some SIGCHLD interrupts
      stat(), bash can throw a spurious 'command not found' since it doesn't
      retry the stat().
      
      In practice, hitting the problem is rare and needs a really
      slow/bogged down 9p server.
      Signed-off-by: 's avatarTuomas Tynkkynen <tuomas@tuxera.com>
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5c87f23
    • Tuomas Tynkkynen's avatar
      9p: Fix missing commas in mount options · d8319b3b
      Tuomas Tynkkynen authored
      commit 61b272c3aa170b3e461b8df636407b29f35f98eb upstream.
      
      Since commit c4fac910 ("9p: Implement show_options"), the mount
      options of 9p filesystems are printed out with some missing commas
      between the individual options:
      
      p9-scratch on /mnt/scratch type 9p (rw,dirsync,loose,access=clienttrans=virtio)
      
      Add them back.
      
      Fixes: c4fac910 ("9p: Implement show_options")
      Signed-off-by: 's avatarTuomas Tynkkynen <tuomas@tuxera.com>
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8319b3b
    • Eric Biggers's avatar
      libceph: don't WARN() if user tries to add invalid key · bcae2363
      Eric Biggers authored
      commit b11270853fa3654f08d4a6a03b23ddb220512d8d upstream.
      
      The WARN_ON(!key->len) in set_secret() in net/ceph/crypto.c is hit if a
      user tries to add a key of type "ceph" with an invalid payload as
      follows (assuming CONFIG_CEPH_LIB=y):
      
          echo -e -n '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' \
      	| keyctl padd ceph desc @s
      
      This can be hit by fuzzers.  As this is merely bad input and not a
      kernel bug, replace the WARN_ON() with return -EINVAL.
      
      Fixes: 7af3ea18 ("libceph: stop allocating a new cipher on every crypto request")
      Signed-off-by: 's avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: 's avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: 's avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcae2363
  2. 24 Nov, 2017 2 commits
    • Eric W. Biederman's avatar
      net/sctp: Always set scope_id in sctp_inet6_skb_msgname · 8d028694
      Eric W. Biederman authored
      
      [ Upstream commit 7c8a61d9ee1df0fb4747879fa67a99614eb62fec ]
      
      Alexandar Potapenko while testing the kernel with KMSAN and syzkaller
      discovered that in some configurations sctp would leak 4 bytes of
      kernel stack.
      
      Working with his reproducer I discovered that those 4 bytes that
      are leaked is the scope id of an ipv6 address returned by recvmsg.
      
      With a little code inspection and a shrewd guess I discovered that
      sctp_inet6_skb_msgname only initializes the scope_id field for link
      local ipv6 addresses to the interface index the link local address
      pertains to instead of initializing the scope_id field for all ipv6
      addresses.
      
      That is almost reasonable as scope_id's are meaniningful only for link
      local addresses.  Set the scope_id in all other cases to 0 which is
      not a valid interface index to make it clear there is nothing useful
      in the scope_id field.
      
      There should be no danger of breaking userspace as the stack leak
      guaranteed that previously meaningless random data was being returned.
      
      Fixes: 372f525b495c ("SCTP:  Resync with LKSCTP tree.")
      History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitReported-by: 's avatarAlexander Potapenko <glider@google.com>
      Tested-by: 's avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: 's avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d028694
    • Jason A. Donenfeld's avatar
      af_netlink: ensure that NLMSG_DONE never fails in dumps · 5856c858
      Jason A. Donenfeld authored
      
      [ Upstream commit 0642840b8bb008528dbdf929cec9f65ac4231ad0 ]
      
      The way people generally use netlink_dump is that they fill in the skb
      as much as possible, breaking when nla_put returns an error. Then, they
      get called again and start filling out the next skb, and again, and so
      forth. The mechanism at work here is the ability for the iterative
      dumping function to detect when the skb is filled up and not fill it
      past the brim, waiting for a fresh skb for the rest of the data.
      
      However, if the attributes are small and nicely packed, it is possible
      that a dump callback function successfully fills in attributes until the
      skb is of size 4080 (libmnl's default page-sized receive buffer size).
      The dump function completes, satisfied, and then, if it happens to be
      that this is actually the last skb, and no further ones are to be sent,
      then netlink_dump will add on the NLMSG_DONE part:
      
        nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
      
      It is very important that netlink_dump does this, of course. However, in
      this example, that call to nlmsg_put_answer will fail, because the
      previous filling by the dump function did not leave it enough room. And
      how could it possibly have done so? All of the nla_put variety of
      functions simply check to see if the skb has enough tailroom,
      independent of the context it is in.
      
      In order to keep the important assumptions of all netlink dump users, it
      is therefore important to give them an skb that has this end part of the
      tail already reserved, so that the call to nlmsg_put_answer does not
      fail. Otherwise, library authors are forced to find some bizarre sized
      receive buffer that has a large modulo relative to the common sizes of
      messages received, which is ugly and buggy.
      
      This patch thus saves the NLMSG_DONE for an additional message, for the
      case that things are dangerously close to the brim. This requires
      keeping track of the errno from ->dump() across calls.
      Signed-off-by: 's avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5856c858
  3. 11 Nov, 2017 3 commits
  4. 10 Nov, 2017 3 commits
    • Yuchung Cheng's avatar
      tcp: fix tcp_fastretrans_alert warning · 0eb96bf7
      Yuchung Cheng authored
      This patch fixes the cause of an WARNING indicatng TCP has pending
      retransmission in Open state in tcp_fastretrans_alert().
      
      The root cause is a bad interaction between path mtu probing,
      if enabled, and the RACK loss detection. Upong receiving a SACK
      above the sequence of the MTU probing packet, RACK could mark the
      probe packet lost in tcp_fastretrans_alert(), prior to calling
      tcp_simple_retransmit().
      
      tcp_simple_retransmit() only enters Loss state if it newly marks
      the probe packet lost. If the probe packet is already identified as
      lost by RACK, the sender remains in Open state with some packets
      marked lost and retransmitted. Then the next SACK would trigger
      the warning. The likely scenario is that the probe packet was
      lost due to its size or network congestion. The actual impact of
      this warning is small by potentially entering fast recovery an
      ACK later.
      
      The simple fix is always entering recovery (Loss) state if some
      packet is marked lost during path MTU probing.
      
      Fixes: a0370b3f ("tcp: enable RACK loss detection to trigger recovery")
      Reported-by: 's avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Reported-by: 's avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Reported-by: 's avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: 's avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: 's avatarEric Dumazet <edumazet@google.com>
      Acked-by: 's avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      0eb96bf7
    • Eric Dumazet's avatar
      tcp: gso: avoid refcount_t warning from tcp_gso_segment() · 7ec318fe
      Eric Dumazet authored
      When a GSO skb of truesize O is segmented into 2 new skbs of truesize N1
      and N2, we want to transfer socket ownership to the new fresh skbs.
      
      In order to avoid expensive atomic operations on a cache line subject to
      cache bouncing, we replace the sequence :
      
      refcount_add(N1, &sk->sk_wmem_alloc);
      refcount_add(N2, &sk->sk_wmem_alloc); // repeated by number of segments
      
      refcount_sub(O, &sk->sk_wmem_alloc);
      
      by a single
      
      refcount_add(sum_of(N) - O, &sk->sk_wmem_alloc);
      
      Problem is :
      
      In some pathological cases, sum(N) - O might be a negative number, and
      syzkaller bot was apparently able to trigger this trace [1]
      
      atomic_t was ok with this construct, but we need to take care of the
      negative delta with refcount_t
      
      [1]
      refcount_t: saturated; leaking memory.
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 8404 at lib/refcount.c:77 refcount_add_not_zero+0x198/0x200 lib/refcount.c:77
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 8404 Comm: syz-executor2 Not tainted 4.14.0-rc5-mm1+ #20
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x41c kernel/panic.c:183
       __warn+0x1c4/0x1e0 kernel/panic.c:546
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
       do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
       invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:refcount_add_not_zero+0x198/0x200 lib/refcount.c:77
      RSP: 0018:ffff8801c606e3a0 EFLAGS: 00010282
      RAX: 0000000000000026 RBX: 0000000000001401 RCX: 0000000000000000
      RDX: 0000000000000026 RSI: ffffc900036fc000 RDI: ffffed0038c0dc68
      RBP: ffff8801c606e430 R08: 0000000000000001 R09: 0000000000000000
      R10: ffff8801d97f5eba R11: 0000000000000000 R12: ffff8801d5acf73c
      R13: 1ffff10038c0dc75 R14: 00000000ffffffff R15: 00000000fffff72f
       refcount_add+0x1b/0x60 lib/refcount.c:101
       tcp_gso_segment+0x10d0/0x16b0 net/ipv4/tcp_offload.c:155
       tcp4_gso_segment+0xd4/0x310 net/ipv4/tcp_offload.c:51
       inet_gso_segment+0x60c/0x11c0 net/ipv4/af_inet.c:1271
       skb_mac_gso_segment+0x33f/0x660 net/core/dev.c:2749
       __skb_gso_segment+0x35f/0x7f0 net/core/dev.c:2821
       skb_gso_segment include/linux/netdevice.h:3971 [inline]
       validate_xmit_skb+0x4ba/0xb20 net/core/dev.c:3074
       __dev_queue_xmit+0xe49/0x2070 net/core/dev.c:3497
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3538
       neigh_hh_output include/net/neighbour.h:471 [inline]
       neigh_output include/net/neighbour.h:479 [inline]
       ip_finish_output2+0xece/0x1460 net/ipv4/ip_output.c:229
       ip_finish_output+0x85e/0xd10 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:238 [inline]
       ip_output+0x1cc/0x860 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:459 [inline]
       ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
       ip_queue_xmit+0x8c6/0x18e0 net/ipv4/ip_output.c:504
       tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1137
       tcp_write_xmit+0x663/0x4de0 net/ipv4/tcp_output.c:2341
       __tcp_push_pending_frames+0xa0/0x250 net/ipv4/tcp_output.c:2513
       tcp_push_pending_frames include/net/tcp.h:1722 [inline]
       tcp_data_snd_check net/ipv4/tcp_input.c:5050 [inline]
       tcp_rcv_established+0x8c7/0x18a0 net/ipv4/tcp_input.c:5497
       tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1460
       sk_backlog_rcv include/net/sock.h:909 [inline]
       __release_sock+0x124/0x360 net/core/sock.c:2264
       release_sock+0xa4/0x2a0 net/core/sock.c:2776
       tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
       sock_sendmsg_nosec net/socket.c:632 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:642
       ___sys_sendmsg+0x31c/0x890 net/socket.c:2048
       __sys_sendmmsg+0x1e6/0x5f0 net/socket.c:2138
      
      Fixes: 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      7ec318fe
    • Håkon Bugge's avatar
      rds: ib: Fix NULL pointer dereference in debug code · 1cb483a5
      Håkon Bugge authored
      rds_ib_recv_refill() is a function that refills an IB receive
      queue. It can be called from both the CQE handler (tasklet) and a
      worker thread.
      
      Just after the call to ib_post_recv(), a debug message is printed with
      rdsdebug():
      
                  ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
                  rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
                           recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
                           (long) ib_sg_dma_address(
                                  ic->i_cm_id->device,
                                  &recv->r_frag->f_sg),
                          ret);
      
      Now consider an invocation of rds_ib_recv_refill() from the worker
      thread, which is preemptible. Further, assume that the worker thread
      is preempted between the ib_post_recv() and rdsdebug() statements.
      
      Then, if the preemption is due to a receive CQE event, the
      rds_ib_recv_cqe_handler() will be invoked. This function processes
      receive completions, including freeing up data structures, such as the
      recv->r_frag.
      
      In this scenario, rds_ib_recv_cqe_handler() will process the receive
      WR posted above. That implies, that the recv->r_frag has been freed
      before the above rdsdebug() statement has been executed. When it is
      later executed, we will have a NULL pointer dereference:
      
      [ 4088.068008] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [ 4088.076754] IP: rds_ib_recv_refill+0x87/0x620 [rds_rdma]
      [ 4088.082686] PGD 0 P4D 0
      [ 4088.085515] Oops: 0000 [#1] SMP
      [ 4088.089015] Modules linked in: rds_rdma(OE) rds(OE) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) mlx4_ib(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_core(E) binfmt_misc(E) sb_edac(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) crypto_simd(E) iTCO_wdt(E) glue_helper(E) iTCO_vendor_support(E) sg(E) cryptd(E) pcspkr(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) shpchp(E) ioatdma(E) i2c_i801(E) wmi(E) lpc_ich(E) mei_me(E) mei(E) mfd_core(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E)
      [ 4088.168486]  fb_sys_fops(E) ahci(E) ixgbe(E) libahci(E) ttm(E) mdio(E) ptp(E) pps_core(E) drm(E) sd_mod(E) libata(E) crc32c_intel(E) mlx4_core(E) i2c_core(E) dca(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: rds]
      [ 4088.193442] CPU: 20 PID: 1244 Comm: kworker/20:2 Tainted: G           OE   4.14.0-rc7.master.20171105.ol7.x86_64 #1
      [ 4088.205097] Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
      [ 4088.216074] Workqueue: ib_cm cm_work_handler [ib_cm]
      [ 4088.221614] task: ffff885fa11d0000 task.stack: ffffc9000e598000
      [ 4088.228224] RIP: 0010:rds_ib_recv_refill+0x87/0x620 [rds_rdma]
      [ 4088.234736] RSP: 0018:ffffc9000e59bb68 EFLAGS: 00010286
      [ 4088.240568] RAX: 0000000000000000 RBX: ffffc9002115d050 RCX: ffffc9002115d050
      [ 4088.248535] RDX: ffffffffa0521380 RSI: ffffffffa0522158 RDI: ffffffffa0525580
      [ 4088.256498] RBP: ffffc9000e59bbf8 R08: 0000000000000005 R09: 0000000000000000
      [ 4088.264465] R10: 0000000000000339 R11: 0000000000000001 R12: 0000000000000000
      [ 4088.272433] R13: ffff885f8c9d8000 R14: ffffffff81a0a060 R15: ffff884676268000
      [ 4088.280397] FS:  0000000000000000(0000) GS:ffff885fbec80000(0000) knlGS:0000000000000000
      [ 4088.289434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4088.295846] CR2: 0000000000000020 CR3: 0000000001e09005 CR4: 00000000001606e0
      [ 4088.303816] Call Trace:
      [ 4088.306557]  rds_ib_cm_connect_complete+0xe0/0x220 [rds_rdma]
      [ 4088.312982]  ? __dynamic_pr_debug+0x8c/0xb0
      [ 4088.317664]  ? __queue_work+0x142/0x3c0
      [ 4088.321944]  rds_rdma_cm_event_handler+0x19e/0x250 [rds_rdma]
      [ 4088.328370]  cma_ib_handler+0xcd/0x280 [rdma_cm]
      [ 4088.333522]  cm_process_work+0x25/0x120 [ib_cm]
      [ 4088.338580]  cm_work_handler+0xd6b/0x17aa [ib_cm]
      [ 4088.343832]  process_one_work+0x149/0x360
      [ 4088.348307]  worker_thread+0x4d/0x3e0
      [ 4088.352397]  kthread+0x109/0x140
      [ 4088.355996]  ? rescuer_thread+0x380/0x380
      [ 4088.360467]  ? kthread_park+0x60/0x60
      [ 4088.364563]  ret_from_fork+0x25/0x30
      [ 4088.368548] Code: 48 89 45 90 48 89 45 98 eb 4d 0f 1f 44 00 00 48 8b 43 08 48 89 d9 48 c7 c2 80 13 52 a0 48 c7 c6 58 21 52 a0 48 c7 c7 80 55 52 a0 <4c> 8b 48 20 44 89 64 24 08 48 8b 40 30 49 83 e1 fc 48 89 04 24
      [ 4088.389612] RIP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] RSP: ffffc9000e59bb68
      [ 4088.397772] CR2: 0000000000000020
      [ 4088.401505] ---[ end trace fe922e6ccf004431 ]---
      
      This bug was provoked by compiling rds out-of-tree with
      EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
      between the rdsdebug() and ib_ib_port_recv() statements:
      
         	       /* XXX when can this fail? */
      	       ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
      +		if (can_wait)
      +			usleep_range(1000, 5000);
      	       rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
      			recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
      			(long) ib_sg_dma_address(
      
      The fix is simply to move the rdsdebug() statement up before the
      ib_post_recv() and remove the printing of ret, which is taken care of
      anyway by the non-debug code.
      Signed-off-by: 's avatarHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: 's avatarKnut Omang <knut.omang@oracle.com>
      Reviewed-by: 's avatarWei Lin Guay <wei.lin.guay@oracle.com>
      Acked-by: 's avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      1cb483a5
  5. 09 Nov, 2017 13 commits
  6. 08 Nov, 2017 1 commit
  7. 05 Nov, 2017 2 commits
    • Priyaranjan Jha's avatar
      tcp: fix DSACK-based undo on non-duplicate ACK · d09b9e60
      Priyaranjan Jha authored
      Fixes DSACK-based undo when sender is in Open State and
      an ACK advances snd_una.
      
      Example scenario:
      - Sender goes into recovery and makes some spurious rtx.
      - It comes out of recovery and enters into open state.
      - It sends some more packets, let's say 4.
      - The receiver sends an ACK for the first two, but this ACK is lost.
      - The sender receives ack for first two, and DSACK for previous
        spurious rtx.
      Signed-off-by: 's avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: 's avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: 's avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: 's avatarYousuk Seung <ysseung@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      d09b9e60
    • Guillaume Nault's avatar
      l2tp: don't use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6 · 8f7dc9ae
      Guillaume Nault authored
      Using l2tp_tunnel_find() in l2tp_ip_recv() is wrong for two reasons:
      
        * It doesn't take a reference on the returned tunnel, which makes the
          call racy wrt. concurrent tunnel deletion.
      
        * The lookup is only based on the tunnel identifier, so it can return
          a tunnel that doesn't match the packet's addresses or protocol.
      
      For example, a packet sent to an L2TPv3 over IPv6 tunnel can be
      delivered to an L2TPv2 over UDPv4 tunnel. This is worse than a simple
      cross-talk: when delivering the packet to an L2TP over UDP tunnel, the
      corresponding socket is UDP, where ->sk_backlog_rcv() is NULL. Calling
      sk_receive_skb() will then crash the kernel by trying to execute this
      callback.
      
      And l2tp_tunnel_find() isn't even needed here. __l2tp_ip_bind_lookup()
      properly checks the socket binding and connection settings. It was used
      as a fallback mechanism for finding tunnels that didn't have their data
      path registered yet. But it's not limited to this case and can be used
      to replace l2tp_tunnel_find() in the general case.
      
      Fix l2tp_ip6 in the same way.
      
      Fixes: 0d76751f ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
      Fixes: a32e0eec ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
      Signed-off-by: 's avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      8f7dc9ae
  8. 04 Nov, 2017 1 commit
  9. 03 Nov, 2017 6 commits
    • Steffen Klassert's avatar
      xfrm: Fix stack-out-of-bounds read in xfrm_state_find. · c9f3f813
      Steffen Klassert authored
      When we do tunnel or beet mode, we pass saddr and daddr from the
      template to xfrm_state_find(), this is ok. On transport mode,
      we pass the addresses from the flowi, assuming that the IP
      addresses (and address family) don't change during transformation.
      This assumption is wrong in the IPv4 mapped IPv6 case, packet
      is IPv4 and template is IPv6. Fix this by using the addresses
      from the template unconditionally.
      Signed-off-by: 's avatarSteffen Klassert <steffen.klassert@secunet.com>
      c9f3f813
    • Florian Westphal's avatar
      xfrm: do unconditional template resolution before pcpu cache check · cf379667
      Florian Westphal authored
      Stephen Smalley says:
       Since 4.14-rc1, the selinux-testsuite has been encountering sporadic
       failures during testing of labeled IPSEC. git bisect pointed to
       commit ec30d ("xfrm: add xdst pcpu cache").
       The xdst pcpu cache is only checking that the policies are the same,
       but does not validate that the policy, state, and flow match with respect
       to security context labeling.
       As a result, the wrong SA could be used and the receiver could end up
       performing permission checking and providing SO_PEERSEC or SCM_SECURITY
       values for the wrong security context.
      
      This fix makes it so that we always do the template resolution, and
      then checks that the found states match those in the pcpu bundle.
      
      This has the disadvantage of doing a bit more work (lookup in state hash
      table) if we can reuse the xdst entry (we only avoid xdst alloc/free)
      but we don't add a lot of extra work in case we can't reuse.
      
      xfrm_pol_dead() check is removed, reasoning is that
      xfrm_tmpl_resolve does all needed checks.
      
      Cc: Paul Moore <paul@paul-moore.com>
      Fixes: ec30d78c ("xfrm: add xdst pcpu cache")
      Reported-by: 's avatarStephen Smalley <sds@tycho.nsa.gov>
      Tested-by: 's avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: 's avatarFlorian Westphal <fw@strlen.de>
      Acked-by: 's avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: 's avatarSteffen Klassert <steffen.klassert@secunet.com>
      cf379667
    • Eric Dumazet's avatar
      tcp: do not mangle skb->cb[] in tcp_make_synack() · 3b117750
      Eric Dumazet authored
      Christoph Paasch sent a patch to address the following issue :
      
      tcp_make_synack() is leaving some TCP private info in skb->cb[],
      then send the packet by other means than tcp_transmit_skb()
      
      tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
      IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
      
      tcp_make_synack() should not use tcp_init_nondata_skb() :
      
      tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
      queues (the ones that are only sent via tcp_transmit_skb())
      
      This patch fixes the issue and should even save few cpu cycles ;)
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: 's avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      3b117750
    • Florian Westphal's avatar
      fib: fib_dump_info can no longer use __in_dev_get_rtnl · 25dd169a
      Florian Westphal authored
      syzbot reported yet another regression added with DOIT_UNLOCKED.
      When nexthop is marked as dead, fib_dump_info uses __in_dev_get_rtnl():
      
      ./include/linux/inetdevice.h:230 suspicious rcu_dereference_protected() usage!
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by syz-executor2/23859:
       #0:  (rcu_read_lock){....}, at: [<ffffffff840283f0>]
      inet_rtm_getroute+0xaa0/0x2d70 net/ipv4/route.c:2738
      [..]
        lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4665
        __in_dev_get_rtnl include/linux/inetdevice.h:230 [inline]
        fib_dump_info+0x1136/0x13d0 net/ipv4/fib_semantics.c:1377
        inet_rtm_getroute+0xf97/0x2d70 net/ipv4/route.c:2785
      ..
      
      This isn't safe anymore, callers either hold RTNL mutex or rcu read lock,
      so these spots must use rcu_dereference_rtnl() or plain rcu_derefence()
      (plus unconditional rcu read lock).
      
      This does the latter.
      
      Fixes: 394f51ab ("ipv4: route: set ipv4 RTM_GETROUTE to not use rtnl")
      Reported-by: 's avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: 's avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      25dd169a
    • Cong Wang's avatar
      net_sched: hold netns refcnt for each action · ceffcc5e
      Cong Wang authored
      TC actions have been destroyed asynchronously for a long time,
      previously in a RCU callback and now in a workqueue. If we
      don't hold a refcnt for its netns, we could use the per netns
      data structure, struct tcf_idrinfo, after it has been freed by
      netns workqueue.
      
      Hold refcnt to ensure netns destroy happens after all actions
      are gone.
      
      Fixes: ddf97ccd ("net_sched: add network namespace support for tc actions")
      Reported-by: 's avatarLucas Bates <lucasb@mojatatu.com>
      Tested-by: 's avatarLucas Bates <lucasb@mojatatu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: 's avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      ceffcc5e
    • Cong Wang's avatar
      net_sched: acquire RTNL in tc_action_net_exit() · a159d3c4
      Cong Wang authored
      I forgot to acquire RTNL in tc_action_net_exit()
      which leads that action ops->cleanup() is not always
      called with RTNL. This usually is not a big deal because
      this function is called after all netns refcnt are gone,
      but given RTNL protects more than just actions, add it
      for safety and consistency.
      
      Also add an assertion to catch other potential bugs.
      
      Fixes: ddf97ccd ("net_sched: add network namespace support for tc actions")
      Reported-by: 's avatarLucas Bates <lucasb@mojatatu.com>
      Tested-by: 's avatarLucas Bates <lucasb@mojatatu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: 's avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      a159d3c4
  10. 02 Nov, 2017 3 commits
    • Florian Westphal's avatar
      xfrm: defer daddr pointer assignment after spi parsing · cb79a180
      Florian Westphal authored
      syzbot reports:
      BUG: KASAN: use-after-free in __xfrm_state_lookup+0x695/0x6b0
      Read of size 4 at addr ffff8801d434e538 by task syzkaller647520/2991
      [..]
      __xfrm_state_lookup+0x695/0x6b0 net/xfrm/xfrm_state.c:833
      xfrm_state_lookup+0x8a/0x160 net/xfrm/xfrm_state.c:1592
      xfrm_input+0x8e5/0x22f0 net/xfrm/xfrm_input.c:302
      
      The use-after-free is the ipv4 destination address, which points
      to an skb head area that has been reallocated:
        pskb_expand_head+0x36b/0x1210 net/core/skbuff.c:1494
        __pskb_pull_tail+0x14a/0x17c0 net/core/skbuff.c:1877
        pskb_may_pull include/linux/skbuff.h:2102 [inline]
        xfrm_parse_spi+0x3d3/0x4d0 net/xfrm/xfrm_input.c:170
        xfrm_input+0xce2/0x22f0 net/xfrm/xfrm_input.c:291
      
      so the real bug is that xfrm_parse_spi() uses pskb_may_pull, but
      for now do smaller workaround that makes xfrm_input fetch daddr
      after spi parsing.
      Reported-by: 's avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: 's avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: 's avatarSteffen Klassert <steffen.klassert@secunet.com>
      cb79a180
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: 's avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: 's avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
    • Konstantin Khlebnikov's avatar
      tcp_nv: fix division by zero in tcpnv_acked() · 4eebff27
      Konstantin Khlebnikov authored
      Average RTT could become zero. This happened in real life at least twice.
      This patch treats zero as 1us.
      Signed-off-by: 's avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: 's avatarLawrence Brakmo <Brakmo@fb.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      4eebff27
  11. 01 Nov, 2017 1 commit