      blk-mq: Fix a race between bt_clear_tag() and bt_get() · c38d185d
      What we need is the following two guarantees:
      * Any thread that observes the effect of the test_and_set_bit() by
        __bt_get_word() also observes the preceding addition of 'current'
        to the appropriate wait list. This is guaranteed by the semantics
        of the spin_unlock() operation performed by prepare_and_wait().
        Hence the conversion of test_and_set_bit_lock() into
      * The wait lists are examined by bt_clear() after the tag bit has
        been cleared. clear_bit_unlock() guarantees that any thread that
        observes that the bit has been cleared also observes the store
        operations preceding clear_bit_unlock(). However,
        clear_bit_unlock() does not prevent that the wait lists are examined
        before that the tag bit is cleared. Hence the addition of a memory
        barrier between clear_bit() and the wait list examination.
      blk-mq: Avoid that __bt_get_word() wraps multiple times · 9e98e9d7
      If __bt_get_word() is called with last_tag != 0, if the first
      find_next_zero_bit() fails, if after wrap-around the
      test_and_set_bit() call fails and find_next_zero_bit() succeeds,
      if the next test_and_set_bit() call fails and subsequently
      find_next_zero_bit() does not find a zero bit, then another
      wrap-around will occur. Avoid this by introducing an additional
      local variable.
      blk-mq: Fix a use-after-free · 45a9c9d9
      blk-mq users are allowed to free the memory request_queue.tag_set
      points at after blk_cleanup_queue() has finished but before
      blk_release_queue() has started. This can happen e.g. in the SCSI
      core. The SCSI core namely embeds the tag_set structure in a SCSI
      host structure. The SCSI host structure is freed by
      scsi_host_dev_release(). This function is called after
      blk_cleanup_queue() finished but can be called before
      This means that it is not safe to access request_queue.tag_set from
      inside blk_release_queue(). Hence remove the blk_sync_queue() call
      from blk_release_queue(). This call is not necessary - outstanding
      requests must have finished before blk_release_queue() is
      called. Additionally, move the blk_mq_free_queue() call from
      blk_release_queue() to blk_cleanup_queue() to avoid that struct
      request_queue.tag_set gets accessed after it has been freed.
      This patch avoids that the following kernel oops can be triggered
      when deleting a SCSI host for which scsi-mq was enabled:
      Call Trace:
       [<ffffffff8109a7c4>] lock_acquire+0xc4/0x270
       [<ffffffff814ce111>] mutex_lock_nested+0x61/0x380
       [<ffffffff812575f0>] blk_mq_free_queue+0x30/0x180
       [<ffffffff8124d654>] blk_release_queue+0x84/0xd0
       [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
       [<ffffffff8126c140>] kobject_put+0x30/0x70
       [<ffffffff81245895>] blk_put_queue+0x15/0x20
       [<ffffffff8125c409>] disk_release+0x99/0xd0
       [<ffffffff8133d056>] device_release+0x36/0xb0
       [<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
       [<ffffffff8126c140>] kobject_put+0x30/0x70
       [<ffffffff8125a78a>] put_disk+0x1a/0x20
       [<ffffffff811d4cb5>] __blkdev_put+0x135/0x1b0
       [<ffffffff811d56a0>] blkdev_put+0x50/0x160
       [<ffffffff81199eb4>] kill_block_super+0x44/0x70
       [<ffffffff8119a2a4>] deactivate_locked_super+0x44/0x60
       [<ffffffff8119a87e>] deactivate_super+0x4e/0x70
       [<ffffffff811b9833>] cleanup_mnt+0x43/0x90
       [<ffffffff811b98d2>] __cleanup_mnt+0x12/0x20
       [<ffffffff8107252c>] task_work_run+0xac/0xe0
       [<ffffffff81002c01>] do_notify_resume+0x61/0xa0
       [<ffffffff814d2c58>] int_signal+0x12/0x17
      blk-mq: prevent unmapped hw queue from being scheduled · 19c66e59
      When one hardware queue has no mapped software queues, it
      shouldn't have been scheduled. Otherwise WARNING or OOPS
      can triggered.
      blk_mq_hw_queue_mapped() helper is introduce for fixing
      the problem.
      writeback: fix a subtle race condition in I_DIRTY clearing · 9c6ac78e
      After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
      tests inode->i_state locklessly to see whether it already has all the
      necessary I_DIRTY bits set.  The comment above the barrier doesn't
      contain any useful information - memory barriers can't ensure "changes
      are seen by all cpus" by itself.
      And it sure enough was broken.  Please consider the following
       CPU 0					CPU 1
      					enters __writeback_single_inode()
      					grabs inode->i_lock
      					tests PAGECACHE_TAG_DIRTY which is clear
       enters __set_page_dirty()
       grabs mapping->tree_lock
       releases mapping->tree_lock
       leaves __set_page_dirty()
       enters __mark_inode_dirty()
       sees I_DIRTY_PAGES set
       leaves __mark_inode_dirty()
      					clears I_DIRTY_PAGES
      					releases inode->i_lock
      Now @inode has dirty pages w/ I_DIRTY_PAGES clear.  This doesn't seem
      to lead to an immediately critical problem because requeue_inode()
      later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
      deciding whether the inode needs to be requeued for IO and there are
      enough unintentional memory barriers inbetween, so while the inode
      ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
      IO list.
      The lack of explicit barrier may also theoretically affect the other
      I_DIRTY bits which deal with metadata dirtiness.  There is no
      guarantee that a strong enough barrier exists between
      I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
      inode.  Filesystem inode writeout path likely has enough stuff which
      can behave as full barrier but it's theoretically possible that the
      writeout may not see all the updates from ->dirty_inode().
      Fix it by adding an explicit smp_mb() after I_DIRTY clearing.  Note
      that I_DIRTY_PAGES needs a special treatment as it always needs to be
      cleared to be interlocked with the lockless test on
      __mark_inode_dirty() side.  It's cleared unconditionally and
      reinstated after smp_mb() if the mapping still has dirty pages.
      Also add comments explaining how and why the barriers are paired.
      Lightly tested.
      block: Expand a bit documentation about elevator_allow_merge_fn · b8ab956c
      Explain that two requests can be merged without
      elevator_allow_merge_fn() being called.
      blk-mq: add BLK_MQ_F_DEFER_ISSUE support flag · e167dfb5
      Drivers can now tell blk-mq if they take advantage of the deferred
      issue through 'last' or not. If they do, don't do queue-direct
      for sync IO. This is a preparation patch for the nvme conversion.
      blk-mq: add a 'list' parameter to ->queue_rq() · 74c45052
      Since we have the notion of a 'last' request in a chain, we can use
      this to have the hardware optimize the issuing of requests. Add
      a list_head parameter to queue_rq that the driver can use to
      temporarily store hw commands for issue when 'last' is true. If we
      are doing a chain of requests, pass in a NULL list for the first
      request to force issue of that immediately, then batch the remainder
      for deferred issue until the last request has been sent.
      Instead of adding yet another argument to the hot ->queue_rq path,
      encapsulate the passed arguments in a blk_mq_queue_data structure.
      This is passed as a constant, and has been tested as faster than
      passing 4 (or even 3) args through ->queue_rq. Update drivers for
      the new ->queue_rq() prototype. There are no functional changes
      in this patch for drivers - if they don't use the passed in list,
      then they will just queue requests individually like before.
      block: remove artifical max_hw_sectors cap · 34b48db6
      Set max_sectors to the value the drivers provides as hardware limit by
      default.  Linux had proper I/O throttling for a long time and doesn't
      rely on a artifically small maximum I/O size anymore.  By not limiting
      the I/O size by default we remove an annoying tuning step required for
      most Linux installation.
      Note that both the user, and if absolutely required the driver can still
      impose a limit for FS requests below max_hw_sectors_kb.
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · c2661b80
      Pull ext4 updates from Ted Ts'o:
       "A large number of cleanups and bug fixes, with some (minor) journal
      Linux 3.18-rc1 · f114040e
      Merge tag 'arm-soc-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 4d3639ac
      Pull ARM SoC fixes from Olof Johansson:
       "A batch of fixes that have come in during the merge window.
        Some of them are defconfig updates for things that have now landed,
        some errata additions and a few general scattered fixes.
        There's also a qcom DT update that adds support for SATA on AP148, and
        basic support for Sony Xperia Z1 and CM-QS600 platforms that seemed
        isolated enough that we could merge it even if it's late"
      Merge git://git.infradead.org/users/eparis/audit · ab074ade
      Pull audit updates from Eric Paris:
       "So this change across a whole bunch of arches really solves one basic
        problem.  We want to audit when seccomp is killing a process.  seccomp
        hooks in before the audit syscall entry code.  audit_syscall_entry
        took as an argument the arch of the given syscall.  Since the arch is
        part of what makes a syscall number meaningful it's an important part
        of the record, but it isn't available when seccomp shoots the
        For most arch's we have a better way to get the arch (syscall_get_arch)
        So the solution was two fold: Implement syscall_get_arch() everywhere
        there is audit which didn't have it.  Use syscall_get_arch() in the
        seccomp audit code.  Having syscall_get_arch() everywhere meant it was
        a useless flag on the stack and we could get rid of it for the typical
        syscall entry.
        The other changes inside the audit system aren't grand, fixed some
        records that had invalid spaces.  Better locking around the task comm
        field.  Removing some dead functions and structs.  Make some things
        static.  Really minor stuff"
      Merge tag 'qcom-dt-for-3.18-3' of... · 57764512
      Merge tag 'qcom-dt-for-3.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/galak/linux-qcom into fixes
      Merge "qcom DT changes for v3.18-3" from Kumar Gala:
      Qualcomm ARM Based Device Tree Updates for v3.18-3
      * Added Board support for CM-QS600 and Sony Xperia Z1 phone
      * Added SATA support on IPQ8064/AP148
      Merge tag 'samsung-fixes-2' of... · e29c6486
      Merge tag 'samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes
      Pull more fixes from Kukjin Kim:
      2nd Samsung fixes for v3.18
      - Explicitly set dr_mode on exynos5800-peach-pi, exynos5420-peach-pit
        and exynos5420-arndale-octa boards, because the USB dwc3 controller
        will not work properly without dr_mode as host on above boards if
        the USB host and gadget are enabled in kernel configuration both.
      MAINTAINERS: corrected bcm2835 search · 9209bec4
      Corrected bcm2835 maintainer info by using N: to specify any files with
      bcm2835 in are directed to the proper maintainer.
      Also corrected minor mispelling of ARCHITECTURE in 2 comment locations.
      Merge tag 'ntb-3.18' of git://github.com/jonmason/ntb · 61ed53de
      Pull ntb (non-transparent bridge) updates from Jon Mason:
       "Add support for Haswell NTB split BARs, a debugfs entry for basic
        debugging info, and some code clean-ups"
      Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 278f1d07
      Pull i2c updates from Wolfram Sang:
       "Highlights from the I2C subsystem for 3.18:
         - new drivers for Axxia AM55xx, and Hisilicon hix5hd2 SoC.
         - designware driver gained AMD support, exynos gained exynos7 support
        The rest is usual driver stuff.  Hopefully no lowlights this time"
      Merge tag 'sound-fix-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · d590c6cd
      Pull sound fixes from Takashi Iwai:
       "Here are a collection of small fixes after 3.18 merge.
        The urgent one is the fix for kernel panics with linked PCM substream
        triggered by the recent nonatomic PCM ops support.  Other two fixes
        (emu10k1 and bebob) are stable fixes, and one easy PCI ID addition for
        a new Intel HD-audio controller"
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · fb378df5
      Pull second round of input updates from Dmitry Torokhov:
       "Mostly simple bug fixes, although we do have one brand new driver for
        Microchip AR1021 i2c touchscreen.
        Also there is the change to stop trying to use i8042 active
        multiplexing by default (it is still possible to activate it via
        i8042.nomux=0 on boxes that implement it)"
      Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · 2eb7f910
      Pull infiniband/RDMA updates from Roland Dreier:
       - large set of iSER initiator improvements
       - hardware driver fixes for cxgb4, mlx5 and ocrdma
       - small fixes to core midlayer
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1f6075f9
      Pull more perf updates from Ingo Molnar:
       "A second (and last) round of late coming fixes and changes, almost all
        of them in perf tooling:
        User visible tooling changes:
         - Add period data column and make it default in 'perf script' (Jiri
         - Add a visual cue for toggle zeroing of samples in 'perf top'
           (Taeung Song)
         - Improve callchains when using libunwind (Namhyung Kim)
        Tooling fixes and infrastructure changes:
         - Fix for double free in 'perf stat' when using some specific invalid
           command line combo (Yasser Shalabi)
         - Fix off-by-one bugs in map->end handling (Stephane Eranian)
         - Fix off-by-one bug in maps__find(), also related to map->end
           handling (Namhyung Kim)
         - Make struct symbol->end be the first addr after the symbol range,
           to make it match the convention used for struct map->end.  (Arnaldo
           Carvalho de Melo)
         - Fix perf_evlist__add_pollfd() error handling in 'perf kvm stat
           live' (Jiri Olsa)
         - Fix python test build by moving callchain_param to an object linked
           into the python binding (Jiri Olsa)
         - Document sysfs events/ interfaces (Cody P Schafer)
         - Fix typos in perf/Documentation (Masanari Iida)
         - Add missing 'struct option' forward declaration (Arnaldo Carvalho
           de Melo)
         - Add option to copy events when queuing for sorting across cpu
           buffers and enable it for 'perf kvm stat live', to avoid having
           events left in the queue pointing to the ring buffer be rewritten
           in high volume sessions.  (Alexander Yarygin, improving work done
           by David Ahern):
         - Do not include a struct hists per perf_evsel, untangling the
           histogram code from perf_evsel, to pave the way for exporting a
           minimalistic tools/lib/api/perf/ library usable by tools/perf and
           initially by the rasd daemon being developed by Borislav Petkov,
           Robert Richter and Jean Pihet.  (Arnaldo Carvalho de Melo)
         - Make perf_evlist__open(evlist, NULL, NULL), i.e. without cpu and
           thread maps mean syswide monitoring, reducing the boilerplate for
           tools that only want system wide mode.  (Arnaldo Carvalho de Melo)
         - Move exit stuff from perf_evsel__delete to perf_evsel__exit, delete
           should be just a front end for exit + free (Arnaldo Carvalho de
         - Add support to new style format of kernel PMU event.  (Kan Liang)
        and other misc fixes"
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 5e2ee7cd
      Pull sparc fixes from David Miller:
       "Here we have two bug fixes:
        1) The current thread's fault_code is not setup properly upon entry to
           do_sparc64_fault() in some paths, leading to spurious SIGBUS.
        2) Don't use a zero length array at the end of thread_info on sparc64,
           otherwise end_of_stack() isn't right"
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e25b4927
      Pull networking fixes from David Miller:
       "A quick batch of bug fixes:
        1) Fix build with IPV6 disabled, from Eric Dumazet.
        2) Several more cases of caching SKB data pointers across calls to
           pskb_may_pull(), thus referencing potentially free'd memory.  From
           Li RongQing.
        3) DSA phy code tests operation presence improperly, instead of going:
              if (x->ops->foo)
                      r = x->ops->foo(args);
           it was going:
              if (x->ops->foo(args))
                      r = x->ops->foo(args);
         Fix from Andew Lunn"
      Net: DSA: Fix checking for get_phy_flags function · 228b16cb
      The check for the presence or not of the optional switch function
      get_phy_flags() called the function, rather than checked to see if it
      is a NULL pointer. This causes a derefernce of a NULL pointer on all
      switch chips except the sf2, the only switch to implement this call.
      sparc64: Do not define thread fpregs save area as zero-length array. · e2653143
      This breaks the stack end corruption detection facility.
      What that facility does it write a magic value to "end_of_stack()"
      and checking to see if it gets overwritten.
      "end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
      the beginning of the FPU register save area.
      So once the user uses the FPU, the magic value is overwritten and the
      debug checks trigger.
      Fix this by making the size explicit.
      Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
      are limited to 7 levels of FPU state saves.  So each FPU register set
      is 256 bytes, allocate 256 * 7 for the fpregs area.
      sparc64: Fix corrupted thread fault code. · 84bd6d8b
      Every path that ends up at do_sparc64_fault() must install a valid
      FAULT_CODE_* bitmask in the per-thread fault code byte.
      Two paths leading to the label winfix_trampoline (which expects the
      FAULT_CODE_* mask in register %g4) were not doing so:
      1) For pre-hypervisor TLB protection violation traps, if we took
         the 'winfix_trampoline' path we wouldn't have %g4 initialized
         with the FAULT_CODE_* value yet.  Resulting in using the
         TLB_TAG_ACCESS register address value instead.
      2) In the TSB miss path, when we notice that we are going to use a
         hugepage mapping, but we haven't allocated the hugepage TSB yet, we
         still have to take the window fixup case into consideration and
         in that particular path we leave %g4 not setup properly.
      Errors on this sort were largely invisible previously, but after
      commit 4ccb9272 ("sparc64: sun4v TLB
      error power off events") we now have a fault_code mask bit
      (FAULT_CODE_BAD_RA) that triggers due to this bug.
      FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
      (see #1 above) and thus we get seemingly random bus errors triggered
      for user processes.
      Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events")
      Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma · 52d589a0
      Pull slave-dmaengine updates from Vinod Koul:
       "For dmaengine contributions we have:
         - designware cleanup by Andy
         - my series moving device_control users to dmanegine_xxx APIs for
           later removal of device_control API
         - minor fixes spread over drivers mainly mv_xor, pl330, mmp, imx-sdma
