1. 09 Oct, 2014 3 commits
  2. 30 Sep, 2014 1 commit
  3. 25 Sep, 2014 7 commits
  4. 09 Sep, 2014 6 commits
    • Heiko Carstens's avatar
      s390/spinlock: optimize spin_unlock code · 44230282
      Heiko Carstens authored
      Use a memory barrier + store sequence instead of a load + compare and swap
      sequence to unlock a spinlock and an rw lock.
      For the spinlock case this saves us two memory reads and a not needed cpu
      serialization after the compare and swap instruction stored the new value.
      
      The kernel size (performance_defconfig) gets reduced by ~14k.
      
      Average execution time of a tight inlined spin_unlock loop drops from
      5.8ns to 0.7ns on a zEC12 machine.
      
      An artificial stress test case where several counters are protected with
      a single spinlock and which are only incremented while holding the spinlock
      shows ~30% improvement on a 4 cpu machine.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      44230282
    • Heiko Carstens's avatar
      s390/ftrace: optimize mcount code · 3d1e220d
      Heiko Carstens authored
      Reduce the number of executed instructions within the mcount block if
      function tracing is enabled. We achieve that by using a non-standard
      C function call ABI. Since the called function is also written in
      assembler this is not a problem.
      This also allows to replace the unconditional store at the beginning
      of the mcount block with a larl instruction, which doesn't touch
      memory.
      
      In theory we could also patch the first instruction of the mcount block
      to enable and disable function tracing. However this would break kprobes.
      This could be fixed with implementing the "kprobes_on_ftrace" feature;
      however keeping the odd jprobes working seems not to be possible without
      a lot of code churn. Therefore keep the code easy and simply accept one
      wasted 1-cycle "larl" instruction per function prologue.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      3d1e220d
    • Heiko Carstens's avatar
      s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_REGS support · 10dec7db
      Heiko Carstens authored
      This code is based on a patch from Vojtech Pavlik.
      http://marc.info/?l=linux-s390&m=140438885114413&w=2
      
      The actual implementation now differs significantly:
      Instead of adding a second function "ftrace_regs_caller" which would be nearly
      identical to the existing ftrace_caller function, the current ftrace_caller
      function is now an alias to ftrace_regs_caller and always passes the needed
      pt_regs structure and function_trace_op parameters unconditionally.
      
      Besides that also use asm offsets to correctly allocate and access the new
      struct pt_regs on the stack.
      
      While at it we can make use of new instruction to get rid of some indirect
      loads if compiled for new machines.
      
      The passed struct pt_regs can be changed by the called function and it's new
      contents will replace the current contents.
      
      Note: to change the return address the embedded psw member of the pt_regs
      structure must be changed. The psw member is right now incomplete, since
      the mask part is missing. For all current use cases this should be sufficent.
      Providing and restoring a sane mask would mean we need to add an epsw/lpswe
      pair to the mcount code. Only these two instruction would cost us ~120 cycles
      which currently seems not necessary.
      
      Cc: Vojtech Pavlik <vojtech@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      10dec7db
    • Heiko Carstens's avatar
      s390/ftrace: optimize function graph caller code · 2481a87b
      Heiko Carstens authored
      When the function graph tracer is disabled we can skip three additional
      instructions. So let's just do this.
      
      So if function tracing is enabled but function graph tracing is
      runtime disabled, we get away with a single unconditional branch.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      2481a87b
    • Martin Schwidefsky's avatar
      s390/vdso: add vdso support for coarse clocks · b7eacb59
      Martin Schwidefsky authored
      Add CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE optimization to
      the 64-bit and 31-bit vdso.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      b7eacb59
    • Heiko Carstens's avatar
  5. 02 Sep, 2014 2 commits
  6. 01 Sep, 2014 1 commit
  7. 08 Aug, 2014 2 commits
    • Andy Lutomirski's avatar
      arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area · a6c19dfe
      Andy Lutomirski authored
      The core mm code will provide a default gate area based on
      FIXADDR_USER_START and FIXADDR_USER_END if
      !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).
      
      This default is only useful for ia64.  arm64, ppc, s390, sh, tile, 64-bit
      UML, and x86_32 have their own code just to disable it.  arm, 32-bit UML,
      and x86_64 have gate areas, but they have their own implementations.
      
      This gets rid of the default and moves the code into ia64.
      
      This should save some code on architectures without a gate area: it's now
      possible to inline the gate_area functions in the default case.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
      Acked-by: Richard Weinberger <richard@nod.at> [for um]
      Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a6c19dfe
    • Laura Abbott's avatar
      lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig · 308c09f1
      Laura Abbott authored
      Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
      architecture specific scatterlist.h, make it a proper Kconfig option and
      use that instead.  At same time, remove the header files are are now
      mostly useless and just include asm-generic/scatterlist.h.
      
      [sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Acked-by: Thomas Gleixner <tglx@linutronix.de>			[x86]
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	[powerpc]
      Acked-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      308c09f1
  8. 01 Aug, 2014 1 commit
    • Martin Schwidefsky's avatar
      s390/mm: implement dirty bits for large segment table entries · 152125b7
      Martin Schwidefsky authored
      The large segment table entry format has block of bits for the
      ACC/F values for the large page. These bits are valid only if
      another bit (AV bit 0x10000) of the segment table entry is set.
      The ACC/F bits do not have a meaning if the AV bit is off.
      This allows to put the THP splitting bit, the segment young bit
      and the new segment dirty bit into the ACC/F bits as long as
      the AV bit stays off. The dirty and young information is only
      available if the pmd is large.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      152125b7
  9. 28 Jul, 2014 1 commit
  10. 22 Jul, 2014 1 commit
  11. 21 Jul, 2014 2 commits
  12. 17 Jul, 2014 1 commit
    • Davidlohr Bueso's avatar
      arch, locking: Ciao arch_mutex_cpu_relax() · 3a6bfbc9
      Davidlohr Bueso authored
      The arch_mutex_cpu_relax() function, introduced by 34b133f8, is
      hacky and ugly. It was added a few years ago to address the fact
      that common cpu_relax() calls include yielding on s390, and thus
      impact the optimistic spinning functionality of mutexes. Nowadays
      we use this function well beyond mutexes: rwsem, qrwlock, mcs and
      lockref. Since the macro that defines the call is in the mutex header,
      any users must include mutex.h and the naming is misleading as well.
      
      This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
      only if you can do it with very low latency") and (ii) defines it in
      each arch's asm/processor.h local header, just like for regular cpu_relax
      functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
      and thus we can take it out of mutex.h. While this can seem redundant,
      I believe it is a good choice as it allows us to move out arch specific
      logic from generic locking primitives and enables future(?) archs to
      transparently define it, similarly to System Z.
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharat Bhushan <r65777@freescale.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Joseph Myers <joseph@codesourcery.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Stratos Karafotis <stratosk@semaphore.gr>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: adi-buildroot-devel@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-am33-list@redhat.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux@lists.openrisc.net
      Cc: linux-m32r-ja@ml.linux-m32r.org
      Cc: linux-m32r@ml.linux-m32r.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-metag@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3a6bfbc9
  13. 16 Jul, 2014 1 commit
  14. 10 Jul, 2014 1 commit
  15. 10 Jun, 2014 1 commit
    • Martin Schwidefsky's avatar
      s390/uaccess: always load the kernel ASCE after task switch · f8b13505
      Martin Schwidefsky authored
      This patch fixes a problem introduced with git commit beef560b
      "s390/uaccess: simplify control register updates".
      
      The switch_mm function is not called if the next process is a kernel
      thread without an attached mm or is a nop if the mm does not change.
      But CR1 still needs to be loaded with the kernel ASCE in case the
      code returns to a uaccess function that uses the secondary space mode.
      
      In addition move the set_fs call from finish_arch_switch to
      finish_arch_post_lock_switch and then remove finish_arch_switch.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      f8b13505
  16. 30 May, 2014 1 commit
  17. 28 May, 2014 2 commits
  18. 22 May, 2014 1 commit
  19. 20 May, 2014 5 commits