1. 13 Nov, 2018 1 commit
  2. 09 Dec, 2017 1 commit
  3. 12 Sep, 2016 1 commit
  4. 08 Aug, 2016 1 commit
    • Lukasz Odzioba's avatar
      EDAC, sb_edac: Fix channel reporting on Knights Landing · c5b48fa7
      Lukasz Odzioba authored
      On Intel Xeon Phi Knights Landing processor family the channels of the
      memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
      and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
      channel name incorrectly.
      
      We missed this change earlier, so the code already contains similar
      comment, but the translation function is incorrect.
      
      Without this patch:
        errors in DIMM_A and DIMM_D were reported in DIMM_D
        errors in DIMM_B and DIMM_E were reported in DIMM_E
        errors in DIMM_C and DIMM_F were reported in DIMM_F
      
      Correct this.
      
      Hubert Chrzaniuk:
       - rebased to 4.8
       - comments and code cleanup
      
      Fixes: d0cdf900 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: lukasz.anaczkowski@intel.com
      Cc: lukasz.odzioba@intel.com
      Cc: mchehab@kernel.org
      Cc: <stable@vger.kernel.org> # v4.5..
      Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.comSigned-off-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      [ Boris: Simplify a bit by removing char mc. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      c5b48fa7
  5. 15 Jul, 2016 1 commit
  6. 03 Jun, 2016 2 commits
  7. 02 May, 2016 1 commit
    • Tony Luck's avatar
      EDAC, sb_edac: Use cpu family/model in driver detection · 2c1ea4c7
      Tony Luck authored
      Instead of picking a random PCI ID from the dozen or so we need to
      access, just use x86_match_cpu() to pick based on CPU model number. The
      choosing of PCI devices has been problematic in the past, see
      
        11249e73 ("sb_edac: Fix detection on SNB machines")
      
      which fixed problems introduced by
      
        d0585cd8 ("sb_edac: Claim a different PCI device").
      
      This is especially ugly if future hardware might not even have
      EDAC-relevant registers in PCI config space and we would still be
      required to choose some "random" PCI devices to scan for just so our
      driver loads.
      
      Is this cleaner/clearer? It deletes much more code than it adds. Only
      tested on Broadwell. The driver loads/unloads and loads again. Still
      decodes errors too.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Suggested-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      2c1ea4c7
  8. 29 Apr, 2016 1 commit
  9. 23 Apr, 2016 1 commit
  10. 22 Apr, 2016 2 commits
  11. 10 Mar, 2016 1 commit
    • Luck, Tony's avatar
      EDAC/sb_edac: Fix computation of channel address · eb1af3b7
      Luck, Tony authored
      Large memory Haswell-EX systems with multiple DIMMs per channel were
      sometimes reporting the wrong DIMM.
      
      Found three problems:
      
       1) Debug printouts for socket and channel interleave were not interpreting
          the register fields correctly. The socket interleave field is a 2^X
          value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
          2=3. 3=4).
      
       2) Actual use of the socket interleave value didn't interpret as 2^X
      
       3) Conversion of address to channel address was complicated, and wrong.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarAristeu Rozanski <arozansk@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      eb1af3b7
  12. 07 Mar, 2016 1 commit
  13. 11 Dec, 2015 1 commit
  14. 05 Dec, 2015 3 commits
  15. 24 Sep, 2015 1 commit
    • Seth Jennings's avatar
      EDAC, sb_edac: Fix TAD presence check for sbridge_mci_bind_devs() · 2900ea60
      Seth Jennings authored
      In commit
      
        7d375bff ("sb_edac: Fix support for systems with two home agents per socket")
      
      NUM_CHANNELS was changed to 8 and the channel space was renumerated to
      handle EN, EP, and EX configurations.
      
      The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() -
      got a new device presence check in the form of saw_chan_mask. However,
      sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop.
      
      With the increase in NUM_CHANNELS, this loop fails at index 4 since
      SB only has 4 TADs.  This results in the following error on SB machines:
      
        EDAC sbridge: Some needed devices are missing
        EDAC sbridge: Couldn't find mci handler
        EDAC sbridge: Couldn't find mci handle
      
      This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as
      well.
      
      After this patch:
      
        EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED)
        EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED)
      Signed-off-by: default avatarSeth Jennings <sjenning@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # v4.2
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.comSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      2900ea60
  16. 08 Sep, 2015 2 commits
  17. 13 Aug, 2015 2 commits
  18. 03 Jun, 2015 3 commits
    • Tony Luck's avatar
      sb_edac: support for Broadwell -EP and -EX · fa2ce64f
      Tony Luck authored
      Basic support for the single socket Broadwell-DE processor
      was added back in commit 1f39581a
         sb_edac: Add support for Broadwell-DE processor
      This patch extends Broadwell support to cover the two
      socket "-EP" and four socket "-EX" versions of Broadwell.
      Only tested on the 2 socket - but this code is largely
      cloned from the Haswell path.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      fa2ce64f
    • Tony Luck's avatar
      sb_edac: Fix support for systems with two home agents per socket · 7d375bff
      Tony Luck authored
      First noticed a problem on a 4 socket machine where EDAC only reported
      half the DIMMS.  Tracked this down to the code that assumes that systems
      with two home agents only have two memory channels on each agent. This
      is true on 2 sockect ("-EP") machines. But four socket ("-EX") machines
      have four memory channels on each home agent.
      
      The old code would have had problems on two socket systems as it did
      a shuffling trick to make the internals of the code think that the
      channels from the first agent were '0' and '1', with the second agent
      providing '2' and '3'. But the code didn't uniformly convert from
      {ha,channel} tuples to this internal representation.
      
      New code always considers up to eight channels.
      On a machine with a single home agent these map easily to edac channels
      0, 1, 2, 3. On machines with two home agents we map using:
        edac_channel = 4*ha# + channel
      So on a -EP machine where each home agent supports only two channels
      we'll fill in channels 0, 1, 4, 5, and on a -EX machine we use all of 0,
      1, 2, 3, 4, 5, 6, 7.
      
      [mchehab@osg.samsung.com: fold a fixup patch as per Tony's request and fixed
       a few CodingStyle issues]
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      7d375bff
    • Tony Luck's avatar
      sb_edac: Fix a typo and a thinko in address handling for Haswell · bb89e714
      Tony Luck authored
      typo: "a7mode" chooses whether to use bits {8, 7, 9} or {8, 7, 6}
      in the algorithm to spread access between memory resources. But
      the non-a7mode path was incorrectly using GET_BITFIELD(addr, 7, 9)
      and so picking bits {9, 8, 7}
      
      thinko: BIT(1) of the dram_rule registers chooses whether to just
      use the {8, 7, 6} (or {8, 7, 9}) bits mentioned above as they are,
      or to XOR them with bits {18, 17, 16} but the code inverted the
      test. We need the additional XOR when dram_rule{1} == 0.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      bb89e714
  19. 09 Feb, 2015 1 commit
    • Borislav Petkov's avatar
      sb_edac: Fix detection on SNB machines · 11249e73
      Borislav Petkov authored
      d0585cd8 ("sb_edac: Claim a different PCI device") changed the
      probing of sb_edac to look for PCI device 0x3ca0:
      
      3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
      00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
      ...
      
      but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
      in sbridge_probe() therefore the probing fails.
      
      Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
      .i.e., the 14.0 device, fixes the issue and driver loads successfully
      again:
      
      [ 2449.013120] EDAC DEBUG: sbridge_init:
      [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
      [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
      [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      ...
      
      Add a debug printk while at it to be able to catch the failure in the
      future and dump driver version on successful load.
      
      Fixes: d0585cd8 ("sb_edac: Claim a different PCI device")
      Cc: stable@vger.kernel.org # 3.18
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Acked-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      11249e73
  20. 02 Dec, 2014 4 commits
  21. 08 Oct, 2014 3 commits
    • Andy Lutomirski's avatar
      sb_edac: Claim a different PCI device · d0585cd8
      Andy Lutomirski authored
      sb_edac controls a large number of different PCI functions.  Rather
      than registering as a normal PCI driver for all of them, it
      registers for just one so that it gets probed and, at probe time, it
      looks for all the others.
      
      Coincidentally, the device it registers for also contains the SMBUS
      registers, so the PCI core will refuse to probe both sb_edac and a
      future iMC SMBUS driver.  The drivers don't actually conflict, so
      just change sb_edac's device table to probe a different device.
      
      An alternative fix would be to merge the two drivers, but sb_edac
      will also refuse to load on non-ECC systems, whereas i2c_imc would
      still be useful without ECC.
      
      The only user-visible change should be that sb_edac appears to bind
      a different device.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Rui Wang <ruiv.wang@gmail.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      d0585cd8
    • Andy Lutomirski's avatar
      Move Intel SNB device ids from sb_edac to pci_ids.h · 68939df1
      Andy Lutomirski authored
      The i2c_imc driver will use two of them, and moving only part of
      the list seems messier.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      68939df1
    • Seth Jennings's avatar
      sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel · 351fc4a9
      Seth Jennings authored
      Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but
      EDAC doesn't know about this and returns and INTERNAL ERROR when the
      channel is greater than NUM_CHANNELS:
      
      kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f
      kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0
      kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4)
      kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 -  area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0)
      
      This commit changes sb_edac to forward a channel of -1 to EDAC if the
      channel is not specified.  edac_mc_handle_error() sets the channel to -1
      internally after the error message anyway, so this commit should have no
      effect other than avoiding the INTERNAL ERROR message when the channel
      is not specified.
      Signed-off-by: default avatarSeth Jennings <sjenning@redhat.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      351fc4a9
  22. 26 Jun, 2014 6 commits