Problem with latest amdkfd-next

Felix Kuehling felix.kuehling at amd.com
Wed Sep 27 02:40:01 UTC 2017


Hi Oded,

I rebased my next set of changes on your latest amdkfd-next (4.14-rc1
based) and ran into a problem when running SHOC on Kaveri. The system
randomly reboots just when the test starts to run. The first run is
usually fine, but the second or third fails. If the test starts
successfully, it will run through to the end.

I backed out my local changes, and the same problem persists. I went
back to v4.14-rc1, which is about as far back as I can go without losing
KFD support for the ROCm user mode stack. Still the problem persists. As
a guess, I also tried reverting just the iommu driver back to 4.13, that
didn't help.

I tried to find a way to reproduce it without ROCm, but unsuccessfully.
I tried running SHOC using the amdgpu-pro version of OpenCL, and running
Unigine Valley as a stressful graphics benchmark. Those tests didn't
trigger the problem.

Alex's amd-staging-drm-next, which is still 4.13-rc5 based, is working
fine (with the same user mode and firmware). At this point I have to
conclude that the problem is ROCm-specific and specific to current
4.14-rc1. It will be interesting to see what happens when Alex's branch
moves to 4.14-rcX. If Alex's branch continues to work, it likely has a
fix in amdgpu. If it starts failing, the root cause is outside the
amdgpu driver. I diffed the amdkfd driver between amd-staging-drm-next,
amdkfd-next and v4.14-rc1 and I think it's not likely a problem in
amdkfd itself. I also eliminated the iommu driver above.

For now I can't spend more time on this issue. I've confirmed that my
new set of patches is not to blame, so I'll send them out for your review.

Regards,
  Felix

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com



More information about the amd-gfx mailing list