[Advice Request] Trying to debug amdgpu fatal error

Christian König christian.koenig at amd.com
Mon Apr 9 06:52:33 UTC 2018


Please provide the full dmesg of the system as well as the output of 
"lspci -s 0000:16:00.0 -vvvv" as attachment.

Thanks,
Christian.

Am 09.04.2018 um 06:00 schrieb Andrey Grodzovsky:
>
> Just from a quick look it seems to fail in amdgpu_device_init->ioremap 
> with ENOMEM, that would explain why you don't see any more prints - 
> this failure is very early in the device init process.
>
> No idea why ioremap would fail in this case and not even sure which 
> implementation of ioremap to look into for your case.
>
> Adding Christian for this.
>
> Andrey
>
>
> On 04/07/2018 03:16 AM, Daniel Moran wrote:
>> Also, to clarify... if I move it into a regular slot, turn off the 
>> eGPU it works as expected.
>> Tested with Intel iGPU enabled and disabled, made sure i915 loaded 
>> without error and can connect display to it.
>>
>>
>>
>> Again, thank you in advance for any time/support offered.
>>
>> Respectfully,
>> Daniel S. Moran (garwynn)
>> PC Hardware Editor - XDA-Developers
>> Phone: 1-559-316-0760/+81-90-5484-4155
>> Article Links: http://www.xda-developers.com/author/garwynn
>> E-mail: xdagarwynn at gmail.com <mailto:xdagarwynn at gmail.com> | Twitter: 
>> @xdagarwynn
>>
>> On Sat, Apr 7, 2018 at 3:58 PM, Daniel Moran <xdagarwynn at gmail.com 
>> <mailto:xdagarwynn at gmail.com>> wrote:
>>
>>     Hello all,
>>
>>     I've got a Powercolor Red Devil Vega 56 here that I'm trying to
>>     get working in eGPU mode.
>>     I think on the BIOS/hardware side it's now all fleshed out.
>>     Now I'm at a point where amdgpu tries to init and reaches a fatal
>>     error.
>>
>>     Set loglevel=8 doesn't get any additional messages.
>>     Here's what it does report (full dmesg attached):
>>
>>     [  429.005909] [drm] amdgpu kernel modesetting enabled.
>>     [  429.006080] [drm] initializing kernel modesetting (VEGA10
>>     0x1002:0x687F 0x148C:0x2388 0xC3).
>>     [  429.006082] amdgpu 0000:16:00.0: Fatal error during GPU init
>>     [  429.006155] amdgpu: probe of 0000:16:00.0 failed with error -12
>>
>>     Using the following commands to unload & reload for testing.
>>     Since it's as an eGPU I'm using the i7-7700K iGPU (i915 module)
>>     as the primary and these commands work in terminal without
>>     requiring a reboot.
>>
>>     sudo rmmod amdgpu
>>     sudo modprobe -v amgpu
>>
>>     Pulled the UMR and tried to make, fails on Cmake. I'll attach log
>>     in a text.
>>     Also will attach a full dmesg and lspci dump. uname -a below:
>>     /Linux testbox 4.15.15-041515-generic #201803311331 SMP Sat Mar
>>     31 17:34:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux/
>>
>>     Any other ideas on how I can debug this further? Feel I'm so
>>     close, don't want to let this go.
>>     Thank you in advance for your time.
>>
>>     Respectfully,
>>     Daniel S. Moran (garwynn)
>>     PC Hardware Editor - XDA-Developers
>>     Phone: 1-559-316-0760/+81-90-5484-4155
>>     Article Links: http://www.xda-developers.com/author/garwynn
>>     <http://www.xda-developers.com/author/garwynn>
>>     E-mail: xdagarwynn at gmail.com <mailto:xdagarwynn at gmail.com> |
>>     Twitter: @xdagarwynn
>>
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180409/64034484/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2018-04-07 16-08-59.png
Type: image/png
Size: 60529 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180409/64034484/attachment-0001.png>


More information about the amd-gfx mailing list