number of rings broken

Tom St Denis tom.stdenis at amd.com
Mon Jun 5 11:34:21 UTC 2017


Hi all,

Just back after a week off ... first thing I see on my vega10 system is 
this patch:

83866f0fc72017d55f40cbd4160cd1e42a2cc3a8 is the first bad commit
commit 83866f0fc72017d55f40cbd4160cd1e42a2cc3a8
Author: Andres Rodriguez <andresx7 at gmail.com>
Date:   Thu Feb 2 00:38:22 2017 -0500

     drm/amdgpu: allow split of queues with kfd at queue granularity v4

     Previously the queue/pipe split with kfd operated with pipe
     granularity. This patch allows amdgpu to take ownership of an arbitrary
     set of queues.

     It also consolidates the last few magic numbers in the compute
     initialization process into mec_init.

     v2: support for gfx9
     v3: renamed AMDGPU_MAX_QUEUES to AMDGPU_MAX_COMPUTE_QUEUES
     v4: fix off-by-one in num_mec checks in *_compute_queue_acquire

     Reviewed-by: Edward O'Callaghan <funfunctor at folklore1984.net>
     Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
     Acked-by: Christian König <christian.koenig at amd.com>
     Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
     Signed-off-by: Alex Deucher <alexander.deucher at amd.com>

:040000 040000 cafdad84e9fd950112e9ff56956526fa47dcaa59 
48647a91855c0b2d4b3dfedffb89bd93da25c0eb M      drivers

Causes the driver to fail to init properly with these messages:

[   21.983487] [drm] amdgpu: irq initialized.
[   22.012073] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[   22.015161] amdgpu 0000:03:00.0: fence driver on ring 0 use gpu addr 
0x000000f5ff000008, cpu addr 0xffff8802143fc008
[   22.015208] amdgpu 0000:03:00.0: fence driver on ring 1 use gpu addr 
0x000000f5ff000010, cpu addr 0xffff8802143fc010
[   22.015243] amdgpu 0000:03:00.0: fence driver on ring 2 use gpu addr 
0x000000f5ff000018, cpu addr 0xffff8802143fc018
[   22.015278] amdgpu 0000:03:00.0: fence driver on ring 3 use gpu addr 
0x000000f5ff000028, cpu addr 0xffff8802143fc028
[   22.015310] amdgpu 0000:03:00.0: fence driver on ring 4 use gpu addr 
0x000000f5ff000030, cpu addr 0xffff8802143fc030
[   22.015342] amdgpu 0000:03:00.0: fence driver on ring 5 use gpu addr 
0x000000f5ff000038, cpu addr 0xffff8802143fc038
[   22.015374] amdgpu 0000:03:00.0: fence driver on ring 6 use gpu addr 
0x000000f5ff000048, cpu addr 0xffff8802143fc048
[   22.015412] amdgpu 0000:03:00.0: fence driver on ring 7 use gpu addr 
0x000000f5ff000050, cpu addr 0xffff8802143fc050
[   22.015445] amdgpu 0000:03:00.0: fence driver on ring 8 use gpu addr 
0x000000f5ff000058, cpu addr 0xffff8802143fc058
[   22.015457] amdgpu 0000:03:00.0: fence driver on ring 1 use gpu addr 
0x000000f5ff000068, cpu addr 0xffff8802143fc068
[   22.015565] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015573] amdgpu 0000:03:00.0: fence driver on ring 2 use gpu addr 
0x000000f5ff000070, cpu addr 0xffff8802143fc070
[   22.015616] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015620] amdgpu 0000:03:00.0: fence driver on ring 3 use gpu addr 
0x000000f5ff000078, cpu addr 0xffff8802143fc078
[   22.015660] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015663] amdgpu 0000:03:00.0: fence driver on ring 4 use gpu addr 
0x000000f5ff000088, cpu addr 0xffff8802143fc088
[   22.015702] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015705] amdgpu 0000:03:00.0: fence driver on ring 5 use gpu addr 
0x000000f5ff000090, cpu addr 0xffff8802143fc090
[   22.015745] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015748] amdgpu 0000:03:00.0: fence driver on ring 6 use gpu addr 
0x000000f5ff000098, cpu addr 0xffff8802143fc098
[   22.015787] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015792] amdgpu 0000:03:00.0: fence driver on ring 7 use gpu addr 
0x000000f5ff0000a8, cpu addr 0xffff8802143fc0a8
[   22.015836] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015842] amdgpu 0000:03:00.0: fence driver on ring 8 use gpu addr 
0x000000f5ff0000b0, cpu addr 0xffff8802143fc0b0
[   22.015881] [drm:amdgpu_ring_init [amdgpu]] *ERROR* Failed to 
register debugfs file for rings !
[   22.015920] [drm:gfx_v9_0_sw_init [amdgpu]] *ERROR* Too many (8) 
compute rings!

I haven't diagnosed why that is a problem but effectively various 
constants have changed it seems.  On the same system the module inits 
for carrizo just fine so it seems to be a GFX9 issue.

Tom


More information about the amd-gfx mailing list