[Mesa-dev] [PATCH 2/2] r600/atomic: add cayman version of atomic save/restore from GDS
Dave Airlie
airlied at gmail.com
Sun Dec 3 23:57:11 UTC 2017
On 1 December 2017 at 20:49, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 01.12.2017 06:06, Dave Airlie wrote:
>>
>> From: Dave Airlie <airlied at redhat.com>
>>
>> On Cayman we don't use the append/consume counters (fglrx doesn't)
>> and they don't seem to work well with compute shaders.
>>
>> This just uses GDS instead to do the atomic operations.
>
>
> Interesting. This is kind of what I'd have expected to be used from the
> beginning at least for GCN.
>
> Don't you still need to use an EOS event for proper synchronization? I mean,
> I guess you looked at fglrx traces, but still... CP_DMA definitely isn't
> waiting for shaders on newer hardware, and I don't know why it would do that
> on older hardware.
>
> FWIW, I don't have the packet specification for pre-GCN hardware here, but
> on GCN it should be:
>
> radeon_emit(cs, PKT3(PKT3_EVENT_WRITE_EOS, 3, 0) | pkt_flags);
> radeon_emit(cs, EVENT_TYPE(event) | EVENT_INDEX(6));
> radeon_emit(cs, (dst_offset) & 0xffffffff);
> radeon_emit(cs, (1 << 29) | ((dst_offset >> 32) & 0xffff));
> radeon_emit(cs, (gds_index & 0xffff) | (num_dwords << 16));
>
> to copy GDS data to memory at EOS.
My guess is WRITE_EOS is broken on cayman for compute shaders, hence
why they don't
use it. I'll dump some non-compute atomics to make sure they don't use
it there either.
It at least appears the GDS append/consume counters work for
non-compute shaders (hence
why I didn't notice this earlier), but when it comes to compute they failed.
I've no sign of fglrx using EVENT_WRITE_EOS on cayman traces, which leads
me to suspect I just have to flush hard before CP_DMA.
0xc0031503 // PKT3 0x15 4dw: COMPUTE
0x00000001
0x00000001
0x00000001
0x00000001
0xc0004600 // PKT3 0x46 1dw:
0x00000006
0xc0004600 // PKT3 0x46 1dw:
0x00000410
0xc0004600 // PKT3 0x46 1dw:
0x00000407
0xc0016800 // PKT3 0x68 2dw:
0x00000363 // CFG_OFFSET 0x00008d8c
0x00000100 // 0x00008d8c SQ_DYN_GPR_CNTL_PS_FLUSH_REQ
0xc0034300 // PKT3 0x43 4dw:
0x80107ffc
0xffffffff
0x00000000
0x00000004
0xc0016900 // PKT3 0x69 2dw:
0x00000290 // CTX_OFFSET 0x00028a40
0x00000000 // 0x00028a40 VGT_GS_MODE
0xc0016900 // PKT3 0x69 2dw:
0x000002d5 // CTX_OFFSET 0x00028b54
0x00000000 // 0x00028b54 VGT_SHADER_STAGES_EN
0xc0016900 // PKT3 0x69 2dw:
0x000001ba // CTX_OFFSET 0x000286e8
0x00000000 // 0x000286e8 SPI_COMPUTE_INPUT_CNTL
0xc0056900 // PKT3 0x69 6dw:
0x000001be // CTX_OFFSET 0x000286f8
0x00000000 // 0x000286f8 ??
0x0000ffff // 0x000286fc ??
0x00000000 // 0x00028700 ??
0x00000000 // 0x00028704 ??
0x00000000 // 0x00028708 ??
0xc0004600 // PKT3 0x46 1dw:
0x00000407
0xc0004600 // PKT3 0x46 1dw:
0x00000407
0xc0044102 // PKT3 0x41 5dw: COMPUTE
0x00000004
0xa0000000
0x7f1d0000
0x000000f8
0x04000004
0xc0044102 // PKT3 0x41 5dw: COMPUTE
0x00000000
0xa0000000
0x7f1d0004
0x000000f8
0x04000004
is the command stream from fglrx, it executes the dispatch, does a
CACHE_FLUSH, PS_PARTIAL_FLUSH, CS_PARTIAL_FLUSH,
resets DYN_GPR_PS_FLUSH_REQ, then does a SURFACE_SYNC,
resets a bunch of registers, does another couple of CS_PARTIAL_FLUSHES./
However fglrx definitely does some things different, as we have this
DEALLOC_STATE workaround and they never do it, which means they must
construct something else differently in kernel or a lot earlier in userspace,
if I do all those flushes directly after dispatch I hang unless I call
the DEALLOC_STATE
pkt.
Dave.
More information about the mesa-dev
mailing list