[PATCH 2/4] drm/amdgpu:new method to sync ce&de

Mon Aug 29 08:54:38 UTC 2016

Well a perfect example of why we shouldn't take a look at what userspace 
does, but rather what the hardware can do when we design the IOCTLs.

I'm trying to raise awareness for this for quite a while now, but a lot 
of people seem to think when the UMD doesn't do something the kernel 
doesn't need to handle this situation.

Thanks for that info,
Christian.

Am 29.08.2016 um 10:28 schrieb Liu, Monk:
> But speaking with practice attitude: at least close source UMD OCL doesn't use CE at all, and I guess MESA either ...
>
> BR Monk
>
> -----Original Message-----
> From: Christian König [mailto:deathsimple at vodafone.de]
> Sent: Monday, August 29, 2016 4:25 PM
> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 2/4] drm/amdgpu:new method to sync ce&de
>
> Hui what?
>
> It feels like a hundred times I asked if the compute engine has a CE as well, but so far the answer was always No. That would explain a whole bunch of problems we had with the compute rings as well.
>
> In this case the patch is good as it is, please just rebase it.
>
> Christian.
>
> Am 29.08.2016 um 10:14 schrieb Liu, Monk:
>> No, compute ring also can leverage constant engines, that's depend on
>> OCL umd behavior I just make sure KMD do nothing wrong
>>
>> BR Monk
>>
>> -----Original Message-----
>> From: Christian König [mailto:deathsimple at vodafone.de]
>> Sent: Monday, August 29, 2016 4:06 PM
>> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH 2/4] drm/amdgpu:new method to sync ce&de
>>
>> Am 29.08.2016 um 04:55 schrieb Monk Liu:
>>> CE & DE can have most up to 128dw as the gap between them so to sync
>>> CE nad DE we don't need double SWITCH_BUFFERs any more, which is
>>> urgly and harm performance, we only need insert 128NOP after VM flush
>>> to prevent CE vm fault.
>>>
>>> Change-Id: Ibec954ce4c817ad7d3bce89c2bcb95b6c6bb5411
>>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>> Looks good to me, but only the GFX engines have a CE. So syncing on the compute engines is pretty much pointless.
>>
>> So I suggest that you move this into the "usepfp" if branch as well.
>>
>> With that fixed the patch is Reviewed-by: Christian König <christian.koenig at amd.com>.
>>
>> Regards,
>> Christian.
>>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 18 ++++++------------
>>>     1 file changed, 6 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> index 26fced0..ce1e616 100755
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> @@ -6005,14 +6005,6 @@ static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
>>>     	amdgpu_ring_write(ring, seq);
>>>     	amdgpu_ring_write(ring, 0xffffffff);
>>>     	amdgpu_ring_write(ring, 4); /* poll interval */
>>> -
>>> -	if (usepfp) {
>>> -		/* synce CE with ME to prevent CE fetch CEIB before context switch done */
>>> -		amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0));
>>> -		amdgpu_ring_write(ring, 0);
>>> -		amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0));
>>> -		amdgpu_ring_write(ring, 0);
>>> -	}
>>>     }
>>>     
>>>     static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>> @@
>>> -6020,6 +6012,9 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>>     {
>>>     	int usepfp = (ring->type == AMDGPU_RING_TYPE_GFX);
>>>     
>>> +	/* GFX8 emits 128 dw nop to prevent DE do vm_flush before CE finish CEIB */
>>> +	amdgpu_ring_insert_nop(ring, 128);
>>> +
>>>     	amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
>>>     	amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(usepfp) |
>>>     				 WRITE_DATA_DST_SEL(0)) |
>>> @@ -6059,11 +6054,10 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>>     		/* sync PFP to ME, otherwise we might get invalid PFP reads */
>>>     		amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
>>>     		amdgpu_ring_write(ring, 0x0);
>>> -		amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0));
>>> -		amdgpu_ring_write(ring, 0);
>>> -		amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0));
>>> -		amdgpu_ring_write(ring, 0);
>>>     	}
>>> +
>>> +	/* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */
>>> +	amdgpu_ring_insert_nop(ring, 128);
>>>     }
>>>     
>>>     static u32 gfx_v8_0_ring_get_rptr_compute(struct amdgpu_ring
>>> *ring)
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx