Re: 答复: [PATCH 12/13] drm/amdgpu:changes in gfx DMAframe scheme
Christian König
deathsimple at vodafone.de
Tue Mar 28 08:33:56 UTC 2017
No sure what version of the code your are working with, but at least on
amd-staging-4.9 the 128 NOPs are still present for GFX8.
GFX7 uses the double switch buffer package and that can be removed if
you add this in the vm_flush code.
Additional to that you haven't answered why we need the conditional
execution for the vm flush?
Regards,
Christian.
Am 28.03.2017 um 05:23 schrieb Liu, Monk:
> Christian
>
> For the 128 NOPs after VM flush, already removed for gfx8 & 7
>
> BR Monk
>
> -----邮件原件-----
> 发件人: Christian König [mailto:deathsimple at vodafone.de]
> 发送时间: Tuesday, March 28, 2017 12:17 AM
> 收件人: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> 主题: Re: [PATCH 12/13] drm/amdgpu:changes in gfx DMAframe scheme
>
> Am 24.03.2017 um 11:38 schrieb Monk Liu:
>> 1) Adapt to vulkan:
>> Now use double SWITCH BUFFER to replace the 128 nops w/a, because when
>> vulkan introduced, umd can insert 7 ~ 16 IBs per submit which makes
>> 256 DW size cannot hold the whole DMAframe (if we still insert those
>> 128 nops), CP team suggests use double SWITCH_BUFFERs, instead of
>> tricky 128 NOPs w/a.
>>
>> 2) To fix the CE VM fault issue when MCBP introduced:
>> Need one more COND_EXEC wrapping IB part (original one us for VM
>> switch part).
>>
>> this change can fix vm fault issue caused by below scenario without
>> this change:
>>
>>> CE passed original COND_EXEC (no MCBP issued this moment),
>> proceed as normal.
>>
>>> DE catch up to this COND_EXEC, but this time MCBP issued,
>> thus DE treats all following packages as NOP. The following
>> VM switch packages now looks just as NOP to DE, so DE
>> dosen't do VM flush at all.
>>
>>> Now CE proceeds to the first IBc, and triggers VM fault,
>> because DE didn't do VM flush for this DMAframe.
>>
>> 3) change estimated alloc size for gfx9.
>> with new DMAframe scheme, we need modify emit_frame_size for gfx9
>>
>> with above changes, no more 128 NOPs w/a after VM flush
> So we are back to square one? Cause that is exactly what we used to have which you replaced with the 128 NOPs.
>
>> Change-Id: Ib3f92d9d5a81bfff0369a00f23e1e5891797089a
>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> Sorry for the delay, have been busy with tracking down a hardware issue.
>
> First of all please fix your editor to correctly indent your code. This file once more doesn't has correct indentation.
>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 8 ++--
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 77 +++++++++++++++++++++-------------
>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 29 ++++++++-----
>> 3 files changed, 69 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> index d103270..b300929 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> @@ -167,9 +167,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>> return r;
>> }
>>
>> - if (ring->funcs->init_cond_exec)
>> - patch_offset = amdgpu_ring_init_cond_exec(ring);
>> -
>> if (vm) {
>> amdgpu_ring_insert_nop(ring, extra_nop); /* prevent CE go too fast
>> than DE */
>>
>> @@ -180,7 +177,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>> }
>> }
>>
>> - if (ring->funcs->emit_hdp_flush
>> + if (ring->funcs->init_cond_exec)
>> + patch_offset = amdgpu_ring_init_cond_exec(ring);
>> +
>> + if (ring->funcs->emit_hdp_flush
> Ok, so the conditional execution shouldn't cover the VM flush any more.
> That makes perfect sense.
>
> Maybe split that up as separate patch? Would make reviewing and editing this one easier.
>
>> #ifdef CONFIG_X86_64
>> && !(adev->flags & AMD_IS_APU)
>> #endif
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 9ff445c..74be4fa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -483,42 +483,59 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
>> id->oa_size != job->oa_size);
>> int r;
>>
>> - if (ring->funcs->emit_pipeline_sync && (
>> - job->vm_needs_flush || gds_switch_needed ||
>> - amdgpu_vm_ring_has_compute_vm_bug(ring)))
>> - amdgpu_ring_emit_pipeline_sync(ring);
>> + if (job->vm_needs_flush || gds_switch_needed ||
>> + amdgpu_vm_is_gpu_reset(adev, id) ||
>> + amdgpu_vm_ring_has_compute_vm_bug(ring)) {
>> + unsigned patch_offset = 0;
>>
>> - if (ring->funcs->emit_vm_flush && (job->vm_needs_flush ||
>> - amdgpu_vm_is_gpu_reset(adev, id))) {
>> - struct fence *fence;
>> - u64 pd_addr = amdgpu_vm_adjust_mc_addr(adev, job->vm_pd_addr);
>> + if (ring->funcs->init_cond_exec)
>> + patch_offset = amdgpu_ring_init_cond_exec(ring);
> Ok, why are you inserting a cond_exec for the VM flush here?
>
>>
>> - trace_amdgpu_vm_flush(pd_addr, ring->idx, job->vm_id);
>> - amdgpu_ring_emit_vm_flush(ring, job->vm_id, pd_addr);
>> + if (ring->funcs->emit_pipeline_sync &&
>> + (job->vm_needs_flush || gds_switch_needed ||
>> + amdgpu_vm_ring_has_compute_vm_bug(ring)))
>> + amdgpu_ring_emit_pipeline_sync(ring);
>>
>> - r = amdgpu_fence_emit(ring, &fence);
>> - if (r)
>> - return r;
>> + if (ring->funcs->emit_vm_flush && (job->vm_needs_flush ||
>> + amdgpu_vm_is_gpu_reset(adev, id))) {
>> + struct fence *fence;
>> + u64 pd_addr = amdgpu_vm_adjust_mc_addr(adev, job->vm_pd_addr);
>>
>> - mutex_lock(&adev->vm_manager.lock);
>> - fence_put(id->last_flush);
>> - id->last_flush = fence;
>> - mutex_unlock(&adev->vm_manager.lock);
>> - }
>> + trace_amdgpu_vm_flush(pd_addr, ring->idx, job->vm_id);
>> + amdgpu_ring_emit_vm_flush(ring, job->vm_id, pd_addr);
>>
>> - if (gds_switch_needed) {
>> - id->gds_base = job->gds_base;
>> - id->gds_size = job->gds_size;
>> - id->gws_base = job->gws_base;
>> - id->gws_size = job->gws_size;
>> - id->oa_base = job->oa_base;
>> - id->oa_size = job->oa_size;
>> - amdgpu_ring_emit_gds_switch(ring, job->vm_id,
>> - job->gds_base, job->gds_size,
>> - job->gws_base, job->gws_size,
>> - job->oa_base, job->oa_size);
>> - }
>> + r = amdgpu_fence_emit(ring, &fence);
>> + if (r)
>> + return r;
>>
>> + mutex_lock(&adev->vm_manager.lock);
>> + fence_put(id->last_flush);
>> + id->last_flush = fence;
>> + mutex_unlock(&adev->vm_manager.lock);
>> + }
>> +
>> + if (gds_switch_needed) {
>> + id->gds_base = job->gds_base;
>> + id->gds_size = job->gds_size;
>> + id->gws_base = job->gws_base;
>> + id->gws_size = job->gws_size;
>> + id->oa_base = job->oa_base;
>> + id->oa_size = job->oa_size;
>> + amdgpu_ring_emit_gds_switch(ring, job->vm_id,
>> + job->gds_base, job->gds_size,
>> + job->gws_base, job->gws_size,
>> + job->oa_base, job->oa_size);
>> + }
>> +
>> + if (ring->funcs->patch_cond_exec)
>> + amdgpu_ring_patch_cond_exec(ring, patch_offset);
>> +
>> + /* the double SWITCH_BUFFER here *cannot* be skipped by COND_EXEC */
>> + if (ring->funcs->emit_switch_buffer) {
>> + amdgpu_ring_emit_switch_buffer(ring);
>> + amdgpu_ring_emit_switch_buffer(ring);
>> + }
>> + }
>> return 0;
>> }
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index ce6fa03..eb8551b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3134,8 +3134,6 @@ static void gfx_v9_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
>> /* sync PFP to ME, otherwise we might get invalid PFP reads */
>> amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
>> amdgpu_ring_write(ring, 0x0);
>> - /* Emits 128 dw nop to prevent CE access VM before vm_flush finish */
>> - amdgpu_ring_insert_nop(ring, 128);
> If this isn't needed any more you should remove that from gfx8 and gfx7 as well.
>
> Regards,
> Christian.
>
>> }
>> }
>>
>> @@ -3629,15 +3627,24 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
>> .get_rptr = gfx_v9_0_ring_get_rptr_gfx,
>> .get_wptr = gfx_v9_0_ring_get_wptr_gfx,
>> .set_wptr = gfx_v9_0_ring_set_wptr_gfx,
>> - .emit_frame_size =
>> - 20 + /* gfx_v9_0_ring_emit_gds_switch */
>> - 7 + /* gfx_v9_0_ring_emit_hdp_flush */
>> - 5 + /* gfx_v9_0_ring_emit_hdp_invalidate */
>> - 8 + 8 + 8 +/* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */
>> - 7 + /* gfx_v9_0_ring_emit_pipeline_sync */
>> - 128 + 66 + /* gfx_v9_0_ring_emit_vm_flush */
>> - 2 + /* gfx_v9_ring_emit_sb */
>> - 3, /* gfx_v9_ring_emit_cntxcntl */
>> + .emit_frame_size = /* totally 242 maximum if 16 IBs */
>> + 5 + /* COND_EXEC */
>> + 7 + /* PIPELINE_SYNC */
>> + 46 + /* VM_FLUSH */
>> + 8 + /* FENCE for VM_FLUSH */
>> + 20 + /* GDS switch */
>> + 4 + /* double SWITCH_BUFFER,
>> + the first COND_EXEC jump to the place just
>> + prior to this double SWITCH_BUFFER */
>> + 5 + /* COND_EXEC */
>> + 7 + /* HDP_flush */
>> + 4 + /* VGT_flush */
>> + 14 + /* CE_META */
>> + 31 + /* DE_META */
>> + 3 + /* CNTX_CTRL */
>> + 5 + /* HDP_INVL */
>> + 8 + 8 + /* FENCE x2 */
>> + 2, /* SWITCH_BUFFER */
>> .emit_ib_size = 4, /* gfx_v9_0_ring_emit_ib_gfx */
>> .emit_ib = gfx_v9_0_ring_emit_ib_gfx,
>> .emit_fence = gfx_v9_0_ring_emit_fence,
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list