<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Arial,Helvetica,sans-serif">
<p>1, windows separate vm flush and gfx submit, so if with vm flush windows actually submit 256 dw totally (128 for vm flush and 128 for gfx submit)</p>
<p>2, for gfx submit frame: windows always emit pipeline sync(name is "w0ait_3d_idle" for windows)<br>
</p>
<p>3, for vm flush frame: windows always emit pipeline sync(name is drain_pfp_to_EOP for windows) prior to pd update and following vm invalidate.<br>
</p>
<p><br>
</p>
<p>><font size="2"><span style="font-size:10pt">But why doesn't this happen on Windows then?</span></font></p>
<p><font size="2"><span style="font-size:10pt"><br>
</span></font></p>
<p><font size="2"><span style="font-size:10pt">[ML] good question, I guess the answer is windows always insert pipeline sync.</span></font></p>
<p><font size="2"><span style="font-size:10pt"><br>
</span></font></p>
<p><font size="2"><span style="font-size:10pt">for the case without vm flush: a pipeline sync in gfx submit can prevent ME ruin SH/3d register before previous pipeline finished (DE ib will programing lots of 3d/sh register; and CE ib will also change many resource
descriptor if there is no context switch this frame)</span></font></p>
<p><font size="2"><span style="font-size:10pt"><br>
</span></font></p>
<p><font size="2"><span style="font-size:10pt">for the case with a vm flush: a pipeline sync before vm invalidate can also prevent ME invalidate VM before previous pipeline finished.</span></font></p>
<p><font size="2"><span style="font-size:10pt">I think a vm invalidate will trigger VM fault if 3d pipeline is working on the invalidated VMID *or* invalidated PD address.</span></font></p>
<p><font size="2"><span style="font-size:10pt"><br>
</span></font></p>
<p><font size="2"><span style="font-size:10pt">so I think we should move pipeline sync out of VM flush next step, and always put it prior to IB as well as vm invalidate
<br>
</span></font></p>
<p><font size="2"><span style="font-size:10pt"><br>
</span></font></p>
<p><font size="2"><span style="font-size:10pt">BR Monk<br>
</span></font></p>
<p><font size="2"><span style="font-size:10pt"><br>
</span></font></p>
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>发件人:</b> Christian König <deathsimple@vodafone.de><br>
<b>发送时间:</b> 2017年1月20日 16:48:17<br>
<b>收件人:</b> Liu, Monk; amd-gfx@lists.freedesktop.org<br>
<b>主题:</b> Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3)</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">Yeah, thought about that possibility as well when we change the padding
<br>
to 256.<br>
<br>
This happens every time our command submission uses more than 128 dw, in <br>
this case you obviously have less than 128 padding.<br>
<br>
But why doesn't this happen on Windows then?<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 19.01.2017 um 15:49 schrieb Liu, Monk:<br>
> And with more think on it, even the very first UMD cmd start with wptr at 256/512 position ( because ib test ends up at 255 aligned position), your approach still cannot satisfy 128 DW between SWTICH_BUFFER and Preamble IB (if no vm_flush) or 128 Nops before
VM_flush (if needed) for the next frame.<br>
><br>
> e.g. we insert 180 dw this frame, start from0, end up at 179, and ring_commit() padding NOPs and make wptr end up at 255, so only 75 NOPs is between SWITCH_BUFFER and next frame (assume we don't have vm-flush next frame)<br>
><br>
> BR Monk<br>
><br>
> -----Original Message-----<br>
> From: Liu, Monk<br>
> Sent: Thursday, January 19, 2017 10:39 PM<br>
> To: 'Christian König' <deathsimple@vodafone.de>; amd-gfx@lists.freedesktop.org<br>
> Subject: RE: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3)<br>
><br>
> Christian<br>
><br>
> I just found one issue in your previous patch ( the one you remove inserting 128nop before vm_fush and meanwhile set align_mask of GFX ring to 0xff)<br>
><br>
> Because I checked amdgpu_ring_commit() again, this function will not guarantee current submit size aligned with 256 dw, instead it only guarantee the ring->wptr will be 256 dw aligned after amdgpu_ring_commit()<br>
><br>
> So that means your approach can not make sure there will be 128 NOP before vm_flush at all !<br>
><br>
> e.g. this frame start at 200 position of RB, and it submitted 50 dw so it ends at 249 position, you call amdgpu_ring_commit() now and it only padding 6 more NOPs to ring buffer, so wptr finnaly end at 255, that clearly not what we want originally ~ we want
to insert 128 nop after SWITCH_BUFFER (or say, before vm_flush), but you only add 6 NOPs ....<br>
><br>
> your approach only works with wptr == 0 for the first UMD submit<br>
><br>
> BR Monk<br>
><br>
> -----Original Message-----<br>
> From: amd-gfx [<a href="mailto:amd-gfx-bounces@lists.freedesktop.org">mailto:amd-gfx-bounces@lists.freedesktop.org</a>] On Behalf Of Christian K?nig<br>
> Sent: Thursday, January 19, 2017 5:11 PM<br>
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org<br>
> Subject: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3)<br>
><br>
> Am 18.01.2017 um 12:42 schrieb Monk Liu:<br>
>> previously we always insert 128nops behind vm_flush, which may lead to<br>
>> DAMframe size above 256 dw and automatially aligned to 512 dw.<br>
>><br>
>> now we calculate how many DWs already inserted after vm_flush and make<br>
>> up for the reset to pad up to 128dws before emit_ib.<br>
>> that way we only take 256 dw per submit.<br>
>><br>
>> v2:<br>
>> drop insert_nops in vm_flush<br>
>> re-calculate the estimate frame size for gfx8 ring<br>
>> v3:<br>
>> calculate the gap between vm_flush and IB in cntx_cntl on an member<br>
>> field of ring structure<br>
>> v4:<br>
>> set last_vm_flush_pos even for case of no vm flush.<br>
>><br>
>> Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd<br>
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com><br>
>> ---<br>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +<br>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++<br>
>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++----<br>
>> 3 files changed, 20 insertions(+), 4 deletions(-)<br>
>><br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h<br>
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h<br>
>> index c813cbe..332969f 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h<br>
>> @@ -173,6 +173,7 @@ struct amdgpu_ring {<br>
>> #if defined(CONFIG_DEBUG_FS)<br>
>> struct dentry *ent;<br>
>> #endif<br>
>> + u32 last_vm_flush_pos;<br>
>> };<br>
>> <br>
>> int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff<br>
>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c<br>
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c<br>
>> index d05546e..53fc5e0 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c<br>
>> @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)<br>
>> amdgpu_vm_ring_has_compute_vm_bug(ring)))<br>
>> amdgpu_ring_emit_pipeline_sync(ring);<br>
>> <br>
>> + /* when no vm-flush this frame, we still need to mark down<br>
>> + * the position of the tail of hdp_flush, because we still<br>
>> + * need to make sure there are 128 DWs between last SWITCH_BUFFER and<br>
>> + * the emit_ib this frame. otherwise there is still VM fault occured on<br>
>> + * constant engine.<br>
>> + */<br>
>> + ring->last_vm_flush_pos = ring->wptr;<br>
>> if (ring->funcs->emit_vm_flush && (job->vm_needs_flush ||<br>
>> amdgpu_vm_is_gpu_reset(adev, id))) {<br>
>> struct fence *fence;<br>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c<br>
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c<br>
>> index 5f37313..dde4177 100644<br>
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c<br>
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c<br>
>> @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr,<br>
>> DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0));<br>
>> amdgpu_ring_write(ring, lower_32_bits(seq));<br>
>> amdgpu_ring_write(ring, upper_32_bits(seq));<br>
>> -<br>
>> }<br>
>> <br>
>> static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring<br>
>> *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring,<br>
>> /* sync PFP to ME, otherwise we might get invalid PFP reads */<br>
>> amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));<br>
>> amdgpu_ring_write(ring, 0x0);<br>
>> - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */<br>
>> - amdgpu_ring_insert_nop(ring, 128);<br>
>> }<br>
>> + ring->last_vm_flush_pos = ring->wptr;<br>
>> }<br>
>> <br>
>> static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring)<br>
>> @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags)<br>
>> if (amdgpu_sriov_vf(ring->adev))<br>
>> gfx_v8_0_ring_emit_de_meta_init(ring,<br>
>> (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR :<br>
>> ring->adev->virt.csa_vmid0_addr);<br>
>> +<br>
>> + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of<br>
>> + * vm_flush, which may trigger VM fault. */<br>
>> + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */<br>
>> + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr -<br>
>> +ring->last_vm_flush_pos));<br>
> This can easily result in a negative number, couldn't it?<br>
><br>
>> + else<br>
>> + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128)<br>
>> + amdgpu_ring_insert_nop(ring,<br>
>> + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos +<br>
>> +ring->wptr));<br>
> I think it would be cleaner if you calculate the number of NOPs needed first for both cases and then check if the number isn't negative for both cases.<br>
><br>
>> }<br>
>> <br>
>> static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring,<br>
>> uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = {<br>
>> 7 + /* gfx_v8_0_ring_emit_pipeline_sync */<br>
>> 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */<br>
>> 2 + /* gfx_v8_ring_emit_sb */<br>
>> - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */<br>
>> + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt<br>
>> +flush/meta-data */<br>
> Please put the - on the next line. That is easy to miss and I asked myself for a moment why you want to add 20 here.<br>
><br>
> Apart from that the patch looks good to me, Christian.<br>
><br>
>> + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush<br>
>> +fence/cntx_cntl/vgt_flush/meta-data anymore */<br>
>> .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */<br>
>> .emit_ib = gfx_v8_0_ring_emit_ib_gfx,<br>
>> .emit_fence = gfx_v8_0_ring_emit_fence_gfx,<br>
><br>
> _______________________________________________<br>
> amd-gfx mailing list<br>
> amd-gfx@lists.freedesktop.org<br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
<br>
<br>
</div>
</span></font>
</body>
</html>