<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Arial;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - General]<br>
</p>
<br>
<div>
<div><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">Christian, firmware has nothing to do with it and doesn't control it. That was a wrong group of people to ping.
It's only implemented in the SPI and tested by the SPI team and PAL team.</span></div>
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
</span></div>
<div class="elementToProof"><span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">Marek</span></div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Koenig, Christian <Christian.Koenig@amd.com><br>
<b>Sent:</b> December 8, 2023 09:38<br>
<b>To:</b> Friedrich Vock <friedrich.vock@gmx.de>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
<b>Cc:</b> Deucher, Alexander <Alexander.Deucher@amd.com>; Olsak, Marek <Marek.Olsak@amd.com><br>
<b>Subject:</b> Re: [PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Am 08.12.23 um 12:43 schrieb Friedrich Vock:<br>
> On 08.12.23 10:51, Christian König wrote:<br>
>> Well longer story short Alex and I have been digging up the<br>
>> documentation for this and as far as we can tell this isn't correct.<br>
> Huh. I initially talked to Marek about this, adding him in Cc.<br>
<br>
Yeah, from the userspace side all you need to do is to set the bit as <br>
far as I can tell.<br>
<br>
>><br>
>> You need to do quite a bit more before you can turn on this feature.<br>
>> What userspace side do you refer to?<br>
> I was referring to the Mesa merge request I made<br>
> (<a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462</a>).<br>
> If/When you have more details about what else needs to be done, feel<br>
> free to let me know.<br>
<br>
For example from the hardware specification explicitly states that the <br>
kernel driver should make sure that only one app/queue is using this at <br>
the same time. That might work for now since we should only have a <br>
single compute priority queue, but we are not 100% sure yet.<br>
<br>
Apart from that the hardware documentation only says that it's a nice to <br>
have feature and when we pinged firmware engineers to get more <br>
information they didn't know the feature immediately either.<br>
<br>
That is usually a strong indicator that stuff was implemented in the <br>
hardware, but not fully completed and tested by the firmware team and <br>
validation team.<br>
<br>
Alex and I need to confirm that this feature actually works the way it <br>
should and that it's validated/stable/read for production use.<br>
<br>
Regards,<br>
Christian.<br>
<br>
> I'm happy to expand this to add the rest of what's needed as well.<br>
><br>
> Thanks,<br>
> Friedrich<br>
><br>
>><br>
>> Regards,<br>
>> Christian.<br>
>><br>
>> Am 08.12.23 um 09:19 schrieb Friedrich Vock:<br>
>>> Friendly ping on this one.<br>
>>> Userspace side got merged, so would be great to land this patch too :)<br>
>>><br>
>>> On 02.12.23 01:17, Friedrich Vock wrote:<br>
>>>> This improves latency if the GPU is already busy with other work.<br>
>>>> This is useful for VR compositors that submit highly latency-sensitive<br>
>>>> compositing work on high-priority compute queues while the GPU is busy<br>
>>>> rendering the next frame.<br>
>>>><br>
>>>> Userspace merge request:<br>
>>>> <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462</a><br>
>>>><br>
>>>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de><br>
>>>> ---<br>
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +<br>
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 10 ++++++----<br>
>>>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 ++-<br>
>>>> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-<br>
>>>> 4 files changed, 11 insertions(+), 6 deletions(-)<br>
>>>><br>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
>>>> index 9505dc8f9d69..4b923a156c4e 100644<br>
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
>>>> @@ -790,6 +790,7 @@ struct amdgpu_mqd_prop {<br>
>>>> uint64_t eop_gpu_addr;<br>
>>>> uint32_t hqd_pipe_priority;<br>
>>>> uint32_t hqd_queue_priority;<br>
>>>> + bool allow_tunneling;<br>
>>>> bool hqd_active;<br>
>>>> };<br>
>>>><br>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c<br>
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c<br>
>>>> index 231d49132a56..4d98e8879be8 100644<br>
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c<br>
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c<br>
>>>> @@ -620,6 +620,10 @@ static void amdgpu_ring_to_mqd_prop(struct<br>
>>>> amdgpu_ring *ring,<br>
>>>> struct amdgpu_mqd_prop *prop)<br>
>>>> {<br>
>>>> struct amdgpu_device *adev = ring->adev;<br>
>>>> + bool is_high_prio_compute = ring->funcs->type ==<br>
>>>> AMDGPU_RING_TYPE_COMPUTE &&<br>
>>>> + amdgpu_gfx_is_high_priority_compute_queue(adev, ring);<br>
>>>> + bool is_high_prio_gfx = ring->funcs->type ==<br>
>>>> AMDGPU_RING_TYPE_GFX &&<br>
>>>> + amdgpu_gfx_is_high_priority_graphics_queue(adev, ring);<br>
>>>><br>
>>>> memset(prop, 0, sizeof(*prop));<br>
>>>><br>
>>>> @@ -637,10 +641,8 @@ static void amdgpu_ring_to_mqd_prop(struct<br>
>>>> amdgpu_ring *ring,<br>
>>>> */<br>
>>>> prop->hqd_active = ring->funcs->type == AMDGPU_RING_TYPE_KIQ;<br>
>>>><br>
>>>> - if ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE &&<br>
>>>> - amdgpu_gfx_is_high_priority_compute_queue(adev, ring)) ||<br>
>>>> - (ring->funcs->type == AMDGPU_RING_TYPE_GFX &&<br>
>>>> - amdgpu_gfx_is_high_priority_graphics_queue(adev, ring))) {<br>
>>>> + prop->allow_tunneling = is_high_prio_compute;<br>
>>>> + if (is_high_prio_compute || is_high_prio_gfx) {<br>
>>>> prop->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_HIGH;<br>
>>>> prop->hqd_queue_priority = <br>
>>>> AMDGPU_GFX_QUEUE_PRIORITY_MAXIMUM;<br>
>>>> }<br>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
>>>> index c8a3bf01743f..73f6d7e72c73 100644<br>
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
>>>> @@ -6593,7 +6593,8 @@ static int gfx_v10_0_compute_mqd_init(struct<br>
>>>> amdgpu_device *adev, void *m,<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, ENDIAN_SWAP, 1);<br>
>>>> #endif<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0);<br>
>>>> - tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH, 0);<br>
>>>> + tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH,<br>
>>>> + prop->allow_tunneling);<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1);<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, KMD_QUEUE, 1);<br>
>>>> mqd->cp_hqd_pq_control = tmp;<br>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c<br>
>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c<br>
>>>> index c659ef0f47ce..bdcf96df69e6 100644<br>
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c<br>
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c<br>
>>>> @@ -3847,7 +3847,8 @@ static int gfx_v11_0_compute_mqd_init(struct<br>
>>>> amdgpu_device *adev, void *m,<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, RPTR_BLOCK_SIZE,<br>
>>>> (order_base_2(AMDGPU_GPU_PAGE_SIZE / 4) - 1));<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0);<br>
>>>> - tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH, 0);<br>
>>>> + tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH,<br>
>>>> + prop->allow_tunneling);<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1);<br>
>>>> tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, KMD_QUEUE, 1);<br>
>>>> mqd->cp_hqd_pq_control = tmp;<br>
>>>> -- <br>
>>>> 2.43.0<br>
>>>><br>
>><br>
<br>
</div>
</span></font></div>
</div>
</body>
</html>