[PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

Christian König ckoenig.leichtzumerken at gmail.com
Wed Sep 28 14:46:55 UTC 2022


Am 28.09.22 um 15:52 schrieb Michel Dänzer:
> On 2022-09-28 03:01, Zhu, Jiadong wrote:>
>> Please make sure umd is calling the libdrm function to create context with different priories,
>> amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, &context_handle).
> Yes, I double-checked that, and that it returns success.
>
>
>> Here is the behavior we could see:
>> 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr to see if it is being used.
>> 2. MCBP happens while the two different priority ibs are submitted at the same time. We could check fence info as below:
>> Last signaled trailing fence++  when the mcbp triggers by kmd. Last preempted may not increase as the mcbp is not triggered from CP.
>>
>> --- ring 0 (gfx) ---
>> Last signaled fence          0x00000001
>> Last emitted                 0x00000001
>> Last signaled trailing fence 0x0022eb84
>> Last emitted                 0x0022eb84
>> Last preempted               0x00000000
>> Last reset                   0x00000000
> I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled trailing fence" line is changing here (seems to always match the "Last emitted" line).
>
> However, mutter's frame rate still cannot exceed that of GPU-limited clients. BTW, you can test with a GNOME Wayland session, even without my MR referenced below. Preemption will just be less effective without that MR. mutter has used a high priority context when possible for a long time.
>
> I'm also seeing intermittent freezes, where not even the mouse cursor moves for up to around one second, e.g. when interacting with the GNOME top bar. I'm not sure yet if these are related to this patch series, but I never noticed it before. I wonder if the freezes might occur when GPU preemption is attempted.

Keep in mind that this doesn't have the same fine granularity as the 
separate hw ring buffer found on gfx10.

With MCBP we can only preempt on draw command boundary, while the 
separate hw ring solution can preempt as soon as a CU is available.

>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>
>>> This work is solely for gfx9 (e.g. Vega) and older.
>>>
>>> Navi has a completely separate high priority gfx queue we can use for this.
> Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was for Sienna Cichlid only, and turned out to be unstable, so it had to reverted.
>
> It would be nice to make the SW ring solution take effect by default whenever there is no separate high priority HW gfx queue available (and any other requirements are met).

I don't think that this will be a good idea. The hw ring buffer or even 
hw scheduler have much nicer properties and we should focus on getting 
that working when available.

Regards,
Christian.

>
>
>> Am 27.09.22 um 19:49 schrieb Michel Dänzer:
>>> On 2022-09-27 08:06, Christian König wrote:
>>>> Hey Michel,
>>>>
>>>> JIadong is working on exposing high/low priority gfx queues for gfx9 and older hw generations by using mid command buffer preemption.
>>> Yeah, I've been keeping an eye on these patches. I'm looking forward to this working.
>>>
>>>
>>>> I know that you have been working on Gnome Mutter to make use from userspace for this. Do you have time to run some tests with that?
>>> I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then with it, since I wasn't sure if it's needed) with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3D&reserved=0 on Navi 14.
>>>
>>> Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high priority context via the EGL_IMG_context_priority extension, it's unable to reach a higher frame rate than a GPU-limited client[0]. The "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x00000000.
>>>
>>> Did I miss a step?
>>>
>>>
>>> [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, mutter can achieve a higher frame rate than plot3d, though not than pixmark piano (presumably due to limited GPU preemption granularity).
>



More information about the amd-gfx mailing list