[RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment

Thu May 29 23:15:27 UTC 2025

Ping.

On 5/7/25 7:31 AM, John Olender wrote:
> On 5/5/25 12:06 PM, John Olender wrote:
>> On 5/5/25 5:02 AM, Christian König wrote:
>>>> Simply changing the uvd vcpu bo (and therefore the firmware) to always
>>>> be allocated in vram does *not* solve #3851.
>>>>
>>>> Let me go into a bit of depth about how I arrived at this patch.
>>>>
>>>> First, what sort of system configuration changes result in the uvd init
>>>> failure?  It looks like having a display connected and changing the BAR
>>>> size have an impact.  Next, which kernel change reliably triggers the
>>>> issue?  The change is the switch to the buddy allocator.
>>>
>>> Well that is not a resizable BAR, but rather the "VRAM" is just stolen system memory and we completely bypass the BAR to access it.
>>>
>>> But the effect is the same. E.g. you have more memory CPU accessible than otherwise.
>>>
>>>>
>>>> Now that the issue can be reliably triggered, where does the error code,
>>>> -110 / -ETIMEDOUT, come from?  It turns out it's in
>>>> amdgpu_uvd_ring_test_ib(), specifically a timeout while waiting on the
>>>> ring's fence.
>>>>
>>>> With that out of the way, what allocator-related change happens when a
>>>> display is connected at startup?  The 'stolen_vga_memory' and related
>>>> bos are created.  Adding a one page dummy bo to the same place in the
>>>> driver can allow a headless configuration to now pass the uvd ring ib test.
>>>>
>>>> Why does having these extra objects allocated result in a change in
>>>> behavior?  Well, the switch to the buddy allocator drastically changes
>>>> *where* in vram various objects end up being placed.  What about the BAR
>>>> size change?  That ends up influencing where the objects are placed too.
>>>>
>>>> Which objects related to uvd end up being moved around?  The uvd code
>>>> has a function to force its objects into a specific segment after all.
>>>> Well, it turns out the vcpu bo doesn't go through this function and is
>>>> therefore being moved around.
>>>
>>> That function is there because independent buffers (the message and the feedback for example) needs to be in the same 256MB segment.
>>>
>>>> When the system configuration results in a ring ib timeout, the uvd vcpu
>>>> bo is pinned *outside* the uvd segment.  When uvd init succeeds, the uvd
>>>> vcpu bo is pinned *inside* the uvd segment.
>>>>
>>>> So, it appears there's a relationship between *where* the vcpu bo ends
>>>> up and the fence timeout.  But why does the issue manifest as a ring
>>>> fence timeout while testing the ib?  Unfortunately, I'm unable to find
>>>> something like a datasheet or developer's guide containing the finer
>>>> details of uvd.
>>>
>>>
>>> Mhm, there must be something wrong with programming bits 28-31 of the VCPU BO base address.
>>>
>>> Forcing the VCPU into the first 256 segment just makes those bits zero and so makes it work on your system.
>>>
>>> The problem is that this is basically just coincident. On other systems the base address can be completely different.
>>>
>>> See function uvd_v4_2_mc_resume() where the mmUVD_LMI_ADDR_EXT and mmUVD_LMI_EXT40_ADDR register is programmed and try to hack those two register writes and see if they really end up in the HW.
> 
> Okay, I did a read and compare after each write.
> 
> Both writes seem to go through on both the Kaveri and s9150:
> 
> Kaveri (512MB UMA Buffer):
> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF41FA00000, addr=0x00000001, wrote 0x00001001, read 0x00001001 [same]
> amdgpu 0000:00:01.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF41FA00000, addr=0x000000F4, wrote 0x800900F4, read 0x800900F4 [same]
> 
> s9150:
> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_ADDR_EXT: gpu_addr=0xF7FFA00000, addr=0x0000000F, wrote 0x0000F00F, read 0x0000F00F [same]
> amdgpu 0000:41:00.0: amdgpu: [drm] uvd_v4_2_mc_resume: mmUVD_LMI_EXT40_ADDR: gpu_addr=0xF7FFA00000, addr=0x000000F7, wrote 0x800900F7, read 0x800900F7 [same]
> 

I've also confirmed the patch works fine when segments other than
[0, 256M) are used.

E.g.: Both init and VA-API playback work fine with a UVD segment of
[1792M, 2048M) on Kaveri with a 2G UMA buffer.

> Thanks,
> John
> 
>>>
>>> I will try to find a Kaveri system which is still working to reproduce the issue.
>>>
>>> Thanks,
>>> Christian.
>>>
>>
>> I first saw this issue with a s9150.  I had serious reservations about
>> reporting the issue because, in its default configuration, the s9150 has
>> no display output.  I needed to figure out that yes, this is a real
>> issue, I didn't just shoot myself in the foot by enabling broken display
>> hardware.
>>
>> The issue affects all s9150s in a system, occurs in different slots and
>> numa nodes, still occurs when other hardware is added or removed, and
>> follows the s9150 from x399 to a significantly newer b650 system.
>>
>> The Kaveri iGPU, while also impacted, mainly serves to show that yes,
>> this issue is happening on more than just some dodgy s9150 setup.
>>
>> Anyway, hopefully these extra configuration details help narrow down the
>> problem.
>>
>> Thanks,
>> John
>>
>>>>
>>>> Well, what seems related in the code?  Where is the ring fence located?
>>>> It's placed inside the vcpu bo by amdgpu_fence_driver_start_ring().
>>>>
>>>> So, does this patch provide the correct solution to the problem?  Maybe
>>>> not.  But the solution seems plausible enough to at least send in the
>>>> patch for review.
>>>>
>>>> Thanks,
>>>> John
>>
>