[RFC PATCH 2/2] drm/amdgpu/uvd: Ensure vcpu bos are within the uvd segment

Mon May 5 09:02:12 UTC 2025

> Simply changing the uvd vcpu bo (and therefore the firmware) to always
> be allocated in vram does *not* solve #3851.
> 
> Let me go into a bit of depth about how I arrived at this patch.
> 
> First, what sort of system configuration changes result in the uvd init
> failure?  It looks like having a display connected and changing the BAR
> size have an impact.  Next, which kernel change reliably triggers the
> issue?  The change is the switch to the buddy allocator.

Well that is not a resizable BAR, but rather the "VRAM" is just stolen system memory and we completely bypass the BAR to access it.

But the effect is the same. E.g. you have more memory CPU accessible than otherwise.

> 
> Now that the issue can be reliably triggered, where does the error code,
> -110 / -ETIMEDOUT, come from?  It turns out it's in
> amdgpu_uvd_ring_test_ib(), specifically a timeout while waiting on the
> ring's fence.
> 
> With that out of the way, what allocator-related change happens when a
> display is connected at startup?  The 'stolen_vga_memory' and related
> bos are created.  Adding a one page dummy bo to the same place in the
> driver can allow a headless configuration to now pass the uvd ring ib test.
> 
> Why does having these extra objects allocated result in a change in
> behavior?  Well, the switch to the buddy allocator drastically changes
> *where* in vram various objects end up being placed.  What about the BAR
> size change?  That ends up influencing where the objects are placed too.
> 
> Which objects related to uvd end up being moved around?  The uvd code
> has a function to force its objects into a specific segment after all.
> Well, it turns out the vcpu bo doesn't go through this function and is
> therefore being moved around.

That function is there because independent buffers (the message and the feedback for example) needs to be in the same 256MB segment.

> When the system configuration results in a ring ib timeout, the uvd vcpu
> bo is pinned *outside* the uvd segment.  When uvd init succeeds, the uvd
> vcpu bo is pinned *inside* the uvd segment.
> 
> So, it appears there's a relationship between *where* the vcpu bo ends
> up and the fence timeout.  But why does the issue manifest as a ring
> fence timeout while testing the ib?  Unfortunately, I'm unable to find
> something like a datasheet or developer's guide containing the finer
> details of uvd.

Mhm, there must be something wrong with programming bits 28-31 of the VCPU BO base address.

Forcing the VCPU into the first 256 segment just makes those bits zero and so makes it work on your system.

The problem is that this is basically just coincident. On other systems the base address can be completely different.

See function uvd_v4_2_mc_resume() where the mmUVD_LMI_ADDR_EXT and mmUVD_LMI_EXT40_ADDR register is programmed and try to hack those two register writes and see if they really end up in the HW.

I will try to find a Kaveri system which is still working to reproduce the issue.

Thanks,
Christian.

> 
> Well, what seems related in the code?  Where is the ring fence located?
> It's placed inside the vcpu bo by amdgpu_fence_driver_start_ring().
> 
> So, does this patch provide the correct solution to the problem?  Maybe
> not.  But the solution seems plausible enough to at least send in the
> patch for review.
> 
> Thanks,
> John