"Fixes" for page flipping under PRIME on AMD & nouveau

Michel Dänzer michel at daenzer.net
Mon Aug 29 03:16:36 UTC 2016


On 27/08/16 04:57 AM, Mario Kleiner wrote:
> On 08/18/2016 04:23 AM, Michel Dänzer wrote:
>> On 18/08/16 01:12 AM, Mario Kleiner wrote:
> 
> One thing that confuses me so far is that visual results and measurment
> suggest it works nicely, properly serializing the rendering/detiling
> blit and the pageflip. But when i ftrace the Intel drivers
> reservation_object_wait_timeout_rcu() call where it normally waits for
> the dmabuf fence to complete then i never see it blocking for more than
> a few dozen microseconds, and i couldn't find any other place where it
> blocks on detiling blit completion yet. Iow. it seems to work correctly
> in practice, but i don't know where it actually blocks.

It actually doesn't work correctly in all cases yet:
https://bugs.freedesktop.org/show_bug.cgi?id=95472


>>> Turns out that prime + page flipping currently doesn't work
>>> on nouveau and amd. The first offload rendered images from
>>> the imported dmabufs show up properly, but then the display
>>> is stuck alternating between the first two or three rendered
>>> frames.
>>>
>>> The problem is that during the pageflip ioctl we pin the
>>> dmabuf into VRAM in preparation for scanout, then unpin it
>>> when we are done with it at next flip, but the buffer stays
>>> in the VRAM memory domain.
>>
>> Sounds like you found a bug here: BOs which are being shared between
>> different GPUs should always be pinned to GTT, moving them to VRAM (and
>> consequently the page flip) should fail.
> 
> Seems so, although i hoped i was fixing a bug, not exploiting a
> loophole. In practice i haven't observed trouble with the hack so far. I
> havent't looked deeply enough into how the dma api below dmabuf
> operates, so this is just guesswork, but i suspect the reason that this
> doesn't blow up in an obvious way is that if the render offload gpu
> exports the dmabuf then the pages get pinned/locked into system RAM, so
> the pages can't move around or get paged out to swap, as long as the
> dmabuf stays exported. When the dmabuf importing AMD or nouveau display
> gpu then moves the bo from GTT to VRAM (or pseudo-moves it back with my
> hack) all that changes is some pin refcount for the RAM pages, but the
> refcount always stays non-zero and system RAM isn't freed or moved
> around during the session. I just wonder if this bug couldn't somehow be
> turned into a proper feature?

I'm afraid not; BOs which are being shared between devices are supposed
to be pinned to GTT, and pinned BOs aren't supposed to move.

However, something similar to your patches could be done in the DDX
drivers, using the dedicated scanout pixmap mechanism.


>> The latest versions of DCE support scanning out from GTT, so that might
>> be a good solution at least for Carrizo and newer APUs, not sure it
>> makes sense for dGPUs though.
> 
> That would be good to have. But that means DCE-11 or later only? What is
> the constraint on older parts, does it need contiguous memory?

Presumably. Anyway, from Christian's description it sounds like it'll be
tricky to get this working even with current APUs. :(


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the dri-devel mailing list