"Fixes" for page flipping under PRIME on AMD & nouveau
Mario Kleiner
mario.kleiner.de at gmail.com
Fri Aug 26 20:07:17 UTC 2016
On 08/18/2016 04:32 AM, Michel Dänzer wrote:
> On 18/08/16 08:51 AM, Mario Kleiner wrote:
>>
>> That's what the ati-ddx/amdgpu-ddx does at the moment, as it detects the
>> mismatch in tiling flags and uses the DRI3/Present copy path instead of
>> the pageflip path. The problem is that the servers Present
>> implementation doesn't request a vsync'ed start of the copy operation [...]
>
> It waits for vblank before starting the copy.
>
Yes, a vblank event triggers the present_execute in the server. But all
the latency from vblank event dispatch to the copy command packet
hitting the gpu is still way too bad to avoid tearing. I tried again and
couldn't find a single intel/amd/nvidia gpu here that doesn't tear more
or less badly depending on load with DRI3/Present Copyswaps. Even
tearfree wouldn't be good enough for my kind of applications as crucial
timing/timestamps could still be off frequently by at least 1 frame.
>
>> There is this other approach from NVidia's Alex Goins for their
>> proprietary driver, whose patches landed in the X-Server 1.19 master
>> branch a couple of weeks ago. I haven't read his patches in detail yet,
>> and i so far couldn't successfully test them with the reference
>> implementation in modesetting ddx 1.19. Afaik there the display gpu
>> exports a pair of scanout friendly, page flipping compatible dmabufs (i
>> assume linear, contiguous, accessible by the display engines),
>
> FWIW, that wouldn't be possible with our "older" GPUs which can't scan
> out from GTT: A BO can be either shared with another GPU or scanout
> friendly, not both at the same time.
>
Ok, good to know.
>
>> and the offload gpu imports those and renders into them. That saves
>> one extra copy, so should be somewhat more efficient.
>
> Using two shared buffers actually isn't as efficient as possible wrt
> inter-GPU bandwidth.
>
Out of interest, why? You'd have only one detiling copy VRAM -> RAM? Or
is it about switching some kind of GTT mappings with two buffers that is
inefficient?
>
>> Setting it up seems to be more involved and less flexible though. So far
>> i couldn't make it work here for testing. Maybe bugs, maybe mistakes on
>> my side, maybe i just have the wrong hardware for it.
>
> Yeah, my impression has been it's a rather complicated solution geared
> towards the Intel iGPU + proprietary nVidia use case.
>
>
Setting up output source/output sink is not fun, as i learned now,
rather clumsy and complex compared to render offload. I hope the real
thing will come with some fool-proof one-click setup GUI, otherwise i
don't have great hopes, given the technical skill level of my users. I
still didn't manage to get it working, not even with the new Nvidia
proprietary beta drivers on a real Optimus laptop.
-mario
More information about the dri-devel
mailing list