"Fixes" for page flipping under PRIME on AMD & nouveau

Christian König deathsimple at vodafone.de
Wed Aug 17 16:27:10 UTC 2016


> AMD uses copy swaps because radeon/amdgpu kms can't switch the
> scanout mode from tiled to linear on the fly during flips.
Well I'm not an expert on this, but as far as I know the bigger problem 
is that the dedicated AMD hardware generations you are targeting usually 
can't reliable scanout from system memory without a rather complicated 
setup.

So that is a complete NAK to the radeon changes.

Regards,
Christian.

Am 17.08.2016 um 18:12 schrieb Mario Kleiner:
> Hi,
>
> i spent some time playing with DRI3/Present + PRIME for testing
> how well it works for Optimus/Enduro style setups wrt. page flipping
> on the current kernel/mesa/xorg. I want page flipping, because
> neuroscience/medical applications need the reliable timing/timestamping
> and tear free presentation we currently only can get via page
> flipping, but not the copyswap path.
>
> Intel as display gpu + nouveau for render offload worked nicely
> on intel-ddx with page flipping, proper timing, dmabuf fence sync
> and all.
>
> AMD uses copy swaps because radeon/amdgpu kms can't switch the
> scanout mode from tiled to linear on the fly during flips. That's
> a todo in itself. For the moment i used the ati-ddx with Option
> "ColorTiling/ColorTiling2D" "off" to force my pair of old Radeon
> HD-5770's into linear mode so page flipping can be used for
> prime. The current modesetting-ddx will use page flipping in
> any case as it doesn't detect the tiling format mismatch.
>
> nouveau uses page flips.
>
> Turns out that prime + page flipping currently doesn't work
> on nouveau and amd. The first offload rendered images from
> the imported dmabufs show up properly, but then the display
> is stuck alternating between the first two or three rendered
> frames.
>
> The problem is that during the pageflip ioctl we pin the
> dmabuf into VRAM in preparation for scanout, then unpin it
> when we are done with it at next flip, but the buffer stays
> in the VRAM memory domain. Next time we flip to the buffer
> again, the driver skips the DMA copy from GTT to VRAM during
> pinning, because the buffers content apparently already resides
> in VRAM. Therefore it doesn't update the VRAM copy with the updated
> dmabuf content in system RAM, so freshly rendered frames from the
> prime export/render offload gpu never reach the display gpu and one
> only sees stale images.
>
> The attached patches for nouveau and radeon kms seem to work
> pretty ok, page flipping works, display updates, tear-free,
> dmabuf fence sync works, onset timing/timestamping is correct.
> They simply pin the buffer back into GTT, then unpin, to force
> a move of the buffer into the GTT domain, and thereby force the
> following pin to do a new copy from GTT -> VRAM. The code tries
> to avoid a useless copy from VRAM -> GTT during the pin op.
>
> However, the approach feels very much like a hack, so i assume
> this is not the proper way of doing it? I looked what ttm has
> to offer, but couldn't find anything elegant and obvious. Maybe
> there is a way to evict a bo without actually copying data back
> to RAM? Or to invalidate the VRAM copy as stale? Maybe i just
> missed something, as i'm not very familiar with ttm.
>
> Thoughts or suggestions?
>
> Another insight with my hacks is so far that nouveau seems to
> be fast as prime exporter/renderoffload, but rather slow as
> display gpu/prime importer, as tested on a 2008 or 2009
> MacBookPro dual-Nvidia laptop.
>
> AMD, as tested with dual Radeon HD-5770 seems to be fast as prime
> importer/display gpu, but very slow as prime exporter/render offload,
> e.g., taking 16 msecs to get a 1920x1080 framebuffer into RAM. Seems
> that Mesa's blitImage function is the slow bit here. On r600 it seems
> to draw a textured triangle strip to detile the gpu renderbuffer and
> copy it into GTT. As drawing a textured fullscreen quad is normally
> much faster, something special seems to be going on there wrt. DMA?
> However, i don't have a realistic real Enduro test setup with AMD
> iGPU + dGPU, only this cobbled together dual HD-5770's in a MacPro,
> so this could be wrong.
>
> thanks,
> -mario
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel




More information about the dri-devel mailing list