[Mesa-dev] [PATCH] vl/dri3: handle the case of different GPU

Fri Sep 9 01:38:38 UTC 2016

On 08/09/16 05:59 PM, Christian König wrote:
> Am 08.09.2016 um 10:42 schrieb Michel Dänzer:
>> On 08/09/16 05:05 PM, Christian König wrote:
>>> Am 08.09.2016 um 08:23 schrieb Michel Dänzer:
>>>> On 08/09/16 01:13 PM, Nayan Deshmukh wrote:
>>>>> On Thu, Sep 8, 2016 at 9:03 AM, Michel Dänzer <michel at daenzer.net
>>>>> <mailto:michel at daenzer.net>> wrote:
>>>>> On 08/09/16 02:48 AM, Nayan Deshmukh wrote:
>>>>>> use a linear buffer in case of back buffer
>>>>>>
>>>>>> Signed-off-by: Nayan Deshmukh <nayan26deshmukh at gmail.com
>>>>>> <mailto:nayan26deshmukh at gmail.com>>
>>>>>
>>>>> However, as we discussed before, for various reasons it would
>>>>> probably be better to create separate linear buffers instead of making
>>>>> all buffers linear.
>>>>>
>>>>> So should I maintain a single linear buffer and copy the back
>>>>> buffer to
>>>>> it before sending it via the present extension?
>>>> It's better to create one linear buffer corresponding to each
>>>> non-linear
>>>> buffer with contents to be presented. Otherwise the rendering GPU may
>>>> overwrite the linear buffer contents while the presentation GPU is
>>>> still
>>>> reading from it, resulting in tearing-like artifacts.
>>> That approach isn't necessary. VDPAU has functions to query if an output
>>> surface is still displayed or not.
>>>
>>> If the application starts to render into a buffer while it is still
>>> being displayed tearing-like artifacts are the expected result.
>> You're talking about the buffers exposed to applications via VDAPU. I
>> was talking about using a single separate linear buffer which would be
>> used for presentation of all VDPAU buffers. There's no way for the
>> application to know when that's idle.
> 
> Ok, yes that makes more sense.
> 
>>
>>> Additional to that I've made the VDPAU output surfaces linear a while
>>> ago anyway, because it showed that tiling actually wasn't beneficial in
>>> this use case (a single quad rendered over the whole texture).
>> That's fine as long as the buffers are in VRAM, but when they're pinned
>> to GTT for sharing between GPUs, rendering to them with the 3D engine
>> results in bad PCIe bandwidth utilization, as Marek explained recently.
>> So even if the original buffers are already linear, it's better to keep
>> those in VRAM and use separate buffers for sharing between GPUs.
>>
> Mhm at least for VDPAU most compositions should happen on temporary
> buffers anyway when there are any filters enabled.

In that case, do the contents get into the final buffer via a blit or
some kind of triangle / quad draw operation?

> Anyway I would clearly suggest to handle that in the VDPAU state tracker
> and not in the DRI3 code, cause the handling needed seems to be
> different for VA-API and I would really like to avoid any additional
> copy for 4k playback.

The thing is, with a discrete GPU, having separate buffers for sharing
between GPUs and transferring the final contents to be presented to
those buffers using a blit might be faster than having any of the
previous steps render to the shared buffer in GTT directly. Only the
DRI3 specific code knows about this.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer