[Mesa-dev] [PATCH] vl/dri3: handle the case of different GPU

Fri Sep 9 08:06:28 UTC 2016

Am 09.09.2016 um 03:38 schrieb Michel Dänzer:
> On 08/09/16 05:59 PM, Christian König wrote:
>> Am 08.09.2016 um 10:42 schrieb Michel Dänzer:
>>> On 08/09/16 05:05 PM, Christian König wrote:
>>>> Am 08.09.2016 um 08:23 schrieb Michel Dänzer:
>>>>> On 08/09/16 01:13 PM, Nayan Deshmukh wrote:
>>>>>> On Thu, Sep 8, 2016 at 9:03 AM, Michel Dänzer <michel at daenzer.net
>>>>>> <mailto:michel at daenzer.net>> wrote:
>>>>>> On 08/09/16 02:48 AM, Nayan Deshmukh wrote:
>>>>>>> use a linear buffer in case of back buffer
>>>>>>>
>>>>>>> Signed-off-by: Nayan Deshmukh <nayan26deshmukh at gmail.com
>>>>>>> <mailto:nayan26deshmukh at gmail.com>>
>>>>>> However, as we discussed before, for various reasons it would
>>>>>> probably be better to create separate linear buffers instead of making
>>>>>> all buffers linear.
>>>>>>
>>>>>> So should I maintain a single linear buffer and copy the back
>>>>>> buffer to
>>>>>> it before sending it via the present extension?
>>>>> It's better to create one linear buffer corresponding to each
>>>>> non-linear
>>>>> buffer with contents to be presented. Otherwise the rendering GPU may
>>>>> overwrite the linear buffer contents while the presentation GPU is
>>>>> still
>>>>> reading from it, resulting in tearing-like artifacts.
>>>> That approach isn't necessary. VDPAU has functions to query if an output
>>>> surface is still displayed or not.
>>>>
>>>> If the application starts to render into a buffer while it is still
>>>> being displayed tearing-like artifacts are the expected result.
>>> You're talking about the buffers exposed to applications via VDAPU. I
>>> was talking about using a single separate linear buffer which would be
>>> used for presentation of all VDPAU buffers. There's no way for the
>>> application to know when that's idle.
>> Ok, yes that makes more sense.
>>
>>>> Additional to that I've made the VDPAU output surfaces linear a while
>>>> ago anyway, because it showed that tiling actually wasn't beneficial in
>>>> this use case (a single quad rendered over the whole texture).
>>> That's fine as long as the buffers are in VRAM, but when they're pinned
>>> to GTT for sharing between GPUs, rendering to them with the 3D engine
>>> results in bad PCIe bandwidth utilization, as Marek explained recently.
>>> So even if the original buffers are already linear, it's better to keep
>>> those in VRAM and use separate buffers for sharing between GPUs.
>>>
>> Mhm at least for VDPAU most compositions should happen on temporary
>> buffers anyway when there are any filters enabled.
> In that case, do the contents get into the final buffer via a blit or
> some kind of triangle / quad draw operation?

It's a quad draw operation.

And yeah thinking about it using a blit (e.g. the DMA) is probably a 
memory access pattern which is much more friendly to bus transactions.

>> Anyway I would clearly suggest to handle that in the VDPAU state tracker
>> and not in the DRI3 code, cause the handling needed seems to be
>> different for VA-API and I would really like to avoid any additional
>> copy for 4k playback.
> The thing is, with a discrete GPU, having separate buffers for sharing
> between GPUs and transferring the final contents to be presented to
> those buffers using a blit might be faster than having any of the
> previous steps render to the shared buffer in GTT directly. Only the
> DRI3 specific code knows about this.
>
Indeed, but I was wondering if we couldn't export that information to 
the state tracker somehow.

4k playbacks nearly max out our memory bandwidth limits, every copy 
avoided is very helpful with that.

Regards,
Christian.