[Mesa-dev] [PATCH] Revert "st/vdpau: use linear layout for output surfaces"

Fri Sep 16 08:03:36 UTC 2016

Am 16.09.2016 um 09:50 schrieb Michel Dänzer:
> On 16/09/16 04:33 PM, Christian König wrote:
>> Am 15.09.2016 um 21:43 schrieb Dave Airlie:
>>> On 15 September 2016 at 17:43, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>> Am 15.09.2016 um 06:00 schrieb Ilia Mirkin:
>>>>> On Wed, Sep 14, 2016 at 11:58 PM, Dave Airlie <airlied at gmail.com>
>>>>> wrote:
>>>>>> From: Dave Airlie <airlied at redhat.com>
>>>>>>
>>>>>> This reverts commit d180de35320eafa3df3d76f0e82b332656530126.
>>>>>>
>>>>>> This is a radeon specific hack that causes problems on nouveau
>>>>>> when combined with the SHARED flag later. If radeonsi needs a fix
>>>>>> for this, please fix it in the driver.
>>>> Actually it isn't radeon specific. Using linear surfaces for this makes
>>>> sense because tilling isn't beneficial and the surfaces can
>>>> potentially be
>>>> shared with other GPUs using the VDPAU OpenGL interop.
>>> Who says tiling isn't beneficial though? Maybe on other GPUs tiling
>>> might be, it
>>> still seems like a radeon centric view to me.
>> Tiling helps with the memory throughput because it makes pixels which
>> are rendered together appear near to each other in the memory layout as
>> well.
>>
>> Since multimedia as well as compute applications usually always render
>> to the whole texture/array/matrix it usually makes no sense at all to
>> enable it for those tasks.
> Are you sure about that? Tiling also affects the order of memory 
> accesses, which could affect performance even when all pixels of a 
> surface are written.

I can't 100% rule that out, but the hardware I've encountered so far 
orders the execution by the memory layout of the output buffer which is 
written to maximize throughput.

On the other hand I never double checked how the MC on AMD hardware 
really works in the documentation, just toke some measurements and it 
didn't seemed to be beneficial at all.

tiling/shuffling can actually hurt performance quite a bit when the 
whole buffer is written and the execution order doesn't follow the 
memory pattern, so I think we would have noticed that.

Where tilling could help quite a bit is with the video surfaces, cause 
the deinterlacing shaders need to read them quite extensively, but 
unfortunately our decoding hardware can't fill it in the way it is needed.

Regards,
Christian.