[Mesa-dev] [PATCH] Revert "st/vdpau: use linear layout for output surfaces"

Fri Sep 16 13:41:06 UTC 2016

On Fri, Sep 16, 2016 at 10:03 AM, Christian König
<deathsimple at vodafone.de> wrote:
> Am 16.09.2016 um 09:50 schrieb Michel Dänzer:
>>
>> On 16/09/16 04:33 PM, Christian König wrote:
>>>
>>> Am 15.09.2016 um 21:43 schrieb Dave Airlie:
>>>>
>>>> On 15 September 2016 at 17:43, Christian König
>>>> <deathsimple at vodafone.de> wrote:
>>>>>
>>>>> Am 15.09.2016 um 06:00 schrieb Ilia Mirkin:
>>>>>>
>>>>>> On Wed, Sep 14, 2016 at 11:58 PM, Dave Airlie <airlied at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> From: Dave Airlie <airlied at redhat.com>
>>>>>>>
>>>>>>> This reverts commit d180de35320eafa3df3d76f0e82b332656530126.
>>>>>>>
>>>>>>> This is a radeon specific hack that causes problems on nouveau
>>>>>>> when combined with the SHARED flag later. If radeonsi needs a fix
>>>>>>> for this, please fix it in the driver.
>>>>>
>>>>> Actually it isn't radeon specific. Using linear surfaces for this makes
>>>>> sense because tilling isn't beneficial and the surfaces can
>>>>> potentially be
>>>>> shared with other GPUs using the VDPAU OpenGL interop.
>>>>
>>>> Who says tiling isn't beneficial though? Maybe on other GPUs tiling
>>>> might be, it
>>>> still seems like a radeon centric view to me.
>>>
>>> Tiling helps with the memory throughput because it makes pixels which
>>> are rendered together appear near to each other in the memory layout as
>>> well.
>>>
>>> Since multimedia as well as compute applications usually always render
>>> to the whole texture/array/matrix it usually makes no sense at all to
>>> enable it for those tasks.
>>
>> Are you sure about that? Tiling also affects the order of memory accesses,
>> which could affect performance even when all pixels of a surface are
>> written.
>
>
> I can't 100% rule that out, but the hardware I've encountered so far orders
> the execution by the memory layout of the output buffer which is written to
> maximize throughput.
>
> On the other hand I never double checked how the MC on AMD hardware really
> works in the documentation, just toke some measurements and it didn't seemed
> to be beneficial at all.
>
> tiling/shuffling can actually hurt performance quite a bit when the whole
> buffer is written and the execution order doesn't follow the memory pattern,
> so I think we would have noticed that.
>
> Where tilling could help quite a bit is with the video surfaces, cause the
> deinterlacing shaders need to read them quite extensively, but unfortunately
> our decoding hardware can't fill it in the way it is needed.

Tiling is mainly used for drawing triangles whose average bounding box
is a square and since you want to touch as few cache lines as
possible, the tiles should also be squares.

Tiling doesn't make any sense for video decoding and also isn't
desirable for many compute applications.

Marek