[Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

Thu Sep 15 02:01:16 UTC 2016

On Wed, Sep 14, 2016 at 9:42 PM, Michel Dänzer <michel at daenzer.net> wrote:
> On 15/09/16 08:20 AM, Ilia Mirkin wrote:
>> On Wed, Sep 7, 2016 at 12:06 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>> On Wed, Sep 7, 2016 at 5:36 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>> On Wed, Sep 7, 2016 at 4:08 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>>> On 07/09/16 04:19 AM, Christian König wrote:
>>>>>> Am 06.09.2016 um 21:05 schrieb Ilia Mirkin:
>>>>>>> On Tue, Sep 6, 2016 at 2:22 PM, Christian König
>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>> Am 06.09.2016 um 16:23 schrieb Ilia Mirkin:
>>>>>>>>> On Mon, Sep 5, 2016 at 2:48 AM, Michel Dänzer <michel at daenzer.net>
>>>>>>>>> wrote:
>>>>>>>>>> On 05/09/16 04:37 AM, Ilia Mirkin wrote:
>>>>>>>>>>> On Tue, Mar 8, 2016 at 7:21 AM, Christian König
>>>>>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>>>>> @@ -80,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
>>>>>>>>>>>>       res_tmpl.depth0 = 1;
>>>>>>>>>>>>       res_tmpl.array_size = 1;
>>>>>>>>>>>>       res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW |
>>>>>>>>>>>> PIPE_BIND_RENDER_TARGET |
>>>>>>>>>>>> -                   PIPE_BIND_LINEAR;
>>>>>>>>>>>> +                   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>
>>>>>>>>>>> This change appears to have semi-broken vdpau on nouveau. Whenever I
>>>>>>>>>>> flip on the OSD in mplayer, the rendering becomes *extremely* slow.
>>>>>>>>>>> However regular up-scaling without the OSD is plenty fast. This
>>>>>>>>>>> effectively is forcing the output surfaces to live in GART instead of
>>>>>>>>>>> VRAM.
>>>>>>>>>> Strictly speaking, they'd only need to be forced to GART while they're
>>>>>>>>>> actually being shared between different GPUs. That's how it works with
>>>>>>>>>> the amdgpu and radeon kernel drivers.
>>>>>>>>> Any suggestions on how to handle this? Perhaps reallocate + copy the
>>>>>>>>> surface in st/vdpau when actual dmabuf sharing is requested?
>>>>>>>>>
>>>>>>>>> To be clear - with this change, vdpau with nouveau is unusable in the
>>>>>>>>> presence of an OSD in mplayer. The OSD comes up whenever you seek
>>>>>>>>> around in the video, so in effect, it's unusable. Used to work great.
>>>>>>>>
>>>>>>>> Well I think you should clearly figure out why adding
>>>>>>>> PIPE_BIND_SHARED has
>>>>>>>> such dramatic effect.
>>>>>>> Because the buffer goes into GART. And then you try to blend on it,
>>>>>>> which involves readback from GART (that's how the functions OSD is
>>>>>>> based on work, I believe). We normally don't allocate renderable
>>>>>>> surfaces or textures in GART.
>>>>>>>
>>>>>>>> We not only need this for DMA-buf based interop, but also for the
>>>>>>>> DRI3 based
>>>>>>>> sharing of buffers with X.
>>>>>>>>
>>>>>>>> So that clearly sounds like a bug in nouveau to me.
>>>>>>> OK, so SHARED != GART? With nouveau, buffers are placed statically in
>>>>>>> either VRAM or GART, so I think that if it's shared it has to end up
>>>>>>> in GART, no?
>>>>>>
>>>>>> As far as I understand it no. Shared just means that we can share it
>>>>>> between applications, doesn't it? Or does it mean the buffer should be
>>>>>> shareable between GPUs?
>>>>>>
>>>>>> Could be that my understanding was wrong and so if it's the later feel
>>>>>> free to provide a patch to just remove the flag.
>>>>>>
>>>>>>> I'm pretty weak on all these concepts, as well as how the DRI3 stuff
>>>>>>> works, unfortunately.
>>>>>>
>>>>>> I have to confess I'm not so deeply into this stuff either. Marek,
>>>>>> Michel what exactly is the meaning of the flag?
>>>>>
>>>>> According to src/gallium/docs/source/screen.rst:
>>>>>
>>>>> * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
>>>>>   process.
>>>>>
>>>>> It's also used e.g. for buffers shared via DRI3. So I'm afraid this is
>>>>> something nouveau has to deal with better.
>>>>
>>>> Any suggestions that don't involve rewriting nouveau bo handling at
>>>> every level (kernel, ddx, mesa)?
>>>>
>>>> Otherwise I'll send a revert for this change.
>>>
>>> PIPE_BIND_SHARED means texture_get_handle is expected to be used on
>>> the resource, meaning that inter-API, inter-process, or inter-device
>>> sharing is possible. All window back buffers should have the flag. If
>>> they don't, it's a bug. If the flag causes nouveau to put the buffer
>>> in GART, it's a bug too. There is no reason to use GART for inter-API
>>> and inter-process sharing like VDPAU and DRI3 are.
>>>
>>> To be honest, the flag is pratically useless with respect to EGL and
>>> VDPAU, which allow sharing almost any texture.
>>>
>>> I suggest you fix nouveau. The first step would be to become less
>>> dependent on BIND flags whose existence is already questionable.
>>
>> As I suspected, merely flipping away from using PIPE_BIND_SHARED
>> doesn't work. By flipping the logic like this:
>>
>> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>> b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>> index f2e304f..5532794 100644
>> --- a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>> +++ b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>> @@ -377,7 +377,8 @@ nv50_miptree_create(struct pipe_screen *pscreen,
>>     }
>>     bo_config.nv50.tile_mode = mt->level[0].tile_mode;
>>
>> -   if (!bo_config.nv50.memtype && (pt->bind & PIPE_BIND_SHARED))
>> +   if (!bo_config.nv50.memtype && (pt->usage == PIPE_USAGE_STAGING ||
>> +                                   pt->usage == PIPE_USAGE_STREAM))
>>        mt->base.domain = NOUVEAU_BO_GART;
>>     else
>>        mt->base.domain = NV_VRAM_DOMAIN(nouveau_screen(pscreen));
>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>> b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>> index 27674f7..0d009bd 100644
>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>> @@ -299,7 +299,8 @@ nvc0_miptree_create(struct pipe_screen *pscreen,
>>     }
>>     bo_config.nvc0.tile_mode = mt->level[0].tile_mode;
>>
>> -   if (!bo_config.nvc0.memtype && (pt->usage == PIPE_USAGE_STAGING ||
>> pt->bind & PIPE_BIND_SHARED))
>> +   if (!bo_config.nvc0.memtype && (pt->usage == PIPE_USAGE_STAGING ||
>> +                                   pt->usage == PIPE_USAGE_STREAM))
>>        mt->base.domain = NOUVEAU_BO_GART;
>>     else
>>        mt->base.domain = NV_VRAM_DOMAIN(nouveau_screen(pscreen));
>>
>> I end up with
>>
>> nouveau 0000:04:00.0: DRM: Moving pinned object ffff88009503fc00!
>>
>> when trying to run glxgears with prime. It all works "locally" of
>> course. So really, what I need is a PIPE_USAGE_MIGHT_BE_PRIME. Nouveau
>> has been using PIPE_BIND_SHARED for that. Is there a quick fix? Should
>> I just disable VDPAU on nouveau and stop worrying about it? Make a
>> copy of the vdpau state tracker and change it so that it works well
>> with nouveau?
>
> The facts are that dma-buf is used for both PRIME and DRI3, and the way
> buffer placement is currently handled in nouveau doesn't work well for
> both cases, so it needs to be fixed. If you want to work around it
> instead, how you do that is up to you (as long as it's done inside
> nouveau specific code, which kind of rules out the last option above).

No, the current impl is pretty radeon-specific (note - it doesn't work
on nouveau, and no other drivers support the interfaces, so ... it's
radeon-specific). I could, instead, make something that works well
with nouveau (and presumably poorly with radeon, but who knows). This
shouldn't be too hard - just make the compositor output to a temporary
surface before copying things out to the prime-shareable one. However
the reality is that I haven't got the time or desire to do that. I've
sent a patch to remove the nouveau target for now, until an interested
party comes along and performs one of these options.

  -ilia