[Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

Thu Sep 15 02:15:42 UTC 2016

On 15/09/16 11:01 AM, Ilia Mirkin wrote:
> On Wed, Sep 14, 2016 at 9:42 PM, Michel Dänzer <michel at daenzer.net> wrote:
>> On 15/09/16 08:20 AM, Ilia Mirkin wrote:
>>> On Wed, Sep 7, 2016 at 12:06 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>> On Wed, Sep 7, 2016 at 5:36 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>> On Wed, Sep 7, 2016 at 4:08 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>>>> On 07/09/16 04:19 AM, Christian König wrote:
>>>>>>> Am 06.09.2016 um 21:05 schrieb Ilia Mirkin:
>>>>>>>> On Tue, Sep 6, 2016 at 2:22 PM, Christian König
>>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>> Am 06.09.2016 um 16:23 schrieb Ilia Mirkin:
>>>>>>>>>> On Mon, Sep 5, 2016 at 2:48 AM, Michel Dänzer <michel at daenzer.net>
>>>>>>>>>> wrote:
>>>>>>>>>>> On 05/09/16 04:37 AM, Ilia Mirkin wrote:
>>>>>>>>>>>> On Tue, Mar 8, 2016 at 7:21 AM, Christian König
>>>>>>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>>>>>>> @@ -80,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
>>>>>>>>>>>>>       res_tmpl.depth0 = 1;
>>>>>>>>>>>>>       res_tmpl.array_size = 1;
>>>>>>>>>>>>>       res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW |
>>>>>>>>>>>>> PIPE_BIND_RENDER_TARGET |
>>>>>>>>>>>>> -                   PIPE_BIND_LINEAR;
>>>>>>>>>>>>> +                   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>
>>>>>>>>>>>> This change appears to have semi-broken vdpau on nouveau. Whenever I
>>>>>>>>>>>> flip on the OSD in mplayer, the rendering becomes *extremely* slow.
>>>>>>>>>>>> However regular up-scaling without the OSD is plenty fast. This
>>>>>>>>>>>> effectively is forcing the output surfaces to live in GART instead of
>>>>>>>>>>>> VRAM.
>>>>>>>>>>> Strictly speaking, they'd only need to be forced to GART while they're
>>>>>>>>>>> actually being shared between different GPUs. That's how it works with
>>>>>>>>>>> the amdgpu and radeon kernel drivers.
>>>>>>>>>> Any suggestions on how to handle this? Perhaps reallocate + copy the
>>>>>>>>>> surface in st/vdpau when actual dmabuf sharing is requested?
>>>>>>>>>>
>>>>>>>>>> To be clear - with this change, vdpau with nouveau is unusable in the
>>>>>>>>>> presence of an OSD in mplayer. The OSD comes up whenever you seek
>>>>>>>>>> around in the video, so in effect, it's unusable. Used to work great.
>>>>>>>>>
>>>>>>>>> Well I think you should clearly figure out why adding
>>>>>>>>> PIPE_BIND_SHARED has
>>>>>>>>> such dramatic effect.
>>>>>>>> Because the buffer goes into GART. And then you try to blend on it,
>>>>>>>> which involves readback from GART (that's how the functions OSD is
>>>>>>>> based on work, I believe). We normally don't allocate renderable
>>>>>>>> surfaces or textures in GART.
>>>>>>>>
>>>>>>>>> We not only need this for DMA-buf based interop, but also for the
>>>>>>>>> DRI3 based
>>>>>>>>> sharing of buffers with X.
>>>>>>>>>
>>>>>>>>> So that clearly sounds like a bug in nouveau to me.
>>>>>>>> OK, so SHARED != GART? With nouveau, buffers are placed statically in
>>>>>>>> either VRAM or GART, so I think that if it's shared it has to end up
>>>>>>>> in GART, no?
>>>>>>>
>>>>>>> As far as I understand it no. Shared just means that we can share it
>>>>>>> between applications, doesn't it? Or does it mean the buffer should be
>>>>>>> shareable between GPUs?
>>>>>>>
>>>>>>> Could be that my understanding was wrong and so if it's the later feel
>>>>>>> free to provide a patch to just remove the flag.
>>>>>>>
>>>>>>>> I'm pretty weak on all these concepts, as well as how the DRI3 stuff
>>>>>>>> works, unfortunately.
>>>>>>>
>>>>>>> I have to confess I'm not so deeply into this stuff either. Marek,
>>>>>>> Michel what exactly is the meaning of the flag?
>>>>>>
>>>>>> According to src/gallium/docs/source/screen.rst:
>>>>>>
>>>>>> * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
>>>>>>   process.
>>>>>>
>>>>>> It's also used e.g. for buffers shared via DRI3. So I'm afraid this is
>>>>>> something nouveau has to deal with better.
>>>>>
>>>>> Any suggestions that don't involve rewriting nouveau bo handling at
>>>>> every level (kernel, ddx, mesa)?
>>>>>
>>>>> Otherwise I'll send a revert for this change.
>>>>
>>>> PIPE_BIND_SHARED means texture_get_handle is expected to be used on
>>>> the resource, meaning that inter-API, inter-process, or inter-device
>>>> sharing is possible. All window back buffers should have the flag. If
>>>> they don't, it's a bug. If the flag causes nouveau to put the buffer
>>>> in GART, it's a bug too. There is no reason to use GART for inter-API
>>>> and inter-process sharing like VDPAU and DRI3 are.
>>>>
>>>> To be honest, the flag is pratically useless with respect to EGL and
>>>> VDPAU, which allow sharing almost any texture.
>>>>
>>>> I suggest you fix nouveau. The first step would be to become less
>>>> dependent on BIND flags whose existence is already questionable.
>>>
>>> As I suspected, merely flipping away from using PIPE_BIND_SHARED
>>> doesn't work. By flipping the logic like this:
>>>
>>> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>>> b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>>> index f2e304f..5532794 100644
>>> --- a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>>> +++ b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>>> @@ -377,7 +377,8 @@ nv50_miptree_create(struct pipe_screen *pscreen,
>>>     }
>>>     bo_config.nv50.tile_mode = mt->level[0].tile_mode;
>>>
>>> -   if (!bo_config.nv50.memtype && (pt->bind & PIPE_BIND_SHARED))
>>> +   if (!bo_config.nv50.memtype && (pt->usage == PIPE_USAGE_STAGING ||
>>> +                                   pt->usage == PIPE_USAGE_STREAM))
>>>        mt->base.domain = NOUVEAU_BO_GART;
>>>     else
>>>        mt->base.domain = NV_VRAM_DOMAIN(nouveau_screen(pscreen));
>>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>>> b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>>> index 27674f7..0d009bd 100644
>>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c
>>> @@ -299,7 +299,8 @@ nvc0_miptree_create(struct pipe_screen *pscreen,
>>>     }
>>>     bo_config.nvc0.tile_mode = mt->level[0].tile_mode;
>>>
>>> -   if (!bo_config.nvc0.memtype && (pt->usage == PIPE_USAGE_STAGING ||
>>> pt->bind & PIPE_BIND_SHARED))
>>> +   if (!bo_config.nvc0.memtype && (pt->usage == PIPE_USAGE_STAGING ||
>>> +                                   pt->usage == PIPE_USAGE_STREAM))
>>>        mt->base.domain = NOUVEAU_BO_GART;
>>>     else
>>>        mt->base.domain = NV_VRAM_DOMAIN(nouveau_screen(pscreen));
>>>
>>> I end up with
>>>
>>> nouveau 0000:04:00.0: DRM: Moving pinned object ffff88009503fc00!
>>>
>>> when trying to run glxgears with prime. It all works "locally" of
>>> course. So really, what I need is a PIPE_USAGE_MIGHT_BE_PRIME. Nouveau
>>> has been using PIPE_BIND_SHARED for that. Is there a quick fix? Should
>>> I just disable VDPAU on nouveau and stop worrying about it? Make a
>>> copy of the vdpau state tracker and change it so that it works well
>>> with nouveau?
>>
>> The facts are that dma-buf is used for both PRIME and DRI3, and the way
>> buffer placement is currently handled in nouveau doesn't work well for
>> both cases, so it needs to be fixed. If you want to work around it
>> instead, how you do that is up to you (as long as it's done inside
>> nouveau specific code, which kind of rules out the last option above).
> 
> No, the current impl is pretty radeon-specific (note - it doesn't work
> on nouveau, and no other drivers support the interfaces, so ... it's
> radeon-specific).

We're getting into semantics here, but since the reason it doesn't work
well with nouveau is a fundamental issue in nouveau (which should also
affect at least DRI3 in general), while you may call it "de facto radeon
specific" if you're so inclined, that doesn't make the implementation
actually radeon specific.

> I could, instead, make something that works well with nouveau (and
> presumably poorly with radeon, but who knows).

Have fun with that.

> This shouldn't be too hard - just make the compositor output to a temporary
> surface before copying things out to the prime-shareable one.

You mean something like https://patchwork.freedesktop.org/patch/110375/ ?

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer