[Mesa-dev] [PATCH 07/10] egl/android: Make drm_gralloc headers optional

Tue Jul 19 15:40:20 UTC 2016

On 19 July 2016 at 14:36, Rob Clark <robdclark at gmail.com> wrote:
> On Tue, Jul 19, 2016 at 6:54 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>> On 19 July 2016 at 04:21, Tomasz Figa <tfiga at chromium.org> wrote:
>>> On Tue, Jul 19, 2016 at 2:35 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>> On 18 July 2016 at 16:38, Tomasz Figa <tfiga at chromium.org> wrote:
>>>>> On Mon, Jul 18, 2016 at 11:58 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>>> On 18 July 2016 at 13:02, Tomasz Figa <tfiga at chromium.org> wrote:
>>>>>>> On Mon, Jul 18, 2016 at 7:28 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>>>>> Hi Tomasz,
>>>>>>>>
>>>>>>>> On 15 July 2016 at 08:53, Tomasz Figa <tfiga at chromium.org> wrote:
>>>>>>>>
>>>>>>>>> +#define DRM_RENDER_DEV_NAME  "%s/renderD%d"
>>>>>>>>> +
>>>>>>>>> +static int
>>>>>>>>> +droid_open_device(_EGLDisplay *dpy)
>>>>>>>>> +{
>>>>>>>>> +   struct dri2_egl_display *dri2_dpy = dpy->DriverData;
>>>>>>>>> +   const int limit = 64;
>>>>>>>>> +   const int base = 128;
>>>>>>>>> +   int fd;
>>>>>>>>> +   int i;
>>>>>>>>> +
>>>>>>>>> +   for (i = 0; i < limit; ++i) {
>>>>>>>>> +      char *card_path;
>>>>>>>>> +      if (asprintf(&card_path, DRM_RENDER_DEV_NAME, DRM_DIR_NAME, base + i) < 0)
>>>>>>>> Why do we need any of this ? What gralloc implementation are you guys using ?
>>>>>>>
>>>>>>> We are using our heavily rewritten fork of some old drm_gralloc
>>>>>>> release. It supports only render nodes and PRIME FDs and doesn't
>>>>>>> export the DRI device FD outside of its internals (which isn't
>>>>>>> actually even fully correct, at least for PRIME and render nodes, see
>>>>>>> my reply to Rob's comments).
>>>>>>>
>>>>>> That explain it, since https://chromium.googlesource.com/ does not
>>>>>> have gralloc, and
>>>>>> https://android.googlesource.com/platform/external/drm_gralloc/ has
>>>>>> both the DRM_FD define and the gem/flink function(s)?
>>>>>>
>>>>>> Can I suggest porting the fd drm_gralloc/gbm_gralloc patches to your
>>>>>> private copy/repo. This way we'll have some consistency throughout
>>>>>> gralloc implementations
>>>>>
>>>>> I'd prefer if any code using flink names was not added back. On top of
>>>>> that, our drm_gralloc doesn't really have much in common with that
>>>>> from android-x86 anymore (as I said, it was heavily rewritten) and
>>>>> there is not even a chance that with its current design flink names
>>>>> could even work.
>>>>>
>>>>> Also I'm wondering why we want to consider current brokenness of
>>>>> drm_gralloc as something to be consistent with. It's supposed to be a
>>>>> HAL library providing an uniform abstraction, but it exports private
>>>>> APIs on the side instead. Moreover, as I mentioned before, flink names
>>>>> are considered insecure and it would be really much better if we could
>>>>> just forget about them.
>>>>>
>>>>>> and you can use gbm_gralloc directly in the
>>>>>> (hopefully) not too distant future.
>>>>>
>>>>> I agree with this part, though. gbm_gralloc is definitely something
>>>>> that we might want to migrate to in the future. Although it's a bit
>>>>> lacking at the moment, so it might need a bit more time to develop the
>>>>> missing bits. [I'm CCing Gurchetan, who was investigating GBM-backed
>>>>> gralloc usable for our purposes.]
>>>>>
>>>>> In any case, the missing flink API is quite easy to handle and can be
>>>>> just stubbed out in a local header as you suggested. I don't think it
>>>>> would hurt anyone and would definitely help us and anyone not willing
>>>>> to export any private APIs from their gralloc and rely only on the
>>>>> public HAL API.
>>>>>
>>>> Looks like I wasn't clear enough here, realyl sorry about that. No
>>>> objection on nuking _any_ of the gem/flink paths, but hoping to have
>>>> the behaviour consistent with the one described in
>>>> get_native_buffer_fd.
>>>
>>> Did you mean having the PRIME FD in native_handle_t::data[0]?
>>>
>>> If so, it's more or less guaranteed by the API, because all file
>>> descriptors in handle have to be stored in first N (equals to
>>> native_handle_t::numFds) ints of native_handle_t::data[] for
>>> respective general code to properly transfer the FDs through binder
>>> when sharing between processes.
>>>
>>> Our gralloc currently supports only one PRIME FD per buffer (no
>>> separate memory planes for planar YUV) and stores it exactly in
>>> native_handle_t::data[0].
>>>
>> Wasn't sure if the PRIME FD is at idx 0. Glad to hear it's there, thanks.
>>
>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Afaict the latter must provide reasonable result for
>>>>>>>> hw_get_module(GRALLOC_HARDWARE_MODULE_ID...) and as it's missing the
>>>>>>>> perform hook existing code should work just fine. Right ?
>>>>>>>
>>>>>>> Existing code would fail with -1 as file descriptor, wouldn't it? Or
>>>>>>> I'm failing to see something?
>>>>>>>
>>>>>> Nope you're spot on - I had a dull moment. May I suggest revering the
>>>>>> patch which removed the GRALLOC_MODULE_PERFORM_GET_DRM_FD handling in
>>>>>> your gralloc ? Reason being is that the proposed code is very 'flaky'
>>>>>> and can open the wrong render node on systems which have more than
>>>>>> one.
>>>>>
>>>>> I think the answer is a bit of yes and no at the same time.
>>>>>
>>>>> Starting with no, it's incorrect for gralloc to share the DRI device
>>>>> FD with Mesa for multiple reasons:
>>>>>  - there are cases when the allocator used is different that the render node,
>>>> Can you please provide an example how the current open-source
>>>> gralloc/EGL stack might hit this ? Only a mix of closed and
>>>> open-source components comes to mind :-\
>>>
>>> Well, yes. I don't think I can hide the fact that we have to use
>>> closed source components on some platforms. To put it simply, that's
>>> the reality of non-Intel world. On Intel-based platforms we indeed
>>> have the same allocating and rendering device.
>>>
>>> I can also imagine some users having some other custom gralloc, for
>>> example based on Ion (which I personally don't like, but some people
>>> prefer...). I just want the existing Mesa code to be as less tied to
>>> any private interfaces and as much compatible with other solutions as
>>> possible.
>>>
>> Fully, agree having something targeting a single (set) of usecases isn't ideal.
>>
>>> I might be able to give one more use case here as well, but need to
>>> consult with some people internally first.
>>>
>>>>
>>>>>  - gralloc and Mesa might be stepping over each other - they would
>>>>> share the GEM handle namespace, which is not refcounted,
>>>> When working with gbm_gralloc that is a bugfix, imho. Other
>>>> implementations such as drm_gralloc (and/or minigbm ?) might be more
>>>> susceptible to such issues.
>>>>
>>>>>  - it is generally undesirable to let an external library mess with
>>>>> graphics context, because it is bug-prone (see above).
>>>>>
>>>> Hmm true, exposing this to users is not a good idea. We could keep it
>>>> as a interface private between {drm,gbm,X}gralloc and mesa.
>>>
>>> What I mean is that {drm,gbm,X}gralloc might be doing some operations
>>> on the DRI FD it shares with Mesa, which is bug prone, as in the
>>> example above.
>>>
>>>>
>>>>> Then we have a small yes: We don't have to share the same FD, we can
>>>>> just make the implementation of that perform call create a new FD
>>>>> every time it's called.
>>>>>
>>>>> However, after a bit of reflection, there is another no, because:
>>>>> - gralloc has no knowledge what drivers are available in Mesa, so in
>>>>> case there are multiple DRI nodes, it has no way to know which one to
>>>>> open for Mesa to be happy.
>>>> I'm afraid I cannot parse that, can you please elaborate ?
>>>
>>> It basically boils down to the problem of using different allocator
>>> and different rendering device.
>>>
>>> Let's take one of our example systems. There are 3 DRI nodes, vgem,
>>> allocator (GEM-enabled display subsystem) and GPU (no GEM support;
>>> it's a bit strange case maybe...
>> It does sound strange relative to the open-source drivers. Although it
>> will be less so once one knows the hw/stack.
>>
>>> Now Mesa can use kms_swrast for
>>> vgem node and allocator node or driver X for GPU node. Obviously we
>>> prefer driver X, but for gralloc to open the right node it would mean
>>> hardcoding the node ID somewhere, which would be disqualifying such
>>> image from being usable on other devices with a different set of
>>> nodes.
>>>
>>> On the other hand, if we just let Mesa open the node itself, it can
>>> check if there is a hardware driver available and favor such nodes,
>>> only falling back to kms_swrast if no hardware-accelerated node could
>>> be found.
>>>
>> Sounds like one needs a way to map render device/node to allocator and
>> vice-versa ? We did a few similar ones (render <> card/primary node)
>> in libdrm, not sure what's the better location for that in your
>> case/stack.
>>
>>>>
>>>>> This is actually the solution that is flaky
>>>>> and can open wrong render node, because Mesa at least can open a node
>>>>> that it can use for rendering.
>>>>>  - it's just yet another private API to rely on and it's not even
>>>>> strictly necessary, because Mesa can open the device itself. My
>>>>> intention is to remove the need for any private APIs from gralloc
>>>>> implementation, including making the handle opaque, which is what
>>>>> something pretending to be called HAL should do.
>>>>>
>>>> Yes, the HAL should provide generic/opaque handles to users. At the
>>>> same time having a private interface between (vendor) specific driver
>>>> components (gralloc and EGL here) is not that crazy of an idea is it ?
>>>> I'd imagine that at least a few vendors have them in their stack.
>>>
>>> This is true for a typical Android stack, where all components come
>>> from one vendor and typically are hand crafted just for one particular
>>> device (or set of similar devices based on the same platform/SoC).
>>>
>>> We are exploring a slightly new territory, though.
>>>
>> Agreed, we are. I think your diverse experience with various stacks
>> allows you to see (and foresee) a few more use-cases that the
>> open-source opens. Thus some confusion along the say ;-)
>>
>>>>
>>>> One of the reasons behind gbm_gralloc was so that one can reuse the
>>>> existing DRI screen, thus granting access to the buffer metadata/etc.
>>>> making sharing easier/possible. Using it one can safely drop the final
>>>> EGL image restriction [1].
>>>>
>>>> The alternative is to have an explicit interface between gralloc and
>>>> EGL. Doing that may be the better option in the long term, but it
>>>> could take some time to materialise.
>>>>
>>>
>>> I think I'm missing the point of reusing the existing DRI screen for
>>> sharing things. I thought DMA-buf/PRIME was supposed to be used for
>>> this purpose. May I ask you to elaborate a bit on this topic,
>>> preferably with some example use cases?
>>>
>>>>> On top of that, there are already EGL DRI2 backends which open the
>>>>> device on their own, i.e. surfaceless and drm.
>>>>>
>>>> If you check the archives you'll see that there were concerns about
>>>> the direct probing in surfaceless. Furthermore I believe that it was
>>>> only ever tested with the [1] workaround and never with a pure
>>>> upstream mesa. The drm platform reuses the gbm provided DRI primitives
>>>> - exactly like gbm_gralloc :-)
>>>
>>> I can see following code in platform_drm:
>>>
>>>    gbm = disp->PlatformDisplay;
>>>    if (gbm == NULL) {
>>>       char buf[64];
>>>       int n = snprintf(buf, sizeof(buf), DRM_DEV_NAME, DRM_DIR_NAME, 0);
>>>       if (n != -1 && n < sizeof(buf))
>>>          fd = loader_open_device(buf);
>>>       if (fd < 0)
>>>          fd = loader_open_device("/dev/dri/card0");
>>>       dri2_dpy->own_device = 1;
>>>       gbm = gbm_create_device(fd);
>>>       if (gbm == NULL)
>>>          goto cleanup;
>>>
>>> I'm not sure how likely it is to hit this case, though.
>>>
>>> Anyway, I would be happy to figure out a way to avoid such direct
>>> probing, but at the same time addressing the problems I pointed in my
>>> previous message.
>>>
>> Hmm I'm a genious. Thanks for bearing with me :-)
>>
>> So from a closer look: The (another?) fd is used solely for wayland
>> "comms". Not entirely sure how it all works and/or if it's the corrent
>> thing to do. The DRI screen/other primitives are the GBM ones,
>>
>>>>
>>>> TL:DR Abstracting things may be the better route in the long term, but
>>>> reusing the DRI primitives (which starts with reusing the fd) and thus
>>>> getting buffer sharing to work is the more robust option that works
>>>> with open-source drivers.
>>>>
>>>> All of the above is my take on things, based on previous discussions
>>>> with Rob H/Rob C/Eric A. Although I could have misunderstood things ?
>>>
>>> I think I'm failing to understand the buffer sharing problem you
>>> mentioned several times. Could we maybe discuss this on IRC some time
>>> this week? I'm sitting on #dri-devel all the time, but should be at
>>> the keyboard between 11-24 JST (3-16 GMT).
>>>
>> I don't recall the details, but the gist was:
>>  - In the gbm_gralloc case, everything is done in Mesa effectively.
>> gbm_gralloc is a simple wrapper (1kloc incl. white space, comments,
>> ...) that translates the Android specifics (USAGE hints, HAL formats,
>> etc) and has some basic locking.
>> - Since we have essentially the same driver (mesa) managing both
>> gralloc and EGL we can ensure that lifetime (refcounting) is correct
>> and we can share any metadata that we might need (alongside the fd) is
>> available.
>>
>> Hope Rob H (robher) or Rob C (robclark) will be able to share some
>> greater insights.
>
> So the root issue is that a single open() of the drm device gives you
> only a single gem handle namespace.  In the upstream drivers, we have
> tricks to ensure we end up with only a single pipe_screen for a given
> drm device (so that existing logic in the form of hashtables ensures
> we don't end up w/ two handles for the same underlying bo).
>
> Note that we have the same issue with, for example, gl<->vdpau interop
> (where same device fd is used).
>
> Part of this problem is, I think, avoided by not sharing the same
> open() (ie. via dup()/etc).  *But* I think that would cause problems
> for the way we ensure a single pipe_screen per process for a given drm
> device node.
>
Thanks Rob !

Just double-checking - wasn't there a subtlety wrt the prime fds ? Or
was the whole dance solely for sharing the gem handles and avoid
duplicating the same bo in both prime and gem lists/hash tables ?

> What I'm wondering about, for the cases where you have separate
> render-only and kms-only devices, is whether gbm_gralloc could just
> use two different gbm instances in that case.  One instance from the
> gl driver, and another from minigbm for display.
>
In the case of wrapped drivers (props to Thierry for the idea) which
target individual SoC designs, internally the DRI module will have
access to both. While we could expose the fds selectively on the GBM
level, flipping between the two is a matter of calling the respective
libdrm helper.

If we consider the case where the allocator is separate, all we need
is to add a gralloc util function. The latter of which maps between
the kms/render and allocator, providing "the correct" device (node) fd
to mesa, via GRALLOC_MODULE_PERFORM_GET_DRM_FD (or similar).

This way, as we get some/more open-source users for a similar
interface in gbm, we could just use it internally in mesa (libEGL)
instead of GET_DRM_FD.

> That does bring up the question, I think, of how to have multiple gbm
> instances from different vendors.  (gbmvnd ftw!)
>
> I think what we do is split out the gbm interface (front-end) from
> mesa into it's own git tree.  We already internally have a separation
> of gbm interface from back-end which populates all the gbm_device fxn
> ptrs.  I think just extract that out into "gbmvnd", and then use
> mini-gbm for the kms-only devices, and the existing mesa gbm backend
> for mesa gl drivers (on blob gl drivers are on their own to sort out
> there mess, but at least we gave them a way so that gbm_gralloc can
> work).
>
> Then it is just a question of how the loader works.  But I guess maybe
> we can copy how glvnd does it?  I haven't looked too closely, but iirc
> a config file of some sort?
>
Yes, the frontend-backend separation is there, but I don't think we
need (yet we could) move libgbm to separate repo. If that will coerce
vendors to stop making local changes to the frontend - sure thing,
otherwise there's little benefit imho. Even in master we can easily
revert (bring back) the backends loading machinery, if we have actual
users for it ;-)

Still, the "the kms/render <> allocator helper" idea makes sense even
without any of those GBM specifics ?

Regards,
Emil