[Mesa-dev] [PATCH 07/10] egl/android: Make drm_gralloc headers optional

Tue Jul 19 15:58:18 UTC 2016

On Tue, Jul 19, 2016 at 11:40 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> On 19 July 2016 at 14:36, Rob Clark <robdclark at gmail.com> wrote:
>> On Tue, Jul 19, 2016 at 6:54 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>> On 19 July 2016 at 04:21, Tomasz Figa <tfiga at chromium.org> wrote:
>>>> On Tue, Jul 19, 2016 at 2:35 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>> On 18 July 2016 at 16:38, Tomasz Figa <tfiga at chromium.org> wrote:
>>>>>> On Mon, Jul 18, 2016 at 11:58 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>>>> On 18 July 2016 at 13:02, Tomasz Figa <tfiga at chromium.org> wrote:
>>>>>>>> On Mon, Jul 18, 2016 at 7:28 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>>>>>> Hi Tomasz,
>>>>>>>>>
>>>>>>>>> On 15 July 2016 at 08:53, Tomasz Figa <tfiga at chromium.org> wrote:
>>>>>>>>>
>>>>>>>>>> +#define DRM_RENDER_DEV_NAME  "%s/renderD%d"
>>>>>>>>>> +
>>>>>>>>>> +static int
>>>>>>>>>> +droid_open_device(_EGLDisplay *dpy)
>>>>>>>>>> +{
>>>>>>>>>> +   struct dri2_egl_display *dri2_dpy = dpy->DriverData;
>>>>>>>>>> +   const int limit = 64;
>>>>>>>>>> +   const int base = 128;
>>>>>>>>>> +   int fd;
>>>>>>>>>> +   int i;
>>>>>>>>>> +
>>>>>>>>>> +   for (i = 0; i < limit; ++i) {
>>>>>>>>>> +      char *card_path;
>>>>>>>>>> +      if (asprintf(&card_path, DRM_RENDER_DEV_NAME, DRM_DIR_NAME, base + i) < 0)
>>>>>>>>> Why do we need any of this ? What gralloc implementation are you guys using ?
>>>>>>>>
>>>>>>>> We are using our heavily rewritten fork of some old drm_gralloc
>>>>>>>> release. It supports only render nodes and PRIME FDs and doesn't
>>>>>>>> export the DRI device FD outside of its internals (which isn't
>>>>>>>> actually even fully correct, at least for PRIME and render nodes, see
>>>>>>>> my reply to Rob's comments).
>>>>>>>>
>>>>>>> That explain it, since https://chromium.googlesource.com/ does not
>>>>>>> have gralloc, and
>>>>>>> https://android.googlesource.com/platform/external/drm_gralloc/ has
>>>>>>> both the DRM_FD define and the gem/flink function(s)?
>>>>>>>
>>>>>>> Can I suggest porting the fd drm_gralloc/gbm_gralloc patches to your
>>>>>>> private copy/repo. This way we'll have some consistency throughout
>>>>>>> gralloc implementations
>>>>>>
>>>>>> I'd prefer if any code using flink names was not added back. On top of
>>>>>> that, our drm_gralloc doesn't really have much in common with that
>>>>>> from android-x86 anymore (as I said, it was heavily rewritten) and
>>>>>> there is not even a chance that with its current design flink names
>>>>>> could even work.
>>>>>>
>>>>>> Also I'm wondering why we want to consider current brokenness of
>>>>>> drm_gralloc as something to be consistent with. It's supposed to be a
>>>>>> HAL library providing an uniform abstraction, but it exports private
>>>>>> APIs on the side instead. Moreover, as I mentioned before, flink names
>>>>>> are considered insecure and it would be really much better if we could
>>>>>> just forget about them.
>>>>>>
>>>>>>> and you can use gbm_gralloc directly in the
>>>>>>> (hopefully) not too distant future.
>>>>>>
>>>>>> I agree with this part, though. gbm_gralloc is definitely something
>>>>>> that we might want to migrate to in the future. Although it's a bit
>>>>>> lacking at the moment, so it might need a bit more time to develop the
>>>>>> missing bits. [I'm CCing Gurchetan, who was investigating GBM-backed
>>>>>> gralloc usable for our purposes.]
>>>>>>
>>>>>> In any case, the missing flink API is quite easy to handle and can be
>>>>>> just stubbed out in a local header as you suggested. I don't think it
>>>>>> would hurt anyone and would definitely help us and anyone not willing
>>>>>> to export any private APIs from their gralloc and rely only on the
>>>>>> public HAL API.
>>>>>>
>>>>> Looks like I wasn't clear enough here, realyl sorry about that. No
>>>>> objection on nuking _any_ of the gem/flink paths, but hoping to have
>>>>> the behaviour consistent with the one described in
>>>>> get_native_buffer_fd.
>>>>
>>>> Did you mean having the PRIME FD in native_handle_t::data[0]?
>>>>
>>>> If so, it's more or less guaranteed by the API, because all file
>>>> descriptors in handle have to be stored in first N (equals to
>>>> native_handle_t::numFds) ints of native_handle_t::data[] for
>>>> respective general code to properly transfer the FDs through binder
>>>> when sharing between processes.
>>>>
>>>> Our gralloc currently supports only one PRIME FD per buffer (no
>>>> separate memory planes for planar YUV) and stores it exactly in
>>>> native_handle_t::data[0].
>>>>
>>> Wasn't sure if the PRIME FD is at idx 0. Glad to hear it's there, thanks.
>>>
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Afaict the latter must provide reasonable result for
>>>>>>>>> hw_get_module(GRALLOC_HARDWARE_MODULE_ID...) and as it's missing the
>>>>>>>>> perform hook existing code should work just fine. Right ?
>>>>>>>>
>>>>>>>> Existing code would fail with -1 as file descriptor, wouldn't it? Or
>>>>>>>> I'm failing to see something?
>>>>>>>>
>>>>>>> Nope you're spot on - I had a dull moment. May I suggest revering the
>>>>>>> patch which removed the GRALLOC_MODULE_PERFORM_GET_DRM_FD handling in
>>>>>>> your gralloc ? Reason being is that the proposed code is very 'flaky'
>>>>>>> and can open the wrong render node on systems which have more than
>>>>>>> one.
>>>>>>
>>>>>> I think the answer is a bit of yes and no at the same time.
>>>>>>
>>>>>> Starting with no, it's incorrect for gralloc to share the DRI device
>>>>>> FD with Mesa for multiple reasons:
>>>>>>  - there are cases when the allocator used is different that the render node,
>>>>> Can you please provide an example how the current open-source
>>>>> gralloc/EGL stack might hit this ? Only a mix of closed and
>>>>> open-source components comes to mind :-\
>>>>
>>>> Well, yes. I don't think I can hide the fact that we have to use
>>>> closed source components on some platforms. To put it simply, that's
>>>> the reality of non-Intel world. On Intel-based platforms we indeed
>>>> have the same allocating and rendering device.
>>>>
>>>> I can also imagine some users having some other custom gralloc, for
>>>> example based on Ion (which I personally don't like, but some people
>>>> prefer...). I just want the existing Mesa code to be as less tied to
>>>> any private interfaces and as much compatible with other solutions as
>>>> possible.
>>>>
>>> Fully, agree having something targeting a single (set) of usecases isn't ideal.
>>>
>>>> I might be able to give one more use case here as well, but need to
>>>> consult with some people internally first.
>>>>
>>>>>
>>>>>>  - gralloc and Mesa might be stepping over each other - they would
>>>>>> share the GEM handle namespace, which is not refcounted,
>>>>> When working with gbm_gralloc that is a bugfix, imho. Other
>>>>> implementations such as drm_gralloc (and/or minigbm ?) might be more
>>>>> susceptible to such issues.
>>>>>
>>>>>>  - it is generally undesirable to let an external library mess with
>>>>>> graphics context, because it is bug-prone (see above).
>>>>>>
>>>>> Hmm true, exposing this to users is not a good idea. We could keep it
>>>>> as a interface private between {drm,gbm,X}gralloc and mesa.
>>>>
>>>> What I mean is that {drm,gbm,X}gralloc might be doing some operations
>>>> on the DRI FD it shares with Mesa, which is bug prone, as in the
>>>> example above.
>>>>
>>>>>
>>>>>> Then we have a small yes: We don't have to share the same FD, we can
>>>>>> just make the implementation of that perform call create a new FD
>>>>>> every time it's called.
>>>>>>
>>>>>> However, after a bit of reflection, there is another no, because:
>>>>>> - gralloc has no knowledge what drivers are available in Mesa, so in
>>>>>> case there are multiple DRI nodes, it has no way to know which one to
>>>>>> open for Mesa to be happy.
>>>>> I'm afraid I cannot parse that, can you please elaborate ?
>>>>
>>>> It basically boils down to the problem of using different allocator
>>>> and different rendering device.
>>>>
>>>> Let's take one of our example systems. There are 3 DRI nodes, vgem,
>>>> allocator (GEM-enabled display subsystem) and GPU (no GEM support;
>>>> it's a bit strange case maybe...
>>> It does sound strange relative to the open-source drivers. Although it
>>> will be less so once one knows the hw/stack.
>>>
>>>> Now Mesa can use kms_swrast for
>>>> vgem node and allocator node or driver X for GPU node. Obviously we
>>>> prefer driver X, but for gralloc to open the right node it would mean
>>>> hardcoding the node ID somewhere, which would be disqualifying such
>>>> image from being usable on other devices with a different set of
>>>> nodes.
>>>>
>>>> On the other hand, if we just let Mesa open the node itself, it can
>>>> check if there is a hardware driver available and favor such nodes,
>>>> only falling back to kms_swrast if no hardware-accelerated node could
>>>> be found.
>>>>
>>> Sounds like one needs a way to map render device/node to allocator and
>>> vice-versa ? We did a few similar ones (render <> card/primary node)
>>> in libdrm, not sure what's the better location for that in your
>>> case/stack.
>>>
>>>>>
>>>>>> This is actually the solution that is flaky
>>>>>> and can open wrong render node, because Mesa at least can open a node
>>>>>> that it can use for rendering.
>>>>>>  - it's just yet another private API to rely on and it's not even
>>>>>> strictly necessary, because Mesa can open the device itself. My
>>>>>> intention is to remove the need for any private APIs from gralloc
>>>>>> implementation, including making the handle opaque, which is what
>>>>>> something pretending to be called HAL should do.
>>>>>>
>>>>> Yes, the HAL should provide generic/opaque handles to users. At the
>>>>> same time having a private interface between (vendor) specific driver
>>>>> components (gralloc and EGL here) is not that crazy of an idea is it ?
>>>>> I'd imagine that at least a few vendors have them in their stack.
>>>>
>>>> This is true for a typical Android stack, where all components come
>>>> from one vendor and typically are hand crafted just for one particular
>>>> device (or set of similar devices based on the same platform/SoC).
>>>>
>>>> We are exploring a slightly new territory, though.
>>>>
>>> Agreed, we are. I think your diverse experience with various stacks
>>> allows you to see (and foresee) a few more use-cases that the
>>> open-source opens. Thus some confusion along the say ;-)
>>>
>>>>>
>>>>> One of the reasons behind gbm_gralloc was so that one can reuse the
>>>>> existing DRI screen, thus granting access to the buffer metadata/etc.
>>>>> making sharing easier/possible. Using it one can safely drop the final
>>>>> EGL image restriction [1].
>>>>>
>>>>> The alternative is to have an explicit interface between gralloc and
>>>>> EGL. Doing that may be the better option in the long term, but it
>>>>> could take some time to materialise.
>>>>>
>>>>
>>>> I think I'm missing the point of reusing the existing DRI screen for
>>>> sharing things. I thought DMA-buf/PRIME was supposed to be used for
>>>> this purpose. May I ask you to elaborate a bit on this topic,
>>>> preferably with some example use cases?
>>>>
>>>>>> On top of that, there are already EGL DRI2 backends which open the
>>>>>> device on their own, i.e. surfaceless and drm.
>>>>>>
>>>>> If you check the archives you'll see that there were concerns about
>>>>> the direct probing in surfaceless. Furthermore I believe that it was
>>>>> only ever tested with the [1] workaround and never with a pure
>>>>> upstream mesa. The drm platform reuses the gbm provided DRI primitives
>>>>> - exactly like gbm_gralloc :-)
>>>>
>>>> I can see following code in platform_drm:
>>>>
>>>>    gbm = disp->PlatformDisplay;
>>>>    if (gbm == NULL) {
>>>>       char buf[64];
>>>>       int n = snprintf(buf, sizeof(buf), DRM_DEV_NAME, DRM_DIR_NAME, 0);
>>>>       if (n != -1 && n < sizeof(buf))
>>>>          fd = loader_open_device(buf);
>>>>       if (fd < 0)
>>>>          fd = loader_open_device("/dev/dri/card0");
>>>>       dri2_dpy->own_device = 1;
>>>>       gbm = gbm_create_device(fd);
>>>>       if (gbm == NULL)
>>>>          goto cleanup;
>>>>
>>>> I'm not sure how likely it is to hit this case, though.
>>>>
>>>> Anyway, I would be happy to figure out a way to avoid such direct
>>>> probing, but at the same time addressing the problems I pointed in my
>>>> previous message.
>>>>
>>> Hmm I'm a genious. Thanks for bearing with me :-)
>>>
>>> So from a closer look: The (another?) fd is used solely for wayland
>>> "comms". Not entirely sure how it all works and/or if it's the corrent
>>> thing to do. The DRI screen/other primitives are the GBM ones,
>>>
>>>>>
>>>>> TL:DR Abstracting things may be the better route in the long term, but
>>>>> reusing the DRI primitives (which starts with reusing the fd) and thus
>>>>> getting buffer sharing to work is the more robust option that works
>>>>> with open-source drivers.
>>>>>
>>>>> All of the above is my take on things, based on previous discussions
>>>>> with Rob H/Rob C/Eric A. Although I could have misunderstood things ?
>>>>
>>>> I think I'm failing to understand the buffer sharing problem you
>>>> mentioned several times. Could we maybe discuss this on IRC some time
>>>> this week? I'm sitting on #dri-devel all the time, but should be at
>>>> the keyboard between 11-24 JST (3-16 GMT).
>>>>
>>> I don't recall the details, but the gist was:
>>>  - In the gbm_gralloc case, everything is done in Mesa effectively.
>>> gbm_gralloc is a simple wrapper (1kloc incl. white space, comments,
>>> ...) that translates the Android specifics (USAGE hints, HAL formats,
>>> etc) and has some basic locking.
>>> - Since we have essentially the same driver (mesa) managing both
>>> gralloc and EGL we can ensure that lifetime (refcounting) is correct
>>> and we can share any metadata that we might need (alongside the fd) is
>>> available.
>>>
>>> Hope Rob H (robher) or Rob C (robclark) will be able to share some
>>> greater insights.
>>
>> So the root issue is that a single open() of the drm device gives you
>> only a single gem handle namespace.  In the upstream drivers, we have
>> tricks to ensure we end up with only a single pipe_screen for a given
>> drm device (so that existing logic in the form of hashtables ensures
>> we don't end up w/ two handles for the same underlying bo).
>>
>> Note that we have the same issue with, for example, gl<->vdpau interop
>> (where same device fd is used).
>>
>> Part of this problem is, I think, avoided by not sharing the same
>> open() (ie. via dup()/etc).  *But* I think that would cause problems
>> for the way we ensure a single pipe_screen per process for a given drm
>> device node.
>>
> Thanks Rob !
>
> Just double-checking - wasn't there a subtlety wrt the prime fds ? Or
> was the whole dance solely for sharing the gem handles and avoid
> duplicating the same bo in both prime and gem lists/hash tables ?

yes, but it is kinda the same issue.. importing the same prime fd
twice gives you different gem handles

>> What I'm wondering about, for the cases where you have separate
>> render-only and kms-only devices, is whether gbm_gralloc could just
>> use two different gbm instances in that case.  One instance from the
>> gl driver, and another from minigbm for display.
>>
> In the case of wrapped drivers (props to Thierry for the idea) which
> target individual SoC designs, internally the DRI module will have
> access to both. While we could expose the fds selectively on the GBM
> level, flipping between the two is a matter of calling the respective
> libdrm helper.

I think the wrapped-driver approach helps when the gl driver is mesa..
but not otherwise..

not a case that *I* care too much about, although I suspect the CrOS
folks don't have that luxury of not caring ;-)

> If we consider the case where the allocator is separate, all we need
> is to add a gralloc util function. The latter of which maps between
> the kms/render and allocator, providing "the correct" device (node) fd
> to mesa, via GRALLOC_MODULE_PERFORM_GET_DRM_FD (or similar).

I'm mostly assuming "allocator is separate" is another way of saying
"we need to allocate scanout buffers from kms device"?

Not entirely sure what a gralloc util fxn would help us with, but
could be missing something.  I'm assuming there is just
one-true-gralloc (gbm-gralloc), so any logic of deciding which gbm
instance to allocate from would be internal to gbm-gralloc.  Although
possibly some per-platform config/hints about which gbm device to
prefer for allocating scanout buffers?

> This way, as we get some/more open-source users for a similar
> interface in gbm, we could just use it internally in mesa (libEGL)
> instead of GET_DRM_FD.
>
>> That does bring up the question, I think, of how to have multiple gbm
>> instances from different vendors.  (gbmvnd ftw!)
>>
>> I think what we do is split out the gbm interface (front-end) from
>> mesa into it's own git tree.  We already internally have a separation
>> of gbm interface from back-end which populates all the gbm_device fxn
>> ptrs.  I think just extract that out into "gbmvnd", and then use
>> mini-gbm for the kms-only devices, and the existing mesa gbm backend
>> for mesa gl drivers (on blob gl drivers are on their own to sort out
>> there mess, but at least we gave them a way so that gbm_gralloc can
>> work).
>>
>> Then it is just a question of how the loader works.  But I guess maybe
>> we can copy how glvnd does it?  I haven't looked too closely, but iirc
>> a config file of some sort?
>>
> Yes, the frontend-backend separation is there, but I don't think we
> need (yet we could) move libgbm to separate repo. If that will coerce
> vendors to stop making local changes to the frontend - sure thing,
> otherwise there's little benefit imho. Even in master we can easily
> revert (bring back) the backends loading machinery, if we have actual
> users for it ;-)

I think the split is probably mostly needed since a given process
could only have one .so providing the gbm interface syms.. so for
hypothetical example, mixing intel/mesa and nv/blob on a laptop w/
igpu + dgpu..

BR,
-R

> Still, the "the kms/render <> allocator helper" idea makes sense even
> without any of those GBM specifics ?
>
> Regards,
> Emil