[Mesa-dev] [PATCH 07/10] egl/android: Make drm_gralloc headers optional

Tue Jul 19 03:21:34 UTC 2016

On Tue, Jul 19, 2016 at 2:35 AM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
> On 18 July 2016 at 16:38, Tomasz Figa <tfiga at chromium.org> wrote:
>> On Mon, Jul 18, 2016 at 11:58 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>> On 18 July 2016 at 13:02, Tomasz Figa <tfiga at chromium.org> wrote:
>>>> On Mon, Jul 18, 2016 at 7:28 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote:
>>>>> Hi Tomasz,
>>>>>
>>>>> On 15 July 2016 at 08:53, Tomasz Figa <tfiga at chromium.org> wrote:
>>>>>
>>>>>> +#define DRM_RENDER_DEV_NAME  "%s/renderD%d"
>>>>>> +
>>>>>> +static int
>>>>>> +droid_open_device(_EGLDisplay *dpy)
>>>>>> +{
>>>>>> +   struct dri2_egl_display *dri2_dpy = dpy->DriverData;
>>>>>> +   const int limit = 64;
>>>>>> +   const int base = 128;
>>>>>> +   int fd;
>>>>>> +   int i;
>>>>>> +
>>>>>> +   for (i = 0; i < limit; ++i) {
>>>>>> +      char *card_path;
>>>>>> +      if (asprintf(&card_path, DRM_RENDER_DEV_NAME, DRM_DIR_NAME, base + i) < 0)
>>>>> Why do we need any of this ? What gralloc implementation are you guys using ?
>>>>
>>>> We are using our heavily rewritten fork of some old drm_gralloc
>>>> release. It supports only render nodes and PRIME FDs and doesn't
>>>> export the DRI device FD outside of its internals (which isn't
>>>> actually even fully correct, at least for PRIME and render nodes, see
>>>> my reply to Rob's comments).
>>>>
>>> That explain it, since https://chromium.googlesource.com/ does not
>>> have gralloc, and
>>> https://android.googlesource.com/platform/external/drm_gralloc/ has
>>> both the DRM_FD define and the gem/flink function(s)?
>>>
>>> Can I suggest porting the fd drm_gralloc/gbm_gralloc patches to your
>>> private copy/repo. This way we'll have some consistency throughout
>>> gralloc implementations
>>
>> I'd prefer if any code using flink names was not added back. On top of
>> that, our drm_gralloc doesn't really have much in common with that
>> from android-x86 anymore (as I said, it was heavily rewritten) and
>> there is not even a chance that with its current design flink names
>> could even work.
>>
>> Also I'm wondering why we want to consider current brokenness of
>> drm_gralloc as something to be consistent with. It's supposed to be a
>> HAL library providing an uniform abstraction, but it exports private
>> APIs on the side instead. Moreover, as I mentioned before, flink names
>> are considered insecure and it would be really much better if we could
>> just forget about them.
>>
>>> and you can use gbm_gralloc directly in the
>>> (hopefully) not too distant future.
>>
>> I agree with this part, though. gbm_gralloc is definitely something
>> that we might want to migrate to in the future. Although it's a bit
>> lacking at the moment, so it might need a bit more time to develop the
>> missing bits. [I'm CCing Gurchetan, who was investigating GBM-backed
>> gralloc usable for our purposes.]
>>
>> In any case, the missing flink API is quite easy to handle and can be
>> just stubbed out in a local header as you suggested. I don't think it
>> would hurt anyone and would definitely help us and anyone not willing
>> to export any private APIs from their gralloc and rely only on the
>> public HAL API.
>>
> Looks like I wasn't clear enough here, realyl sorry about that. No
> objection on nuking _any_ of the gem/flink paths, but hoping to have
> the behaviour consistent with the one described in
> get_native_buffer_fd.

Did you mean having the PRIME FD in native_handle_t::data[0]?

If so, it's more or less guaranteed by the API, because all file
descriptors in handle have to be stored in first N (equals to
native_handle_t::numFds) ints of native_handle_t::data[] for
respective general code to properly transfer the FDs through binder
when sharing between processes.

Our gralloc currently supports only one PRIME FD per buffer (no
separate memory planes for planar YUV) and stores it exactly in
native_handle_t::data[0].

>
>>>
>>>>>
>>>>> Afaict the latter must provide reasonable result for
>>>>> hw_get_module(GRALLOC_HARDWARE_MODULE_ID...) and as it's missing the
>>>>> perform hook existing code should work just fine. Right ?
>>>>
>>>> Existing code would fail with -1 as file descriptor, wouldn't it? Or
>>>> I'm failing to see something?
>>>>
>>> Nope you're spot on - I had a dull moment. May I suggest revering the
>>> patch which removed the GRALLOC_MODULE_PERFORM_GET_DRM_FD handling in
>>> your gralloc ? Reason being is that the proposed code is very 'flaky'
>>> and can open the wrong render node on systems which have more than
>>> one.
>>
>> I think the answer is a bit of yes and no at the same time.
>>
>> Starting with no, it's incorrect for gralloc to share the DRI device
>> FD with Mesa for multiple reasons:
>>  - there are cases when the allocator used is different that the render node,
> Can you please provide an example how the current open-source
> gralloc/EGL stack might hit this ? Only a mix of closed and
> open-source components comes to mind :-\

Well, yes. I don't think I can hide the fact that we have to use
closed source components on some platforms. To put it simply, that's
the reality of non-Intel world. On Intel-based platforms we indeed
have the same allocating and rendering device.

I can also imagine some users having some other custom gralloc, for
example based on Ion (which I personally don't like, but some people
prefer...). I just want the existing Mesa code to be as less tied to
any private interfaces and as much compatible with other solutions as
possible.

I might be able to give one more use case here as well, but need to
consult with some people internally first.

>
>>  - gralloc and Mesa might be stepping over each other - they would
>> share the GEM handle namespace, which is not refcounted,
> When working with gbm_gralloc that is a bugfix, imho. Other
> implementations such as drm_gralloc (and/or minigbm ?) might be more
> susceptible to such issues.
>
>>  - it is generally undesirable to let an external library mess with
>> graphics context, because it is bug-prone (see above).
>>
> Hmm true, exposing this to users is not a good idea. We could keep it
> as a interface private between {drm,gbm,X}gralloc and mesa.

What I mean is that {drm,gbm,X}gralloc might be doing some operations
on the DRI FD it shares with Mesa, which is bug prone, as in the
example above.

>
>> Then we have a small yes: We don't have to share the same FD, we can
>> just make the implementation of that perform call create a new FD
>> every time it's called.
>>
>> However, after a bit of reflection, there is another no, because:
>> - gralloc has no knowledge what drivers are available in Mesa, so in
>> case there are multiple DRI nodes, it has no way to know which one to
>> open for Mesa to be happy.
> I'm afraid I cannot parse that, can you please elaborate ?

It basically boils down to the problem of using different allocator
and different rendering device.

Let's take one of our example systems. There are 3 DRI nodes, vgem,
allocator (GEM-enabled display subsystem) and GPU (no GEM support;
it's a bit strange case maybe...). Now Mesa can use kms_swrast for
vgem node and allocator node or driver X for GPU node. Obviously we
prefer driver X, but for gralloc to open the right node it would mean
hardcoding the node ID somewhere, which would be disqualifying such
image from being usable on other devices with a different set of
nodes.

On the other hand, if we just let Mesa open the node itself, it can
check if there is a hardware driver available and favor such nodes,
only falling back to kms_swrast if no hardware-accelerated node could
be found.

>
>> This is actually the solution that is flaky
>> and can open wrong render node, because Mesa at least can open a node
>> that it can use for rendering.
>>  - it's just yet another private API to rely on and it's not even
>> strictly necessary, because Mesa can open the device itself. My
>> intention is to remove the need for any private APIs from gralloc
>> implementation, including making the handle opaque, which is what
>> something pretending to be called HAL should do.
>>
> Yes, the HAL should provide generic/opaque handles to users. At the
> same time having a private interface between (vendor) specific driver
> components (gralloc and EGL here) is not that crazy of an idea is it ?
> I'd imagine that at least a few vendors have them in their stack.

This is true for a typical Android stack, where all components come
from one vendor and typically are hand crafted just for one particular
device (or set of similar devices based on the same platform/SoC).

We are exploring a slightly new territory, though.

>
> One of the reasons behind gbm_gralloc was so that one can reuse the
> existing DRI screen, thus granting access to the buffer metadata/etc.
> making sharing easier/possible. Using it one can safely drop the final
> EGL image restriction [1].
>
> The alternative is to have an explicit interface between gralloc and
> EGL. Doing that may be the better option in the long term, but it
> could take some time to materialise.
>

I think I'm missing the point of reusing the existing DRI screen for
sharing things. I thought DMA-buf/PRIME was supposed to be used for
this purpose. May I ask you to elaborate a bit on this topic,
preferably with some example use cases?

>> On top of that, there are already EGL DRI2 backends which open the
>> device on their own, i.e. surfaceless and drm.
>>
> If you check the archives you'll see that there were concerns about
> the direct probing in surfaceless. Furthermore I believe that it was
> only ever tested with the [1] workaround and never with a pure
> upstream mesa. The drm platform reuses the gbm provided DRI primitives
> - exactly like gbm_gralloc :-)

I can see following code in platform_drm:

   gbm = disp->PlatformDisplay;
   if (gbm == NULL) {
      char buf[64];
      int n = snprintf(buf, sizeof(buf), DRM_DEV_NAME, DRM_DIR_NAME, 0);
      if (n != -1 && n < sizeof(buf))
         fd = loader_open_device(buf);
      if (fd < 0)
         fd = loader_open_device("/dev/dri/card0");
      dri2_dpy->own_device = 1;
      gbm = gbm_create_device(fd);
      if (gbm == NULL)
         goto cleanup;

I'm not sure how likely it is to hit this case, though.

Anyway, I would be happy to figure out a way to avoid such direct
probing, but at the same time addressing the problems I pointed in my
previous message.

>
> TL:DR Abstracting things may be the better route in the long term, but
> reusing the DRI primitives (which starts with reusing the fd) and thus
> getting buffer sharing to work is the more robust option that works
> with open-source drivers.
>
> All of the above is my take on things, based on previous discussions
> with Rob H/Rob C/Eric A. Although I could have misunderstood things ?

I think I'm failing to understand the buffer sharing problem you
mentioned several times. Could we maybe discuss this on IRC some time
this week? I'm sitting on #dri-devel all the time, but should be at
the keyboard between 11-24 JST (3-16 GMT).

Best regards,
Tomasz