[Mesa-dev] virgl and vc4 problem on Android

Thu Jun 16 19:20:34 UTC 2016

On Thu, Jun 16, 2016 at 2:57 PM, Rob Herring <robh at kernel.org> wrote:
> On Thu, Jun 16, 2016 at 12:09 PM, Rob Clark <robdclark at gmail.com> wrote:
>> On Thu, Jun 16, 2016 at 12:56 PM, Rob Herring <robh at kernel.org> wrote:
>>> On Thu, Jun 16, 2016 at 11:44 AM, Rob Clark <robdclark at gmail.com> wrote:
>>>> On Wed, Jun 15, 2016 at 8:34 PM, Rob Herring <robh at kernel.org> wrote:
>>>>> In the process of adding RGBX (XB24) format to mesa for Android, I
>>>>> started seeing a new problem that makes the UI stop updating. It
>>>>> happens about when the splash screen is stopped and the lock screen is
>>>>> displayed. The display flickers on mouse movement, and it looks like
>>>>> the screen is flipping to old buffers (like the splash screen after
>>>>> its process exited). It is working fine for freedreno AFAICT, but I am
>>>>> running into a problem with virgl. With virgl, I get the following
>>>>> error:
>>>>>
>>>>> vrend_create_surface: context error reported 1 "surfaceflinger"
>>>>> Illegal resource 1435
>>>>> vrend_report_buffer_error: context error reported 1 "surfaceflinger"
>>>>> Illegal command buffer 329729
>>>>>
>>>>> The addition of the pixel format changes the eglconfig used for the
>>>>> splash screen. If I force the splash screen eglconfig to have an alpha
>>>>> or draw one frame of the splash screen and exit early or disable the
>>>>> splash screen, everything seems fine though I have hit the problem
>>>>> rarely navigating around. I suspect this has nothing to do with the
>>>>> pixel format other than different buffer sizes cause buffers to get
>>>>> reused differently.
>>>>>
>>>>> Now I've started working on getting RPi3 and vc4 working, and it
>>>>> appears to have a similar problem. I'm getting these errors though
>>>>> things go haywire before getting any error message:
>>>>>
>>>>> [   43.846569] [drm:vc4_submit_cl_ioctl] *ERROR* Failed to look up GEM BO 0: 4
>>>>
>>>> at least in the vc4 case, I suspect you need a similar bit of winsys
>>>> magic to ensure the same pipe_screen is returned for any given drm
>>>> device fd.  (Or did someone already add that?)
>>>
>>> That problem should be gone with GBM gralloc, right?
>>
>> *maaaybe*..
>>
>> It, like the gralloc-drm-pipe approach, means we have a pipe_screen
>> (vs. the other drm-gralloc backends which were using libdrm_xyz
>> directly), so it was going through the logic to avoid duplicate
>> pipe_screen's (for the drivers which had that).
>>
>> Maybe w/ gbm, everything ends up sharing the same pipe_screen?  I'm
>> not really sure, since I guess both GL and gralloc are creating a gbm
>> device?
>>
>> I guess easy enough to put some debug print in vc4_screen_create() to
>> confirm.  But the sort of errors you are seeing make me suspicious.
>
> Uhh, well looks like that is a problem for vc4:
>
> 01-01 00:00:07.295   127   127 W VC4     : vc4_screen_create
> 01-01 00:00:07.334   127   127 W VC4     : vc4_screen_create
> 01-01 00:00:08.349   205   223 W VC4     : vc4_screen_create
> 01-01 00:00:08.352   205   223 W VC4     : vc4_screen_create
> 01-01 00:00:35.467   437   488 W VC4     : vc4_screen_create
> 01-01 00:00:35.477   437   488 W VC4     : vc4_screen_create
> 01-01 00:00:39.041   511   511 W VC4     : vc4_screen_create
> 01-01 00:00:43.385   511   798 W VC4     : vc4_screen_create
> 01-01 00:00:44.135   718   718 W VC4     : vc4_screen_create
> 01-01 00:00:44.202   718   923 W VC4     : vc4_screen_create
>
>> Possibly the "libdrm equivalent" part of vc4 needs to do more to avoid
>> re-importing the same handle multiple times?
>
> Maybe time for the common implementation.

yeah, probably

> This doesn't explain the virgl case though as I already fixed this
> problem. The log below is from virgl.

I haven't looked closely at virgl yet, but if it has some sort of bo
cache, perhaps it is allowing shared buffers into the cache??  Not
sure, but I'd be on the lookout for things like that..

Presumably it already has a hashtable to deal w/ multiple-imports of
the same flink name?

BR,
-R

>>>> In both virgl and vc4 case, you need to make sure that shared
>>>> (exported/imported) buffers don't end up in the bo cache.
>>>
>>> I've disabled the cache (in the gallium drv, right?) and still see problems.
>>>
>>> I am seeing a double GEM_CLOSE. I'm not sure how that is happening.
>>> One of them must be hwc releasing an imported buffer, but it's all in
>>> the same thread.
>>>
>>> [    7.024495] [drm] pid=1310, dev=0xe280, auth=0, handle=17, ret = 0,
>>> DRM_IOCTL_GEM_CLOSE
>>> [    7.025379] [drm] pid=1310, dev=0xe280, auth=0, handle=23, ret = 0,
>>> DRM_IOCTL_PRIME_FD_TO_HANDLE
>>> [    7.026663] [drm] pid=1310, dev=0xe280, auth=0, handle=10, ret = 0,
>>> DRM_IOCTL_GEM_CLOSE
>>> [    7.027343] [drm] pid=1310, dev=0xe200, auth=1, handle=23, ret = 0,
>>> DRM_IOCTL_PRIME_FD_TO_HANDLE
>>> [    7.035098] [drm] pid=1333, dev=0xe200, auth=1, handle=1, ret = 0,
>>> DRM_IOCTL_GEM_CLOSE
>>> [    7.036093] [drm] pid=1310, dev=0xe280, auth=0, handle=17, ret =
>>> -22, DRM_IOCTL_GEM_CLOSE
>>
>> sure would be nice if there was a dump_stack() that showed you the
>> userspace stack too ;-)
>>
>> (but maybe dumb question, is pid unique per process or thread?)
>
> Ignoring namespaces, pids are globally unique.
>
> Rob