[Mesa-dev] virgl and vc4 problem on Android

Thu Jun 16 17:09:26 UTC 2016

On Thu, Jun 16, 2016 at 12:56 PM, Rob Herring <robh at kernel.org> wrote:
> On Thu, Jun 16, 2016 at 11:44 AM, Rob Clark <robdclark at gmail.com> wrote:
>> On Wed, Jun 15, 2016 at 8:34 PM, Rob Herring <robh at kernel.org> wrote:
>>> In the process of adding RGBX (XB24) format to mesa for Android, I
>>> started seeing a new problem that makes the UI stop updating. It
>>> happens about when the splash screen is stopped and the lock screen is
>>> displayed. The display flickers on mouse movement, and it looks like
>>> the screen is flipping to old buffers (like the splash screen after
>>> its process exited). It is working fine for freedreno AFAICT, but I am
>>> running into a problem with virgl. With virgl, I get the following
>>> error:
>>>
>>> vrend_create_surface: context error reported 1 "surfaceflinger"
>>> Illegal resource 1435
>>> vrend_report_buffer_error: context error reported 1 "surfaceflinger"
>>> Illegal command buffer 329729
>>>
>>> The addition of the pixel format changes the eglconfig used for the
>>> splash screen. If I force the splash screen eglconfig to have an alpha
>>> or draw one frame of the splash screen and exit early or disable the
>>> splash screen, everything seems fine though I have hit the problem
>>> rarely navigating around. I suspect this has nothing to do with the
>>> pixel format other than different buffer sizes cause buffers to get
>>> reused differently.
>>>
>>> Now I've started working on getting RPi3 and vc4 working, and it
>>> appears to have a similar problem. I'm getting these errors though
>>> things go haywire before getting any error message:
>>>
>>> [   43.846569] [drm:vc4_submit_cl_ioctl] *ERROR* Failed to look up GEM BO 0: 4
>>
>> at least in the vc4 case, I suspect you need a similar bit of winsys
>> magic to ensure the same pipe_screen is returned for any given drm
>> device fd.  (Or did someone already add that?)
>
> That problem should be gone with GBM gralloc, right?

*maaaybe*..

It, like the gralloc-drm-pipe approach, means we have a pipe_screen
(vs. the other drm-gralloc backends which were using libdrm_xyz
directly), so it was going through the logic to avoid duplicate
pipe_screen's (for the drivers which had that).

Maybe w/ gbm, everything ends up sharing the same pipe_screen?  I'm
not really sure, since I guess both GL and gralloc are creating a gbm
device?

I guess easy enough to put some debug print in vc4_screen_create() to
confirm.  But the sort of errors you are seeing make me suspicious.

Possibly the "libdrm equivalent" part of vc4 needs to do more to avoid
re-importing the same handle multiple times?

>> In both virgl and vc4 case, you need to make sure that shared
>> (exported/imported) buffers don't end up in the bo cache.
>
> I've disabled the cache (in the gallium drv, right?) and still see problems.
>
> I am seeing a double GEM_CLOSE. I'm not sure how that is happening.
> One of them must be hwc releasing an imported buffer, but it's all in
> the same thread.
>
> [    7.024495] [drm] pid=1310, dev=0xe280, auth=0, handle=17, ret = 0,
> DRM_IOCTL_GEM_CLOSE
> [    7.025379] [drm] pid=1310, dev=0xe280, auth=0, handle=23, ret = 0,
> DRM_IOCTL_PRIME_FD_TO_HANDLE
> [    7.026663] [drm] pid=1310, dev=0xe280, auth=0, handle=10, ret = 0,
> DRM_IOCTL_GEM_CLOSE
> [    7.027343] [drm] pid=1310, dev=0xe200, auth=1, handle=23, ret = 0,
> DRM_IOCTL_PRIME_FD_TO_HANDLE
> [    7.035098] [drm] pid=1333, dev=0xe200, auth=1, handle=1, ret = 0,
> DRM_IOCTL_GEM_CLOSE
> [    7.036093] [drm] pid=1310, dev=0xe280, auth=0, handle=17, ret =
> -22, DRM_IOCTL_GEM_CLOSE

sure would be nice if there was a dump_stack() that showed you the
userspace stack too ;-)

(but maybe dumb question, is pid unique per process or thread?)

BR,
-R

> Rob