[Nouveau] Request for info on a big problems with nouveau driver
Stéphane Marchesin
marcheu at chromium.org
Mon Mar 18 23:31:46 UTC 2019
On Sat, Mar 16, 2019 at 12:33 PM Mauro Rossi <issor.oruam at gmail.com> wrote:
> Hi Stéphane,
>
> the good news is that Kerol Herbst patches are mitigating effectively
> the GPU lockup.
> it would really be a pity to loose and abandon nouveau driver in
> android-x86,
> while intel, radeon and amdgpu are working perfectly.
>
> The Android GUI reboots always the same way when bringing back main screen,
> with home button or using square menu button.
>
> I've collected log with drm.debug level 63
> to see what is happening prior to EGL-MAIN: DRI2: failed to create
> screen/ EGL_NOT_INITIALIZED
>
> Full log and tombstone in the attachment,
> could someone in nouveau team decipher the errors?
>
That's a question more appropriate for the nouveau list, I am CCing the
list.
Stéphane
> In the logs there is also the DRM ioctl commands happening before the
> DRI screen error
>
> Mauro
>
> 03-16 18:57:03.615 0 0 E : 00a0 2 base507c_ntfy_set
> 03-16 18:57:03.615 0 0 E : 00000060
> 03-16 18:57:03.615 0 0 E : f0000000
> 03-16 18:57:03.615 0 0 E : 0084 1 base907c_image_set
> 03-16 18:57:03.615 0 0 E : 00000010
> 03-16 18:57:03.615 0 0 E : 00c0 1 base907c_image_set
> 03-16 18:57:03.615 0 0 E : fb0000fe
> 03-16 18:57:03.615 0 0 E : 0400 5 base907c_image_set
> 03-16 18:57:03.615 0 0 E : 00010000
> 03-16 18:57:03.615 0 0 E : 00000000
> 03-16 18:57:03.615 0 0 E : 04000500
> 03-16 18:57:03.615 0 0 E : 00005004
> 03-16 18:57:03.615 0 0 E : 0000cf00
> 03-16 18:57:03.615 0 0 E : 0080 1 base507c_update
> 03-16 18:57:03.615 0 0 E : 00000000
> 03-16 18:57:03.616 2729 4165 W EGL-MAIN: DRI2: failed to create dri
> screen
> 03-16 18:57:03.616 2729 4165 W EGL-MAIN: DRI2: failed to create screen
> 03-16 18:57:03.617 2729 4165 W libEGL : eglInitialize(0xad3ab800)
> failed (EGL_NOT_INITIALIZED)
> 03-16 18:57:03.617 2729 4165 I system_server:
>
> android::hardware::configstore::V1_0::ISurfaceFlingerConfigs::hasWideColorDisplay
> retrieved: 0
> 03-16 18:57:03.617 2729 4165 I OpenGLRenderer: Initialized EGL, version
> 1.4
> 03-16 18:57:03.617 2729 4165 D OpenGLRenderer: Swap behavior 2
> 03-16 18:57:03.617 2729 4165 F OpenGLRenderer: Failed to choose
> config, error = EGL_NOT_INITIALIZED
> --------- beginning of crash
> 03-16 18:57:03.617 2729 4165 F libc : Fatal signal 6 (SIGABRT),
> code -6 in tid 4165 (RenderThread), pid 2729 (system_server)
>
> On Tue, Mar 5, 2019 at 8:55 AM Mauro Rossi <issor.oruam at gmail.com> wrote:
> >
> > Hi,
> > one of the problems (the Play Store Crash) was resolved with following
> commit:
> >
> http://git.osdn.net/view?p=android-x86/frameworks-base.git;a=commit;h=d488a6c2bbedc06fc22942555d0157e7bf09f135
> >
> > Now the remaining one, affecting the dEQP-EGL multithreading tests and
> > RenderThread in general,
> > has been traced in the attached logs.
> >
> > It seams a problem similar to "a second libEGL call failing" when
> > RenderThread is trying to create dri screen
> > which is killed by Android attempt to load EGL config which fails and
> > it is treated as Fatal.
> > We just need to find the root cause of failure.
> >
> > In the logcat there is a clue of what is happening:
> >
> > --------- beginning of crash
> > 03-04 20:50:56.762 1440 1440 E AndroidRuntime: FATAL EXCEPTION: main
> > 03-04 20:50:56.762 1440 1440 E AndroidRuntime: Process:
> > com.android.systemui, PID: 1440
> > 03-04 20:50:56.762 1440 1440 E AndroidRuntime:
> > java.lang.NullPointerException: Attempt to invoke virtual method
> > 'android.graphics.GraphicBuffer
> > android.graphics.Bitmap.createGraphicBufferHandle()' on a null object
> > reference
> > 03-04 20:50:56.762 1440 1440 E AndroidRuntime: at
> >
> com.android.systemui.recents.views.RecentsTransitionHelper.drawViewIntoGraphicBuffer(RecentsTransitionHelper.java:436)
> >
> > Mauro
> >
> > On Tue, Mar 5, 2019 at 1:29 AM Stéphane Marchesin <marcheu at chromium.org>
> wrote:
> > >
> > >
> > >
> > > On Sat, Mar 2, 2019 at 12:08 AM Mauro Rossi <issor.oruam at gmail.com>
> wrote:
> > >>
> > >> Hi Stéphane,
> > >>
> > >> On Fri, Mar 1, 2019 at 11:24 PM Stéphane Marchesin <
> marcheu at chromium.org> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Mar 1, 2019 at 4:30 AM Mauro Rossi <issor.oruam at gmail.com>
> wrote:
> > >> >>
> > >> >> Hi Stéphane,
> > >> >>
> > >> >> thanks for responding
> > >> >>
> > >> >> On Thu, Feb 28, 2019 at 9:56 PM Stéphane Marchesin <
> marcheu at chromium.org> wrote:
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Tue, Feb 19, 2019 at 6:54 PM Tomasz Figa <tfiga at chromium.org>
> wrote:
> > >> >> >>
> > >> >> >> Hi Mauro,
> > >> >> >>
> > >> >> >> Thanks for your query. I'm not very active in the graphics area
> > >> >> >> anymore, but let me add +Stéphane Marchesin , who should know
> the
> > >> >> >> best.
> > >> >> >>
> > >> >> >> Best regards,
> > >> >> >> Tomasz
> > >> >> >>
> > >> >> >> On Wed, Feb 20, 2019 at 3:00 AM Mauro Rossi <
> issor.oruam at gmail.com> wrote:
> > >> >> >> >
> > >> >> >> > Hi Tomasz,
> > >> >> >> >
> > >> >> >> > I wanted to ask some help, even just some information about
> how
> > >> >> >> > nouveau is working with chromeos minigbm stack, because we
> have big
> > >> >> >> > issues with drm_gralloc and gbm_gralloc.
> > >> >> >> >
> > >> >> >> > nouveau gallium driver does not support Multithreading and
> oreo-x86
> > >> >> >> > has introduced additional RenderThread scenarios which cause
> > >> >> >> > instability.
> > >> >> >> >
> > >> >> >> > dEQP-EGL multithreding tests are causing GUI restarts, even
> with
> > >> >> >> > latest Karol Herbst patches with per gl context mutex locking
> and per
> > >> >> >> > fence mutex locking,
> > >> >> >> > he said there is an additional race condition that may
> require another
> > >> >> >> > major rewrite,
> > >> >> >> > but he did not mention which additional race condition.
> > >> >> >> >
> > >> >> >> > I wanted to ask you just some info, in case you may have
> them, or
> > >> >> >> > suggestions on how to avoid the problem.
> > >> >> >> >
> > >> >> >> > 1) Are you aware of problems with chromeos with nouveau MT
> and how
> > >> >> >> > they were avoided?
> > >> >> >> > At the moment I can boot with minigbm, but the navigation bar
> and menu
> > >> >> >> > bar are trasparent and invisible, so I was not able to check
> if
> > >> >> >> > minigbm has same problems we have.
> > >> >> >> >
> > >> >> >> > 2) We are so stuck with nouveau support that I was thinking
> to explore
> > >> >> >> > another angle,
> > >> >> >> > is it possible to disable additional threads in android-x86
> code base for Oreo?
> > >> >> >> > Do you have some colleagues that may provide indication on
> how to do it?
> > >> >> >> >
> > >> >> >
> > >> >> >
> > >> >> > Hi Mauro,
> > >> >> >
> > >> >> > We don't officially support nouveau on Chrome OS (there are no
> devices which use it). The nouveau minigbm driver was written to be able to
> develop Chrome for Chrome OS on top of a Linux workstation with an nvidia
> GPU. In particular, we have never started Android with that configuration.
> > >> >> >
> > >> >> > Can you give more details on issue 1, i.e. what is invisible?
> Last I looked Chrome was working. Are you certain this is related to
> threading?
> > >> >> >
> > >> >> > Stéphane
> > >> >>
> > >> >> [minigbm issue]
> > >> >>
> > >> >> The problem with minigbm was mentioned after trying to exploit
> minigbm
> > >> >> as it is in Chrome OS stack (which supports running Android
> > >> >> applications AFAIK)
> > >> >>
> > >> >> The stock minigbm was not ready to boot in android-x86, lambdadroid
> > >> >> added dma fb support and I added some required formats (RGBA, RGBX,
> > >> >> RGB565)
> > >> >> to be able to boot:
> > >> >> https://github.com/maurossi/minigbm/commits/minigbm_fb
> > >> >>
> > >> >> Using that version of minigbm with android-x86 (oreo-x86) I see is
> > >> >> that Android GUI top bar, bottom menu bar, icons and cursor are
> > >> >> invisible/not rendered,
> > >> >> even if blind interaction is possible.
> > >> >> Maybe I've done something wrong because the drm format selection in
> > >> >> minigbm is not as easy to underdestrand as drm_gralloc and
> gbm_gralloc
> > >> >> ones.
> > >> >
> > >> >
> > >> > Yeah as I said, we never ran any Android with the nouveau minigbm
> driver, not ARC++, even less Android, so I don't know.
> > >> >
> > >> >>
> > >> >>
> > >> >> The GUI transparency (or missing rendering) with minigbm does not
> seem
> > >> >> related to multiple threads using same GL context,
> > >> >> however the GPU lookups and failure of dEQP-EGL multithreading
> tests
> > >> >> happening also with drm_gralloc and gbm_gralloc are certainly
> related.
> > >> >>
> > >> >> [MT issues]
> > >> >>
> > >> >> Since it is already assessed and known that nouveau lacks MT
> support
> > >> >> as per other mesa drivers i965, radeon, amdgpu
> > >> >> and Karol Herbst submitted patches to mesa-dev to bring "per gl
> > >> >> context mutex" and "per fence mutex locking" in nouveau,
> > >> >> I tried to run CTS dEQP-EGL with mesa GLES/EGL built with those
> patches,
> > >> >> the result was that dEQP-EGL multithreading tests failed causing
> GUI
> > >> >> reboots or PC restarts.
> > >> >>
> > >> >
> > >> > I am surprised by that; we have no problem with android on radeon
> which uses gallium which would have the same issues.
> > >>
> > >> We have no problem with radeon too,
> > >> but for nouveau there is an history of GPU lockups with android-x86
> as we speak,
> > >> Ilia Mirkin confirmed in several different bugzilla tickets that
> > >> nouveau does not react well to multiple threads workers on same gl
> > >> context.
> > >
> > >
> > >
> > > Hmm if you get GPU lockups, yes that's a different problem.
> > >
> > >
> > >>
> > >>
> > >> Infact with some prototypal mutex locking patches we had a mitigation
> > >> for android-x86 releases from lollipop-x86 to nougat-x86
> > >>
> > >> Karol Hebst submitted patches to mesa-dev on last december for that
> > >> exact same problem,
> > >> the patches are not yet up-streamed, so technically the problem is
> still there.
> > >>
> > >> The current Use Case is android-x86, but the first next GUI using
> > >> multiple threads will have problems too.
> > >>
> > >> >
> > >> >
> > >> >> Having contacted Karol Herbst he told that there may be one
> additional
> > >> >> race condition, but he did not clarified which one.
> > >> >>
> > >> >> What about launching dEQP-EGL on platform different from android,
> e.g.
> > >> >> EGL wayland is that possible to see if the tests also fail on Linux
> > >> >> platform?
> > >> >
> > >> >
> > >> > We use the surfaceless/null backend for deqp. We have upstreamed
> it, you should be able to use that also. Otherwise I have used the glx
> backend successfully as well on my desktop.
> > >>
> > >> Could it be that in your scenario there is only one thread per gl
> > >> context at a time?
> > >>
> > >
> > > In general, most of deqp is one GL context at a time, unless you run
> the parallel deqp stuff. So yes it would probably help. Similarly Chrome OS
> is running pretty much in a single GPU process, so we wouldn't see that
> problem either when running nouveau.
> > >
> > >
> > >>
> > >> >
> > >> >
> > >> >>
> > >> >> Are there similar tests in piglit?
> > >> >
> > >> >
> > >> > I'm not aware of any, but I stopped using piglit years ago.
> > >> >
> > >> >>
> > >> >>
> > >> >> [Other issue appeared with Android 8 Oreo hardware bitmaps]
> > >> >>
> > >> >> System UI and Play Store crashes, are happening after successful
> > >> >> android-x86 boot with drm_gralloc and gbm_gralloc,
> > >> >> these crashes seem to be very much related to this path:
> > >> >> CreateHardwareBitmap -> CreateBitmap -> Null Pointer Exception.
> > >> >> CreateHardwareBitmap (introduced in Android Oreo),
> > >> >
> > >> >
> > >> > Seems like you are missing dri extensions?
> > >>
> > >> Checking in the logcat the boot with nouveau has all extensions as per
> > >> other drivers,
> > >> but it has DRI_IMAGE twice, is that bad?
> > >>
> > >> 02-02 10:35:37.176 2489 2489 D vndksupport: Loading
> > >> /vendor/lib/egl/libGLES_mesa.so from current namespace instead of
> > >> sphal namespace.
> > >> 02-02 10:35:37.188 2489 2489 D libEGL : loaded
> > >> /vendor/lib/egl/libGLES_mesa.so
> > >> 02-02 10:35:37.251 2489 2489 D vndksupport: Loading
> > >> /vendor/lib/hw/gralloc.gbm.so from current namespace instead of sphal
> > >> namespace.
> > >> 02-02 10:35:37.253 2489 2489 I EGL-MAIN: found extension DRI_Core
> version 2
> > >> 02-02 10:35:37.253 2489 2489 I EGL-MAIN: found extension
> > >> DRI_IMAGE_DRIVER version 1
> > >> 02-02 10:35:37.253 2489 2489 I EGL-MAIN: found extension
> > >> DRI_ConfigOptions version 2
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension
> > >> DRI_TexBuffer version 2
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI2_Flush
> version 4
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI_IMAGE
> version 17
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI_IMAGE
> version 17
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension
> > >> DRI_RENDERER_QUERY version 1
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension
> > >> DRI_CONFIG_QUERY version 1
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension DRI2_Fence
> version 2
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension
> > >> DRI2_Interop version 1
> > >> 02-02 10:35:37.257 2489 2489 I EGL-MAIN: found extension
> DRI_NoError version 1
> > >>
> > >
> > > Can you put it in gdb and see where the NULL crash is? One can only
> intuit about what's going on otherwise.
> > >
> > > Stéphane
> > >
> > >
> > >>
> > >> >
> > >> > Stéphane
> > >> >
> > >> >
> > >> >>
> > >> >> uses only one copy
> > >> >> of bitmap instead of two, are there some restrictions in nouveau
> with
> > >> >> RGBA/RGBX, BGRA hardware bitmaps?
> > >> >>
> > >> >> Thanks in advance for any info, suggestions
> > >> >> I am available and ready to support testing/verifications to see
> the
> > >> >> MT and HardwareBitmap issues solved.
> > >> >>
> > >> >> Mauro
> > >> >>
> > >> >>
> > >> >>
> > >> >> >
> > >> >> >
> > >> >> >>
> > >> >> >> > Mauro
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20190318/566d3eee/attachment-0001.html>
More information about the Nouveau
mailing list