[Nouveau] GT 730 freeze : how do diagnose / debug ?

Vincent Vanackere vincent.vanackere at gmail.com
Thu May 11 09:15:13 UTC 2017


2017-05-09 9:50 GMT+02:00 Vincent Vanackere <vincent.vanackere at gmail.com>:

> Some additional data:
> - putting LIBGL_ALWAYS_SOFTWARE=1 in /etc/environment makes indeed the
> system work (for my current usage, the slowness is acceptable in exchange
> of stabillity)
>

Unfortunately I just got a freeze (using wayland with
LIBGL_ALWAYS_SOFTWARE=1):
[179221.647861] nouveau 0000:01:00.0: Xwayland[27856]: nv50cal_space: -16
[179245.768920] traps: gnome-shell[3175] trap int3 ip:7f14cd988de1
sp:7ffe10e66110 error:0 in libglib-2.0.so.0.5200.0[7f14cd939000+111000]
[179256.854109] [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR*
[CRTC:41:head-1] hw_done timed out
[179267.094392] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:41:head-1] hw_done timed out
[179277.334749] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:41:head-1] flip_done timed out
[179279.385856] nouveau 0000:01:00.0: DRM: base-1: timeout
[179289.623162] [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR*
[CRTC:41:head-1] hw_done timed out
[179299.863479] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:41:head-1] hw_done timed out
[179310.103838] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:41:head-1] flip_done timed out
[179319.064210] INFO: task kworker/u8:1:30061 blocked for more than 120
seconds.
[179319.064211]       Not tainted 4.11.0-999-generic #201705062201
[179319.064211] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[179319.064212] kworker/u8:1    D    0 30061      2 0x00000000
[179319.064238] Workqueue: events_unbound nv50_disp_atomic_commit_work
[nouveau]
[179319.064239] Call Trace:
[179319.064242]  __schedule+0x3c3/0x840
[179319.064261]  ? nouveau_display_scanoutpos+0xe9/0x180 [nouveau]
[179319.064262]  schedule+0x36/0x80
[179319.064264]  schedule_timeout+0x23e/0x310
[179319.064265]  ? __slab_free+0xa9/0x300
[179319.064283]  ? nouveau_bo_rd32+0x2a/0x30 [nouveau]
[179319.064300]  ? nv84_fence_read+0x2e/0x30 [nouveau]
[179319.064301]  dma_fence_default_wait+0x1af/0x250
[179319.064302]  ? dma_fence_default_wait+0x1af/0x250
[179319.064304]  ? dma_fence_free+0x20/0x20
[179319.064305]  dma_fence_wait_timeout+0x39/0xe0
[179319.064310]  drm_atomic_helper_wait_for_fences+0x4c/0xf0
[drm_kms_helper]
[179319.064326]  nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau]
[179319.064342]  nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
[179319.064343]  process_one_work+0x1e9/0x410
[179319.064344]  worker_thread+0x4b/0x410
[179319.064345]  kthread+0x109/0x140
[179319.064346]  ? process_one_work+0x410/0x410
[179319.064347]  ? kthread_create_on_node+0x70/0x70
[179319.064348]  ret_from_fork+0x2c/0x40
[179320.344194] [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR*
[CRTC:41:head-1] hw_done timed out
[179330.584461] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:41:head-1] hw_done timed out
[179340.824777] [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:41:head-1] flip_done timed out

My current kernel version is 4.11.0-999.201705062201 from
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2017-05-07/

To Ben Skeggs: is there anything I could do to help fix this ? If there is
no hope of stability improvements I will have to switch to another graphic
card so please let me know !

Best regards,

Vincent


- I still get lock-up using mesa from git (17.2~git1705081930.25d2 from
> this repository https://launchpad.net/~oibaf/+archive/ubuntu/graphics-
> drivers <https://launchpad.net/%7Eoibaf/+archive/ubuntu/graphics-drivers>)
>
> I have another question (probably Ben Skeggs could also give an advice ?):
> I see there are a lot more mesa variables that can be set (
> https://www.mesa3d.org/envvars.html). Are there some other variables that
> I could set in order to either partially enable hardware acceleration or
> (better) to get a diagnostic of what the driver is doing that is causing
> the graphic card to hang ?
>
> Thanks for your help !
>
> Vincent
>
> 2017-05-08 13:50 GMT+02:00 Vincent Vanackere <vincent.vanackere at gmail.com>
> :
>
>> On 07/05/2017 23:50, Ilia Mirkin wrote:
>> > You have two issues:
>> >
>> > (a) nouveau's GL driver messed something up, causing a read fault error
>> > (b) nouveau's kernel driver tried to recover. It failed.
>> >
>> > Solution to #1: None, really. You can try updating mesa, and hope it
>> > helps. Not sure what version you're on.
>>
>> Here's my packages version:
>>
>> ii  libegl1-mesa:amd64              17.0.3-1ubuntu1                amd64
>>       free implementation of the EGL API -- runtime
>> ii  libegl1-mesa-dev:amd64          17.0.3-1ubuntu1                amd64
>>       free implementation of the EGL API -- development files
>> ii  libgl1-mesa-dev:amd64           17.0.3-1ubuntu1                amd64
>>       free implementation of the OpenGL API -- GLX development files
>> ii  libgl1-mesa-dri:amd64           17.0.3-1ubuntu1                amd64
>>       free implementation of the OpenGL API -- DRI modules
>> ii  libgl1-mesa-glx:amd64           17.0.3-1ubuntu1                amd64
>>       free implementation of the OpenGL API -- GLX runtime
>> ii  libglapi-mesa:amd64             17.0.3-1ubuntu1                amd64
>>       free implementation of the GL API -- shared library
>> ii  libgles2-mesa:amd64             17.0.3-1ubuntu1                amd64
>>       free implementation of the OpenGL|ES 2.x API -- runtime
>> ii  libglu1-mesa:amd64              9.0.0-2.1build1                amd64
>>       Mesa OpenGL utility library (GLU)
>> ii  libglu1-mesa-dev:amd64          9.0.0-2.1build1                amd64
>>       Mesa OpenGL utility library -- development files
>> ii  libwayland-egl1-mesa:amd64      17.0.3-1ubuntu1                amd64
>>       implementation of the Wayland EGL platform -- runtime
>> ii  mesa-common-dev:amd64           17.0.3-1ubuntu1                amd64
>>       Developer documentation for Mesa
>> ii  mesa-utils                      8.3.0-4                        amd64
>>       Miscellaneous Mesa GL utilities
>> ii  mesa-vdpau-drivers:amd64        17.0.3-1ubuntu1                amd64
>>       Mesa VDPAU video acceleration drivers
>>
>>
>> I'll try compiling a newer version from git to see if it helps...
>>
>> > Solution to #2: Ben Skeggs will hopefully have something clever to
>> > say. The recovery logic was recently beefed up considerably, so the
>> > fact that you even got that far is already a good start.
>> >
>> > If you're looking for a stable experience with Xorg, I recommend using
>> > xf86-video-nouveau -- it's been extensively battle-tested, and is
>> > quite simple logic; I also recommend against anything that uses GL on
>> > an ongoing basis (which, sadly, everyone thinks is the coolest thing
>> > to do these days). If you're looking for a stable experience with a
>> > GL-based Wayland compositor, you'll have to wait until either the
>> > nouveau GL driver is perfect or nouveau kernel module can properly
>> > recover from any screwups the GL driver makes.
>>
>> I'm not expecting the GL driver to be perfect ;-)
>> However it would be nice if the kernel module could recover at least a
>> bit better from bad commands from the GL driver (indeed I've had some hard
>> lockups too where I could not even connect from ssh).
>>
>> > You can also remove nouveau_dri.so entirely, which is a big hammer
>> > against these types of issues (removes all GL-based acceleration), or
>> > you can run certain key pieces of software with
>> > LIBGL_ALWAYS_SOFTWARE=1, which will force a CPU-based GL
>> > implementation.
>>
>> Thanks for the hint, I'll try this workaround too !
>>
>> Please let me know if I can do anything to improve the drivers's
>> stablility (like dumping the cards's register or enabling some traces ?).
>> Alternatively if you know of a fanless graphic card model that would be
>> able to drive 2 monitors at 2560x1440 with proper linux support, I'm
>> interested ;-)
>>
>> Regards
>>
>> > Cheers,
>> >
>> >   -ilia
>> >
>> >
>> > 2017-05-07 16:03 GMT-04:00 Vincent Vanackere <
>> vincent.vanackere at gmail.com>:
>> >> Hi,
>> >>
>> >>  I own an Asus GT730-SL-2GD3-BRK, trying to drive two monitors at
>> 2560x1440
>> >> resolution. Using gnome-shell with either Xorg or wayland I get screen
>> >> freezes very frequently. Those freezes usually require a reboot to get
>> >> working graphics (below a sample trace that I got yesterday).
>> >>  I am running Ubuntu 17.04 with the latest kernels avalable, I also
>> tested
>> >> various more recent kernels including the latest drm tree at
>> >> https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next but the
>> problem
>> >> always occurs.
>> >>  When a freeze occurs, the computer is still reachable through ssh but
>> the
>> >> only action I found so far to get graphics back is to restart the
>> computer.
>> >>   I am willing to run diagnostics programs or test any patch if it
>> would
>> >> help. I'm also not excluding the possibility that I may have some
>> faulty
>> >> hardware so any hardwae-health-test advice would be welcome...
>> >>
>> >> Regards,
>> >>
>> >> Vincent Vanackère
>> >>
>> >> [    1.199135] nouveau 0000:01:00.0: NVIDIA GK208B (b06070b1)
>> >> [    1.319930] nouveau 0000:01:00.0: bios: version 80.28.92.00.10
>> >> [    1.322095] nouveau 0000:01:00.0: fb: 2048 MiB DDR3
>> >> [    2.620362] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB
>> >> [    2.620362] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
>> >> [    2.620364] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
>> >> [    2.620378] nouveau 0000:01:00.0: DRM: DCB version 4.0
>> >> [    2.620379] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02
>> 00020030
>> >> [    2.620380] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62
>> 00020010
>> >> [    2.620380] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022f10
>> 00000000
>> >> [    2.620381] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031
>> >> [    2.620381] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
>> >> [    2.620382] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
>> >> [    2.666199] nouveau 0000:01:00.0: hwmon_device_register() is
>> deprecated.
>> >> Please convert the driver to use hwmon_device_register_with_info().
>> >> [    2.717519] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer
>> copies
>> >> [    2.992994] nouveau 0000:01:00.0: DRM: allocated 2560x1440 fb:
>> 0x60000,
>> >> bo ffff8cd1499f8000
>> >> [    3.025200] fbcon: nouveaufb (fb0) is primary device
>> >> [    3.253561] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
>> >> [    3.268163] [drm] Initialized nouveau 1.3.1 20120801 for
>> 0000:01:00.0 on
>> >> minor 0
>> >> [ 2150.225651] nouveau 0000:01:00.0: fifo: read fault at 0006710000
>> engine
>> >> 00 [GR] client 02 [GPC0/PE_0] reason 02 [PTE] on channel 31 [007e8cb000
>> >> Xwayland[3019]]
>> >> [ 2150.225662] nouveau 0000:01:00.0: fifo: channel 31: killed
>> >> [ 2150.225663] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for
>> recovery
>> >> [ 2150.225666] nouveau 0000:01:00.0: fifo: engine 0: scheduled for
>> recovery
>> >> [ 2150.225669] nouveau 0000:01:00.0: Xwayland[3019]: channel 31 killed!
>> >> [ 2296.863975] Workqueue: events_unbound nv50_disp_atomic_commit_work
>> >> [nouveau]
>> >> [ 2296.863990]  ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau]
>> >> [ 2296.864032]  nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau]
>> >> [ 2296.864047]  nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
>> >> [ 2296.864118] Workqueue: events_unbound nv50_disp_atomic_commit_work
>> >> [nouveau]
>> >> [ 2296.864138]  ? nouveau_bo_rd32+0x2a/0x30 [nouveau]
>> >> [ 2296.864153]  ? nv84_fence_read+0x2e/0x30 [nouveau]
>> >> [ 2296.864175]  nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau]
>> >> [ 2296.864189]  nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
>> >> [ 2417.699641] Workqueue: events_unbound nv50_disp_atomic_commit_work
>> >> [nouveau]
>> >> [ 2417.699656]  ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau]
>> >> [ 2417.699688]  nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau]
>> >> [ 2417.699705]  nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
>> >> [ 2417.699785] Workqueue: events_unbound nv50_disp_atomic_commit_work
>> >> [nouveau]
>> >> [ 2417.699808]  ? nouveau_bo_rd32+0x2a/0x30 [nouveau]
>> >> [ 2417.699825]  ? nv84_fence_read+0x2e/0x30 [nouveau]
>> >> [ 2417.699851]  nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau]
>> >> [ 2417.699867]  nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
>> >> [ 2538.535424] Workqueue: events_unbound nv50_disp_atomic_commit_work
>> >> [nouveau]
>> >> [ 2538.535439]  ? nvkm_ioctl_ntfy_get+0x69/0xb0 [nouveau]
>> >> [ 2538.535469]  nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau]
>> >> [ 2538.535485]  nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
>> >> [ 2538.535555] Workqueue: events_unbound nv50_disp_atomic_commit_work
>> >> [nouveau]
>> >> [ 2538.535576]  ? nouveau_bo_rd32+0x2a/0x30 [nouveau]
>> >> [ 2538.535591]  ? nv84_fence_read+0x2e/0x30 [nouveau]
>> >> [ 2538.535614]  nv50_disp_atomic_commit_tail+0x55/0x3a00 [nouveau]
>> >> [ 2538.535628]  nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
>> >>
>> >>
>> >> _______________________________________________
>> >> Nouveau mailing list
>> >> Nouveau at lists.freedesktop.org
>> >> https://lists.freedesktop.org/mailman/listinfo/nouveau
>> >>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20170511/782ade4d/attachment-0001.html>


More information about the Nouveau mailing list