[BUG 4.17] etnaviv-gpu f1840000.gpu: recover hung GPU!

Russell King - ARM Linux linux at armlinux.org.uk
Wed Jul 11 20:13:44 UTC 2018


On Tue, Jun 19, 2018 at 02:28:46PM +0200, Lucas Stach wrote:
> See attached patch (which I apparently forgot to send out). The DRM GPU
> scheduler has some tracepoints, which might be helpful. The attached
> patch adds a drm_sched_job_run tracepoint when a job is queued in the
> hardware ring. Together with the existing drm_sched_process_job, this
> should get you an idea how long a job takes to process. Note that at
> any time up to 4 jobs are allowed in the hardware queue, so you need to
> match up the end times.

Here's the trace:

            Xorg  1009 [000]   276.593018:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=144, fence=0xeb849580, ring=ef16bd00, job count:0, hw job count:0
  gnome-software  1384 [000]   276.673634: gpu_scheduler:drm_sched_process_job: fence=0xeb849580 signaled
            Xorg  1009 [000]   278.756538:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=145, fence=0xecaaef40, ring=ef16bd00, job count:0, hw job count:0
            Xorg  1009 [000]   279.130196: gpu_scheduler:drm_sched_process_job: fence=0xecaaef40 signaled
            Xorg  1009 [000]   279.856129:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=146, fence=0xecbdd7c0, ring=ef16bd00, job count:0, hw job count:0
            Xorg  1009 [000]   279.856542:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=147, fence=0xeb849700, ring=ef16bd00, job count:0, hw job count:1
         kswapd0    26 [000]   281.532808: gpu_scheduler:drm_sched_process_job: fence=0xecbdd7c0 signaled
  gnome-software  1384 [000]   281.560666: gpu_scheduler:drm_sched_process_job: fence=0xeb849700 signaled
            Xorg  1009 [000]   282.554920:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=148, fence=0xecad3a00, ring=ef16bd00, job count:0, hw job count:0
            Xorg  1009 [000]   283.956216:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=149, fence=0xeb849700, ring=ef16bd00, job count:0, hw job count:1
            Xorg  1009 [000]   284.149557: gpu_scheduler:drm_sched_process_job: fence=0xecad3a00 signaled
            Xorg  1009 [000]   285.465863: gpu_scheduler:drm_sched_process_job: fence=0xeb849700 signaled
            Xorg  1009 [000]   286.796692:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=150, fence=0xecbdd340, ring=ef16bd00, job count:0, hw job count:0
            Xorg  1009 [000]   288.549507:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=151, fence=0xecbdd280, ring=ef16bd00, job count:0, hw job count:1
            Xorg  1009 [000]   288.856277:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=152, fence=0xecaaee80, ring=ef16bd00, job count:0, hw job count:2
            Xorg  1009 [000]   288.856713:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=153, fence=0xecad3040, ring=ef16bd00, job count:0, hw job count:3
  gnome-software  1384 [000]   288.958764: gpu_scheduler:drm_sched_process_job: fence=0xecbdd340 signaled
            Xorg  1009 [000]   290.052842:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=154, fence=0xecbdde80, ring=ef16bd00, job count:0, hw job count:3
            Xorg  1009 [000]   291.682222:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=155, fence=0xecaaeb80, ring=ef16bd00, job count:0, hw job count:4
  gnome-software  1384 [000]   291.760295: gpu_scheduler:drm_sched_process_job: fence=0xecbdd280 signaled
  gnome-software  1384 [000]   292.018437: gpu_scheduler:drm_sched_process_job: fence=0xecaaee80 signaled
            Xorg  1009 [000]   292.046721: gpu_scheduler:drm_sched_process_job: fence=0xecad3040 signaled
            Xorg  1009 [000]   292.819475: gpu_scheduler:drm_sched_process_job: fence=0xecbdde80 signaled
            Xorg  1009 [000]   293.149829:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=156, fence=0xecad3f40, ring=ef16bd00, job count:0, hw job count:1
  gnome-software  1384 [000]   293.190607: gpu_scheduler:drm_sched_process_job: fence=0xecaaeb80 signaled
  gnome-software  1384 [000]   293.190720: gpu_scheduler:drm_sched_process_job: fence=0xecad3f40 signaled
            Xorg  1009 [000]   293.463414:         gpu_scheduler:drm_sched_job: entity=0xee9c3e00, id=157, fence=0xecbddd00, ring=ef16bd00, job count:0, hw job count:0
            Xorg  1009 [000]   293.463629: gpu_scheduler:drm_sched_process_job: fence=0xecbddd00 signaled

So:
Job	Start		End		Time on queue:
144	276.593018	276.673634	80ms
145	278.756538	279.130196	374ms
146	279.856129	281.532808	1.7s
147	279.856542	281.560666	1.7s (overlaps 146)
148	282.554920	284.149557	1.6s
149	283.956216	285.465863	1.5s (overlaps 148)
150	286.796692	288.958764	2.2s
151	288.549507	291.760295	3.2s (overlaps 150)
152	288.856277	292.018437	3.2s (overlaps 150-151)
153	288.856713	292.046721	3.2s (overlaps 150-152)
154	290.052842	292.819475	2.8s (overlaps 151-153)
155	291.682222	293.190607	1.5s (overlaps 151-154)
156	293.149829	293.190720	40ms (overlaps 155)
157	293.463414	293.463629	215us

The time for jobs 147, 149, 151-155 are misleading because they are
delayed by the preceeding job.  Correcting for that,

					Time executing:
147					28ms
149					1.3s
151					2.8s
152					250ms
153					28ms
154					780ms
155					370ms

So, sorted by time apparently executing:
151					2.8s
150	286.796692	288.958764	2.2s
146	279.856129	281.532808	1.7s
148	282.554920	284.149557	1.6s
149					1.3s
154					780ms
145	278.756538	279.130196	374ms
155					370ms
152					250ms
144	276.593018	276.673634	80ms
156	293.149829	293.190720	40ms
153					28ms
147					28ms
157	293.463414	293.463629	215us

Of course, from the perf dump, we don't know what the GPU was being
asked to do.

I've applied your forward-progress patch, and with the 5s timeout
reverted, it seems to "behave" without causing a timeout.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


More information about the dri-devel mailing list