[RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode

Dmitry Osipenko digetx at gmail.com
Sat Sep 12 21:51:01 UTC 2020


12.09.2020 16:31, Mikko Perttunen пишет:
...
>> I'm now taking a closer look at this patch and it raises some more
>> questions, like for example by looking at the "On job timeout, we stop
>> the channel, NOP all future jobs on the channel using the same syncpoint
>> ..." through the prism of grate-kernel experience, I'm not sure how it
>> could co-exist with the drm-scheduler and why it's needed at all. But I
>> think we need a feature-complete version (at least a rough version), so
>> that we could start the testing, and then it should be easier to review
>> and discuss such things.
>>
> 
> The reason this is needed is that if a job times out and we don't do its
> syncpoint increments on the CPU, then a successive job incrementing that
> same syncpoint would cause fences to signal incorrectly. The job that
> was supposed to signal those fences didn't actually run; and any data
> those fences were protecting would still be garbage.

I'll need to re-read the previous discussion because IIRC, I was
suggesting that once job is hung, all jobs should be removed from
queue/PB and re-submitted, then the re-submitted jobs will use the
new/updated sync point base.

And we probably should need another drm_tegra_submit_cmd type that waits
for a relative sync point increment. The
drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it
shouldn't be used for sync point increments that are internal to a job
because it complicates the recovery.

All waits that are internal to a job should only wait for relative sync
point increments.

In the grate-kernel every job uses unique-and-clean sync point (which is
also internal to the kernel driver) and a relative wait [1] is used for
the job's internal sync point increments [2][3][4], and thus, kernel
driver simply jumps over a hung job by updating DMAGET to point at the
start of a next job.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367

[2]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486
[3]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389
[4]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536


More information about the dri-devel mailing list