[Intel-gfx] [RFC] drm/i915: Don't reset on preemptible workloads

Thu Aug 2 12:32:31 UTC 2018

On Thu, 2018-08-02 at 10:14 +0100, Chris Wilson wrote:
> Quoting Bartminski, Jakub (2018-08-02 09:56:10)
> > On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote:
> > > Quoting Jakub Bartmiński (2018-08-01 14:56:11)
> > > > [...]
> > > Though the argument that as long as its preemptible (and
> > > isolated) it
> > > is not acting as DoS so we don't need to enforce a timeout. That
> > > also
> > > suggests that we need not reset until the request is blocking
> > > others
> > > or userspace.
> > 
> > So if I'm understanding correctly you are suggesting that if we
> > have
> > preemption enabled we could drop the hangcheck entirely, and just
> > reset
> > if preemption fails?
> 
> Yes, but we still need the reset if we are stuck at the head of a
> chain.

I see a potential problem with that approach. If we have a request that
is actually hung and introduce a higher-priority request, when trying
to preempt the hung request we experience some considerable latency
since we need to do the hangcheck at that time, whereas if we were
hangchecking regularly it would have been declared hung before that.

> [...]
> > As I've mentioned before, one other problem with this patch is that
> > we
> > can't have more than one "background" request working before the
> > first
> > one finishes. While this could be solved at hangcheck by putting
> > the
> > background requests in the back of the background-priority queue
> > instead of at the front during preemption, this breaks if the
> > background requests are dependent on each other. These dependencies
> > could be resolved in the tasklet (only in the case hangcheck was
> > called
> > so it wouldn't slow us down too much, they could also be resolved
> > in
> > the hangcheck itself but the timeline could change in between the
> > hangcheck and the tasklet, which would make doing it correctly more
> > complicated I think).
> 
> I think the very simple rule that you reset if there are any
> dependencies and demote otherwise covers that.
> -Chris

So from what I understand, what you meant by that is that you would
scratch the opt-in and just demote the priority by some constant on the
hangcheck, handle the change in the tasklet, possibly preempting, and
if so then wait until the next hangcheck to see if the preempt worked
(and if not then reset the engine)? 
While this is an elegant solution, if we want to allow (do we want to?)
multiple "background" requests working interweaved with each other this
can still lead to starvation. 
Imagine the following scenario: we leave a "background" request working
for a long enough time that its priority is decreased substantially. If
we now introduce a second "background" request with the same starting
priority, it will starve the first one for as long as it takes for it
to be demoted to the first's current priority.
We could avoid demoting if the request is the only one of that
priority, but then the same starvation happens if we have two starting
requests instead of one, and introduce the third one after some time.
I think one way to fix this could be to demote not by a constant, but
to a bottom ("background") queue, and then maybe rotate the bottom
queue on hangcheck (if that's possible, we should have no dependencies
there with what you proposed) to let multiple background requests work
interweaved. What is your opinion on that approach?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3278 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20180802/b22a1614/attachment.bin>