[Mesa-dev] [PATCH] RFC: Externd IMG_context_priority with NV_context_priority_realtime

Kenneth Graunke kenneth at whitecape.org
Sat Mar 31 19:29:28 UTC 2018


On Saturday, March 31, 2018 5:56:57 AM PDT Chris Wilson wrote:
> Quoting Chris Wilson (2018-03-31 12:00:16)
> > Quoting Kenneth Graunke (2018-03-30 19:20:57)
> > > On Friday, March 30, 2018 7:40:13 AM PDT Chris Wilson wrote:
> > > > For i915, we are proposing to use a quality-of-service parameter in
> > > > addition to that of just a priority that usurps everyone. Due to our HW,
> > > > preemption may not be immediate and will be forced to wait until an
> > > > uncooperative process hits an arbitration point. To prevent that unduly
> > > > impacting the privileged RealTime context, we back up the preemption
> > > > request with a timeout to reset the GPU and forcibly evict the GPU hog
> > > > in order to execute the new context.
> > > 
> > > I am strongly against exposing this in general.  Performing a GPU reset
> > > in the middle of a batch can completely screw up whatever application
> > > was running.  If the application is using robustness extensions, we may
> > > be forced to return GL_DEVICE_LOST, causing the application to have to
> > > recreate their entire GL context and start over.  If not, we may try to
> > > let them limp on(*) - and hope they didn't get too badly damaged by some
> > > of their commands not executing, or executing twice (if the kernel tries
> > > to resubmit it).  But it may very well cause the app to misrender, or
> > > even crash.
> > 
> > Yes, I think the revulsion has been universal. However, as a
> > quality-of-service guarantee, I can understand the appeal. The
> > difference is that instead of allowing a DoS for 6s or so as we
> > currently allow, we allow that to be specified by the context. As it
> > does allow one context to impact another, I want it locked down to
> > privileged processes. I have been using CAP_SYS_ADMIN as the potential
> > to do harm is even greater than exploiting the weak scheduler by
> > changing priority.

Right...I was thinking perhaps a tunable to reduce the 6s would do the
trick, and be much less complicated...but perhaps you want to let it go
longer when there isn't super-critical work to do.

> Also to add further insult to injury, we might want to force GPU clocks
> to max for the RT context (so that the context starts executing at max
> rather than wait for the system to upclock on load). Something like,

That makes some sense - but I wonder if it wouldn't cause more battery
burn than is necessary.  The super-critical workload may also be
relatively simple (redrawing a clock), and so up-clocking and
down-clocking again might hurt us...it's hard to say. :(

I also don't know what I think of this plan to let userspace control
(restrict) the frequency.  That's been restricted to root (via sysfs)
in the past.  But I think you're allowing it more generally now, without
CAP_SYS_ADMIN?  It seems like there's a lot of potential for abuse.
(Hello, benchmark mode!  Zoooom!)  I know it solves a problem, but it
seems like there's got to be a better way...

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180331/a2bba773/attachment.sig>


More information about the mesa-dev mailing list