[Intel-gfx] [PATCH] drm/i915: Force CPU synchronisation even if userspace requests ASYNC

Wed Jul 12 07:49:41 UTC 2017

On Tue, Jul 11, 2017 at 12:06:50PM +0100, Chris Wilson wrote:
> Quoting Jason Ekstrand (2017-07-11 01:01:20)
> > Given that domain tracking is global, we can also run into interesting issues
> > if process A does a CPU map, writes to it, and then hands it to process B which
> > uses it from the GPU.  Without being very aggressive about set_domain, we have
> > no knowledge that this is a problem.
> 
> Yes, set-domain has always been a bit nebulous in that it is a one-sided
> barrier for a critical section. If we wanted to start from scratch, it
> would instead move the desired region into a cache domain and return an
> exclusive fence that the user must then signal. (Exclusive for writing,
> shared for concurrent reads.) And if the kernel were to even stop
> tracking the cache domain and leave it to userspace that would probably be
> even better. (That would have to mean that by convention, the bo is
> always coherent when the fence is released, but that could be relaxed
> if the only two parties are private.)
> 
> Hmm. Actually, this can be inserted into current operations without
> breaking backwards compatibility. If we assume old userspace is correctly
> calling set-domain prior to accessing the bo through a mmapping (and if
> it is not, then it is deliberately using unsynchronised access) then it
> will wait for the fences become beginning. The only problem is allowing
> userspace to create an unsignaled fence for indefinite periods of time.
> The alternative would be something like breaking it after X seconds and
> sending a signal to the process (depending on how easy it is to add a
> new signal or even if it desirable, that may just be SIGKILL!).
> 
> SVM is an interesting thorn in the side. At present, we simply have no
> means of implicit tracking (or KMS/prime integration) nor do we have any
> means of tracking which pages each operation uses. Hmm, we should have
> userspace at least provide a list of bo that are affected by the svm
> operation so that the implicit tracking just works. And of course, it
> assumes that userspace has made everything coherent. (Or we need to be
> even heavier handed and force pagefaults between coherency domains,
> that's certainly plausible, see HMM.)

I don't think expecting userspace to signal when it's done and the kernel
waiting on that is a good idea. A simpler model I think would be ioctl for
begin/end flushing to maintain cache coherency, and that's all they do.
This means avoiding double clflush isn't a thing we'd do anymore, but for
async maps that's probably a very reasonable tradeoff (since it's all
about exchanging results between gpu and cpu). And we don't even need an
ioctl, since we can clflush in userspace.

One-shot uploading is probably still better done with synchronous
set_domain.

In both cases it's up to userspace to not trip over itself and start the
next transaction before the previous one completed. Imo that's not the job
of the buffer manager or kernel backend.

And I think this model can completely ignore what's going on the gpu side,
as long as it properly waits for cpu stuff to complete (including clflush)
before calling execbuf, and ofc waits for the fences to complete before
reading stuff back, and simply clfushes defensively. And in case the
double-clflush ever hurts us we could add a get_domain ioctl and avoid
them if the buffer happens to be in the cpu cache right at that moment.
Ofc someone else could immediately change the domain, but that's simply
undefined, don't do that, nothing new.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch