[Mesa-dev] EXTERNAL: Re: Clover clEnqueue* function don't implement blocking?

Tue Apr 15 06:52:12 PDT 2014

> -----Original Message-----
> From: Francisco Jerez [mailto:currojerez at riseup.net]
> 
> "Dorrington, Albert" <albert.dorrington at lmco.com> writes:
> 
> >> -----Original Message-----
> >> From: Francisco Jerez [mailto:currojerez at riseup.net]
> >  > "Dorrington, Albert" <albert.dorrington at lmco.com> writes:
> >> >
> >> > From reading the OpenCL spec (and perhaps I'm misinterpreting
> >> > something
> >> again), section 5.10 Flush and Finish says:
> >> >
> >> > 	Any blocking commands queued in a command-queue such as
> >> > 	clEnqueueRead{Image|Buffer} with blocking_read set to CL_TRUE,
> >> > 	clEnqueueWrite{Image|Buffer} with blocking_write set to CL_TRUE,
> >> > 	clEnqueueMap{Buffer|Image} with blocking_map set to CL_TRUE or
> >> > 	clWaitForEvents perform an implicit flush of the command-queue.
> >> >
> >> > From this statement, I would expect that the command-queue would
> >> > be flushed when the blocking flag is set.
> >>
> >> clEnqueueRead*, clEnqueueMap* and clWaitForEvents already flush the
> >> command queue (the first two are flushing indirectly as we try to map
> >> a buffer referenced by the GPU).  clEnqueueWrite* doesn't flush, but
> >> it's not clear to me that not doing it can be considered a violation
> >> of the spec.  The guarantees given by clFlush() are rather vague (to
> >> some extent an empty function could be a valid implementation) and it
> >> seems to me that a compliant implementation might, for instance,
> >> choose to batch up commands across flushes if that's the most
> >> efficient thing to do, as long as the user has no way to tell the difference.
> >>
> >> I'd like to see some real-world example where clover's behavior
> >> represents a problem before we change it to flush more frequently,
> >> because I'm worried that changing this will actually worsen performance
> rather than improving it.
> >
> > I have been working with a modified version of Mesa code, which
> > accepts kernels compiled with AMD's compiler.  (Our project's goal is
> > to host Mesa in an environment which does not currently support
> > LLVM/Clang or C++11)
> >
> > While testing 2D image read capabilities, I have been encountering an
> > issue where the command queue's 'queued_events' continues to be
> > populated, with none of the events being removed until the clFinish
> > call. At that point, I have 23,328 events in the queue and encounter a
> > segmentation fault during the command_queue flush.
> >
> It would be interesting to find out what's causing the segfault exactly, the
> more frequent flushes might just be hiding a problem of different nature.
> Also, is it the expected behavior of your test to queue so many events
> before trying to read back the results or doing some other sort of blocking
> operation?  Is its source code public?

The test is the Khronos conformance test for 2D image reads, so unfortunately the code is not public.

In the case where I had the 23,328 events in the queue, at least two dozen kernels has been compiled,
and each kernel had been executed 6 times, with different input parameters.
(I would have to back the changes out to get exact numbers)

From all appearances, the queued_events queue was never being flushed until the clFinish() call.

Previously you mentioned that the clEnqueueRead*/Map* calls are implicitly flushing the command queue;
I've looked through the code and just don't see where the queued_events queue is being flushed.