[Mesa-dev] EXTERNAL: Re: Clover clEnqueue* function don't implement blocking?
Dorrington, Albert
albert.dorrington at lmco.com
Tue Apr 15 04:34:56 PDT 2014
> -----Original Message-----
> From: Francisco Jerez [mailto:currojerez at riseup.net]
> "Dorrington, Albert" <albert.dorrington at lmco.com> writes:
> >
> > From reading the OpenCL spec (and perhaps I'm misinterpreting something
> again), section 5.10 Flush and Finish says:
> >
> > Any blocking commands queued in a command-queue such as
> > clEnqueueRead{Image|Buffer} with blocking_read set to CL_TRUE,
> > clEnqueueWrite{Image|Buffer} with blocking_write set to CL_TRUE,
> > clEnqueueMap{Buffer|Image} with blocking_map set to CL_TRUE or
> > clWaitForEvents perform an implicit flush of the command-queue.
> >
> > From this statement, I would expect that the command-queue would be
> flushed when the blocking flag is set.
>
> clEnqueueRead*, clEnqueueMap* and clWaitForEvents already flush the
> command queue (the first two are flushing indirectly as we try to map a
> buffer referenced by the GPU). clEnqueueWrite* doesn't flush, but it's not
> clear to me that not doing it can be considered a violation of the spec. The
> guarantees given by clFlush() are rather vague (to some extent an empty
> function could be a valid implementation) and it seems to me that a
> compliant implementation might, for instance, choose to batch up
> commands across flushes if that's the most efficient thing to do, as long as
> the user has no way to tell the difference.
>
> I'd like to see some real-world example where clover's behavior represents a
> problem before we change it to flush more frequently, because I'm worried
> that changing this will actually worsen performance rather than improving it.
I have been working with a modified version of Mesa code, which accepts kernels compiled with AMD's compiler.
(Our project's goal is to host Mesa in an environment which does not currently support LLVM/Clang or C++11)
While testing 2D image read capabilities, I have been encountering an issue where the command queue's 'queued_events' continues to be populated, with none of the events being removed until the clFinish call. At that point, I have 23,328 events in the queue and encounter a segmentation fault during the command_queue flush.
After seeing the statement in the OpenCL spec about the implicit flush during the clEnqueue calls, I added the previously mentioned conditional hev().wait() calls to initiate a flush.
This seems to have resolved the issue with the segFaults during the clFinish call; although I'll admit it likely isn't the most efficient method.
While I have not benchmarked the runtimes precisely, the run-time did not seem to be significantly impacted. The test ran for ~20 minutes before crashing, and now runs for ~20 minutes before completing successfully.
More information about the mesa-dev
mailing list