[Beignet] clEnqueueNDRangeKernel and kernel completion

Sun Jun 9 02:10:42 PDT 2013

Hi Nanhai,

I have a quick look at this patch. And have one question here, the original bug should be that
it forgets to set the queue's last batch's value, so it always do nothing when call clFinish().
So IMHO, the easiest fix is to add code to set the queue's last_batch maybe.

Your fix seems to not only fix that bug, but also move the whole things from hardware independent
layer to the intel driver layer. Right? Is there any specific reason to do so?

On Thu, Jun 06, 2013 at 06:45:22AM +0000, Zou, Nanhai wrote:
> Hi, could you check if the attached patch works?
> 
> Thanks
> Zou Nanhai
> 
> From: beignet-bounces+nanhai.zou=intel.com at lists.freedesktop.org [mailto:beignet-bounces+nanhai.zou=intel.com at lists.freedesktop.org] On Behalf Of Edward Ching
> Sent: Thursday, June 06, 2013 8:03 AM
> To: beignet at lists.freedesktop.org
> Subject: [Beignet] clEnqueueNDRangeKernel and kernel completion
> 
> I hope this is the right forum to post comments/questions on Beignet OpenCL API behaviour. If not, please ignore and excuse the disruption.
> I'm running Beignet on an Ivy Bridge machine and noticed that clFinish would return before the GPU has complete processing of previously submitted commands.
> 
> e.g:
> I submitted an OpenCL kernel via clEnqueueNDRangeKernel, followed by clFinish, and expected the GPU to have finished all processing when clFinish returns.
> 
> But clFinish returned right away, and when I then call clEnqueueMapBuffer to access data, the call blocks, so I traced the logic all the way into the Ivy Bridge GPU device driver (~/drivers/gpu/drm/i915/*), and it looks like every IvyBr GPU batchbuffer used to submit an OpenCL kernel has an associated sequence number which the GPU would write to a special location which has to be monitored in order to tell if the GPU has finished processing the submtted kernel (details in Intel HD graphics PRM and i915/GEM design notes).  Neither clFinish nor clEnqueueNDRangeKernel does this monitoring, and clEnqueueMapBuffer happened to do it when it tried to transfer the buffer objects from GPU to CPU domain.
> Does this make sense? It seems to me that clFinish should instead be monitoring and blocking if the GPU is still busy executing an OpenCL kernel.
> /Ed

> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet