[Mesa-dev] [PATCH v2 1/1] clover: Wait for requested operation if blocking flag is set

Jan Vesely jan.vesely at rutgers.edu
Fri Aug 4 04:30:48 UTC 2017


Hi,

thanks for detailed explanation. I indeed missed the writeBuffer part
in specs.

On Wed, 2017-08-02 at 15:05 -0700, Francisco Jerez wrote:
> These changes are somewhat redundant and potentially
> performance-impacting, the reason is that in the OpenCL API,
> clEnqueueWrite* commands are specified to block until the memory
> provided by the application as origin can be reused safely (i.e. until
> soft_copy_op()() runs), not necessarily until the transfer to graphics
> memory has completed (which is what hard_event::wait() will wait for).
> OTOH reads and maps as implemented by soft_copy_op and friends are
> essentially blocking so the wait() call is redundant in most cases.

That explains a noticeable slowdown running piglit with these changes.
I'm not sure about the read part though. I expected it to be basically
a noop, but it changes behaviour.
Adding clGetEventInfo(CL_EVENT_COMMAND_EXECUTION_STATUS) after a
blocking read in one of the piglit tests surprisingly returns
CL_QUEUED.

The specs don't mention use of events with blocking read, but it does
say that "When the read command has completed, the contents of the
buffer that ptr points to can be used by the application." in the non-
blocking section. I'd say that the expectation is for the event to be
CL_COMPLETE after blocking read (at least beignet/pocl/intel-cpu-sdk
follow this).

> 
> The only reason why it might be useful to behave differently on blocking
> transfers is that the application may have specified a user event in the
> event dependency list, which may cause the soft_copy_op() call to be
> delayed until the application signals the user event.  In order to fix
> that it should really only be necessary to wait for the event action to
> be executed, not necessarily its associated GPU work.

I think that another use is that non-blocking writes do not create an
extra copy of the buffer. Thus
clEnqueueWriteBuffer(...,cl_false, ev, ...)
clWaitForEvents(ev)
is more memory efficient.

> 
> Last time this issue came up (yeah it's not the first time) I proposed
> the patches below to add a mechanism to wait for the event action only,
> feel free to include it as PATCH 0.1 and 0.2 of this series (it's been a
> while so they may no longer apply cleanly).

I think we can just add comments explaining why the blocking argument
is ignored, until someone chooses to fix this problem and/or to
implement proper non-blocking variants (would std::async work for
trivial cases like ReadBuffer?)

thanks,
Jan

> 
> Thank you.
> 
> Jan Vesely <jan.vesely at rutgers.edu> writes:
> 
> > v2: wait in map_buffer and map_image as well
> > 
> > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> > ---
> > Hi Aaron,
> > 
> > yes, I think you're right, we should wait in Map* as well.
> > If nothing else it's consistent, even if passing the flag to add_map might make it unnecessary (haven't actually checked).
> > 
> > thanks,
> > Jan
> > 
> >  src/gallium/state_trackers/clover/api/transfer.cpp | 30 ++++++++++++++++++++--
> >  1 file changed, 28 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp b/src/gallium/state_trackers/clover/api/transfer.cpp
> > index f7046253be..729a34590e 100644
> > --- a/src/gallium/state_trackers/clover/api/transfer.cpp
> > +++ b/src/gallium/state_trackers/clover/api/transfer.cpp
> > @@ -295,6 +295,9 @@ clEnqueueReadBuffer(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >                     &mem, obj_origin, obj_pitch,
> >                     region));
> >  
> > +   if (blocking)
> > +       hev().wait();
> > +
> >     ret_object(rd_ev, hev);
> >     return CL_SUCCESS;
> >  
> > @@ -325,6 +328,9 @@ clEnqueueWriteBuffer(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >                     ptr, {}, obj_pitch,
> >                     region));
> >  
> > +   if (blocking)
> > +       hev().wait();
> > +
> >     ret_object(rd_ev, hev);
> >     return CL_SUCCESS;
> >  
> > @@ -362,6 +368,9 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >                     &mem, obj_origin, obj_pitch,
> >                     region));
> >  
> > +   if (blocking)
> > +       hev().wait();
> > +
> >     ret_object(rd_ev, hev);
> >     return CL_SUCCESS;
> >  
> > @@ -399,6 +408,9 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >                     ptr, host_origin, host_pitch,
> >                     region));
> >  
> > +   if (blocking)
> > +       hev().wait();
> > +
> >     ret_object(rd_ev, hev);
> >     return CL_SUCCESS;
> >  
> > @@ -504,6 +516,9 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >                     &img, src_origin, src_pitch,
> >                     region));
> >  
> > +   if (blocking)
> > +       hev().wait();
> > +
> >     ret_object(rd_ev, hev);
> >     return CL_SUCCESS;
> >  
> > @@ -538,6 +553,9 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >                     ptr, {}, src_pitch,
> >                     region));
> >  
> > +   if (blocking)
> > +       hev().wait();
> > +
> >     ret_object(rd_ev, hev);
> >     return CL_SUCCESS;
> >  
> > @@ -667,7 +685,11 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >  
> >     void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, region);
> >  
> > -   ret_object(rd_ev, create<hard_event>(q, CL_COMMAND_MAP_BUFFER, deps));
> > +   auto hev = create<hard_event>(q, CL_COMMAND_MAP_BUFFER, deps);
> > +   if (blocking)
> > +       hev().wait();
> > +
> > +   ret_object(rd_ev, hev);
> >     ret_error(r_errcode, CL_SUCCESS);
> >     return map;
> >  
> > @@ -695,7 +717,11 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, cl_bool blocking,
> >  
> >     void *map = img.resource(q).add_map(q, flags, blocking, origin, region);
> >  
> > -   ret_object(rd_ev, create<hard_event>(q, CL_COMMAND_MAP_IMAGE, deps));
> > +   auto hev = create<hard_event>(q, CL_COMMAND_MAP_IMAGE, deps);
> > +   if (blocking)
> > +       hev().wait();
> > +
> > +   ret_object(rd_ev, hev);
> >     ret_error(r_errcode, CL_SUCCESS);
> >     return map;
> >  
> > -- 
> > 2.13.3
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170804/e9c27a73/attachment.sig>


More information about the mesa-dev mailing list