[Beignet] [PATCH 2/2] Fix DRM Memory leak BUG

Zhigang Gong zhigang.gong at linux.intel.com
Mon Sep 21 22:21:00 PDT 2015


On Tue, Sep 22, 2015 at 04:51:41AM +0000, Pan, Xiuli wrote:
> I agree about the complex event handlings, and maybe we should do that update somewhere else, but the leaked event is newed from clEnqueueNDRangeKernel and passed to user and it is a very rare usage. As it is not a user event, and the only chance for us to update the event status  in the last_event is here. If the last_event is completed it will be deleted from the event update function, otherwise it will be lost and cause leak, so we need to force it updating here. Also if the event is completed before that, the last_event should be NULL. I think if we did it like gpgpu in a linked list, maybe we could not do blocking update, but now we may do a block update to make other things easier in these cases. We should have more tests about the events, but now the memory leak caused by rare usage of event is now be fixed.

One misunderstanding in the above analysis is that event update function
itself never deletes any event. It just update the event status and check
for all events in wait lists, if any event status become compelte, it will
try to check wait list recursively and if any completed event has user
call back function, it will call those call back function.

The reason why we wil leak a event if we don't force update here is that application
may usually put the clReleaseEvent() into the event's call back function. Otherwise,
we will not leak any event. Because user will call clReleaseEvent() explicitly. If user
don't do that, then it's a application level bug.

You could continue to track down the specific application to find out when you put such
a force update there, how does it help on releasing the missing event?

Is the event released within beignet internal? If so, what's the code path?
Is the event released in user registered call back function? If so, how does that call back function get missed?

cl_command_queue_flush() has been called from almost all the enqueue functions.
Add a almost unconditional(just check the last event) blocking event wait here
is really not good idea.

Thanks,
Zhigang Gong.

> 
> The rare usage of event from the PSieve-CUDA case:
>   checkCUDAErr(clEnqueueReadBuffer(commandQueue,
>         d_factor_found,
>         CL_TRUE,
>         0,
>         cthread_count*sizeof(cl_uint),
>         factor_found,
>         0,
>         NULL,
>         &dev_read_event), "Retrieving results");
> //It get the ReadBuffer event here, as well as NDRangeKernel event.
> 
>   checkCUDAErr(clWaitForEvents(1, &dev_read_event), "Waiting for results read. (clWaitForEvents)");
>   checkCUDAErr(clReleaseEvent(dev_read_event), "Release event object 3. (clReleaseEvent)");
> 
> //Then wait and release the event, it is very different from our usage.
> 
> I will have a deep look about this usage path. Thank you for your advice.
> 
> 
> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com] 
> Sent: Tuesday, September 22, 2015 10:30 AM
> To: Pan, Xiuli <xiuli.pan at intel.com>
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 2/2] Fix DRM Memory leak BUG
> 
> Nice catch! But may not be a correct fix.
> We don't need to do the blocking event updating all the time.
> We only need to do that when there is potential possibility to leak a event. If a event has a user call back function registered is such a case, and my best guessing here is:
> one event in the wait list of the last event has user call back function registered and has been missed.
> 
> We may need to check all the wait list of the last event before we do a locking event updating here.
> 
> Thanks,
> Zhigang Gong.
> 
> On Mon, Sep 21, 2015 at 04:41:52PM +0800, Pan Xiuli wrote:
> > This bug is cased by event flush, we should not only run usr event but 
> > also event made by enqueue functions.
> > If the event haven't been completed before it is been overwite in the 
> > last_event, the related gpgpu buffer will not be unreference. And will 
> > cause all related drm buffers unreference and thenw leak.
> > 
> > Signed-off-by: Pan Xiuli <xiuli.pan at intel.com>
> > ---
> >  src/cl_command_queue.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/cl_command_queue.c b/src/cl_command_queue.c index 
> > 4b92311..fd1d613 100644
> > --- a/src/cl_command_queue.c
> > +++ b/src/cl_command_queue.c
> > @@ -261,7 +261,7 @@ cl_command_queue_flush(cl_command_queue queue)
> >    // the event any more. If we don't do this here, we will leak that event
> >    // and all the corresponding buffers which is really bad.
> >    cl_event last_event = get_last_event(queue);
> > -  if (last_event && last_event->user_cb)
> > +  if (last_event)
> >      cl_event_update_status(last_event, 1);
> >    cl_event current_event = get_current_event(queue);
> >    if (current_event && err == CL_SUCCESS) {
> > --
> > 2.1.4
> > 
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet


More information about the Beignet mailing list