[Beignet] [PATCH 2/2] Fix DRM Memory leak BUG

Zhigang Gong zhigang.gong at linux.intel.com
Tue Sep 22 00:06:24 PDT 2015


On Tue, Sep 22, 2015 at 07:47:45AM +0000, Yang, Rong R wrote:
> The scenario of this memory leak don't not deal with event's call back.
> The scenario is:
> 
> clEnqueuNDRange(....., event1); 
> clReleaseEvent(event1);
> clEnqueuNDRange(....., event2); 
> clReleaseEvent(event2);
> clEnqueuNDRange(....., event3); 
> clReleaseEvent(event3);
> 
> 
> Application create events but don't use them.
> After first clEnqueuNDRange, the event1 ref count is 2, and last event is event1.

> In first clReleaseEvent, because the event have't complete, the event's ref count is 1, will not delete.
> After  the 2nd clEnqueuNDRange, the last event point to event2.
> So neither driver nor application will track of event1, so event1 leak.

Sign, another bug caused by the weird event handling mechanism.

Now, I know the root cause for this specific issue. Beignet doesn't have a deadicated 
thread to maintain event. And because we want to avoid busy wait for each event release,
we increase reference counter to the event when we create the event, and when the user
want to release it, it will just decrease ref counter it to 1 and not zero. Thus it will
wait (not busy wait) for the real completion to release that event. But it make a implicitly
requirement for a event status update after the event complete.

If we add any busy wait to solve this issue, then all we do to avoid busy wait at the clRelease()
become meaningless. If we want o avoid busy wait any way, we need a new mechanism to track all events,
and make sure all events will get a chance to be updated. So we need a event list to track these obsolete
events which have no users. And need to determine a proper time point to update their status.

Becareful that these event lists are thread specific, each thread should have different.

I still suggest some one in beignet team to rewrite the so complicate and error-prone event handling mechanism.

Thanks,
Zhigang Gong.

> 
> I think one solution is, before update last event, check the current last event whether is waited by other events or not. If not, should add the last event to a list, and update / delete the events in the list when flush or queue delete. This solution does not need the busy wait.

> 
> > -----Original Message-----
> > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of
> > Zhigang Gong
> > Sent: Tuesday, September 22, 2015 13:21
> > To: Pan, Xiuli
> > Cc: beignet at lists.freedesktop.org
> > Subject: Re: [Beignet] [PATCH 2/2] Fix DRM Memory leak BUG
> > 
> > On Tue, Sep 22, 2015 at 04:51:41AM +0000, Pan, Xiuli wrote:
> > > I agree about the complex event handlings, and maybe we should do that
> > update somewhere else, but the leaked event is newed from
> > clEnqueueNDRangeKernel and passed to user and it is a very rare usage. As it
> > is not a user event, and the only chance for us to update the event status  in
> > the last_event is here. If the last_event is completed it will be deleted from
> > the event update function, otherwise it will be lost and cause leak, so we
> > need to force it updating here. Also if the event is completed before that,
> > the last_event should be NULL. I think if we did it like gpgpu in a linked list,
> > maybe we could not do blocking update, but now we may do a block update
> > to make other things easier in these cases. We should have more tests about
> > the events, but now the memory leak caused by rare usage of event is now
> > be fixed.
> > 
> > One misunderstanding in the above analysis is that event update function
> > itself never deletes any event. It just update the event status and check for
> > all events in wait lists, if any event status become compelte, it will try to
> > check wait list recursively and if any completed event has user call back
> > function, it will call those call back function.
> > 
> > The reason why we wil leak a event if we don't force update here is that
> > application may usually put the clReleaseEvent() into the event's call back
> > function. Otherwise, we will not leak any event. Because user will call
> > clReleaseEvent() explicitly. If user don't do that, then it's a application level
> > bug.
> > 
> > You could continue to track down the specific application to find out when
> > you put such a force update there, how does it help on releasing the missing
> > event?
> > 
> > Is the event released within beignet internal? If so, what's the code path?
> > Is the event released in user registered call back function? If so, how does
> > that call back function get missed?
> > 
> > cl_command_queue_flush() has been called from almost all the enqueue
> > functions.
> > Add a almost unconditional(just check the last event) blocking event wait
> > here is really not good idea.
> > 
> > Thanks,
> > Zhigang Gong.
> > 
> > >
> > > The rare usage of event from the PSieve-CUDA case:
> > >   checkCUDAErr(clEnqueueReadBuffer(commandQueue,
> > >         d_factor_found,
> > >         CL_TRUE,
> > >         0,
> > >         cthread_count*sizeof(cl_uint),
> > >         factor_found,
> > >         0,
> > >         NULL,
> > >         &dev_read_event), "Retrieving results"); //It get the
> > > ReadBuffer event here, as well as NDRangeKernel event.
> > >
> > >   checkCUDAErr(clWaitForEvents(1, &dev_read_event), "Waiting for results
> > read. (clWaitForEvents)");
> > >   checkCUDAErr(clReleaseEvent(dev_read_event), "Release event object
> > > 3. (clReleaseEvent)");
> > >
> > > //Then wait and release the event, it is very different from our usage.
> > >
> > > I will have a deep look about this usage path. Thank you for your advice.
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > > Sent: Tuesday, September 22, 2015 10:30 AM
> > > To: Pan, Xiuli <xiuli.pan at intel.com>
> > > Cc: beignet at lists.freedesktop.org
> > > Subject: Re: [Beignet] [PATCH 2/2] Fix DRM Memory leak BUG
> > >
> > > Nice catch! But may not be a correct fix.
> > > We don't need to do the blocking event updating all the time.
> > > We only need to do that when there is potential possibility to leak a event.
> > If a event has a user call back function registered is such a case, and my best
> > guessing here is:
> > > one event in the wait list of the last event has user call back function
> > registered and has been missed.
> > >
> > > We may need to check all the wait list of the last event before we do a
> > locking event updating here.
> > >
> > > Thanks,
> > > Zhigang Gong.
> > >
> > > On Mon, Sep 21, 2015 at 04:41:52PM +0800, Pan Xiuli wrote:
> > > > This bug is cased by event flush, we should not only run usr event
> > > > but also event made by enqueue functions.
> > > > If the event haven't been completed before it is been overwite in
> > > > the last_event, the related gpgpu buffer will not be unreference.
> > > > And will cause all related drm buffers unreference and thenw leak.
> > > >
> > > > Signed-off-by: Pan Xiuli <xiuli.pan at intel.com>
> > > > ---
> > > >  src/cl_command_queue.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/src/cl_command_queue.c b/src/cl_command_queue.c index
> > > > 4b92311..fd1d613 100644
> > > > --- a/src/cl_command_queue.c
> > > > +++ b/src/cl_command_queue.c
> > > > @@ -261,7 +261,7 @@ cl_command_queue_flush(cl_command_queue
> > queue)
> > > >    // the event any more. If we don't do this here, we will leak that event
> > > >    // and all the corresponding buffers which is really bad.
> > > >    cl_event last_event = get_last_event(queue);
> > > > -  if (last_event && last_event->user_cb)
> > > > +  if (last_event)
> > > >      cl_event_update_status(last_event, 1);
> > > >    cl_event current_event = get_current_event(queue);
> > > >    if (current_event && err == CL_SUCCESS) {
> > > > --
> > > > 2.1.4
> > > >
> > > > _______________________________________________
> > > > Beignet mailing list
> > > > Beignet at lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/beignet
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet


More information about the Beignet mailing list