[PATCH -fixes 5/5] drm/vmwgfx: Fix a buffer object eviction regression

Matthew Wilcox willy at infradead.org
Thu Sep 13 15:28:11 UTC 2018


On Thu, Sep 13, 2018 at 04:56:53PM +0200, Thomas Hellstrom wrote:
> Hi,
> 
> On 09/13/2018 04:10 PM, Matthew Wilcox wrote:
> > On Thu, Sep 13, 2018 at 01:58:37PM +0200, Thomas Hellstrom wrote:
> > > Commit 4eb085e42fde ("drm/vmwgfx: Convert to new IDA API") indroduced
> > > an incorrect return value from the function vmw_gmrid_man_get_node(),
> > > when we run out if integer ids. Instead of returning 0 (meaning
> > > non-fatal error) we forward the ida_simple_get error code -ENOSPC.
> > > This causes TTM not to retry allocation after buffer eviction and
> > > instead return -ENOSPC to user-space.
> > > 
> > > Fix this by returning 0 when ida_simple_get() returns -ENOSPC.
> > Thanks.  I got confused by the convoluted code that was there before ;-(
> > 
> > I think this could be better though ... if ida_alloc() ever starts
> > returning a different errno in the future, you'll hit the same problem,
> > right?  So how about this ...
> > 
> >   	id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL);
> > +	if (id == -ENOMEM)
> > +		return -ENOMEM;
> > +	if (id < 0)
> > +		return 0;
> >   	spin_lock(&gman->lock);
> > 
> > But I wonder ... why is -ENOMEM seen as a fatal error?  If you free up
> > some memory, you'll free up an ID, so the next time around you should
> > be able to allocate an ID.  So shouldn't this function just have
> > been doing this all along?
> > 
> >   	id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL);
> > +	if (id < 0)
> > +		return 0;
> > 
> Non-fatal errors are errors that can be remedied by GPU buffer eviction, and
> buffer eviction will free up IDA space, so basically we need to target only
> the error code that indicates we've run out of IDA space.

Yes, but the following situation can happen:

 - Allocate 1024 IDs
 - Run very low on memory
 - Allocating ID 1025 will fail (very very unlikely)
 - ida_alloc_max() returns -ENOMEM

In this situation, we want ttm_mem_evict_first() to be called which will
free up one of the 1024 existing IDs and then we can allocate that ID for
our new node.

I'm assuming we're analysing the behaviour of ttm_bo_mem_force_space()
here.

> If we're worried that ida_alloc_max() will change return value, I guess we
> will have to increase the IDA space and detect the error ourselves:  error
> if (id >= gman->max_gmr_ids)

My point was that your solution (detect the one error which should be
deemed as non-fatal) was not as robust as its inverse (detect the one
error which the previous code deemed as fatal).  But I now believe no
error from the IDA should be seen as fatal.


More information about the dri-devel mailing list