[PATCH -fixes 5/5] drm/vmwgfx: Fix a buffer object eviction regression
Thomas Hellstrom
thellstrom at vmware.com
Thu Sep 13 16:52:43 UTC 2018
On 09/13/2018 05:28 PM, Matthew Wilcox wrote:
> On Thu, Sep 13, 2018 at 04:56:53PM +0200, Thomas Hellstrom wrote:
>> Hi,
>>
>> On 09/13/2018 04:10 PM, Matthew Wilcox wrote:
>>> On Thu, Sep 13, 2018 at 01:58:37PM +0200, Thomas Hellstrom wrote:
>>>> Commit 4eb085e42fde ("drm/vmwgfx: Convert to new IDA API") indroduced
>>>> an incorrect return value from the function vmw_gmrid_man_get_node(),
>>>> when we run out if integer ids. Instead of returning 0 (meaning
>>>> non-fatal error) we forward the ida_simple_get error code -ENOSPC.
>>>> This causes TTM not to retry allocation after buffer eviction and
>>>> instead return -ENOSPC to user-space.
>>>>
>>>> Fix this by returning 0 when ida_simple_get() returns -ENOSPC.
>>> Thanks. I got confused by the convoluted code that was there before ;-(
>>>
>>> I think this could be better though ... if ida_alloc() ever starts
>>> returning a different errno in the future, you'll hit the same problem,
>>> right? So how about this ...
>>>
>>> id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL);
>>> + if (id == -ENOMEM)
>>> + return -ENOMEM;
>>> + if (id < 0)
>>> + return 0;
>>> spin_lock(&gman->lock);
>>>
>>> But I wonder ... why is -ENOMEM seen as a fatal error? If you free up
>>> some memory, you'll free up an ID, so the next time around you should
>>> be able to allocate an ID. So shouldn't this function just have
>>> been doing this all along?
>>>
>>> id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL);
>>> + if (id < 0)
>>> + return 0;
>>>
>> Non-fatal errors are errors that can be remedied by GPU buffer eviction, and
>> buffer eviction will free up IDA space, so basically we need to target only
>> the error code that indicates we've run out of IDA space.
> Yes, but the following situation can happen:
>
> - Allocate 1024 IDs
> - Run very low on memory
> - Allocating ID 1025 will fail (very very unlikely)
> - ida_alloc_max() returns -ENOMEM
>
> In this situation, we want ttm_mem_evict_first() to be called which will
> free up one of the 1024 existing IDs and then we can allocate that ID for
> our new node.
>
> I'm assuming we're analysing the behaviour of ttm_bo_mem_force_space()
> here.
Well, that's true, but that situation depends I guess very much on the
radix tree implementation of IDA? Also I would expect the eviction paths
to try to allocate more memory here and there, so to me the preferred
option when -ENOMEM happens, is really to back off as soon as possible
to avoid interfering with shrinker work going on etc.
>> If we're worried that ida_alloc_max() will change return value, I guess we
>> will have to increase the IDA space and detect the error ourselves: error
>> if (id >= gman->max_gmr_ids)
> My point was that your solution (detect the one error which should be
> deemed as non-fatal) was not as robust as its inverse (detect the one
> error which the previous code deemed as fatal). But I now believe no
> error from the IDA should be seen as fatal.
If you insist, I can test on -ENOMEM instead of -ENOSPC to mimic the
pre-change behaviour. We should really focus on the IDA api changes
here, and defer changing -ENOMEM to non-fatal to a follow-up patch if
needed.
Thanks,
Thomas
More information about the dri-devel
mailing list