[Intel-gfx] [PATCH] drm/i915: Kick rcu harder to free objects

Das, Nirmoy nirmoy.das at linux.intel.com
Thu Sep 8 19:22:49 UTC 2022


On 9/8/2022 4:55 PM, Tvrtko Ursulin wrote:
>
> On 08/09/2022 15:32, Das, Nirmoy wrote:
>> Hi Ville,
>>
>>
>> I fixed a similar issue in DII but I couldn't reproduce it in drm
>>
>> http://intel-gfx-pw.fi.intel.com/patch/228850/?series=15910&rev=2.
>>
>> I wonder if that fixes the problem you are facing then I can send 
>> that to drm.
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 7809be3a6840..5438e9277924 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1213,7 +1213,7 @@  void i915_gem_init_early(struct 
>> drm_i915_private *dev_priv)
>>
>>   void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>>   {
>> -    i915_gem_drain_freed_objects(dev_priv);
>> +    i915_gem_drain_workqueue(dev_priv);
>>       GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>>       GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>>       drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
>
> Yes why not, more black magic (count to three) but if it works... :) I 
> also spy the general area has been a bit neglected. Like:


Not sure what should be the correct solution here.  I wonder if we might 
have to change this because of

https://lwn.net/Articles/906975/ ?


>
> i915_gem_driver_remove:
> ...
>   i915_gem_drain_workqueue
>   i915_gem_drain_freed_objects
>
> While i915_gem_drain_workqueue:
> ...
>   i915_gem_drain_freed_objects
>
> So i915_gem_drain_freed_objects in i915_gem_driver_remove is redundant 
> already.
>
> Should i915_gem_drain_freed_objects be unexported and all callers made 
> just call i915_gem_drain_workqueue after your patch? Or if "drain free 
> objects" is considered more self descriptive it could be made as an 
> alias to i915_gem_drain_workqueue.


We are using i915_gem_drain_freed_objects() in many places and replacing 
that with i915_gem_drain_workqueue() might have performance implication.


Nirmoy


>
> Regards,
>
> Tvrtko
>
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>> On 9/6/2022 7:46 PM, Ville Syrjala wrote:
>>> From: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>>
>>> On gen3 the selftests are pretty much always tripping this:
>>> <4> [383.822424] pci 0000:00:02.0: 
>>> drm_WARN_ON(dev_priv->mm.shrink_count)
>>> <4> [383.822546] WARNING: CPU: 2 PID: 3560 at 
>>> drivers/gpu/drm/i915/i915_gem.c:1223 
>>> i915_gem_cleanup_early+0x96/0xb0 [i915]
>>>
>>> Looks to be due to the status page object lingering on the
>>> purge_list. Call synchronize_rcu() ahead of it to make more
>>> sure all objects have been freed.
>>>
>>> Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index 0f49ec9d494a..5b61f7ad6473 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1098,6 +1098,7 @@ void i915_gem_drain_freed_objects(struct 
>>> drm_i915_private *i915)
>>>           flush_delayed_work(&i915->bdev.wq);
>>>           rcu_barrier();
>>>       }
>>> +    synchronize_rcu();
>>>   }
>>>   /*


More information about the Intel-gfx mailing list