[PATCH] mm: Remove pointless might_sleep() in remove_vm_area().

Thomas Hellstrom thellstrom at vmware.com
Mon Mar 27 14:10:45 UTC 2017


On 03/27/2017 03:26 PM, Andrey Ryabinin wrote:
> [+CC drm folks, see the following threads:
> 	https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_201703232349.BGB95898.QHLVFFOMtFOOJS-40I-2Dlove.SAKURA.ne.jp&d=DwIC-g&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=iraTSVk5qLTsuZU7WSBk97YAYoGmC7W5zjR2wwDRVVk&s=bX-RtB9qE168yR2AjMRvRvln1Pn6r8fmNUDQydGWIdk&e= 
> 	https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1490352808-2D7187-2D1-2Dgit-2Dsend-2Demail-2Dpenguin-2Dkernel-40I-2Dlove.SAKURA.ne.jp&d=DwIC-g&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=iraTSVk5qLTsuZU7WSBk97YAYoGmC7W5zjR2wwDRVVk&s=jw45iN1ypIQrzl08wkan3QZkuU6Gu0riU4_PvZD8KXQ&e= 
> ]
>
> On 03/24/2017 07:17 PM, Matthew Wilcox wrote:
>> On Fri, Mar 24, 2017 at 06:05:45PM +0300, Andrey Ryabinin wrote:
>>> Just fix the drm code. There is zero point in releasing memory under spinlock.
>> I disagree.  The spinlock has to be held while deleting from the hash
>> table. 
> And what makes you think so?
>
> There are too places where spinlock held during drm_ht_remove();
>
> 1) The first one is an obvious crap in ttm_object_device_release():
>
> void ttm_object_device_release(struct ttm_object_device **p_tdev)
> {
> 	struct ttm_object_device *tdev = *p_tdev;
>
> 	*p_tdev = NULL;
>
> 	spin_lock(&tdev->object_lock);
> 	drm_ht_remove(&tdev->object_hash);
> 	spin_unlock(&tdev->object_lock);
>
> 	kfree(tdev);
> }
>
> Obviously this spin_lock has no use here and it can be removed. There should
> be no concurrent access to tdev at this point, because that would mean immediate
> use-afte-free.
>
> 2) The second case is in ttm_object_file_release() calls drm_ht_remove() under tfile->lock
> And drm_ht_remove() does:
> void drm_ht_remove(struct drm_open_hash *ht)
> {
> 	if (ht->table) {
> 		kvfree(ht->table);
> 		ht->table = NULL;
> 	}
> }
>
> Let's assume that we have some other code accessing ht->table and racing
> against ttm_object_file_release()->drm_ht_remove().
> This would mean that such code must do the following:
>   a) take spin_lock(&tfile->lock)
>   b) check ht->table for being non-NULL and only after that it can dereference ht->table.
>
> But I don't see any code checking ht->table for NULL. So if race against drm_ht_remove()
> is possible, this code is already broken and this spin_lock doesn't save us from NULL-ptr
> deref.
>
> So, either we already protected from such scenarios (e.g. we are the only owners of tdev/tfile in
> ttm_object_device_release()/ttm_object_file_release()) or this code is already terribly
> broken. Anyways we can just move drm_ht_remove() out of spin_lock()/spin_unlock() section.
>
> Did I miss anything? 
>
>
>> Sure, we could change the API to return the object removed, and
>> then force the caller to free the object that was removed from the hash
>> table outside the lock it's holding, but that's a really inelegant API.
>>
> This won't be required if I'm right.
>
Actually, I've already sent out a patch for internal review to remove
the spinlocks around drm_ht_free().
They are both in destructors so it should be harmless in this particular
case. The reason the locks are there is to avoid upsetting static code
analyzers that think the hash table pointer should be protected because
it is elsewhere in the code.

However, while it is common that acquiring a resource (in this case
vmalloc space) might sleep,  Sleeping while releasing it ishould, IMO in
general be avoided if at all possible. It's quite common to take a lock
around kref_put() and if the destructor needs to sleep that requires
unlocking in it, leading to bad looking code that upsets static
analyzers and requires extra locking cycles.

In addition, if the vfree sleeping is triggered by waiting for a thread
currently blocked by, for example a memory allocation, which is blocked
by the kernel running shrinkers, which call vfree() then we're in a
deadlock situation.

So to summarize. Yes, the drm callers can be fixed up, but IMO requiring
vfree() to be non-atomic is IMO not a good idea if avoidable.

Thanks,

Thomas





More information about the dri-devel mailing list