[PATCH] drm: Replace kref with a simple atomic reference count

Sun Nov 28 06:19:50 PST 2010

On 11/28/2010 02:35 PM, Daniel Vetter wrote:
> On Sun, Nov 28, 2010 at 01:32:27PM +0100, Thomas Hellstrom wrote:
>    
>> This is racy, in that the kref_get() can hit a zero refcount. I
>> think an ideal thing here would be to add a kref_get_unless_zero()
>> for this situation, that returns an error if the refcount was indeed
>> zero. I don't think that would violate the kref idea, since we still
>> never increase the refcount of a zeroed object, and the user needs
>> to provide the necessary synchronization to make sure the object
>> isn't gone while still the refcount is zero.
>>      
> I've got curious rechecked the code and it all looks sane. The trick is
> that all the userspace-visible names hold a ref to the underlying gem
> object (including the flink name). If all userspace handles are destroyed
> (save the flink name) the flink name gets destroyed, too (see
> obj->handle_count). All this is independently protected with spinlocks
> and it happens _before_ dropping the ref to the underlying bo (and taking
> any necessary locks to do so). I don't see any races nor locking problems,
> there.
> -Daniel
>    

Hi, Daniel!

I'm not saying that the current gem code is incorrect. In particular, 
the gem name system allows upping the handle_count even if it's zero and 
the name is just about to be destroyed. I see that more as a curiosity, 
and gem would work just as well if it used kref_get_unless_zero on the 
handle_count.

However this is a generic problem.
It appears in the general user-visible object case. For example, 
currently gem needs to take the struct_mutex *before* unreferencing to 
avoid hitting this problem with a racing mmap lookup (At least I think 
that's why the struct_mutex is taken).  If mmap were using 
kref_get_unless_zero, and returning error on failure, the struct_mutex 
would be needed only inside the kref release function, and that is what 
Dave was referring to in his mail.

An identical problem appears in TTM where we need to take the 
bdev::vm_lock around unreffing a ttm bo, and in the ttm_object 
functionality were we also have a general lookup mechanism for stuff 
like contexts, surfaces, user-visible fences or whatever.

In particular this also applies to the rcu case, were we cannot block 
readers. If we were to move over all object lookup to use RCU locks, 
we'd want to be completely 100% sure that once a refcount reaches zero, 
it won't be upped again, and we can safely go ahead and use call_rcu for 
destruction. Let's say this wasn't true, call_rcu could in principle be 
called multiple times for the same object, and that isn't allowed until 
the callback has been run...

And the simple solution to this problem is kref_get_unless_zero().

/Thomas