[Intel-gfx] [PATCH] drm/i915: drop the coditional mutex

Mon Apr 8 20:58:57 CEST 2013

* Daniel Vetter | 2013-04-02 15:47:02 [+0200]:

>On Tue, Apr 02, 2013 at 03:30:58PM +0200, Sebastian Andrzej Siewior wrote:
>> mutex_is_locked_by() checks the owner of the lock against current. This
>> is done by accessing a private member of struct mutex which works on
>> mainline but does not on RT.
>> I did not figure out, why this "lock-owner-check" and the "lock stealing
>> flag" is required. If the lock can not be acquire the lock (because it
>> would deadlock) then it can return -1.
>> The lock stealing makes actually no sense to me. If
>> shrinker_no_lock_stealing is true then the same functions
>> (i915_gem_purge(), i915_gem_shrink_all()) are called from
>> i915_gem_object_create_mmap_offset() as from i915_gem_inactive_shrink().
>> I haven't found a path in which i915_gem_inactive_shrink() is invoked
>> from i915_gem_object_create_mmap_offset() that means there is no way
>> shrinker_no_lock_stealing is true _and_ the lock is owned by the current
>> process.
>
>Every invocation of a memory allocation function can potentially recourse
>into the shrinker callbacks. Since we currently protect almost all our gpu
>buffer object state with one single lock, and we can also easily hold onto
>tons of memory, we can easily OOM without that lock stealing trick.

But this only occures when you allocate memory from within the driver or
is this mutex so contended that it is almost always taken?

>The patch you're reverting here was added to work around spurious OOM when
>trying to allocate sg tables, so this is real.

Couldn't you simply call the two shrink functions before allocating the
sg table?

>I fully realize that this is an awful hack and I'll burn on a pretty big
>pyre for merging it. I also know that it doesn't really work if someone
>else is trying to allocate while another thread is hogging our single
>mutex. But without a replacement fix, this is nacked.

I'm sorry to point it out but this "fix" has staging quality.

>The no_lock_stealing hack is probably even worse, we use it to protect
>critical section which are touching the gpu lru.

Maybe another reason to have a few spots where you call shirnk function
manually…

>
>Long term we're working towards that goal (also for a bunch of other
>reasons) and aim for more fine-grained locking, so that we can free memory
>in smaller chunks. But that's gonna be a lot of work.
Could you please define "long term" in time units? I would really like
to see the owner check gone asap.

>Cheers, Daniel

Sebastian