[Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
Daniel Vetter
daniel at ffwll.ch
Wed Jan 6 00:06:58 PST 2016
On Tue, Jan 05, 2016 at 08:35:37AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 05, 2016 at 04:06:48PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 05, 2016 at 04:02:13PM +0100, Peter Zijlstra wrote:
> > > > Shouldn't the slab subsystem do this for us if we request it delays the
> > > > actual kfree? Seems like a core bug to me ... Adding more folks.
> > >
> > > note that sync_rcu() can take a terribly long time.. but yes, I seem to
> > > remember Paul talking about adding this to reclaim paths for just this
> > > reason. Not sure that ever happened thouhg.
>
> There is an RCU OOM notifier, but it just ensures that existing callbacks
> get processed in a timely fashion. It does not block, as that would
> prevent other OOM notifiers from getting their memory freed quickly.
>
> > Also, you might be wanting rcu_barrier() instead, that not only waits
> > for a GP to complete, but also for all pending callbacks to be
> > processed.
>
> And in fact what the RCU OOM notifier does can be thought of as an
> asynchronous open-coded rcu_barrier(). If you are interested, please
> see rcu_register_oom_notifier() and friends.
>
> > Without the latter there might still not be anything to free after it.
>
> Another approach is synchronize_rcu() after some largish number of
> requests. The advantage of this approach is that it throttles the
> production of callbacks at the source. The corresponding disadvantage
> is that it slows things up.
>
> Another approach is to use call_rcu(), but if the previous call_rcu()
> is still in flight, block waiting for it. Yet another approach is
> the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
> idea is to do something like this:
>
> cond_synchronize_rcu(cookie);
> cookie = get_state_synchronize_rcu();
>
> You would of course do an initial get_state_synchronize_rcu() to
> get things going. This would not block unless there was less than
> one grace period's worth of time between invocations. But this
> assumes a busy system, where there is almost always a grace period
> in flight. But you can make that happen as follows:
>
> cond_synchronize_rcu(cookie);
> cookie = get_state_synchronize_rcu();
> call_rcu(&my_rcu_head, noop_function);
>
> Note that you need additional code to make sure that the old callback
> has completed before doing a new one. Setting and clearing a flag
> with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
> and smp_store_release()).
This pretty much went over my head ;-) What I naively hoped for is that
kfree() on an rcu-freeing slab could be tought to magically stall a bit
(or at least expedite the delayed freeing) if we're piling up too many
freed objects. Doing that only in OOM is probably too late since OOM
handling is a bit unreliable/unpredictable. And I thought we're not the
first ones running into this problem.
Do all the other users of rcu-freed slabs just open-code their own custom
approach? If that's the recommendation we can certainly follow that, too.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list