[Intel-gfx] [PATCH] drm/i915: Replace some more busy waits with normal ones
tvrtko.ursulin at linux.intel.com
Thu Mar 24 11:37:07 UTC 2016
On 23/03/16 16:40, Chris Wilson wrote:
> On Wed, Mar 23, 2016 at 04:24:48PM +0000, Tvrtko Ursulin wrote:
>> Biggest thing to make sure is that you don't add a lot of cycles to
>> the forcewake loops since for example fw_domains_get can be the
>> hottest i915 function on some benchmarks.
>> (This area slightly annoys me anyway with redundant looping over
>> forcewake domains and we could also potentially optimize the ack
>> waiting by first requesting all we want, and then doing the waits.
>> That would be one additional loop, but if removed the other one,
>> code would stay at the same number of domain loops.)
> I hear you. I just end up weeping in the corner when I see fw_domain_get
> on the profile.
> We already do have a mitigation scheme to hold onto the forcewake for an
> extra jiffie every time. I don't like it, but without it fw_domains_get
> becomes a real hog.
I am pretty sure I've seen some tests which somehow defeat the jiffie
delay and we end up re-acquiring every ms/jiffie. This is something I
wanted to get to the bottom of but did not get round to yet. It was
totally unexpected because the test is hammering on everything.
> Note that one thing we can actually do is restrict the domains we wakeup
> for the engines (engine->fw_domain) in execlists_submit, that should
> help chv/skl+ a small amount.
I even have a patch to do that somewhere. :)
> I don't have a good idea for how to keep rc6 residency high but avoid
> forcewake when those darn elsp require forcewake. As does gen6+ legacy
> RING_TAIL writes. And even then that spinlock causes quite a bit of
> traffic when it shouldn't be contended. I've been thinking of whether we
> can have multiple locks (hashed by register) but we would then still
> need some cross-communication for the common forcewake.
Maybe it is not worth it at this point. This is pretty well optimised
now and could switch to the next target. Like maybe move to active and
retire__read, or retired_req_list, or something.
More information about the Intel-gfx