[Intel-gfx] [PATCH] drm/i915: Replace some more busy waits with normal ones

Wed Mar 23 16:24:48 UTC 2016

On 23/03/16 15:43, Mika Kuoppala wrote:
> Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com> writes:
>
>> [ text/plain ]
>>
>> Should have sent this as RFC..
>>
>> On 23/03/16 14:32, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>
>>> When I added an assert to catch non-atomic users of
>>> wait_for_atomic_us in 0351b93992aa463cc3e7f358ddec2709f9390756
>>> ("drm/i915: Do not lie about atomic timeout granularity"),
>>> I have missed some callers which use it from obviously
>>> non-atomic context.
>>>
>>> Replace them with sleeping waits which support micro-second
>>> timeout granularity since 3f177625ee896f5d3c62fa6a49554a9c0243bceb
>>> ("drm/i915: Add wait_for_us").
>>>
>>> Note however than a fix for wait_for is needed to a clock with
>>> more granularity than jiffies. In the above referenced patch
>>> I have switched the arguments to micro-seconds, but failed to
>>> upgrade the clock as well, as Mika has later discovered.
>>>
>>> Open question here is whether we should allow sleeping waits
>>> of less than 10us which usleep_range recommends against. And
>>> this patch actually touches one call site which asks for 1us
>>> timeout.
>>>
>>> These might be better served with wait_for_atomic_us, in which
>>> case the inatomic warning there should be made dependant on
>>> the requested timeout.
>>
>> For discussion - does the above sound like a better plan than this
>> patch? To sum up my proposal:
>>
>
> What I have aimed for was that we only have wait_for and wait_for_atomic.
>
> The sleeping one operates on 1ms granularity and the nonsleeping
> one on usecs.

Okay, if you think 1ms is enough for all callers.

>> 1. Allow wait for_atomic_us for < 10us waits and keep using it for such
>> waiters.
>
> I have modified the wait_for to do few busy cycles on the
> start of the wait and then adaptive backoff if condition is not
> yet met. In hopes that we could convert few atomic_waits for this.

Sounds good.

>> 2. Upgrade the clock in wait_for to something more precise than jiffies
>> so timeouts from 10us and up can be handled properly. Note that
>> currently this is  only and issue in the failure/timeout mode. In the
>> expected case the current implementation is fine.
>>
>
> I would not go this route. If you really really want <1ms response
> this should be explicit in the callsite. Disclaimer: i don't
> know all the callsites and requirements.

It is explicit, just that it is currently broken on the timeout front. 
But never mind, if the precision is really not required then it is good 
to get rid of it.

>> Equally as under 1), put a BUILD_BUG_ON in wait_for for <10us waits.
>>
>
> This is what I had in mind (wip/rfc):
> https://cgit.freedesktop.org/~miku/drm-intel/log/?h=wait_until
>
> Spiced with your patch and few build_bug_on, I think the
> wait_for_atomic(_us) might become rare thing.

I had a brief look at your tree - looks like a comprehensive approach 
and in overall a good one. Have spotted some small details, but I can 
comment on when you post it.

Biggest thing to make sure is that you don't add a lot of cycles to the 
forcewake loops since for example fw_domains_get can be the hottest i915 
function on some benchmarks.

(This area slightly annoys me anyway with redundant looping over 
forcewake domains and we could also potentially optimize the ack waiting 
by first requesting all we want, and then doing the waits. That would be 
one additional loop, but if removed the other one, code would stay at 
the same number of domain loops.)

Regards,

Tvrtko