[Intel-gfx] [PATCH 4/5] drm/i915: Do not lie about atomic wait granularity

Thu Mar 3 15:43:49 UTC 2016

On 03/03/16 15:11, Chris Wilson wrote:
> On Thu, Mar 03, 2016 at 02:36:44PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> Currently the wait_for_atomic_us only allows for a millisecond
>> granularity which is not nice towards callers requesting small
>> micro-second waits.
>
> Hmm, by granularity I think of the interval between COND reads. That
> should be the limiting factor in how fast we respond to the change in
> event (and so for the atomic variants should be virtually the same).

Yeah not that granularity, should I say "timeout granularity" then in 
the commit? Before if someone wanted to wait for, say 10us, it would 
have busy looped for one jiffie instead. In the timeout scenario that 
is. That is the headline improvement.

> Your suggestion that it is icache or similar is probably along the right
> path. A quick code size measurement of a test function would be a good
> supporting argument.

I am not sure. It shaves ~3.3% (12 of original 353 bytes) of the whole 
fw_domains_get which even if it is quite hot as you know, and even 3% 
from the main loop in wait for atomic (3% here is a glorious one byte). 
So I am not sure, but gem_latency results were quite convincing. 
Unexplained for me.

>> Re-implement it so micro-second granularity is really supported
>> and not just in the name of the macro.
>>
>> This has another beneficial side effect that it improves
>> "gem_latency -n 100" results by approximately 2.5% (throughput
>> and latencies) and 3% (CPU usage). (Note this improvement is
>> relative to not yet merged execlist lock uncontention patch
>> which moves the CSB MMIO outside this lock.)
>>
>> v2:
>>    * Warn when used from non-atomic context (if possible).
>>    * Warn on too long atomic waits.
>>
>> v3:
>>    * Added comment explaining CONFIG_PREEMPT_COUNT.
>>    * Fixed pre-processor indentation.
>>    (Chris Wilson)
>>
>> v4:
>>   * Commit msg update (gem_latency) and rebase.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/intel_drv.h | 31 +++++++++++++++++++++++++++++--
>>   1 file changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>> index c2a62e9554b3..b6fcb5c800e7 100644
>> --- a/drivers/gpu/drm/i915/intel_drv.h
>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>> @@ -44,6 +44,10 @@
>>    * contexts. Note that it's important that we check the condition again after
>>    * having timed out, since the timeout could be due to preemption or similar and
>>    * we've never had a chance to check the condition before the timeout.
>> + *
>> + * TODO: When modesetting has fully transitioned to atomic, the below
>> + * drm_can_sleep() can be removed and in_atomic()/!in_atomic() asserts
>> + * added.
>>    */
>>   #define _wait_for(COND, US, W) ({ \
>>   	unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;	\
>> @@ -66,8 +70,31 @@
>>   #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
>>   #define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
>>
>> -#define wait_for_atomic(COND, MS) 	_wait_for((COND), (MS) * 1000, 0)
>> -#define wait_for_atomic_us(COND, US)	_wait_for((COND), (US), 0)
>> +/* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
>> +#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
>> +# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
>> +#else
>> +# define _WAIT_FOR_ATOMIC_CHECK do { } while(0)
>> +#endif
>> +
>> +#define _wait_for_atomic(COND, US) ({ \
>> +	unsigned long end__; \
>> +	int ret__ = 0; \
>> +	_WAIT_FOR_ATOMIC_CHECK; \
>> +	BUILD_BUG_ON((US) > 50000); \
>> +	end__ = (local_clock() >> 10) + (US) + 1; \
>> +	while (!(COND)) { \
>> +		if (time_after((unsigned long)(local_clock() >> 10), end__)) { \
>
> A comment here about why we can forgo the COND recheck would be
> worthwhile.
>
> /* Unlike the regular wait_for(), this atomic variant cannot be
>   * preempted (and we'll just ignore the issue of irq interruptions) and
>   * so we know that no time has passed since the last check of COND and
>   * can immediately report the timeout.
>   */

Will add.

Regards,

Tvrtko