[Intel-gfx] [PATCH] drm/i915: use udelay for very short delays

Thu Dec 15 11:39:14 UTC 2016

On Thu, Dec 15, 2016 at 11:51 AM, Nicholas Mc Guire <der.herr at hofr.at> wrote:
> On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
>> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
>> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat at osadl.org> wrote:
>> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
>> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> > > > change this to a udelay(2).
>> > >
>> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> > > so this boils down to which is the better use of CPU. We could probably
>> > > relax the max delay more if that was helpful. But I'm not immediately
>> > > sold on "is most likely more efficient" which sounds like a gut feeling.
>> > >
>> > > I'm sorry it's not clear in my other reply that I do appreciate
>> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> > > convinced udelay() is the answer.
>> >
>> > So one reason why we unconditionally use *sleep variants is the
>> > might_sleep check. Because in the past people have used udelay and mdelay,
>> > those delays had to be increased a lot because hw, and at the same time
>> > someone added users of these functions to our irq helper, resulting in irq
>> > handling times measures in multiple ms. That's not good.
>> >
>> > So until someone can demonstrate that there's a real benefit (which let's
>> > be honest, for modeset code, will never be the case) I very highly prefer
>> > to use *sleep* functions. They prevent some silly things from happening by
>> > accident. Micro-optimizing modeset code and hampering maitainability in
>> > the process is not good.
>>
>> Also, the entire premise seems backwards: usleep_range is inefficient for
>> certain parameter ranges and better replaced with udelay. That makes
>> sense.
>>
>> But why exactly do we not fix udelay_range then, but instead do a cocci
>> job crawling through all the thousands of callers? Just fix usleep(_range)
>> to use udelay for very small values (and still keep the might_sleep check
>> ofc) if that's more efficient!
>
> its actually not thousands more like a few dozen:

There's 1k + usleep* calls you need to audit and teach people how to
exactly use it.

> usleep_range(min,max) in linux-stable 4.9.0
>
> 1648 calls total
> 1488 pass numeric values only (90.29%)
>   27 min below 10us (1.81%)
>   40 min above 10ms (2.68%)
>      min out of spec 4.50%
>   76 preprocessor constants (4.61%)
>    1 min below 10us (1.31%)
>    8 min above 10ms (10.52%)
>      min out of spec 11.84%
>   85 expressions (5.15%)
> 1(0) min below 10us (1.50%)*
> 6(2) min above 10ms (7.50%)*
>      min out of spec 9.0%
> Errors:
>   23 where min==max  (1.39%)
>    0 where max < min (0.00%)
>
> Total:
>   Bugs: 6.48%-10.70%*
>   Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
>   Detectable by coccinelle:
>   Bugs: 74/103 (71.8%)
>   Crit: 50/52 (96.1%)
>
> *based on estimats as runtime values not known.
>
>
> there is no usleep() as noted in Documentation/timers/timers-howto.txt
>                 - Why not usleep?
>                         On slower systems, (embedded, OR perhaps a speed-
>                         stepped PC!) the overhead of setting up the hrtimers
>                         for usleep *may* not be worth it. Such an evaluation
>                         will obviously depend on your specific situation, but
>                         it is something to be aware of.
>
> and it suggests to use different API for different ranges - sounds sane
> to me and seems to cover the needs of drivers.

git grep usleep seems to disagree, at least I see a bunch of usleep()
calls. And per Rusty's api usability scale the ultimate fucntion is
dwim(). It just feels to me like pushing such trade-off decisions to
drivers is bad design because a) the tradeoffs depend upon the cpu
you're running on b) driver writers are pretty good at getting such
tradeoffs wrong in general. Aiming for a more dwim()-like api for this
here makes sense, instead of an eternal fight to correct drivers that
get it wrong all the time.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch