[PATCH 2/5] drm: Add initial dnyamic power off feature

Mon Sep 10 02:00:58 PDT 2012

On Mon, Sep 10, 2012 at 10:23 AM, Dave Airlie <airlied at gmail.com> wrote:
> (oops' forgot reply to all)
>
>>
>> My midlayer-smell-o-meter just cranked up to 11 when reading this comment
>> ;-)
>>
>> I'd have expected:
>> - Drivers to check the power state and enable the gpu if it's off in their
>>   cs ioctl (instead of the brute-force every ioctl there is approach in
>>   the drm core)
>
> The problem is it won't just be the CS ioctl, so I just went with the
> larger hammer, I started annotating all the nouveau ioctls adding a
> wake up check at the top, and then realised that I need to start
> annotating all the non-gpu paths as well, like drm open, sysfs,
> basically anywhere that might cause us to access any GPU resources.
> Once I started annotating everywhere I realised just bashing it into
> the top most ioctl then sticking the whole lot into the drm core made
> sense. You have to think the device is gone completely, not the device
> is asleep. its not asleep, its effectively dead.

Yeah, I've noticed that there are tons of places that need it. Otoh stuff
added to core should be scrunitized much more imo - I don't care about
races only in the drivers ;-) And I think you need to annotate all driver
paths anyway to make this race-free:
1. userspace does ioctl/syfs open 
2. core enables gpu
3. massive delay out of far left field
4. core decides to disable the gpu again
..
5. ioctl/sysf file access continues: boom

Afaics the only way to avoid 5 goin boom is to properly instrument all
driver paths that touch the hw, hold some sort of reference while doing so
and tell the core that it can't turn off the gpu while the refcoun > 0
(but also tell it to pls try again shortly if the refcount hold is only
temporary and not indefinite like an enabled kms output ...). So to get
this really right I think you can't avoid adding all these checks to all
the hw paths, but then with the added fun of fighting a midlayer that gets
in the way (since you'd have to re-issue any delayed power off switch
request when the refcount drops to zero).

And imo for the "meh, I don't care about these unlikely races" proof of
concept code it shouldn't be hard to wrap drm_ioctl and the other few
top-level entry points in the driver.

Imo this is very similar to the nouveau gpu reset discussion a while back
where they wanted to add a check in core to similarly guarantee that no
one can touch the hw while they change it. I've shot that down for similar
reasons. Imo that kind of device access bracketing really belongs into the
driver. And i915.ko has it. Yep, it's a pain to get right and you need
error injection + decent testsuite to have a decent guarantee it doesn't
blow up again.

>> - Launch a delayed work item to cut the power again once they notice
>> that the gpu is idle (dunno how radeon/nouveau work exactly, but i915
>> has this nice retire_requests work item which does a few similar things
>> for power management, like lvds downclocking)
>
>> - Same thing for any other kind of usage, like e.g. kms: Drivers can
>> wrap the crtc helper set_mode to ensure the gpu is on and also check
>> for any enabled outputs before launching the delayed power off switch.
>> Same applies to any sysfs/debugfs files (although in the case of
>> i915.ko, many of these don't need the hw to be on).
>
> Yeah this is probably the way I'd like it to end up alright, it'll just
> be a lot more code and a shit load of missed corner cases until I get it
> right. So I preferred to start with something I knew would work..

See above, I think I can poke holes into the current approach, too ;-)

>> I guess you could add a small vga_switcheroo_dynamic_power_state struct
>> or something with a few helpers to do that to extract some duplicated
>> code from drivers. But tbh managing a piece of state lazily with a
>> timer/delayed work item is a common code pattern, so I don't think even
>> that little bit of code sharing is worth it.
>>
>> Cheers, Daniel
>>
>> PS: I've read a bit around in the switcheroo code and I think the
>> switcheroo ->can_switch callback is actually worse ... since the drm
>> drivers just check the open_count (which is hilariously racy in itself,
>> too) and there's no locking to ensure that stays the same between the
>> ->can_switch check and the actual ->set_state calls ...
>
> Yeah it just works by the fact that things are slow, but yeah in theory
> would need to lock-out userspace from accessing drm_open during the
> timeframe. The problem is doing that without userspace suddenly getting
> an error it didn't get before and things not working at all.

thread A				thread B
drm_open dGPU
drm_open_helper
dev->switch_power_state != DRM_SWITCH_POWER_ON check suceeds

					vga_switcheroo
					->can_switch check succeeds
					disables the dGPU

dev->open_count++ in drm_open

BOOM, you think the dGPU is on and working, but it isn't ...

I think we could fix this by killing ->can_switch into the ->set_power
callback an allowing that to return -EBUSY if switching isn't possible
right now. The switcheroo client could then do appropriate locking. But
since the entire drm->open_count story is a neat disaster anyway (never
try to unload while having a sysfs/debugfs file open ...) I'm in no hurry
to fix this ;-)

Cheers, Daniel
-- 
Daniel Vetter
daniel.vetter at ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch