Expecting to revert commit 55285e21f045 "fbdev/efifb: Release PCI device ..."

Christian König christian.koenig at amd.com
Tue Dec 21 07:51:58 UTC 2021


Good morning guys,

first of all get better soon Linus.

I'm unfortunately not the best expert for runtime power management 
(Alex) nor display (Harry), but from the lack of their response I guess 
that they are already on vacation. So maybe take everything I explain 
here with a grain of salt.

Then for the background we have two separate power management features 
here which doesn't seem to work as they should.

The first buggy one is runtime power management, which is what commit 
55285e21f045 surfaces. My educated guess is that the now corrected 
reference counting turns of the GPU before userspace has a chance to 
send a signal to the monitor to turn of it's backlight. Double checking 
the code I can see the correct calls to pm_runtime_get_*() and 
pm_runtime_put_*() in amdgpu_dm_atomic_commit_tail(), but to be honest 
that function seems to be quite a mess.

A trace of what exactly happens during PM autosuspend might help here. 
Daniel do you know any tracepoint for that?

Then we have DPMS, which is basically the way of telling the monitor to 
shut of it's backlight. When this doesn't work as expected (e.g. you 
need *two* cycles) then it can as well be that userspace is not sending 
the right command.

When you use X you could double check with "xset dpms force off" and 
"xset dpms force suspend". At least with my monitor it turns of the 
backlight in both cases, but maybe your hardware behaves differently.

Regards,
Christian.

Am 20.12.21 um 23:21 schrieb Linus Torvalds:
> [ Adding back in more amd people and the amd list, the people Daniel
> added seem to have gotten lost again, but I think people at least saw
> my original report thanks to Daniel ]
>
> With "amdgpu.runpm=0", things are better, but not perfect. With that I
> can lock the screen, and it has to go through *two* cycles of "No
> signal, turning off", but on the second cycle it does finally work.
>
> This was exposed by commit 55285e21f045 ("fbdev/efifb: Release PCI
> device's runtime PM ref during FB destroy"), probably because that
> made runtime PM actually potentially work, but it is then broken on
> amdgpu.
>
> Absolutely nothing odd in my setup. Two monitors, one GPU. PCI ID
> 1002:67df rev e7, subsystem ID 1da2:e353.
>
> I'd expect pretty much any amdgpu person to see this.
>
> On Mon, Dec 20, 2021 at 2:04 PM Linus Torvalds
> <torvalds at linux-foundation.org> wrote:
>> Note: on my machine, I get that
>>
>>     amdgpu 0000:49:00.0: amdgpu: Using BACO for runtime pm
>>
>> so maybe the other possible runtime pm models (ARPX and BOCO) are ok,
>> and it's only that BACO case that is broken.
> Hmm. The *documentation* says:
>
>      PX runtime pm
>          2 = force enable with BAMACO,
>          1 = force enable with BACO,
>          0 = disable,
>          -1 = PX only default
>
> but the code actually makes anything != 0 enable it, except on VEGA20
> and ARCTURUS, where it needs to be positive.
>
> My card is apparently "POLARIS10", whatever that means, which means
> that any non-zero value of amdgpu_runtime_pm will enable runtime PM as
> long as "amdgpu_device_supports_baco()" is true. Which it is.
>
> Whatever. Now I'm just kwetching about the documentation not matching
> what I see the code doing, which is never a great sign when things
> don't work.
>
>                Linus



More information about the amd-gfx mailing list