[Nouveau] Addressing the problem of noisy GPUs under Nouveau

Martin Peres martin.peres at free.fr
Mon Nov 13 02:29:25 UTC 2017


Hello,

Some users have been complaining for years about their GPU sounding like
a jet engine at take off. Last year, I finally laid my hand on one of
these GPUs and have been trying to fix this issue on and off since then.

After failing to find anything in the HW, I figured out that the duty
cycle set by nvidia's proprietary driver would be way under the expected
value. By randomly changing values in the unknown tables of the vbios, I
found out that there is a fan calibration table at the offset 0x18 in
the BIT P table (version 2).

In this table, I identified 2 major 16 bits parameters at offset 0xa and
0xc[2]. The first one, I named pwm_max, while naming the latter
pwm_offset. As expected, these parameters look like a mapping function
of the form aX + b. However, after gathering more samples, I found out
that the output was not continuous when linearly increasing pwm_offset
[1]. Even more funnily, the period of this square function is linear
with the frequency used for the fan's PWN.

I tried reverse engineering the formula to describe this function, but
failed to find a version that would work perfectly for all PWM
frequency. This is the closest I have got to[3], and I basically stopped
there about a year ago because I could not figure it out and got
frustrated :s.

I started again on this project 2 weeks ago, with the intent of finding
a good-enough solution for nouveau, and modelling the rest of the
equation that that would allow me to compute what duty I should set for
every wanted fan speed (%). I again mostly succeeded... but it would
seem that the interpretation of the table depends on the generation of
chipset (Tesla behaves one way, Fermi+ behaves another way). Also, the
proprietary is not consistent for rules such as what to do when the
computed duty value is going to be lower than 0 or not (sometimes we
clamp it to 0, some times we set it to the same value as the divider,
some times we set it to a slightly lower value than the divider).

I have been trying to cover all edge cases by generating a randomized
set of values for the PWM frequency, pwm_max, and pwm_offset values,
flashed the vbios, and iterate from 0% to 100% fan speed while dumping
the values set by your driver. Using half a million sample points (which
took a week to acquire), my model computes 97% of the values correctly
(ignoring off by ones), while the remaining 3% are worryingly off (by up
to 100%)... It is clear that the code is not trivial and is full of
branching, which makes clean-room reverse engineering a chore.

As a final attempt to make a somewhat complete solution, I tried this
weekend to make a "safe" model that would still make the GPUs quiet. I
managed to improve the pass rate from 97 to 99.6%, but the remaining
failures conflict with my previous findings, which are also way more
prevalent. In the end, the only completely-safe way of driving the fan
is the current behaviour of nouveau...

At this point, I am ready to throw in the towel and hardcode parameters
in nouveau to address the problem of the loudest GPUs, but this is of
course suboptimal. This is why I am asking for your help. Would you have
some documentation about this fan calibration table that could help me
here? Code would be even more appreciated.

Thanks a lot in advance,
Martin

PS: here is most of the code you may want to see:
http://fs.mupuf.org/nvidia/fan_calib/

[1] http://fs.mupuf.org/nvidia/fan_calib/pwm_offset.png
[2] https://github.com/envytools/envytools/blob/master/nvbios/power.c#L333
[3] https://github.com/envytools/envytools/blob/master/nvbios/power.c#L298



More information about the Nouveau mailing list