[Nouveau] Addressing the problem of noisy GPUs under Nouveau

Martin Peres martin.peres at free.fr
Mon Nov 13 09:25:37 UTC 2017


Hello,

On 13/11/17 09:15, John Hubbard wrote:
> On 11/12/2017 06:29 PM, Martin Peres wrote:
>> Hello,
>>
>> Some users have been complaining for years about their GPU sounding like
>> a jet engine at take off. Last year, I finally laid my hand on one of
>> these GPUs and have been trying to fix this issue on and off since then.
> 
> Some early feedback: can you tell us the exact SKUs you have? And are these
> production boards with production VBIOSes?  
> 
> Normally, it's just our bringup boards that we'd expect to be noisy like 
> this, so we're looking for a few more details.

Thanks for the quick feedback.

We only have access to production hardware with production vbioses, as
far as I know. In any case, I made all my experiments on the following
GPU (with a stock vbios, albeit modified to perform the experiment):

NVIDIA Corporation GF108 [GeForce GT 620] (rev a1) (prog-if 00 [VGA
controller])
        Subsystem: eVga.com. Corp. Device 2625

I pushed my vbios to http://fs.mupuf.org/nvidia/fan_calib/ if this is
interesting to you (I doubt it, but if that can save us a round trip,
then let's do this :)).

Thanks,
Martin

> 
> thanks,
> John Hubbard
> NVIDIA
> 
>>
>> After failing to find anything in the HW, I figured out that the duty
>> cycle set by nvidia's proprietary driver would be way under the expected
>> value. By randomly changing values in the unknown tables of the vbios, I
>> found out that there is a fan calibration table at the offset 0x18 in
>> the BIT P table (version 2).
>>
>> In this table, I identified 2 major 16 bits parameters at offset 0xa and
>> 0xc[2]. The first one, I named pwm_max, while naming the latter
>> pwm_offset. As expected, these parameters look like a mapping function
>> of the form aX + b. However, after gathering more samples, I found out
>> that the output was not continuous when linearly increasing pwm_offset
>> [1]. Even more funnily, the period of this square function is linear
>> with the frequency used for the fan's PWN.
>>
>> I tried reverse engineering the formula to describe this function, but
>> failed to find a version that would work perfectly for all PWM
>> frequency. This is the closest I have got to[3], and I basically stopped
>> there about a year ago because I could not figure it out and got
>> frustrated :s.
>>
>> I started again on this project 2 weeks ago, with the intent of finding
>> a good-enough solution for nouveau, and modelling the rest of the
>> equation that that would allow me to compute what duty I should set for
>> every wanted fan speed (%). I again mostly succeeded... but it would
>> seem that the interpretation of the table depends on the generation of
>> chipset (Tesla behaves one way, Fermi+ behaves another way). Also, the
>> proprietary is not consistent for rules such as what to do when the
>> computed duty value is going to be lower than 0 or not (sometimes we
>> clamp it to 0, some times we set it to the same value as the divider,
>> some times we set it to a slightly lower value than the divider).
>>
>> I have been trying to cover all edge cases by generating a randomized
>> set of values for the PWM frequency, pwm_max, and pwm_offset values,
>> flashed the vbios, and iterate from 0% to 100% fan speed while dumping
>> the values set by your driver. Using half a million sample points (which
>> took a week to acquire), my model computes 97% of the values correctly
>> (ignoring off by ones), while the remaining 3% are worryingly off (by up
>> to 100%)... It is clear that the code is not trivial and is full of
>> branching, which makes clean-room reverse engineering a chore.
>>
>> As a final attempt to make a somewhat complete solution, I tried this
>> weekend to make a "safe" model that would still make the GPUs quiet. I
>> managed to improve the pass rate from 97 to 99.6%, but the remaining
>> failures conflict with my previous findings, which are also way more
>> prevalent. In the end, the only completely-safe way of driving the fan
>> is the current behaviour of nouveau...
>>
>> At this point, I am ready to throw in the towel and hardcode parameters
>> in nouveau to address the problem of the loudest GPUs, but this is of
>> course suboptimal. This is why I am asking for your help. Would you have
>> some documentation about this fan calibration table that could help me
>> here? Code would be even more appreciated.
>>
>> Thanks a lot in advance,
>> Martin
>>
>> PS: here is most of the code you may want to see:
>> http://fs.mupuf.org/nvidia/fan_calib/
>>
>> [1] http://fs.mupuf.org/nvidia/fan_calib/pwm_offset.png
>> [2] https://github.com/envytools/envytools/blob/master/nvbios/power.c#L333
>> [3] https://github.com/envytools/envytools/blob/master/nvbios/power.c#L298
>>



More information about the Nouveau mailing list