[Nouveau] Addressing the problem of noisy GPUs under Nouveau

Martin Peres martin.peres at free.fr
Mon Jan 29 00:05:03 UTC 2018


On 29/01/18 01:24, Martin Peres wrote:
> On 28/11/17 07:32, John Hubbard wrote:
>> On 11/23/2017 02:48 PM, Martin Peres wrote:
>>> On 23/11/17 10:06, John Hubbard wrote:
>>>> On 11/22/2017 05:07 PM, Martin Peres wrote:
>>>>> Hey,
>>>>>
>>>>> Thanks for your answer, Andy!
>>>>>
>>>>> On 22/11/17 04:06, Ilia Mirkin wrote:
>>>>>> On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger <aritger at nvidia.com> wrote:
>>>>>> Martin's question was very long, but it boils down to this:
>>>>>>
>>>>>> How do we compute the correct values to write into the e114/e118 pwm
>>>>>> registers based on the VBIOS contents and current state of the board
>>>>>> (like temperature).
>>>>>
>>>>> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on
>>>>> GF119+, or 0x200cd/d0 on Kepler+.
>>>>>
>>>>> At least, it looks like we know which PWM controler we need to drive, so
>>>>> I did not want to muddy the water even more by giving register
>>>>> addresses, rather concentrating on the problem at hand: How to compute
>>>>> the duty value for the PWM controler.
>>>>>
>>>>>>
>>>>>> We generally do this right, but appear to get it extra-wrong for certain GPUs.
>>>>>
>>>>> Yes... So far, we are always safe, but users tend to mind when their
>>>>> computer sound like a jumbo jet at take off... Who would have thought? :D
>>>>>
>>>>> Anyway, looking forward to your answer!
>>>>>
>>>>> Cheers,
>>>>> Martin
>>>>
>>>>
>>>> Hi Martin,
>>>>
>>>> One of our firmware engineers thinks that this looks a lot like PWM inversion.
>>>> For some SKUs, the interpretation of the PWM duty cycle is inverted. That 
>>>> would probably make it *very* difficult to find a sensible algorithm that 
>>>> covered all the SKUs, given that some are inverted and others are not.
>>>>
>>>> For the noisy GPUs, a very useful experiment would be to try inverting it, 
>>>> like this:
>>>>
>>>> 	pwmDutyCycle = pwmPeriod - pwmDutyCycle;
>>>>
>>>> ...and then see if fan control starts behaving closer to how you've actually 
>>>> programmed it.
>>>>
>>>> Would that be easy enough to try out? It should help narrow down the
>>>> problem at least.
>>>>
>>>
>>> Hey John,
>>>
>>> Unfortunately, we know about PWM inversion, and one can know which mode
>>> to use based on the GPIO entry associated to the fan (inverted). We have
>>> had support for this in Nouveau for a long time. At the very least, this
>>> is not the problem on my GF108.
>>>
>>> I am certain that the problem I am seeing is related to this vbios table
>>> I wrote about (BIT P, offset 0x18). It is used to compute what PWM duty
>>> I should use for both 0 and 100% of the fan speed.
>>>
>>> Computing the value for 0% fan speed is difficult because of
>>> non-continuous nature of some of the functions[1], but I can always
>>> over-approximate. However, I failed to accurately compute the duty I
>>> need to write to get the 100% fan speed (I have cases where I greatly
>>> over-estimate it...).
>>>
>>> Could you please check out the vbios table I am pointing at? I am quite
>>> sure that your documentation will be clearer than my babbling :D
>>
>> Yes. We will check on this. There has been some productive discussion 
>> internally, but it will take some more investigation.
>>
>> thanks,
>> John Hubbard
> 
> Have the productive discussions panned out?

Oh, I see you pushed new vbios documentation:
 1)
http://download.nvidia.com/open-gpu-doc/BIOS-Information-Table/1/BIOS-Information-Table.html
 2)
http://download.nvidia.com/open-gpu-doc/MemoryClockTable/1/MemoryClockTable.html
 3)
http://download.nvidia.com/open-gpu-doc/MemoryTweakTable/1/MemoryTweakTable.html

Is there any chance to get the documentation of the "Thermal Coolers
Table", and the "Thermal Device Table" (the latter does not seem super
important though).

Anyway, thanks for the new documentation, reverse engineering of power
management will be greatly simplified as we have a better idea which
bits will control what. Too bad it won't help for the current issue
though...

Thanks,
Martin


More information about the Nouveau mailing list