[Nouveau] Addressing the problem of noisy GPUs under Nouveau
John Hubbard
jhubbard at nvidia.com
Wed Feb 7 03:31:07 UTC 2018
On 01/28/2018 04:05 PM, Martin Peres wrote:
> On 29/01/18 01:24, Martin Peres wrote:
>> On 28/11/17 07:32, John Hubbard wrote:
>>> On 11/23/2017 02:48 PM, Martin Peres wrote:
>>>> On 23/11/17 10:06, John Hubbard wrote:
>>>>> On 11/22/2017 05:07 PM, Martin Peres wrote:
>>>>>> Hey,
>>>>>>
>>>>>> Thanks for your answer, Andy!
>>>>>>
>>>>>> On 22/11/17 04:06, Ilia Mirkin wrote:
>>>>>>> On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger <aritger at nvidia.com> wrote:
>>>>>>> Martin's question was very long, but it boils down to this:
>>>>>>>
>>>>>>> How do we compute the correct values to write into the e114/e118 pwm
>>>>>>> registers based on the VBIOS contents and current state of the board
>>>>>>> (like temperature).
>>>>>>
>>>>>> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on
>>>>>> GF119+, or 0x200cd/d0 on Kepler+.
>>>>>>
>>>>>> At least, it looks like we know which PWM controler we need to drive, so
>>>>>> I did not want to muddy the water even more by giving register
>>>>>> addresses, rather concentrating on the problem at hand: How to compute
>>>>>> the duty value for the PWM controler.
>>>>>>
>>>>>>>
>>>>>>> We generally do this right, but appear to get it extra-wrong for certain GPUs.
>>>>>>
>>>>>> Yes... So far, we are always safe, but users tend to mind when their
>>>>>> computer sound like a jumbo jet at take off... Who would have thought? :D
>>>>>>
>>>>>> Anyway, looking forward to your answer!
>>>>>>
>>>>>> Cheers,
>>>>>> Martin
>>>>>
[...]
Hi Martin,
I strongly suspect you are seeing a special behavior, which is: on
some GF108 boards we use only a very limited range of PWM,
0.4 to 2.5%, due to the particular type of DC power conversion
circuit on those boards. However, it could also just be difficulties
in interpreting the fixed-point variables in the tables. In either
case, the answer is to explain those formats, so I'll do that now.
I am attaching the fan cooler table, in HTML format. We have also
published the BIT (BIOS Information Table) format, separately:
http://download.nvidia.com/open-gpu-doc/BIOS-Information-Table/1/BIOS-Information-Table.html
, but I don't think it has any surprises for you, in this regard. You
can check it, to be sure you're looking at the right subtable, though,
just in case.
The interesting parts of that table are:
PWM Scale Slope (16 bits):
Slope to scale effective PWM to actual PWM (1/4096, F4.12, signed).
For backwards compatibility, a value of 0.0 (0x0000) is interpreted as 1.0 (0x1000).
This value is used to scale the effective PWM duty cycle, a conceptual fraction
of full speed (0% to 100%), to the actual electrical PWM duty cycle.
PWM(actual) = Slope × PWM(effective) + Offset
PWM Scale Offset (16 bits):
Offset to scale effective PWM to actual PWM (1/4096, F4.12, signed).
This value is used to scale the effective PWM duty cycle, a conceptual fraction
of full speed (0% to 100%), to the actual electrical PWM duty cycle.
PWM(actual) = Slope × PWM(effective) + Offset
However, the calculations are hard to get right, and the table stores
values in fixed-point format, so I'm showing a few simplified code excerpts
that use these. The various fixed point macro definitions are found as part of
our normal driver package, in nvmisc.h and nvtypes.h. Any other definitions
that you need are included right here (I ran a quick compiler check to be sure.)
#define VBIOS_THERM_COOLER_TABLE_10_ENTRY_SIZE_10 0x00000010
// Fan Cooler Table Version
#define NV_FAN_COOLER_TABLE_V2_VERSION_10 0x10
// We limit the size of V2 tables
#define NV_FAN_COOLER_TABLE_V2_MAX_ENTRIES 10
// Entry skip.
#define NV_FAN_COOLER_TABLE_V2_SKIP_ENTRY 0x000000FF
#define NV_FAN_COOLER_TABLE_TYPE 3:0 // field1- dword
#define NV_FAN_COOLER_TABLE_TYPE_PASSIVE_HEAT_SINK 0x00000000
#define NV_FAN_COOLER_TABLE_TYPE_ACTIVE_FAN_SINK 0x00000001
#define NV_FAN_COOLER_TABLE_TYPE_ACTIVE_SKIP 0x0000000F
#define NV_FAN_COOLER_TABLE_TYPE_DEFAULT NV_FAN_COOLER_TABLE_TYPE_PASSIVE_HEAT_SINK
#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY 6:4
#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY_GPU 0x00000000
#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY_ALL 0x00000001
#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY_DEFAULT NV_FAN_COOLER_TABLE_TARGET_AFFINITY_GPU
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE 10:8
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_NONE 0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_GPU 0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_EXTERNAL 0x00000002
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_DEFAULT NV_FAN_COOLER_TABLE_CONTROL_DEVICE_NONE
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE 14:12
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_NONE 0x00000000
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_GPU 0x00000001
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_EXTERNAL_INSTANCE0 0x00000002
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_DEFAULT NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_NONE
#define NV_FAN_COOLER_TABLE_CONTROL_SPEED_MAXIMUM 25:16
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL 29:26
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_NONE 0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_UNKNOWN 0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_FAN_SPECIFIC_INSTANCE0 0x00000002
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_GPIO_FAN_FUNC_INSTANCE0 0x00000003
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_DEFAULT NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_NONE
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY 31:30
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_GPIO 0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_LOW 0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_HIGH 0x00000002
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_DEFAULT NV_FAN_COOLER_TABLE_CONTROL_POLARITY_GPIO
#define NV_FAN_COOLER_TABLE_CONTROL_SPEED_MINIMUM 9:0 // field2- dword
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL 13:10
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_NONE 0x00000000
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_UNKNOWN 0x00000001
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_TACH_SPECIFIC_INSTANCE0 0x00000002
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_GPIO_TACH_FUNC_INSTANCE0 0x00000003
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_DEFAULT NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_NONE
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE 15:14
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_1 0x00000000
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_2 0x00000001
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_3 0x00000002
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_4 0x00000003
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_DEFAULT NV_FAN_COOLER_TABLE_TACHOMETER_RATE_1
#define NV_FAN_COOLER_TABLE_PWM_MINIMUM 22:16
#define NV_FAN_COOLER_TABLE_CONTROL_STOP 23:23
#define NV_FAN_COOLER_TABLE_CONTROL_STOP_PWM 0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_STOP_POWER 0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_STOP_DEFAULT NV_FAN_COOLER_TABLE_CONTROL_STOP_PWM
#define NV_FAN_COOLER_TABLE_PWM_START 30:24
#define NV_FAN_COOLER_TABLE_PWM_FREQUENCY 11:0 // field3 - dword
#define NV_FAN_COOLER_TABLE_PWM_FREQUENCY_UNDEFINED 0x00000000
#define NV_FAN_COOLER_TABLE_FIELD3_RSVD 15:12
#define NV_FAN_COOLER_TABLE_FIELD3_PWM_SCALE_SLOPE 31:16
#define NV_FAN_COOLER_TABLE_FIELD4_PWM_SCALE_OFFSET 15:0 // field4 - dword
#define NV_FAN_COOLER_TABLE_FIELD4_LOW_ENDPOINT_EXPECTED_ERROR 23:16
#define NV_FAN_COOLER_TABLE_FIELD4_INTERPOLATION_EXPECTED_ERROR 31:24
#define NV_FAN_COOLER_TABLE_FIELD5_HIGH_ENDPOINT_EXPECTED_ERROR 7:0
#define NV_FAN_COOLER_TABLE_FIELD5_RSVD 31:8
// Fan Cooler Table entry
typedef struct
{
NvU32 field1;
NvU32 field2;
NvU32 field3;
NvU32 field4;
NvU32 field5;
} FAN_COOLER_TABLEENTRY;
// Fan Cooler Table
typedef struct
{
NvU32 version;
NvU32 entrySize;
NvU32 entryCount;
FAN_COOLER_TABLEENTRY entries[NV_FAN_COOLER_TABLE_V2_MAX_ENTRIES];
} FAN_COOLER_TABLE, *PFAN_COOLER_TABLE;
// Default minimum fan level - *cannot* be overriden by VBIOS:
#define FAN_SPEEDCONTROL_MINIMUM_LEVEL_DEFAULT 30
// Default maximum fan level - can be overriden by VBIOS:
#define FAN_SPEEDCONTROL_MAXIMUM_LEVEL_DEFAULT 100
/*!
* Scales a PWM ratio (raw duty cycle / period) to a functional percentage,
* which more adequately represents the percent of full phsyical fan speed on
* the fan.
*
* @param[in] fanPwmScaleOffset Fan PWM scale offset, in F4.12
* @param[in] fanPwmScaleSlope Fan PWM scale slope, in F4.12
* @param[in] pwmRatio PWM ratio in F16.16. However, expected values are in
* the range [0,1] so it's actually F1.16.
*
* @return Scaled percent in F16.16 - Note, this is an actual percentage stored
* in the fractional part. The value is not scaled up by 100, as
* elsewhere in the RM.
*/
static NvUFXP16_16
fanPwmScalePwmRatioToPct
(
NvSFXP4_12 fanPwmScaleOffset,
NvSFXP4_12 fanPwmScaleSlope,
NvUFXP16_16 pwmRatio
)
{
NvUFXP16_16 pwmPct;
if (fanPwmScaleSlope == 0)
{
// Various logging/tracing actions here...
return 0;
}
//
// (F1.16 << 12) - (F4.12 << 16) => F4.28
// / F4.12 => F4.12
// ----------------------------------------
// F4.16
NvSFXP16_16 signedPwmPct = ((pwmRatio << 12) -
(((NvS32) fanPwmScaleOffset) << 16) +
(fanPwmScaleSlope / 2)) /
fanPwmScaleSlope;
if (signedPwmPct > NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 1))
{
pwmPct = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 1);
}
else if (signedPwmPct < NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 0))
{
pwmPct = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 0);
}
else
{
pwmPct = (NvUFXP16_16) signedPwmPct;
}
return pwmPct;
}
/*!
* Scales a functional percentage (represing the percent of full physical fan
* speed) to a PWM ratio (raw duty cycle / period).
*
* @param[in] fanPwmScaleOffset Fan PWM scale offset, in F4.12
* @param[in] fanPwmScaleSlope Fan PWM scale slope, in F4.12
* @param[in] pwmPct Percent to scale in F16.16. Note, this is an actual
* percentage stored in the fractional part. The value is not scaled
* up by 100, as elsewhere in the RM. So expected values are in the
* range [0,1] and actual value is really F.16.
*
* @return Scaled pwm ratio in F16.16
*/
static NvUFXP16_16
fanPwmScalePctToPwmRatio
(
NvSFXP4_12 fanPwmScaleOffset,
NvSFXP4_12 fanPwmScaleSlope,
NvUFXP16_16 pwmPct
)
{
NvUFXP16_16 pwmRatio;
//
// (F1.16 * F4.12) >> 12 => F4.28 => F4.16
// + F4.12 << 4 => F4.16
// ------------------------------------------
// F4.16
//
NvSFXP16_16 signedPwmRatio = pwmPct * fanPwmScaleSlope;
signedPwmRatio = (signedPwmRatio >> 12) +
(DRF_VAL(_TYPES, _FXP, _FRACTIONAL_MSB(4, 12),
signedPwmRatio)) +
(((NvSFXP16_16) fanPwmScaleOffset) << 4);
if (signedPwmRatio > NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 1))
{
pwmRatio = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 1);
}
else if (signedPwmRatio < NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 0))
{
pwmRatio = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 0);
}
else
{
pwmRatio = (NvUFXP16_16) signedPwmRatio;
}
return pwmRatio;
}
/*!
* Parses the PWM Scale slope and offset from the Fan Coolers Table.
*
* @param[in] pTable FAN_COOLER_TABLE pointer from which to pull the
* information.
* @param[out] pPwmScaleSlope Pointer in which to return the slope
* @param[out] pPwmScaleOffset Pointer in which to return the offset
*
* @return NV_OK if values successfully returned in the pointers
* @return NV_ERR_INVALID_ARGMENT if NULL pointers are provided.
* @return NV_ERR_NOT_SUPPORTED if this Fan Coolers Table does not support
* these fields.
*/
NV_STATUS
fanCoolerTableGetPwmScale
(
PFAN_COOLER_TABLE pTable,
NvSFXP4_12 *pPwmScaleSlope,
NvSFXP4_12 *pPwmScaleOffset
)
{
NV_STATUS status = NV_ERR_NOT_SUPPORTED;
NvU32 i;
if ((pPwmScaleSlope == NULL) ||
(pPwmScaleOffset == NULL))
{
status = NV_ERR_INVALID_ARGUMENT;
goto fanCoolerTableGetPwmScale_exit;
}
//
// Make sure that we know the table supports this field.
//
// Ideally, this should have been abstracted out in the devinit
// function to parse this table, so that our storage isn't
// version-dependent.
//
if ((pTable->version >= NV_FAN_COOLER_TABLE_V2_VERSION_10) &&
(pTable->entrySize >= VBIOS_THERM_COOLER_TABLE_10_ENTRY_SIZE_10))
{
for (i = 0; i < pTable->entryCount; i++)
{
// Slope of zero is unsupported, and the error case for this data.
if ((FLD_TEST_DRF(_FAN, _COOLER_TABLE, _TYPE, _ACTIVE_FAN_SINK,
pTable->entries[i].field1)) &&
(FLD_TEST_DRF(_FAN, _COOLER_TABLE, _CONTROL_DEVICE, _GPU,
pTable->entries[i].field1)))
{
//
// Scale of zero is an invalid/unexpected case. This is a major
// failure.
//
if (FLD_TEST_DRF_NUM(_FAN_COOLER_TABLE, _FIELD3,
_PWM_SCALE_SLOPE, NV_TYPES_S32_TO_SFXP_X_Y(4, 12, 0),
pTable->entries[i].field3))
{
// Warning/assertion here: Found a PWM Scaling Slope of
// zero! This is invalid and would effectively lock the
// fan speed.
}
else
{
*pPwmScaleSlope = (NvSFXP4_12) DRF_VAL(_FAN_COOLER_TABLE,
_FIELD3, _PWM_SCALE_SLOPE, pTable->entries[0].field3);
*pPwmScaleOffset = (NvSFXP4_12) DRF_VAL(_FAN_COOLER_TABLE,
_FIELD4, _PWM_SCALE_OFFSET, pTable->entries[0].field4);
status = NV_OK;
}
break;
}
}
}
fanCoolerTableGetPwmScale_exit:
return status;
}
/**
* Converts a PWM duty cycle to a fan level (fan speed percentage) based on the
* fan period and duty cycle.
*
* @param[in] fanPeriod The fan period to convert
* @param[in] dutyCycle The duty cycle to convert
* @param[in] fanPwmScaleOffset PWM scale offset, from the VBIOS table
* @param[in] fanPwmScaleSlope PWM scale slope, from the VBIOS table
*
* @return The converted level
*/
NvU32
fanConvertDutyCycleToLevel
(
NvU32 fanPeriod,
NvU32 dutyCycle,
NvSFXP4_12 fanPwmScaleOffset,
NvSFXP4_12 fanPwmScaleSlope
)
{
NvU32 level;
NvUFXP16_16 pwmRatio;
NvU64 pwmRatio64;
// If the fan period is 0, we dont have a cooler, so default to OFF.
if (fanPeriod == 0)
{
// OFF
level = 0;
}
// If the fanPeriod is 1, we have an ON/OFF cooler
else if (fanPeriod == 1)
{
// If the dutyCycle matches the fanPeriod, the cooler is ON
if (dutyCycle == fanPeriod)
{
// ON.
level = 100;
}
else
{
// OFF.
level = 0;
}
}
else
{
//
// Variable speed, so calc level from dutyCycle and fanPeriod
//
//
// On our legacy boards, we can have periods > 16-bits, meaning the
// dutyCycle can be also be > 16-bits. This means that if we shift left
// by 16 we can overflow 32-bits.
//
// However, once we divide by the period, the result will always be in
// the range [0,1] because dutyCycle <= period. Thus, if we shift and
// do the division/scaling in 64-bits the result can be truncated back
// to 32-bits.
//
// 32.0 << 16 => 32.16
// / 32.0
// -----------------------
// 1.16
//
pwmRatio64 = NV_UNSIGNED_ROUNDED_DIV(((NvU64)dutyCycle) <<
DRF_SHIFT(NV_TYPES_FXP_INTEGER(32, 16)), fanPeriod);
pwmRatio = (NvUFXP16_16) pwmRatio64;
// F16.16 >> 16 => F16.0
level = NV_TYPES_UFXP_X_Y_TO_U32_ROUNDED(16, 16,
fanPwmScalePwmRatioToPct(fanPwmScaleOffset,
fanPwmScaleSlope,
pwmRatio) * 100);
// Make sure we're within the max and min bounds - this can be due to
// rounding/truncation issues due to integer math.
if (level > FAN_SPEEDCONTROL_MAXIMUM_LEVEL_DEFAULT)
{
level = FAN_SPEEDCONTROL_MAXIMUM_LEVEL_DEFAULT;
}
else if (level < FAN_SPEEDCONTROL_MINIMUM_LEVEL_DEFAULT)
{
level = FAN_SPEEDCONTROL_MINIMUM_LEVEL_DEFAULT;
}
}
return level;
}
thanks,
--
John Hubbard
NVIDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180206/353ee97c/attachment-0001.html>
More information about the Nouveau
mailing list