[Nouveau] Addressing the problem of noisy GPUs under Nouveau

John Hubbard jhubbard at nvidia.com
Wed Feb 7 03:31:07 UTC 2018


On 01/28/2018 04:05 PM, Martin Peres wrote:
> On 29/01/18 01:24, Martin Peres wrote:
>> On 28/11/17 07:32, John Hubbard wrote:
>>> On 11/23/2017 02:48 PM, Martin Peres wrote:
>>>> On 23/11/17 10:06, John Hubbard wrote:
>>>>> On 11/22/2017 05:07 PM, Martin Peres wrote:
>>>>>> Hey,
>>>>>>
>>>>>> Thanks for your answer, Andy!
>>>>>>
>>>>>> On 22/11/17 04:06, Ilia Mirkin wrote:
>>>>>>> On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger <aritger at nvidia.com> wrote:
>>>>>>> Martin's question was very long, but it boils down to this:
>>>>>>>
>>>>>>> How do we compute the correct values to write into the e114/e118 pwm
>>>>>>> registers based on the VBIOS contents and current state of the board
>>>>>>> (like temperature).
>>>>>>
>>>>>> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on
>>>>>> GF119+, or 0x200cd/d0 on Kepler+.
>>>>>>
>>>>>> At least, it looks like we know which PWM controler we need to drive, so
>>>>>> I did not want to muddy the water even more by giving register
>>>>>> addresses, rather concentrating on the problem at hand: How to compute
>>>>>> the duty value for the PWM controler.
>>>>>>
>>>>>>>
>>>>>>> We generally do this right, but appear to get it extra-wrong for certain GPUs.
>>>>>>
>>>>>> Yes... So far, we are always safe, but users tend to mind when their
>>>>>> computer sound like a jumbo jet at take off... Who would have thought? :D
>>>>>>
>>>>>> Anyway, looking forward to your answer!
>>>>>>
>>>>>> Cheers,
>>>>>> Martin
>>>>>
[...]

Hi Martin,

I strongly suspect you are seeing a special behavior, which is: on
some GF108 boards we use only a very limited range of PWM,
0.4 to 2.5%, due to the particular type of DC power conversion
circuit on those boards. However, it could also just be difficulties
in interpreting the fixed-point variables in the tables. In either
case, the answer is to explain those formats, so I'll do that now.

I am attaching the fan cooler table, in HTML format. We have also
published the BIT (BIOS Information Table) format, separately:

    http://download.nvidia.com/open-gpu-doc/BIOS-Information-Table/1/BIOS-Information-Table.html

, but I don't think it has any surprises for you, in this regard. You
can check it, to be sure you're looking at the right subtable, though,
just in case.

The interesting parts of that table are:

PWM Scale Slope (16 bits):

  Slope to scale effective PWM to actual PWM (1/4096, F4.12, signed).
  For backwards compatibility, a value of 0.0 (0x0000) is interpreted as 1.0 (0x1000).
  This value is used to scale the effective PWM duty cycle, a conceptual fraction
  of full speed (0% to 100%), to the actual electrical PWM duty cycle.
  PWM(actual) = Slope × PWM(effective) + Offset

PWM Scale Offset (16 bits):

  Offset to scale effective PWM to actual PWM (1/4096, F4.12, signed).
  This value is used to scale the effective PWM duty cycle, a conceptual fraction
  of full speed (0% to 100%), to the actual electrical PWM duty cycle.
  PWM(actual) = Slope × PWM(effective) + Offset


However, the calculations are hard to get right, and the table stores
values in fixed-point format, so I'm showing a few simplified code excerpts
that use these. The various fixed point macro definitions are found as part of
our normal driver package, in nvmisc.h and nvtypes.h. Any other definitions
that you need are included right here (I ran a quick compiler check to be sure.)

#define VBIOS_THERM_COOLER_TABLE_10_ENTRY_SIZE_10                    0x00000010

// Fan Cooler Table Version
#define NV_FAN_COOLER_TABLE_V2_VERSION_10                             0x10

// We limit the size of V2 tables
#define NV_FAN_COOLER_TABLE_V2_MAX_ENTRIES                            10

// Entry skip.
#define NV_FAN_COOLER_TABLE_V2_SKIP_ENTRY                             0x000000FF

#define NV_FAN_COOLER_TABLE_TYPE                                      3:0       // field1- dword
#define NV_FAN_COOLER_TABLE_TYPE_PASSIVE_HEAT_SINK                    0x00000000
#define NV_FAN_COOLER_TABLE_TYPE_ACTIVE_FAN_SINK                      0x00000001
#define NV_FAN_COOLER_TABLE_TYPE_ACTIVE_SKIP                          0x0000000F
#define NV_FAN_COOLER_TABLE_TYPE_DEFAULT                              NV_FAN_COOLER_TABLE_TYPE_PASSIVE_HEAT_SINK

#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY                           6:4
#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY_GPU                       0x00000000
#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY_ALL                       0x00000001
#define NV_FAN_COOLER_TABLE_TARGET_AFFINITY_DEFAULT                   NV_FAN_COOLER_TABLE_TARGET_AFFINITY_GPU

#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE                            10:8
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_NONE                       0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_GPU                        0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_EXTERNAL                   0x00000002
#define NV_FAN_COOLER_TABLE_CONTROL_DEVICE_DEFAULT                    NV_FAN_COOLER_TABLE_CONTROL_DEVICE_NONE

#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE                         14:12
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_NONE                    0x00000000
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_GPU                     0x00000001
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_EXTERNAL_INSTANCE0      0x00000002
#define NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_DEFAULT                 NV_FAN_COOLER_TABLE_TACHOMETER_DEVICE_NONE

#define NV_FAN_COOLER_TABLE_CONTROL_SPEED_MAXIMUM                     25:16

#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL                            29:26
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_NONE                       0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_UNKNOWN                    0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_FAN_SPECIFIC_INSTANCE0     0x00000002
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_GPIO_FAN_FUNC_INSTANCE0    0x00000003
#define NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_DEFAULT                    NV_FAN_COOLER_TABLE_CONTROL_SIGNAL_NONE

#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY                          31:30
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_GPIO                     0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_LOW                      0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_HIGH                     0x00000002
#define NV_FAN_COOLER_TABLE_CONTROL_POLARITY_DEFAULT                  NV_FAN_COOLER_TABLE_CONTROL_POLARITY_GPIO

#define NV_FAN_COOLER_TABLE_CONTROL_SPEED_MINIMUM                     9:0      // field2- dword

#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL                         13:10
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_NONE                    0x00000000
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_UNKNOWN                 0x00000001
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_TACH_SPECIFIC_INSTANCE0 0x00000002
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_GPIO_TACH_FUNC_INSTANCE0 0x00000003
#define NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_DEFAULT                 NV_FAN_COOLER_TABLE_TACHOMETER_SIGNAL_NONE

#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE                           15:14
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_1                         0x00000000
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_2                         0x00000001
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_3                         0x00000002
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_4                         0x00000003
#define NV_FAN_COOLER_TABLE_TACHOMETER_RATE_DEFAULT                   NV_FAN_COOLER_TABLE_TACHOMETER_RATE_1

#define NV_FAN_COOLER_TABLE_PWM_MINIMUM                               22:16

#define NV_FAN_COOLER_TABLE_CONTROL_STOP                              23:23
#define NV_FAN_COOLER_TABLE_CONTROL_STOP_PWM                          0x00000000
#define NV_FAN_COOLER_TABLE_CONTROL_STOP_POWER                        0x00000001
#define NV_FAN_COOLER_TABLE_CONTROL_STOP_DEFAULT                      NV_FAN_COOLER_TABLE_CONTROL_STOP_PWM

#define NV_FAN_COOLER_TABLE_PWM_START                                 30:24

#define NV_FAN_COOLER_TABLE_PWM_FREQUENCY                             11:0     // field3 - dword
#define NV_FAN_COOLER_TABLE_PWM_FREQUENCY_UNDEFINED                   0x00000000

#define NV_FAN_COOLER_TABLE_FIELD3_RSVD                               15:12

#define NV_FAN_COOLER_TABLE_FIELD3_PWM_SCALE_SLOPE                    31:16

#define NV_FAN_COOLER_TABLE_FIELD4_PWM_SCALE_OFFSET                   15:0     // field4 - dword

#define NV_FAN_COOLER_TABLE_FIELD4_LOW_ENDPOINT_EXPECTED_ERROR        23:16

#define NV_FAN_COOLER_TABLE_FIELD4_INTERPOLATION_EXPECTED_ERROR       31:24

#define NV_FAN_COOLER_TABLE_FIELD5_HIGH_ENDPOINT_EXPECTED_ERROR       7:0

#define NV_FAN_COOLER_TABLE_FIELD5_RSVD                               31:8

// Fan Cooler Table entry

typedef struct
{
    NvU32 field1;
    NvU32 field2;
    NvU32 field3;
    NvU32 field4;
    NvU32 field5;
} FAN_COOLER_TABLEENTRY;

// Fan Cooler Table
typedef struct
{
    NvU32                   version;
    NvU32                   entrySize;
    NvU32                   entryCount;
    FAN_COOLER_TABLEENTRY   entries[NV_FAN_COOLER_TABLE_V2_MAX_ENTRIES];
} FAN_COOLER_TABLE, *PFAN_COOLER_TABLE;

// Default minimum fan level - *cannot* be overriden by VBIOS:
#define FAN_SPEEDCONTROL_MINIMUM_LEVEL_DEFAULT          30

// Default maximum fan level - can be overriden by VBIOS:
#define FAN_SPEEDCONTROL_MAXIMUM_LEVEL_DEFAULT          100

/*!
 * Scales a PWM ratio (raw duty cycle / period) to a functional percentage,
 * which more adequately represents the percent of full phsyical fan speed on
 * the fan.
 *
 * @param[in] fanPwmScaleOffset   Fan PWM scale offset, in F4.12
 * @param[in] fanPwmScaleSlope    Fan PWM scale slope, in F4.12
 * @param[in] pwmRatio    PWM ratio in F16.16.  However, expected values are in
 *            the range [0,1] so it's actually F1.16.
 *
 * @return Scaled percent in F16.16 - Note, this is an actual percentage stored
 *         in the fractional part.  The value is not scaled up by 100, as
 *         elsewhere in the RM.
 */
static NvUFXP16_16
fanPwmScalePwmRatioToPct
(
    NvSFXP4_12 fanPwmScaleOffset,
    NvSFXP4_12 fanPwmScaleSlope,
    NvUFXP16_16 pwmRatio
)
{
    NvUFXP16_16 pwmPct;

    if (fanPwmScaleSlope == 0)
    {
        // Various logging/tracing actions here...
        return 0;
    }

    //
    //   (F1.16 << 12) - (F4.12 << 16) => F4.28
    // / F4.12                         => F4.12
    // ----------------------------------------
    //                                    F4.16
    NvSFXP16_16 signedPwmPct = ((pwmRatio << 12) -
                                (((NvS32) fanPwmScaleOffset) << 16) +
                                 (fanPwmScaleSlope / 2)) /
                               fanPwmScaleSlope;

    if (signedPwmPct > NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 1))
    {
        pwmPct = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 1);
    }
    else if (signedPwmPct < NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 0))
    {
        pwmPct = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 0);
    }
    else
    {
        pwmPct = (NvUFXP16_16) signedPwmPct;
    }

    return pwmPct;
}

/*!
 * Scales a functional percentage (represing the percent of full physical fan
 * speed) to a PWM ratio (raw duty cycle / period).
 *
 * @param[in] fanPwmScaleOffset   Fan PWM scale offset, in F4.12
 * @param[in] fanPwmScaleSlope    Fan PWM scale slope, in F4.12
 * @param[in] pwmPct      Percent to scale in F16.16.  Note, this is an actual
 *            percentage stored in the fractional part.  The value is not scaled
 *            up by 100, as elsewhere in the RM.  So expected values are in the
 *            range [0,1] and actual value is really F.16.
 *
 * @return Scaled pwm ratio in F16.16
 */
static NvUFXP16_16
fanPwmScalePctToPwmRatio
(
    NvSFXP4_12 fanPwmScaleOffset,
    NvSFXP4_12 fanPwmScaleSlope,
    NvUFXP16_16 pwmPct
)
{
    NvUFXP16_16 pwmRatio;

    //
    //   (F1.16 * F4.12) >> 12  => F4.28 => F4.16
    // + F4.12 << 4                      => F4.16
    // ------------------------------------------
    //                                      F4.16
    //
    NvSFXP16_16 signedPwmRatio = pwmPct * fanPwmScaleSlope;
    signedPwmRatio = (signedPwmRatio >> 12) +
                     (DRF_VAL(_TYPES, _FXP, _FRACTIONAL_MSB(4, 12),
                            signedPwmRatio)) +
                     (((NvSFXP16_16) fanPwmScaleOffset) << 4);

    if (signedPwmRatio > NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 1))
    {
        pwmRatio = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 1);
    }
    else if (signedPwmRatio < NV_TYPES_S32_TO_SFXP_X_Y(16, 16, 0))
    {
        pwmRatio = NV_TYPES_U32_TO_UFXP_X_Y(16, 16, 0);
    }
    else
    {
        pwmRatio = (NvUFXP16_16) signedPwmRatio;
    }

    return pwmRatio;
}

/*!
 * Parses the PWM Scale slope and offset from the Fan Coolers Table.
 *
 * @param[in]   pTable          FAN_COOLER_TABLE pointer from which to pull the
 *                              information.
 * @param[out]  pPwmScaleSlope  Pointer in which to return the slope
 * @param[out]  pPwmScaleOffset Pointer in which to return the offset
 *
 * @return NV_OK if values successfully returned in the pointers
 * @return NV_ERR_INVALID_ARGMENT if NULL pointers are provided.
 * @return NV_ERR_NOT_SUPPORTED if this Fan Coolers Table does not support
 *         these fields.
 */

NV_STATUS
fanCoolerTableGetPwmScale
(
    PFAN_COOLER_TABLE   pTable,
    NvSFXP4_12         *pPwmScaleSlope,
    NvSFXP4_12         *pPwmScaleOffset
)
{
    NV_STATUS status = NV_ERR_NOT_SUPPORTED;
    NvU32     i;

    if ((pPwmScaleSlope == NULL) ||
        (pPwmScaleOffset == NULL))
    {
       status = NV_ERR_INVALID_ARGUMENT;
       goto fanCoolerTableGetPwmScale_exit;
    }

    //
    // Make sure that we know the table supports this field.
    //
    // Ideally, this should have been abstracted out in the devinit
    // function to parse this table, so that our storage isn't
    // version-dependent.
    //
    if ((pTable->version >= NV_FAN_COOLER_TABLE_V2_VERSION_10) &&
        (pTable->entrySize >= VBIOS_THERM_COOLER_TABLE_10_ENTRY_SIZE_10))
    {
        for (i = 0; i < pTable->entryCount; i++)
        {
            // Slope of zero is unsupported, and the error case for this data.
            if ((FLD_TEST_DRF(_FAN, _COOLER_TABLE, _TYPE, _ACTIVE_FAN_SINK,
                              pTable->entries[i].field1)) &&
                (FLD_TEST_DRF(_FAN, _COOLER_TABLE, _CONTROL_DEVICE, _GPU,
                              pTable->entries[i].field1)))
            {
                //
                // Scale of zero is an invalid/unexpected case.  This is a major
                // failure.
                //
                if (FLD_TEST_DRF_NUM(_FAN_COOLER_TABLE, _FIELD3,
                      _PWM_SCALE_SLOPE, NV_TYPES_S32_TO_SFXP_X_Y(4, 12, 0),
                      pTable->entries[i].field3))
                {
                    // Warning/assertion here: Found a PWM Scaling Slope of
                    // zero!  This is invalid and would effectively lock the
                    // fan speed.
                }
                else
                {
                    *pPwmScaleSlope = (NvSFXP4_12) DRF_VAL(_FAN_COOLER_TABLE,
                        _FIELD3, _PWM_SCALE_SLOPE, pTable->entries[0].field3);

                    *pPwmScaleOffset = (NvSFXP4_12) DRF_VAL(_FAN_COOLER_TABLE,
                        _FIELD4, _PWM_SCALE_OFFSET, pTable->entries[0].field4);

                    status = NV_OK;
                }
                break;
            }
        }
    }

fanCoolerTableGetPwmScale_exit:
    return status;
}

/**
 * Converts a PWM duty cycle to a fan level (fan speed percentage) based on the
 * fan period and duty cycle.
 *
 * @param[in] fanPeriod          The fan period to convert
 * @param[in] dutyCycle          The duty cycle to convert
 * @param[in] fanPwmScaleOffset  PWM scale offset, from the VBIOS table
 * @param[in] fanPwmScaleSlope   PWM scale slope, from the VBIOS table
 *
 * @return The converted level
 */
NvU32
fanConvertDutyCycleToLevel
(
    NvU32 fanPeriod,
    NvU32 dutyCycle,
    NvSFXP4_12 fanPwmScaleOffset,
    NvSFXP4_12 fanPwmScaleSlope
)
{
    NvU32       level;
    NvUFXP16_16 pwmRatio;
    NvU64       pwmRatio64;

    // If the fan period is 0, we dont have a cooler, so default to OFF.
    if (fanPeriod == 0)
    {
        // OFF
        level = 0;
    }
    // If the fanPeriod is 1, we have an ON/OFF cooler
    else if (fanPeriod == 1)
    {
        // If the dutyCycle matches the fanPeriod, the cooler is ON
        if (dutyCycle == fanPeriod)
        {
            // ON.
            level = 100;
        }
        else
        {
            // OFF.
            level = 0;
        }
    }
    else
    {
        //
        // Variable speed, so calc level from dutyCycle and fanPeriod
        //

        //
        // On our legacy boards, we can have periods > 16-bits, meaning the
        // dutyCycle can be also be > 16-bits.  This means that if we shift left
        // by 16 we can overflow 32-bits.
        //
        // However, once we divide by the period, the result will always be in
        // the range [0,1] because dutyCycle <= period.  Thus, if we shift and
        // do the division/scaling in 64-bits the result can be truncated back
        // to 32-bits.
        //
        //   32.0 << 16   => 32.16
        // / 32.0
        // -----------------------
        //                  1.16
        //
        pwmRatio64 = NV_UNSIGNED_ROUNDED_DIV(((NvU64)dutyCycle) <<
                      DRF_SHIFT(NV_TYPES_FXP_INTEGER(32, 16)), fanPeriod);
        pwmRatio = (NvUFXP16_16) pwmRatio64;

        // F16.16 >> 16 => F16.0
        level = NV_TYPES_UFXP_X_Y_TO_U32_ROUNDED(16, 16,
                    fanPwmScalePwmRatioToPct(fanPwmScaleOffset,
                                             fanPwmScaleSlope,
                                             pwmRatio) * 100);

        // Make sure we're within the max and min bounds - this can be due to
        // rounding/truncation issues due to integer math.
        if (level > FAN_SPEEDCONTROL_MAXIMUM_LEVEL_DEFAULT)
        {
            level = FAN_SPEEDCONTROL_MAXIMUM_LEVEL_DEFAULT;
        }
        else if (level < FAN_SPEEDCONTROL_MINIMUM_LEVEL_DEFAULT)
        {
            level = FAN_SPEEDCONTROL_MINIMUM_LEVEL_DEFAULT;
        }
    }
    return level;
}

thanks,
-- 
John Hubbard
NVIDIA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20180206/353ee97c/attachment-0001.html>


More information about the Nouveau mailing list