[Intel-gfx] [PATCH] x86: Downgrade clock throttling thermal event critical error

Chris Wilson chris at chris-wilson.co.uk
Wed Oct 10 12:10:04 UTC 2018


Quoting Tvrtko Ursulin (2018-10-10 12:59:59)
> 
> On 09/10/2018 12:37, Chris Wilson wrote:
> > Under CI testing, it is common for the cpus to overheat with the
> > continuous workloads and end up being throttled. As the cpus still
> > function, it is less of a critical error meriting urgent action, but an
> > expected yet significant condition (pr_note).
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Petri Latvala <petri.latvala at intel.com>
> > ---
> >   arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > index 2da67b70ba98..bc57b5988589 100644
> > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > @@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
> >       /* if we just entered the thermal event */
> >       if (new_event) {
> >               if (event == THERMAL_THROTTLING_EVENT)
> > -                     pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > -                             this_cpu,
> > -                             level == CORE_LEVEL ? "Core" : "Package",
> > -                             state->count);
> > +                     pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > +                               this_cpu,
> > +                               level == CORE_LEVEL ? "Core" : "Package",
> > +                               state->count);
> >               return;
> >       }
> >       if (old_event) {
> > 
> 
> It even sounds it wouldn't be far fetched to argue these days notice is 
> the correct log level for thermal throttling. Unless there are more 
> sources of throttling messages. TBC when I get back to my Skull Canyon. 
> That one certainly logs something like this shortly after invoking make -j8.

I was thinking of tarting up the language to say most processors
nowadays can easily exceed their Thermal Design Point and are built with
that in mind. The caveat is making sure that the shutdown limit is still
reported as a critical event, iirc that comes as a MCE.
-Chris


More information about the Intel-gfx mailing list