nouveau shuts the machine down with v3.9-rc1 (temperature (72 C) hit the 'shutdown' threshold).

Martin Peres martin.peres at free.fr
Mon Mar 4 11:21:48 PST 2013


Hi Konrad,

On 04/03/2013 19:40, Konrad Rzeszutek Wilk wrote:> After git merge 
ab7826595e9ec51a51f622c5fc91e2f59440481a
 > (Merge tag 'mfd-3.9-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6)
 > the nouveau driver ends up shutting of the machine when booting.
 >
 >
 > I hadn't done a git bisection yet and was wondering if there are some
 > juice commits I ought to look at?

Sure, no need to bisect, it is a new (apparently-broken-for-you) feature.

The code is in /drivers/gpu/drm/nouveau/core/subdev/therm/


 >
 > Here is the serial console:


 > [    6.940628] nouveau  [  PTHERM][0000:00:0d.0] Thermal management: 
disabled
 > [    6.957474] nouveau  [  PTHERM][0000:00:0d.0] programmed 
thresholds [ 90(2), 95(3), 145(2), 135(5) ]
 > [    6.966594] nouveau     6.975100] nouveau  [ 
PTHERM][0000:00:0d.0] Thermal management: automatic
 > [    6.982059] nouveau  [  PTHERM][0000:00:0d.0] temperature (88 C) 
hit the 'downclock' threshold
 > [    6.990680] nouveau  [  PTHERM][0000:00:0d.0] temperature (88 C) 
hit the 'critical' threshold
 > [    6.999194] nouveau  [  PTHERM][0000:00:0d.0] temperature (90 C) 
hit the 'shutdown' threshold

See, this is strange. If I believe the "programmed thresholds" line, the 
fanboost threshold is at 90°C, downclock is at 95°C, critical 
temperature is at 145°C and shutdown is at 135°C.
So, from the BIOS side, things seem to be in fairly good shape (critical 
should be lower than shutdown, but that's OK).

My theory is that your temperature sensor is very variable that would 
set off the shutdown alarm. So, either the sensor needs more settling 
time or the output is genuinely very variable.

In the first case, we could fix that by increasing the settling time (at 
the expense of a longer boot period). We could also for a 10s wait at 
boot time before reading temperature.
If this is the latter case, we only have the solution to average the 
temperature on several samples. I would need statistics on the 
variability in order to calculate a proper low-pass filter that wouldn't 
be too slow or too RAM/wakeup-intensive.

I really hope the problem is the settling time!


Here is what you can do to test the theory:

Change the mdelay at line 41 of 
/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c 
(http://cgit.freedesktop.org/nouveau/linux-2.6/tree/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c#n41) 
from 10 to 1000.
Please also add an mdelay of 1000 between lines 44 and 45.

If it works with this patch, then try decreasing the delay to 20ms.

In any way, I'll send some thermal patches tonight to be more resistant 
to long settling times.

Thanks for reporting!

Martin (mupuf)




More information about the dri-devel mailing list