[Nouveau] [PATCH v2 2/4] gpio: fail if gpu external power is missing
Mark Menzynski
mmenzyns at redhat.com
Tue Jul 16 13:24:04 UTC 2019
This is what Nvidia did after using nvidia-smi, which is not very far
from what happens now with the patch.
kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 235
kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 430.34 Wed
Jun 26 12:19:48 CDT 2019
kernel: NVRM: GPU 0000:01:00.0: GPU does not have the necessary power
cables connected.
kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x1c:1133)
kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Also, when booting, POST refuses to boot if power cables are not
connected, but there are scenarios where you don't boot with the
Nvidia GPU.
I am not sure about limiting it to warning, it makes sense but I also
think it should fail.
But I wanted to ask about the error messages. At this moment, this is
current output:
[17383.042727] nouveau 0000:12:00.0: gpio: GPU is missing power, check
its power cables. Boot with nouveau.config=NvPowerChecks=0 to disable.
[17383.042728] nouveau 0000:12:00.0: gpio: init failed, -22
[17383.042986] nouveau 0000:12:00.0: init failed with -22
[17383.042987] nouveau: DRM-master:00000000:00000080: init failed with -22
[17383.042990] nouveau 0000:12:00.0: DRM-master: Device allocation failed: -22
[17383.043458] nouveau: probe of 0000:12:00.0 failed with error -22
Isn't it a wrong place to implement the checks? Maybe I should put it
somewhere else?
On Tue, Jul 16, 2019 at 5:47 AM Ben Skeggs <skeggsb at gmail.com> wrote:
>
> On Mon, 15 Jul 2019 at 22:26, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> >
> > Please add a config override to skip this, since we'll invariably get
> > it wrong for some setup, and should be able to provide users with
> > workarounds while the issue is being worked out.
> Yeah, this makes me nervous as well. In the very least, I'd like a
> config option, but I'm still wondering if perhaps we shouldn't limit
> this to a warning (which people tend to report) for a while first too.
>
> Also, what's NV's behaviour here? Do they refuse to load, or do they
> do something like force the GPU into its lowest pstate?
>
> Ben.
>
> >
> > On Mon, Jul 15, 2019 at 5:43 AM Mark Menzynski <mmenzyns at redhat.com> wrote:
> > >
> > > Currently, nouveau doesn't check if GPU is missing power. This
> > > patch makes nouveau fail when this happens on latest GPUs.
> > >
> > > It checks GPIO function 121 (External Power Emergency), which
> > > should detect power problems on GPU initialization.
> > >
> > > Tested on TU104, GP106 and GF100.
> > >
> > > Signed-off-by: Mark Menzynski <mmenzyns at redhat.com>
> > > ---
> > > drm/nouveau/include/nvkm/subdev/bios/gpio.h | 1 +
> > > drm/nouveau/nvkm/subdev/gpio/base.c | 23 +++++++++++++++++++++
> > > 2 files changed, 24 insertions(+)
> > >
> > > diff --git a/drm/nouveau/include/nvkm/subdev/bios/gpio.h b/drm/nouveau/include/nvkm/subdev/bios/gpio.h
> > > index 2f40935f..a70ec9e8 100644
> > > --- a/drm/nouveau/include/nvkm/subdev/bios/gpio.h
> > > +++ b/drm/nouveau/include/nvkm/subdev/bios/gpio.h
> > > @@ -7,6 +7,7 @@ enum dcb_gpio_func_name {
> > > DCB_GPIO_TVDAC0 = 0x0c,
> > > DCB_GPIO_TVDAC1 = 0x2d,
> > > DCB_GPIO_FAN_SENSE = 0x3d,
> > > + DCB_GPIO_EXT_POWER_LOW = 0x79,
> > > DCB_GPIO_LOGO_LED_PWM = 0x84,
> > > DCB_GPIO_UNUSED = 0xff,
> > > DCB_GPIO_VID0 = 0x04,
> > > diff --git a/drm/nouveau/nvkm/subdev/gpio/base.c b/drm/nouveau/nvkm/subdev/gpio/base.c
> > > index 1399d923..c4685807 100644
> > > --- a/drm/nouveau/nvkm/subdev/gpio/base.c
> > > +++ b/drm/nouveau/nvkm/subdev/gpio/base.c
> > > @@ -182,12 +182,35 @@ static const struct dmi_system_id gpio_reset_ids[] = {
> > > { }
> > > };
> > >
> > > +static enum dcb_gpio_func_name power_checks[] = {
> > > + DCB_GPIO_EXT_POWER_LOW,
> > > +};
> > > +
> > > static int
> > > nvkm_gpio_init(struct nvkm_subdev *subdev)
> > > {
> > > struct nvkm_gpio *gpio = nvkm_gpio(subdev);
> > > + struct dcb_gpio_func func;
> > > + int ret;
> > > + int i;
> > > +
> > > if (dmi_check_system(gpio_reset_ids))
> > > nvkm_gpio_reset(gpio, DCB_GPIO_UNUSED);
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(power_checks); ++i) {
> > > + ret = nvkm_gpio_find(gpio, 0, power_checks[i], DCB_GPIO_UNUSED,
> > > + &func);
> > > + if (ret)
> > > + continue;
> > > +
> > > + ret = nvkm_gpio_get(gpio, 0, func.func, func.line);
> > > + if (ret) {
> > > + nvkm_error(&gpio->subdev,
> > > + "not enough power, check GPU power cable\n");
> > > + return -EINVAL;
> > > + }
> > > + }
> > > +
> > > return 0;
> > > }
> > >
> > > --
> > > 2.21.0
> > >
> > > _______________________________________________
> > > Nouveau mailing list
> > > Nouveau at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/nouveau
> > _______________________________________________
> > Nouveau mailing list
> > Nouveau at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/nouveau
More information about the Nouveau
mailing list