[Nouveau] [PATCH v2] pmu: fix queued messages while getting no IRQ

Sat Nov 14 10:50:19 PST 2015

On Sat, Nov 14, 2015 at 1:44 PM, Karol Herbst <nouveau at karolherbst.de> wrote:
> I encountered while stresstesting the reclocking code, that rarely (1 out of
> 20.000+ requests) we don't get any IRQ in nvkm_pmu_intr.
>
> This means we have a queued message on the pmu, but nouveau doesn't read it and
> waits infinitely in nvkm_pmu_send:
>         if (reply) {
>                 wait_event(pmu->recv.wait, (pmu->recv.process == 0));
>
> therefore let us use wait_event_timeout with a 1s timeout frame and just check
> whether there is a message queued and handle it if there is one.
>
> Return -ETIMEDOUT whenever we timed out and there is no message queued or when
> we hit another timeout while trying to read the message without getting any IRQ
>
> The benefit of not using wait_event is, that we don't have a kworker waiting
> on an event, which makes it easier to reload the module at runtime, which helps
> me developing on nouveau on my laptop a lot, because I don't need to reboot
> anymore
>
> Nethertheless, we shouldn't use wait_event here, because we can't guarantee any
> answere at all, can we?
>
> v2: moved it into a new function
>
> Signed-off-by: Karol Herbst <nouveau at karolherbst.de>
> ---
>  drm/nouveau/nvkm/subdev/pmu/base.c | 43 ++++++++++++++++++++++++++++++++------
>  1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drm/nouveau/nvkm/subdev/pmu/base.c b/drm/nouveau/nvkm/subdev/pmu/base.c
> index 6b2007f..fafbe2a 100644
> --- a/drm/nouveau/nvkm/subdev/pmu/base.c
> +++ b/drm/nouveau/nvkm/subdev/pmu/base.c
> @@ -43,6 +43,41 @@ nvkm_pmu_handle_reclk_request(struct work_struct *work)
>         nvkm_clk_pmu_reclk_request(clk, pmu->intr.data[0]);
>  }
>
> +static int
> +wait_for_pmu_reply(struct nvkm_pmu *pmu, u32 reply[2])
> +{
> +       struct nvkm_subdev *subdev = &pmu->subdev;
> +       struct nvkm_device *device = subdev->device;
> +       unsigned long jiffies = msecs_to_jiffies(1000);
> +
> +       if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process == 0), jiffies)) {
> +               u32 addr = nvkm_rd32(device, 0x10a4cc);
> +               nvkm_error(subdev, "wait on reply timed out\n");
> +
> +               if (addr != nvkm_rd32(device, 0x10a4c8)) {
> +                       nvkm_error(subdev, "found queued message without getting an interrupt\n");
> +                       schedule_work(&pmu->recv.work);
> +
> +                       if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process == 0), jiffies)) {
> +                               nvkm_error(subdev, "failed to repair PMU state\n");
> +                               goto reply_error;
> +                       }
> +               } else

Not sure whether kernel style dictates this, but I really hate these
"hanging" else's... both sides should have brackets if either one
does.

> +                       goto reply_error;
> +       }
> +
> +       reply[0] = pmu->recv.data[0];
> +       reply[1] = pmu->recv.data[1];
> +       mutex_unlock(&subdev->mutex);
> +       return 0;
> +
> +reply_error:
> +       reply[0] = 0;
> +       reply[1] = 0;
> +       mutex_unlock(&subdev->mutex);
> +       return -ETIMEDOUT;
> +}
> +
>  int
>  nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
>               u32 process, u32 message, u32 data0, u32 data1)
> @@ -88,12 +123,8 @@ nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
>         nvkm_wr32(device, 0x10a580, 0x00000000);
>
>         /* wait for reply, if requested */
> -       if (reply) {
> -               wait_event(pmu->recv.wait, (pmu->recv.process == 0));
> -               reply[0] = pmu->recv.data[0];
> -               reply[1] = pmu->recv.data[1];
> -               mutex_unlock(&subdev->mutex);
> -       }
> +       if (reply)
> +               return wait_for_pmu_reply(pmu, reply);

Having one function lock and another unlock is a disaster waiting to
happen. Perhaps make wiat_for_pmu_reply not handle the unlock and
instead do

int ret = 0;
if (reply)
  ret = wait_for_pmu_reply()

return ret;

Additionally leaving the reply[] filling in this function would allow
you to avoid annoying error handling and goto's in the other function.

>
>         return 0;
>  }
> --
> 2.6.3
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau