[Nouveau] [PATCH v2] pmu: fix queued messages while getting no IRQ
Ilia Mirkin
imirkin at alum.mit.edu
Sat Nov 14 10:50:19 PST 2015
On Sat, Nov 14, 2015 at 1:44 PM, Karol Herbst <nouveau at karolherbst.de> wrote:
> I encountered while stresstesting the reclocking code, that rarely (1 out of
> 20.000+ requests) we don't get any IRQ in nvkm_pmu_intr.
>
> This means we have a queued message on the pmu, but nouveau doesn't read it and
> waits infinitely in nvkm_pmu_send:
> if (reply) {
> wait_event(pmu->recv.wait, (pmu->recv.process == 0));
>
> therefore let us use wait_event_timeout with a 1s timeout frame and just check
> whether there is a message queued and handle it if there is one.
>
> Return -ETIMEDOUT whenever we timed out and there is no message queued or when
> we hit another timeout while trying to read the message without getting any IRQ
>
> The benefit of not using wait_event is, that we don't have a kworker waiting
> on an event, which makes it easier to reload the module at runtime, which helps
> me developing on nouveau on my laptop a lot, because I don't need to reboot
> anymore
>
> Nethertheless, we shouldn't use wait_event here, because we can't guarantee any
> answere at all, can we?
>
> v2: moved it into a new function
>
> Signed-off-by: Karol Herbst <nouveau at karolherbst.de>
> ---
> drm/nouveau/nvkm/subdev/pmu/base.c | 43 ++++++++++++++++++++++++++++++++------
> 1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drm/nouveau/nvkm/subdev/pmu/base.c b/drm/nouveau/nvkm/subdev/pmu/base.c
> index 6b2007f..fafbe2a 100644
> --- a/drm/nouveau/nvkm/subdev/pmu/base.c
> +++ b/drm/nouveau/nvkm/subdev/pmu/base.c
> @@ -43,6 +43,41 @@ nvkm_pmu_handle_reclk_request(struct work_struct *work)
> nvkm_clk_pmu_reclk_request(clk, pmu->intr.data[0]);
> }
>
> +static int
> +wait_for_pmu_reply(struct nvkm_pmu *pmu, u32 reply[2])
> +{
> + struct nvkm_subdev *subdev = &pmu->subdev;
> + struct nvkm_device *device = subdev->device;
> + unsigned long jiffies = msecs_to_jiffies(1000);
> +
> + if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process == 0), jiffies)) {
> + u32 addr = nvkm_rd32(device, 0x10a4cc);
> + nvkm_error(subdev, "wait on reply timed out\n");
> +
> + if (addr != nvkm_rd32(device, 0x10a4c8)) {
> + nvkm_error(subdev, "found queued message without getting an interrupt\n");
> + schedule_work(&pmu->recv.work);
> +
> + if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process == 0), jiffies)) {
> + nvkm_error(subdev, "failed to repair PMU state\n");
> + goto reply_error;
> + }
> + } else
Not sure whether kernel style dictates this, but I really hate these
"hanging" else's... both sides should have brackets if either one
does.
> + goto reply_error;
> + }
> +
> + reply[0] = pmu->recv.data[0];
> + reply[1] = pmu->recv.data[1];
> + mutex_unlock(&subdev->mutex);
> + return 0;
> +
> +reply_error:
> + reply[0] = 0;
> + reply[1] = 0;
> + mutex_unlock(&subdev->mutex);
> + return -ETIMEDOUT;
> +}
> +
> int
> nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
> u32 process, u32 message, u32 data0, u32 data1)
> @@ -88,12 +123,8 @@ nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
> nvkm_wr32(device, 0x10a580, 0x00000000);
>
> /* wait for reply, if requested */
> - if (reply) {
> - wait_event(pmu->recv.wait, (pmu->recv.process == 0));
> - reply[0] = pmu->recv.data[0];
> - reply[1] = pmu->recv.data[1];
> - mutex_unlock(&subdev->mutex);
> - }
> + if (reply)
> + return wait_for_pmu_reply(pmu, reply);
Having one function lock and another unlock is a disaster waiting to
happen. Perhaps make wiat_for_pmu_reply not handle the unlock and
instead do
int ret = 0;
if (reply)
ret = wait_for_pmu_reply()
return ret;
Additionally leaving the reply[] filling in this function would allow
you to avoid annoying error handling and goto's in the other function.
>
> return 0;
> }
> --
> 2.6.3
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
More information about the Nouveau
mailing list