[PATCH v2 17/22] drm/nouveau/timer: Fall back to kernel timer if GPU timer read failed

Wed Aug 12 04:40:47 PDT 2015

Unloading the nouveau module while the GPU is asleep (e.g. on dual GPU
laptops) leads to an infinite loop in nvkm_timer_wait_eq() because the
timer read out is 0xffffffffffffffff so the condition of the while loop
becomes -1 - (-1) < nsec and stays like that unless the GPU is woken up.

Use the kernel timer as fallback in this unlikely event. Synchronize the
kernel timer and GPU timer in nv04_timer_init() / gk20a_timer_init()
which should get called once on driver initialization and on every
resume.

Even with this fix applied, unloading the module takes a whopping
167 seconds. This could be reduced by changing the NV_WAIT_DEFAULT
timeout from the current (maybe excessive?) 2 seconds to 200 ms.

A WARN_ON is spewed out at nouveau_bo.c:398 after 81 seconds and
a null pointer dereference occurs in nouveau_cli_destroy(),
so there's more to fix here.

This patch might also be needed to properly handle a GPU connected
via Thunderbolt which is suddenly unplugged.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88861
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61115
Tested-by: Paul Hordiienko <pvt.gord at gmail.com>
    [MBP  6,2 2010  intel ILK + nvidia GT216  pre-retina]
Tested-by: William Brown <william at blackhats.net.au>
    [MBP  8,2 2011  intel SNB + amd turks     pre-retina]
Tested-by: Lukas Wunner <lukas at wunner.de>
    [MBP  9,1 2012  intel IVB + nvidia GK107  pre-retina]
Tested-by: Bruno Bierbaumer <bruno at bierbaumer.net>
    [MBP 11,3 2013  intel HSW + nvidia GK107  retina -- work in progress]

Signed-off-by: Lukas Wunner <lukas at wunner.de>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/timer/gk20a.c | 4 ++++
 drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.c  | 9 +++++++++
 drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.h  | 1 +
 3 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/timer/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/timer/gk20a.c
index 80e3806..28d27ff 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/timer/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/timer/gk20a.c
@@ -41,6 +41,10 @@ gk20a_timer_init(struct nvkm_object *object)
 	/* restore the time before suspend */
 	nv_wr32(priv, NV04_PTIMER_TIME_1, hi);
 	nv_wr32(priv, NV04_PTIMER_TIME_0, lo);
+
+	/* save kernel time as fallback */
+	priv->suspend_ktime = ktime_to_ns(ktime_get()) - priv->suspend_time;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.c
index 6b7facb..228749d 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.c
@@ -36,6 +36,11 @@ nv04_timer_read(struct nvkm_timer *ptimer)
 		lo = nv_rd32(priv, NV04_PTIMER_TIME_0);
 	} while (hi != nv_rd32(priv, NV04_PTIMER_TIME_1));
 
+	if (unlikely(hi == -1 && lo == -1)) {
+		nv_spam(priv, "read failed, falling back to kernel timer\n");
+		return ktime_to_ns(ktime_get()) - priv->suspend_ktime;
+	}
+
 	return ((u64)hi << 32 | lo);
 }
 
@@ -216,6 +221,10 @@ nv04_timer_init(struct nvkm_object *object)
 	nv_wr32(priv, NV04_PTIMER_INTR_EN_0, 0x00000000);
 	nv_wr32(priv, NV04_PTIMER_TIME_1, hi);
 	nv_wr32(priv, NV04_PTIMER_TIME_0, lo);
+
+	/* save kernel time as fallback */
+	priv->suspend_ktime = ktime_to_ns(ktime_get()) - priv->suspend_time;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.h b/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.h
index 89996a9..1b83a0f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/timer/nv04.h
@@ -15,6 +15,7 @@ struct nv04_timer_priv {
 	struct list_head alarms;
 	spinlock_t lock;
 	u64 suspend_time;
+	u64 suspend_ktime;
 };
 
 int  nv04_timer_ctor(struct nvkm_object *, struct nvkm_object *,
-- 
1.8.5.2 (Apple Git-48)