[PATCH 1/6] drm/msm: Fix race condition in the submit path

Jordan Crouse jcrouse at codeaurora.org
Tue Oct 3 15:27:01 UTC 2017


From: Sharat Masetty <smasetty at codeaurora.org>

There is a race condition issue between the IRQ context trying to
trigger preemption and the user context trying to submit commands to
the GPU. The check in a5xx_flush() API only updates the wptr if the GPU is
not in preemption. In the cases where we move from PREEMPT_START to
PREEMPT_NONE there is a small window where the preempt state is still
in START but the CPU context switches to the user thread which is in
the a5xx_flush() call to update the wptr, but fails to update the wptr to
the GPU since the preempt state is not PREEMPT_NONE. This leads to a
GPU stall.

Introduce a new intermediate state PREEMPT_ABORT and
change preempt_trigger() to use gpu's current ring instead of the
ring retrieved from get_next_ring() while in this state.

Signed-off-by: Sharat Masetty <smasetty at codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse at codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  8 +++++++-
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 17 ++++++++++++++---
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index f062a90..6fb8c2f 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -56,6 +56,8 @@ struct a5xx_gpu {
  * PREEMPT_NONE - no preemption in progress.  Next state START.
  * PREEMPT_START - The trigger is evaulating if preemption is possible. Next
  * states: TRIGGERED, NONE
+ * PREEMPT_ABORT - An intermediate state before moving back to NONE. Next
+ * state: NONE.
  * PREEMPT_TRIGGERED: A preemption has been executed on the hardware. Next
  * states: FAULTED, PENDING
  * PREEMPT_FAULTED: A preemption timed out (never completed). This will trigger
@@ -67,6 +69,7 @@ struct a5xx_gpu {
 enum preempt_state {
 	PREEMPT_NONE = 0,
 	PREEMPT_START,
+	PREEMPT_ABORT,
 	PREEMPT_TRIGGERED,
 	PREEMPT_FAULTED,
 	PREEMPT_PENDING,
@@ -154,7 +157,10 @@ static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
 /* Return true if we are in a preempt state */
 static inline bool a5xx_in_preempt(struct a5xx_gpu *a5xx_gpu)
 {
-	return !(atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_NONE);
+	int preempt_state = atomic_read(&a5xx_gpu->preempt_state);
+
+	return !(preempt_state == PREEMPT_NONE ||
+			preempt_state == PREEMPT_ABORT);
 }
 
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
index 6a3767d..40f4840 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -122,9 +122,20 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
 	 * one do nothing except to update the wptr to the latest and greatest
 	 */
 	if (!ring || (a5xx_gpu->cur_ring == ring)) {
-		update_wptr(gpu, ring);
-
-		/* Set the state back to NONE */
+		/*
+		 * Its possible that while a preemption request is in progress
+		 * from an irq context, a user context trying to submit might
+		 * fail to update the write pointer, because it determines
+		 * that the preempt state is not PREEMPT_NONE.
+		 *
+		 * Close the race by introducing an intermediate
+		 * state PREEMPT_ABORT to let the submit path
+		 * know that the ringbuffer is not going to change
+		 * and can safely update the write pointer.
+		 */
+
+		set_preempt_state(a5xx_gpu, PREEMPT_ABORT);
+		update_wptr(gpu, a5xx_gpu->cur_ring);
 		set_preempt_state(a5xx_gpu, PREEMPT_NONE);
 		return;
 	}
-- 
1.9.1



More information about the dri-devel mailing list