[PATCH 2/2] drm/v3d: Appease lockdep while updating GPU stats
Maíra Canal
mcanal at igalia.com
Mon Aug 12 15:27:00 UTC 2024
Hi Tvrtko,
On 8/12/24 06:12, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>
> Lockdep thinks our seqcount_t usage is unsafe because the update path can
> be both from irq and worker context:
>
> [ ] ================================
> [ ] WARNING: inconsistent lock state
> [ ] 6.10.3-v8-16k-numa #159 Tainted: G WC
> [ ] --------------------------------
> [ ] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ ] swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ ] ffff80003d7c08d0 (&v3d_priv->stats[i].lock){?.+.}-{0:0}, at: v3d_irq+0xc8/0x660 [v3d]
> [ ] {HARDIRQ-ON-W} state was registered at:
> [ ] lock_acquire+0x1f8/0x328
> [ ] v3d_job_start_stats.isra.0+0xd8/0x218 [v3d]
> [ ] v3d_bin_job_run+0x23c/0x388 [v3d]
> [ ] drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
> [ ] process_one_work+0x62c/0xb48
> [ ] worker_thread+0x468/0x5b0
> [ ] kthread+0x1c4/0x1e0
> [ ] ret_from_fork+0x10/0x20
> [ ] irq event stamp: 337094
> [ ] hardirqs last enabled at (337093): [<ffffc0008144ce7c>] default_idle_call+0x11c/0x140
> [ ] hardirqs last disabled at (337094): [<ffffc0008144a354>] el1_interrupt+0x24/0x58
> [ ] softirqs last enabled at (337082): [<ffffc00080061d90>] handle_softirqs+0x4e0/0x538
> [ ] softirqs last disabled at (337073): [<ffffc00080010364>] __do_softirq+0x1c/0x28
> [ ]
> other info that might help us debug this:
> [ ] Possible unsafe locking scenario:
>
> [ ] CPU0
> [ ] ----
> [ ] lock(&v3d_priv->stats[i].lock);
> [ ] <Interrupt>
> [ ] lock(&v3d_priv->stats[i].lock);
> [ ]
> *** DEADLOCK ***
>
> [ ] no locks held by swapper/0/0.
> [ ]
> stack backtrace:
> [ ] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G WC 6.10.3-v8-16k-numa #159
> [ ] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
> [ ] Call trace:
> [ ] dump_backtrace+0x170/0x1b8
> [ ] show_stack+0x20/0x38
> [ ] dump_stack_lvl+0xb4/0xd0
> [ ] dump_stack+0x18/0x28
> [ ] print_usage_bug+0x3cc/0x3f0
> [ ] mark_lock+0x4d0/0x968
> [ ] __lock_acquire+0x784/0x18c8
> [ ] lock_acquire+0x1f8/0x328
> [ ] v3d_job_update_stats+0xec/0x2e0 [v3d]
> [ ] v3d_irq+0xc8/0x660 [v3d]
> [ ] __handle_irq_event_percpu+0x1f8/0x488
> [ ] handle_irq_event+0x88/0x128
> [ ] handle_fasteoi_irq+0x298/0x408
> [ ] generic_handle_domain_irq+0x50/0x78
>
> But it is a false positive because all the queue-stats pairs have their
> own lock and jobs are also one at a time.
>
> Nevertheless we can appease lockdep by doing two things:
>
> 1. Split the locks into two classes:
>
> Because only GPU jobs have the irq context update section, this means no
> further changes are required for the CPU based queues.
>
> 2. Disable local interrupts in the GPU stats update portions:
>
> This appeases lockdep that all GPU job update sides consistently run with
> interrupts disabled. Again, it isn't a real locking issue that this fixes
> but it avoids an alarming false positive lockdep splat.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Fixes: 6abe93b621ab ("drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler")
> Cc: Maíra Canal <mcanal at igalia.com>
> ---
> Splitting into two own lock classes is perhaps too complicated and instead
> we could use the new v3d_job_start_stats_irqsave() helper from the CPU
> jobs too. I *think* that would fix the false positive too.
>
> The naming of v3d_job_start_stats_irqsave() is also a bit dodgy (given
> that the irqsave part depends on lockdep), but I have no better ideas at
> the moment.
>
> Having said this.. Perhaps simply the #ifdef dance with a comment to
> existing v3d_job_start_stats() would be better? It would be a much shorter
> and a very localised patch with perhaps no new downsides.
TBH I don't really like the idea of creating two lock classes only to
fix a false positive in lockdep. The code just got too complex for a
feature that just exists for tracking stats. Moreover, it is a false
positive.
If possible, I'd try any of the two other options you suggested here.
They feel more digestible for a false positive fix IMHO.
Best Regards,
- Maíra
> ---
> drivers/gpu/drm/v3d/v3d_drv.c | 16 +++++++++++++++-
> drivers/gpu/drm/v3d/v3d_gem.c | 15 ++++++++++++++-
> drivers/gpu/drm/v3d/v3d_sched.c | 29 +++++++++++++++++++++++++----
> 3 files changed, 54 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
> index d7ff1f5fa481..4a5d3ab1281b 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -106,6 +106,8 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
> static int
> v3d_open(struct drm_device *dev, struct drm_file *file)
> {
> + static struct lock_class_key v3d_stats_gpu_lock_class;
> + static struct lock_class_key v3d_stats_cpu_lock_class;
> struct v3d_dev *v3d = to_v3d_dev(dev);
> struct v3d_file_priv *v3d_priv;
> struct drm_gpu_scheduler *sched;
> @@ -118,13 +120,25 @@ v3d_open(struct drm_device *dev, struct drm_file *file)
> v3d_priv->v3d = v3d;
>
> for (i = 0; i < V3D_MAX_QUEUES; i++) {
> + struct lock_class_key *key;
> + char *name;
> +
> sched = &v3d->queue[i].sched;
> drm_sched_entity_init(&v3d_priv->sched_entity[i],
> DRM_SCHED_PRIORITY_NORMAL, &sched,
> 1, NULL);
>
> memset(&v3d_priv->stats[i], 0, sizeof(v3d_priv->stats[i]));
> - seqcount_init(&v3d_priv->stats[i].lock);
> +
> + if (i == V3D_CACHE_CLEAN || i == V3D_CPU) {
> + key = &v3d_stats_cpu_lock_class;
> + name = "v3d_client_stats_cpu_lock";
> + } else {
> + key = &v3d_stats_gpu_lock_class;
> + name = "v3d_client_stats_gpu_lock";
> + }
> +
> + __seqcount_init(&v3d_priv->stats[i].lock, name, key);
> }
>
> v3d_perfmon_open_file(v3d_priv);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index da8faf3b9011..3567a80e603d 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -242,16 +242,29 @@ v3d_invalidate_caches(struct v3d_dev *v3d)
> int
> v3d_gem_init(struct drm_device *dev)
> {
> + static struct lock_class_key v3d_stats_gpu_lock_class;
> + static struct lock_class_key v3d_stats_cpu_lock_class;
> struct v3d_dev *v3d = to_v3d_dev(dev);
> u32 pt_size = 4096 * 1024;
> int ret, i;
>
> for (i = 0; i < V3D_MAX_QUEUES; i++) {
> struct v3d_queue_state *queue = &v3d->queue[i];
> + struct lock_class_key *key;
> + char *name;
>
> queue->fence_context = dma_fence_context_alloc(1);
> memset(&queue->stats, 0, sizeof(queue->stats));
> - seqcount_init(&queue->stats.lock);
> +
> + if (i == V3D_CACHE_CLEAN || i == V3D_CPU) {
> + key = &v3d_stats_cpu_lock_class;
> + name = "v3d_stats_cpu_lock";
> + } else {
> + key = &v3d_stats_gpu_lock_class;
> + name = "v3d_stats_gpu_lock";
> + }
> +
> + __seqcount_init(&queue->stats.lock, name, key);
> }
>
> spin_lock_init(&v3d->mm_lock);
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index cc2e5a89467b..b2540e20d30c 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -149,6 +149,27 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue)
> preempt_enable();
> }
>
> +/*
> + * We only need to disable local interrupts to appease lockdep who otherwise
> + * would think v3d_job_start_stats vs v3d_stats_update has an unsafe in-irq vs
> + * no-irq-off usage problem. This is a false positive becuase all the locks are
> + * per queue and stats type, and all jobs are completely one at a time
> + * serialised.
> + */
> +static void
> +v3d_job_start_stats_irqsave(struct v3d_job *job, enum v3d_queue queue)
> +{
> +#ifdef CONFIG_LOCKDEP
> + unsigned long flags;
> +
> + local_irq_save(flags);
> +#endif
> + v3d_job_start_stats(job, queue);
> +#ifdef CONFIG_LOCKDEP
> + local_irq_restore(flags);
> +#endif
> +}
> +
> static void
> v3d_stats_update(struct v3d_stats *stats, u64 now)
> {
> @@ -194,6 +215,7 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
> * reuse the overflow attached to a previous job.
> */
> V3D_CORE_WRITE(0, V3D_PTB_BPOS, 0);
> + v3d_job_start_stats(&job->base, V3D_BIN); /* Piggy-back on existing irq-off section. */
> spin_unlock_irqrestore(&v3d->job_lock, irqflags);
>
> v3d_invalidate_caches(v3d);
> @@ -209,7 +231,6 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
> trace_v3d_submit_cl(dev, false, to_v3d_fence(fence)->seqno,
> job->start, job->end);
>
> - v3d_job_start_stats(&job->base, V3D_BIN);
> v3d_switch_perfmon(v3d, &job->base);
>
> /* Set the current and end address of the control list.
> @@ -261,7 +282,7 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
> trace_v3d_submit_cl(dev, true, to_v3d_fence(fence)->seqno,
> job->start, job->end);
>
> - v3d_job_start_stats(&job->base, V3D_RENDER);
> + v3d_job_start_stats_irqsave(&job->base, V3D_RENDER);
> v3d_switch_perfmon(v3d, &job->base);
>
> /* XXX: Set the QCFG */
> @@ -294,7 +315,7 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
>
> trace_v3d_submit_tfu(dev, to_v3d_fence(fence)->seqno);
>
> - v3d_job_start_stats(&job->base, V3D_TFU);
> + v3d_job_start_stats_irqsave(&job->base, V3D_TFU);
>
> V3D_WRITE(V3D_TFU_IIA(v3d->ver), job->args.iia);
> V3D_WRITE(V3D_TFU_IIS(v3d->ver), job->args.iis);
> @@ -339,7 +360,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>
> trace_v3d_submit_csd(dev, to_v3d_fence(fence)->seqno);
>
> - v3d_job_start_stats(&job->base, V3D_CSD);
> + v3d_job_start_stats_irqsave(&job->base, V3D_CSD);
> v3d_switch_perfmon(v3d, &job->base);
>
> csd_cfg0_reg = V3D_CSD_QUEUED_CFG0(v3d->ver);
More information about the dri-devel
mailing list