[PATCH v2 2/2] drm/xe/oa: Fix locking for stream->pollin

Rodrigo Vivi rodrigo.vivi at intel.com
Thu Jan 23 10:25:45 UTC 2025


On Wed, Jan 22, 2025 at 07:04:21PM -0800, Dixit, Ashutosh wrote:
> On Wed, 22 Jan 2025 02:23:23 -0800, Rodrigo Vivi wrote:
> >
> 
> Hi Rodrigo,
> 
> Thanks for your inputs, it helped me look at the code more closely and
> clarify some things.
> 
> I have responded to your comments below, but in case you don't want to go
> through that, the summary is that I would like to drop this patch from
> consideration at this time, basically because it needs more changes.
> 
> > On Tue, Jan 21, 2025 at 08:02:04PM -0800, Ashutosh Dixit wrote:
> > > Previously locking was skipped for stream->pollin. This "mostly" worked
> > > because pollin is u32/bool, even when stream->pollin is accessed
> > > concurrently in multiple threads. However, with stream->pollin moving under
> > > stream->oa_buffer.ptr_lock in this series, implement the correct way to
> > > access stream->pollin, which is to access it under
> > > stream->oa_buffer.ptr_lock.
> > >
> > > v2: Update commit message to explain the "why" of this change (Rodrigo)
> > >     Document the change in scope for stream->oa_buffer.ptr_lock (Rodrigo)
> >
> > First of all thanks for the rework.
> > But I believe I didn't have enough coffee today yet, because
> > I'm still failing to understand why...
> >
> > Breaking your explanation to see if I can understand:
> > 'mostly' - Why mostly? Did we face bugs?
> >
> > 'worked because pollin is u32/bool' - this sounds like 'works by luck'
> 
> No, there are no known bugs in the current code in the kernel. That is what
> 'mostly' is supposed to mean: basically the requirements on this code are
> pretty lax, and even if writes into the boolean stream->pollin from
> different threads cross each other, or they stomp on each other, things
> work out. So not 'works by luck' or 'works most of the time' but the code
> works even in absence of this locking.
> 
> >
> > 'with stream->pollin moving under stream->oa_buffer.ptr_lock' - Why?
> >
> > I believe this is the main why that I had yesterday and that continues
> > today. Why are we using the oa_buffer pointer lock to also protect
> > the a stream variable?
> >
> > Why don't you use the stream_lock? Or why don't you create a dedicated
> > polling_lock?
> >
> 
> So, stream->oa_buffer.ptr_lock _is_ the correct lock to use for
> 'stream->pollin', because 'stream->pollin' is intimately connected to the
> same pointers 'ptr_lock' is protecting. You can see this in Patch 1.
> 
> Also, there is basically one 'struct xe_oa_buffer' per stream (and stream_lock
> is a mutex which can't be used from interrupt context).
> 
> So rather than using a stream level lock, we can just move pollin into
> stream->oa_buffer. So it will become stream->oa_buffer.pollin.
> 
> > I'm sorry for not been clear yesterday, wasting your time and 1 cycle...
> 
> No, thanks for your comments, it made me think about all this.
> 
> > > @@ -562,8 +563,10 @@ static ssize_t xe_oa_read(struct file *file, char __user *buf,
> > >	 * Also in case of -EIO, we have already waited for data before returning
> > >	 * -EIO, so need to wait again
> > >	 */
> > > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> > >	if (ret != -ENOSPC && ret != -EIO)
> > >		stream->pollin = false;
> > > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> 
> Also this is the part I don't like, this should be moved into
> xe_oa_append_reports() where stream->oa_buffer.ptr_lock is already
> held. Otherwise we are dropping and re-grabbing the lock etc.
> 
> So imo the basic direction of this patch is correct but I want to make all
> these changes and resubmit this as part of a separate series.

everything makes sense now, thanks for the clarifications and for the work.
the lock might even deserve a more generic name in the series,
like s/ptr_lock/lock.

>
 So please
> ignore this patch for now. I will go ahead and merge Patch 1 since that is
> ok and was already reviewed as a separate series:
> 
> https://patchwork.freedesktop.org/series/143575/

on merging that other patch: Acked-by: Rodrigo Vivi <rodrigo.vivi at intel.com>

> 
> Thanks.
> --
> Ashutosh
> 
> 
> 
> > >
> > >	/* Possible values for ret are 0, -EFAULT, -ENOSPC, -EIO, -EINVAL, ... */
> > >	return offset ?: (ret ?: -EAGAIN);
> > > @@ -573,6 +576,7 @@ static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
> > >				  struct file *file, poll_table *wait)
> > >  {
> > >	__poll_t events = 0;
> > > +	unsigned long flags;
> > >
> > >	poll_wait(file, &stream->poll_wq, wait);
> > >
> > > @@ -582,8 +586,10 @@ static __poll_t xe_oa_poll_locked(struct xe_oa_stream *stream,
> > >	 * in use. We rely on hrtimer xe_oa_poll_check_timer_cb to notify us when there
> > >	 * are samples to read
> > >	 */
> > > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> > >	if (stream->pollin)
> > >		events |= EPOLLIN;
> > > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > >
> > >	return events;
> > >  }
> > > diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
> > > index 52e33c37d5ee8..5c4ea13f646fc 100644
> > > --- a/drivers/gpu/drm/xe/xe_oa_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_oa_types.h
> > > @@ -159,7 +159,7 @@ struct xe_oa_buffer {
> > >	/** @vaddr: mapped vaddr of the OA buffer */
> > >	u8 *vaddr;
> > >
> > > -	/** @ptr_lock: Lock protecting reads/writes to head/tail pointers */
> > > +	/** @ptr_lock: Lock protecting reads/writes to head/tail pointers and stream->pollin */
> > >	spinlock_t ptr_lock;
> > >
> > >	/** @head: Cached head to read from */
> > > --
> > > 2.47.1
> > >


More information about the Intel-xe mailing list