[PATCH 01/10] drm/vkms: Fix crc worker races

Tue Jun 18 08:49:59 UTC 2019

On Mon, Jun 17, 2019 at 11:39:31PM -0300, Rodrigo Siqueira wrote:
> On 06/12, Daniel Vetter wrote:
> > On Wed, Jun 12, 2019 at 10:33:11AM -0300, Rodrigo Siqueira wrote:
> > > On Thu, Jun 6, 2019 at 7:28 PM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> > > >
> > > > The issue we have is that the crc worker might fall behind. We've
> > > > tried to handle this by tracking both the earliest frame for which it
> > > > still needs to compute a crc, and the last one. Plus when the
> > > > crtc_state changes, we have a new work item, which are all run in
> > > > order due to the ordered workqueue we allocate for each vkms crtc.
> > > >
> > > > Trouble is there's been a few small issues in the current code:
> > > > - we need to capture frame_end in the vblank hrtimer, not in the
> > > >   worker. The worker might run much later, and then we generate a lot
> > > >   of crc for which there's already a different worker queued up.
> > > > - frame number might be 0, so create a new crc_pending boolean to
> > > >   track this without confusion.
> > > > - we need to atomically grab frame_start/end and clear it, so do that
> > > >   all in one go. This is not going to create a new race, because if we
> > > >   race with the hrtimer then our work will be re-run.
> > > > - only race that can happen is the following:
> > > >   1. worker starts
> > > >   2. hrtimer runs and updates frame_end
> > > >   3. worker grabs frame_start/end, already reading the new frame_end,
> > > >   and clears crc_pending
> > > >   4. hrtimer calls queue_work()
> > > >   5. worker completes
> > > >   6. worker gets  re-run, crc_pending is false
> > > >   Explain this case a bit better by rewording the comment.
> > > >
> > > > v2: Demote warning level output to debug when we fail to requeue, this
> > > > is expected under high load when the crc worker can't quite keep up.
> > > >
> > > > Cc: Shayenne Moura <shayenneluzmoura at gmail.com>
> > > > Cc: Rodrigo Siqueira <rodrigosiqueiramelo at gmail.com>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > > > Cc: Haneen Mohammed <hamohammed.sa at gmail.com>
> > > > Cc: Daniel Vetter <daniel at ffwll.ch>
> > > > ---
> > > >  drivers/gpu/drm/vkms/vkms_crc.c  | 27 +++++++++++++--------------
> > > >  drivers/gpu/drm/vkms/vkms_crtc.c |  9 +++++++--
> > > >  drivers/gpu/drm/vkms/vkms_drv.h  |  2 ++
> > > >  3 files changed, 22 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_crc.c b/drivers/gpu/drm/vkms/vkms_crc.c
> > > > index d7b409a3c0f8..66603da634fe 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_crc.c
> > > > +++ b/drivers/gpu/drm/vkms/vkms_crc.c
> > > > @@ -166,16 +166,24 @@ void vkms_crc_work_handle(struct work_struct *work)
> > > >         struct drm_plane *plane;
> > > >         u32 crc32 = 0;
> > > >         u64 frame_start, frame_end;
> > > > +       bool crc_pending;
> > > >         unsigned long flags;
> > > >
> > > >         spin_lock_irqsave(&out->state_lock, flags);
> > > >         frame_start = crtc_state->frame_start;
> > > >         frame_end = crtc_state->frame_end;
> > > > +       crc_pending = crtc_state->crc_pending;
> > > > +       crtc_state->frame_start = 0;
> > > > +       crtc_state->frame_end = 0;
> > > > +       crtc_state->crc_pending = false;
> > > >         spin_unlock_irqrestore(&out->state_lock, flags);
> > > >
> > > > -       /* _vblank_handle() hasn't updated frame_start yet */
> > > > -       if (!frame_start || frame_start == frame_end)
> > > > -               goto out;
> > > > +       /*
> > > > +        * We raced with the vblank hrtimer and previous work already computed
> > > > +        * the crc, nothing to do.
> > > > +        */
> > > > +       if (!crc_pending)
> > > > +               return;
> > > 
> > > I think this condition is not reachable because crc_pending will be
> > > filled with true in `vkms_vblank_simulate()` which in turn schedule
> > > the function `vkms_crc_work_handle()`. Just for checking, I tried to
> > > reach this condition by running kms_flip, kms_pipe_crc_basic, and
> > > kms_cursor_crc with two different VM setups[1], but I couldn't reach
> > > it. What do you think?
> > 
> > thread A			thread B
> > 1. run vblank hrtimer
> > 
> > 				2. starts running crc work (from previous
> > 				vblank)
> > 
> > 3. spin_lock()			-> gets stalled on the spin_lock() because
> > 				   thread A has it already
> > 
> > 4. update frame_end (only in
> > later patches, atm this is
> > impossible). crc_pending is set
> > already.
> > 
> > 5. schedule_work: since the work
> > is running already, this means it
> > is scheduled to run once more.
> > 
> > 6. spin_unlock
> > 
> > 				7. compute crc, clear crc_pending
> > 				8. work finishes
> > 				9. work gets run again
> > 				8. crc_pending=false
> > 
> > Since the spin_lock critical section is _very_ short (less than 1 usec I
> > bet), this race is very hard to hit.
> 
> First of all, thank you very much for all of your detailed explanation
> and sorry for my delay to reply, I was 'processing' all of your
> comments. I believe that I understood the issues related with this
> patchset, I just want to check with you if the diagram and the cases
> below make sense:
> 
> timer   |------|------|------|------|------|...
> 
> Case 1:        +----x +---x  +-----x
> 
> Case 2:     A  +----------x
>             B         +----x
> 
> At the top of this diagram, I illustrated the vblank period along the
> time. In the bottom lines, I highlighted two cases; the '+' represents
> when the worker is queued (queue_work()), and the 'x' denotes when the
> CRC work finishes its data processing. Before describing each case from
> the diagram, I want to highlight that I'm focused on these two snippets
> of code:
> 
> static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) {
>     [..]
> 		spin_lock(&output->crc_lock);
>       [..] data [..]
> 		spin_unlock(&output->crc_lock);
>     [..]
> }
> 
> void vkms_crc_work_handle(struct work_struct *work){
>   [..]
>   spin_lock_irq(&out->crc_lock);
>   crtc_state->crc_pending = false;
>     [..] data [..]
> 	spin_unlock_irq(&output->crc_lock);
>   [..]
> }
> 
> Cases:
> 
> 1) This is the best scenario; each CRC worker finishes before the next
> vblank.
> 
> 2) In this scenario, one of the CRC workers extends along multiple
> vblanks.  If worker A already collected the sensitive data inside
> vkms_crc_work_handle(), worker A and B will finish without problems
> (thanks to your changes). However, if for any reason, the worker A did
> not start before the worker B, the new work will take care of its own
> CRC and the CRC from worker A. Finally, since worker B will set
> crc_pending equal false when the worker A starts, it'll just return
> because of the following code:
> 
> if (!crc_pending)
>   return;
> 
> Make sense?

Almost correct, but I think you mixed up the workers here. Since we have a
single-threaded workqueue B will always run only once A has started. So if
A has consumed the 2nd vblank already (because it got delayed for some
reason), then it's worker B which will observe crc_pending == false
(because A has already taken care of that).

So in a way there's an additional event per worker that you need to
incorporate into your scenarios (and this means there's going to be a few
more variants). You alsready have two:

- spin_lock critical section in vkms_vblank_simulate, which also
  (re)queues the work. In your diagram that's represented with a '+'

- the worker finishes and has uploaded the crc, in your diagramm that's a
  '*'

What's missing is when exactly the worker starts, and more precisely, the
spinlock protected critical section where the work reads frame_start/end.
That's the thing which can race in interesting ways with the vblank
handler.

But looking at your diagrams and explanations I think you're getting the
hang of thinking through race conditions.

btw another interesting exercise for thinking these through:
- write up the critical steps of the functions you want to analyze
- cut them up into pieces (number them so there's no chaos)
- put them on the table as columns (each column represents a cpu or thread
  doing stuff)
- then try to make gaps and interleave execution in interesting ways, and
  try to figure out whether everything is still computed correctly.

Like in your diagram you might need a few copies of the functions to
really make things fun. For me being able to quickly move around how code
is executed and even play out extreme scenarios (like "what happens if the
entire worker is executed between these two instructions over there" -
this is possible with preemption and the linux scheduler running something
else) helps a lot for understanding this.

It's still really hard to convince yourself of correctness, but at least
easier to get an intuitive understanding.

Cheers, Daniel

> > Exercise: Figure out why schedule_work _must_ schedule the work item to
> > re-run if it's running already. If it doesn't do that there's another
> > race.
> > 
> > > 
> > > [1] Qemu parameters
> > > VM1: -m 1G -smp cores=2,cpus=2
> > > VM2: -enable-kvm -m 2G -smp cores=4,cpus=4
> > > 
> > > >         drm_for_each_plane(plane, &vdev->drm) {
> > > >                 struct vkms_plane_state *vplane_state;
> > > > @@ -196,20 +204,11 @@ void vkms_crc_work_handle(struct work_struct *work)
> > > >         if (primary_crc)
> > > >                 crc32 = _vkms_get_crc(primary_crc, cursor_crc);
> > > >
> > > > -       frame_end = drm_crtc_accurate_vblank_count(crtc);
> > > > -
> > > > -       /* queue_work can fail to schedule crc_work; add crc for
> > > > -        * missing frames
> > > > +       /*
> > > > +        * The worker can fall behind the vblank hrtimer, make sure we catch up.
> > > >          */
> > > >         while (frame_start <= frame_end)
> > > >                 drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
> > > 
> > > I want to take this opportunity to ask about this while; It's not
> > > really specific to this patch.
> > > 
> > > I have to admit that I never fully got the idea behind this 'while';
> > > it looks like that we just fill out the missed frames with a repeated
> > > value. FWIU, `drm_crtc_add_crc_entry()` will add an entry with the CRC
> > > information for a frame, but in this case, we are adding the same CRC
> > > for a different set of frames. I agree that near frame has a similar
> > > CRC value, but could we rely on this all the time? What could happen
> > > if we have a great difference from the frame_start and frame_end?
> > 
> > It's a cheap trick for slow cpu: If the crc work gets behind the vblank
> > hrtimer, we need to somehow catch up. With real hw this is not possible,
> > but with vkms we simulate the hw. The only quick way to catch up is to
> > fill out the same crc for everything. It's a lie, it will make some
> > kms_atomic tests fail, but it's the only thing we can really do. Aside
> > from trying to make the crc computation code as fast as possible.
> > -Daniel
> > 
> > > 
> > > > -
> > > > -out:
> > > > -       /* to avoid using the same value for frame number again */
> > > > -       spin_lock_irqsave(&out->state_lock, flags);
> > > > -       crtc_state->frame_end = frame_end;
> > > > -       crtc_state->frame_start = 0;
> > > > -       spin_unlock_irqrestore(&out->state_lock, flags);
> > > >  }
> > > >
> > > >  static int vkms_crc_parse_source(const char *src_name, bool *enabled)
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> > > > index 1bbe099b7db8..c727d8486e97 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> > > > +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> > > > @@ -30,13 +30,18 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
> > > >                  * has read the data
> > > >                  */
> > > >                 spin_lock(&output->state_lock);
> > > > -               if (!state->frame_start)
> > > > +               if (!state->crc_pending)
> > > >                         state->frame_start = frame;
> > > > +               else
> > > > +                       DRM_DEBUG_DRIVER("crc worker falling behind, frame_start: %llu, frame_end: %llu\n",
> > > > +                                        state->frame_start, frame);
> > > > +               state->frame_end = frame;
> > > > +               state->crc_pending = true;
> > > >                 spin_unlock(&output->state_lock);
> > > >
> > > >                 ret = queue_work(output->crc_workq, &state->crc_work);
> > > >                 if (!ret)
> > > > -                       DRM_WARN("failed to queue vkms_crc_work_handle");
> > > > +                       DRM_DEBUG_DRIVER("vkms_crc_work_handle already queued\n");
> > > >         }
> > > >
> > > >         spin_unlock(&output->lock);
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > > > index 81f1cfbeb936..3c7e06b19efd 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > > > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > > > @@ -56,6 +56,8 @@ struct vkms_plane_state {
> > > >  struct vkms_crtc_state {
> > > >         struct drm_crtc_state base;
> > > >         struct work_struct crc_work;
> > > > +
> > > > +       bool crc_pending;
> > > >         u64 frame_start;
> > > >         u64 frame_end;
> > > >  };
> > > > --
> > > > 2.20.1
> > > >
> > > 
> > > 
> > > -- 
> > > 
> > > Rodrigo Siqueira
> > > https://siqueira.tech
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> -- 
> Rodrigo Siqueira
> https://siqueira.tech

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch