[Spice-devel] [PATCH v1 09/10] DCC: change how fill_bits() marshalls data by reference

Frediano Ziglio fziglio at redhat.com
Fri Dec 16 15:26:45 UTC 2016


> 
> > 
> > On Wed, 2016-12-14 at 16:03 -0500, Frediano Ziglio wrote:
> > > > 
> > > > 
> > > > The fill_bits() function marshalls some data by reference. This
> > > > data is
> > > > owned by the RedDrawable that is owned by the Drawable that is
> > > > owned by
> > > > the RedDrawablePipeItem.  Instead of keeping the RedPipeItem alive
> > > > by
> > > > passing it to red_channel_client_init_send_data(), simply reference
> > > > the
> > > > Drawable and marshall it with _add_by_ref_full(). This means that
> > > > we
> > > > can't use  the _add_chunks_by_ref() convenience function since that
> > > > function doesn't allow us to pass a free function to clean up the
> > > > data
> > > > after it is sent.
> > > > 
> > > > This change is not perfect since the fill_bits() function makes an
> > > > assumption that 'simage' is owned by the 'drawable'. On the other
> > > > hand,
> > > > the previous code made a much bigger assumption: that the caller
> > > > would
> > > > ensure that the data would be kept alive
> > > > ---
> > > >  server/dcc-send.c | 55
> > > >  ++++++++++++++++++++++++++++++++++++++-----------------
> > > >  1 file changed, 38 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/server/dcc-send.c b/server/dcc-send.c
> > > > index db42ab8..09a2bb4 100644
> > > > --- a/server/dcc-send.c
> > > > +++ b/server/dcc-send.c
> > > > @@ -343,8 +343,17 @@ static void
> > > > marshaller_add_compressed(SpiceMarshaller
> > > > *m,
> > > >      } while (max);
> > > >  }
> > > >  
> > > > +static void marshaller_unref_drawable(uint8_t *data, void *opaque)
> > > > +{
> > > > +    Drawable *drawable = opaque;
> > > > +    drawable_unref(drawable);
> > > > +}
> > > > +
> > > 
> > > This function is copied multiple times. Should be defined and reused.
> > 
> > I don't think this function is really copied multiple times. In other
> > files, it's generally a RedPipeItem that is being referenced and
> > unreferenced. In this file, it's a Drawable.
> > 
> > I thought about making the helper function for unreffing a RedPipeItem
> > accessible from different files, but I couldn't decide on a good place
> > to put it. The signature of the function is pretty marshaller-specific,
> > so I'm not sure that I'd want to put that function in red-pipe-item.h.
> > But maybe...
> > 
> > > 
> > > > 
> > > >  /* if the number of times fill_bits can be called per one
> > > > qxl_drawable
> > > >  increases -
> > > >     MAX_LZ_DRAWABLE_INSTANCES must be increased as well */
> > > > +/* NOTE: 'simage' should be owned by the drawable. The drawable
> > > > will be kept
> > > > + * alive until the marshalled message has been sent. See comments
> > > > below for
> > > > + * more information */
> > > >  static FillBitsType fill_bits(DisplayChannelClient *dcc,
> > > > SpiceMarshaller *m,
> > > >                                SpiceImage *simage, Drawable
> > > > *drawable, int
> > > >                                can_lossy)
> > > >  {
> > > > @@ -449,7 +458,14 @@ static FillBitsType
> > > > fill_bits(DisplayChannelClient *dcc,
> > > > SpiceMarshaller *m,
> > > >                  spice_marshall_Palette(bitmap_palette_out,
> > > > palette);
> > > >              }
> > > >  
> > > > -            spice_marshaller_add_chunks_by_ref(m, bitmap->data);
> > > > +            /* 'drawable' owns this bitmap data, so it must be
> > > > kept
> > > > +             * alive until the message is sent. */
> > > > +            for (unsigned int i = 0; i < bitmap->data->num_chunks;
> > > > i++) {
> > > > +                drawable->refs++;
> > > > +                spice_marshaller_add_by_ref_full(m,
> > > > bitmap->data->chunk[i].data,
> > > > +                                                 bitmap->data-
> > > > >chunk[i].len,
> > > > +                                                 marshaller_unref_
> > > > drawable,
> > > > drawable);
> > > > +            }
> > > 
> > > could this became a function? Or perhaps adding a
> > > spice_marshaller_add_chunks_by_ref_full?
> > > I think just adding a reference for the last chunk is enough (but
> > > this is an optimization).
> > 
> > Yeah, I was initially going to add a _add_chunks_by_ref_full()
> > function, but didn't because I was thinking I'd need to add a reference
> > for each chunk and that wasn't really feasible within the
> > _add_chunks_by_ref_full() function (since that function doesn't know
> > what object to ref). I suppose that it would be possible if you only
> > add a reference for the last chunk, but that feels a bit hacky to me.
> > 
> > You'd end up with something like:
> > 
> > void spice_marshaller_add_chunks_by_ref_full(SpiceMarshaller *m,
> > SpiceChunks *chunks, free_func, opaque) {
> >     unsigned int i;
> > 
> >     for (i = 0; i < chunks->num_chunks; i++) {
> >     	if (last_chunk)
> > 		spice_marshaller_add_by_ref_full(...)
> > 	else
> > 		spice_marshaller_add_by_ref(...)
> >     }
> > }
> > 
> > 
> > 
> > > 
> > > > 
> > > >              pthread_mutex_unlock(&dcc->priv->pixmap_cache->lock);
> > > >              return FILL_BITS_TYPE_BITMAP;
> > > >          } else {
> > > > @@ -481,7 +497,14 @@ static FillBitsType
> > > > fill_bits(DisplayChannelClient *dcc,
> > > > SpiceMarshaller *m,
> > > >                               &bitmap_palette_out,
> > > > &lzplt_palette_out);
> > > >          spice_assert(bitmap_palette_out == NULL);
> > > >          spice_assert(lzplt_palette_out == NULL);
> > > > -        spice_marshaller_add_chunks_by_ref(m, image.u.quic.data);
> > > > +        /* 'drawable' owns this image data, so it must be kept
> > > > +         * alive until the message is sent. */
> > > > +        for (unsigned int i = 0; i < image.u.quic.data-
> > > > >num_chunks; i++) {
> > > > +            drawable->refs++;
> > > > +            spice_marshaller_add_by_ref_full(m,
> > > > image.u.quic.data->chunk[i].data,
> > > > +
> > > > image.u.quic.data->chunk[i].len,
> > > > +                                             marshaller_unref_draw
> > > > able,
> > > > drawable);
> > > > +        }
> > > >          pthread_mutex_unlock(&dcc->priv->pixmap_cache->lock);
> > > >          return FILL_BITS_TYPE_COMPRESS_LOSSLESS;
> > > >      default:
> > > > @@ -531,7 +554,7 @@ static void
> > > > marshall_qxl_draw_fill(RedChannelClient *rcc,
> > > >      SpiceMarshaller *mask_bitmap_out;
> > > >      SpiceFill fill;
> > > >  
> > > > -    red_channel_client_init_send_data(rcc,
> > > > SPICE_MSG_DISPLAY_DRAW_FILL,
> > > > &dpi->dpi_pipe_item);
> > > > +    red_channel_client_init_send_data(rcc,
> > > > SPICE_MSG_DISPLAY_DRAW_FILL,
> > > > NULL);
> > > >      fill_base(base_marshaller, item);
> > > >      fill = drawable->u.fill;
> > > >      spice_marshall_Fill(base_marshaller,
> > > 
> > > Could this cause some problem destroying the primary surface?
> > > Deleting a surface causes dpi to be deleted and a check on Drawables
> > > (if I remember).
> > 
> > Hmm, I didn't notice any potential problems here. Can you be a little
> > more specific about what you think the problem might be?
> > 
> > Jonathon
> > 
> 
> This could be related :(
> 
> (process:9392): Spice-ERROR **:
> display-channel.c:1835:display_channel_destroy_surfaces: assertion
> `!display->priv->surfaces[i].context.canvas' failed
> 
> Didn't narrow down. I was just starting a Windows 7 machine with 4 monitors
> (I don't know if the monitor information is related).
> 

It's specifically this patch that cause the assert to trigger.

The surface destruction is currently complicated and wrong.
If you looks there are intermittent crashes now and then.
The big mistake is that reference counting is used but the code rely
on many assumption on when the surface is released, basically:
1- destroy request is received;
2- code try to remove any reference to the surface;
3- code check there are no more references.
Point 3 sometimes fails and cause a crash (spice_assert). In
other words code is assuming that 2 always remove all references.
So why is so complicated to remove all references? References
came from commands that use the surface so:
- commands with source the surface;
- commands with destination the surface.
Commands are usually queued using RedDrawablePipeItem to the
queue, then items are removed from the queue and sent to client,
when the items are freed commands (Drawables so RedDrawables) are
freed also releasing surfaces. It seems easy but is easily prone
to errors. On point 2 all drawables in the pipe has to be deleted.
if commands has the surface as destination could be removed from
queue, if they have as source is harder the command should be
executed before being removed. Removing a source could then trigger
some chain to commands to be executed which is not easy.
The specific problem caused by this patch is caused by the fact
(probably, not sure) that now an item that is being send is freed
but code 2 then don't know how to release the command (Drawable)
pending (registered in the marshaller).
I actually though that this problem (destroying surface with a
pending Drawable not in the queue) was already present even before
(it's handled by a timeout and I know this is wrong) but or it
seems that it's not true or that my supposition on previous
paragraph is not correct.
So: is this patch wrong? No, I don't think so, it's the code that
destroy the surface that is too sensible to errors and basically
the assumption that can release all reference is wrong. Basically
it's hard to predict if some Drawable are pending, as it's
also indirectly referenced by the client so could depend on:
- operating system queue;
- some network problems like:
  - congestion;
  - weather condition;
  - people walking on wire;
  - router faults;
  - birds passing on your radio connection :-)... well... yes,
    improbable and maybe funny but this is true and good code
    should not rely on stuff like this not happening.

I currently have no solution for this patch crash, wasn't that
expected. I sent some time ago a RFC series to fix the root
cause of this issue.

Frediano


More information about the Spice-devel mailing list