Forcing initial Frame Transmission for dmabuf encoding on SPICE display channel connection

Wed May 14 12:11:03 UTC 2025

Hello,

ok I have implemented this.

Due to the existing code structure, it was difficult to make a draw that 
does not also trigger the callback. I would have liked to attach the 
meta data to the draw to be able to identify it at the end, but that 
would have required major code changes. Now I have ended up with the 
following:

https://gitlab.freedesktop.org/spice/spice/-/merge_requests/238

Description:

Send initial draw on client connect for DMA-BUF encoder

Ensure an initial draw is sent immediately upon client connection
if the DMA-BUF video encoder is in use. Without this, a frame would
only be pushed when the screen content changes and the GPU renders
a new frame—which could take a while, causing delays for the client.
GStreamer now receives a duplicated file descriptor (fd) and takes
responsibility for closing it. This guarantees the buffer remains
available long enough. The previous fd mechanism is still retained
to ensure a valid scanout buffer is always available for a new
client.

Greetings,
Michael

On 05.05.25 14:33, Michael Scherle wrote:
> Hello,
> 
> Sorry for the late reply — I was on vacation. Thank you very much for 
> the detailed explanation; it gave me a clear direction on how to 
> approach the problem.
> 
> The race condition I was facing involved qemu calling 
> spice_qxl_gl_scanout during the initial frame transmission. This caused 
> the dma-buf to be closed while GStreamer was still using it. I resolved 
> this by giving GStreamer a duplicate of the dma-buf file descriptor and 
> letting it close it once it's done. I'm still keeping the original dma- 
> buf fd in qxl_state so that one is always available for the initial 
> frame transmission. The question now is whether this is an acceptable 
> solution.
> 
> With that, I have a working prototype. However, there are still a few 
> things I need to improve before i can do a MR:
> 
> 1. Callback (async_complete) handling: I obviously don’t want to call 
> this for the initial frame transmission. Implementing a special case for 
> this in the current code structure is a bit tricky. Either by passing a 
> variable through, or perhaps storing it in the qxl_state. For the 
> latter, however, I first need to better understand the thread, worker 
> and pipe system and see if that is possible.
> 
> 2. I'm not sure whether it's necessary to ensure that, when multiple 
> clients are connected, only the newly connected one receives the new 
> frame. It's also an interesting design choice to encode the frame 
> separately for each connection.
> 
> Best regards,
> Michael
> 
> 
> On 16.04.25 00:00, Frediano Ziglio wrote:
>> On Thu, Apr 10, 2025 at 3:18 PM Michael Scherle <
>> michael.scherle at rz.uni-freiburg.de> wrote:
>>
>>> Hello,
>>>
>>> I’ve encountered an issue with the new DMA-BUF -> video encoding feature
>>> in SPICE. When connecting, the first frame is only sent once the GPU
>>> renders a new frame. However, this can take quite some time if the VM is
>>> idle (e.g., sitting on the desktop), since the GPU only renders a new
>>> frame when something on the screen changes. To address this, I wanted to
>>> force a frame to be sent when the display channel is connected.
>>>
>>>
>> Which makes sense.
>>
>>
>>> My initial, naive attempt was to grab the latest DMA-BUF on the display
>>> channel's connection in the SPICE server, encode it, and send it.
>>> However, this led to race conditions and crashes—particularly when QEMU
>>> happened to perform a scanout at the same time, closing the DMA-BUF in
>>> the process.
>>>
>>> By "closing" do you mean calling close() function? No, we should have
>> ownership.
>> What exact race did you encounter?
>>
>>
>>> As a second approach, I modified the QXLInterface to pass the display
>>> channel on_connect event back to QEMU. I couldn’t find any existing
>>> mechanism in QEMU to detect the connection of a display channel. Within
>>> QEMU, I then used qemu_spice_gl_monitor_config, and spice_gl_refresh to
>>> trigger a spice_gl_draw. This solution works, but the downside is that
>>> it requires changes to SPICE, QEMU, and especially the
>>> QXLInterface—which is obviously not ideal.
>>>
>>> Not ideal is a compliment. I would say complicated, hard to maintain,
>> adding too much coupling.
>>
>> So now I’m wondering: does anyone have a better idea for how to tackle
>>> this problem?
>>>
>>> I would define "the problem" first, currently you mentioned a race
>> condition without describing the details of the race.
>>
>>
>>> Best regards,
>>> Michael
>>>
>>
>> I could suspect the race is more in the current implementation of the
>> interface. Indeed that interface does not fit entirely in the Spice 
>> server
>> model.
>>
>> Externally there are 2 functions, spice_qxl_gl_scanout and
>> spice_qxl_gl_draw_async, the callback async_complete is used to tell Qemu
>> when we finish with the scanout. So, spice_qxl_gl_scanout should set the
>> scanout (or frame if you prefer), while spice_qxl_gl_draw_async tells 
>> Spice
>> to use the scanout, till async_complete is called (which should be 
>> done in
>> a time fashion, I think Qemu timeout is 1 second). In theory the scanout
>> can be reused for multiple draws (which was never the case, but that's
>> another story). In theory a partial draw of the scanout can be requested.
>> In theory the scanout should not be used after async_complete is 
>> called as
>> Qemu could reuse the scanout for next drawings. That last point is a 
>> bit of
>> a problem here and to be honest something I think is an issue of the
>> external interface definition. In hardware you set the framebuffer and 
>> the
>> video card will continue to use it, no matter what, the computer can 
>> freeze
>> or panic and the video card will continue to use the same frame over and
>> over. Also, considering that the maximum that can happen is to get a
>> partial draw that will be fixed, I think it's correct to use the last
>> scanout to solve your initial problem.
>>
>> Internally Spice server stores the scanout in the RedQxl thread (Qemu I/O
>> one) but uses it in the RedWorker thread. This is pretty uncommon, 
>> usually
>> data is passed from a thread to the other, ownership included. This,
>> probably, leads to the race you are facing. If that's the issue I think
>> really the best option is to fix that race.
>>
>> Regards,
>>    Frediano
>>
>