Support for 2D engines/blitters in V4L2 and DRM

Thu Apr 25 15:17:56 UTC 2019

On 2019-04-24 7:43 p.m., Nicolas Dufresne wrote:
> Le mercredi 24 avril 2019 à 18:54 +0200, Michel Dänzer a écrit :
>> On 2019-04-24 5:44 p.m., Nicolas Dufresne wrote:
>>> Le mercredi 24 avril 2019 à 17:06 +0200, Daniel Vetter a écrit :
>>>> On Wed, Apr 24, 2019 at 4:41 PM Paul Kocialkowski
>>>> <paul.kocialkowski at bootlin.com> wrote:
>>>>> On Wed, 2019-04-24 at 16:39 +0200, Michel Dänzer wrote:
>>>>>> On 2019-04-24 2:01 p.m., Nicolas Dufresne wrote:
>>>>>>> Rendering a video stream is more complex then what you describe here.
>>>>>>> Whenever there is a unexpected delay (late delivery of a frame as an
>>>>>>> example) you may endup in situation where one frame is ready after the
>>>>>>> targeted vblank. If there is another frame that targets the following
>>>>>>> vblank that gets ready on-time, the previous frame should be replaced
>>>>>>> by the most recent one.
>>>>>>>
>>>>>>> With fences, what happens is that even if you received the next frame
>>>>>>> on time, naively replacing it is not possible, because we don't know
>>>>>>> when the fence for the next frame will be signalled. If you simply
>>>>>>> always replace the current frame, you may endup skipping a lot more
>>>>>>> vblank then what you expect, and that results in jumpy playback.
>>>>>>
>>>>>> So you want to be able to replace a queued flip with another one then.
>>>>>> That doesn't necessarily require allowing more than one flip to be
>>>>>> queued ahead of time.
>>>>>
>>>>> There might be other ways to do it, but this one has plenty of
>>>>> advantages.
>>>>
>>>> The point of kms (well one of the reasons) was to separate the
>>>> implementation of modesetting for specific hw from policy decisions
>>>> like which frames to drop and how to schedule them. Kernel gives
>>>> tools, userspace implements the actual protocols.
>>>>
>>>> There's definitely a bit a gap around scheduling flips for a specific
>>>> frame or allowing to cancel/overwrite an already scheduled flip, but
>>>> no one yet has come up with a clear proposal for new uapi + example
>>>> implementation + userspace implementation + big enough support from
>>>> other compositors that this is what they want too.
>>
>> Actually, the ATOMIC_AMEND patches propose a way to replace a scheduled
>> flip?
>>
>>
>>>>>> Note that this can also be done in userspace with explicit fencing (by
>>>>>> only selecting a frame and submitting it to the kernel after all
>>>>>> corresponding fences have signalled), at least to some degree, but the
>>>>>> kernel should be able to do it up to a later point in time and more
>>>>>> reliably, with less risk of missing a flip for a frame which becomes
>>>>>> ready just in time.
>>>>>
>>>>> Indeed, but it would be great if we could do that with implicit fencing
>>>>> as well.
>>>>
>>>> 1. extract implicit fences from dma-buf. This part is just an idea,
>>>> but easy to implement once we have someone who actually wants this.
>>>> All we need is a new ioctl on the dma-buf to export the fences from
>>>> the reservation_object as a sync_file (either the exclusive or the
>>>> shared ones, selected with a flag).
>>>> 2. do the exact same frame scheduling as with explicit fencing
>>>> 3. supply explicit fences in your atomic ioctl calls - these should
>>>> overrule any implicit fences (assuming correct kernel drivers, but we
>>>> have helpers so you can assume they all work correctly).
>>>>
>>>> By design this is possible, it's just that no one yet bothered enough
>>>> to make it happen.
>>>> -Daniel
>>>
>>> I'm not sure I understand the workflow of this one. I'm all in favour
>>> leaving the hard work to userspace. Note that I have assumed explicit
>>> fences from the start, I don't think implicit fence will ever exist in
>>> v4l2, but I might be wrong. What I understood is that there was a
>>> previous attempt in the past but it raised more issues then it actually
>>> solved. So that being said, how do handle exactly the follow use cases:
>>>
>>>  - A frame was lost by capture driver, but it was schedule as being the
>>> next buffer to render (normally previous frame should remain).
>>
>> Userspace just doesn't call into the kernel to flip to the lost frame,
>> so the previous one remains.
> 
> We are stuck in a loop you a me. Considering v4l2 to drm, where fences
> don't exist on v4l2, it makes very little sense to bring up fences if
> we are to wait on the fence in userspace.

It makes sense insofar as no V4L specific code would be needed to make
sure that the contents of a buffer produced via V4L aren't consumed
before they're ready to be.

>>>  - The scheduled frame is late for the next vblank (didn't signal on-
>>> time), a new one may be better for the next vlbank, but we will only
>>> know when it's fence is signaled.
>>
>> Userspace only selects a frame and submits it to the kernel after all
>> its fences have signalled.
>>
>>> Better in this context means the the presentation time of this frame is
>>> closer to the next vblank time. Keep in mind that the idea is to
>>> schedule the frames before they are signal, in order to make the usage
>>> of the fence useful in lowering the latency.
>>
>> Fences are about signalling completion, not about low latency.
> 
> It can be used to remove a roundtrip with userspace at a very time
> sensitive moment. If you pass a dmabuf with it's unsignalled fence to a
> kernel driver, the driver can start the job on this dmabuf as soon as
> the fence is signalled. If you always wait on a fence in userspace, you
> have to wait for the userspace process to be scheduled,

I doubt this magically works without something like that (e.g. a
workqueue, which runs in normal process context) in the kernel either. :)

> then userspace will setup the drm atomic request or similar action, which
> may take some time and may require another process in the kernel to have
> to be schedule. This effectively adds some variable delay, a gap where
> nothing is happening between two operations. This time is lost and
> contributes to the overall operation latency.

It only increases latency if it causes a flip to miss its target vblank,
and it's not possible to know this happens at an unacceptable rate
without trying. The prudent approach is to at least prototype a solution
with as much complexity as possible in userspace first. If that turns
out to perform too badly, then we can think about how to improve it by
adding complexity in the kernel.

> The benefit of fences we are looking for is being able to setup before
> the fence is signalled the operations on various compatible drivers.
> This way, on the time critical moment a driver can be feed more jobs,
> there is no userspace rountrip involved.

That is possible with other operations, just not with page flipping yet.

-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190425/d4934e85/attachment-0001.sig>