[PATCH v2 2/2] media: imx: vdic: Introduce mem2mem VDI deinterlacer driver

Thu Oct 3 15:11:44 UTC 2024

On 9/26/24 1:14 PM, Philipp Zabel wrote:
> Hi,

Hi,

> On Mi, 2024-09-25 at 22:14 +0200, Marek Vasut wrote:
>> The userspace could distribute the frames between the two devices in an
>> alternating manner, can it not ?
> 
> This doesn't help with latency, or when converting a single large
> frame.
> 
> For the deinterlacer, this can't be done with the motion-aware
> temporally filtering modes. Those need a field from the previous frame.

It is up to the userspace to pass the correct frames to the deinterlacer.

>> Would the 1280x360 field be split into two tiles vertically and each
>> tile (newly 1280/2 x 360) be enqueued on each VDIC ? I don't think that
>> works, because you wouldn't be able to stitch those tiles back together
>> nicely after the deinterlacing, would you? I would expect to see some
>> sort of an artifact exactly where the two tiles got stitched back
>> together, because the VDICs are unaware of each other and how each
>> deinterlaced the tile.
> 
> I was thinking horizontally, two 640x720 tiles side by side. 1280 is
> larger than the 968 pixel maximum horizontal resolution of the VDIC.
> 
> As you say, splitting vertically (which would be required for 1080i)
> should cause artifacts at the seam due to the 4-tap vertical filter.

Can the userspace set some sort of offset/stride in each buffer and 
distribute the task between the two VDIs then ?

> [...]
>>>
>>> With the rigid V4L2 model though, where memory handling, parameter
>>> calculation, and job scheduling of tiles in a single frame all have to
>>> be hidden behind the V4L2 API, I don't think requiring userspace to
>>> combine multiple mem2mem video devices to work together on a single
>>> frame is feasible.
>>
>> If your concern is throughput (from what I gathered from the text
>> above), userspace could schedule frames on either VDIC in alternating
>> manner.
> 
> Both throughput and latency.
> 
> Yes, alternating to different devices would help with throughput where
> possible, but it's worse for frame pacing, a hassle to implement
> generically in userspace, and it's straight up impossible with temporal
> filtering.

See above, userspace should be able to pass the correct frames to m2m 
device.

>> I think this is much better and actually generic approach than trying to
>> combine two independent devices on kernel level and introduce some sort
>> of scheduler into kernel driver to distribute jobs between the two
>> devices. Generic, because this approach works even if either of the two
>> devices is not VDIC. Independent devices, because yes, the MX6Q IPUs are
>> two independent blocks, it is only the current design of the IPUv3
>> driver that makes them look kind-of like they are one single big device,
>> I am not happy about that design, but rewriting the IPUv3 driver is way
>> out of scope here. (*)
> 
> The IPUs are glued together at the capture and output paths, so yes,
> they are independent blocks, but also work together as a big device.
> 
>>> Is limiting different users to the different deinterlacer hardware
>>> units a real usecase? I saw the two ICs, when used as mem2mem devices,
>>> as interchangeable resources.
>>
>> I do not have that use case, but I can imagine it could come up.
>> In my case, I schedule different cameras to different VDICs from
>> userspace as needed.
> 
> Is this just because a single VDIC does not have enough throughput to
> serve all cameras, or is there some reason for a fixed assignment
> between cameras and VDICs?

I want to be able to distribute the bandwidth utilization between the 
two IPUs .

>>>>> To be fair, we never implemented that for the CSC/scaler mem2mem device
>>>>> either.
>>>>
>>>> I don't think that is actually a good idea. Instead, it would be better
>>>> to have two scaler nodes in userspace.
>>>
>>> See above, that would make it impossible (or rather unreasonably
>>> complicated) to distribute work on a single frame to both IPUs.
>>
>> Is your concern latency instead of throughput ? See my comment in
>> paragraph (*) .
> 
> Either, depending on the use case.
> 
> [...]
>>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207
>>>
>>> That code is unused. The direct hardware path doesn't use
>>> IPUV3_CHANNEL_MEM_VDI_PREV/CUR/NEXT, but is has a similar effect, half
>>> of the incoming fields are dropped. The setup is vdic_setup_direct().
>>
>> All right, let's drop that unused code then, I'll prepare a patch.
> 
> Thanks!
> 
>> But it seems the bottom line is, the VDI direct mode does not act as a
>> frame-rate doubler ?
> 
> Yes, it can't. In direct mode, VDIC only receives half of the fields.
> 
> [...]
>>>>
>> Why would adding the (configurable) frame-rate doubling mode break
>> userspace if this is not the default ?
> 
> I'm not sure it would. Maybe there should be a deinterlacer control to
> choose between full and half field rate output (aka frame doubling and
> 1:1 input to output frame rate).
> 
> Also, my initial assumption was that currently there is 1:1 input
> frames to output frames. But with temporal filtering enabled there's
> already one input frame (the first one) that doesn't produce any
> output.

Hum, ok.

>>>>> If we don't start with that supported, I fear userspace will make
>>>>> assumptions and be surprised when a full rate mode is added later.
>>>>
>>>> I'm afraid that since the current VDI already does retain input frame
>>>> rate instead of doubling it, the userspace already makes an assumption,
>>>> so that ship has sailed.
>>>
>>> No, this is about the deinterlacer mem2mem device, which doesn't exist
>>> before this series.
>>
>> I am not convinced it is OK if the direct VDI path and mem2mem VDI
>> behave differently, that would be surprising to me as a user ?
> 
> Is this still about the frame rate doubling? Surely supporting it in
> the mem2mem device and not in the capture path is ok. I'm not arguing
> that frame doubling should be enabled by default.

My understanding was that your concern was -- frame doubling should be 
the default because it not being the default would break userspace . 
Maybe that's not the case ?

>>> The CSI capture path already has configurable framedrops (in the CSI).
>>
>> What am I looking for ? git grep doesn't give me any hits ? (**)
> 
> That's configured by the set_frame_interval pad op of the CSI subdevice
> - on the IDMAC output pad. See csi_find_best_skip().
> 
>>>> But I think we can make the frame doubling configurable ?
>>>
>>> That would be good. Specifically, there must be no guarantee that one
>>> input frame with two fields only produces one deinterlaced output
>>> frame, and userspace should somehow be able to understand this.
>>
>> See my question (**) , where is this configurable framedrops thing ?
> 
> This would have to be done differently, though. Here we don't have
> subdev set_frame_interval configuration, and while VIDIOC_S_PARM /
> v4l2_captureparm were used to configure frame dropping on capture
> devices, that's not really applicable to mem2mem deinterlacers.

V4L2_CID_DEINTERLACING_MODE should probably work.

>>> I'd rather not default to the setting that throws away half of the
>>> input data. Not using frame doubling by default is sensible, but now
>>> that using all three input fields to calculate the output frame is
>>> possible, why not make that the default.
>>
>> To save memory bandwidth on the MX6, that's my main concern.
> 
> What userspace are you using to exercise this driver? Maybe we can back
> this concern with a few numbers (or mine with pictures).
Custom one, but with gstreamer 1.22 and 1.24 driving the media pipeline.