Request API: stateless VPU: the buffer mechanism and DPB management

Tue Jan 17 12:46:01 UTC 2017

Hi all,

If we move parser or part of DPB management mechanism into kernel we will face a issue as follows:
One customer requires dpb management do a flush when stream occurs in order to keep output frame clean.
While another one requires output frame with error to keep output frame smooth.
And when only one field has a error one customer wants to do a simple field copy to recover.

These are some operation related to strategy rather then mechanism.
I think it is not a good idea to bring such kind of flexible process to kernel driver.

So here is the ultimate challenge that how to reasonably move the parser and flexible process
which is encapsuled in firmware to a userspace - kernel stateless driver model.

陈恒明/Herman Chen
算法工程师/Algorithm Engineer
+86-591-83991906-8900
福州瑞芯微电子股份有限公司

From: Randy Li
Date: 2017-01-17 11:04
To: linux-media at vger.kernel.org
CC: dri-devel at lists.freedesktop.org; Hans Verkuil; pawel; ayaka at soulik.info; nicolas.dufresne at collabora.co.uk; florent.revest; hugues.fruchet; herman.chen at rock-chips.com
Subject: Request API: stateless VPU: the buffer mechanism and DPB management
Hello all:
   I have recently finish the learning of the H.264 codec and ready to 
write the driver. Although I have not get deep in syntax of H.264 but I 
think I just need to reuse and extended the VA-API H264 Parser from 
gstreamer. The whole plan in userspace is just injecting a parsing 
operation and set those v4l2 control in kernel before enqueue a buffer 
into OUTPUT, which would keep the most compatible with those stateful 
video IP(those with a firmware).
   But in order to do that, I can't avoid the management of DPB. I 
decided to moving the DPB management job from userspace in kernel. Also 
the video IP(On2 on rk3288 and the transition video IP on those future 
SoC than rk3288, rkv don't have this problem) would a special way to 
manage the DPB, which requests the same reference frame is storing in 
the same reference index in the runtime(actually it is its Motion Vector 
data appended in decoded YUV data would not be moved). I would suggest 
to keep those job in kernel, the userspace just to need update the list0 
and list1 of DPB. DPB is self managed in kernel the userspace don't need 
to even dequeue the buffer from CAPTURE until the re-order is done.
   The kernel driver would also re-order the CAPTURE buffer into display 
order, and blocking the operation on CAPTURE until a buffer is ready to 
place in the very display order. If I don't do that, I have to get the 
buffer once it is decoded, and marking its result with the poc, I could 
only begin the processing of the next frame only after those thing are 
done. Which would effect the performance badly. That is what chromebook 
did(I hear that from the other staff, I didn't get invoke in chromium 
project yet). So I would suggest that doing the re-order job in kernel, 
and inform the the userspace the buffers are ready when the new I 
frame(key frame) is pushed into the video IP.
   Although moving those job into kernel would increase the loading, but 
I think it is worth to do that, but I don't know whether all those 
thought are correct and high performance(It is more important than API 
compatible especially for those 4K video). And I want to know more ideas 
about this topic.
   I would begin the writing the new driver after the coming culture new 
year vacation(I would go to the Europe), I wish we can decide the final 
mechanism before I begin this job.
-- 
Randy Li
The third produce department

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20170117/5f961a48/attachment.html>