Small vc4 kms fixes and some questions.
Mario Kleiner
mario.kleiner.de at gmail.com
Fri May 20 01:40:21 UTC 2016
On 05/13/2016 08:51 PM, Eric Anholt wrote:
> Mario Kleiner <mario.kleiner.de at gmail.com> writes:
>
>> On 05/09/2016 09:38 PM, Eric Anholt wrote:
>>> Mario Kleiner <mario.kleiner.de at gmail.com> writes:
>>>
>>>> Hi Eric and all,
>>>>
>>>> two small fixes against vc4 kms, built and tested agains the
>>>> Raspberry Pi foundations 4.4.8 kernel tree on RPi2B.
>>>>
>>>> I'm tinkering with a Rpi 2B a bit to see if your vc4 work can
>>>> already make the Pi useful as a device for some serious but low
>>>> cost neuro-science applications.
>>>>
>>>> Eric:
>>>>
>>>> Is there any public documentation about the HVS hardware video scaler
>>>> or the pixel valves? I could find other docs about Videocore's 3d
>>>> part, but nothing about hvs or the pixel valves? Or are the register
>>>> definitions inside the vc4 already all that exists in the hw?
>>>
>>> Nope, docs for display never got released. I've got docs internally,
>>> and I'm happy to try to answer questions.
>>>
>>
>> Ah good. My questions are around making the pageflip completion events
>> and vblank timestamps as precise and reliable as possible.
>>
>> Atm. i'm working on a patch to the flip completion handling to make it
>> more robust. The current code in my stress tests with 10000 flips sends
>> out flip completions too early in about 1-2% of the trials.
>>
>> My current patch reduces these to 0 failures in my test. I'll send the
>> patch out later after some more tweaking and cleanup.
>>
>> E.g., for crtc/pixelvalve 1 the patch checks if the SCALER_DISPLIST1 and
>> SCALER_DISPLACT1 registers in the HVS match, and only then sends out the
>> pageflip completion event, otherwise waits for the next vblank. I assume
>> SCALER_DISPLACT1 is the true current value for start of active display
>> list whereas SCALER_DISPLIST1 is the value that got latched and then
>> gets committed at the next vblank after writing to the reg. This seems
>> to work well according to my special measurement equipment which can
>> timestamp the true start of display of a new framebuffer.
>
> Oh, interesting. The docs had said that the display lists were
> re-parsed on every line, so I assumed that also meant that the reparsing
> started from the head pointer every line. I've confirmed in the
> hardware what you've found: the head pointer is only latched at VSTART
> or INIT signal to the display fifo (that start signal comes from the
> pixel valve).
>
Yes. You can observe it also as a symptom with async pageflips, where
the return from flip is async, but the display still doesn't tear.
>> However i don't know if this already perfect, or just strongly reduced
>> error rate, so i need to know when the value gets committed (start of
>> vblank, end of vblank, vsync)? And when does the vblank irq fire for the
>> different possible settings of:
>>
>> # define PV_INT_VID_IDLE BIT(9)
>> # define PV_INT_VFP_END BIT(8)
>> # define PV_INT_VFP_START BIT(7)
>> # define PV_INT_VACT_START BIT(6)
>> # define PV_INT_VBP_START BIT(5)
>> # define PV_INT_VSYNC_START BIT(4)
>>
>> Currently the driver uses PV_INT_VFP_START.
>
> OK, looks like the PV timing state is:
>
> IDLE
> START
> (vfp lines)
> VSYNC
> (vsw lines)
> VBP
> (vbp lines)
> VACT
> (vact lines)
> VFP
> (jump to waiting for vfp lines)
>
> The normal timing loop after START does its transitions at HFP, and
> those transitions are when you get the VFP/VBP/VSYNC START/END
> interrupts.
>
> You'll jump back to IDLE almost right away if VID_EN is dropped.
>
> The VSTART signal to the HVS is when PV does IDLE -> START or on the
> last pixel of the active scanout. Note, this is *not* the PV's START
> signal, which looks like it's basically unused. Also, I think it's an
> interesting note that we don't have VFP_START on our first frame, as far
> as I've found.
>
> The PV requests that the HVS generate the next line at the last pixel of
> each HACTIVE when we're either at end of VBP or VACTIVE-but-not-its-end.
>
Thanks for the detailed explanation Eric. The way this is designed is
almost as if the Broadcom hw engineers had read our DRM vblank handling
and DRI swap scheduling code and decided to build the hw to exactly
match the expectations of our code :)
This means that my patch for pageflip completion robustness should be
perfectly robust and not subject to races with the hardware :) - My
testing over many runs of 10000's of flips confirms that. I've sent the
patch out for review, cc'ed to you, and it should be fine as is for
inclusion if you are happy with it.
>> The second question is if the HVS or pixelvalves have some kind of
>> scanline register that reports the currently scanned out scanline? I'd
>> like to implement scanout position queries, so we can get instant high
>> precision vblank timestamps if possible like we have for intel, amd and
>> nouveau, so we'd have precise timestamps, a vblank counter and also
>> additional power savings. Or lacking that are there other regs that
>> could be used to timestamp vblanks or updates of display lists in the
>> hardware?
>
> HVS has bits 0:11 of DISPSTATx for the Y line being generated. That
> will be in a different clock domain from the PV, but it's probably good
> enough, right?
>
Mostly. It's not quite as good as having true scanout position from the
PV's. Attached is my current w.i.p patch for scanoutpos based
timestamping. It already works quite well with my timing tests, but as
you can see from the code and the longish explanation, we can't get
close to perfect accuracy if we query the timestamp while the PV is in
vblank, and need some trickery to get ok'ish results there.
The problem, as far as my understanding of the hw from the results goes,
is that the HVS has some linebuffer fifo which can hold a couple of
composited scanlines, e.g., 13 - 24 for typical video resolutions, for
later consumption by the PV. The HVS refills much faster than the PV
consumes. During active scanout that means the PV and HVS work in
lockstep, the HVS fifo is almost always completely full and the HVS is
throttled to the rate with which the PV consumes. This is good, because
the scanlinepos of the PV is the compositing pos of the HVS from
DISPSTATx minus the capacity of the fifo.
At the last few lines of active scanout the HVS stops compositing while
the PV drains the fifo, so our position estimation gets inaccurate - not
a big deal in practice. In VBLANK however, we don't get any meaningful
reading because the HVS apparently quickly refills the fifo to full
capacity after the VSTART signal from the PV and then it is idle until
start of active scanout when the PV starts to consume from the fifo again.
You can see multiple special cases for "in vblank" to deal with this ok'ish.
Anyway, this still gives us almost all advantages of scanoutpos based
timestamping, except for the blind spot in vblank when we can't use HVS
readings. Even so the timestamps will always be accurate to an error of
less than 1 vblank duration or ~1 msec, and typically accurate to about
0.1 msecs, according to my measurements.
Two questions:
1. Can you tell me something about the size of that fifo - capacity in
lines, depending on horizontal resolution? My heuristic formula in the
patch...
fifo_lines = (2048 * 7 / mode->crtc_hdisplay) - 1;
... seems to work well for the three or four video modes i could
actually test. But having the real numbers would be better.
2. In the special case hack for vpos < fifo_lines, i try to estimate the
refill speed of the HVS as ratio between HVS clock and mode clock...
*vpos = -vblank_lines + (*vpos * mode->crtc_clock / 250000);
... under the assumption that the HVS clock is == system clock and that
clock is a constant 250 Mhz, based on some numbers from some of the
public docs. However, i'm not sure if the 250 Mhz is right, or if this
is even constant across Soc's or wrt. power management. That specific
code path so far doesn't really improve precision. I'm not sure if i
should drop it, or refine it, or how. But maybe my assumptions about HVS
composition rate vs. PV scanout rate are wrong there, or the clock value
is wrong?
thanks,
-mario
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-vc4-Implement-precise-vblank-timestamping.patch
Type: text/x-patch
Size: 13566 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20160520/45813d52/attachment-0001.bin>
More information about the dri-devel
mailing list