Funky new vblank counter regressions in Linux 4.4-rc1

Fri Nov 27 06:36:20 PST 2015

On 11/25/2015 08:38 PM, Alex Deucher wrote:
> On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
> <mario.kleiner.de at gmail.com> wrote:
>> On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
>>>
>>> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>>>>
>>>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
>>>>>
>>>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>>>>>>
>>>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> ...
>>>>>>>> Ok, but why would that be a bad thing? I think we want it to think it
>>>>>>>> is
>>>>>>>> in the previous frame if it is called outside the vblank irq context.
>>>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq
>>>>>>>> is
>>>>>>>> because we know the vblank irq handler we are executing atm. was
>>>>>>>> meant
>>>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>>>>>> the scanout positions and thereby timestamp to correspond to that new
>>>>>>>> frame. But if something called outside irq context it should get a
>>>>>>>> scanout position/timestamp that corresponds to "reality".
>>>>>>>
>>>>>>>
>>>>>>> It would be a bad thing since it would cause the timestamp to jump
>>>>>>> backwards, and that would also cause the frame count guesstimate to go
>>>>>>> backwards.
>>>>>>>
>>>>>>
>>>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
>>>>>> which we try to use on AMD.
>>>>>
>>>>>
>>>>> Well, if you do it that way then you have the problem of the hw counter
>>>>> seeming to jump forward by one after crossing the start of vblank (when
>>>>> compared to the value you sampled when you processed the early vblank
>>>>> interrupt).
>>>>>
>>>>
>>>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>>>> current locking with the new vblank counting in the core. The vblank
>>>> enable path is safe due to locking and discounting of redundant
>>>> timestamps etc. But the disable path could go wrong:
>>>>
>>>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>>>> updates timestamps and counts "as if" in vblank -> incremented vblank
>>>> count and timestamp now set in the future.
>>>>
>>>> 2. After vblank irq finishes, but just before leading edge of vblank,
>>>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>>>> count because before vblank and not in vblank irq. Now
>>>> drm_update_vblank_count() would process a
>>>> "new" timestamp and count from the past and we'd have time and counts
>>>> going backwards, and bad things would happen.
>>>>
>>>> I haven't observed such a thing happening during testing so far,
>>>> probably because the time window in which it could happen is tiny, but
>>>> given how awfully bad it would be, it needs to be prevented.
>>>>
>>>> I had a look at the description of the Vblank irq in the "M76 Register
>>>> Reference Guide" for older asics and the description suggests that the
>>>> vblank irq fires when the crtc's line buffer is finished reading pixel
>>>> data from the scanout buffer in memory for a frame, ie., when the line
>>>> buffer read "enters" vblank.
>>>
>>>
>>> Hmm. Does that mean there's always at least one fullscreen plane enabled
>>> in the hw? As in you can't turn off the primary plane or make it smaller
>>> than the active video area? Othwewise it sounds like you'd could either
>>> not get it at all, or get it somewhere in the middle of the screen.
>>>
>>
>> It says "Interrupt that can be programmed to be generated by the
>> primary display controller's line buffer logic either when the
>> source image line counter is not requesting any active
>> display data (i.e. in the vertical blank) or the output CRTC
>> timing generator is within the vertical blanking region."
>>
>> So my statements were my interpretation of this quote, so i can make some
>> sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
>> M76 reference refers to some older asics, i just assume it is the same for
>> the current ones, given that observed behaviour would be consistent with the
>> line buffer causing this lead of a couple of scanlines. I see about 2
>> scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
>> line buffer is, how quickly it refills etc., but it sounds reasonable.
>
> The size of the line buffer varies by generation, but the LB logic is
> still responsible for generating the vblank interrupt even on newer
> hw.
>
> Alex
>

Thanks for the pointer. I digged through the 
evergreen_line_buffer_adjust() function and its siblings from classic to 
DCE11. Seems the line buffer capacity goes up to a max 16384 pixels on 
single display, e.g., evergreen, and more like 8192 pixels on DCE8+? 
It's expressed as 8192 * 2 and talking about different line buffer 
partitions in some places. At least for DCE4+ i can see the sizes from 
the code, but not for earlier gens.

To find the proper value of how much earlier i need to fudge/place the 
start of vblank wrt. the true start of vblank in the code, i think i'll 
have to adjust for line buffer size.

The worst case would be that the line buffer can be ahead of our scanout 
positions by the full height of the line buffer a la: fudgelines = 
lb_size / mode->crtc_hdisplay

I think i could calculate this in the xxx_line_buffer_adjust() functions 
and store it in radeon_crtc for use in the fudging code. Atm. i'm just 
using a hard-coded "10" which so far worked for typical display modes. 
Assuming something like a 640x480 mode i'd need to set a constant of 26 
if i'd want to account for full line buffer size as worst case. So i 
wonder what a good value would be to be safe but at the same time to 
keep this extended vblank small? E.g., maybe a smaller value than full 
line buffer height would do because due to the way watermarks for lb 
refill are set it can't get ahead by more than a few scanlines << total 
size?

I also realized that we need to extend the virtual vblank by the same 
fudged number of scanlines when programming the page flip. Our hw 
settings make it flip at leading edge of true vblank if the new bo 
address is latched in time. To keep all the logic in userspace working 
properly we need to guarantee that now it will flip at the start of the 
fudged vblank at latest, otherwise delays the flip one frame. So i'd add 
a check for being in the fudged "vblank before the true vblank" to the 
radeon_flip_work_func() before programming the flip and udelay() if we'd 
end up in this forbidden zone. This problem already exists in the old 
code, but probably doesn't happen frequently (or ever) in practice, at 
least i couldn't easily provoke it, because usually we are too slow in 
scheduling the flip to fit into the time window for this race. But 
future optimizations or faster machines might break it.

Thoughts?
-mario