Funky new vblank counter regressions in Linux 4.4-rc1

Mario Kleiner mario.kleiner.de at gmail.com
Mon Nov 23 13:20:40 PST 2015


On 11/23/2015 09:04 PM, Harry Wentland wrote:
> Hi Mario,
>
> when we've had issues with this on amdgpu Christian fixed it by enabling
> page flip irq all the time, rather than turning it on when usermode
> request a flip and turning it back off after we handled it. I believe
> that fix exists on radeon already. Michel should have more info on that.
>
> See other comments inline.
>
> Thanks,
> Harry
>
>
> On 2015-11-23 11:02 AM, Mario Kleiner wrote:
>> On 11/20/2015 04:42 AM, Alex Deucher wrote:
>>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>>> <mario.kleiner.de at gmail.com> wrote:
>>>> Hi Alex and Michel and Ville,
>>>>
>>>> it's "fix vblank stuff" time again ;-)
>>>
>>> Adding Harry from our display team.  He might be able to fill in the
>>> blanks of on some of this better than I can.  It might also be worth
>>> checking to see how our DAL (our new display code which is being
>>> developed directly by our display team) code handles this.  It could
>>> be that we are just missing register settings:
>>
>> Thanks Alex! And hello Harry :)
>>
>>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
>>
>> I'll have a look at this.
>>
>>> Additionally we've published full registers headers for the display
>>> block:
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
>>>
>>> The DCE8 stuff should generally apply back to DCE4.  If you have
>>> questions about registers older asics not covered in the hw docs, let
>>> me know.  Note the new headers are dword aligned rather than byte
>>> aligned.
>>>
>>
>> I've tested now with two different progressive modes on DCE3 and one
>> progressive mode on DCE4, the only cards i have atm. So far it seems
>> that the framecounter indeed increments when the vpos from the scanout
>> position query jumps back to zero. Attached for reference is my
>> current patch to radeon-kms. This one seems to work reliably so far,
>> also if i enable the immediate vblank irq disable which we so far only
>> used on intel-kms.
>>
>> But according to this patch the framecounter increment happens
>> somewhere in the middle of vblank.
>>
>> Essentially from the vpos extracted from
>> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start"
>> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1
>> the framecounter stays on the count of the old previous scanout cycle.
>> Then when vpos wraps to zero the framecounter increments by 1. And
>> then we have another couple of dozen lines inside vblank until
>> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and
>> entering active scanout for the new frame.
>>
>> So the position of observed framecounter increment seems to be not
>> close to the end of vblank ("start of frame" indicator?), but a couple
>> of scanlines after start of vblank.
>>
>> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478,
>> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the
>> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and
>> the framecounter increments by 1, then 38 scanlines later the vblank
>> ends.
>>
>> So i seem to have something that seems to work in practice and this
>> "increment framecounter if vpos wraps back to zero" behavior makes
>> some sense. It just doesn't conform to what those descriptions for
>> start_line and "start of frame" indicator describe?
>>
> This is correct. Our HW doesn't really have a vblank counter but a frame
> counter. The framecounter increments at the start of vsync, which is
> when we wrap to zero which doesn't coincide with the start of vblank.
>
> What we're trying to do with get_vblank_counter isn't the same as what
> framecount gives us, but we could probably do something like:
>
> if (get_scanout_pos > vblank_start)
>    return frame_count + 1;
> else
>    return frame_count;
>

Great! That's what my current patch does and it seems to work well with 
different video modes on both DCE3 and DCE4. So theory agrees with 
practice again :) - thanks for clarifying this.

So the other problem we have since forever is the vblank irq firing 
before the start of vblank. We are typically 1-2 scanlines before 
vblank_start when we sample the scanout position and framecounter. This 
needs a slightly ugly workaround. Is there a way, maybe some config 
register, to fire the irq at leading edge of vblank instead of a bit too 
early?

thanks,
-mario

> Harry
>
>> I'll test with a few more video modes.
>>
>> thanks,
>> -mario
>>
>>
>>>>
>>>> Ville's changes to the DRM's drm_handle_vblank() /
>>>> drm_update_vblank_count()
>>>> code in Linux 4.4 not only made that code more elegant, but also
>>>> removed the
>>>> robustness against the vblank irq quirks in AMD hw and similar
>>>> hardware. So
>>>> now i get tons of off-by-one errors and
>>>>
>>>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>>>> completion event has impossible msc 24803 < target_msc 24804" XOrg
>>>> messages
>>>> from that kernel.
>>>>
>>>> One of the reasons for trouble is that AMD hw quirk where the hw
>>>> fires an
>>>> extra vblank irq shortly after vblank irq's get enabled, not
>>>> synchronized to
>>>> vblank, but typically in the middle of active scanout, so we get a
>>>> redundant
>>>> call to drm_handle_vblank in the middle of scanout.
>>>>
>>>> To fix that i have a minor patch to make drm_update_vblank_count()
>>>> again
>>>> robust against such redundant calls, which i will send out later to the
>>>> mailing list. Diff attached for reference.
>>>>
>>>> The second quirk of AMD hw is that the vblank interrupt fires a few
>>>> scanlines before start of vblank, so drm_handle_vblank ->
>>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>>>> called
>>>> before the start of the vblank for which the new vblank count should be
>>>> queried.
>>>>
>>>> The third problem is that the DRM vblank handling always had the
>>>> assumption
>>>> that hardware vblank counters would essentially increment at leading
>>>> edge of
>>>> vblank - basically in sync with the firing of the vblank irq, so
>>>> that a hw
>>>> counter readout from within the vblank irq handler would always
>>>> deliver the
>>>> new incremented value. If this assumption is violated then the
>>>> counting by
>>>> use of the hw counter gets unreliable, because depending on random
>>>> small
>>>> delays in irq handling the code may end up sampling the hw counter
>>>> pre- or
>>>> post-increment, leading to inconsistent updating and funky bugs. It
>>>> just so
>>>> happens that AMD hardware doesn't increment the hw counter at
>>>> leading edge
>>>> of vblank, so stuff falls apart.
>>>>
>>>> So to fix those two problems i'm tinkering with cooking the hw vblank
>>>> counter value returned by radeon_get_vblank_counter_kms() to make it
>>>> appear
>>>> as if the counter incremented at leading edge of vblank in sync with
>>>> vblank
>>>> irq.
>>>>
>>>> It almost sort of works on the rs600 code path, but i need a bit of
>>>> info
>>>> from you:
>>>>
>>>> 1. There's this register from the old specs for m76.pdf, which is
>>>> not part
>>>> of the current register defines for radeon-kms:
>>>>
>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>
>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>> vertical
>>>> scanout position. It seems to give the same readings as the 24 bit
>>>> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>> would
>>>> come handy.
>>>>
>>>> Does Evergreen and later have a same/similar register and where is it?
>>>
>>> Yes.  CRTC_STATUS_VF_COUNT
>>>
>>> CRTC:CRTC_STATUS_VF_COUNT  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32  ·
>>>   [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2]
>>> GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4]
>>> GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c
>>> DESCRIPTION: Current composite vertical and frame count for CRTC
>>> Field Name   Bits    Default   Description
>>> CRTC_VF_COUNT
>>> (Access: R)    28:0    0x0    Reports current vertical and frame count
>>>
>>>>
>>>> 2. The hw framecounter seems to increment when the vertical scanout
>>>> position
>>>> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i
>>>> tested so
>>>> far. Is this so on all asics? And is the hw counter increment happening
>>>> exactly at the moment that vertical scanout position jumps back to
>>>> zero, ie.
>>>> both events are driven by the same signal? Or is the framecounter
>>>> increment
>>>> just happening somewhere inside either scanline VTOTAL-1 or scanline 0?
>>>
>>> Unless Harry knows, we can ask the hw team, but I doubt they would
>>> have changed it.  I think it's tied to the start line which is
>>> controlled by this register:
>>> CRTC:CRTC_START_LINE_CONTROL  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32
>>>   ·  [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2]
>>> GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4]
>>> GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc
>>> DESCRIPTION: move start_line signal earlier by 1 line in CRTC
>>> Field Name   Bits          Default         Description
>>> CRTC_PROGRESSIVE_START_LINE_EARLY         0    0x0    move start_line
>>> signal by 1 line eariler in progressive mode
>>>
>>> CRTC_INTERLACE_START_LINE_EARLY         8     0x1     move start_line
>>> signal by 1 line earlier in interlaced timing mode
>>>
>>> CRTC_ADVANCED_START_LINE_POSITION         19:16     0x4 Advanced
>>> Start Line position respect to not VBLANK. Default of 4 means the
>>> Advanced Start Line is 4 lines before the first non VBLANK line
>>>
>>> The following info I dug up internally may be useful:
>>>
>>> The position of the CRTC output timing generator when the “start of
>>> frame” indicator occurs depends on several conditions.  These
>>> conditions are whether the timing generator is outputting timing for a
>>> progressive or interlaced display mode, whether the scaler is enabled,
>>> and if so, whether the register to select an earlier than normal
>>> “start of frame” indicator is set.  The “start of frame” indicator
>>> typically occurs 2 lines before the vertical blank end position
>>> (DxCRTC_V_BLANK_END) is reached
>>>
>>> When interlaced output display modes are enabled
>>> (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled
>>> (DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts
>>> by 2 for both the even fields and odd fields of each frame.  For both
>>> the even fields and the odd fields of interlaced output modes, the
>>> “start of frame” indicator occurs 2 lines before the end of the
>>> vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END –
>>> 3 depending on the field since the vertical counter counts by 2 in
>>> interlaced), since the vertical counter counts by 2 in this mode).
>>> There is one exception to the previous statement.  If scaling is
>>> enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled
>>> (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then
>>> the “start of frame” indicator happens 3 lines before the end of the
>>> vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5
>>> depending on the field since the vertical counter counts by 2 in
>>> interlaced).
>>>
>>> When progressive output display modes are enabled
>>> (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled
>>> (DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines
>>> before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2).
>>> Similar to interlaced output mode, there is one exception to the
>>> previous statement.  If scaling is enabled (DxSCL_SCALE_EN = 1) and
>>> the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY =
>>> 1) in progressive output mode then the “start of frame” indicator
>>> happens 3 lines before the end of the vertical blank
>>> (DxCRTC_V_BLANK_END – 3)
>>>
>>> I hope this helps,
>>>
>>> Alex
>>>
>>>>
>>>>
>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>>>> regression and with a bit of luck at the same time improve by being
>>>> able to
>>>> set dev->vblank_disable_immediate = true then and allow vblank irqs
>>>> to get
>>>> turned off more aggressively for a bit of extra power saving.
>>>>
>>>> thanks,
>>>> -mario
>


More information about the dri-devel mailing list