Funky new vblank counter regressions in Linux 4.4-rc1

Mario Kleiner mario.kleiner.de at gmail.com
Thu Nov 19 09:46:28 PST 2015


Hi Alex and Michel and Ville,

it's "fix vblank stuff" time again ;-)

Ville's changes to the DRM's drm_handle_vblank() / 
drm_update_vblank_count() code in Linux 4.4 not only made that code more 
elegant, but also removed the robustness against the vblank irq quirks 
in AMD hw and similar hardware. So now i get tons of off-by-one errors and

"[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip 
completion event has impossible msc 24803 < target_msc 24804" XOrg 
messages from that kernel.

One of the reasons for trouble is that AMD hw quirk where the hw fires 
an extra vblank irq shortly after vblank irq's get enabled, not 
synchronized to vblank, but typically in the middle of active scanout, 
so we get a redundant call to drm_handle_vblank in the middle of scanout.

To fix that i have a minor patch to make drm_update_vblank_count() again 
robust against such redundant calls, which i will send out later to the 
mailing list. Diff attached for reference.

The second quirk of AMD hw is that the vblank interrupt fires a few 
scanlines before start of vblank, so drm_handle_vblank -> 
drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
called before the start of the vblank for which the new vblank count 
should be queried.

The third problem is that the DRM vblank handling always had the 
assumption that hardware vblank counters would essentially increment at 
leading edge of vblank - basically in sync with the firing of the vblank 
irq, so that a hw counter readout from within the vblank irq handler 
would always deliver the new incremented value. If this assumption is 
violated then the counting by use of the hw counter gets unreliable, 
because depending on random small delays in irq handling the code may 
end up sampling the hw counter pre- or post-increment, leading to 
inconsistent updating and funky bugs. It just so happens that AMD 
hardware doesn't increment the hw counter at leading edge of vblank, so 
stuff falls apart.

So to fix those two problems i'm tinkering with cooking the hw vblank 
counter value returned by radeon_get_vblank_counter_kms() to make it 
appear as if the counter incremented at leading edge of vblank in sync 
with vblank irq.

It almost sort of works on the rs600 code path, but i need a bit of info 
from you:

1. There's this register from the old specs for m76.pdf, which is not 
part of the current register defines for radeon-kms:

"D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"

It contains the lower 16 bits of framecounter and the 13 bits of 
vertical scanout position. It seems to give the same readings as the 24 
bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
would come handy.

Does Evergreen and later have a same/similar register and where is it?

2. The hw framecounter seems to increment when the vertical scanout 
position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 
gpu i tested so far. Is this so on all asics? And is the hw counter 
increment happening exactly at the moment that vertical scanout position 
jumps back to zero, ie. both events are driven by the same signal? Or is 
the framecounter increment just happening somewhere inside either 
scanline VTOTAL-1 or scanline 0?


If we can fix this and get it into rc2 or rc3 then we could avoid a bad 
regression and with a bit of luck at the same time improve by being able 
to set dev->vblank_disable_immediate = true then and allow vblank irqs 
to get turned off more aggressively for a bit of extra power saving.

thanks,
-mario
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fixupForDRM.patch
Type: text/x-patch
Size: 3373 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20151119/ce6d9daa/attachment.bin>


More information about the dri-devel mailing list