[Intel-gfx] [PATCH 3/3] drm/i915: Improve the accuracy of get_scanout_pos on CTG+

Thu Sep 26 19:04:05 CEST 2013

On 25.09.13 10:11, Ville Syrjälä wrote:
> On Wed, Sep 25, 2013 at 05:12:54AM +0200, Mario Kleiner wrote:
>>
>>
>> On 23.09.13 12:02, ville.syrjala at linux.intel.com wrote:
>>> From: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>>
>>> The DSL register increments at the start of horizontal sync, so it
>>> manages to miss the entire active portion of the current line.
>>>
>>> Improve the get_scanoutpos accuracy a bit when the scanout position is
>>> close to the start or end of vblank. We can do that by double checking
>>> the DSL value against the vblank status bit from ISR.
>>>
>>> Cc: Mario Kleiner <mario.kleiner at tuebingen.mpg.de>
>>> Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_irq.c | 53 +++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 53 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>> index 4f74f0c..14b42d9 100644
>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>> @@ -567,6 +567,47 @@ static u32 gm45_get_vblank_counter(struct drm_device *dev, int pipe)
>>>    	return I915_READ(reg);
>>>    }
>>>
>>> +static bool g4x_pipe_in_vblank(struct drm_device *dev, enum pipe pipe)
>>> +{
>>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>>> +	uint32_t status;
>>> +
>>> +	if (IS_VALLEYVIEW(dev)) {
>>> +		status = pipe == PIPE_A ?
>>> +			I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT :
>>> +			I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;
>>> +
>>> +		return I915_READ(VLV_ISR) & status;
>>> +	} else if (IS_G4X(dev)) {
>>> +		status = pipe == PIPE_A ?
>>> +			I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT :
>>> +			I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;
>>> +
>>> +		return I915_READ(ISR) & status;
>>> +	} else if (INTEL_INFO(dev)->gen < 7) {
>>> +		status = pipe == PIPE_A ?
>>> +			DE_PIPEA_VBLANK :
>>> +			DE_PIPEB_VBLANK;
>>> +
>>> +		return I915_READ(DEISR) & status;
>>> +	} else {
>>> +		switch (pipe) {
>>> +		default:
>>> +		case PIPE_A:
>>> +			status = DE_PIPEA_VBLANK_IVB;
>>> +			break;
>>> +		case PIPE_B:
>>> +			status = DE_PIPEB_VBLANK_IVB;
>>> +			break;
>>> +		case PIPE_C:
>>> +			status = DE_PIPEC_VBLANK_IVB;
>>> +			break;
>>> +		}
>>> +
>>> +		return I915_READ(DEISR) & status;
>>> +	}
>>> +}
>>> +
>>>    static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe,
>>>    			     int *vpos, int *hpos)
>>>    {
>>> @@ -616,6 +657,18 @@ static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe,
>>>    		 * scanout position from Display scan line register.
>>>    		 */
>>>    		position = I915_READ(PIPEDSL(pipe)) & 0x1fff;
>>> +
>>> +		/*
>>> +		 * The scanline counter increments at the leading edge
>>> +		 * of hsync, ie. it completely misses the active portion
>>> +		 * of the line. Fix up the counter at both edges of vblank
>>> +		 * to get a more accurate picture whether we're in vblank
>>> +		 * or not.
>>> +		 */
>>> +		in_vbl = g4x_pipe_in_vblank(dev, pipe);
>>> +		if ((in_vbl && position == vbl_start - 1) ||
>>> +		    (!in_vbl && position == vbl_end - 1))
>>> +			position = (position + 1) % vtotal;
>>>    	} else {
>>>    		/* Have access to pixelcount since start of frame.
>>>    		 * We can split this into vertical and horizontal
>>>
>>
>> This one i don't know. I think i can't follow the logic, but i don't
>> know enough about the way the intel hw counts.
>>
>> Do you mean the counter increments when the scanline is over, instead of
>> when it begins?
>
> Let me draw a picture of the scanline (not to scale):
>
>   |XXXXXXXXXXXXX|-----|___________|---|
>    horiz. active       horiz. sync
>   ^                   ^
>   |                   |
>   first pixel         this is where the
>   of the line         scanline counter increments
>
>> With this correction by +1 at the edges of vblank, the scanlines at
>> vbl_start and vbl_end would be reported twice, for two successive
>> scanline durations, that seems a bit weird and asymmetric to the rest of
>> the scanline positions. Wouldn't it make more sense to simply always add
>> 1 for a smaller overall error, given that hblank is shorter than the
>> active scanout part of a scanline?
>
> Since the counter increments too late, drm_handle_vblank()
> may get the wrong idea ie. something like this may happen:
>
> 1. vblank irq triggered
> 2. drm_handle_vblank() gets called
> 3. i915_get_crtc_scanoutpos() returns vbl_start-1 as the scanline
> 4. delta_ns calculation gets confused and tries to correct for it
>
> Now, the correction you do for delta_ns should handle this, but
> I don't like having such kludges in common code, and we can handle
> it in the driver as I've demonstrated. But yeah, I suppose it can
> make the error slightly less stable.
>

The kludges are also needed for other drivers, especially some radeon 
gpu's which can fire their vblank interrupts multiple scanlines before 
the vblank, and iirc nouveau with the prototype patches we had. I like 
that catch-all for robustness, especially on gpu's whose behaviour is 
not that well documented as in case of intel, but of course it's better 
if a driver does the right thing from the start.

> For some other uses (atomic page flip stuff) of the scanline position,
> I definitely want this correction since I need accurate information
> whether the position has passed vblank start.
>

Ok, i can live with that.

>> Also it adds back one lock protected, therefore potentially slow,
>> register read into the time critical code.
>
> I don't think a single register read should be _that_ slow even
> with all the extra junk we do. And of course we can fix that problem.
>

Ja. If you make g4x_pipe_in_vblank() or maybe a helper to just return 
the register and status mask, we can do the actual register read also as 
a __raw read together with the raw read of scanout position regs, inside 
that ktime_get() enclosed section. Then it's cheap and rt compatible and 
done with that one uncore.lock lock/unlock and everybody's happy.

-mario