[PATCH 1/2] drm/i915/display: remove small micro-optimizations in irq handling
Tvrtko Ursulin
tursulin at ursulin.net
Thu Apr 18 10:57:27 UTC 2024
On 18/04/2024 10:49, Jani Nikula wrote:
> On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
>> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
>>> The raw register reads/writes are there as micro-optimizations to avoid
>>> multiple pointer indirections on uncore->regs. Presumably this is useful
>>> when there are plenty of register reads/writes in the same
>>> function. However, the display irq handling only has a few raw
>>> reads/writes. Remove them for simplification.
>>
>> I think that comment didn't age well. Not to say there's something wrong
>> with this commit, but just to make sure we are aware of the additional
>> stuff going on and we if we are ok with that.
>>
>> using intel_de_read() in place of raw_reg_read() will do (for newer
>> platforms):
>>
>> 1) Read FPGA_DBG to detect unclaimed access before the actual read
>> 2) Find the relevant forcewake for that register, acquire and wait for ack
>> 3) readl(reg)
>> 4) Read FPGA_DBG to detect unclaimed access after the actual read
>> 5) Trace reg rw
>>
>> That's much more than a pointer indirection. Are we ok with that in the
>> irq? Also, I don't know why but we have variants to skip tracing (step
>> 5 above), but on my books a disabled tracepoint is order of magnitudes
>> less overhead than 1, 2 and 4.
>
> Honestly, I don't really know.
>
> The thing is, we have these ad hoc optimizations all over the place. Why
> do we have the raw access in two places, but not everywhere in irq
> handling? The pointer indirection thing really only makes sense if you
> have a lot of access in a function, but that's not the case. You do have
> a point about everything else.
The "why only two" places is I think simply an artefact of refactoring
and code evolution. Initially all IRQ handling was in one function, then
later gen11 and display parts got split out as more platforms were
added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display
interrupt handling").
As for the original rationale, it was described in commits like:
2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")
c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")
Obviosuly, once a portion of a handler was/is extracted, pointer caching
to avoid uncore->regs reloads may not make full sense any more due
function calls potentially overshadowing that cost.
As for unclaimed debug, I would say it is probably okay to not burden
the irq handlers with it, but if the display folks think a little bit of
extra cost in this sub-handlers is fine that would sound plausible to me
given the frequency of display related interrupts is low. So for me
patch is fine if it makes the display decoupling easier.
> What would the interface be like if display were its own module? We
> couldn't just wrap it all in a bunch of macros and static inlines. Is
> the end result that display irq handling needs to call functions via
> pointers in another module? Or do we need to move the register level irq
> handling to xe and i915 cores, and handle the display parts at a higher
> abstraction level?
AFAIR no trace variants were not for performance but to avoid log spam
when debugging stuff. From things like busy/polling loops.
Regards,
Tvrtko
>>
>> btw, if we drop the raw accesses, then we can probably drop (1) above.
>>
>> Lucas De Marchi
>>
>>>
>>> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
>>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>> index f846c5b108b5..d4ae9139be39 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_display_irq.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>> @@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
>>>
>>> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
>>> {
>>> - void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>> u32 iir;
>>>
>>> if (!(master_ctl & GEN11_GU_MISC_IRQ))
>>> return 0;
>>>
>>> - iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
>>> + iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
>>> if (likely(iir))
>>> - raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
>>> + intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
>>>
>>> return iir;
>>> }
>>> @@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
>>>
>>> void gen11_display_irq_handler(struct drm_i915_private *i915)
>>> {
>>> - void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>> - const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
>>> + u32 disp_ctl;
>>>
>>> disable_rpm_wakeref_asserts(&i915->runtime_pm);
>>> /*
>>> * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
>>> * for the display related bits.
>>> */
>>> - raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
>>> + disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
>>> +
>>> + intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
>>> gen8_de_irq_handler(i915, disp_ctl);
>>> - raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
>>> - GEN11_DISPLAY_IRQ_ENABLE);
>>> + intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
>>>
>>> enable_rpm_wakeref_asserts(&i915->runtime_pm);
>>> }
>>> --
>>> 2.39.2
>>>
>
More information about the Intel-gfx
mailing list