[PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit

Robert Bragg robert at sixbynine.org
Thu Apr 21 15:01:43 UTC 2016


On Thu, Apr 21, 2016 at 12:16 AM, Chris Wilson <chris at chris-wilson.co.uk>
wrote:

> On Wed, Apr 20, 2016 at 03:23:10PM +0100, Robert Bragg wrote:
> > +static int hsw_enable_metric_set(struct drm_i915_private *dev_priv)
> > +{
> > +     int ret = i915_oa_select_metric_set_hsw(dev_priv);
> > +
> > +     if (ret)
> > +             return ret;
> > +
> > +     I915_WRITE(GDT_CHICKEN_BITS, GT_NOA_ENABLE);
> > +
> > +     /* PRM:
> > +      *
> > +      * OA unit is using “crclk” for its functionality. When trunk
> > +      * level clock gating takes place, OA clock would be gated,
> > +      * unable to count the events from non-render clock domain.
> > +      * Render clock gating must be disabled when OA is enabled to
> > +      * count the events from non-render domain. Unit level clock
> > +      * gating for RCS should also be disabled.
> > +      */
> > +     I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) &
> > +                                 ~GEN7_DOP_CLOCK_GATE_ENABLE));
> > +     I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) |
> > +                               GEN6_CSUNIT_CLOCK_GATE_DISABLE));
> > +
> > +     config_oa_regs(dev_priv, dev_priv->perf.oa.mux_regs,
> > +                    dev_priv->perf.oa.mux_regs_len);
> > +
> > +     /* It takes a fairly long time for a new MUX configuration to
> > +      * be be applied after these register writes. This delay
> > +      * duration was derived empirically based on the render_basic
> > +      * config but hopefully it covers the maximum configuration
> > +      * latency...
> > +      */
> > +     mdelay(100);
>
> You really want to busy spin for 100ms? msleep() perhaps!
>

Ah, oops, I forgot to change this, thanks!


>
> Did you look for some register you can observe the change in when the
> mux is reconfigured? Is even reading one of the OA registers enough?
>

Although I can't really comprehend why the delay apparently needs to be
quite so long, based on my limited understanding of some of the NOA
michroarchitecture involved here it makes some sense to me there would be a
delay that's also somewhat variable depending on the particular MUX config
and I don't know of a trick for getting explicit feedback of completion
unfortunately.

I did bring this up briefly, recently in discussion with others more
familiar with the HW side of things, but haven't had much feedback on this
so far. afaik other OS drivers aren't currently accounting for a need to
have a delay here.

For reference, 100ms was picked as I was experimenting with stepping up the
delay by orders of magnitude and found 10ms wasn't enough. Potentially I
could experiment further with delays between 10 and 100ms, but I suppose it
won't make a big difference.



>
> > +     config_oa_regs(dev_priv, dev_priv->perf.oa.b_counter_regs,
> > +                    dev_priv->perf.oa.b_counter_regs_len);
> > +
> > +     return 0;
> > +}
> > +
> > +static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
> > +{
> > +     I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) &
> > +                               ~GEN6_CSUNIT_CLOCK_GATE_DISABLE));
> > +     I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) |
> > +                                 GEN7_DOP_CLOCK_GATE_ENABLE));
> > +
> > +     I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) &
> > +                                   ~GT_NOA_ENABLE));
>
> You didn't preserve any other chicken bits during enable_metric_set.
>

Hmm, good point. I think I'll aim to preserve other bits when setting if
that works, just in case something else needs to fiddle with the same
register later.


> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20160421/5400393c/attachment-0001.html>


More information about the dri-devel mailing list