[Intel-gfx] [PATCH] drm/i915/oa: Reconfigure contexts on the fly

Fri Jul 5 13:06:25 UTC 2019

On 05/07/2019 15:54, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-05 13:42:51)
>> Wow nice. I didn't have the courage to actually write it, knowing how
>> easy it could be to screw an offset and write at random in GGTT...
>>
>> I only have one concern below.
>>
>> Thanks a lot,
>>
>> -Lionel
>>
>> On 05/07/2019 15:30, Chris Wilson wrote:
>>>        CTX_REG(reg_state,
>>> @@ -1692,6 +1693,150 @@ gen8_update_reg_state_unlocked(struct intel_context *ce,
>>>                intel_sseu_make_rpcs(i915, &ce->sseu));
>>>    }
>>>    
>>> +struct flex {
>>> +     i915_reg_t reg;
>>> +     u32 offset;
>>> +     u32 value;
>>> +};
>>> +
>>> +static int
>>> +gen8_store_flex(struct i915_request *rq,
>>> +             struct intel_context *ce,
>>> +             const struct flex *flex, unsigned int count)
>>> +{
>>> +     u32 offset;
>>> +     u32 *cs;
>>> +
>>> +     cs = intel_ring_begin(rq, 4 * count);
>>> +     if (IS_ERR(cs))
>>> +             return PTR_ERR(cs);
>>
>> Is the right of the kernel context large enough to hold the MI_SDIs for
>> all the contexts?
> At the moment we are using 9 registers, add in say 128 bytes tops for
> the request overhead, that's <300 bytes. The kernel context uses 4k
> rings, so enough for a few updates before we have to flush. We may have
> to wait for external rings to idle and be interrupt -- but that's the
> same as before, including the chance that we may be interrupted in the
> middle of conversion.
>
> The worst case is that we overrun the ring and we should get a juicy
> warning (GEM_BUG_ON or -ENOSPC) in that case. We can increase the
> kernel_context ring if that's an issue or just fallback to suballocating
> a batchbuffer for the updates.

Ah, thanks. I didn't notice it was one request per context to reconfigure.

Still I wouldn't want to have this fail somewhat randomly because 
something else stayed on the HW just a bit too long.

Maybe checking the available space in the ring in 
gen8_configure_all_contexts() and wait on last_request if there isn't 
enough?

-Lionel

>
>>> +static int
>>> +gen8_emit_oa_config(struct i915_request *rq,
>>> +                 struct intel_context *ce,
>>> +                 const struct i915_oa_config *oa_config,
>>> +                 bool self)
>>> +{
>>> +     struct drm_i915_private *i915 = rq->i915;
>>> +     /* The MMIO offsets for Flex EU registers aren't contiguous */
>>> +     const u32 ctx_flexeu0 = i915->perf.oa.ctx_flexeu0_offset;
>>> +#define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N))
>>> +     struct flex regs[] = {
>>> +             {
>>> +                     GEN8_R_PWR_CLK_STATE,
>>> +                     CTX_R_PWR_CLK_STATE,
>>> +                     intel_sseu_make_rpcs(i915, &ce->sseu)
>>> +             },
>>> +             {
>>> +                     GEN8_OACTXCONTROL,
>>> +                     i915->perf.oa.ctx_oactxctrl_offset,
>>> +                     ((i915->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
>>> +                      (i915->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
>>> +                      GEN8_OA_COUNTER_RESUME)
>>> +             },
>>> +             { EU_PERF_CNTL0, ctx_flexeuN(0) },
>>> +             { EU_PERF_CNTL1, ctx_flexeuN(1) },
>>> +             { EU_PERF_CNTL2, ctx_flexeuN(2) },
>>> +             { EU_PERF_CNTL3, ctx_flexeuN(3) },
>>> +             { EU_PERF_CNTL4, ctx_flexeuN(4) },
>>> +             { EU_PERF_CNTL5, ctx_flexeuN(5) },
>>> +             { EU_PERF_CNTL6, ctx_flexeuN(6) },
>>> +     };
>>> +#undef ctx_flexeuN
>>> +     int i;
>>> +
>>> +     for (i = 2; i < ARRAY_SIZE(regs); i++)
>>> +             regs[i].value = oa_config_flex_reg(oa_config, regs[i].reg);
>>
>> I guess this mapping could be done only once in
>> gen8_configure_all_contexts().
> Ah, it's just ce->sseu giving the appearance of it being context
> specific :)
> -Chris
>