[Intel-gfx] [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Sep 7 09:39:46 UTC 2018
On 07/09/2018 10:23, Lionel Landwerlin wrote:
> On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>>
>> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>>> On 06/09/2018 11:22, Chris Wilson wrote:
>>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>>>>>>>
>>>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>>> configurations written in the NOA muxes.
>>>>>>>>
>>>>>>>> One possible solution to this problem is to reprogram the NOA muxes
>>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>>> workaround batchbuffer but some concerns where raised about the
>>>>>>>> cost
>>>>>>>> of reprogramming at every context switch. This solution is also not
>>>>>>>> without consequences from the userspace point of view.
>>>>>>>> Reprogramming
>>>>>>>> of the muxes can only happen once the powergating configuration has
>>>>>>>> changed (which happens after context switch). This means for a
>>>>>>>> window
>>>>>>>> of time during the recording, counters recorded by the OA unit
>>>>>>>> might
>>>>>>>> be invalid. This requires userspace dealing with OA reports to
>>>>>>>> discard
>>>>>>>> the invalid values.
>>>>>>>>
>>>>>>>> Minimizing the reprogramming could be implemented by tracking of
>>>>>>>> the
>>>>>>>> last programmed configuration somewhere in GGTT and use
>>>>>>>> MI_PREDICATE
>>>>>>>> to discard some of the programming commands, but the command
>>>>>>>> streamer
>>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>>> workaround batchbuffer.
>>>>>>>>
>>>>>>>> Another solution, which this change implements, is to simply
>>>>>>>> disregard
>>>>>>>> the user requested configuration for the period of time when
>>>>>>>> i915/perf
>>>>>>>> is active. There is no known issue with this apart from a
>>>>>>>> performance
>>>>>>>> penality for some media workloads that benefit from running on a
>>>>>>>> partially powergated GPU. We already prevent RC6 from affecting the
>>>>>>>> programming so it doesn't sound completely unreasonable to hold on
>>>>>>>> powergating for the same reason.
>>>>>>>>
>>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>>
>>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>> More to_intel_context() (Tvrtko)
>>>>>>>> s/dev_priv/i915/ (Tvrtko)
>>>>>>>>
>>>>>>>> Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> v4:
>>>>>>>> * Rebase for make_rpcs changes.
>>>>>>>>
>>>>>>>> v5:
>>>>>>>> * Apply OA restriction from make_rpcs directly.
>>>>>>>>
>>>>>>>> v6:
>>>>>>>> * Rebase for context image setup changes.
>>>>>>>>
>>>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/i915/i915_perf.c | 5 +++++
>>>>>>>> drivers/gpu/drm/i915/intel_lrc.c | 30
>>>>>>>> ++++++++++++++++++++----------
>>>>>>>> drivers/gpu/drm/i915/intel_lrc.h | 3 +++
>>>>>>>> 3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> @@ -1677,6 +1677,11 @@ static void
>>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>> CTX_REG(reg_state, state_offset, flex_regs[i],
>>>>>>>> value);
>>>>>>>> }
>>>>>>>> +
>>>>>>>> + CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>>>>>>> + gen8_make_rpcs(dev_priv,
>>>>>>>> + &to_intel_context(ctx,
>>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>>> I think there is one issue I missed on the previous iterations of
>>>>>>> this
>>>>>>> patch.
>>>>>>>
>>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is
>>>>>>> parked
>>>>>>> on the kernel context.
>>>>>>>
>>>>>>> It's supposed to update all contexts, but I think we might not be
>>>>>>> able
>>>>>>> to update the kernel context image while the GPU is using it.
>>>>>> The kernel context is only ever taken in extremis (you are either
>>>>>> parking or stalling userspace) so I don't care.
>>>>>
>>>>> The patch exposing the RPCS configuration to userspace will make
>>>>> use of
>>>>> the kernel context while OA/perf is enabled. Even if it reprograms the
>>>>> locked value that will break the power configuration stability on
>>>>> Gen11
>>>>> (because the locked configuration will be different from the kernel
>>>>> context configuration).
>>>> Sure, but as you point out that's only on changing configuration.
>>>>
>>>> What's missing in the patch is that we only bail early if the new sseu
>>>> matches the ce->sseu, but that doesn't necessarily match whats in the
>>>> context due to OA. (Or maybe I missed the conversion to rpcs value and
>>>> checking.)
>>>> -Chris
>>>>
>>>
>>> Yep, because the gen8_make_rpcs() post processes the values store at
>>> the gem context level, we risk rerunning the kernel context to write
>>> the exiting value.
>>> Sorry this is all so messy :(
>>
>> Lets see if I managed to follow here.
>>
>> The current code indeed bails out at the set ctx param level if the
>> requested state matches the ce->state. My thinking was that ce->state
>> is the master state and whatever happens in "post processing" via
>> gen8_make_rpcs should be hidden from it since the design is that the
>> i915_perf.c will re-configure all contexts when the OA active status
>> changes (to either direction).
>>
>> So I don't see a problem in those two interactions.
>
>
> Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) for ICL.
>
> You then enable OA which locks the configuration at (0x1,0xf).
>
> The kernel context has retained its (0x1/0xff) configuration.
>
>
> And after you change the config of contextA to (0x1,0x7).
>
>
> This would lead to the kernel context scheduled with (0x1,0xff) while OA
> is active.
Okay that's a problem discussed in the paragraph below - that the kernel
context is not updated at all. But is it a problem for OA? Will it mess
up some counters even if kernel context isn't executing anything
interacting with them? Or is it?
>
>>
>> Apart from one, get_param_sseu will lie a bit - we can discuss about
>> this one more. At one point I suggested we have two sets of masks in
>> the uAPI, requested and active in a way. So userspace could query what
>> it set and what is actually active.
>>
>> Now second issue is if i915_perf.c is able to reprogram the kernel
>> config.
>>
>> Here its true, it will write to the context image and that will get
>> overwritten by context save.
>>
>> If that is a problem for OA, I was initially if a throw-away second
>> "kernel" context could be use to re-program the real one, but perhaps
>> even simpler - what about a mmio write to program the RPCS while
>> kernel context is active?
>
>
> Documentation says : "This register must not be programmed directly
> through CPU MMIO cycle."
>
>
> Sorry :(
Ugh.. okay, help me understand if kernel context absolutely needs to
follow the "lock" for OA to work and then we'll see what to do.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list