[Intel-gfx] [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Fri Sep 7 09:23:36 UTC 2018
On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>
> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>> On 06/09/2018 11:22, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>>>>>>
>>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>> configurations written in the NOA muxes.
>>>>>>>
>>>>>>> One possible solution to this problem is to reprogram the NOA muxes
>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>> workaround batchbuffer but some concerns where raised about the
>>>>>>> cost
>>>>>>> of reprogramming at every context switch. This solution is also not
>>>>>>> without consequences from the userspace point of view.
>>>>>>> Reprogramming
>>>>>>> of the muxes can only happen once the powergating configuration has
>>>>>>> changed (which happens after context switch). This means for a
>>>>>>> window
>>>>>>> of time during the recording, counters recorded by the OA unit
>>>>>>> might
>>>>>>> be invalid. This requires userspace dealing with OA reports to
>>>>>>> discard
>>>>>>> the invalid values.
>>>>>>>
>>>>>>> Minimizing the reprogramming could be implemented by tracking of
>>>>>>> the
>>>>>>> last programmed configuration somewhere in GGTT and use
>>>>>>> MI_PREDICATE
>>>>>>> to discard some of the programming commands, but the command
>>>>>>> streamer
>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>> workaround batchbuffer.
>>>>>>>
>>>>>>> Another solution, which this change implements, is to simply
>>>>>>> disregard
>>>>>>> the user requested configuration for the period of time when
>>>>>>> i915/perf
>>>>>>> is active. There is no known issue with this apart from a
>>>>>>> performance
>>>>>>> penality for some media workloads that benefit from running on a
>>>>>>> partially powergated GPU. We already prevent RC6 from affecting the
>>>>>>> programming so it doesn't sound completely unreasonable to hold on
>>>>>>> powergating for the same reason.
>>>>>>>
>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>
>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>> More to_intel_context() (Tvrtko)
>>>>>>> s/dev_priv/i915/ (Tvrtko)
>>>>>>>
>>>>>>> Tvrtko Ursulin:
>>>>>>>
>>>>>>> v4:
>>>>>>> * Rebase for make_rpcs changes.
>>>>>>>
>>>>>>> v5:
>>>>>>> * Apply OA restriction from make_rpcs directly.
>>>>>>>
>>>>>>> v6:
>>>>>>> * Rebase for context image setup changes.
>>>>>>>
>>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/i915/i915_perf.c | 5 +++++
>>>>>>> drivers/gpu/drm/i915/intel_lrc.c | 30
>>>>>>> ++++++++++++++++++++----------
>>>>>>> drivers/gpu/drm/i915/intel_lrc.h | 3 +++
>>>>>>> 3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>> @@ -1677,6 +1677,11 @@ static void
>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>> CTX_REG(reg_state, state_offset, flex_regs[i],
>>>>>>> value);
>>>>>>> }
>>>>>>> +
>>>>>>> + CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>>>>>> + gen8_make_rpcs(dev_priv,
>>>>>>> + &to_intel_context(ctx,
>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>> I think there is one issue I missed on the previous iterations of
>>>>>> this
>>>>>> patch.
>>>>>>
>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is
>>>>>> parked
>>>>>> on the kernel context.
>>>>>>
>>>>>> It's supposed to update all contexts, but I think we might not be
>>>>>> able
>>>>>> to update the kernel context image while the GPU is using it.
>>>>> The kernel context is only ever taken in extremis (you are either
>>>>> parking or stalling userspace) so I don't care.
>>>>
>>>> The patch exposing the RPCS configuration to userspace will make
>>>> use of
>>>> the kernel context while OA/perf is enabled. Even if it reprograms the
>>>> locked value that will break the power configuration stability on
>>>> Gen11
>>>> (because the locked configuration will be different from the kernel
>>>> context configuration).
>>> Sure, but as you point out that's only on changing configuration.
>>>
>>> What's missing in the patch is that we only bail early if the new sseu
>>> matches the ce->sseu, but that doesn't necessarily match whats in the
>>> context due to OA. (Or maybe I missed the conversion to rpcs value and
>>> checking.)
>>> -Chris
>>>
>>
>> Yep, because the gen8_make_rpcs() post processes the values store at
>> the gem context level, we risk rerunning the kernel context to write
>> the exiting value.
>> Sorry this is all so messy :(
>
> Lets see if I managed to follow here.
>
> The current code indeed bails out at the set ctx param level if the
> requested state matches the ce->state. My thinking was that ce->state
> is the master state and whatever happens in "post processing" via
> gen8_make_rpcs should be hidden from it since the design is that the
> i915_perf.c will re-configure all contexts when the OA active status
> changes (to either direction).
>
> So I don't see a problem in those two interactions.
Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) for ICL.
You then enable OA which locks the configuration at (0x1,0xf).
The kernel context has retained its (0x1/0xff) configuration.
And after you change the config of contextA to (0x1,0x7).
This would lead to the kernel context scheduled with (0x1,0xff) while OA
is active.
>
> Apart from one, get_param_sseu will lie a bit - we can discuss about
> this one more. At one point I suggested we have two sets of masks in
> the uAPI, requested and active in a way. So userspace could query what
> it set and what is actually active.
>
> Now second issue is if i915_perf.c is able to reprogram the kernel
> config.
>
> Here its true, it will write to the context image and that will get
> overwritten by context save.
>
> If that is a problem for OA, I was initially if a throw-away second
> "kernel" context could be use to re-program the real one, but perhaps
> even simpler - what about a mmio write to program the RPCS while
> kernel context is active?
Documentation says : "This register must not be programmed directly
through CPU MMIO cycle."
Sorry :(
-
Lionel
>
> Regards,
>
> Tvrtko
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20180907/e5bf28ef/attachment.html>
More information about the Intel-gfx
mailing list