[Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Mar 1 10:50:57 UTC 2022
On 28/02/2022 18:32, John Harrison wrote:
> On 2/28/2022 08:11, Tvrtko Ursulin wrote:
>> On 25/02/2022 17:39, John Harrison wrote:
>>> On 2/25/2022 09:06, Tvrtko Ursulin wrote:
>>>>
>>>> On 24/02/2022 19:19, John Harrison wrote:
>>>>
>>>> [snip]
>>>>
>>>>>>>>>> ./gt/uc/intel_guc_fwif.h: u32 execution_quantum;
>>>>>>>>>>
>>>>>>>>>> ./gt/uc/intel_guc_submission.c: desc->execution_quantum =
>>>>>>>>>> engine->props.timeslice_duration_ms * 1000;
>>>>>>>>>>
>>>>>>>>>> ./gt/intel_engine_types.h: unsigned long
>>>>>>>>>> timeslice_duration_ms;
>>>>>>>>>>
>>>>>>>>>> timeslice_store/preempt_timeout_store:
>>>>>>>>>> err = kstrtoull(buf, 0, &duration);
>>>>>>>>>>
>>>>>>>>>> So both kconfig and sysfs can already overflow GuC, not only
>>>>>>>>>> because of tick conversion internally but because at backend
>>>>>>>>>> level nothing was done for assigning 64-bit into 32-bit. Or I
>>>>>>>>>> failed to find where it is handled.
>>>>>>>>> That's why I'm adding this range check to make sure we don't
>>>>>>>>> allow overflows.
>>>>>>>>
>>>>>>>> Yes and no, this fixes it, but the first bug was not only due
>>>>>>>> GuC internal tick conversion. It was present ever since the u64
>>>>>>>> from i915 was shoved into u32 sent to GuC. So even if GuC used
>>>>>>>> the value without additional multiplication, bug was be there.
>>>>>>>> My point being when GuC backend was added timeout_ms values
>>>>>>>> should have been limited/clamped to U32_MAX. The tick discovery
>>>>>>>> is additional limit on top.
>>>>>>> I'm not disagreeing. I'm just saying that the truncation wasn't
>>>>>>> noticed until I actually tried using very long timeouts to debug
>>>>>>> a particular problem. Now that it is noticed, we need some method
>>>>>>> of range checking and this simple clamp solves all the truncation
>>>>>>> problems.
>>>>>>
>>>>>> Agreed in principle, just please mention in the commit message all
>>>>>> aspects of the problem.
>>>>>>
>>>>>> I think we can get away without a Fixes: tag since it requires
>>>>>> user fiddling to break things in unexpected ways.
>>>>>>
>>>>>> I would though put in a code a clamping which expresses both,
>>>>>> something like min(u32, ..GUC LIMIT..). So the full story is
>>>>>> documented forever. Or "if > u32 || > ..GUC LIMIT..) return
>>>>>> -EINVAL". Just in case GuC limit one day changes but u32 stays.
>>>>>> Perhaps internal ticks go away or anything and we are left with
>>>>>> plain 1:1 millisecond relationship.
>>>>> Can certainly add a comment along the lines of "GuC API only takes
>>>>> a 32bit field but that is further reduced to GUC_LIMIT due to
>>>>> internal calculations which would otherwise overflow".
>>>>>
>>>>> But if the GuC limit is > u32 then, by definition, that means the
>>>>> GuC API has changed to take a u64 instead of a u32. So there will
>>>>> no u32 truncation any more. So I'm not seeing a need to explicitly
>>>>> test the integer size when the value check covers that.
>>>>
>>>> Hmm I was thinking if the internal conversion in the GuC fw changes
>>>> so that GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS goes above u32, then to be
>>>> extra safe by documenting in code there is the additional limit of
>>>> the data structure field. Say the field was changed to take some
>>>> unit larger than a millisecond. Then the check against the GuC MAX
>>>> limit define would not be enough, unless that would account both for
>>>> internal implementation and u32 in the protocol. Maybe that is
>>>> overdefensive but I don't see that it harms. 50-50, but it's do it
>>>> once and forget so I'd do it.
>>> Huh?
>>>
>>> How can the limit be greater than a u32 if the interface only takes a
>>> u32? By definition the limit would be clamped to u32 size.
>>>
>>> If you mean that the GuC policy is in different units and those units
>>> might not overflow but ms units do, then actually that is already the
>>> case. The GuC works in us not ms. That's part of why the wrap around
>>> is so low, we have to multiply by 1000 before sending to GuC.
>>> However, that is actually irrelevant because the comparison is being
>>> done on the i915 side in i915's units. We have to scale the GuC limit
>>> to match what i915 is using. And the i915 side is u64 so if the
>>> scaling to i915 numbers overflows a u32 then who cares because that
>>> comparison can be done at 64 bits wide.
>>>
>>> If the units change then that is a backwards breaking API change that
>>> will require a manual driver code update. You can't just recompile
>>> with a new header and magically get an ms to us or ms to s conversion
>>> in your a = b assignment. The code will need to be changed to do the
>>> new unit conversion (note we already convert from ms to us, the GuC
>>> API is all expressed in us). And that code change will mean having to
>>> revisit any and all scaling, type conversions, etc. I.e. any
>>> pre-existing checks will not necessarily be valid and will need to be
>>> re-visted anyway. But as above, any scaling to GuC units has to be
>>> incorporated into the limit already because otherwise the limit would
>>> not fit in the GuC's own API.
>>
>> Yes I get that, I was just worried that u32 field in the protocol and
>> GUC_POLICY_MAX_EXEC_QUANTUM_MS defines are separate in the source code
>> and then how to protect against forgetting to update both in sync.
>>
>> Like if the protocol was changed to take nanoseconds, and firmware
>> implementation changed to support the full range, but define
>> left/forgotten at 100s. That would then overflow u32.
> Huh? If the API was updated to 'support the full range' then how can you
> get overflow by forgetting to update the limit? You could get
> unnecessary clamping, which hopefully would be noticed by whoever is
> testing the new API and/or whoever requested the change. But you can't
> get u32 overflow errors if all the code has been updated to u64.
1)
Change the protocol so that "u32 desc->execution_quantum" now takes nano seconds.
This now makes the maximum time 4.29.. seconds.
2)
Forget to update GUC_POLICY_MAX_EXEC_QUANTUM_MS from 100s, since for instance that part at that point still not part of the interface contract.
3)
User passes in 5 seconds.
Clamping check says all is good.
"engine->props.timeslice_duration_ms > GUC_POLICY_MAX_EXEC_QUANTUM_MS"
4)
Assignment was updated:
gt/uc/intel_guc_submission.c:
desc->execution_quantum = engine->props.timeslice_duration_ms * 1e6;
But someone did not realize field is u32.
desc->execution_quantum = engine->props.timeslice_duration_ms * 1e6;
Defensive solution:
if (overflows_type(engine->props.timeslice_duration_ms * 1e6, desc->execution_quantum))
drm_WARN_ON...
desc->execution_quantum = engine->props.timeslice_duration_ms * 1e6;
Regards,
Tvrtko
> John.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> John.
>>>
>>>>
>>>>>>>>>>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>>>>> ---
>>>>>>>>>>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15
>>>>>>>>>>> +++++++++++++++
>>>>>>>>>>> drivers/gpu/drm/i915/gt/sysfs_engines.c | 14
>>>>>>>>>>> ++++++++++++++
>>>>>>>>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 +++++++++
>>>>>>>>>>> 3 files changed, 38 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>>>>>>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>>>>>>>>>> index e53008b4dd05..2a1e9f36e6f5 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>>>>>>>>>> @@ -389,6 +389,21 @@ static int intel_engine_setup(struct
>>>>>>>>>>> intel_gt *gt, enum intel_engine_id id,
>>>>>>>>>>> if (GRAPHICS_VER(i915) == 12 && engine->class ==
>>>>>>>>>>> RENDER_CLASS)
>>>>>>>>>>> engine->props.preempt_timeout_ms = 0;
>>>>>>>>>>> + /* Cap timeouts to prevent overflow inside GuC */
>>>>>>>>>>> + if (intel_guc_submission_is_wanted(>->uc.guc)) {
>>>>>>>>>>> + if (engine->props.timeslice_duration_ms >
>>>>>>>>>>> GUC_POLICY_MAX_EXEC_QUANTUM_MS) {
>>>>>>>>>>
>>>>>>>>>> Hm "wanted".. There's been too much back and forth on the GuC
>>>>>>>>>> load options over the years to keep track..
>>>>>>>>>> intel_engine_uses_guc work sounds like would work and read nicer.
>>>>>>>>> I'm not adding a new feature check here. I'm just using the
>>>>>>>>> existing one. If we want to rename it yet again then that would
>>>>>>>>> be a different patch set.
>>>>>>>>
>>>>>>>> $ grep intel_engine_uses_guc . -rl
>>>>>>>> ./i915_perf.c
>>>>>>>> ./i915_request.c
>>>>>>>> ./selftests/intel_scheduler_helpers.c
>>>>>>>> ./gem/i915_gem_context.c
>>>>>>>> ./gt/intel_context.c
>>>>>>>> ./gt/intel_engine.h
>>>>>>>> ./gt/intel_engine_cs.c
>>>>>>>> ./gt/intel_engine_heartbeat.c
>>>>>>>> ./gt/intel_engine_pm.c
>>>>>>>> ./gt/intel_reset.c
>>>>>>>> ./gt/intel_lrc.c
>>>>>>>> ./gt/selftest_context.c
>>>>>>>> ./gt/selftest_engine_pm.c
>>>>>>>> ./gt/selftest_hangcheck.c
>>>>>>>> ./gt/selftest_mocs.c
>>>>>>>> ./gt/selftest_workarounds.c
>>>>>>>>
>>>>>>>> Sounds better to me than intel_guc_submission_is_wanted. What
>>>>>>>> does the reader know whether "is wanted" translates to "is
>>>>>>>> actually used". Shrug on "is wanted".
>>>>>>> Yes, but isn't '_uses' the one that hits a BUG_ON if you call it
>>>>>>> too early in the boot up sequence? I never understood why that
>>>>>>> was necessary or why we need so many different ways to ask the
>>>>>>> same question. But this version already exists and definitely
>>>>>>> works without hitting any explosions.
>>>>>>
>>>>>> No idea if it causes a bug on, doesn't in the helper itself so
>>>>>> maybe you are saying it is called too early? Might be.. I think
>>>>>> over time the nice idea we had that "setup" and "init" phases of
>>>>>> engine setup clearly separated got destroyed a bit. There would
>>>>>> always be an option to move this clamping in a later phase, once
>>>>>> the submission method is known. One could argue that if the
>>>>>> submission method is not yet known at this point, it is even wrong
>>>>>> to clamp based on something which will only be decided later.
>>>>>> Because:
>>>>>>
>>>>>> int intel_engines_init(struct intel_gt *gt)
>>>>>> {
>>>>>> int (*setup)(struct intel_engine_cs *engine);
>>>>>> struct intel_engine_cs *engine;
>>>>>> enum intel_engine_id id;
>>>>>> int err;
>>>>>>
>>>>>> if (intel_uc_uses_guc_submission(>->uc)) {
>>>>>> gt->submission_method = INTEL_SUBMISSION_GUC;
>>>>>>
>>>>>> So this uses "uses", not "wanted". Presumably the point for having
>>>>>> "wanted" and "uses" is that they can disagree, in which case if
>>>>>> you clamp early based on "wanted" that suggests it could be wrong.
>>>>>
>>>>> Okay, looks like I was getting confused with intel_guc_is_used().
>>>>> That one blows up if called too early.
>>>>>
>>>>> I'll change it to _uses_ and repost, then.
>>>>
>>>> Check that it isn't called too early, before gt->submission_setup is
>>>> set.
>>> Obviously it is because it blew up. But I am not re-writing the
>>> driver start up sequence just to use the word 'use' instead of 'want'.
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>>>> And limit to class instead of applying to all engines looks
>>>>>>>>>> like a miss.
>>>>>>>>> As per follow up email, the class limit is not applied here.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> + drm_info(&engine->i915->drm, "Warning, clamping timeslice
>>>>>>>>>>> duration to %d to prevent possibly overflow\n",
>>>>>>>>>>> + GUC_POLICY_MAX_EXEC_QUANTUM_MS);
>>>>>>>>>>> + engine->props.timeslice_duration_ms =
>>>>>>>>>>> GUC_POLICY_MAX_EXEC_QUANTUM_MS;
>>>>>>>>>>
>>>>>>>>>> I am not sure logging such message during driver load is
>>>>>>>>>> useful. Sounds more like a confused driver which starts with
>>>>>>>>>> one value and then overrides itself. I'd just silently set the
>>>>>>>>>> value appropriate for the active backend. Preemption timeout
>>>>>>>>>> kconfig text already documents the fact timeouts can get
>>>>>>>>>> overriden at runtime depending on platform+engine. So maybe
>>>>>>>>>> just add same text to timeslice kconfig.
>>>>>>>>> The point is to make people aware if they compile with
>>>>>>>>> unsupported config options. As far as I know, there is no way
>>>>>>>>> to apply range checking or other limits to config defines.
>>>>>>>>> Which means that a user would silently get unwanted behaviour.
>>>>>>>>> That seems like a bad thing to me. If the driver is confused
>>>>>>>>> because the user built it in a confused manner then we should
>>>>>>>>> let them know.
>>>>>>>>
>>>>>>>> Okay, but I think make it notice low level.
>>>>>>>>
>>>>>>>> Also consider in patch 3/3 when you triple it, and then clamp
>>>>>>>> back down here. That's even more confused state since tripling
>>>>>>>> gets nerfed. I think that's also an argument to always account
>>>>>>>> preempt timeout in heartbeat interval calculation. Haven't got
>>>>>>>> to your reply on 2/3 yet though..
>>>>>>> That sounds like even more reason to make sure the warning gets
>>>>>>> seen. The more complex the system and the more chances there are
>>>>>>> to get it wrong, the more important it is to have a nice easy to
>>>>>>> see and understand notification that it did go wrong.
>>>>>>
>>>>>> I did not disagree, just said make it notice, one level higher
>>>>>> than info! :)
>>>>> But then it won't appear unless you have explicitly said an
>>>>> elevated debug level. Whereas info appears in dmesg by default (but
>>>>> is still not classed as an error by CI and such).
>>>>
>>>> Notice is higher than info! :) If info appears by default so does
>>>> notice, warning, err, etc...
>>> Doh! I could have sworn those were the other way around.
>>>
>>> Okay. Will update to use notice :).
>>>
>>>>
>>>> #define KERN_EMERG KERN_SOH "0" /* system is unusable */
>>>> #define KERN_ALERT KERN_SOH "1" /* action must be taken
>>>> immediately */
>>>> #define KERN_CRIT KERN_SOH "2" /* critical conditions */
>>>> #define KERN_ERR KERN_SOH "3" /* error conditions */
>>>> #define KERN_WARNING KERN_SOH "4" /* warning conditions */
>>>> #define KERN_NOTICE KERN_SOH "5" /* normal but significant
>>>> condition */
>>>> #define KERN_INFO KERN_SOH "6" /* informational */
>>>> #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
>>>>
>>>>>> But also think how, if we agree to go with tripling, that you'd
>>>>>> have to consider that in the sysfs store when hearbeat timeout is
>>>>>> written, to consider whether or not to triple and error out if
>>>>>> preemption timeout is over limit.
>>>>> I see this as just setting the default values. If an end user is
>>>>> explicitly overriding the defaults then we should obey what they
>>>>> have requested. If they are changing the heartbeat interval then
>>>>> they can also change the pre-emption timeout appropriately.
>>>>
>>>> Question is can they unknowingly and without any feedback configure
>>>> a much worse state than they expect? Like when they set heartbeats
>>>> up to some value, everything is configured as you intended - but if
>>>> you go over a certain hidden limit the overall scheme degrades in
>>>> some way. What is the failure mode here if you silently let them do
>>>> that?
>>> You can always configure things to be worse than expected. If you
>>> don't understand what you are doing then any control can make things
>>> worse instead of better. The assumption is that if a user is savvy
>>> enough to be writing to sysfs overrides of kernel parameters then
>>> they know what those parameters are and what their implications are.
>>> If they want to set a very short heartbeat with a very long
>>> pre-emption timeout then its their problem if they hit frequent TDRs.
>>> Conversely, if they want to set a very long heartbeat with a very
>>> short pre-emption timeout then its still their problem if they hit
>>> frequent TDRs.
>>>
>>> But if the user explicitly requests a heartbeat period of 3s and a
>>> pre-emption timeout of 2s and the i915 arbitrarily splats their 2s
>>> and makes it 9s then that is wrong.
>>>
>>> We should give the driver defaults that work for the majority of
>>> users and then let the minority specify exactly what they need.
>>>
>>> And there is no silent or hidden limit. If the user specifies a value
>>> too large then they will get -EINVAL. Nothing hidden or silent about
>>> that. Any other values are legal and the behaviour will be whatever
>>> has been requested.
>>>
>>> John.
>>>
>>>
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>
>
More information about the Intel-gfx
mailing list