[RFC PATCH 00/97] Basic GuC submission support in the i915

Tue May 11 08:01:33 UTC 2021

On 10/05/2021 19:25, Jason Ekstrand wrote:
> On May 10, 2021 08:55:55 Martin Peres <martin.peres at free.fr> wrote:
> 
>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres at free.fr> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>> upstreaming plan covered in the following RFC [1].
>>>>>
>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>> communicates with the GuC and the GuC communicates with the GPU.
>>>>
>>>> May I ask what will GuC command submission do that execlist won't/can't
>>>> do? And what would be the impact on users? Even forgetting the troubled
>>>> history of GuC (instability, performance regression, poor level of user
>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>> and doubling the amount of validation needed should come with a
>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>> mind providing the rationale behind this work?
>>>>
>>>>>
>>>>> GuC submission will be disabled by default on all current upstream
>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>> enable submission and HuC loading via the GuC. GuC submission should
>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>
>>>> What is the plan here when it comes to keeping support for execlist? I
>>>> am afraid that landing GuC support in Linux is the first step towards
>>>> killing the execlist, which would force users to use proprietary
>>>> firmwares that even most Intel engineers have little influence over.
>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC scheduling"
>>>> which states "Disable semaphores when using GuC scheduling as semaphores
>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>> that even Intel developers seem to prefer working around the GuC
>>>> firmware, rather than fixing it.
>>>
>>> Yes, landing GuC support may be the first step in removing execlist
>>> support. The inevitable reality is that GPU scheduling is coming and
>>> likely to be there only path in the not-too-distant future. (See also
>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>> judgement on whether or not this is a good thing.  I'm just reading the
>>> winds and, in my view, this is where things are headed for good or ill.
>>>
>>> In answer to the question above, the answer to "what do we gain from
>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet and,
>>> again, I'm not necessarily advocating for it, but that is likely where
>>> things are headed.
>>
>> This will be a sad day, especially since it seems fundamentally opposed
>> with any long-term support, on top of taking away user freedom to
>> fix/tweak their system when Intel won't.
>>
>>> A firmware-based submission model isn't a bad design IMO and, aside from
>>> the firmware freedom issues, I think there are actual advantages to the
>>> model. Immediately, it'll unlock a few features like parallel submission
>>> (more on that in a bit) and long-running compute because they're
>>> implemented in GuC and the work to implement them properly in the
>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>> guarantees) unlock some performance by getting the kernel out of the way.
>>
>> Oh, I definitely agree with firmware-based submission model not being a
>> bad design. I was even cheering for it in 2015. Experience with it made
>> me regret that deeply since :s
>>
>> But with the DRM scheduler being responsible for most things, I fail to
>> see what we could offload in the GuC except context switching (like
>> every other manufacturer). The problem is, the GuC does way more than
>> just switching registers in bulk, and if the number of revisions of the
>> GuC is anything to go by, it is way too complex for me to feel
>> comfortable with it.
> 
> It's more than just bulk register writes. When it comes to 
> load-balancing multiple GPU users, firmware can theoretically preempt 
> and switch faster leading to more efficient time-slicing. All we really 
> need the DRM scheduler for is handling implicit dma_fence dependencies 
> between different applications.

Right, this makes sense. However, if the GuC's interface was so simple, 
I doubt it would be at major version 60 already :s

I don't disagree with FW-based command submission, as it has a lot of 
benefits. I just don't like the route of going with a firmware no-one 
else than Intel can work on, *and* one that doesn't seem to concern 
itself with stable interfaces, and how i915 will have to deal with every 
generation using different interfaces (assuming the firmware was bug-free).

> 
> 
>>>> In the same vein, I have another concern related to the impact of GuC on
>>>> Linux's stable releases. Let's say that in 3 years, a new application
>>>> triggers a bug in command submission inside the firmware. Given that the
>>>> Linux community cannot patch the GuC firmware, how likely is it that
>>>> Intel would release a new GuC version? That would not be necessarily
>>>> such a big problem if newer versions of the GuC could easily be
>>>> backported to this potentially-decade-old Linux version, but given that
>>>> the GuC seems to have ABI-breaking changes on a monthly cadence (we are
>>>> at major version 60 *already*? :o), I would say that it is
>>>> highly-unlikely that it would not require potentially-extensive changes
>>>> to i915 to make it work, making the fix almost impossible to land in the
>>>> stable tree... Do you have a plan to mitigate this problem?
>>>>
>>>> Patches like "drm/i915/guc: Disable bonding extension with GuC
>>>> submission" also make me twitch, as this means the two command
>>>> submission paths will not be functionally equivalent, and enabling GuC
>>>> could thus introduce a user-visible regression (one app used to work,
>>>> then stopped working). Could you add in the commit's message a proof
>>>> that this would not end up being a user regression (in which case, why
>>>> have this codepath to begin with?).
>>>
>>> I'd like to address this one specifically as it's become something of a
>>> speciality of mine the past few weeks. The current bonded submission
>>> model is bad. It provides a plethora of ways for a client to back itself
>>> into a corner and doesn't actually provide the guarantees the media
>>> driver needs for its real-time high-resolution decode. It's bad enough
>>> we're seriously considering ripping it out, backwards compatibility or
>>> not. The good news is that very little that your average desktop user
>>> does depends on it: basically just real-time >4K video decode.
>>>
>>> The new parallel submit API is much better and should be the path
>>> forward. (We should have landed parallel submit the first time around.)
>>> It isn't full of corners and does let us provides actual parallel
>>> execution guarantees. It also gives the scheduler the information it
>>> needs to reliably provide those guarantees. >
>>> If we need to support the parallel submit API with the execlist
>>> back-end, that's totally possible. The choice to only implement the
>>> parallel submit API with GuC is a pragmatic one. We're trying to get
>>> upstream back on it's feet and get all the various up-and-coming bits of
>>> hardware enabled. Enabling the new API in the execlist back-end makes
>>> that pipeline longer.
>>
>> I feel your pain, and wish you all the best to get GEM less complex
>> and more manageable.
>>
>> So, if I understood correctly, the plan is just to regress 4K+ video
>> decoding for people who do not enable GuC scheduling, or did not also
>> update to a recent-enough media driver that would support this new
>> interface? If it is indeed only for over 4K videos, then whatever. If it
>> is 4K, it starts being a little bad, assuming graceful fallback to
>> CPU-based decoding. What's the test plan for this patch then? The patch
>> in its current form is definitely not making me confident.
> 
> My understanding is that it's only >4k that's affected; we've got enough 
> bandwidth on a single VCS for 4K. I'm not sure where the exact cut-off 
> is (it may be a little higher than 4k) but real-time 4k should be fine 
> and real-time 8k requires parallel submit. So we're really not cutting 
> off many use-cases. Also, as I said above, the new API can be 
> implemented with the execlist scheduler if needed. We've just 
> pragmatically deprioritized it.

Sounds like a niche-enough use case to me that I feel no user would 
complain about it.

Martin

> 
> --Jason
> 
> 
>>>> Finally, could you explain why IGT tests need to be modified to work the
>>>> GuC [1], and how much of the code in this series is covered by
>>>> existing/upcoming tests? I would expect a very solid set of tests to
>>>> minimize the maintenance burden, and enable users to reproduce potential
>>>> issues found in this new codepath (too many users run with enable_guc=3,
>>>> as can be seen on Google[2]).
>>>
>>> The IGT changes, as I understand them, are entirely around switching to
>>> the new parallel submit API. There shouldn't be a major effect to most
>>> users.
>>
>> Right, this part I followed, but failed to connect it to the GuC...
>> because I couldn't see why it would be needed (execlist requiring a lot
>> more work).
>>
>> I sincerely wish for the GuC to stay away from upstream because of the
>> above concerns (which are yet to be addressed), but if Intel were to
>> push forward with the plan to drop execlist, I can foresee a world of
>> trouble for users... That is of course unless the GuC were to be open
>> sourced, with people outside of Intel able to sign their own builds or
>> run unsigned. Failing that, let's hope the last 6 years were just a bad
>> start, and the rapid climb in major version of the GuC will magically
>> stop! I hope execlists will remain at feature parity with the GuC when
>> possible... but deplore the increase in validation needs which will only
>> hurt users in the end.
>>
>> Thanks for your honest answer,
>> Martin
>>
>>>
>>> --Jason
>