[RFC PATCH 00/97] Basic GuC submission support in the i915

Martin Peres martin.peres at free.fr
Wed May 12 06:26:14 UTC 2021


On 11/05/2021 19:39, Matthew Brost wrote:
> On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
>>> -----Original Message-----
>>> From: Martin Peres <martin.peres at free.fr>
>>> Sent: Tuesday, May 11, 2021 1:06 AM
>>> To: Daniel Vetter <daniel at ffwll.ch>
>>> Cc: Jason Ekstrand <jason at jlekstrand.net>; Brost, Matthew
>>> <matthew.brost at intel.com>; intel-gfx <intel-gfx at lists.freedesktop.org>;
>>> dri-devel <dri-devel at lists.freedesktop.org>; Ursulin, Tvrtko
>>> <tvrtko.ursulin at intel.com>; Ekstrand, Jason <jason.ekstrand at intel.com>;
>>> Ceraolo Spurio, Daniele <daniele.ceraolospurio at intel.com>; Bloomfield, Jon
>>> <jon.bloomfield at intel.com>; Vetter, Daniel <daniel.vetter at intel.com>;
>>> Harrison, John C <john.c.harrison at intel.com>
>>> Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
>>>
>>> On 10/05/2021 19:33, Daniel Vetter wrote:
>>>> On Mon, May 10, 2021 at 3:55 PM Martin Peres <martin.peres at free.fr>
>>> wrote:
>>>>>
>>>>> On 10/05/2021 02:11, Jason Ekstrand wrote:
>>>>>> On May 9, 2021 12:12:36 Martin Peres <martin.peres at free.fr> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 06/05/2021 22:13, Matthew Brost wrote:
>>>>>>>> Basic GuC submission support. This is the first bullet point in the
>>>>>>>> upstreaming plan covered in the following RFC [1].
>>>>>>>>
>>>>>>>> At a very high level the GuC is a piece of firmware which sits between
>>>>>>>> the i915 and the GPU. It offloads some of the scheduling of contexts
>>>>>>>> from the i915 and programs the GPU to submit contexts. The i915
>>>>>>>> communicates with the GuC and the GuC communicates with the
>>> GPU.
>>>>>>>
>>>>>>> May I ask what will GuC command submission do that execlist
>>> won't/can't
>>>>>>> do? And what would be the impact on users? Even forgetting the
>>> troubled
>>>>>>> history of GuC (instability, performance regression, poor level of user
>>>>>>> support, 6+ years of trying to upstream it...), adding this much code
>>>>>>> and doubling the amount of validation needed should come with a
>>>>>>> rationale making it feel worth it... and I am not seeing here. Would you
>>>>>>> mind providing the rationale behind this work?
>>>>>>>
>>>>>>>>
>>>>>>>> GuC submission will be disabled by default on all current upstream
>>>>>>>> platforms behind a module parameter - enable_guc. A value of 3 will
>>>>>>>> enable submission and HuC loading via the GuC. GuC submission
>>> should
>>>>>>>> work on all gen11+ platforms assuming the GuC firmware is present.
>>>>>>>
>>>>>>> What is the plan here when it comes to keeping support for execlist? I
>>>>>>> am afraid that landing GuC support in Linux is the first step towards
>>>>>>> killing the execlist, which would force users to use proprietary
>>>>>>> firmwares that even most Intel engineers have little influence over.
>>>>>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC
>>> scheduling"
>>>>>>> which states "Disable semaphores when using GuC scheduling as
>>> semaphores
>>>>>>> are broken in the current GuC firmware." is anything to go by, it means
>>>>>>> that even Intel developers seem to prefer working around the GuC
>>>>>>> firmware, rather than fixing it.
>>>>>>
>>>>>> Yes, landing GuC support may be the first step in removing execlist
>>>>>> support. The inevitable reality is that GPU scheduling is coming and
>>>>>> likely to be there only path in the not-too-distant future. (See also
>>>>>> the ongoing thread with AMD about fences.) I'm not going to pass
>>>>>> judgement on whether or not this is a good thing.  I'm just reading the
>>>>>> winds and, in my view, this is where things are headed for good or ill.
>>>>>>
>>>>>> In answer to the question above, the answer to "what do we gain from
>>>>>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
>>> and,
>>>>>> again, I'm not necessarily advocating for it, but that is likely where
>>>>>> things are headed.
>>>>>
>>>>> This will be a sad day, especially since it seems fundamentally opposed
>>>>> with any long-term support, on top of taking away user freedom to
>>>>> fix/tweak their system when Intel won't.
>>>>>
>>>>>> A firmware-based submission model isn't a bad design IMO and, aside
>>> from
>>>>>> the firmware freedom issues, I think there are actual advantages to the
>>>>>> model. Immediately, it'll unlock a few features like parallel submission
>>>>>> (more on that in a bit) and long-running compute because they're
>>>>>> implemented in GuC and the work to implement them properly in the
>>>>>> execlist scheduler is highly non-trivial. Longer term, it may (no
>>>>>> guarantees) unlock some performance by getting the kernel out of the
>>> way.
>>>>>
>>>>> Oh, I definitely agree with firmware-based submission model not being a
>>>>> bad design. I was even cheering for it in 2015. Experience with it made
>>>>> me regret that deeply since :s
>>>>>
>>>>> But with the DRM scheduler being responsible for most things, I fail to
>>>>> see what we could offload in the GuC except context switching (like
>>>>> every other manufacturer). The problem is, the GuC does way more than
>>>>> just switching registers in bulk, and if the number of revisions of the
>>>>> GuC is anything to go by, it is way too complex for me to feel
>>>>> comfortable with it.
>>>>
>>>> We need to flesh out that part of the plan more, but we're not going
>>>> to use drm scheduler for everything. It's only to handle the dma-fence
>>>> legacy side of things, which means:
>>>> - timeout handling for batches that take too long
>>>> - dma_fence dependency sorting/handling
>>>> - boosting of context from display flips (currently missing, needs to
>>>> be ported from drm/i915)
>>>>
>>>> The actual round-robin/preempt/priority handling is still left to the
>>>> backend, in this case here the fw. So there's large chunks of
>>>> code/functionality where drm/scheduler wont be involved in, and like
>>>> Jason says: The hw direction winds definitely blow in the direction
>>>> that this is all handled in hw.
>>>
>>> The plan makes sense for a SRIOV-enable GPU, yes.
>>>
>>> However, if the GuC is actually helping i915, then why not open source
>>> it and drop all the issues related to its stability? Wouldn't it be the
>>> perfect solution, as it would allow dropping execlist support for newer
>>> HW, and it would eliminate the concerns about maintenance of stable
>>> releases of Linux?
>>
>> That the major version of the FW is high is not due to bugs - Bugs don't trigger major version bumps anyway. 

Of course, where did I say they would?

> Only interface changes increment the major version, and we do add features, to keep it relevant to the evolving hardware and OS landscape. When only Windows used GuC there was no reason not to minimize interface creep - GuC and KMD are released as an atomic bundle on Windows. With Linux, this is no longer the case, and has not been for some time.

AFAIK, Intel has been shipping GuC to customers since gen9, and upstream 
has been supporting command submission (albeit in a broken form) for 
years... until Michal finally disabled it after I asked for it to a bit 
over 2 years ago[1], when GuC was at major version 32.

So... not sure I would trust your word so blindly here.

[1] 
https://patchwork.freedesktop.org/patch/297997/?series=58760&rev=2#comment_559594
>>
> 
> Jon hit the nail on head here - there hasn't been any reason not to bump the GuC
> version / change the interface until there is code upstream using the GuC. Once
> we push something that totally changes. Once SRIOV lands we literally can't the
> interface without breaking the world. Our goal is to this right before
> somethings lands, hence the high version number.

Good to hear! But Intel will continue to change the interface as new 
generations are made, so what is the support model for older GPUs / 
kernels which will be stuck on older major revisions?

> 
> Matt
> 
>> We have been using GuC as the sole mechanism for submission on Windows since Gen8, and it has proven very reliable. This is in large part because it is simple, and designed from day 1 as a cohesive solution alongside the hardware.

Exactly, the GuC was designed with Windows' GPU model... which is not 
directly applicable to Linux. Also, Windows does not care as much about 
submission latency, whereas most Linux users still depend on glamor for 
2D acceleration which is pretty much the biggest stress test for command 
submission latency. Also, features not used by the Windows driver or 
used in a different way are/will get broken (see the semaphore patch 
that works around it).

>>
>> Will there be bugs in the future? Of course. It's a new i915 backend. There are bugs in the execlist backend too, and the runlist backend, and the majority of real-world software ever written. But the i915 GuC backend is way simpler than execlist, much easier to understand, and therefore much easier to maintain. It's a net win for i915 and Linux.

I am more than willing to accept the fact that the interface would be 
easier to work with, and I welcome anything that will simplify the 
driver... but not at the expense of regressing the user experience. One 
has to prove more than *just* code maintainability.

Feel free to iterate/land the code, but enabling guc-based command 
submission is waaaaayyyy too early, no matter how much you want it. This 
patch will remain a NACK from me until I see more of the plan to support 
*users* who are willing to use a proprietary firmware, performance 
analysis, what's the plan for users who will not want to use it, and 
what are the capabilities of GuC which could be used for privilege 
escalation and what is done to mitigate that.

Thanks,
Martin

>>
>> Jon


More information about the dri-devel mailing list