[RFC] drm/xe/guc: Don't support GuC older GuC 70.x releases

Lucas De Marchi lucas.demarchi at intel.com
Wed Feb 7 20:40:21 UTC 2024


On Wed, Feb 07, 2024 at 10:34:07AM -0800, Daniele Ceraolo Spurio wrote:
>
>
>On 2/7/2024 8:42 AM, Lucas De Marchi wrote:
>>+Gustavo who is dealing with DMC firmware lately
>>
>>On Wed, Feb 07, 2024 at 03:30:59AM +0000, Matthew Brost wrote:
>>>On Tue, Feb 06, 2024 at 05:18:50PM -0800, John Harrison wrote:
>>>>On 2/6/2024 15:41, Daniele Ceraolo Spurio wrote:
>>>>> Supporting older GuC versions comes with baggage, both on the coding
>>>>> side (due to interfaces only being available from a certain version
>>>>> onwards) and on the testing side (due to having to make sure 
>>>>the driver
>>>>> works as expected with older GuCs).
>>>>> Since all of our Xe platform are still under force probe, we haven't
>>>>> committed to support any specific GuC version and we therefore don't
>>>>> need to support the older once, which means that we can force 
>>>>a bottom
>>>>> limit to what GuC we accept. This allows us to remove any conditional
>>>>> statements based on older GuC versions and also to approach newer
>>>>> additions knowing that we'll never attempt to load something older
>>>>> than our minimum requirement.
>>>>>
>>>>> RFC: this patch sets the minimum to the current GuC version (70.19),
>>>>> but that can be moved one way or the other. The main aim here is
>>>>Ideally, this would be bumped every time we update Xe to a newer 
>>>>firmware
>>>>version right up to the point when force probe is lifted. At 
>>>>that point it
>>>>becomes fixed and we have to add the version check support back in for
>>>>future w/a's and features.
>>>>
>>>>Get's my vote :).
>>
>>Yeah, but see my other reply... I think we will have to wait the
>>firmware being available in linux-firmware for that.
>>
>>Also, let's kickstart a discussion on our process with some
>>possible changes so we can get it documented. I think we have a good
>>opportunity here to start adopting the
>>https://gitlab.freedesktop.org/drm/firmware repo.
>>
>>Rough idea:
>>
>>1) use intel-staging branch with tags for pull requests to
>>   linux-firmware, just like documented in their readme.
>>   IMO the naming is rather unfortunate since it would be
>>   good to use it for (2) below.... but since it's already used
>>   we can use something else.
>>
>>   this would mainly replace the use we have today for
>>   https://cgit.freedesktop.org/drm/drm-firmware/ ,    which could 
>>be retired. From  upstream linux-firmware pov the only
>>   change would be the remote location and that we start using tags
>>   for the pull requests, coming from a single branch regardless of
>>   the firmware (guc, huc, dmc, gsc): intel-staging. Once accepted in
>>   linux-firmware, the branch is fast-forwarded.
>
>I think this needs a bit more fleshing out, because before we do a 
>pull request, we do want to run CI on the blobs. Also, in several 
>occasions we went through a couple of versions before we closed on 
>what to push to linux-firmware (e.g. in the latest push we started 
>with 70.19.1 but then pushed 70.19.2), so we can't go to intel-staging 
>until we're actually ready to push. I think the process you have below 
>for mmp blobs should work for this early testing flow as well, but we 
>might end up with a lot of noise in the staging-intel-for-CI branch.

that would be a throw away branch where we push stuff to be able to test
on CI. I don't think the commit history matters much there. The fact
that the firmware is available to match what is in the kernel and that
there's a documented process for using it in my view trumps the
this downside.

what I expect would be, considering the LNL case as example:

1) Start testing with the mmp version:

	a) Add firmware to  drm/firmware intel-staging-for-CI
	b) Add commit in topic/xe-for-CI on the kernel side to make
	   use of that firmware

2) Ooops, that has bugs

	a) add a second mmp firmware to drm/firmware intel-staging-for-CI
	b) replace commit in topic/xe-for-CI on the kernel side

3) we think we are good, let's try for real

	a) Add lnl_guc_70.bin to drm/firmware intel-staging-for-CI
	b) replace commit in topic/xe-for-CI on the kernel side

4) yay, it worked

	a) Add that lnl_gu_70.bin firmware to intel-staging branch and
	   prepare pull request to linux-firmware
	b) move patch from topic/xe-for-CI to drm-xe-next: i.e., rebase
	   topic/xe-for-CI on top of drm-xe-next leaving that commit as
	   first one. git push topic/xe-for-CI, dim push drm-xe-next (or
	   implement the logic in dim to push 2 branches)

	We may need some time between (a) and (b) depending on where we
	are on the kernel release cycle: we don't want to submit a
	kernel pull request before the firmware is available @
	linux-firmware repo.

Note that the fact we are using mmp makes it more complex, although
explicit.  Going direct with lnl_gu_70.bin would also work and avoid
updating the commits on the kernel side.

>
>We also need some rules to handle the case where there is already a PR 
>in flight and we need to push some more blobs. This might be as easy 
>as the committer seeing that there are commits on top of master, 
>replying to the previous PR to deprecate it, and then generating a new 
>PR with all the blobs.

the pull requests to linux-firmware would come from tags, not a branch.
So you have (tip of the branch is on top):

	o <intel-staging> intel: Add lnl_guc_70.bin
	o <refs/tags/intel-2024-01-30> intel: Update dg2_guc_70.bin  <-- last in flight pull request
	o intel: Add lnl_dmc.bin
	o <origin/main> ....  <--  where linux-firmware is at

Looking at amd-staging, it seems to match what they are doing:
https://gitlab.freedesktop.org/drm/firmware/-/commits/amd-staging?ref_type=heads

see the amd-$DATE tags

>
>
>>
>>2) mmp firmware versions are only ever pushed to a separate 
>>staging-intel-for-CI
>>   branch. There is no pull request in the mailing for this. We can 
>>either
>>   push directly to the branch or create MRs in gitlab. CI would start
>>   using this branch for the extra firmware for platforms instead of
>>   whatever it's using today to process the pull requests from the
>>   mailing list.  Or whatever it's using, because I don't know and don't
>>   see it documented anywhere.
>
>As long as the CI team is ok with this, I'm all for it.
>
>>
>>   The patch on the kernel side to use the mmp firmware is only ever
>>   pushed to the topic/xe-for-CI branch since a) the firmware is 
>>coming from
>>   a non-official location and b) end users and distro packaging
>>   shouldn't see a warning when building the kernel due to a possibly
>>   missing firmware
>>3) Raising firmware version requirement for past platforms used as
>>   SDV can be done **unless** it raises the major version. That's because
>>   end users would start seeing the warning that we avoided in (2).
>
>Who are the end users here? If we're talking about older 
>non-officially supported platforms, the only users should be 
>developers and they should be able to handle having to update the 
>firmwares to a newer major versions.

distros and any developer outside Intel. The kernel build system is
unaware of xe.force_probe. So if you have, after the several macros:

MODULE_FIRMWARE("xe/tgl_guc_71.bin")

It will show up in `modinfo -f firmware xe`. And it will show as a
warning when installing/packaging a kernel.

It doesn't matter for minor/patch updates because the file name is
major-only and **running** with that module is protected by the
force_probe. The major may be updated when it's available in
linux-firmware, which means i915 started using it (for i915 that would
be "as an option, with fallback to the previous major release" of
course).

Lucas De Marchi

>
>Daniele
>
>>
>>thoughts?
>>
>>Lucas De Marchi
>>
>>>>
>>>
>>>Mine too.
>>>
>>>With that:
>>>Acked-by: Matthew Brost <matthew.brost at intel.com>
>>>
>>>>John.
>>>>
>>>>> agreeing to stop supporting very old GuC releases on the newer 
>>>>driver.
>>>>>
>>>>> Signed-off-by: Daniele Ceraolo Spurio 
>>>><daniele.ceraolospurio at intel.com>
>>>>> Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/xe/xe_guc.c   | 14 ++------------
>>>>>   drivers/gpu/drm/xe/xe_uc_fw.c | 36 
>>>>++++++++++++++---------------------
>>>>>   2 files changed, 16 insertions(+), 34 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc.c 
>>>>b/drivers/gpu/drm/xe/xe_guc.c
>>>>> index 868208a39829..5e6b27aac495 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>>>>> @@ -132,15 +132,10 @@ static u32 guc_ctl_ads_flags(struct 
>>>>xe_guc *guc)
>>>>>       return flags;
>>>>>   }
>>>>> -#define GUC_VER(maj, min, pat)    (((maj) << 16) | ((min) << 
>>>>8) | (pat))
>>>>> -
>>>>>   static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>   {
>>>>>       struct xe_device *xe = guc_to_xe(guc);
>>>>>       struct xe_gt *gt = guc_to_gt(guc);
>>>>> -    struct xe_uc_fw *uc_fw = &guc->fw;
>>>>> -    struct xe_uc_fw_version *version = 
>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>> -
>>>>>       u32 flags = 0;
>>>>>       if (XE_WA(gt, 22012773006))
>>>>> @@ -170,13 +165,8 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>       if (XE_WA(gt, 1509372804))
>>>>>           flags |= GUC_WA_RENDER_RST_RC6_EXIT;
>>>>> -    if (XE_WA(gt, 14018913170)) {
>>>>> -        if (GUC_VER(version->major, version->minor, 
>>>>version->patch) >= GUC_VER(70, 7, 0))
>>>>> -            flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>> -        else
>>>>> -            drm_dbg(&xe->drm, "Skip WA 14018913170: GUC 
>>>>version expected >= 70.7.0, found %u.%u.%u\n",
>>>>> -                version->major, version->minor, version->patch);
>>>>> -    }
>>>>> +    if (XE_WA(gt, 14018913170))
>>>>> +        flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>       return flags;
>>>>>   }
>>>>> diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c 
>>>>b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>> index 4714f2c8d2ba..e5bf59616f3d 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>> @@ -296,36 +296,28 @@ static void uc_fw_fini(struct drm_device 
>>>>*drm, void *arg)
>>>>>       xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_SELECTED);
>>>>>   }
>>>>> -static void guc_read_css_info(struct xe_uc_fw *uc_fw, struct 
>>>>uc_css_header *css)
>>>>> +static int guc_read_css_info(struct xe_uc_fw *uc_fw, struct 
>>>>uc_css_header *css)
>>>>>   {
>>>>>       struct xe_gt *gt = uc_fw_to_gt(uc_fw);
>>>>>       struct xe_uc_fw_version *release = 
>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>       struct xe_uc_fw_version *compatibility = 
>>>>&uc_fw->versions.found[XE_UC_FW_VER_COMPATIBILITY];
>>>>>       xe_gt_assert(gt, uc_fw->type == XE_UC_FW_TYPE_GUC);
>>>>> -    xe_gt_assert(gt, release->major >= 70);
>>>>> -
>>>>> -    if (release->major > 70 || release->minor >= 6) {
>>>>> -        /* v70.6.0 adds CSS header support */
>>>>> -        compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>> -                         css->submission_version);
>>>>> -        compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>> -                         css->submission_version);
>>>>> -        compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>> -                         css->submission_version);
>>>>> -    } else if (release->minor >= 3) {
>>>>> -        /* v70.3.0 introduced v1.1.0 */
>>>>> -        compatibility->major = 1;
>>>>> -        compatibility->minor = 1;
>>>>> -        compatibility->patch = 0;
>>>>> -    } else {
>>>>> -        /* v70.0.0 introduced v1.0.0 */
>>>>> -        compatibility->major = 1;
>>>>> -        compatibility->minor = 0;
>>>>> -        compatibility->patch = 0;
>>>>> +
>>>>> +    /* We don't support GuC releases older than 70.19 */
>>>>> +    if (release->major < 70 || (release->major == 70 && 
>>>>release->minor < 19)) {
>>>>> +        xe_gt_err(gt, "Unsupported GuC v%u.%u! v70.19 or 
>>>>newer is required\n",
>>>>> +              release->major, release->minor);
>>>>> +        return -EINVAL;
>>>>>       }
>>>>> +    compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR, 
>>>>css->submission_version);
>>>>> +    compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR, 
>>>>css->submission_version);
>>>>> +    compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, 
>>>>css->submission_version);
>>>>> +
>>>>>       uc_fw->private_data_size = css->private_data_size;
>>>>> +
>>>>> +    return 0;
>>>>>   }
>>>>>   int xe_uc_fw_check_version_requirements(struct xe_uc_fw *uc_fw)
>>>>> @@ -424,7 +416,7 @@ static int parse_css_header(struct 
>>>>xe_uc_fw *uc_fw, const void *fw_data, size_t
>>>>>       release->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, 
>>>>css->sw_version);
>>>>>       if (uc_fw->type == XE_UC_FW_TYPE_GUC)
>>>>> -        guc_read_css_info(uc_fw, css);
>>>>> +        return guc_read_css_info(uc_fw, css);
>>>>>       return 0;
>>>>>   }
>>>>
>


More information about the Intel-xe mailing list