[RFC] drm/xe/guc: Don't support GuC older GuC 70.x releases

Gustavo Sousa gustavo.sousa at intel.com
Wed Feb 14 19:22:03 UTC 2024


Quoting Lucas De Marchi (2024-02-09 03:01:26-03:00)
>On Thu, Feb 08, 2024 at 04:29:55PM -0800, Daniele Ceraolo Spurio wrote:
>>
>>
>>On 2/7/2024 12:40 PM, Lucas De Marchi wrote:
>>>On Wed, Feb 07, 2024 at 10:34:07AM -0800, Daniele Ceraolo Spurio wrote:
>>>>
>>>>
>>>>On 2/7/2024 8:42 AM, Lucas De Marchi wrote:
>>>>>+Gustavo who is dealing with DMC firmware lately

Hey, guys. Sorry for being so late for the party...

>>>>>
>>>>>On Wed, Feb 07, 2024 at 03:30:59AM +0000, Matthew Brost wrote:
>>>>>>On Tue, Feb 06, 2024 at 05:18:50PM -0800, John Harrison wrote:
>>>>>>>On 2/6/2024 15:41, Daniele Ceraolo Spurio wrote:
>>>>>>>>Supporting older GuC versions comes with baggage, both on the coding
>>>>>>>>side (due to interfaces only being available from a certain version
>>>>>>>>onwards) and on the testing side (due to having to make 
>>>>>>>>sure
>>>>>>>the driver
>>>>>>>>works as expected with older GuCs).
>>>>>>>>Since all of our Xe platform are still under force probe, we haven't
>>>>>>>>committed to support any specific GuC version and we therefore don't
>>>>>>>>need to support the older once, which means that we can 
>>>>>>>>force
>>>>>>>a bottom
>>>>>>>>limit to what GuC we accept. This allows us to remove any 
>>>>>>>>conditional
>>>>>>>>statements based on older GuC versions and also to approach newer
>>>>>>>>additions knowing that we'll never attempt to load something older
>>>>>>>>than our minimum requirement.
>>>>>>>>
>>>>>>>>RFC: this patch sets the minimum to the current GuC version (70.19),
>>>>>>>>but that can be moved one way or the other. The main aim here is
>>>>>>>Ideally, this would be bumped every time we update Xe to a 
>>>>>>>newer firmware
>>>>>>>version right up to the point when force probe is lifted. At 
>>>>>>>that point it
>>>>>>>becomes fixed and we have to add the version check support 
>>>>>>>back in for
>>>>>>>future w/a's and features.
>>>>>>>
>>>>>>>Get's my vote :).
>>>>>
>>>>>Yeah, but see my other reply... I think we will have to wait the
>>>>>firmware being available in linux-firmware for that.
>>>>>
>>>>>Also, let's kickstart a discussion on our process with some
>>>>>possible changes so we can get it documented. I think we have a good
>>>>>opportunity here to start adopting the
>>>>>https://gitlab.freedesktop.org/drm/firmware repo.
>>>>>
>>>>>Rough idea:
>>>>>
>>>>>1) use intel-staging branch with tags for pull requests to
>>>>>   linux-firmware, just like documented in their readme.
>>>>>   IMO the naming is rather unfortunate since it would be
>>>>>   good to use it for (2) below.... but since it's already used

Agreed. The <vendor>-staging branches, as proposed in the README, look
more like some sort of <vendor>-next type of branch.

I think it would be good if vendors could have their own `<vendor>/*`
(or `<vendor>-*`) namespace for branch names so that, while having some
common conventions, they could also adapt parts of it to their needs.
For example, we could have branches like intel/next for (1) and intel/ci
for (2). Not sure how easy it is to do this now, though.

>>>>>   we can use something else.
>>>>>
>>>>>   this would mainly replace the use we have today for
>>>>>   https://cgit.freedesktop.org/drm/drm-firmware/ ,    which 
>>>>>could be retired. From  upstream linux-firmware pov the only
>>>>>   change would be the remote location and that we start using tags
>>>>>   for the pull requests, coming from a single branch regardless of
>>>>>   the firmware (guc, huc, dmc, gsc): intel-staging. Once accepted in
>>>>>   linux-firmware, the branch is fast-forwarded.
>>>>
>>>>I think this needs a bit more fleshing out, because before we do a 
>>>>pull request, we do want to run CI on the blobs. Also, in several 
>>>>occasions we went through a couple of versions before we closed on 
>>>>what to push to linux-firmware (e.g. in the latest push we started 
>>>>with 70.19.1 but then pushed 70.19.2), so we can't go to 
>>>>intel-staging until we're actually ready to push. I think the 
>>>>process you have below for mmp blobs should work for this early 
>>>>testing flow as well, but we might end up with a lot of noise in 
>>>>the staging-intel-for-CI branch.
>>>
>>>that would be a throw away branch where we push stuff to be able to test
>>>on CI. I don't think the commit history matters much there. The fact
>>
>>That depends on how CI does things. With the current handling of 
>>throwaway branches we have on drm-firmware, a CI request can 
>>accidentally roll back another one. e.g., if we push a throwaway 
>>branch with a GuC update and then another with a DMC update, the 
>>second push will roll-back the GuC to what's on the new branch (likely 
>>the linux-firmware version). That's why there was a suggestion ti use 
>>a unified branch for CI as well.
>
>not sure we are talking about the same thing. It is a unified branch for
>CI: staging-intel-for-CI is where the mmp +
>about-to-be-upstreamed-for-the-first-time firmware blobs are added,
>regardless if it's guc, dmc, huc, etc. IMO it's much simpler since CI
>basically has to take the additional firmware from this 1 branch. No
>risk of rolling back another firmware because of the new one.

Maybe we are having some confusion here because of the term "throwaway"
for intel-staging-for-ci?

We use throwaway branches for the current process, but I guess
intel-staging-for-ci would not really be a "throwaway" branch per se and
a unified one (as already mentioned above).

I think using intel-staging-for-ci will be okay if teams take the care
of only adding/updating/removing blobs they are responsible for.

>
>>
>>
>>>that the firmware is available to match what is in the kernel and that
>>>there's a documented process for using it in my view trumps the
>>>this downside.
>>>
>>>what I expect would be, considering the LNL case as example:
>>>
>>>1) Start testing with the mmp version:
>>>
>>>    a) Add firmware to  drm/firmware intel-staging-for-CI
>>>    b) Add commit in topic/xe-for-CI on the kernel side to make
>>>       use of that firmware

How would we check for CI after (b)?

For DMC, I have been doing something similar. Differences are:

  * for (a), I am using an intel-ci branch on drm/drm-firmware and send
    a pull request to intel-gfx so that CI makes the mmp blobs
    available;

  * for (b), I send a "[CI]"-tagged patch to intel-gfx making the kernel
    explicitly use that fully versioned blob path. One advantage here is
    that I keep a broken DMC release from causing CI noise on existing
    unrelated patch series.

>>>
>>>2) Ooops, that has bugs
>>>
>>>    a) add a second mmp firmware to drm/firmware intel-staging-for-CI
>>>    b) replace commit in topic/xe-for-CI on the kernel side
>>>
>>>3) we think we are good, let's try for real
>>>
>>>    a) Add lnl_guc_70.bin to drm/firmware intel-staging-for-CI
>>>    b) replace commit in topic/xe-for-CI on the kernel side
>>>
>>>4) yay, it worked
>>>
>>>    a) Add that lnl_gu_70.bin firmware to intel-staging branch and
>>>       prepare pull request to linux-firmware
>>>    b) move patch from topic/xe-for-CI to drm-xe-next: i.e., rebase
>>>       topic/xe-for-CI on top of drm-xe-next leaving that commit as
>>>       first one. git push topic/xe-for-CI, dim push drm-xe-next (or
>>>       implement the logic in dim to push 2 branches)
>>>
>>>    We may need some time between (a) and (b) depending on where we
>>>    are on the kernel release cycle: we don't want to submit a
>>>    kernel pull request before the firmware is available @
>>>    linux-firmware repo.
>>>
>>>Note that the fact we are using mmp makes it more complex, although
>>>explicit.  Going direct with lnl_gu_70.bin would also work and avoid
>>>updating the commits on the kernel side.
>>
>>This works for a completely new release. For updating an existing 
>>release, we'll have to push, potentially multiple times, all the 
>>*_guc_70.bin binaries to intel-staging-for-CI. Just to be clear, I 
>>have nothing against this, just noting that it would generate a lot of 
>>noise in that branch and potentially use a lot of space on disk.
>>
>>>
>>>>
>>>>We also need some rules to handle the case where there is already 
>>>>a PR in flight and we need to push some more blobs. This might be 
>>>>as easy as the committer seeing that there are commits on top of 
>>>>master, replying to the previous PR to deprecate it, and then 
>>>>generating a new PR with all the blobs.
>>>
>>>the pull requests to linux-firmware would come from tags, not a branch.
>>>So you have (tip of the branch is on top):
>>>
>>>    o <intel-staging> intel: Add lnl_guc_70.bin
>>>    o <refs/tags/intel-2024-01-30> intel: Update dg2_guc_70.bin  <-- 
>>>last in flight pull request
>>>    o intel: Add lnl_dmc.bin
>>>    o <origin/main> ....  <--  where linux-firmware is at
>>>
>>>Looking at amd-staging, it seems to match what they are doing:
>>>https://gitlab.freedesktop.org/drm/firmware/-/commits/amd-staging?ref_type=heads
>>>
>>>
>>>see the amd-$DATE tags
>>
>>Sorry I wasn't very clear in my comment, what I wanted to point out 
>>was that if we are on a unified branch and we have the PR against a 
>>specific tag (intel-2024-01-30 in your example) already in flight, how 
>>do we generate a new PR for the newer commit that comes after the tag 
>>(and which will have its own new tag)? Does git do some tag magic and 
>>handle it for us, or do we need to generate a new PR that supersedes 
>>the one in flight?
>
>humn... there is no magic, the old tag is an ascendent path of the new
>one. But as I said, just coordinating with the few people updating
>firmware who/when will do the pull request should be sufficient for
>avoiding a pull request when there's already another one in flight.

While I see the benefit of having a unified intel-staging-for-ci, I
think I'm failing to see much benefit of having a unified intel-staging
here.

Wouldn't it be better if pull request were independent of each other? If
we had an intel/* (or intel-*) branching namespace, we could keep using
throwaway branches for the pull requests and have the discipline of
removing them when not needed anymore.

--
Gustavo Sousa

>
>
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>2) mmp firmware versions are only ever pushed to a separate 
>>>>>staging-intel-for-CI
>>>>>   branch. There is no pull request in the mailing for this. We 
>>>>>can either
>>>>>   push directly to the branch or create MRs in gitlab. CI would start
>>>>>   using this branch for the extra firmware for platforms instead of
>>>>>   whatever it's using today to process the pull requests from the
>>>>>   mailing list.  Or whatever it's using, because I don't know 
>>>>>and don't
>>>>>   see it documented anywhere.
>>>>
>>>>As long as the CI team is ok with this, I'm all for it.
>>>>
>>>>>
>>>>>   The patch on the kernel side to use the mmp firmware is only ever
>>>>>   pushed to the topic/xe-for-CI branch since a) the firmware is 
>>>>>coming from
>>>>>   a non-official location and b) end users and distro packaging
>>>>>   shouldn't see a warning when building the kernel due to a possibly
>>>>>   missing firmware
>>>>>3) Raising firmware version requirement for past platforms used as
>>>>>   SDV can be done **unless** it raises the major version. 
>>>>>That's because
>>>>>   end users would start seeing the warning that we avoided in (2).
>>>>
>>>>Who are the end users here? If we're talking about older 
>>>>non-officially supported platforms, the only users should be 
>>>>developers and they should be able to handle having to update the 
>>>>firmwares to a newer major versions.
>>>
>>>distros and any developer outside Intel. The kernel build system is
>>>unaware of xe.force_probe. So if you have, after the several macros:
>>>
>>>MODULE_FIRMWARE("xe/tgl_guc_71.bin")
>>>
>>>It will show up in `modinfo -f firmware xe`. And it will show as a
>>>warning when installing/packaging a kernel.
>>>
>>>It doesn't matter for minor/patch updates because the file name is
>>>major-only and **running** with that module is protected by the
>>>force_probe. The major may be updated when it's available in
>>>linux-firmware, which means i915 started using it (for i915 that would
>>>be "as an option, with fallback to the previous major release" of
>>>course).
>>
>>Ok I get the concern. My assumption here was that we'd only update the 
>>minimum required version if that version was in linux-firmware even 
>>for minor updates, hence why I didn't see why a major update would be 
>>different. I guess we could go with a more relaxed approach where we 
>>allow the required minor to be updated for force-probe platforms as 
>>long as the firmware is available on a public/CI branch even if it is 
>>not in linux-firmware.
>
>yep, I don't see it causing issues to end users.
>
>>
>>Getting back on track with the original purpose of this patch, are you 
>>ok with setting the minimum to 70.19 if I first push the matching PVC 
>>70.19 binary (via the old method for now), while we continue sorting 
>>out how to manage the new repo?
>
>yes.
>
>Lucas De Marchi
>
>>
>>Daniele
>>
>>>
>>>Lucas De Marchi
>>>
>>>>
>>>>Daniele
>>>>
>>>>>
>>>>>thoughts?
>>>>>
>>>>>Lucas De Marchi
>>>>>
>>>>>>>
>>>>>>
>>>>>>Mine too.
>>>>>>
>>>>>>With that:
>>>>>>Acked-by: Matthew Brost <matthew.brost at intel.com>
>>>>>>
>>>>>>>John.
>>>>>>>
>>>>>>>>agreeing to stop supporting very old GuC releases on the 
>>>>>>>>newer
>>>>>>>driver.
>>>>>>>>
>>>>>>>>Signed-off-by: Daniele Ceraolo Spurio
>>>>>>><daniele.ceraolospurio at intel.com>
>>>>>>>>Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>>>>>Cc: Matt Roper <matthew.d.roper at intel.com>
>>>>>>>>Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>>>>Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>>>>---
>>>>>>>>   drivers/gpu/drm/xe/xe_guc.c   | 14 ++------------
>>>>>>>>   drivers/gpu/drm/xe/xe_uc_fw.c | 36
>>>>>>>++++++++++++++---------------------
>>>>>>>>   2 files changed, 16 insertions(+), 34 deletions(-)
>>>>>>>>
>>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>>index 868208a39829..5e6b27aac495 100644
>>>>>>>>--- a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>>+++ b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>>@@ -132,15 +132,10 @@ static u32 guc_ctl_ads_flags(struct
>>>>>>>xe_guc *guc)
>>>>>>>>       return flags;
>>>>>>>>   }
>>>>>>>>-#define GUC_VER(maj, min, pat)    (((maj) << 16) | ((min) 
>>>>>>>><<
>>>>>>>8) | (pat))
>>>>>>>>-
>>>>>>>>   static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>>>   {
>>>>>>>>       struct xe_device *xe = guc_to_xe(guc);
>>>>>>>>       struct xe_gt *gt = guc_to_gt(guc);
>>>>>>>>-    struct xe_uc_fw *uc_fw = &guc->fw;
>>>>>>>>-    struct xe_uc_fw_version *version =
>>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>>>-
>>>>>>>>       u32 flags = 0;
>>>>>>>>       if (XE_WA(gt, 22012773006))
>>>>>>>>@@ -170,13 +165,8 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>>>       if (XE_WA(gt, 1509372804))
>>>>>>>>           flags |= GUC_WA_RENDER_RST_RC6_EXIT;
>>>>>>>>-    if (XE_WA(gt, 14018913170)) {
>>>>>>>>-        if (GUC_VER(version->major, version->minor,
>>>>>>>version->patch) >= GUC_VER(70, 7, 0))
>>>>>>>>-            flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>>>-        else
>>>>>>>>-            drm_dbg(&xe->drm, "Skip WA 14018913170: GUC
>>>>>>>version expected >= 70.7.0, found %u.%u.%u\n",
>>>>>>>>- version->major, version->minor, version->patch);
>>>>>>>>-    }
>>>>>>>>+    if (XE_WA(gt, 14018913170))
>>>>>>>>+        flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>>>       return flags;
>>>>>>>>   }
>>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>>index 4714f2c8d2ba..e5bf59616f3d 100644
>>>>>>>>--- a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>>+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>>@@ -296,36 +296,28 @@ static void uc_fw_fini(struct 
>>>>>>>>drm_device
>>>>>>>*drm, void *arg)
>>>>>>>>xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_SELECTED);
>>>>>>>>   }
>>>>>>>>-static void guc_read_css_info(struct xe_uc_fw *uc_fw, 
>>>>>>>>struct
>>>>>>>uc_css_header *css)
>>>>>>>>+static int guc_read_css_info(struct xe_uc_fw *uc_fw, 
>>>>>>>>struct
>>>>>>>uc_css_header *css)
>>>>>>>>   {
>>>>>>>>       struct xe_gt *gt = uc_fw_to_gt(uc_fw);
>>>>>>>>       struct xe_uc_fw_version *release =
>>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>>>       struct xe_uc_fw_version *compatibility =
>>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_COMPATIBILITY];
>>>>>>>>       xe_gt_assert(gt, uc_fw->type == XE_UC_FW_TYPE_GUC);
>>>>>>>>-    xe_gt_assert(gt, release->major >= 70);
>>>>>>>>-
>>>>>>>>-    if (release->major > 70 || release->minor >= 6) {
>>>>>>>>-        /* v70.6.0 adds CSS header support */
>>>>>>>>-        compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>>>-                         css->submission_version);
>>>>>>>>-        compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>>>-                         css->submission_version);
>>>>>>>>-        compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>>-                         css->submission_version);
>>>>>>>>-    } else if (release->minor >= 3) {
>>>>>>>>-        /* v70.3.0 introduced v1.1.0 */
>>>>>>>>-        compatibility->major = 1;
>>>>>>>>-        compatibility->minor = 1;
>>>>>>>>-        compatibility->patch = 0;
>>>>>>>>-    } else {
>>>>>>>>-        /* v70.0.0 introduced v1.0.0 */
>>>>>>>>-        compatibility->major = 1;
>>>>>>>>-        compatibility->minor = 0;
>>>>>>>>-        compatibility->patch = 0;
>>>>>>>>+
>>>>>>>>+    /* We don't support GuC releases older than 70.19 */
>>>>>>>>+    if (release->major < 70 || (release->major == 70 &&
>>>>>>>release->minor < 19)) {
>>>>>>>>+        xe_gt_err(gt, "Unsupported GuC v%u.%u! v70.19 or
>>>>>>>newer is required\n",
>>>>>>>>+              release->major, release->minor);
>>>>>>>>+        return -EINVAL;
>>>>>>>>       }
>>>>>>>>+    compatibility->major = 
>>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>>css->submission_version);
>>>>>>>>+    compatibility->minor = 
>>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>>css->submission_version);
>>>>>>>>+    compatibility->patch = 
>>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>css->submission_version);
>>>>>>>>+
>>>>>>>>       uc_fw->private_data_size = css->private_data_size;
>>>>>>>>+
>>>>>>>>+    return 0;
>>>>>>>>   }
>>>>>>>>   int xe_uc_fw_check_version_requirements(struct xe_uc_fw *uc_fw)
>>>>>>>>@@ -424,7 +416,7 @@ static int parse_css_header(struct
>>>>>>>xe_uc_fw *uc_fw, const void *fw_data, size_t
>>>>>>>>       release->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>css->sw_version);
>>>>>>>>       if (uc_fw->type == XE_UC_FW_TYPE_GUC)
>>>>>>>>-        guc_read_css_info(uc_fw, css);
>>>>>>>>+        return guc_read_css_info(uc_fw, css);
>>>>>>>>       return 0;
>>>>>>>>   }
>>>>>>>
>>>>
>>


More information about the Intel-xe mailing list