[RFC] drm/xe/guc: Don't support GuC older GuC 70.x releases

Fri Feb 9 06:01:26 UTC 2024

On Thu, Feb 08, 2024 at 04:29:55PM -0800, Daniele Ceraolo Spurio wrote:
>
>
>On 2/7/2024 12:40 PM, Lucas De Marchi wrote:
>>On Wed, Feb 07, 2024 at 10:34:07AM -0800, Daniele Ceraolo Spurio wrote:
>>>
>>>
>>>On 2/7/2024 8:42 AM, Lucas De Marchi wrote:
>>>>+Gustavo who is dealing with DMC firmware lately
>>>>
>>>>On Wed, Feb 07, 2024 at 03:30:59AM +0000, Matthew Brost wrote:
>>>>>On Tue, Feb 06, 2024 at 05:18:50PM -0800, John Harrison wrote:
>>>>>>On 2/6/2024 15:41, Daniele Ceraolo Spurio wrote:
>>>>>>>Supporting older GuC versions comes with baggage, both on the coding
>>>>>>>side (due to interfaces only being available from a certain version
>>>>>>>onwards) and on the testing side (due to having to make 
>>>>>>>sure
>>>>>>the driver
>>>>>>>works as expected with older GuCs).
>>>>>>>Since all of our Xe platform are still under force probe, we haven't
>>>>>>>committed to support any specific GuC version and we therefore don't
>>>>>>>need to support the older once, which means that we can 
>>>>>>>force
>>>>>>a bottom
>>>>>>>limit to what GuC we accept. This allows us to remove any 
>>>>>>>conditional
>>>>>>>statements based on older GuC versions and also to approach newer
>>>>>>>additions knowing that we'll never attempt to load something older
>>>>>>>than our minimum requirement.
>>>>>>>
>>>>>>>RFC: this patch sets the minimum to the current GuC version (70.19),
>>>>>>>but that can be moved one way or the other. The main aim here is
>>>>>>Ideally, this would be bumped every time we update Xe to a 
>>>>>>newer firmware
>>>>>>version right up to the point when force probe is lifted. At 
>>>>>>that point it
>>>>>>becomes fixed and we have to add the version check support 
>>>>>>back in for
>>>>>>future w/a's and features.
>>>>>>
>>>>>>Get's my vote :).
>>>>
>>>>Yeah, but see my other reply... I think we will have to wait the
>>>>firmware being available in linux-firmware for that.
>>>>
>>>>Also, let's kickstart a discussion on our process with some
>>>>possible changes so we can get it documented. I think we have a good
>>>>opportunity here to start adopting the
>>>>https://gitlab.freedesktop.org/drm/firmware repo.
>>>>
>>>>Rough idea:
>>>>
>>>>1) use intel-staging branch with tags for pull requests to
>>>>   linux-firmware, just like documented in their readme.
>>>>   IMO the naming is rather unfortunate since it would be
>>>>   good to use it for (2) below.... but since it's already used
>>>>   we can use something else.
>>>>
>>>>   this would mainly replace the use we have today for
>>>>   https://cgit.freedesktop.org/drm/drm-firmware/ ,    which 
>>>>could be retired. From  upstream linux-firmware pov the only
>>>>   change would be the remote location and that we start using tags
>>>>   for the pull requests, coming from a single branch regardless of
>>>>   the firmware (guc, huc, dmc, gsc): intel-staging. Once accepted in
>>>>   linux-firmware, the branch is fast-forwarded.
>>>
>>>I think this needs a bit more fleshing out, because before we do a 
>>>pull request, we do want to run CI on the blobs. Also, in several 
>>>occasions we went through a couple of versions before we closed on 
>>>what to push to linux-firmware (e.g. in the latest push we started 
>>>with 70.19.1 but then pushed 70.19.2), so we can't go to 
>>>intel-staging until we're actually ready to push. I think the 
>>>process you have below for mmp blobs should work for this early 
>>>testing flow as well, but we might end up with a lot of noise in 
>>>the staging-intel-for-CI branch.
>>
>>that would be a throw away branch where we push stuff to be able to test
>>on CI. I don't think the commit history matters much there. The fact
>
>That depends on how CI does things. With the current handling of 
>throwaway branches we have on drm-firmware, a CI request can 
>accidentally roll back another one. e.g., if we push a throwaway 
>branch with a GuC update and then another with a DMC update, the 
>second push will roll-back the GuC to what's on the new branch (likely 
>the linux-firmware version). That's why there was a suggestion ti use 
>a unified branch for CI as well.

not sure we are talking about the same thing. It is a unified branch for
CI: staging-intel-for-CI is where the mmp +
about-to-be-upstreamed-for-the-first-time firmware blobs are added,
regardless if it's guc, dmc, huc, etc. IMO it's much simpler since CI
basically has to take the additional firmware from this 1 branch. No
risk of rolling back another firmware because of the new one.

>
>
>>that the firmware is available to match what is in the kernel and that
>>there's a documented process for using it in my view trumps the
>>this downside.
>>
>>what I expect would be, considering the LNL case as example:
>>
>>1) Start testing with the mmp version:
>>
>>    a) Add firmware to  drm/firmware intel-staging-for-CI
>>    b) Add commit in topic/xe-for-CI on the kernel side to make
>>       use of that firmware
>>
>>2) Ooops, that has bugs
>>
>>    a) add a second mmp firmware to drm/firmware intel-staging-for-CI
>>    b) replace commit in topic/xe-for-CI on the kernel side
>>
>>3) we think we are good, let's try for real
>>
>>    a) Add lnl_guc_70.bin to drm/firmware intel-staging-for-CI
>>    b) replace commit in topic/xe-for-CI on the kernel side
>>
>>4) yay, it worked
>>
>>    a) Add that lnl_gu_70.bin firmware to intel-staging branch and
>>       prepare pull request to linux-firmware
>>    b) move patch from topic/xe-for-CI to drm-xe-next: i.e., rebase
>>       topic/xe-for-CI on top of drm-xe-next leaving that commit as
>>       first one. git push topic/xe-for-CI, dim push drm-xe-next (or
>>       implement the logic in dim to push 2 branches)
>>
>>    We may need some time between (a) and (b) depending on where we
>>    are on the kernel release cycle: we don't want to submit a
>>    kernel pull request before the firmware is available @
>>    linux-firmware repo.
>>
>>Note that the fact we are using mmp makes it more complex, although
>>explicit.  Going direct with lnl_gu_70.bin would also work and avoid
>>updating the commits on the kernel side.
>
>This works for a completely new release. For updating an existing 
>release, we'll have to push, potentially multiple times, all the 
>*_guc_70.bin binaries to intel-staging-for-CI. Just to be clear, I 
>have nothing against this, just noting that it would generate a lot of 
>noise in that branch and potentially use a lot of space on disk.
>
>>
>>>
>>>We also need some rules to handle the case where there is already 
>>>a PR in flight and we need to push some more blobs. This might be 
>>>as easy as the committer seeing that there are commits on top of 
>>>master, replying to the previous PR to deprecate it, and then 
>>>generating a new PR with all the blobs.
>>
>>the pull requests to linux-firmware would come from tags, not a branch.
>>So you have (tip of the branch is on top):
>>
>>    o <intel-staging> intel: Add lnl_guc_70.bin
>>    o <refs/tags/intel-2024-01-30> intel: Update dg2_guc_70.bin  <-- 
>>last in flight pull request
>>    o intel: Add lnl_dmc.bin
>>    o <origin/main> ....  <--  where linux-firmware is at
>>
>>Looking at amd-staging, it seems to match what they are doing:
>>https://gitlab.freedesktop.org/drm/firmware/-/commits/amd-staging?ref_type=heads
>>
>>
>>see the amd-$DATE tags
>
>Sorry I wasn't very clear in my comment, what I wanted to point out 
>was that if we are on a unified branch and we have the PR against a 
>specific tag (intel-2024-01-30 in your example) already in flight, how 
>do we generate a new PR for the newer commit that comes after the tag 
>(and which will have its own new tag)? Does git do some tag magic and 
>handle it for us, or do we need to generate a new PR that supersedes 
>the one in flight?

humn... there is no magic, the old tag is an ascendent path of the new
one. But as I said, just coordinating with the few people updating
firmware who/when will do the pull request should be sufficient for
avoiding a pull request when there's already another one in flight.

>
>
>>
>>>
>>>
>>>>
>>>>2) mmp firmware versions are only ever pushed to a separate 
>>>>staging-intel-for-CI
>>>>   branch. There is no pull request in the mailing for this. We 
>>>>can either
>>>>   push directly to the branch or create MRs in gitlab. CI would start
>>>>   using this branch for the extra firmware for platforms instead of
>>>>   whatever it's using today to process the pull requests from the
>>>>   mailing list.  Or whatever it's using, because I don't know 
>>>>and don't
>>>>   see it documented anywhere.
>>>
>>>As long as the CI team is ok with this, I'm all for it.
>>>
>>>>
>>>>   The patch on the kernel side to use the mmp firmware is only ever
>>>>   pushed to the topic/xe-for-CI branch since a) the firmware is 
>>>>coming from
>>>>   a non-official location and b) end users and distro packaging
>>>>   shouldn't see a warning when building the kernel due to a possibly
>>>>   missing firmware
>>>>3) Raising firmware version requirement for past platforms used as
>>>>   SDV can be done **unless** it raises the major version. 
>>>>That's because
>>>>   end users would start seeing the warning that we avoided in (2).
>>>
>>>Who are the end users here? If we're talking about older 
>>>non-officially supported platforms, the only users should be 
>>>developers and they should be able to handle having to update the 
>>>firmwares to a newer major versions.
>>
>>distros and any developer outside Intel. The kernel build system is
>>unaware of xe.force_probe. So if you have, after the several macros:
>>
>>MODULE_FIRMWARE("xe/tgl_guc_71.bin")
>>
>>It will show up in `modinfo -f firmware xe`. And it will show as a
>>warning when installing/packaging a kernel.
>>
>>It doesn't matter for minor/patch updates because the file name is
>>major-only and **running** with that module is protected by the
>>force_probe. The major may be updated when it's available in
>>linux-firmware, which means i915 started using it (for i915 that would
>>be "as an option, with fallback to the previous major release" of
>>course).
>
>Ok I get the concern. My assumption here was that we'd only update the 
>minimum required version if that version was in linux-firmware even 
>for minor updates, hence why I didn't see why a major update would be 
>different. I guess we could go with a more relaxed approach where we 
>allow the required minor to be updated for force-probe platforms as 
>long as the firmware is available on a public/CI branch even if it is 
>not in linux-firmware.

yep, I don't see it causing issues to end users.

>
>Getting back on track with the original purpose of this patch, are you 
>ok with setting the minimum to 70.19 if I first push the matching PVC 
>70.19 binary (via the old method for now), while we continue sorting 
>out how to manage the new repo?

yes.

Lucas De Marchi

>
>Daniele
>
>>
>>Lucas De Marchi
>>
>>>
>>>Daniele
>>>
>>>>
>>>>thoughts?
>>>>
>>>>Lucas De Marchi
>>>>
>>>>>>
>>>>>
>>>>>Mine too.
>>>>>
>>>>>With that:
>>>>>Acked-by: Matthew Brost <matthew.brost at intel.com>
>>>>>
>>>>>>John.
>>>>>>
>>>>>>>agreeing to stop supporting very old GuC releases on the 
>>>>>>>newer
>>>>>>driver.
>>>>>>>
>>>>>>>Signed-off-by: Daniele Ceraolo Spurio
>>>>>><daniele.ceraolospurio at intel.com>
>>>>>>>Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>>>>Cc: Matt Roper <matthew.d.roper at intel.com>
>>>>>>>Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>>>Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>>>---
>>>>>>>   drivers/gpu/drm/xe/xe_guc.c   | 14 ++------------
>>>>>>>   drivers/gpu/drm/xe/xe_uc_fw.c | 36
>>>>>>++++++++++++++---------------------
>>>>>>>   2 files changed, 16 insertions(+), 34 deletions(-)
>>>>>>>
>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>index 868208a39829..5e6b27aac495 100644
>>>>>>>--- a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>+++ b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>@@ -132,15 +132,10 @@ static u32 guc_ctl_ads_flags(struct
>>>>>>xe_guc *guc)
>>>>>>>       return flags;
>>>>>>>   }
>>>>>>>-#define GUC_VER(maj, min, pat)    (((maj) << 16) | ((min) 
>>>>>>><<
>>>>>>8) | (pat))
>>>>>>>-
>>>>>>>   static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>>   {
>>>>>>>       struct xe_device *xe = guc_to_xe(guc);
>>>>>>>       struct xe_gt *gt = guc_to_gt(guc);
>>>>>>>-    struct xe_uc_fw *uc_fw = &guc->fw;
>>>>>>>-    struct xe_uc_fw_version *version =
>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>>-
>>>>>>>       u32 flags = 0;
>>>>>>>       if (XE_WA(gt, 22012773006))
>>>>>>>@@ -170,13 +165,8 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>>       if (XE_WA(gt, 1509372804))
>>>>>>>           flags |= GUC_WA_RENDER_RST_RC6_EXIT;
>>>>>>>-    if (XE_WA(gt, 14018913170)) {
>>>>>>>-        if (GUC_VER(version->major, version->minor,
>>>>>>version->patch) >= GUC_VER(70, 7, 0))
>>>>>>>-            flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>>-        else
>>>>>>>-            drm_dbg(&xe->drm, "Skip WA 14018913170: GUC
>>>>>>version expected >= 70.7.0, found %u.%u.%u\n",
>>>>>>>- version->major, version->minor, version->patch);
>>>>>>>-    }
>>>>>>>+    if (XE_WA(gt, 14018913170))
>>>>>>>+        flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>>       return flags;
>>>>>>>   }
>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>index 4714f2c8d2ba..e5bf59616f3d 100644
>>>>>>>--- a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>@@ -296,36 +296,28 @@ static void uc_fw_fini(struct 
>>>>>>>drm_device
>>>>>>*drm, void *arg)
>>>>>>>xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_SELECTED);
>>>>>>>   }
>>>>>>>-static void guc_read_css_info(struct xe_uc_fw *uc_fw, 
>>>>>>>struct
>>>>>>uc_css_header *css)
>>>>>>>+static int guc_read_css_info(struct xe_uc_fw *uc_fw, 
>>>>>>>struct
>>>>>>uc_css_header *css)
>>>>>>>   {
>>>>>>>       struct xe_gt *gt = uc_fw_to_gt(uc_fw);
>>>>>>>       struct xe_uc_fw_version *release =
>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>>       struct xe_uc_fw_version *compatibility =
>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_COMPATIBILITY];
>>>>>>>       xe_gt_assert(gt, uc_fw->type == XE_UC_FW_TYPE_GUC);
>>>>>>>-    xe_gt_assert(gt, release->major >= 70);
>>>>>>>-
>>>>>>>-    if (release->major > 70 || release->minor >= 6) {
>>>>>>>-        /* v70.6.0 adds CSS header support */
>>>>>>>-        compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>>-                         css->submission_version);
>>>>>>>-        compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>>-                         css->submission_version);
>>>>>>>-        compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>-                         css->submission_version);
>>>>>>>-    } else if (release->minor >= 3) {
>>>>>>>-        /* v70.3.0 introduced v1.1.0 */
>>>>>>>-        compatibility->major = 1;
>>>>>>>-        compatibility->minor = 1;
>>>>>>>-        compatibility->patch = 0;
>>>>>>>-    } else {
>>>>>>>-        /* v70.0.0 introduced v1.0.0 */
>>>>>>>-        compatibility->major = 1;
>>>>>>>-        compatibility->minor = 0;
>>>>>>>-        compatibility->patch = 0;
>>>>>>>+
>>>>>>>+    /* We don't support GuC releases older than 70.19 */
>>>>>>>+    if (release->major < 70 || (release->major == 70 &&
>>>>>>release->minor < 19)) {
>>>>>>>+        xe_gt_err(gt, "Unsupported GuC v%u.%u! v70.19 or
>>>>>>newer is required\n",
>>>>>>>+              release->major, release->minor);
>>>>>>>+        return -EINVAL;
>>>>>>>       }
>>>>>>>+    compatibility->major = 
>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>css->submission_version);
>>>>>>>+    compatibility->minor = 
>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>css->submission_version);
>>>>>>>+    compatibility->patch = 
>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>css->submission_version);
>>>>>>>+
>>>>>>>       uc_fw->private_data_size = css->private_data_size;
>>>>>>>+
>>>>>>>+    return 0;
>>>>>>>   }
>>>>>>>   int xe_uc_fw_check_version_requirements(struct xe_uc_fw *uc_fw)
>>>>>>>@@ -424,7 +416,7 @@ static int parse_css_header(struct
>>>>>>xe_uc_fw *uc_fw, const void *fw_data, size_t
>>>>>>>       release->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>css->sw_version);
>>>>>>>       if (uc_fw->type == XE_UC_FW_TYPE_GUC)
>>>>>>>-        guc_read_css_info(uc_fw, css);
>>>>>>>+        return guc_read_css_info(uc_fw, css);
>>>>>>>       return 0;
>>>>>>>   }
>>>>>>
>>>
>