[RFC] drm/xe/guc: Don't support GuC older GuC 70.x releases
Gustavo Sousa
gustavo.sousa at intel.com
Wed Feb 14 19:22:03 UTC 2024
Quoting Lucas De Marchi (2024-02-09 03:01:26-03:00)
>On Thu, Feb 08, 2024 at 04:29:55PM -0800, Daniele Ceraolo Spurio wrote:
>>
>>
>>On 2/7/2024 12:40 PM, Lucas De Marchi wrote:
>>>On Wed, Feb 07, 2024 at 10:34:07AM -0800, Daniele Ceraolo Spurio wrote:
>>>>
>>>>
>>>>On 2/7/2024 8:42 AM, Lucas De Marchi wrote:
>>>>>+Gustavo who is dealing with DMC firmware lately
Hey, guys. Sorry for being so late for the party...
>>>>>
>>>>>On Wed, Feb 07, 2024 at 03:30:59AM +0000, Matthew Brost wrote:
>>>>>>On Tue, Feb 06, 2024 at 05:18:50PM -0800, John Harrison wrote:
>>>>>>>On 2/6/2024 15:41, Daniele Ceraolo Spurio wrote:
>>>>>>>>Supporting older GuC versions comes with baggage, both on the coding
>>>>>>>>side (due to interfaces only being available from a certain version
>>>>>>>>onwards) and on the testing side (due to having to make
>>>>>>>>sure
>>>>>>>the driver
>>>>>>>>works as expected with older GuCs).
>>>>>>>>Since all of our Xe platform are still under force probe, we haven't
>>>>>>>>committed to support any specific GuC version and we therefore don't
>>>>>>>>need to support the older once, which means that we can
>>>>>>>>force
>>>>>>>a bottom
>>>>>>>>limit to what GuC we accept. This allows us to remove any
>>>>>>>>conditional
>>>>>>>>statements based on older GuC versions and also to approach newer
>>>>>>>>additions knowing that we'll never attempt to load something older
>>>>>>>>than our minimum requirement.
>>>>>>>>
>>>>>>>>RFC: this patch sets the minimum to the current GuC version (70.19),
>>>>>>>>but that can be moved one way or the other. The main aim here is
>>>>>>>Ideally, this would be bumped every time we update Xe to a
>>>>>>>newer firmware
>>>>>>>version right up to the point when force probe is lifted. At
>>>>>>>that point it
>>>>>>>becomes fixed and we have to add the version check support
>>>>>>>back in for
>>>>>>>future w/a's and features.
>>>>>>>
>>>>>>>Get's my vote :).
>>>>>
>>>>>Yeah, but see my other reply... I think we will have to wait the
>>>>>firmware being available in linux-firmware for that.
>>>>>
>>>>>Also, let's kickstart a discussion on our process with some
>>>>>possible changes so we can get it documented. I think we have a good
>>>>>opportunity here to start adopting the
>>>>>https://gitlab.freedesktop.org/drm/firmware repo.
>>>>>
>>>>>Rough idea:
>>>>>
>>>>>1) use intel-staging branch with tags for pull requests to
>>>>> linux-firmware, just like documented in their readme.
>>>>> IMO the naming is rather unfortunate since it would be
>>>>> good to use it for (2) below.... but since it's already used
Agreed. The <vendor>-staging branches, as proposed in the README, look
more like some sort of <vendor>-next type of branch.
I think it would be good if vendors could have their own `<vendor>/*`
(or `<vendor>-*`) namespace for branch names so that, while having some
common conventions, they could also adapt parts of it to their needs.
For example, we could have branches like intel/next for (1) and intel/ci
for (2). Not sure how easy it is to do this now, though.
>>>>> we can use something else.
>>>>>
>>>>> this would mainly replace the use we have today for
>>>>> https://cgit.freedesktop.org/drm/drm-firmware/ , which
>>>>>could be retired. From upstream linux-firmware pov the only
>>>>> change would be the remote location and that we start using tags
>>>>> for the pull requests, coming from a single branch regardless of
>>>>> the firmware (guc, huc, dmc, gsc): intel-staging. Once accepted in
>>>>> linux-firmware, the branch is fast-forwarded.
>>>>
>>>>I think this needs a bit more fleshing out, because before we do a
>>>>pull request, we do want to run CI on the blobs. Also, in several
>>>>occasions we went through a couple of versions before we closed on
>>>>what to push to linux-firmware (e.g. in the latest push we started
>>>>with 70.19.1 but then pushed 70.19.2), so we can't go to
>>>>intel-staging until we're actually ready to push. I think the
>>>>process you have below for mmp blobs should work for this early
>>>>testing flow as well, but we might end up with a lot of noise in
>>>>the staging-intel-for-CI branch.
>>>
>>>that would be a throw away branch where we push stuff to be able to test
>>>on CI. I don't think the commit history matters much there. The fact
>>
>>That depends on how CI does things. With the current handling of
>>throwaway branches we have on drm-firmware, a CI request can
>>accidentally roll back another one. e.g., if we push a throwaway
>>branch with a GuC update and then another with a DMC update, the
>>second push will roll-back the GuC to what's on the new branch (likely
>>the linux-firmware version). That's why there was a suggestion ti use
>>a unified branch for CI as well.
>
>not sure we are talking about the same thing. It is a unified branch for
>CI: staging-intel-for-CI is where the mmp +
>about-to-be-upstreamed-for-the-first-time firmware blobs are added,
>regardless if it's guc, dmc, huc, etc. IMO it's much simpler since CI
>basically has to take the additional firmware from this 1 branch. No
>risk of rolling back another firmware because of the new one.
Maybe we are having some confusion here because of the term "throwaway"
for intel-staging-for-ci?
We use throwaway branches for the current process, but I guess
intel-staging-for-ci would not really be a "throwaway" branch per se and
a unified one (as already mentioned above).
I think using intel-staging-for-ci will be okay if teams take the care
of only adding/updating/removing blobs they are responsible for.
>
>>
>>
>>>that the firmware is available to match what is in the kernel and that
>>>there's a documented process for using it in my view trumps the
>>>this downside.
>>>
>>>what I expect would be, considering the LNL case as example:
>>>
>>>1) Start testing with the mmp version:
>>>
>>> a) Add firmware to drm/firmware intel-staging-for-CI
>>> b) Add commit in topic/xe-for-CI on the kernel side to make
>>> use of that firmware
How would we check for CI after (b)?
For DMC, I have been doing something similar. Differences are:
* for (a), I am using an intel-ci branch on drm/drm-firmware and send
a pull request to intel-gfx so that CI makes the mmp blobs
available;
* for (b), I send a "[CI]"-tagged patch to intel-gfx making the kernel
explicitly use that fully versioned blob path. One advantage here is
that I keep a broken DMC release from causing CI noise on existing
unrelated patch series.
>>>
>>>2) Ooops, that has bugs
>>>
>>> a) add a second mmp firmware to drm/firmware intel-staging-for-CI
>>> b) replace commit in topic/xe-for-CI on the kernel side
>>>
>>>3) we think we are good, let's try for real
>>>
>>> a) Add lnl_guc_70.bin to drm/firmware intel-staging-for-CI
>>> b) replace commit in topic/xe-for-CI on the kernel side
>>>
>>>4) yay, it worked
>>>
>>> a) Add that lnl_gu_70.bin firmware to intel-staging branch and
>>> prepare pull request to linux-firmware
>>> b) move patch from topic/xe-for-CI to drm-xe-next: i.e., rebase
>>> topic/xe-for-CI on top of drm-xe-next leaving that commit as
>>> first one. git push topic/xe-for-CI, dim push drm-xe-next (or
>>> implement the logic in dim to push 2 branches)
>>>
>>> We may need some time between (a) and (b) depending on where we
>>> are on the kernel release cycle: we don't want to submit a
>>> kernel pull request before the firmware is available @
>>> linux-firmware repo.
>>>
>>>Note that the fact we are using mmp makes it more complex, although
>>>explicit. Going direct with lnl_gu_70.bin would also work and avoid
>>>updating the commits on the kernel side.
>>
>>This works for a completely new release. For updating an existing
>>release, we'll have to push, potentially multiple times, all the
>>*_guc_70.bin binaries to intel-staging-for-CI. Just to be clear, I
>>have nothing against this, just noting that it would generate a lot of
>>noise in that branch and potentially use a lot of space on disk.
>>
>>>
>>>>
>>>>We also need some rules to handle the case where there is already
>>>>a PR in flight and we need to push some more blobs. This might be
>>>>as easy as the committer seeing that there are commits on top of
>>>>master, replying to the previous PR to deprecate it, and then
>>>>generating a new PR with all the blobs.
>>>
>>>the pull requests to linux-firmware would come from tags, not a branch.
>>>So you have (tip of the branch is on top):
>>>
>>> o <intel-staging> intel: Add lnl_guc_70.bin
>>> o <refs/tags/intel-2024-01-30> intel: Update dg2_guc_70.bin <--
>>>last in flight pull request
>>> o intel: Add lnl_dmc.bin
>>> o <origin/main> .... <-- where linux-firmware is at
>>>
>>>Looking at amd-staging, it seems to match what they are doing:
>>>https://gitlab.freedesktop.org/drm/firmware/-/commits/amd-staging?ref_type=heads
>>>
>>>
>>>see the amd-$DATE tags
>>
>>Sorry I wasn't very clear in my comment, what I wanted to point out
>>was that if we are on a unified branch and we have the PR against a
>>specific tag (intel-2024-01-30 in your example) already in flight, how
>>do we generate a new PR for the newer commit that comes after the tag
>>(and which will have its own new tag)? Does git do some tag magic and
>>handle it for us, or do we need to generate a new PR that supersedes
>>the one in flight?
>
>humn... there is no magic, the old tag is an ascendent path of the new
>one. But as I said, just coordinating with the few people updating
>firmware who/when will do the pull request should be sufficient for
>avoiding a pull request when there's already another one in flight.
While I see the benefit of having a unified intel-staging-for-ci, I
think I'm failing to see much benefit of having a unified intel-staging
here.
Wouldn't it be better if pull request were independent of each other? If
we had an intel/* (or intel-*) branching namespace, we could keep using
throwaway branches for the pull requests and have the discipline of
removing them when not needed anymore.
--
Gustavo Sousa
>
>
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>2) mmp firmware versions are only ever pushed to a separate
>>>>>staging-intel-for-CI
>>>>> branch. There is no pull request in the mailing for this. We
>>>>>can either
>>>>> push directly to the branch or create MRs in gitlab. CI would start
>>>>> using this branch for the extra firmware for platforms instead of
>>>>> whatever it's using today to process the pull requests from the
>>>>> mailing list. Or whatever it's using, because I don't know
>>>>>and don't
>>>>> see it documented anywhere.
>>>>
>>>>As long as the CI team is ok with this, I'm all for it.
>>>>
>>>>>
>>>>> The patch on the kernel side to use the mmp firmware is only ever
>>>>> pushed to the topic/xe-for-CI branch since a) the firmware is
>>>>>coming from
>>>>> a non-official location and b) end users and distro packaging
>>>>> shouldn't see a warning when building the kernel due to a possibly
>>>>> missing firmware
>>>>>3) Raising firmware version requirement for past platforms used as
>>>>> SDV can be done **unless** it raises the major version.
>>>>>That's because
>>>>> end users would start seeing the warning that we avoided in (2).
>>>>
>>>>Who are the end users here? If we're talking about older
>>>>non-officially supported platforms, the only users should be
>>>>developers and they should be able to handle having to update the
>>>>firmwares to a newer major versions.
>>>
>>>distros and any developer outside Intel. The kernel build system is
>>>unaware of xe.force_probe. So if you have, after the several macros:
>>>
>>>MODULE_FIRMWARE("xe/tgl_guc_71.bin")
>>>
>>>It will show up in `modinfo -f firmware xe`. And it will show as a
>>>warning when installing/packaging a kernel.
>>>
>>>It doesn't matter for minor/patch updates because the file name is
>>>major-only and **running** with that module is protected by the
>>>force_probe. The major may be updated when it's available in
>>>linux-firmware, which means i915 started using it (for i915 that would
>>>be "as an option, with fallback to the previous major release" of
>>>course).
>>
>>Ok I get the concern. My assumption here was that we'd only update the
>>minimum required version if that version was in linux-firmware even
>>for minor updates, hence why I didn't see why a major update would be
>>different. I guess we could go with a more relaxed approach where we
>>allow the required minor to be updated for force-probe platforms as
>>long as the firmware is available on a public/CI branch even if it is
>>not in linux-firmware.
>
>yep, I don't see it causing issues to end users.
>
>>
>>Getting back on track with the original purpose of this patch, are you
>>ok with setting the minimum to 70.19 if I first push the matching PVC
>>70.19 binary (via the old method for now), while we continue sorting
>>out how to manage the new repo?
>
>yes.
>
>Lucas De Marchi
>
>>
>>Daniele
>>
>>>
>>>Lucas De Marchi
>>>
>>>>
>>>>Daniele
>>>>
>>>>>
>>>>>thoughts?
>>>>>
>>>>>Lucas De Marchi
>>>>>
>>>>>>>
>>>>>>
>>>>>>Mine too.
>>>>>>
>>>>>>With that:
>>>>>>Acked-by: Matthew Brost <matthew.brost at intel.com>
>>>>>>
>>>>>>>John.
>>>>>>>
>>>>>>>>agreeing to stop supporting very old GuC releases on the
>>>>>>>>newer
>>>>>>>driver.
>>>>>>>>
>>>>>>>>Signed-off-by: Daniele Ceraolo Spurio
>>>>>>><daniele.ceraolospurio at intel.com>
>>>>>>>>Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>>>>>Cc: Matt Roper <matthew.d.roper at intel.com>
>>>>>>>>Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>>>>Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>>>>---
>>>>>>>> drivers/gpu/drm/xe/xe_guc.c | 14 ++------------
>>>>>>>> drivers/gpu/drm/xe/xe_uc_fw.c | 36
>>>>>>>++++++++++++++---------------------
>>>>>>>> 2 files changed, 16 insertions(+), 34 deletions(-)
>>>>>>>>
>>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>>index 868208a39829..5e6b27aac495 100644
>>>>>>>>--- a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>>+++ b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>>@@ -132,15 +132,10 @@ static u32 guc_ctl_ads_flags(struct
>>>>>>>xe_guc *guc)
>>>>>>>> return flags;
>>>>>>>> }
>>>>>>>>-#define GUC_VER(maj, min, pat) (((maj) << 16) | ((min)
>>>>>>>><<
>>>>>>>8) | (pat))
>>>>>>>>-
>>>>>>>> static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>>> {
>>>>>>>> struct xe_device *xe = guc_to_xe(guc);
>>>>>>>> struct xe_gt *gt = guc_to_gt(guc);
>>>>>>>>- struct xe_uc_fw *uc_fw = &guc->fw;
>>>>>>>>- struct xe_uc_fw_version *version =
>>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>>>-
>>>>>>>> u32 flags = 0;
>>>>>>>> if (XE_WA(gt, 22012773006))
>>>>>>>>@@ -170,13 +165,8 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>>> if (XE_WA(gt, 1509372804))
>>>>>>>> flags |= GUC_WA_RENDER_RST_RC6_EXIT;
>>>>>>>>- if (XE_WA(gt, 14018913170)) {
>>>>>>>>- if (GUC_VER(version->major, version->minor,
>>>>>>>version->patch) >= GUC_VER(70, 7, 0))
>>>>>>>>- flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>>>- else
>>>>>>>>- drm_dbg(&xe->drm, "Skip WA 14018913170: GUC
>>>>>>>version expected >= 70.7.0, found %u.%u.%u\n",
>>>>>>>>- version->major, version->minor, version->patch);
>>>>>>>>- }
>>>>>>>>+ if (XE_WA(gt, 14018913170))
>>>>>>>>+ flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>>> return flags;
>>>>>>>> }
>>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>>index 4714f2c8d2ba..e5bf59616f3d 100644
>>>>>>>>--- a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>>+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>>@@ -296,36 +296,28 @@ static void uc_fw_fini(struct
>>>>>>>>drm_device
>>>>>>>*drm, void *arg)
>>>>>>>>xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_SELECTED);
>>>>>>>> }
>>>>>>>>-static void guc_read_css_info(struct xe_uc_fw *uc_fw,
>>>>>>>>struct
>>>>>>>uc_css_header *css)
>>>>>>>>+static int guc_read_css_info(struct xe_uc_fw *uc_fw,
>>>>>>>>struct
>>>>>>>uc_css_header *css)
>>>>>>>> {
>>>>>>>> struct xe_gt *gt = uc_fw_to_gt(uc_fw);
>>>>>>>> struct xe_uc_fw_version *release =
>>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>>> struct xe_uc_fw_version *compatibility =
>>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_COMPATIBILITY];
>>>>>>>> xe_gt_assert(gt, uc_fw->type == XE_UC_FW_TYPE_GUC);
>>>>>>>>- xe_gt_assert(gt, release->major >= 70);
>>>>>>>>-
>>>>>>>>- if (release->major > 70 || release->minor >= 6) {
>>>>>>>>- /* v70.6.0 adds CSS header support */
>>>>>>>>- compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>>>- css->submission_version);
>>>>>>>>- compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>>>- css->submission_version);
>>>>>>>>- compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>>- css->submission_version);
>>>>>>>>- } else if (release->minor >= 3) {
>>>>>>>>- /* v70.3.0 introduced v1.1.0 */
>>>>>>>>- compatibility->major = 1;
>>>>>>>>- compatibility->minor = 1;
>>>>>>>>- compatibility->patch = 0;
>>>>>>>>- } else {
>>>>>>>>- /* v70.0.0 introduced v1.0.0 */
>>>>>>>>- compatibility->major = 1;
>>>>>>>>- compatibility->minor = 0;
>>>>>>>>- compatibility->patch = 0;
>>>>>>>>+
>>>>>>>>+ /* We don't support GuC releases older than 70.19 */
>>>>>>>>+ if (release->major < 70 || (release->major == 70 &&
>>>>>>>release->minor < 19)) {
>>>>>>>>+ xe_gt_err(gt, "Unsupported GuC v%u.%u! v70.19 or
>>>>>>>newer is required\n",
>>>>>>>>+ release->major, release->minor);
>>>>>>>>+ return -EINVAL;
>>>>>>>> }
>>>>>>>>+ compatibility->major =
>>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>>css->submission_version);
>>>>>>>>+ compatibility->minor =
>>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>>css->submission_version);
>>>>>>>>+ compatibility->patch =
>>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>css->submission_version);
>>>>>>>>+
>>>>>>>> uc_fw->private_data_size = css->private_data_size;
>>>>>>>>+
>>>>>>>>+ return 0;
>>>>>>>> }
>>>>>>>> int xe_uc_fw_check_version_requirements(struct xe_uc_fw *uc_fw)
>>>>>>>>@@ -424,7 +416,7 @@ static int parse_css_header(struct
>>>>>>>xe_uc_fw *uc_fw, const void *fw_data, size_t
>>>>>>>> release->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>css->sw_version);
>>>>>>>> if (uc_fw->type == XE_UC_FW_TYPE_GUC)
>>>>>>>>- guc_read_css_info(uc_fw, css);
>>>>>>>>+ return guc_read_css_info(uc_fw, css);
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>
>>>>
>>
More information about the Intel-xe
mailing list