[RFC] drm/xe/guc: Don't support GuC older GuC 70.x releases
Lucas De Marchi
lucas.demarchi at intel.com
Fri Feb 9 06:01:26 UTC 2024
On Thu, Feb 08, 2024 at 04:29:55PM -0800, Daniele Ceraolo Spurio wrote:
>
>
>On 2/7/2024 12:40 PM, Lucas De Marchi wrote:
>>On Wed, Feb 07, 2024 at 10:34:07AM -0800, Daniele Ceraolo Spurio wrote:
>>>
>>>
>>>On 2/7/2024 8:42 AM, Lucas De Marchi wrote:
>>>>+Gustavo who is dealing with DMC firmware lately
>>>>
>>>>On Wed, Feb 07, 2024 at 03:30:59AM +0000, Matthew Brost wrote:
>>>>>On Tue, Feb 06, 2024 at 05:18:50PM -0800, John Harrison wrote:
>>>>>>On 2/6/2024 15:41, Daniele Ceraolo Spurio wrote:
>>>>>>>Supporting older GuC versions comes with baggage, both on the coding
>>>>>>>side (due to interfaces only being available from a certain version
>>>>>>>onwards) and on the testing side (due to having to make
>>>>>>>sure
>>>>>>the driver
>>>>>>>works as expected with older GuCs).
>>>>>>>Since all of our Xe platform are still under force probe, we haven't
>>>>>>>committed to support any specific GuC version and we therefore don't
>>>>>>>need to support the older once, which means that we can
>>>>>>>force
>>>>>>a bottom
>>>>>>>limit to what GuC we accept. This allows us to remove any
>>>>>>>conditional
>>>>>>>statements based on older GuC versions and also to approach newer
>>>>>>>additions knowing that we'll never attempt to load something older
>>>>>>>than our minimum requirement.
>>>>>>>
>>>>>>>RFC: this patch sets the minimum to the current GuC version (70.19),
>>>>>>>but that can be moved one way or the other. The main aim here is
>>>>>>Ideally, this would be bumped every time we update Xe to a
>>>>>>newer firmware
>>>>>>version right up to the point when force probe is lifted. At
>>>>>>that point it
>>>>>>becomes fixed and we have to add the version check support
>>>>>>back in for
>>>>>>future w/a's and features.
>>>>>>
>>>>>>Get's my vote :).
>>>>
>>>>Yeah, but see my other reply... I think we will have to wait the
>>>>firmware being available in linux-firmware for that.
>>>>
>>>>Also, let's kickstart a discussion on our process with some
>>>>possible changes so we can get it documented. I think we have a good
>>>>opportunity here to start adopting the
>>>>https://gitlab.freedesktop.org/drm/firmware repo.
>>>>
>>>>Rough idea:
>>>>
>>>>1) use intel-staging branch with tags for pull requests to
>>>> linux-firmware, just like documented in their readme.
>>>> IMO the naming is rather unfortunate since it would be
>>>> good to use it for (2) below.... but since it's already used
>>>> we can use something else.
>>>>
>>>> this would mainly replace the use we have today for
>>>> https://cgit.freedesktop.org/drm/drm-firmware/ , which
>>>>could be retired. From upstream linux-firmware pov the only
>>>> change would be the remote location and that we start using tags
>>>> for the pull requests, coming from a single branch regardless of
>>>> the firmware (guc, huc, dmc, gsc): intel-staging. Once accepted in
>>>> linux-firmware, the branch is fast-forwarded.
>>>
>>>I think this needs a bit more fleshing out, because before we do a
>>>pull request, we do want to run CI on the blobs. Also, in several
>>>occasions we went through a couple of versions before we closed on
>>>what to push to linux-firmware (e.g. in the latest push we started
>>>with 70.19.1 but then pushed 70.19.2), so we can't go to
>>>intel-staging until we're actually ready to push. I think the
>>>process you have below for mmp blobs should work for this early
>>>testing flow as well, but we might end up with a lot of noise in
>>>the staging-intel-for-CI branch.
>>
>>that would be a throw away branch where we push stuff to be able to test
>>on CI. I don't think the commit history matters much there. The fact
>
>That depends on how CI does things. With the current handling of
>throwaway branches we have on drm-firmware, a CI request can
>accidentally roll back another one. e.g., if we push a throwaway
>branch with a GuC update and then another with a DMC update, the
>second push will roll-back the GuC to what's on the new branch (likely
>the linux-firmware version). That's why there was a suggestion ti use
>a unified branch for CI as well.
not sure we are talking about the same thing. It is a unified branch for
CI: staging-intel-for-CI is where the mmp +
about-to-be-upstreamed-for-the-first-time firmware blobs are added,
regardless if it's guc, dmc, huc, etc. IMO it's much simpler since CI
basically has to take the additional firmware from this 1 branch. No
risk of rolling back another firmware because of the new one.
>
>
>>that the firmware is available to match what is in the kernel and that
>>there's a documented process for using it in my view trumps the
>>this downside.
>>
>>what I expect would be, considering the LNL case as example:
>>
>>1) Start testing with the mmp version:
>>
>> a) Add firmware to drm/firmware intel-staging-for-CI
>> b) Add commit in topic/xe-for-CI on the kernel side to make
>> use of that firmware
>>
>>2) Ooops, that has bugs
>>
>> a) add a second mmp firmware to drm/firmware intel-staging-for-CI
>> b) replace commit in topic/xe-for-CI on the kernel side
>>
>>3) we think we are good, let's try for real
>>
>> a) Add lnl_guc_70.bin to drm/firmware intel-staging-for-CI
>> b) replace commit in topic/xe-for-CI on the kernel side
>>
>>4) yay, it worked
>>
>> a) Add that lnl_gu_70.bin firmware to intel-staging branch and
>> prepare pull request to linux-firmware
>> b) move patch from topic/xe-for-CI to drm-xe-next: i.e., rebase
>> topic/xe-for-CI on top of drm-xe-next leaving that commit as
>> first one. git push topic/xe-for-CI, dim push drm-xe-next (or
>> implement the logic in dim to push 2 branches)
>>
>> We may need some time between (a) and (b) depending on where we
>> are on the kernel release cycle: we don't want to submit a
>> kernel pull request before the firmware is available @
>> linux-firmware repo.
>>
>>Note that the fact we are using mmp makes it more complex, although
>>explicit. Going direct with lnl_gu_70.bin would also work and avoid
>>updating the commits on the kernel side.
>
>This works for a completely new release. For updating an existing
>release, we'll have to push, potentially multiple times, all the
>*_guc_70.bin binaries to intel-staging-for-CI. Just to be clear, I
>have nothing against this, just noting that it would generate a lot of
>noise in that branch and potentially use a lot of space on disk.
>
>>
>>>
>>>We also need some rules to handle the case where there is already
>>>a PR in flight and we need to push some more blobs. This might be
>>>as easy as the committer seeing that there are commits on top of
>>>master, replying to the previous PR to deprecate it, and then
>>>generating a new PR with all the blobs.
>>
>>the pull requests to linux-firmware would come from tags, not a branch.
>>So you have (tip of the branch is on top):
>>
>> o <intel-staging> intel: Add lnl_guc_70.bin
>> o <refs/tags/intel-2024-01-30> intel: Update dg2_guc_70.bin <--
>>last in flight pull request
>> o intel: Add lnl_dmc.bin
>> o <origin/main> .... <-- where linux-firmware is at
>>
>>Looking at amd-staging, it seems to match what they are doing:
>>https://gitlab.freedesktop.org/drm/firmware/-/commits/amd-staging?ref_type=heads
>>
>>
>>see the amd-$DATE tags
>
>Sorry I wasn't very clear in my comment, what I wanted to point out
>was that if we are on a unified branch and we have the PR against a
>specific tag (intel-2024-01-30 in your example) already in flight, how
>do we generate a new PR for the newer commit that comes after the tag
>(and which will have its own new tag)? Does git do some tag magic and
>handle it for us, or do we need to generate a new PR that supersedes
>the one in flight?
humn... there is no magic, the old tag is an ascendent path of the new
one. But as I said, just coordinating with the few people updating
firmware who/when will do the pull request should be sufficient for
avoiding a pull request when there's already another one in flight.
>
>
>>
>>>
>>>
>>>>
>>>>2) mmp firmware versions are only ever pushed to a separate
>>>>staging-intel-for-CI
>>>> branch. There is no pull request in the mailing for this. We
>>>>can either
>>>> push directly to the branch or create MRs in gitlab. CI would start
>>>> using this branch for the extra firmware for platforms instead of
>>>> whatever it's using today to process the pull requests from the
>>>> mailing list. Or whatever it's using, because I don't know
>>>>and don't
>>>> see it documented anywhere.
>>>
>>>As long as the CI team is ok with this, I'm all for it.
>>>
>>>>
>>>> The patch on the kernel side to use the mmp firmware is only ever
>>>> pushed to the topic/xe-for-CI branch since a) the firmware is
>>>>coming from
>>>> a non-official location and b) end users and distro packaging
>>>> shouldn't see a warning when building the kernel due to a possibly
>>>> missing firmware
>>>>3) Raising firmware version requirement for past platforms used as
>>>> SDV can be done **unless** it raises the major version.
>>>>That's because
>>>> end users would start seeing the warning that we avoided in (2).
>>>
>>>Who are the end users here? If we're talking about older
>>>non-officially supported platforms, the only users should be
>>>developers and they should be able to handle having to update the
>>>firmwares to a newer major versions.
>>
>>distros and any developer outside Intel. The kernel build system is
>>unaware of xe.force_probe. So if you have, after the several macros:
>>
>>MODULE_FIRMWARE("xe/tgl_guc_71.bin")
>>
>>It will show up in `modinfo -f firmware xe`. And it will show as a
>>warning when installing/packaging a kernel.
>>
>>It doesn't matter for minor/patch updates because the file name is
>>major-only and **running** with that module is protected by the
>>force_probe. The major may be updated when it's available in
>>linux-firmware, which means i915 started using it (for i915 that would
>>be "as an option, with fallback to the previous major release" of
>>course).
>
>Ok I get the concern. My assumption here was that we'd only update the
>minimum required version if that version was in linux-firmware even
>for minor updates, hence why I didn't see why a major update would be
>different. I guess we could go with a more relaxed approach where we
>allow the required minor to be updated for force-probe platforms as
>long as the firmware is available on a public/CI branch even if it is
>not in linux-firmware.
yep, I don't see it causing issues to end users.
>
>Getting back on track with the original purpose of this patch, are you
>ok with setting the minimum to 70.19 if I first push the matching PVC
>70.19 binary (via the old method for now), while we continue sorting
>out how to manage the new repo?
yes.
Lucas De Marchi
>
>Daniele
>
>>
>>Lucas De Marchi
>>
>>>
>>>Daniele
>>>
>>>>
>>>>thoughts?
>>>>
>>>>Lucas De Marchi
>>>>
>>>>>>
>>>>>
>>>>>Mine too.
>>>>>
>>>>>With that:
>>>>>Acked-by: Matthew Brost <matthew.brost at intel.com>
>>>>>
>>>>>>John.
>>>>>>
>>>>>>>agreeing to stop supporting very old GuC releases on the
>>>>>>>newer
>>>>>>driver.
>>>>>>>
>>>>>>>Signed-off-by: Daniele Ceraolo Spurio
>>>>>><daniele.ceraolospurio at intel.com>
>>>>>>>Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>>>>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>>>>Cc: Matt Roper <matthew.d.roper at intel.com>
>>>>>>>Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>>>Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>>>---
>>>>>>> drivers/gpu/drm/xe/xe_guc.c | 14 ++------------
>>>>>>> drivers/gpu/drm/xe/xe_uc_fw.c | 36
>>>>>>++++++++++++++---------------------
>>>>>>> 2 files changed, 16 insertions(+), 34 deletions(-)
>>>>>>>
>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>index 868208a39829..5e6b27aac495 100644
>>>>>>>--- a/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>+++ b/drivers/gpu/drm/xe/xe_guc.c
>>>>>>>@@ -132,15 +132,10 @@ static u32 guc_ctl_ads_flags(struct
>>>>>>xe_guc *guc)
>>>>>>> return flags;
>>>>>>> }
>>>>>>>-#define GUC_VER(maj, min, pat) (((maj) << 16) | ((min)
>>>>>>><<
>>>>>>8) | (pat))
>>>>>>>-
>>>>>>> static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>> {
>>>>>>> struct xe_device *xe = guc_to_xe(guc);
>>>>>>> struct xe_gt *gt = guc_to_gt(guc);
>>>>>>>- struct xe_uc_fw *uc_fw = &guc->fw;
>>>>>>>- struct xe_uc_fw_version *version =
>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>>-
>>>>>>> u32 flags = 0;
>>>>>>> if (XE_WA(gt, 22012773006))
>>>>>>>@@ -170,13 +165,8 @@ static u32 guc_ctl_wa_flags(struct xe_guc *guc)
>>>>>>> if (XE_WA(gt, 1509372804))
>>>>>>> flags |= GUC_WA_RENDER_RST_RC6_EXIT;
>>>>>>>- if (XE_WA(gt, 14018913170)) {
>>>>>>>- if (GUC_VER(version->major, version->minor,
>>>>>>version->patch) >= GUC_VER(70, 7, 0))
>>>>>>>- flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>>- else
>>>>>>>- drm_dbg(&xe->drm, "Skip WA 14018913170: GUC
>>>>>>version expected >= 70.7.0, found %u.%u.%u\n",
>>>>>>>- version->major, version->minor, version->patch);
>>>>>>>- }
>>>>>>>+ if (XE_WA(gt, 14018913170))
>>>>>>>+ flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6;
>>>>>>> return flags;
>>>>>>> }
>>>>>>>diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>index 4714f2c8d2ba..e5bf59616f3d 100644
>>>>>>>--- a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>+++ b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>>>>@@ -296,36 +296,28 @@ static void uc_fw_fini(struct
>>>>>>>drm_device
>>>>>>*drm, void *arg)
>>>>>>>xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_SELECTED);
>>>>>>> }
>>>>>>>-static void guc_read_css_info(struct xe_uc_fw *uc_fw,
>>>>>>>struct
>>>>>>uc_css_header *css)
>>>>>>>+static int guc_read_css_info(struct xe_uc_fw *uc_fw,
>>>>>>>struct
>>>>>>uc_css_header *css)
>>>>>>> {
>>>>>>> struct xe_gt *gt = uc_fw_to_gt(uc_fw);
>>>>>>> struct xe_uc_fw_version *release =
>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_RELEASE];
>>>>>>> struct xe_uc_fw_version *compatibility =
>>>>>>&uc_fw->versions.found[XE_UC_FW_VER_COMPATIBILITY];
>>>>>>> xe_gt_assert(gt, uc_fw->type == XE_UC_FW_TYPE_GUC);
>>>>>>>- xe_gt_assert(gt, release->major >= 70);
>>>>>>>-
>>>>>>>- if (release->major > 70 || release->minor >= 6) {
>>>>>>>- /* v70.6.0 adds CSS header support */
>>>>>>>- compatibility->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>>- css->submission_version);
>>>>>>>- compatibility->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>>- css->submission_version);
>>>>>>>- compatibility->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>>- css->submission_version);
>>>>>>>- } else if (release->minor >= 3) {
>>>>>>>- /* v70.3.0 introduced v1.1.0 */
>>>>>>>- compatibility->major = 1;
>>>>>>>- compatibility->minor = 1;
>>>>>>>- compatibility->patch = 0;
>>>>>>>- } else {
>>>>>>>- /* v70.0.0 introduced v1.0.0 */
>>>>>>>- compatibility->major = 1;
>>>>>>>- compatibility->minor = 0;
>>>>>>>- compatibility->patch = 0;
>>>>>>>+
>>>>>>>+ /* We don't support GuC releases older than 70.19 */
>>>>>>>+ if (release->major < 70 || (release->major == 70 &&
>>>>>>release->minor < 19)) {
>>>>>>>+ xe_gt_err(gt, "Unsupported GuC v%u.%u! v70.19 or
>>>>>>newer is required\n",
>>>>>>>+ release->major, release->minor);
>>>>>>>+ return -EINVAL;
>>>>>>> }
>>>>>>>+ compatibility->major =
>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MAJOR,
>>>>>>css->submission_version);
>>>>>>>+ compatibility->minor =
>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_MINOR,
>>>>>>css->submission_version);
>>>>>>>+ compatibility->patch =
>>>>>>>FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>css->submission_version);
>>>>>>>+
>>>>>>> uc_fw->private_data_size = css->private_data_size;
>>>>>>>+
>>>>>>>+ return 0;
>>>>>>> }
>>>>>>> int xe_uc_fw_check_version_requirements(struct xe_uc_fw *uc_fw)
>>>>>>>@@ -424,7 +416,7 @@ static int parse_css_header(struct
>>>>>>xe_uc_fw *uc_fw, const void *fw_data, size_t
>>>>>>> release->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH,
>>>>>>css->sw_version);
>>>>>>> if (uc_fw->type == XE_UC_FW_TYPE_GUC)
>>>>>>>- guc_read_css_info(uc_fw, css);
>>>>>>>+ return guc_read_css_info(uc_fw, css);
>>>>>>> return 0;
>>>>>>> }
>>>>>>
>>>
>
More information about the Intel-xe
mailing list