[RFC] drm/i915: Add GuC submission interface version query

Souza, Jose jose.souza at intel.com
Wed Feb 7 19:43:46 UTC 2024


On Wed, 2024-02-07 at 11:34 -0800, John Harrison wrote:
> On 2/7/2024 10:49, Tvrtko Ursulin wrote:
> > On 07/02/2024 18:12, John Harrison wrote:
> > > On 2/7/2024 03:56, Tvrtko Ursulin wrote:
> > > > From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > 
> > > > Add a new query to the GuC submission interface version.
> > > > 
> > > > Mesa intends to use this information to check for old firmware versions
> > > > with a known bug where using the render and compute command streamers
> > > > simultaneously can cause GPU hangs due issues in firmware scheduling.
> > > > 
> > > > Based on patches from Vivaik and Joonas.
> > > > 
> > > > There is a little bit of an open around the width required for 
> > > > versions.
> > > > While the GuC FW iface tells they are u8, i915 GuC code uses u32:
> > > > 
> > > >   #define CSS_SW_VERSION_UC_MAJOR               (0xFF << 16)
> > > >   #define CSS_SW_VERSION_UC_MINOR               (0xFF << 8)
> > > >   #define CSS_SW_VERSION_UC_PATCH               (0xFF << 0)
> > > > ...
> > > >   struct intel_uc_fw_ver {
> > > >           u32 major;
> > > >           u32 minor;
> > > >           u32 patch;
> > > >           u32 build;
> > > >   };
> > > This is copied from generic code which supports firmwares other than 
> > > GuC. Only GuC promises to use 8-bit version components. Other 
> > > firmwares very definitely do not. There is no open.
> > 
> > Ack.
> > 
> > > > 
> > > > So we could make the query u8, and refactor the struct intel_uc_fw_ver
> > > > to use u8, or not. To avoid any doubts on why are we assigning u32 to
> > > > u8 I simply opted to use u64. Which avoids the need to add any padding
> > > > too.
> > > I don't follow how potential 8 vs 32 confusion means jump to 64?!
> > 
> > Suggestion was to use u8 in the uapi in order to align with GuC FW ABI 
> > (or however it's called), in which case there would be:
> > 
> >    ver.major = guc->submission_version.major;
> > 
> > which would be:
> > 
> >    (u8) = (u32)
> > 
> > And I was anticipating someone not liking that either. Using too wide 
> > u64 simply avoids the need to add a padding element to the uapi struct.
> > 
> > If you are positive we need to include a branch number, even though it 
> > does not seem to be implemented in the code even(*) then I can make 
> > uapi 4x u32 and achieve the same.
> It's not implemented in the code because we've never had to, and it is 
> yet another train wreck waiting to happen. There are a bunch of issues 
> at different levels that need to be resolved. But that is all in the 
> kernel and/or firmware and so can be added by a later kernel update when 
> necessary. However, if the UMDs are not already taking it into account 
> or its not even in the UAPI, then we can't back fill in the kernel 
> later, we are just broken.

This sounds to me like a firmware version for internal testing or for pre-production HW, would any branched firmware be released to customers?

> 
> > 
> > (*)
> > static void uc_unpack_css_version(struct intel_uc_fw_ver *ver, u32 
> > css_value)
> > {
> >     /* Get version numbers from the CSS header */
> >     ver->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR, css_value);
> >     ver->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR, css_value);
> >     ver->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, css_value);
> > }
> > 
> > No branch field in the CSS header?
> I think there is, it's just not officially implemented yet.
> 
> > 
> > And Why is UMD supposed to reject a non-zero branch? Like how would 
> > 1.1.3.0 be fine and 1.1.3.1 be bad? I don't get it. But anyway, I can 
> > respin if you definitely confirm.
> Because that is backwards. The branch number goes at the front.
> 
> So, for example (using made up numbers, I don't recall offhand what 
> versions we have where) say we currently have 0.1.3.0 in tip and 0.1.1.0 
> in the last LTS. We then need to ship a critical security fix and back 
> port it to the LTS. Tip becomes 0.1.3.1 but the LTS can't become 0.1.1.1 
> because that version already exists in the history of tip and does not 
> contain the fix. So the LTS gets branched to 1.1.0.0. We then have both 
> branches potentially moving forwards with completely independent versioning.
> 
> Exactly the same as 5.8.x, 5.9,y, 6.0.z, etc in the Linux kernel 
> versioning. You cannot make any assumptions about what might be in 
> 1.4.5.6 compared to 0.1.2.3. 1.4.5.6 could actually 0.1.0.3 with a stack 
> of security fixes but none of the features, workarounds or bug fixes 
> that are in 0.1.2.3.
> 
> Hence, if the branch number changes then all bets are off. You have to 
> start over and reject anything you do not explicitly know about.
> 
> This is why we were saying that exposing version numbers to UMDs breaks 
> down horribly as soon as we have to start branching. There is no clean 
> or simple way to do this.
> 
> John.
> 
> 
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > > 
> > > > Compile tested only.
> > > > 
> > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > > > Cc: Jose Souza <jose.souza at intel.com>
> > > > Cc: Sagar Ghuge <sagar.ghuge at intel.com>
> > > > Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > > > Cc: John Harrison <John.C.Harrison at Intel.com>
> > > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > Cc: Jani Nikula <jani.nikula at intel.com>
> > > > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > Cc: Vivaik Balasubrawmanian <vivaik.balasubrawmanian at intel.com>
> > > > ---
> > > >   drivers/gpu/drm/i915/i915_query.c | 32 
> > > > +++++++++++++++++++++++++++++++
> > > >   include/uapi/drm/i915_drm.h       | 11 +++++++++++
> > > >   2 files changed, 43 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_query.c 
> > > > b/drivers/gpu/drm/i915/i915_query.c
> > > > index 00871ef99792..999687f6a3d4 100644
> > > > --- a/drivers/gpu/drm/i915/i915_query.c
> > > > +++ b/drivers/gpu/drm/i915/i915_query.c
> > > > @@ -551,6 +551,37 @@ static int query_hwconfig_blob(struct 
> > > > drm_i915_private *i915,
> > > >       return hwconfig->size;
> > > >   }
> > > > +static int
> > > > +query_guc_submission_version(struct drm_i915_private *i915,
> > > > +                 struct drm_i915_query_item *query)
> > > > +{
> > > > +    struct drm_i915_query_guc_submission_version __user *query_ptr =
> > > > +                        u64_to_user_ptr(query->data_ptr);
> > > > +    struct drm_i915_query_guc_submission_version ver;
> > > > +    struct intel_guc *guc = &to_gt(i915)->uc.guc;
> > > > +    const size_t size = sizeof(ver);
> > > > +    int ret;
> > > > +
> > > > +    if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
> > > > +        return -ENODEV;
> > > > +
> > > > +    ret = copy_query_item(&ver, size, size, query);
> > > > +    if (ret != 0)
> > > > +        return ret;
> > > > +
> > > > +    if (ver.major || ver.minor || ver.patch)
> > > > +        return -EINVAL;
> > > > +
> > > > +    ver.major = guc->submission_version.major;
> > > > +    ver.minor = guc->submission_version.minor;
> > > > +    ver.patch = guc->submission_version.patch;
> > > This needs to include the branch version (currently set to zero) in 
> > > the definition. And the UMD needs to barf if branch comes back as 
> > > non-zero. I.e. there is no guarantee that a branched version will 
> > > have the w/a + fix that they are wanting.
> > > 
> > > John.
> > > 
> > > 
> > > > +
> > > > +    if (copy_to_user(query_ptr, &ver, size))
> > > > +        return -EFAULT;
> > > > +
> > > > +    return 0;
> > > > +}
> > > > +
> > > >   static int (* const i915_query_funcs[])(struct drm_i915_private 
> > > > *dev_priv,
> > > >                       struct drm_i915_query_item *query_item) = {
> > > >       query_topology_info,
> > > > @@ -559,6 +590,7 @@ static int (* const i915_query_funcs[])(struct 
> > > > drm_i915_private *dev_priv,
> > > >       query_memregion_info,
> > > >       query_hwconfig_blob,
> > > >       query_geometry_subslices,
> > > > +    query_guc_submission_version,
> > > >   };
> > > >   int i915_query_ioctl(struct drm_device *dev, void *data, struct 
> > > > drm_file *file)
> > > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > > > index 550c496ce76d..d80d9b5e1eda 100644
> > > > --- a/include/uapi/drm/i915_drm.h
> > > > +++ b/include/uapi/drm/i915_drm.h
> > > > @@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
> > > >        *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct 
> > > > drm_i915_query_memory_regions)
> > > >        *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob 
> > > > uAPI`)
> > > >        *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct 
> > > > drm_i915_query_topology_info)
> > > > +     *  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct 
> > > > drm_i915_query_guc_submission_version)
> > > >        */
> > > >       __u64 query_id;
> > > >   #define DRM_I915_QUERY_TOPOLOGY_INFO        1
> > > > @@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
> > > >   #define DRM_I915_QUERY_MEMORY_REGIONS        4
> > > >   #define DRM_I915_QUERY_HWCONFIG_BLOB        5
> > > >   #define DRM_I915_QUERY_GEOMETRY_SUBSLICES    6
> > > > +#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION    7
> > > >   /* Must be kept compact -- no holes and well documented */
> > > >       /**
> > > > @@ -3591,6 +3593,15 @@ struct drm_i915_query_memory_regions {
> > > >       struct drm_i915_memory_region_info regions[];
> > > >   };
> > > > +/**
> > > > +* struct drm_i915_query_guc_submission_version - query GuC 
> > > > submission interface version
> > > > +*/
> > > > +struct drm_i915_query_guc_submission_version {
> > > > +    __u64 major;
> > > > +    __u64 minor;
> > > > +    __u64 patch;
> > > > +};
> > > > +
> > > >   /**
> > > >    * DOC: GuC HWCONFIG blob uAPI
> > > >    *
> > > 
> 



More information about the dri-devel mailing list