[RFC] drm/i915: Add GuC submission interface version query

John Harrison john.c.harrison at intel.com
Wed Feb 7 19:52:19 UTC 2024


On 2/7/2024 11:43, Souza, Jose wrote:
> On Wed, 2024-02-07 at 11:34 -0800, John Harrison wrote:
>> On 2/7/2024 10:49, Tvrtko Ursulin wrote:
>>> On 07/02/2024 18:12, John Harrison wrote:
>>>> On 2/7/2024 03:56, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>
>>>>> Add a new query to the GuC submission interface version.
>>>>>
>>>>> Mesa intends to use this information to check for old firmware versions
>>>>> with a known bug where using the render and compute command streamers
>>>>> simultaneously can cause GPU hangs due issues in firmware scheduling.
>>>>>
>>>>> Based on patches from Vivaik and Joonas.
>>>>>
>>>>> There is a little bit of an open around the width required for
>>>>> versions.
>>>>> While the GuC FW iface tells they are u8, i915 GuC code uses u32:
>>>>>
>>>>>    #define CSS_SW_VERSION_UC_MAJOR               (0xFF << 16)
>>>>>    #define CSS_SW_VERSION_UC_MINOR               (0xFF << 8)
>>>>>    #define CSS_SW_VERSION_UC_PATCH               (0xFF << 0)
>>>>> ...
>>>>>    struct intel_uc_fw_ver {
>>>>>            u32 major;
>>>>>            u32 minor;
>>>>>            u32 patch;
>>>>>            u32 build;
>>>>>    };
>>>> This is copied from generic code which supports firmwares other than
>>>> GuC. Only GuC promises to use 8-bit version components. Other
>>>> firmwares very definitely do not. There is no open.
>>> Ack.
>>>
>>>>> So we could make the query u8, and refactor the struct intel_uc_fw_ver
>>>>> to use u8, or not. To avoid any doubts on why are we assigning u32 to
>>>>> u8 I simply opted to use u64. Which avoids the need to add any padding
>>>>> too.
>>>> I don't follow how potential 8 vs 32 confusion means jump to 64?!
>>> Suggestion was to use u8 in the uapi in order to align with GuC FW ABI
>>> (or however it's called), in which case there would be:
>>>
>>>     ver.major = guc->submission_version.major;
>>>
>>> which would be:
>>>
>>>     (u8) = (u32)
>>>
>>> And I was anticipating someone not liking that either. Using too wide
>>> u64 simply avoids the need to add a padding element to the uapi struct.
>>>
>>> If you are positive we need to include a branch number, even though it
>>> does not seem to be implemented in the code even(*) then I can make
>>> uapi 4x u32 and achieve the same.
>> It's not implemented in the code because we've never had to, and it is
>> yet another train wreck waiting to happen. There are a bunch of issues
>> at different levels that need to be resolved. But that is all in the
>> kernel and/or firmware and so can be added by a later kernel update when
>> necessary. However, if the UMDs are not already taking it into account
>> or its not even in the UAPI, then we can't back fill in the kernel
>> later, we are just broken.
> This sounds to me like a firmware version for internal testing or for pre-production HW, would any branched firmware be released to customers?
See comments below. Branching is about back porting critical fixes to 
older releases. I.e. supporting LTS releases. There is absolutely 
nothing internal only or testing related about branching.

Just because we haven't had to do so yet doesn't mean we won't need to 
do so tomorrow.

John.

>
>>> (*)
>>> static void uc_unpack_css_version(struct intel_uc_fw_ver *ver, u32
>>> css_value)
>>> {
>>>      /* Get version numbers from the CSS header */
>>>      ver->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR, css_value);
>>>      ver->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR, css_value);
>>>      ver->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, css_value);
>>> }
>>>
>>> No branch field in the CSS header?
>> I think there is, it's just not officially implemented yet.
>>
>>> And Why is UMD supposed to reject a non-zero branch? Like how would
>>> 1.1.3.0 be fine and 1.1.3.1 be bad? I don't get it. But anyway, I can
>>> respin if you definitely confirm.
>> Because that is backwards. The branch number goes at the front.
>>
>> So, for example (using made up numbers, I don't recall offhand what
>> versions we have where) say we currently have 0.1.3.0 in tip and 0.1.1.0
>> in the last LTS. We then need to ship a critical security fix and back
>> port it to the LTS. Tip becomes 0.1.3.1 but the LTS can't become 0.1.1.1
>> because that version already exists in the history of tip and does not
>> contain the fix. So the LTS gets branched to 1.1.0.0. We then have both
>> branches potentially moving forwards with completely independent versioning.
>>
>> Exactly the same as 5.8.x, 5.9,y, 6.0.z, etc in the Linux kernel
>> versioning. You cannot make any assumptions about what might be in
>> 1.4.5.6 compared to 0.1.2.3. 1.4.5.6 could actually 0.1.0.3 with a stack
>> of security fixes but none of the features, workarounds or bug fixes
>> that are in 0.1.2.3.
>>
>> Hence, if the branch number changes then all bets are off. You have to
>> start over and reject anything you do not explicitly know about.
>>
>> This is why we were saying that exposing version numbers to UMDs breaks
>> down horribly as soon as we have to start branching. There is no clean
>> or simple way to do this.
>>
>> John.
>>
>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>> Compile tested only.
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>> Cc: Kenneth Graunke <kenneth at whitecape.org>
>>>>> Cc: Jose Souza <jose.souza at intel.com>
>>>>> Cc: Sagar Ghuge <sagar.ghuge at intel.com>
>>>>> Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
>>>>> Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>> Cc: Jani Nikula <jani.nikula at intel.com>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>> Cc: Vivaik Balasubrawmanian <vivaik.balasubrawmanian at intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_query.c | 32
>>>>> +++++++++++++++++++++++++++++++
>>>>>    include/uapi/drm/i915_drm.h       | 11 +++++++++++
>>>>>    2 files changed, 43 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_query.c
>>>>> b/drivers/gpu/drm/i915/i915_query.c
>>>>> index 00871ef99792..999687f6a3d4 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_query.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_query.c
>>>>> @@ -551,6 +551,37 @@ static int query_hwconfig_blob(struct
>>>>> drm_i915_private *i915,
>>>>>        return hwconfig->size;
>>>>>    }
>>>>> +static int
>>>>> +query_guc_submission_version(struct drm_i915_private *i915,
>>>>> +                 struct drm_i915_query_item *query)
>>>>> +{
>>>>> +    struct drm_i915_query_guc_submission_version __user *query_ptr =
>>>>> +                        u64_to_user_ptr(query->data_ptr);
>>>>> +    struct drm_i915_query_guc_submission_version ver;
>>>>> +    struct intel_guc *guc = &to_gt(i915)->uc.guc;
>>>>> +    const size_t size = sizeof(ver);
>>>>> +    int ret;
>>>>> +
>>>>> +    if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
>>>>> +        return -ENODEV;
>>>>> +
>>>>> +    ret = copy_query_item(&ver, size, size, query);
>>>>> +    if (ret != 0)
>>>>> +        return ret;
>>>>> +
>>>>> +    if (ver.major || ver.minor || ver.patch)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    ver.major = guc->submission_version.major;
>>>>> +    ver.minor = guc->submission_version.minor;
>>>>> +    ver.patch = guc->submission_version.patch;
>>>> This needs to include the branch version (currently set to zero) in
>>>> the definition. And the UMD needs to barf if branch comes back as
>>>> non-zero. I.e. there is no guarantee that a branched version will
>>>> have the w/a + fix that they are wanting.
>>>>
>>>> John.
>>>>
>>>>
>>>>> +
>>>>> +    if (copy_to_user(query_ptr, &ver, size))
>>>>> +        return -EFAULT;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>>    static int (* const i915_query_funcs[])(struct drm_i915_private
>>>>> *dev_priv,
>>>>>                        struct drm_i915_query_item *query_item) = {
>>>>>        query_topology_info,
>>>>> @@ -559,6 +590,7 @@ static int (* const i915_query_funcs[])(struct
>>>>> drm_i915_private *dev_priv,
>>>>>        query_memregion_info,
>>>>>        query_hwconfig_blob,
>>>>>        query_geometry_subslices,
>>>>> +    query_guc_submission_version,
>>>>>    };
>>>>>    int i915_query_ioctl(struct drm_device *dev, void *data, struct
>>>>> drm_file *file)
>>>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>>> index 550c496ce76d..d80d9b5e1eda 100644
>>>>> --- a/include/uapi/drm/i915_drm.h
>>>>> +++ b/include/uapi/drm/i915_drm.h
>>>>> @@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
>>>>>         *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct
>>>>> drm_i915_query_memory_regions)
>>>>>         *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob
>>>>> uAPI`)
>>>>>         *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct
>>>>> drm_i915_query_topology_info)
>>>>> +     *  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct
>>>>> drm_i915_query_guc_submission_version)
>>>>>         */
>>>>>        __u64 query_id;
>>>>>    #define DRM_I915_QUERY_TOPOLOGY_INFO        1
>>>>> @@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
>>>>>    #define DRM_I915_QUERY_MEMORY_REGIONS        4
>>>>>    #define DRM_I915_QUERY_HWCONFIG_BLOB        5
>>>>>    #define DRM_I915_QUERY_GEOMETRY_SUBSLICES    6
>>>>> +#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION    7
>>>>>    /* Must be kept compact -- no holes and well documented */
>>>>>        /**
>>>>> @@ -3591,6 +3593,15 @@ struct drm_i915_query_memory_regions {
>>>>>        struct drm_i915_memory_region_info regions[];
>>>>>    };
>>>>> +/**
>>>>> +* struct drm_i915_query_guc_submission_version - query GuC
>>>>> submission interface version
>>>>> +*/
>>>>> +struct drm_i915_query_guc_submission_version {
>>>>> +    __u64 major;
>>>>> +    __u64 minor;
>>>>> +    __u64 patch;
>>>>> +};
>>>>> +
>>>>>    /**
>>>>>     * DOC: GuC HWCONFIG blob uAPI
>>>>>     *



More information about the dri-devel mailing list