[PATCH v7 4/4] drm/xe/vf: Fixup CTB send buffer messages after migration

Lis, Tomasz tomasz.lis at intel.com
Fri Apr 11 14:34:01 UTC 2025


On 10.04.2025 20:24, Michal Wajdeczko wrote:
> On 09.04.2025 23:09, Lis, Tomasz wrote:
>> On 08.04.2025 16:23, Michal Wajdeczko wrote:
>>> On 03.04.2025 20:40, Tomasz Lis wrote:
> ...
>
>>>> +    u32 msg[1];
>>>      u32 msg;
>>> or
>>>      u32 msg[GUC_HXG_MSG_MIN_LEN];
>> I want an array, so will go with 2nd. Though this looks like unjustified
>> obfuscation to me.
> using magic "1" is an obfuscation to me

It's not magic, the array has one entry because only one entry is used 
later.

Naming it something which one could theorize is any value, obfuscates 
the real value of "1".

It makes no sense to think of this value as some "min length".

So maybe I used a wrong term - this is not just obfuscation, this is 
misinformation.

> ....
>
>>>> +                     struct guc_ctb *h2g,
>>> nit: this is redundant
>> oh I see. by "this" you mean a spaced out separate line.
> no, I just repeat that all your fixup is about H2G and since we pass ct
> each helper function can easily find pointer to h2g
>
> passing struct guc_ctb * as separate parameter may suggest that
> something other than h2g could be handled (but we don't need this)

 From the other comments, it would seem you think it is important for 
the code to correspond to the idea, or mindset, behind it.

The idea behind this function is to fix a message in specific `guc_ctb`. 
Why do we want to hide this idea by optimizing out a parameter in this 
specific case, but we're adding incorrect parameters (meaning ones that 
are not used directly, need to be converted to another object) 
everywhere else? What is the difference?

> ...
>
>>>> +    /* Read header */
>>> you should at least assert that avail > 0 before reading even single u32
>>>
>>>      xe_gt_assert(gt, avail >= GUC_CTB_MSG_MIN_LEN);
>> This is a static function, and the caller is ensuring that. But sure, I
>> can add the assert.
>>> but since it is called in the loop, more appropriate would be:
>>>
>>>      if (avail < GUC_CTB_MSG_MIN_LEN)
>>>          goto broken;
>> I don't know what you mean by that. There are not that many
>> possibilities for an integer value greater than 0 but smaller than 1.
> caller just checks for avail > 0
> but we shall check for avail >= GUC_CTB_MSG_MIN_LEN
>
> and we should not make any assumptions about GUC_CTB_MSG_MIN_LEN value

In this round I finally agreed to the changes which removed any size 
read (xe_map_memcpy_from) and replaced that with one dword read 
(xe_map_rd_array_u32).

We are now making assumptions about GUC_CTB_MSG_MIN_LEN.  The only 
supported value is 1.

And even if it wasn't the case, the value is and always will be 1. We 
are creating false narrative around the simple value of 1. It is just 1 
and can't be anything else. Change to that would require a rewrite, but 
it's not even worth considering as that will never happen.

"avail > 0" and "avail >= 1" are equivalent.

>> And by the changes I agreed to above and below, we've already ensured
>> that GUC_CTB_MSG_MIN_LEN can only be equal to 1.
> we are adding checks for 'avail'
> we are not making new assumptions about ABI values
>
> ...
>
>>> maybe it's time to introduce
>>>
>>>      u32 move_head(u32 head, u32 step, u32 size)
>> We are overdefining a lot of things already. I don't think we should
>> press even harder in that direction.
> it's about avoiding duplicated code and wrap it into named helper

This is why we introduce helpers, true. But now we're starting to do 
this for trivial stuff.

Every programmer understands these:

head = (head + GUC_CTB_MSG_MIN_LEN) % size;

*mhead = (head + msg_len_to_hxg_len(len)) % size;

This is trivial. The lines are self explanatory. Making a sub-function 
to hide that isn't providing any benefit, It's harmful, as it requires 
the reader to check the sub-function instead of seeing plainly and 
directly what's being done. It also leads to more lines of code.

> ...
>
>>>> +
>>>> +/**
>>>> + * xe_guc_ct_fixup_messages_with_ggtt - Fixup any pending H2G CTB
>>>> messages
>>>> + * @ct: pointer to CT struct of the target GuC
>>>> + * @ggtt_shift: shift to be added to all GGTT addresses within the CTB
>>>> + *
>>>> + * Messages in guc-to-host CTB are owned by GuC and any fixups in them
>>>> + * are made by GuC. But content of the host-to-guc CTB is owned by the
>>>> + * KMD, so fixups to GGTT references in any pending messages need to be
>>>> + * applied here.
>>> s/guc-to-host/H2G
>>>
>>> like you have earlier and below
>> Earlier is a short description. Below I am using shortcut after defining
>> it in this line. This is how the long comments should look like,
>> abbreviations should be defined.
> but I guess you should be aware right now that your code is not the
> first that introduces the CTB feature, including G2H and H2G concept and
> naming,

ok, interesting argument. I didn't know we have the concept of "first" 
introduction which defines abbreviations.

Where is it? I see two candidates:

1. ./drivers/gpu/drm/xe/xe_guc_ct_types.h:        /** @ctbs.recv: GuC to 
Host (G2H, receive) channel */

This one is a part of struct declaration, so it clearly isn't the first, 
right?

2. /drivers/gpu/drm/xe/abi/guc_communication_mmio_abi.h: * Each MMIO 
based message, both Host to GuC (H2G) and GuC to Host (G2H)

This one is in MMIO specific file. And we're discussing a comment to a 
CTB-related function. So.. not this one?

Where is that "first" introduction which defined it? What rule do we 
have here as a general concept, one expansion per driver is enough? Or 
one expansion per file? One per feature?

Or.. is it all just arbitrary. We have no rules. In which case this 
place is in no way worse than any other. Maybe every full description 
should expand it.

>   so really above kernel-doc is not a right place to suddenly use
> "guc-to-host" term (which is a bad notation anyway)

Right, in the two other places the notation is different.

To be honest I'm not really against using a shortcut here, I just do not 
understand the argumentation in this review comment.

With the justification provided, I'd consider this as below the 
threshold of things worth commenting in a review.

-Tomasz

>


More information about the Intel-xe mailing list