[Intel-gfx] [PATCH] [i915] avoid infinite retries in GuC/HuC loading

John Harrison john.c.harrison at intel.com
Fri Mar 31 19:14:47 UTC 2023


On 3/26/2023 02:46, Alexandre Oliva wrote:
> Hello, John,
>
> On Mar 24, 2023, John Harrison <john.c.harrison at intel.com> wrote:
>
>> On 3/12/2023 12:56, Alexandre Oliva wrote:
>>> If two or more suitable entries with the same filename are found in
>>> __uc_fw_auto_select's fw_blobs, and that filename fails to load in the
>>> first attempt and in the retry, when __uc_fw_auto_select is called for
>>> the third time, the coincidence of strings will cause it to clear
>>> file_selected.path at the first hit, so it will return the second hit
>>> over and over again, indefinitely.
>>>
>>> Of course this doesn't occur with the pristine blob lists, but a
>>> modified version could run into this, e.g., patching in a duplicate
>>> entry, or (as in our case) disarming blob loading by remapping their
>>> names to "/*(DEBLOBBED)*/", given a toolchain that unifies identical
>>> string literals.
>> Not sure what you mean by disarming?
> Our users find loading nonfree firmware harmful.
>
>> I think what you are saying is that you made a change similar to this?
>>      #define __MAKE_UC_FW_PATH_MMP(prefix_, name_, major_, minor_,
>> patch_) "i915/invalid_file_name.bin"
> Yeah, that's the jist of it.  The name we use is "/*(DEBLOBBED)*/", so
> that it can't possibly be satisfied.
>
>> So all entries in the table have the exact same filename.
> *nod*
>
>> And with the toolchain unification comment, that means not just a
>> matching string but the same string pointer. Thus, the search code is
>> getting confused.
> Exactly
>
>> I'm not sure that is really a valid use case that the driver code
>> should be expected to support.
> It's most certainly not.  As I wrote, I'd be happy to keep on carrying
> the patch that adjusts the code to cope with our changes.  I just
> thought the same issue could come up by, say, mistakenly applying a
> patch twice to add support for a new card, a circumstance in which one
> might not have the card readily available to try it out.
Not following this argument. You can't add support for a card that you 
don't have access to. GuC firmware is produced internally by Intel so it 
isn't going to be added by some third party person. And internally, we 
have CI systems up and running for each platform before the patches to 
support that platform land in the upstream tree. So any such error most 
certainly should be caught by pre-merge CI.

>> Even without the infinite loop, the driver is not
>> going to load because you have removed the firmware files?
> Oh, no, the driver loads just fine even without those blobs, and that's
> much nicer of you than other drivers for hardware that doesn't really
> require blobs, but that insist on bailing out if the firmware can't be
> loaded.  i915 hasn't been hostile like that.
That situation won't last...

> When you override the firmware filenames, and it fails to load, the
> driver makes it a (reasonable IMHO) hard fail, but when it just fails to
> find the regular firmware files, it's nice that it proceeds that does
> the best it can.
>
>> However, I think you are saying that the problem would also exist if
>> there was some kind of genuine duplication in the table?
> Yes.  Not the kind you mention, for different platforms, but an actual
> duplicate entry, such as what you might get if you applied a patch that
> added an entry for a new card, and then applied it again, resolving the
> conflicts in a way that retained the duplicate entries.
I would consider that a bug that should never make it past either 
pre-merge CI or code review.

Also, that is what we have the table verification code for - ensuring 
that bugs don't creep in to the table. So if you have spotted a hole in 
that verification then I do think it needs plugging.

Unfortunately for you, I think that is the best way forward for 
i915/Intel. Enhancing the verification step to ensure that such bugs 
can't happen before it gets to do the search. However, I think there are 
easier ways for you to modify the driver to prevent firmware loading. 
E.g. rather than modifying the table, just force an early exit from the 
loading code itself. And if you really do need to remove the firmware 
files from the compiled binary completely, then replacing them with 
unique names would also work - '/*(DEBLOBBED_1)*/', '/*(DEBLOBBED_2)*/', 
etc.


>
>> So there can only be a problem if a single platform specifies the same
>> filename multiple times? Which would be a bug in the table because
>> why? It would be redundant entries that have no purpose.
> Agreed.
>
>> Note that I'm not saying we don't want to take your change. But I
>> would like to understand if there is a genuine issue that maybe needs
>> a better fix. E.g. should the table verification code be enhanced to
>> just reject the table entirely if there are such errors present.
> Table verification might wish to detect and report duplicate filenames
> for the same platform, to catch even alternating duplicates (e.g. "a",
> then "b", then "a" again), but it would be kind if you didn't make that
> a hard error, otherwise we'd have to tweak it to cope with our own
> "/*(DEBLOBBED)*/" duplicates.
>
> Another approach, that would probably be more efficient as the table
> grows, is to store in uc_fw a pointer to or index of the current or next
> entry to be searched, so that the code doesn't have to iterate over the
> table at every try (O(n^2)), and instead takes it from exactly where it
> left off, running overall a single time over the whole table (O(n)), at
> the cost of a pointer or index in uc_fw.  Then, duplicates in the table
> wouldn't matter at all.
>
>> Also, is this string unification thing a part of the current gcc
>> toolchain?
> Yeah, compilers and linkers have been unifying (read-only) string
> literals for a very long time.
That's what I would have assumed. Which is why I was confused that you 
were saying 'if you use a toolchain that does this'. It seemed that you 
were implying that most don't and this was a special situation.

John.

>
> Thanks,
>



More information about the Intel-gfx mailing list