[Intel-gfx] [PATCH 2/5] drm/i915: Notify user about outdated dmc firmware

Dave Gordon david.s.gordon at intel.com
Tue Oct 13 05:30:31 PDT 2015


On 08/10/15 15:45, Animesh Manna wrote:
>
> On 10/8/2015 5:53 PM, Mika Kuoppala wrote:
>> Animesh Manna <animesh.manna at intel.com> writes:
>>
>>> On 9/21/2015 2:00 PM, Mika Kuoppala wrote:
>>>> Jani Nikula <jani.nikula at linux.intel.com> writes:
>>>>
>>>>> On Fri, 18 Sep 2015, Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>>>> wrote:
>>>>>> If csr/dmc firmware is known to be outdated, notify
>>>>>> user.
>>>>> What would break if we requested a firmware version that works? Or
>>>>> we've
>>>>> made it so that we only request the major version because there's not
>>>>> supposed to be changes like this between minor versions...?
>>>>>
>>>> I guess the question is more of a what should we do
>>>> if there is only outdated (known bad) firmware available.
>>>>
>>>> Refuse to load and limb onwards, or return with error code
>>>> on driver init.
>>>>
>>>> Latter would force firmware and version to be mandatory and the
>>>> version to be tightly coupled to kernel driver version.
>>> A softlink is used to use recommended firmware for dmc and the same
>>> information is published through 01.org for the firmware user.
>>> Imo, we should not have this kind of hack in code which will change
>>> over time and this is responsibility of repo-owner to link correct
>>> recommended firmware for new kernel update.
>>>
>> On machines that had 1.19 symlinked, in filesystem, execlist submission
>> sometimes broke due to interrupt delivery problem. To reach a conclusion
>> that it was csr firmware, before 1.21 was out, took quite amount of work.
>>
>> I bet there are still machines with 1.19 only, and we get to
>> wade through error states trying to connect the dots.
>>
>> The dmc/csr firmware is part of our driver functionality. Apparently
>> it is very tightly coupled to our driver functionality as it can
>> break things outside of its own domain.
>>
>> And currently it is loosely coupled black box with our driver,
>> through symlink, accepting any version that happens to be in customers
>> filesystem.
>>
>> So we recommend latest in website and end up in a situation
>> that user gets what happens to be in filesystem. Even a known
>> broken version? And we will keep debugging these problems caused by
>> broken version? I don't want any more dimensions in our triaging
>> space, the distributio/firmware version dimension.
>>
>> Symlink also means that bisectability is very close to worthless on these
>> kind of bugs. Both in our machines and also on customers. We have
>> loosely coupled, black box entity, affecting our driver depending
>> on customers filesystem. Symlink threw that valuable tool out, and
>> we gained what?
>>
>> So we are left with triaging. Which is true detective work as there are
>> no traces of firmware versions nor loading success/fails on
>> logs/error states.
>>
>>  From where I look at, the version blacklist is not a hack. It is a cure.
>
> I completely understand your concern and we discussed a lot on same
> during firmware naming
> convention and finally decided to have symlink.
>
> If we really want to tightly couple firmware and driver then imo putting
> exact firmware name
> will be better option.
>
> Next I saw your subsequent patch where you are not loading the firmware
> if it is older than 1.21.
> http://lists.freedesktop.org/archives/intel-gfx/2015-September/076422.html
> Curious to know the gpu-hang issue present for any version less than 1.21.
>
> -Animesh

The GuC loader always had this sort of functionality, so the driver can 
be built to know that anything older than a specific minor version is bogus.

The proposed unified loader therefore tested (=major, >=minor) criteria 
for each of the various chunks of uC device firmware being loaded.

.Dave.



More information about the Intel-gfx mailing list