[Intel-gfx] [CI 1/2] drm/i915/guc: Fix null pointer dereference when GuC FW is not available

Fri Mar 23 19:04:49 UTC 2018

On Fri, 23 Mar 2018 19:40:10 +0100, Yaodong Li <yaodong.li at intel.com>  
wrote:

> On 03/23/2018 11:26 AM, Michal Wajdeczko wrote:
>> On Fri, 23 Mar 2018 19:03:47 +0100, Yaodong Li <yaodong.li at intel.com>  
>> wrote:
>>
>>> On 03/23/2018 05:27 AM, Michal Wajdeczko wrote:
>>>> On Fri, 23 Mar 2018 13:07:15 +0100, Sagar Arun Kamble  
>>>> <sagar.a.kamble at intel.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 3/23/2018 4:53 PM, Piotr Piórkowski wrote:
>>>>>> If GuC firmware is not available on the system and we load i915  
>>>>>> with enable
>>>>>> GuC, then we hit this null pointer dereference issue:
>>>>> Patch looks good but I have query on usage of enable_guc modparam.
>>>>> enable_guc modparam will enable GuC/HuC only if firmware is  
>>>>> available.
>>>>
>>>> During modparam sanitization phase, code will only check for firmware
>>>> name, code will attempt to check if firmware file exists.
>>> Is it better if we won't even call into upload FW once we found the FW  
>>> fetching was failed?
>>>>
>>>>> If user sets to forcefully enable GuC/HuC on systems that don't have  
>>>>> firmware then it is user's fault.
>>>>
>>>> Sure its user's fault, but event in such case we should just  
>>>> gracefully
>>>> abort driver load, without any NULL pointer BUG ;)
>>>>
>>>> And note, that we will hit this bug not only when user force GuC with:
>>>>     enable_guc=1 guc_firmware_path=/does/not/exist
>>>>
>>>> but also when user just specify auto mode:
>>>>     enable_guc=-1
>>>>
>>>> when predefined firmwares are not present (not installed or removed)
>>>>
>>> it feels like it's still user's fault even with the auto mode. why the  
>>> user even set
>>> the enable_guc to -1 after known the fact that the FWs are not present?
>>> I mean should users be expected to know the fact that the auto mode
>>> enable_guc = -1 will enable the GuC and HuC, if the guc_firmware_path  
>>> was left
>>> to be none?
>>
>> GuC fw path will not be none, as we have some hardcoded paths in the  
>> code.
>> And we don't tell the user that after turning on 'auto' mode, he has to
>> do something special. Only later in case of fw fetch errors, we are just
>> printing error message with extra help:
>>
>> "%s: Failed to fetch firmware %s (error %d)\n"
>> "%s: Firmware can be downloaded from %s\n"
>>
> In this case, back to my question: maybe it's a better idea to make  
> things stops here
> instead of continuing to call following functions such as uc_init,  
> uc_init_hw -> fw_upload?

We can stop at many places, but we should be still prepared for broken/
unsigned firmware, and signature will be verified during fw_upload

>>>
>>> I agree we should make sure correct pointer access. it's a  
>>> non-excusable fault:)
>>>
>>> The thing actually frustrated me was that we failed to screen such an  
>>> error
>>> with the CI, but suddenly it broke things and tests with incorrect  
>>> configurations
>>> (enable_guc set to 1 or -1 + No FW available). so maybe we should add  
>>> a case to
>>> cover such a case in CI as well if we did have such test  
>>> configurations.
>>>
>>
>> I guess you can write such test on your own and submit patch to igt-dev  
>> ;)
> :-)
>
> Regards,
> -Jackie