[Intel-gfx] [PATCH 3/7] drm/i915/uc: use different ggtt pin offsets for uc loads

Mon Oct 10 19:18:02 UTC 2022

On 9/30/2022 16:42, Ceraolo Spurio, Daniele wrote:
> On 9/30/2022 4:24 PM, John Harrison wrote:
>> On 9/22/2022 15:11, Daniele Ceraolo Spurio wrote:
>>> Our current FW loading process is the same for all FWs:
>>>
>>> - Pin FW to GGTT at the start of the ggtt->uc_fw node
>>> - Load the FW
>>> - Unpin
>>>
>>> This worked because we didn't have a case where 2 FWs would be 
>>> loaded on
>>> the same GGTT at the same time. On MTL, however, this can happend if 
>>> both
>> The point being that the mapping is done using a single 'dummy' vma 
>> that can't be mapped to two different places at the same time? But 
>> isn't there a separate dummy object per uc instance. So there would 
>> be one for the GuC on the render GT and another for the GuC on the 
>> media GT. So each would be mapped separately to it's own unique 
>> address and there is no conflict? Or what am I missing?
>
> The issue is that without this patch all the dummy vmas are inserted 
> at the same address (start of the reserved node), which only works if 
> they can't be mapped at the same time. Note that we don't use the 
> generic vm functions to insert the dummy vma and we instead specify 
> the exact offset we want it mapped at. This patch makes it so that 
> each dummy vma has its own private offset.
>
Got it. I think it would be worth adding some documentation about the 
forced mapping address and why it is necessary.

>>> GTs are reset at the same time, so we can't pin everything in the same
>>> spot and we need to use separate offset. For simplicity, instead of
>>> calculating the exact required size, we reserve a 2MB slot for each fw.
>>>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>> Cc: John Harrison <john.c.harrison at intel.com>
>>> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 22 +++++++++++++++++++---
>>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> index b91ad4aede1f..d6ca772e9f4b 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> @@ -666,16 +666,33 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
>>>       return err;
>>>   }
>>>   +/*
>>> + * The reserved GGTT space is ~18 MBs. All our blobs are well below 
>>> 1MB, so for
>>> + * safety we reserve 2MB each.
>>> + */
>>> +#define INTEL_UC_RSVD_GGTT_PER_FW SZ_2M
>>>   static u32 uc_fw_ggtt_offset(struct intel_uc_fw *uc_fw)
>>>   {
>>> -    struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt;
>>> +    struct intel_gt *gt = __uc_fw_to_gt(uc_fw);
>>> +    struct i915_ggtt *ggtt = gt->ggtt;
>>>       struct drm_mm_node *node = &ggtt->uc_fw;
>>> +    u32 offset = uc_fw->type * INTEL_UC_RSVD_GGTT_PER_FW;
>>> +
>>> +    /*
>>> +     * To keep the math simple, we use 8MB for the root tile and 
>>> 8MB for
>>> +     * the media one.
>>> +     */
>>> +    BUILD_BUG_ON(INTEL_UC_FW_NUM_TYPES * INTEL_UC_RSVD_GGTT_PER_FW 
>>> > SZ_8M);
>> Doesn't this need to be >= ?
>
> No, this is a size check and 8MB is fine for a size.
>
>>
>>> +    if (gt->type == GT_MEDIA)
>>> +        offset += SZ_8M;
>>>         GEM_BUG_ON(!drm_mm_node_allocated(node));
>>>       GEM_BUG_ON(upper_32_bits(node->start));
>>>       GEM_BUG_ON(upper_32_bits(node->start + node->size - 1));
>>> +    GEM_BUG_ON(offset + uc_fw->obj->base.size > node->size);
>>> +    GEM_BUG_ON(uc_fw->obj->base.size > INTEL_UC_RSVD_GGTT_PER_FW);
>> Given that the firmware blob is loaded from the disk and therefore 
>> under user control, is a BUG_ON appropriate? Although there doesn't 
>> look to be any obvious way to abort at this point. Could the size 
>> check be moved to where we actually load the firmware rather than 
>> where it is being mapped?
>
> My idea was that we wouldn't release a blob that violates this without 
> updating the kernel first, so the only way a user can violate this is 
> if they try to load a bogus file. But I can still move this check to 
> fetch time and fail the fetch if the size is too big, so we can 
> fall-back to something sensible.
Can't trust those pesky users. They can download a HTML page of an ASCII 
dump of a firmware blob and try to load that. For example. I've seen 
that before. So yeah, I think there definitely needs to be a 'drm_err; 
goto fail' rather than a BUG_ON somewhere.

Otherwise, it all seems good.

John.

>
>>
>>>   -    return lower_32_bits(node->start);
>>> +    return lower_32_bits(node->start + offset);
>>>   }
>>>     static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>>> @@ -690,7 +707,6 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw 
>>> *uc_fw)
>>>       dummy->bi.pages = obj->mm.pages;
>>>         GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
>>> -    GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size);
>> Why remove this?
>
> The new GEM_BUG_ONs in the function above perform a more strict test, 
> so this is redundant.
>
> Daniele
>
>>
>> John.
>>
>>>         /* uc_fw->obj cache domains were not controlled across 
>>> suspend */
>>>       if (i915_gem_object_has_struct_page(obj))
>>
>