[Intel-gfx] [PATCH 3/7] drm/i915/uc: use different ggtt pin offsets for uc loads
John Harrison
john.c.harrison at intel.com
Mon Oct 10 19:18:02 UTC 2022
On 9/30/2022 16:42, Ceraolo Spurio, Daniele wrote:
> On 9/30/2022 4:24 PM, John Harrison wrote:
>> On 9/22/2022 15:11, Daniele Ceraolo Spurio wrote:
>>> Our current FW loading process is the same for all FWs:
>>>
>>> - Pin FW to GGTT at the start of the ggtt->uc_fw node
>>> - Load the FW
>>> - Unpin
>>>
>>> This worked because we didn't have a case where 2 FWs would be
>>> loaded on
>>> the same GGTT at the same time. On MTL, however, this can happend if
>>> both
>> The point being that the mapping is done using a single 'dummy' vma
>> that can't be mapped to two different places at the same time? But
>> isn't there a separate dummy object per uc instance. So there would
>> be one for the GuC on the render GT and another for the GuC on the
>> media GT. So each would be mapped separately to it's own unique
>> address and there is no conflict? Or what am I missing?
>
> The issue is that without this patch all the dummy vmas are inserted
> at the same address (start of the reserved node), which only works if
> they can't be mapped at the same time. Note that we don't use the
> generic vm functions to insert the dummy vma and we instead specify
> the exact offset we want it mapped at. This patch makes it so that
> each dummy vma has its own private offset.
>
Got it. I think it would be worth adding some documentation about the
forced mapping address and why it is necessary.
>>> GTs are reset at the same time, so we can't pin everything in the same
>>> spot and we need to use separate offset. For simplicity, instead of
>>> calculating the exact required size, we reserve a 2MB slot for each fw.
>>>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>> Cc: John Harrison <john.c.harrison at intel.com>
>>> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 22 +++++++++++++++++++---
>>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> index b91ad4aede1f..d6ca772e9f4b 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> @@ -666,16 +666,33 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
>>> return err;
>>> }
>>> +/*
>>> + * The reserved GGTT space is ~18 MBs. All our blobs are well below
>>> 1MB, so for
>>> + * safety we reserve 2MB each.
>>> + */
>>> +#define INTEL_UC_RSVD_GGTT_PER_FW SZ_2M
>>> static u32 uc_fw_ggtt_offset(struct intel_uc_fw *uc_fw)
>>> {
>>> - struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt;
>>> + struct intel_gt *gt = __uc_fw_to_gt(uc_fw);
>>> + struct i915_ggtt *ggtt = gt->ggtt;
>>> struct drm_mm_node *node = &ggtt->uc_fw;
>>> + u32 offset = uc_fw->type * INTEL_UC_RSVD_GGTT_PER_FW;
>>> +
>>> + /*
>>> + * To keep the math simple, we use 8MB for the root tile and
>>> 8MB for
>>> + * the media one.
>>> + */
>>> + BUILD_BUG_ON(INTEL_UC_FW_NUM_TYPES * INTEL_UC_RSVD_GGTT_PER_FW
>>> > SZ_8M);
>> Doesn't this need to be >= ?
>
> No, this is a size check and 8MB is fine for a size.
>
>>
>>> + if (gt->type == GT_MEDIA)
>>> + offset += SZ_8M;
>>> GEM_BUG_ON(!drm_mm_node_allocated(node));
>>> GEM_BUG_ON(upper_32_bits(node->start));
>>> GEM_BUG_ON(upper_32_bits(node->start + node->size - 1));
>>> + GEM_BUG_ON(offset + uc_fw->obj->base.size > node->size);
>>> + GEM_BUG_ON(uc_fw->obj->base.size > INTEL_UC_RSVD_GGTT_PER_FW);
>> Given that the firmware blob is loaded from the disk and therefore
>> under user control, is a BUG_ON appropriate? Although there doesn't
>> look to be any obvious way to abort at this point. Could the size
>> check be moved to where we actually load the firmware rather than
>> where it is being mapped?
>
> My idea was that we wouldn't release a blob that violates this without
> updating the kernel first, so the only way a user can violate this is
> if they try to load a bogus file. But I can still move this check to
> fetch time and fail the fetch if the size is too big, so we can
> fall-back to something sensible.
Can't trust those pesky users. They can download a HTML page of an ASCII
dump of a firmware blob and try to load that. For example. I've seen
that before. So yeah, I think there definitely needs to be a 'drm_err;
goto fail' rather than a BUG_ON somewhere.
Otherwise, it all seems good.
John.
>
>>
>>> - return lower_32_bits(node->start);
>>> + return lower_32_bits(node->start + offset);
>>> }
>>> static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
>>> @@ -690,7 +707,6 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw
>>> *uc_fw)
>>> dummy->bi.pages = obj->mm.pages;
>>> GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
>>> - GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size);
>> Why remove this?
>
> The new GEM_BUG_ONs in the function above perform a more strict test,
> so this is redundant.
>
> Daniele
>
>>
>> John.
>>
>>> /* uc_fw->obj cache domains were not controlled across
>>> suspend */
>>> if (i915_gem_object_has_struct_page(obj))
>>
>
More information about the Intel-gfx
mailing list