[PATCH v2 04/15] drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access
Nirmoy Das
nirmoy.das at linux.intel.com
Fri Jan 12 16:31:10 UTC 2024
On 1/12/2024 4:12 PM, Ville Syrjälä wrote:
> On Wed, Jan 10, 2024 at 11:49:47AM +0100, Nirmoy Das wrote:
>> Hi Ville,
>>
>> Apologies, but I lost track of this series after I returned from sick leave.
>>
>>
>> On 12/15/2023 11:59 AM, Ville Syrjala wrote:
>>> From: Ville Syrjälä<ville.syrjala at linux.intel.com>
>>>
>>> On MTL accessing stolen memory via the BARs is somehow borked,
>>> and it can hang the machine. As a workaround let's bypass the
>>> BARs and just go straight to DSMBASE/GSMBASE instead.
>>>
>>> Note that on every other platform this itself would hang the
>>> machine, but on MTL the system firmware is expected to relax
>>> the access permission guarding stolen memory to enable this
>>> workaround, and thus direct CPU accesses should be fine.
>>>
>>> TODO: add w/a numbers and whatnot
>>>
>>> Cc: Paz Zcharya<pazz at chromium.org>
>>> Cc: Nirmoy Das<nirmoy.das at intel.com>
>>> Cc: Radhakrishna Sripada<radhakrishna.sripada at intel.com>
>>> Cc: Joonas Lahtinen<joonas.lahtinen at linux.intel.com>
>>> Signed-off-by: Ville Syrjälä<ville.syrjala at linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 11 ++++++++++-
>>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 ++++++++++++-
>>> 2 files changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> index ee237043c302..252fe5cd6ede 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>>> @@ -941,7 +941,16 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
>>> dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M);
>>> }
>>>
>>> - if (pci_resource_len(pdev, GEN12_LMEM_BAR) < lmem_size) {
>>> + if (IS_METEORLAKE(i915)) {
>>> + /*
>>> + * Workaround: access via BAR can hang MTL, go directly to DSM.
>>> + *
>>> + * Normally this would not work but on MTL the system firmware
>>> + * should have relaxed the access permissions sufficiently.
>>> + */
>>> + io_start = intel_uncore_read64(uncore, GEN12_DSMBASE) & GEN12_BDSM_MASK;
>>> + io_size = dsm_size;
>> This will work well on host driver but I am afraid this will not work on
>> VM when someone tries to do direct device assignment of the igfx.
>>
>> GSMBASE/DSMBASE is reserved region so won't show up in VM, last I checked.
> Hmm. So BARs get passed over but other regions won't be? I wonder if
> there's a way to pass them explicitly...
Yes, when a user ask qemu to pass though a pci device then qemu will
ensure to map those
BARs.
>
>> This is an obscure usages but are we suppose to support that? If so then
>> we need to detect that and fall back to binder approach.
> I suppose some people may attempt it. But I'm not sure how well that
> will work in practice even on other platforms. I don't think we've
> ever really considered that use case any kind of priority so bug
> reports tend to go unanswered.
>
> My main worry with the MI_UPDATE_GTT stuff is:
> - only used on this one platform so very limited testing coverage
> - async so more opprtunities to screw things up
> - what happens if the engine hangs while we're waiting for MI_UPDATE_GTT
> to finish?
> - requires working command submission, so even getting a working
> display now depends on a lot more extra components working correctly
>
> hence the patch to disable it. During testing my MTL was very unstable
> so I wanted to eliminate all potential sources of new bugs.
Valid concerns but unfortunately MI_UPDATE_GTT is the only generic
solution came up in the discussions
which supports host, vm, also SRIOV case.
>
> Hmm. But we can't even use MI_UPDATE_GTT until command submission is
> up and running, so we still need the direct CPU path for early ggtt
> setup no?
It is very unlikely for the bug to appear when there is only single user
of the GPU. So the HW team is fine with
having a small window where we do modify GTT using stolen.
How about a modparam which defaults to your approach and have a doc
saying to use binder on VM ?
It would be nice if i915 could detect if it is running in virtualized
environment but I don't have any ideas for that.
Regards,
Nirmoy
> So if we can't pass the stolen directly to the VM the only
> option would be to use the BARs for that and risk hanging the machine.
Question how would i915 detect if it is running in VM environment
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20240112/4ab13caf/attachment.htm>
More information about the Intel-gfx
mailing list