[Intel-gfx] [PATCH v2] drm/i915: Restore inhibiting the load of the default context
Francisco Jerez
currojerez at riseup.net
Thu Dec 10 05:24:52 PST 2015
Mika Kuoppala <mika.kuoppala at linux.intel.com> writes:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
>> Following a GPU reset, we may leave the context in a poorly defined
>> state, and reloading from that context will leave the GPU flummoxed. For
>> secondary contexts, this will lead to that context being banned - but
>> currently it is also causing the default context to become banned,
>> leading to turmoil in the shared state.
>>
>> This is a regression from
>>
>> commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
>> Author: Ben Widawsky <benjamin.widawsky at intel.com>
>> Date: Mon Mar 16 16:00:58 2015 +0000
>>
>> drm/i915: Initialize all contexts
>>
>> which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
>> default context.
>>
AFAICT the removal of MI_RESTORE_INHIBIT in that commit seemed
justified. Ben explained that it was needed to fix a pagefault in the
default context under certain conditions. I don't know the details of
the original change (Ben CC'ed), but it seems like this would be trading
one bug for another?
Other than that this opens the door again to subtle state leaks between
processes, as I learned recently while implementing L3 partitioning in
Mesa. Mesa now changes the L3 configuration in ways that will break
assumptions from processes that use the default context (the DDX). The
DDX assumes, for instance, that the URB size is set according to the
hardware defaults, but it doesn't actually program the URB size itself,
so in a way the DDX relies on MI_RESTORE_INHIBIT *not* to be set for the
default context for correct operation. This commit breaks that
assumption and the kernel ABI.
Mesa has a workaround in place to reduce the likelihood of such leaks,
but the solution is far from ideal because it relies on userspace
cooperation and had a measurable impact in performance (because it
requires userspace to assume the worst-case scenario that the following
batch is going to be from a different context with MI_RESTORE_INHIBIT
set, so we have to restore the hardware default L3 configuration at the
end of every batch even if that's actually not the case), for that
reason we would like to drop the userspace workaround in the future at
least on v4.1 kernels and newer.
One more question below.
>> v2: Mark the global default context as uninitialized on GPU reset so
>> that the context-local workarounds are reloaded upon re-enabling.
>>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>
> Reviewed-by: Mika Kuoppala <mika.kuoppala at intel.com>
>
>> Cc: Michel Thierry <michel.thierry at intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>> ---
>> drivers/gpu/drm/i915/i915_gem_context.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 43761c5bcaca..f024d5d2c746 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -340,6 +340,10 @@ void i915_gem_context_reset(struct drm_device *dev)
>> i915_gem_context_unreference(lctx);
>> ring->last_context = NULL;
>> }
>> +
>> + /* Force the GPU state to be reinitialised on enabling */
>> + if (ring->default_context)
>> + ring->default_context->legacy_hw_ctx.initialized = false;
>> }
>> }
>>
>> @@ -708,7 +712,7 @@ static int do_switch(struct drm_i915_gem_request *req)
>> if (ret)
>> goto unpin_out;
>>
>> - if (!to->legacy_hw_ctx.initialized) {
>> + if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
This hunk causes MI_RESTORE_INHIBIT to be set again for the default
context regardless of whether a reset happened or not, so it seems
unrelated to the rest of your change. Maybe I'm understanding this
wrong but doesn't the !initialized check together with the hunk above
already guarantee that MI_RESTORE_INHIBIT will be set after GPU reset,
which is what you wanted to achieve?
>> hw_flags |= MI_RESTORE_INHIBIT;
>> /* NB: If we inhibit the restore, the context is not allowed to
>> * die because future work may end up depending on valid address
>> --
>> 2.6.2
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20151210/56190cae/attachment.sig>
More information about the Intel-gfx
mailing list