[Intel-gfx] [PATCH 21/39] drm/i915/guc: Disable global reset

Mon Jan 7 21:28:48 UTC 2019

On 01/07/2019 10:50 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-01-07 18:31:52)
>>
>>
>> On 01/02/2019 01:41 AM, Chris Wilson wrote:
>>> The guc (and huc) currently inexcruitably depend on struct_mutex for
>>> device reinitialisation from inside the reset, and indeed taking any
>>> mutex here is verboten (as we must be able to reset from underneath any
>>> of our mutexes). That makes recovering the guc unviable without, for
>>> example, reserving contiguous vma space and pages for it to use.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_reset.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
>>> index f5da67f1bc04..77fc2f74e427 100644
>>> --- a/drivers/gpu/drm/i915/i915_reset.c
>>> +++ b/drivers/gpu/drm/i915/i915_reset.c
>>> @@ -590,6 +590,9 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
>>>    
>>>    bool intel_has_gpu_reset(struct drm_i915_private *i915)
>>>    {
>>> +     if (USES_GUC(i915))
>>> +             return false;
>>> +
>>
>> Do we need to tweak the getparam so we can report that we have engine
>> reset but not full reset?
> 
> Oh, we cut a corner. But is userspace ready for having one but not the
> other... What userspace other than igt, I wonder. I suspect even igt
> treats it as boolean for the large part, with just a few correctly
> checking when they want per-engine resets.
> 
>> Also, this is a regression of capabilities when GuC is enabled, so we
>> need to make lockless reset work with GuC soon.
> 
> Hah. It frequently fails to recover after reset :-p
> 
>> Were you already
>> planning/working on something along the lines of the possible solution
>> mentioned in the commit message? Just trying to understand what the
>> status is before jumping in to avoid duplication of work ;)
> 
> The compromises required (reserving space for the firmware) seem like it
> will just end up with a lot of screaming to make it mutexless today.
> 

Aren't we already keeping some GuC stuff perma-pinned (e.g. the 
stage_desc_pool)? Current Gen11+ firmware needs more space and uses 
objects up to tens of MBs, the blobs aren't that big in comparison.

> I was planning on coming back to it after breaking up the struct_mutex
> to see which locks were required. If we get to the point where we can
> guarantee that those locks are never held when waiting (inc.
> allocations) (the plan there is to annotate them with lockdep to catch
> waiters), then we can safely use those inside the reset.
> 

Sounds good.

> Or maybe we can get away with using just a single page in the global gtt
> and just loading the firmware piecemeal, or something.

 From my understanding the HW consider the firmware loaded and starts 
processing it as soon as the first DMA transfer completes, so loading it 
bit by bit wouldn't work.

Thanks,
Daniele

> -Chris
>