[Intel-gfx] [PATCH] drm/i915/guc: Fix request re-submission after reset

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu Mar 9 17:02:16 UTC 2017


On 09/03/2017 08:55, Oscar Mateo wrote:
> On 03/09/2017 08:50 AM, Tvrtko Ursulin wrote:
>>
>> On 09/03/2017 08:42, Oscar Mateo wrote:
>>> On 03/09/2017 02:05 AM, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> In order to ensure no missed interrupts we must first re-direct
>>>> the interrupts to GuC, and only then re-submit the requests to
>>>> be replayed after a GPU reset. Otherwise context switch can fire
>>>> before GuC has been set up to receive it triggering more hangs.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>>> Cc: Oscar Mateo <oscar.mateo at intel.com>
>>>> Cc: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>>> Cc: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_guc_submission.c | 58
>>>> +++++++++++++++++++++++++++---
>>>>   drivers/gpu/drm/i915/intel_guc_loader.c    | 47
>>>> ------------------------
>>>>   2 files changed, 54 insertions(+), 51 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> index beb38e30d0e9..5b8ec0fab563 100644
>>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> @@ -936,6 +936,52 @@ static void guc_reset_wq(struct i915_guc_client
>>>> *client)
>>>>       client->wq_tail = 0;
>>>>   }
>>>>
>>> <SNIP>
>>>>   int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
>>>>   {
>>>>       struct intel_guc *guc = &dev_priv->guc;
>>>> @@ -953,13 +999,17 @@ int i915_guc_submission_enable(struct
>>>> drm_i915_private *dev_priv)
>>>>         /* Take over from manual control of ELSP (execlists) */
>>>>       for_each_engine(engine, dev_priv, id) {
>>>> -        const int wqi_size = sizeof(struct guc_wq_item);
>>>> -        struct drm_i915_gem_request *rq;
>>>> -
>>>>           engine->submit_request = i915_guc_submit;
>>>>           engine->schedule = NULL;
>>>> +    }
>>>> +
>>>> +    guc_interrupts_capture(dev_priv);
>>>> +
>>>> +    /* Replay the current set of previously submitted requests */
>>>> +    for_each_engine(engine, dev_priv, id) {
>>>> +        const int wqi_size = sizeof(struct guc_wq_item);
>>>> +        struct drm_i915_gem_request *rq;
>>>>
>>> Don't you want to move the guc_interrupts_release to
>>> i915_guc_submission_disable as well?
>>
>> I can't spot anything broken in that path. We never go in that
>> direction with the live submission so why do you think it is needed?
>>
>> Regards,
>>
>> Tvrtko
> Just code symmetry: if we are leaving i915_guc_submission_enable to
> capture the interrupts, why doesn't the disable also releases them?
> Maybe it's not very important now, but it makes a lot more sense with my
> series to do proper unwinding of the whole path (I can tackle it there
> if you prefer).

I think so. There is a multitude of people trying to refactor the GuC 
code at the moment so I would prefer just to fix the reset fail quickly 
and not interfere with that wider refactoring too much. Because I think 
it is not just a quick job of moving the interrupt release to get to 
full symmetry. Ack to merge then?

Regards,

Tvrtko


More information about the Intel-gfx mailing list