[PATCH] drm/xe/gsc: Handle GSCCS ER interrupt

Wed Feb 21 18:21:38 UTC 2024

just realized I hadn't replied to this. Sorry for the late reply!
Answers below.

On 2/9/2024 10:36 AM, Teres Alexis, Alan Previn wrote:
> On Thu, 2024-01-25 at 13:55 -0800, Daniele Ceraolo Spurio wrote:
>> Starting on Xe2, the GSCCS engine reset is a 2-step process. When the
>> driver or the GuC hit the GDRST register, the CS is immediately reset
> alan: minor nit %s/hit/hits
>
>> and a success is reported, but the GSC shim keeps resetting in the
> alan: minor nit %s/keeps resetting/continues the reset
>> background. While the shim reset is ongoing, the CS is able to accept
>> new context submission, but any commands that require the shim will
>> be stalled until the reset is completed. This means that we can keep
>> submitting to the GSCCS as long as we make sure that the preemption
>> timeout is big enough to cover any delay introduced by the reset
>> (which it already is).
> alan: as per offline conversation, we believe that reserved engines
> like GSC isnt effected by sysfs knobs to change engine preemption
> tmeouts. However, in the event a system integrator decides to play
> around with the default preemption timeout CONFIG, then perhaps we
> should add some kind of build or runtime warning like this in
> hw_engine_init? :
>
> if (hwe->class == XE_ENGINE_CLASS_OTHER &&
>      CONFIG_DRM_XE_PREEMPT_TIMEOUT> [that-shim-timeout])
> 	drm_warn(blah..)

ok

> alan:snip
>
>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> index cd27480f6486..4acc8f3d646c 100644
>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>   b/drivers/gpu/drm/xe/xe_gsc.c
>> index 0b90fd9ef63a..42dd61a197cb 100644
>> --- a/drivers/gpu/drm/xe/xe_gsc.c
>> +++ b/drivers/gpu/drm/xe/xe_gsc.c
>> @@ -25,6 +25,7 @@
>>   #include "xe_wa.h"
>>   #include "instructions/xe_gsc_commands.h"
>>   #include "regs/xe_gsc_regs.h"
>> +#include "regs/xe_gt_regs.h"
>>   
>>   static struct xe_gt *
>>   gsc_to_gt(struct xe_gsc *gsc)
>> @@ -271,6 +272,44 @@ static int gsc_upload_and_init(struct xe_gsc
>> *gsc)
>>          return 0;
>>   }
>>   
>> +static int gsc_er_complete(struct xe_gt *gt)
>> +{
> alan:snip
>> +       if (er_status == GSCI_TIMER_STATUS_TIMER_EXPIRED) {
>> +               /*
>> +                * XXX: we should trigger an FLR here, but we don't
>> have support
>> +                * for that yet.
>> +                */
>> +               xe_gt_err(gt, "GSC ER timed out!\n");
> alan: in a case like this, GSC is basically dead right? (i.e. until we
> support driver-flr). Wonder if should mark fw status as
> XE_UC_FIRMWARE_LOAD_FAIL with a comment "eventually we can replace this
> with a trigger to perform driver flr when that becomes available"

Changing the FW status wouldn't do anything, because after the load is 
complete the driver works off the values reported in the registers. As 
far as I understand those gets cleared when the reset fails, so PXP and 
HDCP should automatically be stopped that way.

>
>> +               return -EIO;
>> +       }
>> +
> alan: do you think we should add a drm_dbg here if er_status is some
> other unexpected value like "GSCI_TIMER_STATUS_RESET_IN_PROGRESS"?

No, GSCI_TIMER_STATUS_RESET_IN_PROGRESS is a valid case if you get 2 
resets back to back and the other are either the failure or the ok values.

Daniele

>> +       return 0;
>> +}
>> +
>>   static void gsc_work(struct work_struct *work)
>>
> alan:snip
>
>>   int xe_gsc_init(struct xe_gsc *gsc)
>>   {
>>          struct xe_gt *gt = gsc_to_gt(gsc);
>> diff --git a/drivers/gpu/drm/xe/xe_gsc.h
>> b/drivers/gpu/drm/xe/xe_gsc.h
>> index c6fb32e3fd79..dd16e9b8b894 100644
>> --- a/drivers/gpu/drm/xe/xe_gsc.h
>> +++ b/drivers/gpu/drm/xe/xe_gsc.h
>> @@ -9,12 +9,14 @@
>>   #include "xe_gsc_types.h"
>>   
>>   struct xe_gt;
>> +struct xe_hw_engine;
>>   
>>   int xe_gsc_init(struct xe_gsc *gsc);
>>   int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc);
>>   void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc);
>>   void xe_gsc_load_start(struct xe_gsc *gsc);
>>   void xe_gsc_remove(struct xe_gsc *gsc);
>> +void xe_gsc_hwe_irq_handler(struct xe_hw_engine *hwe, u16 intr_vec);
>>   
>>   void xe_gsc_wa_14015076503(struct xe_gt *gt, bool prep);
>>   
> alan:snip