[igt-dev] [PATCH] tests: read engine name again before restore timeout value

Lee, Shawn C shawn.c.lee at intel.com
Thu Oct 12 12:19:08 UTC 2023


2023-10-06T04:09:49.998264Z INFO kernel: [    0.255038] idma64 idma64.2: Found Intel integrated DMA 64-bit
2023-10-06T04:09:49.998285Z INFO kernel: [    0.322711] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/mtl_dmc.bin (v2.16)
2023-10-06T04:09:49.998286Z  ERR kernel: [    0.342093] i915 0000:00:02.0: [drm] *ERROR* Unlocked WOPCM regs with media GT
2023-10-06T04:09:49.998368Z INFO kernel: [    0.345039] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/mtl_guc_70.bin version 70.8.0
2023-10-06T04:09:49.998387Z INFO kernel: [    0.357664] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
2023-10-06T04:09:49.998387Z INFO kernel: [    0.357666] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
2023-10-06T04:09:49.998389Z INFO kernel: [    0.357970] i915 0000:00:02.0: [drm] GuC RC: enabled
2023-10-06T04:09:49.998423Z INFO kernel: [    0.362337] i915 0000:00:02.0: [drm] GT1: GuC firmware i915/mtl_guc_70.bin version 70.8.0
2023-10-06T04:09:49.998424Z INFO kernel: [    0.362338] i915 0000:00:02.0: [drm] GT1: HuC firmware i915/mtl_huc_gsc.bin version 8.5.4
2023-10-06T04:09:49.998425Z  ERR kernel: [    0.362339] i915 0000:00:02.0: [drm] *ERROR* GT1: Unsuccessful WOPCM partitioning
2023-10-06T04:09:49.998435Z  ERR kernel: [    0.362399] i915 0000:00:02.0: [drm] *ERROR* GT1: GuC initialization failed -E2BIG
2023-10-06T04:09:49.998436Z  ERR kernel: [    0.362400] i915 0000:00:02.0: [drm] *ERROR* GT1: Enabling uc failed (-5)
2023-10-06T04:09:49.998440Z  ERR kernel: [    0.362401] i915 0000:00:02.0: [drm] *ERROR* GT1: Failed to initialize GPU, declaring it wedged!
2023-10-06T04:09:49.998451Z NOTICE kernel: [    0.362736] i915 0000:00:02.0: [drm:add_taint_for_CI] CI tainted:0x9 by intel_gt_init+0x1c6/0x303



On Thursday, October 12, 2023 7:33 PM, Deak, Imre wrote:
>On Thu, Oct 12, 2023 at 09:53:44AM +0100, Tvrtko Ursulin wrote:
>> 
>> On 11/10/2023 09:42, Lee Shawn C wrote:
>> > We encounter a unexpected error on chrome book device while running 
>> > this test. The tool will restore GPU engine's timeout value but open 
>> > incorrect file name (XR24 in below). This is a workaround patch to 
>> > avoid this problem before we got the root cause.
>> > 
>> > openat(AT_FDCWD, "/sys/dev/char/226:0", O_RDONLY) = 12
>> > openat(12, "dev", O_RDONLY)             = 13
>> > read(13, "226:0\n", 1023)               = 6
>> > close(13)                               = 0
>> > openat(12, "engine", O_RDONLY)          = 13
>> > close(12)                               = 0
>> > openat(13, "XR24", O_RDONLY)            = -1 ENOENT (No such file or directory)
>> > 
>> > Signed-off-by: Lee Shawn C <shawn.c.lee at intel.com>
>> > Issue: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/147
>> > ---
>> >   tests/intel/kms_busy.c | 10 ++++++++--
>> >   1 file changed, 8 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/tests/intel/kms_busy.c b/tests/intel/kms_busy.c index 
>> > 5b620658fb18..119e6f1652ce 100644
>> > --- a/tests/intel/kms_busy.c
>> > +++ b/tests/intel/kms_busy.c
>> > @@ -414,9 +414,15 @@ static void gpu_engines_init_timeouts(int fd, int max_engines,
>> >   	}
>> >   }
>> > -static void gpu_engines_restore_timeouts(int fd, int num_engines, 
>> > const struct gem_engine_properties *props)
>> > +static void gpu_engines_restore_timeouts(int fd, int num_engines, 
>> > +struct gem_engine_properties *props)
>> >   {
>> > -	int i;
>> > +	const struct intel_execution_engine2 *e;
>> > +	int i = 0;
>> > +
>> > +	for_each_physical_engine(fd, e) {
>> > +		props[i].engine = e;
>> > +		i++;
>> > +	}
>> >   	for (i = 0; i < num_engines; i++)
>> >   		gem_engine_properties_restore(fd, &props[i]);
>> 
>> By the look of it bug is in gpu_engines_init_timeouts(). This pointer
>> assignment:
>> 
>> 	for_each_physical_engine(fd, e) {
>> 		igt_assert(*num_engines < max_engines);
>> 
>> 		props[*num_engines].engine = e;
>> 
>> ^^^ e is on stack, in scope of for_each_physical_engine, so by the 
>> time
>> gpu_engines_restore_timeouts() runs it can legitimately point to 
>> garbage, like XR24 in your example.
>> 

Hi Tvrtko,

Thanks for your suggestion! Try to allocate memory and copy e's data.
Then assign this pointer to props[].engine. It can fix this issue as well.

Best regards,
Shawn

>> Your workaround works, although strictly don't think the order of 
>> engines is guaranteed. Which is also moot since same preempt_timeout 
>> and hearbeat_interval is used for all.
>> 
>> Nevertheless, proper fix would be to allocate a make a copy of each 
>> engine and store a pointer to that. It might be an overkill but, up 
>> for discussion I guess.
>> 
>> Fixes: 9e635a1c5029 ("tests/kms_busy: Ensure GPU reset when waiting 
>> for a new FB during modeset")
>> 
>> So I'll be cheeky and add Imre and Juha-Pekka too.
>
>ugh, thanks for catching this.
>
>Would it work to save the engine class/instance instead in gpu_engines_init_timeouts(), and look up the engines using these in
>gpu_engines_restore_timeouts() ?
>
>> 
>> Regards,
>> 
>> Tvrtko


More information about the igt-dev mailing list