[igt-dev] [PATCH i-g-t 2/2] tests/i915/perf: Exercise barrier race

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Wed Feb 1 19:16:11 UTC 2023


Hi Kamil,

On Wednesday, 1 February 2023 19:21:57 CET Kamil Konieczny wrote:
> Hi Janusz,
> 
> please send patches to igt ML and add other addresses to cc:
> I have one nit, see below.
> 
> On 2023-01-31 at 10:17:31 +0100, Janusz Krzysztofik wrote:
> > Add a new subtest focused on exercising interaction between perf open /
> > close operations, which replace active barriers with perf requests, and
> > concurrent barrier preallocate / acquire operations performed during
> > context first pin / last unpin.
> > 
> > References: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> > ---
> >  tests/i915/perf.c | 41 +++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 39 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tests/i915/perf.c b/tests/i915/perf.c
> > index e33cacc443..11a3ec21ab 100644
> > --- a/tests/i915/perf.c
> > +++ b/tests/i915/perf.c
> > @@ -39,6 +39,7 @@
> >  #include <math.h>
> >  
> >  #include "i915/gem.h"
> > +#include "i915/gem_create.h"
> >  #include "i915/perf.h"
> >  #include "igt.h"
> >  #include "igt_perf.h"
> > @@ -4885,7 +4886,27 @@ test_whitelisted_registers_userspace_config(void)
> >  	i915_perf_remove_config(drm_fd, config_id);
> >  }
> >  
> > -static void test_open_race(const struct intel_execution_engine2 *e, int timeout)
> > +static void gem_exec_nop(int i915, const struct intel_execution_engine2 *e)
> > +{
> > +	const uint32_t bbe = MI_BATCH_BUFFER_END;
> > +	struct drm_i915_gem_exec_object2 obj = { };
> > +	struct drm_i915_gem_execbuffer2 execbuf = {
> > +		.buffers_ptr = to_user_pointer(&obj),
> > +		.buffer_count = 1,
> > +	};
> > +
> > +	obj.handle = gem_create(i915, 4096);
> > +	gem_write(i915, obj.handle, 0, &bbe, sizeof(bbe));
> > +
> > +	execbuf.flags = e->flags;
> > +	gem_execbuf(i915, &execbuf);
> > +
> > +	gem_sync(i915, obj.handle);
> > +	gem_close(i915, obj.handle);
> > +}
> > +
> > +static void test_open_race(const struct intel_execution_engine2 *e, int timeout,
> > +			   bool use_spin)
> -------------------------- ^
> This is not the best way to develop new code, it may be good
> for fast development but imho it is better to refactor existing
> code and avoiding to add bool logic into function.

I'm not sure what you mean.  By refactoring, do you mean moving the code 
common to both processing paths out to a separate helper, then call it from 
two distinct functions, each implementing only one scenario?  What's wrong 
with passing flags that select one of alternative processing paths within one 
function?  Or are you just not happy with bool type?

> It can be done later as this is revealing the bug.
> 
> >  {
> >  	int *done;
> >  
> > @@ -4926,6 +4947,12 @@ static void test_open_race(const struct intel_execution_engine2 *e, int timeout)
> >  				ctx = gem_context_create_for_engine(i915, e->class, e->instance);
> >  				gem_context_set_persistence(i915, ctx, persistence);
> >  
> > +				if (!use_spin) {
> > +					gem_exec_nop(i915, e);
> > +					gem_context_destroy(i915, ctx);
> > +					continue;
> > +				}
> > +
> >  				spin = __igt_spin_new(i915, ctx, .ahnd = ahnd);
> >  				for (int i = random() % 7; i--; ) {
> >  					if (random() & 1) {
> > @@ -5330,7 +5357,17 @@ igt_main
> >  		for_each_physical_engine(drm_fd, e)
> >  			if (e->class == I915_ENGINE_CLASS_RENDER)
> >  				igt_dynamic_f("%s", e->name)
> > -					test_open_race(e, 5);
> > +					test_open_race(e, 5, true);
> > +	}
> > +
> 
> Please add here some TODO comments from discussions and a note
> which will help bug filling team to classify that.

TODO about moving the subtest to a different test -- OK.  But instructions for 
Bug Filling team?  Hmm, what if this subtest triggers a completely different 
bug in the future?  Are we going to update the comments then?  Do Bug Filling 
team members really read the sources while classifying bugs?

Please have another look at an example of expected dmesg from the currently 
triggered bug:

<4> [192.634015] list_del corruption. prev->next should be ffff8881479d34e0, but was ffff888108af4ce0. (prev=ffff888127335160)
<4> [192.634032] WARNING: CPU: 1 PID: 1286 at lib/list_debug.c:59 __list_del_entry_valid+0xb7/0xe0
<4> [192.634041] Modules linked in: vgem drm_shmem_helper fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio x86_pkg_temp_thermal coretemp i915 prime_numbers i2c_algo_bit kvm_intel ttm snd_hda_intel snd_intel_dspcfg kvm drm_buddy mei_pxp snd_hda_codec smsc75xx irqbypass mei_hdcp e1000e drm_display_helper usbnet snd_hwdep crct10dif_pclmul wmi_bmof mii crc32_pclmul ghash_clmulni_intel igc snd_hda_core drm_kms_helper ptp mei_me i2c_i801 syscopyarea snd_pcm pps_core sysfillrect i2c_smbus mei sysimgblt video intel_lpss_pci wmi
<4> [192.634156] CPU: 1 PID: 1286 Comm: perf Not tainted 6.2.0-rc6-CI_DRM_12672-gdf5c31e78609+ #1
<4> [192.634161] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S SODIMM RVP, BIOS RKLSFWI1.R00.1435.A00.2010232019 10/23/2020
<4> [192.634165] RIP: 0010:__list_del_entry_valid+0xb7/0xe0
<4> [192.634170] Code: cc cc cc 48 89 ca 48 c7 c7 78 00 46 82 e8 0d 7e 61 00 0f 0b 31 c0 c3 cc cc cc cc 4c 89 c2 48 c7 c7 b0 00 46 82 e8 f5 7d 61 00 <0f> 0b 31 c0 c3 cc cc cc cc 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 f8
<4> [192.634174] RSP: 0018:ffffc90001f2fcb0 EFLAGS: 00010082
<4> [192.634180] RAX: 0000000000000000 RBX: ffff88812f8dec40 RCX: 0000000000000000
<4> [192.634184] RDX: 0000000000000003 RSI: ffffffff823d7c88 RDI: 00000000ffffffff
<4> [192.634187] RBP: ffff8881479d34d8 R08: 0000000000000000 R09: ffffc90001f2fb60
<4> [192.634191] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88812f8df440
<4> [192.634194] R13: ffff88812f8df440 R14: ffff8881479d34e0 R15: 0000000000000246
<4> [192.634197] FS:  00007fa38f9514c0(0000) GS:ffff888450080000(0000) knlGS:0000000000000000
<4> [192.634201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [192.634205] CR2: 00007fa392c76000 CR3: 000000010809e003 CR4: 0000000000770ee0
<4> [192.634208] PKRU: 55555554
<4> [192.634211] Call Trace:
<4> [192.634214]  <TASK>
<4> [192.634218]  __i915_active_fence_set+0x71/0xf0 [i915]
<4> [192.634412]  __i915_request_commit+0x2e1/0x5b0 [i915]
<4> [192.634597]  i915_request_add+0xa6/0x330 [i915]
<4> [192.634783]  gen8_modify_context+0xc2/0x220 [i915]
<4> [192.634958]  oa_configure_all_contexts.isra.0+0x183/0x400 [i915]
<4> [192.635137]  gen12_disable_metric_set+0x98/0x160 [i915]
<4> [192.635305]  i915_oa_stream_destroy+0x3c/0x330 [i915]
<4> [192.635480]  i915_perf_destroy_locked+0x22/0x80 [i915]
<4> [192.635646]  i915_perf_release+0x35/0x50 [i915]
<4> [192.635808]  __fput+0x95/0x260

Isn't such information from actual bug occurrence, contained in CI reports the 
Bug Filling team is working with, more meaningful than any comment added to 
the test source code?  How can a test know in advance what bugs it will ever 
trigger, and who exactly will be responsible for fixing them?

Thanks,
Janusz

> 
> Regards,
> Kamil
> 
> > +	igt_describe("Exercise perf open/close against intensive barrier preallocate/acquire");
> > +	igt_subtest_with_dynamic("barrier-race") {
> > +		const struct intel_execution_engine2 *e;
> > +
> > +		for_each_physical_engine(drm_fd, e)
> > +			if (e->class == I915_ENGINE_CLASS_RENDER)
> > +				igt_dynamic_f("%s", e->name)
> > +					test_open_race(e, 5, false);
> >  	}
> >  
> >  	igt_fixture {
> 






More information about the igt-dev mailing list