[igt-dev] [PATCH i-g-t 2/2] tests/i915/perf: Exercise barrier race
Kamil Konieczny
kamil.konieczny at linux.intel.com
Thu Feb 2 10:19:56 UTC 2023
Hi Janusz,
On 2023-02-01 at 20:16:11 +0100, Janusz Krzysztofik wrote:
> Hi Kamil,
>
> On Wednesday, 1 February 2023 19:21:57 CET Kamil Konieczny wrote:
> > Hi Janusz,
> >
> > please send patches to igt ML and add other addresses to cc:
> > I have one nit, see below.
> >
> > On 2023-01-31 at 10:17:31 +0100, Janusz Krzysztofik wrote:
> > > Add a new subtest focused on exercising interaction between perf open /
> > > close operations, which replace active barriers with perf requests, and
> > > concurrent barrier preallocate / acquire operations performed during
> > > context first pin / last unpin.
> > >
> > > References: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
> > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > > Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> > > ---
> > > tests/i915/perf.c | 41 +++++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 39 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tests/i915/perf.c b/tests/i915/perf.c
> > > index e33cacc443..11a3ec21ab 100644
> > > --- a/tests/i915/perf.c
> > > +++ b/tests/i915/perf.c
> > > @@ -39,6 +39,7 @@
> > > #include <math.h>
> > >
> > > #include "i915/gem.h"
> > > +#include "i915/gem_create.h"
> > > #include "i915/perf.h"
> > > #include "igt.h"
> > > #include "igt_perf.h"
> > > @@ -4885,7 +4886,27 @@ test_whitelisted_registers_userspace_config(void)
> > > i915_perf_remove_config(drm_fd, config_id);
> > > }
> > >
> > > -static void test_open_race(const struct intel_execution_engine2 *e, int timeout)
> > > +static void gem_exec_nop(int i915, const struct intel_execution_engine2 *e)
> > > +{
> > > + const uint32_t bbe = MI_BATCH_BUFFER_END;
> > > + struct drm_i915_gem_exec_object2 obj = { };
> > > + struct drm_i915_gem_execbuffer2 execbuf = {
> > > + .buffers_ptr = to_user_pointer(&obj),
> > > + .buffer_count = 1,
> > > + };
> > > +
> > > + obj.handle = gem_create(i915, 4096);
> > > + gem_write(i915, obj.handle, 0, &bbe, sizeof(bbe));
> > > +
> > > + execbuf.flags = e->flags;
> > > + gem_execbuf(i915, &execbuf);
> > > +
> > > + gem_sync(i915, obj.handle);
> > > + gem_close(i915, obj.handle);
> > > +}
> > > +
> > > +static void test_open_race(const struct intel_execution_engine2 *e, int timeout,
> > > + bool use_spin)
> > -------------------------- ^
> > This is not the best way to develop new code, it may be good
> > for fast development but imho it is better to refactor existing
> > code and avoiding to add bool logic into function.
>
> I'm not sure what you mean. By refactoring, do you mean moving the code
> common to both processing paths out to a separate helper, then call it from
> two distinct functions, each implementing only one scenario? What's wrong
> with passing flags that select one of alternative processing paths within one
> function? Or are you just not happy with bool type?
It is not wrong unless there is only one such var but even then
readability of code is reduced.
>
> > It can be done later as this is revealing the bug.
> >
> > > {
> > > int *done;
> > >
> > > @@ -4926,6 +4947,12 @@ static void test_open_race(const struct intel_execution_engine2 *e, int timeout)
> > > ctx = gem_context_create_for_engine(i915, e->class, e->instance);
> > > gem_context_set_persistence(i915, ctx, persistence);
> > >
> > > + if (!use_spin) {
> > > + gem_exec_nop(i915, e);
> > > + gem_context_destroy(i915, ctx);
> > > + continue;
> > > + }
> > > +
> > > spin = __igt_spin_new(i915, ctx, .ahnd = ahnd);
> > > for (int i = random() % 7; i--; ) {
> > > if (random() & 1) {
> > > @@ -5330,7 +5357,17 @@ igt_main
> > > for_each_physical_engine(drm_fd, e)
> > > if (e->class == I915_ENGINE_CLASS_RENDER)
> > > igt_dynamic_f("%s", e->name)
> > > - test_open_race(e, 5);
> > > + test_open_race(e, 5, true);
> > > + }
> > > +
> >
> > Please add here some TODO comments from discussions and a note
> > which will help bug filling team to classify that.
>
> TODO about moving the subtest to a different test -- OK. But instructions for
> Bug Filling team? Hmm, what if this subtest triggers a completely different
> bug in the future? Are we going to update the comments then? Do Bug Filling
> team members really read the sources while classifying bugs?
>
> Please have another look at an example of expected dmesg from the currently
> triggered bug:
>
> <4> [192.634015] list_del corruption. prev->next should be ffff8881479d34e0, but was ffff888108af4ce0. (prev=ffff888127335160)
> <4> [192.634032] WARNING: CPU: 1 PID: 1286 at lib/list_debug.c:59 __list_del_entry_valid+0xb7/0xe0
Ok, I see that it is kernel bug here. You are right,
please add only TODO here.
Regards,
Kamil
> <4> [192.634041] Modules linked in: vgem drm_shmem_helper fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio x86_pkg_temp_thermal coretemp i915 prime_numbers i2c_algo_bit kvm_intel ttm snd_hda_intel snd_intel_dspcfg kvm drm_buddy mei_pxp snd_hda_codec smsc75xx irqbypass mei_hdcp e1000e drm_display_helper usbnet snd_hwdep crct10dif_pclmul wmi_bmof mii crc32_pclmul ghash_clmulni_intel igc snd_hda_core drm_kms_helper ptp mei_me i2c_i801 syscopyarea snd_pcm pps_core sysfillrect i2c_smbus mei sysimgblt video intel_lpss_pci wmi
> <4> [192.634156] CPU: 1 PID: 1286 Comm: perf Not tainted 6.2.0-rc6-CI_DRM_12672-gdf5c31e78609+ #1
> <4> [192.634161] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S SODIMM RVP, BIOS RKLSFWI1.R00.1435.A00.2010232019 10/23/2020
> <4> [192.634165] RIP: 0010:__list_del_entry_valid+0xb7/0xe0
> <4> [192.634170] Code: cc cc cc 48 89 ca 48 c7 c7 78 00 46 82 e8 0d 7e 61 00 0f 0b 31 c0 c3 cc cc cc cc 4c 89 c2 48 c7 c7 b0 00 46 82 e8 f5 7d 61 00 <0f> 0b 31 c0 c3 cc cc cc cc 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 f8
> <4> [192.634174] RSP: 0018:ffffc90001f2fcb0 EFLAGS: 00010082
> <4> [192.634180] RAX: 0000000000000000 RBX: ffff88812f8dec40 RCX: 0000000000000000
> <4> [192.634184] RDX: 0000000000000003 RSI: ffffffff823d7c88 RDI: 00000000ffffffff
> <4> [192.634187] RBP: ffff8881479d34d8 R08: 0000000000000000 R09: ffffc90001f2fb60
> <4> [192.634191] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88812f8df440
> <4> [192.634194] R13: ffff88812f8df440 R14: ffff8881479d34e0 R15: 0000000000000246
> <4> [192.634197] FS: 00007fa38f9514c0(0000) GS:ffff888450080000(0000) knlGS:0000000000000000
> <4> [192.634201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [192.634205] CR2: 00007fa392c76000 CR3: 000000010809e003 CR4: 0000000000770ee0
> <4> [192.634208] PKRU: 55555554
> <4> [192.634211] Call Trace:
> <4> [192.634214] <TASK>
> <4> [192.634218] __i915_active_fence_set+0x71/0xf0 [i915]
> <4> [192.634412] __i915_request_commit+0x2e1/0x5b0 [i915]
> <4> [192.634597] i915_request_add+0xa6/0x330 [i915]
> <4> [192.634783] gen8_modify_context+0xc2/0x220 [i915]
> <4> [192.634958] oa_configure_all_contexts.isra.0+0x183/0x400 [i915]
> <4> [192.635137] gen12_disable_metric_set+0x98/0x160 [i915]
> <4> [192.635305] i915_oa_stream_destroy+0x3c/0x330 [i915]
> <4> [192.635480] i915_perf_destroy_locked+0x22/0x80 [i915]
> <4> [192.635646] i915_perf_release+0x35/0x50 [i915]
> <4> [192.635808] __fput+0x95/0x260
>
> Isn't such information from actual bug occurrence, contained in CI reports the
> Bug Filling team is working with, more meaningful than any comment added to
> the test source code? How can a test know in advance what bugs it will ever
> trigger, and who exactly will be responsible for fixing them?
>
> Thanks,
> Janusz
>
> >
> > Regards,
> > Kamil
> >
> > > + igt_describe("Exercise perf open/close against intensive barrier preallocate/acquire");
> > > + igt_subtest_with_dynamic("barrier-race") {
> > > + const struct intel_execution_engine2 *e;
> > > +
> > > + for_each_physical_engine(drm_fd, e)
> > > + if (e->class == I915_ENGINE_CLASS_RENDER)
> > > + igt_dynamic_f("%s", e->name)
> > > + test_open_race(e, 5, false);
> > > }
> > >
> > > igt_fixture {
> >
>
>
>
>
More information about the igt-dev
mailing list