Can you help me debug an issue?

Peter Senna Tschudin peter.senna at gmail.com
Sat Mar 23 13:05:39 UTC 2024


Dear List,

I found a commit that introduced a regression that broke the tests
gem_exec_capture at many-4k-incremental and
gem_exec_capture at many-4k-zero. Reverting 93c5ec210 fixes the issue,
but it does not tell what the problem is.

The problem is that `e` gets corrupted when `many()` calls the macro
`find_first_available_engine`, and the corruption happens at the line
`saved = configure_hangs(fd, e, ctx->id);`. By corrupted, I mean that
the field name gets empty and the field class gets a large number.

After `e` gets corrupted, the call to __captureN() will fail because
it expects 'e' to be valid. A simple fix is to add `e =
&saved_engine.engine;` before the call to __captureN().

I have been trying to understand why `e` gets corrupted for a few
hours, and I ran out of ideas. To make the code more gdb-friendly, I
have unfolded the macro find_first_available_engine, but that did not
help me find the reason for the `e` corruption. Here is how I have
unfolded the macros:

-       find_first_available_engine(fd, ctx, e, saved_engine);
+       ctx = intel_ctx_create_all_physical(fd);
+       igt_assert(ctx);
+       for (struct intel_engine_data i =
intel_engine_list_for_ctx_cfg(fd, &(ctx)->cfg);
+            (e = intel_get_current_engine(&i));
+            intel_next_engine(&i)) {
+               if ((gem_class_can_store_dword(fd, e->class)))
+                       break;
+                }
+       igt_assert(e);
+       printf("e->name: %s\n", e->name);
+       saved_engine = configure_hangs(fd, e, ctx->id);
+       printf("e->name: %s\n", e->name);

Reverting 93c5ec210 stops the corruption from happening, and I am
trying to understand why. Can you help me debug this further?

Thank you,

Peter




-- 
                         Peter


More information about the igt-dev mailing list