[PATCH v9] tests/intel/xe_exec_capture: Add xe_exec_capture test
Teres Alexis, Alan Previn
alan.previn.teres.alexis at intel.com
Thu Dec 12 19:54:58 UTC 2024
Zhanjun, per offline chats with Kamil looks like we need to expand the
igt_fixture sections before and after the igt_subtest section and
save the per-engine-timeouts in the initial fixture and restore
the per-engine-timeouts in the later fixture because the fixture
section is not bypassed during an assert. That's what i understood.
That said, we will need another rev of this.
On Wed, 2024-12-11 at 14:08 -0800, Teres Alexis, Alan Previn wrote:
> Just re-RB-ing after the recent addition for the change to set engine execution time manually before running the tests
> on each engine in order to limit the execution time of this test:
>
> Reviewed-by: Alan Previn <alan.previn.teres.alexis at intel.com>
>
>
> On Fri, 2024-12-06 at 14:59 -0800, Dong, Zhanjun wrote:
> > Submit cmds to the GPU that result in a GuC engine reset and check that
> > devcoredump register dump is generated, by the GuC, and includes the
> > full register range.
> >
> > Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> > Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
> > Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> > ---
> > Changes from prior revs:
> > v9:- Reduced job timeout to 2 seconds to speedup test
> > Add info print to show test is running on single/multiple GPU
> > v8:- Move change list below ---
>
>
> alan: I just reviewed the difference of the last two revs (diff of diff
> farther below):
> with that change, we hope it will address Kamil's concern by reducing the execution
> time dramatically. IIRC Zhanjun couldn't designate any subtest to declare pass
> or fail without ensuring multiple engines are executed-on back to back since the
> test needs to ensure that XE-KMD is catching the correct guc-error-dump for the
> exact batch on the exact engine we expect it to capture amidst multiple back to back
> runs of different-batches-same-engine vs different-engines. (the test uses the ring
> buffer batch buffer address as a way to differentiate and determine precisely).
>
>
> 28a29
> > +#include "igt_sysfs.h"
> 37a39,40
> > +#define CAPTURE_JOB_TIMEOUT 2000
> > +#define JOB_TIMOUT_ENTRY "job_timeout_ms"
> 83a87,109
> > +static u64
> > +xe_sysfs_get_job_timeout_ms(int fd, struct drm_xe_engine_class_instance *eci)
> > +{
> > + int engine_fd = -1;
> > + u64 ret;
> > +
> > + engine_fd = xe_sysfs_engine_open(fd, eci->gt_id, eci->engine_class);
> > + ret = igt_sysfs_get_u64(engine_fd, JOB_TIMOUT_ENTRY);
> > + close(engine_fd);
> > +
> > + return ret;
> > +}
> > +
> > +static void xe_sysfs_set_job_timeout_ms(int fd, struct drm_xe_engine_class_instance *eci,
> > + u64 timeout)
> > +{
> > + int engine_fd = -1;
> > +
> > + engine_fd = xe_sysfs_engine_open(fd, eci->gt_id, eci->engine_class);
> > + igt_sysfs_set_u64(engine_fd, JOB_TIMOUT_ENTRY, CAPTURE_JOB_TIMEOUT);
> > + close(engine_fd);
> > +}
> > +
>
> ...
>
> > xe_for_each_engine(fd, hwe) {
> > /*
> > * To test devcoredump register data, the test batch address is
> > * used to compare with the dump, address bit 40 to 46 act as
> > * context id, which start with an random number, increased 1
> > * per engine. By this way, the address is unique for each
> > * engine, and start with an random number on each run.
> > */
> > const u64 addr = BASE_ADDRESS | ((u64)(engine_cid++ % CID_ADDRESS_MASK) <<
> > ADDRESS_SHIFT);
> 413a440
> > + u64 job_timeout = xe_sysfs_get_job_timeout_ms(fd, hwe);
> 417a445,447
> > + /* Reduce timeout value to speedup test */
> > + xe_sysfs_set_job_timeout_ms(fd, hwe, CAPTURE_JOB_TIMEOUT);
> > +
> 419a450,452
> > + /* Restore timeout value */
> > + xe_sysfs_set_job_timeout_ms(fd, hwe, job_timeout);
> > +
> 460a494,495
> > + igt_info("Running test on multiple GPU\n");
> > +
> 473a509
> > + igt_info("Running test on single GPU\n");
>
More information about the igt-dev
mailing list