[igt-dev] [PATCH i-g-t] tests/xe: Add a test for corner cases of eviction.

Maarten Lankhorst maarten.lankhorst at linux.intel.com
Wed May 31 18:18:40 UTC 2023


On 2023-05-31 11:53, Thomas Hellström wrote:
> Hi,
>
>
> On 5/30/23 16:55, Maarten Lankhorst wrote:
>> Cc
>>
>> On 2023-05-29 14:48, Maarten Lankhorst wrote:
>>> Gitlab issues 194 and 239 show some corner cases. In particular,
>>> signal handling, pt eviction and losing track of external objects
>>> on the list.
>>>
>>> The patch series to fix those was sent as
>>> https://patchwork.freedesktop.org/series/118428/
>>>
>>> But because it's tricky to reproduce those issues, I had to add some
>>> custom igt's to handle those.
>>>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>
> Looks good overall, but It would be great with some more documentation so that if a test fails, it's easier to understand what is failing, see below.
>
>>> ---
>>>   tests/meson.build     |   1 +
>>>   tests/xe/xe_evicted.c | 313 ++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 314 insertions(+)
>>>   create mode 100644 tests/xe/xe_evicted.c
>>>
>>> diff --git a/tests/meson.build b/tests/meson.build
>>> index 6551194fe..166937b6c 100644
>>> --- a/tests/meson.build
>>> +++ b/tests/meson.build
>>> @@ -248,6 +248,7 @@ xe_progs = [
>>>       'xe_dma_buf_sync',
>>>       'xe_debugfs',
>>>       'xe_evict',
>>> +    'xe_evicted',
>>>       'xe_exec_balancer',
>>>       'xe_exec_basic',
>>>       'xe_exec_compute_mode',
>>> diff --git a/tests/xe/xe_evicted.c b/tests/xe/xe_evicted.c
>>> new file mode 100644
>>> index 000000000..3f5e6a82c
>>> --- /dev/null
>>> +++ b/tests/xe/xe_evicted.c
>>> @@ -0,0 +1,313 @@
>>> +// SPDX-License-Identifier: MIT
>>> +/*
>>> + * Copyright © 2023 Intel Corporation
>>> + */
>>> +
>>> +/**
>>> + * TEST: Test various eviction and ENOMEM corner cases.
>>> + * Category: Software building block
>>> + * Sub-category: VM_BIND / eviction
>>> + * Test category: functionality test
>>> + */
>>> +
>>> +#include "igt.h"
>>> +#include "lib/igt_syncobj.h"
>>> +#include "lib/intel_reg.h"
>>> +#include "xe_drm.h"
>>> +
>>> +#include "xe/xe_ioctl.h"
>>> +#include "xe/xe_query.h"
>>> +#include "xe/xe_spin.h"
>>> +#include <string.h>
>>> +
>>> +static void evict_mem(struct xe_device *xe)
>>> +{
>>> +    if (xe->has_vram)
>>> +        igt_debugfs_write(xe->fd, "vram0_evict", "1");
>>> +    else
>>> +        igt_debugfs_write(xe->fd, "gtt_evict", "1");
>>> +}
>>> +
>>> +static void execstress(struct xe_device *xe, uint32_t engine, uint32_t addr)
>>> +{
>>> +    struct timespec start, end;
>>> +
>>> +    igt_fork(child, 2) {
>>> +        struct drm_xe_sync sync = {
>>> +            .flags = DRM_XE_SYNC_SYNCOBJ | DRM_XE_SYNC_SIGNAL,
>>> +            .handle = syncobj_create(xe->fd, 0),
>>> +        };
>>> +
>>> +        clock_gettime(CLOCK_MONOTONIC, &start);
>>> +
>>> +        do {
>>> +            if (!child)
>>> +                evict_mem(xe);
>>> +            else
>>> +                xe_exec_sync(xe->fd, engine, addr, &sync, 1);
>>> +
>>> +            clock_gettime(CLOCK_MONOTONIC, &end);
>>> +        } while (end.tv_sec - start.tv_sec <= 3);
>>> +
>>> +        if (child)
>>> +            igt_assert(syncobj_wait(xe->fd, &sync.handle, 1, INT64_MAX, 0, NULL));
>>> +        syncobj_destroy(xe->fd, sync.handle);
>>> +    }
>>> +
>>> +    igt_waitchildren();
>>> +
>>> +    /* Leave function in a known state */
>>> +    evict_mem(xe);
>>> +}
>>> +
>>> +/* See issue 194, unbinding first VMA may cause BO not to be locked correctly */
>>> +static void test_multimap(struct xe_device *xe, bool external)
>>> +{
>>> +    uint64_t size = xe->default_alignment;
>>> +    uint32_t vm = xe_vm_create(xe->fd, DRM_XE_VM_CREATE_SCRATCH_PAGE, 0);
>>> +    uint32_t bo = xe_bo_create(xe->fd, 0, external ? 0 : vm, size);
>>> +    uint32_t engine = xe_engine_create_class(xe->fd, vm, DRM_XE_ENGINE_CLASS_COPY);
>>> +    uint32_t *map = xe_bo_map(xe->fd, bo, size);
>>> +
>>> +    *map = MI_BATCH_BUFFER_END;
>>> +    munmap(map, size);
>>> +
>>> +    /* Create 2 mappings */
>>> +    xe_vm_bind_sync(xe->fd, vm, bo, 0, 0, size);
>>> +    xe_vm_bind_sync(xe->fd, vm, bo, 0, size, size);
>>> +
>>> +    execstress(xe, engine, size);
>>> +
>>> +    /* Unbind the first one, to possibly confuse stuff */
>>> +    xe_vm_unbind_sync(xe->fd, vm, 0, 0, size);
>>> +
>>> +    execstress(xe, engine, size);
>>> +
>>> +    /* Teardown.. */
>>> +    xe_engine_destroy(xe->fd, engine);
>>> +    xe_vm_destroy(xe->fd, vm);
>>> +    gem_close(xe->fd, bo);
>>> +}
>>> +
>>> +static uint64_t avail_vram(struct xe_device **xe)
>>> +{
>>> +    int fd = (*xe)->fd;
>>> +    int region_idx;
>>> +
>>> +    /* Refresh to get up-to-date values */
>>> +    xe_device_put(fd);
>>> +    *xe = xe_device_get(fd);
>>> +
>>> +    region_idx = ffs((*xe)->gts->gts[0].native_mem_regions) - 1;
>>> +
>>> +    return (*xe)->mem_usage->regions[region_idx].total_size -
>>> +           (*xe)->mem_usage->regions[region_idx].used;
>>> +}
>>> +
>>> +static void test_pt_eviction(struct xe_device *xe, bool external)
>
> Could we have an overall description on what this test does and how it does it?
I put a small description in the test comments, basically try to force a pt eviction.
>
>>> +{
>>> +    uint32_t bo, i;
>>> +    uint64_t size, full_size, full_addr, addr, avail;
>>> +    uint32_t vm;
>>> +    uint32_t engine;
>>> +    uint32_t *map;
>>> +    struct drm_xe_exec exec = {};
>>> +    igt_require(xe->has_vram);
>>> +
>>> +    vm = xe_vm_create(xe->fd, 0, 0);
>>> +    engine = xe_engine_create_class(xe->fd, vm, DRM_XE_ENGINE_CLASS_COPY);
>>> +
>>> +    /* Refresh xe_device size after creating vm + engine */
>>> +    evict_mem(xe);
>>> +    full_size = (avail_vram(&xe) - (2 << 20ULL)) & ~((2 << 20ULL) - 1);
> And what we are doing here and what the magical shifts come from?
 Should probably add a define for 2 megabyte instead. 1 << 20 is 1 mb. :)
>>> +
>>> +    bo = xe_bo_create(xe->fd, 0, external ? 0 : vm, full_size);
>>> +
>>> +    /* Only map what we need.. */
>>> +    map = xe_bo_map(xe->fd, bo, xe->default_alignment);
>>> +    *map = MI_BATCH_BUFFER_END;
>>> +    munmap(map, xe->default_alignment);
>>> +
>>> +    /* Stuff bo at the end, so we can use start for partial bo's */
>>> +    full_addr = (1ULL << (uint64_t)xe_va_bits(xe->fd)) - full_size;
>>> +    xe_vm_bind_sync(xe->fd, vm, bo, 0, full_addr, full_size);
>>> +
>>> +    exec.engine_id = engine;
>>> +    exec.num_batch_buffer = 1;
>>> +    exec.address = full_addr;
>>> +
>>> +    /* Now map partially at 0, and see how many pagetables we can still fill.. */
>>> +    addr = 0;
>>> +    size = xe->default_alignment;
>>> +    xe_vm_bind_sync(xe->fd, vm, bo, 0, addr, size);
>>> +
>>> +    xe_exec_sync(xe->fd, engine, full_addr, NULL, 0);
>>> +
>>> +    avail = full_size >> 12;
>>> +    while (1) {
>>> +        uint64_t new_avail = avail_vram(&xe) >> 12;
>>> +
>>> +        if (new_avail >= avail)
>>> +            igt_info("TTM delayed destroy may have freed some memory\n");
>>> +        avail = new_avail;
>>> +
>>> +        igt_info("%"PRIu64" available 4K pages left\n", avail);
>>> +        if (!avail)
>>> +            break;
>>> +
>>> +        if (avail == 1) {
>>> +            xe_vm_bind_sync(xe->fd, vm, bo, 0, 2 << 20ULL, size);
>>> +            break;
>>> +        }
>>> +
>>> +        addr += 1 << 30ULL;
>>> +
>>> +        /* Remove 1 for second level PT */
>>> +        avail--;
>>> +
>>> +        for (i = 0; i < min((uint64_t)512, avail); i++) {
>>> +            uint64_t this_addr = addr + (i << 21);
>>> +
>>> +            xe_vm_bind_sync(xe->fd, vm, bo, 0, this_addr, size);
>>> +        }
>>> +    }
>>> +
>>> +    /* Verify that with 0 memory available, exec still completes */
>>> +    xe_exec_sync(xe->fd, engine, full_addr, NULL, 0);
>>> +
>>> +    /* Bind 1 more to go over the edge, this should never succeed.. */
>>> +    if (__xe_vm_bind(xe->fd, vm, 0, bo, 0, 4 << 20ULL, size,
>>> +             XE_VM_BIND_OP_MAP, NULL, 0, 0, 0) == 0) {
>>> +
>>> +        /* Boom! */
>>> +        igt_assert(igt_ioctl(xe->fd, DRM_IOCTL_XE_EXEC, &exec) < 0 &&
>>> +               errno == ENOMEM);
>>> +
>>> +        xe_vm_unbind_sync(xe->fd, vm, 0, 4 << 20ULL, size);
>>> +    } else {
>>> +        igt_assert(errno == ENOMEM);
>>> +    }
>>> +
>>> +    /*
>>> +     * After evicting, we may end up creating new page tables,
>>> +     * so this should fail..
>>> +     */
>>> +    evict_mem(xe);
>>> +    igt_assert(igt_ioctl(xe->fd, DRM_IOCTL_XE_EXEC, &exec) < 0 &&
>>> +           errno == ENOMEM);
>>> +
>>> +    /* Cleanup */
>>> +    xe_engine_destroy(xe->fd, engine);
>>> +    xe_vm_destroy(xe->fd, vm);
>>> +    gem_close(xe->fd, bo);
>>> +}
>>> +
>>> +/* Test OOM with signal handler enabled, to mimic X.org (issue #239) */
>
> Same thing here, what's done and how is it done? Are we actually getting a signal here, or just wait interruptible and may or may not get a signal?

We attempt to cause a signal through igt_while_interruptible, it changes the igt_ioctl pointer from drmIoctl to sig_ioctl,

which attempts to cause as many EINTR's as possible. That's why igt uses igt_ioctl internally and not drmIoctl.


> Thanks,
>
> Thomas
~Maarten


More information about the igt-dev mailing list