[PATCH i-g-t v2 4/4] tests/intel/xe_eudebug_online: Add read/write pagefault online tests

Fri Nov 22 14:33:42 UTC 2024


On 11/22/24 11:55 AM, Manszewski, Christoph wrote:
> Hi Gwan-gyeong,
> 
> On 22.11.2024 09:21, Gwan-gyeong Mun wrote:
>>
>>
>> On 11/21/24 7:12 PM, Manszewski, Christoph wrote:
>>> Hi Gwan-gyeong,
>>>
>>> On 21.11.2024 13:22, Gwan-gyeong Mun wrote:
>>>> Add read and write pagefault tests to xe_eudebug_online that checks 
>>>> if a
>>>> pagefault event is submitted by the KMD debugger when a pagefault 
>>>> occurs.
>>>>
>>>> Test that read (load instruction) and write(store instruction) 
>>>> attempt to
>>>> load or store access to unallocated memory, causing a pagefault.
>>>> Examine the address causing the page fault and the number of eu threads
>>>> causing the pagefault.
>>>>
>>>> v2: Refactor of output attention bits on pagefault event handling 
>>>> (Andrzej)
>>>>      remove / update redudant code (Andrzej, Christoph)
>>>>      use igt_container_of() macro (Christoph)
>>>>
>>>> Co-developed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
>>>> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>
>>>> ---
>>>>   tests/intel/xe_eudebug_online.c | 178 ++++++++++++++++++++++++++++ 
>>>> +++-
>>>>   1 file changed, 173 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/tests/intel/xe_eudebug_online.c b/tests/intel/ 
>>>> xe_eudebug_online.c
>>>> index 0ef0d8093..a70d18ee4 100644
>>>> --- a/tests/intel/xe_eudebug_online.c
>>>> +++ b/tests/intel/xe_eudebug_online.c
>>>> @@ -36,6 +36,8 @@
>>>>   #define BB_IN_VRAM            (1 << 11)
>>>>   #define TARGET_IN_SRAM            (1 << 12)
>>>>   #define TARGET_IN_VRAM            (1 << 13)
>>>> +#define SHADER_PAGEFAULT_READ        (1 << 14)
>>>> +#define SHADER_PAGEFAULT_WRITE        (1 << 15)
>>>>   #define TRIGGER_UFENCE_SET_BREAKPOINT    (1 << 24)
>>>>   #define TRIGGER_RESUME_SINGLE_WALK    (1 << 25)
>>>>   #define TRIGGER_RESUME_PARALLEL_WALK    (1 << 26)
>>>> @@ -45,6 +47,7 @@
>>>>   #define TRIGGER_RESUME_DSS        (1 << 30)
>>>>   #define TRIGGER_RESUME_ONE        (1 << 31)
>>>> +#define SHADER_PAGEFAULT    (SHADER_PAGEFAULT_READ | 
>>>> SHADER_PAGEFAULT_WRITE)
>>>>   #define BB_REGION_BITMASK    (BB_IN_SRAM | BB_IN_VRAM)
>>>>   #define TARGET_REGION_BITMASK    (TARGET_IN_SRAM | TARGET_IN_VRAM)
>>>> @@ -61,6 +64,8 @@
>>>>   #define CACHING_VALUE(n)    (CACHING_INIT_VALUE + (n))
>>>>   #define SHADER_CANARY 0x01010101
>>>> +#define BAD_CANARY 0xf1f1f1f
>>>> +#define BAD_OFFSET (0x12345678ull << 12)
>>>>   #define WALKER_X_DIM        4
>>>>   #define WALKER_ALIGNMENT    16
>>>> @@ -120,7 +125,7 @@ static struct intel_buf *create_uc_buf(int fd, 
>>>> int width, int height, uint64_t r
>>>>   static int get_number_of_threads(uint64_t flags)
>>>>   {
>>>> -    if (flags & SHADER_MIN_THREADS)
>>>> +    if (flags & (SHADER_MIN_THREADS | SHADER_PAGEFAULT))
>>>>           return 16;
>>>>       if (flags & (TRIGGER_RESUME_ONE | TRIGGER_RESUME_SINGLE_WALK |
>>>> @@ -179,6 +184,16 @@ static struct gpgpu_shader *get_shader(int fd, 
>>>> const unsigned int flags)
>>>>               gpgpu_shader__common_target_write_u32(shader, s_dim.y 
>>>> + i, CACHING_VALUE(i));
>>>>           gpgpu_shader__nop(shader);
>>>>           gpgpu_shader__breakpoint(shader);
>>>> +    } else if (flags & SHADER_PAGEFAULT) {
>>>> +        if (flags & SHADER_PAGEFAULT_READ)
>>>> +            gpgpu_shader__read_a64_dword(shader, BAD_OFFSET);
>>>> +        else
>>>> +            gpgpu_shader__write_a64_dword(shader, BAD_OFFSET, 
>>>> BAD_CANARY);
>>>> +
>>>> +        gpgpu_shader__label(shader, 0);
>>>> +        gpgpu_shader__write_dword(shader, SHADER_CANARY, 0);
>>>> +        gpgpu_shader__jump_neq(shader, 0, w_dim.y, STEERING_END_LOOP);
>>>> +        gpgpu_shader__write_dword(shader, SHADER_CANARY, 0);
>>>
>>> Now that I think about - do we need this to be a loop? Can't we just 
>>> do the read/write instructions? This would simplify the code and I 
>>> don't yet see why we need to loop within the shader. The SHADER_LOOP 
>>> is used for interrupt-all because we want to interrupt the workload 
>>> from the user/main igt thread. But here, similar to the basic- 
>>> breakpoint test, we just submit a workload that will halt because of 
>>> the hardware/kmd intervention.
>>>
>> the pagefault tests also need this concept.
>>
>> When a pagefault happened, KMD sets “Force Exception / Force External 
>> Halt” in TD_CTL to cause the eu threads to enter SIP mode.
>> In the pagefault handling process of eudebug, kmd installs a null page 
>> at the address where the pagefault happened and makes the halted eu 
>> threads resume (make unhalt).
>>
>> It would be ideal if all unhalted eu threads immediately entered SIP 
>> mode due to the FE/FEH settings, but it may not happen immediately.
>> Therefore, the purpose of using a loop is to ensure that the kernel 
>> shader does not terminate until a pagefault event and attention event 
>> occur by adding an additional instruction after the instruction that 
>> causes the page fault.
>> Therefore, a loop is used to ensure that at least one eu thread must 
>> enter SIP mode.
> 
> Yeah if the count of processed instructions before the exception is not 
> defined then indeed the loop has it's place here. But we still may 
> reduce a little bit of code, see below.
> 
>> The attention callback sets to exit this loop, so this code allows the 
>> eu thread to terminate after the sip shader is processed.
>>
>> Br,
>> G.G.
>>>>       }
>>>>       gpgpu_shader__eot(shader);
>>>> @@ -217,6 +232,16 @@ static int count_set_bits(void *ptr, size_t size)
>>>>       return count;
>>>>   }
>>>> +static int eu_attentions_xor_count(const uint32_t *a, const 
>>>> uint32_t *b, uint32_t size)
>>>> +{
>>>> +    int count = 0;
>>>> +
>>>> +    for (int i = 0; i < size / 4 ; i++)
>>>> +        count += igt_hweight(a[i] ^ b[i]);
>>>> +
>>>> +    return count;
>>>> +}
>>>> +
>>>>   static int count_canaries_eq(uint32_t *ptr, struct dim_t w_dim, 
>>>> uint32_t value)
>>>>   {
>>>>       int count = 0;
>>>> @@ -636,7 +661,7 @@ static void eu_attention_resume_trigger(struct 
>>>> xe_eudebug_debugger *d,
>>>>           }
>>>>       }
>>>> -    if (d->flags & SHADER_LOOP) {
>>>> +    if (d->flags & (SHADER_LOOP | SHADER_PAGEFAULT)) {
>>>
>>> If we drop the loop we can drop also this.
>>>
>>>>           uint32_t threads = get_number_of_threads(d->flags);
>>>>           uint32_t val = STEERING_END_LOOP;
>>>> @@ -746,6 +771,44 @@ static void 
>>>> eu_attention_resume_single_step_trigger(struct xe_eudebug_debugger *
>>>>               data->single_step_bitmask[i] &= ~att->bitmask[i];
>>>>   }
>>>> +static void eu_attention_resume_pagefault_trigger(struct 
>>>> xe_eudebug_debugger *d,
>>>> +                          struct drm_xe_eudebug_event *e)
>>>> +{
>>>> +    struct drm_xe_eudebug_event_eu_attention *att = 
>>>> igt_container_of(e, att, base);
>>>> +    struct online_debug_data *data = d->ptr;
>>>> +    uint32_t bitmask_size = att->bitmask_size;
>>>> +    uint8_t *bitmask;
>>>> +
>>>> +    if (data->last_eu_control_seqno > att->base.seqno)
>>>> +        return;
>>>> +
>>>> +    bitmask = calloc(1, att->bitmask_size);
>>>> +    igt_assert(bitmask);
>>>> +
>>>> +    eu_ctl_stopped(d->fd, att->client_handle, att->exec_queue_handle,
>>>> +               att->lrc_handle, bitmask, &bitmask_size);
>>>> +    igt_assert(bitmask_size == att->bitmask_size);
>>>> +
>>>> +    pthread_mutex_lock(&data->mutex);
>>>> +
>>>> +    if (d->flags & SHADER_PAGEFAULT) {
>>>> +        uint32_t threads = get_number_of_threads(d->flags);
>>>> +        uint32_t val = STEERING_END_LOOP;
>>>> +
>>>> +        igt_assert_eq(pwrite(data->vm_fd, &val, sizeof(uint32_t),
>>>> +                     data->target_offset + steering_offset(threads)),
>>>> +                  sizeof(uint32_t));
>>>> +        fsync(data->vm_fd);
>>>> +    }
>>>
>>> We can also drop this when we remove the loop. Btw. why can't we just 
>>> use 'eu_attention_resume_trigger' instead of this whole function?
> 
> We could remove the 'eu_attention_resume_trigger' like so:
> 
> ```
> diff --git a/tests/intel/xe_eudebug_online.c b/tests/intel/ 
> xe_eudebug_online.c
> index a70d18ee4..c077795ee 100644
> --- a/tests/intel/xe_eudebug_online.c
> +++ b/tests/intel/xe_eudebug_online.c
> @@ -622,7 +622,10 @@ static void eu_attention_resume_trigger(struct 
> xe_eudebug_debugger *d,
>       eu_ctl_stopped(d->fd, att->client_handle, att->exec_queue_handle,
>                  att->lrc_handle, bitmask, &bitmask_size);
>       igt_assert(bitmask_size == att->bitmask_size);
> -    igt_assert(memcmp(bitmask, att->bitmask, att->bitmask_size) == 0);
> +
> +    /* No guarantee that all pagefaulting eu threads will raise 
> attention */
> +    if (!(d->flags & SHADER_PAGEFAULT))
> +        igt_assert(memcmp(bitmask, att->bitmask, att->bitmask_size) == 0);
> 
>       pthread_mutex_lock(&data->mutex);
>       if (igt_nsec_elapsed(&data->exception_arrived) < 
> (MAX_PREEMPT_TIMEOUT + 1) * NSEC_PER_SEC &&
> @@ -771,44 +774,6 @@ static void 
> eu_attention_resume_single_step_trigger(struct xe_eudebug_debugger *
>               data->single_step_bitmask[i] &= ~att->bitmask[i];
>   }
> 
> -static void eu_attention_resume_pagefault_trigger(struct 
> xe_eudebug_debugger *d,
> -                          struct drm_xe_eudebug_event *e)
> -{
> -    struct drm_xe_eudebug_event_eu_attention *att = igt_container_of(e, 
> att, base);
> -    struct online_debug_data *data = d->ptr;
> -    uint32_t bitmask_size = att->bitmask_size;
> -    uint8_t *bitmask;
> -
> -    if (data->last_eu_control_seqno > att->base.seqno)
> -        return;
> -
> -    bitmask = calloc(1, att->bitmask_size);
> -    igt_assert(bitmask);
> -
> -    eu_ctl_stopped(d->fd, att->client_handle, att->exec_queue_handle,
> -               att->lrc_handle, bitmask, &bitmask_size);
> -    igt_assert(bitmask_size == att->bitmask_size);
> -
> -    pthread_mutex_lock(&data->mutex);
> -
> -    if (d->flags & SHADER_PAGEFAULT) {
> -        uint32_t threads = get_number_of_threads(d->flags);
> -        uint32_t val = STEERING_END_LOOP;
> -
> -        igt_assert_eq(pwrite(data->vm_fd, &val, sizeof(uint32_t),
> -                     data->target_offset + steering_offset(threads)),
> -                  sizeof(uint32_t));
> -        fsync(data->vm_fd);
> -    }
> -    pthread_mutex_unlock(&data->mutex);
> -
> -    data->last_eu_control_seqno = eu_ctl_resume(d->master_fd, d->fd, 
> att->client_handle,
> -                            att->exec_queue_handle, att->lrc_handle,
> -                            bitmask, att->bitmask_size);
> -
> -    free(bitmask);
> -}
> -
>   static void open_trigger(struct xe_eudebug_debugger *d,
>                struct drm_xe_eudebug_event *e)
>   {
> @@ -1530,7 +1495,7 @@ static void test_pagefault_online(int fd, struct 
> drm_xe_engine_class_instance *h
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_EU_ATTENTION,
>                       eu_attention_debug_trigger);
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_EU_ATTENTION,
> -                    eu_attention_resume_pagefault_trigger);
> +                    eu_attention_resume_trigger);
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_VM, vm_open_trigger);
>       xe_eudebug_debugger_add_trigger(s->debugger, 
> DRM_XE_EUDEBUG_EVENT_METADATA,
>                       create_metadata_trigger);
> ```
> 
> Does this look reasonable? I know it adds yet another path to 
> 'eu_attention_resume_trigger' but you partially account for the 
> pagefault shader in your current code anyway.
> 
Yes, right, I added a separate callback (for readability) because I 
didn't need the non-pagefault scenario checking routine of the attention 
callback.
In order to keep the code additions in this patch as small as possible, 
I'll integrate to handle pagefault scenario in 
eu_attention_resume_trigger() callback, as you suggested.

Thanks,
G.G.

> Thanks,
> Christoph
>