[Intel-gfx] [PATCH v8 3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines

Mon Oct 9 13:24:03 UTC 2023

Hi Tvrtko,

On 10/9/2023 2:54 PM, Tvrtko Ursulin wrote:
<snip>
>
>>> +static long must_wait_woken(struct wait_queue_entry *wq_entry, long 
>>> timeout)
>>> +{
>>> +    /*
>>> +     * This is equivalent to wait_woken() with the exception that
>>> +     * we do not wake up early if the kthread task has been completed.
>>> +     * As we are called from page reclaim in any task context,
>>> +     * we may be invoked from stopped kthreads, but we *must*
>>> +     * complete the wait from the HW .
>>> +     *
>>> +     * A second problem is that since we are called under reclaim
>>> +     * and wait_woken() inspected the thread state, it makes an 
>>> invalid
>>> +     * assumption that all PF_KTHREAD tasks have set_kthread_struct()
>>> +     * called upon them, and will trigger a GPF
>>
>> As discussed internally, the GPF issue is resolved with 
>> https://lore.kernel.org/all/20230602212350.535358-1-jstultz@google.com/ 
>> <https://lore.kernel.org/all/20230602212350.535358-1-jstultz@google.com/> 
>>
>
> If it means no open coding a slighlt different wait_token() that would 
> be a lot better ineed.
>
> Although the question of why a single wait queue is not good enough 
> still remains. As I wrote before - give each invalidation a seqno, 
> upon receiving the done h2g store the latest seqno in the driver and 
> wake up all sleepers. They check the seqno and if their has passed 
> exit, otherwise go back to sleep. No xarray needed. Put_pages() 
> invalidations are already serialized so no ill-effect and GGTT 
> invalidations are, or are not, performance sensitive for some reason?

I think your proposed solution should work as per my understanding 
because we are doing TLB invalidation of all engines and it is 
serialized with gt->tlb.invalidate_lock.

We might need this when we want to make it more fine grain and do vma 
based ranged invalidation for better performance. So I think we should 
try with a single wait queue now.

I don't have an concrete answer yet, may be someone else?

Regards,

Nirmoy

>
> Regards,
>
> Tvrtko
>
>>
>>
>>>   in is_kthread_should_stop().
>>> +     */
>>> +    do {
>>> +        set_current_state(TASK_UNINTERRUPTIBLE);
>>> +        if (wq_entry->flags & WQ_FLAG_WOKEN)
>>> +            break;
>>> +
>>> +        timeout = schedule_timeout(timeout);
>>> +    } while (timeout);
>>> +    __set_current_state(TASK_RUNNING);
>>> +
>>> +    /* See wait_woken() and woken_wake_function() */
>>> +    smp_store_mb(wq_entry->flags, wq_entry->flags & ~WQ_FLAG_WOKEN);
>>> +
>>> +    return timeout;
>>> +}
>>> +
>>> +static int guc_send_invalidate_tlb(struct intel_guc *guc, enum 
>>> intel_guc_tlb_inval_mode type)
>>
>>
>> 2nd param should be intel_guc_tlb_invalidation_type not 
>> intel_guc_tlb_inval_mod.
>>
>> Not sure why didn't CI complained.
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>>> +{
>>> +    struct intel_guc_tlb_wait _wq, *wq = &_wq;
>>> +    DEFINE_WAIT_FUNC(wait, woken_wake_function);
>>> +    int err;
>>> +    u32 seqno;
>>> +    u32 action[] = {
>>> +        INTEL_GUC_ACTION_TLB_INVALIDATION,
>>> +        0,
>>> +        REG_FIELD_PREP(INTEL_GUC_TLB_INVAL_TYPE_MASK, type) |
>>> +            REG_FIELD_PREP(INTEL_GUC_TLB_INVAL_MODE_MASK,
>>> +                       INTEL_GUC_TLB_INVAL_MODE_HEAVY) |
>>> +            INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
>>> +    };
>>> +    u32 size = ARRAY_SIZE(action);
>>> +
>>> +    init_waitqueue_head(&_wq.wq);
>>> +
>>> +    if (xa_alloc_cyclic_irq(&guc->tlb_lookup, &seqno, wq,
>>> +                xa_limit_32b, &guc->next_seqno,
>>> +                GFP_ATOMIC | __GFP_NOWARN) < 0) {
>>> +        /* Under severe memory pressure? Serialise TLB allocations */
>>> +        xa_lock_irq(&guc->tlb_lookup);
>>> +        wq = xa_load(&guc->tlb_lookup, guc->serial_slot);
>>> +        wait_event_lock_irq(wq->wq,
>>> +                    !READ_ONCE(wq->busy),
>>> +                    guc->tlb_lookup.xa_lock);
>>> +        /*
>>> +         * Update wq->busy under lock to ensure only one waiter can
>>> +         * issue the TLB invalidation command using the serial slot 
>>> at a
>>> +         * time. The condition is set to true before releasing the 
>>> lock
>>> +         * so that other caller continue to wait until woken up again.
>>> +         */
>>> +        wq->busy = true;
>>> +        xa_unlock_irq(&guc->tlb_lookup);
>>> +
>>> +        seqno = guc->serial_slot;
>>> +    }
>>> +
>>> +    action[1] = seqno;
>>> +
>>> +    add_wait_queue(&wq->wq, &wait);
>>> +
>>> +    /*
>>> +     * This is a critical reclaim path and thus we must loop here:
>>> +     * We cannot block for anything that is on the GPU.
>>> +     */
>>> +    err = intel_guc_send_busy_loop(guc, action, size, 
>>> G2H_LEN_DW_INVALIDATE_TLB, true);
>>> +    if (err)
>>> +        goto out;
>>> +
>>> +    if (!must_wait_woken(&wait, 
>>> intel_guc_ct_expected_delay(&guc->ct))) {
>>> +        guc_err(guc,
>>> +            "TLB invalidation response timed out for seqno %u\n", 
>>> seqno);
>>> +        err = -ETIME;
>>> +    }
>>> +out:
>>> +    remove_wait_queue(&wq->wq, &wait);
>>> +    if (seqno != guc->serial_slot)
>>> +        xa_erase_irq(&guc->tlb_lookup, seqno);
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +/* Full TLB invalidation */
>>> +int intel_guc_invalidate_tlb_engines(struct intel_guc *guc)
>>> +{
>>> +    return guc_send_invalidate_tlb(guc, INTEL_GUC_TLB_INVAL_ENGINES);
>>> +}
>>> +
>>> +/* GuC TLB Invalidation: Invalidate the TLB's of GuC itself. */
>>> +int intel_guc_invalidate_tlb_guc(struct intel_guc *guc)
>>> +{
>>> +    return guc_send_invalidate_tlb(guc, INTEL_GUC_TLB_INVAL_GUC);
>>> +}
>>> +
>>>   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>                         const u32 *msg,
>>>                         u32 len)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20231009/226412c0/attachment.htm>