<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Tvrtko,<br>
</p>
<div class="moz-cite-prefix">On 10/9/2023 2:54 PM, Tvrtko Ursulin
wrote:</div>
<div class="moz-cite-prefix"><snip><br>
</div>
<blockquote type="cite" cite="mid:e149a000-f746-9ae3-4dfc-cfe565a836b0@linux.intel.com">
<br>
<blockquote type="cite">
<blockquote type="cite">+static long must_wait_woken(struct
wait_queue_entry *wq_entry, long timeout)
<br>
+{
<br>
+ /*
<br>
+ * This is equivalent to wait_woken() with the exception
that
<br>
+ * we do not wake up early if the kthread task has been
completed.
<br>
+ * As we are called from page reclaim in any task
context,
<br>
+ * we may be invoked from stopped kthreads, but we *must*
<br>
+ * complete the wait from the HW .
<br>
+ *
<br>
+ * A second problem is that since we are called under
reclaim
<br>
+ * and wait_woken() inspected the thread state, it makes
an invalid
<br>
+ * assumption that all PF_KTHREAD tasks have
set_kthread_struct()
<br>
+ * called upon them, and will trigger a GPF
<br>
</blockquote>
<br>
As discussed internally, the GPF issue is resolved with
<a class="moz-txt-link-freetext" href="https://lore.kernel.org/all/20230602212350.535358-1-jstultz@google.com/">https://lore.kernel.org/all/20230602212350.535358-1-jstultz@google.com/</a>
<a class="moz-txt-link-rfc2396E" href="https://lore.kernel.org/all/20230602212350.535358-1-jstultz@google.com/"><https://lore.kernel.org/all/20230602212350.535358-1-jstultz@google.com/></a>
<br>
</blockquote>
<br>
If it means no open coding a slighlt different wait_token() that
would be a lot better ineed.
<br>
<br>
Although the question of why a single wait queue is not good
enough still remains. As I wrote before - give each invalidation a
seqno, upon receiving the done h2g store the latest seqno in the
driver and wake up all sleepers. They check the seqno and if their
has passed exit, otherwise go back to sleep. No xarray needed.
Put_pages() invalidations are already serialized so no ill-effect
and GGTT invalidations are, or are not, performance sensitive for
some reason?
<br>
</blockquote>
<p><br>
</p>
<p>I think your proposed solution should work as per my
understanding because we are doing TLB invalidation of all engines
and it is serialized with <span style="padding: 0px; tab-size:
8;" class="hljs diff colorediffs language-diff">gt->tlb.invalidate_lock.</span></p>
<p>We might need this when we want to make it more fine grain and do
vma based ranged invalidation for better performance. So I think
we should try with a single wait queue now. <span style="padding:
0px; tab-size: 8;" class="hljs diff colorediffs language-diff"></span></p>
<p></p>
<p><br>
</p>
<p>I don't have an concrete answer yet, may be someone else?</p>
<p><br>
</p>
<p>Regards,</p>
<p>Nirmoy<br>
</p>
<blockquote type="cite" cite="mid:e149a000-f746-9ae3-4dfc-cfe565a836b0@linux.intel.com">
<br>
Regards,
<br>
<br>
Tvrtko
<br>
<br>
<blockquote type="cite">
<br>
<br>
<blockquote type="cite"> in is_kthread_should_stop().
<br>
+ */
<br>
+ do {
<br>
+ set_current_state(TASK_UNINTERRUPTIBLE);
<br>
+ if (wq_entry->flags & WQ_FLAG_WOKEN)
<br>
+ break;
<br>
+
<br>
+ timeout = schedule_timeout(timeout);
<br>
+ } while (timeout);
<br>
+ __set_current_state(TASK_RUNNING);
<br>
+
<br>
+ /* See wait_woken() and woken_wake_function() */
<br>
+ smp_store_mb(wq_entry->flags, wq_entry->flags &
~WQ_FLAG_WOKEN);
<br>
+
<br>
+ return timeout;
<br>
+}
<br>
+
<br>
+static int guc_send_invalidate_tlb(struct intel_guc *guc,
enum intel_guc_tlb_inval_mode type)
<br>
</blockquote>
<br>
<br>
2nd param should be intel_guc_tlb_invalidation_type not
intel_guc_tlb_inval_mod.
<br>
<br>
Not sure why didn't CI complained.
<br>
<br>
<br>
Regards,
<br>
<br>
Nirmoy
<br>
<br>
<blockquote type="cite">+{
<br>
+ struct intel_guc_tlb_wait _wq, *wq = &_wq;
<br>
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);
<br>
+ int err;
<br>
+ u32 seqno;
<br>
+ u32 action[] = {
<br>
+ INTEL_GUC_ACTION_TLB_INVALIDATION,
<br>
+ 0,
<br>
+ REG_FIELD_PREP(INTEL_GUC_TLB_INVAL_TYPE_MASK, type) |
<br>
+ REG_FIELD_PREP(INTEL_GUC_TLB_INVAL_MODE_MASK,
<br>
+ INTEL_GUC_TLB_INVAL_MODE_HEAVY) |
<br>
+ INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
<br>
+ };
<br>
+ u32 size = ARRAY_SIZE(action);
<br>
+
<br>
+ init_waitqueue_head(&_wq.wq);
<br>
+
<br>
+ if (xa_alloc_cyclic_irq(&guc->tlb_lookup,
&seqno, wq,
<br>
+ xa_limit_32b, &guc->next_seqno,
<br>
+ GFP_ATOMIC | __GFP_NOWARN) < 0) {
<br>
+ /* Under severe memory pressure? Serialise TLB
allocations */
<br>
+ xa_lock_irq(&guc->tlb_lookup);
<br>
+ wq = xa_load(&guc->tlb_lookup,
guc->serial_slot);
<br>
+ wait_event_lock_irq(wq->wq,
<br>
+ !READ_ONCE(wq->busy),
<br>
+ guc->tlb_lookup.xa_lock);
<br>
+ /*
<br>
+ * Update wq->busy under lock to ensure only one
waiter can
<br>
+ * issue the TLB invalidation command using the
serial slot at a
<br>
+ * time. The condition is set to true before
releasing the lock
<br>
+ * so that other caller continue to wait until woken
up again.
<br>
+ */
<br>
+ wq->busy = true;
<br>
+ xa_unlock_irq(&guc->tlb_lookup);
<br>
+
<br>
+ seqno = guc->serial_slot;
<br>
+ }
<br>
+
<br>
+ action[1] = seqno;
<br>
+
<br>
+ add_wait_queue(&wq->wq, &wait);
<br>
+
<br>
+ /*
<br>
+ * This is a critical reclaim path and thus we must loop
here:
<br>
+ * We cannot block for anything that is on the GPU.
<br>
+ */
<br>
+ err = intel_guc_send_busy_loop(guc, action, size,
G2H_LEN_DW_INVALIDATE_TLB, true);
<br>
+ if (err)
<br>
+ goto out;
<br>
+
<br>
+ if (!must_wait_woken(&wait,
intel_guc_ct_expected_delay(&guc->ct))) {
<br>
+ guc_err(guc,
<br>
+ "TLB invalidation response timed out for seqno
%u\n", seqno);
<br>
+ err = -ETIME;
<br>
+ }
<br>
+out:
<br>
+ remove_wait_queue(&wq->wq, &wait);
<br>
+ if (seqno != guc->serial_slot)
<br>
+ xa_erase_irq(&guc->tlb_lookup, seqno);
<br>
+
<br>
+ return err;
<br>
+}
<br>
+
<br>
+/* Full TLB invalidation */
<br>
+int intel_guc_invalidate_tlb_engines(struct intel_guc *guc)
<br>
+{
<br>
+ return guc_send_invalidate_tlb(guc,
INTEL_GUC_TLB_INVAL_ENGINES);
<br>
+}
<br>
+
<br>
+/* GuC TLB Invalidation: Invalidate the TLB's of GuC itself.
*/
<br>
+int intel_guc_invalidate_tlb_guc(struct intel_guc *guc)
<br>
+{
<br>
+ return guc_send_invalidate_tlb(guc,
INTEL_GUC_TLB_INVAL_GUC);
<br>
+}
<br>
+
<br>
int intel_guc_deregister_done_process_msg(struct intel_guc
*guc,
<br>
const u32 *msg,
<br>
u32 len)
<br>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>