<div dir="ltr"><div>The kernel will know who should touch the implicit-sync semaphore next, and at the same time, the copy of all write requests to the implicit-sync semaphore will be forwarded to the kernel for monitoring and bo_wait.</div><div><br></div><div>Syncobjs could either use the same monitored access as implicit sync or be completely unmonitored. We haven't decided yet.</div><div><br></div><div>Syncfiles could either use one of the above or wait for a syncobj to go idle before converting to a syncfile.</div><div><br></div><div>Marek<br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 17, 2021 at 12:48 PM Daniel Vetter <<a href="mailto:daniel@ffwll.ch">daniel@ffwll.ch</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian König wrote:<br>
> As long as we can figure out who touched to a certain sync object last that<br>
> would indeed work, yes.<br>
<br>
Don't you need to know who will touch it next, i.e. who is holding up your<br>
fence? Or maybe I'm just again totally confused.<br>
-Daniel<br>
<br>
> <br>
> Christian.<br>
> <br>
> Am 14.06.21 um 19:10 schrieb Marek Olšák:<br>
> > The call to the hw scheduler has a limitation on the size of all<br>
> > parameters combined. I think we can only pass a 32-bit sequence number<br>
> > and a ~16-bit global (per-GPU) syncobj handle in one call and not much<br>
> > else.<br>
> > <br>
> > The syncobj handle can be an element index in a global (per-GPU) syncobj<br>
> > table and it's read only for all processes with the exception of the<br>
> > signal command. Syncobjs can either have per VMID write access flags for<br>
> > the signal command (slow), or any process can write to any syncobjs and<br>
> > only rely on the kernel checking the write log (fast).<br>
> > <br>
> > In any case, we can execute the memory write in the queue engine and<br>
> > only use the hw scheduler for logging, which would be perfect.<br>
> > <br>
> > Marek<br>
> > <br>
> > On Thu, Jun 10, 2021 at 12:33 PM Christian König<br>
> > <<a href="mailto:ckoenig.leichtzumerken@gmail.com" target="_blank">ckoenig.leichtzumerken@gmail.com</a><br>
> > <mailto:<a href="mailto:ckoenig.leichtzumerken@gmail.com" target="_blank">ckoenig.leichtzumerken@gmail.com</a>>> wrote:<br>
> > <br>
> > Hi guys,<br>
> > <br>
> > maybe soften that a bit. Reading from the shared memory of the<br>
> > user fence is ok for everybody. What we need to take more care of<br>
> > is the writing side.<br>
> > <br>
> > So my current thinking is that we allow read only access, but<br>
> > writing a new sequence value needs to go through the scheduler/kernel.<br>
> > <br>
> > So when the CPU wants to signal a timeline fence it needs to call<br>
> > an IOCTL. When the GPU wants to signal the timeline fence it needs<br>
> > to hand that of to the hardware scheduler.<br>
> > <br>
> > If we lockup the kernel can check with the hardware who did the<br>
> > last write and what value was written.<br>
> > <br>
> > That together with an IOCTL to give out sequence number for<br>
> > implicit sync to applications should be sufficient for the kernel<br>
> > to track who is responsible if something bad happens.<br>
> > <br>
> > In other words when the hardware says that the shader wrote stuff<br>
> > like 0xdeadbeef 0x0 or 0xffffffff into memory we kill the process<br>
> > who did that.<br>
> > <br>
> > If the hardware says that seq - 1 was written fine, but seq is<br>
> > missing then the kernel blames whoever was supposed to write seq.<br>
> > <br>
> > Just pieping the write through a privileged instance should be<br>
> > fine to make sure that we don't run into issues.<br>
> > <br>
> > Christian.<br>
> > <br>
> > Am 10.06.21 um 17:59 schrieb Marek Olšák:<br>
> > > Hi Daniel,<br>
> > > <br>
> > > We just talked about this whole topic internally and we came up<br>
> > > to the conclusion that the hardware needs to understand sync<br>
> > > object handles and have high-level wait and signal operations in<br>
> > > the command stream. Sync objects will be backed by memory, but<br>
> > > they won't be readable or writable by processes directly. The<br>
> > > hardware will log all accesses to sync objects and will send the<br>
> > > log to the kernel periodically. The kernel will identify<br>
> > > malicious behavior.<br>
> > > <br>
> > > Example of a hardware command stream:<br>
> > > ...<br>
> > > ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence<br>
> > > number is assigned by the kernel<br>
> > > Draw();<br>
> > > ImplicitSyncSignalWhenDone(syncObjHandle);<br>
> > > ...<br>
> > > <br>
> > > I'm afraid we have no other choice because of the TLB<br>
> > > invalidation overhead.<br>
> > > <br>
> > > Marek<br>
> > > <br>
> > > <br>
> > > On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter <<a href="mailto:daniel@ffwll.ch" target="_blank">daniel@ffwll.ch</a><br>
> > > <mailto:<a href="mailto:daniel@ffwll.ch" target="_blank">daniel@ffwll.ch</a>>> wrote:<br>
> > > <br>
> > > On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian König wrote:<br>
> > > > Am 09.06.21 um 15:19 schrieb Daniel Vetter:<br>
> > > > > [SNIP]<br>
> > > > > > Yeah, we call this the lightweight and the heavyweight<br>
> > > tlb flush.<br>
> > > > > ><br>
> > > > > > The lighweight can be used when you are sure that you<br>
> > > don't have any of the<br>
> > > > > > PTEs currently in flight in the 3D/DMA engine and you<br>
> > > just need to<br>
> > > > > > invalidate the TLB.<br>
> > > > > ><br>
> > > > > > The heavyweight must be used when you need to<br>
> > > invalidate the TLB *AND* make<br>
> > > > > > sure that no concurrently operation moves new stuff<br>
> > > into the TLB.<br>
> > > > > ><br>
> > > > > > The problem is for this use case we have to use the<br>
> > > heavyweight one.<br>
> > > > > Just for my own curiosity: So the lightweight flush is<br>
> > > only for in-between<br>
> > > > > CS when you know access is idle? Or does that also not<br>
> > > work if userspace<br>
> > > > > has a CS on a dma engine going at the same time because<br>
> > > the tlb aren't<br>
> > > > > isolated enough between engines?<br>
> > > ><br>
> > > > More or less correct, yes.<br>
> > > ><br>
> > > > The problem is a lightweight flush only invalidates the<br>
> > > TLB, but doesn't<br>
> > > > take care of entries which have been handed out to the<br>
> > > different engines.<br>
> > > ><br>
> > > > In other words what can happen is the following:<br>
> > > ><br>
> > > > 1. Shader asks TLB to resolve address X.<br>
> > > > 2. TLB looks into its cache and can't find address X so it<br>
> > > asks the walker<br>
> > > > to resolve.<br>
> > > > 3. Walker comes back with result for address X and TLB puts<br>
> > > that into its<br>
> > > > cache and gives it to Shader.<br>
> > > > 4. Shader starts doing some operation using result for<br>
> > > address X.<br>
> > > > 5. You send lightweight TLB invalidate and TLB throws away<br>
> > > cached values for<br>
> > > > address X.<br>
> > > > 6. Shader happily still uses whatever the TLB gave to it in<br>
> > > step 3 to<br>
> > > > accesses address X<br>
> > > ><br>
> > > > See it like the shader has their own 1 entry L0 TLB cache<br>
> > > which is not<br>
> > > > affected by the lightweight flush.<br>
> > > ><br>
> > > > The heavyweight flush on the other hand sends out a<br>
> > > broadcast signal to<br>
> > > > everybody and only comes back when we are sure that an<br>
> > > address is not in use<br>
> > > > any more.<br>
> > > <br>
> > > Ah makes sense. On intel the shaders only operate in VA,<br>
> > > everything goes<br>
> > > around as explicit async messages to IO blocks. So we don't<br>
> > > have this, the<br>
> > > only difference in tlb flushes is between tlb flush in the IB<br>
> > > and an mmio<br>
> > > one which is independent for anything currently being<br>
> > > executed on an<br>
> > > egine.<br>
> > > -Daniel<br>
> > > -- Daniel Vetter<br>
> > > Software Engineer, Intel Corporation<br>
> > > <a href="http://blog.ffwll.ch" rel="noreferrer" target="_blank">http://blog.ffwll.ch</a> <<a href="http://blog.ffwll.ch" rel="noreferrer" target="_blank">http://blog.ffwll.ch</a>><br>
> > > <br>
> > <br>
> <br>
<br>
-- <br>
Daniel Vetter<br>
Software Engineer, Intel Corporation<br>
<a href="http://blog.ffwll.ch" rel="noreferrer" target="_blank">http://blog.ffwll.ch</a><br>
</blockquote></div>