[Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Fri Jun 10 08:18:14 UTC 2022
On 10/06/2022 10:54, Niranjana Vishwanathapura wrote:
> On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote:
>> On 09/06/2022 22:31, Niranjana Vishwanathapura wrote:
>>> On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote:
>>>> On 09/06/2022 00:55, Jason Ekstrand wrote:
>>>>
>>>> On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura
>>>> <niranjana.vishwanathapura at intel.com> wrote:
>>>>
>>>> On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote:
>>>> >
>>>> >
>>>> >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
>>>> >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana
>>>> Vishwanathapura
>>>> wrote:
>>>> >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand
>>>> wrote:
>>>> >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura
>>>> >>>> <niranjana.vishwanathapura at intel.com> wrote:
>>>> >>>>
>>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel
>>>> Landwerlin
>>>> wrote:
>>>> >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote:
>>>> >>>> >
>>>> >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana
>>>> Vishwanathapura
>>>> >>>> > <niranjana.vishwanathapura at intel.com> wrote:
>>>> >>>> >
>>>> >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew
>>>> >>>>Brost wrote:
>>>> >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel
>>>> Landwerlin
>>>> >>>> wrote:
>>>> >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura
>>>> wrote:
>>>> >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start
>>>> >>>> binding/unbinding
>>>> >>>> > the mapping in an
>>>> >>>> > >> > +async worker. The binding and unbinding
>>>> will
>>>> >>>>work like a
>>>> >>>> special
>>>> >>>> > GPU engine.
>>>> >>>> > >> > +The binding and unbinding operations are
>>>> serialized and
>>>> >>>> will
>>>> >>>> > wait on specified
>>>> >>>> > >> > +input fences before the operation and
>>>> will signal
>>>> the
>>>> >>>> output
>>>> >>>> > fences upon the
>>>> >>>> > >> > +completion of the operation. Due to
>>>> serialization,
>>>> >>>> completion of
>>>> >>>> > an operation
>>>> >>>> > >> > +will also indicate that all previous
>>>> operations
>>>> >>>>are also
>>>> >>>> > complete.
>>>> >>>> > >>
>>>> >>>> > >> I guess we should avoid saying "will
>>>> immediately
>>>> start
>>>> >>>> > binding/unbinding" if
>>>> >>>> > >> there are fences involved.
>>>> >>>> > >>
>>>> >>>> > >> And the fact that it's happening in an async
>>>> >>>>worker seem to
>>>> >>>> imply
>>>> >>>> > it's not
>>>> >>>> > >> immediate.
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>> > Ok, will fix.
>>>> >>>> > This was added because in earlier design
>>>> binding was
>>>> deferred
>>>> >>>> until
>>>> >>>> > next execbuff.
>>>> >>>> > But now it is non-deferred (immediate in that
>>>> sense).
>>>> >>>>But yah,
>>>> >>>> this is
>>>> >>>> > confusing
>>>> >>>> > and will fix it.
>>>> >>>> >
>>>> >>>> > >>
>>>> >>>> > >> I have a question on the behavior of the bind
>>>> >>>>operation when
>>>> >>>> no
>>>> >>>> > input fence
>>>> >>>> > >> is provided. Let say I do :
>>>> >>>> > >>
>>>> >>>> > >> VM_BIND (out_fence=fence1)
>>>> >>>> > >>
>>>> >>>> > >> VM_BIND (out_fence=fence2)
>>>> >>>> > >>
>>>> >>>> > >> VM_BIND (out_fence=fence3)
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> > >> In what order are the fences going to be
>>>> signaled?
>>>> >>>> > >>
>>>> >>>> > >> In the order of VM_BIND ioctls? Or out of
>>>> order?
>>>> >>>> > >>
>>>> >>>> > >> Because you wrote "serialized I assume it's
>>>> : in
>>>> order
>>>> >>>> > >>
>>>> >>>> >
>>>> >>>> > Yes, in the order of VM_BIND/UNBIND ioctls.
>>>> Note that
>>>> >>>>bind and
>>>> >>>> unbind
>>>> >>>> > will use
>>>> >>>> > the same queue and hence are ordered.
>>>> >>>> >
>>>> >>>> > >>
>>>> >>>> > >> One thing I didn't realize is that because
>>>> we only
>>>> get one
>>>> >>>> > "VM_BIND" engine,
>>>> >>>> > >> there is a disconnect from the Vulkan
>>>> specification.
>>>> >>>> > >>
>>>> >>>> > >> In Vulkan VM_BIND operations are serialized
>>>> but
>>>> >>>>per engine.
>>>> >>>> > >>
>>>> >>>> > >> So you could have something like this :
>>>> >>>> > >>
>>>> >>>> > >> VM_BIND (engine=rcs0, in_fence=fence1,
>>>> out_fence=fence2)
>>>> >>>> > >>
>>>> >>>> > >> VM_BIND (engine=ccs0, in_fence=fence3,
>>>> out_fence=fence4)
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> > >> fence1 is not signaled
>>>> >>>> > >>
>>>> >>>> > >> fence3 is signaled
>>>> >>>> > >>
>>>> >>>> > >> So the second VM_BIND will proceed before the
>>>> >>>>first VM_BIND.
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> > >> I guess we can deal with that scenario in
>>>> >>>>userspace by doing
>>>> >>>> the
>>>> >>>> > wait
>>>> >>>> > >> ourselves in one thread per engines.
>>>> >>>> > >>
>>>> >>>> > >> But then it makes the VM_BIND input fences
>>>> useless.
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> > >> Daniel : what do you think? Should be
>>>> rework this or
>>>> just
>>>> >>>> deal with
>>>> >>>> > wait
>>>> >>>> > >> fences in userspace?
>>>> >>>> > >>
>>>> >>>> > >
>>>> >>>> > >My opinion is rework this but make the
>>>> ordering via
>>>> >>>>an engine
>>>> >>>> param
>>>> >>>> > optional.
>>>> >>>> > >
>>>> >>>> > >e.g. A VM can be configured so all binds are
>>>> ordered
>>>> >>>>within the
>>>> >>>> VM
>>>> >>>> > >
>>>> >>>> > >e.g. A VM can be configured so all binds
>>>> accept an
>>>> engine
>>>> >>>> argument
>>>> >>>> > (in
>>>> >>>> > >the case of the i915 likely this is a gem
>>>> context
>>>> >>>>handle) and
>>>> >>>> binds
>>>> >>>> > >ordered with respect to that engine.
>>>> >>>> > >
>>>> >>>> > >This gives UMDs options as the later likely
>>>> consumes
>>>> >>>>more KMD
>>>> >>>> > resources
>>>> >>>> > >so if a different UMD can live with binds being
>>>> >>>>ordered within
>>>> >>>> the VM
>>>> >>>> > >they can use a mode consuming less resources.
>>>> >>>> > >
>>>> >>>> >
>>>> >>>> > I think we need to be careful here if we are
>>>> looking
>>>> for some
>>>> >>>> out of
>>>> >>>> > (submission) order completion of vm_bind/unbind.
>>>> >>>> > In-order completion means, in a batch of binds
>>>> and
>>>> >>>>unbinds to be
>>>> >>>> > completed in-order, user only needs to specify
>>>> >>>>in-fence for the
>>>> >>>> > first bind/unbind call and the our-fence for
>>>> the last
>>>> >>>> bind/unbind
>>>> >>>> > call. Also, the VA released by an unbind call
>>>> can be
>>>> >>>>re-used by
>>>> >>>> > any subsequent bind call in that in-order batch.
>>>> >>>> >
>>>> >>>> > These things will break if binding/unbinding
>>>> were to
>>>> >>>>be allowed
>>>> >>>> to
>>>> >>>> > go out of order (of submission) and user need
>>>> to be
>>>> extra
>>>> >>>> careful
>>>> >>>> > not to run into pre-mature triggereing of
>>>> out-fence and
>>>> bind
>>>> >>>> failing
>>>> >>>> > as VA is still in use etc.
>>>> >>>> >
>>>> >>>> > Also, VM_BIND binds the provided mapping on the
>>>> specified
>>>> >>>> address
>>>> >>>> > space
>>>> >>>> > (VM). So, the uapi is not engine/context
>>>> specific.
>>>> >>>> >
>>>> >>>> > We can however add a 'queue' to the uapi which
>>>> can be
>>>> >>>>one from
>>>> >>>> the
>>>> >>>> > pre-defined queues,
>>>> >>>> > I915_VM_BIND_QUEUE_0
>>>> >>>> > I915_VM_BIND_QUEUE_1
>>>> >>>> > ...
>>>> >>>> > I915_VM_BIND_QUEUE_(N-1)
>>>> >>>> >
>>>> >>>> > KMD will spawn an async work queue for each
>>>> queue which
>>>> will
>>>> >>>> only
>>>> >>>> > bind the mappings on that queue in the order of
>>>> submission.
>>>> >>>> > User can assign the queue to per engine or
>>>> anything
>>>> >>>>like that.
>>>> >>>> >
>>>> >>>> > But again here, user need to be careful and not
>>>> >>>>deadlock these
>>>> >>>> > queues with circular dependency of fences.
>>>> >>>> >
>>>> >>>> > I prefer adding this later an as extension
>>>> based on
>>>> >>>>whether it
>>>> >>>> > is really helping with the implementation.
>>>> >>>> >
>>>> >>>> > I can tell you right now that having everything
>>>> on a
>>>> single
>>>> >>>> in-order
>>>> >>>> > queue will not get us the perf we want. What
>>>> vulkan
>>>> >>>>really wants
>>>> >>>> is one
>>>> >>>> > of two things:
>>>> >>>> > 1. No implicit ordering of VM_BIND ops. They just
>>>> happen in
>>>> >>>> whatever
>>>> >>>> > their dependencies are resolved and we ensure
>>>> ordering
>>>> >>>>ourselves
>>>> >>>> by
>>>> >>>> > having a syncobj in the VkQueue.
>>>> >>>> > 2. The ability to create multiple VM_BIND
>>>> queues. We
>>>> need at
>>>> >>>> least 2
>>>> >>>> > but I don't see why there needs to be a limit
>>>> besides
>>>> >>>>the limits
>>>> >>>> the
>>>> >>>> > i915 API already has on the number of engines.
>>>> Vulkan
>>>> could
>>>> >>>> expose
>>>> >>>> > multiple sparse binding queues to the client if
>>>> it's not
>>>> >>>> arbitrarily
>>>> >>>> > limited.
>>>> >>>>
>>>> >>>> Thanks Jason, Lionel.
>>>> >>>>
>>>> >>>> Jason, what are you referring to when you say "limits
>>>> the i915
>>>> API
>>>> >>>> already
>>>> >>>> has on the number of engines"? I am not sure if there
>>>> is such
>>>> an uapi
>>>> >>>> today.
>>>> >>>>
>>>> >>>> There's a limit of something like 64 total engines today
>>>> based on
>>>> the
>>>> >>>> number of bits we can cram into the exec flags in
>>>> execbuffer2. I
>>>> think
>>>> >>>> someone had an extended version that allowed more but I
>>>> ripped it
>>>> out
>>>> >>>> because no one was using it. Of course, execbuffer3
>>>> might not
>>>> >>>>have that
>>>> >>>> problem at all.
>>>> >>>>
>>>> >>>
>>>> >>>Thanks Jason.
>>>> >>>Ok, I am not sure which exec flag is that, but yah,
>>>> execbuffer3
>>>> probably
>>>> >>>will not have this limiation. So, we need to define a
>>>> VM_BIND_MAX_QUEUE
>>>> >>>and somehow export it to user (I am thinking of embedding
>>>> it in
>>>> >>>I915_PARAM_HAS_VM_BIND. bits[0]->HAS_VM_BIND, bits[1-3]->'n'
>>>> meaning 2^n
>>>> >>>queues.
>>>> >>
>>>> >>Ah, I think you are waking about I915_EXEC_RING_MASK (0x3f)
>>>> which
>>>> execbuf3
>>>>
>>>> Yup! That's exactly the limit I was talking about.
>>>>
>>>> >>will also have. So, we can simply define in vm_bind/unbind
>>>> structures,
>>>> >>
>>>> >>#define I915_VM_BIND_MAX_QUEUE 64
>>>> >> __u32 queue;
>>>> >>
>>>> >>I think that will keep things simple.
>>>> >
>>>> >Hmmm? What does execbuf2 limit has to do with how many engines
>>>> >hardware can have? I suggest not to do that.
>>>> >
>>>> >Change with added this:
>>>> >
>>>> > if (set.num_engines > I915_EXEC_RING_MASK + 1)
>>>> > return -EINVAL;
>>>> >
>>>> >To context creation needs to be undone and so let users
>>>> create engine
>>>> >maps with all hardware engines, and let execbuf3 access them
>>>> all.
>>>> >
>>>>
>>>> Earlier plan was to carry I915_EXEC_RING_MAP (0x3f) to
>>>> execbuff3 also.
>>>> Hence, I was using the same limit for VM_BIND queues (64, or
>>>> 65 if we
>>>> make it N+1).
>>>> But, as discussed in other thread of this RFC series, we are
>>>> planning
>>>> to drop this I915_EXEC_RING_MAP in execbuff3. So, there won't be
>>>> any uapi that limits the number of engines (and hence the
>>>> vm_bind
>>>> queues
>>>> need to be supported).
>>>>
>>>> If we leave the number of vm_bind queues to be arbitrarily large
>>>> (__u32 queue_idx) then, we need to have a hashmap for queue
>>>> (a wq,
>>>> work_item and a linked list) lookup from the user specified
>>>> queue
>>>> index.
>>>> Other option is to just put some hard limit (say 64 or 65)
>>>> and use
>>>> an array of queues in VM (each created upon first use). I
>>>> prefer this.
>>>>
>>>> I don't get why a VM_BIND queue is any different from any other
>>>> queue or
>>>> userspace-visible kernel object. But I'll leave those details
>>>> up to
>>>> danvet or whoever else might be reviewing the implementation.
>>>> --Jason
>>>>
>>>> I kind of agree here. Wouldn't be simpler to have the bind queue
>>>> created
>>>> like the others when we build the engine map?
>>>>
>>>> For userspace it's then just matter of selecting the right queue
>>>> ID when
>>>> submitting.
>>>>
>>>> If there is ever a possibility to have this work on the GPU, it
>>>> would be
>>>> all ready.
>>>>
>>>
>>> I did sync offline with Matt Brost on this.
>>> We can add a VM_BIND engine class and let user create VM_BIND
>>> engines (queues).
>>> The problem is, in i915 engine creating interface is bound to
>>> gem_context.
>>> So, in vm_bind ioctl, we would need both context_id and queue_idx
>>> for proper
>>> lookup of the user created engine. This is bit ackward as vm_bind is an
>>> interface to VM (address space) and has nothing to do with gem_context.
>>
>>
>> A gem_context has a single vm object right?
>>
>> Set through I915_CONTEXT_PARAM_VM at creation or given a default one
>> if not.
>>
>> So it's just like picking up the vm like it's done at execbuffer time
>> right now : eb->context->vm
>>
>
> Are you suggesting replacing 'vm_id' with 'context_id' in the
> VM_BIND/UNBIND
> ioctl and probably call it CONTEXT_BIND/UNBIND, because VM can be
> obtained
> from the context?
Yes, because if we go for engines, they're associated with a context and
so also associated with the VM bound to the context.
> I think the interface is clean as a interface to VM. It is only that we
> don't have a clean way to create a raw VM_BIND engine (not associated
> with
> any context) with i915 uapi.
> May be we can add such an interface, but I don't think that is worth it
> (we might as well just use a queue_idx in VM_BIND/UNBIND ioctl as I
> mentioned
> above).
> Anyone has any thoughts?
>
>>
>>> Another problem is, if two VMs are binding with the same defined
>>> engine,
>>> binding on VM1 can get unnecessary blocked by binding on VM2 (which
>>> may be
>>> waiting on its in_fence).
>>
>>
>> Maybe I'm missing something, but how can you have 2 vm objects with a
>> single gem_context right now?
>>
>
> No, we don't have 2 VMs for a gem_context.
> Say if ctx1 with vm1 and ctx2 with vm2.
> First vm_bind call was for vm1 with q_idx 1 in ctx1 engine map.
> Second vm_bind call was for vm2 with q_idx 2 in ctx2 engine map. If
> those two queue indicies points to same underlying vm_bind engine,
> then the second vm_bind call gets blocked until the first vm_bind call's
> 'in' fence is triggered and bind completes.
>
> With per VM queues, this is not a problem as two VMs will not endup
> sharing same queue.
>
> BTW, I just posted a updated PATCH series.
> https://www.spinics.net/lists/dri-devel/msg350483.html
>
> Niranjana
>
>>
>>>
>>> So, my preference here is to just add a 'u32 queue' index in
>>> vm_bind/unbind
>>> ioctl, and the queues are per VM.
>>>
>>> Niranjana
>>>
>>>> Thanks,
>>>>
>>>> -Lionel
>>>>
>>>>
>>>> Niranjana
>>>>
>>>> >Regards,
>>>> >
>>>> >Tvrtko
>>>> >
>>>> >>
>>>> >>Niranjana
>>>> >>
>>>> >>>
>>>> >>>> I am trying to see how many queues we need and don't
>>>> want it to
>>>> be
>>>> >>>> arbitrarily
>>>> >>>> large and unduely blow up memory usage and complexity
>>>> in i915
>>>> driver.
>>>> >>>>
>>>> >>>> I expect a Vulkan driver to use at most 2 in the vast
>>>> majority
>>>> >>>>of cases. I
>>>> >>>> could imagine a client wanting to create more than 1 sparse
>>>> >>>>queue in which
>>>> >>>> case, it'll be N+1 but that's unlikely. As far as
>>>> complexity
>>>> >>>>goes, once
>>>> >>>> you allow two, I don't think the complexity is going up by
>>>> >>>>allowing N. As
>>>> >>>> for memory usage, creating more queues means more
>>>> memory. That's
>>>> a
>>>> >>>> trade-off that userspace can make. Again, the expected
>>>> number
>>>> >>>>here is 1
>>>> >>>> or 2 in the vast majority of cases so I don't think you
>>>> need to
>>>> worry.
>>>> >>>
>>>> >>>Ok, will start with n=3 meaning 8 queues.
>>>> >>>That would require us create 8 workqueues.
>>>> >>>We can change 'n' later if required.
>>>> >>>
>>>> >>>Niranjana
>>>> >>>
>>>> >>>>
>>>> >>>> > Why? Because Vulkan has two basic kind of bind
>>>> >>>>operations and we
>>>> >>>> don't
>>>> >>>> > want any dependencies between them:
>>>> >>>> > 1. Immediate. These happen right after BO
>>>> creation or
>>>> >>>>maybe as
>>>> >>>> part of
>>>> >>>> > vkBindImageMemory() or VkBindBufferMemory(). These
>>>> >>>>don't happen
>>>> >>>> on a
>>>> >>>> > queue and we don't want them serialized with
>>>> anything. To
>>>> >>>> synchronize
>>>> >>>> > with submit, we'll have a syncobj in the
>>>> VkDevice which
>>>> is
>>>> >>>> signaled by
>>>> >>>> > all immediate bind operations and make submits
>>>> wait on
>>>> it.
>>>> >>>> > 2. Queued (sparse): These happen on a VkQueue
>>>> which may
>>>> be the
>>>> >>>> same as
>>>> >>>> > a render/compute queue or may be its own queue.
>>>> It's up
>>>> to us
>>>> >>>> what we
>>>> >>>> > want to advertise. From the Vulkan API PoV,
>>>> this is like
>>>> any
>>>> >>>> other
>>>> >>>> > queue. Operations on it wait on and signal
>>>> semaphores. If we
>>>> >>>> have a
>>>> >>>> > VM_BIND engine, we'd provide syncobjs to wait and
>>>> >>>>signal just like
>>>> >>>> we do
>>>> >>>> > in execbuf().
>>>> >>>> > The important thing is that we don't want one
>>>> type of
>>>> >>>>operation to
>>>> >>>> block
>>>> >>>> > on the other. If immediate binds are blocking
>>>> on sparse
>>>> binds,
>>>> >>>> it's
>>>> >>>> > going to cause over-synchronization issues.
>>>> >>>> > In terms of the internal implementation, I know
>>>> that
>>>> >>>>there's going
>>>> >>>> to be
>>>> >>>> > a lock on the VM and that we can't actually do
>>>> these
>>>> things in
>>>> >>>> > parallel. That's fine. Once the dma_fences have
>>>> signaled and
>>>> >>>> we're
>>>> >>>>
>>>> >>>> Thats correct. It is like a single VM_BIND engine with
>>>> >>>>multiple queues
>>>> >>>> feeding to it.
>>>> >>>>
>>>> >>>> Right. As long as the queues themselves are independent
>>>> and
>>>> >>>>can block on
>>>> >>>> dma_fences without holding up other queues, I think
>>>> we're fine.
>>>> >>>>
>>>> >>>> > unblocked to do the bind operation, I don't care if
>>>> >>>>there's a bit
>>>> >>>> of
>>>> >>>> > synchronization due to locking. That's
>>>> expected. What
>>>> >>>>we can't
>>>> >>>> afford
>>>> >>>> > to have is an immediate bind operation suddenly
>>>> blocking
>>>> on a
>>>> >>>> sparse
>>>> >>>> > operation which is blocked on a compute job
>>>> that's going
>>>> to run
>>>> >>>> for
>>>> >>>> > another 5ms.
>>>> >>>>
>>>> >>>> As the VM_BIND queue is per VM, VM_BIND on one VM
>>>> doesn't block
>>>> the
>>>> >>>> VM_BIND
>>>> >>>> on other VMs. I am not sure about usecases here, but just
>>>> wanted to
>>>> >>>> clarify.
>>>> >>>>
>>>> >>>> Yes, that's what I would expect.
>>>> >>>> --Jason
>>>> >>>>
>>>> >>>> Niranjana
>>>> >>>>
>>>> >>>> > For reference, Windows solves this by allowing
>>>> arbitrarily many
>>>> >>>> paging
>>>> >>>> > queues (what they call a VM_BIND engine/queue).
>>>> That
>>>> >>>>design works
>>>> >>>> > pretty well and solves the problems in question.
>>>> >>>>Again, we could
>>>> >>>> just
>>>> >>>> > make everything out-of-order and require using
>>>> syncobjs
>>>> >>>>to order
>>>> >>>> things
>>>> >>>> > as userspace wants. That'd be fine too.
>>>> >>>> > One more note while I'm here: danvet said
>>>> something on
>>>> >>>>IRC about
>>>> >>>> VM_BIND
>>>> >>>> > queues waiting for syncobjs to materialize. We
>>>> don't
>>>> really
>>>> >>>> want/need
>>>> >>>> > this. We already have all the machinery in
>>>> userspace to
>>>> handle
>>>> >>>> > wait-before-signal and waiting for syncobj
>>>> fences to
>>>> >>>>materialize
>>>> >>>> and
>>>> >>>> > that machinery is on by default. It would actually
>>>> >>>>take MORE work
>>>> >>>> in
>>>> >>>> > Mesa to turn it off and take advantage of the
>>>> kernel
>>>> >>>>being able to
>>>> >>>> wait
>>>> >>>> > for syncobjs to materialize. Also, getting that
>>>> right is
>>>> >>>> ridiculously
>>>> >>>> > hard and I really don't want to get it wrong in
>>>> kernel
>>>> >>>>space. �� When we
>>>> >>>> > do memory fences, wait-before-signal will be a
>>>> thing. We
>>>> don't
>>>> >>>> need to
>>>> >>>> > try and make it a thing for syncobj.
>>>> >>>> > --Jason
>>>> >>>> >
>>>> >>>> > Thanks Jason,
>>>> >>>> >
>>>> >>>> > I missed the bit in the Vulkan spec that we're
>>>> allowed to
>>>> have a
>>>> >>>> sparse
>>>> >>>> > queue that does not implement either graphics or
>>>> compute
>>>> >>>>operations
>>>> >>>> :
>>>> >>>> >
>>>> >>>> > "While some implementations may include
>>>> >>>> VK_QUEUE_SPARSE_BINDING_BIT
>>>> >>>> > support in queue families that also include
>>>> >>>> >
>>>> >>>> > graphics and compute support, other
>>>> implementations may
>>>> only
>>>> >>>> expose a
>>>> >>>> > VK_QUEUE_SPARSE_BINDING_BIT-only queue
>>>> >>>> >
>>>> >>>> > family."
>>>> >>>> >
>>>> >>>> > So it can all be all a vm_bind engine that just does
>>>> bind/unbind
>>>> >>>> > operations.
>>>> >>>> >
>>>> >>>> > But yes we need another engine for the
>>>> immediate/non-sparse
>>>> >>>> operations.
>>>> >>>> >
>>>> >>>> > -Lionel
>>>> >>>> >
>>>> >>>> > >
>>>> >>>> > Daniel, any thoughts?
>>>> >>>> >
>>>> >>>> > Niranjana
>>>> >>>> >
>>>> >>>> > >Matt
>>>> >>>> > >
>>>> >>>> > >>
>>>> >>>> > >> Sorry I noticed this late.
>>>> >>>> > >>
>>>> >>>> > >>
>>>> >>>> > >> -Lionel
>>>> >>>> > >>
>>>> >>>> > >>
>>
>>
More information about the Intel-gfx
mailing list