[Intel-gfx] [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules
Christian König
christian.koenig at amd.com
Sat May 22 08:30:19 UTC 2021
Am 21.05.21 um 20:31 schrieb Daniel Vetter:
> [SNIP]
>> We could provide an IOCTL for the BO to change the flag.
> That's not the semantics we need.
>
>> But could we first figure out the semantics we want to use here?
>>
>> Cause I'm pretty sure we don't actually need those changes at all and as
>> said before I'm certainly NAKing things which break existing use cases.
> Please read how other drivers do this and at least _try_ to understand
> it. I'm really loosing my patience here with you NAKing patches you're
> not even understanding (or did you actually read and fully understand
> the entire story I typed up here, and your NAK is on the entire
> thing?). There's not much useful conversation to be had with that
> approach. And with drivers I mean kernel + userspace here.
Well to be honest I did fully read that, but I was just to emotionally
attached to answer more appropriately in that moment.
And I'm sorry that I react emotional on that, but it is really
frustrating that I'm not able to convince you that we have a major
problem which affects all drivers and not just amdgpu.
Regarding the reason why I'm NAKing this particular patch, you are
breaking existing uAPI for RADV with that. And as a maintainer of the
driver I have simply no other choice than saying halt, stop we can't do
it like this.
I'm perfectly aware that I've some holes in the understanding of how ANV
or other Vulkan/OpenGL stacks work. But you should probably also admit
that you have some holes how amdgpu works or otherwise I can't imagine
why you suggest a patch which simply breaks RADV.
I mean we are working together for years now and I think you know me
pretty well, do you really think I scream bloody hell we can't do this
without a good reason?
So let's stop throwing halve backed solutions at each other and discuss
what we can do to solve the different problems we are both seeing here.
> That's the other frustration part: You're trying to fix this purely in
> the kernel. This is exactly one of these issues why we require open
> source userspace, so that we can fix the issues correctly across the
> entire stack. And meanwhile you're steadfastily refusing to even look
> at that the userspace side of the picture.
Well I do fully understand the userspace side of the picture for the AMD
stack. I just don't think we should give userspace that much control
over the fences in the dma_resv object without untangling them from
resource management.
And RADV is exercising exclusive sync for amdgpu already. You can do
submission to both the GFX, Compute and SDMA queues in Vulkan and those
currently won't over-synchronize.
When you then send a texture generated by multiple engines to the
Compositor the kernel will correctly inserts waits for all submissions
of the other process.
So this already works for RADV and completely without the IOCTL Jason
proposed. IIRC we also have unit tests which exercised that feature for
the video decoding use case long before RADV even existed.
And yes I have to admit that I haven't thought about interaction with
other drivers when I came up with this because the rules of that
interaction wasn't clear to me at that time.
> Also I thought through your tlb issue, why are you even putting these
> tlb flush fences into the shard dma_resv slots? If you store them
> somewhere else in the amdgpu private part, the oversync issues goes
> away
> - in your ttm bo move callback, you can just make your bo copy job
> depend on them too (you have to anyway)
> - even for p2p there's not an issue here, because you have the
> ->move_notify callback, and can then lift the tlb flush fences from
> your private place to the shared slots so the exporter can see them.
Because adding a shared fence requires that this shared fence signals
after the exclusive fence. And this is a perfect example to explain why
this is so problematic and also why why we currently stumble over that
only in amdgpu.
In TTM we have a feature which allows evictions to be pipelined and
don't wait for the evicting DMA operation. Without that driver will
stall waiting for their allocations to finish when we need to allocate
memory.
For certain use cases this gives you a ~20% fps increase under memory
pressure, so it is a really important feature.
This works by adding the fence of the last eviction DMA operation to BOs
when their backing store is newly allocated. That's what the
ttm_bo_add_move_fence() function you stumbled over is good for:
https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo.c#L692
Now the problem is it is possible that the application is terminated
before it can complete it's command submission. But since resource
management only waits for the shared fences when there are some there is
a chance that we free up memory while it is still in use.
Because of this we have some rather crude workarounds in amdgpu. For
example IIRC we manual wait for any potential exclusive fence before
freeing memory.
We could enable this feature for radeon and nouveau as well with an one
line change. But that would mean we need to maintain the workarounds for
shortcomings of the dma_resv object design in those drivers as well.
To summarize I think that adding an unbound fence to protect an object
is a perfectly valid operation for resource management, but this is
restricted by the needs of implicit sync at the moment.
> The kernel move fences otoh are a bit more nasty to wring through the
> p2p dma-buf interface. That one probably needs something new.
Well the p2p interface are my least concern.
Adding the move fence means that you need to touch every place we do CS
or page flip since you now have something which is parallel to the
explicit sync fence.
Otherwise having the move fence separately wouldn't make much sense in
the first place if we always set it together with the exclusive fence.
Best regards and sorry for getting on your nerves so much,
Christian.
> -Daniel
>
>> Regards,
>> Christian.
>>
>>> -Daniel
>>>
>>>
>>>
>>>>> Are you bored enough to type this up for radv? I'll give Jason's kernel
>>>>> stuff another review meanwhile.
>>>>> -Daniel
>>>>>
>>>>>>> e->bo_va = amdgpu_vm_bo_find(vm, bo);
>>>>>>> }
>>>>>>> --
>>>>>>> 2.31.0
>>>>>>>
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7Cf0852f38c85046ca877908d91c86a719%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637572186953277692%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Vgz%2FkXFH4CD6ktZBnxnXFhHTG5tHhN1%2BDyf7pmxak6c%3D&reserved=0
>
More information about the Intel-gfx
mailing list