[Mesa-dev] GBM and the Device Memory Allocator Proposals

Wed Dec 6 10:38:28 UTC 2017

On 06.12.2017 08:01, James Jones wrote:
> On 12/01/2017 10:34 AM, Nicolai Hähnle wrote:
>> On 01.12.2017 18:09, Nicolai Hähnle wrote:
>> [snip]
>>>>> As for the actual transition API, I accept that some metadata may be
>>>>> required, and the metadata probably needs to depend on the memory 
>>>>> layout,
>>>>> which is often vendor-specific. But even linear layouts need some
>>>>> transitions for caches. We probably need at least some generic 
>>>>> "off-device
>>>>> usage" bit.
>>>>
>>>> I've started thinking of cached as a capability with a transition.. I
>>>> think that helps.  Maybe it needs to somehow be more specific (ie. if
>>>> you have two devices both with there own cache with no coherency
>>>> between the two)
>>>
>>> As I wrote above, I'd prefer not to think of "cached" as a capability 
>>> at least for radeonsi.
>>>
>>>  From the desktop perspective, I would say let's ignore caches, the 
>>> drivers know which caches they need to flush to make data visible to 
>>> other devices on the system.
>>>
>>> On the other hand, there are probably SoC cases where non-coherent 
>>> caches are shared between some but not all devices, and in that case 
>>> perhaps we do need to communicate this.
>>>
>>> So perhaps we should have two kinds of "capabilities".
>>>
>>> The first, like framebuffer compression, is a capability of the 
>>> allocated memory layout (because the compression requires a meta 
>>> surface), and devices that expose it may opportunistically use it.
>>>
>>> The second, like caches, is a capability that the device/driver will 
>>> use and you don't get a say in it, but other devices/drivers also 
>>> don't need to be aware of them.
>>>
>>> So then you could theoretically have a system that gives you:
>>>
>>> GPU:     FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache)
>>> Display: FOO/tiled(layout-caps=FOO/cc)
>>> Video:   FOO/tiled(dev-caps=FOO/vid-cache)
>>> Camera:  FOO/tiled(dev-caps=FOO/vid-cache)
>> [snip]
>>
>> FWIW, I think all that stuff about different caches quite likely 
>> over-complicates things. At the end of each "command submission" of 
>> whichever type of engine, the buffer must be in a state where the 
>> kernel is free to move it around for memory management purposes. This 
>> already puts a big constraint on the kind of (non-coherent) caches 
>> that can be supported anyway, so I wouldn't be surprised if we could 
>> get away with a *much* simpler approach.
> 
> I'd rather not depend on this type of cleverness if possible.  Other 
> kernels/OS's may not behave this way, and I'd like the allocator 
> mechanism to be something we can use across all or at least most of the 
> POSIX and POSIX-like OS's we support.  Also, this particular example is 
> not true of our proprietary Linux driver, and I suspect it won't always 
> be the case for other drivers.  If a particular driver or OS fits this 
> assumption, the driver is always free to return no-op transitions in 
> that case.

Agreed.

(What I wrote about memory management should be true for all systems, 
but the kernel could use an engine that goes through the relevant caches 
for memory management-related buffer moves. It just so happens that it 
doesn't do that on our hardware, but that's by no means universal.)

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.