[Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions

Wed Feb 5 17:40:42 PST 2014

Am 06.02.2014 00:49, schrieb Jose Fonseca:
> I hadn't looked at GL_ARB_buffer_storage. I need to read more closely, but at a glance i looks like GL_MAP_PERSISTENT_BIT alone is okay (app needs to call FlushMappedBufferRange must be called to guarantee coherence) but if GL_MAP_COHERENCE_BIT is set we are indeed in face of the same issue... :-(
> 
> Even worse, being part of GL 4.4 and there being no way for the implementation to fail GL_MAP_COHERENCE_BIT mappings, it means there is no way to avoid supporting it...
> 
> Jose
> 
> Note to self: my time would be better spent on reviewing extensions before they are ratified, than ranting after the fact...

I don't think that would work. The reason for this stuff to exist is
because new hw makes that possible on the hw level directly. Some apus
might even be able to share such buffers in LLC (I don't know if Haswell
can do that, and AMD APUs lack a common cache level but they can
actually do fully coherent memory access from the cpu and gpu side). Now
with discrete chips it's not that easy but everybody is doing unified
memory these days.
I don't know how to solve this for tracing, though, indeed seems
impossible...

Roland

> 
> ----- Original Message -----
>> However, GL_ARB_buffer_storage (OpenGL 4.4) with GL_MAP_PERSISTENT_BIT
>> isn't much different. The only difference I see between
>> ARB_buffer_storage and AMD_pinned_memory is that AMD_pinned_memory
>> allows mapping CPU memory to the GPU address space permanently, while
>> ARB_buffer_storage allows mapping GPU memory to the CPU address
>> permanently. At the end of the day, both the GPU and the CPU can read
>> and modify the same buffer and all they need to use for
>> synchronization is fences.
>>
>> Marek
>>
>> On Wed, Feb 5, 2014 at 8:10 PM, Jose Fonseca <jfonseca at vmware.com> wrote:
>>>
>>>
>>> ----- Original Message -----
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> On 05.02.2014 18:08, Jose Fonseca wrote:
>>>>>> I honestly hope that GL_AMD_pinned_memory doesn't become popular. It
>>>>>> would
>>>>>> have been alright if it wasn't for this bit in
>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://www.opengl.org/registry/specs/AMD/pinned_memory.txt&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=pA%2FnK9X3xx0wAlMUZ24PfQ1mW6wAMdTUujz%2Bx7LRwCA%3D%0A&s=ebbe1f51deb46c81578b3c125b16e31b5f4b28c1d47e283bc9ef588e2707024d
>>>>>> which says:
>>>>>>
>>>>>>      2) Can the application still use the buffer using the CPU
>>>>>>      address?
>>>>>>
>>>>>>          RESOLVED: YES. However, this access would be completely
>>>>>>          non synchronized to the OpenGL pipeline, unless explicit
>>>>>>          synchronization is being used (for example, through glFinish
>>>>>>          or
>>>>>>          by
>>>>>>          using
>>>>>>          sync objects).
>>>>>>
>>>>>> And I'm imagining apps which are streaming vertex data doing precisely
>>>>>> just
>>>>>> that...
>>>>>>
>>>>>
>>>>> I don't understand your concern, this is exactly the same behavior
>>>>> GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that
>>>>> properly. How does apitrace handle it?
>>>>
>>>> GL_AMD_pinned_memory it's nothing like GL_ARB_map_buffer_range's
>>>> GL_MAP_UNSYCHRONIZED_BIT:
>>>>
>>>> - When an app touches memory returned by
>>>> glMapBufferRange(GL_MAP_UNSYCHRONIZED_BIT) it will communicate back to the
>>>> OpenGL driver which bytes it actually touched via the
>>>> glFlushMappedBufferRange (unless the apps doesn't care about performance
>>>> and
>>>> doesn't call glFlushMappedBufferRange at all, which is silly as it will
>>>> force the OpenGL driver to assumed the whole range changed)
>>>>
>>>>   In this case, the OpenGL driver (hence apitrace) should get all the
>>>>   information it needs about which bytes were updated betwen
>>>>   glMap/glUnmap.
>>>>
>>>> - When an app touches memory bound via GL_AMD_pinned_memory outside
>>>> glMap/glUnmap, there are be _no_ hints whatsever.  The OpenGL driver might
>>>> not care as the memory is shared between CPU and GPU, so all is good as
>>>> far
>>>> is it is concerned, but all the changes the app does are invisible at an
>>>> API
>>>> level, hence apitrace will not be able to catch them unless it does
>>>> onerous
>>>> heuristics.
>>>>
>>>>
>>>> So while both extensions allow unsynchronized access, but lack of
>>>> synchronization is not my concern. My concern is that GL_AMD_pinned_memory
>>>> allows *hidden* access to GPU memory.
>>>
>>> Just for the record, the challenges GL_AMD_pinned_memory presents to
>>> Apitrace are much similar to the old-fashioned OpenGL user array pointers:
>>> an app is free to change the contents of memory pointed by user arrays
>>> pointers at any point in time, except during a draw call.  This means that
>>> before every draw call, Apitrace needs to scavenge all the user memory
>>> pointers and write their contents to the trace file, just in case the app
>>> changed it..
>>>
>>> In order to support GL_AMD_pinned_memory, for every draw call Apitrace
>>> would also need to walk over bound GL_AMD_pinned_memory (and nowadays
>>> there are loads of bound points!), and check if data changed, and
>>> serialize in the trace file if it did...
>>>
>>>
>>> I never care much about performance of Apitrace with user array pointers:
>>> it is an old paradigm; only old apps use it, or programmers which don't
>>> particular care about performance -- either way, a performance conscious
>>> app developer would use VBOs hence never hit the problem at all.  My
>>> displeasure with GL_AMD_pinned_memory is that it essentially flips
>>> everything on its head -- it encourages a paradigm which apitrace will
>>> never be able to handle properly.
>>>
>>>
>>> People often complain that OpenGL development tools are poor compared with
>>> Direct3D's.  An important fact they often miss is that Direct3D API is
>>> several orders of mangnitude tool friendlier: it's clear that Direct3D
>>> API's cares about things like allowing to query all state back, whereas
>>> OpenGL is more fire and forget and never look back -- the main concern in
>>> OpenGL is ensuring that state can go from App to Driver fast, but little
>>> thought is often given to ensuring that one can read whole state back, or
>>> ensuring that one can intercept all state as it goes between the app and
>>> the driver...
>>>
>>>
>>> In this particular case, if the answer for "Can the application still use
>>> the buffer using the CPU address?" was a NO, the world would be a much
>>> better place.
>>>
>>>
>>> Jose
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=NMr9uy2iTjWVixC0wOcYCWEIYhfo80qKwRgdodpoDzA%3D%0A&m=x3Py6SaAuizlHQhinD9Ig4nikUdXTWMc9RZ5CxQDi9M%3D%0A&s=4fe812f4242b6f3e2d4c7fde43bc25f5a3b4eb1c04ea4381b9f3a13e881a67cf
>>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/mesa-dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0A&m=GOSFCWgCUWrh3ea7iA%2FpLZICJeStXntS4ucMegrLmNY%3D%0A&s=a6771315e66f512f43f052f1fff999e68280b2db13db3193a55712294d99a754
>