[Mesa-dev] [PATCH v2 04/11] gallium: add endian_format field to struct pipe_resource

Thu Apr 21 09:42:29 UTC 2016

On Wed, Apr 20, 2016 at 12:35 PM, Marek Olšák <maraeo at gmail.com> wrote:
> On Wed, Apr 20, 2016 at 11:14 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>> On Wed, Apr 20, 2016 at 12:04 PM, Michel Dänzer <michel at daenzer.net> wrote:
>>> On 20.04.2016 17:48, Oded Gabbay wrote:
>>>> On Wed, Apr 20, 2016 at 11:28 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>>> On 20.04.2016 03:13, Oded Gabbay wrote:
>>>>>> On Tue, Apr 19, 2016 at 5:59 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>> On Tue, Apr 19, 2016 at 3:11 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>>>>>> On Mon, Apr 18, 2016 at 6:03 PM,
>>>>>>>> Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>>>>>> On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>>>>>>>> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>>>>>>>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>>>>>>>>>>> Wouldn't it make more sense to handle such issues in transfer_map?
>>>>>>>>>>>>> (i.e. create a staging memory area, and decode into it)? This assumes
>>>>>>>>>>>>> that the transfer_map() call has enough information to "do the right
>>>>>>>>>>>>> thing". I don't think it does today, but perhaps it could be taught?
>>>>>>>>>>>> It doesn't have all the info today, that's for sure. I imagine though
>>>>>>>>>>>> we can add parameters to it.
>>>>>>>>>>>>
>>>>>>>>>>>>> That way everything that's in a pipe_resource is in some
>>>>>>>>>>>>> tightly-controlled format, and we specify the LE <-> BE parameters
>>>>>>>>>>>>> when converting between CPU-read/written and GPU-read/written data. I
>>>>>>>>>>>>> believe this is a better match for what's really happening, too. What
>>>>>>>>>>>>> do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>>   -ilia
>>>>>>>>>>>>
>>>>>>>>>>>> Unless I'm missing something, I think, at the end of the day, it will
>>>>>>>>>>>> be the same issues as in my solution - per code path per format is a
>>>>>>>>>>>> different case. That's because you will still need to "teach"
>>>>>>>>>>>> transfer_map, per each transfer per format what to do. So one will
>>>>>>>>>>>> need to go and debug every single code path there is in mesa for
>>>>>>>>>>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5
>>>>>>>>>>>> months. It's a great learning experience but it won't give anything
>>>>>>>>>>>> generic.
>>>>>>>>>>>>
>>>>>>>>>>>> Again, for example, in st_ReadPixels, I imagine you will need to give
>>>>>>>>>>>> "different orders" to transfer_map for the two different scenarios -
>>>>>>>>>>>> H/W blit and fallback. So what's the gain here ?
>>>>>>>>>>>>
>>>>>>>>>>>> If I'm missing something, please tell me.
>>>>>>>>>>>
>>>>>>>>>>> One of us is... let's figure out which one :)
>>>>>>>>>>>
>>>>>>>>>>> Here's my proposal:
>>>>>>>>>>>
>>>>>>>>>>> All data stored inside of resources is stored in a driver-happy
>>>>>>>>>>> format. The driver ensures that it's stored in proper endianness, etc.
>>>>>>>>>>> (Much like it does today wrt proper stride.)
>>>>>>>>>>>
>>>>>>>>>>> Blitting(/copying) between resources doesn't require any additional
>>>>>>>>>>> information, since you have the format(s) of the respective resources,
>>>>>>>>>>> and it's all inside the driver, so the driver does whatever it needs
>>>>>>>>>>> to do to make it all "work".
>>>>>>>>>>>
>>>>>>>>>>> *Accessing and modifying* resources (directly) from the CPU is what
>>>>>>>>>>> becomes tricky. The state tracker may have incorrect expectations of
>>>>>>>>>>> the actual backing data. There are a few different ways to resolve
>>>>>>>>>>> this. The one I'm proposing is that you only ever return a pointer to
>>>>>>>>>>> the directly underlying data if it matches the CPU's expectations
>>>>>>>>>>> (which will only be the case for byte-oriented array formats like
>>>>>>>>>>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g.
>>>>>>>>>>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through
>>>>>>>>>>> a bounce buffer.
>>>>>>>>>>>
>>>>>>>>>>> At transfer map time, you convert the data from GPU-style to
>>>>>>>>>>> CPU-style, and copy back the relevant bits at unmap/flush time.
>>>>>>>>>>>
>>>>>>>>>>> This presents a nice clean boundary for this stuff. Instead of the
>>>>>>>>>>> state tracker trying to guess what the driver will do and feeding it
>>>>>>>>>>> endiannesses that it can't possibly guess properly, the tracking logic
>>>>>>>>>>> is relegated to the driver, and we extend the interfaces to allow the
>>>>>>>>>>> state tracker to access the data in a proper way.
>>>>>>>>>>>
>>>>>>>>>>> I believe the advantage of this scheme is that beyond adding format
>>>>>>>>>>> parameters to pipe_transfer_map() calls, there will not need to be any
>>>>>>>>>>> adjustments to the state trackers.
>>>>>>>>>>>
>>>>>>>>>>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it
>>>>>>>>>>> maps a buffer, it's formatless (at map time), and yet the GPU will be
>>>>>>>>>>> required to interpret it correctly. We could decree that PIPE_BUFFER
>>>>>>>>>>> is just *always* an array of R8_UNORM and thus never needs any type of
>>>>>>>>>>> swapping. The driver needs to adjust accordingly to deal with accesses
>>>>>>>>>>> that don't fit that pattern (and where parameters can't be fed to the
>>>>>>>>>>> GPU to interpret it properly).
>>>>>>>>>>>
>>>>>>>>>>> I think something like the above will work. And I think it presents a
>>>>>>>>>>> cleaner barrier than your proposal, because none of the "this GPU can
>>>>>>>>>>> kinda-sorta understand BE, but not everywhere" details are ever
>>>>>>>>>>> exposed to the state tracker.
>>>>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>>   -ilia
>>>>>>>>>>
>>>>>>>>>> Ilia,
>>>>>>>>>>
>>>>>>>>>> To make the GPU do a conversion during blitting, I need to configure
>>>>>>>>>> registers. This is done in a couple of functions in the r600g driver
>>>>>>>>>> (r600_translate_texformat, r600_colorformat_endian_swap,
>>>>>>>>>> r600_translate_colorformat and r600_translate_colorswap).
>>>>>>>>>>
>>>>>>>>>> The problem is that transfer_map/unmap don't call directly to those
>>>>>>>>>> functions. They call other functions which eventually call those 4
>>>>>>>>>> functions. Among those "other" functions, there are several function
>>>>>>>>>> calls which are *not* in the r600g driver. i.e. we go back to generic
>>>>>>>>>> util functions. For example:
>>>>>>>>>>
>>>>>>>>>> #0  r600_translate_colorformat
>>>>>>>>>> #1  evergreen_init_color_surface
>>>>>>>>>> #2  evergreen_set_framebuffer_state
>>>>>>>>>> #3  util_blitter_custom_depth_stencil
>>>>>>>>>> #4  r600_blit_decompress_depth
>>>>>>>>>> #5  r600_texture_transfer_map
>>>>>>>>>>
>>>>>>>>>> Am I allowed to now pass information from transfer_map/unmap all the
>>>>>>>>>> way down to the 4 functions I mentioned through all these layers as
>>>>>>>>>> additional parameters ? I preferred to put it in pipe_resource as that
>>>>>>>>>> information goes all the way down to those functions, but if I can't
>>>>>>>>>> use that, then what's an acceptable alternative ?
>>>>>>>>>>
>>>>>>>>>> This time, I would like to get an agreement *before* I implement it.
>>>>>>>>>
>>>>>>>>> Probably a good idea. And as issues are investigated, people's
>>>>>>>>> opinions on the "correct" way might shift. Let's think about this...
>>>>>>>>>
>>>>>>>>> So clearly *a* correct way to handle this would be to stop all the
>>>>>>>>> lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g.
>>>>>>>>> R5G6B5 but makes no mention of the byte layout in memory for those 16
>>>>>>>>> bits. Really what we have right now is a format and an *implicit*
>>>>>>>>> endian ordering, which is the CPU's. But what happens when the CPU and
>>>>>>>>> GPU don't agree?
>>>>>>>>>
>>>>>>>>> There's a path we could take which would be to add an endianness
>>>>>>>>> alongside each format (be it by doubling formats, or an explicit
>>>>>>>>> second field). This would be a very far-reaching change though, and I
>>>>>>>>> doubt you'll want to do it. What we're left with is having a format
>>>>>>>>> and an *implicit* endianness. Which means that the consumers of the
>>>>>>>>> format need to be able to work out the implicit endianness involved.
>>>>>>>>> And the endianness will be GPU endian for regular resources, and CPU
>>>>>>>>> endian for "staging" resources. So it's definitely tempting to stick
>>>>>>>>> the endian thing into a private field of the resource, like Rob is
>>>>>>>>> suggesting - when creating a staging texture in
>>>>>>>>> transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise
>>>>>>>>> set it to gpu endian. And I think this is somewhat similar to your
>>>>>>>>> former approach.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>>   -ilia
>>>>>>>>
>>>>>>>> I don't think I have any other choice but to stick it as a private
>>>>>>>> field, because the endian parameter simple can't go through all the
>>>>>>>> function calls as an additional parameter. The reason is that
>>>>>>>> set_framebuffer_state() function types are called from
>>>>>>>> st_invalidate_state, where I don't have any idea about the "correct"
>>>>>>>> endianess, so I can't add the endian parameter to that function type.
>>>>>>>>
>>>>>>>> The only thing that is propagated through all layers is r600_texture.
>>>>>>>> I'll try to use that.
>>>>>>>>
>>>>>>>> Marek, Michel,
>>>>>>>> Do you think it is OK to add the endian mark to that private structure ?
>>>>>>>
>>>>>>> Yes, but doesn't util_format_description::is_array provide that info already?
>>>>>>
>>>>>> If you mean to say that they are interchangeable, than no.
>>>>>> It's true that for array formats we *never* need to do swaps, but for
>>>>>> non-array formats there are cases where we need to do it and cases
>>>>>> where we don't need to do it.
>>>>>>
>>>>>> For example, when writing to depth buffer through st_DrawPixels
>>>>>> (glDrawPixels) with GL_FLOAT, than the format used is
>>>>>> PIPE_FORMAT_Z32_FLOAT.
>>>>>> In the make_texture() part, we need to configure endian swap for the
>>>>>> staging buffer, but after you return from it and go into
>>>>>> st_create_texture_sampler_view(), you want the destination texture to
>>>>>> be configured without endian swapping.
>>>>>
>>>>> So the CB/DB block can't swap bytes when accessing Z32?
>>>>>
>>>> Not sure what you mean by this sentence. Could you please further
>>>> explain your question ?
>>>
>>> For packed formats / components such as Z32, the working theory is that
>>> state tracker / Mesa core / application code accesses them using the CPU
>>> byte order. So if the hardware can be programmed to always access them
>>> in the CPU byte order, that should theoretically work in all cases.
>>>
>>>
>>> --
>>> Earthling Michel Dänzer               |               http://www.amd.com
>>> Libre software enthusiast             |             Mesa and X developer
>>
>> Well, it doesn't work in practice. If that were true, than we wouldn't
>> need all this hints. We would do swaps all the time.
>>
>> And if you think about it, once you use a staging buffer in the
>> middle, than you don't want to do two swaps. If you do two swaps (from
>> CPU to staging buffer and from staging buffer to GPU), you will return
>> to the original layout, which is big-endian in our case, which the GPU
>> can't handle.
>
> Yeah, it looks like DB can't do swaps, so you need to do the swapping
> via TEX or CB.
>
> This combination of flags tells you that a texture can only be used by
> DB and TEX:
> "rtex->is_depth && !rtex->is_flushing_texture"
> This should be the only category of textures that must be in the GPU
> native format because of DB.
>
> Other textures can be in the CPU native format, because CB and TEX can
> do the swapping.
>
> R600_RESOURCE_FLAG_TRANSFER tells you which texture is a temporary
> transfer texture. Such a texture can only be used by CB and TEX. Not
> sure if this is useful, because you can already tell:
> - which texture is array or packed and set the endian flags accordingly
> - which texture can be used by DB and can't set the endian flags
>
> Marek

Hi Marek,
I think your DB analysis is spot on!
I hope I can send a new version of the patch-set today without any
need for a new flag.

Oded