[Mesa-dev] [PATCH v2 04/11] gallium: add endian_format field to struct pipe_resource

Mon Apr 18 15:03:00 UTC 2016

On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>> Wouldn't it make more sense to handle such issues in transfer_map?
>>>> (i.e. create a staging memory area, and decode into it)? This assumes
>>>> that the transfer_map() call has enough information to "do the right
>>>> thing". I don't think it does today, but perhaps it could be taught?
>>> It doesn't have all the info today, that's for sure. I imagine though
>>> we can add parameters to it.
>>>
>>>> That way everything that's in a pipe_resource is in some
>>>> tightly-controlled format, and we specify the LE <-> BE parameters
>>>> when converting between CPU-read/written and GPU-read/written data. I
>>>> believe this is a better match for what's really happening, too. What
>>>> do you think?
>>>>
>>>>   -ilia
>>>
>>> Unless I'm missing something, I think, at the end of the day, it will
>>> be the same issues as in my solution - per code path per format is a
>>> different case. That's because you will still need to "teach"
>>> transfer_map, per each transfer per format what to do. So one will
>>> need to go and debug every single code path there is in mesa for
>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5
>>> months. It's a great learning experience but it won't give anything
>>> generic.
>>>
>>> Again, for example, in st_ReadPixels, I imagine you will need to give
>>> "different orders" to transfer_map for the two different scenarios -
>>> H/W blit and fallback. So what's the gain here ?
>>>
>>> If I'm missing something, please tell me.
>>
>> One of us is... let's figure out which one :)
>>
>> Here's my proposal:
>>
>> All data stored inside of resources is stored in a driver-happy
>> format. The driver ensures that it's stored in proper endianness, etc.
>> (Much like it does today wrt proper stride.)
>>
>> Blitting(/copying) between resources doesn't require any additional
>> information, since you have the format(s) of the respective resources,
>> and it's all inside the driver, so the driver does whatever it needs
>> to do to make it all "work".
>>
>> *Accessing and modifying* resources (directly) from the CPU is what
>> becomes tricky. The state tracker may have incorrect expectations of
>> the actual backing data. There are a few different ways to resolve
>> this. The one I'm proposing is that you only ever return a pointer to
>> the directly underlying data if it matches the CPU's expectations
>> (which will only be the case for byte-oriented array formats like
>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g.
>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through
>> a bounce buffer.
>>
>> At transfer map time, you convert the data from GPU-style to
>> CPU-style, and copy back the relevant bits at unmap/flush time.
>>
>> This presents a nice clean boundary for this stuff. Instead of the
>> state tracker trying to guess what the driver will do and feeding it
>> endiannesses that it can't possibly guess properly, the tracking logic
>> is relegated to the driver, and we extend the interfaces to allow the
>> state tracker to access the data in a proper way.
>>
>> I believe the advantage of this scheme is that beyond adding format
>> parameters to pipe_transfer_map() calls, there will not need to be any
>> adjustments to the state trackers.
>>
>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it
>> maps a buffer, it's formatless (at map time), and yet the GPU will be
>> required to interpret it correctly. We could decree that PIPE_BUFFER
>> is just *always* an array of R8_UNORM and thus never needs any type of
>> swapping. The driver needs to adjust accordingly to deal with accesses
>> that don't fit that pattern (and where parameters can't be fed to the
>> GPU to interpret it properly).
>>
>> I think something like the above will work. And I think it presents a
>> cleaner barrier than your proposal, because none of the "this GPU can
>> kinda-sorta understand BE, but not everywhere" details are ever
>> exposed to the state tracker.
>>
>> Thoughts?
>>
>>   -ilia
>
> Ilia,
>
> To make the GPU do a conversion during blitting, I need to configure
> registers. This is done in a couple of functions in the r600g driver
> (r600_translate_texformat, r600_colorformat_endian_swap,
> r600_translate_colorformat and r600_translate_colorswap).
>
> The problem is that transfer_map/unmap don't call directly to those
> functions. They call other functions which eventually call those 4
> functions. Among those "other" functions, there are several function
> calls which are *not* in the r600g driver. i.e. we go back to generic
> util functions. For example:
>
> #0  r600_translate_colorformat
> #1  evergreen_init_color_surface
> #2  evergreen_set_framebuffer_state
> #3  util_blitter_custom_depth_stencil
> #4  r600_blit_decompress_depth
> #5  r600_texture_transfer_map
>
> Am I allowed to now pass information from transfer_map/unmap all the
> way down to the 4 functions I mentioned through all these layers as
> additional parameters ? I preferred to put it in pipe_resource as that
> information goes all the way down to those functions, but if I can't
> use that, then what's an acceptable alternative ?
>
> This time, I would like to get an agreement *before* I implement it.

Probably a good idea. And as issues are investigated, people's
opinions on the "correct" way might shift. Let's think about this...

So clearly *a* correct way to handle this would be to stop all the
lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g.
R5G6B5 but makes no mention of the byte layout in memory for those 16
bits. Really what we have right now is a format and an *implicit*
endian ordering, which is the CPU's. But what happens when the CPU and
GPU don't agree?

There's a path we could take which would be to add an endianness
alongside each format (be it by doubling formats, or an explicit
second field). This would be a very far-reaching change though, and I
doubt you'll want to do it. What we're left with is having a format
and an *implicit* endianness. Which means that the consumers of the
format need to be able to work out the implicit endianness involved.
And the endianness will be GPU endian for regular resources, and CPU
endian for "staging" resources. So it's definitely tempting to stick
the endian thing into a private field of the resource, like Rob is
suggesting - when creating a staging texture in
transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise
set it to gpu endian. And I think this is somewhat similar to your
former approach.

What do you think?

  -ilia