[Mesa-dev] [PATCH v2 04/11] gallium: add endian_format field to struct pipe_resource

Mon Apr 18 15:38:18 UTC 2016

On Mon, Apr 18, 2016 at 5:03 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>>> Wouldn't it make more sense to handle such issues in transfer_map?
>>>>> (i.e. create a staging memory area, and decode into it)? This assumes
>>>>> that the transfer_map() call has enough information to "do the right
>>>>> thing". I don't think it does today, but perhaps it could be taught?
>>>> It doesn't have all the info today, that's for sure. I imagine though
>>>> we can add parameters to it.
>>>>
>>>>> That way everything that's in a pipe_resource is in some
>>>>> tightly-controlled format, and we specify the LE <-> BE parameters
>>>>> when converting between CPU-read/written and GPU-read/written data. I
>>>>> believe this is a better match for what's really happening, too. What
>>>>> do you think?
>>>>>
>>>>>   -ilia
>>>>
>>>> Unless I'm missing something, I think, at the end of the day, it will
>>>> be the same issues as in my solution - per code path per format is a
>>>> different case. That's because you will still need to "teach"
>>>> transfer_map, per each transfer per format what to do. So one will
>>>> need to go and debug every single code path there is in mesa for
>>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5
>>>> months. It's a great learning experience but it won't give anything
>>>> generic.
>>>>
>>>> Again, for example, in st_ReadPixels, I imagine you will need to give
>>>> "different orders" to transfer_map for the two different scenarios -
>>>> H/W blit and fallback. So what's the gain here ?
>>>>
>>>> If I'm missing something, please tell me.
>>>
>>> One of us is... let's figure out which one :)
>>>
>>> Here's my proposal:
>>>
>>> All data stored inside of resources is stored in a driver-happy
>>> format. The driver ensures that it's stored in proper endianness, etc.
>>> (Much like it does today wrt proper stride.)
>>>
>>> Blitting(/copying) between resources doesn't require any additional
>>> information, since you have the format(s) of the respective resources,
>>> and it's all inside the driver, so the driver does whatever it needs
>>> to do to make it all "work".
>>>
>>> *Accessing and modifying* resources (directly) from the CPU is what
>>> becomes tricky. The state tracker may have incorrect expectations of
>>> the actual backing data. There are a few different ways to resolve
>>> this. The one I'm proposing is that you only ever return a pointer to
>>> the directly underlying data if it matches the CPU's expectations
>>> (which will only be the case for byte-oriented array formats like
>>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g.
>>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through
>>> a bounce buffer.
>>>
>>> At transfer map time, you convert the data from GPU-style to
>>> CPU-style, and copy back the relevant bits at unmap/flush time.
>>>
>>> This presents a nice clean boundary for this stuff. Instead of the
>>> state tracker trying to guess what the driver will do and feeding it
>>> endiannesses that it can't possibly guess properly, the tracking logic
>>> is relegated to the driver, and we extend the interfaces to allow the
>>> state tracker to access the data in a proper way.
>>>
>>> I believe the advantage of this scheme is that beyond adding format
>>> parameters to pipe_transfer_map() calls, there will not need to be any
>>> adjustments to the state trackers.
>>>
>>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it
>>> maps a buffer, it's formatless (at map time), and yet the GPU will be
>>> required to interpret it correctly. We could decree that PIPE_BUFFER
>>> is just *always* an array of R8_UNORM and thus never needs any type of
>>> swapping. The driver needs to adjust accordingly to deal with accesses
>>> that don't fit that pattern (and where parameters can't be fed to the
>>> GPU to interpret it properly).
>>>
>>> I think something like the above will work. And I think it presents a
>>> cleaner barrier than your proposal, because none of the "this GPU can
>>> kinda-sorta understand BE, but not everywhere" details are ever
>>> exposed to the state tracker.
>>>
>>> Thoughts?
>>>
>>>   -ilia
>>
>> Ilia,
>>
>> To make the GPU do a conversion during blitting, I need to configure
>> registers. This is done in a couple of functions in the r600g driver
>> (r600_translate_texformat, r600_colorformat_endian_swap,
>> r600_translate_colorformat and r600_translate_colorswap).
>>
>> The problem is that transfer_map/unmap don't call directly to those
>> functions. They call other functions which eventually call those 4
>> functions. Among those "other" functions, there are several function
>> calls which are *not* in the r600g driver. i.e. we go back to generic
>> util functions. For example:
>>
>> #0  r600_translate_colorformat
>> #1  evergreen_init_color_surface
>> #2  evergreen_set_framebuffer_state
>> #3  util_blitter_custom_depth_stencil
>> #4  r600_blit_decompress_depth
>> #5  r600_texture_transfer_map
>>
>> Am I allowed to now pass information from transfer_map/unmap all the
>> way down to the 4 functions I mentioned through all these layers as
>> additional parameters ? I preferred to put it in pipe_resource as that
>> information goes all the way down to those functions, but if I can't
>> use that, then what's an acceptable alternative ?
>>
>> This time, I would like to get an agreement *before* I implement it.
>
> Probably a good idea. And as issues are investigated, people's
> opinions on the "correct" way might shift. Let's think about this...
>
> So clearly *a* correct way to handle this would be to stop all the
> lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g.
> R5G6B5 but makes no mention of the byte layout in memory for those 16
> bits. Really what we have right now is a format and an *implicit*
> endian ordering, which is the CPU's. But what happens when the CPU and
> GPU don't agree?

Not true. The memory layout is clear. There are two categories of
formats: array and packed. If a format can be an array format (all
channels are the same, bpc >= 8, bpc is a power of two), it's always
an array format. Otherwise, it's a packed format.
util_format_description::is_array says which one it is, which is
generated by one of the python scripts.

Array formats have the same layout between components, which is
endian-neutral and it's like a C array, but the components themselves
can be swapped (e.g. 8 bpc array formats are always the same in memory
regardless of endian, but 16 and 32 bpc array formats need
byte-swapping of each component's bytes, but the component order
remains the same)

Packed formats are the only formats that need byte-swapping between
components. The only gallium packed formats are:
- 10_10_10_2
- 5_5_5_1
- 5_6_5
- 4_4_4_4
- 3_3_2
- 24_8
- 8_24
- 32_8_24 (not sure
- 11_11_10
- 5_5_5_9
You can treat these like 1-component array formats (e.g. 4_4_4_4 is
like being packed in 1 uint32 component, which means you need to
byte-swap the whole uint32, instead of swapping components separately
as in 16 and 32 bpc array formats).

The nice thing about GL is that the driver doesn't have to support the
first 5 for GL2. The next 3 are depth-stencil formats, and the last 2
are only required by GL3. This means Oded can drop support for all
packed formats except 24_8, so that he only has to deal with array
formats.

Finally, there are also packed format aliases, which define packed
formats in terms of array formats. These are listed at the end of
p_format.h. For example:
- The packed format alias PIPE_FORMAT_RGBA8888_UNORM is the same as
the array format PIPE_FORMAT_R8G8B8A8_UNORM on LE.
- The packed format alias PIPE_FORMAT_ABGR8888_UNORM is the same as
the array format PIPE_FORMAT_R8G8B8A8_UNORM on BE.

As you can see, the endianness is defined very strictly and accurately
for all formats.

While Gallium has a lot of array formats and no packed formats for
8888, mesa/main has mostly packed formats for those and almost no
array formats. The reason the packed format aliases were added was to
be able to translate from mesa/main packed formats to Gallium array
formats and vice versa. It may be important to check if the format
translation between mesa/main and Gallium is correct for formats other
than 8888.

Marek