[Mesa-dev] [PATCH v2 04/11] gallium: add endian_format field to struct pipe_resource

Michel Dänzer michel at daenzer.net
Wed Apr 20 08:28:03 UTC 2016


On 20.04.2016 03:13, Oded Gabbay wrote:
> On Tue, Apr 19, 2016 at 5:59 PM, Marek Olšák <maraeo at gmail.com> wrote:
>> On Tue, Apr 19, 2016 at 3:11 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>> On Mon, Apr 18, 2016 at 6:03 PM,
>>> Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>> On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>>> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <oded.gabbay at gmail.com> wrote:
>>>>>>>> Wouldn't it make more sense to handle such issues in transfer_map?
>>>>>>>> (i.e. create a staging memory area, and decode into it)? This assumes
>>>>>>>> that the transfer_map() call has enough information to "do the right
>>>>>>>> thing". I don't think it does today, but perhaps it could be taught?
>>>>>>> It doesn't have all the info today, that's for sure. I imagine though
>>>>>>> we can add parameters to it.
>>>>>>>
>>>>>>>> That way everything that's in a pipe_resource is in some
>>>>>>>> tightly-controlled format, and we specify the LE <-> BE parameters
>>>>>>>> when converting between CPU-read/written and GPU-read/written data. I
>>>>>>>> believe this is a better match for what's really happening, too. What
>>>>>>>> do you think?
>>>>>>>>
>>>>>>>>   -ilia
>>>>>>>
>>>>>>> Unless I'm missing something, I think, at the end of the day, it will
>>>>>>> be the same issues as in my solution - per code path per format is a
>>>>>>> different case. That's because you will still need to "teach"
>>>>>>> transfer_map, per each transfer per format what to do. So one will
>>>>>>> need to go and debug every single code path there is in mesa for
>>>>>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5
>>>>>>> months. It's a great learning experience but it won't give anything
>>>>>>> generic.
>>>>>>>
>>>>>>> Again, for example, in st_ReadPixels, I imagine you will need to give
>>>>>>> "different orders" to transfer_map for the two different scenarios -
>>>>>>> H/W blit and fallback. So what's the gain here ?
>>>>>>>
>>>>>>> If I'm missing something, please tell me.
>>>>>>
>>>>>> One of us is... let's figure out which one :)
>>>>>>
>>>>>> Here's my proposal:
>>>>>>
>>>>>> All data stored inside of resources is stored in a driver-happy
>>>>>> format. The driver ensures that it's stored in proper endianness, etc.
>>>>>> (Much like it does today wrt proper stride.)
>>>>>>
>>>>>> Blitting(/copying) between resources doesn't require any additional
>>>>>> information, since you have the format(s) of the respective resources,
>>>>>> and it's all inside the driver, so the driver does whatever it needs
>>>>>> to do to make it all "work".
>>>>>>
>>>>>> *Accessing and modifying* resources (directly) from the CPU is what
>>>>>> becomes tricky. The state tracker may have incorrect expectations of
>>>>>> the actual backing data. There are a few different ways to resolve
>>>>>> this. The one I'm proposing is that you only ever return a pointer to
>>>>>> the directly underlying data if it matches the CPU's expectations
>>>>>> (which will only be the case for byte-oriented array formats like
>>>>>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g.
>>>>>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through
>>>>>> a bounce buffer.
>>>>>>
>>>>>> At transfer map time, you convert the data from GPU-style to
>>>>>> CPU-style, and copy back the relevant bits at unmap/flush time.
>>>>>>
>>>>>> This presents a nice clean boundary for this stuff. Instead of the
>>>>>> state tracker trying to guess what the driver will do and feeding it
>>>>>> endiannesses that it can't possibly guess properly, the tracking logic
>>>>>> is relegated to the driver, and we extend the interfaces to allow the
>>>>>> state tracker to access the data in a proper way.
>>>>>>
>>>>>> I believe the advantage of this scheme is that beyond adding format
>>>>>> parameters to pipe_transfer_map() calls, there will not need to be any
>>>>>> adjustments to the state trackers.
>>>>>>
>>>>>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it
>>>>>> maps a buffer, it's formatless (at map time), and yet the GPU will be
>>>>>> required to interpret it correctly. We could decree that PIPE_BUFFER
>>>>>> is just *always* an array of R8_UNORM and thus never needs any type of
>>>>>> swapping. The driver needs to adjust accordingly to deal with accesses
>>>>>> that don't fit that pattern (and where parameters can't be fed to the
>>>>>> GPU to interpret it properly).
>>>>>>
>>>>>> I think something like the above will work. And I think it presents a
>>>>>> cleaner barrier than your proposal, because none of the "this GPU can
>>>>>> kinda-sorta understand BE, but not everywhere" details are ever
>>>>>> exposed to the state tracker.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>>   -ilia
>>>>>
>>>>> Ilia,
>>>>>
>>>>> To make the GPU do a conversion during blitting, I need to configure
>>>>> registers. This is done in a couple of functions in the r600g driver
>>>>> (r600_translate_texformat, r600_colorformat_endian_swap,
>>>>> r600_translate_colorformat and r600_translate_colorswap).
>>>>>
>>>>> The problem is that transfer_map/unmap don't call directly to those
>>>>> functions. They call other functions which eventually call those 4
>>>>> functions. Among those "other" functions, there are several function
>>>>> calls which are *not* in the r600g driver. i.e. we go back to generic
>>>>> util functions. For example:
>>>>>
>>>>> #0  r600_translate_colorformat
>>>>> #1  evergreen_init_color_surface
>>>>> #2  evergreen_set_framebuffer_state
>>>>> #3  util_blitter_custom_depth_stencil
>>>>> #4  r600_blit_decompress_depth
>>>>> #5  r600_texture_transfer_map
>>>>>
>>>>> Am I allowed to now pass information from transfer_map/unmap all the
>>>>> way down to the 4 functions I mentioned through all these layers as
>>>>> additional parameters ? I preferred to put it in pipe_resource as that
>>>>> information goes all the way down to those functions, but if I can't
>>>>> use that, then what's an acceptable alternative ?
>>>>>
>>>>> This time, I would like to get an agreement *before* I implement it.
>>>>
>>>> Probably a good idea. And as issues are investigated, people's
>>>> opinions on the "correct" way might shift. Let's think about this...
>>>>
>>>> So clearly *a* correct way to handle this would be to stop all the
>>>> lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g.
>>>> R5G6B5 but makes no mention of the byte layout in memory for those 16
>>>> bits. Really what we have right now is a format and an *implicit*
>>>> endian ordering, which is the CPU's. But what happens when the CPU and
>>>> GPU don't agree?
>>>>
>>>> There's a path we could take which would be to add an endianness
>>>> alongside each format (be it by doubling formats, or an explicit
>>>> second field). This would be a very far-reaching change though, and I
>>>> doubt you'll want to do it. What we're left with is having a format
>>>> and an *implicit* endianness. Which means that the consumers of the
>>>> format need to be able to work out the implicit endianness involved.
>>>> And the endianness will be GPU endian for regular resources, and CPU
>>>> endian for "staging" resources. So it's definitely tempting to stick
>>>> the endian thing into a private field of the resource, like Rob is
>>>> suggesting - when creating a staging texture in
>>>> transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise
>>>> set it to gpu endian. And I think this is somewhat similar to your
>>>> former approach.
>>>>
>>>> What do you think?
>>>>
>>>>   -ilia
>>>
>>> I don't think I have any other choice but to stick it as a private
>>> field, because the endian parameter simple can't go through all the
>>> function calls as an additional parameter. The reason is that
>>> set_framebuffer_state() function types are called from
>>> st_invalidate_state, where I don't have any idea about the "correct"
>>> endianess, so I can't add the endian parameter to that function type.
>>>
>>> The only thing that is propagated through all layers is r600_texture.
>>> I'll try to use that.
>>>
>>> Marek, Michel,
>>> Do you think it is OK to add the endian mark to that private structure ?
>>
>> Yes, but doesn't util_format_description::is_array provide that info already?
> 
> If you mean to say that they are interchangeable, than no.
> It's true that for array formats we *never* need to do swaps, but for
> non-array formats there are cases where we need to do it and cases
> where we don't need to do it.
> 
> For example, when writing to depth buffer through st_DrawPixels
> (glDrawPixels) with GL_FLOAT, than the format used is
> PIPE_FORMAT_Z32_FLOAT.
> In the make_texture() part, we need to configure endian swap for the
> staging buffer, but after you return from it and go into
> st_create_texture_sampler_view(), you want the destination texture to
> be configured without endian swapping.

So the CB/DB block can't swap bytes when accessing Z32?


> Both those buffers are configured with PIPE_FORMAT_Z32_FLOAT, so
> that's why is_array can't be used instead of a dedicated flag.

Can't the driver figure out on its own whether it needs to swap or not
based on other available information, e.g. considering the resource
usage/bind flags and/or whether the resource is sampled from or rendered to?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the mesa-dev mailing list