[Mesa-dev] [PATCH] intel: Improve teximage perf for Google Chrome paint rects
Chad Versace
chad.versace at linux.intel.com
Thu Sep 20 12:54:32 PDT 2012
On 09/14/2012 07:01 PM, Paul Berry wrote:
> On 11 September 2012 12:04, Chad Versace <chad.versace at linux.intel.com>wrote:
>> + if (!image->mt ||
>> + image->mt->region->tiling != I915_TILING_X) {
>> + /* The algorithm below is written only for X-tiled memory.
>> + *
>> + * Memcpy'ing to Y-tiled memory is likely faster when done the
>> + * conventional way: that is, map the memory through a fence in
>> order to
>> + * acquire a linear view of the memory, then memcpy one row at a
>> time.
>> + * Due to the arrangement of Y-tiled memory, if we were to upload
>> + * a Y-tiled buffer using the algorithm below, then we could only
>> copy
>> + * one byte at a time.
>>
>
> Nit pick: actually Y tiling only rearranges octwords, so you could copy 16
> bytes at a time for y tiling. Still might not be worth it but one of these
> days we should probably investigate.
In v3, I will remove the incorrect comments about Y-tiling, and thus reduce the
comment to just the first line: "The algorithm below is written only for X-tiled
memory."
>> + /* In the tiling algorithm below, some variables are in units of
>> pixels,
>> + * others are in units of bytes, and others (such as height) are
>> unitless.
>> + * Each variable name is either prefixed or suffixed with its units.
>> + */
>>
>
> Thank you for doing this--it makes the code a lot easier to follow. It
> looks like everything uses suffixing except pixel_max_x and pixel_max_y.
> Could we just change those to max_x_pixels and max_y_pixels? Also, for
> consistency it would be nice to rename tile_height to tile_height_pixels.
pixel_x, pixel_y, byte_offset_y, and byte_offset also used prefixing, just as
pixel_max_[xy]_pixels do. In v3, I will consistently use prefixing for all.
tile_height is unitless. For example,
right: tile_size_bytes = tile_width_bytes * tile_height
wrong: tile_size_bytes2 = tile_width_bytes * tile_height_bytes
>> + uint8_t *map = bo->virtual
>> + + image_offset_y * stride_bytes
>> + + image_offset_x * cpp;
>>
>
> I don't think this will work. The way the miplevel image offsets work is
> that the parts of the mipmap other than level/layer zero are offset within
> the existing X tiling pattern. What you're doing is offsetting the origin
> of the map assuming no tiling, and then applying the X tiling pattern to
> the offset image. In the best case you'll just wind up putting data at the
> wrong location. In the worst case there's a danger of going outside the
> memory allocated for the texture.
Ouch! Thanks for catching that. In v3, I will simply restrict this function to
level 0 of BGRA 2D textures, thus avoiding image offsets altogether. Such a
restriction won't diminish the benefit that this patch gives to Chrome OS
performance, and that's the main reason I wrote this patch. Sure, it would be
nice if this fastpath supported more texture image types, and we can add that
additional support in the future, but right now I just want to get concrete
improvements for Chrome OS commited to master.
>> + intptr_t byte_offset = byte_offset_y + byte_offset_x;
>> + if (intel->has_swizzling) {
>> + bool bit6 = (byte_offset >> 6) & 1;
>> + bool bit9 = (byte_offset >> 9) & 1;
>> + bool bit10 = (byte_offset >> 10) & 1;
>> +
>> + byte_offset += 64 * (bit9 ^ bit10) * (1 - 2 * bit6);
>>
>
> Would this be faster?
>
> byte_offset ^= ((byte_offset >> 3) ^ (byte_offset >> 4)) & (1 << 6);
Clever. I'll take it.
---
Thanks for the careful review.
More information about the mesa-dev
mailing list