[PATCH 02/13] glamor: Add glamor_program based copy acceleration

Tue May 13 10:09:15 PDT 2014

Am 2014-05-13 17:34, schrieb Keith Packard:
> Sure, if glsl had a 'round' function I'd use it in a second :-)

It was added in glsl130. As you use uvec which was also added in 
glsl130, it's fine.

>> I'm a bit worried about this loop order. We don't change the programm 
>> at
>> all, so glamor_use_program will be almost a no-op. But we change the 
>> FBO
>> in the inner loop. Large textures are quite uncommon, but this sounds
>> like a waste of time for me.
> I'm not sure what you'd hope to save; we need to bind the source 
> texture
> and set some of the uniforms for each source FBO. The only work
> glamor_use_program does other than that is to call glUseProgram, which
> we're  going to assume short-circuits setting the program to the 
> current
> value.

I hope to save some framebuffer switching. As framebuffer switches needs 
much more validating than texture binding or uniform updates, it should 
be moved to the outer loop.

>> So there is still no util function for this scissor box handling?
> Do you want something that takes x/y/w/h?

I'm more thinking about a box loop.

>> iirc we always have an element buffer bound for quads, so we could use
>> glDrawElements instead of this for loop.
> Except that GLES doesn't have quads...

So that's what the element buffer is for. Just emit 6 vertices as 
triangles per quad and you'll get your quads :)
0 1 2  0 2 3   4 5 6  4 6 7   ...

>> What would be the "last second"? I guess the first copy is always fine
>> and the second one will fail. So we always have to stall and to 
>> readback
>> a texture. imo it doesn't matter that much to make a gpu based copy
>> first. In the end, it's likely faster for FB to copy without
>> overlapping.
> I'm not sure I understand your comment here; the 'last second' comment 
> I
> stuck in there related to a copy under a plane mask, which will never
> work, and so we want to not create the temporary surface only to 
> discard
> it and do an fb fallback for the planemask.

I wanted to say that we don't have to discard the temp copy directly. We 
can still copy by fb from there. Maybe this has some advantages, but I 
doubt.

>> I think a comment is required here why we have to call
>> glTextureBarrierNV at all. eg what happens when we get two copy calls 
>> in
>> a row, both doesn't overlap, but they overlap each other. So the 
>> second
>> call might want to read the result of the first copy which isn't 
>> allowed
>> without this barrier.
> 
> Eric provided an updated comment for this; does this have enough 
> detail?
> 
> /**
>  * Returns TRUE if the copy has to be implemented with
>  * glamor_copy_fbo_fbo_temp() instead of glamor_copy_fbo_fbo().
>  *
>  * If the src and dst are in the same pixmap, then 
> glamor_copy_fbo_fbo()'s
>  * sampling would give undefined results (since the same texture would 
> be
>  * bound as an FBO destination and as a texture source).  However, if 
> we
>  * have GL_NV_texture_barrier, we can take advantage of the exception 
> it
>  * added:
>  *
>  *    "- If a texel has been written, then in order to safely read the 
> result
>  *       a texel fetch must be in a subsequent Draw separated by the 
> command
>  *
>  *       void TextureBarrierNV(void);
>  *
>  *    TextureBarrierNV() will guarantee that writes have completed and 
> caches
>  *    have been invalidated before subsequent Draws are executed."
>  */

This commend doesn't describe why we have to call glTextureBarrierNV 
without overlapping copys at all. We only need it for multiple X11 copy 
calls.