[PATCH 02/13] glamor: Add glamor_program based copy acceleration

Tue May 13 08:34:08 PDT 2014

Markus Wick <markus at selfnet.de> writes:

> I think
> uvec4(round(texture2D(sampler, fill_pos) * bitmul))
> is easier to read than to add 0.5. Honestly GLSL lacks rounding + 
> convertion functions :/

Sure, if glsl had a 'round' function I'd use it in a second :-)

> This logic should be in fbCopyNtoN imo.

I think it would be called fbCopyNtoM to indicate that the two values
were different?

> Why didn't you use instancing this time? We'd be able to memcpy here.

I still need that un-instanced version for GLES, and as we expect to see
precisely one rectangle most of the time, the benefits of instancing are
pretty much washed out by the overhead of the rest of the code. The only
case where you'll see multiple rectangles is under a complex clip, which
is pretty rare, and didn't seem to justify two separate code paths here.

> I'm a bit worried about this loop order. We don't change the programm at 
> all, so glamor_use_program will be almost a no-op. But we change the FBO 
> in the inner loop. Large textures are quite uncommon, but this sounds 
> like a waste of time for me.

I'm not sure what you'd hope to save; we need to bind the source texture
and set some of the uniforms for each source FBO. The only work
glamor_use_program does other than that is to call glUseProgram, which
we're  going to assume short-circuits setting the program to the current
value.

> So there is still no util function for this scissor box handling?

Do you want something that takes x/y/w/h?

> iirc we always have an element buffer bound for quads, so we could use 
> glDrawElements instead of this for loop.

Except that GLES doesn't have quads...

> Outdated comment, glCopyPixels isn't used any more

Good catch -- Eric also mentioned this and sent an updated version of
the comment which I've used in recent updates:

/**
 * Copies from the GPU to the GPU using a temporary pixmap in between,
 * to correctly handle overlapping copies.
 */

>> +static Bool
>> +glamor_copy_fbo_fbo_temp(DrawablePtr src,
> ...
>> +    /* Sanity check state to avoid getting halfway through and bailing
>> +     * at the last second. Might be nice to have checks that didn't
>> +     * involve setting state.
>> +     */
>> +    glamor_make_current(glamor_priv);
>> +
>> +    if (gc && !glamor_set_planemask(dst_pixmap, gc->planemask))
>> +        goto bail_ctx;
>> +
>> +    if (!glamor_set_alu(screen, gc ? gc->alu : GXcopy))
>> +        goto bail_ctx;
>> +    glDisable(GL_COLOR_LOGIC_OP);
>
> What would be the "last second"? I guess the first copy is always fine 
> and the second one will fail. So we always have to stall and to readback 
> a texture. imo it doesn't matter that much to make a gpu based copy 
> first. In the end, it's likely faster for FB to copy without 
> overlapping.

I'm not sure I understand your comment here; the 'last second' comment I
stuck in there related to a copy under a plane mask, which will never
work, and so we want to not create the temporary surface only to discard
it and do an fb fallback for the planemask.

>
>> +static Bool
>> +glamor_copy_needs_temp(DrawablePtr src,
> ...
>> +    glTextureBarrierNV();
>
> I think a comment is required here why we have to call 
> glTextureBarrierNV at all. eg what happens when we get two copy calls in 
> a row, both doesn't overlap, but they overlap each other. So the second 
> call might want to read the result of the first copy which isn't allowed 
> without this barrier.

Eric provided an updated comment for this; does this have enough detail?

/**
 * Returns TRUE if the copy has to be implemented with
 * glamor_copy_fbo_fbo_temp() instead of glamor_copy_fbo_fbo().
 *
 * If the src and dst are in the same pixmap, then glamor_copy_fbo_fbo()'s
 * sampling would give undefined results (since the same texture would be
 * bound as an FBO destination and as a texture source).  However, if we
 * have GL_NV_texture_barrier, we can take advantage of the exception it
 * added:
 *
 *    "- If a texel has been written, then in order to safely read the result
 *       a texel fetch must be in a subsequent Draw separated by the command
 *
 *       void TextureBarrierNV(void);
 *
 *    TextureBarrierNV() will guarantee that writes have completed and caches
 *    have been invalidated before subsequent Draws are executed."
 */

> I'm very happy to see this copy code without any deprecated api usage
> :)

Turned out to be faster on my machines to just make a copy...

> Reviewed-by: Markus Wick <markus at selfnet.de>

Thanks much!

Let me know if you have other comments on this patch

-- 
keith.packard at intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 810 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg-devel/attachments/20140513/452741ca/attachment.sig>