[Bug 76862] clamp with bounds inside [0, 1] generates slow code

Tue Jun 3 18:45:11 PDT 2014

https://bugs.freedesktop.org/show_bug.cgi?id=76862

--- Comment #2 from Matt Turner <mattst88 at gmail.com> ---
Ian's description has bugs, so let me try again:

GLSL's clamp(A, B, C) clamps A to a lower bound of B and an upper bound of C.
We implement this in the compiler with min() and max() operations: min(max(a,
b), c).

i965 assembly for clamp(A, B, C) would look like

(select from A, B the argument that is greater than or equal; i.e., max)
sel.ge tmp, A, B
(select from tmp, C the argument that is less that; i.e., min)
sel.l  dst, tmp, C

Saturate is a special case of clamp, specifically when the bounds are 0.0 to
1.0 (for floating point types). Probably all GPUs can perform saturate for free
-- it's a destination modifier in i965 assembly.

The i965 backend's try_emit_saturate() function recognizes min(max(a, 0.0),
1.0) as a saturate operation, and sets the saturate modifier (or emits a MOV
instruction with saturate).

i965 assembly for clamp(A, 0.0, 1.0) (after try_emit_saturate()) would turn
into

mov.sat dst, A

The proposed optimization idea here is that for immediate arguments that
satisfy the condition in comment #0, we can emit a single min/max instruction
with a saturate modifier instead of a min and a max instruction.

So, clamp(A, 0, 0.5) would be

(select the least of A and 0.5, saturate the result, and store in dst)
sel.l.sat dst, A, 0.5

and similarly clamp(A, 0.1, 1) -> sel.ge.sat dst, A, 0.1.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20140604/94628b6e/attachment-0001.html>