[Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
Michel Dänzer
michel at daenzer.net
Sat Apr 16 05:51:15 UTC 2016
On 16.04.2016 11:39, Tom Stellard wrote:
> The ds_bpermute instruction allows threads to transfer data directly
> to or from the vgprs of other threads. These instructions use the lds
> hardware to transfer data, but do not read or write lds memory.
>
> DDX BEFORE: | DDX AFTER:
> |
> v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0
> v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2
> v_lshlrev_b32_e32 v4, 2, v2 | v_and_b32_e32 v2, 0x3ffffffc, v2
> v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2
> v_lshlrev_b32_e32 v3, 2, v2 | ds_bpermute_b32 v3, v2, v0
> s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4
> ds_write_b32 v4, v0 | s_waitcnt lgkmcnt(0)
> s_waitcnt lgkmcnt(0) |
> v_or_b32_e32 v0, 1, v2 |
> v_lshlrev_b32_e32 v0, 2, v0 |
> ds_read_b32 v1, v3 |
> ds_read_b32 v0, v0 |
> s_waitcnt lgkmcnt(0) |
> |
> LDS: 1 blocks | LDS: 0 blocks
Nice.
Were these intrinsics already available in LLVM 3.6? If not, the old
code needs to be kept for backwards compatibility.
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
More information about the mesa-dev
mailing list