[Mesa-dev] [PATCH] nv50/ir: do not perform global membar for shared memory
Ilia Mirkin
imirkin at alum.mit.edu
Mon Oct 24 14:35:01 UTC 2016
On Mon, Oct 24, 2016 at 10:29 AM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:
> Shared memory is local to CTA, thus we should only wait for
> prior memory writes which are visible to other threads in
> the same CTA, and not at global level. This should speedup
> compute shaders which use shared memory.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
> ---
> src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> index b47fc49..621a468 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> @@ -3561,12 +3561,15 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
> geni->subOp = tgsi::opcodeToSubOp(tgsi.getOpcode());
> break;
> case TGSI_OPCODE_MEMBAR:
> + {
> + uint32_t level = tgsi.getSrc(0).getValueU32(0, info);
> geni = mkOp(OP_MEMBAR, TYPE_NONE, NULL);
> geni->fixed = 1;
> - if (tgsi.getSrc(0).getValueU32(0, info) & TGSI_MEMBAR_THREAD_GROUP)
> + if ((level & TGSI_MEMBAR_THREAD_GROUP) || level == TGSI_MEMBAR_SHARED)
Probably just level & (A | B)
> geni->subOp = NV50_IR_SUBOP_MEMBAR(M, CTA);
> else
> geni->subOp = NV50_IR_SUBOP_MEMBAR(M, GL);
> + }
> break;
> case TGSI_OPCODE_ATOMUADD:
> case TGSI_OPCODE_ATOMXCHG:
> --
> 2.10.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list