[Mesa-dev] [PATCH] clover: Pass buffer offsets to the driver in set_global_binding()

Francisco Jerez currojerez at riseup.net
Tue Feb 18 13:47:29 PST 2014


Tom Stellard <tom at stellard.net> writes:

> On Sat, Feb 15, 2014 at 12:22:45PM +0100, Francisco Jerez wrote:
>> Tom Stellard <tom at stellard.net> writes:
>> 
>> > From: Tom Stellard <thomas.stellard at amd.com>
>> >
>> > The offsets will be stored in the handles parameter.  This makes
>> > it possible to use sub-buffers.
>> > ---
>> >  src/gallium/drivers/r600/evergreen_compute.c      |  2 +-
>> >  src/gallium/drivers/radeonsi/si_compute.c         |  1 +
>> >  src/gallium/include/pipe/p_context.h              |  9 ++++++---
>> >  src/gallium/state_trackers/clover/core/kernel.cpp | 11 +++++++++--
>> >  4 files changed, 17 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
>> > index 70efe5c..ee00a24 100644
>> > --- a/src/gallium/drivers/r600/evergreen_compute.c
>> > +++ b/src/gallium/drivers/r600/evergreen_compute.c
>> > @@ -665,7 +665,7 @@ static void evergreen_set_global_binding(
>> >  		assert(resources[i]->target == PIPE_BUFFER);
>> >  		assert(resources[i]->bind & PIPE_BIND_GLOBAL);
>> >  
>> > -		*(handles[i]) = buffers[i]->chunk->start_in_dw * 4;
>> > +		*(handles[i]) += buffers[i]->chunk->start_in_dw * 4;
>> >  	}
>> >  
>> >  	evergreen_set_rat(ctx->cs_shader_state.shader, 0, pool->bo, 0, pool->size_in_dw * 4);
>> > diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c
>> > index a7f49e7..e0e6bb4 100644
>> > --- a/src/gallium/drivers/radeonsi/si_compute.c
>> > +++ b/src/gallium/drivers/radeonsi/si_compute.c
>> > @@ -109,6 +109,7 @@ static void si_set_global_binding(
>> >  		uint64_t va;
>> >  		program->global_buffers[i] = resources[i];
>> >  		va = r600_resource_va(ctx->screen, resources[i]);
>> > +		va += *handles[i];
>> >  		memcpy(handles[i], &va, sizeof(va));
>> >  	}
>> >  }
>> > diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h
>> > index 8ef6e27..eef0a63 100644
>> > --- a/src/gallium/include/pipe/p_context.h
>> > +++ b/src/gallium/include/pipe/p_context.h
>> > @@ -461,9 +461,12 @@ struct pipe_context {
>> >      *                   resources will be bound.
>> >      * \param handles    array of pointers to the memory locations that
>> >      *                   will be filled with the respective base
>> > -    *                   addresses each buffer will be mapped to.  It
>> > -    *                   should contain at least \a count elements,
>> > -    *                   unless \a resources is NULL in which case \a
>> > +    *                   addresses each buffer will be mapped to.  When
>> > +    *                   this function is called, the value at these memory
>> > +    *                   locations will be the offset in bytes from
>> > +    *                   the start of the buffer that should be used to
>> > +    *                   compute the handle.  It should contain at least \a count
>> > +    *                   elements, unless \a resources is NULL in which case \a
>> >      *                   handles should be NULL as well.
>> >      *
>> 
>> Maybe something like this would be easier to understand?
>> 
>> |    * \param handles    array of pointers to the memory locations that
>> |    *                   will be updated with the address each buffer
>> |    *                   will be mapped to.  The base memory address of
>> |    *                   each of the buffers will be added to the value
>> |    *                   pointed to by its corresponding handle to form
>> |    *                   the final address argument.  It should contain
>> |    *                   at least \a count elements, unless \a
>> |    *                   resources is NULL in which case \a handles
>> |    *                   should be NULL as well.
>> 
>> >      * Note that the driver isn't required to make any guarantees about
>> > diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp
>> > index 6d894cd..06eff85 100644
>> > --- a/src/gallium/state_trackers/clover/core/kernel.cpp
>> > +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
>> > @@ -337,8 +337,15 @@ kernel::global_argument::bind(exec_context &ctx,
>> >     align(ctx.input, marg.target_align);
>> >  
>> >     if (buf) {
>> > -      ctx.g_handles.push_back(allocate(ctx.input, marg.target_size));
>> > -      ctx.g_buffers.push_back(buf->resource(*ctx.q).pipe);
>> > +      resource &r = buf->resource(*ctx.q);
>> 
>> This could be 'const resource &'.
>> 
>> > +      ctx.g_handles.push_back(ctx.input.size());
>> > +      ctx.g_buffers.push_back(r.pipe);
>> > +
>> > +      // XXX: How to handle multi-demensional offsets?
>> 
>> They're not an issue, you can safely assume that the offset is
>> one-dimensional because CL doesn't support sub-images, only sub-buffers.
>> 
>> > +      assert(!r.offset[1] && !r.offset[2]);
>> > +      std::vector<uint8_t> offset = bytes(r.offset[0]);
>> 
>> 'auto v' for consistency?
>> 
>> > +      extend(offset, marg.ext_type, marg.target_size);
>> 
>> We should call byteswap() here.
>
> I don't think we should call byteswap here.  This value needs to be in
> host-endian order, because the driver will use it to compute the handle.
>

The driver should be doing device-endian operations to manipulate the
handle.  After all, the value it points to is contained in an input
buffer which is going to be read by the GPU eventually, so it should be
stored in the byte order that the GPU wants.  Sure, we could agree to
store global pointer arguments temporarily in host-endian order until
the driver comes and fixes it up, but mixing both orderings depending on
the semantics of the argument and whether or not the driver has been
called seems awkward to me.

Thanks.

> -Tom
>
>> 
>> > +      insert(ctx.input, offset);
>> >     } else {
>> >        // Null pointer.
>> >        allocate(ctx.input, marg.target_size);
>> 
>> Would you mind fixing constant sub-buffers too?  My plan was to pass the
>> offset in the low bits of the input argument, like:
>> 'bytes(ctx.resources.size() << 24 | r.offset[0])'.  They should be even
>> easier than global sub-buffers.
>> 
>> Thanks.
>> 
>> > -- 
>> > 1.8.1.5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140218/8fd9347d/attachment-0001.pgp>


More information about the mesa-dev mailing list