[Mesa-dev] [PATCH v3 01/13] gallium: Basic compute interface.

Francisco Jerez currojerez at riseup.net
Tue May 8 07:08:42 PDT 2012


Tom Stellard <thomas.stellard at amd.com> writes:

> Hi,
>
> I've been testing these updated compute patches all week and they look
> good to me.  I don't think there are any outstanding complaints, so I'll
> give my ACK for merging these into master.
>
> Very nice work!
>
> -Tom Stellard
>
>
Thanks for your comments.  Does anyone else have anything to add before
I merge the gallium-compute branch to master?

>
> On Tue, May 01, 2012 at 05:27:39PM +0200, Francisco Jerez wrote:
>> Define an interface that exposes the minimal functionality required to
>> implement some of the popular compute APIs.  This commit adds entry
>> points to set the grid layout and other state required to keep track
>> of the usual address spaces employed in compute APIs, to bind a
>> compute program, and execute it on the device.
>> 
>> Reviewed-by: Marek Olšák <maraeo at gmail.com>
>> ---
>> v2: Add "start slot" argument to the resource binding driver hooks.
>> v3: Split sampler views from shader resources.
>> 
>>  src/gallium/docs/source/context.rst        |   39 +++++++++++++++
>>  src/gallium/docs/source/screen.rst         |   28 ++++++++++-
>>  src/gallium/include/pipe/p_context.h       |   73 ++++++++++++++++++++++++++++
>>  src/gallium/include/pipe/p_defines.h       |   19 +++++++-
>>  src/gallium/include/pipe/p_screen.h        |   12 +++++
>>  src/gallium/include/pipe/p_shader_tokens.h |    9 ++++
>>  src/gallium/include/pipe/p_state.h         |    7 +++
>>  7 files changed, 185 insertions(+), 2 deletions(-)
>> 
>> diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst
>> index b2872cd..cb9b8de 100644
>> --- a/src/gallium/docs/source/context.rst
>> +++ b/src/gallium/docs/source/context.rst
>> @@ -542,3 +542,42 @@ These flags control the behavior of a transfer object.
>>  ``PIPE_TRANSFER_FLUSH_EXPLICIT``
>>    Written ranges will be notified later with :ref:`transfer_flush_region`.
>>    Cannot be used with ``PIPE_TRANSFER_READ``.
>> +
>> +
>> +Compute kernel execution
>> +^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +A compute program can be defined, bound or destroyed using
>> +``create_compute_state``, ``bind_compute_state`` or
>> +``destroy_compute_state`` respectively.
>> +
>> +Any of the subroutines contained within the compute program can be
>> +executed on the device using the ``launch_grid`` method.  This method
>> +will execute as many instances of the program as elements in the
>> +specified N-dimensional grid, hopefully in parallel.
>> +
>> +The compute program has access to four special resources:
>> +
>> +* ``GLOBAL`` represents a memory space shared among all the threads
>> +  running on the device.  An arbitrary buffer created with the
>> +  ``PIPE_BIND_GLOBAL`` flag can be mapped into it using the
>> +  ``set_global_binding`` method.
>> +
>> +* ``LOCAL`` represents a memory space shared among all the threads
>> +  running in the same working group.  The initial contents of this
>> +  resource are undefined.
>> +
>> +* ``PRIVATE`` represents a memory space local to a single thread.
>> +  The initial contents of this resource are undefined.
>> +
>> +* ``INPUT`` represents a read-only memory space that can be
>> +  initialized at ``launch_grid`` time.
>> +
>> +These resources use a byte-based addressing scheme, and they can be
>> +accessed from the compute program by means of the LOAD/STORE TGSI
>> +opcodes.
>> +
>> +In addition, normal texture sampling is allowed from the compute
>> +program: ``bind_compute_sampler_states`` may be used to set up texture
>> +samplers for the compute stage and ``set_compute_sampler_views`` may
>> +be used to bind a number of sampler views to it.
>> diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst
>> index 05f7e8f..5d8280a 100644
>> --- a/src/gallium/docs/source/screen.rst
>> +++ b/src/gallium/docs/source/screen.rst
>> @@ -110,7 +110,8 @@ The integer capabilities:
>>  * ``PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY``: This CAP describes
>>    a hw limitation.  If true, pipe_vertex_element::src_offset must always be
>>    aligned to 4.  If false, there are no restrictions on src_offset.
>> -
>> +* ``PIPE_CAP_COMPUTE``: Whether the implementation supports the
>> +  compute entry points defined in pipe_context and pipe_screen.
>>  
>>  
>>  .. _pipe_capf:
>> @@ -186,6 +187,29 @@ to be 0.
>>    samplers.
>>  
>>  
>> +.. _pipe_compute_cap:
>> +
>> +PIPE_COMPUTE_CAP_*
>> +^^^^^^^^^^^^^^^^^^
>> +
>> +Compute-specific capabilities. They can be queried using
>> +pipe_screen::get_compute_param.
>> +
>> +* ``PIPE_COMPUTE_CAP_GRID_DIMENSION``: Number of supported dimensions
>> +  for grid and block coordinates.  Value type: ``uint64_t``.
>> +* ``PIPE_COMPUTE_CAP_MAX_GRID_SIZE``: Maximum grid size in block
>> +  units.  Value type: ``uint64_t []``.
>> +* ``PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE``: Maximum block size in thread
>> +  units.  Value type: ``uint64_t []``.
>> +* ``PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE``: Maximum size of the GLOBAL
>> +  resource.  Value type: ``uint64_t``.
>> +* ``PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE``: Maximum size of the LOCAL
>> +  resource.  Value type: ``uint64_t``.
>> +* ``PIPE_COMPUTE_CAP_MAX_PRIVATE_SIZE``: Maximum size of the PRIVATE
>> +  resource.  Value type: ``uint64_t``.
>> +* ``PIPE_COMPUTE_CAP_MAX_INPUT_SIZE``: Maximum size of the INPUT
>> +  resource.  Value type: ``uint64_t``.
>> +
>>  .. _pipe_bind:
>>  
>>  PIPE_BIND_*
>> @@ -223,6 +247,8 @@ resources might be created and handled quite differently.
>>  * ``PIPE_BIND_SCANOUT``: A front color buffer or scanout buffer.
>>  * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
>>    process.
>> +* ``PIPE_BIND_GLOBAL``: A buffer that can be mapped into the global
>> +  address space of a compute program.
>>  
>>  .. _pipe_usage:
>>  
>> diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h
>> index 8b4a158..3c0b89e 100644
>> --- a/src/gallium/include/pipe/p_context.h
>> +++ b/src/gallium/include/pipe/p_context.h
>> @@ -63,6 +63,7 @@ struct pipe_vertex_element;
>>  struct pipe_video_buffer;
>>  struct pipe_video_decoder;
>>  struct pipe_viewport_state;
>> +struct pipe_compute_state;
>>  union pipe_color_union;
>>  union pipe_query_result;
>>  
>> @@ -141,6 +142,10 @@ struct pipe_context {
>>     void   (*bind_geometry_sampler_states)(struct pipe_context *,
>>                                            unsigned num_samplers,
>>                                            void **samplers);
>> +   void   (*bind_compute_sampler_states)(struct pipe_context *,
>> +                                         unsigned start_slot,
>> +                                         unsigned num_samplers,
>> +                                         void **samplers);
>>     void   (*delete_sampler_state)(struct pipe_context *, void *);
>>  
>>     void * (*create_rasterizer_state)(struct pipe_context *,
>> @@ -220,6 +225,10 @@ struct pipe_context {
>>                                        unsigned num_views,
>>                                        struct pipe_sampler_view **);
>>  
>> +   void (*set_compute_sampler_views)(struct pipe_context *,
>> +                                     unsigned start_slot, unsigned num_views,
>> +                                     struct pipe_sampler_view **);
>> +
>>     void (*set_vertex_buffers)( struct pipe_context *,
>>                                 unsigned num_buffers,
>>                                 const struct pipe_vertex_buffer * );
>> @@ -418,6 +427,70 @@ struct pipe_context {
>>      */
>>     struct pipe_video_buffer *(*create_video_buffer)( struct pipe_context *context,
>>                                                       const struct pipe_video_buffer *templat );
>> +
>> +   /**
>> +    * Compute kernel execution
>> +    */
>> +   /*@{*/
>> +   /**
>> +    * Define the compute program and parameters to be used by
>> +    * pipe_context::launch_grid.
>> +    */
>> +   void *(*create_compute_state)(struct pipe_context *context,
>> +				 const struct pipe_compute_state *);
>> +   void (*bind_compute_state)(struct pipe_context *, void *);
>> +   void (*delete_compute_state)(struct pipe_context *, void *);
>> +
>> +   /**
>> +    * Bind an array of buffers to be mapped into the address space of
>> +    * the GLOBAL resource.  Any buffers that were previously bound
>> +    * between [first, first + count - 1] are unbound after this call.
>> +    *
>> +    * \param first      first buffer to map.
>> +    * \param count      number of consecutive buffers to map.
>> +    * \param resources  array of pointers to the buffers to map, it
>> +    *                   should contain at least \a count elements
>> +    *                   unless it's NULL, in which case no new
>> +    *                   resources will be bound.
>> +    * \param handles    array of pointers to the memory locations that
>> +    *                   will be filled with the respective base
>> +    *                   addresses each buffer will be mapped to.  It
>> +    *                   should contain at least \a count elements,
>> +    *                   unless \a resources is NULL in which case \a
>> +    *                   handles should be NULL as well.
>> +    *
>> +    * Note that the driver isn't required to make any guarantees about
>> +    * the contents of the \a handles array being valid anytime except
>> +    * during the subsequent calls to pipe_context::launch_grid.  This
>> +    * means that the only sensible location handles[i] may point to is
>> +    * somewhere within the INPUT buffer itself.  This is so to
>> +    * accommodate implementations that lack virtual memory but
>> +    * nevertheless migrate buffers on the fly, leading to resource
>> +    * base addresses that change on each kernel invocation or are
>> +    * unknown to the pipe driver.
>> +    */
>> +   void (*set_global_binding)(struct pipe_context *context,
>> +                              unsigned first, unsigned count,
>> +                              struct pipe_resource **resources,
>> +                              uint32_t **handles);
>> +
>> +   /**
>> +    * Launch the compute kernel starting from instruction \a pc of the
>> +    * currently bound compute program.
>> +    *
>> +    * \a grid_layout and \a block_layout are arrays of size \a
>> +    * PIPE_COMPUTE_CAP_GRID_DIMENSION that determine the layout of the
>> +    * grid (in block units) and working block (in thread units) to be
>> +    * used, respectively.
>> +    *
>> +    * \a input will be used to initialize the INPUT resource, and it
>> +    * should point to a buffer of at least
>> +    * pipe_compute_state::req_input_mem bytes.
>> +    */
>> +   void (*launch_grid)(struct pipe_context *context,
>> +                       const uint *block_layout, const uint *grid_layout,
>> +                       uint32_t pc, const void *input);
>> +   /*@}*/
>>  };
>>  
>>  
>> diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h
>> index 8b6d00d..c4c217b 100644
>> --- a/src/gallium/include/pipe/p_defines.h
>> +++ b/src/gallium/include/pipe/p_defines.h
>> @@ -304,6 +304,7 @@ enum pipe_transfer_usage {
>>  #define PIPE_BIND_STREAM_OUTPUT        (1 << 11) /* set_stream_output_buffers */
>>  #define PIPE_BIND_CURSOR               (1 << 16) /* mouse cursor */
>>  #define PIPE_BIND_CUSTOM               (1 << 17) /* state-tracker/winsys usages */
>> +#define PIPE_BIND_GLOBAL               (1 << 18) /* set_global_binding */
>>  
>>  /* The first two flags above were previously part of the amorphous
>>   * TEXTURE_USAGE, most of which are now descriptions of the ways a
>> @@ -346,7 +347,8 @@ enum pipe_transfer_usage {
>>  #define PIPE_SHADER_VERTEX   0
>>  #define PIPE_SHADER_FRAGMENT 1
>>  #define PIPE_SHADER_GEOMETRY 2
>> -#define PIPE_SHADER_TYPES    3
>> +#define PIPE_SHADER_COMPUTE  3
>> +#define PIPE_SHADER_TYPES    4
>>  
>>  
>>  /**
>> @@ -477,6 +479,7 @@ enum pipe_cap {
>>     PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY = 65,
>>     PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY = 66,
>>     PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY = 67,
>> +   PIPE_CAP_COMPUTE = 68
>>  };
>>  
>>  /**
>> @@ -522,6 +525,20 @@ enum pipe_shader_cap
>>     PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS = 18
>>  };
>>  
>> +/**
>> + * Compute-specific implementation capability.  They can be queried
>> + * using pipe_screen::get_compute_param.
>> + */
>> +enum pipe_compute_cap
>> +{
>> +   PIPE_COMPUTE_CAP_GRID_DIMENSION,
>> +   PIPE_COMPUTE_CAP_MAX_GRID_SIZE,
>> +   PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE,
>> +   PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE,
>> +   PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE,
>> +   PIPE_COMPUTE_CAP_MAX_PRIVATE_SIZE,
>> +   PIPE_COMPUTE_CAP_MAX_INPUT_SIZE
>> +};
>>  
>>  /**
>>   * Composite query types
>> diff --git a/src/gallium/include/pipe/p_screen.h b/src/gallium/include/pipe/p_screen.h
>> index 45c441b..7ae7c9a 100644
>> --- a/src/gallium/include/pipe/p_screen.h
>> +++ b/src/gallium/include/pipe/p_screen.h
>> @@ -98,6 +98,18 @@ struct pipe_screen {
>>  			   enum pipe_video_profile profile,
>>  			   enum pipe_video_cap param );
>>  
>> +   /**
>> +    * Query a compute-specific capability/parameter/limit.
>> +    * \param param  one of PIPE_COMPUTE_CAP_x
>> +    * \param ret    pointer to a preallocated buffer that will be
>> +    *               initialized to the parameter value, or NULL.
>> +    * \return       size in bytes of the parameter value that would be
>> +    *               returned.
>> +    */
>> +   int (*get_compute_param)(struct pipe_screen *,
>> +			    enum pipe_compute_cap param,
>> +			    void *ret);
>> +
>>     struct pipe_context * (*context_create)( struct pipe_screen *,
>>  					    void *priv );
>>  
>> diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h
>> index df2dd5e..9d08fde 100644
>> --- a/src/gallium/include/pipe/p_shader_tokens.h
>> +++ b/src/gallium/include/pipe/p_shader_tokens.h
>> @@ -166,6 +166,15 @@ struct tgsi_declaration_resource {
>>     unsigned ReturnTypeW : 6; /**< one of enum pipe_type */
>>  };
>>  
>> +/*
>> + * Special resources that don't need to be declared.  They map to the
>> + * GLOBAL/LOCAL/PRIVATE/INPUT compute memory spaces.
>> + */
>> +#define TGSI_RESOURCE_GLOBAL	0x7fff
>> +#define TGSI_RESOURCE_LOCAL	0x7ffe
>> +#define TGSI_RESOURCE_PRIVATE	0x7ffd
>> +#define TGSI_RESOURCE_INPUT	0x7ffc
>> +
>>  #define TGSI_IMM_FLOAT32   0
>>  #define TGSI_IMM_UINT32    1
>>  #define TGSI_IMM_INT32     2
>> diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h
>> index a459a56..74f4ebd 100644
>> --- a/src/gallium/include/pipe/p_state.h
>> +++ b/src/gallium/include/pipe/p_state.h
>> @@ -580,6 +580,13 @@ struct pipe_resolve_info
>>     unsigned mask; /**< PIPE_MASK_RGBA, Z, S or ZS */
>>  };
>>  
>> +struct pipe_compute_state
>> +{
>> +   const struct tgsi_token *tokens; /**< Compute program to be executed. */
>> +   unsigned req_local_mem; /**< Required size of the LOCAL resource. */
>> +   unsigned req_private_mem; /**< Required size of the PRIVATE resource. */
>> +   unsigned req_input_mem; /**< Required size of the INPUT resource. */
>> +};
>>  
>>  #ifdef __cplusplus
>>  }
>> -- 
>> 1.7.10
>> 
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20120508/33bc1464/attachment-0001.pgp>


More information about the mesa-dev mailing list