[Mesa-dev] [PATCH v2 01/11] gallium: Basic compute interface.
Marek Olšák
maraeo at gmail.com
Mon Apr 2 08:13:52 PDT 2012
This looks good to me. Thank you.
Reviewed-by: Marek Olšák <maraeo at gmail.com>
Marek
On Sat, Mar 31, 2012 at 6:39 PM, Francisco Jerez <currojerez at riseup.net> wrote:
> Define an interface that exposes the minimal functionality required to
> implement some of the popular compute APIs. This commit adds entry
> points to set the grid layout and other state required to keep track
> of the usual address spaces employed in compute APIs, to bind a
> compute program, and execute it on the device.
> ---
> v2: Add "start slot" argument to the resource binding driver hooks.
>
> src/gallium/docs/source/context.rst | 39 +++++++++++++++
> src/gallium/docs/source/screen.rst | 28 ++++++++++-
> src/gallium/include/pipe/p_context.h | 73 ++++++++++++++++++++++++++++
> src/gallium/include/pipe/p_defines.h | 21 +++++++-
> src/gallium/include/pipe/p_screen.h | 12 +++++
> src/gallium/include/pipe/p_shader_tokens.h | 9 ++++
> src/gallium/include/pipe/p_state.h | 7 +++
> 7 files changed, 186 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst
> index b2872cd..7da0aad 100644
> --- a/src/gallium/docs/source/context.rst
> +++ b/src/gallium/docs/source/context.rst
> @@ -542,3 +542,42 @@ These flags control the behavior of a transfer object.
> ``PIPE_TRANSFER_FLUSH_EXPLICIT``
> Written ranges will be notified later with :ref:`transfer_flush_region`.
> Cannot be used with ``PIPE_TRANSFER_READ``.
> +
> +
> +Compute kernel execution
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +A compute program can be defined, bound or destroyed using
> +``create_compute_state``, ``bind_compute_state`` or
> +``destroy_compute_state`` respectively.
> +
> +Any of the subroutines contained within the compute program can be
> +executed on the device using the ``launch_grid`` method. This method
> +will execute as many instances of the program as elements in the
> +specified N-dimensional grid, hopefully in parallel.
> +
> +The compute program has access to four special resources:
> +
> +* ``GLOBAL`` represents a memory space shared among all the threads
> + running on the device. An arbitrary buffer created with the
> + ``PIPE_BIND_GLOBAL`` flag can be mapped into it using the
> + ``set_global_binding`` method.
> +
> +* ``LOCAL`` represents a memory space shared among all the threads
> + running in the same working group. The initial contents of this
> + resource are undefined.
> +
> +* ``PRIVATE`` represents a memory space local to a single thread.
> + The initial contents of this resource are undefined.
> +
> +* ``INPUT`` represents a read-only memory space that can be
> + initialized at ``launch_grid`` time.
> +
> +These resources use a byte-based addressing scheme, and they can be
> +accessed from the compute program by means of the LOAD/STORE TGSI
> +opcodes.
> +
> +In addition, normal texture look-up is allowed from the compute
> +program: ``bind_compute_sampler_states`` may be used to set up texture
> +samplers for the compute stage and ``set_compute_sampler_views`` may
> +be used to bind a number of resources to it.
> diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst
> index 51d9464..6e1877a 100644
> --- a/src/gallium/docs/source/screen.rst
> +++ b/src/gallium/docs/source/screen.rst
> @@ -98,7 +98,8 @@ The integer capabilities:
> equivalent to a specific GLSL version. E.g. for GLSL 1.3, report 130.
> * ``PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION``: Whether quads adhere to
> the flatshade_first setting in ``pipe_rasterizer_state``.
> -
> +* ``PIPE_CAP_COMPUTE``: Whether the implementation supports the
> + compute entry points defined in pipe_context and pipe_screen.
>
>
> .. _pipe_capf:
> @@ -174,6 +175,29 @@ to be 0.
> samplers.
>
>
> +.. _pipe_compute_cap:
> +
> +PIPE_COMPUTE_CAP_*
> +^^^^^^^^^^^^^^^^^^
> +
> +Compute-specific capabilities. They can be queried using
> +pipe_screen::get_compute_param.
> +
> +* ``PIPE_COMPUTE_CAP_GRID_DIMENSION``: Number of supported dimensions
> + for grid and block coordinates. Value type: ``uint64_t``.
> +* ``PIPE_COMPUTE_CAP_MAX_GRID_SIZE``: Maximum grid size in block
> + units. Value type: ``uint64_t []``.
> +* ``PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE``: Maximum block size in thread
> + units. Value type: ``uint64_t []``.
> +* ``PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE``: Maximum size of the GLOBAL
> + resource. Value type: ``uint64_t``.
> +* ``PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE``: Maximum size of the LOCAL
> + resource. Value type: ``uint64_t``.
> +* ``PIPE_COMPUTE_CAP_MAX_PRIVATE_SIZE``: Maximum size of the PRIVATE
> + resource. Value type: ``uint64_t``.
> +* ``PIPE_COMPUTE_CAP_MAX_INPUT_SIZE``: Maximum size of the INPUT
> + resource. Value type: ``uint64_t``.
> +
> .. _pipe_bind:
>
> PIPE_BIND_*
> @@ -211,6 +235,8 @@ resources might be created and handled quite differently.
> * ``PIPE_BIND_SCANOUT``: A front color buffer or scanout buffer.
> * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
> process.
> +* ``PIPE_BIND_GLOBAL``: A buffer that can be mapped into the global
> + address space of a compute program.
>
> .. _pipe_usage:
>
> diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h
> index aaeeb81..fc3f723 100644
> --- a/src/gallium/include/pipe/p_context.h
> +++ b/src/gallium/include/pipe/p_context.h
> @@ -63,6 +63,7 @@ struct pipe_vertex_element;
> struct pipe_video_buffer;
> struct pipe_video_decoder;
> struct pipe_viewport_state;
> +struct pipe_compute_state;
> union pipe_color_union;
>
> /**
> @@ -140,6 +141,10 @@ struct pipe_context {
> void (*bind_geometry_sampler_states)(struct pipe_context *,
> unsigned num_samplers,
> void **samplers);
> + void (*bind_compute_sampler_states)(struct pipe_context *,
> + unsigned start_slot,
> + unsigned num_samplers,
> + void **samplers);
> void (*delete_sampler_state)(struct pipe_context *, void *);
>
> void * (*create_rasterizer_state)(struct pipe_context *,
> @@ -219,6 +224,10 @@ struct pipe_context {
> unsigned num_views,
> struct pipe_sampler_view **);
>
> + void (*set_compute_sampler_views)(struct pipe_context *,
> + unsigned start_slot, unsigned num_views,
> + struct pipe_sampler_view **);
> +
> void (*set_vertex_buffers)( struct pipe_context *,
> unsigned num_buffers,
> const struct pipe_vertex_buffer * );
> @@ -417,6 +426,70 @@ struct pipe_context {
> */
> struct pipe_video_buffer *(*create_video_buffer)( struct pipe_context *context,
> const struct pipe_video_buffer *templat );
> +
> + /**
> + * Compute kernel execution
> + */
> + /*@{*/
> + /**
> + * Define the compute program and parameters to be used by
> + * pipe_context::launch_grid.
> + */
> + void *(*create_compute_state)(struct pipe_context *context,
> + const struct pipe_compute_state *);
> + void (*bind_compute_state)(struct pipe_context *, void *);
> + void (*delete_compute_state)(struct pipe_context *, void *);
> +
> + /**
> + * Bind an array of buffers to be mapped into the address space of
> + * the GLOBAL resource. Any buffers that were previously bound
> + * between [first, first + count - 1] are unbound after this call.
> + *
> + * \param first first buffer to map.
> + * \param count number of consecutive buffers to map.
> + * \param resources array of pointers to the buffers to map, it
> + * should contain at least \a count elements
> + * unless it's NULL, in which case no new
> + * resources will be bound.
> + * \param handles array of pointers to the memory locations that
> + * will be filled with the respective base
> + * addresses each buffer will be mapped to. It
> + * should contain at least \a count elements,
> + * unless \a resources is NULL in which case \a
> + * handles should be NULL as well.
> + *
> + * Note that the driver isn't required to make any guarantees about
> + * the contents of the \a handles array being valid anytime except
> + * during the subsequent calls to pipe_context::launch_grid. This
> + * means that the only sensible location handles[i] may point to is
> + * somewhere within the INPUT buffer itself. This is so to
> + * accommodate implementations that lack virtual memory but
> + * nevertheless migrate buffers on the fly, leading to resource
> + * base addresses that change on each kernel invocation or are
> + * unknown to the pipe driver.
> + */
> + void (*set_global_binding)(struct pipe_context *context,
> + unsigned first, unsigned count,
> + struct pipe_resource **resources,
> + uint32_t **handles);
> +
> + /**
> + * Launch the compute kernel starting from instruction \a pc of the
> + * currently bound compute program.
> + *
> + * \a grid_layout and \a block_layout are arrays of size \a
> + * PIPE_COMPUTE_CAP_GRID_DIMENSION that determine the layout of the
> + * grid (in block units) and working block (in thread units) to be
> + * used, respectively.
> + *
> + * \a input will be used to initialize the INPUT resource, and it
> + * should point to a buffer of at least
> + * pipe_compute_state::req_input_mem bytes.
> + */
> + void (*launch_grid)(struct pipe_context *context,
> + const uint *block_layout, const uint *grid_layout,
> + uint32_t pc, const void *input);
> + /*@}*/
> };
>
>
> diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h
> index 889fc99..af36af3 100644
> --- a/src/gallium/include/pipe/p_defines.h
> +++ b/src/gallium/include/pipe/p_defines.h
> @@ -305,6 +305,7 @@ enum pipe_transfer_usage {
> #define PIPE_BIND_STREAM_OUTPUT (1 << 11) /* set_stream_output_buffers */
> #define PIPE_BIND_CURSOR (1 << 16) /* mouse cursor */
> #define PIPE_BIND_CUSTOM (1 << 17) /* state-tracker/winsys usages */
> +#define PIPE_BIND_GLOBAL (1 << 18) /* set_global_binding */
>
> /* The first two flags above were previously part of the amorphous
> * TEXTURE_USAGE, most of which are now descriptions of the ways a
> @@ -347,7 +348,8 @@ enum pipe_transfer_usage {
> #define PIPE_SHADER_VERTEX 0
> #define PIPE_SHADER_FRAGMENT 1
> #define PIPE_SHADER_GEOMETRY 2
> -#define PIPE_SHADER_TYPES 3
> +#define PIPE_SHADER_COMPUTE 3
> +#define PIPE_SHADER_TYPES 4
>
>
> /**
> @@ -473,7 +475,8 @@ enum pipe_cap {
> PIPE_CAP_VERTEX_COLOR_UNCLAMPED = 60,
> PIPE_CAP_VERTEX_COLOR_CLAMPED = 61,
> PIPE_CAP_GLSL_FEATURE_LEVEL = 62,
> - PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION = 63
> + PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION = 63,
> + PIPE_CAP_COMPUTE = 64
> };
>
> /**
> @@ -519,6 +522,20 @@ enum pipe_shader_cap
> PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS = 18
> };
>
> +/**
> + * Compute-specific implementation capability. They can be queried
> + * using pipe_screen::get_compute_param.
> + */
> +enum pipe_compute_cap
> +{
> + PIPE_COMPUTE_CAP_GRID_DIMENSION,
> + PIPE_COMPUTE_CAP_MAX_GRID_SIZE,
> + PIPE_COMPUTE_CAP_MAX_BLOCK_SIZE,
> + PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE,
> + PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE,
> + PIPE_COMPUTE_CAP_MAX_PRIVATE_SIZE,
> + PIPE_COMPUTE_CAP_MAX_INPUT_SIZE
> +};
>
> /**
> * Composite query types
> diff --git a/src/gallium/include/pipe/p_screen.h b/src/gallium/include/pipe/p_screen.h
> index 45c441b..7ae7c9a 100644
> --- a/src/gallium/include/pipe/p_screen.h
> +++ b/src/gallium/include/pipe/p_screen.h
> @@ -98,6 +98,18 @@ struct pipe_screen {
> enum pipe_video_profile profile,
> enum pipe_video_cap param );
>
> + /**
> + * Query a compute-specific capability/parameter/limit.
> + * \param param one of PIPE_COMPUTE_CAP_x
> + * \param ret pointer to a preallocated buffer that will be
> + * initialized to the parameter value, or NULL.
> + * \return size in bytes of the parameter value that would be
> + * returned.
> + */
> + int (*get_compute_param)(struct pipe_screen *,
> + enum pipe_compute_cap param,
> + void *ret);
> +
> struct pipe_context * (*context_create)( struct pipe_screen *,
> void *priv );
>
> diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h
> index df2dd5e..9d08fde 100644
> --- a/src/gallium/include/pipe/p_shader_tokens.h
> +++ b/src/gallium/include/pipe/p_shader_tokens.h
> @@ -166,6 +166,15 @@ struct tgsi_declaration_resource {
> unsigned ReturnTypeW : 6; /**< one of enum pipe_type */
> };
>
> +/*
> + * Special resources that don't need to be declared. They map to the
> + * GLOBAL/LOCAL/PRIVATE/INPUT compute memory spaces.
> + */
> +#define TGSI_RESOURCE_GLOBAL 0x7fff
> +#define TGSI_RESOURCE_LOCAL 0x7ffe
> +#define TGSI_RESOURCE_PRIVATE 0x7ffd
> +#define TGSI_RESOURCE_INPUT 0x7ffc
> +
> #define TGSI_IMM_FLOAT32 0
> #define TGSI_IMM_UINT32 1
> #define TGSI_IMM_INT32 2
> diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h
> index 72ec04a..477ff3c 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -577,6 +577,13 @@ struct pipe_resolve_info
> unsigned mask; /**< PIPE_MASK_RGBA, Z, S or ZS */
> };
>
> +struct pipe_compute_state
> +{
> + const struct tgsi_token *tokens; /**< Compute program to be executed. */
> + unsigned req_local_mem; /**< Required size of the LOCAL resource. */
> + unsigned req_private_mem; /**< Required size of the PRIVATE resource. */
> + unsigned req_input_mem; /**< Required size of the INPUT resource. */
> +};
>
> #ifdef __cplusplus
> }
> --
> 1.7.9.2
>
More information about the mesa-dev
mailing list