[Mesa-dev] [PATCH 0/3] cl workdim v2

Tue Sep 2 05:36:44 PDT 2014

Jan Vesely <jan.vesely at rutgers.edu> writes:

> On Sat, 2014-08-16 at 13:13 +0300, Francisco Jerez wrote:
>> Jan Vesely <jan.vesely at rutgers.edu> writes:
>> 
>> > On Thu, 2014-08-07 at 16:02 +0300, Francisco Jerez wrote:
>> >> Jan Vesely <jan.vesely at rutgers.edu> writes:
>> >> 
>> >> > This respin includes Francisco's approach of providing implicit
>> >> > in the arg vector passed from clover, and Tom's idea of appending
>> >> > implicit args after the kernel args.
>> >> >
>> >> 
>> >> Hmmm...  Maybe it would make sense to add some sort of versioning
>> >> (e.g. as part of the target triple) to the binary interface between
>> >> clover and the kernel instead, so we can handle this sort of
>> >> non-backwards compatible changes and the compiler back-end and libclc
>> >> have some way to find out whether some specific feature is available and
>> >> e.g. some specific extension should be enabled.
>> >> 
>> >> > I assumed it's not safe to modify exec.input, so the input vector is copied
>> >> > before appending work dim.
>> >> >
>> >> 
>> >> Why wouldn't it be safe?  You just need to make sure they're appended
>> >> before the compute state is created.
>> >
>> > I thought there might be a problem when called from multiple threads,
>> > but it looks like most of the vars are local to the current call anyway.
>> >
>> > I looked at the code a bit better, and need a bit of help with what the
>> > proffered approach would be.
>> >
>> > exec_context::bind() appends all kernel args to the input vector. If the
>> > implicit args are added before bind() it shifts all other args, which is
>> > not what we want.
>> > if the implicit args are appended after, they are not accounted for in
>> > shader->input_size (and not copied by the driver).
>> >
>> > my current code modifies exec_context::bind() to preserve the content of
>> > input before binding kernel args, and append the old content after the
>> > args are bound.
>> > I have also considered passing and implicit args vector to
>> > exec_context::bind to make the trick more visible.
>> >
>> > Turning workdim into a proper arg in _args does not work either, because
>> > it is not present in module args.
>> >
>> > any thoughts?
>> >
>> 
>> I finally had a chance to take a closer look at your series.  It looks
>> like you're right: In order to implement my proposal cleanly, implicit
>> arguments would have to be part of the _args array so the compiler would
>> have to include them in the module argument lists with memory layout
>> parameters (e.g. alignment, size) suitable for the hardware, so there's
>> probably little benefit compared to your original approach that includes
>> the number of dimensions as an additional launch_grid() parameter.
>> 
>> So we don't have to change it again, can you add another array parameter
>> for the base grid offset?  That's another thing we don't pass through
>> the pipe driver API currently and CL requires. The prototype of
>> launch_grid could look like:
>> 
>> | void (*launch_grid)(struct pipe_context *context, uint dims,
>> |                     const uint *block_layout, const uint *grid_layout,
>> |                     const uint *grid_offset, uint32_t pc,
>> |                     const void *input);
>> 
>> And don't forget to update the docs. :)
>
> Hi,
>
> I wanted to explore the original idea of appending implicit args, since
> launch_grid is driver specific and would need to reimplement the same
> functionality in every driver.
>
> I came up with a solution (see the attached patch). I don't like that
> the implicit arg needs to be set in api function.

Right, if we do it this way it would probably be a better fit for the
clover::kernel code, the CL front-end doesn't really need to be aware of
implicit args.

> I also don't like that this way there is no difference between
> explicit and implicit kernel arguments. On the other hand it's simple,
> and does not need additional per driver code.
>
Yeah...  We definitely want to hide these from the user, as e.g. the
CL_KERNEL_NUM_ARGS param is required by the spec to return the number of
arguments provided by the user, and we don't want the user to set
implicit args, so it gets a bit messy.  I think I like better your
original idea of passing them as launch_grid() arguments, even though
the grid offset and dimension parameters are somewhat artificial from a
the hardware's point of view.

> thanks,
> jan
>
>> 
>> Thank you.
>> 
>> > thanks,
>> > jan
>> >
>> >
>> >> 
>> >> > Passes get-work-dim piglit on turks without any regression,
>> >> > I have not tested SI as I don't have the hw.
>> >> >
>> >> > jan
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Jan Vesely (3):
>> >> >   gallium: Pass input data size to launch_grid
>> >> >   clover: Add work dimension implicit param to input
>> >> >   r600,radeonsi: Copy implicit args provided by clover
>> >> >
>> >> >  src/gallium/drivers/ilo/ilo_gpgpu.c               |   2 +-
>> >> >  src/gallium/drivers/nouveau/nvc0/nvc0_compute.c   |   2 +-
>> >> >  src/gallium/drivers/nouveau/nvc0/nvc0_context.h   |   4 +-
>> >> >  src/gallium/drivers/nouveau/nvc0/nve4_compute.c   |   2 +-
>> >> >  src/gallium/drivers/r600/evergreen_compute.c      |  14 +-
>> >> >  src/gallium/drivers/r600/evergreen_compute.h      |   1 -
>> >> >  src/gallium/drivers/radeonsi/si_compute.c         |   6 +-
>> >> >  src/gallium/include/pipe/p_context.h              |   2 +-
>> >> >  src/gallium/state_trackers/clover/core/kernel.cpp | 162 ++++++++++++----------
>> >> >  src/gallium/tests/trivial/compute.c               |  40 +++---
>> >> >  10 files changed, 122 insertions(+), 113 deletions(-)
>> >> >
>> >> > -- 
>> >> > 1.9.3
>> >
>> > -- 
>> > Jan Vesely <jan.vesely at rutgers.edu>
>
> -- 
> Jan Vesely <jan.vesely at rutgers.edu>
> From 7ad338ebd3a67b19d4ba492fb5a4cbda418fcdad Mon Sep 17 00:00:00 2001
> From: Jan Vesely <jan.vesely at rutgers.edu>
> Date: Mon, 1 Sep 2014 19:18:12 -0400
> Subject: [PATCH RFC 1/1] clover: Append implicit work dim arg
>
> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> ---
>  src/gallium/state_trackers/clover/api/kernel.cpp     |  6 ++++++
>  .../state_trackers/clover/llvm/invocation.cpp        | 20 ++++++++++++++------
>  2 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp b/src/gallium/state_trackers/clover/api/kernel.cpp
> index 05cc392..a3b9735 100644
> --- a/src/gallium/state_trackers/clover/api/kernel.cpp
> +++ b/src/gallium/state_trackers/clover/api/kernel.cpp
> @@ -276,6 +276,9 @@ clEnqueueNDRangeKernel(cl_command_queue d_q, cl_kernel d_kern,
>     auto block_size = validate_block_size(q, kern, dims,
>                                           d_grid_size, d_block_size);
>  
> +   cl_uint work_dim_val = block_size.size();
> +   kern.args().back().set(sizeof(work_dim_val), &work_dim_val);
> +
>     validate_common(q, kern, deps);
>  
>     auto hev = create<hard_event>(
> @@ -299,6 +302,9 @@ clEnqueueTask(cl_command_queue d_q, cl_kernel d_kern,
>     auto &kern = obj(d_kern);
>     auto deps = objs<wait_list_tag>(d_deps, num_deps);
>  
> +   cl_uint work_dim_val = 1;
> +   kern.args().back().set(sizeof(work_dim_val), &work_dim_val);
> +
>     validate_common(q, kern, deps);
>  
>     auto hev = create<hard_event>(
> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> index 7bca0d6..a934384 100644
> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> @@ -315,17 +315,17 @@ namespace {
>           kernel_func = kernels[i];
>           kernel_name = kernel_func->getName();
>  
> -         for (llvm::Function::arg_iterator I = kernel_func->arg_begin(),
> -                                      E = kernel_func->arg_end(); I != E; ++I) {
> -            llvm::Argument &arg = *I;
>  #if HAVE_LLVM < 0x0302
> -            llvm::TargetData TD(kernel_func->getParent());
> +         llvm::TargetData TD(kernel_func->getParent());
>  #elif HAVE_LLVM < 0x0305
> -            llvm::DataLayout TD(kernel_func->getParent()->getDataLayout());
> +         llvm::DataLayout TD(kernel_func->getParent()->getDataLayout());
>  #else
> -            llvm::DataLayout TD(mod);
> +         llvm::DataLayout TD(mod);
>  #endif
>  
> +         for (llvm::Function::arg_iterator I = kernel_func->arg_begin(),
> +                                      E = kernel_func->arg_end(); I != E; ++I) {
> +            llvm::Argument &arg = *I;
>              llvm::Type *arg_type = arg.getType();
>              const unsigned arg_store_size = TD.getTypeStoreSize(arg_type);
>  
> @@ -384,6 +384,14 @@ namespace {
>              }
>           }
>  
> +         // Implicit arguments
> +         // Work dimensions (cl_uint), uint is 32 bit
> +         llvm::Type *target_type = llvm::Type::getInt32Ty(mod->getContext());
> +         args.push_back(module::argument(module::argument::scalar,
> +                        sizeof(cl_uint), TD.getTypeStoreSize(target_type),
> +                        TD.getABITypeAlignment(target_type),
> +                        module::argument::zero_ext));
> +
>           m.syms.push_back(module::symbol(kernel_name, 0, i, args ));
>        }
>  
> -- 
> 1.9.3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140902/8605dfe1/attachment.sig>