[Mesa-dev] [PATCH 0/3] cl workdim v2
Francisco Jerez
currojerez at riseup.net
Tue Sep 2 05:36:44 PDT 2014
Jan Vesely <jan.vesely at rutgers.edu> writes:
> On Sat, 2014-08-16 at 13:13 +0300, Francisco Jerez wrote:
>> Jan Vesely <jan.vesely at rutgers.edu> writes:
>>
>> > On Thu, 2014-08-07 at 16:02 +0300, Francisco Jerez wrote:
>> >> Jan Vesely <jan.vesely at rutgers.edu> writes:
>> >>
>> >> > This respin includes Francisco's approach of providing implicit
>> >> > in the arg vector passed from clover, and Tom's idea of appending
>> >> > implicit args after the kernel args.
>> >> >
>> >>
>> >> Hmmm... Maybe it would make sense to add some sort of versioning
>> >> (e.g. as part of the target triple) to the binary interface between
>> >> clover and the kernel instead, so we can handle this sort of
>> >> non-backwards compatible changes and the compiler back-end and libclc
>> >> have some way to find out whether some specific feature is available and
>> >> e.g. some specific extension should be enabled.
>> >>
>> >> > I assumed it's not safe to modify exec.input, so the input vector is copied
>> >> > before appending work dim.
>> >> >
>> >>
>> >> Why wouldn't it be safe? You just need to make sure they're appended
>> >> before the compute state is created.
>> >
>> > I thought there might be a problem when called from multiple threads,
>> > but it looks like most of the vars are local to the current call anyway.
>> >
>> > I looked at the code a bit better, and need a bit of help with what the
>> > proffered approach would be.
>> >
>> > exec_context::bind() appends all kernel args to the input vector. If the
>> > implicit args are added before bind() it shifts all other args, which is
>> > not what we want.
>> > if the implicit args are appended after, they are not accounted for in
>> > shader->input_size (and not copied by the driver).
>> >
>> > my current code modifies exec_context::bind() to preserve the content of
>> > input before binding kernel args, and append the old content after the
>> > args are bound.
>> > I have also considered passing and implicit args vector to
>> > exec_context::bind to make the trick more visible.
>> >
>> > Turning workdim into a proper arg in _args does not work either, because
>> > it is not present in module args.
>> >
>> > any thoughts?
>> >
>>
>> I finally had a chance to take a closer look at your series. It looks
>> like you're right: In order to implement my proposal cleanly, implicit
>> arguments would have to be part of the _args array so the compiler would
>> have to include them in the module argument lists with memory layout
>> parameters (e.g. alignment, size) suitable for the hardware, so there's
>> probably little benefit compared to your original approach that includes
>> the number of dimensions as an additional launch_grid() parameter.
>>
>> So we don't have to change it again, can you add another array parameter
>> for the base grid offset? That's another thing we don't pass through
>> the pipe driver API currently and CL requires. The prototype of
>> launch_grid could look like:
>>
>> | void (*launch_grid)(struct pipe_context *context, uint dims,
>> | const uint *block_layout, const uint *grid_layout,
>> | const uint *grid_offset, uint32_t pc,
>> | const void *input);
>>
>> And don't forget to update the docs. :)
>
> Hi,
>
> I wanted to explore the original idea of appending implicit args, since
> launch_grid is driver specific and would need to reimplement the same
> functionality in every driver.
>
> I came up with a solution (see the attached patch). I don't like that
> the implicit arg needs to be set in api function.
Right, if we do it this way it would probably be a better fit for the
clover::kernel code, the CL front-end doesn't really need to be aware of
implicit args.
> I also don't like that this way there is no difference between
> explicit and implicit kernel arguments. On the other hand it's simple,
> and does not need additional per driver code.
>
Yeah... We definitely want to hide these from the user, as e.g. the
CL_KERNEL_NUM_ARGS param is required by the spec to return the number of
arguments provided by the user, and we don't want the user to set
implicit args, so it gets a bit messy. I think I like better your
original idea of passing them as launch_grid() arguments, even though
the grid offset and dimension parameters are somewhat artificial from a
the hardware's point of view.
> thanks,
> jan
>
>>
>> Thank you.
>>
>> > thanks,
>> > jan
>> >
>> >
>> >>
>> >> > Passes get-work-dim piglit on turks without any regression,
>> >> > I have not tested SI as I don't have the hw.
>> >> >
>> >> > jan
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Jan Vesely (3):
>> >> > gallium: Pass input data size to launch_grid
>> >> > clover: Add work dimension implicit param to input
>> >> > r600,radeonsi: Copy implicit args provided by clover
>> >> >
>> >> > src/gallium/drivers/ilo/ilo_gpgpu.c | 2 +-
>> >> > src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 2 +-
>> >> > src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 4 +-
>> >> > src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 2 +-
>> >> > src/gallium/drivers/r600/evergreen_compute.c | 14 +-
>> >> > src/gallium/drivers/r600/evergreen_compute.h | 1 -
>> >> > src/gallium/drivers/radeonsi/si_compute.c | 6 +-
>> >> > src/gallium/include/pipe/p_context.h | 2 +-
>> >> > src/gallium/state_trackers/clover/core/kernel.cpp | 162 ++++++++++++----------
>> >> > src/gallium/tests/trivial/compute.c | 40 +++---
>> >> > 10 files changed, 122 insertions(+), 113 deletions(-)
>> >> >
>> >> > --
>> >> > 1.9.3
>> >
>> > --
>> > Jan Vesely <jan.vesely at rutgers.edu>
>
> --
> Jan Vesely <jan.vesely at rutgers.edu>
> From 7ad338ebd3a67b19d4ba492fb5a4cbda418fcdad Mon Sep 17 00:00:00 2001
> From: Jan Vesely <jan.vesely at rutgers.edu>
> Date: Mon, 1 Sep 2014 19:18:12 -0400
> Subject: [PATCH RFC 1/1] clover: Append implicit work dim arg
>
> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> ---
> src/gallium/state_trackers/clover/api/kernel.cpp | 6 ++++++
> .../state_trackers/clover/llvm/invocation.cpp | 20 ++++++++++++++------
> 2 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp b/src/gallium/state_trackers/clover/api/kernel.cpp
> index 05cc392..a3b9735 100644
> --- a/src/gallium/state_trackers/clover/api/kernel.cpp
> +++ b/src/gallium/state_trackers/clover/api/kernel.cpp
> @@ -276,6 +276,9 @@ clEnqueueNDRangeKernel(cl_command_queue d_q, cl_kernel d_kern,
> auto block_size = validate_block_size(q, kern, dims,
> d_grid_size, d_block_size);
>
> + cl_uint work_dim_val = block_size.size();
> + kern.args().back().set(sizeof(work_dim_val), &work_dim_val);
> +
> validate_common(q, kern, deps);
>
> auto hev = create<hard_event>(
> @@ -299,6 +302,9 @@ clEnqueueTask(cl_command_queue d_q, cl_kernel d_kern,
> auto &kern = obj(d_kern);
> auto deps = objs<wait_list_tag>(d_deps, num_deps);
>
> + cl_uint work_dim_val = 1;
> + kern.args().back().set(sizeof(work_dim_val), &work_dim_val);
> +
> validate_common(q, kern, deps);
>
> auto hev = create<hard_event>(
> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> index 7bca0d6..a934384 100644
> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> @@ -315,17 +315,17 @@ namespace {
> kernel_func = kernels[i];
> kernel_name = kernel_func->getName();
>
> - for (llvm::Function::arg_iterator I = kernel_func->arg_begin(),
> - E = kernel_func->arg_end(); I != E; ++I) {
> - llvm::Argument &arg = *I;
> #if HAVE_LLVM < 0x0302
> - llvm::TargetData TD(kernel_func->getParent());
> + llvm::TargetData TD(kernel_func->getParent());
> #elif HAVE_LLVM < 0x0305
> - llvm::DataLayout TD(kernel_func->getParent()->getDataLayout());
> + llvm::DataLayout TD(kernel_func->getParent()->getDataLayout());
> #else
> - llvm::DataLayout TD(mod);
> + llvm::DataLayout TD(mod);
> #endif
>
> + for (llvm::Function::arg_iterator I = kernel_func->arg_begin(),
> + E = kernel_func->arg_end(); I != E; ++I) {
> + llvm::Argument &arg = *I;
> llvm::Type *arg_type = arg.getType();
> const unsigned arg_store_size = TD.getTypeStoreSize(arg_type);
>
> @@ -384,6 +384,14 @@ namespace {
> }
> }
>
> + // Implicit arguments
> + // Work dimensions (cl_uint), uint is 32 bit
> + llvm::Type *target_type = llvm::Type::getInt32Ty(mod->getContext());
> + args.push_back(module::argument(module::argument::scalar,
> + sizeof(cl_uint), TD.getTypeStoreSize(target_type),
> + TD.getABITypeAlignment(target_type),
> + module::argument::zero_ext));
> +
> m.syms.push_back(module::symbol(kernel_name, 0, i, args ));
> }
>
> --
> 1.9.3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140902/8605dfe1/attachment.sig>
More information about the mesa-dev
mailing list