[Mesa-dev] [PATCH] clover: Pass image attributes to the kernel

Mon Jul 27 01:25:09 PDT 2015

> > Hmmm...  So you only need it as padding?  Wouldn't it be possible for
> > you to declare samplers to be 0 bytes?
> Maybe it can be done by changing the type of the sampler arg from i32
> to an empty struct. I'll have to try this, I don't know if it will
> work.

But then it would only work for the target I'm working with.

> Some hardware (e.g. Intel's) will need an index (which can probably be
> hardcoded in the shader for now) into a table of sampler configurations
> which can be set-up later on by the driver at state-emit time.

Yes, this is also the case with r600 texture fetch instructions: the
sampler configurations are emitted, and are referenced by an index in
the kernel. But if the kernel code itself needs the sampler value,
e.g. it branches and calls normalized or non-normalized texture fetch
based on the appropriate bit of the sampler, it needs the data in a
kernel-accessible place. And AFAIK, the hardware registers which set
up samplers for fetches are not accessible by the kernel.

> In any
> case we can always add a new argument semantic enum later on for the
> target to select sampler config vs index if different drivers turn out
> to need different things.

The index can indeed be hard-coded by the compiler, as you pointed it
out, so uploading the index will not be useful for any target (and
adding the code to do it is a mistake on my end). However the kernel
may need the sampler value in an accessible place. So we are in a
similar situation as image attribute metadata. But instead of adding
implicit arguments after the 0-sized sampler arg, the sampler value
could simply be uploaded as the explicit argument. The benefit of the
latter approach is simplicity, while the former only saves 4 bytes for
some targets at a cost of complexity. Does adding the sampler value to
the input vector breaks something I don't know about?

On Sun, Jul 26, 2015 at 11:37 PM, Zoltán Gilián <zoltan.gilian at gmail.com> wrote:
>> Why?  Your module::argument::image_size and ::image_format cases don't
>> touch the explicit_arg iterator at all AFAICT, so it will be left
>> pointing one past the last general semantic argument
>
> Ok, my mistake, I didn't think this through.
>
>> Hmmm...  So you only need it as padding?  Wouldn't it be possible for
>> you to declare samplers to be 0 bytes?
>
> Maybe it can be done by changing the type of the sampler arg from i32
> to an empty struct. I'll have to try this, I don't know if it will
> work.
>
> On Sun, Jul 26, 2015 at 2:40 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>> Zoltán Gilián <zoltan.gilian at gmail.com> writes:
>>
>>>> auto img = dynamic_cast<image_argument &>(**(explicit_arg - 1))
>>>
>>> Ok, so it should be (explicit_arg - 2) for the image format, I
>>> presume.
>>
>> Why?  Your module::argument::image_size and ::image_format cases don't
>> touch the explicit_arg iterator at all AFAICT, so it will be left
>> pointing one past the last general semantic argument
>>
>>> This will be incorrect, however, if the targets that need implicit
>>> argument for format metadata are indeed a strict superset of the ones
>>> that need dimension, as you mentioned before. The targets that only
>>> need format will break this code. Should I swap the order of the
>>> format and dimension implicit args to make this approach work under
>>> the aforementioned assumption?
>>>
>> Why would it be incorrect?  The kernel::_args vector explicit_arg
>> iterates in contains explicit arguments only so it shouldn't make a
>> difference in which order you emit them as long as you don't interleave
>> other explicit arguments in between.
>>
>>>> It also seems like you've got rid of the static casts you had
>>>> previously?
>>>
>>> That is a mistake, I'll fix it.
>>>
>>>> My expectation here was that the compiler would be able to hard-code
>>>> sampler indices in the shader without the API passing any actual data in
>>>> the input buffer.  Doesn't that work for you?
>>>
>>> Yes, that's correct, but this is also the case with images. If image
>>> index is uploaded explicitly, I don't see why it can't be done with
>>> sampler indices. But probably it's a better idea to send the sampler
>>> value rather than the index, in case the kernel needs it (e.g. the
>>> normalized or non-normalized nature of texture coordinates may have to
>>> be specified in the fetch instruction itself, and not by hardware
>>> registers), so I'll definitely change this.
>>
>> Some hardware (e.g. Intel's) will need an index (which can probably be
>> hardcoded in the shader for now) into a table of sampler configurations
>> which can be set-up later on by the driver at state-emit time.  In any
>> case we can always add a new argument semantic enum later on for the
>> target to select sampler config vs index if different drivers turn out
>> to need different things.
>>
>>> But the bigger problem is, that the byte offsets of the kernel
>>> arguments are computed considering the sampler argument too, so the
>>> binary expects it to be present in the input vector. Furthermore I
>>> can't erase the sampler argument from the IR, because it is needed to
>>> make it possible for get_kernel_args to detect the sampler. But if
>>> sampler_argument::bind doesn't append 4 bytes (clang compiles
>>> sampler_t to i32) to the input vector, the binary will try to load the
>>> following arguments from wrong locations.
>>
>> Hmmm...  So you only need it as padding?  Wouldn't it be possible for
>> you to declare samplers to be 0 bytes?
>>
>>>
>>>> I don't think it's a good idea to use such obvious names for the
>>>> implicit argument types. Couldn't they collide with some type declared
>>> by the user that happens to have the same name?  How about
>>> "__clover_image_size" or something similar?
>>>
>>> Indeed, that's a good point. I'd go with something like
>>> "__opencl_image_*" or "__llvm_image_*", because these strings will be
>>> added to llvm, and non-clover code may depend on them in the future.
>>>
>> Sounds good to me.
>>
>>>> (Identifiers starting with
>>>> double underscore are reserved for the implementation at least on C and
>>>> GLSL, not sure about OpenCL-C)
>>>
>>> I believe this is true for OpenCL C too, since it is an extension to
>>> C99. Identifiers starting with double underscores are reserved in C99.
>>>
>> IIRC identifiers starting with double underscore and single underscore
>> followed by some upper-case letter were reserved even in the original
>> ANSI C spec.
>>
>>> On Sat, Jul 25, 2015 at 1:06 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>> Zoltan Gilian <zoltan.gilian at gmail.com> writes:
>>>>
>>>>> Read-only and write-only image arguments are recognized and
>>>>> distinguished.
>>>>> Attributes of the image arguments are passed to the kernel as implicit
>>>>> arguments.
>>>>> ---
>>>>>  src/gallium/state_trackers/clover/core/kernel.cpp  |  46 ++++++-
>>>>>  src/gallium/state_trackers/clover/core/kernel.hpp  |  13 +-
>>>>>  src/gallium/state_trackers/clover/core/memory.cpp  |   2 +-
>>>>>  src/gallium/state_trackers/clover/core/module.hpp  |   4 +-
>>>>>  .../state_trackers/clover/llvm/invocation.cpp      | 147 ++++++++++++++++++++-
>>>>>  5 files changed, 198 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp
>>>>> index 0756f06..1a6c28f 100644
>>>>> --- a/src/gallium/state_trackers/clover/core/kernel.cpp
>>>>> +++ b/src/gallium/state_trackers/clover/core/kernel.cpp
>>>>> @@ -158,13 +158,18 @@ kernel::exec_context::bind(intrusive_ptr<command_queue> _q,
>>>>>     auto margs = find(name_equals(kern.name()), m.syms).args;
>>>>>     auto msec = find(type_equals(module::section::text), m.secs);
>>>>>     auto explicit_arg = kern._args.begin();
>>>>> +   image_argument *last_image_arg = nullptr;
>>>>>
>>>>>     for (auto &marg : margs) {
>>>>>        switch (marg.semantic) {
>>>>> -      case module::argument::general:
>>>>> +      case module::argument::general: {
>>>>> +         auto image_arg = dynamic_cast<image_argument*>(explicit_arg->get());
>>>>> +         if (image_arg) {
>>>>> +            last_image_arg = image_arg;
>>>>> +         }
>>>>>           (*(explicit_arg++))->bind(*this, marg);
>>>>>           break;
>>>>> -
>>>>> +      }
>>>>>        case module::argument::grid_dimension: {
>>>>>           const cl_uint dimension = grid_offset.size();
>>>>>           auto arg = argument::create(marg);
>>>>> @@ -182,6 +187,36 @@ kernel::exec_context::bind(intrusive_ptr<command_queue> _q,
>>>>>           }
>>>>>           break;
>>>>>        }
>>>>> +      case module::argument::image_size: {
>>>>> +         assert(last_image_arg);
>>>>> +         auto img = last_image_arg->get();
>>>>
>>>> Instead of carrying around an extra variable during the loop, you could
>>>> achieve the same effect more locally by doing:
>>>>
>>>> |            auto img = dynamic_cast<image_argument &>(**(explicit_arg - 1))
>>>> |                       .get();
>>>>
>>>> The cast to reference would also make sure that the argument is of the
>>>> specified type or otherwise throw std::bad_cast which is as good as an
>>>> assertion failure.
>>>>
>>>>> +         std::vector<cl_uint> image_size({
>>>>> +               img->width(),
>>>>> +               img->height(),
>>>>> +               img->depth()});
>>>>
>>>> Parentheses are not necessary around the curly braces, you could just:
>>>>
>>>> |         std::vector<cl_uint> image_size {
>>>> |               img->width(),
>>>> |               img->height(),
>>>> |               img->depth()
>>>> |         };
>>>>
>>>> It also seems like you've got rid of the static casts you had
>>>> previously?  These image methods return a size_t value which may be of
>>>> different size than cl_uint and cause a warning.
>>>>
>>>>> +         for (auto x: image_size) {
>>>>
>>>> Leave a space around the colon for consistency, also in the cases below.
>>>>
>>>>> +            auto arg = argument::create(marg);
>>>>> +
>>>>> +            arg->set(sizeof(x), &x);
>>>>> +            arg->bind(*this, marg);
>>>>> +         }
>>>>> +         break;
>>>>> +      }
>>>>> +      case module::argument::image_format: {
>>>>> +         assert(last_image_arg);
>>>>> +         auto img = last_image_arg->get();
>>>>> +         cl_image_format fmt = img->format();
>>>>> +         std::vector<cl_uint> image_format({
>>>>> +               fmt.image_channel_data_type,
>>>>> +               fmt.image_channel_order});
>>>>
>>>> Same comments apply here.
>>>>
>>>>> +         for (auto x: image_format) {
>>>>> +            auto arg = argument::create(marg);
>>>>> +
>>>>> +            arg->set(sizeof(x), &x);
>>>>> +            arg->bind(*this, marg);
>>>>> +         }
>>>>> +         break;
>>>>> +      }
>>>>>        }
>>>>>     }
>>>>>
>>>>> @@ -532,6 +567,13 @@ kernel::sampler_argument::set(size_t size, const void *value) {
>>>>>  void
>>>>>  kernel::sampler_argument::bind(exec_context &ctx,
>>>>>                                 const module::argument &marg) {
>>>>> +   auto v = bytes(ctx.samplers.size());
>>>>> +
>>>>> +   extend(v, module::argument::zero_ext, marg.target_size);
>>>>> +   byteswap(v, ctx.q->device().endianness());
>>>>> +   align(ctx.input, marg.target_align);
>>>>> +   insert(ctx.input, v);
>>>>> +
>>>>
>>>> My expectation here was that the compiler would be able to hard-code
>>>> sampler indices in the shader without the API passing any actual data in
>>>> the input buffer.  Doesn't that work for you?
>>>>
>>>>>     st = s->bind(*ctx.q);
>>>>>     ctx.samplers.push_back(st);
>>>>>  }
>>>>> diff --git a/src/gallium/state_trackers/clover/core/kernel.hpp b/src/gallium/state_trackers/clover/core/kernel.hpp
>>>>> index d6432a4..007ffcc 100644
>>>>> --- a/src/gallium/state_trackers/clover/core/kernel.hpp
>>>>> +++ b/src/gallium/state_trackers/clover/core/kernel.hpp
>>>>> @@ -190,7 +190,14 @@ namespace clover {
>>>>>           pipe_surface *st;
>>>>>        };
>>>>>
>>>>> -      class image_rd_argument : public argument {
>>>>> +      class image_argument : public argument {
>>>>> +      public:
>>>>> +         const image *get() const { return img; }
>>>>
>>>> Please write the return statement on its own line for consistency.
>>>>
>>>>> +      protected:
>>>>> +         image *img;
>>>>> +      };
>>>>> +
>>>>> +      class image_rd_argument : public image_argument {
>>>>>        public:
>>>>>           virtual void set(size_t size, const void *value);
>>>>>           virtual void bind(exec_context &ctx,
>>>>> @@ -198,11 +205,10 @@ namespace clover {
>>>>>           virtual void unbind(exec_context &ctx);
>>>>>
>>>>>        private:
>>>>> -         image *img;
>>>>>           pipe_sampler_view *st;
>>>>>        };
>>>>>
>>>>> -      class image_wr_argument : public argument {
>>>>> +      class image_wr_argument : public image_argument {
>>>>>        public:
>>>>>           virtual void set(size_t size, const void *value);
>>>>>           virtual void bind(exec_context &ctx,
>>>>> @@ -210,7 +216,6 @@ namespace clover {
>>>>>           virtual void unbind(exec_context &ctx);
>>>>>
>>>>>        private:
>>>>> -         image *img;
>>>>>           pipe_surface *st;
>>>>>        };
>>>>>
>>>>> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp b/src/gallium/state_trackers/clover/core/memory.cpp
>>>>> index 055336a..b852e68 100644
>>>>> --- a/src/gallium/state_trackers/clover/core/memory.cpp
>>>>> +++ b/src/gallium/state_trackers/clover/core/memory.cpp
>>>>> @@ -189,7 +189,7 @@ image2d::image2d(clover::context &ctx, cl_mem_flags flags,
>>>>>                   const cl_image_format *format, size_t width,
>>>>>                   size_t height, size_t row_pitch,
>>>>>                   void *host_ptr) :
>>>>> -   image(ctx, flags, format, width, height, 0,
>>>>> +   image(ctx, flags, format, width, height, 1,
>>>>>           row_pitch, 0, height * row_pitch, host_ptr) {
>>>>>  }
>>>>>
>>>>> diff --git a/src/gallium/state_trackers/clover/core/module.hpp b/src/gallium/state_trackers/clover/core/module.hpp
>>>>> index 9d65688..5db0548 100644
>>>>> --- a/src/gallium/state_trackers/clover/core/module.hpp
>>>>> +++ b/src/gallium/state_trackers/clover/core/module.hpp
>>>>> @@ -72,7 +72,9 @@ namespace clover {
>>>>>           enum semantic {
>>>>>              general,
>>>>>              grid_dimension,
>>>>> -            grid_offset
>>>>> +            grid_offset,
>>>>> +            image_size,
>>>>> +            image_format
>>>>>           };
>>>>>
>>>>>           argument(enum type type, size_t size,
>>>>> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>>>>> index 967284d..e5cd59c 100644
>>>>> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
>>>>> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
>>>>> @@ -340,18 +340,91 @@ namespace {
>>>>>        PM.run(*mod);
>>>>>     }
>>>>>
>>>>> +   // Kernel metadata
>>>>> +
>>>>> +   const llvm::MDNode *
>>>>> +   get_kernel_metadata(const llvm::Function *kernel_func) {
>>>>> +      auto mod = kernel_func->getParent();
>>>>> +      auto kernels_node = mod->getNamedMetadata("opencl.kernels");
>>>>> +      if (!kernels_node) {
>>>>> +         return nullptr;
>>>>> +      }
>>>>> +
>>>>> +      const llvm::MDNode *kernel_node = nullptr;
>>>>> +      for (unsigned i = 0; i < kernels_node->getNumOperands(); ++i) {
>>>>> +#if HAVE_LLVM >= 0x0306
>>>>> +         auto func = llvm::mdconst::dyn_extract<llvm::Function>(
>>>>> +#else
>>>>> +         auto func = llvm::dyn_cast<llvm::Function>(
>>>>> +#endif
>>>>> +                                    kernels_node->getOperand(i)->getOperand(0));
>>>>> +         if (func == kernel_func) {
>>>>> +            kernel_node = kernels_node->getOperand(i);
>>>>> +            break;
>>>>> +         }
>>>>> +      }
>>>>> +
>>>>> +      return kernel_node;
>>>>> +   }
>>>>> +
>>>>> +   llvm::MDNode*
>>>>> +   node_from_op_checked(const llvm::MDOperand &md_operand,
>>>>> +                        llvm::StringRef expect_name,
>>>>> +                        unsigned expect_num_args)
>>>>> +   {
>>>>> +      auto node = llvm::cast<llvm::MDNode>(md_operand);
>>>>> +      assert(node->getNumOperands() == expect_num_args &&
>>>>> +             "Wrong number of operands.");
>>>>> +
>>>>> +      auto str_node = llvm::cast<llvm::MDString>(node->getOperand(0));
>>>>> +      assert(str_node->getString() == expect_name &&
>>>>> +             "Wrong metadata node name.");
>>>>> +
>>>>> +      return node;
>>>>> +   }
>>>>> +
>>>>> +   struct kernel_arg_md {
>>>>> +      llvm::StringRef type_name;
>>>>> +      llvm::StringRef access_qual;
>>>>> +      kernel_arg_md(llvm::StringRef type_name_, llvm::StringRef access_qual_):
>>>>> +         type_name(type_name_), access_qual(access_qual_) {}
>>>>> +   };
>>>>> +
>>>>> +   std::vector<kernel_arg_md>
>>>>> +   get_kernel_arg_md(const llvm::Function *kernel_func) {
>>>>> +      auto num_args = kernel_func->getArgumentList().size();
>>>>> +
>>>>> +      auto kernel_node = get_kernel_metadata(kernel_func);
>>>>> +      auto aq = node_from_op_checked(kernel_node->getOperand(2),
>>>>> +                                     "kernel_arg_access_qual", num_args + 1);
>>>>> +      auto ty = node_from_op_checked(kernel_node->getOperand(3),
>>>>> +                                     "kernel_arg_type", num_args + 1);
>>>>> +
>>>>> +      auto res = std::vector<kernel_arg_md>();
>>>>> +      res.reserve(num_args);
>>>>
>>>> std::vector<kernel_arg_md> res;
>>>>
>>>>> +      for (unsigned i = 0; i < num_args; ++i) {
>>>>> +         res.push_back(kernel_arg_md(
>>>>> +            llvm::cast<llvm::MDString>(ty->getOperand(i+1))->getString(),
>>>>> +            llvm::cast<llvm::MDString>(aq->getOperand(i+1))->getString()));
>>>>> +      }
>>>>> +
>>>>> +      return res;
>>>>> +   }
>>>>> +
>>>>>     std::vector<module::argument>
>>>>>     get_kernel_args(const llvm::Module *mod, const std::string &kernel_name,
>>>>>                     const clang::LangAS::Map &address_spaces) {
>>>>>
>>>>>        std::vector<module::argument> args;
>>>>>        llvm::Function *kernel_func = mod->getFunction(kernel_name);
>>>>> +      assert(kernel_func && "Kernel name not found in module.");
>>>>> +      auto arg_md = get_kernel_arg_md(kernel_func);
>>>>>
>>>>>        llvm::DataLayout TD(mod);
>>>>> +      llvm::Type *size_type =
>>>>> +         TD.getSmallestLegalIntType(mod->getContext(), sizeof(cl_uint) * 8);
>>>>>
>>>>> -      for (llvm::Function::const_arg_iterator I = kernel_func->arg_begin(),
>>>>> -                                      E = kernel_func->arg_end(); I != E; ++I) {
>>>>> -         const llvm::Argument &arg = *I;
>>>>> +      for (const auto &arg: kernel_func->args()) {
>>>>>
>>>>>           llvm::Type *arg_type = arg.getType();
>>>>>           const unsigned arg_store_size = TD.getTypeStoreSize(arg_type);
>>>>> @@ -369,6 +442,68 @@ namespace {
>>>>>           unsigned target_size = TD.getTypeStoreSize(target_type);
>>>>>           unsigned target_align = TD.getABITypeAlignment(target_type);
>>>>>
>>>>> +         llvm::StringRef type_name = arg_md[arg.getArgNo()].type_name;
>>>>> +         llvm::StringRef access_qual = arg_md[arg.getArgNo()].access_qual;
>>>>> +
>>>>> +         // Image
>>>>> +         bool is_image2d = type_name == "image2d_t";
>>>>> +         bool is_image3d = type_name == "image3d_t";
>>>>
>>>> These should be marked const.
>>>>
>>>>> +         if (is_image2d || is_image3d) {
>>>>> +            bool is_write_only = access_qual == "write_only";
>>>>> +            bool is_read_only = access_qual == "read_only";
>>>>> +
>>>>
>>>> And these too.
>>>>
>>>>> +            typename module::argument::type marg_type;
>>>>> +            if (is_image2d && is_read_only) {
>>>>> +               marg_type = module::argument::image2d_rd;
>>>>> +            } else if (is_image2d && is_write_only) {
>>>>> +               marg_type = module::argument::image2d_wr;
>>>>> +            } else if (is_image3d && is_read_only) {
>>>>> +               marg_type = module::argument::image3d_rd;
>>>>> +            } else if (is_image3d && is_write_only) {
>>>>> +               marg_type = module::argument::image3d_wr;
>>>>> +            } else {
>>>>> +               assert(0 && "Wrong image access qualifier");
>>>>> +            }
>>>>> +
>>>>> +            args.push_back(module::argument(marg_type,
>>>>> +                                            arg_store_size, target_size,
>>>>> +                                            target_align,
>>>>> +                                            module::argument::zero_ext));
>>>>> +            continue;
>>>>> +         }
>>>>> +
>>>>> +         // Sampler
>>>>> +         if (type_name == "sampler_t") {
>>>>> +            args.push_back(module::argument(module::argument::sampler,
>>>>> +                                            arg_store_size, target_size,
>>>>> +                                            target_align,
>>>>> +                                            module::argument::zero_ext));
>>>>> +            continue;
>>>>> +         }
>>>>> +
>>>>> +         // Image size implicit argument
>>>>> +         if (type_name == "image_size") {
>>>>
>>>> I don't think it's a good idea to use such obvious names for the
>>>> implicit argument types.  Couldn't they collide with some type declared
>>>> by the user that happens to have the same name?  How about
>>>> "__clover_image_size" or something similar?  (Identifiers starting with
>>>> double underscore are reserved for the implementation at least on C and
>>>> GLSL, not sure about OpenCL-C)
>>>>
>>>>> +            args.push_back(module::argument(module::argument::scalar,
>>>>> +                                            sizeof(cl_uint),
>>>>> +                                            TD.getTypeStoreSize(size_type),
>>>>> +                                            TD.getABITypeAlignment(size_type),
>>>>> +                                            module::argument::zero_ext,
>>>>> +                                            module::argument::image_size));
>>>>> +            continue;
>>>>> +         }
>>>>> +
>>>>> +         // Image format implicit argument
>>>>> +         if (type_name == "image_format") {
>>>>> +            args.push_back(module::argument(module::argument::scalar,
>>>>> +                                            sizeof(cl_uint),
>>>>> +                                            TD.getTypeStoreSize(size_type),
>>>>> +                                            TD.getABITypeAlignment(size_type),
>>>>> +                                            module::argument::zero_ext,
>>>>> +                                            module::argument::image_format));
>>>>> +            continue;
>>>>> +         }
>>>>> +
>>>>> +         // Other types
>>>>>           if (llvm::isa<llvm::PointerType>(arg_type) && arg.hasByValAttr()) {
>>>>>              arg_type =
>>>>>                    llvm::dyn_cast<llvm::PointerType>(arg_type)->getElementType();
>>>>> @@ -413,9 +548,6 @@ namespace {
>>>>>        // Append implicit arguments.  XXX - The types, ordering and
>>>>>        // vector size of the implicit arguments should depend on the
>>>>>        // target according to the selected calling convention.
>>>>> -      llvm::Type *size_type =
>>>>> -         TD.getSmallestLegalIntType(mod->getContext(), sizeof(cl_uint) * 8);
>>>>> -
>>>>>        args.push_back(
>>>>>           module::argument(module::argument::scalar, sizeof(cl_uint),
>>>>>                            TD.getTypeStoreSize(size_type),
>>>>> @@ -744,6 +876,9 @@ clover::compile_program_llvm(const std::string &source,
>>>>>           std::vector<char> code = compile_native(mod, triple, processor,
>>>>>                                                   get_debug_flags() & DBG_ASM,
>>>>>                                                   r_log);
>>>>> +         // Target-specific passes may alter functions
>>>>> +         kernels.clear();
>>>>> +         find_kernels(mod, kernels);
>>>>
>>>> Bah...  To make this sort of mistake impossible I'd get rid of the
>>>> "kernels" argument of optimize() and build_module_native/llvm(), and
>>>> make them call find_kernels() directly.  It would also be nice to have
>>>> find_kernels() return its result as return value instead of returning
>>>> void, so you could declare kernel arrays as const and initialize them at
>>>> the same point.
>>>>
>>>>>           m = build_module_native(code, mod, kernels, address_spaces, r_log);
>>>>>           break;
>>>>>        }
>>>>> --
>>>>> 2.4.6