[Mesa-dev] Leaked hardware event if kernel launch fails?

Pierre Moreau pierre.morrow at free.fr
Sun Dec 25 20:27:58 UTC 2016


Hello,

I noticed that, if trying to enqueue a kernel which had no
`module::section::text_executable` attached to its clover module, I would get a
`std::out_of_range` exception, instead of the expected
CL_INVALID_PROGRAM_EXECUTABLE (see [0]; I tried enqueueing using
`clEnqueueNDRangeKernel). I modified the `kernel::exec_context::bind()` method
to catch the out-of-range exceptions, and throw the proper clover exception
instead:

```
diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp
index 328323b6b0..1bb4f612cb 100644
--- a/src/gallium/state_trackers/clover/core/kernel.cpp
+++ b/src/gallium/state_trackers/clover/core/kernel.cpp
@@ -161,8 +161,18 @@ kernel::exec_context::bind(intrusive_ptr<command_queue> _q,
 
    // Bind kernel arguments.
    auto &m = kern.program().build(q->device()).binary;
-   auto margs = find(name_equals(kern.name()), m.syms).args;
-   auto msec = find(type_equals(module::section::text_executable), m.secs);
+   std::vector<module::argument> margs;
+   try {
+      margs = find(name_equals(kern.name()), m.syms).args;
+   } catch (const std::out_of_range &e) {
+      throw error(CL_INVALID_KERNEL);
+   }
+   module::section msec;
+   try {
+      msec = find(type_equals(module::section::text_executable), m.secs);
+   } catch (const std::out_of_range &e) {
+      throw error(CL_INVALID_PROGRAM_EXECUTABLE);
+   }
    auto explicit_arg = kern._args.begin();
 
    for (auto &marg : margs) {
```

But now, when my OpenCL program exists after the error, the destruction of the
`cl::CommandQueue` object doesn’t happen in a peaceful manner:

```
Program received signal SIGSEGV, Segmentation fault.
0x0000000000652e40 in ?? ()
(gdb) bt
#0  0x0000000000652e40 in ?? ()
#1  0x00007ffff7b2d29a in clover::command_queue::flush (this=this at entry=0x653310) at ../../../../../mesa_spirv/src/gallium/state_trackers/clover/core/queue.cpp:77
#2  0x00007ffff7b0ebc0 in clReleaseCommandQueue (d_q=0x653318) at ../../../../../mesa_spirv/src/gallium/state_trackers/clover/api/queue.cpp:63
#3  0x0000000000405125 in cl::detail::ReferenceHandler<_cl_command_queue*>::release (queue=0x653318) at /usr/include/CL/cl.hpp:1686
#4  0x0000000000405108 in cl::detail::Wrapper<_cl_command_queue*>::release (this=0x7fffffffda98) at /usr/include/CL/cl.hpp:1863
#5  0x00000000004050c7 in cl::detail::Wrapper<_cl_command_queue*>::~Wrapper (this=0x7fffffffda98) at /usr/include/CL/cl.hpp:1802
#6  0x00000000004047e5 in cl::CommandQueue::~CommandQueue (this=0x7fffffffda98) at /usr/include/CL/cl.hpp:5482
#7  0x00000000004046f8 in main () at instruction-set_OpenCL-std.cpp:58
(gdb) up
#1  0x00007ffff7b2d29a in clover::command_queue::flush (this=this at entry=0x653310) at ../../../../../mesa_spirv/src/gallium/state_trackers/clover/core/queue.cpp:77
77               queued_events.front()().fence(fence);
```

(I am using the OpenCL-C++ binding, and Mesa is quite recent (88b5acfa09) with
custom patches to get some OpenCL support for Nouveau.)

Looking around a bit with the debugger, it seems like the event created by the
`clEnqueueNDRangeKernel()` function still exists within the command queue:
shouldn’t it have been automatically removed from the queue as the enqueue
function failed?


Thank you for your help!
Pierre


[0]: https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueNDRangeKernel.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161225/b9851f0b/attachment.sig>


More information about the mesa-dev mailing list