[Beignet] johntheripper/OpenCL clGetEventProfilingInfo issue

Fri Oct 24 23:46:06 PDT 2014

Zhigang Gong <zhigang.gong at gmail.com> writes:

> This should be an application bug, according to OpenCL 1.2 spec:
>
>   CL_PROFILING_INFO_NOT_AVAILABLE if the CL_QUEUE_PROFILING_ENABLE
>   flag is not set for the command-queue, if the execution status of
> the command identified
>   by event is not CL_COMPLETE or if event is a user event object.
>
> To make sure an event's state to be CL_COMPLETE, you need to call
> clWaitForEvents()
> rather than clFinish().
>
> According to spec, clFinish() is used to :
>   blocks until all previously queued OpenCL commands in command_queue are issued
>   to the associated device and have completed.
>
> It is not to update all the related event's state. And it is too
> heavy, as it will wait for the command
> to be completed. The event's CL_COMPLETE state means the command has
> been flushed into
> the GPU's command buffer and may haven't completed. It's used to do
> GPU command queue
> side synchronization. clFinish() is to synchronize with host CPU.
>
> I would recommend you to call clWaitForEvents before you call the
> clGetEventProfilingInfo().
> If you still met problems with that change, please let us know.
>

Thanks, It's works. Slow, but works. Maybe this is the problem with
their implementation.

Btw, i call clWaitForEvents for 1 event in list every time before
calling clGetEventProfilingInfo on that event. Is it ok, or should I
call it for a whole event list?

Also, I use i915 driver with the next args:
i915.modeset=1 i915.i915_enable_rc6=1 i915.i915_enable_fbc=1
i915.lvds_downclock=1

They shouldn't influence the speed, aren't they?

// Some bench output:

magnumripper_JohnTheRipper > run/john -format=Raw-MD5-opencl -te
Device 0: Intel(R) HD Graphics IvyBridge M GT2
Local worksize (LWS) 16, global worksize (GWS) 1048576
Benchmarking: Raw-MD5-opencl [MD5 OpenCL (inefficient, development use only)]... DONE
Raw:	30107K c/s real, 84468K c/s virtual

magnumripper_JohnTheRipper > run/john -format=Raw-MD5 -te       
Will run 4 OpenMP threads
Benchmarking: Raw-MD5 [MD5 128/128 AVX 12x]... (4xOMP) DONE
Raw:	40206K c/s real, 10497K c/s virtual

magnumripper_JohnTheRipper > run/john -format=ecnfs -te  
Unknown ciphertext format name requested
magnumripper_JohnTheRipper > run/john -format=encfs -te
Will run 4 OpenMP threads
Benchmarking: EncFS [PBKDF2-SHA1 AES/Blowfish 8x SSE2]... (4xOMP) DONE
Raw:	62.1 c/s real, 16.4 c/s virtual

magnumripper_JohnTheRipper > run/john -format=encfs-opencl -te
Will run 4 OpenMP threads
Device 0: Intel(R) HD Graphics IvyBridge M GT2
Local worksize (LWS) 64, global worksize (GWS) 64
Benchmarking: encfs-opencl, EncFS [PBKDF2-SHA1 OpenCL 4x AES/Blowfish]... (4xOMP) DONE
Raw:	7.8 c/s real, 4266 c/s virtual

magnumripper_JohnTheRipper > run/john -format=PBKDF2-HMAC-SHA1-opencl -te
Device 0: Intel(R) HD Graphics IvyBridge M GT2
Local worksize (LWS) 64, global worksize (GWS) 8192
Benchmarking: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL 4x]... DONE
Raw:	12459 c/s real, 3276K c/s virtual

magnumripper_JohnTheRipper > run/john -format=PBKDF2-HMAC-SHA1 -te       
Will run 4 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 8x SSE2]... (4xOMP) DONE
Raw:	16062 c/s real, 5957 c/s virtual

Thanks.

// wbr
// alxchk