[Beignet] Random error with very low prabability in Haswell platform

Gao, Sanshan gss at mail.ustc.edu.cn
Fri Apr 14 01:31:23 UTC 2017


The longest running time of single clEnqueueNDRangeKernel() is about 1166 ms.
The shortest running time of single clEnqueueNDRangeKernel() is about 335 ms.
dmesg doesn't give GPU hang hint.


Using the same data which is decrypted wrong in testing, it will be right if I compute it again in a split program, which is only responsible for decryption. And the probability of computation fault is only about 1% in benchmark testing. So, I think ECC (Error Correcting Codes) would be the reason now, or some other reasons which are similar to this.

-----原始邮件-----
发件人:"Yang, Rong R" <rong.r.yang at intel.com>
发送时间:2017-04-13 16:36:07 (星期四)
收件人: "Song, Ruiling" <ruiling.song at intel.com>, "Gao, Sanshan" <gss at mail.ustc.edu.cn>, "beignet at lists.freedesktop.org" <beignet at lists.freedesktop.org>
抄送:
主题: RE: [Beignet] Random error with very low prabability in Haswell platform



How long don’t your benchmark run? Does linux kernel reset the GPU? You could run `dmesg` to get this information.

 

From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of Song, Ruiling
Sent: Tuesday, April 11, 2017 9:22
To: Gao, Sanshan <gss at mail.ustc.edu.cn>; beignet at lists.freedesktop.org
Subject: Re: [Beignet] Random error with very low prabability in Haswell platform

 

Do you mean the ECC (Error Correcting Codes) on Intel GPU by “hardware mistakes”?

Intel GPU adds one bit ECC support to L3 Cache since Broadwell. For details, you can look at:

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol07-3d_media_gpgpu_3.pdf

I am not sure whether you problem is caused by the lack of ECC for L3 cache on Haswell.

But I think it may help you if you can find  a Broadwell machine to do some testing.

 

Thanks!

Ruiling

From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of Gao, Sanshan
Sent: Friday, April 7, 2017 4:47 PM
To:beignet at lists.freedesktop.org
Subject: [Beignet] Random error with very low prabability in Haswell platform

 

Hi, all,

 

I'm using Intel Iris Pro Graphics 5200 for general purpose computing, RSA decryption with OpenCL. However, I found that the calculated result would be wrong with very low probability in benchmark. In my experiments, this prabbility is bout "1%". And when I write out this cipher message to a file, which is not decrypted rightly in benchmark, and decrypt it individually, the result becomes right.

 

Did someone else meet similar situation? I guess there would be some problems with this integrated GPGPU (i.e. there are some mistakes with hardware platform, but not software implementation). I remembered that I heared of such deduction before, but I ignored it, because I had not met such error.

 

--------------

Platform: Intel Iris Pro Graphics 5200, OpenCL, Beignet

Grandtruth: computed reuslt by OpenSSL library

--------------

 

Thanks,

Sanshan

 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/beignet/attachments/20170414/a9d73873/attachment.html>


More information about the Beignet mailing list