[Bug 99488] [r600g]OpenCL driver causes ImageMagick to hang on JPEG input in Gaussian Blur kernel

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Feb 3 03:39:04 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=99488

--- Comment #3 from Jan Vesely <jv356 at scarletmail.rutgers.edu> ---
(In reply to nixscripter from comment #1)
> I'm still trying some versions in order to help you guys pin this down (it's
> not always easy to tell what reinstall is having what effect, since Arch
> Linux has three packages involved). In the mean time, I did the basics on
> the process in its hung state.
> 
> It's currently running three threads, two blocked, one continuing to run:
> 
> (gdb) info threads 
>   Id   Target Id         Frame 
> * 1    Thread 0x39ac9cdf7c0 (LWP 3806) "display" 0x0000039abefef921 in
> llvm::MachineInstr::findRegisterDefOperandIdx(unsigned int, bool, bool,
> llvm::TargetRegisterInfo const*) const () from /usr/lib/libLLVM-5.0svn.so

can you get backtrace of this thread?
does it ever leave this function? you can check by adding breakpoint on that
function and checking if it gets hit.
this can be repeated going up the stack to find the function that won't exit.

>   2    Thread 0x39abd04f700 (LWP 3809) "radeon_cs:0" 0x0000039ac6b0310f in
> pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
>   3    Thread 0x39abadd4700 (LWP 3814) "display" futex_wait (val=8, 
>     addr=0x25349d4)
>     at /build/gcc-multilib/src/gcc/libgomp/config/linux/x86/futex.h:44
> (gdb)
> 
> 
> What is that call to findRegisterDefOperandIdx doing?

there's a loop, it can't be infinite, but if the num of operands is corrupted,
it can take a very long time to finish. can you check "p e" in gdb?

> It's not entirely
> clear, but it's sucking up a lot of memory. Running strace confirms that: 
> 
> strace: Process 3806 attached with 3 threads
> strace: [ Process PID=3806 runs in x32 mode. ]
> [pid  3809] futex(0x2599e64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
> [pid  3814] futex(0x25349d4, FUTEX_WAIT_PRIVATE, 8, NULL <unfinished ...>
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x640f4000
> strace: [ Process PID=3806 runs in 64 bit mode. ]
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a638f3000
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a630f2000
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a628f1000
> [...]
> 
> And down the address space it goes, 0x1000 bytes (4k) a time or two per
> second.

the above mmaps show 8M (+4K, probably for bookkeeping) allocations. is there
any other, not shown? I haven't found anything in the mentioned function that
would need such big amount of memory, the hand if probably higher in the call
stack.

> 
> Looking at the function name, I'm thinking about what Jan said on another
> bug:
> 
> > the hang is probably a separate bug. ImageMagick test suite results on my Turks GPU are:
> > # TOTAL: 86
> > # PASS:  78
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  3
> > # XPASS: 0
> > # ERROR: 5
> >
> > the errors and failures are accompanied by:
> > Assertion `i < getNumRegs() && "Register number out of range!"' failed.
> 
> Could this be perhaps the same registers that were out of range on a
> different card?

all cards of one class have the same number of architecturally available
registers.
I see you have debug symbols, is that a debug build? if not, it can be that the
assert is not hit, and the hang is just fallout.

> 
> Either way, I will continue to investigate, and hope to narrow down the
> issue soon.

thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20170203/3bab73c6/attachment.html>


More information about the dri-devel mailing list