[Bug 100712] ring 0 stalled after bytes_moved_threshold reached - Cap Verde - HD 7770
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Apr 19 12:03:32 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=100712
--- Comment #5 from Julien Isorce <julien.isorce at gmail.com> ---
(In reply to Michel Dänzer from comment #4)
> (In reply to Julien Isorce from comment #0)
> > In kernel radeon_object.c::radeon_bo_list_validate, once "bytes_moved >
> > bytes_moved_threshold" is reached (this is the case for 850 bo in the same
> > list_for_each_entry loop), I can see that radeon_ib_schedule emits a fence
> > that it takes more than the radeon.lockup_timeout to be signaled.
>
> radeon_ib_schedule is called for submitting the command stream from
> userspace, not for any BO moves directly, right?
>
> How did you determine that this hang is directly related to bytes_moved /
> bytes_moved_threshold? Maybe it's only indirectly related, e.g. due to the
> threshold preventing a BO from being moved to VRAM despite userspace's
> preference.
>
I added a trace and the fence that is not signaled on time is always the one
emited by radeon_ib_schedule after that the bytes_moved_threshold is reached.
But you are right it could be only indirectly related.
Here is the sequence I have:
ioctl_radeon_cs
radeon_bo_list_validate
bytes_moved > bytes_moved_threshold(=1024*1024ull)
800 bo are not moved from gtt to vram because of that.
radeon_cs_ib_vm_chunk
radeon_ib_schedule(rdev, &parser->ib, NULL, true);
radeon_fence_emit on ring 0
r600_mmio_hdp_flush
/ioctl_radeon_cs
Then anything calling ttm_bo_wait will block more than the
radeon.lockup_timeout because the above fence is not signaled on time.
Could it be that something is not flushed properly ? (ref:
https://patchwork.kernel.org/patch/5807141/ ? tlb_flush ?)
Are you saying that some bos are required to be moved from gtt to vram in order
for this fence to be signaled ?
As you can see above it happens when vram_usage >= half_vram so
radeon_bo_get_threshold_for_moves returns 1024*1024, which explains why only 1
or 2 bos can be moved from gtt to vram in that case and why all others are
forced to stay in gtt.
In the same run of radeon_bo_list_validate there are many calls to
ttm_bo_validate with both domain and current_domain as VRAM, this is the case
for around 400 bo. Maybe this cause delay for this fence to be signaled,
providing vram usage is high too.
>
> > Also it seems the fence is signaled by swapper after more than 10 seconds
> > but it is too late. I requires to reduce the "15" param above to 4 to see
> > that.
>
> How does "swapper" (what is that exactly?) signal the fence?
My wording was wrong sorry, I should have said "the first entity noticing that
the fence is signaled" by calling radeon_fence_activity. swapper is the name
for process 0 (idle). I change drm logging to print process name and id:
(current->comm, current->pid)
>
> It might be worth looking into why this happens, though. If domain ==
> current_domain == RADEON_GEM_DOMAIN_VRAM, I wouldn't expect ttm_bo_validate
> to trigger a blit.
I will check though I think I get just confused by a previous trace.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20170419/20b0c3da/attachment.html>
More information about the dri-devel
mailing list