Long radeon stalls on recent kernels

Tue Dec 16 00:00:59 PST 2014

On 11.12.2014 14:13, Andy Lutomirski wrote:
> On Wed, Dec 10, 2014 at 8:24 PM, Michel Dänzer <michel at daenzer.net> wrote:
>> On 11.12.2014 05:28, Andy Lutomirski wrote:
>>> On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>> On 10.12.2014 06:39, Andy Lutomirski wrote:
>>>>> On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski <luto at amacapital.net> wrote:
>>>>>> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer <michel at daenzer.net> wrote:
>>>>>>> On 09.12.2014 09:24, Andy Lutomirski wrote:
>>>>>>>>
>>>>>>>> The relevant line from latencytop seems to be:
>>>>>>>>
>>>>>>>> 154 20441402 489139 radeon_fence_default_wait [radeon]
>>>>>>>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm]
>>>>>>>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon]
>>>>>>>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first
>>>>>>>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm]
>>>>>>>> radeon_bo_fault_reserve_notify [radeon]
>>>>>>>
>>>>>>> Which process is this?
>>>>>>
>>>>>> Xorg
>>>>>>
>>>>>>>
>>>>>>> Looks like CPU access to a BO in VRAM, but the BO is located outside of
>>>>>>> the CPU visible area of VRAM, so it has to be moved into the CPU visible
>>>>>>> area first.
>>
>> [...]
>>
>>>>> But I'm still waiting for the day that buggy userspace *can't* cause
>>>>> kernel graphics stalls.
>>>>
>>>> Actually, this looks more like buggy userspace stalling itself. :)
>>>
>>> I thought the stall was the kernel evicting things from vram.  Why
>>> does it need to wait for userspace for that?  Is it that userspace is
>>> actively using whatever's being evicted?
>>
>> As I explained above, the stall happens because userspace does CPU
>> access to a BO which resides in the CPU-inaccessible part of VRAM. The
>> kernel has to move the BO into the CPU accessible part of VRAM before it
>> can let userspace proceed.
> 
> Sure, but why does that take nearly 500ms?  Even if the object in
> question is the entire framebuffer, that still seems extraordinarily
> slow.

It has to wait for any previously queued GPU operations and the eviction
of other buffers. Also, TTM buffer moves are currently synchronous, i.e.
TTM waits for a buffer to become idle before starting its move, which
means we don't get maximum throughput for a series of buffer moves.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer