[PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

Mon Oct 9 12:58:02 UTC 2017

On 09.10.2017 14:33, Christian König wrote:
> Am 09.10.2017 um 13:27 schrieb Nicolai Hähnle:
>> On 09.10.2017 13:12, Christian König wrote:
>>>>
>>>>> Nicolai, how hard would it be to handle ENODEV as failure for all 
>>>>> currently existing contexts?
>>>>
>>>> Impossible? "All currently existing contexts" is not a well-defined 
>>>> concept when multiple drivers co-exist in the same process.
>>>
>>> Ok, let me refine the question: I assume there are resources "shared" 
>>> between contexts like binary shader code for example which needs to 
>>> be reuploaded when VRAM is lost.
>>>
>>> How hard would it be to handle that correctly?
>>
>> Okay, that makes more sense :)
>>
>> With the current interface it's still pretty difficult, but if we 
>> could get a new per-device query ioctl which returns a "VRAM loss 
>> counter", it would be reasonably straight-forward.
> 
> The problem with the VRAM lost counter is that this isn't save either. 
> E.g. you could have an application which uploads shaders, a GPU reset 
> happens and VRAM is lost and then the application creates a new context 
> and makes submission with broken shader binaries.

Hmm. Here's how I imagined we'd be using a VRAM lost counter:

int si_shader_binary_upload(...)
{
    ...
    shader->bo_vram_lost_counter = sscreen->vram_lost_counter;
    shader->bo = pipe_buffer_create(...);
    ptr = sscreen->b.ws->buffer_map(shader->bo->buf, ...);
    ... copies ...
    sscreen->b.ws->buffer_unmap(shader->bo->buf);
}

int si_shader_select(...)
{
    ...
    r = si_shader_select_with_key(ctx->sscreen, state, ...);
    if (r) return r;

    if (state->current->bo_vram_lost_counter !=
        ctx->sscreen->vram_lost_counter) {
       ... re-upload sequence ...
    }
}

(Not shown: logic that compares ctx->vram_lost_counter with 
sscreen->vram_lost_counter and forces a re-validation of all state 
including shaders.)

That should cover this scenario, shouldn't it?

Oh... I see one problem. But it should be easy to fix: when creating a 
new amdgpu context, Mesa needs to query the vram lost counter. So then 
the sequence of events would be either:

- VRAM lost counter starts at 0
- Mesa uploads a shader binary
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1
- Mesa creates a new amdgpu context, queries the VRAM lost counter --> 1
- si_screen::vram_lost_counter is updated to 1
- Draw happens on the new context --> si_shader_select will catch the 
VRAM loss

Or:

- VRAM lost counter starts at 0
- Mesa uploads a shader binary
- Mesa creates a new amdgpu context, VRAM lost counter still 0
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1
- Draw happens on the new context and proceeds normally
...
- Mesa flushes the CS, and the kernel will return an error code because 
the device VRAM lost counter is different from the amdgpu context VRAM 
lost counter

> So I would still vote for a separate IOCTL to reset the VRAM lost state 
> which is called *before" user space starts to reupload 
> shader/descriptors etc...

The question is: is that separate IOCTL per-context or per-fd? If it's 
per-fd, then it's not compatible with multiple drivers. If it's 
per-context, then I don't see how it helps. Perhaps you could explain?

 > This way you also catch the case when another reset happens while you
 > re-upload things.

My assumption would be that the re-upload happens *after* the new amdgpu 
context is created. Then the repeat reset should be caught by the kernel 
when we try to submit a CS on the new context (this is assuming that the 
kernel context's vram lost counter is initialized properly when the 
context is created):

- Mesa prepares upload, sets shader->bo_vram_lost_counter to 0
- Mesa uploads a shader binary
- While doing this, a GPU reset happens[0], kernel increments device 
VRAM lost counter to 1
- Draw happens with the new shader, Mesa proceeds normally
...
- On flush / CS submit, the kernel detects the VRAM lost state and 
returns an error to Mesa

[0] Out of curiosity: What happens on the CPU side if the PCI / full 
ASIC reset method is used? Is there a time window where we could get a SEGV?

[snip]
>> BTW, I still don't like ENODEV. It seems more like the kind of error 
>> code you'd return with hot-pluggable GPUs where the device can 
>> physically disappear...
> 
> Yeah, ECANCELED sounds like a better alternative. But I think we should 
> still somehow note the fatality of loosing VRAM to userspace.
> 
> How about ENODATA or EBADFD?

According to the manpage, EBADFD is "File descriptor in bad state.". 
Sounds fitting :)

Cheers,
Nicolai

> 
> Regards,
> Christian.
> 
>>
>> Cheers,
>> Nicolai
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 09.10.2017 um 13:04 schrieb Nicolai Hähnle:
>>>> On 09.10.2017 12:59, Christian König wrote:
>>>>> Nicolai, how hard would it be to handle ENODEV as failure for all 
>>>>> currently existing contexts?
>>>>
>>>> Impossible? "All currently existing contexts" is not a well-defined 
>>>> concept when multiple drivers co-exist in the same process.
>>>>
>>>> And what would be the purpose of this? If it's to support VRAM loss, 
>>>> having a per-context VRAM loss counter would enable each context to 
>>>> signal ECANCELED separately.
>>>>
>>>> Cheers,
>>>> Nicolai
>>>>
>>>>
>>>>>
>>>>> Monk, would it be ok with you when we return ENODEV only for 
>>>>> existing context when VRAM is lost and/or we have a strict mode GPU 
>>>>> reset? E.g. newly created contexts would continue work as they should.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 09.10.2017 um 12:49 schrieb Nicolai Hähnle:
>>>>>> Hi Monk,
>>>>>>
>>>>>> Yes, you're right, we're only using ECANCELED internally. But as a 
>>>>>> consequence, Mesa would already handle a kernel error of ECANCELED 
>>>>>> on context loss correctly :)
>>>>>>
>>>>>> Cheers,
>>>>>> Nicolai
>>>>>>
>>>>>> On 09.10.2017 12:35, Liu, Monk wrote:
>>>>>>> Hi Christian
>>>>>>>
>>>>>>> You reject some of my patches that returns -ENODEV, with the 
>>>>>>> cause that MESA doesn't do the handling on -ENODEV
>>>>>>>
>>>>>>> But if Nicolai can confirm that MESA do have a handling on 
>>>>>>> -ECANCELED, then we need to overall align our error code, on 
>>>>>>> detail below IOCTL can return error code:
>>>>>>>
>>>>>>> Amdgpu_cs_ioctl
>>>>>>> Amdgpu_cs_wait_ioctl
>>>>>>> Amdgpu_cs_wait_fences_ioctl
>>>>>>> Amdgpu_info_ioctl
>>>>>>>
>>>>>>>
>>>>>>> My patches is:
>>>>>>> return -ENODEV on cs_ioctl if the context is detected guilty,
>>>>>>> also return -ENODEV on cs_wait|cs_wait_fences if the fence is 
>>>>>>> signaled but with error -ETIME,
>>>>>>> also return -ENODEV on info_ioctl so UMD can query if gpu reset 
>>>>>>> happened after the process created (because for strict mode we 
>>>>>>> block process instead of context)
>>>>>>>
>>>>>>>
>>>>>>> according to Nicolai:
>>>>>>>
>>>>>>> amdgpu_cs_ioctl *can* return -ECANCELED, but to be frankly 
>>>>>>> speaking, kernel part doesn't have any place with "-ECANCELED" so 
>>>>>>> this solution on MESA side doesn't align with *current* amdgpu 
>>>>>>> driver,
>>>>>>> which only return 0 on success or -EINVALID on other error but 
>>>>>>> definitely no "-ECANCELED" error code,
>>>>>>>
>>>>>>> so if we talking about community rules we shouldn't let MESA 
>>>>>>> handle -ECANCELED ,  we should have a unified error code
>>>>>>>
>>>>>>> + Marek
>>>>>>>
>>>>>>> BR Monk
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Haehnle, Nicolai
>>>>>>> Sent: 2017年10月9日 18:14
>>>>>>> To: Koenig, Christian <Christian.Koenig at amd.com>; Liu, Monk 
>>>>>>> <Monk.Liu at amd.com>; Nicolai Hähnle <nhaehnle at gmail.com>; 
>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>> Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining 
>>>>>>> fences in amd_sched_entity_fini
>>>>>>>
>>>>>>> On 09.10.2017 10:02, Christian König wrote:
>>>>>>>>> For gpu reset patches (already submitted to pub) I would make 
>>>>>>>>> kernel
>>>>>>>>> return -ENODEV if the waiting fence (in cs_wait or wait_fences 
>>>>>>>>> IOCTL)
>>>>>>>>> founded as error, that way UMD would run into robust extension 
>>>>>>>>> path
>>>>>>>>> and considering the GPU hang occurred,
>>>>>>>> Well that is only closed source behavior which is completely
>>>>>>>> irrelevant for upstream development.
>>>>>>>>
>>>>>>>> As far as I know we haven't pushed the change to return -ENODEV 
>>>>>>>> upstream.
>>>>>>>
>>>>>>> FWIW, radeonsi currently expects -ECANCELED on CS submissions and 
>>>>>>> treats those as context lost. Perhaps we could use the same error 
>>>>>>> on fences?
>>>>>>> That makes more sense to me than -ENODEV.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Nicolai
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 09.10.2017 um 08:42 schrieb Liu, Monk:
>>>>>>>>> Christian
>>>>>>>>>
>>>>>>>>>> It would be really nice to have an error code set on
>>>>>>>>>> s_fence->finished before it is signaled, use 
>>>>>>>>>> dma_fence_set_error()
>>>>>>>>>> for this.
>>>>>>>>> For gpu reset patches (already submitted to pub) I would make 
>>>>>>>>> kernel
>>>>>>>>> return -ENODEV if the waiting fence (in cs_wait or wait_fences 
>>>>>>>>> IOCTL)
>>>>>>>>> founded as error, that way UMD would run into robust extension 
>>>>>>>>> path
>>>>>>>>> and considering the GPU hang occurred,
>>>>>>>>>
>>>>>>>>> Don't know if this is expected for the case of normal process 
>>>>>>>>> being
>>>>>>>>> killed or crashed like Nicolai hit ... since there is no gpu 
>>>>>>>>> hang hit
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> BR Monk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On
>>>>>>>>> Behalf Of Christian K?nig
>>>>>>>>> Sent: 2017年9月28日 23:01
>>>>>>>>> To: Nicolai Hähnle <nhaehnle at gmail.com>;
>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>> Cc: Haehnle, Nicolai <Nicolai.Haehnle at amd.com>
>>>>>>>>> Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining
>>>>>>>>> fences in amd_sched_entity_fini
>>>>>>>>>
>>>>>>>>> Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:
>>>>>>>>>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>>>>>>>>>
>>>>>>>>>> Highly concurrent Piglit runs can trigger a race condition 
>>>>>>>>>> where a
>>>>>>>>>> pending SDMA job on a buffer object is never executed because the
>>>>>>>>>> corresponding process is killed (perhaps due to a crash). 
>>>>>>>>>> Since the
>>>>>>>>>> job's fences were never signaled, the buffer object was 
>>>>>>>>>> effectively
>>>>>>>>>> leaked. Worse, the buffer was stuck wherever it happened to be at
>>>>>>>>>> the time, possibly in VRAM.
>>>>>>>>>>
>>>>>>>>>> The symptom was user space processes stuck in interruptible waits
>>>>>>>>>> with kernel stacks like:
>>>>>>>>>>
>>>>>>>>>>        [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
>>>>>>>>>>        [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
>>>>>>>>>>        [<ffffffffbc5e82d2>]
>>>>>>>>>> reservation_object_wait_timeout_rcu+0x1c2/0x300
>>>>>>>>>>        [<ffffffffc03ce56f>] 
>>>>>>>>>> ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0
>>>>>>>>>> [ttm]
>>>>>>>>>>        [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
>>>>>>>>>>        [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
>>>>>>>>>>        [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
>>>>>>>>>>        [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 
>>>>>>>>>> [ttm]
>>>>>>>>>>        [<ffffffffc042f523>] 
>>>>>>>>>> amdgpu_bo_create_restricted+0x1f3/0x470
>>>>>>>>>> [amdgpu]
>>>>>>>>>>        [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
>>>>>>>>>>        [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140
>>>>>>>>>> [amdgpu]
>>>>>>>>>>        [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120
>>>>>>>>>> [amdgpu]
>>>>>>>>>>        [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
>>>>>>>>>>        [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>>>>>>>>>>        [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
>>>>>>>>>>        [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
>>>>>>>>>>        [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
>>>>>>>>>>        [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>>>>>>>>> Acked-by: Christian König <christian.koenig at amd.com>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++++++-
>>>>>>>>>>     1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>>> index 54eb77cffd9b..32a99e980d78 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>>> @@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct
>>>>>>>>>> amd_gpu_scheduler *sched,
>>>>>>>>>> amd_sched_entity_is_idle(entity));
>>>>>>>>>>         amd_sched_rq_remove_entity(rq, entity);
>>>>>>>>>>         if (r) {
>>>>>>>>>>             struct amd_sched_job *job;
>>>>>>>>>>             /* Park the kernel for a moment to make sure it isn't
>>>>>>>>>> processing
>>>>>>>>>>              * our enity.
>>>>>>>>>>              */
>>>>>>>>>>             kthread_park(sched->thread);
>>>>>>>>>>             kthread_unpark(sched->thread);
>>>>>>>>>> -        while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
>>>>>>>>>> +        while (kfifo_out(&entity->job_queue, &job, 
>>>>>>>>>> sizeof(job))) {
>>>>>>>>>> +            struct amd_sched_fence *s_fence = job->s_fence;
>>>>>>>>>> +            amd_sched_fence_scheduled(s_fence);
>>>>>>>>> It would be really nice to have an error code set on
>>>>>>>>> s_fence->finished before it is signaled, use 
>>>>>>>>> dma_fence_set_error() for this.
>>>>>>>>>
>>>>>>>>> Additional to that it would be nice to note in the subject line 
>>>>>>>>> that
>>>>>>>>> this is a rather important bug fix.
>>>>>>>>>
>>>>>>>>> With that fixed the whole series is Reviewed-by: Christian König
>>>>>>>>> <christian.koenig at amd.com>.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> + amd_sched_fence_finished(s_fence);
>>>>>>>>>> + dma_fence_put(&s_fence->finished);
>>>>>>>>>>                 sched->ops->free_job(job);
>>>>>>>>>> +        }
>>>>>>>>>>         }
>>>>>>>>>>         kfifo_free(&entity->job_queue);
>>>>>>>>>>     }
>>>>>>>>>>     static void amd_sched_entity_wakeup(struct dma_fence *f, 
>>>>>>>>>> struct
>>>>>>>>>> dma_fence_cb *cb)
>>>>>>>>>>     {
>>>>>>>>>>         struct amd_sched_entity *entity =
>>>>>>>>>>             container_of(cb, struct amd_sched_entity, cb);
>>>>>>>>>>         entity->dependency = NULL;
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> amd-gfx mailing list
>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>>> _______________________________________________
>>>>>>>>> amd-gfx mailing list
>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>