<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
<p>Mesa does not handle -ECANCELED. It only returns -ECANCELED from the Mesa winsys layer if the CS ioctl wasn't called (because the context is already lost and so the winsys doesn't submit further CS ioctls).</p>
<p><br>
</p>
<p>When the CS ioctl fails for the first time, the kernel error is returned and the context is marked as "lost".</p>
<p>The next command submission is automatically dropped by the winsys and it returns -ECANCELED.<br>
</p>
<p><br>
</p>
<p>Marek<br>
</p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Haehnle, Nicolai<br>
<b>Sent:</b> Monday, October 9, 2017 2:58:02 PM<br>
<b>To:</b> Koenig, Christian; Liu, Monk; Nicolai Hähnle; amd-gfx@lists.freedesktop.org; Olsak, Marek<br>
<b>Cc:</b> Li, Bingley<br>
<b>Subject:</b> Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">On 09.10.2017 14:33, Christian König wrote:<br>
> Am 09.10.2017 um 13:27 schrieb Nicolai Hähnle:<br>
>> On 09.10.2017 13:12, Christian König wrote:<br>
>>>><br>
>>>>> Nicolai, how hard would it be to handle ENODEV as failure for all <br>
>>>>> currently existing contexts?<br>
>>>><br>
>>>> Impossible? "All currently existing contexts" is not a well-defined <br>
>>>> concept when multiple drivers co-exist in the same process.<br>
>>><br>
>>> Ok, let me refine the question: I assume there are resources "shared" <br>
>>> between contexts like binary shader code for example which needs to <br>
>>> be reuploaded when VRAM is lost.<br>
>>><br>
>>> How hard would it be to handle that correctly?<br>
>><br>
>> Okay, that makes more sense :)<br>
>><br>
>> With the current interface it's still pretty difficult, but if we <br>
>> could get a new per-device query ioctl which returns a "VRAM loss <br>
>> counter", it would be reasonably straight-forward.<br>
> <br>
> The problem with the VRAM lost counter is that this isn't save either. <br>
> E.g. you could have an application which uploads shaders, a GPU reset <br>
> happens and VRAM is lost and then the application creates a new context <br>
> and makes submission with broken shader binaries.<br>
<br>
Hmm. Here's how I imagined we'd be using a VRAM lost counter:<br>
<br>
int si_shader_binary_upload(...)<br>
{<br>
...<br>
shader->bo_vram_lost_counter = sscreen->vram_lost_counter;<br>
shader->bo = pipe_buffer_create(...);<br>
ptr = sscreen->b.ws->buffer_map(shader->bo->buf, ...);<br>
... copies ...<br>
sscreen->b.ws->buffer_unmap(shader->bo->buf);<br>
}<br>
<br>
int si_shader_select(...)<br>
{<br>
...<br>
r = si_shader_select_with_key(ctx->sscreen, state, ...);<br>
if (r) return r;<br>
<br>
if (state->current->bo_vram_lost_counter !=<br>
ctx->sscreen->vram_lost_counter) {<br>
... re-upload sequence ...<br>
}<br>
}<br>
<br>
(Not shown: logic that compares ctx->vram_lost_counter with <br>
sscreen->vram_lost_counter and forces a re-validation of all state <br>
including shaders.)<br>
<br>
That should cover this scenario, shouldn't it?<br>
<br>
Oh... I see one problem. But it should be easy to fix: when creating a <br>
new amdgpu context, Mesa needs to query the vram lost counter. So then <br>
the sequence of events would be either:<br>
<br>
- VRAM lost counter starts at 0<br>
- Mesa uploads a shader binary<br>
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1<br>
- Mesa creates a new amdgpu context, queries the VRAM lost counter --> 1<br>
- si_screen::vram_lost_counter is updated to 1<br>
- Draw happens on the new context --> si_shader_select will catch the <br>
VRAM loss<br>
<br>
Or:<br>
<br>
- VRAM lost counter starts at 0<br>
- Mesa uploads a shader binary<br>
- Mesa creates a new amdgpu context, VRAM lost counter still 0<br>
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1<br>
- Draw happens on the new context and proceeds normally<br>
...<br>
- Mesa flushes the CS, and the kernel will return an error code because <br>
the device VRAM lost counter is different from the amdgpu context VRAM <br>
lost counter<br>
<br>
<br>
> So I would still vote for a separate IOCTL to reset the VRAM lost state <br>
> which is called *before" user space starts to reupload <br>
> shader/descriptors etc...<br>
<br>
The question is: is that separate IOCTL per-context or per-fd? If it's <br>
per-fd, then it's not compatible with multiple drivers. If it's <br>
per-context, then I don't see how it helps. Perhaps you could explain?<br>
<br>
<br>
> This way you also catch the case when another reset happens while you<br>
> re-upload things.<br>
<br>
My assumption would be that the re-upload happens *after* the new amdgpu <br>
context is created. Then the repeat reset should be caught by the kernel <br>
when we try to submit a CS on the new context (this is assuming that the <br>
kernel context's vram lost counter is initialized properly when the <br>
context is created):<br>
<br>
- Mesa prepares upload, sets shader->bo_vram_lost_counter to 0<br>
- Mesa uploads a shader binary<br>
- While doing this, a GPU reset happens[0], kernel increments device <br>
VRAM lost counter to 1<br>
- Draw happens with the new shader, Mesa proceeds normally<br>
...<br>
- On flush / CS submit, the kernel detects the VRAM lost state and <br>
returns an error to Mesa<br>
<br>
[0] Out of curiosity: What happens on the CPU side if the PCI / full <br>
ASIC reset method is used? Is there a time window where we could get a SEGV?<br>
<br>
<br>
[snip]<br>
>> BTW, I still don't like ENODEV. It seems more like the kind of error <br>
>> code you'd return with hot-pluggable GPUs where the device can <br>
>> physically disappear...<br>
> <br>
> Yeah, ECANCELED sounds like a better alternative. But I think we should <br>
> still somehow note the fatality of loosing VRAM to userspace.<br>
> <br>
> How about ENODATA or EBADFD?<br>
<br>
According to the manpage, EBADFD is "File descriptor in bad state.". <br>
Sounds fitting :)<br>
<br>
Cheers,<br>
Nicolai<br>
<br>
<br>
> <br>
> Regards,<br>
> Christian.<br>
> <br>
>><br>
>> Cheers,<br>
>> Nicolai<br>
>><br>
>><br>
>>><br>
>>> Regards,<br>
>>> Christian.<br>
>>><br>
>>> Am 09.10.2017 um 13:04 schrieb Nicolai Hähnle:<br>
>>>> On 09.10.2017 12:59, Christian König wrote:<br>
>>>>> Nicolai, how hard would it be to handle ENODEV as failure for all <br>
>>>>> currently existing contexts?<br>
>>>><br>
>>>> Impossible? "All currently existing contexts" is not a well-defined <br>
>>>> concept when multiple drivers co-exist in the same process.<br>
>>>><br>
>>>> And what would be the purpose of this? If it's to support VRAM loss, <br>
>>>> having a per-context VRAM loss counter would enable each context to <br>
>>>> signal ECANCELED separately.<br>
>>>><br>
>>>> Cheers,<br>
>>>> Nicolai<br>
>>>><br>
>>>><br>
>>>>><br>
>>>>> Monk, would it be ok with you when we return ENODEV only for <br>
>>>>> existing context when VRAM is lost and/or we have a strict mode GPU <br>
>>>>> reset? E.g. newly created contexts would continue work as they should.<br>
>>>>><br>
>>>>> Regards,<br>
>>>>> Christian.<br>
>>>>><br>
>>>>> Am 09.10.2017 um 12:49 schrieb Nicolai Hähnle:<br>
>>>>>> Hi Monk,<br>
>>>>>><br>
>>>>>> Yes, you're right, we're only using ECANCELED internally. But as a <br>
>>>>>> consequence, Mesa would already handle a kernel error of ECANCELED <br>
>>>>>> on context loss correctly :)<br>
>>>>>><br>
>>>>>> Cheers,<br>
>>>>>> Nicolai<br>
>>>>>><br>
>>>>>> On 09.10.2017 12:35, Liu, Monk wrote:<br>
>>>>>>> Hi Christian<br>
>>>>>>><br>
>>>>>>> You reject some of my patches that returns -ENODEV, with the <br>
>>>>>>> cause that MESA doesn't do the handling on -ENODEV<br>
>>>>>>><br>
>>>>>>> But if Nicolai can confirm that MESA do have a handling on <br>
>>>>>>> -ECANCELED, then we need to overall align our error code, on <br>
>>>>>>> detail below IOCTL can return error code:<br>
>>>>>>><br>
>>>>>>> Amdgpu_cs_ioctl<br>
>>>>>>> Amdgpu_cs_wait_ioctl<br>
>>>>>>> Amdgpu_cs_wait_fences_ioctl<br>
>>>>>>> Amdgpu_info_ioctl<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> My patches is:<br>
>>>>>>> return -ENODEV on cs_ioctl if the context is detected guilty,<br>
>>>>>>> also return -ENODEV on cs_wait|cs_wait_fences if the fence is <br>
>>>>>>> signaled but with error -ETIME,<br>
>>>>>>> also return -ENODEV on info_ioctl so UMD can query if gpu reset <br>
>>>>>>> happened after the process created (because for strict mode we <br>
>>>>>>> block process instead of context)<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> according to Nicolai:<br>
>>>>>>><br>
>>>>>>> amdgpu_cs_ioctl *can* return -ECANCELED, but to be frankly <br>
>>>>>>> speaking, kernel part doesn't have any place with "-ECANCELED" so <br>
>>>>>>> this solution on MESA side doesn't align with *current* amdgpu <br>
>>>>>>> driver,<br>
>>>>>>> which only return 0 on success or -EINVALID on other error but <br>
>>>>>>> definitely no "-ECANCELED" error code,<br>
>>>>>>><br>
>>>>>>> so if we talking about community rules we shouldn't let MESA <br>
>>>>>>> handle -ECANCELED , we should have a unified error code<br>
>>>>>>><br>
>>>>>>> + Marek<br>
>>>>>>><br>
>>>>>>> BR Monk<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> -----Original Message-----<br>
>>>>>>> From: Haehnle, Nicolai<br>
>>>>>>> Sent: 2017年10月9日 18:14<br>
>>>>>>> To: Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <br>
>>>>>>> <Monk.Liu@amd.com>; Nicolai Hähnle <nhaehnle@gmail.com>; <br>
>>>>>>> amd-gfx@lists.freedesktop.org<br>
>>>>>>> Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining <br>
>>>>>>> fences in amd_sched_entity_fini<br>
>>>>>>><br>
>>>>>>> On 09.10.2017 10:02, Christian König wrote:<br>
>>>>>>>>> For gpu reset patches (already submitted to pub) I would make <br>
>>>>>>>>> kernel<br>
>>>>>>>>> return -ENODEV if the waiting fence (in cs_wait or wait_fences <br>
>>>>>>>>> IOCTL)<br>
>>>>>>>>> founded as error, that way UMD would run into robust extension <br>
>>>>>>>>> path<br>
>>>>>>>>> and considering the GPU hang occurred,<br>
>>>>>>>> Well that is only closed source behavior which is completely<br>
>>>>>>>> irrelevant for upstream development.<br>
>>>>>>>><br>
>>>>>>>> As far as I know we haven't pushed the change to return -ENODEV <br>
>>>>>>>> upstream.<br>
>>>>>>><br>
>>>>>>> FWIW, radeonsi currently expects -ECANCELED on CS submissions and <br>
>>>>>>> treats those as context lost. Perhaps we could use the same error <br>
>>>>>>> on fences?<br>
>>>>>>> That makes more sense to me than -ENODEV.<br>
>>>>>>><br>
>>>>>>> Cheers,<br>
>>>>>>> Nicolai<br>
>>>>>>><br>
>>>>>>>><br>
>>>>>>>> Regards,<br>
>>>>>>>> Christian.<br>
>>>>>>>><br>
>>>>>>>> Am 09.10.2017 um 08:42 schrieb Liu, Monk:<br>
>>>>>>>>> Christian<br>
>>>>>>>>><br>
>>>>>>>>>> It would be really nice to have an error code set on<br>
>>>>>>>>>> s_fence->finished before it is signaled, use <br>
>>>>>>>>>> dma_fence_set_error()<br>
>>>>>>>>>> for this.<br>
>>>>>>>>> For gpu reset patches (already submitted to pub) I would make <br>
>>>>>>>>> kernel<br>
>>>>>>>>> return -ENODEV if the waiting fence (in cs_wait or wait_fences <br>
>>>>>>>>> IOCTL)<br>
>>>>>>>>> founded as error, that way UMD would run into robust extension <br>
>>>>>>>>> path<br>
>>>>>>>>> and considering the GPU hang occurred,<br>
>>>>>>>>><br>
>>>>>>>>> Don't know if this is expected for the case of normal process <br>
>>>>>>>>> being<br>
>>>>>>>>> killed or crashed like Nicolai hit ... since there is no gpu <br>
>>>>>>>>> hang hit<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> BR Monk<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> -----Original Message-----<br>
>>>>>>>>> From: amd-gfx [<a href="mailto:amd-gfx-bounces@lists.freedesktop.org">mailto:amd-gfx-bounces@lists.freedesktop.org</a>] On<br>
>>>>>>>>> Behalf Of Christian K?nig<br>
>>>>>>>>> Sent: 2017年9月28日 23:01<br>
>>>>>>>>> To: Nicolai Hähnle <nhaehnle@gmail.com>;<br>
>>>>>>>>> amd-gfx@lists.freedesktop.org<br>
>>>>>>>>> Cc: Haehnle, Nicolai <Nicolai.Haehnle@amd.com><br>
>>>>>>>>> Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining<br>
>>>>>>>>> fences in amd_sched_entity_fini<br>
>>>>>>>>><br>
>>>>>>>>> Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:<br>
>>>>>>>>>> From: Nicolai Hähnle <nicolai.haehnle@amd.com><br>
>>>>>>>>>><br>
>>>>>>>>>> Highly concurrent Piglit runs can trigger a race condition <br>
>>>>>>>>>> where a<br>
>>>>>>>>>> pending SDMA job on a buffer object is never executed because the<br>
>>>>>>>>>> corresponding process is killed (perhaps due to a crash). <br>
>>>>>>>>>> Since the<br>
>>>>>>>>>> job's fences were never signaled, the buffer object was <br>
>>>>>>>>>> effectively<br>
>>>>>>>>>> leaked. Worse, the buffer was stuck wherever it happened to be at<br>
>>>>>>>>>> the time, possibly in VRAM.<br>
>>>>>>>>>><br>
>>>>>>>>>> The symptom was user space processes stuck in interruptible waits<br>
>>>>>>>>>> with kernel stacks like:<br>
>>>>>>>>>><br>
>>>>>>>>>> [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250<br>
>>>>>>>>>> [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0<br>
>>>>>>>>>> [<ffffffffbc5e82d2>]<br>
>>>>>>>>>> reservation_object_wait_timeout_rcu+0x1c2/0x300<br>
>>>>>>>>>> [<ffffffffc03ce56f>] <br>
>>>>>>>>>> ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0<br>
>>>>>>>>>> [ttm]<br>
>>>>>>>>>> [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]<br>
>>>>>>>>>> [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]<br>
>>>>>>>>>> [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]<br>
>>>>>>>>>> [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 <br>
>>>>>>>>>> [ttm]<br>
>>>>>>>>>> [<ffffffffc042f523>] <br>
>>>>>>>>>> amdgpu_bo_create_restricted+0x1f3/0x470<br>
>>>>>>>>>> [amdgpu]<br>
>>>>>>>>>> [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]<br>
>>>>>>>>>> [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140<br>
>>>>>>>>>> [amdgpu]<br>
>>>>>>>>>> [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120<br>
>>>>>>>>>> [amdgpu]<br>
>>>>>>>>>> [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]<br>
>>>>>>>>>> [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]<br>
>>>>>>>>>> [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0<br>
>>>>>>>>>> [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90<br>
>>>>>>>>>> [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad<br>
>>>>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff<br>
>>>>>>>>>><br>
>>>>>>>>>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com><br>
>>>>>>>>>> Acked-by: Christian König <christian.koenig@amd.com><br>
>>>>>>>>>> ---<br>
>>>>>>>>>> drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++++++-<br>
>>>>>>>>>> 1 file changed, 6 insertions(+), 1 deletion(-)<br>
>>>>>>>>>><br>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
>>>>>>>>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
>>>>>>>>>> index 54eb77cffd9b..32a99e980d78 100644<br>
>>>>>>>>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
>>>>>>>>>> @@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct<br>
>>>>>>>>>> amd_gpu_scheduler *sched,<br>
>>>>>>>>>> amd_sched_entity_is_idle(entity));<br>
>>>>>>>>>> amd_sched_rq_remove_entity(rq, entity);<br>
>>>>>>>>>> if (r) {<br>
>>>>>>>>>> struct amd_sched_job *job;<br>
>>>>>>>>>> /* Park the kernel for a moment to make sure it isn't<br>
>>>>>>>>>> processing<br>
>>>>>>>>>> * our enity.<br>
>>>>>>>>>> */<br>
>>>>>>>>>> kthread_park(sched->thread);<br>
>>>>>>>>>> kthread_unpark(sched->thread);<br>
>>>>>>>>>> - while (kfifo_out(&entity->job_queue, &job, sizeof(job)))<br>
>>>>>>>>>> + while (kfifo_out(&entity->job_queue, &job, <br>
>>>>>>>>>> sizeof(job))) {<br>
>>>>>>>>>> + struct amd_sched_fence *s_fence = job->s_fence;<br>
>>>>>>>>>> + amd_sched_fence_scheduled(s_fence);<br>
>>>>>>>>> It would be really nice to have an error code set on<br>
>>>>>>>>> s_fence->finished before it is signaled, use <br>
>>>>>>>>> dma_fence_set_error() for this.<br>
>>>>>>>>><br>
>>>>>>>>> Additional to that it would be nice to note in the subject line <br>
>>>>>>>>> that<br>
>>>>>>>>> this is a rather important bug fix.<br>
>>>>>>>>><br>
>>>>>>>>> With that fixed the whole series is Reviewed-by: Christian König<br>
>>>>>>>>> <christian.koenig@amd.com>.<br>
>>>>>>>>><br>
>>>>>>>>> Regards,<br>
>>>>>>>>> Christian.<br>
>>>>>>>>><br>
>>>>>>>>>> + amd_sched_fence_finished(s_fence);<br>
>>>>>>>>>> + dma_fence_put(&s_fence->finished);<br>
>>>>>>>>>> sched->ops->free_job(job);<br>
>>>>>>>>>> + }<br>
>>>>>>>>>> }<br>
>>>>>>>>>> kfifo_free(&entity->job_queue);<br>
>>>>>>>>>> }<br>
>>>>>>>>>> static void amd_sched_entity_wakeup(struct dma_fence *f, <br>
>>>>>>>>>> struct<br>
>>>>>>>>>> dma_fence_cb *cb)<br>
>>>>>>>>>> {<br>
>>>>>>>>>> struct amd_sched_entity *entity =<br>
>>>>>>>>>> container_of(cb, struct amd_sched_entity, cb);<br>
>>>>>>>>>> entity->dependency = NULL;<br>
>>>>>>>>><br>
>>>>>>>>> _______________________________________________<br>
>>>>>>>>> amd-gfx mailing list<br>
>>>>>>>>> amd-gfx@lists.freedesktop.org<br>
>>>>>>>>> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
>>>>>>>>> _______________________________________________<br>
>>>>>>>>> amd-gfx mailing list<br>
>>>>>>>>> amd-gfx@lists.freedesktop.org<br>
>>>>>>>>> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
>>>>>>>><br>
>>>>>>>><br>
>>>>>>><br>
>>>>>><br>
>>>>><br>
>>>><br>
>>><br>
>><br>
> <br>
<br>
</div>
</span></font>
</body>
</html>