[PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

Fri Jan 11 19:11:13 UTC 2019

Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey:
>
> On 01/11/2019 04:42 AM, Koenig, Christian wrote:
>> Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey:
>>> [SNIP]
>>>>>> But we will not be adding the cb back in drm_sched_stop anymore, now we
>>>>>> are only going to add back the cb in drm_sched_startr after rerunning
>>>>>> those jobs in drm_sched_resubmit_jobs and assign them a new parent
>>>>>> there
>>>>>> anyway.
>>>>> Yeah, but when we find that we don't need to reset anything anymore
>>>>> then adding the callbacks again won't be possible any more.
>>>>>
>>>>> Christian.
>>>> I am not sure I understand it, can u point me to example of how this
>>>> will happen ? I am attaching my latest patches with waiting only for
>>>> the last job's fence here just so we are on same page regarding the code.
>> Well the whole idea is to prepare all schedulers, then check once more
>> if the offending job hasn't completed in the meantime.
>>
>> If the job completed we need to be able to rollback everything and
>> continue as if nothing had happened.
>>
>> Christian.
> Oh, but this piece of functionality - skipping HW ASIC reset in case the
> guilty job done is totally missing form this patch series and so needs
> to be added. So what you say actually is that for the case were we skip
> HW asic reset because the guilty job did complete we also need to skip
> resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the
> old parent fence pointer for reuse ? If so I would like to add all this
> functionality as a third patch since the first 2 patches are more about
> resolving race condition with jobs in flight while doing reset - what do
> you think ?

Yeah, sounds perfectly fine to me.

Christian.

>
> Andrey
>>>> Andrey
>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx