[PATCH 2/2] drm/scheduler: stop setting rq to NULL
Andrey Grodzovsky
Andrey.Grodzovsky at amd.com
Thu Aug 9 17:39:56 UTC 2018
On 08/06/2018 04:14 AM, Christian König wrote:
> Am 03.08.2018 um 20:41 schrieb Andrey Grodzovsky:
>>
>>
>>
>> On 08/06/2018 08:44 AM, Christian König wrote:
>>> Am 03.08.2018 um 16:54 schrieb Andrey Grodzovsky:
>>>>
>>>> [SNIP]
>>>>
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > Second of all, even after we removed the entity from rq in
>>>>>>> > drm_sched_entity_flush to terminate any subsequent submissions
>>>>>>> >
>>>>>>> > to the queue the other thread doing push job can just
>>>>>>> acquire again a
>>>>>>> > run queue
>>>>>>> >
>>>>>>> > from drm_sched_entity_get_free_sched and continue submission
>>>>>>>
>>>>>>> Hi Christian
>>>>>>>
>>>>>>> That is actually desired.
>>>>>>>
>>>>>>> When another process is now using the entity to submit jobs
>>>>>>> adding it
>>>>>>> back to the rq is actually the right thing to do cause the
>>>>>>> entity is
>>>>>>> still in use.
>>>>>>>
>>>>>>
>>>>>> Yes, no problem if it's another process. But what about another
>>>>>> thread from same process ? Is it a possible use case that 2
>>>>>> threads from same process submit to same entity concurrently ? If
>>>>>> so and we specifically kill one, the other will not stop event if
>>>>>> we want him to because current code makes him just require a rq
>>>>>> for him self.
>>>>>
>>>>> Well you can't kill a single thread of a process (you can only
>>>>> interrupt it), a SIGKILL will always kill the whole process.
>>>>
>>>> Is the following scenario possible and acceptable ?
>>>> 2 threads from same process working on same queue where thread A
>>>> currently in drm_sched_entity_flush->wait_event_timeout (the
>>>> process getting shut down because of SIGKILL sent by user)
>>>> while thread B still inside drm_sched_entity_push_job before 'if
>>>> (reschedule)'. 'A' stopped waiting because queue became empty and
>>>> then removes the entity queue from scheduler's run queue while
>>>> B goes inside 'reschedule' because it evaluates to true ('first' is
>>>> true and all the rest of the conditions), acquires new rq, and
>>>> later adds it back to scheduler (different one maybe) and keeps
>>>> submitting jobs as much as he likes and then can be stack for up to
>>>> 'timeout' time in his drm_sched_entity_flush waiting for them.
>>>
>>> I'm not 100% sure but I don't think that can happen.
>>>
>>> See flushing the fd is done while dropping the fd, which happens
>>> only after all threads of the process in question are killed.
>>
>> Yea, this FDs handling is indeed a lot of gray area for me but as far
>> as I remember flushing is done per each thread when exits (possibly
>> due to a signal).
>> Now signals interception and processing (as a result of which .flush
>> will get called if SIGKILL received) is done in some points amongst
>> which is when returning from IOCTL.
>> So if first thread was at the very end of the CS ioctl when SIGKILL
>> was received while the other one at the beginning then I think we
>> might see something like the scenario above.
>
> SIGKILL isn't processed as long as any thread of the application is
> still inside the kernel. That's why we have wait_event_killable().
Can you tell me where is this happening ? What i see is in the code is that
do_group_exit calls zap_other_threads which just adds SIGKILL to signal
sets of other threads in group and sends a wake up.
Then do_exit will close all FDs for current thread and so .flush will be
called, when last thread drops it's refcount for the FD .release will be
called.
Andrey
>
> So I don't think that the scenario above is possible, but I'm really
> not 100% sure either.
>
> Christian.
>
>>
>> Andrey
>>
>>>
>>> Otherwise the flushing wouldn't make to much sense. In other words
>>> imagine an application where a thread does a write() on a fd which
>>> is killed.
>>>
>>> The idea of the flush is to preserve the data and that won't work if
>>> that isn't correctly ordered.
>>>
>>>> My understanding was that introduction of entity->last is to force
>>>> immediate termination job submissions by any thread from the
>>>> terminating process.
>>>
>>> We could consider reordering that once more. Going to play out all
>>> scenarios in my head over the weekend :)
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>
>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180809/8a11cb04/attachment-0001.html>
More information about the dri-devel
mailing list