[PATCH 2/2] drm/scheduler: stop setting rq to NULL

Christian König ckoenig.leichtzumerken at gmail.com
Mon Aug 6 08:14:53 UTC 2018


Am 03.08.2018 um 20:41 schrieb Andrey Grodzovsky:
>
>
>
> On 08/06/2018 08:44 AM, Christian König wrote:
>> Am 03.08.2018 um 16:54 schrieb Andrey Grodzovsky:
>>>
>>> [SNIP]
>>>
>>>>>
>>>>>>
>>>>>>     >
>>>>>>     > Second of all, even after we removed the entity from rq in
>>>>>>     > drm_sched_entity_flush to terminate any subsequent submissions
>>>>>>     >
>>>>>>     > to the queue the other thread doing push job can just
>>>>>>     acquire again a
>>>>>>     > run queue
>>>>>>     >
>>>>>>     > from drm_sched_entity_get_free_sched and continue submission
>>>>>>
>>>>>> Hi Christian
>>>>>>
>>>>>>     That is actually desired.
>>>>>>
>>>>>>     When another process is now using the entity to submit jobs
>>>>>>     adding it
>>>>>>     back to the rq is actually the right thing to do cause the
>>>>>>     entity is
>>>>>>     still in use.
>>>>>>
>>>>>
>>>>> Yes, no problem if it's another process. But what about another 
>>>>> thread from same process ? Is it a possible use case that 2 
>>>>> threads from same process submit to same entity concurrently ? If 
>>>>> so and we specifically kill one, the other will not stop event if 
>>>>> we want him to because current code makes him just require a rq 
>>>>> for him self.
>>>>
>>>> Well you can't kill a single thread of a process (you can only 
>>>> interrupt it), a SIGKILL will always kill the whole process.
>>>
>>> Is the following scenario possible and acceptable ?
>>> 2 threads from same process working on same queue where thread A 
>>> currently in drm_sched_entity_flush->wait_event_timeout (the process 
>>> getting shut down because of SIGKILL sent by user)
>>> while thread B still inside drm_sched_entity_push_job before 'if 
>>> (reschedule)'. 'A' stopped waiting because queue became empty and 
>>> then removes the entity queue from scheduler's run queue while
>>> B goes inside 'reschedule' because it evaluates to true ('first' is 
>>> true and all the rest of the conditions), acquires new rq, and later 
>>> adds it back to scheduler (different one maybe) and keeps submitting 
>>> jobs as much as he likes and then can be stack for up to 'timeout' 
>>> time  in his drm_sched_entity_flush waiting for them.
>>
>> I'm not 100% sure but I don't think that can happen.
>>
>> See flushing the fd is done while dropping the fd, which happens only 
>> after all threads of the process in question are killed.
>
> Yea, this FDs handling is indeed a lot of gray area for me but as far 
> as I remember flushing is done per each thread when exits (possibly 
> due to a signal).
> Now signals interception and processing (as a result of which .flush 
> will get called if SIGKILL received) is done in some points amongst 
> which is when returning from IOCTL.
> So if first thread was at the very end of the CS ioctl when SIGKILL 
> was received while the other one at the beginning  then I think we 
> might see something like the scenario above.

SIGKILL isn't processed as long as any thread of the application is 
still inside the kernel. That's why we have wait_event_killable().

So I don't think that the scenario above is possible, but I'm really not 
100% sure either.

Christian.

>
> Andrey
>
>>
>> Otherwise the flushing wouldn't make to much sense. In other words 
>> imagine an application where a thread does a write() on a fd which is 
>> killed.
>>
>> The idea of the flush is to preserve the data and that won't work if 
>> that isn't correctly ordered.
>>
>>> My understanding was that introduction of entity->last is to force 
>>> immediate termination job submissions by any thread from the 
>>> terminating process.
>>
>> We could consider reordering that once more. Going to play out all 
>> scenarios in my head over the weekend :)
>>
>> Christian.
>>
>>>
>>> Andrey
>>
>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180806/9fd8bfb5/attachment-0001.html>


More information about the dri-devel mailing list