[PATCH 2/2] drm/scheduler: stop setting rq to NULL

Christian König christian.koenig at amd.com
Mon Aug 6 12:44:06 UTC 2018


Am 03.08.2018 um 16:54 schrieb Andrey Grodzovsky:
>
> [SNIP]
>
>>>
>>>>
>>>>     >
>>>>     > Second of all, even after we removed the entity from rq in
>>>>     > drm_sched_entity_flush to terminate any subsequent submissions
>>>>     >
>>>>     > to the queue the other thread doing push job can just acquire
>>>>     again a
>>>>     > run queue
>>>>     >
>>>>     > from drm_sched_entity_get_free_sched and continue submission
>>>>
>>>> Hi Christian
>>>>
>>>>     That is actually desired.
>>>>
>>>>     When another process is now using the entity to submit jobs
>>>>     adding it
>>>>     back to the rq is actually the right thing to do cause the
>>>>     entity is
>>>>     still in use.
>>>>
>>>
>>> Yes, no problem if it's another process. But what about another 
>>> thread from same process ? Is it a possible use case that 2 threads 
>>> from same process submit to same entity concurrently ? If so and we 
>>> specifically kill one, the other will not stop event if we want him 
>>> to because current code makes him just require a rq for him self.
>>
>> Well you can't kill a single thread of a process (you can only 
>> interrupt it), a SIGKILL will always kill the whole process.
>
> Is the following scenario possible and acceptable ?
> 2 threads from same process working on same queue where thread A 
> currently in drm_sched_entity_flush->wait_event_timeout (the process 
> getting shut down because of SIGKILL sent by user)
> while thread B still inside drm_sched_entity_push_job before 'if 
> (reschedule)'. 'A' stopped waiting because queue became empty and then 
> removes the entity queue from scheduler's run queue while
> B goes inside 'reschedule' because it evaluates to true ('first' is 
> true and all the rest of the conditions), acquires new rq, and later 
> adds it back to scheduler (different one maybe) and keeps submitting 
> jobs as much as he likes and then can be stack for up to 'timeout' 
> time  in his drm_sched_entity_flush waiting for them.

I'm not 100% sure but I don't think that can happen.

See flushing the fd is done while dropping the fd, which happens only 
after all threads of the process in question are killed.

Otherwise the flushing wouldn't make to much sense. In other words 
imagine an application where a thread does a write() on a fd which is 
killed.

The idea of the flush is to preserve the data and that won't work if 
that isn't correctly ordered.

> My understanding was that introduction of entity->last is to force 
> immediate termination job submissions by any thread from the 
> terminating process.

We could consider reordering that once more. Going to play out all 
scenarios in my head over the weekend :)

Christian.

>
> Andrey

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180806/98cd88e8/attachment.html>


More information about the dri-devel mailing list