[RFC] Exclusive gpu access for SteamVR usecases

Wed May 31 18:46:34 UTC 2017

On 2017-05-31 02:53 AM, Christian König wrote:
>>
>>> 2. How are the priorities from processes supposed to interact with 
>>> the per context priority?
>>
>> Do you mean process niceness?
>>
>> There isn't any relationship between niceness and gpu priority.
>>
>> Let me know if you meant something different here. 
> I meant something different.
> 
> The application controls the per context priority when creating the 
> context and the compositor should control the per process priority.
> 
> Those two needs to be handled separately, otherwise we would could 
> override the context priority from the compositor and so confuse the 
> application.
> 
> I suggest to cleanly separate the two.
> 

Would you then b be okay with having a list of amdgpu_process data 
structures stored in adev? The list would be searchable by struct pid.

However, I'm not 100% convinced on having two separate priorities. It 
duplicates the concept and we need to introduce relationship semantics 
between the two.

Alternatively, since the priority patches still aren't part of drm-next 
we still have a chance make some changes there. If we make the priority 
requests reference counted we should be able to remember any old requests.

>> (ii) Job dependencies between two processes.
>>
>> This case is mentioned separately as it is probably the most common 
>> use case we will encounter for this feature. Most graphics 
>> applications enter producer/consumer relationship with the compositor 
>> process (window swapchain).
>>
>> In this case the compositor should already have all the information 
>> required to avoid a deadlock.
> 
> That is nonsense. The kernel is the supervisor of resource management, 
> so only the kernel knows how to avoid a deadlock.

> Let's imagine the following example:
> Process A is a low priority task (for example updating the clock bitmap) 
> which needs a resource for it's command submission.
> Process B is a high priority task (rendering of the VR) which needs a 
> bunch of memory for it's command submission.
> 
> Now the kernel memory management decides that it needs to evict process 
> A from VRAM to make room for the command submission of process B. To do 
> this all command submissions of process A need to finish.
> 
> In this moment the compositor hands over exclusive access to process B 
> and never gives process A a chance to run.
> 
> Now B depends on A, but A can never run because B has exclusive access 
> -> deadlock.
> 
> We somehow need to handle this inside the kernel or this whole approach 
> won't work.
> 

Thanks for pointing that out. I thought cases like these would work okay 
since we always allow PRIORITY_KERNEL work to execute. But as you 
pointed out, I overlooked the dependency that is created once the 
command buffers have their final memory addresses attached.

Let me read and think about this a bit more.

Regards,
Andres

> Regards,
> Christian.
> 
> Am 30.05.2017 um 23:38 schrieb Andres Rodriguez:
>>
>>
>> On 2017-05-30 11:19 AM, Christian König wrote:
>>> Looks like a good start, but a few notes in general:
>>>
>>> 1. Split the patches into two sets.
>>>
>>> One for implementing changing the priorities and one for limiting the 
>>> priorities.
>>>
>>
>> No problem.
>>
>>> 2. How are the priorities from processes supposed to interact with 
>>> the per context priority?
>>
>> Do you mean process niceness?
>>
>> There isn't any relationship between niceness and gpu priority.
>>
>> Let me know if you meant something different here.
>>
>>>
>>> 3. Thinking more about it we can't limit the minimum priority in the 
>>> scheduler.
>>>
>>> For example a low priority job might block resources the high 
>>> priority job needs to run. E.g. VRAM memory.
>>>
>>
>> We avoid deadlocks by making sure that all dependencies of an 
>> exclusive task are also elevated to the same priority as said task. 
>> Usermode (the DRM_MASTER) is responsible to maintain this guarantee. 
>> The kernel does provide an ioctl that makes this task simple, 
>> amdgpu_sched_process_priority_set().
>>
>> Lets take a look at this issue through three different scenarios.
>>
>> (i) Job dependencies are all process internal, i.e. multiple contexts 
>> in one process.
>>
>> This is the trivial case. A call to 
>> amdgpu_sched_process_priority_set() will change the priority of all 
>> contexts belonging to a process in lockstep.
>>
>> Once amdgpu_sched_process_priority_set() returns, it is safe to raise 
>> the minimum priority using amdgpu_sched_min_priority_get(). At this 
>> point we have a guarantee that all contexts belonging to the process 
>> will be in a runnable state, or all the contexts will be in a 
>> not-runnable state. There won't be a mix of runnable and non-runnable 
>> processes.
>>
>> Getting into that mixed state is what could cause a deadlock, a 
>> runnable context depends on a non-runnable one.
>>
>> Note: the current patchset needs a fix to provide this guarantee in 
>> multi-gpu systems.
>>
>> (ii) Job dependencies between two processes.
>>
>> This case is mentioned separately as it is probably the most common 
>> use case we will encounter for this feature. Most graphics 
>> applications enter producer/consumer relationship with the compositor 
>> process (window swapchain).
>>
>> In this case the compositor should already have all the information 
>> required to avoid a deadlock. It knows:
>>   - Itself (as a process)
>>   - The application process
>>   - The dependencies between both processes
>>
>> At this stage it is simple for the compositor to understand that if it 
>> wishes to perform an exclusive mode transition, all dependencies 
>> (which are known) should also be part of the exclusive group.
>>
>> We should be able to implement this feature without modifying a 
>> game/application.
>>
>> (iii) Job dependencies between multiple (3+) processes.
>>
>> This scenario is very uncommon for games. For example, if a game or 
>> application is split into multiple processes. Process A interacts with 
>> the compositor. Process B does some physics/compute calculations and 
>> send the results to Process A.
>>
>> To support this use case, we would require an interface for the 
>> application to communicate to the compositor its dependencies. I.e. 
>> Process A would say, "Also keep Process B's priority in sync with 
>> mine". This should be a simple bit of plumbing to allow Process A to 
>> share an fd from Process B with the compositor.
>>
>> B --[pipe_send(fdB)]--> A 
>> --[compositor_ext_priority_group_add(fdB)]--> Compositor
>>
>> Once the compositor is aware of all of A's dependencies, this can be 
>> handled in the same fashion as (ii).
>>
>> A special extension would be required for compositor protocols to 
>> communicate the dependencies fd. Applications would also need to be 
>> updated to use this extension.
>>
>> I think this case would be very uncommon. But it is something that we 
>> would be able to handle if the need would arise.
>>
>> > We need something like blocking the submitter instead (bad) or 
>> detection
>> > of dependencies in the scheduler (good, but tricky to implement).
>> >
>>
>> I definitely agree that detecting dependencies is tricky. Which is why 
>> I prefer an approach where usermode defines the dependencies. It is 
>> simple for both the kernel and usermode to implement.
>>
>> > Otherwise we can easily run into a deadlock situation with that 
>> approach.
>> >
>>
>> The current API does allow you to deadlock yourself pretty easily if 
>> misused. But so do many other APIs, like having a thread trying to 
>> grab the same lock twice :)
>>
>> Thanks for the comments,
>> Andres
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 25.05.2017 um 02:00 schrieb Andres Rodriguez:
>>>> When multiple environments are running simultaneously on a system, e.g.
>>>> an X desktop + a SteamVR game session, it may be useful to sacrifice
>>>> performance in one environment in order to boost it on the other.
>>>>
>>>> This series provides a mechanism for a DRM_MASTER to provide exclusive
>>>> gpu access to a group of processes.
>>>>
>>>> Note: This series is built on the assumption that the drm lease 
>>>> patch series
>>>> will extend DRM_MASTER status to lesees.
>>>>
>>>> The libdrm we intend to provide is as follows:
>>>>
>>>> /**
>>>>   * Set the priority of all contexts in a process
>>>>   *
>>>>   * This function will change the priority of all contexts owned by
>>>>   * the process identified by fd.
>>>>   *
>>>>   * \param dev             - \c [in] device handle
>>>>   * \param fd              - \c [in] fd from target process
>>>>   * \param priority        - \c [in] target priority 
>>>> AMDGPU_CTX_PRIORITY_*
>>>>   *
>>>>   * \return  0 on success\n
>>>>   *         <0 - Negative POSIX error code
>>>>   *
>>>>   * \notes @fd can be *any* file descriptor from the target process.
>>>>   * \notes this function requires DRM_MASTER
>>>>   */
>>>> int amdgpu_sched_process_priority_set(amdgpu_device_handle dev,
>>>>                       int fd, int32_t priority);
>>>>
>>>> /**
>>>>   * Request to raise the minimum required priority to schedule a gpu 
>>>> job
>>>>   *
>>>>   * Submit a request to increase the minimum required priority to 
>>>> schedule
>>>>   * a gpu job. Once this function returns, the gpu scheduler will no 
>>>> longer
>>>>   * consider jobs from contexts with priority lower than @priority.
>>>>   *
>>>>   * The minimum priority considered by the scheduler will be the 
>>>> highest from
>>>>   * all currently active requests.
>>>>   *
>>>>   * Requests are refcounted, and must be balanced using
>>>>   * amdgpu_sched_min_priority_put()
>>>>   *
>>>>   * \param dev             - \c [in] device handle
>>>>   * \param priority        - \c [in] target priority 
>>>> AMDGPU_CTX_PRIORITY_*
>>>>   *
>>>>   * \return  0 on success\n
>>>>   *         <0 - Negative POSIX error code
>>>>   *
>>>>   * \notes this function requires DRM_MASTER
>>>>   */
>>>> int amdgpu_sched_min_priority_get(amdgpu_device_handle dev,
>>>>                   int32_t priority);
>>>>
>>>> /**
>>>>   * Drop a request to raise the minimum required scheduler priority
>>>>   *
>>>>   * This call balances amdgpu_sched_min_priority_get()
>>>>   *
>>>>   * If no other active requests exists for @priority, the minimum 
>>>> required
>>>>   * priority will decay to a lower level until one is reached with 
>>>> an active
>>>>   * request or the lowest priority is reached.
>>>>   *
>>>>   * \param dev             - \c [in] device handle
>>>>   * \param priority        - \c [in] target priority 
>>>> AMDGPU_CTX_PRIORITY_*
>>>>   *
>>>>   * \return  0 on success\n
>>>>   *         <0 - Negative POSIX error code
>>>>   *
>>>>   * \notes this function requires DRM_MASTER
>>>>   */
>>>> int amdgpu_sched_min_priority_put(amdgpu_device_handle dev,
>>>>                   int32_t priority);
>>>>
>>>> Using this app, VRComposer can raise the priority of the VRapp and 
>>>> itself. Then
>>>> it can restrict the minimum scheduler priority in order to become 
>>>> exclusive gpu
>>>> clients.
>>>>
>>>> One of the areas I'd like feedback is the following scenario. If a 
>>>> VRapp opens
>>>> a new fd and creates a new context after a call to set_priority, 
>>>> this specific
>>>> context will be lower priority than the rest. If the minimum 
>>>> required priority
>>>> is then raised, it is possible that this new context will be starved 
>>>> and
>>>> deadlock the VRapp.
>>>>
>>>> One solution I had in mind to address this situation, is to make 
>>>> set_priority
>>>> also raise the priority of future contexts created by the VRapp. 
>>>> However, that
>>>> would require keeping track of the requested priority on a 
>>>> per-process data
>>>> structure. The current design appears to steer clean of keeping any 
>>>> process
>>>> specific data, and everything instead of stored on a per-file basis. 
>>>> Which is
>>>> why I did not pursue this approach. But if this is something you'd 
>>>> like me to
>>>> implement let me know.
>>>>
>>>> One could also argue that preventing an application deadlock should 
>>>> be handled
>>>> between the VRComposer and the VRApp. It is not the kernel's 
>>>> responsibility to
>>>> babysit userspace applications and prevent themselves from shooting 
>>>> themselves
>>>> in the foot. The same could be achieved by improper usage of shared 
>>>> fences
>>>> between processes.
>>>>
>>>> Thoughts/feedback/comments on this issue, or others, are appreciated.
>>>>
>>>> Regards,
>>>> Andres
>>>>
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
>