[PATCH 2/3] drm/amdgpu: expose only the first UVD instance for now

Wed Jul 18 16:38:14 UTC 2018

Am 18.07.2018 um 17:58 schrieb Alex Deucher:
> On Wed, Jul 18, 2018 at 10:29 AM, Christian König
> <christian.koenig at amd.com> wrote:
>> Am 18.07.2018 um 16:22 schrieb James Zhu:
>>>
>>>
>>> On 2018-07-18 10:11 AM, Christian König wrote:
>>>> Am 18.07.2018 um 15:40 schrieb James Zhu:
>>>>>
>>>>> On 2018-07-18 08:55 AM, Christian König wrote:
>>>>>> Going to completely rework the context to ring mapping with Nayan's
>>>>>> GSoC
>>>>>> work, but for now just stopping to expose the second UVD instance
>>>>>> should
>>>>>> do it.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 13 +++++--------
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c |  9 ++-------
>>>>>>    2 files changed, 7 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> index e451a3f25beb..75da3c41f3b3 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> @@ -286,7 +286,7 @@ static int amdgpu_info_ioctl(struct drm_device
>>>>>> *dev, void *data, struct drm_file
>>>>>>        struct drm_crtc *crtc;
>>>>>>        uint32_t ui32 = 0;
>>>>>>        uint64_t ui64 = 0;
>>>>>> -    int i, j, found;
>>>>>> +    int i, found;
>>>>>>        int ui32_size = sizeof(ui32);
>>>>>>          if (!info->return_size || !info->return_pointer)
>>>>>> @@ -348,8 +348,7 @@ static int amdgpu_info_ioctl(struct drm_device
>>>>>> *dev, void *data, struct drm_file
>>>>>>                break;
>>>>>>            case AMDGPU_HW_IP_UVD:
>>>>>>                type = AMD_IP_BLOCK_TYPE_UVD;
>>>>>> -            for (i = 0; i < adev->uvd.num_uvd_inst; i++)
>>>>>> -                ring_mask |= adev->uvd.inst[i].ring.ready << i;
>>>>>> +            ring_mask |= adev->uvd.inst[0].ring.ready;
>>>>>>                ib_start_alignment = 64;
>>>>>>                ib_size_alignment = 64;
>>>>>>                break;
>>>>>> @@ -362,11 +361,9 @@ static int amdgpu_info_ioctl(struct drm_device
>>>>>> *dev, void *data, struct drm_file
>>>>>>                break;
>>>>>>            case AMDGPU_HW_IP_UVD_ENC:
>>>>>>                type = AMD_IP_BLOCK_TYPE_UVD;
>>>>>> -            for (i = 0; i < adev->uvd.num_uvd_inst; i++)
>>>>>> -                for (j = 0; j < adev->uvd.num_enc_rings; j++)
>>>>>> -                    ring_mask |=
>>>>>> -                    adev->uvd.inst[i].ring_enc[j].ready <<
>>>>>> -                    (j + i * adev->uvd.num_enc_rings);
>>>>>> +            for (i = 0; i < adev->uvd.num_enc_rings; i++)
>>>>>> +                ring_mask |=
>>>>>> +                    adev->uvd.inst[0].ring_enc[i].ready << i;
>>>>>>                ib_start_alignment = 64;
>>>>>>                ib_size_alignment = 64;
>>>>>>                break;
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
>>>>>> index ea9850c9224d..d8357290ad09 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
>>>>>> @@ -66,8 +66,6 @@ static int amdgpu_identity_map(struct amdgpu_device
>>>>>> *adev,
>>>>>>                       u32 ring,
>>>>>>                       struct amdgpu_ring **out_ring)
>>>>>>    {
>>>>>> -    u32 instance;
>>>>>> -
>>>>>>        switch (mapper->hw_ip) {
>>>>>>        case AMDGPU_HW_IP_GFX:
>>>>>>            *out_ring = &adev->gfx.gfx_ring[ring];
>>>>>> @@ -79,16 +77,13 @@ static int amdgpu_identity_map(struct amdgpu_device
>>>>>> *adev,
>>>>>>            *out_ring = &adev->sdma.instance[ring].ring;
>>>>>>            break;
>>>>>>        case AMDGPU_HW_IP_UVD:
>>>>>> -        instance = ring;
>>>>>> -        *out_ring = &adev->uvd.inst[instance].ring;
>>>>>> +        *out_ring = &adev->uvd.inst[0].ring;
>>>>>>            break;
>>>>>>        case AMDGPU_HW_IP_VCE:
>>>>>>            *out_ring = &adev->vce.ring[ring];
>>>>>>            break;
>>>>>>        case AMDGPU_HW_IP_UVD_ENC:
>>>>>> -        instance = ring / adev->uvd.num_enc_rings;
>>>>>> -        *out_ring =
>>>>>> - &adev->uvd.inst[instance].ring_enc[ring%adev->uvd.num_enc_rings];
>>>>>> +        *out_ring = &adev->uvd.inst[0].ring_enc[ring];
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> If here uses amdgpu_identity_map, I think only first instance's ring is
>>>>> in use.
>>>>>
>>>>> In current Vega20 situation, should we change to use amdgpu_lru_map or
>>>>> something else to schedule which instance/ring for use ?
>>>>
>>>> Yes and no. I'm going to completely remove the queue manager and replace
>>>> it with Nayan's work on the scheduler.
>>>>
>>>> So no more lru map, but instead a function which picks the best ring
>>>> based on actual load during context creation.
>>>>
>>>> Regards,
>>>> Christian.
>>> Got it. So do we totally remove instance and ring concept out of user
>>> space scope?
>>
>> Yes, that's at least the idea.
> I think we still may want to expose multiple queues/rings on an
> instance to allow userspace to make better decisions about
> parallelism.  E.g., if the UMD knows there are two encode rings and it
> wants to run two encode tasks in parallel, it should be possible.
> Then the scheduler can deal with scheduling across instances
> optimally.

Well it depends.

I currently plan to expose the following engines to userspace 
independent of available instances for each block.

* 1 GFX ring (unchanged)
* 4 Compute rings (8 previously)
* 1 UVD decode ring (unchanged)
* 1 VCE encode ring (2/3 previously, but that doesn't make sense cause 
they have different priorities).

When userspace wants to handle multiple streams at the same time it 
should create multiple context anyway, so even with just 1 ring for each 
engine we should be on the save side for parallelism.

Christian.

>
> Alex
>
>
>>> This design is good for use, but it lost flexibility on fine resource
>>> management. It may be requested by cgroup's use case in the future.
>>
>> I actually do this to allow simpler cgroup implementation on top of this.
>>
>> In other words cgroup can still be used to limit certain applications to
>> certain hardware engines and/or sw priorities, etc...
>>
>> Regards,
>> Christian.
>>
>>
>>> Regards!
>>> James
>>>>
>>>>> James
>>>>>
>>>>>>            break;
>>>>>>        case AMDGPU_HW_IP_VCN_DEC:
>>>>>>            *out_ring = &adev->vcn.ring_dec;
>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx