[PATCH 2/8] drm/amdgpu: fix sdma v4 startup under SRIOV

Koenig, Christian Christian.Koenig at amd.com
Fri Oct 12 14:27:18 UTC 2018


Great, can I get an rb or acked-by for the patch in this case?

Thanks,
Christian.

Am 10.10.2018 um 09:52 schrieb Liu, Monk:
> Thanks Sigil
>
> Hi Christian
>
> Looks we can enable/disable ctx-switch for SDMA at will, no dependency or conflict on SRIOV
>
> /Monk
>
> -----Original Message-----
> From: Ma, Sigil
> Sent: Wednesday, October 10, 2018 3:25 PM
> To: Liu, Monk <Monk.Liu at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Min, Frank <Frank.Min at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: RE: [PATCH 2/8] drm/amdgpu: fix sdma v4 startup under SRIOV
>
> Hi Monk,
>
> AUTO_CTXSW_ENABLE is not relevant to worldswitch preemption. it only applies for ring buffer preemption. SDMA will do worldswitch whatever AUTO_CTXSW_ENABLE is 1 or 0.
>
> -----Original Message-----
> From: Liu, Monk
> Sent: Wednesday, October 10, 2018 2:54 PM
> To: Koenig, Christian <Christian.Koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Min, Frank <Frank.Min at amd.com>; Ma, Sigil <Sigil.Ma at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: RE: [PATCH 2/8] drm/amdgpu: fix sdma v4 startup under SRIOV
>
> Oh, that mean I remember it reversed way, according to code looks we need to enable ctx_switch to support WORLD SWITCH for SDMA engine
>
> But better let Sigil confirm it  ...
>
> Hi @Ma, Sigil can you confirm it ? what's the relationship between ctx_swich and world swich for SDMA engines ?
>
> Ctx_switch_enable() will set "SDMA0/1_CNTL's field: AUTO_CTXSW_ENABLE" to 1, can you tell us what's it for and how it go with SRIOV world switch ?
>
> Thanks
>
> /Monk
>
> -----Original Message-----
> From: Koenig, Christian
> Sent: Tuesday, October 9, 2018 9:03 PM
> To: Liu, Monk <Monk.Liu at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Min, Frank <Frank.Min at amd.com>; Ma, Sigil <Sigil.Ma at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 2/8] drm/amdgpu: fix sdma v4 startup under SRIOV
>
> Hi Monk,
>
> well that doesn't make much sense to me what you say here cause context switching certainly is already enabled under SRIOV:
>
>> -               if (amdgpu_sriov_vf(adev)) { /* bare-metal sequence
>> doesn't need below to lines */
>> -                       sdma_v4_0_ctx_switch_enable(adev, true);
>> -                       sdma_v4_0_enable(adev, true);
>> -               }
> The problem is that context switching as well as the gfx ring is enabled for both SDMA0 and SDMA1 without initializing SDMA1.
>
> That's most likely causing some unwanted consequences.
>
> Christian.
>
> Am 09.10.2018 um 13:45 schrieb Liu, Monk:
>> Context switch is for preemption across different queues (gfx, rlc0/1,
>> page) under bare-metal environment, For SRIOV we didn't need it and we didn't test it yet, so we just disable it to make life easier, besides since each VF share only 6 MS slice there is in fact no benefit to enable it for SRIOV ...
>>
>> + @Ma, Sigil to confirm
>>
>> Hi Sigil
>>
>> Do you think context switch could be enabled for SRIOV VF ?? I worry that the context switch have internal crush with preemption for world switch , thanks !
>>
>> /Monk
>>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Tuesday, October 9, 2018 6:57 PM
>> To: Huang, Ray <Ray.Huang at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Min,
>> Frank <Frank.Min at amd.com>
>> Cc: amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH 2/8] drm/amdgpu: fix sdma v4 startup under SRIOV
>>
>> Am 09.10.2018 um 11:17 schrieb Huang Rui:
>>> On Mon, Oct 08, 2018 at 03:35:15PM +0200, Christian König wrote:
>>>> [SNIP]
>>>> -	if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
>>>> -		r = sdma_v4_0_load_microcode(adev);
>>>> +	/* start the gfx rings and rlc compute queues */
>>>> +	for (i = 0; i < adev->sdma.num_instances; i++)
>>>> +		sdma_v4_0_gfx_resume(adev, i);
>>>> +
>>>> +	if (amdgpu_sriov_vf(adev)) {
>>>> +		sdma_v4_0_ctx_switch_enable(adev, true);
>>>> +		sdma_v4_0_enable(adev, true);
>>>> +	} else {
>>>> +		r = sdma_v4_0_rlc_resume(adev);
>>>>     		if (r)
>>>>     			return r;
>>>>     	}
>>> + Monk, Frank,
>>>
>>> I probably cannot judge here, under SRIOV, I saw you disable ctx
>>> switch before. Do you have any concern if we enabled it here.
>> The problem was that those calls where mixed into sdma_v4_0_gfx_resume() for the first SDMA instance.
>>
>> What was happening is that SDMA0 was initialized and while doing so enabled both SDMA0 and SDMA1. So SDMA1 was starting up before the ring buffer was even set.
>>
>> That this doesn't crashed was pure coincident and is most likely also the reason why we ran into problems when ring buffers weren't initialized.
>>
>> Regards,
>> Christian.
>>
>>> Others, looks good for me. Christian, may we know which kind of jobs
>>> will use sdma page queue(ring), you know, we just sdma gfx queue(ring) before?
>>>
>>> Thanks,
>>> Ray
>>>



More information about the amd-gfx mailing list