[PATCH 2/8] drm/amdgpu: fix sdma v4 startup under SRIOV

Koenig, Christian Christian.Koenig at amd.com
Tue Oct 9 13:03:17 UTC 2018


Hi Monk,

well that doesn't make much sense to me what you say here cause context 
switching certainly is already enabled under SRIOV:

> -               if (amdgpu_sriov_vf(adev)) { /* bare-metal sequence 
> doesn't need below to lines */
> -                       sdma_v4_0_ctx_switch_enable(adev, true);
> -                       sdma_v4_0_enable(adev, true);
> -               }

The problem is that context switching as well as the gfx ring is enabled 
for both SDMA0 and SDMA1 without initializing SDMA1.

That's most likely causing some unwanted consequences.

Christian.

Am 09.10.2018 um 13:45 schrieb Liu, Monk:
> Context switch is for preemption across different queues (gfx, rlc0/1, page) under bare-metal environment,
> For SRIOV we didn't need it and we didn't test it yet, so we just disable it to make life easier, besides since each VF share only 6 MS slice there is in fact no benefit to enable it for SRIOV ...
>
> + @Ma, Sigil to confirm
>
> Hi Sigil
>
> Do you think context switch could be enabled for SRIOV VF ?? I worry that the context switch have internal crush with preemption for world switch , thanks !
>
> /Monk
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: Tuesday, October 9, 2018 6:57 PM
> To: Huang, Ray <Ray.Huang at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Min, Frank <Frank.Min at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 2/8] drm/amdgpu: fix sdma v4 startup under SRIOV
>
> Am 09.10.2018 um 11:17 schrieb Huang Rui:
>> On Mon, Oct 08, 2018 at 03:35:15PM +0200, Christian König wrote:
>>> [SNIP]
>>> -	if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
>>> -		r = sdma_v4_0_load_microcode(adev);
>>> +	/* start the gfx rings and rlc compute queues */
>>> +	for (i = 0; i < adev->sdma.num_instances; i++)
>>> +		sdma_v4_0_gfx_resume(adev, i);
>>> +
>>> +	if (amdgpu_sriov_vf(adev)) {
>>> +		sdma_v4_0_ctx_switch_enable(adev, true);
>>> +		sdma_v4_0_enable(adev, true);
>>> +	} else {
>>> +		r = sdma_v4_0_rlc_resume(adev);
>>>    		if (r)
>>>    			return r;
>>>    	}
>> + Monk, Frank,
>>
>> I probably cannot judge here, under SRIOV, I saw you disable ctx
>> switch before. Do you have any concern if we enabled it here.
> The problem was that those calls where mixed into sdma_v4_0_gfx_resume() for the first SDMA instance.
>
> What was happening is that SDMA0 was initialized and while doing so enabled both SDMA0 and SDMA1. So SDMA1 was starting up before the ring buffer was even set.
>
> That this doesn't crashed was pure coincident and is most likely also the reason why we ran into problems when ring buffers weren't initialized.
>
> Regards,
> Christian.
>
>> Others, looks good for me. Christian, may we know which kind of jobs
>> will use sdma page queue(ring), you know, we just sdma gfx queue(ring) before?
>>
>> Thanks,
>> Ray
>>



More information about the amd-gfx mailing list