[PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

Koenig, Christian Christian.Koenig at amd.com
Wed Nov 7 08:48:06 UTC 2018


> is it prepared for PRT (or something like kernel page fault handling 
> on CPU/MMU side)?
That is for providing shared virtual address space (e.g. when the CPU 
and GPU have the same VA view) as well as changing our memory management 
in general.

> For SRIOV, in theoretically any feature*not* related with hardware scheduling (MES) or OS preemption (buggy with world switch preemption) is welcome to SR-IOV, no reason
> Not to support it as far as I know, unless not mature enough to enable it
The problem is that recoverable page faults in Vega10 are incompatible 
with SRIOV because a page fault can block the GPU for an undefined 
amount of time and Vega10 can't schedule those away from the hardware.

So the shader thread is blocked and can't be switched away. Under SRIOV 
that would mean that we just get killed by the hypervisor rather soon.

Christian.

Am 07.11.18 um 09:40 schrieb Liu, Monk:
> Hi Christian
>
> Thanks for sharing,
> Do you further know why we need recoverable page faults ? is it prepared for PRT (or something like kernel page fault handling on CPU/MMU side)?
>
> For SRIOV, in theoretically any feature*not* related with hardware scheduling (MES) or OS preemption (buggy with world switch preemption) is welcome to SR-IOV, no reason
> Not to support it as far as I know, unless not mature enough to enable it
>
> /Monk
>
> -----Original Message-----
> From: Koenig, Christian
> Sent: Wednesday, November 7, 2018 3:30 PM
> To: Liu, Monk <Monk.Liu at amd.com>; Zhang, Jerry <Jerry.Zhang at amd.com>; Huang, Trigger <Trigger.Huang at amd.com>; amd-gfx at lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
>
> Hi guys,
>
> this is necessary for recoverable page fault handling.
>
> When the normal SDMA queue is blocked because of a page fault the SDMA firmware will switch to the paging queue so that we are able to handle the fault.
>
> In general it should work on all Vega (but not Raven) components and we are going to need it when we enable recoverable page faults.
>
> The only case I can see where we don't immediately need it is SRIOV, because the current planning is to not support recoverable page faults there.
>
> Christian.
>
> Am 07.11.18 um 08:21 schrieb Liu, Monk:
>> Hi team
>>
>> Why we need this page_queue in amdgpu ?  can anyone share something of its introduction to the kmd ?
>> According to my understanding , gpu-scheduler already have couple levels of priority for contexts/entities , thus the job page_queue supposed to do (should be mapping/unmapping/moving) is already good took care of by "KERNEL" priority entities, and all other context/entity SDMA jobs will be handled after "KERNEL" jobs ...
>>
>> So there is no real benefit to introduce page_queue (also for rlc_queue) to amdgpu with the existence of priority aware gpu-scheduler ... unless we are going to remove the "KERNEL" priority and always do the mapping/unmapping in page_queue ...
>>
>> /Monk
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> Zhang, Jerry(Junwei)
>> Sent: Wednesday, November 7, 2018 1:26 PM
>> To: Huang, Trigger <Trigger.Huang at amd.com>;
>> amd-gfx at lists.freedesktop.org; Deucher, Alexander
>> <Alexander.Deucher at amd.com>; Koenig, Christian
>> <Christian.Koenig at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV
>> VF
>>
>> On 11/7/18 1:15 PM, Trigger Huang wrote:
>>> Currently, SDMA page queue is not used under SR-IOV VF, and this
>>> queue will cause ring test failure in amdgpu module reload case. So just disable it.
>>>
>>> Signed-off-by: Trigger Huang <Trigger.Huang at amd.com>
>> Looks we ran into several issues about it on vega.
>> kfd also disabled vega10 for development.(but not sure the detail
>> issue for them)
>>
>> Thus, we may disable it for vega10 as well?
>> any comment? Alex, Christian, Flex.
>>
>> Regards,
>> Jerry
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>>     1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index e39a09eb0f..4edc848 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>>     		adev->sdma.has_page_queue = false;
>>>     	} else {
>>>     		adev->sdma.num_instances = 2;
>>> -		if (adev->asic_type != CHIP_VEGA20 &&
>>> +		if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
>>> +			adev->sdma.has_page_queue = false;
>>> +		else if (adev->asic_type != CHIP_VEGA20 &&
>>>     				adev->asic_type != CHIP_VEGA12)
>>>     			adev->sdma.has_page_queue = true;
>>>     	}
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list