[PATCH 2/2] drm/amdgpu/virt: schedule work to handle vm fault for VF
Ding, Pixel
Pixel.Ding at amd.com
Tue Feb 7 05:30:49 UTC 2017
Hi David/Christian,
Tested the proposed method which prints warning only, there’s no problem. I will send another review because it’s a totally different change.
—
Sincerely Yours,
Pixel
On 06/02/2017, 4:56 PM, "Christian König" <deathsimple at vodafone.de> wrote:
>Hi Pixel,
>
>yeah agree with David here, but even busy waiting for the KIQ register
>read is not really an option in the interrupt handler.
>
>Additional to that when we have a VM fault we usually see a mass storm
>of them. So allocating and scheduling a work item for each fault like
>you do here will certainly only result in a whole system crash.
>
>For now I would say just use DRM_ERROR() to print a warning that a VM
>fault happen and we can't decode it because we can't access the register
>under SRIOV.
>
>Regards,
>Christian.
>
>Am 06.02.2017 um 09:36 schrieb zhoucm1:
>> Hi Pixel,
>> I got your mean just now, since your VF must use KIQ to read/write
>> registers, which use fence_wait to wait reading register completed.
>> The alternative way is implementing a new kiq reading/writing register
>> way by using udelay instead of fence wait when reading/writing
>> register in interrupt context.
>>
>> Regards,
>> David Zhou
>>
>> On 2017年02月06日 15:55, Ding, Pixel wrote:
>>> Thanks you for your comments, David. I totally agree on your point.
>>>
>>> However, The VM fault status registers record the latest VM fault
>>> info no matter when they’re changed, that’s what we care about since
>>> we don’t handle too much for VM fault even in bare metal system.
>>>
>>> On the other hand, what do you think if we insist to sleep in the
>>> interrupt context and let the driver runs into a software bug?
>>>
>>> The VM registers are shared among all VFs and we don’t have a copy
>>> for each in hardware, I think there's no other way to access them in
>>> this case. Do you have any suggestion?
>>>
>>> —
>>> Sincerely Yours,
>>> Pixel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 06/02/2017, 3:33 PM, "Zhou, David(ChunMing)" <David1.Zhou at amd.com>
>>> wrote:
>>>
>>>> INIT_WORK(&work->base, gmc_v8_0_vm_fault_sched);
>>>> However VF is used or not, schedule work shouldn't handle registers
>>>> reading for interrupt, especially for status register, which could
>>>> have been changed when you handle it in schedule work after interrupt.
>>>>
>>>> Regards,
>>>> David Zhou
>>>>
>>>> -----Original Message-----
>>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On
>>>> Behalf Of Pixel Ding
>>>> Sent: Monday, February 06, 2017 3:00 PM
>>>> To: amd-gfx at lists.freedesktop.org
>>>> Cc: Ding, Pixel <Pixel.Ding at amd.com>
>>>> Subject: [PATCH 2/2] drm/amdgpu/virt: schedule work to handle vm
>>>> fault for VF
>>>>
>>>> VF uses KIQ to access registers that invoking fence_wait to get the
>>>> accessing completed. When VM fault occurs, the driver can't sleep in
>>>> interrupt context.
>>>>
>>>> For some test cases, VM fault is 'legal' and shouldn't cause driver
>>>> soft lockup.
>>>>
>>>> Signed-off-by: Pixel Ding <Pixel.Ding at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 46
>>>> ++++++++++++++++++++++++++++++++---
>>>> 1 file changed, 43 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>> index 7669b32..75c913f 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>> @@ -1231,9 +1231,9 @@ static int
>>>> gmc_v8_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
>>>> return 0;
>>>> }
>>>>
>>>> -static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
>>>> - struct amdgpu_irq_src *source,
>>>> - struct amdgpu_iv_entry *entry)
>>>> +static int gmc_v8_0_process_vm_fault(struct amdgpu_device *adev,
>>>> + struct amdgpu_irq_src *source,
>>>> + struct amdgpu_iv_entry *entry)
>>>> {
>>>> u32 addr, status, mc_client;
>>>>
>>>> @@ -1262,6 +1262,46 @@ static int gmc_v8_0_process_interrupt(struct
>>>> amdgpu_device *adev,
>>>> return 0;
>>>> }
>>>>
>>>> +struct gmc_vm_fault_work {
>>>> + struct work_struct base;
>>>> + struct amdgpu_device *adev;
>>>> + struct amdgpu_irq_src *source;
>>>> + struct amdgpu_iv_entry *entry;
>>>> +};
>>>> +
>>>> +static void gmc_v8_0_vm_fault_sched(struct work_struct *work) {
>>>> + struct gmc_vm_fault_work *vm_work =
>>>> + container_of(work, struct gmc_vm_fault_work, base);
>>>> + struct amdgpu_device *adev = vm_work->adev;
>>>> + struct amdgpu_irq_src *source = vm_work->source;
>>>> + struct amdgpu_iv_entry *entry = vm_work->entry;
>>>> +
>>>> + gmc_v8_0_process_vm_fault(adev, source, entry);
>>>> +
>>>> + kfree(vm_work);
>>>> +}
>>>> +
>>>> +static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
>>>> + struct amdgpu_irq_src *source,
>>>> + struct amdgpu_iv_entry *entry) {
>>>> + struct gmc_vm_fault_work *work = NULL;
>>>> +
>>>> + if (amdgpu_sriov_vf(adev)) {
>>>> + work = kmalloc(sizeof(struct gmc_vm_fault_work), GFP_ATOMIC);
>>>> + if (!work)
>>>> + return -ENOMEM;
>>>> + INIT_WORK(&work->base, gmc_v8_0_vm_fault_sched);
>>>> + work->adev = adev;
>>>> + work->source = source;
>>>> + work->entry = entry;
>>>> + return schedule_work(&work->base);
>>>> + }
>>>> +
>>>> + return gmc_v8_0_process_vm_fault(adev, source, entry); }
>>>> +
>>>> static void fiji_update_mc_medium_grain_clock_gating(struct
>>>> amdgpu_device *adev,
>>>> bool enable)
>>>> {
>>>> --
>>>> 2.7.4
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
More information about the amd-gfx
mailing list