[PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

Kuehling, Felix Felix.Kuehling at amd.com
Mon May 13 20:31:49 UTC 2019


[Fixed Alex's email address, sorry for getting it wrong first]

On 2019-05-13 3:49 p.m., Jerome Glisse wrote:
> [CAUTION: External Email]
>
> Andrew can we get this 2 fixes line up for 5.2 ?
>
> On Mon, May 13, 2019 at 07:36:44PM +0000, Kuehling, Felix wrote:
>> Hi Jerome,
>>
>> Do you want me to push the patches to your branch? Or are you going to
>> apply them yourself?
>>
>> Is your hmm-5.2-v3 branch going to make it into Linux 5.2? If so, do you
>> know when? I'd like to coordinate with Dave Airlie so that we can also
>> get that update into a drm-next branch soon.
>>
>> I see that Linus merged Dave's pull request for Linux 5.2, which
>> includes the first changes in amdgpu using HMM. They're currently broken
>> without these two patches.
> HMM patch do not go through any git branch they go through the mmotm
> collection. So it is not something you can easily coordinate with drm
> branch.
>
> By broken i expect you mean that if numabalance happens it breaks ?
> Or it might sleep when you are not expecting it too ?

Without the NUMA fix we'd end up using an outdated physical address in 
the GPU page table. The problem was caught by a test that got incorrect 
computation results using OpenCL on a NUMA system.

Without the FAULT_FLAG_ALLOW_RETRY patch, there can be kernel oopses due 
to incorrect locking/unlocking of mmap_sem. It breaks the promise that 
hmm_range_fault should not unlock the mmap_sem if block==true. It takes 
some memory pressure to trigger this.

Regards,
   Felix


>
> Cheers,
> Jérôme
>
>> Thanks,
>>     Felix
>>
>> On 2019-05-10 4:14 p.m., Jerome Glisse wrote:
>>> [CAUTION: External Email]
>>>
>>> On Fri, May 10, 2019 at 07:53:24PM +0000, Kuehling, Felix wrote:
>>>> Don't set this flag by default in hmm_vma_do_fault. It is set
>>>> conditionally just a few lines below. Setting it unconditionally
>>>> can lead to handle_mm_fault doing a non-blocking fault, returning
>>>> -EBUSY and unlocking mmap_sem unexpectedly.
>>>>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>> Reviewed-by: Jérôme Glisse <jglisse at redhat.com>
>>>
>>>> ---
>>>>    mm/hmm.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/hmm.c b/mm/hmm.c
>>>> index b65c27d5c119..3c4f1d62202f 100644
>>>> --- a/mm/hmm.c
>>>> +++ b/mm/hmm.c
>>>> @@ -339,7 +339,7 @@ struct hmm_vma_walk {
>>>>    static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
>>>>                             bool write_fault, uint64_t *pfn)
>>>>    {
>>>> -     unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
>>>> +     unsigned int flags = FAULT_FLAG_REMOTE;
>>>>         struct hmm_vma_walk *hmm_vma_walk = walk->private;
>>>>         struct hmm_range *range = hmm_vma_walk->range;
>>>>         struct vm_area_struct *vma = walk->vma;
>>>> --
>>>> 2.17.1
>>>>


More information about the amd-gfx mailing list