[RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

Mon Jul 2 12:39:50 UTC 2018

Am 02.07.2018 um 14:35 schrieb Michal Hocko:
> On Mon 02-07-18 14:24:29, Christian König wrote:
>> Am 02.07.2018 um 14:20 schrieb Michal Hocko:
>>> On Mon 02-07-18 14:13:42, Christian König wrote:
>>>> Am 02.07.2018 um 13:54 schrieb Michal Hocko:
>>>>> On Mon 02-07-18 11:14:58, Christian König wrote:
>>>>>> Am 27.06.2018 um 09:44 schrieb Michal Hocko:
>>>>>>> This is the v2 of RFC based on the feedback I've received so far. The
>>>>>>> code even compiles as a bonus ;) I haven't runtime tested it yet, mostly
>>>>>>> because I have no idea how.
>>>>>>>
>>>>>>> Any further feedback is highly appreciated of course.
>>>>>> That sounds like it should work and at least the amdgpu changes now look
>>>>>> good to me on first glance.
>>>>>>
>>>>>> Can you split that up further in the usual way? E.g. adding the blockable
>>>>>> flag in one patch and fixing all implementations of the MMU notifier in
>>>>>> follow up patches.
>>>>> But such a code would be broken, no? Ignoring the blockable state will
>>>>> simply lead to lockups until the fixup parts get applied.
>>>> Well to still be bisect-able you only need to get the interface change in
>>>> first with fixing the function signature of the implementations.
>>> That would only work if those functions return -AGAIN unconditionally.
>>> Otherwise they would pretend to not block while that would be obviously
>>> incorrect. This doesn't sound correct to me.
>>>
>>>> Then add all the new code to the implementations and last start to actually
>>>> use the new interface.
>>>>
>>>> That is a pattern we use regularly and I think it's good practice to do
>>>> this.
>>> But we do rely on the proper blockable handling.
>> Yeah, but you could add the handling only after you have all the
>> implementations in place. Don't you?
> Yeah, but then I would be adding a code with no user. And I really
> prefer to no do so because then the code is harder to argue about.
>
>>>>> Is the split up really worth it? I was thinking about that but had hard
>>>>> times to end up with something that would be bisectable. Well, except
>>>>> for returning -EBUSY until all notifiers are implemented. Which I found
>>>>> confusing.
>>>> It at least makes reviewing changes much easier, cause as driver maintainer
>>>> I can concentrate on the stuff only related to me.
>>>>
>>>> Additional to that when you cause some unrelated side effect in a driver we
>>>> can much easier pinpoint the actual change later on when the patch is
>>>> smaller.
>>>>
>>>>>> This way I'm pretty sure Felix and I can give an rb on the amdgpu/amdkfd
>>>>>> changes.
>>>>> If you are worried to give r-b only for those then this can be done even
>>>>> for larger patches. Just make your Reviewd-by more specific
>>>>> R-b: name # For BLA BLA
>>>> Yeah, possible alternative but more work for me when I review it :)
>>> I definitely do not want to add more work to reviewers and I completely
>>> see how massive "flag days" like these are not popular but I really
>>> didn't find a reasonable way around that would be both correct and
>>> wouldn't add much more churn on the way. So if you really insist then I
>>> would really appreciate a hint on the way to achive the same without any
>>> above downsides.
>> Well, I don't insist on this. It's just from my point of view that this
>> patch doesn't needs to be one patch, but could be split up.
> Well, if there are more people with the same concern I can try to do
> that. But if your only concern is to focus on your particular part then
> I guess it would be easier both for you and me to simply apply the patch
> and use git show $files_for_your_subystem on your end. I have put the
> patch to attempts/oom-vs-mmu-notifiers branch to my tree at
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git

Not wanting to block something as important as this, so feel free to add 
an Acked-by: Christian König <christian.koenig at amd.com> to the patch.

Let's rather face the next topic: Any idea how to runtime test this?

I mean I can rather easily provide a test which crashes an AMD GPU, 
which in turn then would mean that the MMU notifier would block forever 
without this patch.

But do you know a way to let the OOM killer kill a specific process?

Regards,
Christian.