[Intel-gfx] i915 memory allocation failure..
Linus Torvalds
torvalds at linux-foundation.org
Mon Jun 5 04:26:06 UTC 2017
So there's something wrong with the memory allocation changes since
4.11, which seem to be mostly credited to Chris Wilson.
In particular, I got this earlier today:
Xorg: page allocation failure: order:0,
mode:0x14210d2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_RECLAIMABLE),
nodemask=(null)
and then soon afterwards in the log I see
chrome[13102]: segfault at 968 ip 00007f472a7fda83 sp
00007fffab9a6ef0 error 4 in libX11.so.6.3.0[7f472a7d1000+138000]
gnome-session-f[13115]: segfault at 0 ip 00007f7e765ab4b9 sp
00007ffca5990470 error 4 in libgtk-3.so.0.2200.15[7f7e762cc000+6f9000]
which I assume is related to broken error handling.
So there's at least two bugs here:
(a) order-0 memory allocation failure is generally a sign of
something bad. We clearly give up *much* too easily.
(b) using __GFP_NORETRY and wanting the memory failure, but then not
using __GFP_NOWARN is just stupid.
Now, (b) initially made me go "I'll just add that __GFP_NOWARN
myself". Because it's true - if you intentionally tell the VM
subsystem that you'd rather get a failed allocation than try a bit
harder, then you obviously shouldn't get the warning either. I think
the VM people have talked about just considering NORETRY to imply
NOWARN.
However, the fact that this actually caused problems in downstream
user space, and the fact that this happened with an order-0 allocation
made me re-consider. That allocation clearly *is* important, and
returning NULL may "work" from a kernel standpoint, but it sure as
hell didn't work in the bigger picture, now did it?
So the warning was actually good in this case. This may in fact be an
example of why GFP_NORETRY should *not* imply NOWARN.
So instead of shutting up the warning, I pass it over to the i915
people. Making that allocation fail easily wasn't such a great idea
after all, was it? Maybe that NORETRY should be reconsidered, at least
for important (perhaps small?) allocations?
Also adding some VM people, because I think it's ridiculous that the
0-order allocation failed in the first place. Full report attached,
there's tons of memory that should have been trivial to free.
So I suspect GFP_NORETRY ends up being *much* too aggressive, and
basically doesn't even try any trivial freeing.
Maybe we want some middle ground between "retry forever" and "don't
try at all". In people trying to fight the "retry forever", we seem to
have gone too far in the "don't even bother, just return NULL"
direction.
Added a random mixture of i915 and MM people. Feel free to send this
message further if you feel I missed somebody,
Linus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gfp-noretry
Type: application/octet-stream
Size: 4437 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20170604/166e56bd/attachment.obj>
More information about the Intel-gfx
mailing list