[Bug 16148] page allocation failure. order:1, mode:0x50d0

Tue Jun 15 15:57:09 PDT 2010

On Tue, 2010-06-15 at 15:41 -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> (switching back to email, actually)
> 
> On Sun, 13 Jun 2010 13:01:57 GMT
> bugzilla-daemon at bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=16148
> >
> >
> > Mikko C. <mikko.cal at gmail.com> changed:
> >
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |mikko.cal at gmail.com
> >
> >
> >
> >
> > --- Comment #8 from Mikko C. <mikko.cal at gmail.com>  2010-06-13 13:01:53 ---
> > I have been getting this with 2.6.35-rc2 and rc3.
> > Could it be the same problem?
> >
> >
> > X: page allocation failure. order:0, mode:0x4
> > Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1
> > Call Trace:
> >  [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680
> >  [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680
> >  [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm]
> >  [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm]
> >  [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm]
> >  [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm]
> >  [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm]
> >  [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm]
> >  [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm]
> >  [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20
> >  [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm]
> >  [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm]
> >  [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon]
> >  [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon]
> >  [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon]
> >  [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon]
> >  [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm]
> >  [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]
> >  [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100
> >  [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0
> >  [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530
> >  [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0
> >  [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80
> >  [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90
> >  [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b
> 
> That's different.  ttm_get_pages() looks pretty busted to me.  It's not
> using __GFP_WAIT and it's not using __GFP_FS.  It's using a plain
> GFP_DMA32 so it's using atomic allocations even though it doesn't need
> to.  IOW, it's shooting itself in the head.
> 
> Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS
> and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT.
> If so, it should very much do so.  If not then the function is
> misdesigned and should be altered to take a gfp_t argument so the
> caller can tell ttm_get_pages() which is the strongest allocation mode
> which it may use.
> 
> > [TTM] Unable to allocate page.
> > radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002)
> > [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (7827456,
> > 2, 4096, -12)
> 
> This bug actually broke stuff for you.
> 
> Something like this:
> 
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a
> +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page
>  	/* No pool for cached pages */
>  	if (pool == NULL) {
>  		if (flags & TTM_PAGE_FLAG_DMA32)
> -			gfp_flags |= GFP_DMA32;
> +			gfp_flags |= GFP_KERNEL|GFP_DMA32;
>  		else
>  			gfp_flags |= GFP_HIGHUSER;
>  
> _
> 
> although I wonder whether it should be using pool->gfp_flags.
> 
> 
> It's a shame that this code was developed and merged in secret :( Had
> we known, we could have looked at enhancing mempools to cover the
> requirement, or at implementing this in some generic fashion rather
> than hiding it down in drivers/gpu/drm.

Its been post to lkml at least once or twice over the past few years
though not as much as it was posted to dri-devel, but that was because
we had never seen anyone show any interest in it outside of kernel
hackers. Originally I was going to use the generic allocator stuff ia64
uses for uncached allocations but it allocates memory ranges not pages
so it wasn't useful. I also suggested getting a page flag for uncached
allocator stuff, I was told to go write the code in my own corner and
prove it was required. So I did, and it was cleaned up and worked on by
others and I merged it. So can we lay off with the "in secret", the
original code is nearly 2 years old at this point and just because -mm
hackers choose to ignore it isn't our fault. Patches welcome.

So now back to the bug:

So the page pools are setup with gfp flags, in the normal case, 4 pools,
one WC GFP_HIGHUSER pages, one UC HIGHUSER pages, one WC GFP_USER|
GFP_DMA32, one UC GFP_USER|GFP_DMA32, so the pools are all fine, the
problem here is the same as before we added the pools, which is the
normal page allocation path, which needs the GFP_USER added instead of
GFP_KERNEL.

That said I've noticed a lot more page allocation failure reports in
2.6.35-rcX than we've gotten for a long time, in code that hasn't
changed (the AGP ones the other day for example) has something in the
core MM regressed (again... didn't this happen back in 2.6.31 or
something).

(cc'ing Mel who tracked these down before).

Dave.