[PATCH] drm/nouveau: fix ttm move notify callback

Fri Jan 6 10:22:20 PST 2012

On Fri, Jan 06, 2012 at 11:53:35AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 06, 2012 at 11:51:03AM -0500, Jerome Glisse wrote:
> > On Fri, Jan 6, 2012 at 9:57 AM, Konrad Rzeszutek Wilk
> > <konrad.wilk at oracle.com> wrote:
> > > On Thu, Jan 05, 2012 at 09:14:10PM -0500, Konrad Rzeszutek Wilk wrote:
> > >> On Fri, Jan 06, 2012 at 07:53:13AM +1000, Ben Skeggs wrote:
> > >> > On Thu, 2012-01-05 at 13:31 -0500, j.glisse at gmail.com wrote:
> > >> > > From: Jerome Glisse <jglisse at redhat.com>
> > >> > >
> > >> > > ttm might call the move notify with null new mem placement,
> > >> > > properly handle this case inside nouveau move notify callback.
> > >> > This has been fixed already in a -next tree I sent to Dave.
> > >>
> > >> I just tried -next with your patch (and two other fixes that I had sent):
> > >>
> > >> drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool
> > >> drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages
> > >>
> > >> and Jerome's AGP fix:
> > >> ttm: fix agp since ttm tt rework
> > >>
> > >> and got the crash (but only with NVidia cards) after swapping between Xorg and the VCs.
> > >> Look in drm-next.jpg
> > >
> > > http://darnok.org/vga/drm-next.jpg
> > >
> > >>
> > >> With your patch removed ("drm/nouveau/ttm: fix crash as a result of a recent ttm change")
> > >> and the patch below by Jerome I still get it to crash (see drm-next-with-Jerome-fix-revert-Ben.jpg)..
> > >
> > > http://darnok.org/vga/drm-next-with-Jerome-fix-revert-Ben.jpg
> > >
> > 
> > Anything special to trigger it ? I can't trigger it with simple gnome3
> > session (firefox evince ...)
> 
> I ran etracer, then switched over to a framebuffer console (Alt-F2), logged in.
> Then ran perf record and switched back to etracer. Ran a couple of laps and when finished
> quit the perf top. On the PCI-e it took a while (so I had to run a couple of laps).
> 
> On the AGP one it happended immediately, which is no surprise since the code looks
> to be activated when we do garbage collection and the machine only had 2GB. The
> PCIe on has 8GB. Perhaps a better way would be to force the workqueue by setting the
> pool limits to smaller values.
>

Still having difficulty to reproduce can you reproduce with the attached
printk debuging patch and provide the log (only few printk preceding the
oops or segfault are interesting).

Cheers,
Jerome