It appears drm-next TTM cleanup broke something . . .
Kevin Brace
kevinbrace at gmx.com
Mon Oct 19 07:23:52 UTC 2020
Hi Dave,
Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null.
It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from.
I think making huge changes to TTM during this development cycle broke OpenChrome DRM.
Following up on the question I raised during the previous e-mail.
Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality?
I feel like that should be the expected behavior.
Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.).
Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
1) nouveau, radeon, and amdgpu already use TTM TT functionality.
2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members.
That implementation worked fine until the previous development cycle code.
Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
Regards,
Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com
> Sent: Sunday, October 18, 2020 at 12:50 PM
> From: "Dave Airlie" <airlied at gmail.com>
> To: "Kevin Brace" <kevinbrace at gmx.com>, "Christian König" <ckoenig.leichtzumerken at gmail.com>
> Cc: "dri-devel" <dri-devel at lists.freedesktop.org>, "Dave Airlie" <airlied at redhat.com>
> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>
> On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace at gmx.com> wrote:
> >
> > Hi Dave,
> >
> > It is a little urgent, so I am writing this right now.
> > As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
> > While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
> > ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
> > If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
> > The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
> > This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
>
> cc'ing Christian,
>
> I can't remember if we did this deliberately or if just worked by
> accident previously.
>
> Either way, you should probably need a ttm_tt_create going forward.
>
> Dave.
>
> >
> > _______________________________________________
> > . . .
> > kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
> > kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
> > kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
> > kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode
> > kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page
> > . . .
> > kernel: [ 34.310807] Call Trace:
> > kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm]
> > kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm]
> > kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm]
> > kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
> > . . .
> > _______________________________________________
> >
> > The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
> > Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
> > It appears that Linux 5.10's drm-next code broke the code.
> >
> > Regards,
> >
> > Kevin Brace
> > Brace Computer Laboratory blog
> > https://bracecomputerlab.com
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
More information about the dri-devel
mailing list