It appears drm-next TTM cleanup broke something . . .

Christian König ckoenig.leichtzumerken at gmail.com
Mon Oct 19 10:13:22 UTC 2020


Hi Kevin,

the basic problem you are facing is that ttm_tt_create/destroy is 
mandatory (It always was). You need an implementation or otherwise you 
won't be able to use the system domain (additional to the optional GTT 
domain).

My best guess is that the difference is that we now force to initiate 
the system domain for all drivers.

If that is correct you just that you never ran into because you never 
correctly initialized TTM to support buffer moves.

I'm not sure what exactly the OpenChrome DRM driver is doing, but I 
strongly suggest to just drop TTM support completely and use the GEM 
VRAM helper layer instead.

Regards,
Christian.

Am 19.10.20 um 09:23 schrieb Kevin Brace:
> Hi Dave,
>
> Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null.
> It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from.
> I think making huge changes to TTM during this development cycle broke OpenChrome DRM.
>      Following up on the question I raised during the previous e-mail.
> Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality?
> I feel like that should be the expected behavior.
> Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.).
> Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
>
> 1) nouveau, radeon, and amdgpu already use TTM TT functionality.
> 2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
> 3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
>
>
> Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members.
> That implementation worked fine until the previous development cycle code.
> Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
>
> Regards,
>
> Kevin Brace
> Brace Computer Laboratory blog
> https://bracecomputerlab.com
>
>
>> Sent: Sunday, October 18, 2020 at 12:50 PM
>> From: "Dave Airlie" <airlied at gmail.com>
>> To: "Kevin Brace" <kevinbrace at gmx.com>, "Christian König" <ckoenig.leichtzumerken at gmail.com>
>> Cc: "dri-devel" <dri-devel at lists.freedesktop.org>, "Dave Airlie" <airlied at redhat.com>
>> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>>
>> On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace at gmx.com> wrote:
>>> Hi Dave,
>>>
>>> It is a little urgent, so I am writing this right now.
>>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
>>> While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
>>> ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
>>> If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
>>> The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
>>> This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
>> cc'ing Christian,
>>
>> I can't remember if we did this deliberately or if just worked by
>> accident previously.
>>
>> Either way, you should probably need a ttm_tt_create going forward.
>>
>> Dave.
>>
>>> _______________________________________________
>>> . . .
>>> kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
>>> kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
>>> kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
>>> kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>> kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
>>> kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
>>> . . .
>>> kernel: [   34.310807] Call Trace:
>>> kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
>>> kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
>>> kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
>>> kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
>>> . . .
>>> _______________________________________________
>>>
>>> The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
>>> Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
>>> It appears that Linux 5.10's drm-next code broke the code.
>>>
>>> Regards,
>>>
>>> Kevin Brace
>>> Brace Computer Laboratory blog
>>> https://bracecomputerlab.com
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel



More information about the dri-devel mailing list