[PATCH] drm/ttm: fix two regressions since move_notify changes

Wed Jan 25 10:51:05 PST 2012

On Wed, Jan 25, 2012 at 1:21 PM, Thomas Hellstrom <thomas at shipmail.org> wrote:
> On 01/25/2012 07:12 PM, Dave Airlie wrote:
>>
>> On Wed, Jan 25, 2012 at 5:19 PM, Thomas Hellstrom<thomas at shipmail.org>
>>  wrote:
>>>
>>> On 01/25/2012 04:37 PM, Jerome Glisse wrote:
>>>>
>>>> On Wed, Jan 25, 2012 at 10:21 AM, Ben Skeggs<skeggsb at gmail.com>
>>>>  wrote:
>>>>>
>>>>> On Wed, 2012-01-25 at 15:33 +0100, Thomas Hellstrom wrote:
>>>>>>
>>>>>> On 01/25/2012 10:41 AM, Ben Skeggs wrote:
>>>>>>>
>>>>>>> My main concern is that we blindly and unnecessarily set up GPU
>>>>>>> bindings and
>>>>>>> end up with unnecessary code in TTM, and furthermore that we
>>>>>>> communicate
>>>>>>> that bad practice to future driver writers.
>>>>>>> This "unnecessary code" is like 5 lines of cleanup if something
>>>>>>> fails,
>>>>>>> hardly anything to be jumping up and down about :)
>>>>>>
>>>>>> It's just not TTM's business, unless the GPU maps are mappable by the
>>>>>> CPU as well.
>>>>>> Also, What if the mapping setup in move_notify() fails?
>>>>>
>>>>> It can't fail, and well, in nouveau's implementation it never will.
>>>>> It's simply a "fill the ptes for all the vmas currently associated with
>>>>> a bo".
>>>>>
>>>>> And well, it's about as much TTM's business as VRAM aperture allocation
>>>>> is..  I don't see the big deal, if you wan't to do it a different way
>>>>> in
>>>>> your driver, there's nothing stopping you.  It's a lot of bother for
>>>>> essentially zero effort in TTM..
>>>>>
>>>>>>>> Thomas, what do you suggest to move forward with this? Both of these
>>>>>>>> bugs are serious regressions that make nouveau unusable with the
>>>>>>>> current
>>>>>>>> 3.3-rc series. Ben.
>>>>>>>>
>>>>>>>> My number one choice would of course be to have the drivers set up
>>>>>>>> their
>>>>>>>> private GPU mappings after a
>>>>>>>> successful validate call, as I originally suggested and you agreed
>>>>>>>> to.
>>>>>>>>
>>>>>>>> If that's not possible (I realize it's late in the release series),
>>>>>>>> I'll
>>>>>>>> ack this patch if you and Jerome agree not to block
>>>>>>>> attempts to move in that direction for future kernel releases.
>>>>>>>
>>>>>>> I can't say I'm entirely happy with the plan honestly.  To me, it
>>>>>>> still
>>>>>>> seems more efficient to handle this when a move happens
>>>>>>> (comparatively
>>>>>>> rare) and "map new backing storage into every vm that has a
>>>>>>> reference"
>>>>>>> than to (on every single buffer of every single "exec" call) go "is
>>>>>>> this
>>>>>>> buffer mapped into this channel's vm? yes, ok; no, lets go map it".
>>>>>>>
>>>>>>> I'm not even sure how exactly I plan on storing this mapping
>>>>>>> efficiently
>>>>>>> yet.. Scanning the BO's linked list of VMs it's mapped into for "if
>>>>>>> (this_vma == chan->vma)" doesn't exactly sound performant.
>>>>>>
>>>>>> As previously suggested, in the simplest case a bo could have a 'needs
>>>>>> remap' flag
>>>>>> that is set on gpu map teardown on move_notify(), and when this flag
>>>>>> is
>>>>>> detected in validate,
>>>>>> go ahead and set up all needed maps and clear that flag.
>>>>>>
>>>>>> This is the simplest case and more or less equivalent to the current
>>>>>> solution, except
>>>>>> maps aren't set up unless needed by at least one channel and there is
>>>>>> a
>>>>>> clear way
>>>>>> to handle errors when GPU maps are set up.
>>>>>
>>>>> Yes, right.  That can be done, and gives exactly the same functionality
>>>>> as I *already* achieve with move_notify() but apparently have to change
>>>>> just because you've decided nouveau/radeon are both doing the
>>>>> WrongThing(tm).
>>>>>
>>>>> Anyhow, I care more that 3.3 works than how it works.  So, whatever.
>>>>>  If
>>>>> I must agree to this in order to get a regression fix in, then I guess
>>>>> I
>>>>> really have no choice in the matter.
>>>>>
>>>>> Ben.
>>>>>
>>>>>> A simple and straightforward fix that leaves the path open (if so
>>>>>> desired) to
>>>>>> handle finer channel granularity.
>>>>>>
>>>>>> Or am I missing something?
>>>>>>
>>>> I went over the code and Ben fix is ok with me, i need to test it a
>>>> bit on radeon side.
>>>>
>>>> For long term solution why not just move most of the
>>>> ttm_bo_handle_move_mem to the driver. It would obsolete the
>>>> move_notify callback. move notify callback was introduced because in
>>>> some case the driver never knew directly that a bo moved. It's obvious
>>>> that driver need to know every time. So instead of having an ha-doc
>>>> function for that. Let just move the handle move stuff into the
>>>> driver. Yes there will be some code duplication but it will avoid
>>>> anykind of weird error path and driver will be able to perform what
>>>> ever make sense.
>>>
>>>
>>> Yes, this is a solution that eliminates the need for TTM to support
>>> private
>>> GPU map setup. Code duplication can largely be avoided if we
>>> collect common code in a small utility function.
>>
>> Please this, its reduces one more place TTM suffers from midlayer effects.
>>
>> I'd much prefer the drivers be in control and call into help common
>> code instead of the driver calling in and then lots of callbacks out.
>>
>> (and yes the drm suffers from this a lot as well).
>>
>> Dave.
>
> OK,
> let's work in this direction.
>
> /Thomas

I should have time next week to craft a patch for this.

Cheers,
Jerome