[PATCH] drm/ttm: pass buffer object for bind/unbind callback

Jerome Glisse j.glisse at gmail.com
Sat Nov 19 14:54:53 PST 2011


On Sat, Nov 19, 2011 at 5:37 PM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
> On 11/19/2011 10:22 PM, Jerome Glisse wrote:
>>
>> On Sat, Nov 19, 2011 at 4:01 PM, Thomas Hellstrom<thellstrom at vmware.com>
>>  wrote:
>>
>>>
>>> On 11/19/2011 09:37 PM, Jerome Glisse wrote:
>>>
>>>>
>>>> On Sat, Nov 19, 2011 at 2:46 PM, Thomas Hellstrom<thellstrom at vmware.com>
>>>>  wrote:
>>>>
>>>>
>>>>>
>>>>> On 11/19/2011 07:11 PM, Jerome Glisse wrote:
>>>>>
>>>>> On Sat, Nov 19, 2011 at 12:00 PM, Thomas Hellstrom
>>>>> <thellstrom at vmware.com>    wrote:
>>>>>
>>>>>
>>>>> On 11/19/2011 03:53 PM, Ben Skeggs wrote:
>>>>>
>>>>>
>>>>> On Sat, 2011-11-19 at 11:07 +0100, Thomas Hellstrom wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 11/19/2011 01:26 AM, Ben Skeggs wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 2011-11-18 at 23:48 +0100, Thomas Hellstrom wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 11/18/2011 06:26 PM, Ben Skeggs wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 2011-11-18 at 15:30 +0100, Thomas Hellstrom wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 11/18/2011 02:15 PM, Ben Skeggs wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 2011-11-18 at 08:57 +0100, Thomas Hellstrom wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Jerome,
>>>>>
>>>>> I don't like this change for the following reasons
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -snip-
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> One can take advantage of move notify callback but, there are
>>>>> corner case where bind/unbind might be call without move notify
>>>>> being call (in error path mostly). So to make sure that each
>>>>> virtual address space have a consistent view of wether a buffer
>>>>> object is backed or not by system page it's better to pass the
>>>>> buffer object to bind/unbind.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> As I discussed previously with Jerome on IRC, I believe the
>>>>> move_notify
>>>>> hook is sufficient.  I fixed a couple of issues back with it back
>>>>> when I
>>>>> implemented support for this in nouveau.
>>>>>
>>>>> Jerome did point out two issues however, which I believe can be
>>>>> solved
>>>>> easily enough.
>>>>>
>>>>> The first was that the error path after move_notify() has been
>>>>> called
>>>>> doesn't reverse the move_notify() too, leaving incorrect page table
>>>>> entries.  This, I think, could easily be solved by swapping bo->mem
>>>>> and
>>>>> mem, and re-calling move_notify() to have the driver reverse
>>>>> whatever it
>>>>> did.
>>>>>
>>>>> The second issue is that apparently move_notify() doesn't get called
>>>>> in
>>>>> the destroy path.  Nouveau's GEM layer takes care of this for our
>>>>> userspace bos, and we don't use any kernel bos that are mapped into
>>>>> a
>>>>> channel's address space so we don't hit it.  This can probably be
>>>>> solved
>>>>> easily enough too I expect.
>>>>>
>>>>> Ben.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> OK. I understand. Surely if a move_notify is missing somewhere, that
>>>>> can
>>>>> easily be fixed.
>>>>>
>>>>> It might be good if we can outline how the virtual tables are set up.
>>>>> In
>>>>> my world, the following would happen:
>>>>>
>>>>> 1) Pre command submission:
>>>>>
>>>>> a) reserve bo
>>>>> b) call ttm_bo_validate to set placement. move_notify will tear down
>>>>> any
>>>>> existing GPU page tables for the bo.
>>>>> c) Set up GPU page tables.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Nouveau doesn't do this exactly.  I wanted to avoid having to check if
>>>>> a
>>>>> bo was bound to a particular address space on every command
>>>>> submission.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> It perhaps might be a good idea to revisit this?
>>>>> I think using move_notify to set up a new placement before the data has
>>>>> actually been moved is very fragile and not the intended use. That
>>>>> would
>>>>> also save you from having to handle error paths. Also how do you handle
>>>>> swapouts?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I don't see how it's fragile, there's only the one error path after
>>>>> that
>>>>> point that needs to undo it.  And, what *is* the expected use then?  I
>>>>> see it as "tell the driver the buffer is moving so it can do whatever
>>>>> is
>>>>> necessary as a result".
>>>>>
>>>>> Swapouts are a missed case though, indeed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> A quick check in c) that the virtual map hasn't been torn down by a
>>>>> move_notify and, in that case, rebuild it would probably be next to
>>>>> free. The virtual GPU mapping would then be valid only after validation
>>>>> and while the bo is fenced or pinned.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> This alone doesn't solve the swapouts issue either unless you're also
>>>>> wanting us to unmap once a buffer becomes unfenced, which would lead to
>>>>> loads of completely unnecessary map/unmaps.
>>>>>
>>>>> Ben.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I think you misunderstand me a little.
>>>>> The swapout issue should be handled by calling a move_notify() to kill
>>>>> the virtual GPU mappings.
>>>>>
>>>>> However, when moving data that has page tables pointing to it, one
>>>>> should (IMHO) do:
>>>>>
>>>>> 1) Kill the old page tables
>>>>> 2) Move the object
>>>>> 3) Set up new page tables on demand.
>>>>>
>>>>> This is done by TTM for CPU page tables and I think that should be done
>>>>> also for GPU page tables. move_notify() should be responsible for 1),
>>>>> The driver execbuf for 3), and 3) needs only to be done for the
>>>>> particular ring / fifo which is executing commands that touch the
>>>>> buffer.
>>>>>
>>>>> This leaves a clear and well-defined requirement on TTM:
>>>>> Notify the driver when it needs to kill its GPU page tables.
>>>>>
>>>>>
>>>>>
>>>>> Ok.  This I don't really object to really.  I read the previous mail as
>>>>> that GPU mappings should only exist as long as the buffer is fenced,
>>>>> which would be a ridiculous amount of overhead.
>>>>>
>>>>>
>>>>>
>>>>> I agree. What I meant was that the page tables wouldn't be considered
>>>>> valid
>>>>> when the fence had signaled. However that was actually incorrect
>>>>> because
>>>>> they would actually be valid until the next move_notify(). The
>>>>> intention
>>>>> was
>>>>> never to tear down the mappings once the fence had signaled; that would
>>>>> have
>>>>> been pretty stupid.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> With the latest patch from jerome:
>>>>> Notify the driver when it needs to rebuild it page tables. Also on
>>>>> error
>>>>> paths, but not for swapins because no driver will probably set up GPU
>>>>> page tables to SYSTEM_MEMORY.
>>>>>
>>>>> This is what I mean with fragile, and I much rather see the other
>>>>> approach.
>>>>>
>>>>> Ben, I didn't fully understand why you didn't want to use that
>>>>> approach.
>>>>> Did you see a significant overhead with it.
>>>>>
>>>>>
>>>>>
>>>>> Now I'm more clear on what you meant, no, it wouldn't be a lot of
>>>>> overhead.  However, move_notify() was never intended for the use you're
>>>>> proposing now, or the new ttm_mem_reg would never have been being
>>>>> passed
>>>>> in as a parameter...
>>>>>
>>>>>
>>>>>
>>>>> I suppose you're right. Still I think this is the right way to go.
>>>>> Since
>>>>> it
>>>>> has the following advantages:
>>>>>
>>>>> 1) TTM doesn't need to care about the driver re-populating its GPU page
>>>>> tables.
>>>>> Since swapin is handled from the tt layer not the bo layer, this makes
>>>>> it
>>>>> a
>>>>> bit easier on us.
>>>>> 2) Transition to page-faulted GPU virtual maps is straightforward and
>>>>> consistent. A non-page-faulting driver sets up the maps at CS time, A
>>>>> pagefaulting driver can set them up directly from an irq handler
>>>>> without
>>>>> reserving, since the bo is properly fenced or pinned when the pagefault
>>>>> happens.
>>>>> 3) A non-page-faulting driver knows at CS time exactly which
>>>>> page-table-entries really do need populating, and can do this more
>>>>> efficiently.
>>>>>
>>>>> So unless there are strong objections I suggest we should go this way.
>>>>>
>>>>> Even if this changes the semantics of the TTM<->    driver interface
>>>>> compared
>>>>> to how Nouveau currently does things, it means that Jerome's current
>>>>> patch
>>>>> is basically correct and doesn't any longer have any assumptions about
>>>>> SYSTEM memory never being used for virtual GPU maps.
>>>>>
>>>>> Thanks,
>>>>> Thomas.
>>>>>
>>>>>
>>>>> I think it's better to let driver choose how it will handle its
>>>>> virtual page table. For radeon  i update in move_notify in order to
>>>>> minimize the number of update. I don't want to track if an object have
>>>>> been updated or not against each page table. Of course this add
>>>>> possibly useless overhead to move notify as we might update page table
>>>>> to many time (bo from vram->gtt->system)
>>>>>
>>>>> I think if move notify is properly call once for each effective move
>>>>> driver can do their own dance behind curtain.
>>>>>
>>>>> Cheers,
>>>>> Jerome
>>>>>
>>>>>
>>>>> Jerome,
>>>>>
>>>>> It's not really what the driver does that is the problem, it's what the
>>>>> driver expects from the driver<->    TTM interface.
>>>>>
>>>>> That's why I'd really like a maintainable interface design before
>>>>> coding
>>>>> patches that makes the interface a set of various workarounds.
>>>>>
>>>>> Enforcing this will also force driver writers to think twice before
>>>>> implementing things in their own way adding another load of ad hoc
>>>>> callbacks
>>>>> to TTM, without a clear specification but with the only purpose to fit
>>>>> a
>>>>> specific driver implementation, so in a way it's our responsibility to
>>>>> future driver writers to try to agree on a simple interface that works
>>>>> well
>>>>> and allows drivers to do an efficient implementation, and that adds a
>>>>> little
>>>>> maintenance burden on TTM.
>>>>>
>>>>> If a future driver writer looks at the Radeon code and replicates it,
>>>>> because he thinks it's state of the art, and then finds out his code
>>>>> breaks
>>>>> because he can't use SYSTEM memory for GPU page tables, or use his own
>>>>> private LRU, or the code breaks with partially populated TTMs and
>>>>> finally
>>>>> finds it's quite inefficient too because it unnecessarily populates
>>>>> page
>>>>> tables, we've failed.
>>>>>
>>>>> This is really the point I'd like to make, but as a side note, I'm not
>>>>> asking you to track each bo against each page table. What I'm asking
>>>>> you
>>>>> is
>>>>> to not populate the page tables in bo_move() but in execbuf / pushbuf.
>>>>> The
>>>>> possibility to track a bo against each page table comes as a nice
>>>>> benefit.
>>>>>
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>
>>>>
>>>> I don't see the issue with updating page table in move_notify. That's
>>>> what i do for radeon virtual memory, doing it in command buffer ioctl
>>>> is kind of too costly.
>>>>
>>>
>>> Could you describe why it is too costly? I previously asked Ben the same
>>> thing and he said it didn't come with a substantial performance penalty.
>>> Why
>>> would it be more than checking a single bool saying whether page tables
>>> had
>>> previously been killed or not? This would really help me understand why
>>> you
>>> want to do it from move_notify.
>>>
>>
>> So we have one page table for each opener of the drm file. This page
>> table is in a bo, it's validated in vram when the virtual address page
>> is in use. There could be several active page table at the same time
>> (GPU working on different work load all possibly in different virtual
>> address space). The page table also cover vram (everything is a page
>> wether it's in system memory or vram). So in the end the pagetable can
>> be quite big like 1M or bigger. Each page table can have the same bo
>> at different virtual address (so no sharing btw page table).
>>
>> So if we were to build/update page table at each cs ioctl we are
>> talking about moving around possibly several M of memory. Also if we
>> want to go the update way only that means we need to track for each
>> page table what was the last position of each bo we want to use, that
>> sounds like a big loop.
>>
>> With move notify we update all page table in which the bo is mapped
>> into. Assumption is that bo don't move often and when it does it will
>> soon be use. Also that way we don't have to track anything. We know
>> that at all point in time all the page table have an up to date entry
>> for all the bo.
>>
>>
>
> As mentioned previously, and in the discussion with Ben, the page tables
> would not need to be rebuilt on each CS. They would be rebuilt only on the
> first CS following a move_notify that caused a page table invalidation.
>
> move_notify:
> if (is_incompatible(new_mem_type)) {
>  bo->page_tables_invalid = true;
>  invalidate_page_tables(bo);
> }
>
> command_submission:
> if (bo->page_tables_invalid) {
>   set_up_page_tables(bo);
>   bo->page_tables_invalid = false;
> }

Why is it different from updating page table in move notify ? I don't
see any bonus here, all the information we need are already available
in move_notify.

>
>>>> Note that nouveau also update the page table in
>>>> move_notify. So i think it's the right solution. TTM interface for
>>>> move notify should just state that it will be call whenever a buffer
>>>> change placement. Then we can add a comment on system buffer which can
>>>> see there page disappear in thin air at any time. Note that when
>>>> placement is system we mark all page table as invalid and point them
>>>> to default entry (which is what nouveau is doing too).
>>>>
>>>>
>>>
>>> Well I've previously listed a number of disadvantages with this, which I
>>> think are still valid, but I fail to see any advantages? Just proposed
>>> workarounds for the disadvantages? (FWIW vmwgfx used to have GPU page
>>> tables
>>> pointing to system memory, and I'd say it would be a pretty common
>>> use-case
>>> for drivers that don't have a mappable GART).
>>>
>>
>> Ok there is 2 different thing for me, system placement in ttm is when
>> the bo will never be use by the gpu (true for both radeon and
>> nouveau). TTM_PL_TT is when a bo is bind to a gart and backed by
>> system page. We know that the page will there as long as the bo is in
>> use. PL_TT means the GPU page table point to system memory.
>>
>
> Memory types in TTM are completely orthogonal to allowed GPU usage. The GPU
> may access a bo if it's reserved, fenced or pinned, regardless of its
> placement.
>
> A TT memory type is a *single* GPU aperture that may be mapped from the
> aperture side by the CPU (AGP). It may also be used by a single unmappable
> aperture that wants to use TTM's range management and eviction (vmwgfx GMR).
> The driver can define more than one such memory type (psb), but a bo can
> only be placed in one of those at a time, so this approach is unsuitable for
> multiple apertures pointing to the same pages.

radeon virtual memory have a special address space, the system address
space, it's managed by ttm through a TTM_TT (exact same code as
current one). All the other address space are not managed by ttm but
we require a bo to be bound to ttm_tt to be use, thought we can relax
that. That's the reason why i consider system placement as different.

> Hardware with multiple page tables (one per context or file descriptor)
> would probably want to use SYSTEM memory directly (unichrome SGDMA, old
> vmwgfx GMR), unless the per-context page tables can only point to pages that
> already reside in a bigger GPU-wide translation table. (IIRC i965 might be
> an example of the latter, but I'm unsure).
>

Cheers,
Jerome


More information about the dri-devel mailing list