[Intel-gfx] [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Tue May 18 13:24:40 UTC 2021

On 5/18/21 3:08 PM, Christian König wrote:
> Am 18.05.21 um 14:52 schrieb Thomas Hellström:
>>
>> On 5/18/21 2:09 PM, Christian König wrote:
>>> Am 18.05.21 um 14:04 schrieb Thomas Hellström:
>>>>
>>>> On 5/18/21 1:55 PM, Christian König wrote:
>>>>>
>>>>>
>>>>> Am 18.05.21 um 10:26 schrieb Thomas Hellström:
>>>>>> The internal ttm_bo_util memcpy uses vmap functionality, and 
>>>>>> while it
>>>>>> probably might be possible to use it for copying in- and out of
>>>>>> sglist represented io memory, using io_mem_reserve() / io_mem_free()
>>>>>> callbacks, that would cause problems with fault().
>>>>>> Instead, implement a method mapping page-by-page using kmap_local()
>>>>>> semantics. As an additional benefit we then avoid the occasional 
>>>>>> global
>>>>>> TLB flushes of vmap() and consuming vmap space, elimination of a 
>>>>>> critical
>>>>>> point of failure and with a slight change of semantics we could 
>>>>>> also push
>>>>>> the memcpy out async for testing and async driver develpment 
>>>>>> purposes.
>>>>>> Pushing out async can be done since there is no memory allocation 
>>>>>> going on
>>>>>> that could violate the dma_fence lockdep rules.
>>>>>>
>>>>>> For copies from iomem, use the WC prefetching memcpy variant for
>>>>>> additional speed.
>>>>>>
>>>>>> Note that drivers that don't want to use struct io_mapping but 
>>>>>> relies on
>>>>>> memremap functionality, and that don't want to use scatterlists for
>>>>>> VRAM may well define specialized (hopefully reusable) iterators 
>>>>>> for their
>>>>>> particular environment.
>>>>>
>>>>> In general yes please since I have that as TODO for TTM for a very 
>>>>> long time.
>>>>>
>>>>> But I would prefer to fix the implementation in TTM instead and 
>>>>> give it proper cursor handling.
>>>>>
>>>>> Amdgpu is also using page based iomem and we are having similar 
>>>>> workarounds in place there as well.
>>>>>
>>>>> I think it makes sense to unify this inside TTM and remove the old 
>>>>> memcpy util function when done.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>
>>>> Christian,
>>>>
>>>> I was thinking when we replace the bo.mem with a pointer (and 
>>>> perhaps have a driver callback to allocate the bo->mem,
>>>> we could perhaps embed a struct ttm_kmap_iter and use it for all 
>>>> mapping in one way or another). That would mean perhaps land this 
>>>> is i915 now and sort out the unification once the struct 
>>>> ttm_resource, struct ttm_buffer_object separation has landed?
>>>
>>> That stuff is ready, reviewed and I'm just waiting for some amdgpu 
>>> changes to land in drm-misc-next to push it.
>>>
>>> But yes in general an iterator for the resource object sounds like 
>>> the right plan to me as well.
>>>
>>> Christian.
>>
>> OK, so then are you OK with landing this in i915 for now? That would 
>> also ofc mean the export you NAK'd but strictly for this memcpy use 
>> until we merge it with TTM?
>
> Well you can of course prototype that in i915, but I really don't want 
> to export the TT functions upstream.

I understand, I once had the same thoughts trying to avoid that as far 
as possible, so this function was actually then added to the ttm_bo 
interface, (hence the awkward naming) as a helper for drivers 
implementing move(), essentially a very special case of 
ttm_bo_move_accel_cleanup(), but anyway, see below:

>
> Can we cleanly move that functionality into TTM instead?

I'll take a look at that, but I think we'd initially be having iterators 
mimicing the current move_memcpy() for the
linear iomem !WC cases, hope that's OK.

/Thomas

>
> Christian.
>
>>
>>
>> /Thomas
>>
>>>
>>>>
>>>> /Thomas
>>>>
>>>>
>>>
>