[PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

Tue May 18 13:08:17 UTC 2021

Am 18.05.21 um 14:52 schrieb Thomas Hellström:
>
> On 5/18/21 2:09 PM, Christian König wrote:
>> Am 18.05.21 um 14:04 schrieb Thomas Hellström:
>>>
>>> On 5/18/21 1:55 PM, Christian König wrote:
>>>>
>>>>
>>>> Am 18.05.21 um 10:26 schrieb Thomas Hellström:
>>>>> The internal ttm_bo_util memcpy uses vmap functionality, and while it
>>>>> probably might be possible to use it for copying in- and out of
>>>>> sglist represented io memory, using io_mem_reserve() / io_mem_free()
>>>>> callbacks, that would cause problems with fault().
>>>>> Instead, implement a method mapping page-by-page using kmap_local()
>>>>> semantics. As an additional benefit we then avoid the occasional 
>>>>> global
>>>>> TLB flushes of vmap() and consuming vmap space, elimination of a 
>>>>> critical
>>>>> point of failure and with a slight change of semantics we could 
>>>>> also push
>>>>> the memcpy out async for testing and async driver develpment 
>>>>> purposes.
>>>>> Pushing out async can be done since there is no memory allocation 
>>>>> going on
>>>>> that could violate the dma_fence lockdep rules.
>>>>>
>>>>> For copies from iomem, use the WC prefetching memcpy variant for
>>>>> additional speed.
>>>>>
>>>>> Note that drivers that don't want to use struct io_mapping but 
>>>>> relies on
>>>>> memremap functionality, and that don't want to use scatterlists for
>>>>> VRAM may well define specialized (hopefully reusable) iterators 
>>>>> for their
>>>>> particular environment.
>>>>
>>>> In general yes please since I have that as TODO for TTM for a very 
>>>> long time.
>>>>
>>>> But I would prefer to fix the implementation in TTM instead and 
>>>> give it proper cursor handling.
>>>>
>>>> Amdgpu is also using page based iomem and we are having similar 
>>>> workarounds in place there as well.
>>>>
>>>> I think it makes sense to unify this inside TTM and remove the old 
>>>> memcpy util function when done.
>>>>
>>>> Regards,
>>>> Christian.
>>>
>>> Christian,
>>>
>>> I was thinking when we replace the bo.mem with a pointer (and 
>>> perhaps have a driver callback to allocate the bo->mem,
>>> we could perhaps embed a struct ttm_kmap_iter and use it for all 
>>> mapping in one way or another). That would mean perhaps land this is 
>>> i915 now and sort out the unification once the struct ttm_resource, 
>>> struct ttm_buffer_object separation has landed?
>>
>> That stuff is ready, reviewed and I'm just waiting for some amdgpu 
>> changes to land in drm-misc-next to push it.
>>
>> But yes in general an iterator for the resource object sounds like 
>> the right plan to me as well.
>>
>> Christian.
>
> OK, so then are you OK with landing this in i915 for now? That would 
> also ofc mean the export you NAK'd but strictly for this memcpy use 
> until we merge it with TTM?

Well you can of course prototype that in i915, but I really don't want 
to export the TT functions upstream.

Can we cleanly move that functionality into TTM instead?

Christian.

>
>
> /Thomas
>
>>
>>>
>>> /Thomas
>>>
>>>
>>