[RFC PATCH 2/2] drm/ttm: downgrade cached to write_combined when snooping not available

Mon Jul 1 12:10:19 UTC 2024

Am 01.07.24 um 13:52 schrieb Jiaxun Yang:
> 在2024年7月1日七月 下午12:40，Christian König写道：
> [...]
>> Ah, wait a second.
>>
>> Such a thing as non-coherent PCIe implementation doesn't exist. The PCIe
>> specification makes it mandatory for memory access to be cache coherent.
> There are some non-PCIe TTM GPU being hit by this pitfall, we have non-coherent
> Vivante GPU on some devices.

Yeah, but those are perfectly supported.

If you have a non PCIe device which needs uncached or write combined 
system memory allocations you can just specify this at memory allocation 
time.

> Handling it at TTM core makes more sense on reducing per-driver effort on dealing
> platform issues.
>
>> There are a bunch of non-compliant PCIe implementations which have
>> broken cache coherency, but those explicitly violate the specification
>> and because of that are not supported.
> I don't really understand, "doesn't exist" and "bunch of" seems to be contradicting
> with each other.

A compliant non-coherent PCIe implementation doesn't exists, that's made 
mandatory by the PCIe standard.

What does exists are some non compliant non coherent PCIe 
implementations, but as far as I know those are then not supported at 
all by Linux.

We already had tons of problems with platform chips which intentionally 
doesn't correctly implement the PCIe specification because it is cheaper.

At least for AMDs GPU driver we reverted to rejecting patches for 
platform bugs cause by incorrectly wired up PCIe root complexes.

Regards,
Christian.

>
>> Regards,
>> Christian.
>>
>>>> Unfortunately I don't think we can safely ttm_cached to ttm_write_comnined, we've
>>>> had enough drama with write combine behaviour on all different platforms.
>>>>
>>>> See drm_arch_can_wc_memory in drm_cache.h.
>>>>
>>> Yes this really sounds like an issue.
>>>
>>> Maybe the behavior of ttm_write_combined should furtherly be decided
>>> by drm_arch_can_wc_memory() in case of quirks?
> IMO for DMA mappings, use dma_pgprot at mapping makes more sense :-)
>
> Thanks
> - Jiaxun
>>>> Thanks
>>>>
>>>>> +
>>>>>    	return ttm_prot_from_caching(caching, tmp);
>>>>>    }
>>>>>    EXPORT_SYMBOL(ttm_io_prot);
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> index 7b00ddf0ce49f..3335df45fba5e 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> @@ -152,6 +152,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>>>>>    			       enum ttm_caching caching,
>>>>>    			       unsigned long extra_pages)
>>>>>    {
>>>>> +	/* Downgrade cached mapping for non-snooping devices */
>>>>> +	if (!bo->bdev->dma_coherent && caching == ttm_cached)
>>>>> +		caching = ttm_write_combined;
>>>>> +
>>>>>    	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>>>>>    	ttm->page_flags = page_flags;
>>>>>    	ttm->dma_address = NULL;
>>>>> diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
>>>>> index a18f43e93abab..f92d7911f50e4 100644
>>>>> --- a/include/drm/ttm/ttm_caching.h
>>>>> +++ b/include/drm/ttm/ttm_caching.h
>>>>> @@ -47,7 +47,8 @@ enum ttm_caching {
>>>>>
>>>>>    	/**
>>>>>    	 * @ttm_cached: Fully cached like normal system memory, requires that
>>>>> -	 * devices snoop the CPU cache on accesses.
>>>>> +	 * devices snoop the CPU cache on accesses. Downgraded to
>>>>> +	 * ttm_write_combined when the snooping capaiblity is missing.
>>>>>    	 */
>>>>>    	ttm_cached
>>>>>    };
>>>>> -- 
>>>>> 2.45.2