[Intel-gfx] [PATCH] iosys-map: Add word-sized reads

Lucas De Marchi lucas.demarchi at intel.com
Fri Jun 10 17:35:06 UTC 2022


On Fri, Jun 10, 2022 at 09:20:18AM +0200, Christian König wrote:
>Am 10.06.22 um 01:20 schrieb Lucas De Marchi:
>>Instead of always falling back to memcpy_fromio() for any size, prefer
>>using read{b,w,l}(). When reading struct members it's common to read
>>individual integer variables individually. Going through memcpy_fromio()
>>for each of them poses a high penalty.
>>
>>Employ a similar trick as __seqprop() by using _Generic() to generate
>>only the specific call based on a type-compatible variable.
>>
>>For a pariticular i915 workload producing GPU context switches,
>>__get_engine_usage_record() is particularly hot since the engine usage
>>is read from device local memory with dgfx, possibly multiple times
>>since it's racy. Test execution time for this test shows a ~12.5%
>>improvement with DG2:
>>
>>Before:
>>	nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07;
>>	median = 9.52548e+06; var = 526149;
>>After:
>>	nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06;
>>	median = 8.33955e+06; var = 333113;
>>
>>Other things attempted that didn't prove very useful:
>>1) Change the _Generic() on x86 to just dereference the memory address
>>2) Change __get_engine_usage_record() to do just 1 read per loop,
>>    comparing with the previous value read
>>3) Change __get_engine_usage_record() to access the fields directly as it
>>    was before the conversion to iosys-map
>>
>>(3) did gave a small improvement (~3%), but doesn't seem to scale well
>>to other similar cases in the driver.
>>
>>Additional test by Chris Wilson using gem_create from igt with some
>>changes to track object creation time. This happens to accidentaly
>>stress this code path:
>>
>>	Pre iosys_map conversion of engine busyness:
>>	lmem0: Creating    262144 4KiB objects took 59274.2ms
>>
>>	Unpatched:
>>	lmem0: Creating    262144 4KiB objects took 108830.2ms
>>
>>	With readl (this patch):
>>	lmem0: Creating    262144 4KiB objects took 61348.6ms
>>
>>	s/readl/READ_ONCE/
>>	lmem0: Creating    262144 4KiB objects took 61333.2ms
>>
>>So we do take a little bit more time than before the conversion, but
>>that is due to other factors: bringing the READ_ONCE back would be as
>>good as just doing this conversion.
>>
>>Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>---
>>
>>If this is acceptable we should probably add the write counterpart, too.
>>Sending here only the read for now since this fixes the issue we are
>>seeing and to gather feedback.
>
>As far as I can see looks sane to me, but the kernel test robot tears 
>the patch apart.
>
>Probably just a typo somewhere in the 32bit handling.

oh, yeah... after cleaning it up I renamed s/__iosys_map_u64_case/__iosys_map_rd_io_u64_case/
to prepare for adding the write case and forgot the other side of the
ifdef.


>
>Apart from that looks good to me.

thanks
Lucas De Marchi

>
>Regards,
>Christian.
>
>>
>>  include/linux/iosys-map.h | 26 ++++++++++++++++++++++----
>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>
>>diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h
>>index e69a002d5aa4..4ae3e459419e 100644
>>--- a/include/linux/iosys-map.h
>>+++ b/include/linux/iosys-map.h
>>@@ -333,6 +333,20 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
>>  		memset(dst->vaddr + offset, value, len);
>>  }
>>+#ifdef CONFIG_64BIT
>>+#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_)			\
>>+	u64: val_ = readq(vaddr_iomem_),
>>+#else
>>+#define __iosys_map_u64_case(val_, vaddr_iomem_)
>>+#endif
>>+
>>+#define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__,	\
>>+	u8: val__ = readb(vaddr_iomem__),				\
>>+	u16: val__ = readw(vaddr_iomem__),				\
>>+	u32: val__ = readl(vaddr_iomem__),				\
>>+	__iosys_map_rd_io_u64_case(val__, vaddr_iomem__)		\
>>+	default: memcpy_fromio(&(val__), vaddr_iomem__, sizeof(val__)))
>>+
>>  /**
>>   * iosys_map_rd - Read a C-type value from the iosys_map
>>   *
>>@@ -346,10 +360,14 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset,
>>   * Returns:
>>   * The value read from the mapping.
>>   */
>>-#define iosys_map_rd(map__, offset__, type__) ({			\
>>-	type__ val;							\
>>-	iosys_map_memcpy_from(&val, map__, offset__, sizeof(val));	\
>>-	val;								\
>>+#define iosys_map_rd(map__, offset__, type__) ({				\
>>+	type__ val;								\
>>+	if ((map__)->is_iomem) {						\
>>+		__iosys_map_rd_io(val, (map__)->vaddr_iomem + offset__, type__);\
>>+	} else {								\
>>+		memcpy(&val, (map__)->vaddr + offset__, sizeof(val));		\
>>+	}									\
>>+	val;									\
>>  })
>>  /**
>


More information about the Intel-gfx mailing list