[Intel-xe] [PATCH 3/3] drm/xe: Correlate engine and cpu timestamps with better accuracy

Dixit, Ashutosh ashutosh.dixit at intel.com
Sat Aug 12 00:32:40 UTC 2023


On Fri, 11 Aug 2023 17:30:18 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, Aug 11, 2023 at 04:32:32PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 04 Aug 2023 14:47:59 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >
> > Hi Umesh,
> >
> >> On Fri, Aug 04, 2023 at 02:32:53PM -0700, Umesh Nerlige Ramappa wrote:
> >> > +/**
> >> > + * struct drm_xe_query_cs_cycles - correlate CPU and GPU timestamps
> >> > + *
> >> > + * If a query is made with a struct drm_xe_device_query where .query
> >> > + * is equal to DRM_XE_QUERY_CS_CYCLES, then the reply uses
> >> > + * struct drm_xe_query_cs_cycles in .data.
> >> > + *
> >> > + * struct drm_xe_query_cs_cycles is allocated by the user and .data points to
> >> > + * this allocated structure. The user must pass .eci and .clockid as inputs to
> >> > + * this query. eci determines the engine and tile info required to fetch the
> >> > + * relevant GPU timestamp. clockid is used to return the specific CPU
> >> > + * timestamp.
> >> > + *
> >> > + * The query returns the command streamer cycles and the frequency that can
> >> > + * be used to calculate the command streamer timestamp. In addition the
> >> > + * query returns a set of cpu timestamps that indicate when the command
> >> > + * streamer cycle count was captured.
> >> > + */
> >> > +struct drm_xe_query_cs_cycles {
> >> > +	/** Engine for which command streamer cycles is queried. */
> >> > +	struct drm_xe_engine_class_instance eci;
> >> > +
> >> > +	/** MBZ (pad eci to 64 bit) */
> >> > +	__u16 rsvd;
> >>
> >> I need some inputs on the rsvd field here. Looks like struct
> >> drm_xe_engine_class_instance may need padding (64 bit aligned) if used this
> >> way.  Is this the right way to pad it? Should the padding be moved to
> >> struct drm_xe_engine_class_instance? OR should struct
> >> drm_xe_engine_class_instance be packed?
> >
> > Does it need to be 64 bit aligned? The next field can be an unaligned u64
> > field, it should be ok, x86 at least should be able to handle it.
>
> Then compiler would potentially insert a pad between eci and the next
> u64. The idea is to avoid an uninitialized padding.

The compiler may or may not add padding. So the next u64 field might or
might not be 8 byte aligned. Either way it should not make a difference to
functionality, unaligned accesses may be a little slower than aligned
access. But in this case probably doesn't matter much.

> >
> > But yes, nice to have all struct's 64 bit aligned. IMO the right way to do
> > it would be to pad 'struct drm_xe_engine_class_instance' to be 64 bit
> > aligned. Because if 'struct drm_xe_engine_class_instance' changes, the
> > above '__u16 rsvd' field might not work and is error prone and will also
> > need to be modified. So better to pad 'struct drm_xe_engine_class_instance'
> > to be 64 bit aligned.
>
> Agree, padding drm_xe_engine_class_instance makes sense.

Yeah better to explicitly pad drm_xe_engine_class_instance.

Thanks.
--
Ashutosh


>> >> > +
> >> > +	/**
> >> > +	 * Command streamer cycles as read from the command streamer
> >> > +	 * register at 0x358 offset.
> >> > +	 */
> >> > +	__u64 cs_cycles;
> >> > +
> >> > +	/** Frequency of the cs cycles in Hz. */
> >> > +	__u64 cs_frequency;
> >> > +
> >> > +	/**
> >> > +	 * CPU timestamp in ns. The timestamp is captured before reading the
> >> > +	 * cs_cycles register using the reference clockid set by the user.
> >> > +	 */
> >> > +	__u64 cpu_timestamp;
> >> > +
> >> > +	/**
> >> > +	 * Time delta in ns captured around reading the lower dword of the
> >> > +	 * cs_cycles register.
> >> > +	 */
> >> > +	__u64 cpu_delta;
> >> > +
> >> > +	/**
> >> > +	 * Reference clock id for CPU timestamp. For definition, see
> >> > +	 * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> >> > +	 * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> >> > +	 * CLOCK_TAI.
> >> > +	 */
> >> > +	__s32 clockid;
> >> > +
> >> > +	/** Width of the cs cycle counter in bits. */
> >> > +	__u32 width;
> >> > +};
> >> > +
> >> > /**
> >> >  * struct drm_xe_query_mem_usage - describe memory regions and usage
> >> >  *
> >> > @@ -395,6 +471,7 @@ struct drm_xe_device_query {
> >> > #define DRM_XE_DEVICE_QUERY_GTS		3
> >> > #define DRM_XE_DEVICE_QUERY_HWCONFIG	4
> >> > #define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY	5
> >> > +#define DRM_XE_QUERY_CS_CYCLES		6
> >> >	/** @query: The type of data to query */
> >> >	__u32 query;
> >> >
> >> > @@ -737,24 +814,6 @@ struct drm_xe_exec_queue_set_property {
> >> >	__u64 reserved[2];
> >> > };
> >> >
> >> > -/** struct drm_xe_engine_class_instance - instance of an engine class */
> >> > -struct drm_xe_engine_class_instance {
> >> > -#define DRM_XE_ENGINE_CLASS_RENDER		0
> >> > -#define DRM_XE_ENGINE_CLASS_COPY		1
> >> > -#define DRM_XE_ENGINE_CLASS_VIDEO_DECODE	2
> >> > -#define DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE	3
> >> > -#define DRM_XE_ENGINE_CLASS_COMPUTE		4
> >> > -	/*
> >> > -	 * Kernel only class (not actual hardware engine class). Used for
> >> > -	 * creating ordered queues of VM bind operations.
> >> > -	 */
> >> > -#define DRM_XE_ENGINE_CLASS_VM_BIND		5
> >> > -	__u16 engine_class;
> >> > -
> >> > -	__u16 engine_instance;
> >> > -	__u16 gt_id;
> >> > -};
> >> > -
> >> > struct drm_xe_exec_queue_create {
> >> > #define XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY               0
> >> >	/** @extensions: Pointer to the first extension struct, if any */
> >> > --
> >> > 2.38.1
> >> >


More information about the Intel-xe mailing list