[Intel-xe] [PATCH 3/3] drm/xe: Correlate engine and cpu timestamps with better accuracy
Dixit, Ashutosh
ashutosh.dixit at intel.com
Sat Aug 12 00:32:40 UTC 2023
On Fri, 11 Aug 2023 17:30:18 -0700, Umesh Nerlige Ramappa wrote:
>
> On Fri, Aug 11, 2023 at 04:32:32PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 04 Aug 2023 14:47:59 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >
> > Hi Umesh,
> >
> >> On Fri, Aug 04, 2023 at 02:32:53PM -0700, Umesh Nerlige Ramappa wrote:
> >> > +/**
> >> > + * struct drm_xe_query_cs_cycles - correlate CPU and GPU timestamps
> >> > + *
> >> > + * If a query is made with a struct drm_xe_device_query where .query
> >> > + * is equal to DRM_XE_QUERY_CS_CYCLES, then the reply uses
> >> > + * struct drm_xe_query_cs_cycles in .data.
> >> > + *
> >> > + * struct drm_xe_query_cs_cycles is allocated by the user and .data points to
> >> > + * this allocated structure. The user must pass .eci and .clockid as inputs to
> >> > + * this query. eci determines the engine and tile info required to fetch the
> >> > + * relevant GPU timestamp. clockid is used to return the specific CPU
> >> > + * timestamp.
> >> > + *
> >> > + * The query returns the command streamer cycles and the frequency that can
> >> > + * be used to calculate the command streamer timestamp. In addition the
> >> > + * query returns a set of cpu timestamps that indicate when the command
> >> > + * streamer cycle count was captured.
> >> > + */
> >> > +struct drm_xe_query_cs_cycles {
> >> > + /** Engine for which command streamer cycles is queried. */
> >> > + struct drm_xe_engine_class_instance eci;
> >> > +
> >> > + /** MBZ (pad eci to 64 bit) */
> >> > + __u16 rsvd;
> >>
> >> I need some inputs on the rsvd field here. Looks like struct
> >> drm_xe_engine_class_instance may need padding (64 bit aligned) if used this
> >> way. Is this the right way to pad it? Should the padding be moved to
> >> struct drm_xe_engine_class_instance? OR should struct
> >> drm_xe_engine_class_instance be packed?
> >
> > Does it need to be 64 bit aligned? The next field can be an unaligned u64
> > field, it should be ok, x86 at least should be able to handle it.
>
> Then compiler would potentially insert a pad between eci and the next
> u64. The idea is to avoid an uninitialized padding.
The compiler may or may not add padding. So the next u64 field might or
might not be 8 byte aligned. Either way it should not make a difference to
functionality, unaligned accesses may be a little slower than aligned
access. But in this case probably doesn't matter much.
> >
> > But yes, nice to have all struct's 64 bit aligned. IMO the right way to do
> > it would be to pad 'struct drm_xe_engine_class_instance' to be 64 bit
> > aligned. Because if 'struct drm_xe_engine_class_instance' changes, the
> > above '__u16 rsvd' field might not work and is error prone and will also
> > need to be modified. So better to pad 'struct drm_xe_engine_class_instance'
> > to be 64 bit aligned.
>
> Agree, padding drm_xe_engine_class_instance makes sense.
Yeah better to explicitly pad drm_xe_engine_class_instance.
Thanks.
--
Ashutosh
>> >> > +
> >> > + /**
> >> > + * Command streamer cycles as read from the command streamer
> >> > + * register at 0x358 offset.
> >> > + */
> >> > + __u64 cs_cycles;
> >> > +
> >> > + /** Frequency of the cs cycles in Hz. */
> >> > + __u64 cs_frequency;
> >> > +
> >> > + /**
> >> > + * CPU timestamp in ns. The timestamp is captured before reading the
> >> > + * cs_cycles register using the reference clockid set by the user.
> >> > + */
> >> > + __u64 cpu_timestamp;
> >> > +
> >> > + /**
> >> > + * Time delta in ns captured around reading the lower dword of the
> >> > + * cs_cycles register.
> >> > + */
> >> > + __u64 cpu_delta;
> >> > +
> >> > + /**
> >> > + * Reference clock id for CPU timestamp. For definition, see
> >> > + * clock_gettime(2) and perf_event_open(2). Supported clock ids are
> >> > + * CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME,
> >> > + * CLOCK_TAI.
> >> > + */
> >> > + __s32 clockid;
> >> > +
> >> > + /** Width of the cs cycle counter in bits. */
> >> > + __u32 width;
> >> > +};
> >> > +
> >> > /**
> >> > * struct drm_xe_query_mem_usage - describe memory regions and usage
> >> > *
> >> > @@ -395,6 +471,7 @@ struct drm_xe_device_query {
> >> > #define DRM_XE_DEVICE_QUERY_GTS 3
> >> > #define DRM_XE_DEVICE_QUERY_HWCONFIG 4
> >> > #define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY 5
> >> > +#define DRM_XE_QUERY_CS_CYCLES 6
> >> > /** @query: The type of data to query */
> >> > __u32 query;
> >> >
> >> > @@ -737,24 +814,6 @@ struct drm_xe_exec_queue_set_property {
> >> > __u64 reserved[2];
> >> > };
> >> >
> >> > -/** struct drm_xe_engine_class_instance - instance of an engine class */
> >> > -struct drm_xe_engine_class_instance {
> >> > -#define DRM_XE_ENGINE_CLASS_RENDER 0
> >> > -#define DRM_XE_ENGINE_CLASS_COPY 1
> >> > -#define DRM_XE_ENGINE_CLASS_VIDEO_DECODE 2
> >> > -#define DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE 3
> >> > -#define DRM_XE_ENGINE_CLASS_COMPUTE 4
> >> > - /*
> >> > - * Kernel only class (not actual hardware engine class). Used for
> >> > - * creating ordered queues of VM bind operations.
> >> > - */
> >> > -#define DRM_XE_ENGINE_CLASS_VM_BIND 5
> >> > - __u16 engine_class;
> >> > -
> >> > - __u16 engine_instance;
> >> > - __u16 gt_id;
> >> > -};
> >> > -
> >> > struct drm_xe_exec_queue_create {
> >> > #define XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY 0
> >> > /** @extensions: Pointer to the first extension struct, if any */
> >> > --
> >> > 2.38.1
> >> >
More information about the Intel-xe
mailing list