[RFC PATCH] drm/pancsf: Add a new driver for Mali CSF-based GPUs

Steven Price steven.price at arm.com
Mon Feb 6 09:37:51 UTC 2023


On 03/02/2023 17:58, Alyssa Rosenzweig wrote:
>>>> +struct drm_pancsf_gpu_info {
>>>> +#define DRM_PANCSF_ARCH_MAJOR(x)		((x) >> 28)
>>>> +#define DRM_PANCSF_ARCH_MINOR(x)		(((x) >> 24) & 0xf)
>>>> +#define DRM_PANCSF_ARCH_REV(x)			(((x) >> 20) & 0xf)
>>>> +#define DRM_PANCSF_PRODUCT_MAJOR(x)		(((x) >> 16) & 0xf)
>>>> +#define DRM_PANCSF_VERSION_MAJOR(x)		(((x) >> 12) & 0xf)
>>>> +#define DRM_PANCSF_VERSION_MINOR(x)		(((x) >> 4) & 0xff)
>>>> +#define DRM_PANCSF_VERSION_STATUS(x)		((x) & 0xf)
>>>> +	__u32 gpu_id;
>>>> +	__u32 gpu_rev;
>>>> +#define DRM_PANCSF_CSHW_MAJOR(x)		(((x) >> 26) & 0x3f)
>>>> +#define DRM_PANCSF_CSHW_MINOR(x)		(((x) >> 20) & 0x3f)
>>>> +#define DRM_PANCSF_CSHW_REV(x)			(((x) >> 16) & 0xf)
>>>> +#define DRM_PANCSF_MCU_MAJOR(x)			(((x) >> 10) & 0x3f)
>>>> +#define DRM_PANCSF_MCU_MINOR(x)			(((x) >> 4) & 0x3f)
>>>> +#define DRM_PANCSF_MCU_REV(x)			((x) & 0xf)
>>>> +	__u32 csf_id;
>>>> +	__u32 l2_features;
>>>> +	__u32 tiler_features;
>>>> +	__u32 mem_features;
>>>> +	__u32 mmu_features;
>>>> +	__u32 thread_features;
>>>> +	__u32 max_threads;
>>>> +	__u32 thread_max_workgroup_size;
>>>> +	__u32 thread_max_barrier_size;
>>>> +	__u32 coherency_features;
>>>> +	__u32 texture_features[4];
>>>> +	__u32 as_present;
>>>> +	__u32 core_group_count;
>>>> +	__u64 shader_present;
>>>> +	__u64 l2_present;
>>>> +	__u64 tiler_present;
>>>> +};
>>>> +
>>>> +struct drm_pancsf_csif_info {
>>>> +	__u32 csg_slot_count;
>>>> +	__u32 cs_slot_count;
>>>> +	__u32 cs_reg_count;
>>>> +	__u32 scoreboard_slot_count;
>>>> +	__u32 unpreserved_cs_reg_count;
>>>> +};
>>>> +
>>>> +struct drm_pancsf_dev_query {
>>>> +	/** @type: the query type (see enum drm_pancsf_dev_query_type). */
>>>> +	__u32 type;
>>>> +
>>>> +	/**
>>>> +	 * @size: size of the type being queried.
>>>> +	 *
>>>> +	 * If pointer is NULL, size is updated by the driver to provide the
>>>> +	 * output structure size. If pointer is not NULL, the the driver will
>>>> +	 * only copy min(size, actual_structure_size) bytes to the pointer,
>>>> +	 * and update the size accordingly. This allows us to extend query
>>>> +	 * types without breaking userspace.
>>>> +	 */
>>>> +	__u32 size;
>>>> +
>>>> +	/**
>>>> +	 * @pointer: user pointer to a query type struct.
>>>> +	 *
>>>> +	 * Pointer can be NULL, in which case, nothing is copied, but the
>>>> +	 * actual structure size is returned. If not NULL, it must point to
>>>> +	 * a location that's large enough to hold size bytes.
>>>> +	 */
>>>> +	__u64 pointer;
>>>> +};  
>>>
>>> Genuine question: is there something wrong with the panfrost 'get_param'
>>> ioctl where individual features are queried one-by-one, rather than
>>> passing a big structure back to user space.
>>
>> Well, I've just seen the Xe driver exposing things this way, and I thought
>> it was a good idea, but I don't have a strong opinion here, and if others
>> think it's preferable to stick to GET_PARAM, I'm fine with that too.
> 
> I vastly prefer the info struct, GET_PARAM isn't a great interface when
> there are large numbers of properties to query... Actually I just
> suggested to Lina that she adopt this approach for Asahi instead of the
> current GET_PARAM ioctl we have (downstream for now).

Ok, good to know there is some preference here - like I said this was a
genuine question: I'm not trying to say this is wrong.

> It isn't a *big* deal but GET_PARAM doesn't really seem better on any
> axes.
> 
>>> I ask because we've had issues in the past with trying to 'deprecate'
>>> registers - if a new version of the hardware stops providing a
>>> (meaningful) value for a register then it's hard to fix up the
>>> structures.
> 
> I'm not sure this is a big deal. If the register no longer exists
> (meaningfully), zero it out in the info structure and trust userspace to
> interpret meaningfully based on the GPU. If registers are getting
> dropped between revisions, that's obviously not great. But this should
> only change at major architecture boundaries; I don't see the added
> value of doing the interpretation in kernel instead of userspace. I say
> this with my userspace hat on, of course ;-)

Just some background:

In the early days of the Midgard DDK driver there was a structure much
like the one proposed here in which the kernel dumped the various
feature registers and passed it as a large struct to user space. User
space then ended up using that struct directly in various bits of code
all over the place.

We then ended up with the problem that it was easy to add new properties
to that struct (including derived and hideously badly defined ones like
"gpu_available_memory_size") but basically impossible to remove anything
since the struct definition was shared between user and kernel. The
kernel couldn't drop anything because old user space might need it, and
user space picked up the definition from the kernel so these problematic
members were always there to tempt user space coders.

By packing the data into a structured blob it provided the ability to:

 * Provide a logical separation between user and kernel - each could
have their own structure and it was definitively not uABI.

 * The kernel could provide backwards compat values for properties we
wanted to kill and user space could simply ignore them. They wouldn't be
around to tempt future user space coders.

 * New properties could be added without breaking old user space, and
without forever bloating the structure if we wanted to back out the
change (at worst you just burn an ID value for the structured blob).

 * (In theory) it's possible for user space to identify that a property
isn't present (e.g. running on an old kernel) rather than trying to
populate every field with a dummy value. As far as I remember this never
actually happened... user space just made up values for what was missing ;)

>>> There is obviously overhead iterating over all the register that user
>>> space cares about. Another option (used by kbase) is to return some form
>>> of structured data so a missing property can be encoded.
>>
>> I'll have a look at how kbase does that. Thanks for the pointer.
> 
> I'd be fine with the kbase approach but I don't really see the added
> value over what Boris proposed in the RFC, tbh.

My main concern is making that structure uABI. Which is because 6 years
ago I had to clean up the mess that we had in the DDK ;)

Although I'll admit that most of the problems we had with the DDK was
user space developers wanting information that the GPU driver shouldn't
have been providing (maximum amount of memory available, clock speeds
etc) or derived properties that user space could have calculated itself
(e.g. decoded GPU ID). I guess this is also one of the problems with
developing a driver in parallel with the hardware - things get added to
try the idea out, and not always reverted if it turns out badly (or even
cleaned up if it's a good idea).

Panfrost/PanCSF thankfully have much cleaner sets of properties exposed.

Steve



More information about the dri-devel mailing list