On Mon, Apr 26, 2021 at 10:31 AM Matthew Auld matthew.auld@intel.com wrote:
On 26/04/2021 16:11, Jason Ekstrand wrote:
On Mon, Apr 26, 2021 at 4:42 AM Matthew Auld matthew.auld@intel.com wrote:
Add an entry for the new uAPI needed for DG1. Also add the overall upstream plan, including some notes for the TTM conversion.
v2(Daniel):
- include the overall upstreaming plan
- add a note for mmap, there are differences here for TTM vs i915
- bunch of other suggestions from Daniel
v3: (Daniel)
- add a note for set/get caching stuff
- add some more docs for existing query and extensions stuff
- add an actual code example for regions query
- bunch of other stuff
(Jason)
- uAPI change(!): - try a simpler design with the placements extension - rather than have a generic setparam which can cover multiple use cases, have each extension be responsible for one thing only
v4: (Daniel)
- add some more notes for ttm conversion
- bunch of other stuff
(Jason)
- uAPI change(!): - drop all the extra rsvd members for the region_query and region_info, just keep the bare minimum needed for padding
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Lionel Landwerlin lionel.g.landwerlin@linux.intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: Jason Ekstrand jason@jlekstrand.net Cc: Dave Airlie airlied@gmail.com Cc: dri-devel@lists.freedesktop.org Cc: mesa-dev@lists.freedesktop.org Acked-by: Daniel Vetter daniel.vetter@ffwll.ch Acked-by: Dave Airlie airlied@redhat.com
Documentation/gpu/rfc/i915_gem_lmem.h | 212 ++++++++++++++++++++++++ Documentation/gpu/rfc/i915_gem_lmem.rst | 130 +++++++++++++++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 346 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst
diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h new file mode 100644 index 000000000000..7ed59b6202d5 --- /dev/null +++ b/Documentation/gpu/rfc/i915_gem_lmem.h @@ -0,0 +1,212 @@ +/**
- enum drm_i915_gem_memory_class - Supported memory classes
- */
+enum drm_i915_gem_memory_class {
/** @I915_MEMORY_CLASS_SYSTEM: System memory */
I915_MEMORY_CLASS_SYSTEM = 0,
/** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
I915_MEMORY_CLASS_DEVICE,
+};
+/**
- struct drm_i915_gem_memory_class_instance - Identify particular memory region
- */
+struct drm_i915_gem_memory_class_instance {
/** @memory_class: See enum drm_i915_gem_memory_class */
__u16 memory_class;
/** @memory_instance: Which instance */
__u16 memory_instance;
+};
+/**
- struct drm_i915_memory_region_info - Describes one region as known to the
- driver.
- Note that we reserve some stuff here for potential future work. As an example
- we might want expose the capabilities(see @caps) for a given region, which
- could include things like if the region is CPU mappable/accessible, what are
- the supported mapping types etc.
- Note this is using both struct drm_i915_query_item and struct drm_i915_query.
- For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS
- at &drm_i915_query_item.query_id.
- */
+struct drm_i915_memory_region_info {
/** @region: The class:instance pair encoding */
struct drm_i915_gem_memory_class_instance region;
/** @pad: MBZ */
__u32 pad;
/** @caps: MBZ */
__u64 caps;
/** @probed_size: Memory probed by the driver (-1 = unknown) */
__u64 probed_size;
/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
__u64 unallocated_size;
+};
+/**
- struct drm_i915_query_memory_regions
- The region info query enumerates all regions known to the driver by filling
- in an array of struct drm_i915_memory_region_info structures.
- Example for getting the list of supported regions:
- .. code-block:: C
struct drm_i915_query_memory_regions *info;
struct drm_i915_query_item item = {
.query_id = DRM_I915_QUERY_MEMORY_REGIONS;
};
struct drm_i915_query query = {
.num_items = 1,
.items_ptr = (uintptr_t)&item,
};
int err, i;
// First query the size of the blob we need, this needs to be large
// enough to hold our array of regions. The kernel will fill out the
// item.length for us, which is the number of bytes we need.
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...
info = calloc(1, item.length);
// Now that we allocated the required number of bytes, we call the ioctl
// again, this time with the data_ptr pointing to our newly allocated
// blob, which the kernel can then populate with the all the region info.
item.data_ptr = (uintptr_t)&info,
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...
// We can now access each region in the array
for (i = 0; i < info->num_regions; i++) {
struct drm_i915_memory_region_info mr = info->regions[i];
u16 class = mr.region.class;
u16 instance = mr.region.instance;
....
}
free(info);
- */
+struct drm_i915_query_memory_regions {
/** @num_regions: Number of supported regions */
__u32 num_regions;
/** @pad: MBZ */
__u32 pad;
/** @regions: Info about each supported region */
struct drm_i915_memory_region_info regions[];
+};
+#define DRM_I915_GEM_CREATE_EXT 0xdeadbeaf +#define DRM_IOCTL_I915_GEM_CREATE_EXT DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext)
Here's another thought: Instead of burning a new IOCTL number, should we just re-use DRM_I915_GEM_CREATE? The different structure size should let us tell the two apart.
Yeah, it was exactly like that in the original version[1]. Scrolling through the review comments, I think the concern was with an older kernel just silently ignoring the extensions with new userspace.
[1] https://patchwork.freedesktop.org/patch/404455/?series=84344&rev=1
Yeah, I think Chris is right. I just dug through the code and there really is no way for us to reject different sized structs; they just show up zero-extended. That's aggravating.
--Jason
--Jason
+/**
- struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
- extension support using struct i915_user_extension.
- Note that in the future we want to have our buffer flags here, at least for
- the stuff that is immutable. Previously we would have two ioctls, one to
- create the object with gem_create, and another to apply various parameters,
- however this creates some ambiguity for the params which are considered
- immutable. Also in general we're phasing out the various SET/GET ioctls.
- */
+struct drm_i915_gem_create_ext {
/**
* @size: Requested size for the object.
*
* The (page-aligned) allocated size for the object will be returned.
*
* Note that for some devices we have might have further minimum
* page-size restrictions(larger than 4K), like for device local-memory.
* However in general the final size here should always reflect any
* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
* extension to place the object in device local-memory.
*/
__u64 size;
/**
* @handle: Returned handle for the object.
*
* Object handles are nonzero.
*/
__u32 handle;
/** @flags: MBZ */
__u32 flags;
/**
* @extensions: The chain of extensions to apply to this object.
*
* This will be useful in the future when we need to support several
* different extensions, and we need to apply more than one when
* creating the object. See struct i915_user_extension.
*
* If we don't supply any extensions then we get the same old gem_create
* behaviour.
*
* For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
* struct drm_i915_gem_create_ext_memory_regions.
*/
+#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
__u64 extensions;
+};
+/**
- struct drm_i915_gem_create_ext_memory_regions - The
- I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.
- Set the object with the desired set of placements/regions in priority
- order. Each entry must be unique and supported by the device.
- This is provided as an array of struct drm_i915_gem_memory_class_instance, or
- an equivalent layout of class:instance pair encodings. See struct
- drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to
- query the supported regions for a device.
- As an example, on discrete devices, if we wish to set the placement as
- device local-memory we can do something like:
- .. code-block:: C
struct drm_i915_gem_memory_class_instance region_lmem = {
.memory_class = I915_MEMORY_CLASS_DEVICE,
.memory_instance = 0,
};
struct drm_i915_gem_create_ext_memory_regions regions = {
.base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
.regions = (uintptr_t)®ion_lmem,
.num_regions = 1,
};
struct drm_i915_gem_create_ext create_ext = {
.size = 16 * PAGE_SIZE,
.extensions = (uintptr_t)®ions,
};
int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...
- At which point we get the object handle in &drm_i915_gem_create_ext.handle,
- along with the final object size in &drm_i915_gem_create_ext.size, which
- should account for any rounding up, if required.
- */
+struct drm_i915_gem_create_ext_memory_regions {
/** @base: Extension link. See struct i915_user_extension. */
struct i915_user_extension base;
/** @pad: MBZ */
__u32 pad;
/** @num_regions: Number of elements in the @regions array. */
__u32 num_regions;
/**
* @regions: The regions/placements array.
*
* An array of struct drm_i915_gem_memory_class_instance.
*/
__u64 regions;
+}; diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst new file mode 100644 index 000000000000..462f1efd9003 --- /dev/null +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst @@ -0,0 +1,130 @@ +========================= +I915 DG1/LMEM RFC Section +=========================
+Upstream plan +============= +For upstream the overall plan for landing all the DG1 stuff and turning it for +real, with all the uAPI bits is:
+* Merge basic HW enabling of DG1(still without pciid) +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag
* At this point we can still make changes, but importantly this lets us
start running IGTs which can utilize local-memory in CI
+* Convert over to TTM, make sure it all keeps working. Some of the work items:
* TTM shrinker for discrete
* dma_resv_lockitem for full dma_resv_lock, i.e not just trylock
* Use TTM CPU pagefault handler
* Route shmem backend over to TTM SYSTEM for discrete
* TTM purgeable object support
* Move i915 buddy allocator over to TTM
* MMAP ioctl mode(see `I915 MMAP`_)
* SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
+* Add pciid for DG1 and turn on uAPI for real
+New object placement and region query uAPI +========================================== +Starting from DG1 we need to give userspace the ability to allocate buffers from +device local-memory. Currently the driver supports gem_create, which can place +buffers in system memory via shmem, and the usual assortment of other +interfaces, like dumb buffers and userptr.
+To support this new capability, while also providing a uAPI which will work +beyond just DG1, we propose to offer three new bits of uAPI:
+DRM_I915_QUERY_MEMORY_REGIONS +----------------------------- +New query ID which allows userspace to discover the list of supported memory +regions(like system-memory and local-memory) for a given device. We identify +each region with a class and instance pair, which should be unique. The class +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms +like DG1.
+Side note: The class/instance design is borrowed from our existing engine uAPI, +where we describe every physical engine in terms of its class, and the +particular instance, since we can have more than one per class.
+In the future we also want to expose more information which can further +describe the capabilities of a region.
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
:functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions
+GEM_CREATE_EXT +-------------- +New ioctl which is basically just gem_create but now allows userspace to +provide a chain of possible extensions. Note that if we don't provide any +extensions then we get the exact same behaviour as gem_create.
+Side note: We also need to support PXP[1] in the near future, which is also +applicable to integrated platforms, and adds its own gem_create_ext extension, +which basically lets userspace mark a buffer as "protected".
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
:functions: drm_i915_gem_create_ext
+I915_GEM_CREATE_EXT_MEMORY_REGIONS +---------------------------------- +Implemented as an extension for gem_create_ext, we would now allow userspace to +optionally provide an immutable list of preferred placements at creation time, +in priority order, for a given buffer object. For the placements we expect +them each to use the class/instance encoding, as per the output of the regions +query. Having the list in priority order will be useful in the future when +placing an object, say during eviction.
+.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h
:functions: drm_i915_gem_create_ext_memory_regions
+One fair criticism here is that this seems a little over-engineered[2]. If we +just consider DG1 then yes, a simple gem_create.flags or something is totally +all that's needed to tell the kernel to allocate the buffer in local-memory or +whatever. However looking to the future we need uAPI which can also support +upcoming Xe HP multi-tile architecture in a sane way, where there can be +multiple local-memory instances for a given device, and so using both class and +instance in our uAPI to describe regions is desirable, although specifically +for DG1 it's uninteresting, since we only have a single local-memory instance.
+Existing uAPI issues +==================== +Some potential issues we still need to resolve.
+I915 MMAP +--------- +In i915 there are multiple ways to MMAP GEM object, including mapping the same +object using different mapping types(WC vs WB), i.e multiple active mmaps per +object. TTM expects one MMAP at most for the lifetime of the object. If it +turns out that we have to backpedal here, there might be some potential +userspace fallout.
+I915 SET/GET CACHING +-------------------- +In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but +DG1 doesn't support non-snooped pcie transactions, so we can just always +allocate as WB for smem-only buffers. If/when our hw gains support for +non-snooped pcie transactions then we must fix this mode at allocation time as +a new GEM extension.
+This is related to the mmap problem, because in general (meaning, when we're +not running on intel cpus) the cpu mmap must not, ever, be inconsistent with +allocation mode.
+Possible idea is to let the kernel picks the mmap mode for userspace from the +following table:
+smem-only: WB. Userspace does not need to call clflush.
+smem+lmem: We allocate uncached memory, and give userspace a WC mapping +for when the buffer is in smem, and WC when it's in lmem. GPU does snooped +access, which is a bit inefficient.
+lmem only: always WC
+This means on discrete you only get a single mmap mode, all others must be +rejected. That's probably going to be a new default mode or something like +that.
+Links +===== +[1] https://patchwork.freedesktop.org/series/86798/
+[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791 diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index a8621f7dab8b..05670442ca1b 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -15,3 +15,7 @@ host such documentation:
- Once the code has landed move all the documentation to the right places in the main core, helper or driver sections.
+.. toctree::
- i915_gem_lmem.rst
-- 2.26.3