[PATCH 2/2] drm/xe/pf: Expose access to the VF GGTT PTEs over debugfs
Matthew Brost
matthew.brost at intel.com
Tue Nov 5 17:26:59 UTC 2024
On Tue, Nov 05, 2024 at 05:41:40PM +0100, Michal Wajdeczko wrote:
>
>
> On 05.11.2024 02:14, Matthew Brost wrote:
> > On Sun, Nov 03, 2024 at 09:16:33PM +0100, Michal Wajdeczko wrote:
> >> For feature enabling and testing purposes, allow to capture and
> >> replace VF's GGTT PTEs data using debugfs blob file.
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> >> ---
> >> drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 62 +++++++++++++++++++++
> >> 1 file changed, 62 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> >> index 05df4ab3514b..69ba830d9e8d 100644
> >> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> >> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> >> @@ -11,6 +11,7 @@
> >> #include "xe_bo.h"
> >> #include "xe_debugfs.h"
> >> #include "xe_device.h"
> >> +#include "xe_ggtt.h"
> >> #include "xe_gt.h"
> >> #include "xe_gt_debugfs.h"
> >> #include "xe_gt_sriov_pf_config.h"
> >> @@ -497,6 +498,64 @@ static const struct file_operations config_blob_ops = {
> >> .llseek = default_llseek,
> >> };
> >>
> >> +/*
> >> + * /sys/kernel/debug/dri/0/
> >> + * ├── gt0
> >> + * │ ├── vf1
> >> + * │ │ ├── ggtt_raw
> >> + */
> >> +
> >> +static ssize_t ggtt_raw_read(struct file *file, char __user *buf,
> >> + size_t count, loff_t *pos)
> >> +{
> >> + struct dentry *dent = file_dentry(file);
> >> + struct dentry *parent = dent->d_parent;
> >> + unsigned int vfid = extract_vfid(parent);
> >> + struct xe_gt *gt = extract_gt(parent);
> >> + struct xe_device *xe = gt_to_xe(gt);
> >> + ssize_t ret;
> >> +
> >> + xe_pm_runtime_get(xe);
> >> + mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
> >
> > + Thomas to confirm I'm making sense here.
> >
> > So this relates to this patch [1] / Thomas comment [2].
> >
> > You are adding memory allocations here under the
> > xe_gt_sriov_pf_master_mutex which renders [1] incomplete.
>
> I was assuming that using GFP_NOWAIT and then on fail having a fallback
> to fixed 64B local chunk is fine, no?
>
Yes. I realized after I typed this you use GFP_NOWAIT which cannot
trigger reclaim so this is indeed fine.
> >
> > So you need to one of two things:
> >
> > 1. Never do any memory allocations under xe_gt_sriov_pf_master_mutex. If
> > you choose this option taint this mutex with reclaim when loading the
> > PF. It is then safe to xe_gt_sriov_pf_master_mutex in suspend / resume /
> > reset flows.
>
> well, due to lack of [1] there are still some allocations done during
> sending a VF config to the GuC, but hopefully we can mitigate that soon
>
Yea I see one in pf_push_full_vf_config using GFP_KERNEL which would
problematic.
> but what I found recently is that due to recent GGTT refactoring, the
> xe_ggtt_node is now allocated (with GFP_NOFS) flag under that mutex,
> which may require another round of fixes
>
Yep, if a xe_ggtt_node is allocated with GFP_NOFS that would also be
problematic. If you could preallocate the node outside of
xe_gt_sriov_pf_master_mutex that might work?
> >
> > 2. Remove xe_gt_sriov_pf_master_mutex from suspend / resume / reset
> > flows.
>
> reprovisioning (sending VFs configs to GuC) is only done as one of the
> final reset steps, and as long it's there it will require that mutex
>
> alternate option would be to decouple reprovisioning to an async worker
> triggered from the reset, will take a look at this
>
This might be the safest option as keeping memory allocations outside of
a large mutex can be difficut to maintain but if taint the mutex with
reclaim we'd presumably catch such issues immediately. No idea if an
async worker here would creates another set problems though.
Matt
> >
> > In addition to above, also never allocate memory in suspend / resume /
> > reset flows.
> >
> > Not blocker here but just using this as an example to explain the
> > current SRIOV locking problems. Hope this helps.
> >
> > Matt
> >
> > [1] https://patchwork.freedesktop.org/patch/619024/?series=139801&rev=1
> > [2] https://lore.kernel.org/intel-xe/3e13401972fd49240f486fd7d47580e576794c78.camel@intel.com/
> >
> >> +
> >> + ret = xe_ggtt_node_read(gt->sriov.pf.vfs[vfid].config.ggtt_region,
> >> + buf, count, pos);
> >> +
> >> + mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
> >> + xe_pm_runtime_put(xe);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static ssize_t ggtt_raw_write(struct file *file, const char __user *buf,
> >> + size_t count, loff_t *pos)
> >> +{
> >> + struct dentry *dent = file_dentry(file);
> >> + struct dentry *parent = dent->d_parent;
> >> + unsigned int vfid = extract_vfid(parent);
> >> + struct xe_gt *gt = extract_gt(parent);
> >> + struct xe_device *xe = gt_to_xe(gt);
> >> + ssize_t ret;
> >> +
> >> + xe_pm_runtime_get(xe);
> >> + mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
> >> +
> >> + ret = xe_ggtt_node_write(gt->sriov.pf.vfs[vfid].config.ggtt_region,
> >> + buf, count, pos);
> >> +
> >> + mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
> >> + xe_pm_runtime_put(xe);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static const struct file_operations ggtt_raw_ops = {
> >> + .owner = THIS_MODULE,
> >> + .read = ggtt_raw_read,
> >> + .write = ggtt_raw_write,
> >> + .llseek = default_llseek,
> >> +};
> >> +
> >> /**
> >> * xe_gt_sriov_pf_debugfs_register - Register SR-IOV PF specific entries in GT debugfs.
> >> * @gt: the &xe_gt to register
> >> @@ -554,6 +613,9 @@ void xe_gt_sriov_pf_debugfs_register(struct xe_gt *gt, struct dentry *root)
> >> debugfs_create_file("config_blob",
> >> IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV) ? 0600 : 0400,
> >> vfdentry, NULL, &config_blob_ops);
> >> + debugfs_create_file("ggtt_raw",
> >> + IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV) ? 0600 : 0400,
> >> + vfdentry, NULL, &ggtt_raw_ops);
> >> }
> >> }
> >> }
> >> --
> >> 2.43.0
> >>
>
More information about the Intel-xe
mailing list