[Intel-gfx] [PATCH v2 3/5] drm/i915/guc: Provide debugfs for log relay sub-buf info

Teres Alexis, Alan Previn alan.previn.teres.alexis at intel.com
Tue Mar 14 22:09:32 UTC 2023


On Thu, 2023-03-09 at 15:29 -0800, Teres Alexis, Alan Previn wrote:
> > > 
alan:snip

> > > +static int guc_log_relay_subbuf_size_get(void *data, u64 *val)
> > > +{
> > > +	struct intel_guc_log *log = data;
> > > +
> > > +	if (!log->vma)
> > > +		return -ENODEV;
> > 
> > For the record, from the other email thread, the issue here is whether this
> > check is needed.
> > 
> > Also, the issue is what happens if the relay is open and we unload the
> > module, what happens?
> > 
> I'll retest this - but I clearly remember that if the user space app was stil holding
> onto the debugfs handle, the i915 unload would go through most of the driver unload /
> unregister steps, while the app doesnt get any signals but if the app were to close that
> handle after that, (guc_log_relay_ctl_release gets called), we do get invalid ptr access
> in kernel. Take note the logger tool runs with sudo. That said something "like" above check
> is required but perhaps hanging off a still-valid ptr (like i915->foo - maybe gt-struct validity
> - but needs something that is explicitly cleared on unload, not left around with stale ptrs.
> 

An update on this above after some digging / testing : I believe we dont we need to check
for "log->vma" validity as you had suspected. However, I did find other legacy debugfs
functions for relay logging that DID check for it - so I must have been trying to maintain
consistency. That said, i will probably remove the check from other legacy functions as well
so they are all consistently not checking for it since its not required.

However, in the process of testing, i found an issue when connecting relay logger tool
and unloading driver. On one hand this is a debugfs interface and we may be able to fix that
later as the use-case doesnt really expect used to run this tool while unloading the driver.
On the other hand some of my colleagues did stress that crashing in kernel is something we cannot
igore and knowably allow. Considering the fact that relay logging tool is not working at all
upstream today, this patch could "unmask" that error. Finally, i too find myself, as part of testing /
debugging, occasionally forgetting to stop the relay logger tool when unloading and i cant even do
simple soft-reboot because of how bad things get in the i915. Given all considerations, I'm compelled
to fix that properly now. Previously, the majority of the time taken for this series was mostly
tied to the intel_guc_logger side of the effort, not the kernel changes. But for this fix, i think
more time + changes will be required on the kernel side.



More information about the Intel-gfx mailing list