[Intel-xe] [PATCH 03/14] drm/xe: Do not take any action if our device was removed.

Matthew Brost matthew.brost at intel.com
Tue May 2 15:40:50 UTC 2023


On Wed, Apr 26, 2023 at 04:57:02PM -0400, Rodrigo Vivi wrote:
> Unfortunately devcoredump infrastructure does not provide and
> interface for us to force the device removal upon the pci_remove
> time of our device.
> 
> The devcoredump is linked at the device level, so when in use
> it will prevent the module removal, but it doesn't prevent the
> call of the pci_remove callback. This callback cannot fail
> anyway and we end up clearing and freeing the entire pci device.
> 
> Hence, after we removed the pci device, we shouldn't allow any
> read or free operations to avoid segmentation fault.
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_devcoredump.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> index d9531183f03a..a08929c01b75 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> @@ -42,6 +42,11 @@
>   * hang capture.
>   */
>  
> +static struct xe_device *coredump_to_xe(const struct xe_devcoredump *coredump)
> +{
> +	return container_of(coredump, struct xe_device, devcoredump);

Confused how still would ever return NULL, can you explain?

Matt

> +}
> +
>  static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
>  				   size_t count, void *data, size_t datalen)
>  {
> @@ -51,6 +56,10 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
>  	struct drm_print_iterator iter;
>  	struct timespec64 ts;
>  
> +	/* Our device is gone already... */
> +	if (!data || !coredump_to_xe(coredump))
> +		return -ENODEV;
> +
>  	iter.data = buffer;
>  	iter.offset = 0;
>  	iter.start = offset;
> @@ -80,12 +89,16 @@ static ssize_t xe_devcoredump_read(char *buffer, loff_t offset,
>  static void xe_devcoredump_free(void *data)
>  {
>  	struct xe_devcoredump *coredump = data;
> -	struct xe_device *xe = container_of(coredump, struct xe_device,
> -					    devcoredump);
> +
> +	/* Our device is gone. Nothing to do... */
> +	if (!data || !coredump_to_xe(coredump))
> +		return;
> +
>  	mutex_lock(&coredump->lock);
>  
>  	coredump->faulty_engine = NULL;
> -	drm_info(&xe->drm, "Xe device coredump has been deleted.\n");
> +	drm_info(&coredump_to_xe(coredump)->drm,
> +		 "Xe device coredump has been deleted.\n");
>  
>  	mutex_unlock(&coredump->lock);
>  }
> -- 
> 2.39.2
> 


More information about the Intel-xe mailing list