[Intel-gfx] [RFC PATCH 0/9] drm/i915/spi: discrete graphics internal spi

Winkler, Tomas tomas.winkler at intel.com
Wed Feb 17 08:34:53 UTC 2021


> 
> )On Tue, Feb 16, 2021 at 7:26 PM Tomas Winkler <tomas.winkler at intel.com>
> wrote:
> > Because the graphic card may undergo reset at any time and basically
> > hot unplug all its child devices, this series also provides a fix to
> > the mtd framework to make the reset graceful.
> 
> Well, just because MTD does not work as you expect, it is not broken. :-)
I'm not saying it's broken by design it just didn't fit this use case. 
> 
> In your case i915_spi_remove() blindly removes the MTD, this is not allowed.
> You may remove the MTD only if there are no more users.

I'm not sure it's good idea to stall the removal on user space.
This is just asking for a deadlock as user space is not getting what it needs and may stall
I think it's better the user space will fail gracefully the hw is not accessible in that stage anyway. 
> 
> The current model in MTD is that the driver is in charge of all life cycle
> management.
> Using ->_get_device() and ->_put_device() a driver can implement
> refcounting and deny new users if the MTD is about to disappear.

Please note that this use case you are describing is still valid, I haven't removed _get_device() _put_device() handlers,
You can still stall the removal of mtd, If this is not that way it's a bug

> 
> In the upcoming MUSE driver that mechanism is used too.
> MUSE allows to implement a MTD in userspace. So the FUSE server can
> disappear at
> *any* time. Just like in your case. Even worse, it can be hostile.
> In MUSE the MTD life time is tied to the FUSE connection object,
> muse_mtd_get_device()
> increments the FUSE connection refcount, and muse_mtd_put_device()
> decrements it.
> That means if the FUSE server disappears all of a sudden but the MTD still has
> users, the MTD will stay. But in this state no new references are allowed and
> all MTD operations of existing users will fail with -ENOTCONN (via FUSE).
> As soon the last user is gone (can be userspace via /dev/mtd* or a in-kernel
> user such as UBIFS), the MTD will be removed.

But in our case whole i915 is taken hostage, it cannot reset because of misbehaving user space.

> For the full details, please see:
> https://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git/tree/fs/fuse/m
> use.c?h=muse_v3#n1034
> 
> Is in your case *really* not possible to do it that way?

Maybe it's possible but I don't think it's good to stall i915 removal. Also It's very easily to crash the kernel.
I've posted a sniped to the mailing list that tried to do that, the kernel still has crashed. Can you looked at?

> On the other hand, your last patch moves some part of the life cycle
> management into MTD core.
> The MTD will stay as long it has users.
> But that's only one part. The driver is still in charge to make sure that all
> operations fail immediately and that no new users arrive.

I think that case I would need to validate every HW access to make sure it's still valid.

> If we want to do all in MTD core we'd have to do it like SCSI disks.
> That means having devices states such as SDEV_RUNNING, SDEV_CANCEL,
> SDEV_OFFLINE, ....
> That way the MTD could be shutdown gracefully, first no new users are
> allowed, then ongoing operations will be cancelled, next all operation will fail
> with -EIO or such, then the device is being removed from sysfs and finally if
> the last user is gone, the MTD can be removed.

Isn't that already that way? You cannot open new handler. That I would need more of your insights.
> 
> I'm not sure whether we want to take that path.

Thanks
Tomas



More information about the Intel-gfx mailing list