[Intel-gfx] [PATCH v2 4/9] vfio/ccw: Make the FSM complete and synchronize it to the mdev

Cornelia Huck cohuck at redhat.com
Mon Sep 20 12:19:18 UTC 2021


On Thu, Sep 09 2021, Jason Gunthorpe <jgg at nvidia.com> wrote:

> The subchannel should be left in a quiescent state unless the VFIO device
> FD is opened. When the FD is opened bring the chanel to active and allow
> the VFIO device to operate. When the device FD is closed then quiesce the
> channel.
>
> To make this work the FSM needs to handle the transitions to/from open and
> closed so everything is sequenced. Rename state NOT_OPER to BROKEN and use
> it wheneven the driver has malfunctioned. STANDBY becomes CLOSED. The
> normal case FSM looks like:
>     CLOSED -> IDLE -> PROCESS/PENDING* -> IDLE -> CLOSED
>
> With a possible branch off to BROKEN from any state. Once the device is in
> BROKEN it cannot be recovered other than be reloading the driver.

Hm, not sure whether it is a good idea to conflate "something went
wrong" and "device is not operational". In the latter case, we will
eventually get a removal of the css device when the common I/O layer has
processed the channel report for the subchannel; while the former case
could mean all kind of things, but the subchannel will likely stay
around. I think NOT_OPER was always meant to be a transitional state.

>
> Delete the triply redundant calls to
> vfio_ccw_sch_quiesce(). vfio_ccw_mdev_close_device() always leaves the
> subchannel quiescent. vfio_ccw_mdev_remove() cannot return until
> vfio_ccw_mdev_close_device() completes and vfio_ccw_sch_remove() cannot
> return until vfio_ccw_mdev_remove() completes. Have the FSM code take care
> of calling cp_free() when appropriate.

I remember some serialization issues wrt cp_free() etc. coming up every
now and than; that might need extra care (I'm taking a look.)

>
> Device reset becomes a CLOSE/OPEN sequence which now properly handles the
> situation if the device becomes BROKEN.
>
> Machine shutdown via vfio_ccw_sch_shutdown() now simply tries to close and
> leaves the device BROKEN (though arguably the bus should take care to
> quiet down the subchannel HW during shutdown, not the drivers)

The problem is that there is not really a uniform thing that can be done
for shutdown; e.g. we can quiesce and then disable I/O and EADM
subchannels, but CHSC subchannels cannot be quiesced.



More information about the Intel-gfx mailing list