<html dir="ltr"><head></head><body style="text-align:left; direction:ltr;"><div>On Wed, 2021-04-28 at 10:09 +1000, Erwin van Londen wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><br></div><div><span></span></div><div><br></div><div>On Tue, 2021-04-27 at 16:41 -0400, Ewan D. Milne wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>On Tue, 2021-04-27 at 20:33 +0000, Martin Wilck wrote:<br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:<br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><br></div><div>There's no way to do that, in principle. Because there could be<br></div><div>other I/Os in flight. You might (somehow) avoid retrying an I/O<br></div><div>that got a UA until you figured out if something changed, but other<br></div><div>I/Os can already have been sent to the target, or issued before you<br></div><div>get to look at the status.<br></div></blockquote></blockquote></blockquote><div><br></div><div>If something happens on a storage side where a lun gets it's attributes changed (any, doesn't matter which one) a UA should be sent. Also all outstanding IO's on that lun should be returning an Abort as it can no longer warrant the validity of any IO due to these changes. Especially when parameters are involved like reservations (PR's) etc. If that does not happen from an array side all bets are off as the only way to be able to get back in business is to start from scratch.</div></blockquote><div><br></div><div>Perhaps an array might abort I/Os it has received in the Device Server when</div><div>something changes. I have no idea if most or any arrays actually do that.</div><div><br></div><div>But, what about I/O that has already been queued from the host to the</div><div>host bus adapter? I don't see how we can abort those I/Os properly.</div><div>Most high-performance HBAs have a queue of commands and a queue</div><div>of responses, there could be lots of commands queued before we</div><div>manage to notice an interesting status. And AFAIK there is no conditional</div><div>mechanism that could hold them off (and, they could be in-flight on the</div><div>wire anyway).</div><div><br></div><div>I get what you are saying about what SAM describes, I just don't see how</div><div>we can guarantee we don't send any further commands after the status</div><div>with the UA is sent back, before we can understand what happened.</div><div><br></div><div>-Ewan</div><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"></blockquote><div><br></div><div>Right. But in practice, a WWID change will hardly happen under full<br></div><div>IO<br></div><div>load. The storage side will probably have to block IO while this<br></div><div>happens, at least for a short time period. So blocking and quiescing<br></div><div>the queue upon an UA might still work, most of the time. Even if we<br></div><div>were too late already, the sooner we stop the queue, the better.<br></div></blockquote></blockquote><div><br></div><div>I think in most cases when something happens on an array side you will see IO's being aborted. That might be a good time to start doing TUR's and if these come back OK do a new inquiry. From a host side there is only so much you can do.</div><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><br></div><div>The current algorithm in multipath-tools needs to detect a path going<br></div><div>down and being reinstated. The time interval during which a WWID<br></div><div>change<br></div><div>will go unnoticed is one or more path checker intervals, typically on<br></div><div>the order of 5-30 seconds. If we could decrease this interval to a<br></div><div>sub-<br></div><div>second or even millisecond range by blocking the queue in the kernel<br></div><div>quickly, we'd have made a big step forward.<br></div></blockquote><div><br></div><div>Yes, and in many situations this may help. But in the general case<br></div><div>we can't protect against a storage array misconfiguration,<br></div><div>where something like this can happen. So I worry about people<br></div><div>believing the host software will protect them against a mistake,<br></div><div>when we can't really do that.<br></div></blockquote><div><br></div><div>My thought exactly. </div><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><br></div><div>All it takes is one I/O (a discard) to make a thorough mess of the LUN.<br></div><div><br></div><div>-Ewan<br></div><div><br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div><br></div><div>Regards<br></div><div>Martin<br></div><div><br></div></blockquote><div><br></div><div>--<br></div><div>dm-devel mailing list<br></div><div><a href="mailto:dm-devel@redhat.com">dm-devel@redhat.com</a><br></div><div><a href="https://listman.redhat.com/mailman/listinfo/dm-devel">https://listman.redhat.com/mailman/listinfo/dm-devel</a><br></div><div><br></div></blockquote> <pre>--</pre><pre>dm-devel mailing list</pre><pre><a href="mailto:dm-devel@redhat.com">dm-devel@redhat.com</a></pre><pre><a href="https://listman.redhat.com/mailman/listinfo/dm-devel">https://listman.redhat.com/mailman/listinfo/dm-devel</a></pre></blockquote></body></html>