<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<pre>Tomas and Daniel,
>From the discussion on this thread, I infer following understanding:
</pre>
<ul>
<li>At present(v9) I915 wants to be hard binded to mei_hdcp
device-driver binding status through components</li>
<ul>
<li>This means I915 driver load will get complete only when the
mei_hdcp's device and driver are bound.</li>
<li>if mei_hdcp device reset I915 will unregister itself from
userspace, and wait for the mei_hdcp device-deriver rebinding.</li>
<ul>
<li>Could be due to FW error or any unexpected failures those
are rare occurances.<br>
</li>
</ul>
<li>when mei_hdcp module is removed i915 will unregister itself.</li>
<li>Becasue of this, Ideally I915 dont expect the device reset
from mei for suspend and resume.</li>
</ul>
<li>At present Mei bus is designed as below:</li>
<ul>
<li>Device will disappear on FW failures, FW upgrade, suspend of
the system etc.</li>
<li>And when the errors are handled or on system resume mei
device will reappear, hence binding with corresponding driver.</li>
</ul>
<li>Mei doesn't plan to avoid the device reset(disappearance and
reappearance) for suspend and resume in near future.</li>
</ul>
<pre>Based on above understanding, I propose the below approach. Please correct or approve it.
</pre>
<ul>
<li>At present(v9) component_add from mei_hdcp indicates the
mei_hdcp's device-driver binded state.</li>
<li>Instead lets use component to indicate the mei_hdcp's module
availability,</li>
<ul>
<li>by adding the component at module_init and removing it from
module_exit.</li>
</ul>
<li>This way I915 will not be impacted due to the mei device reset
at suspend.</li>
<li>In such scenario I915 will have no idea about the
device-driver bind status of mei_hdcp.</li>
<ul>
<li>So incase of device is not available, mei_hdcp is
responsible to prune such calls with -EIO error.</li>
</ul>
<li>This approach avoid any future impact to I915, incase mei
intended to support suspend and resume.<br>
</li>
</ul>
<pre>I am aware this is not the ideal solution we want. But I feel this is the best at present we could do for this I915-mei interface.
</pre>
<pre>Best regards,
Ram
</pre>
<div class="moz-cite-prefix">On 12/17/2018 7:16 PM, Daniel Vetter
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAKMK7uEM9sCm7r4Hha0rWQ53Zp7nTukkC29sjt70vmp0uK62Fg@mail.gmail.com">
<pre class="moz-quote-pre" wrap="">On Mon, Dec 17, 2018 at 11:57 AM Winkler, Tomas <a class="moz-txt-link-rfc2396E" href="mailto:tomas.winkler@intel.com"><tomas.winkler@intel.com></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Sat, Dec 15, 2018 at 09:20:38PM +0000, Winkler, Tomas wrote:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
On Thu, Dec 13, 2018 at 5:27 PM Winkler, Tomas
<a class="moz-txt-link-rfc2396E" href="mailto:tomas.winkler@intel.com"><tomas.winkler@intel.com></a>
wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Thu, Dec 13, 2018 at 1:36 PM C, Ramalingam
<a class="moz-txt-link-rfc2396E" href="mailto:ramalingam.c@intel.com"><ramalingam.c@intel.com></a>
wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Tomas and Daniel,
We got an issue here.
The relationship that we try to build between I915 and
mei_hdcp is as
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">follows:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
We are using the components to establish the relationship.
I915 is component master where as mei_hdcp is component.
I915 adds the component master during the module load.
mei_hdcp adds the
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">component when the driver->probe is called (on device driver binding).
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I915 forces itself such that until mei_hdcp component is added
I915_load
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">wont be complete.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Similarly on complete system, if mei_hdcp component is
removed,
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">immediately I915 unregister itself and HW will be shutdown.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
This is completely fine when the modules are loaded and unloaded.
But during suspend, mei device disappears and mei bus handles
it by
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">unbinding device and driver by calling driver->remove.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">This in-turn removes the component and triggers the master
unbind of I915
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">where, I915 unregister itself.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">This cause the HW state mismatch during the suspend and resume.
Please check the powerwell mismatch errors at CI report for v9
<a class="moz-txt-link-freetext" href="https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_3412/fi-glk-j4">https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_3412/fi-glk-j4</a>
005/
igt@
<a class="moz-txt-link-abbreviated" href="mailto:gem_exec_suspend@basic-s3.html">gem_exec_suspend@basic-s3.html</a>
More over unregistering I915 during the suspend is not expected.
So how do
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">we handle this?
Bit more context from our irc discussion with Ram:
I found this very surprising, since I don't know of any other
subsystems where the devices get outright removed when going
through a
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">suspend/resume cycle.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">The device model was built to handle this stuff
correctly: First clients/devices/interfaces get suspend, then
the parent/bridge/bus. Same dance in reverse when resuming. This
even holds for lots of hotpluggable buses, where child devices
could indeed disappear on resume, but as long as they don't,
everything stays the same. It's really surprising for something
that's soldered onto the
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">board like ME.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
HDCP is an application in the ME it's not ME itself.. On the
linux side HDCP2 is a virtual device on mei client virtual bus,
the bus is teared
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">down on ME reset, which mostly happen on power transitions.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Theoretically, we could keep it up during power transitions, but
so fare it was not necessary and second it's not guarantie that
the all ME
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">applications will reappear after reset.
When does this happen that an ME application doesn't come back after e.g.
suspend/resume?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">No, this can happen in special flows such as fw updates and error conditions,
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">but is has to be supported as well.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Also, what's all the place where this reset can happen? Just
suspend/resume/hibernate and all these, or also at other times?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Also on errors and fw update, the basic assumption is here that it can happen
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">any time.
If this can happen any time, what are we supposed to do if this happens while
we're doing something with the hdcp mei? If this is such a common occurence I
guess we need to somehow wait until everyting is rebound and working again. I
think ideally mei core would handle that for us, but I guess if this just randomly
happens then we need to redo all the transactions. So does need some
involvement of the higher levels.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
It's not common occurrence, but the assumption must be it can happen any time,
In that case everything has to restarted as there is no state preserved in the ME FW.
Right MEI core cannot do it for you, it is just a channel, the logic and state of the connection
is in the mei_hdcp or gfx. Note that HDCP is not the only App over MEI.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Yes, each mei interface would need to provide suspend/resume
functions, or something like that. Or at least a reset function.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Also, how likely is it that the hdcp mei will outright disappear and not come
back after a reset?
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">How does userspace deal with the reset over s/r? I'm assuming that
at least the device node file will become invalid (or whatever
you're using as userspace api), so if userspace is accessing stuff
on the me at the same time as we do a suspend/resume, what happens?
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Also, answer to how other users handle this would be enlighting.
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Still looking to understand this here.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Aside: We'll probably need a device_link to make sure mei_hdcp
is fully resumed before i915 gets resumed, but that's kinda a
detail for later
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">on.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Frankly I don’t believe there is currently exact abstraction that
supports this model, neither components nor device_link .
So fare we used class interface for other purposes, it worked well.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I'm not clear on what class interface has to do with component or device
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">link.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">They all solve different problems, at least as far as I understand all this stuff
</pre>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">...
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">-Daniel
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
It comes instead of it, device_link is mostly used for power
management and component as we see know is not what we need as HDCP Is
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">a b it volitle.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">class_interface gives you two handlers: add and remove device, that's all
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">what is needed for the current implementation.
Well someone needs to handle the volatility of hdcp, and atm we seem to be
playing a game of pass the bucket. I still think that mei_hdcp should supply a
clean interface to i915, with all the reset madness handled internally. But
depending upon how badly this all leaks we might need to have a retry logic in
the i915 hdcp flow too.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Restart logic is must.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Ok, I guess then we need to wrap another layer on top of mei to make
this happen.
Does mei provide any signal whether a client/app has not survived a
reset? Atm there's not way for us to tell a reset apart from a
"mei_hdcp disappared for good" event. Which we kinda need to do.
Ideally a reset would be a distinct event and not implemented as an
unbind/rebind cycle like it currently is.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">device linke we'll probably need anyway, since i915 resuming when hdcp is not
yet up is not a good idea no matter what's goîng on.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I've explored device_link and I'm not sure it is suitable there is no power relationship, on suspend/resume the device disappear.
I still believe that class_interface is better choice, it this particular case.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I'm not sure what you mean with class_interface here. How are we
supposed to use that in this case here? I'm not following you at all
here.
I also noticed that resume seems to be entirely deferred to workers:
mei_restart only writes the me start command through the hbm. So all
the clients will only be re-registered somewhen later on through an
async worker (in the rescan_work). Is that understanding correct? If
that's the case we'd need a way to wait for that, so we know whether
the mei_hdcp is useable again or has disappeared for good.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">The whole issue is not yet resolved in the Linux kernel.
There was a discussion around it in ELC <a class="moz-txt-link-freetext" href="https://schd.ws/hosted_files/osseu18/0f/deferred_problem.pdf">https://schd.ws/hosted_files/osseu18/0f/deferred_problem.pdf</a>
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
There's still a bunch of open issues around deferred probe and device
driver loading, but none that would interfer with what we're trying to
do here. At least if mei wouldn't handle resets through a bind/unbind
cycle.
-Daniel
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Thanks
Tomas
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
</body>
</html>