[Intel-gfx] [PATCH v9 35/39] misc/mei/hdcp: Component framework for I915 Interface

Wed Dec 19 06:45:33 UTC 2018

Tomas and Daniel,

 From the discussion on this thread, I infer following understanding:

  * At present(v9) I915 wants to be hard binded to mei_hdcp
    device-driver binding status through components
      o This means I915 driver load will get complete only when the
        mei_hdcp's device and driver are bound.
      o if mei_hdcp device reset I915 will unregister itself from
        userspace, and wait for the mei_hdcp device-deriver rebinding.
          + Could be due to FW error or any unexpected failures those
            are rare occurances.
      o when mei_hdcp module is removed i915 will unregister itself.
      o Becasue of this, Ideally I915 dont expect the device reset from
        mei for suspend and resume.
  * At present Mei bus is designed as below:
      o Device will disappear on FW failures, FW upgrade, suspend of the
        system etc.
      o And when the errors are handled or on system resume mei device
        will reappear, hence binding with corresponding driver.
  * Mei doesn't plan to avoid the device reset(disappearance and
    reappearance) for suspend and resume in near future.

Based on above understanding, I propose the below approach. Please correct or approve it.

  * At present(v9) component_add from mei_hdcp indicates the mei_hdcp's
    device-driver binded state.
  * Instead lets use component to indicate the mei_hdcp's module
    availability,
      o by adding the component at module_init and removing it from
        module_exit.
  * This way I915 will not be impacted due to the mei device reset at
    suspend.
  * In such scenario I915 will have no idea about the device-driver bind
    status of mei_hdcp.
      o So incase of device is not available, mei_hdcp is responsible to
        prune such calls with -EIO error.
  * This approach avoid any future impact to I915, incase mei intended
    to support suspend and resume.

I am aware this is not the ideal solution we want. But I feel this is the best at present we could do for this I915-mei interface.

Best regards,
Ram

On 12/17/2018 7:16 PM, Daniel Vetter wrote:
> On Mon, Dec 17, 2018 at 11:57 AM Winkler, Tomas <tomas.winkler at intel.com> wrote:
>>
>>> On Sat, Dec 15, 2018 at 09:20:38PM +0000, Winkler, Tomas wrote:
>>>>> On Thu, Dec 13, 2018 at 5:27 PM Winkler, Tomas
>>>>> <tomas.winkler at intel.com>
>>>>> wrote:
>>>>>>> On Thu, Dec 13, 2018 at 1:36 PM C, Ramalingam
>>>>>>> <ramalingam.c at intel.com>
>>>>>>> wrote:
>>>>>>>> Tomas and Daniel,
>>>>>>>>
>>>>>>>> We got an issue here.
>>>>>>>>
>>>>>>>> The relationship that we try to build between I915 and
>>>>>>>> mei_hdcp is as
>>>>> follows:
>>>>>>>> We are using the components to establish the relationship.
>>>>>>>> I915 is component master where as mei_hdcp is component.
>>>>>>>> I915 adds the component master during the module load.
>>>>>>>> mei_hdcp adds the
>>>>>>> component when the driver->probe is called (on device driver binding).
>>>>>>>> I915 forces itself such that until mei_hdcp component is added
>>>>>>>> I915_load
>>>>>>> wont be complete.
>>>>>>>> Similarly on complete system, if mei_hdcp component is
>>>>>>>> removed,
>>>>>>> immediately I915 unregister itself and HW will be shutdown.
>>>>>>>> This is completely fine when the modules are loaded and unloaded.
>>>>>>>>
>>>>>>>> But during suspend, mei device disappears and mei bus handles
>>>>>>>> it by
>>>>>>> unbinding device and driver by calling driver->remove.
>>>>>>>> This in-turn removes the component and triggers the master
>>>>>>>> unbind of I915
>>>>>>> where, I915 unregister itself.
>>>>>>>> This cause the HW state mismatch during the suspend and resume.
>>>>>>>>
>>>>>>>> Please check the powerwell mismatch errors at CI report for v9
>>>>>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_3412/fi-glk-j4
>>>>>>>> 005/
>>>>>>>> igt@
>>>>>>>> gem_exec_suspend at basic-s3.html
>>>>>>>>
>>>>>>>> More over unregistering I915 during the suspend is not expected.
>>>>>>>> So how do
>>>>>>> we handle this?
>>>>>>>
>>>>>>> Bit more context from our irc discussion with Ram:
>>>>>>>
>>>>>>> I found this very surprising, since I don't know of any other
>>>>>>> subsystems where the devices get outright removed when going
>>>>>>> through a
>>>>> suspend/resume cycle.
>>>>>>> The device model was built to handle this stuff
>>>>>>> correctly: First clients/devices/interfaces get suspend, then
>>>>>>> the parent/bridge/bus. Same dance in reverse when resuming. This
>>>>>>> even holds for lots of hotpluggable buses, where child devices
>>>>>>> could indeed disappear on resume, but as long as they don't,
>>>>>>> everything stays the same. It's really surprising for something
>>>>>>> that's soldered onto the
>>>>> board like ME.
>>>>>> HDCP is an application in the ME it's not ME itself..  On the
>>>>>> linux side HDCP2 is a virtual device  on mei client virtual bus,
>>>>>> the bus  is teared
>>>>> down on ME reset, which mostly happen  on power transitions.
>>>>>> Theoretically,  we could keep it up during power transitions, but
>>>>>> so fare it was not necessary and second it's not guarantie that
>>>>>> the all ME
>>>>> applications will reappear after reset.
>>>>>
>>>>> When does this happen that an ME application doesn't come back after e.g.
>>>>> suspend/resume?
>>>> No, this can happen in special flows such as  fw updates and error conditions,
>>> but is has to be supported as well.
>>>>> Also, what's all the place where this reset can happen? Just
>>>>> suspend/resume/hibernate and all these, or also at other times?
>>>> Also on errors and fw update,  the basic assumption is here that it can happen
>>> any time.
>>>
>>> If this can happen any time, what are we supposed to do if this happens while
>>> we're doing something with the hdcp mei? If this is such a common occurence I
>>> guess we need to somehow wait until everyting is rebound and working again. I
>>> think ideally mei core would handle that for us, but I guess if this just randomly
>>> happens then we need to redo all the transactions. So does need some
>>> involvement of the higher levels.
>> It's not common occurrence, but the assumption must be it can happen any time,
>> In that case everything has to restarted as there is no state preserved in the ME FW.
>> Right MEI core cannot do it for you, it is just a channel, the logic and state of the connection
>> is in the mei_hdcp or gfx.   Note that HDCP is not the only App over MEI.
> Yes, each mei interface would need to provide suspend/resume
> functions, or something like that. Or at least a reset function.
>
>>> Also, how likely is it that the hdcp mei will outright disappear and not come
>>> back after a reset?
>>>
>>>>> How does userspace deal with the reset over s/r? I'm assuming that
>>>>> at least the device node file will become invalid (or whatever
>>>>> you're using as userspace api), so if userspace is accessing stuff
>>>>> on the me at the same time as we do a suspend/resume, what happens?
>>> Also, answer to how other users handle this would be enlighting.
> Still looking to understand this here.
>
>>>>>>> Aside: We'll probably need a device_link to make sure mei_hdcp
>>>>>>> is fully resumed before i915 gets resumed, but that's kinda a
>>>>>>> detail for later
>>>>> on.
>>>>>> Frankly I don’t believe there is currently exact abstraction that
>>>>>> supports this model, neither components nor device_link .
>>>>>> So fare we used class interface for other purposes, it worked well.
>>>>> I'm not clear on what class interface has to do with component or device
>>> link.
>>>>> They all solve different problems, at least as far as I understand all this stuff
>>> ...
>>>>> -Daniel
>>>> It comes instead of it, device_link is mostly used for power
>>>> management and component as we see know is not what we need as HDCP Is
>>> a b it volitle.
>>>> class_interface  gives you two handlers: add and remove device, that's all
>>> what is needed for the current implementation.
>>>
>>> Well someone needs to handle the volatility of hdcp, and atm we seem to be
>>> playing a game of pass the bucket. I still think that mei_hdcp should supply a
>>> clean interface to i915, with all the reset madness handled internally. But
>>> depending upon how badly this all leaks we might need to have a retry logic in
>>> the i915 hdcp flow too.
>>
>> Restart logic is must.
> Ok, I guess then we need to wrap another layer on top of mei to make
> this happen.
>
> Does mei provide any signal whether a client/app has not survived a
> reset? Atm there's not way for us to tell a reset apart from a
> "mei_hdcp disappared for good" event. Which we kinda need to do.
> Ideally a reset would be a distinct event and not implemented as an
> unbind/rebind cycle like it currently is.
>
>>> device linke we'll probably need anyway, since i915 resuming when hdcp is not
>>> yet up is not a good idea no matter what's goîng on.
>> I've explored device_link and I'm not sure it is suitable there is no power relationship, on suspend/resume the device disappear.
>> I still believe that class_interface is better choice, it this particular case.
> I'm not sure what you mean with class_interface here. How are we
> supposed to use that in this case here? I'm not following you at all
> here.
>
> I also noticed that resume seems to be entirely deferred to workers:
> mei_restart only writes the me start command through the hbm. So all
> the clients will only be re-registered somewhen later on through an
> async worker (in the rescan_work). Is that understanding correct? If
> that's the case we'd need a way to wait for that, so we know whether
> the mei_hdcp is useable again or has disappeared for good.
>
>> The whole issue is not yet resolved in the Linux kernel.
>> There was a discussion around it in ELC  https://schd.ws/hosted_files/osseu18/0f/deferred_problem.pdf
> There's still a bunch of open issues around deferred probe and device
> driver loading, but none that would interfer with what we're trying to
> do here. At least if mei wouldn't handle resets through a bind/unbind
> cycle.
> -Daniel
>
>> Thanks
>> Tomas
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20181219/44ca358a/attachment-0001.html>