[Intel-gfx] [PATCH 1/7] drm/i915: Disable preemption and sleeping while using the punit sideband

Fri Jan 12 10:02:06 UTC 2018

Hans de Goede <hdegoede at redhat.com> writes:

> Hi,
>
> On 11-01-18 22:42, Hans de Goede wrote:
>> Hi,
>> 
>> On 11-01-18 22:17, Ville Syrjälä wrote:
>>> On Thu, Jan 11, 2018 at 08:53:42PM +0000, Chris Wilson wrote:
>>>> Quoting Ville Syrjälä (2018-01-11 20:10:45)
>>>>> On Wed, Jan 10, 2018 at 12:55:05PM +0000, Chris Wilson wrote:
>>>>>> While we talk to the punit over its sideband, we need to prevent the cpu
>>>>>> from sleeping in order to prevent a potential machine hang.
>>>>>>
>>>>>> Note that by itself, it appears that pm_qos_update_request (via
>>>>>> intel_idle) doesn't provide a sufficient barrier to ensure that all core
>>>>>> are indeed awake (out of Cstate) and that the package is awake. To do so,
>>>>>> we need to supplement the pm_qos with a manual ping on_each_cpu.
>>>>>>
>>>>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109051
>>>>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=102657
>>>>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=195255
>>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>>>>> Cc: Hans de Goede <hdegoede at redhat.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/i915/i915_drv.c       |  6 ++++
>>>>>>   drivers/gpu/drm/i915/i915_drv.h       |  1 +
>>>>>>   drivers/gpu/drm/i915/intel_sideband.c | 61 ++++++++++++++++++++++++-----------
>>>>>>   3 files changed, 50 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>>>>>> index 6c8da9d20c33..d4b90cc0130b 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>>>>>> @@ -902,6 +902,9 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
>>>>>>        spin_lock_init(&dev_priv->uncore.lock);
>>>>>>        mutex_init(&dev_priv->sb_lock);
>>>>>> +     pm_qos_add_request(&dev_priv->sb_qos,
>>>>>> +                        PM_QOS_CPU_DMA_LATENCY, PM_QOS_DEFAULT_VALUE);
>>>>>> +
>>>>>>        mutex_init(&dev_priv->modeset_restore_lock);
>>>>>>        mutex_init(&dev_priv->av_mutex);
>>>>>>        mutex_init(&dev_priv->wm.wm_mutex);
>>>>>> @@ -953,6 +956,9 @@ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
>>>>>>        intel_irq_fini(dev_priv);
>>>>>>        i915_workqueues_cleanup(dev_priv);
>>>>>>        i915_engines_cleanup(dev_priv);
>>>>>> +
>>>>>> +     pm_qos_remove_request(&dev_priv->sb_qos);
>>>>>> +     mutex_destroy(&dev_priv->sb_lock);
>>>>>>   }
>>>>>>   static int i915_mmio_setup(struct drm_i915_private *dev_priv)
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> index a689396d0ff6..ff3f9effc0bb 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> @@ -1887,6 +1887,7 @@ struct drm_i915_private {
>>>>>>        /* Sideband mailbox protection */
>>>>>>        struct mutex sb_lock;
>>>>>> +     struct pm_qos_request sb_qos;
>>>>>>        /** Cached value of IMR to avoid reads in updating the bitfield */
>>>>>>        union {
>>>>>> diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
>>>>>> index 75c872bb8cc9..02bdd2e2cef6 100644
>>>>>> --- a/drivers/gpu/drm/i915/intel_sideband.c
>>>>>> +++ b/drivers/gpu/drm/i915/intel_sideband.c
>>>>>> @@ -22,6 +22,8 @@
>>>>>>    *
>>>>>>    */
>>>>>> +#include <asm/iosf_mbi.h>
>>>>>> +
>>>>>>   #include "i915_drv.h"
>>>>>>   #include "intel_drv.h"
>>>>>> @@ -39,18 +41,20 @@
>>>>>>   /* Private register write, double-word addressing, non-posted */
>>>>>>   #define SB_CRWRDA_NP 0x07
>>>>>> -static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn,
>>>>>> -                        u32 port, u32 opcode, u32 addr, u32 *val)
>>>>>> +static void ping(void *info)
>>>>>>   {
>>>>>> -     u32 cmd, be = 0xf, bar = 0;
>>>>>> -     bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP);
>>>>>> +}
>>>>>> -     cmd = (devfn << IOSF_DEVFN_SHIFT) | (opcode << IOSF_OPCODE_SHIFT) |
>>>>>> -             (port << IOSF_PORT_SHIFT) | (be << IOSF_BYTE_ENABLES_SHIFT) |
>>>>>> -             (bar << IOSF_BAR_SHIFT);
>>>>>> +static int vlv_sideband_rw(struct drm_i915_private *dev_priv,
>>>>>> +                        u32 devfn, u32 port, u32 opcode,
>>>>>> +                        u32 addr, u32 *val)
>>>>>> +{
>>>>>> +     const bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP);
>>>>>> +     int err;
>>>>>> -     WARN_ON(!mutex_is_locked(&dev_priv->sb_lock));
>>>>>> +     lockdep_assert_held(&dev_priv->sb_lock);
>>>>>> +     /* Flush the previous comms, just in case it failed last time. */
>>>>>>        if (intel_wait_for_register(dev_priv,
>>>>>>                                    VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0,
>>>>>>                                    5)) {
>>>>>> @@ -59,22 +63,43 @@ static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn,
>>>>>>                return -EAGAIN;
>>>>>>        }
>>>>>> -     I915_WRITE(VLV_IOSF_ADDR, addr);
>>>>>> -     I915_WRITE(VLV_IOSF_DATA, is_read ? 0 : *val);
>>>>>> -     I915_WRITE(VLV_IOSF_DOORBELL_REQ, cmd);
>>>>>> +     iosf_mbi_punit_acquire();
>>>>>> -     if (intel_wait_for_register(dev_priv,
>>>>>> -                                 VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0,
>>>>>> -                                 5)) {
>>>>>> +     /*
>>>>>> +      * Prevent the cpu from sleeping while we use this sideband, otherwise
>>>>>> +      * the punit may cause a machine hang.
>>>>>> +      */
>>>>>> +     pm_qos_update_request(&dev_priv->sb_qos, 0);
>>>>>> +     on_each_cpu(ping, NULL, 1);
>>>>>
>>>>> pm_qos_update_request() doesn't wake up the cpus on its own? I wonder
>>>>> kind of latency guarantees it can actually give without doing that.
>>>>
>>>> intel_idle plugs into the update to wake up idle cpus, but it is not
>>>> synchronous. If we don't have the on_each_cpu() here, the hangs reoccur;
>>>> the easiest answer seems to be that the on_each_cpu() is causing a
>>>> schedule of the RT migration thread which is enough to ensure that the
>>>> wakeup has occurred.
>>>>> Shouldn't we check if we're actually be talking to the punit and not
>>>>> some other unit before we do all this extra work?
>>>>
>>>> Checking port? I am suspicious of the whole iosf mechanism...
>>>
>>> I doubt iosf sb itself is busted. That would probably make the entire
>>> soc pretty much dead. I suspect the problem is either in the punit
>>> firmware, or something fishy is going on with power delivery and
>>> reducing the rate at which we change the voltages and whatnot helps
>>> keep things a bit more stable.
>>>
>>> So yeah, I would limit this to punit only by checking the port.
>>>
>>>>
>>>> Also debating whether to only apply it to j1900 or all. Is it just pure
>>>> fluke that we can easily reproduce it on the quad core Baytrail as
>>>> opposed to the dual core vlv or chv?
>>>
>>> Not sure. I guess if the problem is around power delivery it might even
>>> be board specific to some degree.
>> 
>> The wordpress (on azure.com) link here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=109051#c860
>> 
>> Certainly seems to hint at a power delivery problem (with the SoC not
>> with specific boards) and all models reported as needing
>> intel_idle.max_cstate=1 there are quad-core Bay Trails.
>> 
>> I guess it the problem happens on all quad-core Bay Trail variants when
>> the GPU and CPU both increase there power-demand (by up-clocking, resp.
>> leaving C6) at the same time and Chris' fix makes sure all CPU cores
>> leave C6 before doing the GPU up-clocking avoiding this.
>> 
>> Note this is all speculation / educated guessing.
>
> Also interesting wrt this is this comment:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=109051#c752
>
> Which speculates along the same line and the commenter has done
> extensive testing (with 2 boards) coming to the conclusion that
> the dual core part does not have whatever problem he is seeing.
>

This has been my understanding that 2 core parts are
not affected. Further, limiting cores to 2 with 4 core j1900 using
maxcpus=2 is an effective workaround.

-Mika