[Intel-gfx] [RFC 03/12] drm/i915: Support for GuC interrupts

Sat May 28 17:33:35 UTC 2016

On 5/28/2016 8:05 PM, Chris Wilson wrote:
> On Sat, May 28, 2016 at 07:15:52PM +0530, Goel, Akash wrote:
>>
>>
>> On 5/28/2016 5:43 PM, Chris Wilson wrote:
>>> On Sat, May 28, 2016 at 02:52:16PM +0530, Goel, Akash wrote:
>>>>
>>>>
>>>> On 5/28/2016 1:26 AM, Chris Wilson wrote:
>>>>> On Sat, May 28, 2016 at 01:12:54AM +0530, akash.goel at intel.com wrote:
>>>>>> +void gen8_reset_guc_interrupts(struct drm_device *dev)
>>>>>> +{
>>>>>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>>>>>> +	i915_reg_t reg = gen6_pm_iir(dev_priv);
>>>>>
>>>> >From the looks of this we have multiple shadows for the same register.
>>>>> That's very bad.
>>>>>
>>>>> Now the platforms might be mutually exclusive, but it is still a mistake
>>>>> that will catch us out.
>>>>>
>>>> Will check how it is in newer platforms.
>>>>
>>>>>> +	spin_lock_irq(&dev_priv->irq_lock);
>>>>>> +	I915_WRITE(reg, dev_priv->guc_events);
>>>>>> +	I915_WRITE(reg, dev_priv->guc_events);
>>>>>
>>>>> What? Not even the tiniest of comments to explain?
>>>>>
>>>> Sorry actually just copied these steps as is from the
>>>> gen6_reset_rps_interrupts(), considering that the same set of
>>>> registers (IIR, IER, IMR) are involved here.
>>>> So the double clearing of IIR followed by posting read could be
>>>> needed here also.
>>>
>>> Move it all to i915_irq.c and export routines to manipulate pm_iir such
>>> that multiple users do not conflict.
>>>
>> Sorry but all interrupt related stuff for rps & GuC is already
>> inside i915_irq.c file.
>
> Didn't notice, because this code didn't match my expectations for an
> interface exported from i915_irq.c
>
>> Also the IER, IMR, IIR registers are being updated in a non
>> conflicting manner, no overlap between the PM bits & GuC events
>> bits.
>
> They share a register, that mandates arbitration.
>

I think the arbitration (& serialization) is already being provided by 
irq_lock.

>> You mean to say need to have single set of routines only for interrupt
>> reset/enable/disable operations for rps & GuC.
>
> Yes.
>

Fine will make them to use a single set of low level routines.

>>>>>> +	POSTING_READ(reg);
>>>>>
>>>>> Again. Not even the tiniest of comments to explain?
>>>>>
>>>>>> +	spin_unlock_irq(&dev_priv->irq_lock);
>>>>>> +}
>>>>>> +
>>>>>> +void gen8_enable_guc_interrupts(struct drm_device *dev)
>>>>>> +{
>>>>>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>>>>>> +
>>>>>> +	spin_lock_irq(&dev_priv->irq_lock);
>>>>>> +	if (!dev_priv->guc.interrupts_enabled) {
>>>>>> +		WARN_ON(I915_READ(gen6_pm_iir(dev_priv)) &
>>>>>> +					dev_priv->guc_events);
>>>>>> +		dev_priv->guc.interrupts_enabled = true;
>>>>>> +		I915_WRITE(gen6_pm_ier(dev_priv),
>>>>>> +		      I915_READ(gen6_pm_ier(dev_priv)) | dev_priv->guc_events);
>>>>>
>>>>> ier should be known, rmw on the reg should not be required.
>>>>>
>>>> Sorry same as above, copy paste from gen6_enable_rps_interrupts().
>>>> Without rmw, would this be fine ?
>>>>
>>>> 	if (dev_priv->rps.interrupts_enabled)
>>>> 		I915_WRITE(gen6_pm_ier(dev_priv),
>>>> 			dev_priv->pm_rps_events | dev_priv->guc_events);
>>>> 	else
>>>> 		I915_WRITE(gen6_pm_ier(dev_priv), dev_priv->guc_events);
>>>
>>> Still has the presumption of owning a register that is ostensibly used
>>> by others.
>>>
>> Since pm_ier is a shared register and being used by others also, rmw
>> seem to be more suited here. Otherwise need to be aware of who all is
>> sharing it so as to update it without disturbing the bits owned by
>> others.
>
> Exactly, see above. The best interfaces from i915_irq.c do not use rmw
> on the register values.

Fine will try to do away with use rmw operation for pm_ier by
maintaining a bit mask of enabled interrupts (just like pm_irq_mask).

>
>>>>>> +static void gen8_guc2host_events_work(struct work_struct *work)
>>>>>> +{
>>>>>> +	struct drm_i915_private *dev_priv =
>>>>>> +		container_of(work, struct drm_i915_private, guc.events_work);
>>>>>> +
>>>>>> +	spin_lock_irq(&dev_priv->irq_lock);
>>>>>> +	/* Speed up work cancelation during disabling guc interrupts. */
>>>>>> +	if (!dev_priv->guc.interrupts_enabled) {
>>>>>> +		spin_unlock_irq(&dev_priv->irq_lock);
>>>>>> +		return;
>>>>>> +	}
>>>>>> +
>>>>>> +	DISABLE_RPM_WAKEREF_ASSERTS(dev_priv);
>>>>>
>>>>> This just shouts that the code is broken.
>>>>>
>>>> You mean to say that ideally the wakeref_count (& power.usage_count)
>>>> should already be non zero here.
>>>
>>> Yes. If it is not under your control, then you have a bug in your code.
>>> Existing DISABLE_RPM_WAKEREF_ASSERTS tell us where we know we have a bug
>>> (and hacks in place whilst we wait for patch review).
>>>
>>
>> This work item can also execute in a window where wakeref_count (&
>> power.usage_count) have become zero but runtime suspend has not yet
>> kicked in (due to auto-suspend delay), so "RPM wakelock ref not held
>> during HW access" warning would come.
>
> i.e. your code is buggy, as DISABLE_RPM_WAKEREF_ASSERTS implied.
>

But isn't this applicable to rps work item also ?.
If there is a way found to circumvent this, then same can be applied to 
GuC work item also. DISABLE_RPM_WAKEREF_ASSERTS is a stopgap solution.

>>>>>> void intel_crt_init(struct drm_device *dev);
>>>>>> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
>>>>>> index 41601c7..e20792d 100644
>>>>>> --- a/drivers/gpu/drm/i915/intel_guc.h
>>>>>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>>>>>> @@ -125,6 +125,11 @@ struct intel_guc {
>>>>>> 	struct intel_guc_fw guc_fw;
>>>>>> 	uint32_t log_flags;
>>>>>> 	struct drm_i915_gem_object *log_obj;
>>>>>> +	/*
>>>>>> +	 * work, interrupts_enabled are protected by dev_priv->irq_lock
>>>>>> +	 */
>>>>>> +	struct work_struct events_work;
>>>>>
>>>>> The work gets added here, yet bugs are fixed for the worker in later
>>>>> patches. Squash in the bug fixes.
>>>>
>>>> Sorry didn't get this. Are you alluding to cancellation of this
>>>> work_item Or flushing of work item from the error handling path ?
>>>>
>>>> Cancellation is being done as a part of disabling interrupt and the
>>>> call to disable interrupt is there in intel_guc_suspend(), part of
>>>> this patch only.
>>>
>>> You add a flush later, which seems unrelated to that patch.
>>
>> The flush of work item is there in the last patch, inside the error
>> handling path for forcefully getting a new flush interrupt from GuC
>> and so that is not tightly coupled with this patch.
>
> Reseting the work would seem be part of the log infrastructure.

Fine will add the work item reset part in this patch only.

Best regards
Akash
> -Chris
>