[Intel-gfx] [08/11] drm/i915/guc: Wait for doorbell to be inactive before deallocating

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Wed Mar 15 00:55:28 UTC 2017



On 13/03/17 01:56, Oscar Mateo wrote:
>
>
> On 03/13/2017 04:46 AM, Chris Wilson wrote:
>> On Fri, Mar 10, 2017 at 08:28:55AM -0800, Oscar Mateo wrote:
>>> Doorbell release flow requires that we wait for GEN8_DRB_VALID bit to go
>>> to zero after updating db_status before we call the GuC to release the
>>> doorbell.
>> Does this fix some of the '110' errors?
>> -Chris
>
> No, AFAICT this doesn't fix anything (but it's required nonetheless).
>
> The 110 errors (ENTER_S_STATE and DEALLOCATE_DOORBELL failed calls to
> GuC) are cause by us resetting the GuC right before we ask it to do
> something. E.g.: in the suspend path, we do
> i915_gem_suspend()->i915_gem_sanitize()->intel_gpu_reset() and then
> intel_guc_suspend() inmediately after. After the sanitize, the GuC is in
> reset state and without a firmware loaded, so asking it to suspend is
> pretty useless. Same thing when trying to orderly shutdown the doorbells
> right after reset (the GuC has no idea of what doorbell we are talking
> about).
>
> For now, the simple fix would be to keep doing the reset and not ask the
> GuC to do anything after. But sooner or later, we won't be able to get
> away with this. E.g.: if we enable direct submission, the GuC is the
> only one that will know about all the requests in-flight, and therefore
> the only one that can decide when to suspend. Also, surely loading the
> GuC firmware every time is slowing down the resume path by a lot?
>
> -- Oscar

To add a bit of info, the wait is required because if there are pending 
rings on the doorbell the HW may require a slightly longer time to 
disable it after we update the status. If we call into GuC before the 
disabling is completed GuC will return an error because the doorbell 
will still be active from its point of view, even if it is in the 
process of shutting down. I think that with our current flow it should 
be impossible to hit this situation so it as Oscar said it doesn't fix 
anything, but it is still worth adding it to avoid issues in the future.

In regards to the reset, I agree that we should either reorder stuff to 
make sure that the H2G messages are sent before the reset or drop the 
messages entirely.
In the particular case of suspend, it is not guaranteed that the GuC 
will survive a suspend/resume cycle (it depends on the depth of the 
sleep state). That is why the ENTER_S_STATE message exists: this message 
tells GuC to save all the information it needs in the buffer we 
provided; we can then tell GuC to restore this state with the 
EXIT_S_STATE message, which can be used on a freshly loaded FW as well.
With our current flow the GuC queues should be empty when we go to 
suspend so it shouldn't matter, but direct submission will need this to 
work.

Daniele


More information about the Intel-gfx mailing list