[Intel-gfx] [PATCH 1/3] drm/i915/guc: Add non blocking CTB send function

Mon Nov 25 13:39:15 UTC 2019

On Fri, 22 Nov 2019 02:34:22 +0100, Matthew Brost  
<matthew.brost at intel.com> wrote:

> On Thu, Nov 21, 2019 at 04:13:25PM -0800, Matthew Brost wrote:
>> On Thu, Nov 21, 2019 at 12:43:26PM +0100, Michal Wajdeczko wrote:
>>> On Thu, 21 Nov 2019 00:56:02 +0100, <John.C.Harrison at intel.com> wrote:
>>>
>>>> From: Matthew Brost <matthew.brost at intel.com>
>>>>
>>>> Add non blocking CTB send fuction, intel_guc_send_nb. In order to
>>>> support a non blocking CTB send fuction a spin lock is needed to
>>>
>>> 2x typos
>>>
>>>> protect the CTB descriptors fields. Also the non blocking call must  
>>>> not
>>>> update the fence value as this value is owned by the blocking call
>>>> (intel_guc_send).
>>>
>>> you probably mean "intel_guc_send_ct", as intel_guc_send is just a  
>>> wrapper
>>> around guc->send
>>>
>>
>> Ah, yes.
>>
>>>>
>>>> The blocking CTB now must have a flow control mechanism to ensure the
>>>> buffer isn't overrun. A lazy spin wait is used as we believe the flow
>>>> control condition should be rare with properly sized buffer. A retry
>>>> counter is also implemented which fails H2G CTBs once a limit is
>>>> reached to prevent deadlock.
>>>>
>>>> The function, intel_guc_send_nb, is exported in this patch but unused.
>>>> Several patches later in the series make use of this function.
>>>
>>> It's likely in yet another series
>>>
>>
>> Yes, it is.
>>
>>>>
>>>> Cc: John Harrison <john.c.harrison at intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h    |  2 +
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 97 +++++++++++++++++++----
>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 10 ++-
>>>> 3 files changed, 91 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> index e6400204a2bd..77c5af919ace 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> @@ -94,6 +94,8 @@ intel_guc_send_and_receive(struct intel_guc *guc,  
>>>> const u32 *action, u32 len,
>>>> 	return guc->send(guc, action, len, response_buf, response_buf_size);
>>>> }
>>>> +int intel_guc_send_nb(struct intel_guc_ct *ct, const u32 *action,  
>>>> u32 len);
>>>> +
>>>
>>> Hmm, this mismatch of guc/ct parameter breaks the our layering.
>>> But we can keep this layering intact by introducing some flags to
>>> the existing guc_send() function. These flags could be passed as
>>> high bits in action[0], like this:
>>
>> This seems reasonable.
>>
>
> Prototyped this and I don't like it all. First 'action' is a const so  
> what you
> are suggesting doesn't work unless that is changed.

I guess only explicit clearing of that flag from const action was a  
mistake,
rest should be ok.

> Also what if all bits in DW
> eventually mean something,

In all guc_send functions we only use lower 16 bits for action code.
This action code is then passed in MMIO messages in bits 15:0 and in
CTB messages in bits 31:16.

> to me overloading a field isn't a good idea if
> anything we should add another argument to guc->send().

guc->send already takes 5 params:

	int (*send)(struct intel_guc *guc, const u32 *data, u32 len,
		    u32 *response_buf, u32 response_buf_size);

We can add explicit flags, but I see nothing wrong in using free top
bits from data[0] to pass these flags there.

> But I'd honestly prefer
> we just leave it as is. Non-blocking only applies to CTs (not MMIO) and  
> we have
> GEM_BUG_ON to protect us if this function is called incorrectly.

Well, I guess many would still prefer to be hit by WARN from guc-send-nop
rather than BUG_ON from guc-send-nb

> Doing what you
> suggest just makes everything more complicated IMO.

I'm not sold that adding GUC_ACTION_FLAG_DONT_WAIT to action[0]
(as shown below) is that complicated

Michal

>
> Matt
>
>>>
>>> #define GUC_ACTION_FLAG_DONT_WAIT 0x80000000
>>>
>>> int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
>>> {
>>> 	u32 action[] = {
>>> 		INTEL_GUC_ACTION_AUTHENTICATE_HUC | GUC_ACTION_FLAG_DONT_WAIT,
>>> 		rsa_offset
>>> 	};
>>>
>>> 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>> }
>>>
>>> then actual back-end of guc->send can take proper steps based on this  
>>> flag:
>>>
>>> @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 *action,  
>>> u32 len,
>>>       GEM_BUG_ON(!len);
>>>       GEM_BUG_ON(len > guc->send_regs.count);
>>>
>>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT)
>>> +               return -EINVAL;
>>> +       *action &= ~GUC_ACTION_FLAG_DONT_WAIT;
>>> +
>>>       /* We expect only action code */
>>>       GEM_BUG_ON(*action & ~INTEL_GUC_MSG_CODE_MASK);
>>>
>>> @@ @@ int intel_guc_send_ct(struct intel_guc *guc, const u32 *action,  
>>> u32 len,
>>>       u32 status = ~0; /* undefined */
>>>       int ret;
>>>
>>> +       if (*action & GUC_ACTION_FLAG_DONT_WAIT) {
>>> +               GEM_BUG_ON(response_buf);
>>> +               GEM_BUG_ON(response_buf_size);
>>> +               return ctch_send_nb(ct, ctch, action, len);
>>> +       }
>>> +
>>>       mutex_lock(&guc->send_mutex);
>>>
>>>       ret = ctch_send(ct, ctch, action, len, response_buf,  
>>> response_buf_size,
>>>
>>>
>>>> static inline void intel_guc_notify(struct intel_guc *guc)
>>>> {
>>>> 	guc->notify(guc);
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index b49115517510..e50d968b15d5 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -3,6 +3,8 @@
>>>> * Copyright © 2016-2019 Intel Corporation
>>>> */
>>>> +#include <linux/circ_buf.h>
>>>> +
>>>> #include "i915_drv.h"
>>>> #include "intel_guc_ct.h"
>>>> @@ -12,6 +14,8 @@
>>>> #define CT_DEBUG_DRIVER(...)	do { } while (0)
>>>> #endif
>>>> +#define MAX_RETRY		0x1000000
>>>> +
>>>> struct ct_request {
>>>> 	struct list_head link;
>>>> 	u32 fence;
>>>> @@ -40,7 +44,8 @@ void intel_guc_ct_init_early(struct intel_guc_ct  
>>>> *ct)
>>>> 	/* we're using static channel owners */
>>>> 	ct->host_channel.owner = CTB_OWNER_HOST;
>>>> -	spin_lock_init(&ct->lock);
>>>> +	spin_lock_init(&ct->request_lock);
>>>> +	spin_lock_init(&ct->send_lock);
>>>> 	INIT_LIST_HEAD(&ct->pending_requests);
>>>> 	INIT_LIST_HEAD(&ct->incoming_requests);
>>>> 	INIT_WORK(&ct->worker, ct_incoming_request_worker_func);
>>>> @@ -291,7 +296,8 @@ static u32 ctch_get_next_fence(struct  
>>>> intel_guc_ct_channel *ctch)
>>>> static int ctb_write(struct intel_guc_ct_buffer *ctb,
>>>> 		     const u32 *action,
>>>> 		     u32 len /* in dwords */,
>>>> -		     u32 fence,
>>>> +		     u32 fence_value,
>>>> +		     bool enable_fence,
>>>
>>> maybe we can just guarantee that fence=0 will never be used as a valid
>>> fence id, then this flag could be replaced with (fence != 0) check.
>>>
>>
>> Yes, again seems reasonable. Initialize next_fence = 1, then increment  
>> by 2 each
>> time and this works.
>>
>>>> 		     bool want_response)
>>>> {
>>>> 	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> @@ -328,18 +334,18 @@ static int ctb_write(struct intel_guc_ct_buffer  
>>>> *ctb,
>>>> 	 * DW2+: action data
>>>> 	 */
>>>> 	header = (len << GUC_CT_MSG_LEN_SHIFT) |
>>>> -		 (GUC_CT_MSG_WRITE_FENCE_TO_DESC) |
>>>> +		 (enable_fence ? GUC_CT_MSG_WRITE_FENCE_TO_DESC : 0) |
>>>
>>> Hmm, even if we ask fw to do not write back fence to the descriptor,
>>> IIRC current firmware will unconditionally write back return status
>>> of this non-blocking call, possibly overwriting status of the blocked
>>> call.
>>>
>>
>> Yes, known problem with the interface that needs to be fixed.
>>
>>>> 		 (want_response ? GUC_CT_MSG_SEND_STATUS : 0) |
>>>
>>> btw, if we switch all requests to expect reply send back over CTB,
>>> then we can possibly drop the send_mutex in CTB paths, and block
>>> only when there is no DONT_WAIT flag and we have to wait for response.
>>>
>>
>> Rather just wait for the GuC to fix this.
>>
>>>> 		 (action[0] << GUC_CT_MSG_ACTION_SHIFT);
>>>> 	CT_DEBUG_DRIVER("CT: writing %*ph %*ph %*ph\n",
>>>> -			4, &header, 4, &fence,
>>>> +			4, &header, 4, &fence_value,
>>>> 			4 * (len - 1), &action[1]);
>>>> 	cmds[tail] = header;
>>>> 	tail = (tail + 1) % size;
>>>> -	cmds[tail] = fence;
>>>> +	cmds[tail] = fence_value;
>>>> 	tail = (tail + 1) % size;
>>>> 	for (i = 1; i < len; i++) {
>>>> @@ -440,6 +446,47 @@ static int wait_for_ct_request_update(struct  
>>>> ct_request *req, u32 *status)
>>>> 	return err;
>>>> }
>>>> +static inline bool ctb_has_room(struct guc_ct_buffer_desc *desc, u32  
>>>> len)
>>>> +{
>>>> +	u32 head = READ_ONCE(desc->head);
>>>> +	u32 space;
>>>> +
>>>> +	space = CIRC_SPACE(desc->tail, head, desc->size);
>>>> +
>>>> +	return space >= len;
>>>> +}
>>>> +
>>>> +int intel_guc_send_nb(struct intel_guc_ct *ct,
>>>> +		      const u32 *action,
>>>> +		      u32 len)
>>>> +{
>>>> +	struct intel_guc_ct_channel *ctch = &ct->host_channel;
>>>> +	struct intel_guc_ct_buffer *ctb = &ctch->ctbs[CTB_SEND];
>>>> +	struct guc_ct_buffer_desc *desc = ctb->desc;
>>>> +	int err;
>>>> +
>>>> +	GEM_BUG_ON(!ctch->enabled);
>>>> +	GEM_BUG_ON(!len);
>>>> +	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> +	lockdep_assert_held(&ct->send_lock);
>>>
>>> hmm, does it mean that now it's caller responsibility to spinlock
>>> on CT private lock ? That is not how other guc_send() functions work.
>>>
>>
>> Yes, that how I would like this work as I feel like it gives more  
>> flexability to
>> the caller on the -EBUSY case. The caller can call intel_guc_send_nb  
>> again while
>> still holding the lock or it release lock the and use a different form  
>> of flow
>> control. Perhaps locking / unlocking should be exposed via static  
>> inlines rather
>> than the caller directly manipulating the lock?
>>
>>>> +
>>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +		ct->retry++;
>>>> +		if (ct->retry >= MAX_RETRY)
>>>> +			return -EDEADLK;
>>>> +		else
>>>> +			return -EBUSY;
>>>> +	}
>>>> +
>>>> +	ct->retry = 0;
>>>> +	err = ctb_write(ctb, action, len, 0, false, false);
>>>> +	if (unlikely(err))
>>>> +		return err;
>>>> +
>>>> +	intel_guc_notify(ct_to_guc(ct));
>>>> +	return 0;
>>>> +}
>>>> +
>>>> static int ctch_send(struct intel_guc_ct *ct,
>>>> 		     struct intel_guc_ct_channel *ctch,
>>>> 		     const u32 *action,
>>>> @@ -460,17 +507,35 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK);
>>>> 	GEM_BUG_ON(!response_buf && response_buf_size);
>>>> +	/*
>>>> +	 * We use a lazy spin wait loop here as we believe that if the CT
>>>> +	 * buffers are sized correctly the flow control condition should be
>>>> +	 * rare.
>>>> +	 */
>>>> +retry:
>>>> +	spin_lock_irqsave(&ct->send_lock, flags);
>>>> +	if (unlikely(!ctb_has_room(desc, (len + 1) * 4))) {
>>>> +		spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> +		ct->retry++;
>>>> +		if (ct->retry >= MAX_RETRY)
>>>> +			return -EDEADLK;
>>>
>>> I'm not sure what's better: have secret deadlock hard to reproduce,
>>> or deadlocks easier to catch that helps improve to be deadlock-clean
>>>
>>
>> This is covering the case where the has died and to avoid deadlock.  
>> Eventually
>> we will have some GuC health check code that will trigger a full GPU  
>> reset if
>> the GuC has died. We need a way for code spinning on the CTBs to exit.
>>
>> I've already tweaked this code locally a bit to use an atomic too with  
>> the idea
>> being the GuC health check code can set this value to have all code  
>> spinning on
>> CTBs immediately return --EDEADLK when the GuC has died.
>>
>>>> +		cpu_relax();
>>>> +		goto retry;
>>>> +	}
>>>> +
>>>> +	ct->retry = 0;
>>>> 	fence = ctch_get_next_fence(ctch);
>>>> 	request.fence = fence;
>>>> 	request.status = 0;
>>>> 	request.response_len = response_buf_size;
>>>> 	request.response_buf = response_buf;
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock(&ct->request_lock);
>>>> 	list_add_tail(&request.link, &ct->pending_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock(&ct->request_lock);
>>>> -	err = ctb_write(ctb, action, len, fence, !!response_buf);
>>>> +	err = ctb_write(ctb, action, len, fence, true, !!response_buf);
>>>> +	spin_unlock_irqrestore(&ct->send_lock, flags);
>>>> 	if (unlikely(err))
>>>> 		goto unlink;
>>>> @@ -501,9 +566,9 @@ static int ctch_send(struct intel_guc_ct *ct,
>>>> 	}
>>>> unlink:
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	list_del(&request.link);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	return err;
>>>> }
>>>> @@ -653,7 +718,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 	CT_DEBUG_DRIVER("CT: response fence %u status %#x\n", fence, status);
>>>> -	spin_lock(&ct->lock);
>>>> +	spin_lock(&ct->request_lock);
>>>> 	list_for_each_entry(req, &ct->pending_requests, link) {
>>>> 		if (unlikely(fence != req->fence)) {
>>>> 			CT_DEBUG_DRIVER("CT: request %u awaits response\n",
>>>> @@ -672,7 +737,7 @@ static int ct_handle_response(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 		found = true;
>>>> 		break;
>>>> 	}
>>>> -	spin_unlock(&ct->lock);
>>>> +	spin_unlock(&ct->request_lock);
>>>> 	if (!found)
>>>> 		DRM_ERROR("CT: unsolicited response %*ph\n", 4 * msglen, msg);
>>>> @@ -710,13 +775,13 @@ static bool ct_process_incoming_requests(struct  
>>>> intel_guc_ct *ct)
>>>> 	u32 *payload;
>>>> 	bool done;
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	request = list_first_entry_or_null(&ct->incoming_requests,
>>>> 					   struct ct_incoming_request, link);
>>>> 	if (request)
>>>> 		list_del(&request->link);
>>>> 	done = !!list_empty(&ct->incoming_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	if (!request)
>>>> 		return true;
>>>> @@ -777,9 +842,9 @@ static int ct_handle_request(struct intel_guc_ct  
>>>> *ct, const u32 *msg)
>>>> 	}
>>>> 	memcpy(request->msg, msg, 4 * msglen);
>>>> -	spin_lock_irqsave(&ct->lock, flags);
>>>> +	spin_lock_irqsave(&ct->request_lock, flags);
>>>> 	list_add_tail(&request->link, &ct->incoming_requests);
>>>> -	spin_unlock_irqrestore(&ct->lock, flags);
>>>> +	spin_unlock_irqrestore(&ct->request_lock, flags);
>>>> 	queue_work(system_unbound_wq, &ct->worker);
>>>> 	return 0;
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h  
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> index 7c24d83f5c24..bc670a796bd8 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>>> @@ -62,8 +62,11 @@ struct intel_guc_ct {
>>>> 	struct intel_guc_ct_channel host_channel;
>>>> 	/* other channels are tbd */
>>>> -	/** @lock: protects pending requests list */
>>>> -	spinlock_t lock;
>>>> +	/** @request_lock: protects pending requests list */
>>>> +	spinlock_t request_lock;
>>>> +
>>>> +	/** @send_lock: protects h2g channel */
>>>> +	spinlock_t send_lock;
>>>> 	/** @pending_requests: list of requests waiting for response */
>>>> 	struct list_head pending_requests;
>>>> @@ -73,6 +76,9 @@ struct intel_guc_ct {
>>>> 	/** @worker: worker for handling incoming requests */
>>>> 	struct work_struct worker;
>>>> +
>>>> +	/** @retry: the number of times a H2G CTB has been retried */
>>>> +	u32 retry;
>>>> };
>>>> void intel_guc_ct_init_early(struct intel_guc_ct *ct);
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx