[Intel-gfx] [bug report] drm/i915/guc: Implement multi-lrc submission
Dan Carpenter
dan.carpenter at oracle.com
Fri Sep 23 09:32:39 UTC 2022
On Thu, Sep 22, 2022 at 01:01:48PM -0700, John Harrison wrote:
> On 9/22/2022 07:26, Dan Carpenter wrote:
> > Hello Matthew Brost,
> >
> > The patch 6b540bf6f143: "drm/i915/guc: Implement multi-lrc
> > submission" from Oct 14, 2021, leads to the following Smatch static
> > checker warning:
> >
> > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:752 __guc_add_request()
> > warn: refcount leak 'ce->ref.refcount.refs.counter': lines='752'
> >
> > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > 672 static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > 673 {
> > 674 int err = 0;
> > 675 struct intel_context *ce = request_to_scheduling_context(rq);
> > 676 u32 action[3];
> > 677 int len = 0;
> > 678 u32 g2h_len_dw = 0;
> > 679 bool enabled;
> > 680
> > 681 lockdep_assert_held(&rq->engine->sched_engine->lock);
> > 682
> > 683 /*
> > 684 * Corner case where requests were sitting in the priority list or a
> > 685 * request resubmitted after the context was banned.
> > 686 */
> > 687 if (unlikely(intel_context_is_banned(ce))) {
> > 688 i915_request_put(i915_request_mark_eio(rq));
> > 689 intel_engine_signal_breadcrumbs(ce->engine);
> > 690 return 0;
> > 691 }
> > 692
> > 693 GEM_BUG_ON(!atomic_read(&ce->guc_id.ref));
> > 694 GEM_BUG_ON(context_guc_id_invalid(ce));
> > 695
> > 696 if (context_policy_required(ce)) {
> > 697 err = guc_context_policy_init_v70(ce, false);
> > 698 if (err)
> > 699 return err;
> > 700 }
> > 701
> > 702 spin_lock(&ce->guc_state.lock);
> > 703
> > 704 /*
> > 705 * The request / context will be run on the hardware when scheduling
> > 706 * gets enabled in the unblock. For multi-lrc we still submit the
> > 707 * context to move the LRC tails.
> > 708 */
> > 709 if (unlikely(context_blocked(ce) && !intel_context_is_parent(ce)))
> > 710 goto out;
> > 711
> > 712 enabled = context_enabled(ce) || context_blocked(ce);
> > 713
> > 714 if (!enabled) {
> > 715 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > 716 action[len++] = ce->guc_id.id;
> > 717 action[len++] = GUC_CONTEXT_ENABLE;
> > 718 set_context_pending_enable(ce);
> > 719 intel_context_get(ce);
> > 720 g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
> > 721 } else {
> > 722 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > 723 action[len++] = ce->guc_id.id;
> > 724 }
> > 725
> > 726 err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> > 727 if (!enabled && !err) {
> > 728 trace_intel_context_sched_enable(ce);
> > 729 atomic_inc(&guc->outstanding_submission_g2h);
> > 730 set_context_enabled(ce);
> > 731
> > 732 /*
> > 733 * Without multi-lrc KMD does the submission step (moving the
> > 734 * lrc tail) so enabling scheduling is sufficient to submit the
> > 735 * context. This isn't the case in multi-lrc submission as the
> > 736 * GuC needs to move the tails, hence the need for another H2G
> > 737 * to submit a multi-lrc context after enabling scheduling.
> > 738 */
> > 739 if (intel_context_is_parent(ce)) {
> > 740 action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > 741 err = intel_guc_send_nb(guc, action, len - 1, 0);
> >
> > Should this have a something like:
> >
> > if (err)
> > intel_context_put(ce);
> No.
>
> The ce has been marked as enabled above because the actual enable call
> succeeded.? This is a secondary call to 'poke' the context. If this fails,
> the context might not get scheduled in a timely manner but the tracking
> state is still correct. The context is now in use by GuC and therefore must
> be reference locked. And the 'context_enabled' flag is set on the i915 side
> to note that the reference count is held.
>
> If you want to unwind at this point, you would need to send a
> CONTEXT_MODE_SET(DISABLE) H2G message (amongst other things) and only once
> that call has completed, can you call intel_context_put(ce).
>
> I don't get why the analyser would be claiming a leak here. I'm not sure if
> it is just that the analyser is confused or if there is some other potential
> route to a leak. Is it possible to get more details as to how it thinks the
> leak can occur?
Thanks, this helps me a lot!
This is a Smatch static analysis thing I'm working on. It simple enough
to silence the false positive. I add this line which says that after
set_context_enabled() then ignore the reference counting.
add_function_param_key_hook("set_context_enabled", &match_ignore,
0, "$->ref.refcount.refs.counter", NULL);
The heuristic that the check is using is that if some error paths drop
the refcount then all error paths should do it. So adding the if
statement I suggested would silence the warning (but introduce a bug in
the kernel).
regards,
dan carpenter
More information about the Intel-gfx
mailing list