[Intel-gfx] [RFC 10/12] cgroup/drm: Introduce weight based drm cgroup control
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Feb 2 14:26:06 UTC 2023
On 28/01/2023 01:11, Tejun Heo wrote:
> On Thu, Jan 12, 2023 at 04:56:07PM +0000, Tvrtko Ursulin wrote:
> ...
>> + /*
>> + * 1st pass - reset working values and update hierarchical weights and
>> + * GPU utilisation.
>> + */
>> + if (!__start_scanning(root, period_us))
>> + goto out_retry; /*
>> + * Always come back later if scanner races with
>> + * core cgroup management. (Repeated pattern.)
>> + */
>> +
>> + css_for_each_descendant_pre(node, &root->css) {
>> + struct drm_cgroup_state *drmcs = css_to_drmcs(node);
>> + struct cgroup_subsys_state *css;
>> + unsigned int over_weights = 0;
>> + u64 unused_us = 0;
>> +
>> + if (!css_tryget_online(node))
>> + goto out_retry;
>> +
>> + /*
>> + * 2nd pass - calculate initial budgets, mark over budget
>> + * siblings and add up unused budget for the group.
>> + */
>> + css_for_each_child(css, &drmcs->css) {
>> + struct drm_cgroup_state *sibling = css_to_drmcs(css);
>> +
>> + if (!css_tryget_online(css)) {
>> + css_put(node);
>> + goto out_retry;
>> + }
>> +
>> + sibling->per_s_budget_us =
>> + DIV_ROUND_UP_ULL(drmcs->per_s_budget_us *
>> + sibling->weight,
>> + drmcs->sum_children_weights);
>> +
>> + sibling->over = sibling->active_us >
>> + sibling->per_s_budget_us;
>> + if (sibling->over)
>> + over_weights += sibling->weight;
>> + else
>> + unused_us += sibling->per_s_budget_us -
>> + sibling->active_us;
>> +
>> + css_put(css);
>> + }
>> +
>> + /*
>> + * 3rd pass - spread unused budget according to relative weights
>> + * of over budget siblings.
>> + */
>> + css_for_each_child(css, &drmcs->css) {
>> + struct drm_cgroup_state *sibling = css_to_drmcs(css);
>> +
>> + if (!css_tryget_online(css)) {
>> + css_put(node);
>> + goto out_retry;
>> + }
>> +
>> + if (sibling->over) {
>> + u64 budget_us =
>> + DIV_ROUND_UP_ULL(unused_us *
>> + sibling->weight,
>> + over_weights);
>> + sibling->per_s_budget_us += budget_us;
>> + sibling->over = sibling->active_us >
>> + sibling->per_s_budget_us;
>> + }
>> +
>> + css_put(css);
>> + }
>> +
>> + css_put(node);
>> + }
>> +
>> + /*
>> + * 4th pass - send out over/under budget notifications.
>> + */
>> + css_for_each_descendant_post(node, &root->css) {
>> + struct drm_cgroup_state *drmcs = css_to_drmcs(node);
>> +
>> + if (!css_tryget_online(node))
>> + goto out_retry;
>> +
>> + if (drmcs->over || drmcs->over_budget)
>> + signal_drm_budget(drmcs,
>> + drmcs->active_us,
>> + drmcs->per_s_budget_us);
>> + drmcs->over_budget = drmcs->over;
>> +
>> + css_put(node);
>> + }
>
> It keeps bothering me that the distribution logic has no memory. Maybe this
> is good enough for coarse control with long cycle durations but it likely
> will get in trouble if pushed to finer grained control. State keeping
> doesn't require a lot of complexity. The only state that needs tracking is
> each cgroup's vtime and then the core should be able to tell specific
> drivers how much each cgroup is over or under fairly accurately at any given
> time.
>
> That said, this isn't a blocker. What's implemented can work well enough
> with coarse enough time grain and that might be enough for the time being
> and we can get back to it later. I think Michal already mentioned it but it
> might be a good idea to track active and inactive cgroups and build the
> weight tree with only active ones. There are machines with a lot of mostly
> idle cgroups (> tens of thousands) and tree wide scanning even at low
> frequency can become a pretty bad bottleneck.
Right, that's the kind of experience (tens of thousands) I was missing,
thank you. Another one item on my TODO list then but I have a question
first.
When you say active/inactive - to what you are referring in the cgroup
world? Offline/online? For those my understanding was offline was a
temporary state while css is getting destroyed.
Also, I am really postponing implementing those changes until I hear at
least something from the DRM community.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list