[igt-dev] [PATCH i-g-t v3 1/2] lib/igt_gt: Check for shared reset domain
John Harrison
john.c.harrison at intel.com
Wed Jan 19 19:37:44 UTC 2022
On 1/19/2022 11:32, Matt Roper wrote:
> On Wed, Jan 19, 2022 at 11:09:00AM -0800, John Harrison wrote:
>> On 1/18/2022 14:32, Matt Roper wrote:
>>> On Mon, Jan 17, 2022 at 12:20:42PM -0800, Dixit, Ashutosh wrote:
>>>> On Mon, 17 Jan 2022 00:56:59 -0800, <priyanka.dandamudi at intel.com> wrote:
>>>>> +bool has_shared_reset_domain(int fd, const intel_ctx_t *ctx)
>>>>> +{
>>>>> + const struct intel_execution_engine2 *e;
>>>>> + bool rcs0 = false;
>>>>> + bool ccs0 = false;
>>>>> + int ccs_count = 0;
>>>>> +
>>>>> + for_each_ctx_engine(fd, ctx, e) {
>>>>> + if ((rcs0 && ccs0) || (ccs_count > 1))
>>>>> + break;
>>>>> + else if (e->class == I915_ENGINE_CLASS_RENDER)
>>>>> + rcs0 = true;
>>>>> + else if (e->class == I915_ENGINE_CLASS_COMPUTE) {
>>>>> + ccs0 = true;
>>>>> + ccs_count++;
>>>>> + }
>>>>> + }
>>>>> + return ((rcs0 && ccs0) || (ccs_count > 1));
>>>>> +}
>>>> No need for bool, just use counts. Something like:
>>>>
>>>> bool has_shared_reset_domain(int fd, const intel_ctx_t *ctx)
>>>> {
>>>> const struct intel_execution_engine2 *e;
>>>> int rcs = 0, ccs = 0;
>>>>
>>>> for_each_ctx_engine(fd, ctx, e) {
>>>> if (e->class == I915_ENGINE_CLASS_RENDER)
>>>> rcs++;
>>>> else if (e->class == I915_ENGINE_CLASS_COMPUTE)
>>>> ccs++;
>>>> }
>>>>
>>>> return ((rcs && ccs) || (ccs >= 2));
>>>> }
>>>>
>>>> Hmm, this can just be:
>>>>
>>>> bool has_shared_reset_domain(int fd, const intel_ctx_t *ctx)
>>>> {
>>>> const struct intel_execution_engine2 *e;
>>>> int count = 0;
>>>>
>>>> for_each_ctx_engine(fd, ctx, e)
>>>> if (e->class == I915_ENGINE_CLASS_RENDER ||
>>>> e->class == I915_ENGINE_CLASS_COMPUTE)
>>>> count++;
>>>>
>>>> return count >= 2;
>>>> }
>>> Yeah, there's no reason to count RCS and CCS separately; all we care
>>> about is whether there's more than one engine in the shared reset
>>> domain.
>>>
>>> However I think any approach that involves just counting engines from
>>> userspace isn't really going to work once we start supporting multi-tile
>>> since the various RCS/CCS engines on two separate GTs do belong to
>>> separate reset domains. E.g., if you query the engine list and see just
>>> CCS0 and CCS1 (or even RCS0 and CCS0) you don't know whether those
>>> engines are both from a single GT and thus share a reset domain, or
>>> whether they come from different GTs and each has its own reset domain.
>> There is a query API to determine which engine is on which tile, isn't
>> there? The distance thing? If CCS0 and CCS1 have a distance of zero they are
>> on the same tile and dependent, otherwise they are on different tiles and
>> independent?
> If I recall correctly, the distance query is only for determining the
> distance of an engine from a specific memory region. So if you had a
> four-tile layout like
>
> 0 - 1
> | |
> 2 - 3
>
> some pairs of tiles would be the same distance from any single memory
> region you choose. You'd potentially have to compare the distance to
> multiple memory regions to figure out whether the engines truly
> originated from the same tile or not. So I guess it's doable (if/when
> the distance query arrives), but kind of ugly. It seems like userspace
> would rather have a more straightforward way to determine this, such as
> a reset ID (integer) associated with every engine. If two engines have
> the same reset ID, userspace would know they're tied together without
> jumping through any extra hoops. I don't know what the UMD plans are in
> this area though.
Yeah, there was a policy of jumping through as many hoops as possible to
hide multi-tile hardware from UMDs. I think we have moved away from that
now? So maybe we can just report the tile id for a given engine? I think
there are valid UMD reasons for wanting to know that - load balancing of
compute workloads, for example. But maybe those usages are still tied to
memory regions anyway? Either way, I don't think there is a valid reason
for UMDs to need to know about engine reset dependencies. So I can't see
any API to expose that being approved. It would have to be a debugfs
interface only. But if we can expose the tile id, that would be sufficient.
John.
>
> Matt
>
>> However, I wonder if this is the right approach at all. What should a test
>> do on a multi-tile system? Test each tile independently but skip dependent
>> engines? Test them together with RCS of one tile vs RCS of the other? Just
>> skip everything? Seems like it is likely to be a different requirement for
>> different tests. So maybe the API is going to need to take take two engines
>> as a parameter and say whether those two specific engines are shared or not?
>> It's all getting extremely messy.
>>
>> John.
>>
>>> Matt
>>>
More information about the igt-dev
mailing list