[RFC PATCH 1/1] drm/mm: add ig_frag selftest

Tue Jun 2 14:13:43 UTC 2020

Hi Christian,

On 6/2/20 2:47 PM, Christian König wrote:
> Nirmoy please keep in mind that your current implementation doesn't 
> fully solve the issue the test case is exercising.
>
> In other words what you have implement is fast skipping of fragmented 
> address space for bottom-up and top-down.
>
> But what this test here exercises is the fast skipping of aligned 
> allocations. You should probably adjust the test case a bit.

Allocations with size=4k and aign = 8k is known to introduce 
fragmentation, do you mean I should only test bottom-up and top-down

for now ?

Regards,

Nirmoy

>
>
> Regards,
> Christian.
>
> Am 29.05.20 um 23:01 schrieb Nirmoy:
>>
>> On 5/29/20 5:52 PM, Chris Wilson wrote:
>>> Quoting Nirmoy (2020-05-29 16:40:53)
>>>> This works correctly most of the times but sometimes
>>
>>
>> I have to take my word back. In another machine,  20k insertions in
>>
>> best mode takes 6-9 times more than 10k insertions, all most all the 
>> time.
>>
>> evict, bottom-up and top-down modes remains in 2-5 times range.
>>
>>
>> If I reduce the insertions to 1k and 2k then scaling factor for best 
>> mode stays  below 4 most of the time.
>>
>> evict, bottom-up and top-down modes remains in 2-3 times range.
>>
>>
>> I wonder if it makes sense to test with only 1k and 2k insertions and 
>> tolerate more than error if the mode == best.
>>
>> Regards,
>>
>> Nirmoy
>>
>>>>
>>>> 20k insertions can take more than 8 times of 10k insertion time.
>>> The pressure is on to improve then :)
>>>
>>>> Regards,
>>>>
>>>> Nirmoy
>>>>
>>>> On 5/29/20 6:33 PM, Nirmoy Das wrote:
>>>>> This patch introduces fragmentation in the address range
>>>>> and measures time taken by 10k and 20k insertions. ig_frag()
>>>>> will fail if time taken by 20k insertions takes more than 4 times
>>>>> of 10k insertions as we know that insertions scale quadratically.
>>>>> Also tolerate 10% error because of kernel scheduler's jitters.
>>>>>
>>>>> Output:
>>>>> <snip>
>>>>> [ 8092.653518] drm_mm: Testing DRM range manger (struct drm_mm), 
>>>>> with random_seed=0x9bfb4117 max_iterations=8192 max_prime=128
>>>>> [ 8092.653520] drm_mm: igt_sanitycheck - ok!
>>>>> [ 8092.653525] igt_debug 0x0000000000000000-0x0000000000000200: 
>>>>> 512: free
>>>>> [ 8092.653526] igt_debug 0x0000000000000200-0x0000000000000600: 
>>>>> 1024: used
>>>>> [ 8092.653527] igt_debug 0x0000000000000600-0x0000000000000a00: 
>>>>> 1024: free
>>>>> [ 8092.653528] igt_debug 0x0000000000000a00-0x0000000000000e00: 
>>>>> 1024: used
>>>>> [ 8092.653529] igt_debug 0x0000000000000e00-0x0000000000001000: 
>>>>> 512: free
>>>>> [ 8092.653529] igt_debug total: 4096, used 2048 free 2048
>>>>> [ 8112.569813] drm_mm: best fragmented insert of 10000 and 20000 
>>>>> insertions took 504 and 1996 msecs
>>>>> [ 8112.723254] drm_mm: bottom-up fragmented insert of 10000 and 
>>>>> 20000 insertions took 44 and 108 msecs
>>>>> [ 8112.813212] drm_mm: top-down fragmented insert of 10000 and 
>>>>> 20000 insertions took 40 and 44 msecs
>>>>> [ 8112.847733] drm_mm: evict fragmented insert of 10000 and 20000 
>>>>> insertions took 8 and 20 msecs
>>>>> <snip>
>>>>>
>>>>> Signed-off-by: Nirmoy Das <nirmoy.das at amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/selftests/drm_mm_selftests.h |  1 +
>>>>>    drivers/gpu/drm/selftests/test-drm_mm.c      | 73 
>>>>> ++++++++++++++++++++
>>>>>    2 files changed, 74 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/selftests/drm_mm_selftests.h 
>>>>> b/drivers/gpu/drm/selftests/drm_mm_selftests.h
>>>>> index 6b943ea1c57d..8c87c964176b 100644
>>>>> --- a/drivers/gpu/drm/selftests/drm_mm_selftests.h
>>>>> +++ b/drivers/gpu/drm/selftests/drm_mm_selftests.h
>>>>> @@ -14,6 +14,7 @@ selftest(insert, igt_insert)
>>>>>    selftest(replace, igt_replace)
>>>>>    selftest(insert_range, igt_insert_range)
>>>>>    selftest(align, igt_align)
>>>>> +selftest(frag, igt_frag)
>>>>>    selftest(align32, igt_align32)
>>>>>    selftest(align64, igt_align64)
>>>>>    selftest(evict, igt_evict)
>>>>> diff --git a/drivers/gpu/drm/selftests/test-drm_mm.c 
>>>>> b/drivers/gpu/drm/selftests/test-drm_mm.c
>>>>> index 9aabe82dcd3a..05d8f3659b4d 100644
>>>>> --- a/drivers/gpu/drm/selftests/test-drm_mm.c
>>>>> +++ b/drivers/gpu/drm/selftests/test-drm_mm.c
>>>>> @@ -1033,6 +1033,79 @@ static int igt_insert_range(void *ignored)
>>>>>        return 0;
>>>>>    }
>>>>>    +static int get_insert_time(unsigned int num_insert,
>>>>> +                        const struct insert_mode *mode)
>>>>> +{
>>>>> +     struct drm_mm mm;
>>>>> +     struct drm_mm_node *nodes, *node, *next;
>>>>> +     unsigned int size = 4096, align = 8192;
>>>>> +     unsigned long start;
>>>>> +     unsigned int i;
>>>>> +     int ret = -EINVAL;
>>>>> +
>>>>> +     drm_mm_init(&mm, 1, U64_MAX - 2);
>>>>> +     nodes = vzalloc(array_size(num_insert, sizeof(*nodes)));
>>>>> +     if (!nodes)
>>>>> +             goto err;
>>>>> +
>>>>> +     start = jiffies;
>>> Use ktime_t start = ktime_now();
>>>
>>>>> +     for (i = 0; i < num_insert; i++) {
>>>>> +             if (!expect_insert(&mm, &nodes[i], size, align, i, 
>>>>> mode)) {
>>>>> +                     pr_err("%s insert failed\n", mode->name);
>>>>> +                     goto out;
>>>>> +             }
>>>>> +     }
>>>>> +
>>>>> +     ret = jiffies_to_msecs(jiffies - start);
>>> ret = ktime_sub(ktime_now(), start);
>>>
>>> The downside to using ktime is remembering it is s64 and so requires 
>>> care
>>> and attention in doing math.
>>>
>>>>> +out:
>>>>> +     drm_mm_for_each_node_safe(node, next, &mm)
>>>>> +             drm_mm_remove_node(node);
>>>>> +     drm_mm_takedown(&mm);
>>>>> +     vfree(nodes);
>>>>> +err:
>>>>> +     return ret;
>>>>> +
>>>>> +}
>>>>> +
>>>>> +static int igt_frag(void *ignored)
>>>>> +{
>>>>> +     const struct insert_mode *mode;
>>>>> +     unsigned int insert_time1, insert_time2;
>>>>> +     unsigned int insert_size = 10000;
>>>>> +     unsigned int scale_factor = 4;
>>>>> +     /* tolerate 10% excess insertion duration */
>>>>> +     unsigned int error_factor = 110;
>>>>> +     int ret = -EINVAL;
>>>>> +
>>>>> +     for (mode = insert_modes; mode->name; mode++) {
>>>>> +             unsigned int expected_time;
>>>>> +
>>>>> +             insert_time1 = get_insert_time(insert_size, mode);
>>>>> +             if (insert_time1 < 0)
>>>>> +                     goto err;
>>> Ah, can you propagate the actual error. I see you are returning EINVAL
>>> for ENOMEM errors. Just wait until it hits and you have to debug why :)
>>>
>>>>> +             insert_time2 = get_insert_time((insert_size * 2), 
>>>>> mode);
>>>>> +             if (insert_time2 < 0)
>>>>> +                     goto err;
>>>>> +
>>>>> +             expected_time = (scale_factor * insert_time1 *
>>>>> +                              error_factor)/100;
>>>>> +             if (insert_time2 > expected_time) {
>>>>> +                     pr_err("%s fragmented insert took more %u 
>>>>> msecs\n",
>>>>> +                            mode->name, insert_time2 - 
>>>>> expected_time);
>>>>> +                     goto err;
>>>>> +             }
>>>>> +
>>>>> +             pr_info("%s fragmented insert of %u and %u 
>>>>> insertions took %u and %u msecs\n",
>>>>> +                     mode->name, insert_size, insert_size * 2, 
>>>>> insert_time1,
>>>>> +                     insert_time2);
>>> Put the info first before the error. We always want the full details,
>>> with the error message explaining why it's unhappy.
>>> -Chris
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cnirmoy.das%40amd.com%7C5c7df129b9cf44b3ae4008d803e84445%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637263643415833767&sdata=PrCQse4nhN0ZITT9OniuHhF7A5uxJD6ehk0PMjm7WMU%3D&reserved=0 
>>>
>