[Intel-gfx] [PATCH 05/16] drm/i915: Add unit tests for the breadcrumb rbtree, wakeups

Fri Dec 9 09:33:41 UTC 2016

On 08/12/2016 21:04, Chris Wilson wrote:
> On Thu, Dec 08, 2016 at 05:38:34PM +0000, Tvrtko Ursulin wrote:
>>
>> On 07/12/2016 13:58, Chris Wilson wrote:
>>> Third retroactive test, make sure that the seqno waiters are woken.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>> drivers/gpu/drm/i915/intel_breadcrumbs.c | 171 +++++++++++++++++++++++++++++++
>>> 1 file changed, 171 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> index fc950f7ff322..1374a54e41c9 100644
>>> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> @@ -966,11 +966,182 @@ static int igt_insert_complete(void *ignore)
>>> 	return err;
>>> }
>>>
>>> +struct igt_wakeup {
>>> +	struct task_struct *tsk;
>>> +	atomic_t *ready, *set, *done;
>>> +	struct intel_engine_cs *engine;
>>> +	unsigned long flags;
>>> +	wait_queue_head_t *wq;
>>> +	u32 seqno;
>>> +};
>>> +
>>> +static int wait_atomic(atomic_t *p)
>>> +{
>>> +	schedule();
>>> +	return 0;
>>> +}
>>> +
>>> +static int wait_atomic_timeout(atomic_t *p)
>>> +{
>>> +	return schedule_timeout(10 * HZ) ? 0 : -ETIMEDOUT;
>>> +}
>>> +
>>> +static int igt_wakeup_thread(void *arg)
>>> +{
>>> +	struct igt_wakeup *w = arg;
>>> +	struct intel_wait wait;
>>> +
>>> +	while (!kthread_should_stop()) {
>>> +		DEFINE_WAIT(ready);
>>> +
>>> +		for (;;) {
>>> +			prepare_to_wait(w->wq, &ready, TASK_INTERRUPTIBLE);
>>> +			if (atomic_read(w->ready) == 0)
>>> +				break;
>>> +
>>> +			schedule();
>>> +		}
>>> +		finish_wait(w->wq, &ready);
>>
>> Have to say this is the first time I've learnt about
>> wake_up_atomic_t & co. You couldn't use wait_on_atomic_t instead of
>> the loop above?
>
> No, there's only wake_up_atomic_t and not wake_up_all_atomic_t. That was
> a few hours of staring before I released the test was failing because
> only the first was being woken.

wait_on_atomic_t, not wake up. Still a problem with that one? It is a 
bit magic how it works, did not get into details yet so possibly I am wrong.

>>
>>> +		if (atomic_dec_and_test(w->set))
>>> +			wake_up_atomic_t(w->set);
>>> +
>
>> Okay, all the threads have observed that all other threads have been
>> started, yes?
>
> Yes. We can do the wake_up_atomic_t after each (because the other side
> will just go back to sleep until w->set is zero) but since we have to do
> the atomic operation we may as use the result to only do the wakeup
> once.
>
>>> +		if (test_bit(0, &w->flags))
>>> +			break;
>>
>> One thread failed to start = bailout.
>
> That's for the synchronisation on exit. Required because the caller
> frees the structs. If I make the allocation per-thread and free it in
> the worker, we could have a simpler while (!kthread_should_stop()) I
> think.

Doesn't matter to me hugely as long as there are some high level 
comments on what is happening.

>
>> Do you intend to use the flags for something more which precludes a
>> more descriptive name for its single purpose here?
>
> No, bit I typically call my bitmasks flags, unless there is a clear
> reason for them. In this case it is because I couldn't use the
> kthread->flags for my purposes.

My point is it could have been "bool stop_thread" (or something) and 
then I wouldn't have to double check what bits are used and for what.

>
>>> +
>>> +		intel_wait_init(&wait, w->seqno);
>>> +		intel_engine_add_wait(w->engine, &wait);
>>> +		for (;;) {
>>> +			set_current_state(TASK_UNINTERRUPTIBLE);
>>> +			if (i915_seqno_passed(intel_engine_get_seqno(w->engine),
>>> +					      w->seqno))
>>> +				break;
>>> +
>>> +			schedule();
>>> +		}
>>> +		intel_engine_remove_wait(w->engine, &wait);
>>> +		__set_current_state(TASK_RUNNING);
>>> +
>>> +		if (atomic_dec_and_test(w->done))
>>> +			wake_up_atomic_t(w->done);
>>> +	}
>>> +
>>> +	if (atomic_dec_and_test(w->done))
>>> +		wake_up_atomic_t(w->done);
>>
>> Must be a special reason done is decremented in the loop and outside
>> the loop?
>
> Synchronisation with the kfree required the wakeup, but I couldn't find
> a pleasant contortion to only do the wakeup(w->done) once. As above
> could change the allocation to avoid the sync before kfree.

I don't see how this last decrement does not overflow. Assuming all have 
decremented in the loop above, then you exit to do it once more. Or 
maybe the value doesn't matter at this point since it will get 
overwritten in the next iteration, hm. It is confusing.

>>
>>> +	return 0;
>>> +}
>>> +
>>> +static void igt_wake_all_sync(atomic_t *ready,
>>> +			      atomic_t *set,
>>> +			      atomic_t *done,
>>> +			      wait_queue_head_t *wq,
>>> +			      int count)
>>> +{
>>> +	atomic_set(set, count);
>>> +	atomic_set(done, count);
>>> +
>>> +	atomic_set(ready, 0);
>>> +	wake_up_all(wq);
>>> +
>>> +	wait_on_atomic_t(set, wait_atomic, TASK_UNINTERRUPTIBLE);
>>> +	atomic_set(ready, count);
>>> +}
>>> +
>>> +static int igt_wakeup(void *ignore)
>>> +{
>>> +	const int state = TASK_UNINTERRUPTIBLE;
>>> +	struct intel_engine_cs *engine;
>>> +	struct igt_wakeup *waiters;
>>> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
>>> +	const int count = 4096;
>>> +	const u32 max_seqno = count / 4;
>>> +	atomic_t ready, set, done;
>>> +	int err = -ENOMEM;
>>> +	int n, step;
>>> +
>>> +	engine = mock_engine("mock");
>>> +	if (!engine)
>>> +		goto out;
>>> +
>>> +	waiters = drm_malloc_gfp(count, sizeof(*waiters), GFP_TEMPORARY);
>>> +	if (!waiters)
>>> +		goto out_engines;
>>> +
>>> +	atomic_set(&ready, count);
>>> +	for (n = 0; n < count; n++) {
>>> +		waiters[n].wq = &wq;
>>> +		waiters[n].ready = &ready;
>>> +		waiters[n].set = &set;
>>> +		waiters[n].done = &done;
>>> +		waiters[n].engine = engine;
>>> +		waiters[n].flags = 0;
>>> +
>>> +		waiters[n].tsk = kthread_run(igt_wakeup_thread, &waiters[n],
>>> +					     "i915/igt:%d", n);
>>> +		if (IS_ERR(waiters[n].tsk))
>>> +			goto out_waiters;
>>> +
>>> +		get_task_struct(waiters[n].tsk);
>>> +	}
>>> +
>>
>> It is time to start documenting the tests I think via nice comments
>> at strategic places. Probably a short commentary on the test as a
>> whole and then separately at interesting steps.
>>
>>> +	for (step = 1; step <= max_seqno; step <<= 1) {
>>> +		u32 seqno;
>>> +
>>> +		for (n = 0; n < count; n++)
>>> +			waiters[n].seqno = 1 + get_random_int() % max_seqno;
>>
>> So you have 4096 waiters but some are waiting on the same seqno,
>> since there are at most 1024 unique seqnos. Took me a while to
>> figure this one out.
>
> Yup.
>
>>> +
>>> +		mock_seqno_advance(engine, 0);
>>> +		igt_wake_all_sync(&ready, &set, &done, &wq, count);
>>> +
>>> +		for (seqno = 1; seqno <= max_seqno + step; seqno += step) {
>>
>> First step wakes up all seqnos one by one, the other steps do it in
>> chunks with a larger and larger skip. All the way to waking the
>> whole bunch at once?
>
> Yes.
>
>>> +			usleep_range(50, 500);
>>
>> Why sleep? It should work without it, no?
>
> Yes. But we want the wakeups to happen in batches, not the first waiter
> seeing the entire sequence is complete and waiting up everyone.

Hm yes, makes sense. Simulates the GPU execution speed could be a comment.

>
>>> +			mock_seqno_advance(engine, seqno);
>>> +		}
>>> +		GEM_BUG_ON(intel_engine_get_seqno(engine) < 1 + max_seqno);
>>> +
>>> +		err = wait_on_atomic_t(&done, wait_atomic_timeout, state);
>>> +		if (err) {
>>> +			pr_err("Timed out waiting for %d remaining waiters\n",
>>> +			       atomic_read(&done));
>>> +			break;
>>> +		}
>>> +
>>> +		err = check_rbtree_empty(engine);
>>> +		if (err)
>>> +			break;
>>> +	}
>>> +
>>> +out_waiters:
>>> +	for (n = 0; n < count; n++) {
>>> +		if (IS_ERR(waiters[n].tsk))
>>> +			break;
>>> +
>>> +		set_bit(0, &waiters[n].flags);
>>> +	}
>>> +
>>> +	igt_wake_all_sync(&ready, &set, &done, &wq, n);
>>> +	wait_on_atomic_t(&done, wait_atomic, state);
>>
>> C-O-M-M-E-N-T-S! :D
>>
>>> +
>>> +	for (n = 0; n < count; n++) {
>>> +		if (IS_ERR(waiters[n].tsk))
>>> +			break;
>>> +
>>> +		kthread_stop(waiters[n].tsk);
>>> +		put_task_struct(waiters[n].tsk);
>>> +	}
>>> +
>>> +	drm_free_large(waiters);
>>> +out_engines:
>>> +	kfree(engine);
>>> +out:
>>> +	return err;
>>> +}
>>> +
>>> int intel_breadcrumbs_selftest(void)
>>> {
>>> 	static const struct i915_subtest tests[] = {
>>> 		SUBTEST(igt_random_insert_remove),
>>> 		SUBTEST(igt_insert_complete),
>>> +		SUBTEST(igt_wakeup),
>>> 	};
>>>
>>> 	return i915_subtests(tests, NULL);
>>>
>>
>> Phew, looks mostly OK. I think only one thing I am unsure of.
>>
>> This was quite smart, please start adding comments when you come up
>> with similar things. :)
>
> Bah, you are mistaking throwing ideas/snippets at a wall and only picking
> those that stick.

Idea looks good to me!

The whole initiative actually is excellent.

Regards,

Tvrtko