[Intel-gfx] [PATCH i-g-t v2] tests/drv_hangman: test for acthd increasing through invalid VM space

Thu Feb 25 12:04:06 UTC 2016

On 25/02/16 11:32, Chris Wilson wrote:
> On Thu, Feb 25, 2016 at 11:12:06AM +0000, Daniele Ceraolo Spurio wrote:
>>
>> On 25/02/16 10:41, Chris Wilson wrote:
>>> On Thu, Feb 25, 2016 at 10:32:11AM +0000, daniele.ceraolospurio at intel.com wrote:
>>>> +/* This test covers the case where we end up in an uninitialised area of the
>>>> + * ppgtt at an offset greater than the one where the last buffer is mapped. This
>>>> + * is particularly relevant if 48b ppgtt is enabled because the ppgtt is
>>>> + * massively bigger compared to the 32b case and it takes a lot more time to
>>>> + * wrap, so the acthd can potentially keep increasing for a long time
>>>> + */
>>>> +#define NSEC_PER_SEC	1000000000L
>>>> +static void ppgtt_walking(void)
>>>> +{
>>>> +	int fd;
>>>> +	int64_t timeout_ns = 100 * NSEC_PER_SEC; /* 100 seconds */
>>> This needs a note that this has to be greater than ~5*hangcheck.
>>>
>>>> +	struct drm_i915_gem_execbuffer2 execbuf;
>>>> +	struct drm_i915_gem_exec_object2 gem_exec;
>>>> +	uint32_t handle;
>>>> +	uint32_t batch[4];
>>>> +
>>>> +	fd = drm_open_driver(DRIVER_INTEL);
>>>> +	igt_require(gem_gtt_type(fd) > 2);
>>> Nope, just full-ppgtt is required (and provides a sensible hangcheck
>>> test if !48bit as well).
>>>
>>> Note this does require that the hangcheck is enabled, so igt_require().
>>>
>>>> +
>>>> +	/* the batch will be mapped to an offset < 4GB because the flag to allow
>>>> +	 * 48b offsets is not specified, so jump to address 0x00000001 00000000
>>>> +	 */
>>>> +	batch[0] = MI_BATCH_BUFFER_START | 1;
>>>> +	batch[1] = 0;
>>>> +	batch[2] = 1;
>>>> +	batch[3] = MI_BATCH_BUFFER_END;
>>> The vm is entirely empty. Just submit an unterminated (empty) batch, and
>>> it will walk from 0 to 1<<48bit and around and around and around and
>>> around...
>> I chose to jump instead of just leaving the batch unterminated to
>> cover the (rare) case where the rest of the allocated 4k of the
>> batch contain some random values, which could cause a hang and thus
>> falsely pass the test.
> That would be a huge kernel bug. Freshly allocated buffers have to be
> zero to avoid information leaks. I hope you are confusing allocating
> from the userspace buffer cache with a fresh kernel allocation...
> -Chris
>

Apologies for the confusion, you're correct I was thinking about it from 
a libdrm level and not from a bare kernel level.

Daniele