[igt-dev] [Intel-gfx] [PATCH i-g-t] igt/gem_userptr: Check read-only mappings

Thu Jun 28 18:09:18 UTC 2018

Quoting Tvrtko Ursulin (2018-06-28 17:56:24)
> 
> On 27/06/2018 20:44, Chris Wilson wrote:
> > +static void test_readonly(int i915)
> 
> Hm.. nice interesting and novel fd naming I think. fd, gem_fd I know we 
> have. I wonder if we have drm_fd as well somewhere. Just thinking out 
> loud...

Not that novel.

> > +{
> > +     unsigned char orig[SHA_DIGEST_LENGTH];
> > +     uint64_t aperture_size;
> > +     uint32_t whandle, rhandle;
> > +     size_t sz, total;
> > +     void *pages, *space;
> > +     int memfd;
> > +
> > +     /*
> > +      * A small batch of pages; small enough to cheaply check for stray
> > +      * writes but large enough that we don't create too many VMA pointing
> > +      * back to this set from the large arena. The limit on total number
> > +      * of VMA for a process is 65,536 (at least on this kernel).
> > +      */
> > +     sz = 16 << 12;
> 
> 12 for page size, so 16 pages? How it is related to VMA limit from the 
> comment?

A few lines later in this block.

> > +     memfd = memfd_create("pages", 0);
> > +     igt_require(memfd != -1);
> 
> igt_require_fd is available if you care for it...

Not fond of it.

> > +     igt_require(ftruncate(memfd, sz) == 0);
> 
> ..and igt_require_eq, the double sided sword of API growing rich, huh? :)

Nope, return code is not interesting, thanks glibc.

> > +
> > +     pages = mmap(NULL, sz, PROT_WRITE, MAP_SHARED, memfd, 0);
> > +     igt_assert(pages != MAP_FAILED);
> > +
> > +     igt_require(__gem_userptr(i915, pages, sz, true, userptr_flags, &rhandle) == 0);
> > +     gem_close(i915, rhandle);
> > +
> > +     gem_userptr(i915, pages, sz, false, userptr_flags, &whandle);
> > +
> > +     total = 2048ull << 20;
> 
> Why 2GiB? Express with the VMA limit and sz or just accidentally half of 
> the VMA limit?

Nah, the largest offset we can use is 4G, and we can't use the full range
as we need some extra room for batches, and we can't use the full VMA
limit without serious slow down and risk of exhaustion.
Then sticking to a pot.

> > +     aperture_size = gem_aperture_size(i915) / 2;
> > +     if (aperture_size < total)
> > +             total = aperture_size;
> > +     total = total / sz * sz;
> 
> There is round_down in lib/igt_primes but it would need exporting.
> 
> > +     igt_info("Using a %'zuB (%'zu pages) arena onto %zu pages\n",
> > +              total, total >> 12, sz >> 12);
> > +
> > +     /* Create an arena all pointing to the same set of pages */
> > +     space = mmap(NULL, total, PROT_READ, MAP_ANON | MAP_SHARED, -1, 0);
> 
> Allocating address space only?

Repeating set of PTEs.

> > +     igt_require(space != MAP_FAILED);
> > +     for (size_t offset = 0; offset < total; offset += sz) {
> > +             igt_assert(mmap(space + offset, sz,
> > +                             PROT_WRITE, MAP_SHARED | MAP_FIXED,
> > +                             memfd, 0) != MAP_FAILED);
> > +             *(uint32_t *)(space + offset) = offset;
> > +     }
> > +     igt_assert_eq_u32(*(uint32_t *)pages, (uint32_t)(total - sz));
> 
> Checking that "arena" somewhat works, ok..

Checking we can allocate.

> > +     igt_assert(mlock(space, total) == 0);
> > +     close(memfd);
> > +
> > +     /* Check we can create a normal userptr bo wrapping the wrapper */
> > +     gem_userptr(i915, space, total, false, userptr_flags, &rhandle);
> > +     gem_set_domain(i915, rhandle, I915_GEM_DOMAIN_CPU, 0);
> > +     for (size_t offset = 0; offset < total; offset += sz)
> > +             store_dword(i915, rhandle, offset + 4, offset / sz);
> > +     gem_sync(i915, rhandle);
> 
> Do you need to move it back to CPU domain before the asserts?

The set-domain checks we can populate the userptr.

> > +     igt_assert_eq_u32(*(uint32_t *)(pages + 0), (uint32_t)(total - sz));
> > +     igt_assert_eq_u32(*(uint32_t *)(pages + 4), (uint32_t)(total / sz - 1));
> 
> Please add a comment somewhere higher up explaining the layout - I got 
> lost what is in the first dword and what in the second of each page, and 
> who writes each.

First dword written by CPU of the page address. Second dword written by
GPU of the overlap. Just checking the setup of the arena.

It's irrelevant to the rest of the test, so not sure it's worth
repeating the code.

> > +     gem_close(i915, rhandle);
> > +
> > +     /* Now enforce read-only henceforth */
> > +     igt_assert(mprotect(space, total, PROT_READ) == 0);
> 
> No writes from the CPU, ok, I suppose to guarantee if there is a write 
> where it came from.

To check the read-only part; to import a PROT_READ set of pages, you
must use I915_USERPTR_READ_ONLY.

> Please add a high level comment what the following block will test and how.

Oh, it's just the same old test as in the kernel with the stages
explained.

> > +     SHA1(pages, sz, orig);
> > +     igt_fork(child, 1) {
> > +             const int gen = intel_gen(intel_get_drm_devid(i915));
> > +             const int nreloc = 1024;
> 
> This has a relationship to the size of the batch buffer created lower below?

We want to use this number of nrelocs? It only has to be less. 64k is
interesting for something else.

> > +             struct drm_i915_gem_execbuffer2 exec;
> > +             unsigned char ref[SHA_DIGEST_LENGTH], result[SHA_DIGEST_LENGTH];
> > +             uint32_t *batch;
> > +             int i;
> > +
> > +             reloc = calloc(sizeof(*reloc), nreloc);
> > +             gem_userptr(i915, space, total, true, userptr_flags, &rhandle);
> > +
> > +
> 
> Extra newline.
> 
> > +             memset(obj, 0, sizeof(obj));
> > +             obj[0].flags = LOCAL_EXEC_OBJECT_SUPPORTS_48B;
> > +             obj[1].handle = gem_create(i915, 4096*16);
> 
> This is the size of store dw times times nreloc? Relationships need to 
> be clearer and expressed in one place.

Nope. It's 16 pages.

> > +             obj[1].relocation_count = nreloc;
> > +             obj[1].relocs_ptr = to_user_pointer(reloc);
> > +
> > +             batch = gem_mmap__wc(i915, obj[1].handle, 0, 4096*16, PROT_WRITE);
> > +
> > +             memset(&exec, 0, sizeof(exec));
> > +             exec.buffer_count =2;
> > +             exec.buffers_ptr = to_user_pointer(obj);
> > +
> > +             for_each_engine(i915, exec.flags) {
> > +                     /* First tweak the backing store through the write */
> > +                     i = 0;
> > +                     obj[0].handle = whandle;
> > +                     for (int n = 0; n < nreloc; n++) {
> > +                             uint64_t offset;
> > +
> > +                             reloc[n].target_handle = obj[0].handle;
> > +                             reloc[n].delta = 4*(rand() % (sz/4));
> > +                             reloc[n].offset = (i+1) * sizeof(uint32_t);
> 
> You can add spaces around operators to follow our coding style since 
> space is not constrained here.
> 
> > +                             reloc[n].presumed_offset = obj[0].offset;
> > +                             reloc[n].read_domains = I915_GEM_DOMAIN_RENDER;
> > +                             reloc[n].write_domain = I915_GEM_DOMAIN_RENDER;
> > +
> > +                             offset = reloc[n].presumed_offset + reloc[n].delta;
> > +
> > +                             batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> > +                             if (gen >= 8) {
> > +                                     batch[++i] = offset;
> > +                                     batch[++i] = offset >> 32;
> > +                             } else if (gen >= 4) {
> > +                                     batch[++i] = 0;
> > +                                     batch[++i] = offset;
> > +                                     reloc[n].offset += sizeof(uint32_t);
> > +                             } else {
> > +                                     batch[i]--;
> > +                                     batch[++i] = offset;
> > +                             }
> > +                             batch[++i] = rand();
> > +                             i++;
> > +                     }
> > +                     batch[i] = MI_BATCH_BUFFER_END;
> 
> Somehow make this possible via previously added store_dword helper 
> instead of duplicating?

There's no point making either more complicated, as I think this is very
straightforward.

> > +
> > +                     gem_execbuf(i915, &exec);
> > +                     gem_sync(i915, obj[0].handle);
> > +                     SHA1(pages, sz, ref);
> > +
> > +                     igt_assert(memcmp(ref, orig, sizeof(ref)));
> > +                     memcpy(orig, ref, sizeof(orig));
> > +
> > +                     /* Now try the same through the read-only handle */
> > +                     i = 0;
> > +                     obj[0].handle = rhandle;
> > +                     for (int n = 0; n < nreloc; n++) {
> > +                             uint64_t offset;
> > +
> > +                             reloc[n].target_handle = obj[0].handle;
> > +                             reloc[n].delta = 4*(rand() % (total/4));
> > +                             reloc[n].offset = (i+1) * sizeof(uint32_t);
> > +                             reloc[n].presumed_offset = obj[0].offset;
> > +                             reloc[n].read_domains = I915_GEM_DOMAIN_RENDER;
> > +                             reloc[n].write_domain = I915_GEM_DOMAIN_RENDER;
> > +
> > +                             offset = reloc[n].presumed_offset + reloc[n].delta;
> > +
> > +                             batch[i] = MI_STORE_DWORD_IMM | (gen < 6 ? 1 << 22 : 0);
> > +                             if (gen >= 8) {
> > +                                     batch[++i] = offset;
> > +                                     batch[++i] = offset >> 32;
> > +                             } else if (gen >= 4) {
> > +                                     batch[++i] = 0;
> > +                                     batch[++i] = offset;
> > +                                     reloc[n].offset += sizeof(uint32_t);
> > +                             } else {
> > +                                     batch[i]--;
> > +                                     batch[++i] = offset;
> > +                             }
> > +                             batch[++i] = rand();
> > +                             i++;
> > +                     }
> > +                     batch[i] = MI_BATCH_BUFFER_END;
> 
> Am I seeing a copy-pasted loop? You know what's next! :D
Not worth it surely.

> > +
> > +                     gem_execbuf(i915, &exec);
> > +                     gem_sync(i915, obj[0].handle);
> > +                     SHA1(pages, sz, result);
> > +
> > +                     /*
> > +                      * As the writes into the read-only GPU bo should fail,
> > +                      * the SHA1 hash of the backing store should be
> > +                      * unaffected.
> > +                      */
> > +                     igt_assert(memcmp(ref, result, SHA_DIGEST_LENGTH) == 0);
> > +             }
> > +
> > +             munmap(batch, 16*4096);
> > +             gem_close(i915, obj[1].handle);
> > +             gem_close(i915, rhandle);
> > +     }
> > +     igt_waitchildren();
> > +
> > +     munmap(space, total);
> > +     munmap(pages, sz);
> > +}
> 
> Okay more or less. Just want some tweaks and high level description 
> since I (or anyone in the future) don't need/want to reverse engineer 
> the patterns.
> 
> > +
> > +static jmp_buf sigjmp;
> > +static void sigjmp_handler(int sig)
> > +{
> > +     siglongjmp(sigjmp, sig);
> > +}
> > +
> > +static void test_readonly_mmap(int i915)
> > +{
> 
> Please add high level test description since there is some trickery below.

Tests handling of readonly userptr vs mmap.

> > +     unsigned char original[SHA_DIGEST_LENGTH];
> > +     unsigned char result[SHA_DIGEST_LENGTH];
> > +     uint32_t handle;
> > +     uint32_t sz;
> > +     void *pages;
> > +     void *ptr;
> > +     int sig;
> > +
> > +     igt_require(igt_setup_clflush());
> > +
> > +     sz = 16 << 12;
> > +     pages = mmap(NULL, sz, PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> > +     igt_assert(pages != MAP_FAILED);
> > +
> > +     igt_require(__gem_userptr(i915, pages, sz, true, userptr_flags, &handle) == 0);
> > +     gem_set_caching(i915, handle, 0);
> > +
> > +     memset(pages, 0xa5, sz);
> > +     igt_clflush_range(pages, sz);
> 
> Why are cache flushed needed in this test? Because they cannot be done 
> via domain management?

Because we are playing tricks here, doing things that are advised
against but not outright forbidden and want to catch out if the
kernel/hw, beyond the control of the bo, so only via pages.

> > +     SHA1(pages, sz, original);
> > +
> > +     ptr = __gem_mmap__gtt(i915, handle, sz, PROT_WRITE);
> > +     igt_assert(ptr == NULL);
> > +
> > +     ptr = gem_mmap__gtt(i915, handle, sz, PROT_READ);
> > +     gem_close(i915, handle);
> > +
> > +     if (!(sig = sigsetjmp(sigjmp, 1))) {
> 
> What does this do? Comment?

It's a sigsetjmp. What's unusual?

> > +             signal(SIGBUS, sigjmp_handler);
> > +             signal(SIGSEGV, sigjmp_handler);
> > +             memset(ptr, 0x5a, sz);
> > +             igt_assert(0);
> > +     }
> > +     igt_assert_eq(sig, SIGSEGV);
> > +
> > +     igt_assert(mprotect(ptr, sz, PROT_WRITE));
> 
> Why is this needed?

? It's a test that we can't change the CPU protection from read-only to
read-write.

> > +     munmap(ptr, sz);
> > +
> > +     igt_clflush_range(pages, sz);
> > +     SHA1(pages, sz, result);
> > +     igt_assert(!memcmp(original, result, sizeof(original)));
> > +
> > +     munmap(pages, sz);
> > +}