[Mesa-dev] [PATCH 8/8] panfrost: Add backend targeting the DRM driver

Tue Mar 5 00:43:38 UTC 2019

Wooo!!!!!!!

Seeing this patch in my inbox has been some of the best news all day!

Without further ado, my (solicited?) comments:

> +/* Copyright 2018, Linaro, Ltd., Rob Herring <robh at kernel.org> */

Please triple check upstream that this license is okay; the other files
in include/drm-uapi are BSDish.

> +/* timeouts are specified in clock-monotonic absolute times (to simplify
> + * restarting interrupted ioctls).  The following struct is logically the
> + * same as 'struct timespec' but 32/64b ABI safe.
> + */
> +struct drm_panfrost_timespec {
> +	__s64 tv_sec;          /* seconds */
> +	__s64 tv_nsec;         /* nanoseconds */
> +};

What's this for -- is there not a shared struct for this if it's
necessary? (I'm assuming this was copied from etna..?)

> +	__u32 flags;          /* in, mask of ETNA_BO_x */

As Rob said, s/ETNA/PAN/g

> +	struct drm_panfrost_timespec timeout;   /* in */

(Presumably we just want a timeout in one of nanoseconds / milliseconds
/ second, not the full timespec... 64-bit time gives a pretty wide range
even for high-precision timeouts, which I don't know that we need. Also,
it's signed for some reason...?)

> +	struct drm_panfrost_gem_submit_atom_dep deps[2];

I'm concerned about hardcoding deps to 2 here. I know the Arm driver
does it, but I'm super uncomfortable by them doing it too. Keep in mind,
when they have complex dependencies they insert dummy "dependency-only"
jobs into the graph, though I concede I haven't seen this yet.

I'm not at all sure what the right number is, especially since the new
kernel interface seems to support waiting on BOs? Or am I
misinterpreting this?

Regardless, this will start to matter when we implement support for more
complex FBOs (where various FBOs samples from various other FBOs), which
I think can have arbitrarily complicated dependency graphs. So
hardcoding to [2] is inappropriate if we want to use deps for waiting on
FBOs. On the other hand, if we use a kernel-side BO wait or fences or
something to resolve dependencies, do we even need two deps, or just
one?

> +	// TODO cache allocations
> +	// TODO properly handle errors
> +	// TODO take into account extra_flags

Not a blocker at all for merge, but these should be taken care of before
we drop support for the non-DRM stuff (since at least the
lack of GROWABLE support represents a regression compared to the Arm
driver).

> +	// TODO map and unmap on demand?

I don't know if this matters too much...? Usually if we're allocating a
slab, that means we want it mapped immediately, unless we set the
INVISIBLE flag which means we odn't want it mapped at all.

Maybe we'll have fancier scenarios in the future, but at the moment I
think we can safely cross off at least the first half of the TODO.

As you know I'm not a believer in unmapping/freeing resources ever, so I
don't get an opinion there ;)

> +	gem_info.handle = gem_new.handle;
> +	ret = drmIoctl(drm->fd, DRM_IOCTL_PANFROST_GEM_INFO, &gem_info);
> +	if (ret) {
> +                fprintf(stderr, "DRM_IOCTL_PANFROST_GEM_INFO failed: %d\n", ret);
> +		assert(0);
> +	}

Maybe a question for Rob, but why isn't this info passed along with the
allocate in the first place? I guess the extra roundtrip to the kernel
isn't a huge deal, but it seems a little odd...?

> +static uint8_t
> +allocate_atom()
> +{
> +	/* Use to allocate atom numbers for jobs. We probably want to overhaul this in kernel space at some point. */
> +	static uint8_t atom_counter = 0;
> +
> +        atom_counter++;
> +
> +        /* Workaround quirk where atoms must be strictly positive */
> +
> +        if (atom_counter == 0)
> +                atom_counter++;
> +
> +        return atom_counter;
> +}

So, this routine (which is copied straight from the non-DRM code) is
specifically to workaround a quirk in the Arm driver where atom numbers
must be non-zero u8's. I'm skeptical the DRM interface needs this.

> +		idx++;

Nice one, no idea why I didn't think of doing it this way :)

> +panfrost_fence_create(struct panfrost_context *ctx)

I'd appreciate a link to documentation about mainline fences, since I
have no idea what's happening here :)

> +static void
> +panfrost_drm_enable_counters(struct panfrost_screen *screen)
> +{
> +	fprintf(stderr, "unimplemented: %s\n", __func__);
> +}

If nobody else is taking it, would anyone mind if I play with adding
performance counters to the kernel? I want to at least dip my feet into
the other side of the stack, and that seems like a nice low-hanging
fruit (especially since I actually grok the performance counters, more
or less) :)

> +        return drmSyncobjCreate(drm->fd, DRM_SYNCOBJ_CREATE_SIGNALED,

(See fence question)