[RFC libdrm 4/6] tegra: Add channel, job, pushbuf and fence APIs

Wed Feb 19 11:57:29 PST 2014

On Wed, Feb 19, 2014 at 5:04 PM, Thierry Reding
<thierry.reding at gmail.com> wrote:
> From: Thierry Reding <treding at nvidia.com>
>
> These functions can be used to open channels to engines, manage job
> submissions, create push buffers to store command streams in and wait
> until jobs have been completed.
>
> Signed-off-by: Thierry Reding <treding at nvidia.com>

Thanks a lot for doing this! I'm going right for the juicy patch ;)

> +drm_public
> +int drm_tegra_fence_wait_timeout(struct drm_tegra_fence *fence,
> +                                unsigned long timeout)
> +{
> +       struct drm_tegra_syncpt_wait args;
> +       int err;
> +
> +       memset(&args, 0, sizeof(args));

Nit: how about

struct drm_tegra_syncpt_wait args = { 0 };

instead?

> +       args.id = fence->syncpt;
> +       args.thresh = fence->value;
> +       args.timeout = timeout;
> +
> +       while (true) {
> +               err = ioctl(fence->drm->fd, DRM_IOCTL_TEGRA_SYNCPT_WAIT, &args);
> +               if (err < 0) {
> +                       if (errno == EINTR)
> +                               continue;
> +
> +                       drmMsg("DRM_IOCTL_TEGRA_SYNCPT_WAIT: %d\n", -errno);

What's the reason for printing the errno negated? And could we do
'...%s\n" strerror(errno));' instead?

> +int drm_tegra_job_add_reloc(struct drm_tegra_job *job,
> +                           const struct drm_tegra_reloc *reloc)
> +{
> +       struct drm_tegra_reloc *relocs;
> +       size_t size;
> +
> +       size = (job->num_relocs + 1) * sizeof(*reloc);
> +
> +       relocs = realloc(job->relocs, size);

Nit: there's no point in not assigning those while declaring them, no?

size_t size = (job->num_relocs + 1) * sizeof(*reloc);
struct drm_tegra_reloc *relocs; = realloc(job->relocs, size);

> +drm_public
> +int drm_tegra_pushbuf_new(struct drm_tegra_pushbuf **pushbufp,
> +                         struct drm_tegra_job *job,
> +                         struct drm_tegra_bo *bo,
> +                         unsigned long offset)
> +{
> +       struct drm_tegra_pushbuf_private *pushbuf;
> +       void *ptr;
> +       int err;
> +
> +       pushbuf = calloc(1, sizeof(*pushbuf));
> +       if (!pushbuf)
> +               return -ENOMEM;
> +
> +       pushbuf->bo = drm_tegra_bo_get(bo);
> +       DRMINITLISTHEAD(&pushbuf->list);
> +       pushbuf->job = job;
> +
> +       err = drm_tegra_bo_map(bo, &ptr);
> +       if (err < 0) {
> +               drm_tegra_bo_put(bo);
> +               free(pushbuf);
> +               return err;
> +       }
> +
> +       pushbuf->start = pushbuf->base.ptr = ptr + offset;
> +       pushbuf->offset = offset;
> +
> +       DRMLISTADD(&pushbuf->list, &job->pushbufs);
> +       job->num_pushbufs++;
> +
> +       *pushbufp = &pushbuf->base;
> +
> +       return 0;
> +}

It feels quite wasteful to me to have to allocate a new pushbuf in
order to be able to use a new BO. I'd much rather see the pushbuf
being a persisting object that's the interface to the command-stream
(that produces jobs).

I was thinking something like:

int drm_tegra_pushbuf_new(struct drm_tegra_pushbuf **pushbufp, struct
drm_tegra_job *job)
int drm_tegra_pushbuf_room(struct drm_tegra_pushbuf *pushbuf, int num_words);

Where room guarantees that there's space for those words in the
pushbuf. A simple implementation could just allocate a bo of that
size, but a slightly more sophisticated one can allocate larger ones
and reuse them. Even more sophisticated ones could keep old cmdbufs
around and reuse them once the hardware is done reading them, do
exponential grow-factors etc.

I've implemented the "slightly more sophisticated" approach here:

https://github.com/grate-driver/libdrm/commit/f90ea2f57ca4d8c81768402900c663ce526bac11

In my implementation, I've changed the job-structure to build the list
of cmdbufs directly rather than keeping a list of the pushbufs. Sure,
that means another allocation every time we need a new cmdbuf, but
hopefully we should be able to produce much less of them this way.

> +int drm_tegra_pushbuf_relocate(struct drm_tegra_pushbuf *pushbuf,
> +                              struct drm_tegra_bo *target,
> +                              unsigned long offset,
> +                              unsigned long shift)
> +{
> +       struct drm_tegra_pushbuf_private *priv = pushbuf_priv(pushbuf);
> +       struct drm_tegra_reloc reloc;
> +       int err;
> +
> +       memset(&reloc, 0, sizeof(reloc));
> +       reloc.cmdbuf.handle = priv->bo->handle;
> +       reloc.cmdbuf.offset = drm_tegra_pushbuf_get_offset(pushbuf);
> +       reloc.target.handle = target->handle;
> +       reloc.target.offset = offset;
> +       reloc.shift = shift;
> +
> +       err = drm_tegra_job_add_reloc(priv->job, &reloc);
> +       if (err < 0)
> +               return err;
> +
> +       return 0;
> +}

Whenever we insert a reloc, we also insert a DEADBEEF in the command
stream. Why not formalize this into this function?