[RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

Thu Nov 29 23:22:00 PST 2012

On Thu, Nov 29, 2012 at 12:39:23PM +0200, Terje Bergström wrote:
> On 29.11.2012 10:44, Thierry Reding wrote:
> >> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
> >> index 98c9c9f..025a820 100644
> >> --- a/drivers/video/tegra/host/dev.c
> >> +++ b/drivers/video/tegra/host/dev.c
> >> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
> >>  }
> >>  EXPORT_SYMBOL(host1x_syncpt_read);
> >>
> >> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
> > 
> > The choice of data types is odd here. id refers to a syncpt so a better
> > choice would have been unsigned int because the size of the variable
> > doesn't actually matter. But as I already said in my reply to patch 1,
> > these are resources and should therefore better be abstracted through an
> > opaque pointer anyway.
> > 
> > timeout is usually signed long, so this function should reflect that. As
> > for the value this is probably fine as it will effectively be set from a
> > register value. Though you also cache them in software using atomics.
> 
> 32-bits is an architectural limit for the sync point id, so that's why I
> used it here.

But given that there are only 32 syncpoints they look rather costly, so
I don't expect more than a few hundred to ever be used in hardware,
right?

> But you're right - it doesn't really matter and could be changed to
> unsigned long.

I'd still opt for unsigned int. For no other reason than that it is how
other types of resources are enumerated.

> thresh and *value reflects that sync point value is 32-bit, and I'd keep
> that as is.

Yes, that makes sense.

> Timeout should be unsigned long, yes.

It should actually be signed long to match the type used for timeouts in
the various wait_*() functions.

> >> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c b/drivers/video/tegra/host/host1x/host1x_intr.c
> > [...]
> >> +#include <linux/interrupt.h>
> >> +#include <linux/irq.h>
> >> +#include <linux/io.h>
> >> +#include <asm/mach/irq.h>
> >> +
> >> +#include "nvhost_intr.h"
> >> +#include "host1x/host1x.h"
> >> +
> >> +/* Spacing between sync registers */
> >> +#define REGISTER_STRIDE 4
> > 
> > Erm... no. The usual way you should be doing this is either make the
> > register definitions account for the stride or use accessors that apply
> > the stride. You should be doing the latter anyway to make accesses. For
> > example:
> > 
> >         static inline void host1x_syncpt_writel(struct host1x *host1x,
> >                                                 unsigned long value,
> >                                                 unsigned long offset)
> >         {
> >                 writel(value, host1x->regs + SYNCPT_BASE + offset);
> >         }
> > 
> >         static inline unsigned long host1x_syncpt_readl(struct host1x *host1x,
> >                                                         unsigned long offset)
> >         {
> >                 return readl(host1x->regs + SYNCPT_BASE + offset);
> >         }
> > 
> > Alternatively, if you want to pass the register index instead of the
> > offset, you can use just multiply the offset in that function:
> >
> >         writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> >
> > The same can also be done with the non-syncpt registers.
> 
> The register number has a stride of 4 when doing writes, and 1 when
> adding to command streams. This is why I've kept the register
> definitions as is.

Yes, that's why it makes sense to use such helpers. It allows you to
reuse the register definitions for both direct and indirect access but
doesn't require you to repeat the stride multiplication every time.

> I could add helper functions. Just as a side note, the sync register
> space has other definitions than just the syncpt registers, so the
> naming should be changed a bit.

The TRM refers to them as SYNC registers, so SYNC_BASE should be fine.

> >> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> >> +{
> >> +     struct nvhost_master *dev = dev_id;
> >> +     void __iomem *sync_regs = dev->sync_aperture;
> >> +     struct nvhost_intr *intr = &dev->intr;
> >> +     unsigned long reg;
> >> +     int i, id;
> >> +
> >> +     for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
> >> +             reg = readl(sync_regs +
> >> +                             host1x_sync_syncpt_thresh_cpu0_int_status_r() +
> >> +                             i * REGISTER_STRIDE);
> >> +             for_each_set_bit(id, &reg, BITS_PER_LONG) {
> >> +                     struct nvhost_intr_syncpt *sp =
> >> +                             intr->syncpt + (i * BITS_PER_LONG + id);
> >> +                     host1x_intr_syncpt_thresh_isr(sp);
> >> +                     queue_work(intr->wq, &sp->work);
> >> +             }
> >> +     }
> >> +
> >> +     return IRQ_HANDLED;
> >> +}
> > 
> > Maybe it would be better to call the syncpt handlers in interrupt
> > context and let them schedule work if they want to. I'm thinking about
> > the display controllers which may want to use syncpoints for VBLANK
> > support.
> 
> Display controller can use the APIs to read, increment and wait for sync
> point.

Actually for the display controller we want just a notification when the
VBLANK happens. I'm not sure if we want to do that with syncpoints at
all since it works quite well using regular interrupts.

> We could do more in isr, but then again, we've noticed that the current
> design already gives pretty good latency, so we haven't seen the need to
> move code from thread to isr.

What I'm proposing is to leave it up to each host1x client how they want
to handle this. For display controllers it may be enough to have their
callback run in interrupt context but other clients may need to do more
work so they can queue it themselves.

I know that this looks like it might be more work, but if it turns out
that many drivers need to do the exact same thing, that functionality
can be factored out into a helper. But it may just as well turn out that
the requirements for each module are slightly different that forcing a
workqueue on them could result in ugly workarounds because it doesn't
quite work for them.

> >> +/**
> >> + * Sync point threshold interrupt service function
> >> + * Handles sync point threshold triggers, in interrupt context
> >> + */
> >> +static void host1x_intr_syncpt_thresh_isr(struct nvhost_intr_syncpt *syncpt)
> >> +{
> >> +     unsigned int id = syncpt->id;
> >> +     struct nvhost_intr *intr = intr_syncpt_to_intr(syncpt);
> >> +
> >> +     void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
> >> +
> >> +     u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
> >> +
> >> +     writel(BIT_MASK(id), sync_regs +
> >> +             host1x_sync_syncpt_thresh_int_disable_r() + reg);
> >> +     writel(BIT_MASK(id), sync_regs +
> >> +             host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
> >> +}
> > 
> > So this disables all interrupts and is called from the syncpt interrupt
> > handler. Where are the interrupts reenabled? Do host1x clients have to
> > do that manually?
> 
> The thread re-enables once it's done. It checks the next value we're
> interested in, and programs that to host1x syncpt threshold.

Okay, that does make sense now. I think I'm indeed beginning to
understand how the hardware works...

> > Okay, so this is where syncpoint interrupts are reenabled. The rest of
> > this whole wait queue code looks overly complex. I'll need to go through
> > that separately and more thoroughly.
> 
> Thanks.

If we move responsibility of managing the workqueue out of host1x as I
proposed above, maybe a lot of this code can be removed. Maybe you can
explain a bit what they are used for exactly in your write-up.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20121130/fd1e020b/attachment.pgp>