[PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

Tue Feb 5 00:42:55 PST 2013

On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
> On 04.02.2013 02:30, Thierry Reding wrote:
> >> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> >> index d8f5979..8376092 100644
> >> --- a/drivers/gpu/host1x/dev.h
> >> +++ b/drivers/gpu/host1x/dev.h
> >> @@ -17,11 +17,12 @@
> >>  #ifndef HOST1X_DEV_H
> >>  #define HOST1X_DEV_H
> >>
> >> +#include <linux/platform_device.h>
> >>  #include "syncpt.h"
> >> +#include "intr.h"
> >>
> >>  struct host1x;
> >>  struct host1x_syncpt;
> >> -struct platform_device;
> > 
> > Why include platform_device.h here?
> 
> host1x_get_host() actually needs that, so this #include should've also
> been in previous patch.

No need to if you pass struct device * instead. You might need
linux/device.h instead, though.

> >> +     void (*set_syncpt_threshold)(
> >> +             struct host1x_intr *, u32 id, u32 thresh);
> >> +     void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
> >> +     void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
> >> +     void (*disable_all_syncpt_intrs)(struct host1x_intr *);
> > 
> > Can disable_all_syncpt_intrs() not be implemented generically using the
> > number of syncpoints as exposed by host1x_device_info and the
> > .disable_syncpt_intr() function?
> 
> disable_all_syncpt_intrs() disables all interrupts in one write (or one
> per 32 sync points), so it's more efficient.

Yes, I noticed that and failed to remove this comment.

> >> +{
> >> +     struct host1x_intr_syncpt *sp =
> >> +             container_of(work, struct host1x_intr_syncpt, work);
> >> +     host1x_syncpt_thresh_fn(sp);
> > 
> > Couldn't we inline the host1x_syncpt_thresh_fn() implementation here?
> > Why do we need to go through an external function declaration?
> 
> If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do
> that. That'd simplify the interrupt path.

I like simplification. =)

> >> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> >> +{
> >> +     struct host1x *host1x = dev_id;
> >> +     struct host1x_intr *intr = &host1x->intr;
> >> +     unsigned long reg;
> >> +     int i, id;
> >> +
> >> +     for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
> >> +             reg = host1x_sync_readl(host1x,
> >> +                             HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
> >> +                             i * REGISTER_STRIDE);
> >> +             for_each_set_bit(id, &reg, BITS_PER_LONG) {
> >> +                     struct host1x_intr_syncpt *sp =
> >> +                             intr->syncpt + (i * BITS_PER_LONG + id);
> >> +                     host1x_intr_syncpt_thresh_isr(sp);
> > 
> > Have you considered mimicking the IRQ API and name this something like
> > host1x_intr_syncpt_thresh_handle() and name the actual ISR just
> > syncpt_thresh_isr()? Not so important but it makes things a bit clearer
> > in my opinion.
> 
> This gets a bit confusing, because we have an ISR that calls a function
> that is also called ISR. I've kept "isr" in names of both to emphasize
> that this is running in interrupt context. I'm open to renaming these to
> make it clearer.
> 
> Did you refer to chained IRQ handler in linux/irq.h when you mentioned
> IRQ API as reference for naming?

What I had in mind was more along the lines of kernel/irq/chip.c, which
has a bunch of handlers for various types of interrupts, such as
handle_nested_irq() or handle_simple_irq().

Hence my proposal to rename host1x_intr_syncpt_thresh_isr() to
host1x_intr_syncpt_handle() because it handles the interrupt from a
single syncpoint and syncpt_thresh_cascade_isr() to syncpt_thresh_isr()
to keep it shorter.

Another variant would be host1x_syncpt_irq() for the top-level handler
and something host1x_handle_syncpt() to handle individual syncpoints. I
like this one best, but this is pure bike-shedding and there's nothing
technically wrong with the names you chose, so I can't really object if
you want to stick to them.

> >> +                     queue_work(intr->wq, &sp->work);
> > 
> > Should the call to queue_work() perhaps be moved into
> > host1x_intr_syncpt_thresh_isr().
> 
> I'm not sure, either way would be ok to me. The current structure allows
> host1x_intr_syncpt_thresh_isr() to only take one parameter
> (host1x_intr_syncpt). If we move queue_work, we'd also need to pass
> host1x_intr.

I think I'd still prefer to have all the code in one function because it
make subsequent modification easier and less error-prone.

> >> +static void host1x_intr_init_host_sync(struct host1x_intr *intr)
> >> +{
> >> +     struct host1x *host1x = intr_to_host1x(intr);
> >> +     int i, err;
> >> +
> >> +     host1x_sync_writel(host1x, 0xffffffffUL,
> >> +             HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
> >> +     host1x_sync_writel(host1x, 0xffffffffUL,
> >> +             HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
> >> +
> >> +     for (i = 0; i < host1x->info.nb_pts; i++)
> >> +             INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
> >> +
> >> +     err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
> >> +                             syncpt_thresh_cascade_isr,
> >> +                             IRQF_SHARED, "host1x_syncpt", host1x);
> >> +     WARN_ON(IS_ERR_VALUE(err));
> > 
> > Do we really want to continue in this case?
> 
> Hmm, we'd need to actually return an error code. There's not much the
> driver can do without syncpt interrupts.

Yeah, in that case I think we should bail out. It's not like we're
expecting any failures. If the interrupt cannot be requested, something
must seriously be wrong and we should tell users about it so that it can
be fixed. Trying to continue on a best effort basis isn't useful here, I
think.

> >> +int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
> >> +                     enum host1x_intr_action action, void *data,
> >> +                     void *_waiter,
> >> +                     void **ref)
> > 
> > Why do you pass in _waiter as void * and not struct host1x_waitlist *?
> 
> struct host1x_waitlist is defined inside intr.c, so I've chosen to pass
> void *. I could naturally just forward declare host1x_waitlist in intr.h
> and change the allocation and add_action to use that.

Yes, that's definitely better.

> > I think I've said this before. The interface doesn't seem optimal to me
> > here. Passing in an enumeration to choose which action to perform looks
> > difficult to work with (not to mention the symbols are rather long and
> > therefore result in ugly code).
> > 
> > Maybe doing this by passing around a pointer to a handler function would
> > be nicer. However since I haven't really used this yet, I can't really
> > tell. So maybe we should just merge the implementation as-is for now. We
> > can always clean it up later.
> 
> We're using the enum also to index into arrays. We do it so that we can
> remove all the completed waiters from the wait_head, and insert them
> into lists per action type. This way we can run all actions in priority
> order: first action_submit_complete, then action_wakeup, and then
> action_wakeup_interruptible.
> 
> Now, we're recently noticed that the priority order is actually wrong.
> The first priority should be to wake up non-interruptible tasks, then
> interruptible tasks. Cleaning up memory of completed submits should be
> lower priority.
> 
> I've considered this part as something private to host1x driver and it's
> not really meant to be called f.ex. from DRM. But, as you seem to have a
> need to have an asynchronous wait for a fence, we'd need to figure
> something out for that.

Okay, let's keep it as-is for now and see how it can be improved later
when we have an actual use-case for using it externally.

> >> +void *host1x_intr_alloc_waiter(void)
> >> +{
> >> +     return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
> >> +}
> > 
> > I'm not sure why this is separate from host1x_syncpt_wait() since it is
> > only used inside that function and the waiter returned never leaves the
> > scope of that function, so it might be better to allocate it directly
> > in host1x_syncpt_wait() instead.
> > 
> > Actually, it looks like the waiter doesn't ever leave scope, so you may
> > even want to allocate it on the stack.
> 
> In patch 3, at submit time we first allocate waiter, then take
> submit_lock, write submit to channel, and add the waiter while having
> the lock. I did this so that I host1x_intr_add_action() can always
> succeed. Otherwise I'd need to write another code path to handle the
> case where we wrote a job to channel, but we're not able to add a
> submit_complete action to it.

Okay. In that case why not allocate it on the stack in the first place
so you don't have to bother with allocations (and potential failure) at
all? The variable doesn't leave the function scope, so there shouldn't
be any issues, right?

Or if that doesn't work it would still be preferable to allocate memory
in host1x_syncpt_wait() directly instead of going through the wrapper.

> >> +void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
> > 
> > Here again, you pass in the waiter via a void *. Why's that?
> 
> host1x_waitlist is hidden inside intr.c.

I don't think that's necessary here. I'd rather have the compiler check
for types rather than hide the structure.

> >> +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
> > 
> > Maybe you should keep the type of the irq_sync here so that it properly
> > propagates to the call to devm_request_irq().
> 
> I'm not sure what you mean. Do you mean that I should use unsigned int,
> as that's the type used in devm_request_irq()?

Yes.

> >> +void host1x_intr_stop(struct host1x_intr *intr)
> >> +{
> >> +     unsigned int id;
> >> +     struct host1x *host1x = intr_to_host1x(intr);
> >> +     struct host1x_intr_syncpt *syncpt;
> >> +     u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
> >> +
> >> +     mutex_lock(&intr->mutex);
> >> +
> >> +     host1x->intr_op.disable_all_syncpt_intrs(intr);
> > 
> > I haven't commented on this everywhere, but I think this could benefit
> > from a wrapper that forwards this to the intr_op. The same goes for the
> > sync_op.
> 
> You mean something like "host1x_disable_all_syncpt_intrs"?

Yes. I think that'd be useful for each of the op functions. Perhaps you
could even pass in a struct host1x * to make calls more uniform.

> >> +/*
> >> + * Main entrypoint for syncpoint value waits.
> >> + */
> >> +int host1x_syncpt_wait(struct host1x_syncpt *sp,
> >> +                     u32 thresh, long timeout, u32 *value)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_syncpt_wait);
> > 
> > This doesn't only seem to be the main entrypoint, but it's basically the
> > only way to currently wait for syncpoints. One actual use-case where
> > this might turn out to be a problem is video capturing. The problem is
> > that using this API you can't very well asynchronously capture frames.
> > So eventually I think we need a way to allow a generic handler to be
> > attached to syncpoints so that you can have this handler continuously
> > invoked after each frame is captured and just pass the buffer back to
> > userspace.
> 
> Yep, so far all asynchronous waits have been done in user space. We
> would probably allow attaching a handler to a syncpt value, so that we'd
> call that handler once a value is reached. In effect, similar to a
> wake_up event that is now added via host1x_intr_add_action, but simpler.
> That'd mean that the handler needs to be re-added after each frame.
> 
> We could also add the handler as persistent if re-adding would be a
> problem. That'd require some new wiring and I'll have to think how to
> implement that.

Yes, that sounds like what I had in mind. Again, no need to worry about
it now. We can cross that bridge when we come to it.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20130205/e1507241/attachment-0001.pgp>