[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

Tue Jun 11 04:43:14 PDT 2013

On 11.06.2013 14:00, Daniel Vetter wrote:
> We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
> which blew up the gpu (that one has been submitted already in a
> different ioctl call), but to be able to restart the ioctl after the
> reset has completed: We need to kick every thread which is potentially
> holding GEM locks and make sure that we block them (at a point where
> they don't hold any locks) until the reset handler completed. To avoid
> a validation nightmare we use the same codepaths as we use for signal
> interrupts, so ioctl restarting is a very natural fit for this.
> 
> Resubmitting victim workloads when a gpu crash happened is something
> the reset handler would do (kernel work item currently), not any
> userspace process doing an ioctl. But atm we don't resubmit victimized
> workloads.

I don't understand the end-to-end of how resubmit is supposed to work.
User space is not supposed to resubmit, but still EAGAIN is returned to
user space, and drmIoctl() in user space just calls the came ioctl
again. Sounds like drmIoctl() is completely wrong.

In Tegra, when a job blows up, we reset the involved units, and set the
pushbuffer pointer of host1x to point to the next job, and re-enable
units. There's no need for anybody to resubmit anything, as kernel
already has them.

Terje