[Freedreno] Freedreno on i.MX53

Fri Jul 31 06:05:55 PDT 2015

On Thu, Jul 30, 2015 at 1:32 PM, Martin Fuzzey <mfuzzey at parkeon.com> wrote:
> On 25/06/15 18:56, Rob Clark wrote:
>
> Thanks for the pointers.
>
> I have made a little progress, not out of the woods yet though.
>
> I finally went back to the kgsl kernel driver for the moment to be able to
> compare with the blob generated command streams.
> This is not the same as the msm-kgsl one, there is no GEM and some of the
> ioctls are a little different.
>
> I instrumented the kernel side to dump the command stream, vertex buffers
> and GMEM.
>
> And then started with something even simpler - just a clear of a single
> tile, even removing gmem2mem (and just looking directly at gmem)
>
> Nothing was being drawn at all...
>
> Finally comparing the command streams showed that
>
> 1) In CP_DRAW_INDX,  NUM_INDICES needs to be stored in the upper half of the
> second parameter rather being as a parameter by itself
> 2) REG_A2XX_CLEAR_COLOR doesn't work, CP_SET_CONSTANT(0x00000480) needs to
> be used instead (with the colour components as 4 floats)

hmm, I *used* to have cmdstream dumps for a200/a205 vs a220/a225, but
I cannot seem to find them anymore.. probably didn't survive several
laptop migrations + disk reformats/replacements..

but I seem to recall a few differences between a20x and a22x, so it
wouldn't surprise me if we need a few if (a20x) / else if (a22x) in
the code.  Since I never had any a20x hw, I was unsure which
differences actually matter.  The CP_DRAW_INDX diff could be as much
about pm4/pfp fw version as hw version, but I guess since lack of data
points it is reasonable to assume it is a20x vs a22x..  the register
differences otoh are certainly a20x vs a22x.

> Changing that was enough to get the GMEM filled by a clear but the gmem2mem
> draw still hung.
>
> Adding this to the initialisation (taken from the blob command stream):
> t0              write RB_BC_CONTROL
>                         RB_BC_CONTROL: { ACCUM_TIMEOUT_SELECT = 3 |
> DISABLE_LZ_NULL_ZCMD_DROP | AZ_THROTTLE_COUNT = 0 | ENABLE_CRC_UPDATE |
> ACCUM_ALLOC_MASK = 0 | ACCUM_DATA_FIFO_LIMIT = 8 | MEM_EXPORT_TIMEOUT_SELECT
> = 3 }
> 00000000:               00000f01 1c004046
>
>
> Fixed that (the important part seems to be ACCUM_DATA_FIFO_LIMIT = 8)
>
> With that a full clear of all 6 tiles worked.
>
> However clear + draw triangle does not work. It hangs on the DRAW_INDX for
> the first gmem2mem.
>
> Doing write scratch, wait idle, write scratch after the hanging DRAW_INDX
> shows that it is indeed the DRAW_INDX hanging (the final register value is
> the one written before the WFI)
>
> Strange thing is that dumping the render buffer (destination for gmem2mem)
> after the hang shows that gmem2mem has worked - the buffer contains a single
> tile of background and triangle as expected.
>
> The RBBM_STATUS register in the hang case is 0x91410310, versus 0x00000110
> when all is well.
> This translates to the following busy bits being set
>     GUI_ACTIVE = 1
>     SQ_CNTX0_BUSY = 1
>     SC_CNTX_BUSY = 1
>     TPC_BUSY = 1
>     CP_NRT_BUSY = 1
>     CFRQ_PENDING = 1
>
> I tried comparing the registers for the gmem2mem draw in the clear only and
> clear + triangle cases, there were a couple of differences but setting those
> registers to be the same makes no difference. I fail to see why a gmem2mem
> draw that works when preceeded by just a clear hangs when preceeded by clear
> + draw, given that the registers are the same...

hmm.. debugging hangs is always a pain.. generally I sprinkle WFI's
everywhere to confirm which draw it is actually hanging on.  I'm not
sure about a2xx, but on a3xx there are two copies of most of the
registers (ones w/ offset >= 0x2000).  On a3xx and up we can read all
the gpu registers from the cpu, so you could dump the whole io space
and see two clusters of nearly identical registers (ie. one with
values for current draw and one with values from previous draw).  On
a3xx/a4xx they are offset by 0x1000..  see find_domain() in
envytools/rnn/demsm.c.  Without extra WFI's the CP is running ahead of
the rest of the gpu, setting up registers for the next draw in
parallel with current draw.  I suspect a2xx is probably similar.

Anyways, that is mostly just fyi, if you can see from memory dump that
the gmem2mem actually did work.

One thing I noticed on a3xx, is that things needed to have a certain
alignment in the toplevel ringbuffer, or the gpu would hang.. see
comment in adreno_submit() on the kernel side.  (ofc who knows if the
type-2 packet works on a2xx so you might need to do something
differently.. or perhaps it is even hanging on that?)

I'd try putting a WFI after the gmem2mem, and then a scratch register
write after the WFI.  Could be it is hanging up on return to the
previous IB level (or on return to ringbuffer?)

BR,
-R

> Comparing with the command stream from the blob driver is complicated by the
> blob driver using the hardware binning commands.
>
>
> Regards,
>
> Martin
>