etnaviv: mmu issue after end of address space reached?

Sat Dec 10 11:00:46 UTC 2016

I'm having an issue where a long-running test eventually runs into a MMU
fault. What this test does is basically:

- while [ 1 ]; do start a program that:
    - Allocate bo A, B and C, D
    - Map bo C, update it
    - Loop
        - Map bo A B and C, update them
        - Build command buffer
        - Submit command buffer
        - etna_cmd_stream_finish
        - Map buffer A, check output
    - Delete buffer A, B, C and D
    - Exit program
(code is here: https://github.com/etnaviv/etnaviv_gpu_tests/blob/master/src/etnaviv_verifyops.c#L735)

The curious thing is that after the fault happens once, it keeps running into
the same fault almost immediately, even after a GPU reset. This made me suspect
it has to do with kernel driver state not GPU state.

I added some debugging in the kernel driver in etnaviv_iommu_find_iova:

<4>[  549.776209] Found iova: 00000000 eff82000
<4>[  549.780712] Found iova: 00000000 eff93000
<4>[  549.785173] Found iova: 00000000 effa4000
<4>[  549.789706] Found iova: 00000000 effb5000
<4>[  549.794167] Found iova: 00000000 effc6000
<4>[  549.798686] Found iova: 00000000 effd7000
<4>[  549.803171] Found iova: 00000000 effe8000
<4>[  549.803171] Found iova: 00000000 effe8000
<4>[  549.807680] last_iova <- end of range
<4>[  549.809966] Found iova: 00000000 e8783000
<3>[  549.814025] etnaviv-gpu 130000.gpu: MMU fault status 0x00000002 <- happens almost immediately
<3>[  549.819960] etnaviv-gpu 130000.gpu: MMU 0 fault addr 0xe8783040
<3>[  549.825889] etnaviv-gpu 130000.gpu: MMU 1 fault addr 0x00000000
<3>[  549.831817] etnaviv-gpu 130000.gpu: MMU 2 fault addr 0x00000000
<3>[  549.837744] etnaviv-gpu 130000.gpu: MMU 3 fault addr 0x00000000

Apparently it is running out of the address space.
(I changed the end of the range to 0xf0000000 instead of 0xffffffff to rule out
that it had to do with the GPU disliking certain addresses)

In principle this shouldn't be an issue - after last_iova it starts over, with a
flushed MMU. I verified that this flush is actually being queued in etnaviv_buffer_queue.

However for some reason that logic doesn't seem to be working. I have not found
out what is wrong yet. I have not verified whether the MMU flush is actually flushing,
or whether this is a problem with updating the page tables.

What I find curious, though, is that after the search presumably starts over at
0 it returns 0xe8783000 instead of an earlier address. For this reason
last_iova is stuck near the end of the address space and the problem keeps
repeating once it's been hit.

It's certainly possible that I'm doing something dumb here and am somehow spamming
full the address space :)

Wladimir