Etnaviv DRM driver - oops when unloading

Wed May 27 16:03:52 PDT 2015

On Wed, May 27, 2015 at 02:49:17PM +0200, Lucas Stach wrote:
> Hi Alexander,
> 
> Am Mittwoch, den 27.05.2015, 14:45 +0200 schrieb Alexander Holler:
> > Hello,
> > 
> > I've just build and booted the Etnaviv driver as module with Kernel 4.0.4.
> > 
> > When I unload the driver with rmmod an oops happens:
> > 
> Thanks for the report.
> 
> I'm currently working on the patchstack to get it into shape for another
> submission. I'll take a look at this problem.

Lucas,

We definitely need to talk about this... please can you take a look at
my patch stack which I mentioned in my reply to Alexander.  I know that
you already based your patch set off one of my out-dated patch stacks,
but unless you grab a more recent one, you're going to be solving a lot
of bugs that I've fixed.

Also, I've heard via IRC that you've updated my DDX - which is something
I've also done (I think I did mention that I would be doing this... and
I also have a bunch of work on the Xrender backend.)  I suspect that
makes your work there redundant.

There's at least two more issues that need to be discussed, which is
concerning the command buffers, and their DMA-coherent nature, which
makes the kernel command parser expensive.  In my perf measurements,
it's right at the top of the list as the most expensive function.

I've made some improvements to the parser which reduces its perf
figure a little, but it's still up there as the number one hot
function, inspite of the code being as tight as the compiler can
manage.

This is probably because we're reading from uncached memory: the CPU
can't speculatively prefetch into the cache any of the data from memory.

It's _probably_ (I haven't benchmarked it) going to be faster to
copy_from_user() the command buffers into the kernel, either directly
into DMA coherent memory, or into cacheable memory, and then run them
through the command parser, followed by either a copy to DMA coherent
memory or using the DMA streaming API to push the cache lines out.

This has other advantages: by not directly exposing the memory which the
GPU executes its command stream into userspace, we prevent submit-then-
change "attacks" bypassing the kernel command parser.

Another issue is that we incur quite a heavy overhead if we allocate an
etnadrm buffer, and then map it into userspace.  We end up allocating
all pages for the buffer as soon as any page is faulted in (consider if
it's a 1080p buffer...) and then setup the scatterlists.  That's useless
overhead if we later decide that we're not going to pass it to the GPU
(eg, because Xorg allocated a pixmap which it then only performed CPU
operations on.)  I have "changes" for this which aren't prepared as
proper patches yet (in other words, they're currently as a playground of
changes.)

I have other improvements pending which needs proper perf analysis
before I can sort them out properly.  These change the flow for a
submitted command set - reading all data from userspace before taking
dev->struct_mutex, since holding this lock over a page fault isn't
particularly graceful.  Do we really want to stall graphics on the
entire GPU while some process' page happens to be swapped out to disk?
I suspect not.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.