[Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers

Wed Nov 1 15:09:53 UTC 2017

Lionel Landwerlin <lionel.g.landwerlin at intel.com> writes:

> On 31/10/17 23:04, Scott D Phillips wrote:
>> Lionel Landwerlin <lionel.g.landwerlin at intel.com> writes:
>>
>>> On 31/10/17 20:54, Scott D Phillips wrote:
>>>> Lionel Landwerlin <lionel.g.landwerlin at intel.com> writes:
>>>>
>>>>> We want to introduce a reader interface for accessing memory, so that
>>>>> later on we can use different ways of storing the content of the GTT
>>>>> address space that don't involve a pointer to a linear buffer.
>>>> I'm kinda sceptical that this is the best way to achieve what you want
>>>> here. It strikes me as code that we'll look at in a year and wonder
>>>> what's going on.
>>>>
>>>> If I'm understanding, it seems like the essence of what you're going for
>>>> here is in the one place where you're using the sub_struct_reader. Maybe
>>>> instead of plumbing the reader object through everywhere, you can add a
>>>> callback just in gen_print_group for fixing up offsets to pointers, and
>>>> then leave everywhere else assuming contiguous memory blocks as today.
>>> First, thanks for you time reviewing this!
>>>
>>> I should have stated that in patch 33 I introduce a sparse memory object
>>> that isn't contiguous.
>>> It's based on the data structure described here :
>>> https://en.wikipedia.org/wiki/Hash_array_mapped_trie
>>>
>>> The idea is to split the memory into chunks of 4Kb but still make it
>>> look like it's a 64bit address space.
>>> The trie structure allows for reuse of pages at different point in time
>>> without having an actual copy of the whole address space.
>> What I meant was that most dword reads will really be adjacent in a
>> piece of memory and leaving the simple pointer math there is
>> clearer. You will only need to callback for indirection when you're
>> chasing an offset or an address.
>>
>>> Like a couple of pages might have been written by relocations associated
>>> to the first batch buffer, then 10 batches later you override them.
>>> The amount of memory we need to allocate for storing 2 snapshots is just
>>> the modified pages (+ ~12 nodes in the trie but those are less than
>>> 300bytes).
>>> That allows the UI to decode 2 batches at the same time as well as all
>>> the associated memory with a small cost.
>> Really there's no need to manage any memory for the buffers themselves,
>> they're immutably stored in the aub file. If you mmap the entire file
>> then you would just need to have a map of gfx addrs to file addrs that
>> would help direct your decoding.
>>
> Thanks, I'll try that.

Thinking more about it, I remember that intel_aubdump will break up
buffers into 32KiB chunks. So that would cause problems for this idea for
buffers bigger than 32KiB. We could try just not doing that splitting in
aubdump and see if it has any other adverse effects.