[Mesa-dev] [PATCH] gallium/radeon: add a new HUD query for the number of mapped buffers

Thu Jan 26 11:01:10 UTC 2017

On 01/26/2017 03:45 AM, Michel Dänzer wrote:
> On 25/01/17 11:19 PM, Samuel Pitoiset wrote:
>> On 01/25/2017 03:56 AM, Michel Dänzer wrote:
>>> On 25/01/17 12:05 AM, Marek Olšák wrote:
>>>> On Tue, Jan 24, 2017 at 2:17 PM, Christian König
>>>> <deathsimple at vodafone.de> wrote:
>>>>> Am 24.01.2017 um 11:44 schrieb Samuel Pitoiset:
>>>>>> On 01/24/2017 11:38 AM, Nicolai Hähnle wrote:
>>>>>>> On 24.01.2017 11:34, Samuel Pitoiset wrote:
>>>>>>>> On 01/24/2017 11:31 AM, Nicolai Hähnle wrote:
>>>>>>>>> On 24.01.2017 11:25, Samuel Pitoiset wrote:
>>>>>>>>>> On 01/24/2017 07:39 AM, Michel Dänzer wrote:
>>>>>>>>>>> On 24/01/17 05:44 AM, Samuel Pitoiset wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Useful when debugging applications which map too much VRAM.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is the number of mapped buffers really useful, as opposed to the
>>>>>>>>>>> total
>>>>>>>>>>> size of buffer mappings? Even if it was the latter though, it
>>>>>>>>>>> doesn't
>>>>>>>>>>> show which mappings are for BOs in VRAM vs GTT, does it? Also,
>>>>>>>>>>> even
>>>>>>>>>>> the
>>>>>>>>>>> total size of mappings of BOs currently in VRAM doesn't directly
>>>>>>>>>>> reflect
>>>>>>>>>>> the pressure on the CPU visible part of VRAM — only the BOs
>>>>>>>>>>> which are
>>>>>>>>>>> actively being accessed by the CPU contribute to that.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It's actually useful to know the number of mapped buffers, but
>>>>>>>>>> maybe
>>>>>>>>>> it
>>>>>>>>>> would be better to have two separate counters for GTT and VRAM.
>>>>>>>>>> Although
>>>>>>>>>> the number of mapped buffers in VRAM is most of the time very high
>>>>>>>>>> compared to GTT AFAIK.
>>>>>>>>>>
>>>>>>>>>> I will submit in a follow-up patch, something which reduces the
>>>>>>>>>> number
>>>>>>>>>> of mapped buffers in VRAM (when a BO has been mapped only
>>>>>>>>>> once). And
>>>>>>>>>> this new counter helped me.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Michel's point probably means that reducing the number/size of
>>>>>>>>> mapped
>>>>>>>>> VRAM buffers isn't actually that important though.
>>>>>>>>
>>>>>>>>
>>>>>>>> It seems useful for apps which map more than 256MB of VRAM.
>>>>>>>
>>>>>>>
>>>>>>> True, if all of that range is actually used by the CPU (which may
>>>>>>> well
>>>>>>> happen, of course). If I understand Michel correctly (and this was
>>>>>>> news
>>>>>>> to me as well), if 1GB of VRAM is mapped, but only 64MB of that are
>>>>>>> regularly accessed by the CPU, then the kernel will migrate all of
>>>>>>> the
>>>>>>> rest into non-visible VRAM.
>>>>>>
>>>>>>
>>>>>> And this can hurt us, for example DXMD maps over 500MB of VRAM. And a
>>>>>> bunch of BOs are only mapped once.
>>>>>
>>>>>
>>>>> But when they are mapped once that won't be a problem.
>>>>>
>>>>> Again as Michel noted when a VRAM buffer is mapped it is migrated
>>>>> into the
>>>>> visible parts of VRAM on access, not on mapping.
>>>>>
>>>>> In other words you can map all your VRAM buffers and keep them
>>>>> mapped and
>>>>> that won't hurt anybody.
>>>>
>>>> Are you saying that I can map 2 GB of VRAM and it will all stay in
>>>> VRAM and I'll get maximum performance if it's not accessed by the CPU
>>>> too much?
>>>
>>> Yes, that's how it's supposed to work.
>>>
>>>
>>>> Are you sure it won't have any adverse effects on anything?
>>>
>>> That's a pretty big statement. :) Bugs happen.
>>>
>>>
>>>> Having useless memory mappings certainly must have some negative
>>>> effect on something. It doesn't seem like a good idea to have a lot of
>>>> mapped memory that doesn't have to be mapped.
>>>
>>> I guess e.g. the bookkeeping overhead might become significant with
>>> large numbers of mappings. Maybe the issue Sam has been looking into is
>>> actually related to something like that, not to VRAM?
>>
>> Well, with some games that new query can report more than 6.8k mapped
>> buffers
>
> Right, and the new HUD query is useful for tracking this, as I can
> easily imagine such a large number of mappings incurring significant
> overhead somewhere.
>
>
>> (both VRAM/GTT) but a bunch are for VRAM. And more than 1GB of mapped VRAM.
>>
>> When I look at the number of bytes moved by TTM, the counter is also
>> very high in these apps and most likely tied to the slowdowns. The
>> kernel memory manager is moving data almost all the time... Presumably
>> it's because of that aperture limit of 256MB.
>
> However, my point is that these issues are not directly related to the
> number of mappings itself, and reducing the number of mappings won't
> directly affect them.
>
>
>> I would like to approach the problem by reducing the amount of vram
>> needed by the userspace in order to prevent TTM to move lot of data...
>
> One thing that might help there is not trying to put any buffers in VRAM
> which will (likely) be accessed by the CPU and which are larger than say
> 1/4 the size of CPU visible VRAM. And maybe also keeping track of the
> total size of such buffers we're trying to put in VRAM, and stop when it
> exceeds say 3/4.

That could be a solution yes. But maybe, we should also try to reduce 
the number of mapped VRAM (for buffers mapped only once). Hopefully that 
will reduce the overhead somewhere. Although, this won't change anything 
if we want to access more than 256MB at the same time.

By the way, the amount of CPU accessible VRAM shouldn't be 256MB? I 
added a new query which tracks that and also something which stores at 
init time the visible VRAM size. And it returns:

vram_vis_size = 238 MB

That's not exactly 256MB. Any ideas?

>
>
>> Anyway, I'm going to push this patch.
>
> Thanks for updating the commit log.
>
>