[Mesa-dev] [PATCH] gallium/radeon: add a new HUD query for the number of mapped buffers

Thu Jan 26 11:07:28 UTC 2017

Am 26.01.2017 um 12:01 schrieb Samuel Pitoiset:
>
>
> On 01/26/2017 03:45 AM, Michel Dänzer wrote:
>> On 25/01/17 11:19 PM, Samuel Pitoiset wrote:
>>> On 01/25/2017 03:56 AM, Michel Dänzer wrote:
>>>> On 25/01/17 12:05 AM, Marek Olšák wrote:
>>>>> On Tue, Jan 24, 2017 at 2:17 PM, Christian König
>>>>> <deathsimple at vodafone.de> wrote:
>>>>>> Am 24.01.2017 um 11:44 schrieb Samuel Pitoiset:
>>>>>>> On 01/24/2017 11:38 AM, Nicolai Hähnle wrote:
>>>>>>>> On 24.01.2017 11:34, Samuel Pitoiset wrote:
>>>>>>>>> On 01/24/2017 11:31 AM, Nicolai Hähnle wrote:
>>>>>>>>>> On 24.01.2017 11:25, Samuel Pitoiset wrote:
>>>>>>>>>>> On 01/24/2017 07:39 AM, Michel Dänzer wrote:
>>>>>>>>>>>> On 24/01/17 05:44 AM, Samuel Pitoiset wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Useful when debugging applications which map too much VRAM.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is the number of mapped buffers really useful, as opposed 
>>>>>>>>>>>> to the
>>>>>>>>>>>> total
>>>>>>>>>>>> size of buffer mappings? Even if it was the latter though, it
>>>>>>>>>>>> doesn't
>>>>>>>>>>>> show which mappings are for BOs in VRAM vs GTT, does it? Also,
>>>>>>>>>>>> even
>>>>>>>>>>>> the
>>>>>>>>>>>> total size of mappings of BOs currently in VRAM doesn't 
>>>>>>>>>>>> directly
>>>>>>>>>>>> reflect
>>>>>>>>>>>> the pressure on the CPU visible part of VRAM — only the BOs
>>>>>>>>>>>> which are
>>>>>>>>>>>> actively being accessed by the CPU contribute to that.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It's actually useful to know the number of mapped buffers, but
>>>>>>>>>>> maybe
>>>>>>>>>>> it
>>>>>>>>>>> would be better to have two separate counters for GTT and VRAM.
>>>>>>>>>>> Although
>>>>>>>>>>> the number of mapped buffers in VRAM is most of the time 
>>>>>>>>>>> very high
>>>>>>>>>>> compared to GTT AFAIK.
>>>>>>>>>>>
>>>>>>>>>>> I will submit in a follow-up patch, something which reduces the
>>>>>>>>>>> number
>>>>>>>>>>> of mapped buffers in VRAM (when a BO has been mapped only
>>>>>>>>>>> once). And
>>>>>>>>>>> this new counter helped me.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Michel's point probably means that reducing the number/size of
>>>>>>>>>> mapped
>>>>>>>>>> VRAM buffers isn't actually that important though.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It seems useful for apps which map more than 256MB of VRAM.
>>>>>>>>
>>>>>>>>
>>>>>>>> True, if all of that range is actually used by the CPU (which may
>>>>>>>> well
>>>>>>>> happen, of course). If I understand Michel correctly (and this was
>>>>>>>> news
>>>>>>>> to me as well), if 1GB of VRAM is mapped, but only 64MB of that 
>>>>>>>> are
>>>>>>>> regularly accessed by the CPU, then the kernel will migrate all of
>>>>>>>> the
>>>>>>>> rest into non-visible VRAM.
>>>>>>>
>>>>>>>
>>>>>>> And this can hurt us, for example DXMD maps over 500MB of VRAM. 
>>>>>>> And a
>>>>>>> bunch of BOs are only mapped once.
>>>>>>
>>>>>>
>>>>>> But when they are mapped once that won't be a problem.
>>>>>>
>>>>>> Again as Michel noted when a VRAM buffer is mapped it is migrated
>>>>>> into the
>>>>>> visible parts of VRAM on access, not on mapping.
>>>>>>
>>>>>> In other words you can map all your VRAM buffers and keep them
>>>>>> mapped and
>>>>>> that won't hurt anybody.
>>>>>
>>>>> Are you saying that I can map 2 GB of VRAM and it will all stay in
>>>>> VRAM and I'll get maximum performance if it's not accessed by the CPU
>>>>> too much?
>>>>
>>>> Yes, that's how it's supposed to work.
>>>>
>>>>
>>>>> Are you sure it won't have any adverse effects on anything?
>>>>
>>>> That's a pretty big statement. :) Bugs happen.
>>>>
>>>>
>>>>> Having useless memory mappings certainly must have some negative
>>>>> effect on something. It doesn't seem like a good idea to have a 
>>>>> lot of
>>>>> mapped memory that doesn't have to be mapped.
>>>>
>>>> I guess e.g. the bookkeeping overhead might become significant with
>>>> large numbers of mappings. Maybe the issue Sam has been looking 
>>>> into is
>>>> actually related to something like that, not to VRAM?
>>>
>>> Well, with some games that new query can report more than 6.8k mapped
>>> buffers
>>
>> Right, and the new HUD query is useful for tracking this, as I can
>> easily imagine such a large number of mappings incurring significant
>> overhead somewhere.
>>
>>
>>> (both VRAM/GTT) but a bunch are for VRAM. And more than 1GB of 
>>> mapped VRAM.
>>>
>>> When I look at the number of bytes moved by TTM, the counter is also
>>> very high in these apps and most likely tied to the slowdowns. The
>>> kernel memory manager is moving data almost all the time... Presumably
>>> it's because of that aperture limit of 256MB.
>>
>> However, my point is that these issues are not directly related to the
>> number of mappings itself, and reducing the number of mappings won't
>> directly affect them.
>>
>>
>>> I would like to approach the problem by reducing the amount of vram
>>> needed by the userspace in order to prevent TTM to move lot of data...
>>
>> One thing that might help there is not trying to put any buffers in VRAM
>> which will (likely) be accessed by the CPU and which are larger than say
>> 1/4 the size of CPU visible VRAM. And maybe also keeping track of the
>> total size of such buffers we're trying to put in VRAM, and stop when it
>> exceeds say 3/4.
>
> That could be a solution yes. But maybe, we should also try to reduce 
> the number of mapped VRAM (for buffers mapped only once).

For buffers mapped only once I suggest to just use a bouncing buffer in 
GART.

BTW: What kind of allocations are we talking about here? From the 
application or driver internal allocations (e.g. shader code for example)?

> Hopefully that will reduce the overhead somewhere. Although, this 
> won't change anything if we want to access more than 256MB at the same 
> time.
>
> By the way, the amount of CPU accessible VRAM shouldn't be 256MB? I 
> added a new query which tracks that and also something which stores at 
> init time the visible VRAM size. And it returns:
>
> vram_vis_size = 238 MB
>
> That's not exactly 256MB. Any ideas?

The GART table is in VRAM as well. So the amount of BAR available for 
applications is actually much lower than the 256MB.

Regards,
Christian.

>
>>
>>
>>> Anyway, I'm going to push this patch.
>>
>> Thanks for updating the commit log.
>>
>>