[Mesa-dev] [PATCH] gallium/radeon: add a new HUD query for the number of mapped buffers

Thu Jan 26 11:09:56 UTC 2017

On 01/26/2017 12:07 PM, Christian König wrote:
> Am 26.01.2017 um 12:01 schrieb Samuel Pitoiset:
>>
>>
>> On 01/26/2017 03:45 AM, Michel Dänzer wrote:
>>> On 25/01/17 11:19 PM, Samuel Pitoiset wrote:
>>>> On 01/25/2017 03:56 AM, Michel Dänzer wrote:
>>>>> On 25/01/17 12:05 AM, Marek Olšák wrote:
>>>>>> On Tue, Jan 24, 2017 at 2:17 PM, Christian König
>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>> Am 24.01.2017 um 11:44 schrieb Samuel Pitoiset:
>>>>>>>> On 01/24/2017 11:38 AM, Nicolai Hähnle wrote:
>>>>>>>>> On 24.01.2017 11:34, Samuel Pitoiset wrote:
>>>>>>>>>> On 01/24/2017 11:31 AM, Nicolai Hähnle wrote:
>>>>>>>>>>> On 24.01.2017 11:25, Samuel Pitoiset wrote:
>>>>>>>>>>>> On 01/24/2017 07:39 AM, Michel Dänzer wrote:
>>>>>>>>>>>>> On 24/01/17 05:44 AM, Samuel Pitoiset wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Useful when debugging applications which map too much VRAM.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is the number of mapped buffers really useful, as opposed
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> total
>>>>>>>>>>>>> size of buffer mappings? Even if it was the latter though, it
>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>> show which mappings are for BOs in VRAM vs GTT, does it? Also,
>>>>>>>>>>>>> even
>>>>>>>>>>>>> the
>>>>>>>>>>>>> total size of mappings of BOs currently in VRAM doesn't
>>>>>>>>>>>>> directly
>>>>>>>>>>>>> reflect
>>>>>>>>>>>>> the pressure on the CPU visible part of VRAM — only the BOs
>>>>>>>>>>>>> which are
>>>>>>>>>>>>> actively being accessed by the CPU contribute to that.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It's actually useful to know the number of mapped buffers, but
>>>>>>>>>>>> maybe
>>>>>>>>>>>> it
>>>>>>>>>>>> would be better to have two separate counters for GTT and VRAM.
>>>>>>>>>>>> Although
>>>>>>>>>>>> the number of mapped buffers in VRAM is most of the time
>>>>>>>>>>>> very high
>>>>>>>>>>>> compared to GTT AFAIK.
>>>>>>>>>>>>
>>>>>>>>>>>> I will submit in a follow-up patch, something which reduces the
>>>>>>>>>>>> number
>>>>>>>>>>>> of mapped buffers in VRAM (when a BO has been mapped only
>>>>>>>>>>>> once). And
>>>>>>>>>>>> this new counter helped me.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Michel's point probably means that reducing the number/size of
>>>>>>>>>>> mapped
>>>>>>>>>>> VRAM buffers isn't actually that important though.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It seems useful for apps which map more than 256MB of VRAM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> True, if all of that range is actually used by the CPU (which may
>>>>>>>>> well
>>>>>>>>> happen, of course). If I understand Michel correctly (and this was
>>>>>>>>> news
>>>>>>>>> to me as well), if 1GB of VRAM is mapped, but only 64MB of that
>>>>>>>>> are
>>>>>>>>> regularly accessed by the CPU, then the kernel will migrate all of
>>>>>>>>> the
>>>>>>>>> rest into non-visible VRAM.
>>>>>>>>
>>>>>>>>
>>>>>>>> And this can hurt us, for example DXMD maps over 500MB of VRAM.
>>>>>>>> And a
>>>>>>>> bunch of BOs are only mapped once.
>>>>>>>
>>>>>>>
>>>>>>> But when they are mapped once that won't be a problem.
>>>>>>>
>>>>>>> Again as Michel noted when a VRAM buffer is mapped it is migrated
>>>>>>> into the
>>>>>>> visible parts of VRAM on access, not on mapping.
>>>>>>>
>>>>>>> In other words you can map all your VRAM buffers and keep them
>>>>>>> mapped and
>>>>>>> that won't hurt anybody.
>>>>>>
>>>>>> Are you saying that I can map 2 GB of VRAM and it will all stay in
>>>>>> VRAM and I'll get maximum performance if it's not accessed by the CPU
>>>>>> too much?
>>>>>
>>>>> Yes, that's how it's supposed to work.
>>>>>
>>>>>
>>>>>> Are you sure it won't have any adverse effects on anything?
>>>>>
>>>>> That's a pretty big statement. :) Bugs happen.
>>>>>
>>>>>
>>>>>> Having useless memory mappings certainly must have some negative
>>>>>> effect on something. It doesn't seem like a good idea to have a
>>>>>> lot of
>>>>>> mapped memory that doesn't have to be mapped.
>>>>>
>>>>> I guess e.g. the bookkeeping overhead might become significant with
>>>>> large numbers of mappings. Maybe the issue Sam has been looking
>>>>> into is
>>>>> actually related to something like that, not to VRAM?
>>>>
>>>> Well, with some games that new query can report more than 6.8k mapped
>>>> buffers
>>>
>>> Right, and the new HUD query is useful for tracking this, as I can
>>> easily imagine such a large number of mappings incurring significant
>>> overhead somewhere.
>>>
>>>
>>>> (both VRAM/GTT) but a bunch are for VRAM. And more than 1GB of
>>>> mapped VRAM.
>>>>
>>>> When I look at the number of bytes moved by TTM, the counter is also
>>>> very high in these apps and most likely tied to the slowdowns. The
>>>> kernel memory manager is moving data almost all the time... Presumably
>>>> it's because of that aperture limit of 256MB.
>>>
>>> However, my point is that these issues are not directly related to the
>>> number of mappings itself, and reducing the number of mappings won't
>>> directly affect them.
>>>
>>>
>>>> I would like to approach the problem by reducing the amount of vram
>>>> needed by the userspace in order to prevent TTM to move lot of data...
>>>
>>> One thing that might help there is not trying to put any buffers in VRAM
>>> which will (likely) be accessed by the CPU and which are larger than say
>>> 1/4 the size of CPU visible VRAM. And maybe also keeping track of the
>>> total size of such buffers we're trying to put in VRAM, and stop when it
>>> exceeds say 3/4.
>>
>> That could be a solution yes. But maybe, we should also try to reduce
>> the number of mapped VRAM (for buffers mapped only once).
>
> For buffers mapped only once I suggest to just use a bouncing buffer in
> GART.
>
> BTW: What kind of allocations are we talking about here? From the
> application or driver internal allocations (e.g. shader code for example)?
>
>> Hopefully that will reduce the overhead somewhere. Although, this
>> won't change anything if we want to access more than 256MB at the same
>> time.
>>
>> By the way, the amount of CPU accessible VRAM shouldn't be 256MB? I
>> added a new query which tracks that and also something which stores at
>> init time the visible VRAM size. And it returns:
>>
>> vram_vis_size = 238 MB
>>
>> That's not exactly 256MB. Any ideas?
>
> The GART table is in VRAM as well. So the amount of BAR available for
> applications is actually much lower than the 256MB.

Ok, that makes sense. Thanks.

>
> Regards,
> Christian.
>
>>
>>>
>>>
>>>> Anyway, I'm going to push this patch.
>>>
>>> Thanks for updating the commit log.
>>>
>>>
>