[Mesa-dev] [PATCH] gallium/radeon: add a new HUD query for the number of mapped buffers

Fri Jan 27 10:05:08 UTC 2017

Am 27.01.2017 um 10:33 schrieb Samuel Pitoiset:
>
>
> On 01/26/2017 12:07 PM, Christian König wrote:
>> Am 26.01.2017 um 12:01 schrieb Samuel Pitoiset:
>>>
>>>
>>> On 01/26/2017 03:45 AM, Michel Dänzer wrote:
>>>> On 25/01/17 11:19 PM, Samuel Pitoiset wrote:
>>>>> On 01/25/2017 03:56 AM, Michel Dänzer wrote:
>>>>>> On 25/01/17 12:05 AM, Marek Olšák wrote:
>>>>>>> On Tue, Jan 24, 2017 at 2:17 PM, Christian König
>>>>>>> <deathsimple at vodafone.de> wrote:
>>>>>>>> Am 24.01.2017 um 11:44 schrieb Samuel Pitoiset:
>>>>>>>>> On 01/24/2017 11:38 AM, Nicolai Hähnle wrote:
>>>>>>>>>> On 24.01.2017 11:34, Samuel Pitoiset wrote:
>>>>>>>>>>> On 01/24/2017 11:31 AM, Nicolai Hähnle wrote:
>>>>>>>>>>>> On 24.01.2017 11:25, Samuel Pitoiset wrote:
>>>>>>>>>>>>> On 01/24/2017 07:39 AM, Michel Dänzer wrote:
>>>>>>>>>>>>>> On 24/01/17 05:44 AM, Samuel Pitoiset wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Useful when debugging applications which map too much VRAM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is the number of mapped buffers really useful, as opposed
>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>> total
>>>>>>>>>>>>>> size of buffer mappings? Even if it was the latter 
>>>>>>>>>>>>>> though, it
>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>> show which mappings are for BOs in VRAM vs GTT, does it? 
>>>>>>>>>>>>>> Also,
>>>>>>>>>>>>>> even
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> total size of mappings of BOs currently in VRAM doesn't
>>>>>>>>>>>>>> directly
>>>>>>>>>>>>>> reflect
>>>>>>>>>>>>>> the pressure on the CPU visible part of VRAM — only the BOs
>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>> actively being accessed by the CPU contribute to that.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's actually useful to know the number of mapped buffers, 
>>>>>>>>>>>>> but
>>>>>>>>>>>>> maybe
>>>>>>>>>>>>> it
>>>>>>>>>>>>> would be better to have two separate counters for GTT and 
>>>>>>>>>>>>> VRAM.
>>>>>>>>>>>>> Although
>>>>>>>>>>>>> the number of mapped buffers in VRAM is most of the time
>>>>>>>>>>>>> very high
>>>>>>>>>>>>> compared to GTT AFAIK.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will submit in a follow-up patch, something which 
>>>>>>>>>>>>> reduces the
>>>>>>>>>>>>> number
>>>>>>>>>>>>> of mapped buffers in VRAM (when a BO has been mapped only
>>>>>>>>>>>>> once). And
>>>>>>>>>>>>> this new counter helped me.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Michel's point probably means that reducing the number/size of
>>>>>>>>>>>> mapped
>>>>>>>>>>>> VRAM buffers isn't actually that important though.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It seems useful for apps which map more than 256MB of VRAM.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> True, if all of that range is actually used by the CPU (which 
>>>>>>>>>> may
>>>>>>>>>> well
>>>>>>>>>> happen, of course). If I understand Michel correctly (and 
>>>>>>>>>> this was
>>>>>>>>>> news
>>>>>>>>>> to me as well), if 1GB of VRAM is mapped, but only 64MB of that
>>>>>>>>>> are
>>>>>>>>>> regularly accessed by the CPU, then the kernel will migrate 
>>>>>>>>>> all of
>>>>>>>>>> the
>>>>>>>>>> rest into non-visible VRAM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And this can hurt us, for example DXMD maps over 500MB of VRAM.
>>>>>>>>> And a
>>>>>>>>> bunch of BOs are only mapped once.
>>>>>>>>
>>>>>>>>
>>>>>>>> But when they are mapped once that won't be a problem.
>>>>>>>>
>>>>>>>> Again as Michel noted when a VRAM buffer is mapped it is migrated
>>>>>>>> into the
>>>>>>>> visible parts of VRAM on access, not on mapping.
>>>>>>>>
>>>>>>>> In other words you can map all your VRAM buffers and keep them
>>>>>>>> mapped and
>>>>>>>> that won't hurt anybody.
>>>>>>>
>>>>>>> Are you saying that I can map 2 GB of VRAM and it will all stay in
>>>>>>> VRAM and I'll get maximum performance if it's not accessed by 
>>>>>>> the CPU
>>>>>>> too much?
>>>>>>
>>>>>> Yes, that's how it's supposed to work.
>>>>>>
>>>>>>
>>>>>>> Are you sure it won't have any adverse effects on anything?
>>>>>>
>>>>>> That's a pretty big statement. :) Bugs happen.
>>>>>>
>>>>>>
>>>>>>> Having useless memory mappings certainly must have some negative
>>>>>>> effect on something. It doesn't seem like a good idea to have a
>>>>>>> lot of
>>>>>>> mapped memory that doesn't have to be mapped.
>>>>>>
>>>>>> I guess e.g. the bookkeeping overhead might become significant with
>>>>>> large numbers of mappings. Maybe the issue Sam has been looking
>>>>>> into is
>>>>>> actually related to something like that, not to VRAM?
>>>>>
>>>>> Well, with some games that new query can report more than 6.8k mapped
>>>>> buffers
>>>>
>>>> Right, and the new HUD query is useful for tracking this, as I can
>>>> easily imagine such a large number of mappings incurring significant
>>>> overhead somewhere.
>>>>
>>>>
>>>>> (both VRAM/GTT) but a bunch are for VRAM. And more than 1GB of
>>>>> mapped VRAM.
>>>>>
>>>>> When I look at the number of bytes moved by TTM, the counter is also
>>>>> very high in these apps and most likely tied to the slowdowns. The
>>>>> kernel memory manager is moving data almost all the time... 
>>>>> Presumably
>>>>> it's because of that aperture limit of 256MB.
>>>>
>>>> However, my point is that these issues are not directly related to the
>>>> number of mappings itself, and reducing the number of mappings won't
>>>> directly affect them.
>>>>
>>>>
>>>>> I would like to approach the problem by reducing the amount of vram
>>>>> needed by the userspace in order to prevent TTM to move lot of 
>>>>> data...
>>>>
>>>> One thing that might help there is not trying to put any buffers in 
>>>> VRAM
>>>> which will (likely) be accessed by the CPU and which are larger 
>>>> than say
>>>> 1/4 the size of CPU visible VRAM. And maybe also keeping track of the
>>>> total size of such buffers we're trying to put in VRAM, and stop 
>>>> when it
>>>> exceeds say 3/4.
>>>
>>> That could be a solution yes. But maybe, we should also try to reduce
>>> the number of mapped VRAM (for buffers mapped only once).
>>
>> For buffers mapped only once I suggest to just use a bouncing buffer in
>> GART.
>>
>> BTW: What kind of allocations are we talking about here? From the
>> application or driver internal allocations (e.g. shader code for 
>> example)?
>
> Sorry, I missed your question.
>
> Mainly from the application. The driver seems to unmap the buffers 
> correctly when they are mapped only once (ie. shader code).

Yeah, but that won't affect the kernel side.

Again, it doesn't matter if userspace unmaps the BO after accessing it 
once. As long as nothing kicks out the once accessed buffer from the 
visible BAR it will be staying there.

Regards,
Christian.

>
>>
>>> Hopefully that will reduce the overhead somewhere. Although, this
>>> won't change anything if we want to access more than 256MB at the same
>>> time.
>>>
>>> By the way, the amount of CPU accessible VRAM shouldn't be 256MB? I
>>> added a new query which tracks that and also something which stores at
>>> init time the visible VRAM size. And it returns:
>>>
>>> vram_vis_size = 238 MB
>>>
>>> That's not exactly 256MB. Any ideas?
>>
>> The GART table is in VRAM as well. So the amount of BAR available for
>> applications is actually much lower than the 256MB.
>>
>> Regards,
>> Christian.
>>
>>>
>>>>
>>>>
>>>>> Anyway, I'm going to push this patch.
>>>>
>>>> Thanks for updating the commit log.
>>>>
>>>>
>>