[Mesa-dev] [PATCH 3/4] winsys/radeon: Keep bo statistics

Wed Jan 8 09:21:14 PST 2014

On Wed, 8 Jan 2014 15:54:04 +0100
Marek Olšák <maraeo at gmail.com> wrote:

> > On Wed, 8 Jan 2014 12:03:12 +0100
> > Marek Olšák <maraeo at gmail.com> wrote:
> >> Why don't you just set the statistics once per CS in
> >> radeon_drm_cs_flush? I don't see a value in doing it in every function
> >> that sets the resources.
> >
> > It's the only way to get accurate statistics that I can see. Doing it
> > per-cs could be off by big amounts (100x even?). Being off by that much
> > could lead to rather worse decisions.
> 
> It's not accurate at all, it's actually pretty random. The stats
> should not be called "num_reads" and "num_writes", they should be
> called "num_state_changes", and the number of resource state changes
> has nothing to do with how the resources affect GPU performance. You
> might get a pretty high score for unimportant resources with your
> approach. It's as useful as assigning a random number to each
> resource.

Yes, more accurate names would be "times_bound_for_reads" and
"times_bound_for_writes", but those are too long names for my taste ;)

> Another issue is that you record times when resource state changes
> happen, but rendering actually starts after radeon_drm_cs_flush is
> called. Your recorded times actually only tell you when the user
> changed states, which may be useful for CPU measurements, but it's
> useless for everything else.

The timing accuracy is intended to determine "recently", ie "within this
frame" or "within a couple frames". It achieves that as far
as I can see.

> The only way to get accurate numbers of reads and writes is to use GPU
> performance counters, which we won't probably have in the open driver,
> and I don't think it's possible to record the numbers for every
> resource individually anyway.

Yes, IIRC we already had this discussion on IRC a month ago (?).

The idea is to get as accurate info as possible cpu-side, since it has
mid-high correlation with actual usage, even if it's not perfect. As
you say, we cannot get perfect numbers both due to lacking docs, and
due to the overhead it would have.

I still believe that with this info, we can get better results than we
currently get, in some cases with significant improvements (the
over-VRAM ping-pong cases).

- Lauri