[Mesa-dev] renderdoc-traces: like shader-db for runtime

Tue Jun 25 12:38:19 UTC 2019

Hi,

On 24.6.2019 19.36, Elie Tournier wrote:
> Great topic. For the past few days, I was looking at a CI for Mesa:
> https://gitlab.freedesktop.org/hopetech/tracie
> OK, it's in a very very alpha stage. ;)
> 
> My idea was to use apitrace to dump and replay traces then compare images with reference
> images or images dump the previous week.
> Apitrace was a good choice for a "correctness CI", maybe not for the "performance CI".
> 
> @eric Out of curiosity, did you looked at apitrace or did you go straight with renderdoc?

Note: ezBench supports both Apitrace & vktrace.

> I add below some comments based on what I learned playing with the CI.
> 
> 
> On Sat, Jun 22, 2019 at 10:59:34AM -0700, Rob Clark wrote:
>> On Thu, Jun 20, 2019 at 12:26 PM Eric Anholt <eric at anholt.net> wrote:
>>>
>>> Hey folks, I wanted to show you this follow-on to shader-db I've been
>>> working on:
>>>
>>> https://gitlab.freedesktop.org/anholt/renderdoc-traces

"On each frame drawn, renderdoccmd replay sets up the initial GL state 
again. This will include compiling programs."

Ouch.  This makes it pretty much useless for performance testing.

>>> For x86 development I've got a collection of ad-hoc scripts to capture
>>> FPS numbers from various moderately interesting open source apps so I
>>> could compare-perf them.  I was only looking at specific apps when they
>>> seemed relevant, so it would be easy to miss regressions.
>>>
>>> Starting work on freedreno, one of the first questions I ran into was
>>> "does this change to the command stream make the driver faster?".  I
>>> don't have my old set of apps on my debian ARM systems, and even less so
>>> for Chrome OS.  Ultimately, users will be judging us based on web
>>> browser and android app performance, not whatever I've got laying around
>>> on my debian system.  And, I'd love to fix that "I ignore apps unless I
>>> think of them" thing.
>>>
>>> So, I've used renderdoc to capture some traces from Android apps.  With
>>> an unlocked phone, it's pretty easy.  Tossing those in a repo (not
>>> shared here), I can then run driver changes past them to see what
>>> happens.  See
>>> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1134 for some
>>> results.
>>>
>>> Where is this repo going from here?
>>>
>>> - I add a runner for doing frame-to-frame consistency tests.  We could
>>>    catch UB in a lot of circumstances by replaying a few times and making
>>>    sure that results are consistent.  Comparing frames between drivers
>>>    might also be interesting, though for that you would need human
>>>    validation since pixel values and pixels lit will change on many
>>>    shader optimization changes.
> Comparing frames between drivers is hard. I try comparing LLVMpipe, sotfpipe and i965.
> They all produce different frames.
> For the human validation, it's sadly hard to avoid. One of the idea Erik come of with
> was to use a mask.

Statistical approach could be better (like error is handled in video
compression).

> I think we should first focus on comparing frame from the same driver and extend later.
> The subject is hard enough. ;)

Note that there are some benchmarks which don't produce stable rendering
results because they include random input, so you can't do automated
rendering differences detection for them.  I would suggest just dropping
those (I don't anymore remember which benchmarks were such, but Martin
Peres might).

>>> - Need to collect more workloads for the public repo:
> I would be happy to help here.
> We should create a list of FOSS games/apps to dump based on there OGL requirement.
>>>
>>>    - I've tried to capture webgl on Chrome and Firefox on Linux with no
>>>      luck. WebGL on ff is supposed to work under apitrace, maybe I could
>>>      do that and then replay on top of renderdoc to capture.
>>
>> perhaps worth a try capturing these on android?
>>
>> I have managed to apitrace chromium-browser in the past.. it ends up a
>> bit weird because there are multiple contexts, but apitrace has
>> managed to replay them.  Maybe the multiple ctx thing is confusing
>> renderdoc?
>>
>> (tbh I've not really played w/ renderdoc yet.. I should probably do so..)
>>
>>>    - Mozilla folks tell me that firefox's WebRender display lists can be
>>>      captured in browser and then replayed from the WR repo under
>>>      apitrace or rendredoc.
>>>
>>>    - I tried capturing Mozilla's new Pathfinder (think SVG renderer), but
>>>      it wouldn't play the demo under renderdoc.
>>>
>>>    Do you have some apps that should be represented here?
>>>
>>> - Add microbenchmarks?  Looks like it would be pretty easy to grab
>>>    piglit drawoverhead results, not using renderdoc.  Capturing from
>>>    arbitrary apps expands the scope of the repo in a way I'm not sure I'm
>>>    excited about (Do we do different configs in those apps?  Then we need
>>>    config infrastructure.  Ugh).
>>>
>>> - I should probably add an estimate of "does this overall improve or
>>>    hurt perf?"  Yay doing more stats.

Good way to measure perf could be repeating specific frame in a trace
(when that doesn't include re-compiling the shaders).

If I remember correctly, that's already supported by vktrace and
(apitrace based) frameretrace.

> Sure. Sadly most benchmark I tryed were unstable performancewise.
> Cache change result a lot. Well, you already know it.
If shader cache changes things, shaders are compiled during
benchmarking, which means it's a bad benchmark.  Shader compilation
should be benchmarked separately.

Or if you were meaning CPU caches...  Completely unrelated changes can
impact CPU speed because code gets aligned slightly differently in
memory which affects cache access patterns.  I.e. some performance
change can be completely accidental and disappear with another
completely unrelated code change.

Note also that I've found that in memory bandwidth bound test-cases
there can be ~10% variation on Intel based on how memory mappings happen
to get aligned (which can change from between boots even more than
between individual process run, or just LD_PRELOADing library that isn't
even used).

Because of latter, one sees real performance changes better by running
tests with different commits (i.e. continuous commit perf trend), than
just doing repeats with a single build.

>>> - I'd love to drop scipy.  I only need it for stats.t.ppf, but it
>>>    prevents me from running run.py directly on my targets.

How much you need PPF?  Maybe you could use some simpler statistics
(e.g. from python3 builtin statistics module) if scipy import fails?

	- Eero

>> thoughts about adding amd_perfcntr/etc support?  I guess some of the
>> perfcntrs we have perhaps want some post-processing to turn into
>> usuable numbers, and plenty of them we don't know much about what they
>> are other than the name.  But some of them are easy enough to
>> understand (like # of fs ALU cycles, etc), and being able to compare
>> that before/after shader optimizations seems useful.
>>
>> Also, it would be nice to have a way to extract "slow frames" somehow
>> (maybe out of scope for this tool, but related?).. ie. when framerate
>> suddenly drops, those are the frames we probably want to look at more
>> closely..
> +1