[Mesa-dev] [PATCH 32/32] r600/radeonsi: enable glsl/tgsi on-disk cache

Wed Feb 15 16:49:03 UTC 2017

On Wed, Feb 15, 2017 at 1:49 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 14.02.2017 23:51, Marek Olšák wrote:
>>
>> On Tue, Feb 14, 2017 at 11:47 PM, Timothy Arceri <tarceri at itsqueeze.com>
>> wrote:
>>>
>>>
>>>
>>> On 15/02/17 08:35, Marek Olšák wrote:
>>>>
>>>>
>>>> On Tue, Feb 14, 2017 at 9:53 PM, Timothy Arceri <tarceri at itsqueeze.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 15/02/17 02:14, Marek Olšák wrote:
>>>>>
>>>>>> On Tue, Feb 14, 2017 at 1:52 AM, Timothy Arceri
>>>>>> <tarceri at itsqueeze.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>>    src/gallium/drivers/r600/r600_pipe.c          | 10 ++++++++++
>>>>>>>    src/gallium/drivers/radeon/r600_pipe_common.c |  2 +-
>>>>>>>    src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
>>>>>>>    src/gallium/drivers/radeonsi/si_pipe.c        | 11 +++++++++++
>>>>>>>    src/gallium/include/pipe/p_screen.h           |  3 +++
>>>>>>>    src/mesa/state_tracker/st_context.c           |  2 ++
>>>>>>>    6 files changed, 28 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/src/gallium/drivers/r600/r600_pipe.c
>>>>>>> b/src/gallium/drivers/r600/r600_pipe.c
>>>>>>> index 5290f40..bdd8c0a 100644
>>>>>>> --- a/src/gallium/drivers/r600/r600_pipe.c
>>>>>>> +++ b/src/gallium/drivers/r600/r600_pipe.c
>>>>>>> @@ -600,6 +600,7 @@ static void r600_destroy_screen(struct
>>>>>>> pipe_screen*
>>>>>>> pscreen)
>>>>>>>                   compute_memory_pool_delete(rscreen->global_pool);
>>>>>>>           }
>>>>>>>
>>>>>>> +       disk_cache_destroy(rscreen->b.b.disk_shader_cache);
>>>>>>>           r600_destroy_common_screen(&rscreen->b);
>>>>>>>    }
>>>>>>>
>>>>>>> @@ -633,6 +634,15 @@ struct pipe_screen *r600_screen_create(struct
>>>>>>> radeon_winsys *ws)
>>>>>>>                   return NULL;
>>>>>>>           }
>>>>>>>
>>>>>>> +       uint32_t mesa_timestamp;
>>>>>>> +       if (disk_cache_get_function_timestamp(r600_screen_create,
>>>>>>> &mesa_timestamp)) {
>>>>>>> +               char *timestamp_str;
>>>>>>> +               if (asprintf(&timestamp_str, "%u", mesa_timestamp) !=
>>>>>>> -1)
>>>>>>> {
>>>>>>> +                       rscreen->b.b.disk_shader_cache =
>>>>>>> disk_cache_create(r600_get_chip_name(&rscreen->b), timestamp_str);
>>>>>>> +                       free(timestamp_str);
>>>>>>> +               }
>>>>>>> +       }
>>>>>>> +
>>>>>>>           if (rscreen->b.info.chip_class >= EVERGREEN) {
>>>>>>>                   rscreen->b.b.is_format_supported =
>>>>>>> evergreen_is_format_supported;
>>>>>>>           } else {
>>>>>>> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c
>>>>>>> b/src/gallium/drivers/radeon/r600_pipe_common.c
>>>>>>> index 95a6a48..4e5582f 100644
>>>>>>> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
>>>>>>> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
>>>>>>> @@ -722,7 +722,7 @@ static const char* r600_get_device_vendor(struct
>>>>>>> pipe_screen* pscreen)
>>>>>>>           return "AMD";
>>>>>>>    }
>>>>>>>
>>>>>>> -static const char* r600_get_chip_name(struct r600_common_screen
>>>>>>> *rscreen)
>>>>>>> +const char* r600_get_chip_name(struct r600_common_screen *rscreen)
>>>>>>>    {
>>>>>>>           switch (rscreen->info.family) {
>>>>>>>           case CHIP_R600: return "AMD R600";
>>>>>>> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h
>>>>>>> b/src/gallium/drivers/radeon/r600_pipe_common.h
>>>>>>> index 6eff9aa..0449d4d 100644
>>>>>>> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
>>>>>>> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
>>>>>>> @@ -765,6 +765,7 @@ void radeon_save_cs(struct radeon_winsys *ws,
>>>>>>> struct
>>>>>>> radeon_winsys_cs *cs,
>>>>>>>                       struct radeon_saved_cs *saved);
>>>>>>>    void radeon_clear_saved_cs(struct radeon_saved_cs *saved);
>>>>>>>    bool r600_check_device_reset(struct r600_common_context *rctx);
>>>>>>> +const char* r600_get_chip_name(struct r600_common_screen *rscreen);
>>>>>>>
>>>>>>>    /* r600_gpu_load.c */
>>>>>>>    void r600_gpu_load_kill_thread(struct r600_common_screen
>>>>>>> *rscreen);
>>>>>>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.c
>>>>>>> b/src/gallium/drivers/radeonsi/si_pipe.c
>>>>>>> index 853d850..0bb95b1 100644
>>>>>>> --- a/src/gallium/drivers/radeonsi/si_pipe.c
>>>>>>> +++ b/src/gallium/drivers/radeonsi/si_pipe.c
>>>>>>> @@ -712,6 +712,7 @@ static void si_destroy_screen(struct pipe_screen*
>>>>>>> pscreen)
>>>>>>>                   }
>>>>>>>           }
>>>>>>>           pipe_mutex_destroy(sscreen->shader_parts_mutex);
>>>>>>> +       disk_cache_destroy(sscreen->b.b.disk_shader_cache);
>>>>>>>           si_destroy_shader_cache(sscreen);
>>>>>>>           r600_destroy_common_screen(&sscreen->b);
>>>>>>>    }
>>>>>>> @@ -801,6 +802,16 @@ struct pipe_screen
>>>>>>> *radeonsi_screen_create(struct
>>>>>>> radeon_winsys *ws)
>>>>>>>                   return NULL;
>>>>>>>           }
>>>>>>>
>>>>>>> +       uint32_t mesa_timestamp, llvm_timestamp;
>>>>>>> +       if (disk_cache_get_function_timestamp(radeonsi_screen_create,
>>>>>>> &mesa_timestamp) &&
>>>>>>> +
>>>>>>> disk_cache_get_function_timestamp(LLVMInitializeAMDGPUTargetInfo,
>>>>>>> &llvm_timestamp)) {
>>>>>>> +               char *timestamp_str;
>>>>>>> +               if (asprintf(&timestamp_str, "%u_%u", mesa_timestamp,
>>>>>>> llvm_timestamp) != -1) {
>>>>>>> +                       sscreen->b.b.disk_shader_cache =
>>>>>>> disk_cache_create(r600_get_chip_name(&sscreen->b), timestamp_str);
>>>>>>
>>>>>>
>>>>>> Can you please at least make some effort to get close to 80 chars per
>>>>>> line even though it's not strictly required in this driver?
>>>>>>
>>>>>>> +                       free(timestamp_str);
>>>>>>> +               }
>>>>>>> +       }
>>>>>>> +
>>>>>>>           si_handle_env_var_force_family(sscreen);
>>>>>>>
>>>>>>>           if (!debug_get_bool_option("RADEON_DISABLE_PERFCOUNTERS",
>>>>>>> false))
>>>>>>> diff --git a/src/gallium/include/pipe/p_screen.h
>>>>>>> b/src/gallium/include/pipe/p_screen.h
>>>>>>> index b6203f1..6fd527f 100644
>>>>>>> --- a/src/gallium/include/pipe/p_screen.h
>>>>>>> +++ b/src/gallium/include/pipe/p_screen.h
>>>>>>> @@ -42,6 +42,7 @@
>>>>>>>    #include "pipe/p_format.h"
>>>>>>>    #include "pipe/p_defines.h"
>>>>>>>    #include "pipe/p_video_enums.h"
>>>>>>> +#include "util/disk_cache.h"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> @@ -318,6 +319,8 @@ struct pipe_screen {
>>>>>>>       const void *(*get_compiler_options)(struct pipe_screen *screen,
>>>>>>>                                          enum pipe_shader_ir ir,
>>>>>>>                                          unsigned shader);
>>>>>>> +
>>>>>>> +   struct disk_cache *disk_shader_cache;
>>>>>>>    };
>>>>>>
>>>>>>
>>>>>> Changes to gallium/include/pipe must be separate commits. We don't
>>>>>> change this interface unless there is a very good reason to do so. You
>>>>>> might have to get Brian's or Roland's ack if you wanna pursue this
>>>>>> change, though the likelihood of acceptance is pretty low on this one.
>>>>>>
>>>>>> However you can ignore all the above, because there is a different
>>>>>> issue to discuss. If I understand it correctly, this only caches TGSI.
>>>>>> What's the point in putting the cache into the driver in that case?
>>>>>
>>>>>
>>>>> I was trying to land the glsl ir and tgsi pieces and avoid getting
>>>>> stuck
>>>>> discussing the GCN cache patch at this point, there will defiantly be a
>>>>> GCN
>>>>> cache.
>>>>>>
>>>>>>
>>>>>> If
>>>>>> the goal is to have a GLSL->TGSI cache, it should live in st/mesa. If
>>>>>> the goal is to have a GLSL->GCN cache, it might live in the driver.
>>>>>> However, even that is not necessary if you just expose shader binaries
>>>>>> to st/mesa, in which case st/mesa can query binaries from the driver
>>>>>> and the GLSL->GCN cache can live entirely in st/mesa. If you need the
>>>>>> lib timestamp in st/mesa, you can also add a query to pipe_screen
>>>>>> returning the lib timestamp only. Then you'll have both native shader
>>>>>> binaries and the driver compiler timestamp exposed to st/mesa. So I'd
>>>>>> like to understand why the cache is in the driver.
>>>>>
>>>>>
>>>>> This might work ok for radeonsi as you don't really have to worry about
>>>>> variants, but what about other drivers? It would seem odd to add
>>>>> something
>>>>> to the st if its only used by a single driver wouldn't it? Happy to be
>>>>> wrong
>>>>> about this, but this was my thought process here.
>>>>
>>>>
>>>> OK. That's a fair point. Nouveau and radeonsi don't need a cache in
>>>> the driver,
>
>
>
> Not sure what you're saying here. Having the GCN in the cache would
> certainly help.
>
> That said, implementing this in the state tracker via an interface like
> below makes sense to me.
>
>
>>>> but other drivers like r600g might. However, r600g
>>>> compilation is so much faster than radeonsi, so it might not be that
>>>> important. Also given the rate of r600g development, I don't see a
>>>> native shader cache happening there. What other drivers are there
>>>> really? We have freedreno and vc4 as the other two actively developed
>>>> drivers. So you can ask Rob or Eric if they need a shader cache inside
>>>> the driver, and if not, well, there is no point in putting the cache
>>>> instance into drivers.
>>>
>>>
>>> I asked Eric about this a while ago and while he hadn't thought about it
>>> too
>>> much he was thinking he would cache NIR. This is an interesting approach
>>> and
>>> something that I avoided in i965 because caching nir seemed painful.
>>> Anyway
>>> I'll point them to this thread for their comments.
>>>
>>>>
>>>>> Also its not clear to me where I can store the binary in the st, that
>>>>> is
>>>>> acceptable and accessible by the driver, any advice on this would be
>>>>> appreciated.
>>>>
>>>>
>>>> The st is never accessible from the driver.
>>>
>>>
>>>
>>> Sure what I mean is if we load the radeonsi/nouveau binary in the st at
>>> link
>>> time, how would you pass that down to the driver?
>>
>>
>> I would add an interface for creating a shader from a binary. So we have
>> this:
>> - pipe_screen::get_compiler_timestamp
>
>
> I think this should return a string (and have a slightly different name) so
> we have the freedom to add additional information when required. I'm
> thinking GPU type (though that's also in the screen name) and R600_DEBUG
> flags that affect compilation.

I talked with Timothy about this on IRC and the current plan is to
keep the on-disk cache inside gallium drivers. It might directly
interact with our in-memory cache, i.e. all in-memory cache misses
will fall back to the on-disk cache, which will load shaders from the
disk to the in-memory cache, and the in-memory cache will return a
cache hit. All shaders will stay in the in-memory cache after that.

Note that the on-disk cache shouldn't keep loaded shaders in memory,
because the in-memory cache will do that.

>
>
>> - pipe_context::get_shader_binary
>> - pipe_context::create_shader_from_binary
>
>
> Makes sense, but there's an open question: How do we store (optimized)
> variants in the cache? I admit they're not as important for the raw loading
> time, but it would be nice not to have the compiler threads busy during the
> first seconds of a game, either.
>
> We don't have to solve that all at once, but it would be nice to have an
> interface that can be extended easily.

Adding optimized variants into the on-disk cache could add a lot of
complexity into the existing code but what would the benefit be? If
the async compilation is a problem, we can either decrease the number
of threads, or decrease the thread priority, or both (IMO).

Marek