[Bug 91857] Mesa linker is slow

Mon Aug 29 12:57:12 UTC 2016

https://bugs.freedesktop.org/show_bug.cgi?id=91857

Eero Tamminen <eero.t.tamminen at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eero.t.tamminen at intel.com
            Summary|Mesa 10.6.3 linker is slow  |Mesa linker is slow

--- Comment #30 from Eero Tamminen <eero.t.tamminen at intel.com> ---
"perf" profiling info for today's Mesa version:
-------------------------------------------------------------------
$ sudo apt-get install linux-tools-common
$ perf record -- frag_runner frag_shader=obsfucated_shader.frag ...
$ perf report -n --sort comm,dso
Overhead       Samples  Command      Shared Object
  95,18%        218725  frag_runner  i965_dri.so
   4,59%         10964  frag_runner  libc-2.23.so
...
$ perf report
#       Object        Symbol                              
30.99%  i965_dri.so   fs_visitor::virtual_grf_interferes                       
28.39%  i965_dri.so   ra_allocate                                              
15.80%  i965_dri.so   fs_visitor::assign_regs                                  
 4.47%  i965_dri.so   ra_add_node_adjacency                                    
 1.64%  i965_dri.so   (anonymous namespace)::ir_copy_propagation_visitor::visit
 1.54%  i965_dri.so   decrement_q.isra.2                                       
 1.37%  i965_dri.so   (anonymous namespace)::ir_copy_propagation_visitor::kill 
 1.24%  i965_dri.so   ra_add_node_interference                                 
 1.22%  i965_dri.so   brw::fs_live_variables::compute_start_end                
 1.11%  libc-2.23.so  _int_malloc                                              
 0.94%  i965_dri.so   (anonymous
namespace)::ir_copy_propagation_elements_visitor::kill
 0.75%  libc-2.23.so  _int_free                                                
 0.69%  libc-2.23.so  __libc_calloc                                            
 0.63%  i965_dri.so   (anonymous
namespace)::ir_copy_propagation_elements_visitor::handle_rvalue
 0.60%  libc-2.23.so  realloc                                                  
 0.45%  i965_dri.so   brw::fs_live_variables::setup_def_use                    
 0.42%  libc-2.23.so  _int_realloc                                             
 0.40%  i965_dri.so   fs_visitor::choose_spill_reg                             
 0.36%  i965_dri.so   ra_get_best_spill_node                                   
 0.35%  i965_dri.so   unsafe_free                                              
 0.34%  i965_dri.so   ir_expression::accept                                    
 0.32%  libc-2.23.so  __memset_sse2                                            
 0.27%  libc-2.23.so  malloc_consolidate                                       
 0.26%  i965_dri.so   match_value                                              
 0.25%  libc-2.23.so  __memcpy_sse2                                            
 0.25%  i965_dri.so   fs_visitor::calculate_payload_ranges                     
 0.25%  i965_dri.so   get_used_mrfs                                            
 0.24%  i965_dri.so   backend_reg::in_range                                    
 0.24%  i965_dri.so   brw::fs_live_variables::compute_live_variables           
 0.22%  i965_dri.so   visit_list_elements                                      
 0.20%  i965_dri.so   fs_visitor::spill_reg                                    
 0.13%  libc-2.23.so  __memset_avx2                                            
 0.12%  i965_dri.so   reralloc_array_size
-------------------------------------------------------------------

When using perf report interactively, one can see that:
* virtual_grf_interferes() cost goes to the couple if checks in that function,
so it just gets called too much.  Calling is done by assign_regs():
  -----------------
   for (unsigned i = 0; i < this->alloc.count; i++) {
      ...
      for (unsigned j = 0; j < i; j++) {
         if (virtual_grf_interferes(i, j)) {
            ra_add_node_interference(g, i, j);
         }
      }
  -----------------
* All the assign_regs() cost goes to running above internal loop.
* ra_allocate() cost goes mainly to inlined graph node loop from ra_simplify(),
and slightly to inlined check from pq_test().

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160829/97d08834/attachment.html>