Mesa (master): 32 new commits

Fri Aug 26 02:12:43 UTC 2016

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=da85b5a9f1b22a8f6cae1a3b335dc5f31011bcb1
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Fri Jul 22 15:52:49 2016 -0700

    i965: Expose shader framebuffer fetch extensions on Gen9+.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=4135fc22ff735a40c36fcf051c1735fe23d154f2
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Aug 18 22:12:37 2016 -0700

    i965/fs: Hook up coherent framebuffer reads to the NIR front-end.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=be12a1f36efcdd4628f199d4e11b01cc06787e8a
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 16:56:05 2016 -0700

    i965/fs: Remove special casing of framebuffer writes in scheduler code.
    
    The reason why it was safe for the scheduler to ignore the side
    effects of framebuffer write instructions was that its side effects
    couldn't have had any influence on any other instruction in the
    program, because we weren't doing framebuffer reads, and framebuffer
    writes were always non-overlapping.  We need actual memory dependency
    analysis in order to determine whether a side-effectful instruction
    can be reordered with respect to other instructions in the program.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=3daa0fae4b39a271f50f473edbe44712b6c8f040
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Wed Jul 6 20:49:58 2016 -0700

    i965/fs: Don't CSE render target messages with different target index.
    
    We weren't checking the fs_inst::target field when comparing whether
    two instructions are equal.  For FB writes it doesn't matter because
    they aren't CSE-able anyway, but this would have become a problem with
    FB reads which are expression-like instructions.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=db123df74773f458e573a9c034ee783570a3ed0f
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 16:55:45 2016 -0700

    i965/fs: Define logical framebuffer read opcode and lower it to physical reads.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=f2f75b0cf05d2519d618c71b19d2187b8ed0d545
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 16:52:33 2016 -0700

    i965/fs: Define framebuffer read virtual opcode.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=71d639f69ee868fbeadd0a1b8bbdd76e17398b43
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Tue Jul 19 11:52:23 2016 -0700

    i965/disasm: Fix RC message type strings on Gen7+.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=26ac16fe2f73507041062f63646286dea60053da
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 19:13:55 2016 -0700

    i965/eu: Add codegen support for the Gen9+ render target read message.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=29eb8059fd7906d2595ea99bc65a27691b9fbe53
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 18:49:36 2016 -0700

    i965/eu: Take into account the target cache argument in brw_set_dp_read_message.
    
    brw_set_dp_read_message() was setting the data cache as send message
    SFID on Gen7+ hardware, ignoring the target cache specified by the
    caller.  Some of the callers were passing a bogus target cache value
    as argument relying on brw_set_dp_read_message not to take it into
    account.  Fix them too.
    
    Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=8a2f19a7772c80fcac85d6bdfa8e588d6cea1beb
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Tue Jul 19 15:23:30 2016 -0700

    i965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 hardware.
    
    This is not enabled on the original Gen4 part because it lacks surface
    state tile offsets so it may not be possible to sample from arbitrary
    non-zero layers of the framebuffer depending on the miptree layout (it
    should be possible to work around this by allocating a scratch surface
    and doing the same hack currently used for render targets, but meh...).
    
    On Gen9+ even though it should mostly work (feel free to force-enable
    it in order to compare the coherent and non-coherent paths in terms of
    performance), there are some corner cases like 1D array layered
    framebuffers that cannot be handled easily by the non-coherent path
    because of the incompatible layout in memory of 1D and 2D miptrees (it
    should be possible to work around this too by doing state-dependent
    recompiles, but it's hard to care enough since Gen9 has native support
    for coherent render target reads...)
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=ecc4800383fb67cd274154469d933c6050782208
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Fri Jul 1 13:54:05 2016 -0700

    i965: Implement glBlendBarrier.
    
    This is a no-op if the platform supports coherent framebuffer fetch,
    -- If it doesn't we just need to flush the render cache and invalidate
    the texture cache in order for previous rendering to be visible to
    framebuffer fetch.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=786108e7b27e4728353d69ff60aa046987859d8e
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Fri Jul 1 13:56:47 2016 -0700

    i965: Upload surface state for non-coherent framebuffer fetch.
    
    This iterates over the list of attached render buffers and binds
    appropriate surface state structures to the binding table block
    allocated for shader framebuffer read.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=dc96968dbf7b359a24a991def16e382379f4b11a
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 22:23:13 2016 -0700

    i965: Implement support for overriding the texture target in brw_emit_surface_state.
    
    This allows the caller to bind a miptree using a texture target other
    than the one it it was created with.  The code should work even if the
    memory layouts of the specified and original targets don't match, as
    long as the caller only intends to access a single slice of the
    miptree structure.
    
    This will be exploited by the next commit in order to support
    non-coherent framebuffer fetch of a single layer of a 3D texture
    (since some generations lack the minimum array element control for 3D
    textures bound to the sampler unit), and multiple layers of a 1D array
    texture (since binding it as an actual 1D array texture would require
    state-dependent recompiles because the same shader couldn't
    simultaneously work for 1D and 2D array textures due to the different
    texel fetch coordinate ordering).
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=49ea2bd17500cbe3cc5f39b59162eaae1278167d
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Aug 18 22:08:10 2016 -0700

    i965: Massage argument list of brw_emit_surface_state().
    
    This commit does three different things in a single pass in order to
    keep the amount of churn low: Remove the for_gather boolean argument
    which was unused, pass the isl_view argument by value rather than by
    reference since I'll have to modify it from within the function, and
    add a target argument to allow callers to bind textures using a target
    other than the original.  The prototype of the function now looks
    like:
    
     void brw_emit_surface_state(struct brw_context *brw,
                                 struct intel_mipmap_tree *mt,
                                 GLenum target, struct isl_view view,
                                 uint32_t mocs, uint32_t *surf_offset, int surf_index,
                                 unsigned read_domains, unsigned write_domains);
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=74e4baec59a5697ec2511733f16421bfd32f4145
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Mon Jul 18 18:06:02 2016 -0700

    i965: Add missing has_surface_tile_offset flag to the Gen8+ device info structures.
    
    This surface state control has been supported by all hardware
    generations since G45.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=0fe732e66f10f526b9187c4d11f134282f5209c8
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 22:09:46 2016 -0700

    i965: Return the correct layout from get_isl_dim_layout for pre-ILK cube textures.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=5759eb458b6bbc85011d4f139d90018bdf6124c0
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Mon Jul 18 18:07:35 2016 -0700

    i965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.
    
    The logic to calculate the right layout and dimensionality for a given
    GL texture target is going to be useful elsewhere, factor it out from
    intel_miptree_get_isl_surf().
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=99fb167839c8c9888f8de78e3b96de23f92a1012
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Fri Jul 1 13:45:22 2016 -0700

    i965: Resolve color for non-coherent FB fetch at UpdateState time.
    
    This is required because the sampler unit used to fetch from the
    framebuffer is unable to interpret non-color-compressed fast-cleared
    single-sample texture data.  Roughly the same limitation applies for
    surfaces bound to texture or image units, but unlike texture sampling,
    non-coherent framebuffer fetch is by definition non-coherent with
    previous rendering, so the brw_render_cache_set_check_flush() call can
    be omitted except after resolve.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=071665c16191e3738f4ee173398da45c008e005a
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Fri Jul 22 18:16:45 2016 -0700

    i965: Return whether the miptree was resolved from intel_miptree_resolve_color().
    
    This will allow optimizing out the cache flush in some cases when
    resolving wasn't necessary.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=f24e393bd5caee85994b00b93f141e6c4b99e273
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 21:57:00 2016 -0700

    i965/fs: Translate nir_intrinsic_load_output on a fragment output.
    
    This gets the non-coherent framebuffer fetch path hooked up to the NIR
    front-end.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=b00a236d6a6212323f77248ba923c65eeb02592b
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 21:47:45 2016 -0700

    i965/fs: Allocate fragment output temporaries on demand.
    
    This gets rid of the duplication of logic between nir_setup_outputs()
    and get_frag_output() by allocating fragment output temporaries lazily
    whenever get_frag_output() is called.  This makes nir_setup_outputs()
    a no-op for the fragment shader stage.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=7dac8820730777756c00d7024330517848dc3b9f
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 21:26:20 2016 -0700

    i965/fs: Rework representation of fragment output locations in NIR.
    
    The problem with the current approach is that driver output locations
    are represented as a linear offset within the nir_outputs array, which
    makes it rather difficult for the back-end to figure out what color
    output and index some nir_intrinsic_load/store_output was meant for,
    because the offset of a given output within the nir_output array is
    dependent on the type and size of all previously allocated outputs.
    Instead this defines the driver location of an output to be the pair
    formed by its GLSL-assigned location and index (I've borrowed the
    bitfield macros from brw_defines.h in order to represent the pair of
    integers as a single scalar value that can be assigned to
    nir_variable_data::driver_location).  nir_assign_var_locations is no
    longer useful for fragment outputs.
    
    Because fragment outputs are now allocated independently rather than
    within the nir_outputs array, the get_frag_output() helper becomes
    necessary in order to obtain the right temporary register for a given
    location-index pair.
    
    The type_size helper passed to nir_lower_io is now type_size_dvec4
    rather than type_size_vec4_times_4 so that output array offsets are
    provided in terms of whole array elements rather than in terms of
    scalar components (dvec4 is the largest vector type supported by the
    GLSL so this will cause all individual fragment outputs to have a size
    of one regardless of the type).
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=4e990b67cef9a90f362e5a3791234ef779f47bea
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 21:58:56 2016 -0700

    i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.
    
    Most likely we had only ever used this macro on bitfields of less than
    31 bits -- That's going to change shortly.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=f3cb2c34f29d35088879a6b8101c3ac648e0febf
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 21:25:46 2016 -0700

    i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.
    
    I'm about to change how fragment shader output locations are
    represented, so the generic nir_intrinsic_store_output implementation
    that assumes that outputs are just contiguous elements in the big
    nir_outputs array won't work anymore.  This somewhat simplified
    implementation of nir_intrinsic_store_output for fragment shaders
    should be functionally equivalent to the current fall-back one.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=af0cc743e607293146861518bb6ef96f411aeca9
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 20:25:28 2016 -0700

    i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.
    
    v2: Memoize sample ID, misc codestyle changes. (Ken)
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=fe6abb5755e0368c993e6f7cf25a0712ee6503a9
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 20:35:29 2016 -0700

    i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.
    
    This will be required for the next commit since the non-coherent path
    makes use of the fragment coordinates implicitly, so they need to be
    calculated.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=98d61ee083de57da6b97c9fcf67003f56f5f5a6b
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 16:20:07 2016 -0700

    i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.
    
    The result of a framebuffer fetch from a multisample FBO is inherently
    per-sample, so the spec requires at least those sections of the shader
    that depend on the framebuffer fetch result to be executed once per
    sample.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=08705badfe136e1782e10472104323d861185357
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Fri Jul 1 13:46:40 2016 -0700

    i965: Allocate space in the binding table for non-coherent FB fetch.
    
    Unfortunately due to the inconsistent meaning of some surface state
    structure fields, we cannot re-use the same binding table entries for
    sampling from and rendering into the same set of render buffers, so we
    need to allocate a separate binding table block specifically for
    render target reads if the non-coherent path is in use.
    
    The slight noise is due to the change of
    brw_assign_common_binding_table_offsets to return the next available
    binding table index rather than void.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=40b23ad57e8da0fd7af21e81ad52d615f9b492ed
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 20:32:12 2016 -0700

    i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.
    
    Some of the following changes in this series are specific to the
    non-coherent path, so I need some way to tell whether the coherent or
    non-coherent path is in use.  The flag defaults to the value of the
    gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be
    overridden easily on hardware that supports both framebuffer fetch
    extensions in order to test the non-coherent path, like:
    
     MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch
    
    (Of course trying to force-enable the coherent framebuffer fetch
    extension on hardware without native support won't work and lead to
    assertion failures).
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=4a87e4ade778e56d43333c65a58752b15a00ce69
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Thu Jul 21 12:46:04 2016 -0700

    i965/fs: Get rid of fs_visitor::do_dual_src.
    
    This boolean flag was being used for two different things:
    
     - To set the brw_wm_prog_data::dual_src_blend flag.  Instead we can
       just set it based on whether the dual_src_output register is valid,
       which will be the case if the shader writes the secondary blending
       color.
    
     - To decide whether to call emit_single_fb_write() once, or in a loop
       that would iterate only once, which seems pretty useless.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=aee3d8f0d940a87dba7eae86c9462a3cb3a7d702
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Tue Jul 19 20:35:26 2016 -0700

    nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.
    
    This requires emitting a series of copies at the top of the program
    from each output variable to the corresponding temporary.  The initial
    copy can be skipped for non-framebuffer fetch outputs whose initial
    value is undefined, and the final copy needs to be skipped for
    read-only outputs (i.e. gl_LastFragData), since it would be illegal to
    emit a store output intrinsic for it.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=97ac3eba58a7d11e171475f4a209cfdb3578b21d
Author: Francisco Jerez <currojerez at riseup.net>
Date:   Tue Jul 19 20:33:46 2016 -0700

    nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.
    
    The NIR representation of framebuffer fetch is the same as the GLSL
    IR's until interface variables are lowered away, at which point it
    will be translated to load output intrinsics.  The GLSL-to-NIR pass
    just needs to copy the bits over to the NIR program.
    
    Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>