<div dir="ltr"><div>16-bit varyings only make sense if they are packed, i.e. we need to fit 2 16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that, AMD (and most others?) won't benefit from 16-bit IO much.</div><div><br></div><div>16-bit uniforms would help everybody, because there is potential for uniform packing, saving memory (and cache lines).<br></div><div><br></div><div>The other items are just for eliminating conversion instructions. We must have more vectorized 16-bit vec2 instructions than "conversion instructions + vec2 packing instructions" for mediump to pay off. We also don't get decreased register usage if we are not vectorized, so mediump is a tough sell at the moment.<br></div><div><br></div><div>Marek<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, May 4, 2020 at 7:03 PM Rob Clark <<a href="mailto:robdclark@gmail.com">robdclark@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, May 4, 2020 at 11:44 AM Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>> wrote:<br> ><br> > Hi,<br> ><br> > This is the status of mediump support in Mesa. What I listed is what AMD GPUs can do. "Yes" means what Mesa supports.<br> ><br> > Feature FP16 support Int16 support<br> > ALU Yes No<br> > Uniforms No No<br> > VS in No No<br> > VS out / FS in No No<br> > FS out No No<br> > TCS, TES, GS out / in No No<br> > Sampler coordinates (only coord, derivs, lod, bias; not offset and compare) No ---<br> > Image coordinates --- No<br> > Return value from samplers (incl. sampler buffers) Yes<br> > No<br> > Return value from image loads (incl. image buffers) No No<br> > Data source for image stores (incl. image buffers) No No<br> > If 16-bit sampler/image instructions are surrounded by conversions, promote them to 32 bits No No<br> ><br> > Please let me know if you don't see the table correctly.<br> ><br> > I'd like to know if I can enable some of them using the existing FP16 CAP. The only drivers supporting FP16 are currently Freedreno and Panfrost.<br> ><br> <br> I think in general it should be ok.<br> <br> I think for ir3 we want 32b inputs/outputs for geom stages<br> (vs/hs/ds/gs). For frag outs we use nir_lower_mediump_outputs.. maybe<br> this is a good approach to continue, to use a simple nir lowering pass<br> for cases where a shader stage can directly take 16b input/output.<br> For frag inputs we fold the narrowing conversion in to the varying<br> fetch instruction in backend.<br> <br> int16 would be pretty useful, for loop counters especially.. these can<br> have a long live-range and currently wastefully occupy a full 32b reg.<br> <br> Uniforms we haven't cared too much about, since we can (usually) read<br> a 32b uniform as a 16b and fold that directly into alu instructions..<br> we handle that in the backend.<br> <br> Pushing mediump support further would be great, and we can definitely<br> help if it ends up needing changes in freedreno backend. The deqp<br> coverage in CI should give us pretty good confidence about whether or<br> not we are breaking things in the ir3 backend.<br> <br> BR,<br> -R<br> </blockquote></div></div>