<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Jan 8, 2019 at 7:18 PM Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu">imirkin@alum.mit.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, Jan 8, 2019 at 6:21 PM Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank">maraeo@gmail.com</a>> wrote:<br>
><br>
> On Tue, Jan 8, 2019 at 5:25 PM Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu" target="_blank">imirkin@alum.mit.edu</a>> wrote:<br>
>><br>
>> Why does this need to be in p_state? And who is responsible for<br>
>> setting it (and how will it be set)?<br>
><br>
><br>
> Oh right, there is a way to get it out of p_state.h if needed.<br>
><br>
> It should be set to 0 by default.<br>
><br>
> If your thread block is 8x8x1, but you need to launch 10x8x1 threads, set partial_block = {2, 0, 0}. It will launch the following thread blocks:<br>
> 8x8x1<br>
> 2x8x1<br>
><br>
> It's the same as launching 16x8x1 threads and doing this at the beginning of the compute shader:<br>
> if (globalThreadID.x >= 10) return;<br>
<br>
But that all sounds like something a state tracker wouldn't care<br>
about, right? In e.g. GLSL you can specify the block to be 10x8x1 and<br>
let the backend work it all out. Should st/mesa care about this (or<br>
clover or whatever)?<br></blockquote><div><br></div>The block size should be a multiple of 64 on radeonsi to utilize all SIMD lanes. If you want to launch 8192+1 threads with the block size of 64, you need to launch 1 partial block with the block size of 1 at the end. OpenGL can't do this.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">Marek<br></div></div>