<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Jan 8, 2019, 7:55 PM Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu">imirkin@alum.mit.edu</a> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, Jan 8, 2019 at 7:26 PM Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank" rel="noreferrer">maraeo@gmail.com</a>> wrote:<br> ><br> > On Tue, Jan 8, 2019 at 7:18 PM Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu" target="_blank" rel="noreferrer">imirkin@alum.mit.edu</a>> wrote:<br> >><br> >> On Tue, Jan 8, 2019 at 6:21 PM Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank" rel="noreferrer">maraeo@gmail.com</a>> wrote:<br> >> ><br> >> > On Tue, Jan 8, 2019 at 5:25 PM Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu" target="_blank" rel="noreferrer">imirkin@alum.mit.edu</a>> wrote:<br> >> >><br> >> >> Why does this need to be in p_state? And who is responsible for<br> >> >> setting it (and how will it be set)?<br> >> ><br> >> ><br> >> > Oh right, there is a way to get it out of p_state.h if needed.<br> >> ><br> >> > It should be set to 0 by default.<br> >> ><br> >> > If your thread block is 8x8x1, but you need to launch 10x8x1 threads, set partial_block = {2, 0, 0}. It will launch the following thread blocks:<br> >> > 8x8x1<br> >> > 2x8x1<br> >> ><br> >> > It's the same as launching 16x8x1 threads and doing this at the beginning of the compute shader:<br> >> > if (globalThreadID.x >= 10) return;<br> >><br> >> But that all sounds like something a state tracker wouldn't care<br> >> about, right? In e.g. GLSL you can specify the block to be 10x8x1 and<br> >> let the backend work it all out. Should st/mesa care about this (or<br> >> clover or whatever)?<br> ><br> ><br> > The block size should be a multiple of 64 on radeonsi to utilize all SIMD lanes. If you want to launch 8192+1 threads with the block size of 64, you need to launch 1 partial block with the block size of 1 at the end. OpenGL can't do this.<br> <br> Ohhhhhhhhhh. So the partial-ness applies to the last-executed block.<br> If you have a local_size=(2,2,1), and you want your global grid to be,<br> say, (5,4,1), with unextended GL you might run it as groups = (3,2,1)<br> which would end up invoking a bunch of bits you don't want, and so<br> this partial_size is a way to say that you don't want the last "line"<br> to be executed at all.<br> <br> That makes sense, and seems like a reasonable thing to have in<br> pipe_grid_info. The documentation did not make that clear the first<br> time I read it, but now I'm having trouble suggesting improvements to<br> it. So I think it's fine.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Yep. The partialness adds additional blocks to the grid with disabled threads (lanes). I might rename it to grid_padding[3]. Or I might keep the whole thing private in radeonsi.</div><div dir="auto"><br></div><div dir="auto">It's useful for e.g. compute-based image blits when the blit box is not aligned to the block size.</div><div dir="auto"><br></div><div dir="auto">Marek</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The p_state.h bits are Acked-by: Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu" target="_blank" rel="noreferrer">imirkin@alum.mit.edu</a>> .<br> Can't speak as to the radeonsi bits.<br> <br> Cheers,<br> <br> -ilia<br> </blockquote></div></div></div>