[Mesa-dev] [PATCH 3/4] [rfc] gallivm/llvmpipe dynamic samplers support.

Mon Apr 18 23:39:46 UTC 2016

Am 18.04.2016 um 23:10 schrieb Dave Airlie:
> On 19 April 2016 at 03:40, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 18.04.2016 um 04:49 schrieb Dave Airlie:
>>> From: Dave Airlie <airlied at redhat.com>
>>>
>>> This is a first attempt at adding support for dynamic indexing
>>> of samplers to llvmpipe. This is needed for ARB_gpu_shader5 support.
>>>
>>> This uses the sampler function generator to generate functions
>>> for all samplers, then uses if statements to pick which one to call.
>>>
>>> This passes all the tests in piglit except a couple of non-uniform
>>> ones which g
>> I can't quite parse this fully ;-).
>>
> Doh I wrote half a commit msg, got distracted by query size and forgot
> to go back.
> 
>>> +
>>> +         for (i = 0; i < sampler->dynamic_state.num_samplers; i++) {
>> Indentation.
>> And I think this should really make an effort to only do this for
>> samplers which are actually dyanmically indexed, for the possible
>> range(s) (as they need to be declared as an array if I'm not mistaken).
>> I'm not entirely sure, but I believe this could even crash otherwise
>> potentially, due to trying to construct lookup functions with
>> incompatible combinations of parameters.
> 
> No unfortunately we don't declare samplers as an array, I think I should
> fix that at some point as I've come to this point a few times and would have
> really liked to have that information, at least for bounds checking.
Not having samplers as arrays is really _insane_ for this. You NEED them
for the code to make much sense (otherwise you might just construct huge
if ladders even if from your 32 samplers only 2 can be indexed
dynamically...)
That said, it's not THAT bad actually. You could (and should) sort of
"emulate" it. Because you know the lower bound from the base reg. And
for the upper bound, you can scan the sampler dcl targets (as they have
to match the base one). That way I'd be a bit more confident you don't
try to create functions which might be impossible and hit asserts or
crashes... Of course, that might very well (and likely) give you a too
pessimistic upper bound. But again, it would make more sense to have
arrays in the first place, so not requiring hacks.

>>
>>> +         struct lp_sampler_params unit_params = *params;
>>> +
>>> +         unit_params.texture_index = i;
>>> +         unit_params.sampler_index = i;
>> Of course, the need to actually override texture and sampler index is
>> why I never even tried to implement it - for d3d10 sample opcodes, it
>> looked simply unfeasible as you'd have to generate the product of
>> num_samplers/num_textures. So I figured this needed way more work...
>> Though actually thinking about this, only need to be able to index into
>> resources, not samplers, there, so it should work as well...
>> I can't really make sense out of the msdn docs though.
>> But with d3d12 you actually could have legitimate per-pixel resource
>> indices (yuck!!!) - with the usual caveat about calculated lods.
>> But I can't actually find anything that says dynamic indexing into
>> resources even works at all for d3d11 now.
> 
> Yup I pretty much ignored SAMPLE* and D3D due to lack of any clue.
> 
> Though really generating all the functions in the world probably isn't
> going to be that brutal. but it would be good to know if you can indirect
> both.
Pretty sure you can't for samplers - so iff it's possible for resources,
it would look pretty much the same (just without overriding the sampler
index).

> 
>>
>>> +         lp_build_sample_soa(&sampler->dynamic_state.static_state[i].texture_state,
>>> +                             &sampler->dynamic_state.static_state[i].sampler_state,
>>> +                             &sampler->dynamic_state.base,
>>> +                             gallivm, &unit_params);
>> And I think really all of this should move to the actual sample code.
> 
> It's kinda messier to do that, since we passing
> dynamic_state.static_state, and I don't think
> we can access it anywhere else except here.
Ah I see. I think the fact you needed to do it in both llvmpipe and draw
exactly the same is a pretty good indication it really isn't the right
place (swr certainly would need to do it too if it wanted to support that).
My guess is we could just pass the whole static_state array easily. It
is true that there's actually different definitions now
(draw_sampler_static_state, lp_sampler_static_state,
swr_sampler_static_state) but effectively they are all the same (all
copied from llvmpipe, including the comments) anyway. So could require
the drivers to use a common definition instead.
(And pass along the range information too!)

> 
>>> +      indirect_matches = LLVMBuildBitCast(builder, indirect_matches, LLVMIntTypeInContext(gallivm->context, params->type.length), "");
>>> +      indirect_matches = LLVMBuildICmp(builder, LLVMIntNE,
>>> +                                       indirect_matches, LLVMConstNull(LLVMIntTypeInContext(gallivm->context, params->type.length)), "");
>>>
>> This isn't quite right (I suppose that's what the commit message wanted
>> to say but was cut short...).
>> From what I understand, you really need to examine the exec mask (not
>> sure if the alive pixel mask is also needed) and pick your (scalar)
>> index from a element which is in the current exec mask. I guess that's
>> another parameter for the sampler params...
>> And fwiw I think doing that with a switch statement in the end instead
>> of if ladders would make far more sense.
> 
> Ah yes I expect you might be right in that and that is why the two
> tests are failing, I'll see if I can work it out.
> 
> A switch statement in the end means having to keep track of all the
> function pointers, whereas doing it with
> separate if statements makes it a lot easier to generate the code
> without having to keep copies of all the individual
> samplers to build the switch at the end. Though I suppose it might be
> possible to wrap it at a higher level.
If you move the static_state loop into the sample code this shouldn't be
much of a problem as you can just pass all information through to the
tex func generator.
That said, we don't have any helpers for constructing switches right
now. I suppose it shouldn't be too difficult to add, but I can live with
ifs if you prefer them.

> 
>>> +   if (params.sampler_is_indirect)
>>> +       params.indirect_index = get_indirect_index(bld, inst->Src[sampler_reg].Register.File, inst->Src[sampler_reg].Register.Index, &inst->Src[sampler_reg].Indirect);
>> Does this ensure that the index doesn't exceed max sampler index?
>> Hopefully yes, but if not you need to add some MIN here.
> 
> Ah I'll check that.

I actually think it should now...

Roland