<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Regression in Mesa 17 on s390x (zSystems)"
href="https://bugs.freedesktop.org/show_bug.cgi?id=100613#c31">Comment # 31</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Regression in Mesa 17 on s390x (zSystems)"
href="https://bugs.freedesktop.org/show_bug.cgi?id=100613">bug 100613</a>
from <span class="vcard"><a class="email" href="mailto:bcrocker@redhat.com" title="Ben Crocker <bcrocker@redhat.com>"> <span class="fn">Ben Crocker</span></a>
</span></b>
<pre>(In reply to Roland Scheidegger from <a href="show_bug.cgi?id=100613#c30">comment #30</a>)
<span class="quote">> (In reply to Rob Clark from <a href="show_bug.cgi?id=100613#c26">comment #26</a>)
> > (In reply to Ben Crocker from <a href="show_bug.cgi?id=100613#c25">comment #25</a>)
> > >
> > > Regarding Ray's specific comment about getting scalar fetch to work
> > > with "sufficient twiddling," I think it's perfectly acceptable to
> > > introduce extra operations, as long as we restrict the extra
> > > operations to the big-endian path. PPC64 (LE or BE) is fast enough so
> > > that any performance impact will be negligible; S390 is less fast, but
> > > I imagine production machines with more memory than the one we
> > > experimented on here are fast enough.
> >
> > drive-by comment.. unless llvm is just rubbish at optimization, I don't
> > think saving a few operations in the front-end IR building should be that
> > important, even for LE.
> Well, yes and no. Yes, if it makes things conceptually simpler (which
> probably isn't really the case here).
> I'm not sure how good llvm is there with the ppc backend. But for x86, no
> you can't assume optimization will take care of everything neatly, in
> particular for load/shuffle combinations. If you look at it, there is in
> fact lots of hack code around gathering of values (on x86), simply because
> llvm can't do some kind of optimizations (in particular, it can't do any
> optimizations crossing scalar/vector boundaries, so if you zero-extend
> values after a scalar load or after assembling into vectors makes a large
> difference in generated code quality, or if you use a int load llvm will not
> consider using float shuffles afterwards even if it means using 3 shuffle
> instructions instead of just 1 and so on (llvm has no real model of domain
> transition penalty costs, which don't exist in these cases on most cpus),
> albeit that latter problem has been fixed with llvm 4.0).
> However, I would not expect these particular bits to be a problem on non-x86
> cpus. I think int/float issues are x86 specific (other simd instructions
> sets afaik don't tend to have different int/float load/store, shuffle or
> even logic op operations). So, going for the conceptually simplest solution
> should be alright (albeit for instance the scalar/vector "optimization
> barrier" is probably going to affect all backends).
>
> > But we have shader-db so it should be possible to
> > prove/disprove that theory. (Not sure if llvmpipe is instrumented for
> > shader-db but if not that should be easy to solve.)
> Yeah I suppose should really do that at some point...</span >
I want to emphasize at this point that the patch I described in
Comments 28-29 is compile-time conditionalized for big-endian only.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>