[Mesa-dev] [PATCH 2/2] clover: fix getting struct args api size
Jan Vesely
jan.vesely at rutgers.edu
Tue Jun 28 15:30:38 UTC 2016
On Thu, 2016-06-23 at 18:03 -0700, Francisco Jerez wrote:
> Jan Vesely <jan.vesely at rutgers.edu> writes:
>
> > On Wed, 2016-06-22 at 20:22 -0700, Francisco Jerez wrote:
> > > Jan Vesely <jan.vesely at rutgers.edu> writes:
> > >
> > > > On Wed, 2016-06-22 at 17:07 -0700, Francisco Jerez wrote:
> > > > > Jan Vesely <jan.vesely at rutgers.edu> writes:
> > > > >
> > > > > > On Mon, 2016-06-13 at 17:24 -0700, Francisco Jerez wrote:
> > > > > > > Serge Martin <edb+mesa at sigluy.net> writes:
> > > > > > >
> > > > > > > > This fix getting the size of a struct arg. vec3 types
> > > > > > > > still
> > > > > > > > work
> > > > > > > > ok.
> > > > > > > > Only buit-in args need to have power of two alignment,
> > > > > > > > getTypeAllocSize
> > > > > > > > reports the correct size.
> > > > > > > > ---
> > > > > > > > src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > | 3
> > > > > > > > ++-
> > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git
> > > > > > > > a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > index 03487d6..9af51539 100644
> > > > > > > > ---
> > > > > > > > a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > +++
> > > > > > > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > > > > > > @@ -472,7 +472,8 @@ namespace {
> > > > > > > > // aligned to the next larger power of
> > > > > > > > two". We
> > > > > > > > need
> > > > > > > > this
> > > > > > > > // alignment for three element vectors, which
> > > > > > > > have
> > > > > > > > // non-power-of-2 store size.
> > > > > > > > - const unsigned arg_api_size =
> > > > > > > > util_next_power_of_two(arg_store_size);
> > > > > > > > + const unsigned arg_api_size = arg_type-
> > > > > > > > > isStructTy()
> > > > > > > > ?
> > > > > > > > + arg_store_size :
> > > > > > > > util_next_power_of_two(arg_store_size);
> > > > > > > >
> > > > > > > Hm... Isn't this still going to be broken if you pass a
> > > > > > > struct
> > > > > > > argument
> > > > > > > to a kernel function and the alignment of any of the
> > > > > > > struct
> > > > > > > members
> > > > > > > doesn't match the target-specific data layout? Not sure
> > > > > > > we
> > > > > > > can
> > > > > > > fix
> > > > > > > this
> > > > > > > sensibly without requiring the target's data layout to
> > > > > > > match
> > > > > > > the
> > > > > > > CL
> > > > > > > API
> > > > > > > exactly. Any suggestions Tom?
> > > > > >
> > > > > > according to 6.7.2.1 compilers can arbitrarily insert
> > > > > > padding
> > > > > > between
> > > > > > struct members (except at the beginning).
> > > > >
> > > > > What spec version are you looking at? My CL spec doesn't
> > > > > have
> > > > > any
> > > > > section labeled 6.7.2.1.
> > > >
> > > > c99 specs, I did not find anything specific for CLC (it might
> > > > be
> > > > that I
> > > > just need to look harder). CLC 2.0 adds additional constraint
> > > > that
> > > > you
> > > > can't use address space qualifiers.
> > > >
> > >
> > > I'd expect that whatever the CL spec says regarding the memory
> > > layout
> > > of
> > > CLC types (e.g. section 6.1.5 which specifies the usual alignment
> > > rules
> > > for CL types and section 6.11.1 and 6.11.3 which specify various
> > > variable and type declaration attributes giving finer control
> > > over
> > > the
> > > alignment of variable and struct member declarations) fully
> > > overrides
> > > the C99 spec.
> >
> > Right, even if we consider that none of the C99 6.7.2.1 apply (and
> > at
> > least CL2.0 6.5.6 does not make it sound so), it only gives us one
> > side, we can check that the CLC struct layout follows what we would
> > expect. We don't have means to check and enforce that the host side
> > struct layout is compatible.
> >
>
> Yes, exactly, the CL spec doesn't have anything to say about the
> host-side memory layout, that's up to the host platform's ABI to
> define.
>
> > >
> > > > >
> > > > > > Even if size/alignment of individual members match CL API
> > > > > > exactly,
> > > > > > there's no guarantee that the structure layout/size will be
> > > > > > the
> > > > > > same.
> > > > > >
> > > > > How can you exchange structured data with a CL kernel then,
> > > > > assuming
> > > > > that the layout of structure types in memory is fully
> > > > > unspecified
> > > > > as
> > > > > you
> > > > > say?
> > > >
> > > > that is my point. My understanding is that it relies on a
> > > > silent
> > > > assumption that both CLC and the host compiler will create the
> > > > same
> > > > structure layout given the same structure elements.
> > > >
> > > > big endian host can create:
> > > > struct foo {
> > > > cl_int a;
> > > > // 16 bit padding;
> > > > cl_short b;
> > > > cl_int c;
> > > > };
> > >
> > > I don't think this is a valid representation of the structure
> > > according
> > > to CL rules, my understanding is if your host happens to lay out
> > > structure fields in this way you have to either marshal things
> > > manually
> > > or use compiler-specific attributes to get the host compiler to
> > > put
> > > things at the right location (according to CL API rules). I
> > > believe
> > > that Khronos' cl_platform.h specifies alignment attributes in the
> > > cl_*
> > > host-side typedefs specifically for this purpose.
> >
> > maybe I'm missing something, but what rule does it break? shorts
> > are by
> > default 2 byte aligned, ints are 4 byte aligned, they cannot be
> > reordered, so there needs to be a 2 byte hole somewhere. is there a
> > rule that requires members to be at the nearest aligned location?
>
> That's how it works on most CPU ABIs I'm aware of, you take the first
> available offset compatible with the alignment requirements of the
> element datatype. I'm not sure if the CL spec defines the memory
> layout
> of structure types to that level of detail, but I would assume that's
> by
> omission rather than it being intentionally unspecified.
>
> > At least on the host side the compiler can insert stuff like '60
> > bytes
> > padding to avoid false sharing'.
>
> That should also be a valid transformation in a (device-side) CL
> program
> if the compiler can prove that the memory layout of a struct type is
> never visible by the host, so none of the (more or less explicitly
> specified) CL ABI applies. The same goes for a host-side C compiler,
> it
> can lay out the program's data structures any way it likes as long as
> the externally visible C ABI of the target platform is preserved.
>
> > If the application programmer needs to use specific attributes to
> > get
> > compatible layout we have no way to check that, other than checking
> > the provided size.
> >
> > what I'm trying to says is; does it make sense to have stricter
> > checks
> > on device side if we can't enforce them on the host side?
> >
> We can't but the application can as long as any binary data exchanged
> through the CL API follows some well-specified and device-independent
> memory layout. It's the application's responsibility to make sure
> that
> any binary blobs passed to the CL API comply with the CL data layout:
> one (somewhat non-portable) possibility is to rely on the host C
> compiler to lay out your data in the same way the CL API expects by
> using compiler-specific type annotations, but the CL implementation
> cannot give you any guarantees that it will be possible to achieve
> that
> on all platforms, for maximum portability the application can always
> marshal any data exchanged with CL kernels by hand.
makes sense, thanks for detailed reply. A lot of the stuff falls into
"implicit assumptions" category for me. I knew that shared data needed
to have rules for interoperability, but I expected they might be more
explicit (similar to how endianness of pointers is handled)
thanks again,
Jan
>
> > Jan
> >
> > >
> > > > while little endian device could create:
> > > > struct foo {
> > > > int a;
> > > > short b;
> > > > // 16 bit padding
> > > > int c;
> > > > };
> > > >
> > > > If cl_short/short alignment is 2bytes, the above structures and
> > > > all
> > > > the
> > > > members have the same size/alignment, yet are not compatible.
> > > >
> > > > Am I missing something that would prevent the above?
> > > >
> > > > Jan
> > > >
> > > > >
> > > > > > Jan
> > > > > >
> > > > > > >
> > > > > > > > llvm::Type *target_type = arg_type-
> > > > > > > > >isIntegerTy()
> > > > > > > > ?
> > > > > > > > TD.getSmallestLegalIntType(mod-
> > > > > > > > > getContext(),
> > > > > > > > arg_store_size * 8)
> > > > > > > > --
> > > > > > > > 2.5.5
> > > > > > --
> > > > > >
> > > > > > Jan Vesely <jan.vesely at rutgers.edu>
> > > > --
> > > >
> > > > Jan Vesely <jan.vesely at rutgers.edu>
> > --
> >
> > Jan Vesely <jan.vesely at rutgers.edu>
--
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160628/f940769f/attachment.sig>
More information about the mesa-dev
mailing list