xserver: Branch 'master' - 12 commits

Aaron Plattner aplattner at nvidia.com
Thu Sep 28 16:25:49 PDT 2006


On Fri, Sep 29, 2006 at 12:54:35AM +0300, Daniel Stone wrote:
> On Thu, Sep 28, 2006 at 01:33:55PM -0700, Aaron Plattner wrote:
> > [the wfb commit]
> >
> > [...]
> >
> > +#define MEMCPY_WRAPPED(dst, src, size) do {                       \
> > +    size_t _i;                                                    \
> > +    CARD8 *_dst = (CARD8*)(dst), *_src = (CARD8*)(src);           \
> > +    for(_i = 0; _i < size; _i++) {                                \
> > +        WRITE(_dst +_i, READ(_src + _i));                         \
> > +    }                                                             \
> > +} while(0)
> >
> > [...]
> >
> >   * by reading/writing aligned CARD32s where it's easy
> >   */
> > [...]
> >  	    /* Do four aligned pixels at a time */
> > [and so on, and so forth]
>
> This is evidently no longer the case.  Could you please modify this to
> do aligned accesses where possible?  Doing unaligned really hurts us on
> architectures like ARM, where the cycles expended on unaligned accesses
> are actually important, unlike i386/amd64, where it doesn't particularly
> matter much.

Note that:
 a) MEMCPY_WRAPPED expands to a simple call to memcpy in libfb, so only
    libwfb uses this loop.
 b) fb24_32.c, which contains the comment you referenced, doesn't use
    MEMCPY_WRAPPED, so it will continue to do aligned CARD32 accesses.
 c) In libwfb, each of those accesses calls through a function pointer,
    which sucks way more than aligned/unaligned accesses (I assume.  I
    don't know much about ARM).

It's theoretically possible that some hardware could have a really funny
tile pattern where doing memcpys from linear to tiled or vice-versa can't
be done in chunks of four bytes.  In practice, I don't know if that will
ever be the case.  If you really feel strongly about it, would something
like this be better?

    size_t _size = (size);
    CARD32 *_dst = (dst), *_src = (src);

    while(_size > 4) {
        WRITE(_dst++, READ(_src++));
        _size -= 4;
    }
    while(_size > 0) {
        WRITE((CARD8*)_dst + (3 - _size),
              READ((CARD8*)_src + (3 - _size)));
        _size--;
    }

-- Aaron
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg/attachments/20060928/9e437c15/attachment.pgp>


More information about the xorg mailing list