[Mesa-dev] [PATCH 28/29] i965: Drop random 32-bit assembly implementation of memcpy().

Mon Sep 30 18:43:22 PDT 2013

On 09/30/2013 05:47 PM, Roland Mainz wrote:
> On Tue, Oct 1, 2013 at 2:27 AM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 09/27/2013 04:46 PM, Kenneth Graunke wrote:
>>> This was only used for uploading batchbuffer data, and only on 32-bit
>>> systems.  If this is actually useful, we might want to use it more
>>> widely.  But more than likely, it isn't.
>>
>> This probably is still useful, alas.  The glibc memcpy wants to do an
>> Atom-friendly backwards walk of the addresses.
> 
> Erm... just curious: Are you sure this is done for Atom ? Originally
> such copy-from-highest-to-lowest-address copying is (should be: "was")
> done to enable overlapping copies... but at least POSIX mandates that
> |memcpy()| is not required to support overlapping copies and users
> should use |memmove()| instead in such cases (for example Solaris uses
> the POSIX interpretation in this case... and AFAIK Apple OSX even hits
> you with an |abort()| if you attempt an overlapping copy with
> |memcpy()| (or |strcpy()|) (and AFAIK "valgrind" will complain about
> such abuse of |memcpy()|/|strcpy()|/|stpcpy()|, too)).

I was pretty sure it was Atom... though looking at the glibc source, the
backward memcpy is only used on Core i3, i5, and i7 unless
USE_AS_MEMMOVE is defined.  Hmm...

>> For some kinds of
>> mappings (uncached?), this breaks write combining and ruins performance.
> 
> That more or less breaks performance _everywhere_ because automatic
> prefetch obtains the next cache line and not the previous one.

Except your out-of-order CPU is really smart, and, IIRC, that makes it
usually not break.  I think.

> ----
> 
> Bye,
> Roland