<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<style>
<!--
 /* Font Definitions */
 @font-face
        {font-family:SimSun;
        panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
        {font-family:SimSun;
        panose-1:2 1 6 0 3 1 1 1 1 1;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        text-align:justify;
        text-justify:inter-ideograph;
        font-size:10.5pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:Arial;
        color:windowtext;}
 /* Page Definitions */
 @page Section1
        {size:595.3pt 841.9pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;
        layout-grid:15.6pt;}
div.Section1
        {page:Section1;}
-->
</style>

</head>

<body lang=ZH-CN link=blue vlink=purple style='text-justify-trim:punctuation'>

<div class=Section1 style='layout-grid:15.6pt'>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>Hi All<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>I find movnti + mfence is better than clflush as below
report shows (on core2 platform)<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'><o:p> </o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>Size(byte)
   movnti(us)   clflush (us)  speedup<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>4k
            3.01
          
3.56        1.182 <o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>16k
          12.01
       
14.23        1.184<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>32k
          23.93
       
28.45        1.188 <o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>64k
           47.92
      
56.89        1.187<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>The code for two cases
(only care about alignment):<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'><o:p> </o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>  Movnti +
mfence                                         
clflush<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>For (i = 0; i < size; i
= i+ 64)
{                                
For (i = 0; i < size; i = i + 64)<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>  
__asm__(“movq (addr + i),
%rax);                             
clflush(addr + i);<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>  __asm__(“movntiq
%rax,   (addr + i);<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>}<o:p></o:p></span></font></p>

<p class=MsoNormal style='layout-grid-mode:char'><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>_<i><span
style='font-style:italic'>-asm</span></i>__ (“mfence”)<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'><o:p> </o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>Movnti will invalidate cache line before writing data
into write combine buffer, at last we may use mfence to<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>drain out the left data in write combine buffer, and
behavior looks like clflush.<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>The approach is only fit for small page, when size is
bigger than about 128k(on my platform),<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>movnti + mfence approach get worse because read
instruction.<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'><o:p> </o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>If </span></font><font size=1 face=Arial><span
lang=EN-US style='font-size:9.0pt;font-family:Arial'>theory</span></font><font
size=1 face=Arial><span lang=EN-US style='font-size:9.0pt;font-family:Arial'> is
right, we can get benefit from many flush operation in gem.<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'><o:p> </o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>Thanks<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'>Ma Ling<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'><o:p> </o:p></span></font></p>

<p class=MsoNormal><font size=1 face=Arial><span lang=EN-US style='font-size:
9.0pt;font-family:Arial'><o:p> </o:p></span></font></p>

</div>

</body>

</html>