<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">Hi David,<br><br>Thanks for your reply. I have tried running the code many times over but the range of the time taken remains the same. For measuring time I have also checked with clock_gettime which also suggests the same result.<br> <br>I am not sure if replacing for loop by orcc generated C function is the best way to get optimization and this is where I need some help. If that is not the case should I then go to intrinsics?<br><br>Thanks<br>Prateek Mathur<br><br>--- On <b>Wed, 1/5/11, David Schleef <i><ds@entropywave.com></i></b> wrote:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;"><br>From: David Schleef <ds@entropywave.com><br>Subject: Re: [Liboil] Fw: ORC performance for NEON<br>To: "Prateek Mathur" <hiprateek007@yahoo.co.in><br>Cc: liboil@lists.freedesktop.org<br>Date:
Wednesday, January 5, 2011, 3:05 AM<br><br><div class="plainMail">On Tue, Jan 04, 2011 at 02:28:37PM +0530, Prateek Mathur wrote:<br>> Now when I run the ORC binary with ORC_DEBUG=3 I get the statement "compiling for target "neon" " which makes me beleive that ORC is working for correct platform.<br>> But when I run both the versions the normal addition is working much better (more than 100 times better) than the ORC code.<br><br>It looks like you are running the code once. First of all, any<br>performance measurements should be done many times with an accurate<br>time counter (gettimeofday() is not). Also, the first time you<br>run an orc function, it will compile the code, which generally<br>slower than running the resulting code.<br><br><br><br>David<br><br></div></blockquote></td></tr></table><br>