memcpy to AGP vs memcpy to framebuffer - which is faster?

Wed Dec 24 10:28:47 PST 2008

This is probably a very basic question, but it is important for me to know:

If I do an ordinary (non-accelerated) memcpy of a frame from system 
memory to a buffer in AGP memory (allocated via DRI), is it any faster 
than a (non-accelerated) memcpy of the exact same frame to a buffer area 
in offscreen memory in the framebuffer? Does the fact of whether the 
chipset is integrated into the mainboard (and using memory stolen from 
main RAM as video memory) affect this, as opposed to a plug-in card in 
the AGP port?

I want to know because I am evaluating whether it is worthwhile to 
implement allocation of AGP buffers via DRI in the XVideo code path of 
xf86-video-savage. The "mastered image transfer" (used to transform from 
planar YV12 to packed YUV) present in the chipset can choose between 
framebuffer memory and AGP memory as a source for the conversion. 
Currently it uses an upload from system memory to an area in offscreen 
memory, followed by the conversion. However, the upload to the 
framebuffer is the main source of delay, and measurements with mplayer 
and a 640x480 movie show that software conversion (BCIForXV=off) is 
*faster* (90 seconds) than BCI-mediated conversion (110 seconds).

-- 
perl -e '$x=2.4;print sprintf("%.0f + %.0f = %.0f\n",$x,$x,$x+$x);'