[cairo] how to propose a change?

Travis Griggs tgriggs at cincom.com
Fri Aug 31 13:23:44 PDT 2007

On Aug 31, 2007, at 0:44, Behdad Esfahbod wrote:

> On Fri, 2007-08-24 at 19:18 -0400, ionous wrote:
>> thanks for the warm welcome and all the info.
>> okay. here's my proposal.  sorry in advance for the extreme length.
> Thanks for writing this down with so much level of detail.  All the  
> time
> and energy you put into this is highly appreciated.
>> only read if you're interested in cairo arcana--
>> i'll try to whip up a patch over the next few days,
>> then let the code speak for itself.
>> the general goal of my proposal is to unify path memory usage across
>> compilers, decrease potentially wasted space, and in, what i suspect
>> are the most frequent use cases, increase the number of paths cairo
>> can store in the same space it uses now.
>> the general concept is to change the dual lists of ops and points  
>> into
>> a single unified command buffer stream.  the following goes into
>> details about how it works now, how the command stream would work,  
>> and
>> how -- in most cases -- the command stream would save memory. there's
>> a couple implementation questions towards the end.
>> how it works now:
>> cairo's paths ( cairo_path_fixed_t ) store their path operations and
>> path points in chains of fixed size buffers ( cairo_path_buf_t )  on
>> all platforms each buffer holds 512 bytes. there are always the same
>> number of points and ops in each structure, however the size of the
>> ops vary per compiler, and thus the number of items each structure  
>> can
>> hold varies. under visual c 6.1 ops take 4 bytes, while via gcc each
>> op is 1 bytes. the former leads to 2 set of 38 items ( 38 points, 38
>> ops ), the latter 2 sets of 51 items.
>> with the existing algorithm whenever a new path operation would
>> overflow either the number of available points or ops, a new buffer
>> gets allocated. there is not a one to one mapping between the number
>> of ops and the number of points used in a given path.  each operation
>> takes different numbers of points: move to and lines to add 1  
>> point, a
>> curve to adds 3, the close path op adds no points.
>> the worst real world cases, memory usage wise, are curves. a new
>> buffer is needed under visual c after 12 curve segments; under gcc:
>> 17.
>> for a buffer full of curves, visaul c produces 2 wasted points and 26
>> (38-12) wasted ops; gcc no wasted points, but 34 (51-17) wasted ops.
>> that's a total of 112 (2*4 + 26*4) and 34 (34*1) wasted bytes
>> respectively; 22% and 7% of path storage memory goes unused in the
>> full curve case.  not terrible, especially under gcc, but i think
>> there's a simple way to use almost all the allocated space for path
>> storage.
> Ok, the win32 case was clearly wasteful.  I went ahead and committed a
> fix to always use a byte for the operation.
>> command buffers
>> since paths and ops are both integer types, it would be easy to store
>> a single command stream of uniform type combing the list of ops and
>> points.  each op would take 1 command stream element, while each  
>> point
>> would take 2 elements ( 1 for x, 1 for y ).  the entire command  
>> stream
>> would have 116 elements available in the space devoted formerly to  
>> the
>> separate point and op lists.
>> if every operation that is currently added to the list is directly
>> followed by the points that use it, a curve segment would have the
>> stream: [ "CURVE", curve-point-1,cp2,cp3, "CURVE",c2p1,c2p2,c2p3 ]
>> each curve would take 7 (1+3*2) elements ( 28 bytes ) and cairo could
>> store 16 curves with no wasted space. for visual c: that's good ( 16
>> vs. 12 ), for gcc not so good ( 16 vs. 17 ).  Lines of course are  
>> even
>> worse for gcc -- reducing it to 38 points from 51 points.
> So, this is not going to save us anything.
>> command buffers + counts
>> to get towards the storage savings that cairo has under gcc, the next
>> step would split the op element into two parts: the operation key and
>> a count tracking the number of times an operation has been used
>> without interruption.  in the repeating curve segment case the  
>> command
>> stream would appear something like [ ("CURVE", X), c1p1,c1p2,c1p3,
>> c2p1,c2p2,c2p3, ... ] where X is the number of curve segments in the
>> current buffer. in this case: 38 curve segments could fit.  if the
>> next path segment were also a curve there would be just 4 bytes  
>> ( 1 op
>> + 38*3 points = 115 elements used; 116 total elements-115 used els=1
>> wasted el; 1el*4bytes/el=4 ) of wasted space. [ the bookkeeping  
>> needed
>> -- an index to the current operation would be kept at the path, not
>> buffer, level ]
>> that's great: 38 curves vs 12(or 17) - a win for all compilers.
> And this is in my opinion trading simplicity for a little hardly
> beneficial memory usage improvement.  Note that path memory is a very
> transitional kind of memory.  Each cairo context at each time doesn't
> have more than one path attached to it, and a cairo context itself  
> is a
> transitional object too.
>> two new commands
>> the downside appears again however in situations where there are lots
>> of changes in the segment type -- ex. "MOVE", "LINE", "MOVE", "LINE",
>> "MOVE", "LINE" --this scheme could only fit 19 (116/(2*(1+2))) such
>> paired segments, while currently under gcc can fit 25 (51/2).
>> the next and final step, at the cost of making some changes  
>> outside of
>> the buffer code itself, would be to introduce two new commands
>> LINE_SEGMENT and CURVE_SEGMENT. (  this makes the lists very
>> 3d-accelerator like with their line strip / line list primitives ).
>> the segment commands would represent the "MOVE", "LINE" pair as a
>> single operation with two points., rather than 2 operations with 2
>> points.  in the current buffer space cairo could store: 28
>> ((116-1) /(2*4)) line list segments per buffer.
>> this isn't quite as straight forward as the other changes as there
>> would probably have to be some work to handle detecting line strip
>> startup cases ( move-line-line-... ) cleanly ( similar to the  
>> existing
>> move / line startups ).  a nice thing about this change though is  
>> that
>> separate strips would wind up shedding their separate initial move
>> operation; the move designated solely by the start of a new strip
>> operation.
>> that said thrashing between strips and lists will still eat up memory
>> slightly faster than it used to.
>> overall gains
>> i think in the majority of cases these changes as a whole will save
>> memory for all compilers, but in between fully mixed and fully  
>> chained
>> paths, there's a lot of wiggle room.  while i suspect most use cases
>> are composed of chains of similar primitives - i don't know that for
>> sure.
>> there's some smaller possibility it might make acceleration
>> compatibility easier in some sense -- given the tighter mapping to
>> gl/d3d style primitive types -- though i don't really know how/where
>> that would actually play out.
>> how should chaining work?
>> one open question is how to chain together series of same path
>> operations that span multiple buffers.  the two main options are
>> either store the complete chain length even if it crosses buffers, or
>> to store only the per buffer length, always terminating a chain at  
>> the
>> end of a buffer.  the option still tops out at 65535 operations, and
>> so some sort of chain continuation code would be needed in either
>> case.  given that the second option likely makes more sense.  that
>> said there could be some advantages in knowing the size of chains up
>> front.... not completely sure but easy to tweak later on.
> This slips over to over-engineering I would say.  Specially if it  
> needs
> internal API change.
>> are reversals a dead code path?
>> while going over the necessary changes again this morning i see that
>> the buffer's prev pointer allows _cairo_path_fixed_interpret() to
>> traverse the op and point list both forwards and backwards.
>> while the reversal traversal actually appears unused in current code,
>> the command stream would require a minor addition to support it. the
>> operator elements would further split into: command / repeat count /
>> prev op index with no more than 255 elements stored in a single
>> chain.
>> while generally i'm a firm believer in removing unused code paths,  
>> i'd
>> definitely want to hear from people with more cairo experience than
>> myself.  if reversal is unneeded that would nicely free up space for
>> another command stream element ^o^
>> if i'm wrong or insane on any front just let me know,
>> -io.
>> http://www.ionous.net
> I like the fact that you put so much effort into it, but would rather
> see some part of code get your attention that is a real performance
> (time/memory) bottleneck, showable in some kind of real-world  
> scenario.
> As for the path storage, I'm pretty happy with how simple the current
> code is and given that the bad win32 performance is gone now, I don't
> think we need to change it for now.

I would like to see the two approaches compared from a language  
binding pov when it comes to iterating paths. The current path  
mechanism was/is very difficult to implement a binding for. I kind of  
wondered if this more "streaming" storage would make it easier to  
increment through the information. But I'd have to see the structure  
definitions in more detail to get a better feel for this.

Travis Griggs
My Other Machine runs OSX. But then... so does this one.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cairographics.org/archives/cairo/attachments/20070831/35216518/attachment-0001.htm 

More information about the cairo mailing list