[HarfBuzz] HarfBuzz API design

Wed Aug 19 12:53:19 PDT 2009

On 08/19/2009 02:45 PM, Carl Worth wrote:
> Excerpts from Behdad Esfahbod's message of Tue Aug 18 16:23:50 -0700 2009:
>> [Warning: long email ahead]
>
> My reply might bounce to unsubscribed-from lists. Feel free to forward
> if you think there's anything important in here for others to read.

Thanks Carl.

>> _reference(), _destory(), and _get_reference_count().  At some point we may
>> want to add _[gs]et_user_data() also which is useful for language bindings.
>
> That should be "destroy" not "destory", of course. I normally wouldn't
> point out a typo in an email message, but I did see the same error in
> a comment that might be copied from actual code:

Ouch.  Fixed the typo.

>> /* calls destory() when not needing user_data anymore */
>> hb_face_t *
>> hb_face_create_for_tables (hb_get_table_func_t  get_table,
>>                              hb_destroy_func_t    destroy,
>>                              void                *user_data);
>
> So don't forget to spell-check your header files. :-)

Yeah, postponed that to when I add actual docs.

Humm, seems like s/writeable/writable/g is also needed.  Stupid me.  The geek 
inside me prefers writeable though.

>> typedef enum {
>>     HB_MEMORY_MODE_DUPLICATE,
>>     HB_MEMORY_MODE_READONLY,
>>     HB_MEMORY_MODE_WRITEABLE,
>>     HB_MEMORY_MODE_READONLY_NEVER_DUPLICATE,
>>     HB_MEMORY_MODE_READONLY_MAY_MAKE_WRITEABLE,
>> } hb_memory_mode_t;
> ...
>>     DUPLICATE: copy data right away and own it.
>>
>>     READONLY: the data passed in can be kept for later use, but should not be
>> modified.  If modification is needed, the blob will duplicate the data lazily.
>>
>>     WRITEABLE: data is writeable, use it freely.
>>
>>     READONLY_NEVER_DUPLICATE: data is readonly and should never be duplicated.
>>    This disables operations needing write access to data.
>>
>>     READONLY_MAY_MAKE_WRITEABLE: data is readonly but may be made writeable
>> using mprotect() or equivalent win32 calls.  It's up to the user to make sure
>> calling mprotect() or system-specific equivalents on the data is safe.  In
>> practice, that's never an issue on Linux and (according to Tor) on win32.
>
> I don't think these names are quite right yet.

That's definitely one of the most arbitrary parts of the code, thanks for 
catching!  I do think there's a slight confusion though, which if I clarify 
(as the docs will eventually will), the values make more sense.  Look at it 
this way, the mode parameter describes the characteristics of the input data 
only.  It's not a property of the blob itself, and for that reason there is no 
getter for it.

Looking at it that way, these values are accurate:

 >>     HB_MEMORY_MODE_DUPLICATE,
 >>     HB_MEMORY_MODE_READONLY,
 >>     HB_MEMORY_MODE_WRITEABLE,
 >>     HB_MEMORY_MODE_READONLY_MAY_MAKE_WRITEABLE,

This one, not so much:

 >>     HB_MEMORY_MODE_READONLY_NEVER_DUPLICATE,

As it also describe a behavior of the blob itself.  That value however is for 
very corner case uses (when you're just interested in knowing whether the data 
is sane and you want to avoid the cost of copying).

> I think the thing that strikes me first as wrong is that if you create
> a READONLY blob then it's still perfectly valid to write to it, (such
> that you have to have a separate READONLY_NEVER_DUPLICATE for
> "readonly---and I mean it).

More clear in the above context?

I also did some Hoffman encoding here.  I initially had READONLY and 
READONLY_MAY_DUPLICATE, but figured that READONLY_MAY_DUPLICATE is what I want 
most users to use, and one should think twice before using READONLY.  So I 
gave the short name to the common one and made the corner case use a longer name.

> One difficulty is that you're capturing separate notions here,
> (whether the data buffer is writeable vs. whether the created blob
> should be writeable).

Yes and no indeed.

> And there's some missing orthogonality too. For example, if I have
> data that I don't want the blob to reference, (hence DUPLICATE), might
> I not still want to create a blob in which all operations requiring
> write access are disabled? That combination is not possible to express
> from what I can see.

Initially I had that design in mind, with separate writeable.  But I figured 
the extra control is not needed, no.

> So, here's pseudocode trying to capture the decision-tree for choosing
> one of the modes currently:
>
> if (the data cannot be referenced by the blob)
> 	DUPLICATE
> else if (the data buffer can be written to)
> 	if (writing requires calling mprotect)
> 		READONLY_MAY_MAKE_WRITEABLE
> 	else
> 		WRITEABLE
> else if (the resulting blob should be readonly)
> 	READONLY_NEVER_DUPLICATE
> else
> 	READONLY;

Again, lets forget about READONLY_NEVER_DUPLICATE, my mental model for this was:

if (I don't own the data)
   DUPLICATE;
else if (I malloced it)
   WRITEABLE;
else if (it's mmapped)
   READONLY_MAY_MAKE_WRITEABLE;
else /* ie.  it's shared with someone else, I don't have perms to modify it */
   READONLY;

> So perhaps what's actually desired here is a set of flags to express
> each of those conditions independently? I don't have a good proposal
> for what those flag names might be.

I found the flag approach even more confusing as one would then think about 
(and the docs have to explain) all different kind of combinations.

> Although, having a name for the entire behavior is helpful, (rather
> than just a conjunction of flags). So maybe we just need simpler names
> to capture the behaviors of interest:
>
> COPY: Data is immediately copied by the blob. Resulting blob supports
> 	write operations. [New name for DUPLICATE]
>
> COPY_ON_WRITE: Data is referenced by the blob. Resulting blob support
> 	write operations which are implemented by lazily copying the
> 	data, (the original data is never modified). [New name for
> 	READONLY]
>
> READ_WRITE: Data is referenced by the blob. Resulting blob supports
> 	write operations which will directly modify original data.
> 	[New name for WRITEABLE. Of the new names I'm proposing, this
> 	is the one I like the least.]
>
> READONLY: Data is referenced by the blob. Resulting blob does not
> 	support write operations. [New name for READONLY_NEVER_DUPLICATE]
>
> MPROTECT: Data is referenced by the blob. Resulting blob supports
> 	write operations which are implemented by first calling
> 	mprotect or equivalent. [New name for
> 	READONLY_MAY_MAKE_WRITEABLE. Obviously this isn't an ideal
> 	name since it references an operating-system-specific
> 	feature. Suggestions welcome.]

I wanted to keep MPROTECT out of the name.  Again, please think about it again 
without the mode determining whether the blob will be writeable or not.  Just 
as describing the access mode of the input data only.  Do you think the 
current enum makes sense that way?

> I don't have much to say about the actual font-related parts of the
> proposal. But I do think it's perhaps a bad idea to reuse the same
> buffer for text input and glyph output. Is memory that tight? Might a
> user not want to be able to query the text back out even after
> shaping?

Memory is not tight, no.  Note that this is only about what the input/output 
to the hb_shape() function will be.  Users already keep the original text in 
some location anyway, and the cluster mapping is what's used to map the glyphs 
back.  All I'm doing here essentially is to let the user populate a hb-buffer 
and pass that to hb_shape(), instead of hb_shape() doing it.

The benefit of course is having one hb_shape() API, instead of 
hb_shape_utf8(), hb_shape_utf16(), and hb_shape_utf32().

Makes more sense?

Thanks again,
behdad

> -Carl