[HarfBuzz] harfbuzz: Branch 'master'

Sun Jan 3 08:03:34 PST 2016

Hi Martin, others,

On 15-12-21 01:59 AM, Martin Hosken wrote:
> Dear Behdad,
> 
>>  buf = hb.buffer_create ()
>> +class Debugger(object):
>> +	def message (self, buf, font, msg, data, _x_what_is_this):
>> +		print(msg)
>> +		return True
>> +debugger = Debugger()
>> +hb.buffer_set_message_func (buf, debugger.message, 1, 0)
>>  hb.buffer_add_utf8 (buf, text.encode('utf-8'), 0, -1)
>>  hb.buffer_guess_segment_properties (buf)
> 
> Yippee. At last, a debug interface :) (Behdad reminds me that I have been asking this once per year for the last 4 years!). Thank you.
> 
> OK. Now to make a great debug interface!
> 
> There are two ways of doing a debug interface: Event driven and One shot. There are probably more, but those are the only two that come to mind now. One shot sends all the information needed to give all the debug information for a debug point in its message. This allows the debugger not to have to keep state, but just record the results and pass them on. Event driven sends, well, events to the debugger and requires the debugger to keep state.
> 
> While one shot seems more inviting and is more in line with what Graphite does. I think for harfbuzz, I would recommend an event based debugger, where you send debug events at the start and end of every lookup, at recursion, during initial reordering and shaping, at dotted circle insertion, etc. and have an enum of events and let the debugger work out what it wants to do with that information.

Agreed about stateful.

> So, I would add an enum to the debug message to give a debug message event type.

My current thinking is that everything is transferred as a text API in
one-line messages.  The client can transform that to an enum if desired.

> One big question that always needs to be answered in the debugger is: where are we? Where in the buffer are we now processing. This is the idx field of the buffer. I don't think this is exposed in the public buffer interface. So it either needs to be exposed or passed as part of the debug message.

I'm unsure about this one.  We don't expose the out_buf pard of the buffer, so
calling client code in the middle of a pass of transformation is harmful
currently.  Exposing all of that, on the other hand, leaks a lot of the buffer
design, which I like to avoid right now.  Indeed, we might end up changing the
buffer internals to accommodate the lookup direction proposal.

So, for now, no callbacks in the middle of a pass.  I understand that's far
from ideal, but at least we are now answering the big question: which lookup
did what.

> I suggest that rather than relying on a message to give the lookup number, that the lookup number be passed as a separate parameter (or in a struct or whatever). The lookup number can be overloaded based on event type. So we could have a starting high level phase event type and use the lookup to say whether that is initial shaping, GSUB, GPOS, etc. for example. Or we could have different event types for each one. That's up to you.

While for regular C APIs I fully agree with you, for this, I'd rather we keep
it as a simple string.  enums and tagged-union types are a headache for
language bindings and even serialization, whereas with 5 lines of code I could
get a debugger going on from Python.  We just need to document the message
syntax completely and it will include all the info that the enum-and-struct
approach does; and performance is definitely not a problem here.

Plus, with a message-based API, clients can handle unknown messages to a
certain degree (eg, printing them out).

> I think we need to send a message each shaper pause when the pause occurs.

Yes, one at the beginning, another at the end.

> For GPOS we need to be passing parameters like the two points in an attachment or the actual calculated offset in a pair or single adjustment. When doing classed based activities, we should be passing the class values involved or perhaps pointers (or offsets) to the data structures involved so that a debugger can turn cross reference that back to source code.

GPOS is more friendly since the buffer structure is fully exposed.  Though,
deferred attachments won't be exposed.

> What does that look like now:
> 
> debug_message(type, buf, idx, lkupidx, void *aptr, void *bptr, msg, ...)
> 
> where aptr and bptr are defined by type and lkupidx and may point to things like an attachment point record or a lookup record in a class based contextual lookup or somesuch. They may also point to debugger specific data structures (perhaps for an attachment point one needs a pointer to the ap record and 2 floats for the resolved x,y coordinates).

That's definitely one thing I *don't* want.

> You know, if we get this right, we should be able to drop the msg, ... since debuggers really don't want to have to parse textual messages. Yes they are easy for a quick trace, but not for a real debugger. But it's welcome to stay to make such tracing programs' lives easier, but it shouldn't contain anything that isn't in the other parameters. If it does, then we need a way to pass it outside the message.

Right.  But I really don't want to add 35-and-growing different structs to
HarfBuzz, just for debugging, either.  Since debuggers can recover whatever
structs they want from the message, and this is a side API I like to keep to a
minimum in HarfBuzz, the message API wins IMO.

> And yes, while I'm trying to define what the kitchen sink is, I'm also trying to keep this lightweight.
> 
> I know the moment I hit send, I'll think of things I've forgotten!

lol.

I'm probably going to add shape_plan to list of arguments.  After that, if I
make a release, the API is here to stay...  So, speak very loudly if you think
for whatever reason this is not workable.  Ie, there are things that cannot be
done using a message.  I can't think of any.

Cheers,

behdad