[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Dec 4 08:36:06 PST 2013


https://bugs.freedesktop.org/show_bug.cgi?id=64815

--- Comment #62 from Adrian Perez de Castro <aperez at igalia.com> ---
(In reply to comment #61)

> ::: poppler/MarkedContentOutputDev.cc
> @@ +43,5 @@
> > +
> > +void MarkedContentOutputDev::endSpan()
> > +{
> > +  if (currentText && currentText->getLength()) {
> > +    TextSpan span(currentText, currentFont, currentLink);
> 
> Add a comment here saying that TextSpan adopts the pointers, because at a
> first glance it looks like this method leaks memory.

Sure, I am adding a comment. There is already a note in the header file
next to the TextSpan constructor but it will not hurt to have a note
also here.

> @@ +44,5 @@
> > +void MarkedContentOutputDev::endSpan()
> > +{
> > +  if (currentText && currentText->getLength()) {
> > +    TextSpan span(currentText, currentFont, currentLink);
> > +    memcpy(&span.data->color, &currentColor, sizeof(GfxRGB));
> 
> Why don't you pass the color to the constructor too? Can't you simply do
> span.data->color = currentColor instead of the memcpy in any case?

Changed.

> @@ +73,5 @@
> > +void MarkedContentOutputDev::beginMarkedContent(char *name, Dict *properties)
> > +{
> > +  int id = -1;
> > +  if (properties && properties->lookupInt("MCID", NULL, &id) && id == mcid)
> > +    inMarkedContent = true;
> 
> I wonder what happens in case of nested marked contents, should we set
> inMarkedContent to false until the inner marked contents ends? or do we want
> also the contents of any nested marked content?

I cannot think of an use-case where one would not be interested in the
nested content. For the moment I think we can keep it as-is, and later
on if the use-case surfaces, add the change.

On a different note, this comment of yours made me notice that there
was a bug with nested marked content sequences: beginMarkedContent()
would be called again for the nested sequence, and when the inner
sequence finishes endMarkedContent() would set inMarkedContent=gFalse
even when the content from the outer marked content sequence has not
been yet extracted. I have changed this so a “std::vector<int>” is
used as a stack to keep track of the nested sequences.

> @@ +144,5 @@
> > +
> > +  if (font == currentFont)
> > +    return;
> > +
> > +  endSpan();
> 
> Do we want to call endSpan even when inMarkedContent = false?

Well, the string with the text content would by NULL or empty, so it is
harmless; but you are right that how this works is expressed better in
the code by checking inMarkedContent before calling endSpan().

> @@ +158,5 @@
> > +
> > +
> > +void MarkedContentOutputDev::updateFillColor(GfxState *state)
> > +{
> > +  GfxRGB color;
> 
> Same here, should we ignore fill color updates when not in marked content?

Not really. Color changes can affect upcoming elements, so if a color
changes before a marked content sequence, the new color also affects
it. Or am I interpreting the spec wrong?

> @@ +167,5 @@
> > +      color.b == currentColor.b)
> > +    return;
> > +
> > +  endSpan();
> > +  currentColor = color;
> 
> Note that the color of the text depends on the current text rendering mode,
> for rendering mode 1 (stroke) what we want is the stroke color instead of
> the fill color. Maybe we could get the current fill/stroke color from the
> state depending on the current render mode in drawChar instead of using
> updateFillColor and updateStrokeColor

Moved into drawChar() as suggested.

> ::: poppler/MarkedContentOutputDev.h
> @@ +25,5 @@
> > +  enum {
> > +    Italic = 1 << 0,
> > +    Bold   = 1 << 1,
> > +    Fixed  = 1 << 2,
> > +  };
> 
> This looks unused.

Removed.

> @@ +46,5 @@
> > +  }
> > +
> > +  GfxFont* getFont() const { return data->font; }
> > +  GooString* getText() const { return data->text; }
> > +  GooString* getLink() const { return data->link; }
> 
> What's exactly this link? a named destination? It doesn't seem to be
> assigned anywhere. If this si something that will be used in a future patch
> I prefer to add it in that patch instead of adding unused code here.

This is a leftover from the past, when I didn't have clear how links
would end up in the structure tree. Links get their own StructElement
so it is not needed to have this in the TextSpan in the end. I have
removed the “data->link” member and the code accessing it.

> @@ +52,5 @@
> > +
> > +private:
> > +  // NOTE: Takes ownership of strings, increases refcount for font.
> > +  TextSpan(GooString *text, GfxFont *font = NULL, GooString *link = NULL)
> > +      : data(new Data) {
> 
> Why do you need this heap allocated data struct instead of using members of
> the class directly? Why is this constructor private?

TextSpanArray does not contain pointers to TextSpans, but the TextSpans
themselves. The Data member is to share the data across them when the
TextSpans are copied/assigned.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20131204/5152dd29/attachment.html>


More information about the Poppler-bugs mailing list