[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present

Mon Jun 17 15:06:21 PDT 2013

https://bugs.freedesktop.org/show_bug.cgi?id=64815

--- Comment #14 from Adrian Perez de Castro <aperez at igalia.com> ---
(In reply to comment #11)
> Comment on attachment 80407 [details] [review]
> [PATCH v3 2/7] Tagged-PDF: Interpret the document structure
> 
> Review of attachment 80407 [details] [review]:
> -----------------------------------------------------------------
> 
> Ok, I have just a few comments for now, I'll do a full review when the patch
> is split. In general, * are placed wrongly (close to the type instead of the
> argument/variable name). Some parts of the code are hard to follow with all
> those abbreviations.
> 
> ::: poppler/Catalog.h
> @@ +124,4 @@
> >    GooString *readMetadata();
> >  
> >    // Return the structure tree root object.
> > +  StructTreeRoot* getStructTreeRoot();
> 
> Note that this is used by pdfinfo, you need to update it too

The corresponding patch for “pdfinfo” is attached to bug #64813

> ::: poppler/StructElement.cc
> @@ +231,5 @@
> > +      }                                                                 \
> > +      return okay;                                                      \
> > +    }
> > +
> > +ARRAY_CHECKER(isRGBColor_or_OptX4, isRGBColor,        4, gTrue,  gTrue );
> 
> what is OptX4?

Instead one single color value specifying e.g. the color of all the
four sides a box border, it is possible to *optionally* have four
values (hence *X4*) with one value for each side. I will add a comment
about that in the code and think about a better name.

> @@ +862,5 @@
> > +StructElement::StructElement(Dict *element, StructTreeRoot *treeRootA, StructElement *parentA):
> > +  type(Unknown),
> > +  treeRoot(treeRootA),
> > +  parent(parentA),
> > +  pageRef(),
> 
> Do you really need this? I guess it does nothing.

Explicitly initializes using the default constructor. I tend to prefer to
write code that is explicit, but I can just remove the initializer for
the “pageRef” attribute.

> @@ +863,5 @@
> > +  type(Unknown),
> > +  treeRoot(treeRootA),
> > +  parent(parentA),
> > +  pageRef(),
> > +  s(new StructData())
> 
> Why having this as a separate struct?

The idea is to make the StructElement instances smaller in memory when
just storing a MCID or an OBJR.

> @@ +900,5 @@
> > +{
> > +  if (isContent())
> > +    delete c;
> > +  else
> > +    delete s;
> 
> I see why using separate structs now. I would just keep the mcid or object
> ref in the same StructElement and parse it on demand. I find very confusing
> using variable names like c and s

Do you mean that you would keep them in the parent StructElement? IMHO that
would make the API of StructElement more cumbersome. Also, note that there
can be mixed MCID, OBJR and StructElem children in a structure object, so
that would also complicate to store the children (now it is just a simple
“std::vector<StructElement*>”). Or am I understanding your comment wrongly?

> @@ +1160,5 @@
> > +  if (element->lookup("Alt", &obj)->isString()) {
> > +    s->altText = obj.getString();
> > +    obj.initNull(); // The StructElement takes ownership of the GooString
> > +  } else if (!obj.isNull()) {
> > +    error(errSyntaxWarning, -1, "Alt object is wrong type ({0:s})", obj.getTypeName());
> 
> I think we can just ignore invalid objects for optional fields.

Do you mean to not even print warning messages?

FWIW, the rest of your comments are valid concerns (especially the ones
related to leaking objects) and I will be fixing the issues tomorrow.

Thanks for the review, and next round would be (hopefully) easier with
the split patches :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130617/22e65c63/attachment.html>