[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jun 20 01:42:59 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=64815

--- Comment #22 from Carlos Garcia Campos <carlosgc at gnome.org> ---
(In reply to comment #14)
> (In reply to comment #11)
> > Comment on attachment 80407 [details] [review] [review]
> > [PATCH v3 2/7] Tagged-PDF: Interpret the document structure
> > 
> > Review of attachment 80407 [details] [review] [review]:
> > -----------------------------------------------------------------
> > 
> > Ok, I have just a few comments for now, I'll do a full review when the patch
> > is split. In general, * are placed wrongly (close to the type instead of the
> > argument/variable name). Some parts of the code are hard to follow with all
> > those abbreviations.
> > 
> > ::: poppler/Catalog.h
> > @@ +124,4 @@
> > >    GooString *readMetadata();
> > >  
> > >    // Return the structure tree root object.
> > > +  StructTreeRoot* getStructTreeRoot();
> > 
> > Note that this is used by pdfinfo, you need to update it too
> 
> The corresponding patch for “pdfinfo” is attached to bug #64813

Every patch should build without depending on any other, if you change
getStructTreeRoot to return a StructTreeRoot * instead of an Object * and
pdutils still does doc->getStructTreeRoot()->isDict() this is not going to
build. 

> > ::: poppler/StructElement.cc
> > @@ +231,5 @@
> > > +      }                                                                 \
> > > +      return okay;                                                      \
> > > +    }
> > > +
> > > +ARRAY_CHECKER(isRGBColor_or_OptX4, isRGBColor,        4, gTrue,  gTrue );
> > 
> > what is OptX4?
> 
> Instead one single color value specifying e.g. the color of all the
> four sides a box border, it is possible to *optionally* have four
> values (hence *X4*) with one value for each side. I will add a comment
> about that in the code and think about a better name.

Ok, it's a bit confusing I would say.

> > @@ +862,5 @@
> > > +StructElement::StructElement(Dict *element, StructTreeRoot *treeRootA, StructElement *parentA):
> > > +  type(Unknown),
> > > +  treeRoot(treeRootA),
> > > +  parent(parentA),
> > > +  pageRef(),
> > 
> > Do you really need this? I guess it does nothing.
> 
> Explicitly initializes using the default constructor. I tend to prefer to
> write code that is explicit, but I can just remove the initializer for
> the “pageRef” attribute.

The default constructor is used in any case, I think those unneeded explicit
initializers are just noise, but I'm not C++ expert.

> > @@ +863,5 @@
> > > +  type(Unknown),
> > > +  treeRoot(treeRootA),
> > > +  parent(parentA),
> > > +  pageRef(),
> > > +  s(new StructData())
> > 
> > Why having this as a separate struct?
> 
> The idea is to make the StructElement instances smaller in memory when
> just storing a MCID or an OBJR.
>
> > @@ +900,5 @@
> > > +{
> > > +  if (isContent())
> > > +    delete c;
> > > +  else
> > > +    delete s;
> > 
> > I see why using separate structs now. I would just keep the mcid or object
> > ref in the same StructElement and parse it on demand. I find very confusing
> > using variable names like c and s
> 
> Do you mean that you would keep them in the parent StructElement? IMHO that
> would make the API of StructElement more cumbersome. Also, note that there
> can be mixed MCID, OBJR and StructElem children in a structure object, so
> that would also complicate to store the children (now it is just a simple
> “std::vector<StructElement*>”). Or am I understanding your comment wrongly?

I think we should try to delay most of this to be done in demand, note that in
most of the cases we will be parsing al this for nothing, because poppler api
users don't support structure/tagged pdfs. So, I would just keep the mcid or
object ref, and parse it on demand if it's required only.

> > @@ +1160,5 @@
> > > +  if (element->lookup("Alt", &obj)->isString()) {
> > > +    s->altText = obj.getString();
> > > +    obj.initNull(); // The StructElement takes ownership of the GooString
> > > +  } else if (!obj.isNull()) {
> > > +    error(errSyntaxWarning, -1, "Alt object is wrong type ({0:s})", obj.getTypeName());
> > 
> > I think we can just ignore invalid objects for optional fields.
> 
> Do you mean to not even print warning messages?

Yes.

> FWIW, the rest of your comments are valid concerns (especially the ones
> related to leaking objects) and I will be fixing the issues tomorrow.
> 
> Thanks for the review, and next round would be (hopefully) easier with
> the split patches :)

Thanks

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130620/59ebdd25/attachment.html>


More information about the Poppler-bugs mailing list