[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jun 20 03:45:49 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=64815

--- Comment #24 from Carlos Garcia Campos <carlosgc at gnome.org> ---
Comment on attachment 81009
  --> https://bugs.freedesktop.org/attachment.cgi?id=81009
[PATCH v5 02/10] Tagged-PDF: Implement parsing of StructTreeRoot

Review of attachment 81009:
-----------------------------------------------------------------

Looks good in general, but you should make sure it doesn't break the build. I
have a few other comments.

::: poppler/Catalog.cc
@@ +854,5 @@
> +      return NULL;
> +    }
> +
> +    if (catalog.dictLookup("StructTreeRoot", &root)->isDict("StructTreeRoot")) {
> +      structTreeRoot = new StructTreeRoot(doc, root.getDict(), getMarkInfo() & markInfoMarked);

What should we do in case of suspects? I don't like bool parameters, maybe we
could just pass the mark info flags and let the StructTreeRoot decide what to
do depending on the flags. You don't even need to pass this as a parameter,
since the StructTreeRoot keeps a pointer to the PDFDoc, you can get the mark
inflo flags from the doc directly:

doc->getCatalog()->getMarkInfo()

::: poppler/Catalog.h
@@ +124,4 @@
>    GooString *readMetadata();
>  
>    // Return the structure tree root object.
> +  StructTreeRoot *getStructTreeRoot();

As I said, this breaks the build. It's very important that every patch builds,
since it might affect any future bisection, for example.

::: poppler/StructTreeRoot.cc
@@ +110,5 @@
> +  // Parse the children StructElements
> +  Object kids;
> +  if (root->lookup("K", &kids)->isArray()) {
> +    if (marked && kids.arrayGetLength() > 1) {
> +      error(errSyntaxWarning, -1, "K in StructTreeRoot has more than one children in a tagged PDF");

I would leave this for patches to add tagged pdf support, adding first just the
generic logical structure parsing.

@@ +118,5 @@
> +      kids.arrayGetNF(i, &ref);
> +      if (kids.arrayGet(i, &obj)->isDict()) {
> +        StructElement *child = new StructElement(obj.getDict(), this);
> +        if (child->isOk()) {
> +          if (marked && !(child->getType() == StructElement::Document ||

Ditto.

@@ +125,5 @@
> +                          child->getType() == StructElement::Div)) {
> +            error(errSyntaxWarning, -1, "StructTreeRoot element of tagged PDF is wrong type ({0:s})", child->getTypeName());
> +          }
> +          appendElement(child);
> +          if (ref.isRef()) parentTreeAdd(ref.getRef(), child);

Move the if body to its own line, please.

::: poppler/StructTreeRoot.h
@@ +36,5 @@
> +  unsigned getNumElements() const { return elements.size(); }
> +  const StructElement *getElement(int i) const { return elements.at(i); }
> +  StructElement *getElement(int i) { return elements.at(i); }
> +  void appendElement(StructElement *element)
> +  { if (element && element->isOk()) elements.push_back(element); }

We normally indent the body when it's in a new line.

@@ +39,5 @@
> +  void appendElement(StructElement *element)
> +  { if (element && element->isOk()) elements.push_back(element); }
> +
> +  const StructElement *findParentElement(unsigned index) const
> +  { return (index < parentTree.size() && parentTree[index].size() == 1) ? parentTree[index][0].element : NULL; }

Ditto. I wonder if this would be easier to read in the .cc file, though.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130620/7ef5a7a0/attachment-0001.html>


More information about the Poppler-bugs mailing list