[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Jun 25 10:46:13 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=64815

--- Comment #26 from Adrian Perez de Castro <aperez at igalia.com> ---
(In reply to comment #25)
> Comment on attachment 81010 [details] [review]
> [PATCH v5 03/10] Tagged-PDF: Implement parsing of StructElem objects
> 
> Review of attachment 81010 [details] [review]:
> -----------------------------------------------------------------
> 
> Please, try to make sure patches build before submitting them.
> 
> ::: poppler/StructElement.cc
> @@ +82,5 @@
> > +  { StructElement::Figure,     "Figure",     elementTypeUndefined },
> > +  { StructElement::Formula,    "Formula",    elementTypeUndefined },
> > +  { StructElement::Form,       "Form",       elementTypeUndefined },
> > +  { StructElement::TOC,        "TOC",        elementTypeUndefined },
> > +  { StructElement::TOCI,       "TOCI",       elementTypeUndefined },
> 
> Isn't all this also part of the standard attributes defined for tagged PDFs?
> If we leave the tagged pdf support for a different patch, maybe you can
> merge this and the StructTreeRoot into a single patch and you don't need to
> dummy impl of StructElement in the previous patch.

Yes, this is for the standard attributes, I can try to avoid having a
completely dummy StructElement class in the patch implementing the
StructTreeRoot one — but be aware that will move the parsing of
StructElement to the patch where StructTreeRoot is parsed and it
will grow bigger :-/

> @@ +237,5 @@
> > +
> > +GooString* StructElement::getText(GooString *string, GBool recursive) const
> > +{
> > +  // TODO: Dummy implementation, complete
> > +  return NULL;
> 
> Do we really need this? or can it be added in a follow up patch with the
> proper code instead?

Not really. I am changing this to not have a dummy StructElement::getText
and add it in a later patch.

> @@ +338,5 @@
> > +
> > +  // Language (optional).
> > +  if (element->lookup("Lang", &obj)->isString()) {
> > +    s->language = obj.getString();
> > +    obj.initNull(); // The StructElement takes ownership of the GooString
> 
> Same concern I had in previous review.

I have seen similar uses of Object::initNull() in other parts of the
Poppler code (to mention one example: Parser.cc, line 178), so this
is not a completely alien code pattern, I would say ;-)

Adding a Object::takeString() method (and related ones) is not really
needed. It would only make the code more self-documenting, avoiding
to have to add a comment about the ownership being transferred — which
is nice but hardly a priority, I guess.


> ::: poppler/StructElement.h
> @@ +74,5 @@
> > +  const GooString *getID() const { return isContent() ? NULL : s->id; }
> > +  GooString *getID() { return isContent() ? NULL : s->id; }
> > +
> > +  // Optional ISO language name, e.g. en_US
> > +  GooString *getLang()
> 
> getLanguage() is not that long :-)

I am changing it to getLanguage()


> @@ +75,5 @@
> > +  GooString *getID() { return isContent() ? NULL : s->id; }
> > +
> > +  // Optional ISO language name, e.g. en_US
> > +  GooString *getLang()
> > +  { return (!isContent() && s->language) ? s->language : (parent ? parent->getLang() : NULL); }
> 
> We normally indent this.

Fixing. Also, I am changing the body to this, to make it clearer:

  GooString *getLanguage() {
     if (!isContent() && s->language) return s->language;
     return parent ? parent->getLanguage() : NULL;
  }

> @@ +85,5 @@
> > +  void setRevision(Guint revision) { if (isContent()) s->revision = revision; }
> > +
> > +  // Optional element title, in human-readable form.
> > +  const GooString *getTitle() const { return isContent() ? NULL : s->title; }
> > +  GooString *getTitle() { return isContent() ? NULL : s->title; }
> 
> Do we really need the two versions of every getter? I prefer to only add the
> code that is actually used and add more later when needed.

Ideally only the “const” getter should be needed, but for that to work
GooString (or whatever the type the returned object has) must also tag
its own methods that do not modify the instance as “const”. Take for
example the following:

  const StructElement *elem = /* ... */
  printf("Language: %s\n", elem->getLanguage->getCString());

Here, getLanguage() would return a “const GooString*”, and following
the rules for constness in C++, only “const” methods can be invoked
using the returned pointer (because they are guaranteed to not modify
the GooString instance), but unfortunately GooString::getCString() is
not declared “const” (even when it does _not_ modify the GooString),
so the compiler assumes that the internals of the returned GooString
may change; and if they change, the internals of the StructElement
are also changing, therefore a “const” getter cannot be used...

The Good Solution™ for this is marking methods of existing classes
properly as “const” (where appropriate). Here I decided to try to
limit the impact in the rest of the code by having the non-“const”
getter.

There is one thing that could be to improve the readability, though.
I could rewrite the non-“const” getters like this:

  const GooString* getLanguage() const { /* ... */ }
  GooString* getLanguage() { 
     return
const_cast<GooString*>(static_cast<GooString*>(this)->getLanguage());
  }

(Not sure which one is uglier, if the original or this one I have
just suggested... IMHO it is okay to repeat the simple getters,
and maybe use this suggestion I have done here for complex getters).


> @@ +133,5 @@
> > +  //
> > +  // The text will be appended to the passed GooString. If NULL is passed,
> > +  // a new string is returned, and the ownership passed to the caller.
> > +  //
> > +  GooString* getText(GooString *string = NULL, GBool recursive = gTrue) const;
> 
> Don't include this in this first patch if it's unused and unimplemented.

Okay.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130625/a25b300b/attachment.html>


More information about the Poppler-bugs mailing list