[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present

Sun Jun 16 03:25:05 PDT 2013

https://bugs.freedesktop.org/show_bug.cgi?id=64815

--- Comment #11 from Carlos Garcia Campos <carlosgc at gnome.org> ---
Comment on attachment 80407
  --> https://bugs.freedesktop.org/attachment.cgi?id=80407
[PATCH v3 2/7] Tagged-PDF: Interpret the document structure

Review of attachment 80407:
-----------------------------------------------------------------

Ok, I have just a few comments for now, I'll do a full review when the patch is
split. In general, * are placed wrongly (close to the type instead of the
argument/variable name). Some parts of the code are hard to follow with all
those abbreviations.

::: poppler/Catalog.h
@@ +124,4 @@
>    GooString *readMetadata();
>  
>    // Return the structure tree root object.
> +  StructTreeRoot* getStructTreeRoot();

Note that this is used by pdfinfo, you need to update it too

::: poppler/StructElement.cc
@@ +197,5 @@
> +{
> +  return value->isNum();
> +}
> +
> +static GBool isNumber_or_AutoName(Object* value)

isNumerOrAutoName I guess or even just isNumberOrAuto

@@ +231,5 @@
> +      }                                                                 \
> +      return okay;                                                      \
> +    }
> +
> +ARRAY_CHECKER(isRGBColor_or_OptX4, isRGBColor,        4, gTrue,  gTrue );

what is OptX4?

@@ +247,5 @@
> +// Maps attributes to their names and whether the attribute can be inherited.
> +struct AttributeMapEntry {
> +  Attribute::Type    type;
> +  const char*        name;
> +  const Object*      defval;

Move the * to the variable name.

@@ +862,5 @@
> +StructElement::StructElement(Dict *element, StructTreeRoot *treeRootA, StructElement *parentA):
> +  type(Unknown),
> +  treeRoot(treeRootA),
> +  parent(parentA),
> +  pageRef(),

Do you really need this? I guess it does nothing.

@@ +863,5 @@
> +  type(Unknown),
> +  treeRoot(treeRootA),
> +  parent(parentA),
> +  pageRef(),
> +  s(new StructData())

Why having this as a separate struct?

@@ +900,5 @@
> +{
> +  if (isContent())
> +    delete c;
> +  else
> +    delete s;

I see why using separate structs now. I would just keep the mcid or object ref
in the same StructElement and parse it on demand. I find very confusing using
variable names like c and s

@@ +995,5 @@
> +    assert(map);
> +
> +    const MCOpArray& ops(getMCOps());
> +    if (!ops.size())
> +      return NULL;

This leaks the map, GlobalParams::getTextEncoding() returns a new reference. I
would get the map below right before you need it.

@@ +1149,5 @@
> +
> +  // Language (optional).
> +  if (element->lookup("Lang", &obj)->isString()) {
> +    s->language = obj.getString()->getCString();
> +    obj.initNull(); // The StructElement takes ownership of the GooString

This is weird, I guess you are trying to avoid another string copy. I wonder
why you didn't do the same for title. Maybe we could add "take" methods to
Object and do something like:

language = obj.takeString();

hmm, but here you are getting the CString from the GooString and leaking the
GooString.

@@ +1160,5 @@
> +  if (element->lookup("Alt", &obj)->isString()) {
> +    s->altText = obj.getString();
> +    obj.initNull(); // The StructElement takes ownership of the GooString
> +  } else if (!obj.isNull()) {
> +    error(errSyntaxWarning, -1, "Alt object is wrong type ({0:s})", obj.getTypeName());

I think we can just ignore invalid objects for optional fields.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130616/a3e05105/attachment.html>