[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Sun Jun 16 03:25:05 PDT 2013
https://bugs.freedesktop.org/show_bug.cgi?id=64815
--- Comment #11 from Carlos Garcia Campos <carlosgc at gnome.org> ---
Comment on attachment 80407
--> https://bugs.freedesktop.org/attachment.cgi?id=80407
[PATCH v3 2/7] Tagged-PDF: Interpret the document structure
Review of attachment 80407:
-----------------------------------------------------------------
Ok, I have just a few comments for now, I'll do a full review when the patch is
split. In general, * are placed wrongly (close to the type instead of the
argument/variable name). Some parts of the code are hard to follow with all
those abbreviations.
::: poppler/Catalog.h
@@ +124,4 @@
> GooString *readMetadata();
>
> // Return the structure tree root object.
> + StructTreeRoot* getStructTreeRoot();
Note that this is used by pdfinfo, you need to update it too
::: poppler/StructElement.cc
@@ +197,5 @@
> +{
> + return value->isNum();
> +}
> +
> +static GBool isNumber_or_AutoName(Object* value)
isNumerOrAutoName I guess or even just isNumberOrAuto
@@ +231,5 @@
> + } \
> + return okay; \
> + }
> +
> +ARRAY_CHECKER(isRGBColor_or_OptX4, isRGBColor, 4, gTrue, gTrue );
what is OptX4?
@@ +247,5 @@
> +// Maps attributes to their names and whether the attribute can be inherited.
> +struct AttributeMapEntry {
> + Attribute::Type type;
> + const char* name;
> + const Object* defval;
Move the * to the variable name.
@@ +862,5 @@
> +StructElement::StructElement(Dict *element, StructTreeRoot *treeRootA, StructElement *parentA):
> + type(Unknown),
> + treeRoot(treeRootA),
> + parent(parentA),
> + pageRef(),
Do you really need this? I guess it does nothing.
@@ +863,5 @@
> + type(Unknown),
> + treeRoot(treeRootA),
> + parent(parentA),
> + pageRef(),
> + s(new StructData())
Why having this as a separate struct?
@@ +900,5 @@
> +{
> + if (isContent())
> + delete c;
> + else
> + delete s;
I see why using separate structs now. I would just keep the mcid or object ref
in the same StructElement and parse it on demand. I find very confusing using
variable names like c and s
@@ +995,5 @@
> + assert(map);
> +
> + const MCOpArray& ops(getMCOps());
> + if (!ops.size())
> + return NULL;
This leaks the map, GlobalParams::getTextEncoding() returns a new reference. I
would get the map below right before you need it.
@@ +1149,5 @@
> +
> + // Language (optional).
> + if (element->lookup("Lang", &obj)->isString()) {
> + s->language = obj.getString()->getCString();
> + obj.initNull(); // The StructElement takes ownership of the GooString
This is weird, I guess you are trying to avoid another string copy. I wonder
why you didn't do the same for title. Maybe we could add "take" methods to
Object and do something like:
language = obj.takeString();
hmm, but here you are getting the CString from the GooString and leaking the
GooString.
@@ +1160,5 @@
> + if (element->lookup("Alt", &obj)->isString()) {
> + s->altText = obj.getString();
> + obj.initNull(); // The StructElement takes ownership of the GooString
> + } else if (!obj.isNull()) {
> + error(errSyntaxWarning, -1, "Alt object is wrong type ({0:s})", obj.getTypeName());
I think we can just ignore invalid objects for optional fields.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130616/a3e05105/attachment.html>
More information about the Poppler-bugs
mailing list