[Poppler-bugs] [Bug 64821] [TAGGEDPDF] Expose the structure tree and attributes in poppler-glib

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Feb 27 12:14:42 PST 2014


https://bugs.freedesktop.org/show_bug.cgi?id=64821

--- Comment #60 from Adrian Perez de Castro <aperez at igalia.com> ---
(In reply to comment #58)
> Comment on attachment 94018 [details] [review]
> [PATCH v16 8/9] glib: Implement accessors for element attributes
> 
> Review of attachment 94018 [details] [review]:
> -----------------------------------------------------------------
> 
> I have a lot of doubts about this API.
> 
> ::: glib/poppler-structure-element.cc
> @@ +212,5 @@
> > +  const gchar *name;
> > +  EnumType     value;
> > +
> > +  static const EnumNameValue<EnumType> values[];
> > +  static const EnumType null = static_cast<EnumType> (-1);
> 
> I don't understand this null value, all attributed have a default value, so
> we should always return the default value when the attribute is not present.

Only some attributes have default values (check the attribute maps in
poppler/StructElement.cc, line 302). The value EnumType<T>::null is
returned during attribute lookup if:

- The attribute does not have a default value, AND
- the attribute is not defined, AND
- the value for the attribute is not inherited from a parent element.

BTW, in the attributes map the items with a default value are those
that should have one according to the PDF spec.

> @@ +371,5 @@
> > +static EnumType
> > +name_to_enum (Object  *name_value,
> > +              EnumType default_value = EnumNameValue<EnumType>::null)
> > +{
> > +  if (!name_value)
> 
> Can this really be called with NULL name_value?

Yes, for example in line 501 there is:

  EnumType result = name_to_enum<EnumType> ((attr == NULL)
                         ? Attribute::getDefaultValue (attribute_type)
                         : attr->getValue ());

In this particular case, “NULL” is passed as “name_value” to
“name_to_enum()” when the attribute is not defined (“attr == NULL”)
for a structure element, and “Attribute::getDefault()” returns
“NULL” (that is, the attribute does not have a default value).

> @@ +378,5 @@
> > +  for (const EnumNameValue<EnumType> *item = EnumNameValue<EnumType>::values ; item->name; item++)
> > +    if (name_value->isName (item->name))
> > +      return item->value;
> > +
> > +  return default_value;
> 
> Can this happen? in the frontend we assume that invalid attributes are
> ignored by the core.

You are right with this one: if “name_value” is not “NULL”, then it
must be one of the values of the enum because the invalid attributes
are indeed discarded by the core.

I am adding “g_assert_not_reached()” before the “return” statement.
Unfortunately, I have to keep the “return” to make the compiler happy
and avoid it to complain about the function ending without returning
qa value.

> @@ +386,5 @@
> > +template <typename EnumType>
> > +static bool
> > +name_to_enum (Object *name_value, EnumType& result)
> > +{
> > +  return (result = name_to_enum<EnumType> (name_value)) != EnumNameValue<EnumType>::null;
> 
> name_to_enum should never return null I think

For the reason mentioned above: the “null” value is returned when
the attribute is not defined and it has no default value. Checking
for “null” is what makes it possible for the API methods to return
“FALSE” when the attribute lookup failed and there is no default.

Funny thing: I never got to use this overloaded “name_to_enum()”
version, so I am going to remove it — the actual check is done in
“attr_to_enum()”.

> @@ +410,5 @@
> > +              g_variant_builder_add (&builder, "d", item.getNum ());
> > +            }
> > +          else
> > +            {
> > +              /* Free the resources in-use by the builder */
> 
> Aren't these values already checked in the core?

Correct. This check is redundant, I will remove it.

> @@ +450,5 @@
> > +              g_free (utf8_string);
> > +            }
> > +          else
> > +            {
> > +              /* Free the resources in-use by the builder */
> 
> Ditto.

Re: Ditto :)

> @@ +468,5 @@
> > +{
> > +  Object *value = Attribute::getDefaultValue (attribute_type);
> > +  const Attribute *attr;
> > +
> > +  if ((attr = poppler_structure_element->elem->findAttribute (attribute_type, inherit)))
> 
> Are we using inherit with any attribute? Shouldn't findAttribute use the
> inherit internally only for attributes that are inheritable?

Yes, it makes sense to hide the fact that some attributes can be inherited in
the frontend.
If some use-case arises in the future that needs to know exactly where an
attribute is
defined, we can add something like poppler_structure_element_has_attribute()
later on.

> @@ +469,5 @@
> > +  Object *value = Attribute::getDefaultValue (attribute_type);
> > +  const Attribute *attr;
> > +
> > +  if ((attr = poppler_structure_element->elem->findAttribute (attribute_type, inherit)))
> > +    value = attr->getValue ();
> 
> Here we could return attr->getValue (); directly, and call
> Attribute::getDefaultValue (attribute_type) after this when the attr was not
> found.

You are right, it is better to avoid doing the call to
“Attribute::getDefault()”
when the attribute has a value. I think it will look nice like this:

  const Attribute *attr =
      poppler_structure_element->elem->findAttribute (attribute_type, inherit);
  return (attr != NULL) ? attr->getValue () : Attribute::getDefaultValue
(attribute_type);


> @@ +485,5 @@
> > +
> > +  EnumType result = name_to_enum<EnumType> ((attr == NULL)
> > +                                            ? Attribute::getDefaultValue (attribute_type)
> > +                                            : attr->getValue ());
> > +  if (result_ptr != NULL)
> 
> Can this be called with a NULL result_ptr?

Nope. It is possible, but I remember that we talked at some point on changing
the API documentation and GI-annotations to disallow passing “NULL”. I am going
to remove this check here, and in methods that use the function I will be
adding
non-NULL checks with “g_return_val_if_fail()”.

> @@ +488,5 @@
> > +                                            : attr->getValue ());
> > +  if (result_ptr != NULL)
> > +    *result_ptr = result;
> > +
> > +  return (result == EnumNameValue<EnumType>::null);
> 
> How can this be null if we are using the default value when the attribute is
> not present?
> 
> @@ +1192,5 @@
> > + */
> > +gboolean
> > +poppler_structure_element_get_placement (PopplerStructureElement    *poppler_structure_element,
> > +                                         PopplerStructurePlacement *value,
> > +                                         gboolean                   inherit)
> 
> hmm, I don't understand this generic way of getting the attributes. Since,
> all attributes have a default value, we should always return the default
> value when the attribute is not present. If we really need API to let the
> user know whether an attribute is explicitly present we could add has_foo()
> methods instead. Also, not all attributes are inheritable, why do we allow
> the user decide? I think that inheritable attributes should be always looked
> up  in their ancestors automatically, it should be transparent for the user.
> We could add in the documentation what the default value is in case of the
> attr is not found and which once are inheriable. So, I think this method
> should be something like:
> 
> PopplerStructurePlacement 
> poppler_structure_element_get_placement (PopplerStructureElement   
> *poppler_structure_element);

As discussed today in IRC, the API is going to be simplified so the frontend
returns reasonable default values for attributes which do not have a default,
or they will signal the non-existence of the attribute by other means (like
for example returning a NULL pointer and so on.)

> It's not inheritable, and when not present inline will be returned. And the
> same approach for all other methods.
> 
> @@ +1255,5 @@
> > +                                                       POPPLER_STRUCTURE_BORDER_STYLE_NONE);
> > +        }
> > +    }
> > +  else
> > +    return FALSE;
> 
> This should never fail, invalid objects should be ignored by the core and
> never passed to the forntends.

True.

> @@ +1286,5 @@
> > +      if (value != NULL)
> > +        value[0] = value[1] = value[2] = value[3] = object->getNum ();
> > +    }
> > +  else
> > +    return FALSE;
> 
> Ditto.

Yep.

> @@ +1304,5 @@
> > +        *value = object->getNum ();
> > +      return TRUE;
> > +    }
> > +  else
> > +    return FALSE;
> 
> Ditto.
> 
> @@ +1311,5 @@
> > +static inline gboolean
> > +convert_doubles_array (Object *object, gdouble **values, guint *n_values)
> > +{
> > +  if (!object->isArray ())
> > +    return FALSE;
> 
> Ditto.
> 
> @@ +1353,5 @@
> > +        *value = (guint) object->getNum ();
> > +      return TRUE;
> > +    }
> > +  else
> > +    return FALSE;
> 
> Ditto.
> 
> @@ +1360,5 @@
> > +static gboolean
> > +convert_color (Object *object, PopplerColor *color)
> > +{
> > +  if (!object->isArray () || object->arrayGetLength () != 3)
> > +    return FALSE;
> 
> Ditto, and same in all  other cases.

Certainly.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20140227/c1dbf571/attachment.html>


More information about the Poppler-bugs mailing list