[Poppler-bugs] [Bug 64821] [TAGGEDPDF] Expose the structure tree and attributes in poppler-glib

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Mar 1 01:49:11 PST 2014


https://bugs.freedesktop.org/show_bug.cgi?id=64821

--- Comment #69 from Carlos Garcia Campos <carlosgc at gnome.org> ---
Comment on attachment 94909
  --> https://bugs.freedesktop.org/attachment.cgi?id=94909
[PATCH v17 5/5] glib: Implement accessors for element attributes

Review of attachment 94909:
-----------------------------------------------------------------

This looks much better now, thanks. But still have some doubts and comments

::: glib/poppler-structure-element.cc
@@ +385,5 @@
> +  /*
> +   * Non-NULL names must always be valid because Poppler
> +   * discards the invalid attributes when parsing them.
> +   */
> +  g_assert (name_value != NULL);

This assert is duplicated.

@@ +401,5 @@
> +static EnumType
> +attr_to_enum (PopplerStructureElement *poppler_structure_element)
> +{
> +  const Attribute *attr =
> +      poppler_structure_element->elem->findAttribute (EnumNameValue<EnumType>::attribute_type, gTrue);

I wonder if we should do the same for the core api, and always inherit when the
attr is inheritable.

@@ +967,5 @@
>   *
> + * Obtains the #PopplerFormField out of an element of type
> + * %POPPLER_STRUCTURE_ELEMENT_FORM. The returned value must be freed
> + * with g_object_unref(). Note that there is one structure element
> + * per form field.

This is unrelated to this patch.

@@ +973,4 @@
>   * Return value: (transfer full): A #PopplerFormField, or %NULL if
>   *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM.
> + *
> + * Since: 0.26

Ditto.

@@ +1018,5 @@
>   *
> + * Obtains the #PopplerFormField out of an element of type
> + * %POPPLER_STRUCTURE_ELEMENT_FORM. The returned value must be freed
> + * with g_object_unref(). Note that there is one structure element
> + * per form field.

Ditto.

@@ +1024,4 @@
>   * Return value: (transfer full): A #PopplerFormFieldMapping, or %NULL if
>   *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM
> + *
> + * Since: 0.26

Ditto.

@@ +1154,5 @@
> + */
> +PopplerStructurePlacement
> +poppler_structure_element_get_placement (PopplerStructureElement *poppler_structure_element)
> +{
> +  ATTRIBUTE_GETTER_BODY (PopplerStructurePlacement);

You are going too far, this is just two lines of code

@@ +1174,5 @@
> +{
> +  ATTRIBUTE_GETTER_BODY (PopplerStructureWritingMode);
> +}
> +
> +

Nit: Extra empty line here.

@@ +1186,5 @@
> +    {
> +      g_assert (object->arrayGetLength () == 4);
> +      Object item;
> +      for (guint i = 0; i < 4; i++)
> +        values[i] = name_to_enum<PopplerStructureBorderStyle> (object->arrayGet (i, &item));

I think you should item.free() everytime.

@@ +1205,5 @@
> +    {
> +      g_assert (object->arrayGetLength () == 4);
> +      Object item;
> +      for (guint i = 0; i < 4; i++)
> +        value[i] = object->arrayGet (i, &item)->getNum ();

Ditto.

@@ +1227,5 @@
> +  gdouble* doubles = g_new (gdouble, *n_values);
> +  Object item;
> +
> +  for (guint i = 0; i < *n_values; i++)
> +    doubles[i] = object->arrayGet (i, &item)->getNum ();

Ditto.

@@ +1241,5 @@
> +  Object item;
> +
> +  color->red   = object->arrayGet (0, &item)->getNum () * 65535;
> +  color->green = object->arrayGet (1, &item)->getNum () * 65535;
> +  color->blue  = object->arrayGet (2, &item)->getNum () * 65535;

Ditto.

@@ +1262,5 @@
> + */
> +PopplerStructureBorderStyle *
> +poppler_structure_element_get_border_style (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);

Should we check also that this is called for an element than can have a border
style attr? and the same for all other attributes.

@@ +1279,5 @@
> + * are in before-after-start-end ordering (for the typical Western
> + * left-to-right writing, that is top-bottom-left-right). The returned
> + * value must be freed with g_free().
> + *
> + * Return value: (transfer full) (array fixed-size=4): An array with the

(element-type PopplerStructureBorderStyle)

@@ +1331,5 @@
> +/**
> + * poppler_structure_element_get_inline_align:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Obtains the block-alignment mode of the structure element.

next method is block_alignment

@@ +1473,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerStructureChecked
> +poppler_structure_element_get_checked (PopplerStructureElement *poppler_structure_element)

I also wonder if we should make more obvious that fact that some attributes are
only availble for some kinds for elements. For example, this method could be:

poppler_structure_element_form_get_state and the previous one
poppler_structure_element_form_get_role()

Also, I would rename PopplerStructureChecked as PopplerStructureFormState since
it described better what it is. In the core API we should follow the spec and
use the same names, to make it easier to follow the code reading the spec. But
in the public API we can change the exposed names when we think are better. And
this is definitely one of those cases.

@@ +1486,5 @@
> + * Obtains the thickness of the border of an element. The result values
> + * are in before-after-start-end ordering (for the typical Western
> + * left-to-right writing, that is top-bottom-left-right).
> + *
> + * Return value: (transfer full) (array fixed-length=4): Thickness of

(element-type gdouble)

@@ +1512,5 @@
> + *
> + * Obtains the thickness of the text decoration for the text contained
> + * in the element.
> + *
> + * Return value: Thickness of the text decoration, or %NaN if not defined.

Can the thickness be a negative number? I prefer to use something like -1
instead of NAN

@@ +1530,5 @@
> + * poppler_structure_element_get_padding:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Obtains the padding of an element (space around it). The result
> + * values are in before-after-start-end ordering (for the typical

s/typical//

@@ +1533,5 @@
> + * Obtains the padding of an element (space around it). The result
> + * values are in before-after-start-end ordering (for the typical
> + * Western left-to-right writing, that is top-bottom-left-right).
> + *
> + * Return value: (transfer full) (array fixed-size=4): Padding for

(element-type gdouble)

@@ +1550,5 @@
> +  return result;
> +}
> +
> +/**
> + * poppler_structure_element_get_table_padding:

poppler_structure_element_table_get_padding maybe?

@@ +1557,5 @@
> + * Obtains the padding of an element (space around it). The result
> + * values are in before-after-start-end ordering (for the typical
> + * Western left-to-right writing, that is top-bottom-left-right).
> + * This has to be used instead of poppler_structure_element_get_padding()
> + * for table elements.

If the name is what I propose, table_get_padding, I think it's more obvious
that this is used for table elements.

@@ +1559,5 @@
> + * Western left-to-right writing, that is top-bottom-left-right).
> + * This has to be used instead of poppler_structure_element_get_padding()
> + * for table elements.
> + *
> + * Return value: (transfer full) (array fixed-size=4): The paddings for

(element-type gdouble)

@@ +1580,5 @@
> + * poppler_structure_element_get_bounding_box:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Obtains the size of the bounding box of the element. This has to be
> + * used instead of poppler_structure_element_get_padding() for table

Why is this related to the padding?

@@ +1589,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerRectangle *
> +poppler_structure_element_get_bounding_box (PopplerStructureElement *poppler_structure_element)

This is one of the case where we could use the approach of returning a boolean
and pass a PopplerRectangle * as parameter, since that allows to use a stack
allocated rectangle.

@@ +1622,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_space_before (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

-1

@@ +1639,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_space_after (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

-1

@@ +1656,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_start_indent (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

-1

@@ +1673,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_end_indent (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

I prefer to use -1 instead of NAN everywhere.

@@ +1743,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerColor *
> +poppler_structure_element_get_color (PopplerStructureElement *poppler_structure_element)

This is another case where we could return boolean when not present and allow
to pass a pointer to a stack allocated color.

@@ +1768,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerColor *
> +poppler_structure_element_get_background_color (PopplerStructureElement *poppler_structure_element)

Ditto.

@@ +1794,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerColor *
> +poppler_structure_element_get_text_decoration_color (PopplerStructureElement *poppler_structure_element)

Ditto.

@@ +1856,5 @@
> + *
> + * Since: 0.26
> + */
> +gdouble
> +poppler_structure_element_get_width (PopplerStructureElement *poppler_structure_element)

Why not merging both functions and have a special value for auto? For example
when this returns -1 is because it's auto

@@ +1869,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Check whether the width of the element is to be calculated automatically
> + * depending on its contents. Note that when the width is automatic, using
> + * poppler_structure_element_get_width() will fail.

And we avoid this situation.

@@ +1908,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Check whether the height of the element is to be calculated automatically
> + * depending on its contents. Note that when the height is automatic, using
> + * poppler_structure_element_get_height() will fail.

Same for the height.

@@ +1949,5 @@
> + *
> + * Check whether the line height for the text contained in the element is
> + * to be calculated automatically depending on the font size. Note that
> + * when the line height is automatic,
> + * poppler_structure_element_get_line_height() will fail.

Same here.

@@ +1997,5 @@
> +
> +  Object *value = attr_value_or_default (poppler_structure_element, Attribute::ColumnGap);
> +  if (value == NULL)
> +    {
> +      *n_values = static_cast<guint> (-1);

The doc says this is set to 0, something I agree with.

@@ +2031,5 @@
> +
> +  Object *value = attr_value_or_default (poppler_structure_element, Attribute::ColumnWidths);
> +  if (value == NULL)
> +    {
> +      *n_values = static_cast<guint> (-1);

Return 0 here.

@@ +2063,5 @@
> +poppler_structure_element_get_description (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +
> +  Object *value = attr_value_or_default (poppler_structure_element, Attribute::Desc);

Who owns the returned Object? the element?

@@ +2069,5 @@
> +    return NULL;
> +  if (value->isString ())
> +    return _poppler_goo_string_to_utf8 (value->getString ());
> +  if (value->isName ())
> +    return g_strdup (value->getName ());

What encoding is this, we should ensure all strings returned by our api a valid
utf8

@@ +2106,5 @@
> +  if (value->isString ())
> +    return _poppler_goo_string_to_utf8 (value->getString ());
> +  if (value->isName ())
> +    return g_strdup (value->getName ());
> +

Same comments here.

@@ +2119,5 @@
> + * Obtains the color of border around the element. The result values
> + * are in before-after-start-end ordering (for the typical Western
> + * left-to-right writing, that is top-bottom-left-right).
> + *
> + * Return value: (transfer full) (array fixed-size=4): Border colors,

(element-type PopplerColor)

@@ +2141,5 @@
> +    {
> +      // One color per side.
> +      Object item;
> +      for (guint i = 0; i < 4; i++)
> +        convert_color (value->arrayGet (i, &item), &result[i]);

item.free() everytime

@@ +2174,5 @@
> + * Since: 0.26
> + */
> +gchar **
> +poppler_structure_element_get_headers (PopplerStructureElement *poppler_structure_element,
> +                                       guint                   *n_values)

We don't need this, since it's a null terminated array. You can iterate ntil
null or get the length with g_strv_length if needed

@@ +2196,5 @@
> +        result[i] = _poppler_goo_string_to_utf8 (item.getString ());
> +      else if (item.isName ())
> +        result[i] = g_strdup (item.getName ());
> +      else
> +        g_assert_not_reached ();

item.free() everytime

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20140301/c7686178/attachment-0001.html>


More information about the Poppler-bugs mailing list