[Poppler-bugs] [Bug 64815] [TAGGEDPDF] Parse the Tagged-PDF document structure tree when present

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Nov 28 03:48:14 PST 2013


https://bugs.freedesktop.org/show_bug.cgi?id=64815

--- Comment #61 from Carlos Garcia Campos <carlosgc at gnome.org> ---
Comment on attachment 89917
  --> https://bugs.freedesktop.org/attachment.cgi?id=89917
Tagged-PDF: Text content extraction from structure elements

Review of attachment 89917:
-----------------------------------------------------------------

This looks much better and simpler :-)

::: poppler/MarkedContentOutputDev.cc
@@ +43,5 @@
> +
> +void MarkedContentOutputDev::endSpan()
> +{
> +  if (currentText && currentText->getLength()) {
> +    TextSpan span(currentText, currentFont, currentLink);

Add a comment here saying that TextSpan adopts the pointers, because at a first
glance it looks like this method leaks memory.

@@ +44,5 @@
> +void MarkedContentOutputDev::endSpan()
> +{
> +  if (currentText && currentText->getLength()) {
> +    TextSpan span(currentText, currentFont, currentLink);
> +    memcpy(&span.data->color, &currentColor, sizeof(GfxRGB));

Why don't you pass the color to the constructor too? Can't you simply do
span.data->color = currentColor instead of the memcpy in any case?

@@ +73,5 @@
> +void MarkedContentOutputDev::beginMarkedContent(char *name, Dict *properties)
> +{
> +  int id = -1;
> +  if (properties && properties->lookupInt("MCID", NULL, &id) && id == mcid)
> +    inMarkedContent = true;

I wonder what happens in case of nested marked contents, should we set
inMarkedContent to false until the inner marked contents ends? or do we want
also the contents of any nested marked content?

@@ +144,5 @@
> +
> +  if (font == currentFont)
> +    return;
> +
> +  endSpan();

Do we want to call endSpan even when inMarkedContent = false?

@@ +158,5 @@
> +
> +
> +void MarkedContentOutputDev::updateFillColor(GfxState *state)
> +{
> +  GfxRGB color;

Same here, should we ignore fill color updates when not in marked content?

@@ +167,5 @@
> +      color.b == currentColor.b)
> +    return;
> +
> +  endSpan();
> +  currentColor = color;

Note that the color of the text depends on the current text rendering mode, for
rendering mode 1 (stroke) what we want is the stroke color instead of the fill
color. Maybe we could get the current fill/stroke color from the state
depending on the current render mode in drawChar instead of using
updateFillColor and updateStrokeColor

::: poppler/MarkedContentOutputDev.h
@@ +25,5 @@
> +  enum {
> +    Italic = 1 << 0,
> +    Bold   = 1 << 1,
> +    Fixed  = 1 << 2,
> +  };

This looks unused.

@@ +46,5 @@
> +  }
> +
> +  GfxFont* getFont() const { return data->font; }
> +  GooString* getText() const { return data->text; }
> +  GooString* getLink() const { return data->link; }

What's exactly this link? a named destination? It doesn't seem to be assigned
anywhere. If this si something that will be used in a future patch I prefer to
add it in that patch instead of adding unused code here.

@@ +52,5 @@
> +
> +private:
> +  // NOTE: Takes ownership of strings, increases refcount for font.
> +  TextSpan(GooString *text, GfxFont *font = NULL, GooString *link = NULL)
> +      : data(new Data) {

Why do you need this heap allocated data struct instead of using members of the
class directly? Why is this constructor private?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20131128/19acb996/attachment.html>


More information about the Poppler-bugs mailing list