[poppler] Is there a way to get blank cells to contain something??

Michael D. Setzer II mikes at guam.net
Sat Feb 6 23:59:38 UTC 2021


Have a pdf file from a web site that I download, 
unfortunately coping the data directly results in long 
columns rather than going across.

The pdftohtml does produce output in a much more user 
friendly format that I can easily parse except for one 
issue. If a cell is completely empty, nothing is produced at 
all. Generally, there are 19 fields per row, but if a cell is 
blank, it only has 18, but not simple to tell, since no row 
marker is included either?

Can detect a short line, since the last field of a line is 
longer than the starting field, which is a 3 digit line 
number. But no way to determin which of the fields in the 
empty on. Sometimes it is field 14, but sometimes field 
17. Once determined, it is just a matter of inserting a cell, 
in the correct position.

Don't know if there is some option or switch that might be 
there to include something for blank cells.

Thanks.

+------------------------------------------------------------+
 Michael D. Setzer II - Computer Science Instructor 
(Retired)     
 mailto:mikes at guam.net                            
 mailto:msetzerii at gmail.com
 Guam - Where America's Day Begins                        
 G4L Disk Imaging Project maintainer 
 http://sourceforge.net/projects/g4l/
+------------------------------------------------------------+





More information about the poppler mailing list