[poppler] Minor issue with pdf files created from excell that have empty cells?
Michael D. Setzer II
mikes at guam.net
Sat Oct 9 10:47:35 UTC 2021
Have a couple of spreadsheets my college puts on web
site as pdf files, but are created from excell spreadsheets.
Using the poppler pdftohtml or pdftotxt I'm able to get
the data from the file. Only issue is that a few records
have cells that are blank, and this throws the columns off.
Original spreadsheet has columns A thru S, but on rows
with a blank cell data gets shifted. Am able to have
program correct issue, since there is a column later that
has only 4 different values, so have it check that, and if a
row has a different value than those, have it shift the
values over. Don't know if the issue is how excell creates
the PDF file or it because the cell is empty nothing is
outputed.
Was using pdftohtml, since it tended to put cells out as
separate lines, but recently it was randomly getting some
cells combined on lines.
Like I said, have a program that automatically cleans it
all up, so not an issue, but thought I'd ask. Used at least
one site that is for a paid program, but has a demo
process, and it does export data, and catches empty cell
some how.
Thanks for all the work. Otherwise it is great.
Have a nice day.
More information about the poppler
mailing list