[poppler] Minor issue with pdf files created from excell that have empty cells?

Michael D. Setzer II mikes at guam.net
Sat Oct 9 10:47:35 UTC 2021


Have a couple of spreadsheets my college puts on web 
site as pdf files, but are created from excell spreadsheets.
Using the poppler pdftohtml or pdftotxt I'm able to get 
the data from the file. Only issue is that a few records 
have cells that are blank, and this throws the columns off.
Original spreadsheet has columns A thru S, but on rows 
with a blank cell data gets shifted. Am able to have 
program correct issue, since there is a column later that 
has only 4 different values, so have it check that, and if a 
row has a different value than those, have it shift the 
values over. Don't know if the issue is how excell creates 
the PDF file or it because the cell is empty nothing is 
outputed. 

Was using pdftohtml, since it tended to put cells out as 
separate lines, but recently it was randomly getting some 
cells combined on lines. 

Like I said,  have a program that automatically cleans it 
all up, so not an issue, but thought I'd ask. Used at least 
one site that is for a paid program, but has a demo 
process, and it does export data, and catches empty cell 
some how. 

Thanks for all the work. Otherwise it is great.
Have a nice day.



More information about the poppler mailing list