<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<title></title>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta http-equiv="Content-Style-Type" content="text/css"/>
</head>
<body>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">Have setup a system the uses the latest poppler pdftohtml on linux, and it
works fine. Wanted to make the system available to others that use
Windows. Was able to fine poppler-0.68.0 for windows, and for my
needs, it does work fine. The output has some differences from the linux
version, but not for data that I extract.</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">Found a number of pages that state this is an outdated version, but no
real explanation on why a windows version is no longer being updated?</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">Did find a page that seemed to have a windows version that seemed
much newer, but when I tried to run it, it came up with a number of
required DLL files, and I was unable to locate them?</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">My other question. The pdftohtml works great to extract the data in a raw
format that basically goes line by line with all the data from the 5 pages
in the pdf file. When I try the pdftotext, it puts out the data in a column
by column method that is impossible to process.. </span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">Use the pdftohtml to extract raw info, then use a cpp program to convert
it to a csv file. </span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">So, does what I need, but wondering why the differences in output.</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">PDF file has staffing pattern data from a spreadsheet converted to pdf.</span></font></div>
<div align="left"><font face="Times New Roman" size="4"><span style=" font-size:14pt">No access to spreadsheet, and trying to copy it directly doesn't work.</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">#!/bin/bash</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">if [[ $# -eq 0 ]] ; then</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt"> echo 'Need to provide name staffing pattern pdf saved from firefox with extension';ls -1 *.pdf;exit
1</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">fi</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">f=${1%.*}</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">#Convert data from pdf file. Trim lines before data start. Eliminate useless lines fix   issue</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">#Change non-break space to regular space, and change 3 byte - to regular - ; space before ;</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">time pdftohtml -nomerge -noframes -q "$f".pdf</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt"># Fix 3 typos - Though they were fixed?? Moved it into program to fix both linux and windows
process.</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">#sed -i
's/Accomodative/Accommodative/g;s/Telecomunications/Telecommunications/g;s/Administative/Administrative/g' "$f".html</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">time ./fixf2b4 "$f".html</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">#libreoffice --infilter=CSV:59,34,76,1 "$f".csv</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">real 0m0.064s</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">user 0m0.059s</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">sys 0m0.004s</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">real 0m0.005s</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">user 0m0.003s</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">sys 0m0.002s</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt"><br />
</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">Have told them of typos, but they have fixed them yet. </span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">Love that the pdftohtml works great to extract the raw data.</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">Sure it must take a good deal of coding.</span></font></div>
<div align="left"><font face="Times New Roman" size="2"><span style=" font-size:10pt">Thanks and be Safe.</span></font></div>
</body>
</html>