[Mesa-dev] [PATCH 02/16] docs: Add python script that converts html to rst.
eric.engestrom at intel.com
Fri May 25 11:26:05 UTC 2018
On Thursday, 2018-05-24 17:27:05 -0700, Laura Ekstrand wrote:
> Use Beautiful Soup to fix bad html, then use pandoc for converting to
> docs/rstConverter.py | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
> create mode 100755 docs/rstConverter.py
> diff --git a/docs/rstConverter.py b/docs/rstConverter.py
> new file mode 100755
> index 0000000000..5321fdde8b
> --- /dev/null
> +++ b/docs/rstConverter.py
> @@ -0,0 +1,23 @@
> +import glob
> +import subprocess
> +from bs4 import BeautifulSoup
> +pages = glob.glob("*.html")
> +pages += glob.glob("relnotes/*.html")
> +for filename in pages:
> + # Fix some annoyingly bad html.
> + with open(filename) as f:
> + soup = BeautifulSoup(f, 'html5lib')
> + soup.find("div", "header").extract() # Get rid of old header
> + soup.iframe.extract() # Get rid of old contents bar.
> + soup.find("div", "content").unwrap() # Strip the content div.
Good call on using beautifulsoup to clean the html before converting it!
> + # Write out the better html.
> + with open(filename, 'wt') as f:
> + f.write(str(soup))
> + # Convert to rst with pandoc.
> + name = filename.split(".html")
> + bashCmd = "pandoc " + filename + " -o " + name + ".rst"
> + subprocess.run(bashCmd.split())
Idea: remove the old html at the same time as we introduce the rst
(commit-wise), so that git picks it up as a rename with changes, which
hopefully would be easier to check as a 1:1 of any given conversion?
(In case this is as unclear as I think it is, I'm thinking about how we
can review individual pages conversions; say index.html -> index.rst, to
see that no release has been dropped in the process. If git shows this
as a rename with changes, I expect it will be easier to check than if
one commit creates all the rst files and another deletes all the html)
More information about the mesa-dev