[Mesa-dev] [PATCH 02/16] docs: Add python script that converts html to rst.
laura at jlekstrand.net
Tue May 29 23:11:31 UTC 2018
Nevermind, Jason used a git revert trick to get it to work. The v2 will
have the diff and preserve the git file history.
On Fri, May 25, 2018 at 7:58 PM, Laura Ekstrand <laura at jlekstrand.net>
> I specifically tried forcing a rename earlier, but it doesn't work. Git
> sees too much change. The only way I could get it to work was manually
> renaming the HTML files to rst first, then committing, then converting to
> The problem with that strategy is that then the Pandoc command for
> converting to rst doesn't make sense. (.rst to .rst? What?)
> On Fri, May 25, 2018, 4:26 AM Eric Engestrom <eric.engestrom at intel.com>
>> On Thursday, 2018-05-24 17:27:05 -0700, Laura Ekstrand wrote:
>> > Use Beautiful Soup to fix bad html, then use pandoc for converting to
>> > rst.
>> > ---
>> > docs/rstConverter.py | 23 +++++++++++++++++++++++
>> > 1 file changed, 23 insertions(+)
>> > create mode 100755 docs/rstConverter.py
>> > diff --git a/docs/rstConverter.py b/docs/rstConverter.py
>> > new file mode 100755
>> > index 0000000000..5321fdde8b
>> > --- /dev/null
>> > +++ b/docs/rstConverter.py
>> > @@ -0,0 +1,23 @@
>> > +#!/usr/bin/python3
>> > +import glob
>> > +import subprocess
>> > +from bs4 import BeautifulSoup
>> > +
>> > +pages = glob.glob("*.html")
>> > +pages += glob.glob("relnotes/*.html")
>> > +for filename in pages:
>> > + # Fix some annoyingly bad html.
>> > + with open(filename) as f:
>> > + soup = BeautifulSoup(f, 'html5lib')
>> > + soup.find("div", "header").extract() # Get rid of old header
>> > + soup.iframe.extract() # Get rid of old contents bar.
>> > + soup.find("div", "content").unwrap() # Strip the content div.
>> Good call on using beautifulsoup to clean the html before converting it!
>> > +
>> > + # Write out the better html.
>> > + with open(filename, 'wt') as f:
>> > + f.write(str(soup))
>> > +
>> > + # Convert to rst with pandoc.
>> > + name = filename.split(".html")
>> > + bashCmd = "pandoc " + filename + " -o " + name + ".rst"
>> > + subprocess.run(bashCmd.split())
>> Idea: remove the old html at the same time as we introduce the rst
>> (commit-wise), so that git picks it up as a rename with changes, which
>> hopefully would be easier to check as a 1:1 of any given conversion?
>> (In case this is as unclear as I think it is, I'm thinking about how we
>> can review individual pages conversions; say index.html -> index.rst, to
>> see that no release has been dropped in the process. If git shows this
>> as a rename with changes, I expect it will be easier to check than if
>> one commit creates all the rst files and another deletes all the html)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mesa-dev