This tool is used for (almost) automatic migration of Confluence documentation into Discourse.
It handles:
- Html to markup formatting conversion (many thanks to
confluence-to-markdowntool) - Functioning links between topics
- Images and attachments uploads
As an example data documentation from https://confluence.origam.com is used (site is not working anymore).
- Install Python: https://www.python.org/
- Install pandoc: https://pandoc.org/installing.html
- Export Confluence documentation in html format (if you wish to import other Confluence data than our example)
- FYI, https://www.npmjs.com/package/confluence-to-markdown is already downloaded in this repository, so no need to install it
Working directory for whole migration is node_modules/confluence-to-markdown, here everything happens.
(To be more clean, my solution should've been created in separate directory and not been mixed with original confluence-to-markdown directory...I know...)
Migration steps along with directories description:
- See
source_all- exported .html files (with images and attachments) from Confluence. This is our entry data. Copy you prospective files from Confluence export here. - See
python- these are Python scripts which are executed in further .bat files. No need to change them now. - Run
0_prepareHtmls.bat- copies .htmls fromsource_alltosourceand does some preprocessing in these files - See
source- preprocessed .html files are now here - Run
1_createMDs.bat- transforms .html files fromsourceto .md files inimport - See
import- .md files - Preparation for images and attachments automated upload (This is the most complicated part):
- For our example export I have prepared files in
uploaddirectory, which we copied from Confluence export - For further steps excel
excel/Discourse files (PROD).xlsxwill be used (or...(LOCAL).xlsx, if you're working on testing/local environment) - Copy all file names from
uploaddirectory, sorted by file size descending ! (to cover duplicities, see later), into this excel into sheetall files, columnB(FileName) - Now we need to manually upload all files from
uploaddirectory into Discourse . To do this, create any topic in Discourse forum, then manually drag all files in batches of 20 files into the topic text. Ask Discourse admin to set that max. 20 files can be concurrently uploaded (it's max. value). - During upload to Discourse some errors concerning duplicate files can be shown (it seems that you can't upload one file under different names multiple times to Discourse). Ignore errors for now.
- Copy generated text from Discourse topic into excel into sheet
translation, columnC(Discourse text) - Now for handling files duplicities. I handled them in excel in sheet
all files, formula in columnF(FinalFileNameBase) - whenFileNameBasecannot be looked up, then file name from previous row is used (I assume it's file with same content, because in step 3 we entered files by descending size. But better check each lookup error by yourself, so correct replacements in columnFare generated) - If correctly filled, excel now produces needed Python code in sheet
all files, columnJ(Final (md)) -_newFileContent = re.sub...etc.. Copy/replace all column into filepython/prepareMDs.py(in section# images and attachments, LOCAL or PROD environment) - The result of all these steps are uploaded images and files in Discourse forum and prepared python script
python/prepareMDs.py - Maybe I forgot some steps here, in case of need contact me directly and we will workthrough your case, and then I'll update this readme.
- For our example export I have prepared files in
- Run
2_prepareMDs_PROD.bat(or2_prepareMDs_LOCAL.bat) - preprocesses .md files in directoryimport - See
import- now preprocessed .md files are here - Run
3_runImport_PROD.bat(or3_runImport_LOCAL.bat) - finally import topics from prepared .md files. Before executing this please explore scriptpython/runImport.pyto understand it and set needed variables (see sections# set >>>>>>...). I recommend to comment the execution ofcreateAllTopicsduring the first run to test the connection etc. - For import to successfully execute, some Discourse settings will need to be set (eg. minimum topic title length, max. post length, etc. - just see what errors arise during migration), together with API throughput settings (see eg. https://meta.discourse.org/t/available-settings-for-global-rate-limits-and-throttling/78612)
- After finishing the import script all processed .md files are moved into
import_DONEfolder.
FYI, directories bin, src and test are not my creation, they belong to downloaded confluence-to-markdown tool. I just slightly changed these files for use of the migration (topic names generation etc.): src/Formatter.coffee, src/Page.coffee, src/Utils.coffee
To run again the migration, repeat all steps (or just from step 4 onward if you're importing same .html files once again).
Good luck.
There are few things this tool doesn't handle:
- Some special characters (like
.,=) are not included in generated Discourse topic titles - Links to chapters inside topics (anchors) are not working (can be solved by adding
<div data-theme-toc="true"></div>in topic text) - No hierarchy tree in Discourse topics is generated, just flat structure of topics
Martin Zákostelský martin.zakostelsky@seznam.cz