Skip to content

Commit 38cda75

Browse files
committed
Attempt 4 to handle redirecting Confluence URLs.
1 parent 3dde0c0 commit 38cda75

File tree

11 files changed

+19322
-3
lines changed

11 files changed

+19322
-3
lines changed

.github/workflows/publish.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,12 @@ jobs:
4444
cat >Makefile.inc <<EOF
4545
${{vars.MAKEFILE_INC}}
4646
EOF
47+
echo "::group::Building documentation"
4748
make
49+
echo "::endgroup::"
50+
echo "::group::Generating the redirect map"
4851
make temp/site/redirect_map.conf
52+
echo "::endgroup::"
53+
echo "::group::Deploying"
4954
make deploy
55+
echo "::endgroup::"

Makefile

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -180,9 +180,14 @@ else
180180
@mkdocs build -f $(BUILD_DIR)/mkdocs.yml -d $(SITE_DIR)
181181
endif
182182

183-
temp/site/redirect_map.conf: temp/site/sitemap.xml
184-
@echo "Creating $@ from $^"
185-
@./utils/create_redirect_map.py $^ $@
183+
temp/site/redirect_map.conf: utils/confluence_redirects/redirect_map_static.conf utils/confluence_redirects/redirect_map_extra.conf
184+
@echo "Creating $@"
185+
@utils/confluence_redirects/check_map.sh $^
186+
@echo "# This file is auto-generated from the files in utils/confluence_redirects" > $@
187+
@echo "# Do not edit directly" >> $@
188+
@echo "# Generated: $$(date -u -Is)" >> $@
189+
@echo "#" >> $@
190+
@cat $^ >> $@
186191

187192
deploy: no-branch-check
188193
@if [ -z "$(DEPLOY_REMOTE)" ] ; then \
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Confluence URL Redirects
2+
3+
## Confluence URLs
4+
5+
The Confluence Wiki instance previously used to host Asterisk documentation used a URL scheme that did not reflect the tree structure of the site.
6+
7+
All pages started with the `/wiki/display/AST/` prefix and if a page title was unique across the site, only the page title was added to the prefix. For instance, the URL for the "License Information" page in the "About the Project" section would simply be...
8+
9+
```
10+
/wiki/display/AST/License+Information
11+
```
12+
13+
The API documentation has duplicate pages by design. For instance, there's a "Hangup" page in AGI Commands, AMI Actions, AMI events and Dialplan Applications AND there's a set of them for each Asterisk version we generate documentation for. The old documentation generation process took care of this by prefixing the page titles with the Asterisk version and API type...
14+
15+
```
16+
/wiki/display/AST/Asterisk+20+AGICommand_hangup
17+
/wiki/display/AST/Asterisk+20+ManagerAction_Hangup
18+
/wiki/display/AST/Asterisk+20+ManagerEvent_Hangup
19+
/wiki/display/AST/Asterisk+21+AGICommand_hangup
20+
/wiki/display/AST/Asterisk+21+ManagerAction_Hangup
21+
/wiki/display/AST/Asterisk+21+ManagerEvent_Hangup
22+
...
23+
```
24+
25+
Finally, if the page title contained other than non-alphanumeric characters or spaces, the URL would be the page ID:
26+
27+
```
28+
/wiki/pages/viewpage.action?pageId=5243109
29+
```
30+
31+
## Attempts to redirect them to the new docs site
32+
33+
### Attempt 1
34+
35+
The first attempt to redirect involved using some basic Nginx redirect rules on the oss-downloads server which handles the wiki.asterisk.org domain which used to house Confluence. Basically, it stripped off the `/wiki/display/AST/` prefix and sent back a 301 redirect to the new site docs.asterisk.org. This almost always resulted in a 404 Not Found however. To get around this, we added redirect capability to the mkdocs.yml file but the entries had to be maintained manually.
36+
37+
### Attempt 2
38+
39+
The next attempt involved using the sitemap.xml file and a custom 404 page handler. When a page wasn't found and the gh-pages server would return the custom 404 page which contained Javascript that would pull down the sitemap.xml file, search it to find a page somewhere in the heirarchy that matched the basename of the page requested, then tell the browser to navigate to that page. This worked "OK" for pages that had exactly the same name as the old page but not very well for the API documentation. It also required the client to grab the 1.2MB sitemap.xml file for every 404 and because all the work was done on the client, it resulted in delays and page flashes. It would also mess up crawlers because instead of getting a 301 which they could follow, they got a 404.
40+
41+
### Attempt 3
42+
43+
Since we have the sitemap.xml file at build time, the next attempt took the file and for each entry tried to guess what the Confluence URL would have looked like. From that a Nginx redirect map was created redirecting the Confluence URL to the new Docs URL and stored in the root of the docs website. A nightly timer job on oss-downloads retrieved the file and reloaded Nginx. Now anyone attempting to visit https://wiki.asterisk.org would have the page looked up in the map and, if it was found, would be redirected to the exact page on the new docs site. If not found, they'd be redirected to the Not Found page on the new website. From a performance perspective, this worked great but the "tried to guess" bit didn't really work for pages that were on the old site but not on the new site, like EOL Asterisk release doeumentation. The `create_redirect_map.py` script that was used to generate the redirect_map.conf file is still in this repo but kept for reference only.
44+
45+
### Attempt 4
46+
47+
As luck and my mental-illness would have it, I still have the Confluence extract we used to create the new site in June 2023. The extract contains a JSON document for every page containing its page title and URL. I extracted those to the `confluence-urls.txt` file in this directory. There's also a `create_redirect_map.sh` bash script that uses a bunch of rules to find the equivalent page in the new site and if found, create a redirect rule for it. There were 9600 entries in the `confluence-urls.txt`, 9432 of which were found and written to `redirect_map_static.conf`. The rest were things that just don't exist anymore or other things that didn't warrant spending time to deal with. Now here's the good part... the `create_redirect_map.sh` script never has to be run again because Confluence is GONE and the `confluence-urls.txt` file can never change. There's also a separate `redirect_map_extra.conf` file in this directory into which additional redirect entries can be placed as needed. There are a few in there already that the script didn't catch but were easy to find manually.
48+
49+
## Operation
50+
51+
So now we have 2 redirect files in `utils/confluence_redirects`. The top-level Makefile has a rule for `temp/site/redirect_map.conf` that concatenates the two files into the one and checks them for formatting and duplicate entries (which will cause Nginx to barf). The publish process will make that file available as `https://docs.asterisk.org/redirect_map.conf`.
52+
53+
The `nginx-redirect-map-fetch.service` and `nginx-redirect-map-fetch.timer` files reside on oss-downloads in `/etc/systemd/system` and cause the server to download the file to `/etc/nginx/redirect_map.conf` and reload nginx. `/etc/nginx/nginx.conf` loads the map and `/etc/nginx/conf.d/redirects.conf` has the lookup block.
54+
55+
## Maintenance
56+
57+
If you discover bad entries in the original `redirect_map_static.conf` file, you can correct or comment them out but you shouldn't remove any entries and any new entries should go in the `redirect_map_extra.conf` file. You also should run `./check_map.sh redirect_map_static.conf redirect_map_extra.conf` to check the formatting and make sure there are no dups in the file. `make temp/site/redirect_map.conf` does this as well.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/usr/bin/bash
2+
3+
infile=${1:?"Usage: $(basename $0) <input_file> [<input_file> ... ]"}
4+
5+
left=$(mktemp /tmp/left-XXXXX)
6+
right=$(mktemp /tmp/right-XXXXX)
7+
8+
grep --no-filename -E "^/wiki" "$@" | sort --key=1,1 > ${left}
9+
sort --key=1,1 -u ${left} > ${right}
10+
diff -uprN ${left} ${right} || { echo "Duplicate entries found!" 1>&2 ; exit 1 ; }
11+
grep -vE '(/wiki[^ ]+)\s+(/.+)/;$' ${right} && { echo "Bad entries found!" 1>&2 ; exit 1 ; }
12+
rm -rf ${left} ${right}
13+
exit 0

0 commit comments

Comments
 (0)