The possibilities for duplicate content checking using SHA512 is limited. What do you think of swapping that out for Simhash so more nuanced comparisons of content would be possible?
The stopwords package already implements Simhash in Go and has a compatible license.