This repo was used to prepare the talk given by Alex Mann at Cognitect's 2016 Conj Conference. It includes a standard implementation of tSNE, examples of data rendered this way, a novel implementation of interop between Clojure and Python, a number of datasets which can be rendered into Clojure objects, and some examples of generatives testing.
I want to start by citing the sources that helped me get this far. This list is by no means exhaustive as there are many blogs and whitepapers I consumed where the information remains and the name has fled.
- Original whitepaper by Hinton and van der Maaten
- Laurens van der Maaten's tSNE resource website
- Joseph Turian's modifications/code for tSNE
- Original whitepaper detailing architecture of SENNA by Collobert and Weston
- SENNA website
I lifted datasets from the following places:
- MNIST from Turian's github repo (link above)
- 130000 Word embeddings from Collobert's SENNA site download (link above)
- Places from hiiamrohit's countries-states-cities-database github repo
- 3000 most common words were copy and pasted from http://www.ef.com/english-resources/english-vocabulary/top-3000-words/
lein testI got sick of starting a headless repl, so the following will start a session at port 54321.
lein nrepl
There are examples of SVG rendering presented in the core namespace in the comments below. The gist is though, to run data through tSNE, then pipe it into spit-svg. Pretty straightforward!