Watcher: My best friend for finding local information

This is a project for CLPS1220B: Collective Cognition. I was curious about the ways that we could use the internet as a transactive memory system, specifically in relation to Vannevar Bush's Memex of having a read version where we could look through databases of information that applies to us instantly. Specifically, this piece from the Scientific American is what peaked this curiosity during the class.

Heavily inspired by Linus Lee's Monocle Project and some other previous examples of memex's

Applications to Collective Cognition

There's two main applications. The first is the fact that one of the views is that individuals who have effective memory recall are able to contribute more successfully to group projects. This can be seen in the fact that in meetings or other discussion based activities, individuals with notes / those who come prepared and have thought about what they're saying are more active contributors. Besides the direct idea into transactive memory systems, it also creates faster read access into someone else's brain. One of the fundamental problems in collective cognition is around cooperation and collaboration problems. This is even seen in crypto and other governance related issues but having quick speed into other people's notes/ideas/brain space can serve as a first step in solving this issue

Details and Implementation

This is a static web app built with create-react-app and hosts a python library for creating an index of notes.

Data Sources

Currently this supports four main data sources: obsidian, twitter, blogs, and my email newsletter. Future implementations would look at Readwise, Pocket, and other web based sources but was unable to get to this due to limitations of time.

type Doc = {
  // identifier for blocks
  id: string;
  // A map of each token in the document to the number of times it appears
  // in the document.
  tokens: Map<string, number>;
  // The document's text content
  content: string;
  // Optionally, the doc's title
  title?: string;
  // Optionally a link to this document on the web if it exists
  href?: string;
};

Tokenizing

The algorithm uses the nltk package form python for tokenizing which removes most common stop words and punctuation (which is helpful when we're searching later since rarely am I searching for a specific punctuation). From here, this produces the doc with doc tokens (as seen above).

Index

After all the docs have been generated, we now have lists of docs for each different type of data source that we have. From here, we iterate through each document to create an inverted index. This massive JSON is then passed to the frontend for querying.

Frontend

Built on top of material-ui (one of my favorite libraries), some css love, and a lot of tears. We tokenize the query and search for the union (not the intersection, mostly because I couldn't get it to work). From here, we run the standard tf-idf algorithm on the docuemnts to get our search results.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
indexer		indexer
lib		lib
modules		modules
public		public
src		src
.gitignore		.gitignore
README.md		README.md
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Watcher: My best friend for finding local information

Applications to Collective Cognition

Details and Implementation

Data Sources

Tokenizing

Index

Frontend

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zhaovan/watcher

Folders and files

Latest commit

History

Repository files navigation

Watcher: My best friend for finding local information

Applications to Collective Cognition

Details and Implementation

Data Sources

Tokenizing

Index

Frontend

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages