-
Notifications
You must be signed in to change notification settings - Fork 20
Analysis
Association mining seeks to find term(s) that are highly correlated to another term. As a first process of sifting through large datasets, association mining provides a quicker way of identifying relevant data points. The R script sclicer.R operates by dynamically creating an SQL query to get data that highly correlates with keyword of an event by calculating the optimal lower correlation limit for finding associated terms.
This script works by finding the words the are most associated with a given keyword using association mining. We use a dynamic correlation limit that varies depending on the size of your corpus. It then goes into the database and selects the tweets that mention your keyword and it's most relevant associations. It will also drop duplicates and retweets.
The file self.R utilises the association mining technique to automatically label relevant datasets as TRUE and the rest as FALSE then use the dataset to create a predictive model.
Classification aims to create a predictive model for classifying text into preset categories/classes. This is normally done from a pre-labelled dataset, the labelling answers a question and the answers are the categories/classes. Classification.R Does tokenisation for feature extraction and uses individual words as features for the naive-Bayes algorithm as a predictive model. Used to reduce noise.
Creates a parse tree of tweets. For visual inception for the grammatical structure of tweets, particularly tweets of interest.