Machine Learning Approach for Identifying Suspicious Uniform Resource Locators (URLs) on Reddit Social Network

This repository is the implementation of a Reddit post crawler and phish URL identification using VirusTotal.

Requirements

To install requirements:

Visual Studio 2015 or higher
Microsoft SQL Server

Nuget packages

Dapper Micro-ORM
FileHelpers
LumenWorksCsvReader
Microsoft.Data.SqlClient
Newtonsoft.json
VirusTotalNet

Python packages

pandas
praw
numpy

📋Move Virual environment folder 'Reddit Post Crawler' (containing /reddit-crawler-env) to the /bin/Debug folder ########## Create Database table as defined by this Schema

Flow

Create a New reddit app
Open praw_scrapper.py file in the virual environment (Reddit Post Crawler) and add app credentials to initialize praw
Activate virtual environment and run 'praw_scrapper.py'
After crawling for posts and saved to new_all_posts.csv (**new and **all are named to indicate the filter values of the scrapper, changeable in praw_scrapper.py)
Add microsoft server connection credentials to App.config
Build and Run the Visual Studio project to start indexing and validating posts

Results

Resulting datasets and processed dataset here

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
DAL		DAL
Enums		Enums
Models		Models
Properties		Properties
Reddit Post Crawler		Reddit Post Crawler
Utility		Utility
.gitattributes		.gitattributes
.gitignore		.gitignore
App.config		App.config
Program.cs		Program.cs
README.md		README.md
RedditCrawler.csproj		RedditCrawler.csproj
RedditCrawler.sln		RedditCrawler.sln
ScriptExecutor.cs		ScriptExecutor.cs
packages.config		packages.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Approach for Identifying Suspicious Uniform Resource Locators (URLs) on Reddit Social Network

Requirements

Nuget packages

Python packages

Flow

Results

About

Uh oh!

Releases 1

Packages

Languages

soldierlytomcat/RedditCrawler

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Approach for Identifying Suspicious Uniform Resource Locators (URLs) on Reddit Social Network

Requirements

Nuget packages

Python packages

Flow

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages