NLP Project for CVE Data Analysis

Deployed at: https://nlpproject-pl6e.onrender.com/

Project Overview

This NLP project is built to streamline the extraction, analysis, and accessibility of critical information from CVE (Common Vulnerabilities and Exposures) entries. By employing advanced NLP models and interactive visualizations, this project enables users to gain deep insights into security vulnerabilities and seamlessly integrate the data into security workflows.

Features

Data Extraction: Automated data collection of CVEs using web scraping and APIs.
Data Cleaning: Preprocessing and cleaning of raw data for consistency and accuracy.
Database Storage: Processed data is stored in MongoDB Atlas for easy management and retrieval.
Web Application: Flask-based website with a user-friendly dashboard for CVE analysis.
Interactive Visualization: Integrated Plotly for interactive visualizations to enhance data insights.
Search Functionality: Allows users to search specific CVE IDs and access comprehensive details.
NLP Inference: Implemented NLP inference using GROQ and an open-source LLaMA 3.1 8B model to provide sophisticated language understanding and response generation.
Similar Search: User can find top 5 similar CVEs for a given description of a cyber attack.

Project Workflow

Data Extraction: Gathered CVE data through scraping and API integrations.
Data Cleaning: Processed data to ensure consistency, remove duplicates, and handle missing values.
Storage: Stored the cleaned data in MongoDB Atlas, making it accessible for both display and further analysis.
Web Application Development: Built a Flask-based web app with a dashboard to display CVE information.
Visualization: Added Plotly-based charts for interactive data visualization, helping users explore CVE data insights.
Search and Analysis: Users can search for CVEs by ID, view detailed information, and analyze key metrics.
NLP Model Integration: Utilized GROQ with LLaMA 3.1 8B model to support natural language responses based on CVE data.
Similar Search Integration: Applied RAG for similar CVE search.

Technologies Used

Backend: Flask
Database: MongoDB Atlas, Pinecone
Frontend: HTML, CSS, JavaScript
Visualization: Plotly
NLP Model: GROQ with LLaMA 3.1 8B model, RAG with vector search
Transformers: Bert for Tokenization, AutoModel for Embedding

Getting Started

Prerequisites

Python 3.12 used
MongoDB Atlas account
Pinecone account
API keys for data sources (contact us for data)

Installation

Clone the repository:

git clone https://github.com/yourusername/nlp-cve-analysis.git
cd nlpProject

Install dependencies:
```
pip install -r requirements.txt
```
Configure MongoDB Atlas:
- Update .env with your GROQ_API_KEY.
- Update .env with your PINECONE_API_KEY.
- Update .env with your MongoDB connection string. Or reach out to us for data.
Run the Flask app:
```
python main.py
```
Access the Application: Open your browser and navigate to http://localhost:5000.

Usage

Dashboard: View CVE data insights through interactive visualizations.
Search: Enter CVE IDs to retrieve detailed vulnerability information.
NLP Responses: Ask questions related to CVE data, and get responses generated by the LLaMA 3.1 8B model.
Similar Search: User can find top 5 similar CVEs for a given description of a cyber attack.

Future Improvements

Expand NLP model capabilities for broader question answering.
Enhance dashboard with additional metrics and visualizations.

Contact

For further inquiries, please contact msa23010@iiitl.ac.in, msd23007@iiitl.ac.in, msd23004@iiitl.ac.in, msa23004@iiitl.ac.in, msd23024@iiitl.ac.in .

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Helper Python Notebooks		Helper Python Notebooks
__pycache__		__pycache__
static/images		static/images
templates		templates
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt
similarity.py		similarity.py
try.py		try.py
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP Project for CVE Data Analysis

Project Overview

Features

Project Workflow

Technologies Used

Getting Started

Prerequisites

Installation

Usage

Future Improvements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

nishantdeswal1810/nlpProject

Folders and files

Latest commit

History

Repository files navigation

NLP Project for CVE Data Analysis

Project Overview

Features

Project Workflow

Technologies Used

Getting Started

Prerequisites

Installation

Usage

Future Improvements

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages