Machine Learning - Binary Classification of VPN Proxy IP Address

The main goal was to participate in a Kaggle competition and develop a model capable of predicting whether an IP address is associated with a VPN or Proxy service.

Objective

The task involved:

Training and evaluating binary classification models on an anonymized dataset of reported attacks provided by CrowdSec.
Participating in the Kaggle competition: Binary Classification of VPN Proxy IP Address.
Using F1-Score as the main evaluation metric (both for local validation and Kaggle submissions).

The dataset was highly imbalanced (~5% positive class, i.e., VPN/Proxy), which required careful feature engineering and validation strategies.

Project Structure

The work was organized into three main parts:

Exploratory Data Analysis (EDA)

Created 6 visualizations (bar plot, heatmap, marginal distributions, etc.) to analyze the relationship between features and the target variable.
Identified correlations, imbalance issues, and relevant patterns in the dataset.

Baseline

Implemented a simple perceptron as a benchmark model.
Applied categorical encodings and hyperparameter search.
Evaluated performance on validation and test sets to establish a reference point.

Competitive Models

Trained two different models with advanced feature engineering, including:
- Missing value imputation.
- Mean encoding on selected features.
- One-hot encoding on selected features.
Creation of at least 5 new features.
Performed hyperparameter tuning and reproducible validation.
Selected the best model based on F1-Score (validation + Kaggle results).

Results

Baseline (Perceptron): F1-score ≈ X.XX (validation).
Model 1: F1-score ≈ X.XX (validation), X.XX (Kaggle).
Model 2: F1-score ≈ X.XX (validation), X.XX (Kaggle).
Best model selected: Model N.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Clases		Clases
TP		TP
README.md		README.md
TP2_Parte_2_Baseline.ipynb		TP2_Parte_2_Baseline.ipynb
TP2_Parte_3_Random_Forest.ipynb		TP2_Parte_3_Random_Forest.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning - Binary Classification of VPN Proxy IP Address

Objective

Project Structure

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

lihuencarranza/BinaryClassification

Folders and files

Latest commit

History

Repository files navigation

Machine Learning - Binary Classification of VPN Proxy IP Address

Objective

Project Structure

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages