Skip to content

lihuencarranza/BinaryClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning - Binary Classification of VPN Proxy IP Address

The main goal was to participate in a Kaggle competition and develop a model capable of predicting whether an IP address is associated with a VPN or Proxy service.

Objective

The task involved:

  • Training and evaluating binary classification models on an anonymized dataset of reported attacks provided by CrowdSec.

  • Participating in the Kaggle competition: Binary Classification of VPN Proxy IP Address.

  • Using F1-Score as the main evaluation metric (both for local validation and Kaggle submissions).

The dataset was highly imbalanced (~5% positive class, i.e., VPN/Proxy), which required careful feature engineering and validation strategies.

Project Structure

The work was organized into three main parts:

  1. Exploratory Data Analysis (EDA)
  • Created 6 visualizations (bar plot, heatmap, marginal distributions, etc.) to analyze the relationship between features and the target variable.
  • Identified correlations, imbalance issues, and relevant patterns in the dataset.
  1. Baseline
  • Implemented a simple perceptron as a benchmark model.
  • Applied categorical encodings and hyperparameter search.
  • Evaluated performance on validation and test sets to establish a reference point.
  1. Competitive Models
  • Trained two different models with advanced feature engineering, including:

    • Missing value imputation.
    • Mean encoding on selected features.
    • One-hot encoding on selected features.
  • Creation of at least 5 new features.

  • Performed hyperparameter tuning and reproducible validation.

  • Selected the best model based on F1-Score (validation + Kaggle results).

Results

  • Baseline (Perceptron): F1-score ≈ X.XX (validation).
  • Model 1: F1-score ≈ X.XX (validation), X.XX (Kaggle).
  • Model 2: F1-score ≈ X.XX (validation), X.XX (Kaggle).
  • Best model selected: Model N.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published