Accurate weather predictions are crucial for farmers to optimize irrigation, planting, and harvesting. Traditional weather forecasts often lack precision at a hyper-local level, making it challenging for farmers to make informed decisions.
This project aims to develop a machine learning model that predicts whether it will rain (rain_or_not) based on historical weather data. The initial dataset consists of 310 days of daily weather observations, including temperature, humidity, wind speed, and rainfall status. However, the raw data contains missing values, errors, and inconsistencies, requiring thorough preprocessing before training the model.
The final goal is to build and optimize a predictive model that can forecast rainfall probabilities for the next 21 days, helping farmers make data-driven agricultural decisions.
- Handle missing values and incorrect entries
- Convert date formats and normalize numerical features
- Encode categorical variables if necessary
- Create new features based on domain knowledge (e.g., rolling averages, seasonal effects)
- Transform skewed features for better model performance
- Scale numerical features to standardize data
- Analyze class distribution of
rain_or_not - Identify feature correlations
- Visualize weather trends and relationships
- Feature selection based on correlation
- Hyperparameter tuning for better performance
- Address class imbalance if necessary
- The final model should provide the probability of rain for the next 21 days
📂 INTELLIHACK_COGNIC_AI_01
│-- 📂 Data-Set
│ ├── 📂 processed_data # Processed training & testing datasets
│ ├── weather_train.csv # Training dataset
│ ├── weather_test.csv # Testing dataset
│ ├── weather_data.csv # Initial weather data set
│
│-- 📂 Plots # Visualizations & analysis outputs
│
│-- 📂 Scripts
│ ├── EDA.ipynb # Exploratory Data Analysis (EDA)
│ ├── Pre_Process.ipynb # Data preprocessing and feature engineering
│ ├──
│
│-- Q-1 Weather Forecasting # (Challenge Question)
│-- README.md # Project documentation
- Python 3.8
pandas: Data manipulation and analysisnumpy: Numerical operationsmatplotlib & seaborn: Data visualizationos: File operationsscipy.stats: Statistical functions and hypothesis testingstatsmodels.api: Statistical modeling and time series analysisscikit-learn:SimpleImputer, KNNImputer: Handling missing valuesStandardScaler: Feature scalingtrain_test_split: Splitting dataset into train-test setsmutual_info_classif: Feature selection
missingno: Visualizing missing valuesdatetime: Handling date operationswarnings: Suppress unnecessary warnings