Institution: Pontifical Catholic University of São Paulo (PUC-SP)
School: Faculty of Interdisciplinary Studies
Program: Humanistic AI and Data Science
Semester: 2nd Semester 2025
Professor: Professor Doctor in Mathematics Daniel Rodrigues da Silva
Important
- Projects and deliverables may be made publicly available whenever possible.
- The course emphasizes practical, hands-on experience with real datasets to simulate professional consulting scenarios in the fields of Data Analysis and Data Mining for partner organizations and institutions affiliated with the university.
- All activities comply with the academic and ethical guidelines of PUC-SP.
- Any content not authorized for public disclosure will remain confidential and securely stored in private repositories.
🎶 Prelude Suite no.1 (J. S. Bach) - Sound Design Remix
Statistical.Measures.and.Banking.Sector.Analysis.at.Bovespa.mp4
📺 For better resolution, watch the video on YouTube.
Tip
This repository is a review of the Statistics course from the undergraduate program Humanities, AI and Data Science at PUC-SP.
☞ Access Data Mining Main Repository
🇬🇧 This code performs an exploratory and clustering analysis of the dataset "Grupo4.csv" for a classroom project. It includes data cleaning, preprocessing, and applies three clustering algorithms (K-Means, Mean-Shift, Affinity Propagation), visualizing and comparing the results in Python.
🇧🇷 Este código realiza uma análise exploratória e de agrupamento no dataset "Grupo4.csv" para um projeto de sala de aula. Inclui limpeza e pré-processamento dos dados, além de aplicar três algoritmos de agrupamento (K-Means, Mean-Shift, Propagação por Afinidade), visualizando e comparando os resultados em Python.
import pandas as pd
df = pd.read_csv('Grupo4.csv')
df.head()num_rows, num_cols = df.shape
print(f"🇬🇧 Number of rows: {num_rows}, Number of columns: {num_cols}")
print(f"🇧🇷 Número de linhas: {num_rows}, Número de colunas: {num_cols}")
display(df.describe())if 'Unnamed: 0' in df.columns:
df.drop('Unnamed: 0', axis=1, inplace=True)
print("🇬🇧 'Unnamed: 0' column dropped. 🇧🇷 Coluna 'Unnamed: 0' removida.")
else:
print("🇬🇧 'Unnamed: 0' column not found. 🇧🇷 Coluna 'Unnamed: 0' não encontrada.")print("🇬🇧 Missing values per column before filling:")
print("🇧🇷 Valores faltantes por coluna antes do preenchimento:")
print(df.isnull().sum())column_medians = df.median()
df.fillna(column_medians, inplace=True)
print("🇬🇧 Missing values filled with medians. 🇧🇷 Valores faltantes preenchidos com as medianas.")initial_rows = df.shape[0]
df.drop_duplicates(inplace=True)
rows_after_duplicates = df.shape[0]
print(f"🇬🇧 Duplicates removed: {initial_rows - rows_after_duplicates}")
print(f"🇧🇷 Duplicados removidos: {initial_rows - rows_after_duplicates}")display(df.head())
num_rows_preprocessed, num_cols_preprocessed = df.shape
print(f"🇬🇧 After preprocessing: {num_rows_preprocessed} rows, {num_cols_preprocessed} columns")
print(f"🇧🇷 Após o pré-processamento: {num_rows_preprocessed} linhas, {num_cols_preprocessed} colunas")import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
sns.set_palette('viridis')
plt.figure(figsize=(12, 8))
sns.scatterplot(data=df, x='Coluna1', y='Coluna2')
plt.title('Scatter Plot of Coluna1 vs Coluna2 / Gráfico de Dispersão Coluna1 vs Coluna2')
plt.show()
21- Our Crew:
-
👨🏽🚀 Andson Ribeiro - Slide into my inbox
-
👩🏻🚀 Fabiana ⚡️ Campanari - Shoot me an email
-
👨🏽🚀 José Augusto de Souza Oliveira - email
-
🧑🏼🚀 Luan Fabiano - email
-
👨🏽🚀 Pedro Barrenco - email
-
🧑🏼🚀 Pedro Vyctor - Hit me up by email
1. Castro, L. N. & Ferrari, D. G. (2016). Introdução à mineração de dados: conceitos básicos, algoritmos e aplicações. Saraiva.
2. Ferreira, A. C. P. L. et al. (2024). Inteligência Artificial - Uma Abordagem de Aprendizado de Máquina. 2nd Ed. LTC.
3. Larson & Farber (2015). Estatística Aplicada. Pearson.
🛸๋ My Contacts Hub
────────────── 🔭⋆ ──────────────
➣➢➤ Back to Top
Copyright 2025 Quantum Software Development. Code released under the MIT License license.