diff --git a/Done_Challenge__Day2.ipynb b/Done_Challenge__Day2.ipynb
new file mode 100644
index 0000000..5c0e597
--- /dev/null
+++ b/Done_Challenge__Day2.ipynb
@@ -0,0 +1,628 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "Done_Challenge_ Day2.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#Done Project: Data Minining Project for X company"
+ ],
+ "metadata": {
+ "id": "zroHHWfG7V2M"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zDwep1K8Erxl"
+ },
+ "source": [
+ "**Project:** Data Minining Project for X company"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "JzIu-UWIDXHw"
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d7-ii3uyI8KY"
+ },
+ "source": [
+ "The CRISP-DM Framework\n",
+ "\n",
+ "\n",
+ "The CRISP-DM methodology provides a structured approach to planning a data mining project. It is a robust and well-proven methodology.\n",
+ "* Business understanding (BU): Determine Business Objectives, Assess Situation, Determine Data Mining Goals, Produce Project Plan\n",
+ "\n",
+ "* Data understanding (DU): Collect Initial Data, Describe Data, Explore Data, Verify Data Quality\n",
+ "\n",
+ "* Data preparation (DP): Select Data, Clean Data, Construct Data, Integrate Data\n",
+ "\n",
+ "* Modeling (M): Select modeling technique, Generate Test Design, Build Model, Assess Model\n",
+ "* Evaluation (E): Evaluate Results, Review Process, Determine Next Steps\n",
+ "* Deployment (D): Plan Deployment, Plan Monitoring and Maintenance, Produce Final Report, Review Project\n",
+ "\n",
+ "\n",
+ "References:\n",
+ "\n",
+ "[What is the CRISP-DM methodology?](https://www.sv-europe.com/crisp-dm-methodology/)\n",
+ "\n",
+ "[Introduction to CRISP DM Framework for Data Science and Machine Learning](https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5lo7Ml7tMQOf"
+ },
+ "source": [
+ "**Data Set**\n",
+ "### The data is for company X which is trying to control attrition. \n",
+ "### There are two sets of data: \"Existing employees\" and \"Employees who have left\". The following attributes are available for every employee.\n",
+ "\n",
+ "\n",
+ "* Satisfaction Level\n",
+ "\n",
+ "* Last evaluation\n",
+ "\n",
+ "* Number of projects\n",
+ "\n",
+ "* Average monthly hours\n",
+ "\n",
+ "* Time spent at the company\n",
+ "* Whether they have had a work accident\n",
+ "\n",
+ "\n",
+ "* Whether they have had a promotion in the last 5 years\n",
+ "\n",
+ "\n",
+ "* Departments (column sales)\n",
+ "\n",
+ "\n",
+ "* Salary\n",
+ "\n",
+ "\n",
+ "* Whether the employee has left\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "sjSj2A2sSph_"
+ },
+ "source": [
+ "**Your Role**\n",
+ " \n",
+ "\n",
+ "* As data science team member X company asked you to answer this two questions.\n",
+ "* What type of employees is leaving? \n",
+ "\n",
+ "* Determine which employees are prone to leave next.\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ajdEVA7LiBUp"
+ },
+ "source": [
+ "Business Understanding\n",
+ "\n",
+ "---\n",
+ "\n",
+ "This step mostly focuses on understanding the Business in all the different aspects. It follows the below different steps.\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "* Identify the goal and frame the business problem.\n",
+ "* Prepare Analytical Goal i.e. what type of performance metric and loss function to use\n",
+ "* Gather information on resource, constraints, assumptions, risks etc\n",
+ "* Gather information on resource, constraints, assumptions, risks etc\n",
+ "* Prepare Work Flow Chart"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "J4MwiCYzj2_u"
+ },
+ "source": [
+ "### Write the main objectives of this project in your words?\n",
+ "minimum of 100 characters"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "STyLda45j1Mf"
+ },
+ "source": [
+ "main_objectives ='''This project aims to allow me to understand about Data mining methodologies such as CRISP in general, \n",
+ "particularly business and data understanding. we have two classes, namely: \"Existing employees\" and \"Employees who have left\"\n",
+ "It could be identified based on the value of each given questions as the model will train from it. \n",
+ "'''"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "CuOlxLxKMOLI"
+ },
+ "source": [
+ "assert len(main_objectives) > 100 \n",
+ "### BEGIN HIDDEN TESTS\n",
+ "assert len(main_objectives) > 80 \n",
+ "### END HIDDEN TESTS"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "NyXeNxlCkbaw"
+ },
+ "source": [
+ "### Outline the different data analysis steps you will follow to carry out the project"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "rC-tl8sUksQq"
+ },
+ "source": [
+ "dm_outline = '''According to Will Hillier, we have 7 data analysis steps [1] (https://careerfoundry.com/en/blog/data-analytics/the-data-analysis-process-step-by-step/#step-four-analyzing-the-data) \n",
+ "1. Defining the question: we have already given two questions 'What type of employees is leaving?' and 'which employees are prone to leave next?'\n",
+ "2. Collecting the data: we need a massive amount of data in order to train the model well. ML requires large amount of data.\n",
+ "3. Cleaning the data: uncleaned data leads to wrong prediction, hence cleaning data is mandatory.\n",
+ "4. Analyzing the data: This one is the main step. after we cleaned the data, analyzing or using it for training is the next level with such as predictive analysis\n",
+ "5. Sharing your results: We have found something (insights) in analysis level, the next is sharing it to the x organization\n",
+ "6. Embracing failure: failure is the sign of working something harder to work, hence accepting failure and hone your ability to spot and rectify errors is the main thing. \n",
+ "7. Summary: the final step, is to summarize what we have done.\n",
+ "'''"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "-K1mWuDoksTk"
+ },
+ "source": [
+ "assert len(dm_outline) > 100 \n",
+ "### BEGIN HIDDEN TESTS\n",
+ "assert len(dm_outline) > 70 \n",
+ "### END HIDDEN TESTS"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pmUDFG1wkzUy"
+ },
+ "source": [
+ "I will use the Accuracy metric to measure the performance of this data analysis model\n",
+ "# accuracy = **$\\frac{correct-predictions}{all-predictions}$**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "KCNulojKk_BP"
+ },
+ "source": [
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vLS2YHoRk_EK"
+ },
+ "source": [
+ "Why do you choose these metrics? minimum of 100 characters"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "LSynT14KlPSJ"
+ },
+ "source": [
+ "why_metrics = '''we are developing a model to predict whether the employees of the x organization will leave or not based on the data collected data from it. \n",
+ "Hence, we want to build a more accurate model that can be able to outcomes result in better decisions. There might be a cost of errors, but optimizing model accuracy mitigates that cost. \n",
+ "There are many optimization algorithms to handle such losses of a model. The benefits of improving model accuracy help avoid considerable time, money, and undue stress.\n",
+ "'''"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "yr-Mk0E8lPVJ"
+ },
+ "source": [
+ "assert len(why_metrics) > 100 \n",
+ "### BEGIN HIDDEN TESTS\n",
+ "assert len(why_metrics) > 80 \n",
+ "### END HIDDEN TESTS"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aAo19Ip6lUtm"
+ },
+ "source": [
+ "### How would you know if your data analysis work is a success or not?\n",
+ "minimum of 100 characters"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "HESsiXW5llX-"
+ },
+ "source": [
+ "how_success = '''After we have analyzed the data (or experiment the model) we'll demonstrate the result to the organization. \n",
+ "What is next is taking their response and feedback by applying a usability testing and quality testing measurements. \n",
+ "'''"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "FdUoiMIOlmXq",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "cf546265-bd20-46dd-e5fe-273e53eef495"
+ },
+ "source": [
+ "assert len(how_success) > 100 \n",
+ "### BEGIN HIDDEN TESTS\n",
+ "print(len(how_success))\n",
+ "assert len(how_success) > 80 \n",
+ "### END HIDDEN TESTS"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "227\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DQE6dqo6l1TZ"
+ },
+ "source": [
+ "## What kind of challenges do you expect in your analysis?\n",
+ "List at least 3 challenges"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "WrAhBQhQl8Lh"
+ },
+ "source": [
+ "challenge_text = '''The most challenge will be related to the data collection process. however, we could also face other challenges. \n",
+ "Here are some challenges what I expect during data analysis.\n",
+ "1. Collecting meaningful data: Identifying and collecting which data is vital for the organization/business is one \n",
+ "2. Selecting the right tool: Since the nature of data may vary as per the area we are going to work, selecting the right tool for the collected data may also a challenge.\n",
+ "3. Consolidate data from multiple sources: data can be collected form different sources; hence structure of these data will be different, putting these data together and using it is another challenge \n",
+ "'''\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "EedHa-Pll8X7",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "1b8df4ac-173a-42aa-d0f7-df15a5844714"
+ },
+ "source": [
+ "assert len(challenge_text) > 100 \n",
+ "### BEGIN HIDDEN TESTS\n",
+ "print(len(challenge_text))\n",
+ "assert len(how_success) > 80 \n",
+ "### END HIDDEN TESTS"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "663\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZcJ8M6uWDeSE"
+ },
+ "source": [
+ "