diff options
Diffstat (limited to 'Data Prediction/Tele Churn/.ipynb_checkpoints/Customer-Churn-Prediction-checkpoint.ipynb')
| -rw-r--r-- | Data Prediction/Tele Churn/.ipynb_checkpoints/Customer-Churn-Prediction-checkpoint.ipynb | 3283 |
1 files changed, 3283 insertions, 0 deletions
diff --git a/Data Prediction/Tele Churn/.ipynb_checkpoints/Customer-Churn-Prediction-checkpoint.ipynb b/Data Prediction/Tele Churn/.ipynb_checkpoints/Customer-Churn-Prediction-checkpoint.ipynb new file mode 100644 index 0000000..b601aff --- /dev/null +++ b/Data Prediction/Tele Churn/.ipynb_checkpoints/Customer-Churn-Prediction-checkpoint.ipynb @@ -0,0 +1,3283 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction \n", + "## Customer Churn Prediction" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Customer attrition or churn, is when customers stop doing business with a company. It can have a significant impact on a company's revenue and it's crucial for businesses to find out the reasons why customers are leaving and take steps to reduce the number of customers leaving. One way to do this is by identifying customer segments that are at risk of leaving, and implementing retention strategies to keep them. Also, by using data and machine learning techniques, companies can predict which customers are likely to leave in the future and take actions to keep them before they decide to leave.\n", + "\n", + "We are going to build a basic model for predicting customer churn using [Telco Customer Churn dataset](https://www.kaggle.com/blastchar/telco-customer-churn). We are using some classification algorithm to model customers who have left, using Python tools such as pandas for data manipulation and matplotlib for visualizations.\n", + "\n", + "\n", + "Let's get started." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Steps Involved to Predict Customer Churn\n", + "- Importing Libraries\n", + "- Loading Dataset\n", + "- Exploratory Data Analysis\n", + "- Outliers using IQR method\n", + "- Cleaning and Transforming Data\n", + " - One-hot Encoding\n", + " - Rearranging Columns\n", + " - Feature Scaling\n", + " - Feature Selection\n", + "- Prediction using Logistic Regression\n", + "- Prediction using Support Vector Classifier\n", + "- Prediction using Decision Tree Classifier\n", + "- Prediction using KNN Classifier" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Importing Libraries\n", + "\n", + "First of all, we will import knwon necessary libraries." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "#import platform\n", + "import pandas as pd\n", + "import sklearn\n", + "import numpy as np\n", + "#import graphviz\n", + "import seaborn as sns\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "# import plotly.express as px\n", + "# import plotly.graph_objects as go\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loading Dataset\n", + "We use pandas to read the dataset and preprocess it." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(7043, 21)" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')\n", + "df.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exploratory Data Analysis" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>customerID</th>\n", + " <th>gender</th>\n", + " <th>SeniorCitizen</th>\n", + " <th>Partner</th>\n", + " <th>Dependents</th>\n", + " <th>tenure</th>\n", + " <th>PhoneService</th>\n", + " <th>MultipleLines</th>\n", + " <th>InternetService</th>\n", + " <th>OnlineSecurity</th>\n", + " <th>...</th>\n", + " <th>DeviceProtection</th>\n", + " <th>TechSupport</th>\n", + " <th>StreamingTV</th>\n", + " <th>StreamingMovies</th>\n", + " <th>Contract</th>\n", + " <th>PaperlessBilling</th>\n", + " <th>PaymentMethod</th>\n", + " <th>MonthlyCharges</th>\n", + " <th>TotalCharges</th>\n", + " <th>Churn</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>7590-VHVEG</td>\n", + " <td>Female</td>\n", + " <td>0</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>1</td>\n", + " <td>No</td>\n", + " <td>No phone service</td>\n", + " <td>DSL</td>\n", + " <td>No</td>\n", + " <td>...</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Electronic check</td>\n", + " <td>29.85</td>\n", + " <td>29.85</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>5575-GNVDE</td>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>34</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>...</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>One year</td>\n", + " <td>No</td>\n", + " <td>Mailed check</td>\n", + " <td>56.95</td>\n", + " <td>1889.5</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>3668-QPYBK</td>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>2</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>...</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Mailed check</td>\n", + " <td>53.85</td>\n", + " <td>108.15</td>\n", + " <td>Yes</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3</th>\n", + " <td>7795-CFOCW</td>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>45</td>\n", + " <td>No</td>\n", + " <td>No phone service</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>...</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>One year</td>\n", + " <td>No</td>\n", + " <td>Bank transfer (automatic)</td>\n", + " <td>42.30</td>\n", + " <td>1840.75</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>4</th>\n", + " <td>9237-HQITU</td>\n", + " <td>Female</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>2</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>Fiber optic</td>\n", + " <td>No</td>\n", + " <td>...</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Electronic check</td>\n", + " <td>70.70</td>\n", + " <td>151.65</td>\n", + " <td>Yes</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "<p>5 rows × 21 columns</p>\n", + "</div>" + ], + "text/plain": [ + " customerID gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", + "0 7590-VHVEG Female 0 Yes No 1 No \n", + "1 5575-GNVDE Male 0 No No 34 Yes \n", + "2 3668-QPYBK Male 0 No No 2 Yes \n", + "3 7795-CFOCW Male 0 No No 45 No \n", + "4 9237-HQITU Female 0 No No 2 Yes \n", + "\n", + " MultipleLines InternetService OnlineSecurity ... DeviceProtection \\\n", + "0 No phone service DSL No ... No \n", + "1 No DSL Yes ... Yes \n", + "2 No DSL Yes ... No \n", + "3 No phone service DSL Yes ... Yes \n", + "4 No Fiber optic No ... No \n", + "\n", + " TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \\\n", + "0 No No No Month-to-month Yes \n", + "1 No No No One year No \n", + "2 No No No Month-to-month Yes \n", + "3 Yes No No One year No \n", + "4 No No No Month-to-month Yes \n", + "\n", + " PaymentMethod MonthlyCharges TotalCharges Churn \n", + "0 Electronic check 29.85 29.85 No \n", + "1 Mailed check 56.95 1889.5 No \n", + "2 Mailed check 53.85 108.15 Yes \n", + "3 Bank transfer (automatic) 42.30 1840.75 No \n", + "4 Electronic check 70.70 151.65 Yes \n", + "\n", + "[5 rows x 21 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.018231Z", + "iopub.status.busy": "2021-11-09T03:52:43.017819Z", + "iopub.status.idle": "2021-11-09T03:52:43.052282Z", + "shell.execute_reply": "2021-11-09T03:52:43.051336Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.018175Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>customerID</th>\n", + " <th>gender</th>\n", + " <th>SeniorCitizen</th>\n", + " <th>Partner</th>\n", + " <th>Dependents</th>\n", + " <th>tenure</th>\n", + " <th>PhoneService</th>\n", + " <th>MultipleLines</th>\n", + " <th>InternetService</th>\n", + " <th>OnlineSecurity</th>\n", + " <th>...</th>\n", + " <th>DeviceProtection</th>\n", + " <th>TechSupport</th>\n", + " <th>StreamingTV</th>\n", + " <th>StreamingMovies</th>\n", + " <th>Contract</th>\n", + " <th>PaperlessBilling</th>\n", + " <th>PaymentMethod</th>\n", + " <th>MonthlyCharges</th>\n", + " <th>TotalCharges</th>\n", + " <th>Churn</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>7038</th>\n", + " <td>6840-RESVB</td>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>24</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>...</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>One year</td>\n", + " <td>Yes</td>\n", + " <td>Mailed check</td>\n", + " <td>84.80</td>\n", + " <td>1990.5</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7039</th>\n", + " <td>2234-XADUH</td>\n", + " <td>Female</td>\n", + " <td>0</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>72</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>Fiber optic</td>\n", + " <td>No</td>\n", + " <td>...</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>One year</td>\n", + " <td>Yes</td>\n", + " <td>Credit card (automatic)</td>\n", + " <td>103.20</td>\n", + " <td>7362.9</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7040</th>\n", + " <td>4801-JZAZL</td>\n", + " <td>Female</td>\n", + " <td>0</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>11</td>\n", + " <td>No</td>\n", + " <td>No phone service</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>...</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Electronic check</td>\n", + " <td>29.60</td>\n", + " <td>346.45</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7041</th>\n", + " <td>8361-LTMKD</td>\n", + " <td>Male</td>\n", + " <td>1</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>4</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>Fiber optic</td>\n", + " <td>No</td>\n", + " <td>...</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Mailed check</td>\n", + " <td>74.40</td>\n", + " <td>306.6</td>\n", + " <td>Yes</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7042</th>\n", + " <td>3186-AJIEK</td>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>66</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>Fiber optic</td>\n", + " <td>Yes</td>\n", + " <td>...</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>Two year</td>\n", + " <td>Yes</td>\n", + " <td>Bank transfer (automatic)</td>\n", + " <td>105.65</td>\n", + " <td>6844.5</td>\n", + " <td>No</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "<p>5 rows × 21 columns</p>\n", + "</div>" + ], + "text/plain": [ + " customerID gender SeniorCitizen Partner Dependents tenure \\\n", + "7038 6840-RESVB Male 0 Yes Yes 24 \n", + "7039 2234-XADUH Female 0 Yes Yes 72 \n", + "7040 4801-JZAZL Female 0 Yes Yes 11 \n", + "7041 8361-LTMKD Male 1 Yes No 4 \n", + "7042 3186-AJIEK Male 0 No No 66 \n", + "\n", + " PhoneService MultipleLines InternetService OnlineSecurity ... \\\n", + "7038 Yes Yes DSL Yes ... \n", + "7039 Yes Yes Fiber optic No ... \n", + "7040 No No phone service DSL Yes ... \n", + "7041 Yes Yes Fiber optic No ... \n", + "7042 Yes No Fiber optic Yes ... \n", + "\n", + " DeviceProtection TechSupport StreamingTV StreamingMovies Contract \\\n", + "7038 Yes Yes Yes Yes One year \n", + "7039 Yes No Yes Yes One year \n", + "7040 No No No No Month-to-month \n", + "7041 No No No No Month-to-month \n", + "7042 Yes Yes Yes Yes Two year \n", + "\n", + " PaperlessBilling PaymentMethod MonthlyCharges TotalCharges \\\n", + "7038 Yes Mailed check 84.80 1990.5 \n", + "7039 Yes Credit card (automatic) 103.20 7362.9 \n", + "7040 Yes Electronic check 29.60 346.45 \n", + "7041 Yes Mailed check 74.40 306.6 \n", + "7042 Yes Bank transfer (automatic) 105.65 6844.5 \n", + "\n", + " Churn \n", + "7038 No \n", + "7039 No \n", + "7040 No \n", + "7041 Yes \n", + "7042 No \n", + "\n", + "[5 rows x 21 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.tail()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.079833Z", + "iopub.status.busy": "2021-11-09T03:52:43.078995Z", + "iopub.status.idle": "2021-11-09T03:52:43.090558Z", + "shell.execute_reply": "2021-11-09T03:52:43.089462Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.079771Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7043, 21)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have 2 types of features in the dataset: categorical (two or more values and without any order) and numerical. Most of the feature names are self-explanatory, except for:\n", + " - Partner: whether the customer has a partner or not (Yes, No),\n", + " - Dependents: whether the customer has dependents or not (Yes, No),\n", + " - OnlineBackup: whether the customer has online backup or not (Yes, No, No internet service),\n", + " - tenure: number of months the customer has stayed with the company,\n", + " - MonthlyCharges: the amount charged to the customer monthly,\n", + " - TotalCharges: the total amount charged to the customer.\n", + " \n", + "There are 7043 customers in the dataset and 19 features without customerID (non-informative) and Churn column (target variable). Most of the categorical features have 4 or less unique values." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.093002Z", + "iopub.status.busy": "2021-11-09T03:52:43.092646Z", + "iopub.status.idle": "2021-11-09T03:52:43.101858Z", + "shell.execute_reply": "2021-11-09T03:52:43.100608Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.092944Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "147903" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.size" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.055811Z", + "iopub.status.busy": "2021-11-09T03:52:43.055339Z", + "iopub.status.idle": "2021-11-09T03:52:43.065207Z", + "shell.execute_reply": "2021-11-09T03:52:43.064137Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.055751Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "customerID object\n", + "gender object\n", + "SeniorCitizen int64\n", + "Partner object\n", + "Dependents object\n", + "tenure int64\n", + "PhoneService object\n", + "MultipleLines object\n", + "InternetService object\n", + "OnlineSecurity object\n", + "OnlineBackup object\n", + "DeviceProtection object\n", + "TechSupport object\n", + "StreamingTV object\n", + "StreamingMovies object\n", + "Contract object\n", + "PaperlessBilling object\n", + "PaymentMethod object\n", + "MonthlyCharges float64\n", + "TotalCharges object\n", + "Churn object\n", + "dtype: object" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Totalcharges is given as object datatype but it is float datatype" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.067769Z", + "iopub.status.busy": "2021-11-09T03:52:43.067117Z", + "iopub.status.idle": "2021-11-09T03:52:43.076918Z", + "shell.execute_reply": "2021-11-09T03:52:43.075769Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.067723Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',\n", + " 'tenure', 'PhoneService', 'MultipleLines', 'InternetService',\n", + " 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',\n", + " 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',\n", + " 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],\n", + " dtype='object')" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.105839Z", + "iopub.status.busy": "2021-11-09T03:52:43.104115Z", + "iopub.status.idle": "2021-11-09T03:52:43.143193Z", + "shell.execute_reply": "2021-11-09T03:52:43.142163Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.105792Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "<class 'pandas.core.frame.DataFrame'>\n", + "RangeIndex: 7043 entries, 0 to 7042\n", + "Data columns (total 21 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 customerID 7043 non-null object \n", + " 1 gender 7043 non-null object \n", + " 2 SeniorCitizen 7043 non-null int64 \n", + " 3 Partner 7043 non-null object \n", + " 4 Dependents 7043 non-null object \n", + " 5 tenure 7043 non-null int64 \n", + " 6 PhoneService 7043 non-null object \n", + " 7 MultipleLines 7043 non-null object \n", + " 8 InternetService 7043 non-null object \n", + " 9 OnlineSecurity 7043 non-null object \n", + " 10 OnlineBackup 7043 non-null object \n", + " 11 DeviceProtection 7043 non-null object \n", + " 12 TechSupport 7043 non-null object \n", + " 13 StreamingTV 7043 non-null object \n", + " 14 StreamingMovies 7043 non-null object \n", + " 15 Contract 7043 non-null object \n", + " 16 PaperlessBilling 7043 non-null object \n", + " 17 PaymentMethod 7043 non-null object \n", + " 18 MonthlyCharges 7043 non-null float64\n", + " 19 TotalCharges 7043 non-null object \n", + " 20 Churn 7043 non-null object \n", + "dtypes: float64(1), int64(2), object(18)\n", + "memory usage: 1.1+ MB\n" + ] + } + ], + "source": [ + "df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.176933Z", + "iopub.status.busy": "2021-11-09T03:52:43.176295Z", + "iopub.status.idle": "2021-11-09T03:52:43.202429Z", + "shell.execute_reply": "2021-11-09T03:52:43.201454Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.176874Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "customerID 0\n", + "gender 0\n", + "SeniorCitizen 0\n", + "Partner 0\n", + "Dependents 0\n", + "tenure 0\n", + "PhoneService 0\n", + "MultipleLines 0\n", + "InternetService 0\n", + "OnlineSecurity 0\n", + "OnlineBackup 0\n", + "DeviceProtection 0\n", + "TechSupport 0\n", + "StreamingTV 0\n", + "StreamingMovies 0\n", + "Contract 0\n", + "PaperlessBilling 0\n", + "PaymentMethod 0\n", + "MonthlyCharges 0\n", + "TotalCharges 0\n", + "Churn 0\n", + "dtype: int64" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.isnull().sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.205070Z", + "iopub.status.busy": "2021-11-09T03:52:43.203846Z", + "iopub.status.idle": "2021-11-09T03:52:43.233001Z", + "shell.execute_reply": "2021-11-09T03:52:43.231899Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.205022Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.duplicated().sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Basic Data Cleaning: \n", + "As we have already observered in above cell that Totalcharges is given as object datatype but it is float datatype. We will fix it here." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('O')" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['TotalCharges'].dtype" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.290044Z", + "iopub.status.busy": "2021-11-09T03:52:43.289662Z", + "iopub.status.idle": "2021-11-09T03:52:43.301523Z", + "shell.execute_reply": "2021-11-09T03:52:43.300033Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.289998Z" + } + }, + "outputs": [], + "source": [ + "df['TotalCharges'] = pd.to_numeric(df['TotalCharges'],errors = 'coerce')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dtype('float64')" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['TotalCharges'].dtype" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "categorical_features = [\n", + " \"gender\",\n", + " \"SeniorCitizen\",\n", + " \"Partner\",\n", + " \"Dependents\",\n", + " \"PhoneService\",\n", + " \"MultipleLines\",\n", + " \"InternetService\",\n", + " \"OnlineSecurity\",\n", + " \"OnlineBackup\",\n", + " \"DeviceProtection\",\n", + " \"TechSupport\",\n", + " \"StreamingTV\",\n", + " \"StreamingMovies\",\n", + " \"Contract\",\n", + " \"PaperlessBilling\",\n", + " \"PaymentMethod\",\n", + "]\n", + "numerical_features = [\"tenure\", \"MonthlyCharges\", \"TotalCharges\"]\n", + "target = \"Churn\"" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.235534Z", + "iopub.status.busy": "2021-11-09T03:52:43.234920Z", + "iopub.status.idle": "2021-11-09T03:52:43.262979Z", + "shell.execute_reply": "2021-11-09T03:52:43.261969Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.235471Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "SeniorCitizen 1.833633\n", + "tenure 0.239540\n", + "MonthlyCharges -0.220524\n", + "TotalCharges 0.961642\n", + "dtype: float64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.skew(numeric_only= True)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:52:43.269333Z", + "iopub.status.busy": "2021-11-09T03:52:43.268524Z", + "iopub.status.idle": "2021-11-09T03:52:43.287626Z", + "shell.execute_reply": "2021-11-09T03:52:43.286653Z", + "shell.execute_reply.started": "2021-11-09T03:52:43.269284Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>SeniorCitizen</th>\n", + " <th>tenure</th>\n", + " <th>MonthlyCharges</th>\n", + " <th>TotalCharges</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>SeniorCitizen</th>\n", + " <td>1.000000</td>\n", + " <td>0.016567</td>\n", + " <td>0.220173</td>\n", + " <td>0.102411</td>\n", + " </tr>\n", + " <tr>\n", + " <th>tenure</th>\n", + " <td>0.016567</td>\n", + " <td>1.000000</td>\n", + " <td>0.247900</td>\n", + " <td>0.825880</td>\n", + " </tr>\n", + " <tr>\n", + " <th>MonthlyCharges</th>\n", + " <td>0.220173</td>\n", + " <td>0.247900</td>\n", + " <td>1.000000</td>\n", + " <td>0.651065</td>\n", + " </tr>\n", + " <tr>\n", + " <th>TotalCharges</th>\n", + " <td>0.102411</td>\n", + " <td>0.825880</td>\n", + " <td>0.651065</td>\n", + " <td>1.000000</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " SeniorCitizen tenure MonthlyCharges TotalCharges\n", + "SeniorCitizen 1.000000 0.016567 0.220173 0.102411\n", + "tenure 0.016567 1.000000 0.247900 0.825880\n", + "MonthlyCharges 0.220173 0.247900 1.000000 0.651065\n", + "TotalCharges 0.102411 0.825880 0.651065 1.000000" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.corr(numeric_only= True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Feature distribution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We plot distributions for numerical and categorical features to check for outliers and compare feature distributions with target variable." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Numerical features distribution\n", + "\n", + "Numeric summarizing techniques (mean, standard deviation, etc.) don't show us spikes, shapes of distributions and it is hard to observe outliers with it. That is the reason we use histograms." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>tenure</th>\n", + " <th>MonthlyCharges</th>\n", + " <th>TotalCharges</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>count</th>\n", + " <td>7043.000000</td>\n", + " <td>7043.000000</td>\n", + " <td>7032.000000</td>\n", + " </tr>\n", + " <tr>\n", + " <th>mean</th>\n", + " <td>32.371149</td>\n", + " <td>64.761692</td>\n", + " <td>2283.300441</td>\n", + " </tr>\n", + " <tr>\n", + " <th>std</th>\n", + " <td>24.559481</td>\n", + " <td>30.090047</td>\n", + " <td>2266.771362</td>\n", + " </tr>\n", + " <tr>\n", + " <th>min</th>\n", + " <td>0.000000</td>\n", + " <td>18.250000</td>\n", + " <td>18.800000</td>\n", + " </tr>\n", + " <tr>\n", + " <th>25%</th>\n", + " <td>9.000000</td>\n", + " <td>35.500000</td>\n", + " <td>401.450000</td>\n", + " </tr>\n", + " <tr>\n", + " <th>50%</th>\n", + " <td>29.000000</td>\n", + " <td>70.350000</td>\n", + " <td>1397.475000</td>\n", + " </tr>\n", + " <tr>\n", + " <th>75%</th>\n", + " <td>55.000000</td>\n", + " <td>89.850000</td>\n", + " <td>3794.737500</td>\n", + " </tr>\n", + " <tr>\n", + " <th>max</th>\n", + " <td>72.000000</td>\n", + " <td>118.750000</td>\n", + " <td>8684.800000</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " tenure MonthlyCharges TotalCharges\n", + "count 7043.000000 7043.000000 7032.000000\n", + "mean 32.371149 64.761692 2283.300441\n", + "std 24.559481 30.090047 2266.771362\n", + "min 0.000000 18.250000 18.800000\n", + "25% 9.000000 35.500000 401.450000\n", + "50% 29.000000 70.350000 1397.475000\n", + "75% 55.000000 89.850000 3794.737500\n", + "max 72.000000 118.750000 8684.800000" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[numerical_features].describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[<AxesSubplot: title={'center': 'tenure'}>,\n", + " <AxesSubplot: title={'center': 'MonthlyCharges'}>],\n", + " [<AxesSubplot: title={'center': 'TotalCharges'}>, <AxesSubplot: >]],\n", + " dtype=object)" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x700 with 4 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "df[numerical_features].hist(bins=30, figsize=(10, 7))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We look at distributions of numerical features in relation to the target variable. We can observe that the greater TotalCharges and tenure are the less is the probability of churn." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([<AxesSubplot: title={'center': 'tenure'}>,\n", + " <AxesSubplot: title={'center': 'MonthlyCharges'}>,\n", + " <AxesSubplot: title={'center': 'TotalCharges'}>], dtype=object)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1400x400 with 3 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "fig, ax = plt.subplots(1, 3, figsize=(14, 4))\n", + "df[df.Churn == \"No\"][numerical_features].hist(bins=30, color=\"blue\", alpha=0.5, ax=ax)\n", + "df[df.Churn == \"Yes\"][numerical_features].hist(bins=30, color=\"red\", alpha=0.5, ax=ax)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Categorical feature distribution\n", + "\n", + "To analyze categorical features, we use bar charts. We observe that Senior citizens and customers without phone service are less represented in the data." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1900x1900 with 16 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "ROWS, COLS = 4, 4\n", + "fig, ax = plt.subplots(ROWS,COLS, figsize=(19,19))\n", + "row, col = 0, 0,\n", + "for i, categorical_feature in enumerate(categorical_features):\n", + " if col == COLS - 1:\n", + " row += 1\n", + " col = i % COLS\n", + " df[categorical_feature].value_counts().plot(kind='bar', ax=ax[row, col]).set_title(categorical_feature)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step is to look at categorical features in relation to the target variable. We do this only for contract feature. Users who have a month-to-month contract are more likely to churn than users with long term contracts." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0.5, 1.0, 'churned')" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1200x400 with 2 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "feature = 'Contract'\n", + "fig, ax = plt.subplots(1, 2, figsize=(12, 4))\n", + "df[df.Churn == \"No\"][feature].value_counts().plot(kind='bar', ax=ax[0]).set_title('not churned')\n", + "df[df.Churn == \"Yes\"][feature].value_counts().plot(kind='bar', ax=ax[1]).set_title('churned')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Target variable distribution" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0.5, 1.0, 'churned')" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 640x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "df[target].value_counts().plot(kind='bar').set_title('churned')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Target variable distribution shows that we are dealing with an imbalanced problem as there are many more non-churned as compare to churned users. The model would achieve high accuracy as it would mostly predict majority class - users who didn't churn in our example.\n", + "\n", + "Few things we can do to minimize the influence of imbalanced dataset:\n", + "- resample data,\n", + "- collect more samples,\n", + "- use precision and recall as accuracy metrics." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Outliers Analysis with IQR Method" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:14.876626Z", + "iopub.status.busy": "2021-11-09T03:53:14.875430Z", + "iopub.status.idle": "2021-11-09T03:53:14.900303Z", + "shell.execute_reply": "2021-11-09T03:53:14.899071Z", + "shell.execute_reply.started": "2021-11-09T03:53:14.876576Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "No outliers in tenure\n", + "No outliers in MonthlyCharges\n" + ] + } + ], + "source": [ + "x = ['tenure','MonthlyCharges']\n", + "def count_outliers(data,col):\n", + " q1 = data[col].quantile(0.25,interpolation='nearest')\n", + " q2 = data[col].quantile(0.5,interpolation='nearest')\n", + " q3 = data[col].quantile(0.75,interpolation='nearest')\n", + " q4 = data[col].quantile(1,interpolation='nearest')\n", + " IQR = q3 -q1\n", + " global LLP\n", + " global ULP\n", + " LLP = q1 - 1.5*IQR\n", + " ULP = q3 + 1.5*IQR\n", + " if data[col].min() > LLP and data[col].max() < ULP:\n", + " print(\"No outliers in\",i)\n", + " else:\n", + " print(\"There are outliers in\",i)\n", + " x = data[data[col]<LLP][col].size\n", + " y = data[data[col]>ULP][col].size\n", + " a.append(i)\n", + " print('Count of outliers are:',x+y)\n", + "global a\n", + "a = []\n", + "for i in x:\n", + " count_outliers(df,i)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleaning and Transforming Data" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:14.902614Z", + "iopub.status.busy": "2021-11-09T03:53:14.902166Z", + "iopub.status.idle": "2021-11-09T03:53:14.911726Z", + "shell.execute_reply": "2021-11-09T03:53:14.910394Z", + "shell.execute_reply.started": "2021-11-09T03:53:14.902565Z" + } + }, + "outputs": [], + "source": [ + "df.drop(['customerID'],axis = 1,inplace = True)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:14.914366Z", + "iopub.status.busy": "2021-11-09T03:53:14.914012Z", + "iopub.status.idle": "2021-11-09T03:53:14.952158Z", + "shell.execute_reply": "2021-11-09T03:53:14.951160Z", + "shell.execute_reply.started": "2021-11-09T03:53:14.914319Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>gender</th>\n", + " <th>SeniorCitizen</th>\n", + " <th>Partner</th>\n", + " <th>Dependents</th>\n", + " <th>tenure</th>\n", + " <th>PhoneService</th>\n", + " <th>MultipleLines</th>\n", + " <th>InternetService</th>\n", + " <th>OnlineSecurity</th>\n", + " <th>OnlineBackup</th>\n", + " <th>DeviceProtection</th>\n", + " <th>TechSupport</th>\n", + " <th>StreamingTV</th>\n", + " <th>StreamingMovies</th>\n", + " <th>Contract</th>\n", + " <th>PaperlessBilling</th>\n", + " <th>PaymentMethod</th>\n", + " <th>MonthlyCharges</th>\n", + " <th>TotalCharges</th>\n", + " <th>Churn</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>Female</td>\n", + " <td>0</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>1</td>\n", + " <td>No</td>\n", + " <td>No phone service</td>\n", + " <td>DSL</td>\n", + " <td>No</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Electronic check</td>\n", + " <td>29.85</td>\n", + " <td>29.85</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>34</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>One year</td>\n", + " <td>No</td>\n", + " <td>Mailed check</td>\n", + " <td>56.95</td>\n", + " <td>1889.50</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>2</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Mailed check</td>\n", + " <td>53.85</td>\n", + " <td>108.15</td>\n", + " <td>Yes</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3</th>\n", + " <td>Male</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>45</td>\n", + " <td>No</td>\n", + " <td>No phone service</td>\n", + " <td>DSL</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>Yes</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>One year</td>\n", + " <td>No</td>\n", + " <td>Bank transfer (automatic)</td>\n", + " <td>42.30</td>\n", + " <td>1840.75</td>\n", + " <td>No</td>\n", + " </tr>\n", + " <tr>\n", + " <th>4</th>\n", + " <td>Female</td>\n", + " <td>0</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>2</td>\n", + " <td>Yes</td>\n", + " <td>No</td>\n", + " <td>Fiber optic</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>No</td>\n", + " <td>Month-to-month</td>\n", + " <td>Yes</td>\n", + " <td>Electronic check</td>\n", + " <td>70.70</td>\n", + " <td>151.65</td>\n", + " <td>Yes</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " gender SeniorCitizen Partner Dependents tenure PhoneService \\\n", + "0 Female 0 Yes No 1 No \n", + "1 Male 0 No No 34 Yes \n", + "2 Male 0 No No 2 Yes \n", + "3 Male 0 No No 45 No \n", + "4 Female 0 No No 2 Yes \n", + "\n", + " MultipleLines InternetService OnlineSecurity OnlineBackup \\\n", + "0 No phone service DSL No Yes \n", + "1 No DSL Yes No \n", + "2 No DSL Yes Yes \n", + "3 No phone service DSL Yes No \n", + "4 No Fiber optic No No \n", + "\n", + " DeviceProtection TechSupport StreamingTV StreamingMovies Contract \\\n", + "0 No No No No Month-to-month \n", + "1 Yes No No No One year \n", + "2 No No No No Month-to-month \n", + "3 Yes Yes No No One year \n", + "4 No No No No Month-to-month \n", + "\n", + " PaperlessBilling PaymentMethod MonthlyCharges TotalCharges \\\n", + "0 Yes Electronic check 29.85 29.85 \n", + "1 No Mailed check 56.95 1889.50 \n", + "2 Yes Mailed check 53.85 108.15 \n", + "3 No Bank transfer (automatic) 42.30 1840.75 \n", + "4 Yes Electronic check 70.70 151.65 \n", + "\n", + " Churn \n", + "0 No \n", + "1 No \n", + "2 Yes \n", + "3 No \n", + "4 Yes " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Dropped customerID because it is not needed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### On Hot Encoding" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:14.954613Z", + "iopub.status.busy": "2021-11-09T03:53:14.953998Z", + "iopub.status.idle": "2021-11-09T03:53:15.014837Z", + "shell.execute_reply": "2021-11-09T03:53:15.013920Z", + "shell.execute_reply.started": "2021-11-09T03:53:14.954564Z" + } + }, + "outputs": [], + "source": [ + "df1=pd.get_dummies(data=df,columns=['gender', 'Partner', 'Dependents', \n", + " 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity',\n", + " 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV',\n", + " 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'Churn'], drop_first=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>SeniorCitizen</th>\n", + " <th>tenure</th>\n", + " <th>MonthlyCharges</th>\n", + " <th>TotalCharges</th>\n", + " <th>gender_Male</th>\n", + " <th>Partner_Yes</th>\n", + " <th>Dependents_Yes</th>\n", + " <th>PhoneService_Yes</th>\n", + " <th>MultipleLines_No phone service</th>\n", + " <th>MultipleLines_Yes</th>\n", + " <th>...</th>\n", + " <th>StreamingTV_Yes</th>\n", + " <th>StreamingMovies_No internet service</th>\n", + " <th>StreamingMovies_Yes</th>\n", + " <th>Contract_One year</th>\n", + " <th>Contract_Two year</th>\n", + " <th>PaperlessBilling_Yes</th>\n", + " <th>PaymentMethod_Credit card (automatic)</th>\n", + " <th>PaymentMethod_Electronic check</th>\n", + " <th>PaymentMethod_Mailed check</th>\n", + " <th>Churn_Yes</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>29.85</td>\n", + " <td>29.85</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>0</td>\n", + " <td>34</td>\n", + " <td>56.95</td>\n", + " <td>1889.50</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>0</td>\n", + " <td>2</td>\n", + " <td>53.85</td>\n", + " <td>108.15</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>1</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3</th>\n", + " <td>0</td>\n", + " <td>45</td>\n", + " <td>42.30</td>\n", + " <td>1840.75</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>4</th>\n", + " <td>0</td>\n", + " <td>2</td>\n", + " <td>70.70</td>\n", + " <td>151.65</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "<p>5 rows × 31 columns</p>\n", + "</div>" + ], + "text/plain": [ + " SeniorCitizen tenure MonthlyCharges TotalCharges gender_Male \\\n", + "0 0 1 29.85 29.85 0 \n", + "1 0 34 56.95 1889.50 1 \n", + "2 0 2 53.85 108.15 1 \n", + "3 0 45 42.30 1840.75 1 \n", + "4 0 2 70.70 151.65 0 \n", + "\n", + " Partner_Yes Dependents_Yes PhoneService_Yes \\\n", + "0 1 0 0 \n", + "1 0 0 1 \n", + "2 0 0 1 \n", + "3 0 0 0 \n", + "4 0 0 1 \n", + "\n", + " MultipleLines_No phone service MultipleLines_Yes ... StreamingTV_Yes \\\n", + "0 1 0 ... 0 \n", + "1 0 0 ... 0 \n", + "2 0 0 ... 0 \n", + "3 1 0 ... 0 \n", + "4 0 0 ... 0 \n", + "\n", + " StreamingMovies_No internet service StreamingMovies_Yes \\\n", + "0 0 0 \n", + "1 0 0 \n", + "2 0 0 \n", + "3 0 0 \n", + "4 0 0 \n", + "\n", + " Contract_One year Contract_Two year PaperlessBilling_Yes \\\n", + "0 0 0 1 \n", + "1 1 0 0 \n", + "2 0 0 1 \n", + "3 1 0 0 \n", + "4 0 0 1 \n", + "\n", + " PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check \\\n", + "0 0 1 \n", + "1 0 0 \n", + "2 0 0 \n", + "3 0 0 \n", + "4 0 1 \n", + "\n", + " PaymentMethod_Mailed check Churn_Yes \n", + "0 0 0 \n", + "1 1 0 \n", + "2 1 1 \n", + "3 0 0 \n", + "4 0 1 \n", + "\n", + "[5 rows x 31 columns]" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges',\n", + " 'gender_Male', 'Partner_Yes', 'Dependents_Yes', 'PhoneService_Yes',\n", + " 'MultipleLines_No phone service', 'MultipleLines_Yes',\n", + " 'InternetService_Fiber optic', 'InternetService_No',\n", + " 'OnlineSecurity_No internet service', 'OnlineSecurity_Yes',\n", + " 'OnlineBackup_No internet service', 'OnlineBackup_Yes',\n", + " 'DeviceProtection_No internet service', 'DeviceProtection_Yes',\n", + " 'TechSupport_No internet service', 'TechSupport_Yes',\n", + " 'StreamingTV_No internet service', 'StreamingTV_Yes',\n", + " 'StreamingMovies_No internet service', 'StreamingMovies_Yes',\n", + " 'Contract_One year', 'Contract_Two year', 'PaperlessBilling_Yes',\n", + " 'PaymentMethod_Credit card (automatic)',\n", + " 'PaymentMethod_Electronic check', 'PaymentMethod_Mailed check',\n", + " 'Churn_Yes'],\n", + " dtype='object')" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Rearranging Columns" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "_kg_hide-input": true, + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.018322Z", + "iopub.status.busy": "2021-11-09T03:53:15.017423Z", + "iopub.status.idle": "2021-11-09T03:53:15.028617Z", + "shell.execute_reply": "2021-11-09T03:53:15.027469Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.018273Z" + } + }, + "outputs": [], + "source": [ + "df1 = df1[['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges',\n", + " 'gender_Male', 'Partner_Yes', 'Dependents_Yes',\n", + " 'PhoneService_Yes', 'MultipleLines_No phone service',\n", + " 'MultipleLines_Yes', 'InternetService_Fiber optic',\n", + " 'InternetService_No', 'OnlineSecurity_No internet service',\n", + " 'OnlineSecurity_Yes', 'OnlineBackup_No internet service',\n", + " 'OnlineBackup_Yes', 'DeviceProtection_No internet service',\n", + " 'DeviceProtection_Yes', 'TechSupport_No internet service',\n", + " 'TechSupport_Yes', 'StreamingTV_No internet service', 'StreamingTV_Yes',\n", + " 'StreamingMovies_No internet service', 'StreamingMovies_Yes',\n", + " 'Contract_One year', 'Contract_Two year', 'PaperlessBilling_Yes',\n", + " 'PaymentMethod_Credit card (automatic)',\n", + " 'PaymentMethod_Electronic check', 'PaymentMethod_Mailed check','Churn_Yes']]" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.031710Z", + "iopub.status.busy": "2021-11-09T03:53:15.030868Z", + "iopub.status.idle": "2021-11-09T03:53:15.064625Z", + "shell.execute_reply": "2021-11-09T03:53:15.063618Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.031661Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>SeniorCitizen</th>\n", + " <th>tenure</th>\n", + " <th>MonthlyCharges</th>\n", + " <th>TotalCharges</th>\n", + " <th>gender_Male</th>\n", + " <th>Partner_Yes</th>\n", + " <th>Dependents_Yes</th>\n", + " <th>PhoneService_Yes</th>\n", + " <th>MultipleLines_No phone service</th>\n", + " <th>MultipleLines_Yes</th>\n", + " <th>...</th>\n", + " <th>StreamingTV_Yes</th>\n", + " <th>StreamingMovies_No internet service</th>\n", + " <th>StreamingMovies_Yes</th>\n", + " <th>Contract_One year</th>\n", + " <th>Contract_Two year</th>\n", + " <th>PaperlessBilling_Yes</th>\n", + " <th>PaymentMethod_Credit card (automatic)</th>\n", + " <th>PaymentMethod_Electronic check</th>\n", + " <th>PaymentMethod_Mailed check</th>\n", + " <th>Churn_Yes</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>29.85</td>\n", + " <td>29.85</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>0</td>\n", + " <td>34</td>\n", + " <td>56.95</td>\n", + " <td>1889.50</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>0</td>\n", + " <td>2</td>\n", + " <td>53.85</td>\n", + " <td>108.15</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>1</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3</th>\n", + " <td>0</td>\n", + " <td>45</td>\n", + " <td>42.30</td>\n", + " <td>1840.75</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>4</th>\n", + " <td>0</td>\n", + " <td>2</td>\n", + " <td>70.70</td>\n", + " <td>151.65</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>...</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " <td>0</td>\n", + " <td>1</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "<p>5 rows × 31 columns</p>\n", + "</div>" + ], + "text/plain": [ + " SeniorCitizen tenure MonthlyCharges TotalCharges gender_Male \\\n", + "0 0 1 29.85 29.85 0 \n", + "1 0 34 56.95 1889.50 1 \n", + "2 0 2 53.85 108.15 1 \n", + "3 0 45 42.30 1840.75 1 \n", + "4 0 2 70.70 151.65 0 \n", + "\n", + " Partner_Yes Dependents_Yes PhoneService_Yes \\\n", + "0 1 0 0 \n", + "1 0 0 1 \n", + "2 0 0 1 \n", + "3 0 0 0 \n", + "4 0 0 1 \n", + "\n", + " MultipleLines_No phone service MultipleLines_Yes ... StreamingTV_Yes \\\n", + "0 1 0 ... 0 \n", + "1 0 0 ... 0 \n", + "2 0 0 ... 0 \n", + "3 1 0 ... 0 \n", + "4 0 0 ... 0 \n", + "\n", + " StreamingMovies_No internet service StreamingMovies_Yes \\\n", + "0 0 0 \n", + "1 0 0 \n", + "2 0 0 \n", + "3 0 0 \n", + "4 0 0 \n", + "\n", + " Contract_One year Contract_Two year PaperlessBilling_Yes \\\n", + "0 0 0 1 \n", + "1 1 0 0 \n", + "2 0 0 1 \n", + "3 1 0 0 \n", + "4 0 0 1 \n", + "\n", + " PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check \\\n", + "0 0 1 \n", + "1 0 0 \n", + "2 0 0 \n", + "3 0 0 \n", + "4 0 1 \n", + "\n", + " PaymentMethod_Mailed check Churn_Yes \n", + "0 0 0 \n", + "1 1 0 \n", + "2 1 1 \n", + "3 0 0 \n", + "4 0 1 \n", + "\n", + "[5 rows x 31 columns]" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(7043, 31)" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df1.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.067076Z", + "iopub.status.busy": "2021-11-09T03:53:15.066454Z", + "iopub.status.idle": "2021-11-09T03:53:15.080022Z", + "shell.execute_reply": "2021-11-09T03:53:15.078954Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.067027Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.impute import SimpleImputer\n", + "\n", + "# The imputer will replace missing values with the mean of the non-missing values for the respective columns\n", + "\n", + "imputer = SimpleImputer(missing_values=np.nan, strategy=\"mean\")\n", + "\n", + "df1.TotalCharges = imputer.fit_transform(df1[\"TotalCharges\"].values.reshape(-1, 1))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Feature Scaling" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.082462Z", + "iopub.status.busy": "2021-11-09T03:53:15.082111Z", + "iopub.status.idle": "2021-11-09T03:53:15.103525Z", + "shell.execute_reply": "2021-11-09T03:53:15.102463Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.082399Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.preprocessing import StandardScaler\n", + "scaler = StandardScaler()" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "scaler.fit(df1.drop(['Churn_Yes'],axis = 1))\n", + "scaled_features = scaler.transform(df1.drop('Churn_Yes',axis = 1))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Feature Selection" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.106000Z", + "iopub.status.busy": "2021-11-09T03:53:15.105329Z", + "iopub.status.idle": "2021-11-09T03:53:15.116525Z", + "shell.execute_reply": "2021-11-09T03:53:15.115285Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.105952Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "X = scaled_features\n", + "Y = df1['Churn_Yes']\n", + "X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.3,random_state=44)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prediction using Logistic Regression" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.228616Z", + "iopub.status.busy": "2021-11-09T03:53:42.227007Z", + "iopub.status.idle": "2021-11-09T03:53:42.319319Z", + "shell.execute_reply": "2021-11-09T03:53:42.318141Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.228565Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "<style>#sk-container-id-1 {color: black;background-color: white;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>LogisticRegression()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression()</pre></div></div></div></div></div>" + ], + "text/plain": [ + "LogisticRegression()" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.metrics import classification_report,accuracy_score ,confusion_matrix\n", + "\n", + "logmodel = LogisticRegression()\n", + "logmodel.fit(X_train,Y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.328549Z", + "iopub.status.busy": "2021-11-09T03:53:42.325493Z", + "iopub.status.idle": "2021-11-09T03:53:42.338505Z", + "shell.execute_reply": "2021-11-09T03:53:42.337265Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.328497Z" + } + }, + "outputs": [], + "source": [ + "predLR = logmodel.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "predLR" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5616 0\n", + "2937 0\n", + "1355 0\n", + "5441 1\n", + "3333 0\n", + " ..\n", + "2797 1\n", + "412 0\n", + "174 0\n", + "5761 0\n", + "5895 0\n", + "Name: Churn_Yes, Length: 2113, dtype: uint8" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Y_test" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.348885Z", + "iopub.status.busy": "2021-11-09T03:53:42.344785Z", + "iopub.status.idle": "2021-11-09T03:53:42.381860Z", + "shell.execute_reply": "2021-11-09T03:53:42.380863Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.348824Z" + }, + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.84 0.90 0.87 1557\n", + " 1 0.65 0.53 0.58 556\n", + "\n", + " accuracy 0.80 2113\n", + " macro avg 0.74 0.71 0.73 2113\n", + "weighted avg 0.79 0.80 0.79 2113\n", + "\n" + ] + } + ], + "source": [ + "print(classification_report(Y_test, predLR))" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1200x400 with 4 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# calculate the classification report\n", + "report = classification_report(Y_test, predLR, target_names=['Churn_No', 'Churn_Yes'])\n", + "\n", + "# split the report into lines\n", + "lines = report.split('\\n')\n", + "\n", + "# split each line into parts\n", + "parts = [line.split() for line in lines[2:-5]]\n", + "\n", + "# extract the metrics for each class\n", + "class_metrics = dict()\n", + "for part in parts:\n", + " class_metrics[part[0]] = {'precision': float(part[1]), 'recall': float(part[2]), 'f1-score': float(part[3]), 'support': int(part[4])}\n", + "\n", + "# create a bar chart for each metric\n", + "fig, ax = plt.subplots(1, 4, figsize=(12, 4))\n", + "metrics = ['precision', 'recall', 'f1-score', 'support']\n", + "for i, metric in enumerate(metrics):\n", + " ax[i].bar(class_metrics.keys(), [class_metrics[key][metric] for key in class_metrics.keys()])\n", + " ax[i].set_title(metric)\n", + "\n", + "# display the plot\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "confusion_matrix_LR = confusion_matrix(Y_test, predLR)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 480x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# create a heatmap of the matrix using matshow()\n", + "\n", + "plt.matshow(confusion_matrix(Y_test, predLR))\n", + "\n", + "# add labels for the x and y axes\n", + "plt.xlabel('Predicted Class')\n", + "plt.ylabel('Actual Class')\n", + "\n", + "for i in range(2):\n", + " for j in range(2):\n", + " plt.text(j, i, confusion_matrix_LR[i, j], ha='center', va='center')\n", + "\n", + "\n", + "# Add custom labels for x and y ticks\n", + "plt.xticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.yticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.390863Z", + "iopub.status.busy": "2021-11-09T03:53:42.388123Z", + "iopub.status.idle": "2021-11-09T03:53:42.405849Z", + "shell.execute_reply": "2021-11-09T03:53:42.404464Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.390782Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.8062880324543611" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "logmodel.score(X_train, Y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.8002839564600095" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "accuracy_score(Y_test, predLR)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prediction using Support Vector Classifier" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.527574Z", + "iopub.status.busy": "2021-11-09T03:53:42.526756Z", + "iopub.status.idle": "2021-11-09T03:53:43.842686Z", + "shell.execute_reply": "2021-11-09T03:53:43.841678Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.527527Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.svm import SVC\n", + "\n", + "svc = SVC()\n", + "svc.fit(X_train, Y_train)\n", + "y_pred_svc = svc.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:43.862493Z", + "iopub.status.busy": "2021-11-09T03:53:43.861822Z", + "iopub.status.idle": "2021-11-09T03:53:43.877207Z", + "shell.execute_reply": "2021-11-09T03:53:43.876226Z", + "shell.execute_reply.started": "2021-11-09T03:53:43.862445Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.83 0.92 0.87 1557\n", + " 1 0.67 0.48 0.56 556\n", + "\n", + " accuracy 0.80 2113\n", + " macro avg 0.75 0.70 0.71 2113\n", + "weighted avg 0.79 0.80 0.79 2113\n", + "\n" + ] + } + ], + "source": [ + "print(classification_report(Y_test, y_pred_svc))" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:43.844696Z", + "iopub.status.busy": "2021-11-09T03:53:43.844279Z", + "iopub.status.idle": "2021-11-09T03:53:43.858729Z", + "shell.execute_reply": "2021-11-09T03:53:43.857478Z", + "shell.execute_reply.started": "2021-11-09T03:53:43.844652Z" + } + }, + "outputs": [], + "source": [ + "confusion_matrix_svc = confusion_matrix(Y_test, y_pred_svc)" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfwAAAG4CAYAAACgm1VpAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA2aklEQVR4nO3deVxVdf7H8fdl30FcEJREA1wSc9dyxj0xyyUryxYly5qmJk3TMtMUc6/JsZx0skbLNnPLn5ZrLqVpqWlaqGCaZrggCCjKen5/ON66IcrNi5jf1/Px4BHnfL/nez4HT/d9zznfCzbLsiwBAIBrmlt5FwAAAMoegQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPlJG1a9fKZrPp5MmT5V2KU2w2mxYtWlTeZeAK+zP+u7dt21YDBw4s7zL+NAh8/OklJCTIZrNpwoQJDusXLVokm83m1FhRUVGaMmVKqfp+++23uvvuuxUWFiYfHx/FxMSof//+2rt3r1P7BK6EI0eO6B//+Idq1aolb29vRUZGqmvXrlq9enV5l4YrhMDHNcHHx0cTJ05URkbGFdnfkiVL1LJlS+Xm5uq9995TUlKS5syZo+DgYI0YMaJM952Xl1em4+Pac+DAATVp0kSff/65Jk+erJ07d2rZsmVq166dnnjiiTLbL+fq1YXAxzWhY8eOqlq1qsaPH3/RfvPnz9cNN9wgb29vRUVF6ZVXXrG3tW3bVj/99JOefvpp2Wy2Eu8O5OTk6KGHHlKXLl20ePFidezYUTVr1lSLFi308ssva8aMGQ79t27dqqZNm8rPz08333yz9uzZY29LSEhQjx49HPoPHDhQbdu2dajrySef1MCBA1WpUiXFx8fbHxesXr26xLEl6ZNPPlHjxo3l4+OjWrVqafTo0SooKLC3Jycnq3Xr1vLx8VG9evW0cuXKi/788Of097//XTabTV9//bXuvPNOxcbG6oYbbtCgQYO0adMme7+0tDTdcccd8vPzU0xMjBYvXmxvmzVrlkJCQhzG/f1dtFGjRqlhw4aaOXOmatasKR8fH0nnHhfMnDmzxLEladeuXbr11lsVEBCgsLAwPfjgg0pLS7O3nz59Wn369FFAQIDCw8Md/t9F6RD4uCa4u7tr3Lhxeu211/Tzzz9fsM/WrVvVq1cv3Xvvvdq5c6dGjRqlESNGaNasWZKkBQsWqHr16kpMTFRqaqpSU1MvOM7y5cuVlpamoUOHXrD99y+Kw4cP1yuvvKItW7bIw8ND/fr1c/r4Zs+eLS8vL23YsEHTp08v1dhffPGF+vTpowEDBuiHH37QjBkzNGvWLI0dO1aSVFRUpJ49e8rLy0ubN2/W9OnT9eyzzzpdG65u6enpWrZsmZ544gn5+/sXa//t+Tp69Gj16tVL3333nbp06aL7779f6enpTu0vJSVF8+fP14IFC7R9+/ZSjX3y5Em1b99ejRo10pYtW7Rs2TIdPXpUvXr1sm8/ZMgQrVu3Tp988olWrFihtWvXatu2bc79MExnAX9yffv2tbp3725ZlmW1bNnS6tevn2VZlrVw4ULrt6f4fffdZ91yyy0O2w4ZMsSqV6+efblGjRrWq6++etH9TZw40ZJkpaenX7TfmjVrLEnWqlWr7OuWLl1qSbLOnDlTrPbzBgwYYLVp08a+3KZNG6tRo0ZOj92hQwdr3LhxDtu9++67Vnh4uGVZlrV8+XLLw8PDOnz4sL39s88+syRZCxcuvOix4c9j8+bNliRrwYIFF+0nyXrhhRfsy6dOnbIkWZ999pllWZb13//+1woODnbY5vf/j7344ouWp6endezYMafGHjNmjNWpUyeHbQ4dOmRJsvbs2WNlZ2dbXl5e1ty5c+3tJ06csHx9fa0BAwZc+ocAy7Isiyt8XFMmTpyo2bNnKykpqVhbUlKSWrVq5bCuVatWSk5OVmFhYan3YVmWUzU1aNDA/n14eLgk6dixY06N0aRJE6fH3rFjhxITExUQEGD/6t+/v1JTU5WTk6OkpCRFRkYqIiLCPsZNN93kVF24+jlzvv72fPL391dQUJDT52qNGjVUuXJlp8besWOH1qxZ43Cu1qlTR5K0b98+7du3T3l5eWrRooV9jNDQUNWuXdup2kznUd4FAK7UunVrxcfHa9iwYUpISCiTfcTGxkqSdu/eXaqA9PT0tH9//nlnUVGRJMnNza3YC3J+fn6xMS50K/ZSY586dUqjR49Wz549i213/tkqrn0xMTGy2WzavXv3Jfv+9nySzp1TZXGu/n7sU6dOqWvXrpo4cWKx7cLDw5WSknLJ2nFpXOHjmjNhwgT93//9n7766iuH9XXr1tWGDRsc1m3YsEGxsbFyd3eXJHl5eV3yar9Tp06qVKmSJk2adMF2Zz53X7ly5WJzBX773PNyNG7cWHv27FF0dHSxLzc3N9WtW1eHDh1y2P9vJ3Dh2hAaGqr4+HhNmzZNp0+fLtZe2vO1cuXKys7OdhjDlefq999/r6ioqGLnqr+/v66//np5enpq8+bN9m0yMjL4CKyTCHxcc+Li4nT//fdr6tSpDusHDx6s1atXa8yYMdq7d69mz56t119/Xc8884y9T1RUlNavX6/Dhw87zBD+LX9/f82cOVNLly5Vt27dtGrVKh04cEBbtmzR0KFD9be//a3UtbZv315btmzRO++8o+TkZL344ovatWvXHzvw3xk5cqTeeecdjR49Wt9//72SkpL04Ycf6oUXXpB07pMNsbGx6tu3r3bs2KEvvvhCw4cPd8m+cXWZNm2aCgsL1bx5c82fP1/JyclKSkrS1KlTS/0Yp0WLFvLz89Pzzz+vffv26f3337dPeL1cTzzxhNLT09W7d29988032rdvn5YvX66HHnpIhYWFCggI0MMPP6whQ4bo888/165du5SQkCA3NyLMGfy0cE1KTEy03y48r3Hjxpo7d64+/PBD1a9fXyNHjlRiYqLDrf/ExEQdOHBA119//QWfQ57XvXt3bdy4UZ6enrrvvvtUp04d9e7dW5mZmXrppZdKXWd8fLxGjBihoUOHqlmzZsrOzlafPn2cPt6Sxl6yZIlWrFihZs2aqWXLlnr11VdVo0YNSedu0S5cuFBnzpxR8+bN9cgjj9hn8OPaUqtWLW3btk3t2rXT4MGDVb9+fd1yyy1avXq13njjjVKNERoaqjlz5ujTTz9VXFycPvjgA40aNcol9UVERGjDhg0qLCxUp06dFBcXp4EDByokJMQe6pMnT9Zf//pXde3aVR07dtRf/vKXEue24MJslrMzkAAAwJ8OV/gAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACH7hCcnNzNWrUKOXm5pZ3KUCZ4Ty/evGLd4ArJCsrS8HBwcrMzFRQUFB5lwOUCc7zqxdX+AAAGIDABwDAAB7lXQDKR1FRkX755RcFBgba/446ylZWVpbDf4FrEef5lWdZlrKzsxUREXHRvyDIM3xD/fzzz4qMjCzvMgAALnLo0CFVr169xHau8A0VGBgoSfppW5SCAniyg2vXHbFx5V0CUKYKlK8v9an9db0kBL6hzt/GDwpwU1AggY9rl4fNs7xLAMrW/+7TX+rxLK/0AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgX8Ba9eulc1m08mTJ8u7FKfYbDYtWrSovMu4pq3/6oy69flF1Rvul3t4ihZ9dqrEvo8PPSb38BT96z8n7esOHMrXI4OO6vrmB+Rfc59iWh7QqMknlJdnOWy7fM1p3XzbIQVH71PYDT/qrodTdeBQflkdFnBRGdZxbbc2aL21RKuseTpmHba3FVlFSra+01fWCn1uLdR6a4l2WV8r1zrjMEaWlaFt1nqttT7ROmuxkqytKrAKrvShGK1cAz8hIUE2m00TJkxwWL9o0SLZbDanxoqKitKUKVNK1ffbb7/V3XffrbCwMPn4+CgmJkb9+/fX3r17ndonzHM6p0g31vPWa+MqX7Tfwk9PafO2s4qo6u6wfndynoqKpDcmVdHOtdfpldGVNeOdLA0ff8LeZ//BfN3x0BG1a+Wnbauu02cfROhEeqHu6nekTI4JuJRCFShAwaqjRsXailSobJ1ULdVVC3XUjbpJOcrWdm2098m1zmib1stXAWqm9mqov+iUsvSDvrmSh2G8cr/C9/Hx0cSJE5WRkXFF9rdkyRK1bNlSubm5eu+995SUlKQ5c+YoODhYI0aMKNN95+Xllen4KHu3dvDXmOcq6o4uASX2OZxaoAEvHNe708Lk6eH4xrVze3+9PSVMndr6qVYNT3WL99fgx0O08NNf7xRs/S5XhYWWxjwXquujPNW4gY8GPV5B27/PVX6+9fvdAWWuki1c0bb6qmKrVqzNw+apxrbWCrNFyt8WqGBbRdVWI2UrQ2etHEnScaXKTW6qo0b/6xOqumqsYzqsHKvku2RwrXIP/I4dO6pq1aoaP378RfvNnz9fN9xwg7y9vRUVFaVXXnnF3ta2bVv99NNPevrpp2Wz2Uq8O5CTk6OHHnpIXbp00eLFi9WxY0fVrFlTLVq00Msvv6wZM2Y49N+6dauaNm0qPz8/3XzzzdqzZ4+9LSEhQT169HDoP3DgQLVt29ahrieffFIDBw5UpUqVFB8fb39csHr16hLHlqRPPvlEjRs3lo+Pj2rVqqXRo0eroODX21/Jyclq3bq1fHx8VK9ePa1cufKiPz9cGUVFlvr+46ieebyCbqjtXaptMrOKFBry652AJg285eYm/ffDbBUWWsrMKtScednq8FdfeXo6d+cLKA8FOvf4yUOekqQiFckmN4fXZjedO+dPKu3KF2iocg98d3d3jRs3Tq+99pp+/vnnC/bZunWrevXqpXvvvVc7d+7UqFGjNGLECM2aNUuStGDBAlWvXl2JiYlKTU1VamrqBcdZvny50tLSNHTo0Au2h4SEOCwPHz5cr7zyirZs2SIPDw/169fP6eObPXu2vLy8tGHDBk2fPr1UY3/xxRfq06ePBgwYoB9++EEzZszQrFmzNHbsWElSUVGRevbsKS8vL23evFnTp0/Xs88+63RtcL1Jr2fI3V36xyPBpeqfsj9Pr7+dqf4PBtnX1bzOU8s+qKYXxp+Qb419Cq29X4dTC/TRf6qWVdmAyxRahUrRTlVVpDxs5wI/VJWVp7M6YO1RkVWkfCtPKdopScrV2fIs1yge5V2AJN1xxx1q2LChXnzxRb311lvF2v/5z3+qQ4cO9lvusbGx+uGHHzR58mQlJCQoNDRU7u7uCgwMVNWqJb8oJicnS5Lq1KlTqrrGjh2rNm3aSJKee+453XbbbTp79qx8fHxKfWwxMTGaNGmSffn8m5GLjT169Gg999xz6tu3rySpVq1aGjNmjIYOHaoXX3xRq1at0u7du7V8+XJFRERIksaNG6dbb721xDpyc3OVm5trX87Kyir1MaB0tu44q6kzM7VlRWSp5qAcTi1Ql/tSdVfXAPV/4Nc3CEeOFeixIcfUp1eg7u0RqOxTRRo1+YR69T+i5R9FOD2/BbhSiqwi7dQmSVIdNbavD7AF6warmfZqh/ZplySbrlO0vOQtzuYrp9yv8M+bOHGiZs+eraSkpGJtSUlJatWqlcO6Vq1aKTk5WYWFhaXeh2U59/yzQYMG9u/Dw8MlSceOHXNqjCZNmjg99o4dO5SYmKiAgAD7V//+/ZWamqqcnBwlJSUpMjLSHvaSdNNNN120jvHjxys4ONj+FRkZ6dRx4NK+3HxWx9IKFdX0gLyqp8ireop++rlAz4xOU61mBxz6/nKkQB3uOqybmvpoxmTHCYD//m+mggPdNHFEJTWK81brm3z1zutVtfqLM9q8LVfA1eh82J9Vjhrpr/ar+/Oq2q5Ta1tX/UW3qY26qZbqKU+58lXJ82HgWlfFFb4ktW7dWvHx8Ro2bJgSEhLKZB+xsbGSpN27d18yICXJ0/PXE/b8VVVRUZEkyc3NrdgbiPz84h+b8vf3d3rsU6dOafTo0erZs2ex7Zy5u/Bbw4YN06BBg+zLWVlZhL6LPXBXoDq09nVYd2vvX/TAXYFKuOfXW/aHU8+FfeMG3np7ShW5uTle4+ScseT2u7fi7v97xF9UxKQ9XH3Oh32OTqmJ2sjLVvL8FW/budeww9Z+ucldoapypco03lUT+JI0YcIENWzYULVr13ZYX7duXW3YsMFh3YYNGxQbGyv3/70Senl5XfJqv1OnTqpUqZImTZqkhQsXFms/efJksef4JalcubJ27drlsG779u0OQf5HNW7cWHv27FF0dPQF2+vWratDhw4pNTXVfndg06ZNFx3T29tb3t6lm0SGkp06XaSU/b++sTtwsEDbd+UqNMRN11X3VMVQx4/heXrYVLWyh2pHe0k6F/bt7zysGtU9NHlkJR0/8es5W7XKuf8du3T005T/nNSYf6br3h4Byj5VpOHj01Wjuoca1effEFdegVWgM/p1Nv0ZnVa2dVKe8pKXfPSdvlK2TqqhWsmSpVzr3HN5T3nJzXbu3eshK0XBqih3eShdR5WsnYpWfXnavMrlmEx0VQV+XFyc7r//fk2dOtVh/eDBg9WsWTONGTNG99xzj7766iu9/vrr+ve//23vExUVpfXr1+vee++Vt7e3KlWqVGx8f39/zZw5U3fffbe6deump556StHR0UpLS9PcuXN18OBBffjhh6WqtX379po8ebLeeecd3XTTTZozZ4527dqlRo2Kf07VWSNHjtTtt9+u6667TnfddZfc3Ny0Y8cO7dq1Sy+99JI6duyo2NhY9e3bV5MnT1ZWVpaGDx9+2fvFpW3ZcVYd7vzFvjx41LkZxn16Beq//wq75PYr1+coZX++Uvbn67rGBxzaClPPvcFr/xc/zfl3mF6edlKTp2XIz9dNLZv66NP3I+Tre9U8hYNBspSubVpvX07Wd5KkcNVQLdVTms7NTdqsVQ7bNVZr+xV8ptL1o35QgQrkr0DVVWOF22pcoSOAdJUFviQlJibqo48+cljXuHFjzZ07VyNHjtSYMWMUHh6uxMREh1v/iYmJeuyxx3T99dcrNze3xOf13bt318aNGzV+/Hjdd9999lvb7du310svvVTqOuPj4zVixAgNHTpUZ8+eVb9+/dSnTx/t3LnzDx3378desmSJEhMTNXHiRHl6eqpOnTp65JFHJJ17nLBw4UI9/PDDat68uaKiojR16lR17tz5sveNi2t7s589mEvjx2+iHJYT7glyuL1fknt7nJuwB1wNQm1V1FF3ldh+sbbz6tuau7Ik/AE2y9mZbLgmZGVlKTg4WBl7aykokKtGXLviIxqWdwlAmSqw8rVWnygzM1NBQSVfUPBKDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAATgf+smXL9OWXX9qXp02bpoYNG+q+++5TRkaGS4sDAACu4XTgDxkyRFlZWZKknTt3avDgwerSpYv279+vQYMGubxAAABw+Tyc3WD//v2qV6+eJGn+/Pm6/fbbNW7cOG3btk1dunRxeYEAAODyOX2F7+XlpZycHEnSqlWr1KlTJ0lSaGio/cofAABcXZy+wv/LX/6iQYMGqVWrVvr666/10UcfSZL27t2r6tWru7xAAABw+Zy+wn/99dfl4eGhefPm6Y033lC1atUkSZ999pk6d+7s8gIBAMDls1mWZZV3EbjysrKyFBwcrIy9tRQUyKczce2Kj2hY3iUAZarAytdafaLMzEwFBQWV2M/pV/pt27Zp586d9uVPPvlEPXr00PPPP6+8vLw/Vi0AAChTTgf+Y489pr1790qSfvzxR917773y8/PTxx9/rKFDh7q8QAAAcPmcDvy9e/eqYcOGkqSPP/5YrVu31vvvv69Zs2Zp/vz5rq4PAAC4gNOBb1mWioqKJJ37WN75z95HRkYqLS3NtdUBAACXcDrwmzZtqpdeeknvvvuu1q1bp9tuu03SuV/IExYW5vICAQDA5XM68KdMmaJt27bpySef1PDhwxUdHS1Jmjdvnm6++WaXFwgAAC6f0794p0GDBg6z9M+bPHmy3N3dXVIUAABwLacDvyQ+Pj6uGgoAALiY04FfWFioV199VXPnztXBgweLffY+PT3dZcUBAADXcPoZ/ujRo/XPf/5T99xzjzIzMzVo0CD17NlTbm5uGjVqVBmUCAAALpfTgf/ee+/pzTff1ODBg+Xh4aHevXtr5syZGjlypDZt2lQWNQIAgMvkdOAfOXJEcXFxkqSAgABlZmZKkm6//XYtXbrUtdUBAACXcDrwq1evrtTUVEnS9ddfrxUrVkiSvvnmG3l7e7u2OgAA4BJOB/4dd9yh1atXS5L+8Y9/aMSIEYqJiVGfPn3Ur18/lxcIAAAun9Oz9CdMmGD//p577tF1112nr776SjExMeratatLiwMAAK5x2Z/Dv+mmm3TTTTe5ohYAAFBGShX4ixcvLvWA3bp1+8PFAACAslGqwO/Ro0epBrPZbCosLLycegAAQBkoVeCf/3O4AADgz8npWfoAAODPp9SB//nnn6tevXrKysoq1paZmakbbrhB69evd2lxAADANUod+FOmTFH//v0VFBRUrC04OFiPPfaYXn31VZcWBwAAXKPUgb9jxw517ty5xPZOnTpp69atLikKAAC4VqkD/+jRo/L09Cyx3cPDQ8ePH3dJUQAAwLVKHfjVqlXTrl27Smz/7rvvFB4e7pKiAACAa5U68Lt06aIRI0bo7NmzxdrOnDmjF198UbfffrtLiwMAAK5hsyzLKk3Ho0ePqnHjxnJ3d9eTTz6p2rVrS5J2796tadOmqbCwUNu2bVNYWFiZFgzXyMrKUnBwsDL21lJQIJ/OxLUrPqJheZcAlKkCK19r9YkyMzMvOLH+vFL/Lv2wsDBt3LhRjz/+uIYNG6bz7xNsNpvi4+M1bdo0wh4AgKuUU388p0aNGvr000+VkZGhlJQUWZalmJgYVahQoazqAwAALvCH/lpehQoV1KxZM1fXAgAAyggPbwEAMACBDwCAAQh8AAAMQOADAGCAUk3aW7x4cakH7Nat2x8uBlfeXd3ukIe7d3mXAZQZ9+j88i4BKFNWYa7046X7lSrwe/ToUaqd2mw2FRYWlqovAAC4ckoV+EVFRWVdBwAAKEM8wwcAwAB/6BfvnD59WuvWrdPBgweVl5fn0PbUU0+5pDAAAOA6Tgf+t99+qy5duignJ0enT59WaGio0tLS5OfnpypVqhD4AABchZy+pf/000+ra9euysjIkK+vrzZt2qSffvpJTZo00csvv1wWNQIAgMvkdOBv375dgwcPlpubm9zd3ZWbm6vIyEhNmjRJzz//fFnUCAAALpPTge/p6Sk3t3ObValSRQcPHpQkBQcH69ChQ66tDgAAuITTz/AbNWqkb775RjExMWrTpo1GjhyptLQ0vfvuu6pfv35Z1AgAAC6T01f448aNU3h4uCRp7NixqlChgh5//HEdP35c//nPf1xeIAAAuHxOX+E3bdrU/n2VKlW0bNkylxYEAABcj1+8AwCAAZy+wq9Zs6ZsNluJ7T/+WIrf4A8AAK4opwN/4MCBDsv5+fn69ttvtWzZMg0ZMsRVdQEAABdyOvAHDBhwwfXTpk3Tli1bLrsgAADgei57hn/rrbdq/vz5rhoOAAC4kMsCf968eQoNDXXVcAAAwIX+0C/e+e2kPcuydOTIER0/flz//ve/XVocAABwDacDv3v37g6B7+bmpsqVK6tt27aqU6eOS4sDAACu4XTgjxo1qgzKAAAAZcnpZ/ju7u46duxYsfUnTpyQu7u7S4oCAACu5XTgW5Z1wfW5ubny8vK67IIAAIDrlfqW/tSpUyVJNptNM2fOVEBAgL2tsLBQ69ev5xk+AABXqVIH/quvvirp3BX+9OnTHW7fe3l5KSoqStOnT3d9hQAA4LKVOvD3798vSWrXrp0WLFigChUqlFlRAADAtZyepb9mzZqyqAMAAJQhpyft3XnnnZo4cWKx9ZMmTdLdd9/tkqIAAIBrOR3469evV5cuXYqtv/XWW7V+/XqXFAUAAFzL6cA/derUBT9+5+npqaysLJcUBQAAXMvpwI+Li9NHH31UbP2HH36oevXquaQoAADgWk5P2hsxYoR69uypffv2qX379pKk1atX64MPPtDHH3/s8gIBAMDlczrwu3btqkWLFmncuHGaN2+efH191aBBA61atUpt2rQpixoBAMBlcjrwJem2227TbbfdVmz9rl27VL9+/csuCgAAuJbTz/B/Lzs7W//5z3/UvHlz3Xjjja6oCQAAuNgfDvz169erT58+Cg8P18svv6z27dtr06ZNrqwNAAC4iFO39I8cOaJZs2bprbfeUlZWlnr16qXc3FwtWrSIGfoAAFzFSn2F37VrV9WuXVvfffedpkyZol9++UWvvfZaWdYGAABcpNRX+J999pmeeuopPf7444qJiSnLmgAAgIuV+gr/yy+/VHZ2tpo0aaIWLVro9ddfV1paWlnWBgAAXKTUgd+yZUu9+eabSk1N1WOPPaYPP/xQERERKioq0sqVK5WdnV2WdQIAgMvg9Cx9f39/9evXT19++aV27typwYMHa8KECapSpYq6detWFjUCAIDLdFmfw69du7YmTZqkn3/+WR988IGragIAAC522b94R5Lc3d3Vo0cPLV682BXDAQAAF3NJ4AMAgKsbgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAM4FHeBVyNbDabFi5cqB49epR3KaXWtm1bNWzYUFOmTCnvUozy4/GNOpq1R6fzTsjd5qEQv+qKDWsnf++K9j45eRnac2S1MnIOqcgqVKWAWqob3kneHgH2PvuOb1Badoqyzh6Vm81dHeoOLo/DAS7ox/RNOnoq+dx57uapEJ8IxVZqI3+vUId+J88cVvKJL5V5NlWy2RTkVUVNqt0ldzdPSdK6/TN0tiDLYZuYiq1VK7TFFTsWkxkZ+EeOHNHYsWO1dOlSHT58WFWqVFHDhg01cOBAdejQobzLw59Ies5BXRfaRMG+4SpSkZKPrtWWnz5Qq+hH5eHmpYKiPG058IECfaqoWdT9kqSUY+u17eDHalkzQTabTZJkWYUKC6qrYL9qOpyxozwPCSgm/cwhXRfSSMHeVc+d52lfaMvhj9WqxkPycPOSdC7st/4yTzUrtFTdyh1ks7kpO/eYbLI5jBUd2krVgxvYl93/tz3KnnGBf+DAAbVq1UohISGaPHmy4uLilJ+fr+XLl+uJJ57Q7t27y2S/eXl58vLixL7WNK1xr8NyXLXbtWbPv5R15ohC/a/TyZyfdSY/Uzdf/7A83L0lSfWr3a7Pd/9T6acPqGJATUlSdJXWkqTDGd9d2QMASqFptbsdluPCbtWa/dOUlXtUob6RkqTdaWt0XUgTh6v1398BkM4F/G/vbuHKMe4Z/t///nfZbDZ9/fXXuvPOOxUbG6sbbrhBgwYN0qZNm+z90tLSdMcdd8jPz08xMTFavHixvW3WrFkKCQlxGHfRokX2qzVJGjVqlBo2bKiZM2eqZs2a8vHxkXTuccHMmTNLHFuSdu3apVtvvVUBAQEKCwvTgw8+qLS0NHv76dOn1adPHwUEBCg8PFyvvPKKK39EuAz5hbmSJE/3c//eRVahbJLcbO72Pu42D9lkU0bOofIoEbhs+UX/O8/dzp3nuQWnlXk2VV7uftp86D2t+XGavv75A2Wc+bnYtvszNuvzfa9p48HZ2p/xtYqsoitau8mMCvz09HQtW7ZMTzzxhPz9/Yu1/zbER48erV69eum7775Tly5ddP/99ys9Pd2p/aWkpGj+/PlasGCBtm/fXqqxT548qfbt26tRo0basmWLli1bpqNHj6pXr1727YcMGaJ169bpk08+0YoVK7R27Vpt27bNuR8GXM6yLO05skohftUV6FNFkhTiGyF3Ny/tObpGhUX5KijK056jq2XJUm7BqXKuGHCeZVnac/xzhfhUU6B3ZUnSmfxMSdK+ExtUPaiBmkTcpSDvMH1zeK5O52XYt60R0lg3Vu2qZtXvUWTQjfoxfZP2pq0tj8MwklG39FNSUmRZlurUqXPJvgkJCerdu7ckady4cZo6daq+/vprde7cudT7y8vL0zvvvKPKlSuXeuzXX39djRo10rhx4+z93377bUVGRmrv3r2KiIjQW2+9pTlz5tjnG8yePVvVq1e/aC25ubnKzc21L2dlZV2kN/6IpNRlys49rhY1H7Sv8/Lw142Rd+iHX5bpYPo3ssmmqsE3KMinarFnm8CfQdLxlcrOS1OL6vfZ11myJEnVg29UteA4SVKQT5hO5Pykw1k7FVvp3COrqArN7NsEeleRzeauH46tUGzF1nJzMyqOyoVRP2HLskrdt0GDXyeV+Pv7KygoSMeOHXNqfzVq1CgW9pcae8eOHVqzZo0CAoo/49q3b5/OnDmjvLw8tWjx63Oy0NBQ1a5d+6K1jB8/XqNHj3aqfpTeD6nLdTw7Rc1qPigfzyCHtkoBtdQ69u/KK8iRzeYmT3cfrdnzL1X1qldO1QJ/zA/HVun46R/VrPq98vEMtK/39jh3xzTAq6JD/wCvisVm5f9WiE+4LBXpTEHWBZ/3w7WMCvyYmBjZbLZSTczz9PR0WLbZbCoqOvesyc3Nrdibh/z8/GJjXOixwaXGPnXqlLp27aqJEycW2y48PFwpKSmXrP1Chg0bpkGDBtmXs7KyFBkZ+YfGwq8sy1LSkRU6lrVHzaIekJ9XSIl9vTz8JEknTh1QXsFpVQmMuUJVApfHsiwlHV+tY6eS1az6vfLzDHFo9/UIlrd7gE7nZzisP52foUp+NUscNyv3mCSbvNz9yqBq/J5Rz/BDQ0MVHx+vadOm6fTp08XaT548WapxKleurOzsbIcxfvuM/nI0btxY33//vaKiohQdHe3w5e/vr+uvv16enp7avHmzfZuMjAzt3bv3ouN6e3srKCjI4QuXLyl1uVJP7lKD6t3l4eal3PxTys0/pcKiX98AHs7YoZM5h5WTl6FfTu7Sjp8XqkbF5g6f1T+Tl6msM0d1Jj9TlixlnTmqrDNHVVCYVx6HBThIOr5Kqdk/qEHV2+Xh5qncglPKLfj1PLfZbIqq0EwHT27Vkew9Op2XoeQTX+p0XrqqB527xX/yzGEdyNiirNxjysk/qV+yftCetDWKCKxnn+SKsmXUFb4kTZs2Ta1atVLz5s2VmJioBg0aqKCgQCtXrtQbb7yhpKSkS47RokUL+fn56fnnn9dTTz2lzZs3a9asWS6p74knntCbb76p3r17a+jQoQoNDVVKSoo+/PBDzZw5UwEBAXr44Yc1ZMgQVaxYUVWqVNHw4cPl5mbUe7erxqGMc5MlvznwnsP6+hG3q1qFc49uTuela++xtcovPCNfzxDVqnSzalRs7tA/5fh6/XJyp335qx/fkiQ1i7pfof41yvIQgEs6lLldkvTN4Q8d1tcPu1XVgupLkqIqNFWRVag9aWuUX3hWgd6V1bTa3fLzqiBJcrN56Mip3dqXvlFFVqF8PYNVI6SJokKaXtFjMZlxgV+rVi1t27ZNY8eO1eDBg5WamqrKlSurSZMmeuONN0o1RmhoqObMmaMhQ4bozTffVIcOHTRq1Cg9+uijl11fRESENmzYoGeffVadOnVSbm6uatSooc6dO9tDffLkyfZb/4GBgRo8eLAyMzMve99wXvwNz1+yT2xYO8WGtbton7hqXRVXraurygJcKj5mSKn61QptUeJvzQvyCVPLyAdcWRacZLOcmcmGa0ZWVpaCg4PVoc5g+y+EAa5JecXn1wDXkoLCXK3+caoyMzMv+riW+8AAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAMQOADAGAAAh8AAAMQ+AAAGIDABwDAAAQ+AAAGIPABADAAgQ8AgAEIfAAADEDgAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAABD4AAAYg8AEAMACBDwCAAQh8AAAM4FHeBaB8WJYlSSoozC3nSoAyVphf3hUAZaqg6Nzr+PnX9ZIQ+IbKzs6WJK1Lfr2cKwEAuEJ2draCg4NLbLdZl3pLgGtSUVGRfvnlFwUGBspms5V3OUbIyspSZGSkDh06pKCgoPIuBygTnOdXnmVZys7OVkREhNzcSn5SzxW+odzc3FS9evXyLsNIQUFBvBDimsd5fmVd7Mr+PCbtAQBgAAIfAAADEPjAFeLt7a0XX3xR3t7e5V0KUGY4z69eTNoDAMAAXOEDAGAAAh8AAAMQ+AAAGIDAB1BuEhIS1KNHD/ty27ZtNXDgwCtex9q1a2Wz2XTy5MmrYhygLBD4ABwkJCTIZrPJZrPJy8tL0dHRSkxMVEFBQZnve8GCBRozZkyp+pZHuH777be6++67FRYWJh8fH8XExKh///7au3fvFasB+KMIfADFdO7cWampqUpOTtbgwYM1atQoTZ48+YJ98/LyXLbf0NBQBQYGumw8V1qyZIlatmyp3Nxcvffee0pKStKcOXMUHBysESNGlHd5wCUR+ACK8fb2VtWqVVWjRg09/vjj6tixoxYvXizp19vwY8eOVUREhGrXri1JOnTokHr16qWQkBCFhoaqe/fuOnDggH3MwsJCDRo0SCEhIapYsaKGDh1a7K97/f6Wfm5urp599llFRkbK29tb0dHReuutt3TgwAG1a9dOklShQgXZbDYlJCRIOvd3IsaPH6+aNWvK19dXN954o+bNm+ewn08//VSxsbHy9fVVu3btHOq8kJycHD300EPq0qWLFi9erI4dO6pmzZpq0aKFXn75Zc2YMeOC2504cUK9e/dWtWrV5Ofnp7i4OH3wwQcOfebNm6e4uDj5+vqqYsWK6tixo06fPi3p3F2M5s2by9/fXyEhIWrVqpV++umni9YKlITAB3BJvr6+Dlfyq1ev1p49e7Ry5UotWbJE+fn5io+PV2BgoL744gtt2LBBAQEB6ty5s327V155RbNmzdLbb7+tL7/8Uunp6Vq4cOFF99unTx998MEHmjp1qpKSkjRjxgwFBAQoMjJS8+fPlyTt2bNHqamp+te//iVJGj9+vN555x1Nnz5d33//vZ5++mk98MADWrdunaRzb0x69uyprl27avv27XrkkUf03HPPXbSO5cuXKy0tTUOHDr1ge0hIyAXXnz17Vk2aNNHSpUu1a9cuPfroo3rwwQf19ddfS5JSU1PVu3dv9evXT0lJSVq7dq169uwpy7JUUFCgHj16qE2bNvruu+/01Vdf6dFHH+WPXeGPswDgN/r27Wt1797dsizLKioqslauXGl5e3tbzzzzjL09LCzMys3NtW/z7rvvWrVr17aKiors63Jzcy1fX19r+fLllmVZVnh4uDVp0iR7e35+vlW9enX7vizLstq0aWMNGDDAsizL2rNnjyXJWrly5QXrXLNmjSXJysjIsK87e/as5efnZ23cuNGh78MPP2z17t3bsizLGjZsmFWvXj2H9meffbbYWL81ceJES5KVnp5+wfaL1fR7t912mzV48GDLsixr69atliTrwIEDxfqdOHHCkmStXbv2ovsESou/lgegmCVLliggIED5+fkqKirSfffdp1GjRtnb4+Li5OXlZV/esWOHUlJSij1/P3v2rPbt26fMzEylpqaqRYsW9jYPDw81bdq02G3987Zv3y53d3e1adOm1HWnpKQoJydHt9xyi8P6vLw8NWrUSJKUlJTkUIck3XTTTRcdt6QaL6WwsFDjxo3T3LlzdfjwYeXl5Sk3N1d+fn6SpBtvvFEdOnRQXFyc4uPj1alTJ911112qUKGCQkNDlZCQoPj4eN1yyy3q2LGjevXqpfDw8D9UC0DgAyimXbt2euONN+Tl5aWIiAh5eDi+VPj7+zssnzp1Sk2aNNF7771XbKzKlSv/oRp8fX2d3ubUqVOSpKVLl6patWoObZfzu91jY2MlSbt3777km4Pfmjx5sv71r39pypQpiouLk7+/vwYOHGh/zOHu7q6VK1dq48aNWrFihV577TUNHz5cmzdvVs2aNfXf//5XTz31lJYtW6aPPvpIL7zwglauXKmWLVv+4WOBuXiGD6AYf39/RUdH67rrrisW9hfSuHFjJScnq0qVKoqOjnb4Cg4OVnBwsMLDw7V582b7NgUFBdq6dWuJY8bFxamoqMj+7P33zt9hKCwstK+rV6+evL29dfDgwWJ1REZGSpLq1q1rf4Z+3qZNmy56fJ06dVKlSpU0adKkC7aX9NHADRs2qHv37nrggQd04403qlatWsU+wmez2dSqVSuNHj1a3377rby8vBzmNjRq1EjDhg3Txo0bVb9+fb3//vsXrRUoCYEP4LLdf//9qlSpkrp3764vvvhC+/fv19q1a/XUU0/p559/liQNGDBAEyZM0KJFi7R79279/e9/v+hn6KOiotS3b1/169dPixYtso85d+5cSVKNGjVks9m0ZMkSHT9+XKdOnVJgYKCeeeYZPf3005o9e7b27dunbdu26bXXXtPs2bMlSX/729+UnJysIUOGaM+ePXr//fc1a9asix6fv7+/Zs6cqaVLl6pbt25atWqVDhw4oC1btmjo0KH629/+dsHtYmJi7FfwSUlJeuyxx3T06FF7++bNmzVu3Dht2bJFBw8e1IIFC3T8+HHVrVtX+/fv17Bhw/TVV1/pp59+0ooVK5ScnKy6des68S8D/EZ5TyIAcHX57aQ9Z9pTU1OtPn36WJUqVbK8vb2tWrVqWf3797cyMzMtyzo3SW/AgAFWUFCQFRISYg0aNMjq06dPiZP2LMuyzpw5Yz399NNWeHi45eXlZUVHR1tvv/22vT0xMdGqWrWqZbPZrL59+1qWdW6i4ZQpU6zatWtbnp6eVuXKla34+Hhr3bp19u3+7//+z4qOjra8vb2tv/71r9bbb799ycl2lmVZ33zzjdWzZ0+rcuXKlre3txUdHW09+uijVnJysmVZxSftnThxwurevbsVEBBgValSxXrhhRccjvmHH36w4uPj7ePFxsZar732mmVZlnXkyBGrR48e9mOvUaOGNXLkSKuwsPCiNQIl4c/jAgBgAG7pAwBgAAIfAAADEPgAABiAwAcAwAAEPgAABiDwAQAwAIEPAIABCHwAAAxA4AMAYAACHwAAAxD4AAAYgMAHAMAA/w/7lH2xlwd8gQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "<Figure size 480x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# create a heatmap of the matrix using matshow()\n", + "\n", + "plt.matshow(confusion_matrix_svc)\n", + "\n", + "# add labels for the x and y axes\n", + "plt.xlabel('Predicted Class')\n", + "plt.ylabel('Actual Class')\n", + "\n", + "for i in range(2):\n", + " for j in range(2):\n", + " plt.text(j, i, confusion_matrix_svc[i, j], ha='center', va='center')\n", + "\n", + " \n", + "# Add custom labels for x and y ticks\n", + "plt.xticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.yticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.8170385395537525" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "svc.score(X_train,Y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:43.879144Z", + "iopub.status.busy": "2021-11-09T03:53:43.878814Z", + "iopub.status.idle": "2021-11-09T03:53:43.885927Z", + "shell.execute_reply": "2021-11-09T03:53:43.884870Z", + "shell.execute_reply.started": "2021-11-09T03:53:43.879102Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.8012304779933743" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "accuracy_score(Y_test, y_pred_svc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prediction using Decision Tree Classifier" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.414719Z", + "iopub.status.busy": "2021-11-09T03:53:42.412027Z", + "iopub.status.idle": "2021-11-09T03:53:42.465457Z", + "shell.execute_reply": "2021-11-09T03:53:42.464395Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.414670Z" + } + }, + "outputs": [], + "source": [ + "from sklearn.tree import DecisionTreeClassifier\n", + "\n", + "dtc = DecisionTreeClassifier()\n", + "\n", + "dtc.fit(X_train, Y_train)\n", + "y_pred_dtc = dtc.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.485884Z", + "iopub.status.busy": "2021-11-09T03:53:42.485243Z", + "iopub.status.idle": "2021-11-09T03:53:42.506139Z", + "shell.execute_reply": "2021-11-09T03:53:42.505038Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.485837Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.81 0.80 0.81 1557\n", + " 1 0.47 0.48 0.47 556\n", + "\n", + " accuracy 0.72 2113\n", + " macro avg 0.64 0.64 0.64 2113\n", + "weighted avg 0.72 0.72 0.72 2113\n", + "\n" + ] + } + ], + "source": [ + "print(classification_report(Y_test, y_pred_dtc))" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.468239Z", + "iopub.status.busy": "2021-11-09T03:53:42.467658Z", + "iopub.status.idle": "2021-11-09T03:53:42.483494Z", + "shell.execute_reply": "2021-11-09T03:53:42.482335Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.468197Z" + } + }, + "outputs": [], + "source": [ + "confusion_matrix_dtc = confusion_matrix(Y_test, y_pred_dtc)" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 480x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# create a heatmap of the matrix using matshow()\n", + "\n", + "plt.matshow(confusion_matrix_dtc)\n", + "\n", + "# add labels for the x and y axes\n", + "plt.xlabel('Predicted Class')\n", + "plt.ylabel('Actual Class')\n", + "\n", + "for i in range(2):\n", + " for j in range(2):\n", + " plt.text(j, i, confusion_matrix_dtc[i, j], ha='center', va='center')\n", + "\n", + "\n", + "# Add custom labels for x and y ticks\n", + "plt.xticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.yticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9987829614604462" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dtc.score(X_train,Y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:42.512579Z", + "iopub.status.busy": "2021-11-09T03:53:42.511696Z", + "iopub.status.idle": "2021-11-09T03:53:42.524237Z", + "shell.execute_reply": "2021-11-09T03:53:42.523090Z", + "shell.execute_reply.started": "2021-11-09T03:53:42.512525Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.718409843823947" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "accuracy_score(Y_test, y_pred_dtc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prediction using KNN Classifier" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.119418Z", + "iopub.status.busy": "2021-11-09T03:53:15.118718Z", + "iopub.status.idle": "2021-11-09T03:53:15.188313Z", + "shell.execute_reply": "2021-11-09T03:53:15.187419Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.119360Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "<style>#sk-container-id-2 {color: black;background-color: white;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KNeighborsClassifier(n_neighbors=30)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" checked><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">KNeighborsClassifier</label><div class=\"sk-toggleable__content\"><pre>KNeighborsClassifier(n_neighbors=30)</pre></div></div></div></div></div>" + ], + "text/plain": [ + "KNeighborsClassifier(n_neighbors=30)" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.neighbors import KNeighborsClassifier\n", + "\n", + "knn = KNeighborsClassifier(n_neighbors = 30)\n", + "knn.fit(X_train,Y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.190286Z", + "iopub.status.busy": "2021-11-09T03:53:15.189853Z", + "iopub.status.idle": "2021-11-09T03:53:15.800866Z", + "shell.execute_reply": "2021-11-09T03:53:15.799696Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.190238Z" + } + }, + "outputs": [], + "source": [ + "pred_knn = knn.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.840171Z", + "iopub.status.busy": "2021-11-09T03:53:15.839811Z", + "iopub.status.idle": "2021-11-09T03:53:40.333004Z", + "shell.execute_reply": "2021-11-09T03:53:40.332162Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.840125Z" + } + }, + "outputs": [], + "source": [ + "error_rate= []\n", + "for i in range(1,40):\n", + " knn = KNeighborsClassifier(n_neighbors = i)\n", + " knn.fit(X_train,Y_train)\n", + " pred_i = knn.predict(X_test)\n", + " error_rate.append(np.mean(pred_i != Y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:40.334926Z", + "iopub.status.busy": "2021-11-09T03:53:40.334639Z", + "iopub.status.idle": "2021-11-09T03:53:40.729899Z", + "shell.execute_reply": "2021-11-09T03:53:40.728891Z", + "shell.execute_reply.started": "2021-11-09T03:53:40.334874Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'Error Rate')" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x600 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize = (10,6))\n", + "plt.plot(range(1,40),error_rate,color = 'blue',linestyle = '--',marker = 'o',markerfacecolor='red',markersize = 10)\n", + "plt.title('Error Rate vs K')\n", + "plt.xlabel('K')\n", + "plt.ylabel('Error Rate')" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.820436Z", + "iopub.status.busy": "2021-11-09T03:53:15.820173Z", + "iopub.status.idle": "2021-11-09T03:53:15.838086Z", + "shell.execute_reply": "2021-11-09T03:53:15.837096Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.820382Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " precision recall f1-score support\n", + "\n", + " 0 0.84 0.88 0.86 1557\n", + " 1 0.62 0.55 0.58 556\n", + "\n", + " accuracy 0.79 2113\n", + " macro avg 0.73 0.71 0.72 2113\n", + "weighted avg 0.79 0.79 0.79 2113\n", + "\n" + ] + } + ], + "source": [ + "print(classification_report(Y_test,pred_knn))" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:15.803343Z", + "iopub.status.busy": "2021-11-09T03:53:15.803004Z", + "iopub.status.idle": "2021-11-09T03:53:15.818621Z", + "shell.execute_reply": "2021-11-09T03:53:15.817622Z", + "shell.execute_reply.started": "2021-11-09T03:53:15.803297Z" + } + }, + "outputs": [], + "source": [ + "confusion_matrix_knn = confusion_matrix(Y_test,pred_knn)" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 480x480 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# create a heatmap of the matrix using matshow()\n", + "\n", + "plt.matshow(confusion_matrix_knn)\n", + "\n", + "# add labels for the x and y axes\n", + "plt.xlabel('Predicted Class')\n", + "plt.ylabel('Actual Class')\n", + "\n", + "for i in range(2):\n", + " for j in range(2):\n", + " plt.text(j, i, confusion_matrix_knn[i, j], ha='center', va='center')\n", + "\n", + "# Add custom labels for x and y ticks\n", + "plt.xticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.yticks([0, 1], [\"Not Churned\", \"Churned\"])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.8008113590263691" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "knn.score(X_train,Y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": { + "execution": { + "iopub.execute_input": "2021-11-09T03:53:40.732823Z", + "iopub.status.busy": "2021-11-09T03:53:40.731412Z", + "iopub.status.idle": "2021-11-09T03:53:42.225267Z", + "shell.execute_reply": "2021-11-09T03:53:42.224304Z", + "shell.execute_reply.started": "2021-11-09T03:53:40.732768Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.792238523426408" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "accuracy_score(Y_test, pred_knn)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Conclusion\n", + "So, Thank you for sticking with me until the end. If you are interested in learning more about this dataset, you can explore other machine learning classification models such as Ada Boost Classifier, Gradient Boosting Classifier, Stochastic Gradient Boosting (SGB) Classifier, Cat Boost Classifier and XGB Boost Classifier. Additionally, you can try tuning the model's hyperparameters using techniques like GridSearchCV. I am not going into detail about those topics, but if you are interested, feel free to explore them further. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} |
