From ff372087d9554c42840e87c6d80c55bd325f30d5 Mon Sep 17 00:00:00 2001 From: pmqvrsp4i <2522769846@qq.com> Date: Tue, 10 May 2022 23:06:00 +0800 Subject: [PATCH] ADD file via upload --- Kaggle_tatinic_logistic .ipynb | 3009 ++++++++++++++++++++++++++++++++ 1 file changed, 3009 insertions(+) create mode 100644 Kaggle_tatinic_logistic .ipynb diff --git a/Kaggle_tatinic_logistic .ipynb b/Kaggle_tatinic_logistic .ipynb new file mode 100644 index 0000000..307965e --- /dev/null +++ b/Kaggle_tatinic_logistic .ipynb @@ -0,0 +1,3009 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "2D3940F0AF89478480A8B8B999B95629", + "mdEditEnable": false + }, + "source": [ + "# Kaggle泰坦尼克之灾\n", + "\n", + "\n", + "\n", + "\n", + "### 关于泰坦尼克号之灾\n", + "\n", + "\n", + "- 泰坦尼克号问题之背景\n", + "\n", + "\t- 就是那个大家都熟悉的『Jack and Rose』的故事,豪华游艇倒了,大家都惊恐逃生,可是救生艇的数量有限,无法人人都有,副船长发话了『lady and kid first!』,所以是否获救其实并非随机,而是基于一些背景有rank先后的。\n", + "\n", + "\t- 训练和测试数据是一些乘客的个人信息以及存活状况,要尝试根据它生成合适的模型并预测其他人的存活状况。\n", + "\n", + "\t- 这是一个二分类问题,是我们之前讨论的logistic regression所能处理的范畴。\n", + "\t\n", + "## 说明\n", + "\n", + "『解决一个问题的方法和思路不止一种』 \n", + "『没有所谓的机器学习算法优劣,也没有绝对高性能的机器学习算法,只有在特定的场景、数据和特征下更合适的机器学习算法。』\n", + "\n", + "## 怎么做?\n", + "Andrew Ng老师似乎在coursera上说过,应用机器学习,千万不要一上来就试图做到完美,先撸一个baseline的model出来,再进行后续的分析步骤,一步步提高,所谓后续步骤可能包括『分析model现在的状态(欠/过拟合),分析我们使用的feature的作用大小,进行feature selection,以及我们模型下的bad case和产生的原因』等等。\n", + "\n", + "Kaggle上的大神们,也分享过一些experience:\n", + "\n", + "『对数据的认识太重要了!』\n", + "『数据中的特殊点/离群点的分析和处理太重要了!』\n", + "『特征工程(feature engineering)太重要了!』\n", + "『要做模型融合(model ensemble)!』\n", + "\n", + "## 初探数据\n", + "\n", + "pandas是常用的python数据处理包,把csv文件读入成dataframe各式,我们在ipython notebook中,看到data_train如下所示:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "DFC95DE04483465083C0140B0D735524", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived Pclass \\\n", + "0 1 0 3 \n", + "1 2 1 1 \n", + "2 3 1 3 \n", + "3 4 1 1 \n", + "4 5 0 3 \n", + "\n", + " Name Sex Age SibSp \\\n", + "0 Braund, Mr. Owen Harris male 22.0 1 \n", + "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", + "2 Heikkinen, Miss. Laina female 26.0 0 \n", + "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", + "4 Allen, Mr. William Henry male 35.0 0 \n", + "\n", + " Parch Ticket Fare Cabin Embarked \n", + "0 0 A/5 21171 7.2500 NaN S \n", + "1 0 PC 17599 71.2833 C85 C \n", + "2 0 STON/O2. 3101282 7.9250 NaN S \n", + "3 0 113803 53.1000 C123 S \n", + "4 0 373450 8.0500 NaN S " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import warnings\n", + "warnings.filterwarnings(\"ignore\", message=\"numpy.dtype size changed\")\n", + "warnings.filterwarnings(\"ignore\", message=\"numpy.ufunc size changed\")\n", + "#https://stackoverflow.com/q/40845304/10704205\n", + "#Ignore RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility\n", + "\n", + "import pandas as pd #数据分析\n", + "import numpy as np #科学计算\n", + "from pandas import Series,DataFrame\n", + "\n", + "data_train = pd.read_csv('train.csv',engine = 'python',encoding='UTF-8')\n", + "data_train.head() #dataframe格式" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0C9407007EEC48D78DEA889D00BFFFB5", + "mdEditEnable": false + }, + "source": [ + "\n", + "我们看到,总共有12列,其中Survived字段表示的是该乘客是否获救,其余都是乘客的个人信息,包括:\n", + "\n", + "- PassengerId => 乘客ID\n", + "- Pclass => 乘客等级(1/2/3等舱位)\n", + "- Name => 乘客姓名\n", + "- Sex => 性别\n", + "- Age => 年龄\n", + "- SibSp => 堂兄弟/妹个数\n", + "- Parch => 父母与小孩个数\n", + "- Ticket => 船票信息\n", + "- Fare => 票价\n", + "- Cabin => 客舱\n", + "- Embarked => 登船港口\n", + "\n", + "让dataframe自己告诉我们一些信息,如下所示:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "878F32156AC54EFD832D9C6E816038C8", + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 891 entries, 0 to 890\n", + "Data columns (total 12 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 PassengerId 891 non-null int64 \n", + " 1 Survived 891 non-null int64 \n", + " 2 Pclass 891 non-null int64 \n", + " 3 Name 891 non-null object \n", + " 4 Sex 891 non-null object \n", + " 5 Age 714 non-null float64\n", + " 6 SibSp 891 non-null int64 \n", + " 7 Parch 891 non-null int64 \n", + " 8 Ticket 891 non-null object \n", + " 9 Fare 891 non-null float64\n", + " 10 Cabin 204 non-null object \n", + " 11 Embarked 889 non-null object \n", + "dtypes: float64(2), int64(5), object(5)\n", + "memory usage: 83.7+ KB\n" + ] + } + ], + "source": [ + "data_train.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2E77A7E73003473DBE95365B98118688", + "mdEditEnable": false + }, + "source": [ + "上面的数据告诉我们,训练数据中总共有891名乘客,但是有些属性的数据不全,比如说:\n", + "\n", + "- Age(年龄)属性只有714名乘客有记录\n", + "- Cabin(客舱)更是只有204名乘客是已知的\n", + "\n", + "想观察具体数据数值情况,用下列的方法,得到数值型数据的一些分布(因为有些属性,比如姓名,是文本型;而另外一些属性,比如登船港口,是类目型。这些我们用下面的函数是看不到的):\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "5DDB3F0E16054E048EA1C5EDC1BB8C6D", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvivedPclassAgeSibSpParchFare
count891.000000891.000000891.000000714.000000891.000000891.000000891.000000
mean446.0000000.3838382.30864229.6991180.5230080.38159432.204208
std257.3538420.4865920.83607114.5264971.1027430.80605749.693429
min1.0000000.0000001.0000000.4200000.0000000.0000000.000000
25%223.5000000.0000002.00000020.1250000.0000000.0000007.910400
50%446.0000000.0000003.00000028.0000000.0000000.00000014.454200
75%668.5000001.0000003.00000038.0000001.0000000.00000031.000000
max891.0000001.0000003.00000080.0000008.0000006.000000512.329200
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived Pclass Age SibSp \\\n", + "count 891.000000 891.000000 891.000000 714.000000 891.000000 \n", + "mean 446.000000 0.383838 2.308642 29.699118 0.523008 \n", + "std 257.353842 0.486592 0.836071 14.526497 1.102743 \n", + "min 1.000000 0.000000 1.000000 0.420000 0.000000 \n", + "25% 223.500000 0.000000 2.000000 20.125000 0.000000 \n", + "50% 446.000000 0.000000 3.000000 28.000000 0.000000 \n", + "75% 668.500000 1.000000 3.000000 38.000000 1.000000 \n", + "max 891.000000 1.000000 3.000000 80.000000 8.000000 \n", + "\n", + " Parch Fare \n", + "count 891.000000 891.000000 \n", + "mean 0.381594 32.204208 \n", + "std 0.806057 49.693429 \n", + "min 0.000000 0.000000 \n", + "25% 0.000000 7.910400 \n", + "50% 0.000000 14.454200 \n", + "75% 0.000000 31.000000 \n", + "max 6.000000 512.329200 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_train.describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B6D0B24DA7CD46538040DB87EFDEA1BC", + "mdEditEnable": false + }, + "source": [ + " mean字段告诉我们,大概0.383838的人最后获救了,2/3等舱的人数比1等舱要多,平均乘客年龄大概是29.7岁(计算这个时候会略掉无记录的)等等…" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7AAF8DDD496A4E429B47B60927BC95D5", + "mdEditEnable": false + }, + "source": [ + "## 数据初步分析\n", + "\n", + "**- 『对数据的认识太重要了!』**\n", + "\n", + "**- 『对数据的认识太重要了!』**\n", + "\n", + "**- 『对数据的认识太重要了!』**\n", + "\n", + "仅仅最上面的对数据了解,依旧无法给我们提供想法和思路。我们再深入一点来看看我们的数据,看看每个/多个 属性和最后的Survived之间有着什么样的关系呢。\n", + "\n", + "### 乘客各属性分布" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "7D84F87CCD8D4DECA515BA51E3931BDF", + "scrolled": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaIAAAEaCAYAAABTklN3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAABpyklEQVR4nO2dd3wVRdeAn5NGCARCCSUECEiTIi10UESRpggIgh07KogC8qmoWAHrq6KoKIIKCmIBFQVBQelIld4JHWmhhJaQ8/0xm3BzuWmQ3JsyD78ld2ennN2dnTNzpomqYrFYLBaLr/DztQAWi8Viyd9YRWSxWCwWn2IVkcVisVh8ilVEFovFYvEpVhFZLBaLxadYRWSxeEBEKvtaBk+IiL+P0y+UxrXCIlLMm/LkNHz9fnIreUoRiciLItLX5by3iLyVit+mIjJRRIaLyA2O2+Mi0tuD32Yi8ooH90UiEpmFt5AUb7iIrBeRMi5uzUVks4gUdvM7XEQqOL/bi8isDKZxyO18h3vcLtemi0gHERnrFDYBzr2Li5+yjszhmblXXyCGEBEpk/TsPPCBiAxMI44AEVktIjXd3P8RkQaphHlCRKpmUtYyIjLJkbk5MDUDYRqJyD2ZScdDHE1F5Ds3twBgdRr3MAi46DtJI43CIrLIza2qiMzLRBx+TjwRnt5ldpUJLmG88n7yepkQkBEBczIi8jHQ0jktBSSISB/nPAwIFpH2zvk/wP0YBewPCDAS6Ar8DtwCPO4hmQJAMeeluL7UWsBPInLOOZ8GrAFUVaeISBSwHohxiy8QOKqq0anc1gjgfVXdLyIRwFdAbef+3FkEzBaRTkAR4KynCEXkKuBrF6cwEVnjch4BLBGRROf8XmBFUnBgJzAPCAeigH3qMglNVfeJyEhH9vtTuS+vIiLTgeJA0j2FYe7lEHAGOAVsFZGxwBi34GFATRG5zc39YVVdBjwChACfu3x7MzDPcZSL25+q+qyI1ADuAz4SkXec30eBOMdfVaCRqv7rlt5HwAJVVRFZCESISHtVnZ7GrQ8C/krjeqYQkeuASs7pRmCAiCxzzpeq6koRKQg8BpwVkevTiG6Aqv4qIhOBB4DaIrIU+FlVXwLuAk6KSGNVXeKk7w8sB4K58C7DgcNAPOY9xgFbgQe8VCYk4a33k6fLBMkLE1pFpKCqnhaRBzEF/HeO+43AFar6novf54GeQEGgBLDbudQLWAlscYl6GvAG0BQYDjwPKJBU83oCGAscc86nAkHAn5iC6l9goqo2dZM3CvjOkyJyag//ApWde7oSuBn4BVgNhKrqSbcwNziydwauUdW7UnlOwap6xvk9Q1XbuVz7Ceipqqed8xJc+FgqAvuAc8AUoBCmwHCtQXUA9gPbgKtU9aAnGbyFiFyB+UBc6Y4pbCa5uQcB96lqb6emWlxV/+chznHAOCeO/wN6YD7kWzAF5CqgHfA+cL+qHnbCCUZJvYAp0DdiCqPRqjrHeceTgbqqet4lvaed+K5PcheReph82UZVNzpuwzEFJ5iCpwgX8rU7c1X1wVSuJaU7HLgNo8R3Ahswhf5WLigCgGuAxar6qoi8AZxT1edc4pkOvKqqF7VwHOXTGpiV9H2ISBAmf92Fea4dVDXRudYZ857A5L93gYcw32MScar6m+M/28oEVR3ohPPK+8kXZYKq5uoDk3mWAFUwtdpeLtdGANcDXwDXOW4lMS2ZpsBETE2oAabWM8Lx0wT40fn9HPAq8JkTfxOgvXOsB+50OQ93wjTEKKMrgEUeZI7C1CQ93U8vYHIqYRQo7OI2x3nRScdxTO1wv9vxnOO/H/A6UBf4Gwh03MtgMvV1mNpgqON+jfN3OlADqAkUw7Twkvz8hilAk2T6DpN5fZ0vrgP6Osc7zt9vHfn6Av9zuX4bcK2TJ85gCp6lbsfLTpzlnOdY0knnced9DwLuddx6AO1cZBmEqaT0wCihMCfv9XSuTwZucZP/UWAzUNrDvd3mvNc2bu5+wFygu3N+L+B/ic+vKaayBMbcFg08AzzvllfvBtoCa4ECbnFMB1qmEv9SoDAu3wfwmvM+ijlpDXWTp7XznjZhWlOt3Y6m3igTvP1+yAdlgk8Li6w6MDWzOOBNN/cxQEcno72HMYld6Xw0rZ1MNxhTy90GfOiEewB4y/n9OdAb+AC4kZSF0wZgmcv5j27pR5F5RTQceCYjmc6Dn2lAVze354GXnd8C/IEpaKu6+CkPLHF+98QpFDHN/wecTNcco3jfcORIUro7kzKgc/4sMCwH5IkKzvu6EYhx3PoCTzi/dzgf2ZfATZiKxEZMBeIxl3iuw7QEGrjFP9stL+wDtru5zXf8dsIUav9xofCbyoWCfi7GpJvonL/ipBmVxv11xrRS3nRxexaY5HK+Hwi+xOeXrIhc3NYBTTz4be74/9W5j6TjBOa7cnWr5Dzzlc61vc7vNcACoAUwHvOtLgX6YyoBSc/0iPPu3CsKScdnXigTvPp+yAdlQl7oI+qMqUn9BRwRkfuBJ53L5TAFSVKz9TlVHSoiXwJXYTJhHYx5rQim1gPQDPMCwTQ9dwLRqvqLiJzB2GbdSVDVD1zkKuH8rCsiOzG14INAWUyhdTSVWyqBKdAyhYgEA62Ae0WkrarOdC4FA3EiUhx4iwv24n2Onb8A5jmUd7UPi+kwHoQxJ4F5jsMw5q1fgGYisgKIVdUTLqIcwjTbfU0xTI0NUvaFJo1qCsYooG8whdAEjJnlIDBeRFpinlV1oJOqbnCNXFWvdT0XkRHABlUd5y6Iqk4TkWeAMar6h+NcFCgjIuVUtZWTXlJH/68Yk8ssl74md97G5N2iTvqdgJeAoyKSJGtJYJWIKKY/a4Gq9kotQjeKAfVE5COgGuadVgS+cGQKBBIwprm7HRmqYb6TWOfco2lORPph3kkbTIH3CdAFWKmqa0VkiHNf3YBr1ZjRXhCRxpg+2jcwCsmduaq6xwtlgrffT94vEy6ltpSTDkxrJRJTc9rqdm0p8JKHMCGYjLoAp7aAaTbPw3wc+4FCLv5bAx84vz9zXsSNbscBF/8VMcomAlO43OKEq8KFzr3XUrmf94F+ma39APdgPtJCwHygr+P+DuYjjHSeVTFMTe8ul7A3YprvhTzEe53zHEdgPio/jCnmB+BN4AU3//2B93JAvvDDKJEeGHu8YFoeGzB9DTEYpRTt3EslTE15OMasM9c5VgIfAn2AWzE1/7GOu6djnfOeks4nYDrOTzvhfsNUZLZgapofO/J+iwfzhfPe17u5vQD0dzlv5eS3ScAgF/fkGjfGHDUxg8/uS0zBtxin1u/kraTWXDAuphiXcFuAMJdzj6Y5jDn7CUwr8R1Htk89PMv1QIQTpqZzj3MwNXL341egi7fKBG++H/JBmZDrW0SqOs6pAXUHVERKq+oBMcOqCwKdReQVVU0AEJHyGJvlP5im+xARiVDVD0XkPczD/EhV4zynCBgbcEc3txCX349gbLp7geedztCtLjIfBIakEvceTC0xwzidvC9i+ijinNrX7yIyE5NRNqjqbjHDcX/EjKppjmlmg8l06zHP8AuXePs7bucxhfZs575nYTJcY4wt3ZVI5x58zZtAfWChcz4QUzE4jun3m4n5+M5gzENfYAYbLAP+p6r/QXJHcUtM7bYJMF1V700tUacWukVV67m4lQVuxyjB7Zh8eRbTN9RfRIYClZ1zdxpgWmyuhDgyJ9EMo3BbkjX0xyjxQaq6Q0TqYvrBCouIH6b/5idNWeu9FG7GPItNwGG3Z9YDU0juczreP8TU3gdjFI87yd+Ml8sEb7yfPF8m5HpFJCLPYbTxzZgPORw4gOmAew+oh/mw3naC3I1pIh/B1MpuBYaKGSJaGtOBeo60eVLdhmeKyG7nbxhmaG4j57wQplPyZjf/ZYBmqvqjW9zzXGTNKB8Dq1R1DoCqxopIU1VNFDPB8IQzUm8q5mPegXkmiEgtTA2nPTBdRKao6jERKeXcQzvMhzgGY8e/CjOCxx8zdDbYTZargQGZlD/LUWdkE4CIbMK0aEZiWkY3AxNUNUWBJiJbMIVGXzeTS9Jw+9aOv44YkwSYgu20azRAKRFZ6Zy/pKo/ishyTMv6/zDKbrrzfl7DmDXuUmeEmIs8gunXGuV2e2FcMC2hqm84/tMq6IqQcsRbqqjqUdf7V9VVjjJ6ApN/imMG4iTJ+SWmAEoAFrmEjQQmiIjr89mDaRnejmlxfouxFkwTkZsw3+4xYCimY1zFDH3vhDOcGPNdu5M8PNhbZYIX30/eLxMy0lTPyQfGzBGAMbdswmTWNzFzAPwwzc7twIOO/wDH/Q7gK8etMGY46BLMB7YAGI1jZuBi09x6TA3C9TjhXH/NJd4yGDv1O875FVzoAGwDTPVwP37OfVzp5h7FxSNkAjDN7HVAEcetACZTC6bDfj+mhhII1HT83IuZ/3AlplXQxnEfhDHHlHGJqwDGHn6F43a1I9/9mJbhdpwRYhjzySbAz9f5wuW9foiZg1ICZ7CCkyeWYwrEcBf/W1KJJxKY48E90In/NRe34sAuN3/vYEYkPYPpRN/s5IVaGBPXxxg7ehe3uD9wwvlzYapFMObjj/Ygz9NcbPq5F1P4nwL6ZOLZXYMzUgujbLs7skzEFIobMS2WVEd9kbZpbggXRmh1wJh5GmNMWDvwPCiiNUZ59fVw/M4F05w3ygSvvR/yQZng88Licg8ns+11MvDbwE/A90CIi58qmObzM06mT8A0wZOGUS50woY45yGY2vMwlw/AVRG1d5OhALDf+X2D8zIbOC/8GRd/wZiaw2ZMAdQ9lXu6DbcRSx78CGaE1zwcO7rjHo2pWZ13Mvd7LteecTL9BuBB53dHt3ifd9zDMAo2AfMRBmA699cCrVz8t8KYtEo4z/02X+cJR67HMYX7B1yww7uOmiuIsb2fwhnZhOdh20ud+5uTRlohGDNILGak2Cdp+B2PqYH2w4yiSxrCfa2Th9/DFMgbMeaipMJkOWYU1kHg01TivhWXYeBO/gvGFGQZrhxgCqRYzATVh5y8+glQ38XPFZi5UYPTiCc1RXST8x0swxT0a7kwTLq2I/cYHEXlEq6188w8vaN9XFBE2Vom+OL9kMfLBJ8XGFl1cKFGUjaV6wVwm+fgcs3jEEqXOFvjQRFhWjVbMTWI993C+gEVL+N+HgVKpeOnofvH6nItMEn+VO7JL7X4kz4uD+5+QJCnODEzvB/1dT5wkakWUN3NLVkRubiVx0wUhEy2iC5RrtbOB9wDqOR2LRxTQAbhoUbt+v4ymNYlD992iaNAWoUkEJDGtVTnEbn5ux5HEbm8u4Xuedt5dh+nEscIXFqUrs8qq8sEX72fvFwm5ImVFSwWS95CRPzVZZUJS97GKiKLxWKx+JQ8tfq2xWKxWHIfuX74dhIlS5bUqKgoX4thAZYtW3ZIVbNkOwj7XnMOWflewb7bnEJWv9dLIc8ooqioKJYuXeprMSyAiMRkVVz2veYcsvK9gn23OYWsfq+XQp5RRBaLxWJJnSkr9vDmjI3sjT1NRFhBnmpXnS71y/laLMAqIovFYsnzTFmxh2d+WM3peDMQcU/saZ75YbWPpbqAHaxgsVgseZw3Z2xMVkJJnI4/z5szNvpIopTk6xZR1NPT0vd0GewY0Slb47cYLvc92vdkyevsjT2dKXdvY1tEFovFkseJCCuYKXdvYxWRxWKx5HGealedgoH+KdwKBvrzVLvqPpIoJVYRWSyWDCMid4rIFpcjTkR6iMgxF7eXfS2nJSVd6pdjeLc6lAsriADlwgoyvFsdO2rOYrHkPlR1PGYFcUSkKGal55+Anapa51LizMnDivMSXeqXy7HP1Soii8VyqQzAbA9RAjiakQAi8hBmawlKly7NlN9msufoaXqVV2cP0hPsWb+MKfvXEVYwMLvktuQwrCKyWCyZxtkS/U7M7pwVgVoishWzaeQTqrrFUzhVHY3ZYI7o6Gh9c5U/e2L9L/JXLsyf+U+3zibpLTkN20dksVguhZ7Ab6oap6rrVLUEUBWYDXyR0Uhy+rBii3ewishisVwKtwGTXR1UNRFjqquV0UhCgi5uDaXlbsmbWEVksVgyhYgUwuwEOs85L+24gTHXLcloXHHnPO99l5q7JW9i+4gsFktmqQesddlBtTIwUUQSgC3Ag74SzJI7sYrIYrFkClWdD7R2OV+IGbBgsVwS1jRnsVgsFp9iFZHFYvEZLa4onil3S97EKiKLxeIzekRXyJS7JW9iFZEly+jYsSMPPPAAACLSX0R2ishGEemQ5EdERojIbhFZLSINfSasJUfw1OSVmXK35E3sYAVLljBjxgxWrlxJREQEQAHgMcx8kvLALBGpCLQCWgJRwLXAGMwILEs+JT4xc+6WS+e5Kav5ZvEuzqviL8JtTcrzapdLWh4wy7GKyHLZxMXFMXToUAYPHsyaNWsAwoAvVPUEsE5EdmDmnXQDxqlqAjBTRMJFpIyq7veV7BZLfuC5KasZv2hn8vl51RTnvsaa5iyXTf/+/RkwYABhYWFJTkFAjIuX3UBZTOvI1X2P454CEXlIRJaKyNKDBw9mj9AWSz4iNaWTU5SRVUSWy+KLL75ARLj11ltdnQVwNa4kAucxCsqTewpUdbSqRqtqdHh4eDZIbbFYchLWNGe5LEaOHElsbCw1atTg2LFjnD59GiAccN34JBLYBexzc4/AtJYsFks+xraILJfF0qVL2bJlCxs2bGD48OF0794dzFYAvUQkRESuBIoDK4FpwD0i4i8ibYFNqnrEZ8JbLJYcgW0RWbKDU8APwFrgDPCAqqqI/AhcA2wDDgO3+05Ei8WSU/BJi0hEfhWRz5zfdr5JHqF379589tlnAKjqMFWtpKpXOmuToaqJqvq4qlZU1QaqusGnAlsslhyB11tEItIOM3dkr4hcgZ1vYrFYLPkar7aInD1LXgLecJy6At+q6glVXQfswG2+iarOBMJFpIw3ZbVYLBaLd/C2ae494B0g1jl3n1di55tYLBZLPsNrikhE7gFUVb91cU5tXomdb2KxWCz5BG/2EfUDwkRkA1AUKAgUwcwtScLON7FYLJZ8htdaRE7LpYqq1gCeAb4DGmHnm1gsFku+xqfziFR1mYiMx843sVgslnyLT+YRqeo4VX3A+W3nm1gsuQwRWSsiW5zjc8fN45xAiyU97MoKFovlUiigqlWSTlKbE6iq8b4S0JJ7sGvNWSyWS0HdzlObE2ixpIttEVkslkzhTEwvLSLbMPP9BmNaQWtcvCXNCXQP+xDwEEDp0qUZUich1XTmzJmTdULncwam8Zwf96IcqWEVkcViyRSqGoeZeoGI9AB+BH4mg3P/gNEA0dHR+vbq1IugHXe0zjKZ8zu9n57maxHSxCqiXE5UNmewHSM6ZWv8ltyNqk4WkY+4eO5f0pxAiyVdbB+RxWLJFCJSVERKOL87AEcwc/88zQm0WNIlU4pIRDqJyN0iUsTNXUTkOxGpkLXiWbzBqa3/cHLNHySePZXCXVXp3r07O3fmjH3tLVnLzz//DMDixYszG7Q4sEREtgLPAT1UdRmQNCfwB+BBVXUf0GCxeCSzprkw4C6gt4gUBBYCHwFPAP+pqi2xciGJZ04St2Y2J1f/gSacpUBEDULrd+T40p8o1bQCFSrY+kVeIiEhgXfffZeuXbsC0KFDB2677TZGjhzJtm3bqFKlSprhVXU7cIUH92HAsOyQ2ZK3ybAiEpHamCGbI1X1JxEJAN4CNgKLVLV5NsloyUbOHdyBiBDa8CZCqjZBE89z9M8x7P20DwUiqjNq1C++FtGShXz99desW7eOvn37MnPmTM6cOUPdunW56aabuOWWW/Dz8+O7775DRHwtqiUfkSHTnKN0+gPPYtaAmwj8ARQA6gLbROT5bJPSki1o4nlOLP2JYwu/JW7NHxyc+joHJg5Bz8dT9r6RBISV4ZVXXvG1mJYsZPPmzRw6dIh9+/axc+dOoqKiEBHat28PwJtvvmmVkMXrZKhFpKoJIvKKqu4UkYeBXkB/VV0JICJ3Ab+JSGNVXZJ94lqyEvHzp2iLXgQUKcWJlb8Rt/5vil/3EEGlKwNQ4saBzJ//AUuWLKFx48Y+ltaSFQwdOpSEhATefvtt6tWrR6FChQA4cOAA3bp1o3Llyj6W0JIfyUwfUTcRKQC0A4YDt4rIS5gRM69iVsi2SiiXcWrjQvR8PKe3L6do0x7EbZhL7LwJ+AWHUrR5T6pVq2aVUB7jtddeY+jQoTRq1IipU6dSpEgRnn/+eVSVuXPnMnr0aF+LaMlnZGbU3C7gAFAKeBn4CwgFRgAzgL+zXDpLtuNfpCT+hcJIPBVL7NwJBJevTeK50xRt2p3/vn2Bq6++2tciWrKQmJgY/vjjD3bu3EnRokV56623eP7559m7dy+PPfYYMTEx6UdisWQxmVFEYZgWVCQQjxk1UxOzlMdBwA5WyIUknolDE8+TcOIw4u9PQux+4g/vxD+0JH4hRViwYIGvRbRkIe+++y4NGzbkf//7H0uXLuWRRx6hffv2tGzZknr16tn+IYtPyIwiigOKASuAmcBOzMKG84FXgCgRCc1qAS3Zi19gARLPnCSodGWCo+rjXyScgCKl2T/hKcKa92LHjh2cOHHC12Jasoj77ruPsLAwwsLCqF+/PkOHDuXPP/9k/fr1fPPNNyQmJqYficWSxWRGES1R1TcxSuhVoDFmHlErYDpQA6ie5RJaspWgiOoUbXILBaPqU7R5T87t20SBcjUoc/vrBFduyIYNG9i4caOvxbRkERUrViQgIIDu3btTsGBBxo8fz7fffsvJkydZt24dPXv29LWIlnxIhgcrqOo252/ShLUXkq6JyNNALLA8K4WzZD+BYWUAKNrsVgDCWt2ZfO3YosmUCwujQYMGPpHNkvUUKVKEGTNmMGTIEE6cOMHLL78MwNVXX22H6lt8xmUveioiz2KGc7dTVduuzyMcW/gtcev/5vtVC/Dzs0sS5nXsajwWX3LJikhEmgCvA1uAlqp6PMuksviMs3s3cnTOWAKLRVDmjjcoW/aiLWUseYDExEREBFVN8VtVbcXD4nUys8TP08BeoCJwHbAOM6l1VTbJZvECxxZNxr9wcRKO/ceZmFUElqxA8esfIqiUndiYV1m1ahUtW7Zk7dq1tGjRAlVl/fr1NG/eHBGxIyUtXiczLaJDmAEJlYEKmHlFdpRcLsevYBHiD+8mIXY/548fJKBIOInnTvtaLEs2UrduXf7++29atWrF3LlzAVL8tli8TWYGK3yW9NtZe64j8JqIHAQeVtXD2SCfJZsJrdsu+bcmnuf01qXE/v0V/gWLULx933TDnzt3jieeeIJZs2ahqowYMQIAEekPDAROA0+o6m+O+wjgTuAo0NvZPiBXkxWbE3prA8L4+Pjk365zhuz8IYsvuaQ+IlVNAH4SkZ+BZ4C/RKSj3QYidyN+/oRUbULBKo05vmgyB75+mp0DW6S5DcSRI0do06YNo0aNYtOmTUnLARUAHgNqAeWBWSJSETPUvyUQBVwLjAHqZetNWZKJi4ujUaNGBAYGkpiYSFxcHM8++6wdqGDxOZc1as7Z+GqYiBwGfhSRpqoan144S85GRCja7Fb8CobStWtXFi1aRGBgoEe/ZcqUoXv37gBUq1aNgIAAMBunjVXVE8A6EdkBNAS6AeOcisxMEQkXkTKqut8Lt5XvKVSoEIsXL2b48OG0bduW1157jdBQY12fP3++j6Wz5Gcue/g2gKp+IiLlgTLYferzDKH1OtAhIIz9+/dTvnz5dP2PHTuWq666itmzZwcArouW7QbKYlpHU13c9zjuKRSRiDwEPATYTfmymNDQUIYNG8aMGTNITEykRYsWAOluhmexZCdZNk5TVZ9T1VSVkIgEicgoEdkkIptF5BbHvb+I7BSRjSLSwcX/CBHZLSKrRaRhVslpyRyvvvpqhpTQiBEjeP/995kwYQKAAK5zyhKB80BQKu4pUNXRqhqtqtHh4eGXI74lFdq1a0eHDsmfG59//rkPpbHkd7KkRZRBigN/quqjIlINs+f9GmxfQq7nscceIy4ujvnz5xMSEgJmUdxyLl4iMS3lfW7uEZjWksViycd4TRE5/QDfOb83iUgCZkWGb21fQu5l0aJFbNy4kVmzZrk6HwN6ichbmHlnxYGVwDTgURGZALTB7GF1xMsi50m8NXJPRIKAd4HrMS3fp1X1exE5hlmFH+BrVX0hlSgslovwZosoGRG5F/gXU0Ctcblk+xJyGStXrmTp0qXufQwBwDhgLXAGeEBVVUR+BK4BtgGHgdu9LK7l8vFk2fgF2KmqdXwsmyWX4vW1PJwVGh4H7iD1PgPbl5BL6NOnD7GxsWzZsiX5AI6r6jBVraSqV6rqfABVTVTVx1W1oqo2UNUNvpXekllUdb+qJls2gATMIKWjPhXMkqvxaotIRD4ECgEtVPWUiLj3Gdi+BIsll+Bi2SgE1BKRrcB6zATmLamESbZilC5dmiF1ElKNf86cOVktcr5lYBrP+XEvypEaXlNEItIUqK6q17s4TwO+sn0JFkvuwrFs9AQ6quo+oISI+AFPAl8ALTyFU9XRwGiA6OhofXt16kXQjjtaZ7HU+ZfeWdCHmJ14s0VUD4gWEdeaUl9gPLYvwWLJNbhbNpLcVTVRRD4BnveZcJZciTdHzX0MfOzh0nRgmJvfREyLMSe0Gi0Wi4Mny4aIlAZOqmocZh3BJb6Sz5I78cmoOYvFkmupx8WWjTFAH2dKxhbgQV8IZsm9WEVksVgyTBqWjeHelsWSd7BbMVosFovFp1hFZLFYLBafYhWRxWKxWHyKVUQWi8Vi8SlWEVksFovFp1hFZLFYLBafYhWRxWKxWHyKVUQWi8Vi8SlWEVksFp8R7C+ZcrfkTawislgsPuPMec2UuyVvYhWRxWKxWHyKVUQWi8Vi8SlWEVksFovFp1hFZLFYLBafYhWRxWKxWHyKVUQWi8Vi8SlWEVksFp+R2mwhO4sof2EVkcVi8RmpzRays4jyF1YRWSwWi8WnWEVksVgsFp9iFZHFYskSRORWEdkuIltE5D5fy2PJPQT4WgCLxZL7EZFQ4G2gKXAeWCkiP6vqQd9KZskNWEVksViygnbAX6q6B0BE/gSuAya6ehKRh4CHAEqXLs2QOgmpRjhnzpzskjXfMTCN5/y4F+VIDauILBZLVlAeiHE53w2UdfekqqOB0QDR0dF6oFBlxi/aeVFkdzatQL/WdbJJ1PxH76en+VqENMnRfUTW5pw3se81TxIEJLqcJ2JMdGnyapc63Nm0Av5iZg75i3Bn0wq82sUqoazk3Z71MuXubXJsi8janPMm9r3mWfYBrV3OI4HFGQn4apc6VvFkM13qlwPgzRkb2Rt7moiwgjzVrnqyu68R1Zw5dUxEugNdVPVO5/xr4CdVnejiJ9neDFQHNmazWCWBQ9mcRnbjjXuoqKrhni746L3mhPfmaxmyIv203mtpYDlQH2NpWQDUUdW41CITkYOkNOf5+hldCnlB5lTfq7fIsS0iMmBzdrU3ewMRWaqq0d5KLzvIAffg9feaA+7Z5zJkd/qqekBEhgALHaeBaSkhJ0yKws/Xz+hSsDJnDTlZEV2SzdmS47HvNY+iquOAcT4Ww5ILycmDFfYBrgbMSGCXj2SxZB32vVoslhTkZEU0A2gnIqVEpAzQHPjdxzJ50wwYmE1Re+0eUiHV9yoildMLLCKFRaRYJtO8rHsWEf/LCZ8VMuSB9JNJI2/nGBkzQa6WWUQCsih/XxY5VhGp6gEgyeY8nwzYnL0gk1cynYgEA2tEpFo6/gJEZLWI1HRz/0dEGqQSLEREqmZCljdE5BaXc38n/pKZiENEJMRRPAVI/b1+ICID04luEPBKJtIuDNzn5lZVROalEaaMiExy5G4OTM1AOo1E5J7Urnsr7+TU9JNIK2+nJ6OIPCQiBZ18/6iIhLhdvz+NfJ+liEhrEfkuI89VRMJEREUkza6Qy5FfRB4TEY+yiEhREWnv/N6iqqNFpL2IlAJ6kgOUaU7uI8rPNufBQGXgJxGPO7PcrqrLgUeAEOBzF38zgAhglIvbn6r6rIjUwBTKH4nIO87vo0CSIqgKNFLVfyG55noH8JZL2tcA/sBJp1ABiFfV8yIyHSjOhT6gMMzWMoeAM8ApYKuqPiEiK4AxwBCnkzvJf00Ruc3tfh9W1WUiUhB4DDgrIten+vRggKr+KiITgQeA2iKyFPhZVV8C7nLkb6yqSzyE/whYoKoqIguBCBFpr6rT00hzEPBXGtcthnTzNiY/TnJxqwQMBYoA7wANMc+6ACZPISKLMSMs94nIdlXtKCJvAT2AE0AUUFtVdzj+3wK6pyPrj6r6pOO/PdBdVR/w5NH5tj5T1ZbpxOmRzMrvgXCM2dsTCowUkY5OWlUxz7EFZsj9/EuROUtRVXvkoAO4AfgXWA00dnEX4F1gDhCMWT7ld6AoMM1xCwLWAxWAKUAJt/C/Y+bv3AY0AL4GWjvXrwTWAP7OeRfgJLDDOc4A9TDKYzuwwTmOYgr7K5y4XY+3gP95cK8JtATGOWndAzyZyvMY5yLjG8CrbtenAy1TCbsUKAwscnELAn4FSmCUtp9bmKeB2UnPwXGrB+wBqru4DXd5BnudZ7UhleNTX+ernHBkIm9fBTzhXGsJrMNUUkKBLcD1GCWyEmji+NsA3IKx8mxw3P4HdHZ+zwKi3PJV7zRk7Q2Mdzlv4ci+0jlWuZ2vB+alElcYRhkEpJFepuT3EP4b4Pk0rrfEVBK3AHUwJnEBtgFfACOc4yFf5I0c3SLKb4hIE+ATjJIJBb4XkfuBtZjmczFMxjzjmONuV9VjIjIDqAVcC7yhqjtFZAIQjSlsAQYCpzHDp18EmmAyfWnn+svAUFV1HcH2nar2dmSbgxlY0BaoqqpnHfdPgbOYGtuVTrjKmAxewUkjGqOotjrX92BMc1+ISFPnnneLyB1uj+RX4Ctgs4i0BTphFOjlMNSJNxFT8D0PvOTcy6PA/RjFlvwcVHWliAwC/hKR21X1T1V9BnhGRPwwtfP3VPU7EbkX+NLtOeZ7Mpm3VwN3ichUTJ6ahKkYAYwHvsTkr07qrG3n0IKU82NqOH4vG1Wdj1GQHnGsAxUvM5nLkb8h5ru7yGwtIi8DnTEKsTzmmwL4DqMgf3POP3P8eR2riFJBREpgPpqymOHFu4DfVfV0Nia7F+ipqtscGe4A/sAU9D8CPZIKOFUdKSKzxaxUAHA3ZjTaGRF5zOU+XlDVFpga2zngQ+A2VY11TF09MB96GeAlEZmsqqn1HbYH3iflBpoBjnybgYKO21OqWtExvQSo6gcisgNTyN8DTFbV/SJSD/gYM/nxe1X90JH5OkzhNEWNCRIRqeiE/1FEKrikXxH4UkROubjdhFEuAZhWXpCIrHTOjwNPASOBe4GFIhKLmeR3O3Cdmv7JFKjqNyISB0wWkc9V9Snn0tPAXlX9zjkfjqmd+lQRiemADlHVE27uN6iqLwb9JOdtEbkVY1KejTGtTcTJ2yJSG5iMqcQcxhTODV3iWYxplX+FqciMV2PCx3ELA3AqBOGY1gqY+WpjReRXVX0zs8KLyGSgK+a9HgRiXS6XxZgNE0XkqKqWSSWaQ27myHWq2tzl/JLkF5EqTvpnRaSCUxEtgDG936CqXUVkFfAw5htdj1FC44FAzDN+HvhPVf/M2BPJWqwi8oCTCYZhlMAeTO25CfC2iAxQ1Z+yI11V3SUix0XkBkwzvSMmc+7BFMLzROQXzEoDG1T1Wje5Rzju4zzEPU1EngHGqOofjnNRoIyIlFPVViLSkrQHAnyGqaVNEJFb1bT5gzCKqJhzDVLmq6QROcEYBfENsErMIIIhmFWbDwLjnfTPYmzlnVR1g4v8C5x7rAZEq2qscz4dY66b57SahgHPYswMUzCmuURMf1plYKWqrnX6pYoC3TAtySUYk9GsVPouwCxNVMcJh4h0wii8oyKSJGtJ5/4UU9guUNVeaTzTLEdEumEK63gR+Re4w0W5fgCkOQgmO3DJ2zcDn2MqBF9i8nlbLuTtOzBm4H4Y0+5sUhb6SWzFVKoKi8h3mArJp861IEyrq6uTR8Gs1FEdOJZZ2Z3K3g2Y1vQYjHK4Vp1lqZy+yLdUdWk6UZVU1YuWwc4C+e/FVCZ3AX0x/XAbgRVAqIiMxXwDDwOvYvoz/8FUonpgrAyPYvK/b/CFPTCnH8AmoLQH9zLA5mxMdwymw3EiJnMVdrnmh/kYhmE+zrFcsE+7H+swrZak8wkYG/FpJ9xvQG2Mvbgn8LGTxreYWisYU8gxLvRznML0lYgT/nnH32RMS8kP86H0wNTeBJjrhA3CrKbgjzHTfef4r4RRtsMximCuc6zEFDJ9gFuBpi7PYQsQ5nKe3EeEqem1w3y0e5zn+ThG+a3BfOjuz2o9EOH2HqKA9W5uLwD9Xc5bOe9qEjDIxX0/EOz8vh6Y6IP8uxFjPsXJR8uAAs55tuXfDObteZgWcGHH/WtMn+UNmH6iM8ArzrWjSXnTLa4wYI2b23onHwY4ea60885djzku/seRwT4iTH/Uv5iBMq85cu1w3vVujFJdB7yWSlxhpN9HlCn5XcIVx1TkrgJKYVqRFZ00W2P6lkIcv/MxSzCFA7sctzcwZcIy13zs7cO2iDwTgClI3UlIxT2r6I/p+N+HKfT/z0Pt/AqgmKqeTC0Sx169BWNHL4zJeK9iCqhqGIX0Pqb1MRnoLyJDMS2GyS5R/agp+4hQVRWRO4FxjmmvoBPfm5hMnrzEC2b03nHgOWCmI88ZjKKYiFHsqzAfwf9U9T8nrXCM4qyDaYlOBxal+eQMfqo6w4njbUwN72ZHxlLAYVWt5/KcemAKHPfRRg0wfReuhGDMS0k0wyhdT6OkJooZWlwcKCciKUxhqnpDBu7lslDVzc7fsWLWgfsIY6rx1eKSSXn7GMZEvNTJ2+EYE/hRTN5eqqrPO2HO42ErCXecfBjvxJM0YjMQOKSqrV387bhE2ctjKmKnVHWImNGk+zAt40NAL4xZvKOIrFPVCZmJ/DLl/x9mNGjSSNf3MBWuji5+3ndGzAlG6RTCtNzBVKQmYEyh32dG7qzEKiLPPA0scQrfGEzmiMTU2oakEe6ySFIuIoKq1vDkJykzOhlrmOOcpAySvWEK3kaYJv5bmJZGaYyCq49p9a1Q1UQReQ34BbhLVV2X30lNzsMYM1vSPJ1Tqpo8/0dENmFaMyMdWW4GJmjKoa9/OX63YAr1vm5KNxA4mvQhisiXQGNMZWCRi99IjKnwNBApIpuBncDPmD6fuZha4RQnnpuAA5gCcShwjTpVQ+e6YMwbo9xuOwwzMi7pGbzh+PekiP7A1Eb/dv6+5sFPdjJLRLqp6g8AqjpCRD4QkZlcKIC8ikveFuCjJGXjmJP3qur7IrKXlMs/7QWud/r3XCmBMTslEYlpFfyMaYVmNUGYPpimYua5lcYojgBMfjyCGXJeGDM1YqmqZmah3kuS35GlDcbKkMSbmBbc584Bxiy9EjOtYYaIvI/5JlEzLSIAiFXV7ZmQOWvxVVMspx+YTHUL8ATG5noXUMpLae9P49oOUprsAjFmrNdc3Ipj7MVBmKHWSfMv/gaewcwF2oypgdbCKNuPMbW7Lk4cXfA8fDsA0yIDo+T2Y1bvTXpmH2JWYS6BKdCfwPQfLcfUxsLd7mdLKvcZiQdThAd/rqY5cXG/E1NpCMSYAPti+o0aY2qzO3CG/ro9yw+c5+SfFB+mf2sNpm/KPf2nudg09wCmgDoF9PFB3g3Es2m5CWZOli+/q1jgc5fz8Zi+EJw8u93l2gMYk+3wpPeLMfE95xZnr6Q4MSbfjZ7yD7DD5fc4jALZncpxhAumuXsxFb1wV5kxo0/7uqXxHc435OIWRhqmuczKj/kGh2FakbVT+XZ2YcyJCxy3azAVsulOXt7muPdw3skpoIGv8oVtEaWCmhqcz5qqIrImlUsRrieqGg88JmblghcxBb8/8LWqnsMUwgADXOIej1ktoCNmtEw/VZ0kIpOAr0XkWkw/lPvwbTDK7ZDLKLWpqhojIo9j+lEmAs3VDMNNkvGoiLQAXgdiRORDvTDqDDGTTd0JxHxoGUadL8vhGKb/qxvGxFMIY8JY4gxqmAU8JCLLVTVeRBpjBoasBm5UM4JruTNaLxEzgs+TnNu4eITceJzaqGaghZnVOHnC08i/xWRwj6Bs5BxmaHYzjGKpDNQXkVcwLY3NIvI9ppJUBVNp6C0iHZzw5YFdYrYTATNXrDjwu4g8i+l0n+1ca+T2HQW5/D6PmavkcXi0iNyNMZWBMQ+KcU5eluphoK7j9wpV3erklSbAeREZ5xblMS4eNQemz+xcJuV/DrgRaKYuA3qSUNXdYobL/wKUFZEiGDN4QYzJuShQ2jHjdcdUSqdiWtKpPpNsxZe1I3t4PshEi+gS42+NqVX1ACq5XQvHFABBqaWD+SgL4nSAO261cJnw6bj1xZmY6OJWHqjscp5lLaJ0/F0PjHCTdyEQ6JwH4aHFk3S/mXl3OIMV7JHq8+mNGfW2FdOy6IoZybUDM5R4NaZG/0oG4xwClHV9V57yDy4tokzK/DrGjOsu8+9OHp+PsTysA269hPgzJT+mhZ5uHsOMBp2FaUW+5cg6yLl2C2bFhr1cGBxyDWbqhdfzRY7dGM9iyW5ExF/txFOLxedYRWSxWCwWn5JjV9+2WCwWS/4gzwxWKFmypEZFRflaDIslz7Js2bJD6ra99+Vgv9mcQVa/10shzyiiqKgoli5Nb4UNi8VyqYhITFbGZ7/ZnEFWv9dLwZrmLBaLxeJTrCKyeERVWX1wNduObfO1KBaLJY+TZ0xzlqwjPjGeAbMHMGf3HABuq3EbzzR+hjRWpbZYLJZLxioiy0V8sfYL5uyeQ7/6/Th0+hDfbPiGUiGleKCOx12S8xTx8fHs3r2bM2fO+FoUnxEcHExkZCSBgYE+lSPq6WmXFX7HiE5ZJIklu7GKyJKCE+dO8Pnqz7km8hoeuuohVJXDpw8zauUo2pRvQ+Wwyr4WMVvZvXs3oaGhREVF5csWoKpy+PBhdu/eTaVKldIPYLFkAbaPyJKCGTtmcCL+BA9d9RBgVgJ/tsmzBAcE8/ayt30sXfZz5swZSpQokS+VEJj3XaJEiXzdIrR4H6uILCn4eevPVC5amTol6yS7lShYgvtq38ffu//m34P/+lA675BflVASrvd/7NgxevXqRbly5QBqi0iQiPQXkZ0istFlMVJEZISI7BaR1SLS0EPUFotHrCKyJHPs7DFWHlxJ24ptLyqMb69xO8UKFOPTfz9NJbQlL9KvXz9q167N7t27wazcXB6zU2ktzMKfY0QkUETaYDYJjMKs9D7GNxJbciNWEVmSWbhvIYmaSMtyF+/1FhIYQs8aPflr91/sOr7LB9JZLpXz5y9tXdf9+/ezYMECnn322aSKiWKUz7eqekJV12FWzG6I2W5jnKomqOpMINzZMsFiSReriCzJzN8zn9CgUGqXrO3x+q3VbsVf/Pl6w9delix/ceLECV544QVOnjQbwl599dUcPHgwhZ/ffvuN0qVL07p1a1q3bs1PP/1EqVKlks8fffRRABISEujSpUtyuPj4eBo1asRLL72Urhxr166lUqVK3HLLLVSvXh3M1gTlMRspJrEbs1+Pu/sePGzzLSIPichSEVnqfk+W/ItVRJZklh9YTqPSjQjw8zyYMjwknBuibmDKlimcij/l0Y/l8gkNDaVRo0Zcd911zJkzB1XF398/+fq8efOYPXs27777LsOGDWP27NmMHTuWmTNncuONNzJnzhxGjRpFQkICkZGRnD59mqpVqwLw8ssvM2DAAJYtW8bixWnvkffff/+xbt06Ro4cyfLly8HswNuZlNt5J2I2mQtKxT0FqjpaVaNVNTo83KfLm1lyEHb4tgWAo2eOsvPETrpV7Zamv141evHr9l+Z+etj3Hz0IJS6EloNgtDSXpLUe7z081rW7T2epXHWjCjC0JtqpXr9xIkTDB8+nGeeeYYWLVpw9OhR4uPjCQi48Klu2bKFl19+mXPnzjFt2jQ2bdpE165dqVu3LkuWLEkRX7NmzWjUqBHLly9n2LBhHDt2jNtuu4127drRpUsXHn74Ye644w6PspQqVYqGDRsSGRmZ5HQcs8V2ORdvSdtS73Nzj8C0liyWdLGKyALA6kOrAbgq/Ko0/dULrUSFRGHqvnnc7FcOlo2D9b/A/TMgrIIXJM3bhIaGEh0dTZs2bXjllVdo3749p06dIiQkJNlPjx49uPHGG5PP9+7di5+fH19+aXZ4njRpEp988gkVK1akQIECVKlShX/++QeA2NhY+vTpA0CTJk0oXTr1CkTTpk15+OGH2bt3LyVKlAAoApwE7heRt4CKmG26VwLTgEdFZALQBtikqkey6rlY8jZWEVkAo4j8xI9aJVKvraOKTH2MzseP8UFYEXZ3G0/kycMw7kaYeAc8+Cf4+3Y2flaSVsslO+nWrRvXXnstu3aZQSFnz55N0SKaOnUq119/ffL5hg0b8Pf3Tza/Afz+++9s2LCBOXPmcOjQIWJiYvjoo494+umn+fjjj0lMTKRz5868+eabqcpRqFAhRo4cSdu2bTl79izAMVV9W0QKYEbQnQEeUFUVkR8xW01vAw4Dt2fdE7HkdawisgDw78F/qRpWlZDAkNQ9bfwNNvxC59aD+TBmEj9v/ZlH6j0CN38I394Fiz+G5v28J3QepF+/fqxYsSL5/PTp0+zbt4+WLS+MZKxfvz4jR468KOyUKVPYsmULgwYNAuDcuXPs27ePevXqJfXxAGb1hI0bN6bZGkqiQ4cOdOhgpgqJyD4n/DBgmKs/VU0EHncOiyVTWEVkAWDDkQ20Lt86dQ+q8OcrULIaZVsNpvHZ7UzbPo0+dfsgV94EVW+Av9+Ehr2hQKi3xM5zuCuYfv36kZCQwFdffZW85M6PP/5I69atOX36NAUKFMDPz4w5OnToEKdPn+aXX36hZ8+edOjQgQoVKhAbG5sizvvuu49t27bx+uuv8+2333Lrrbd65d4sltSwo+YsHD1zlCNnjlC5aBrryG3+Hf5bZwYm+AdyQ9QNxByPYXPsZhCB1k/DmWPwj53HmBWsXLmSTp06cfz4cUaOHMldd91Fv379+O+//+jatStz5szh9ddfp0SJEvz888/MmTOHF198kUcffZQ5c+bwyCOPsHz5cqpUqcIbb7yBquLn58e5c+cYO3Ysf/31FyVKlGDy5Mm+vlWLxSoiC8l7Dl0RdkXqnhaNgiKRUNuMqmtToQ1+4sfMmJnmermGULm1Mc+dT8hmifMuX3/9NU2bNuXxxx+nX79+fPHFF1x99dXMnTuXhg0b0rJlS6ZPnw5A69at+eSTT5g5cyb16tVj4MCBNGx4YWWdKVOm0KlTJ66//vrkNeQ2b95M8+bNadmyJT169OCBB/L+iuqWnI81zVmSFVGqLaLYXbDtL9PqcQYjlCxYkoalGzJzx0weq/eY8dfoAZh0J2z9A6q184boeY4GDRowfvx4qlSpksJdROjduzfdu3dPsfxSqVKl6NatG126dEFEUlwbM2YMgYGBzJo1K9ktafScxZKTsC0iC9tit1EwoCBlCqWyIsu/EwGFur1SOLet2Jatx7ayNXarcajWHgqFw/Ivs1fgPEyNGjUuUkKuFC5cmEKFCl3k7ufnd9H6gL7eT8hiyShWEVnYdmwblYpWwk88ZAdVWDURKraEYlEpLl1X4ToA/tz5p3HwD4S6t5nRdXGHsllqi8WSV7CKyMLW2K1cUTSV/qGDG+HwluS+IVdKhZTiyuJXMm/PvAuOdXqAnocNv2STtBaLJa9hFVE+5+S5kxw4dSD1nVeTFEr1jh4vt4psxcqDKzl29phxKFMHileGtVOyXliLxZInsYoon7P92HYgjYEKG6ZBuWgoctFCygC0KteKRE1k4d6FxkEEat4M2/+GuMPZIbLFYsljWEWUz0lzxNyxPbB3OdTolGr4OiXrULRAUebumXvBsWYXY57bOC2LpbVYLHkRq4jyOVuPbSXQL5DI0MiLL26bbf6mMRTb38+fFhEtmLdnHonq7AJQti4UKQebZmSDxJb0uNSN8CwWX2EVUT5nW+w2Khap6HkPoh3zoWBxCL8yzThaRbbiyJkjrD+83jiImCV/ts6GhLPZIHXe5dy5czz66KNUq1aNqlWr8v333wPw0UcfUaFCheSN737++WdKliyZfP7yyy8DZg+h++67Lzm+o0ePEhUVxRdffOGT+7FYMoKd0JrP2XZsGzVL1PR8MWYeVGwOfmnXV5qWbQqYrcZrlXRWrK7WDpaNhZj5cEWbrBQ5T3PkyBHatGnDqFGj2LRpE40bNyYhIYG4uDgGDhxIp06diIyMpEePHkyZMoWYmJjk/YS2bNnC9ddfT5UqVWjVqhVz585lwIABjB07lqFDh3L11Vcnr1dnseQkrCLKx5xJOMPuE7u5sfKNF1+M3QWxO6Hpo+nGU7JgSaoWq8rifYt5oI6zZEylq8G/AGz6Pfcqot+ehv2rszbOMnWgw4jUL5cpQ/fu3QGoVq0aAQEB7N+/n4EDB7J7927WrFnDhAkTGDRoEM2bN2fTpk0pwrds2ZJrrrmG3377jUceeYTatWtz7bXXEhUVRc+ePRk2bFiKLSQslpyAVUT5mJjjMSjqeeh2zHzzt2KLDMXVpEwTJm+azNnzZyngXwCCCkGlVrB5RpoFryV1xo4dy1VXXcWNN95I27Ztk923bt3KL7/8QtGiRQGzPt3UqVMBiIyMpGjRopw8eZKaNWuyevVq+vTpg6rSsWNHChYs6JN7sVjSwiqifEzS0jweR8ztmAfBRaF0xjaHa1q2KePXj2flfytpUraJcazaDn57Cg5tgZKpL1uTY/GhAh0xYgSTJk3i119/ZcqUKSlaMUWLFqV8+fKUKXNhSaapU6cyadIkVq1axdKlSzl79iw33ngj06dP58UXX2Tbtm289dZbtGiRsYqFxeJNrCLKx2w7tg0/8SOqSNTFF2PmQ4Xm4Oefobgalm6Iv/izeN/iC4qo2g1GEW2ekTsVkY947LHHiIuLY/78+YSEhPDII4949Pfuu+8SGRmZbMqLjo7m9ddfp3HjxsycOTPZX9JGeGXLep4LZrH4GquI8jHbjm2jfGh5gvyDUl44vg+ObIPo+zwH9EDhoMLULlmbxfsWX3AsFgUlq8PmmdDssawROo+zaNEiNm7cmGLF7I8++ohJkyYRFxeXYsHT3bt3U6BAAT744AP69+9PsWLFqF+/Plu3bk0RZ4cOHUhISOC1115j1qxZto/IkuOww7fzMdtit3k2y2WyfyiJpmWbsubwGlbt2c9r09bRbdR8fjh5JQnb5zF3zXZUNQukztusXLmSpUuXUqVKleSjUqVKzJkzh4EDB1KxYkX++OMP5syZQ58+fXj55ZeZM2cOXbt2TXUjvOnTpzNr1iyOHz/OjBl2bpcl52FbRPmU+MR4Yo7HeN4efMc8KFAEylyVqTjrlowmUT+h+7gv4VRtGlQoxuYizQj4bwrjvv6KUVE38PatdYkIsx3mqdGnTx/69Onj8VqvXr2Sh3aPGTOG2NjYFIpl3rx5dOzYkTZt2lCrVi2qVq3Kjz/+yN9//w3AiRMn+Oqrr7xyHxZLZrCKKJ+y68QuEjTB866sMfOhQlPwz3j22HEojpe/O4mGBVKz8n+Muek6ShYuAAkN0NdfZnD5XXSLiaXDe3P56I4GNK9SMgvvJv9QqlQp+vXrx6OPPoq/f8r+u0mTJhEYGMiff/6Z7LZ+/Xpvi2ixZBprmsunbItNZY25k//BoU2ZMsv9s+MInT+Yx8ETidQsXhcpuMUoIYCAAkjl1lQ/vohp/VpSKrQA94xdwtSVe7LqVvIl7koIsm8jvI4dOyZvKS4i/UVkp4hsFJEOSX5EZISI7BaR1SLSMNXILBYPWEWUT0la7LRSUbeZ9kn9Q1EtMxTPnI3/cdeYxZQsXICfHmtJ+ytasfXYVg6eOnjBU9Xr4dhOotjDd32a06BCMfpPXMmHs7fYfqMczowZM1i5cmXSaQHgMaAW0BUYIyKBItIGaAlEAQOAMd6X1JKbsYoon7I1ditlC5UlJDAk5YUd8yGwkFm4NB2m/buPB79cSuWShfm2TzMqlAhJXu5n0b5FFzxWcSZjbv6doiGBfHl/Y26uF8GbMzYy+Lt/OZeQmFW3ZclC4uLiGDp0KIMHD05yCgO+VdUTqroO2AE0BLoB41Q1QVVnAuEi4nHfeRF5SESWisjSgwcPevJiyYdYRZRP2X5se+orKlRoYrb9ToOJS3bS75vl1I0M45uHmiab4moUr0GxAsVYsHfBBc9h5c3CqZvN3JYCAf6827Me/a+ryuRlu7lzzGL2xp7OsnuzZA39+/dnwIABhIWFJTkFATEuXnYDZYHybu57HPeLUNXRqhqtqtHh4eFZL7QlV5KuIhKRG0XkShF5IpXrfUWkkKdrlpxJoiYaReTePxR3GP5bl27/0Nj523n6h9W0rBrOV/c3oWjBC0rLT/xoFtGMBXsXXNgWAqBqW4hZAGdPACAiPNm2Gu/2rMeaPcdo9+7f/LB8tzXVZTHTpk3jyy+/5Pjx4yncVZXu3buzc+dOj+GmTJmCiHDrrbe6Ogvg2nxNBM5jFJQnd4slQ2SkRVQZKAYk7xUtIg+ISEXntDtwKq0IRORWEdkuIltE5D63a7VFZJWIxIjISBHxc9xfFpENTjg7GzIL2XNyD2fOn+GKom4j5nY6rZg0+oe+WLCDl35eR7tapfns7mgKBl3cad6iXAuOnDnCxiMbLzhWbQuJ8WbnVhe61C/Hb/1bUb10KAO+XcU9Y/9h15E0s5MlE8TGxvLVV1/RpUsXmjVrxoABA9i8eTN9+/alVKlSVKhQwWO4CRMmMHv2bGrUqMEzzzzDd999BxAOlHPxFgnsAva5uUdgWksWS4bIiCIS4FGggYj8ISJfAj2Bz0UkEjitaVRjRSQUeBvTmdkSGCYirm3yUcDTGIV3FdDZcd8P1AQaAy84aVmygC1HtwBQpZjbsjs75kNAQYho4DHcV4tiGPrTWtrWLM3I2xoQFOA5+zSPaA7A/L3zLziWbwpBobD594v8VyxRiG8fbsbLN9di2Y4j3PC/v/ls7jYSztu+o0vl/PnzrFmzBhGhX79+/Pnnn8ydO5fExESqV6/OihUrGDVqVKrhJ0+ezJYtW9iwYQPDhw9PWkZoPdBLREJE5EqgOLASmAbcIyL+ItIW2KSqR7L/Li15hYwqoo+AFcB9wEZAgf8DZgGT0wnfDvhLVfeo6n7gT+A6AEchVVLV31T1PDABaA+gqqNUNVFVD2JqVyUye3MWz2w9ZpaAuahFFDMPyjeCgKCLwkz7dx/PT1nD9VeW4sPbU1dCYLaFqF6sesp+ooAgqHwNbJ4FHuotfn7C3c2imDngGlpUKcGr09bTddQC1u49dmk3mYtJTEykbdu2VKtWjerVqydPWs3M5ni9e/fmvffeY9iwYXz66acUKlSI2rVrc/bsWVatWkXlypV55ZVXMivaKWA8sBb4AXjQqYT+6LhtA14H+mbJg7DkG9JURCLSA3gWo3iSjiRWAIHA0nTScO/ITOrgBNO035nKtSQZ2gCFgTUe5LMjcC6BzUc3U7ZQWQoHFb7gePoo7F8DFS82yy2LOcqT364kumIxPkhHCSXRvFxzVhxYQVx83AXHqjfA8d1wcEOq4SLCCvLp3dF8eHsD9h07TecP5vP9svxl5RERvvzySzZt2sR7773HkCFD+OGHH5I3x/vss8+YPn06o0ePZsqUKTz44IPMmTOHF154gS1bttC4cWP27dvHypUrWbNmDYcPH6ZatWqEhIQwePBg6tSpw1dffcX8+fNZsmRJuvL07t2bzz77DABVHaaqlVT1SlWd77glqurjqlpRVRuoauov2GLxQHpT5/8ARgD1gFJAB4zyAGNuGw/cCQz2FNghrY7MNDs5ReQeYAhws9NiSoGqjgZGA0RHR9te7gyyNXbrxSsq7FwEKESlHKhw8MRZHv5qGWWLBjP67miCAzO2GneLiBaMXTOWxfsW06aCszFeFWexzc2/Q6nUtx8XETpdVZYWVUrw6ITlDJy8ivOJyq2Nymf0FrOE15e8zoYjWVum1iheg/9r/H9p+hGR5JWyY2JiqFu3LgcPHsz05ngjR46kadOmHDp0iFGjRjFlyhQaNGhA48aNGTVqFNWqVaNx48ZZen8Wy6WQpiJS1SMich4IxUxm8wf+AqoAi1R1oojMTieNfUBrl/NIYLHLNU+dn4jIU8C1QHNVPZShu7GkS0JiAtuObUvux0lmxzyzo2q56GSnxERl4ORVnDgTz/gHGlO80MUmu9RoUKoBoYGhzN41+4IiKloOStc2w7hb9E83jrCQID7v3YgHv1zKMz+upkzRYK6ulj+G/L7xxhu8/vrrhIeHM2PGDBISEjK9OZ6IEB4ezsaNG7nrrrto2LAhYWFh9O7dm3bt2jFihN2w0JIzyOhiYnOBWqo6CkBEiqvqROfabhGpqKoxqYSdAQwXkVIYU2Bz4GEAVd0pInEi0tpJ4y5giIiUB+4GGqhq/CXclyUVdp3YRXxi/MUtopj5EBkNgcHJTl8tiuHvTQd5rWttapQpkql0Av0Duab8NczeNZv4xHgC/Zwh3lXbwoKRcOoIhBRPN57gQH9G3dGAHh8v5LEJy/m1fyvKFw9JN1xWkF7LJTsZPHgwgwcP5ocffqBdu3b0798/05vjnTp1ijp16jBz5kxq165N06ZNmTNnDp07d+b9999nwYIFyXsZWSy+JCOKKBZ4EzgnIn9iBi/4i8h1GOUxFjPCzSOqekBEhgALHaeBwA0icoWqvgXcA3yBmbU9TlXniUh7oBKwXkSSohquqnbpkMtkS6yHEXNnjsO+VdBqULLT/mNneHPGRq6uFs7tjT0P8U2P6ytezy/bfmHp/qU0i2hmHK/sDPP+Bxt/hfp3Ziie0OBAPr07mo7vzaX/xBV8+3AzAvzzx1zsbt268fjjj9OjRw9Klrx4odi0Nsf7/PPPOX78OGXLluX6668nMTGREiVK0KJFC4YNG8Znn33GiRMnCA0N9fZtWSwpSFcRqeoXznIde4EpwD2q+oGIlMb0Gf0fZomPs2nEMQ4Yl8q15UAdN7fpmAEKlixmy9EtCJJyMuuuxaCJKfqHXvllHfHnE3nl5lq4VAYyRfOI5hQMKMismFkXFFFEfQirAGunZFgRAZQvHsJr3erw+DcreP+PzQy4ofolyZQb2LZtGyEhIZQpU4aFCxcSHBzM5MmTM705XrFixbjzzjtZv349CxYsYNOmTbRs2ZLbbruN9u3bM2jQIDZu3Eh0dHQa0lgs2U9GTXN3Y4Zkvs4FhfIiEAzsUdU4z8EsOY31R9ZTsUhFCga47Am0Yx74BUKk6bheFnOEaav38eT11ahY4tIXzSgYUJBW5VoxM2YmTzd+mkD/QBCBWl1h4YcZNs8l0bluBHM2/seHc7bSvnZZakZkzlyYW4iNjaV9+/acP3+eUqVKMWnSJBo2bMgjjzzCxIkT+e233/j888/x9/fnrbfeolKlStxyyy0AvPPOO9SsWZO+fftyyy234OfnR5MmTRg2bBgAM2fO5Pfff2fdunWEhYXRoIHnOWMWizfJqH3jCPAI8B7wvIhcA7wMPI+ZdGrJJaw9vJaaJdxe2Y55UK4BBIWgqrwxfSMlCwfx4NWVPEeSCW6ucjNHzx5lzu45FxxrdoHEBGOeyyTPd6pJWMFAnvnhX84n5s2Bkg0aNGDTpk1s3bqVhQsX0rDhhV0VevXqxZtvvsmoUaOoV68eH3zwAbVr106+Pm/ePKKiomjTpg0RERHJm+O1bNmSli1bMmjQIM6ePcvXX3/N999/j59f/jBxWnI2mcmFE4CnMH06Q1R1n6ruBs5li2SWLOfQ6UP8d+q/lIro7EnYtzJ5fbl5Ww6xePsR+l5bhZCgy983sUVEC0qFlOKHzT9ccIyoD2EVYXV6c6EvplihIIZ2rsWq3ccYt2DHZcuXG0naHG/ZsmXs2LGD6tUvmCknTZpEjRo1+PPPPxk5ciTFihVj/fr1zJs3j7fffptixYpx6tQp5s2blzxE3GLxNemWNCLSGTOR9QxmyfcQoKiI3I0ZuGB7OnMJ6w6vA6BWiVoXHHcuMq2TSq0AGPnnFiKKBnNbk0sboOCOv58/N19xM2PWjGHPyT2UK1zOmOfq3gZ/vQ6xO02fUSa46aqy/Lh8N2//vpH2tctQLp9uPZ7e5ngjRowgIiKCmJgY/vjjD2rWrMl7771H3brpb/FhsXiTjLSIamH6groDfYB+Trhg57i0nmyL11l7eC2CcGUJl8mk2/8y/UPlm7JqVyxLth/hvpaVKBCQsYmrGeHW6rfiJ358ufbLC4717zB/V0zIdHwiwitdaqMKQ6eusSt2p0LJkiXZsGEDa9euZefOnZw4cYITJ074WiyL5SLSVUSqOhwzIm4eZh24nsACZ1+RT4DjaYW35BxWH1xNpaKVKBToMgBhx1yIbARBIXw6dxuhwQH0usTh2qlRplAZbqx8Iz9s/oEjZ5y1MMMqwBXXwsoJkJj5HQMii4XwZNuqzFr/HzPWHshSefOKYnvggQcYNmwYEydOZNOmTfTo0YMhQ4bQvXt3Dh8+nGq4vHL/ltxDRvuIFKOIlmCW9pkvIqEiUjQTcVh8yPnE86z4bwUNSruMkjoda+YPVWrFriOn+HX1Pm5vUoHCBS6/b8ide2vdy9nzZ/l89ecXHOvfBcd2wZZZlxZni0pcWbYIL/60lpNnE7JEzuDgYA4fPpznCuOAgAA6d+7MnDlzaNCgAddcc43HvYhUlcOHDxMcHOwhFosle8hoiVMcqIoxzf2OaRl1AQoCm1IPZskpbDq6iZPxJ4ku7TJnJGaBM3+oFd8sMYVS7+ZR2ZJ+5bDK3FzlZr7e8DU9q/ekfJHycOVNEBphVlqo1i7TcQb6+zGsa226fbSAt3/fyNCbaqUfKB0iIyPZvXs3eXkR3a5du3Lu3Dk6dOjAN998k6JfCYwyjoy0u65YvEdGFdFkIFJVb3Y2xAtX1RezTyxLVrP0gFkkvWHpC0OB2TEXAoJJiGjId18voHX1UpQtmn0d//3q92PGjhm8s+wd/nft/8x25E0fgZnPw94VZjRdJqlfoRh3NqnIFwt20K1+JHUii16WjIGBgVSqdPnD1nM6L774IgkJCRQrVozy5b27mKzF4k6GzGqq+pKqfub8jrFKKPcxf898KhapSJlCF9YmY/tcKN+Yv7ad4L8TZ+mZzatblwopxQN1HmDWzlnM3umslduwNxQoAnPfvuR4n2pfnRKFC/DMj//azfQywauvvmqVkCVHYPt38gFx8XEs2b+E1pGtLzge3wcHVkPl1kz6ZxclCxegTY1S2S7LvbXupWqxqry66FVOnDsBwUWgWV9Y/zPs+ueS4iwSHMjQm2qyZs9xvlyY2tq7Foslp2IVUT5g/p75xCfG07p86wuOzgCBI+Wu5c8N/3FLg3IEemEh0UD/QF5u/jKHzhzinWXvGMdmj0GhUsZEd4mDBDrVKUvr6uG8/ftG9h07nYUSWyyW7MYqonzAL9t+oURwCeqVqnfBcfMMCI1g8s4iJCQqPaK9Z6KpXbI2d9e8m+82fcc/+/+BAoWhzRDYuRBWfHVJcYoIr9xcm/OqvPjT2iyW2GKxZCdWEeVxDsQd4O/df3NzlZsJ8HPGpiScg61z0KptmbRsN9EVi1GllHcXO3+03qNEFo7k5YUvc+78Oah/N0S1ghlD4NilbQ1evngI/a+rxoy1B5i5LmvnFlksluzDKqI8zudrzLyd7tVcNkDbuQDOnWBr0eZsOxjn9S24wazM/VzT59hxfAdj1owBPz/oPNJMbv32bog/c0nxPtCqEtVLhzJ06hrismhukcViyV6sIsrDrD28lm83fkuXKl0oH+qibNZOgcAQPt9fiUJB/nSq45vFL1uUa0H7qPZ89u9nxByPgeKVoOvHsGcZ/NQPEjM/Ai7Q349h3Wqz99gZ3p1lp7hdKufOnePRRx+lWrVqVK1ale+//x4AEekvIjtFZKOIdEjyLyIjRGS3iKwWkYapRmyxeMAqojxAoiZy+PRhjp45SqImkqiJLNq3iL5/9KV4weI82fDJC57PJ8D6n4ivcgM/rjnKTXUjKJQNKylklMGNBhPkH8Rri14zqxnU7AxtnofV316yMmpYsTi3N6nAmHnbWb37WDZInfc5cuQIbdq0YdOmTUybNo37778foADwGGb9ya7AGBEJFJE2QEsgChgA2J2ULZnCdyWQ5bI5n3ier9Z9xVfrvuK/0/8BIAgiQqImUq5wOUZdN4qiBVwmee74G04dZlHB1pyOP+8Ts5wr4SHh9K3flxFLRjB3z1yujrwarh4E5+PhrxFw+ih0+wQKZG6R9/9rX4NZ6w7wf9//y9S+LbwyIjAvUaZMmeTtx6tVq0ZAQACYFVbGquoJYJ2I7MCsyN8NGKeqCcBMEQkXkTKqut830ltyG1YR5VLOJ57n/+b+HzN2zKB5RHPuq3MfQHKrqFLRSrSt2JbgALc1w1Z+AwWKMHJnRaqW8qd++TDvC+/GrdVu5ZsN3/DO0ndoHtHcDKq49hmze+v0Z+CzttBrApS4IsNxFi0YyMs316bP+GV8Nnc7j7TOeFhLSsaOHctVV13F7NmzAwDXiVq7gbJAeWCqi/sex/0iRSQiDwEPAVSokLWL61pyL1YR5VI+WvURM3bM4MmGT3JvrXsRycBuHHGHYN0Ujta8gyX/nOa5TldmLFw2E+gfyJMNnuSJOU/w45Yf6VGth7nQ5GEIrw6Te8Poa03/UY2OGY63fe0ytK9VhndnbaJ97TJUKnnp257nV0aMGMGkSZP49ddfiYiIEMDVVpoInAeCUnG/CFUdDYwGiI6Ozlsry1ouGWuvyIVsOLKBz1Z/RucrOnNf7fsyrkyWfwHnz/HN+esI9Be61i+XvYJmgjYV2tCgVAM+XPEhcfFxFy5Ubg0P/WUGMky8DWa9lKltI166uRZBAX4888O/eW5F7ezmscceY8OGDcyfPz9pN9d4wDXTRAK7gH1u7hGY1pLFkiGsIsqFvLvsXUKDQhncaHDGA509AQs+4Hzlaxm9oQA31CpDicIFsk/ITCIiDIoexOEzh/li7RcpLxarCPfNgAb3wLx34KuupnWXAUoXCebZjleyaNsRvl26Kxskz5ssWrSIjRs3Mm7cOEJCQpKcjwG9RCRERK7E9BmtBKYB94iIv4i0BTap6hGfCG7JlVjTXC5j2YFlzN87n4ENB6YchJAeCz+E00eYV/5hYtfFc1ujnGefrxNeh3ZR7Ri3dhw9qvUgPCT8wsXAYOj8PkRGw7RB8MnVcMd3ULpmuvH2jC7PlBV7eG3aeq6tXopSRexeO+mxcuVKli5dSpUqVVydA4BxwFrgDPCAqqqI/AhcA2wDDgO3e1ncbCPq6WmXFX7HiE5ZJEnexraIchnj1oyjeHBxetXoZdZlO7IdDm6ChLOpB9q/Gv5+C2p15aPNYZQvXpDmV5TwntCZoH/9/sQnxvPRqo88e2hwN9w/w5jnxnUyG/ulg5+fMLxbHc4kJDLULv+TIfr06UNsbCxbtmxJPoDjqjpMVSup6pWqOh9AVRNV9XFVraiqDVR1g2+lt+Q2rCLKRew5uYe/dv9F96rdCV41Cd6tA+/Xgw8bwfDy8GUXWPIpHNtzIdCBdTDhVggpwY4mL7Fo2xF6NaqAn5/vByl4onyR8vSs3pMfNv/Atthtnj1F1Id7f4WgQvDFTWYvo3SoHF6YJ66vym9r9jN9jR1VbLHkJKwiykVM2jgJP/Gjx+ZF8PPjUCQCbnwXuo6GRg+Ybbd/HQT/qwmjW5u+lE+uhsR4uPN7Pl9xgiB/P3pE5+zdNx+66iGCA4J5d/m7qXsqcYVRRgWKwvjucHhruvE+2KoyV5YtwgtT13DsdHzWCWyxWC4Lq4hyCWcSzvDj5h9p41eEMut/getfNB340fdC3Z7Qfhj0WwaP/QPXDQW/QDh5EBo/CI8sILZINSYv3c3N9SIoFZqz+0iKBxfn/tr3M3vXbJYdWJa6x7AKcNePgMJXXcweS2kQ6O/H67fU4dDJs7w+3VqPLJacglVEuYTpO6YTezaW23ZvMEqo5ZPgadh2eDVoNQAemAmPzIP2w6FwKSYs3snp+PPc3yp3bIN9Z807KVWwFO8sfSftYdclq5hBC6eOwPhb4HRsmvFeFRnG/S0r8fXinSzadjhrhbZYLJeEVUS5hImrx1LlXDzRUW2hxROZCnvqXAJj5++gVdWS1ChTJHsEzGIKBhSkb/2+/HvoX2bGzEzbc7kGZuWFQ5vgm14Qn/bGeE+2rUb54gV55ofVnInP+Jwki8WSPVhFlAtYc2AFa49v49azitz0nueWUBp8sSCGQyfP0v+6qtkkYfbQ+YrOVAmrwrvL3+Xs+TRGBYKZ+NptNOxcBN/dZxZ3TYWQoACGd72K7YfieP+PzVkrtMViyTRWEeUCvvn7BUISE7mp9TAoVDJTYY+djufjv7bSuno40VHFs0nC7MHfz5+nGj3FrhO7GP3v6PQD1O4GHd+Ejb/CL/3T3Ha8ZdWS9GgYycd/beXvTQezUGqLxZJZrCLK4cRun8P0uO3cFFyOwrVvyXT4t2Zs5MSZeAbdUD0bpMt+mkc056bKN/H56s/ZdDQD+ws1fhCuHgwrxsMfL6fp9aWba1G1VCj9vlnBriOnskhii8WSWawiysmcO8WU35/gnAi3Xjsi08GXxRxl/OIY7m4WRe1ymViFIYfxVKOnCA0K5YX5LxB/PgPDrq99Fhr2NssBzR6WassoJCiAT+5qiKry4JdL7ZBui8VHWEWUg4mf+QITAs7SsEgVqpVpkKmwR+PO0e/r5ZQLK8jAG6plk4TeoVhwMV5o9gJrD6/lnWXvpB9ABDq9A/XvhL9eh9/+L9UN9qJKFuLDOxqw9eBJ7hv3D6fO2e3FLRZvYxVRTmXzLH5Z/zX7AwK4v9GATAU9dS6Bh75ayqGT5xh1RwNCgwOzSUjvcX3F67njyjsYv348v23/Lf0Afv7Q+QNo1heWfAITbzeb7HmgVdVw3utVnxU7j9J77D8cO2VbRhaLN7GKKCdy8iDxUx/lsxIlubJYDVqWa5nhoP+dOMPdY5awLOYo7/Ssy1WRYdknp5cZ0HAADUo14Nl5zzJ/z/z0A4jADa9Cx7dgyyyz2sQOz+E61inLu44y6vHJAttnZLF4Ebv6dk4j4Rx8exffBJxjp18BPqjfN0P7DZ2JP8/3y3fzzu+biDuXwPu31efGqyK8ILD3CPIPYuR1I7lv+n08/ufjvNryVTpU6pB2IBEzgKFsXfj+fhjX0Zjsrh1ilkhyoXPdCEoWDuLhr5bR8b25vNKlNjfXi8gRmwda8ieXu/o35I4VwG2LKCeReB6mPsa+PUv4qERJWpRrwdWRV6fqfW/saX5YvpunJq+iybA/GPLjGiqVLMTUx1rmOSWURJGgInx6w6fULlmbwX8PZsi8IRyIO5B+wPKN4dFF0PxxWDUR3qsL0wbCwY0pvDW/oiS/9W9F9TKhPDFpJXd8tpi1e49l091YLBawLaKcQ/wZmPoo59Z8z6AaDdHEUwxpPCRFbfzQybMs3HqYBVsPsWDrYWIOG/NRWEgg11QLp1ej8jS7okSer8EXCy7Gpzd8yserPmbsmrFM2zaNZhHNaFi6IeUKl6NEcAkKBxWmcGBhCgUWonBQYYL8gpCgQnDDK2aB2Llvw7Iv4J/PoEIzqHc7VO8EhUoQWSyEiQ81ZcLinfxv1iY6vT+PFlVKcFfTKFpXDyc40N/Xj8BiyVNYRZQT2L8GpvTh1IE1DK7Vgn9P7eKta96iSGAZfl+7nwVbD7Nw62E2HjgBQGiBAJpULsHdzaJoVrkENcqE5thtHbKLIP8gHm/wOF2rduXbjd8yZ9cc5u2Zl6r/AAmgUFAhigcXp2yhspQtWZayHZ6l7KFtlN0+nzK/PkGJX54gpEJzqNGJgErXcE+zK+lSrxzjF8fw1cIY+oxfRkiQP1dXDadRpeI0rFiMmmWLEBRgDQsWy+XgFUUkIrcCrwPngWGq+rnLtdrABCAM+Anor6qJInIN8CkQCHymqq95Q1avce4UxMyHlROIXzuF2WEleadaPfae2k39kPsY+XMIj+z9nUSF4EA/GkUV5+b6ETS/oiS1I4oQ4G8LP4DyoeUZGD2QgdEDOX7uOP/F/ceRM0eIi4/jZPzJFH9PnDvBkTNH2HtyLxuObODIGWc360JAoXIABGsMYWvfp9i/7xKGP2EhJSkdWp6+LatyUq5gzYFiLN+6j+lrzZ5GAX5CVMlCVC1VmCqlClM5vBDlwkKICAumdJFgAu17sljSJdsVkYiEAm8DTTGKaKWI/KyqSeuqjAKeBn4H/gQ6i8hU4DPgFmArsEJEpqnqysuRZfqa/ew6cgpFWX18BvGJp1Gg1LGViCa4THxU8y/pXM35BXcQktxASXS8aXLopP9xcfPXcwScP02BxFMEJxzjmJ+wO6AAqytWJN4vkcQTwpn9D7Akvir1ywfSt01Vml9RgvoVwigQYM1B6VEkqAhFgjK+qOuZhDMcOHWAvSf3sj9uP4fPHCb2TCxHT+wmNnYHsXEH2HXuCH8cOcK52H8vBCwDZUoL4QQRdr4AIQkFOBfrz/KDwopEP/zUD0EAITDAn6AAPwL8/QjwM3/9/QQ/QEQQEfzE+e0mX1ptXAEOhdYgPqCwx+vB/oWpGXpdhp5DRFhBOtYpmyG/Fkt24I0WUTvgL1XdAyAifwLXARNFJByopKq/OdcmAO2B3cABVf3Xcf/OcV/pGrGIPAQ8BFChQoV0BZn0z05mbzT6r1DV8fgFnEg7gLj9TcurqlP0pAwugLhO7A8wPpSi+GsB/ChLCb8oqhdtTMtyLahRJsyae7xEcEAwFYtUpGKRimn6Szz5H0d2L2Lf3qXsPbad/acOsPdsLHsTz7Cf0+wIhLNBEC9CwqX2z6Wx00WqHP831Uvnz4bzw7aMDVhpfkUJq4gsPsUbiqg8EONyvhtIyvWRwE63a51SCXPRYmmqOhoYDRAdHZ3up/zhHQ04n6iICKfiW4CAIPidO4mf+CHi59RSzW8/Pz/88EP8jLufnx+CH35+/oj4gVOjteRt/AqXomSNzpSs0Zk6qXlSBVUS9Tzx589yPtFpYWuiORJTbjdxoTUNCYlK/PnEZDeSr3henSjZKSDETNyFi/ZsEvGjUGChDOk3f5uHLT7GG4ooCHBdXyURY6JL61paYS6ZkKALt1u4QNiFC4Vy7zpslhyCCIjghx8F/HP/ShaXQlp9wRZLWnjD/rMPKOdyHgnsSudaWmEsFksOw6UvuKVzDHNM7xZLunhDEc0A2olIKREpAzTHDExAVXcCcSLSWkT8gbuAycAioLqIVBeRQkA34AcvyGqxWC6N5L5gVd2PGXiUsdESlnxPtpvmVPWAiAwBFjpOA4EbROQKVX0LuAf4AjN8e5yqzgMQkfuBnzFmutdVNeaiyF1YtmzZIRFJ008WUhI45KW0bNq+T9vX6eeUtNMa1ZFWX3AyrgOMgJMistHdzyXKdhHy+mXEnL9kSHu0jhcQ905OS/qIyFJVjbZp54+0fZ1+bkhbRP4PKKyqzzvnI4C9qvq+r2XLTqwMWYMdI2yxWLIC269ruWSsIrJYLFlBqn3BFkt62LXmLo3RNu18lbav08/xaXvqC1bVuOwTC/B9ngArQ5Zg+4gsFovF4lOsac5isVgsPsUqIovFYrH4FKuILBZLrkREaohIpJfSinIm17u7FxWRHLEdsojk2rWlrCLKBCLSVURWi8gOERnjrAaBiIwSkd0iskVE1mZj+reKyHYnnfuyKx0nrSDnvjaJyGYRucVxP+akv0VEXs7G9Ne6pPO549ZfRHaKyEYR6ZBN6d7pku4WEYkTkR7Zed8iUkBEHhGRH93cPd6viIxw8ttqEWmY1Wk7hetE572vEZGrHfdSInLK5Tk8fDlpZ1LOpSJSweX8A8xIvT9F5F4viDATKODBvTDwvRfSB5K/y94i8raIvC4id7kooGybs5XtqKo9MngA9wEhmM36ZgG3O+4TgehsTjsUMy+jHFAG2A+EZ2N6ZYDuzu9qQCzmQ1ztpWe9xe38CmCT8xxqAnuBwGyWoSiwOrvvG9gB/AjMSu9+gTbAPMyI17bAymxIuw5wjfP7WmCT87sG8Is33r8nOV1+twPWOd9iKLDKC+lvT+PaJi89gyrARmAs0Ad4GBjnuL0ILPHFu8mKww7fzgSacmfZVUDSoo7FgSPZnHyq+zplR2Jq1gv7zvm9SUQSMMrpaHak50kEt/OuwLeqegJYJyI7gIaYdQmziwHAJ0AJsve+6znHcy5uqd1vN8xSWAnATBEJF5EyzvvKkrRVdbXL9aV4N5+nxkkRSVqV/01gsKqeguQFV7MbFZGCqnra1VFEAoCM78Z4ebwPPKeqk13cPhGRXsCHQG0vyZHlWNPcJeDYhLsBvzpOIcAfIrJCRO7IpmQztJZXduCYPv7FbKpdS0S2isgvIlIlm9IrBJQWkW0iMltEGuHl+xeRYOBOTO0zjGy8b1WN9eCc2v26u+/hMp5DKmm7MgjTYgLTQmznPIdvRKT0paZ7CXwK/O0cm1X1FwARqQgkeCH9CcC7Ihdt3vQcZoFXb3CVmxICQFUnAsdUdZ+X5MhybIvIAyLyCab26cr9qrpKRKKBr4GnVHUzgKq2dMLVAmY5az9dzmKOnsiWPZrSQ0SeBnoCHZ2MXkJE/IAnMYvVtsjqNNVMhCzipN8DUxD+jHfvvyfwmyPLOrxw3254da8ud5ya/juYWvbNAGp2Ui7t9Em84VzPropXClT1PRFZgmmVzXC5FArc7wURXga+Af4VkT+As8DVmJb7zV5IH0yrzE9VXd8/Tr4M8pIM2YJtEXlAVR9W1Wi3Y5XTafs50ENVv/MQbi0wH7gyG8Ty+lpeIvIhpl+ghWtty/kQPgFqZWf6TlqTgWC8f/+3YbYkcZXFa/dNxvfqisC0lrIMp9b/AxAH3OCYB5NR1XhgDN55Dq7pLlTVaY5ZMsltjarO9ULa8araHdNPHIN5D8+ranNVPZjd6TvMwvQFuTMYyPZnkJ3YFlHm+BjooG5bUojZ0mKrYyZogjFnZDUzgOEiUgpTgWiO6azMFkSkKVBdVa93cSsNnHRaCXcCS7Ip7aJAgKoedkaLHQGmAV+JyFuYZeuLAyuzKf1CmBZx0pYkXrlvN1K732nAoyIyATNwYZOqZnW/TU/goKo+4+ooIuWBA0A8piXkjeeQo1DVf4B/fJT8AGCSiPwD/IVpDbfCmCa7+kimLMEqogwiIgUxo8f+cDET/6mqDwFfOv1GpzFrbO3I6vTV+2t51QOiRWSLi9sYoI8zcGEL8GA2pV0cY+IEMzqwh9MiHQ+sBc4AD6gzlCgbqAesVdUkk1dlYKIX7jsZVV3m6X6dYdbXANuAw8Dt2ZB8PaCz27vvAlQH3gXOAcswI7csXkJVj2L2cmsO1MWY455V1dm+lezysWvNWSwWi8Wn2D4ii8VisfgUq4gsFovF4lOsIrJYLBaLT7GKyGKxWCw+xSoii8VisfgUq4gsFovF4lOsIrJkGyJS2NcyWCyWnI9VRJZsQUSKAP84KyZ7ul5URK4SkXdFJMrF/cVsXDjWYrHkQOzKCpbs4mnM0iN/OiskXIVZwTuJGYC/83u8s5Dkq5iVpINEJNBZ08xiseRxrCKyZDki0h5oBtRPWqBSRBYBrV3OawPdnSB3quoOZ6Xv0ZhN4J7A7DtjsVjyOHaJH0uWIyKdMMvjvwqccpzdW0QnMPs41cdsvlYEeAmzztlOYAFQU1Wnekdqi8XiK6wismQLItISuFFVn3bOFwEtXVpErwLjgaHAI8BbqvqAM8AhSlXX+Eh0i8XiZaxpzuJLgoAbgCpAuIjMSbogInUxm/EtTCWsxWLJI9gWkSXLEZGxQFU3Z3fT3EHgJLAes9tp0kZj+5w9iDqr6iPekNdisfgWq4gsXiEV09x6VZ0gIvWBxzC7THYFBLjFdSdOi8WSd7HziCy+Ygdwr4j8DrwBoKpfAAMxO09W9p1oFovFm9gWkSXHISL+LrujWiyWPI5VRBaLxWLxKdY0Z7FYLBafYhWRxWKxWHyKVUQWi8Vi8SlWEVksFovFp1hFZLFYLBafYhWRxWKxWHzK/wPISsT4hB8l1QAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "%matplotlib inline \n", + "import matplotlib.pyplot as plt\n", + "fig = plt.figure()\n", + "fig.set(alpha=0.2) # 设定图表颜色alpha参数\n", + "plt.rcParams['font.sans-serif']=['Microsoft YaHei'] #显示中文标签 plt.rcParams['axes.unicode_minus']=False\n", + "\n", + "plt.subplot2grid((2,3),(0,0)) # 在一张大图里分列几个小图\n", + "data_train.Survived.value_counts().plot(kind='bar')# 柱状图 \n", + "plt.title(u\"获救情况 (1为获救)\") # 标题\n", + "plt.ylabel(u\"人数\") \n", + "\n", + "plt.subplot2grid((2,3),(0,1))\n", + "data_train.Pclass.value_counts().plot(kind=\"bar\")\n", + "plt.ylabel(u\"人数\")\n", + "plt.title(u\"乘客等级分布\")\n", + "\n", + "plt.subplot2grid((2,3),(0,2))\n", + "plt.scatter(data_train.Survived, data_train.Age)\n", + "plt.ylabel(u\"年龄\") # 设定纵坐标名称\n", + "plt.grid(b=True, which='major', axis='y') \n", + "plt.title(u\"按年龄看获救分布 (1为获救)\")\n", + "\n", + "\n", + "plt.subplot2grid((2,3),(1,0), colspan=2)\n", + "data_train.Age[data_train.Pclass == 1].plot(kind='kde') \n", + "data_train.Age[data_train.Pclass == 2].plot(kind='kde')\n", + "data_train.Age[data_train.Pclass == 3].plot(kind='kde')\n", + "plt.xlabel(u\"年龄\")# plots an axis lable\n", + "plt.ylabel(u\"密度\") \n", + "plt.title(u\"各等级的乘客年龄分布\")\n", + "plt.legend((u'头等舱', u'2等舱',u'3等舱'),loc='best') # sets our legend for our graph.\n", + "\n", + "\n", + "plt.subplot2grid((2,3),(1,2))\n", + "data_train.Embarked.value_counts().plot(kind='bar')\n", + "plt.title(u\"各登船口岸上船人数\")\n", + "plt.ylabel(u\"人数\") \n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A329BAF0557149368662B041B9D39B74", + "mdEditEnable": false + }, + "source": [ + "在图上可以看出来,被救的人300多点,不到半数;3等舱乘客灰常多;遇难和获救的人年龄似乎跨度都很广;3个不同的舱年龄总体趋势似乎也一致,2/3等舱乘客20岁多点的人最多,1等舱40岁左右的最多;登船港口人数按照S、C、Q递减,而且S远多于另外俩港口。\n", + "\n", + "我们可能会有一些想法了:\n", + "\n", + "- 不同舱位/乘客等级可能和财富/地位有关系,最后获救概率可能会不一样\n", + "\n", + "- 年龄对获救概率也一定是有影响的,毕竟前面说了,副船长还说『小孩和女士先走』呢\n", + "\n", + "- 和登船港口是不是有关系呢?也许登船港口不同,人的出身地位不同?\n", + "\n", + "来统计统计,看看这些属性值的统计分布吧。" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A097A57249B140889E39306B2C44A70B", + "mdEditEnable": false + }, + "source": [ + "### 属性与获救结果的关联统计" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A8B155CD2A154348A15AB35464861C24", + "mdEditEnable": false + }, + "source": [ + "#### 看看各乘客等级的获救情况" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "2CC0D1CFD0E54BCE8486ADB8FE0E2EAD", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEUCAYAAADEGSquAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgDklEQVR4nO3deZRU1b328e9jMyuIIIgCl3aIAYwJiTgGBxRilPd64xCCwxVjFGcwr1c0Ro04R6M4RMUBLiaYV9QETYKRoIm6zJWriKCigoYItqI0oBEHDNi/9499OBRtMylV1cPzWasXfXadOrWrmnWe2sM5WxGBmZkZwGblroCZmdUfDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUrC0nNS/AaOxT7NcwaG4eClZykVsBLknb+gs+XpDaSukj6t3Xs+ktJ56zjOM0kvSipd63yZyV9ay3POVvSVzairtdIOrJguyI7/tbreM4lks4s2D5B0i/Wsu9eku6VdJWk72RlwyWdUMe+e0u6rI7yaZK6beh7ssatWbkrYE3SSGAH4PeS6nr8mIiYIekRoANQk5W3BwQsBpYDHwN/B86W9A1gbK3jtAd6Szq6VvkpEfEccBrQBhhXUI8pwHbArQVlf4mICyT1BE4EbpN0ffb7e8BH2X5fAXaPiBcgbw0dCxSe0PcHKoAPs3AEWAHcAvTLtjsDKyWdWvA+Wkn6brb9LPAj0pe6iuwzuRk4HPgzcCQwnM9rCWwlaQvg0YLyXUh/i39l25OBl4CIiAclVQKvAPNrHa858F5E9K3jtayBcihYSWXfZo8CXgV+FBHPZOUCRgN9gJcl7QhcUuvpR5FOghNrHbM30BZ4KSJOkDQU6BARo+t4/fFAW0kHAf8OfAv4DelEWgPMAvYGbsrqt6SgfjcBw4AjgAlAF+COiHhcUi/gfmB2tv/3sn0WA89I6gLsRQqJrYCZWZW2Ac6NiFMltY6ITySdTDrZPpAd6/8AO0bEjQXv4yLgB0BroCMpDJD0BPBtYGJBqE0GrgG2BA4EDs62V7WyzgYeAv6ZbT8EtAD+IqkGeAGYFRF71fosK4EHan/G1rA5FKxkJO0J3A4cRDqJ/1bSj0gn0jtIJ8vDImJ5dsLplT11B2Ae6SS2GdAX2JHUSgB4C3gauFvSXtlrVEk6tlYVHgZ+DbxGOrEfExH/lDSF9G25P3BNRCyQdE/2OlOy554DfAJ0J4XVnlldtskevxT4WUR8VvB6D0TECdl7fxzoBgwEvhIRn2bldwKfSmoNPCHpGFJ4TC04Tj/gUUl3A7+KiMeA24DfZZ/j2cAxpEA9EPhFRJyffd7nR8Q5ki4EWgH/Axyafd4fZ8f/JPssF2fbH0ZEtaTDgGuBk7Emw6FgpfQ28IOImAeQnbQfAz4FJgHfLzipvkb6Fgzpm3SP7Jtvs4j4paQ3SF0oQ4H7I+IdSX2AMaQT328j4pbsdQ4inQQfjIgZ2TFvlvRXSW2z7eOBrsBySWesqrCkiyPi26Tuk3+RunmOjoj3sxP590ktly7AKEn3R8Taxuq+S2ptFN5bphnwadZCOJfUUrk1Iu4t2KcT6Zv7OcBFkp7Myu4DVtX1v4DewH7An7KyXVkdnDsAT5K6oh7J3kf+NoEfF9TrTeDwrIvtwCygrYlwKFjJRMSbkj7IupCOJH1j/TXpm/6PgKck/RGYQwqFntlTC/+fVmT/tiJ1//w/YFbWT/5TUtdINTBBUj9S4HwVGBQRr9aqT//CbUlXA69GxPg66j5Z0k+Asdk3dUjdMV0kdY2IfbPX+9xAboG7svd0j6TBkW481oLUUjgMuAJ4AliataB+nD2vK6l19WG2fWFE/EzSr4Cvk/r2dwX+G2hHasVA6gabnP3+L2AB0Dci/ihpOfC1Ouq4MiJ+WfCZdMx+/YakBaRQqQa2BRaSxlSsEZFviGelImksKQieIHXL3B8RH2aPbQYMAA4gncyeIvXJf5001tCd9E23U1b2Gunb7zeB84HBQA9Sl9O+pJPop9lLtwX+BrwILCWdHE/JnluXFtlxZmXbs0ndNVOBe0mtgnOBB0lB1D8bE7iP1EKZmI0p3E06cULq+tonO+bDwP9ExGWS7icNkHchDf4eCQyPiB0LPrfpwOSI+Fmtz7PNqtcHro2ISdnn+CRwMWmsZMeI+Cjb/wDgqIg4U9Jd2ecwgzWNjYhtsv17ANOA3UiD8jOBQ4CrgfGkge2zI+Kna/kcrQFyS8FKaQRwEulE2Qc4r47ZRzuSxhZGkU7ET2fl55BmBX0AXEg6Qc8jzUIaxuqT9SzgOWB0RCwCkNSJ1C+/K+lb9CMR8cO1VTKbFfR6RPQpKNuW1G//KvAPUtfWp6TB5RGSfkYKqfsLDjWp1pgCERGSjgPGZ91PrYFPImJ81lo4CghJ20TEu9lU0dbAYZIui4iV2fG6kwZ5nyXNfvqppO0i4hZJN5LGG25bFQhrcTQppAu1Kfj9NFJwv03qtrqI1d1RREQ1KZSsEXEoWMkUtAqIiJ517ZONFRAR5xSUzQVOJU27FPAfwD0RcVLBU5/I9n2d1NI4s1bgrJo+eUC236HAldljrUmDrflLAp0lzcy2R2XfwmeQWjLnkYLnkYiokXQF8EfgPyOihvXIZjT9e1aPLYCPs4Hggdl7u5/UInoX+DlwIylERwDXZYc5nhScS0kDzYOBn0mqIA1+b0HqMlqXH0fEI4UFkqqyf9uTptzunm1vDvwwq1/h/l2AvSNi0vretzUMDgWrt7IT5s9JJ/mBpG+2zUjdFo9J2gcYmX1jzdU1bz77xj2hYJ+HgYezawluAN5f1Q0iqQNpCmafgudfT5qN9CdS3/1dwHcl7QLcSprxdIOkDyPiwexpR2VdNpBaMUhqBrSNiPeyQe6ewCJSV9TVpBlN3UnjCteSwuEu0vjFDEkfRMSd2edSk30mK7LjXUgKjX1I4yi/zlo4IyPi/To+4tGSLqlVtmX277nAlIiYn534J5IG6l9Umi7cItuvN3ACaaKANQIOBSsLSS+t5aHtsseHk/rF7wX2yaapApCdAL9NOjHOl3RLRJxbcOzpdRy3OXUMikbECuAMpSukLyF9664g9ccX7vd/C44/gTSX/1DgIuCsbBxhIvAbSf2Bv/L5KamQTqaLJa2aDvpQduJtQerjV/baY0gXtX0va328J2kg8JDS1dAzgT+QurCGZseaSpp5dUBEfCxpQPYZjQQuqOMzWaOlIKklqy9QewJ4U+nK7oeBGyPiquyxt0gX073G6llR1kh4oNlKTtI7EdFlLY+9QZoV04M0E2ZOwWNnkqak3lBQ1h1oXjDN9fWI2KmO43YDJqzqPvqS9T+ANBB+ODA9Iv5R8Fgn0rftBUCLVV1mtZ4v0uypmlXXKxQ+lo07bBsRC+t4bkuA2s/LHmsVEcvrer3smAew5kDzAxHxiKQDgTtJgfTHiBhe8NzNgO4RUftqZmukHApmZpbzDfHMzCznUDAzs1yDH2jeeuuto7KystzVMDNrUJ577rnFEdGpdnmDD4XKykqmT69rsomZma2NpDonD7j7yMzMcg4FMzPLlSQUJM2W9Hr2My4rGyFpgaQ5kg4p2PdqSVVKyyTuVor6mZlZUqoxhZaFFxRll8mfQVrYpDtpAZEepLtb9gMqSXd+HEu658tGWbFiBVVVVSxf/rnreJqMVq1a0a1bN5o3b17uqphZA1KqUKh9hdzhwH0RsYy09OIbpNvzHgGMz+4EOVVSJ0ldIuKdjXmxqqoq2rZtS2VlJXXchbPRiwiWLFlCVVUV22+/fbmrY2YNSNG7j7K7K24jaZ7SSle7k1oHhSPfVaRFO2qXv5WV1z7mMEnTJU2vrq6u/TDLly+nY8eOTTIQIN2FtGPHjk26pWRmX0zRQyEiPoqIdhGxA+lukpNIN9EqvMVwDfDZOsprH/OOiOgbEX07dfrcNFuAJhsIqzT1929mX0xJZx9FxP2kG4EtJC0xuEo30rqwtcu3I7UizMysBErRfbSlsnVes1lGS0nrxg7JblfcC+hAuhXwZGCopIrsNsFzI2JpsetoZmZJKQaaO5BmFwG8A3w/ImZl96SfTVpO8aTs1r6TgP1JyywuIS1/+KVVnj95/TtthDeuHrRR+48ZM4b27dszZMiQzz22ePFiTjrpJB588EEAjjvuOC6//HJq37pjyJAhnHrqqcyaNYsRI0Zw3HHHMW7cOFq0aPG5Y5o1Spdsuf59GrJL/lnuGgAlCIXsXvM71lF+JauXQ1xVVgMMz34arHnz5nHMMavzbNGiRVRUVHDDDTfkZb1792bu3LmsXLmS1157jX79+gEwd+5c5syZQ8uWLenZsyc9evRgypQpzJkzh/nz51NRUcGAAQOYMmUKBx54IAC77747o0ePLul7NLPGqcHf+6g+2mGHHZg2bVq+XVdLobq6mtmzZ9OnTx8uvfRSrr/+egDOP/98jj32WLbZZhs6d+7Mq6++ytChQxk5ciRjxozh+eef5/HHH+fuu+/m0EMPpX///g4EM9tkfJuLMtl6660ZP348gwcP5uijj87Lp02bRufOnbniiitYunQpK1euZPjw1HCaOHEiEydO5LrrrmPRokWsWLECL5JkZpuSWwpFcPrppzNjxox8+4UXXqBFixZrdB/9/e9/Z9999+WTTz6hWbNmeffRzJkzOfLIIwEYPXo0l112Gaeddhpjx46lY8eOnH322fTu3Zsnn3ySrl27sueee5b0vZlZ4+ZQKIJbb701//3mm2+mVatWrFixghEjRjB48GAAZsyYQceOHdlvv/149913eeqpp/jggw/o0KEDd911Fz179gTS1clTp05l5syZvPfee9x8880MHz6c/fffn6uuuoobb7yxLO/RzBondx8VybJlyzjttNPYaqutGDJkCGeddRYvvfQSI0eO5KOPPiIiGDZsGCNHjmT27NkAjBs3jhEjRnDdddflxznvvPOorKzkW9/6FldddRWnnHIKK1euZL/99mPJkiXsuuuu5XqLZtYINYmWwsZOIf2y7rnnHiZMmMCoUaPYY489GDNmDACXXnopkydPZsCAAZxwwgnce++9/OlPf2LRokU8+eSTPPTQQ0ydOpUTTzyRO++8k5NPPplRo0bRunVrpkyZws4778xjjz3GuHHjmDZtGgcddBAnn3wyN910E61bty7pezSzxqlJhEKp9e7dm4ceeqjOawgGDRrEwIEDmTZtGgMGDGDFihUcffTRXHjhhfz2t7+lWbNm3H777QwZMoSFCxfSq1cvLr74YrbcckvatWvHWWedhST+8Ic/0KpVK0aPHs15553HTTfdVIZ3amaNjRr67JW+fftG7eU4X3nlFXr16lWmGm28ZcuW0aZNGyoqKtYoX758Oa1atVqjbMWKFRt8O+yG9jmYrZMvXtukJD0XEX1rl7ulUA+0bdu2zvLagQB4fQQzKyoPNJuZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmuaYxJXVTz2/ewPnEG7KAjhfPMbP6pGmEQomdfvrpvPDCC+tcQOfpp59m77339uI5ZlavOBSKYNSoUetdQGfp0qW0adPGi+eYWb3iMYUi2JAFdBYtWuTFc8ys3nFLoQhuu+02Pvjgg3UuoNOuXTsvnmNm9Y5DoQj22msvBg0atM4FdCKCc88914vnmFm94u6jItiQBXS8eI6Z1UdNo6VQ4lvSzpw5c70L6HjxHDOrj5pGKJRY//7917uAzttvv829997rxXPMrF7xIjtFtqEL6GzM4jkbqj59DmZfmhfZ2aS8yE6ZbOgCOl48x8zqAw80m5lZrtGGQkPvFvuymvr7N7MvplGGQqtWrViyZEmTPTFGBEuWLKlzjWczs3VplGMK3bp1o6qqiurq6nJXpWxatWpFt27dyl0NM2tgGmUoNG/enO23377c1TAza3AaZfeRmZl9MQ4FMzPLlSwUJD0s6a7s9xGSFkiaI+mQgn2ullQl6UVJu5WqbmZmlpRkTEHSwUAf4G1JOwJnALsA3YFHJfUA9gX6AZVAf2Bs9hwzMyuRorcUJG0OjAKuyYoOB+6LiGUR8TLwBrAbcAQwPiJWRsRUoJOkLsWun5mZrVaK7qMbgeuB97Pt7sD8gsergG3rKH8rK/8cScMkTZc0vSlPOzUz29SKGgqShgIREfcVFLcAagq2a4DP1lH+ORFxR0T0jYi+nTp12sS1NjNruoo9pnAW0F7Sq8CWQGugHbCwYJ9uwJtZWdeC8u1IrQgzMyuRorYUsm/zO0VET+AnwAPA7sAQSW0k9QI6ADOBycBQSRWSBgJzI2JpMetnZmZrKvkVzRHxnKQJwGxgOXBSRISkScD+wDxgCXBMqetmZtbUlSwUImI8MD77/UrgylqP1wDDsx8zMysDX9FsZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUa5cprZtb4VC7/TbmrUFRvlLsCGbcUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHJFDwVJm0maKmmupDmSDs7KR0hakJUdUrD/1ZKqJL0oabdi18/MzFZrVoLXCOD4iFgo6bvAFZJeB84AdgG6A49K6gHsC/QDKoH+wFigTwnqaGZmlKClEMnCbLMHMAs4HLgvIpZFxMvAG8BuwBHA+IhYGRFTgU6SutQ+pqRhkqZLml5dXV3st2Bm1mSUZExB0khJS4AfA5eSWgfzC3apArato/ytrHwNEXFHRPSNiL6dOnUqXsXNzJqYkoRCRFwTER2BC4ApQAugpmCXGuCzdZSbmVkJlHT2UUT8DtgCWAh0LXioG/BmHeXbkVoRZmZWAqWYfbTDqnEBSXsDy4HJwBBJbST1AjoAM7PyoZIqJA0E5kbE0mLX0czMklLMPmoPPCKpAlgE/CAinpM0AZhNComTIiIkTQL2B+YBS4BjSlA/MzPLFD0UImIGsHMd5VcCV9YqqwGGZz9mZlZivqLZzMxyDgUzM8s5FMzMLOdQMDOz3AaFgqRBko6X1K5WuSQ9IOnfilM9MzMrpQ2dfdQe+E/gBEmtgaeB24CzgUURsaAotTMzs5Jab0tB0tdIdzq9OSIOJN3JdDNgDvDNiDi9uFU0M7NSWWcoSGoGjCDds2iopHuBx4CWwDeAeZIuKnotzcysJNYZChGxErgsIr4G/BnYBhgREadFxIukLqVvS9qj+FU1M7Ni25AxhSMktQQOBq4CBksaBSwFLifdn+iZItbRzMxKZENmH70JvAt0Jq2F8ATQFriadBvsJ4tWOzMzK6kNCYX2pBZFN2AFsCPQm3RL62pgn2JVzszMSmtDQuEjYCvgeWAqsIC0fObfgMuASklti1VBMzMrnQ0JhWci4lpSIFwO7EG6TmFf4BGgJ/DVotXQzMxKZr0DzRExL/t31W2uL171mKTzgfeBGcWonJmZldYXXk9B0gXAEODgbB0EMzNr4Db6hniS9pT0OLAD0C8iFm7yWpmZWVmst6WQdRG9DfQADgJeJl3ANqvIdTMzsxLbkJbCYtJg8i7Av5GuUfBsIzOzRmhDBprvWvV7di+kQ4ErJFUDp0TEkiLWz8zMSmijxhQiYmVE/B44gDTj6AmvpWBm1nh8odlHERHAlZKWAJMk7RURKzZt1czMrNS+8JRUgIi4XVJ3oAvpHklmZtaAfalQAIiICzdFRczMrPw2+joFMzNrvBwKZmaWcyiYmVnOoWBmZjmHgpmZ5b707KMm55Ity12D4rrkn+WugZmVkVsKZmaWcyiYmVnOoWBmZrmih4KkFpJulTRX0muSjszKR0haIGmOpEMK9r9aUpWkFyXtVuz6mZnZaqUYaO4A/CUiTpe0M/CMpJeAM0hrNHQHHpXUA9gX6AdUAv2BsUCfEtTRzMwoQUshIt6JiAey3+cCK0lrO98XEcsi4mXgDWA34AhgfHaL7qlAJ0ldil1HMzNLSjqmIOmHwAuk1sP8goeqgG1JrYbC8rey8trHGSZpuqTp1dXVRayxmVnTUrJQyNZ6Hg4cC7QAagoergE+W0f5GiLijojoGxF9O3XqVLxKm5k1MSW5eE3SLcDmwLcj4mNJC4GuBbt0I63HULt8O1IrwszMSqAUs4/2Ar4aESdExMdZ8WRgiKQ2knqRupNmZuVDJVVIGgjMjYilxa6jmZklpWgp9AH6Snq9oOxMYAIwG1gOnBQRIWkSsD8wD1gCHFOC+pmZWabooRARY4AxdTz0CHBlrX1rSOMOw4tdLzMz+zzfEM+aDt/M0Gy9fJsLMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMckUPBUktJZ0maVKt8hGSFkiaI+mQgvKrJVVJelHSbsWun5mZrdasBK8xB3geaLuqQNKOwBnALkB34FFJPYB9gX5AJdAfGAv0KUEdzcyM0nQf9QFurFV2OHBfRCyLiJeBN4DdgCOA8RGxMiKmAp0kdSlBHc3MjBKEQkS8X0dxd2B+wXYVsG0d5W9l5WuQNEzSdEnTq6urN2FtzcyatlJ0H9WlBVBTsF0DfLaO8jVExB3AHQB9+/aN4lXTGpPK5b8pdxWK6o1yV8AahXLNPloIdC3Y7ga8WUf5dqRWhJmZlUC5QmEyMERSG0m9gA7AzKx8qKQKSQOBuRGxtEx1NDNrcsrSfRQRz0maAMwGlgMnRURk01b3B+YBS4BjylE/M7OmqiShEBGPA4/XKrsSuLJWWQ0wPPsxM7MS8xXNZmaWcyiYmVmuXFNSGyxPazSzxswtBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMws51AwM7OcQ8HMzHIOBTMzyzkUzMwsV+9CQdJgSf+Q9LqkE8tdHzOzpqRZuStQSFJb4DpgL+AzYKakP0REdXlrZmbWNNS3lsLBwBMR8VZEvAP8BTiozHUyM2sy6lVLAegOzC/YrgK2rb2TpGHAsGzzQ0lzSlC3ctkaWFyqF9PPS/VKTYL/dg1bY//79airsL6FQgugpmC7htSNtIaIuAO4o1SVKidJ0yOib7nrYRvPf7uGran+/epb99FCoGvBdjfgzTLVxcysyalvoTAFOFhSZ0ldgH2AP5e5TmZmTUa96j6KiHcl/RR4Ois6JyI+Kmed6oEm0U3WSPlv17A1yb+fIqLcdTAzs3qivnUfmZlZGTkUzMws51AwM7OcQ8FsE5FUkd2qpXb5d8pRH/tyJHUodx3KwaFgtglIOoJ09evfJT0qaZuCh39ZpmrZBpI0UNKzku6U1F3SK8AsSXMlfbPc9Sslh4LZpnEVsEdEdAbuAR6W1DJ7TOWrlm2gm4ALgWdJ10tdEBHdgaE0sVD3lNR6RNLvgC3W9nhEuBuinpI0JyK+WrB9PrBzRJwoaW5E7FzG6tl6FP79JC2OiK0LHnstIr5SvtqVVr26eM2YCgwgfWuxhuVRSUdExO8AIuJqSb+UNJV0YzWr3+ZL2iYi3gXyW9Nl3YA1a39a4+OWQj0iqQUwB/hGRHxQ7vrYhpPUHOiQnVQKy/cE+kTE7eWpmW0ISYo6ToaStgfaR8TzZahWWTgUzMws54FmMzPLORTM1kPSZpLaSupax2MHS2r/JY+/paTvf5ljmG0q7j4yy0jaijSd9DPS4OK2QJDW+fgY+BC4CNi14Gn/AbQDfl1QNjsiFko6BBgPvJKVXw/cBbycbb8cEadLugB4nfQlbTTwNvAJaSXCURExbtO+U7O1cyiYZSTtDRRedHYg8C/gqYKyrwGVwDTgBNJJvtCewP+STvKHAc+Tlph9GvgdcAkwMCJ+kb1mB+BXwHBgL+AA4NqIeC2bufTDiKjaRG/RbL08JdVstdak60S2JbUMWpK+vXfNHnuHdCHaBOAY4CvAcbWOcSuwFNgJuJi0xOwgYGdgUkTMkrRHwf7XkELnWuBy4CCgU3bh22IHgpWaQ8FsTTuRVvy7HvgIWA68D/woe3wCcAqwAJgODAOWAZcCn5JO/CFpc+CPBcfdDqiRdDyApB9kx5kMHAy8HhHPS2oNHA78BeguaRnQ1VOUrVQcCmarzQDmAc2z7Y5ABSkU3gXOA04CxgDPAccCs0ljEL8itRKaAStIYw2PFhy7Z7bfawVl3wHuy455jqQ2pAvdOpLGLp4B7ncgWCk5FMxWOxH4LmmA923gXNKYwjlAH+BKUnfSocAHwN+Ab5JaFIOAC4AdJT0bET+p6wUkfQ/YqWBM4UxSN9UtwH8D7UmtlBHAVsBtm/xdmq2DB5rNMpI6A0tItxoZTeoq+ggYDNwIvEpqGXyddAJfXOsQW5MGhp+XdDjpxN6a1K1UU7BPa+BNYGJE3CZpM1K4dAeGAEeTbnlCRAwoyps1Wwu3FMxW2wL4L9KU04HZz3LgB6QQ+AdpyuqnwIMRcUnhkyVdQvrWT0RMAiZJOoAUDsdHxDJJRwE9IuK67Dk/z17vf4H9Sffd2R3oBFRIOiwifl+0d2xWi0PBbLX9gb9GxEgAKd3xOpsBNFDSQGBzUigcL6lfrefvADxSWBARj0t6OXv+xaQuoaEFu/w0IlZKGkSazroT8EPge6Suq/slfSMiLtuk79RsLdx9ZFYiWTdRrOXGa5uTpq92B+ZExKdZeQXQLiLeK2llrclyKJiZWc73PjIzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPL/X94725DWKRYwgAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "fig = plt.figure()\n", + "fig.set(alpha=0.2) # 设定图表颜色alpha参数\n", + "\n", + "Survived_0 = data_train.Pclass[data_train.Survived == 0].value_counts()\n", + "Survived_1 = data_train.Pclass[data_train.Survived == 1].value_counts()\n", + "df=pd.DataFrame({u'未获救':Survived_0,u'获救':Survived_1})\n", + "df.plot(kind='bar', stacked=True)\n", + "plt.title(u\"各乘客等级的获救情况\")\n", + "plt.xlabel(u\"乘客等级\") \n", + "plt.ylabel(u\"人数\") \n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "613662C50043454394E17B230BE8EA05", + "mdEditEnable": false + }, + "source": [ + "明显等级为1的乘客,获救的概率高很多。恩,这个一定是影响最后获救结果的一个特征。" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4A247574DA87475C872ABC0755B32F90", + "mdEditEnable": false + }, + "source": [ + "#### 看看各性别的获救情况" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "CF47A03A505D43C6B34753EF10403A4E", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEUCAYAAADEGSquAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAbAklEQVR4nO3dfZRU1Z3u8e8jrypEArYCNgrxBZE4YS7EIGkGTBR04nIGzcS3GA0hxrdBczPJVaOGkBGJxns1TsBxJGJ0JUpYEHUwKsYBdRQVEAeVgIagNKK2oAZfMED/7h/n9KYoGmheqqqhn89aveizz66qX7HgPLX3PueUIgIzMzOAvSpdgJmZNR8OBTMzSxwKZmaWOBTMzCxxKJiZWeJQsN2SpH230N6q3LWY7UkcCtasSRooaWpRW2tgoaTDi9oPA/5QsL13/ucYSZft4Ov/UVLPbfTZS1IHSd0lHdzI/jGSLinYPk/Sz7bwXAMl3SPpOknD8rbRks5rpO+xkn7SSPscSdXbfndmm2td6QLMmkrSl4Fe+eZi4H9LmpdvzwU+KHrIA40dNAuebz3wakHTgcC3gWeBYyJiakHfVsB8oD1QnzdXAauAdcBHwIfAn4BRkm4FavJ+BwDrJV2Qb3cC2ks6Md9+DvgW2Ye0VoCAW4ARwCPAacDoRt5CO+DTkjoAjxa09wXul/TXfHsG8CIQEfG7POQWAa8VPV8b4N2IGNDIa1kL4VCwZkvSdcCZQGdJLwJ/JDsI/4mNI4JOwBCgK3AP0EPSaWQH02OBecBxW3iJ9yLiyILXuzX/9TPAeUAKhYjYIOlqoG3etC9wE3AVUHgF6Id5/wsk7R0RH0v6NtnBdmr+OicDh0bEzQWvfTVwOrA30CWvH0mzgS8C90pq6D4DuB7YD/gSMDzfbhilXAbcB7yfb9+X1/2YpHrgf4AXImJg4V9GHhabjMqs5XEoWLMVEVdIug/4l4j4av6p/z7gBKBdRPwEQFIt2UHvXLJP7NeSjSg+BuaQfVLfIGlU/tT/HhG3kH1av6DgJY9i00/cxd4mGykIGAd8H6gr6rM2r2lvYLaks4CBwMyCPjXAo5LuBH4VEX8AJgLTgI5kB/WzgH5kB/2fRcTlkr4AXB4R35N0VV7LU8DfA7fl7538ff8JeCff/iAi6iSdAtxANhoya5RDwXYbEXE1gKRfAd8saL9H0g1sHElcA0wB/jEinpQ0hmxUcFPxUwLdgL+STaWs29JrSxpLdvCFbCTxF+CCLfRdEBGjJH0feAGYEBH3FHSpIgux7wFXS3o8b5sCXJz3+ReykPo74Pd529FkB/uGGh4nGyk9BPyisATgu2wcwSwHRkTEPOBL21ojsZbNoWDN3aeBfpImAkcAh+Q/d+bTKW2A9cAS4C7gHLL1hR9HxJPbeO5DgUuAtyPibkn3A5+QTTttIiKuAa6RdAzZaOJ6YHUjz/lERKzIP5VfC8wGVkv6FtmBGuAg4MtsXAO5KiJ+lIfd3+Tv6WjgDuBTwBfyfseSTR1BFmSvAwMi4j8lrQU+20g96yPi3xo2JHXJf/2cpNfJQqWOLBxXAu82+jdlLYZDwZqt/CA5mGza5qcRsUzSo8B3IuIPktqTLTj3i4g1+dlHRMRHklrn6xCw+fTRd4DOZIu7RwEfSPpKvu9W8imgRuo5imz6aj7ZQblDUZdT88euyJ//JPJF4og4FJiUP89cYEZE/Kjo8bcAPyQ7QN8QEY9JmgU8LulLwFfIF5zzNYuhBY89gywk5hc95yTg3/LXPYRsOq0/8DNgQV7jeGAy2cL2ZY29d2s5HArWnF0K9CZbU1gm6XNkB7QOkvYCrgDuj4g1xQ/M1wxugeyUUIqmjyTtR3YGzk3Ab4BnyD7JH8TGKRoK+g8jm6L5GfADYFRxH6BHwetPzkcLXwVC0oER8VZ+qujewCmSfhIR6/Pn70G2yPsc2WL1DyV1j4hfSLqZbL1hYkR8uJW/rzPZOMXVYJ+C3y8EfhsRb5BNW11d+F4joo4slKwFcyhYsxUR7xaccUNEvJAHw2VkB+fOZFNAO/Lc7xdMufw+IlZL+piNi7XFjiT7pN6d7JTVmxvp862GX/KF4BOAfwB+S7Zm8Bbw0/yx/chC78b8Id8Afkw2JXUZ8DXgR/mpsAeSjUoaTjHdku9GxEOFDfkiPJI6ASOBz+fb+5Kty/xDUf+uwLERMX0br2V7KIeCNXftyBdM8zN6jgEGkJ1qugJ4Jj919a7teVJlaXMzGwNBQB/goYiYBczKRyOdgA0R8fP8cd3ztiMbedouBb//jmxaZi+yEcTqfDG8Crid7HTS+ZL+EhH/QRYW9WSf9tflgXgVWWgMIhsx3SWpG/CDiHivkdf/f/moqNB++Z/fBx6OiNfyA/+9wO8iYqGkQ9l4qu1RZKfjOhRaKIeCNVv54vKZZFMp55NNbTwEXBoRz+d9JgATyNYNpuVtf5e3NSheU1gBvER2YL9E0htk/xdeAv47n6v/Fdk0zztkC7CFepMdOIsdVPD7ErI5fgG/JlurWEd2RlQ98K6kE4D7JO1PNr//ANlC97n5c8wkO+V0aL5OcjxZePwAuLKR199kpCCpHRsvUJsNLJf0v4AHgZsj4rqCv4/2kl5h41lR1kLJ37xmu4P8ALcuP6A2tr810BO4PSKGNuH5ugLvNMzpF+1rQ7bYC/BGYZ88MM6IiM1OR5U0HpgTEb8raFNEhKRuEVEcLg3vi4j4pJF97SNis0XvguccCnw1Ii6RdDswNSIeyhel/4MskP4zIkYXPHYvoEdEFF/NbAY4FMzMrIBviGdmZolDwczMkt1+oXn//fePnj17VroMM7Pdyrx5896JiKri9t0+FHr27MncuXMrXYaZ2W5FUqMnG3j6yMzMEoeCmZklDgUzM0t2+zUFM7OdtW7dOmpra1m7ttEb5O7W2rdvT3V1NW3atGlSf4eCmbV4tbW1dOzYkZ49e1J4E8bdXUSwatUqamtr6dWr17YfgKePzMxYu3YtXbp02aMCAUASXbp02a4RkEPBzAz2uEBosL3vy6FgZmaJQ8HMbDdRX19PqW9i6oXmchmz37b7WNOMeb/SFdgeruflM3bp8y0b/5Vt9jnyyCPp2rUrAG+99RYRscn2okWLGDt2LEOHDmXo0KHMmTOHvn37ctJJJ/Hkk0/uslodCmZmzUDXrl2ZNWsWAHfffTfr16/nvPPOA+D4449P/Z555hmGDh3K6NGjU/9dyaFgZtYMvPnmmwwdOhTYOFKYPHly2r7ppptYtmwZixYtoqqqipUrVzJs2DAWLlxITU0NACNHjmTkyJE7VYdDwcysGejQoUMaGTz99NNs2LAhHewnTpzIc889R69evTjllFOYOHEi9957L4MGDaKmpsbTR2Zme5qBAwemEHj88ce58sqNX8M9f/58jjrqKN588006d+7MlVdeyaBBg0pSh0PBzKzC7rrrLl588UVGjRoFZCGwdOnSTfp8/etf58EHHwSy23I0BEjh9NF1113H4MGDd6oWh4KZWYWdc845nHPOOQBMnTqVuXPnMn78+LR/9OjRm1yENmzYMIYNGwbg6SMzs1Jryimku1p9fT2/+c1vuPHGG3n44Yc32bdgwQKuueaastThUDAzq7ApU6Ywbtw4hgwZwmOPPUanTp244YYb+OUvf8mGDRs47rjj2H///VP/hukigFdffTVtH3744dxxxx07VYtKfXVcqQ0YMCB2i6/j9MVru44vXrNdbNGiRfTp06dir79mzRratm1Lu3btNtsXETt9X6bG3p+keRExoLivRwpmZhXWsWPHLe4r9436fO8jMzNLHApmZpY4FMzMLHEomJlZ4lAwM7OkLKEg6SVJr+Y/v8zbLpX0uqTFkk4q6DteUq2khZL6l6M+MzPLlOuU1HYRcVjDhqRDgYuBvkAP4FFJhwCDgRqgJ3AcMAnoV6Yazcwyu/q6ou24tuaNN97ggw8+4IgjjthsX319PZJKeppquaaPiq+QGwFMiYg1EfEysAzoD5wKTI6I9RExE6iS1LX4ySSdL2mupLl1dXWlrt3MrGwmTJjA7NmzG903duzYtG/OnDmsWbNmk6ubd4WSh4KkfYEDJS2V9F+SPk82OnitoFst0K2R9hV5+yYi4raIGBARA6qqqkpYvZlZec2YMYMRI0Zscf8zzzwDZDfJa9Wq1S5//ZJPH0XEh8CnACT9EzAdeACoL+hWD2wA2m6h3cxsj3XRRRcxf/58PvroI2prazn55JMB+Otf/8p7773HAQccwPPPP8+ZZ565Z33zWkT8VtJEYCVwUMGuamB5I+3dyUYRZmZ7rAkTJgBw+eWXc/TRR3P22WcD8MADD/Dcc88xduxYzj77bKqrq0v+zWvlmD7aT1KX/PeTgNXADOAMSftI6gN0Bhbk7edKaiXpBGBJRKwudY1mZpX2ySefcP/993PaaaeltunTp3PiiScCMHjwYFq3bl3yb14rx0JzZ+BZSX8CrgL+KSLmAXcDLwHTgG9HdrvW6XnbUuCnwCVlqM/MrOLmzJlD69ateeGFF4DszqavvPJKOvhfcMEFqW/DN6/V1NSk6aOamhqeeOKJna6jHGsKfwYObaR9HDCuqK0eGJ3/mJlVRgVuzz5kyBCmTZvG9ddfz9VXX82KFSu45557Gu1bym9e8xXNZmbNRMeOHTn44INZvXo1xxxzDBdeeOFm38JWav4+BTOzCps0aRJ33nknGzZsYOTIkTz11FO0bduWxYsXc8UVV3Dttdcyc+bM1L+U37zmUDAzq7CamhqGDx9OdXX1Ju29e/dm2rRp1NbW0q5dO8aMGQOwS6eLijkUzMwqrHfv3lvdXxwWpeQ1BTMzsu9C3hNt7/vySKFMeq79daVL2GMsq3QBtsdp3749q1atokuXLmX/TuRSighWrVpF+/btm/wYh4KZtXjV1dXU1tayJ95gs3379ts1/eRQMLMWr02bNvTq1avSZTQLXlMwM7PEoWBmZolDwczMEoeCmZklDgUzM0scCmZmljgUzMwscSiYmVniUDAzs8ShYGZmiUPBzMwSh4KZmSUOBTMzSxwKZmaWOBTMzCxxKJiZWeJQMDOzxKFgZmaJQ8HMzBKHgpmZJWULBUkPSro9//1SSa9LWizppII+4yXVSlooqX+5ajMzs0zrcryIpOFAP+ANSYcCFwN9gR7Ao5IOAQYDNUBP4DhgUv4YMzMrk5KPFCTtC/wYuD5vGgFMiYg1EfEysAzoD5wKTI6I9RExE6iS1LXU9ZmZ2UblmD66Gfi/wHv5dg/gtYL9tUC3RtpX5O2bkXS+pLmS5tbV1e3ygs3MWqqShoKkc4GIiCkFzW2B+oLtemDDVto3ExG3RcSAiBhQVVW1i6s2M2u5Sr2m8M9AJ0l/BPYD9gY+Baws6FMNLM/bDipo7042ijAzszIp6Ugh/zR/WEQcCVwBTAU+D5whaR9JfYDOwAJgBnCupFaSTgCWRMTqUtZnZmabKsvZR4UiYp6ku4GXgLXAqIgISdOBIcBSYBVwVrlrMzNr6coWChExGZic/z4OGFe0vx4Ynf+YmVkF+IpmMzNLHApmZpY4FMzMLHEomJlZ4lAwM7PEoWBmZolDwczMEoeCmZklDgUzM0scCmZmljgUzMwscSiYmVniUDAzs8ShYGZmiUPBzMwSh4KZmSVl/+Y1M2tmxuxX6Qr2LGPer3QFO8UjBTMzSxwKZmaWOBTMzCxxKJiZWeJQMDOzxKFgZmaJQ8HMzBKHgpmZJQ4FMzNLHApmZpY4FMzMLHEomJlZ0qRQkPQVSd+Q9KmidkmaKungrTx2L0kzJS2RtFjS8Lz9Ukmv520nFfQfL6lW0kJJ/Xf0jZmZ2fZr6l1SOwHnAOdJ2ht4GpgIXAa8HRGvb+WxAXwjIlZKOhG4VtKrwMVAX6AH8KikQ4DBQA3QEzgOmAT02763ZGZmO2qbIwVJnyU7sN8SEV8iO3DvBSwG/jYiLtra4yOzMt88BHgBGAFMiYg1EfEysAzoD5wKTI6I9RExE6iS1HXH3pqZmW2vrYaCpNbApcCVwLmS7gH+ALQDPgcslXT1tl5E0g8krQK+C4wlGx28VtClFujWSPuKvL34+c6XNFfS3Lq6um29vJmZNdFWQyEi1gM/iYjPAo8ABwKXRsSFEbGQbErpi5KO2cbzXB8RXcjC5WGgLVBf0KUe2LCV9uLnuy0iBkTEgKqqqm29RzMza6KmLDSfKun/AKcD1wFfk3SfpDuAzwBLIuLZprxYREwDOgArgYMKdlUDyxtp7042ijAzszJoSigsB94CDiCb+pkNdATGk33qf3xrD5b0mYZ1AUnHAmuBGcAZkvaR1AfoDCzI28+V1ErSCWSBs3pH3piZmW2/ppx91AkQ2af5hcChwFFkn+DrgEHA1G08/iFJrYC3gdMjYp6ku4GXyEJiVESEpOnAEGApsAo4awfek5mZ7aCmhMKHZAvAzwP/BbxOdrbQf5OtEYyS1DEi1jT24IiYDxzRSPs4YFxRWz0wOv8xM7Mya8r00bMRcQMwE/hX4Biy6xQGAw8BRwK9S1ahmZmVzTZHChGxNP+z4VP9NQ37JF0OvAfML0VxZmZWXk29onkzkq4EzgCG59M+Zma2m9vuG+JJ+oKkWWSno9YUXK1sZma7uW2OFPIpojfIblHxZeBlsgvYXihxbWZmVmZNGSm8Q7aY3Bc4mOwahY6lLMrMzCqjKQvNtzf8nt8L6e/J7nRaB3wnIlaVsD4zMyuj7VpTyO9eej8wlOyMo9lb+y4FMzPbvezQ2UcREcC4/M6n0yUNjIh1u7Y0MzMrtx0+JRUgIv5dUg+gK9k9kszMbDe2U6EAEBFX7YpCzMys8rb7OgUzM9tzORTMzCxxKJiZWeJQMDOzxKFgZmaJQ8HMzBKHgpmZJQ4FMzNLHApmZpY4FMzMLHEomJlZ4lAwM7PEoWBmZolDwczMEoeCmZklDgUzM0scCmZmljgUzMwsKXkoSGoraYKkJZJekXRa3n6ppNclLZZ0UkH/8ZJqJS2U1L/U9ZmZ2UY7/R3NTdAZeCwiLpJ0BPCspBeBi4G+QA/gUUmHAIOBGqAncBwwCehXhhrNzIwyjBQi4s2ImJr/vgRYD5wBTImINRHxMrAM6A+cCkyOiPURMROoktS1+DklnS9prqS5dXV1pX4LZmYtRlnXFCR9E/gfstHDawW7aoFuZKOGwvYVefsmIuK2iBgQEQOqqqpKWLGZWctStlCQdDkwGjgbaAvUF+yuBzZspd3MzMqgHGsKSPoFsC/wxYj4SNJK4KCCLtXAcqC4vTvZKMLMzMqgHGcfDQR6R8R5EfFR3jwDOEPSPpL6kE0nLcjbz5XUStIJwJKIWF3qGs3MLFOOkUI/YICkVwvaLgHuBl4C1gKjIiIkTQeGAEuBVcBZZajPzMxyJQ+FiLgVuLWRXQ8B44r61pOtO4wudV1mZrY5X9FsZmaJQ8HMzBKHgpmZJQ4FMzNLHApmZpY4FMzMLHEomJlZ4lAwM7PEoWBmZolDwczMEoeCmZklDgUzM0scCmZmljgUzMwscSiYmVniUDAzs8ShYGZmiUPBzMwSh4KZmSUOBTMzSxwKZmaWOBTMzCxxKJiZWeJQMDOzxKFgZmaJQ8HMzBKHgpmZJQ4FMzNLWpf6BSS1A0YCwyJiREH7pcD3gI+ByyLi93n7eODrwLvAeRExr9Q1mrVkPdf+utIl7FGWVbqAnVTyUAAWA88DHRsaJB0KXAz0BXoAj0o6BBgM1AA9geOASUC/MtRoZmaUZ/qoH3BzUdsIYEpErImIl8nCtT9wKjA5ItZHxEygSlLXMtRoZmaUIRQi4r1GmnsArxVs1wLdGmlfkbdvQtL5kuZKmltXV7cLqzUza9kqtdDcFqgv2K4HNmylfRMRcVtEDIiIAVVVVSUt1MysJalUKKwEDirYrgaWN9LenWwUYWZmZVCpUJgBnCFpH0l9gM7Agrz9XEmtJJ0ALImI1RWq0cysxSnH2UebiYh5ku4GXgLWAqMiIiRNB4YAS4FVwFmVqM/MrKUqSyhExCxgVlHbOGBcUVs9MDr/MTOzMvMVzWZmljgUzMwscSiYmVniUDAzs8ShYGZmiUPBzMwSh4KZmSUOBTMzSxwKZmaWOBTMzCxxKJiZWeJQMDOzxKFgZmaJQ8HMzBKHgpmZJQ4FMzNLHApmZpY4FMzMLHEomJlZ4lAwM7PEoWBmZolDwczMEoeCmZklDgUzM0scCmZmljgUzMwscSiYmVniUDAzs8ShYGZmSbMLBUlfk/RnSa9KGlnpeszMWpLWlS6gkKSOwI3AQGADsEDSAxFRV9nKzMxahuY2UhgOzI6IFRHxJvAY8OUK12Rm1mI0q5EC0AN4rWC7FuhW3EnS+cD5+eYHkhaXobaWYn/gnUoXsTX6aaUrsApp9v82Ybf693lIY43NLRTaAvUF2/Vk00ibiIjbgNvKVVRLImluRAyodB1mxfxvszya2/TRSuCggu1qYHmFajEza3GaWyg8DAyXdICkrsAg4JEK12Rm1mI0q+mjiHhL0g+Bp/Om70XEh5WsqQXytJw1V/63WQaKiErXYGZmzURzmz4yM7MKciiYmVniUDAzs6RZLTRbeUnqQnbFeDey60GWA49ExMcVLczMKsYjhRZK0jeBF4FTgO5k14d8FVgo6ZRK1mZmleOzj1ooSUuAwRHxVlF7V+CJiDi8MpWZWSV5+qjlag2okfb1W2g3KxtJ04AOW9ofEcPKWE6L4lBouS4HnpU0i+wmhPVktxUZBvywgnWZAcwEjgd+XulCWhpPH7VgkjqQ3a68B9nNCFcCD0fE2xUtzFo8SW2BxcDnIuIvla6nJXEomJlZ4rOPzMwscSiYmVniUDDbAZL2kuSztGyP41Aw2zHXAEMAJA2U1FHSkw07JR0m6S1JT+Y/f5Z0sqRqSftJGiPpHytVvNmWOBTMdtwX8j9/TiNfGwv8PiJqIqIG+EXedgFwbDmKM9sRvk7BbDtJugzoCfSRVEd276hHgKMLRgu/AvpJGpNvDwL+WN5KzbafQ8Fs+30e+DNwP3AhcHpEPCXpyXxUgKRPAwH0yfu9QhYKAytTslnTOBTMtt8TQFdgNTAuIp5qpM/hQDuyGw4uzdvalKc8sx3nNQWz7RQRtxZstmlYTCafPsp/3xt4G/gIeBM4ChhQ/mrNto9HCmY7ISIeIVtPoHD6KN/uDzwVEVPzu8/6eyqs2fNIwawE8msY/hWYnDcdDLwbEVdFxENk95ryPWas2fFIwWwnFF6bABxWsP0KMB8YLulWYB0wVtI/A6cDvYG7ylqsWRP4hnhmJSCpS0SsaqR9X6AT8H5EfFD2wsy2waFgZmaJ1xTMzCxxKJiZWeJQMDOzxKFgZmaJQ8HMzBKHgpmZJf8f4ePkAbyETC4AAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "fig = plt.figure()\n", + "fig.set(alpha=0.2) # 设定图表颜色alpha参数\n", + "\n", + "Survived_m = data_train.Survived[data_train.Sex == 'male'].value_counts()\n", + "Survived_f = data_train.Survived[data_train.Sex == 'female'].value_counts()\n", + "df=pd.DataFrame({u'男性':Survived_m, u'女性':Survived_f})\n", + "df.plot(kind='bar', stacked=True)\n", + "plt.title(u\"按性别看获救情况\")\n", + "plt.xlabel(u\"性别\") \n", + "plt.ylabel(u\"人数\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "C89799DF55AB47B283D543D89B63F3A3", + "mdEditEnable": false + }, + "source": [ + "歪果盆友果然很尊重lady,lady first践行得不错。性别无疑也要作为重要特征加入最后的模型之中。" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "159A753C26E24D25845EFCBE173B3D37", + "mdEditEnable": false + }, + "source": [ + "再来个详细版的好了。" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "409F149602FF49C19E127C9C074FE6B4", + "scrolled": false + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEKCAYAAAARnO4WAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAwpUlEQVR4nO3deZgU1fX/8fdhR3ZwEBUFV1QU+emoRFFQEtFEjTFi3PWLSnCNuESjIsYoGqMGYwKKGy5xIUYMStSYIBgFIwOCIsoiArLpDDqyCzNzfn/c6rEZZunu6YaB+ryep5/pWrrq1umaU7du3ao2d0dERLZ/9bZ2AUREZMtQwhcRiQklfBGRmFDCFxGJCSV8EZGYUMKXrDCzBlu7DLVhZvW3dhlEck0JPybMrLWZFaQ474Fm9nKaq7jRzC6qsJyhZrag4sHAzBqY2UdmdkCF8VPM7JAqynS1me2TYtkHmNmfzOw/ZjbPzPaq4TN7A/9JGm4a/b3NzK6uaZ1VLPNTM+ucNHyPmf08abh+tL07VrOM28zsiqThC83s3irm7WFmz5vZXWZ2fDTuKjO7sJJ5f2Bmv6tk/Htm1jHVbZRtzzZdK5PMmdlaYE6F0c3dfe8K8z0DHAasqzDvLkB/d3/VzJ4CTgKWmdnh7v5LM+sC/AR4BvgNkJxgLgV2AB43s8S4N6JlDk8aN97dbzKz/YD+wAgzuz96/w2wJppvH+Awd/8Q2ANoCxwAPA885u5lZvYZkHzTSTPgdHd/t5LwvFJZQkyKSQkwL2nUTsAlwPvA4e7+YoX5GwLnAMnJuhdQH1htZk2icRuBvwA9o+H2QImZDYyGWwNNzOyEaHgKcBGh4lYfMOBB4GfAv4CfA1dVsgmNgTZm1hz4d9L4rsBYM9sQDY8DZgLu7i9HB7BPgIUVltcQ+Mbd8ytZl9QhSvgxYGY/AfYEdjSzi4HJwFJ3715hvnlm1g/oCxxgZndHk85x94IK8yYnrwOAU4AZhGTZBvgrcAUhCb5jZp+7+zNm1gc4GTgEeJaQlMqiz/4A+BNwkbuviNZj0bgBwGmEA0gHYKS7TzCz/YG/AR8DuPsrURk6AoXuXhaVsZ27t04q/8uERJWwW1QD/1dUjqnAsVWEtNjd90ta1kPR2z2BC4EXk6adGpW5CHjfzDoAPQgHgDbA9GjWnYDr3X2gmTV193Vmdgkhkb4YLeskYC93fyBp+YOBXwBNgXZR+TGzicBRwAtJB9BxwD1AK+A4wvd8D7B7NP1q4B/At9HwP4BGwHgzKwM+BGa4e4/kYEQHgk0OclI3KeHHwxJgL+A7Qs30W2AXM5teybxTCDXGvQn/xFcDfzWzdYSa4X6E5LwLMCHpcycCOxKSyb+Bp939vwBmdjLwn6hp5WvgbHf/1szeINQqjwXucfdFZvZXIJ9Q4we4lnB2sRtwG3BEVL6doum3A0PcvTRa1wtAF2BX4CdmdhuQSMibMbNTCGcwa4E7CWcI64D3CDXs0uggCfCwuz9IqGUPTFrMAWxaU67oRXe/MFrfBKAj8CNgH3f/Lhr/CPBd1Jw00czOJhwY3kxaTk/g32b2JPCUu/8HGAG8BLQgfFdnA90JCf1ed7/RzI4AbnT3a83sFqAJMAn4MTAy2nai7f6McHACWO3uhVGM/kA4i5FtmBJ+DLj79Kj2ugPQ2N0Xm1mlNXx3XxCd6he7e0FUOzwnet8BeN7de5vZ0cDipI8/TUiWJcBY4KSoRprwODDJ3Seb2Vtm1iIafz4hOa83s8uTynKrux9FaELYQGjqOMvdi6Ok2A94gVDb/62Z/c3d67n7L8xsB8JBbpC7vxwtL3G2UtHRwKfACuBWYDRwqru/Ex0sit19WMWQAjtH5VpIaIpJxwmEs5bkJqYGwHdRzf56wkF1uLs/nzRPHqHGfS0w2MzejsaNBhKxu45wADoGeC0adxAhkUM4C3mb0Dz0OiGuCQYMSirXF8DP3H0qcFzyNQnZNinhx8d+QDFwX9RmvIuZzaxi3uZAFzP7PaE2najhNwA6J50ZLDWz2cD+hGaLesBK4AlC7bFfNN/pQEt3nwzg7ps0lUTJ+FN3H1WxIO4+zsx+Q2iLT1xYbQV0MLNd3f1oM+vJptcIzou24Rozm+vuH1cTlyaEs5rzgALgt+7+TjXzQzhbugL4KmqmGks4e/pBDZ9LeJTwffzVzM7w8ECrRoQa/imEM42JwNcWLoQPij63K9AHWB0N3+LuQ6JrKN0ITVQHEeLfknA2RFSucdH7DcAiID+6/rIeOLCSMpa4+58TA2bWLnp7sJktIhwwCgkHvmWEaypSx6mXTgyY2Y+AuYSkdD3houZSwqn/OuAgdz+QkGCGEy78rSW0sbcETozOBn4IFLh79+j1Y3cfRKiFX0poRkj4zt0Xu/tiQhv9N1FZnjCz6ckvQo33ajObZWaeNO2vUTK/lZDgXzOzAwlNIkOAwdG6riJqtjGzxsCVhPbntwnt+QdVFRt3vzLaVtx9LdDAzGZGB8PLgBsSw2Z2VNQ8NRI4E7g0uhbwFFW39wP8zEKvnU+Bw6NxLxAOSrdEw42i76ctoXnsDeBid3/M3Q+Mvp/PgCcTw+4+JPrsg4Qmrp2Bl9x9POEgu97MjiNcPH8j2saB0feRcCbhO55X4ZWILWbWiXDxdgPhwvMgwllFX2AaoenpDaTOUw0/Hq4i/AMf7+6vmVk94GZ3L4maBY4ysw8JFyrHAMOBO9x9RtSMc5KZnU/YX/ay77t3XujuMwlt+ycRasjlomabEkLb8zMA7v5/VRUyOvOYl9zUZGY7E9qlPwU+J1yc/I5wofZXZjaE0Ezxt+gjNwDvROstIHS3XJlqoKI2+gejdd9GhSYdM2tFOMANA54D/kdIgLvyfbNJRWMqtOHj7m5m5wKjoiaqpsA6dx8V1fJPB9zMdnL3L6OL0E2BU8zsd+5eEi1vN8K1limEXks3m9ku7v4XM3uA0L4/wt3XULWzCGdkyXZIen8p8Dd3X0poShqcvK3uXgjcXM3ypY5Qwo+Hh4AFiYGom2Ji8B5CMt4JeIxwIbYLQNQWvoFQ67w3uT3ZzEYBzS0sqDGhd0ifCuu9iHCh7wPg4yiRlQBDo+lN2bS7pwHtk5qMfuvuY8xsGtCbkMynAq9H23An8CpwXlJvnAMIzS23Rdv6VlTeFMJUs+hic6IZ5DV3/zpq7lpbw0crW9YKQo8lousma6OLqj8Cfko4iOUBXwK/Bx4gnJX9CrgvWsz5wG8JF8OvBs4Ahli4kWwnwllEoptlVQa5++vJI8xscfS3NaEb7GHRcDPg/6LyJc/fAfiBu49JPQKypSnhx0DUDt66ism7EZoXjgAejBJpc8Jp/xmE2mPLahbfC/jA3d81s48JTRslQIuoZjwMwMwuAHq5e3/gnxb6pg8j1KBvjuZpS+j21z2xcAv97vMJFyCfILR/n2BmXQlnIg8Dw8xsdXSB9nx331BZgo+aVBJ2TZQtHdEB7gG+T/ZGuIbxurtPACZEZ1CtgdLoY6ebWe/ofYdoOQ2iGH0TnQntB3wFvAzcTWhu3Y3QzPYHQuJ/lHD9YpqZrXT3RwgHgjJCLX1jtLxbCAeEIwkH76ejM6Vfu3txJZv1x+hsJlmr6O/1wBvuvjBK6i8AL7v7RxZ6XTWK5juA0CVVCb8OU8KPoSiht4uac9a4+4lmtgfwLzNbRWiCuR04l1CTux6408yuS1pMZ8KZw4XAw2Z2F6Gf/MuEGum3Fm52WkdIXjsQ+p4D4O4bgcvNbIco2VxNuHno2eSyuvs1SeV+htA2/2NCE9WV7v6Cha6Yz5rZse7+q6q2u0Lf+ZcricsxhINIQsVumUsI/f33A64ws6WE/6GPgXejpP4U4cyliHAxEzbvlgkhURZZuAEO4B9RUm1EuKhqUSweIvQCOjU6i/kmuibzDwt36U4HXiE0c10QLetNQrfL3u6+1sx+SDgw/Bq4qZLQbFLDj66DJG6umgh8YeEO6H8CD7j7XUnxaGJmc/m+95DUZe6uVwxehBpnQfS+C6HNt1eFeS4ABiYNN0xhuTskvbccb0NvQoLtB+xRYVoesHfS8J8JSTKV5e4NTEhx3g5AgyqmNSTcxLR7Yh5CImxexfxGODg0rmxa9HfnKj7buLLPRdOaVLW+pDj+OXr/KHBC9P44Qtv8fOBPFT5bD+i0JfdZvbL/SuwAIiKynUu7W6aZNTazS82s0rY6Cw+vmmFmC83swag9U0REtrJMkvFs4HjCrdyVGQ7cSOgq143wjBUREdnKMrlo2z163VJxgpnlEdpWX4uG/0q4qeblCvMNIDwMi3bt2h3auXPnDIoRH9OnT6d79+5pf27BggUottXLNLag+NZEsc2tqVOnurunVWlPO+F7eJZJVZM7EnoYJCwm3OVXcRkjCXcrkp+f7wUFKT2mPbby8/PJJEaZfi5OahMjxbd6im1uRfd/pCXb7euN2PS27TK+74ssIiJbUbYT/jLCDS0JHQlP3BMRka0sqwnf3RcBa8ysd3Rr93l8/4wTERHZilLqhz916tT2DRo0eJTw/JB63333XZPVq1e3ateu3ZcA69at26G0tLRB8+bNV27cuLFRcXHxjmVlZfWaNm26umXLlsXVLXvFihWddt5552xsyzbL3SktDS1f9evX3+y5L8uWLSOTGGX6ue1JrmJb289uDxTbLaNJkyZ07NiRhg0bbjLezNa6e7N0lpXSRdsGDRo82qFDh/3z8vK+qVevXuIIsbiajyxNtQCzZs3qtP/++6c6+3bp888/p0WLFrRs2RJ3p3HjxptMd3cyiVGmn9ue5Cq2tf3s9kCxzT13Z8WKFSxevJg99tij1stLtUnnwLy8vJVJyV6yaP369bRr146GDRtSVlZW8wckZYpt7ii2uWdmtGvXjvXr12dleTXW8M2s3n/+85+dZ86cuQvAbrvttqhNmzYrly5d2r6wsHAnM/OOHTsuatu27UqAhQsX7vrNN9+0q1+/fknnzp0XtGjRIu3HxsZRth7fK5tTbHNHsc29bMY4lRq+t27duqhbt24zO3bsuGjJkiW7rlu3rnFRUVH7rl27frzXXnvNW7RoUeeysjIrLi5usWbNmubdunX7sGPHjosXLlzYOWslFRGRWqkx4bu7169fvxRgw4YNjZo2bbr266+/bt26deuvGzRoUNasWbP1jRo12rB69eodvvnmmzZt27ZdUa9ePdq0abOypKSkwYYNG7L+COalS5cyZ86cSqeVlZVR2wfCrV69mqVLq74M8eijj6a0nMQFrW1JrmI7efJk5s2bt9n4devWUVhYWD68Pce2Otpvc0v7bpBSMl69enXLDz74oGP9+vU37rvvvnPrH3T7QQAbw29osk+Ybb+O0fwboRNA1zB48MZqlr0PwNyh1cyxueHDh9OpUyf23Xffzabdfvvt9O7dm969e/Pee+/RtWtXTjzxRN55p/Lfpe7bty/33HMPX3zx/e0Cn376Ka+88grXX3/9JvP+8Ic/ZNq0aUybNo3Zs2fz05/+lJKSEjp06MDKlSvp3r07Tz31VPn8Z555Jk888QTNmzcHYODAgSxYsIDXXnut+tO0iZveYXhAJeNSUf65Xvkpfybbsb3//vvp2rUrQ4YM4fe//z1HHXUU69atY+3atRQVFdGoUSN69erFc889x6RJk3Ie25dmL9t0RP02fFpxXKrqtwkxrsF+++1Hhw4dAPjyyy9x902GP/nkk+1iv+37u3Gbj/z75ylEqHJvHFB9dOcsLQbghGMOJ699ewCKCgtx902GX5v4Px68724O/0FPjjiyJ9OnTmHvLvtxybn9eO7l19l3l9abb0sd3HezIaWE37x585UHH3zw/KKiotZz5szZZ2tfOx83bhxvvvlmldP/97//0bt3b6666iomTJhQ5XxffPEFq1atwswoLi4uH9+hQwcuueSSTcZBqCXcc889DB8+nKeeeopbb72VNWvWcMkll3Dbbbdt0qugT58+mBndunVj7NixLF++HDOjd+/eDBs2jEGDBmW6+TmV7dh27dqVTz75hLy8PNq0aUP79u25//77GT16NHvuuSf9+vUr/8z2GtsOHTqUx+qZZ56hpKSECy+8EAjJOEH7bWby2rfn6RdfBeAff3+B0pJSTvvF2QBc+ItTy+eb8UEBRxzZkztuvZGnX3ylyuVtz/tuWs0tO+64Y/HixYt3z1VhqnPZZZcxbdo01q5dy+LFiznppJMA2LBhA8XFxbRv354PPviAs846q/xLWrZsGccffzwfffQRPXv2BKB///70798fgFGjRnHBBRfQrVs3unXrxiGHHELLlpv+mt+3337LT37yE+644w6eeeYZmjdvzvDhw9lpp50oLS2lsLCQjRs38tprr3HLLd8/T65NmzYccMAB9OzZkzfffJNx48YxduxYmjZtykUXXcQ111zDvffeS716W//p0bmMLcCvf/1rzj33XABatWrFHnvsQceOHVm79vvr+dtrbAGWL19O7969ge9r+KNGjSofHjZsGAsWLNB+m6HCr77ivNPDPpuo4Y/527Plw6MeGcGSLxbx2bw5tG23I4VfLqf/Wacx59NZnHXqCTRt1CA2+24qvXT2nDp1an2AlStXNqtXr95W6X81fHj45bkbb7yRgw46iHPOCb+W98orrzBlyhRuv/12zjnnHDp27Mgpp5zCiBEjeOGFFzjyyCPp2bPnZqfG7s7o0aM3Gd+0adPyLzchue1u3333ZenSpYwdO5a3336bRx99lDFjxtC/f38aNmxI586dGTFiBCeffDIAPXr0YPz48axatYq9996ba64Jv9a3YcMGevbsWWf+aXIZ25deeomZM2dSv359AF577TV69uzJihUr2LhxI4888ggAQ4cOLa/1bE+xBWjevHl5jX7y5MmUlpaWJ/IRI0YwZcoU9thjD+23GdqhWTN+1i/U6KdPfZ/SsjIOPawHAM89/TgfzZhGx906cdzxJ/LcU4/zxxGPc8hhR3DWqSds1qSzve+7qdTwW69YsaLDhx9+mFe/fv2SPfbYYz7hR5u3uO+++46xY8dy2223lY8bM2YMF18cfnL06KOPZvny5bRt25abbrqJI488ssplTZw4kYMPPphWrcJvNf/85z9n0qRJfPzxx5vM9+233wLQtm1bBg0axLBhw3jkkUdYu3YthYWFlJaWsnTpUt555x2OOeYYjj76aEaNGsV///tfiouLmT9/PmPHjuW6667j9dfDz4aecMIJnHbaadkMTa3lKrbr16/nhhtuKJ924oknMmrUKN5//33GjBnDXXeFn0d1dx566KHtMrY9evQoT/Bvv/02N930/c/KTps2jQMOOED7bS10PySfQw8PCX7K/yYx8Mryn0Fm1swZ7LVPF4oKv6JV6zb88sprOOSwI6pc1va+79aY8N192owZM5Z069atKDGuuouwufTee+/RoEEDZsyYwRFHHMEnn3zC3Llzy/9BBg4cWJ6wNm7cWP5PlnxqfNddd3H00Ufz+OOPl5/CQfgHcXfeeust1qxZUz5+r732omnTptxxxx1MnTqVuXPnMnToUHr16sXs2bO5++67uf322xk4cCAHHXQQrVu35sILL+TVV1+lW7du5euFsGNs3LgxazdRZFOuYnv22WeXN18k23vvvZk7d2758PYa26effpqZM2eWHzinTZvG/PnzN5nn3HPP5Z///Ceg/TZdL7/4PHNnf8It118FwMcffcjiRQs2meeU085g4vhwXaqkZCNnnXoCwCZNOnHZd7PeZTKXevXqxUsvvcQ999zD4MGDWbJkCc8//3yl8x5//PEcf/zxAJudGq9atYpp06bx5JNPbva5IUOGMGDAACA8z2PChAlcccUVQHhG95QpU/jss89YsWIFw4YN48knn2TChAkMHDiQadOmlS/H3WnWrNkmp9Z//vOfefbZZ7n44osZPXo0p59+ep05Pd4SsU0oKipi4cKFfPXVV5SUlDBp0iS6du26Xcb2vPPO47zzzgPgxRdfpKCggLvvvrt8+lVXXbVJzwztt+k59fQzOfX0MwF4/dV/MPPDD7juptvKp98x+AZIim/PXsfRs9dxAJs16cRh380o4TecO3Rqtgowa9asQ1Pp2pbQokULdt99d6ZNm8bhhx/OpZdeyuDBg+nbt2/Ky3j++efp169fpV2giouLy/vTrl27lmOOOYbvvvuOevXq8eyzzzJq1Cj2339/NmzYwMknn0xRURHjx4+nffv2PPjggwwePJiioiKaNm3Ktddey5AhQ4BQc7vyyiu58sorcXcOOeQQzjjjjMoLWKEb5axZszighi5qlUn3c7mO7Zo1a5g8eTKnnnoql112GSeddBIPPvggc+fOZdddd2XixIk5j+1pXTZ9IFemsU18NuqZXK2ysjKee+457rvvPt54441Npk2fPp1bb701pfXV9f32jcGb/tZR7WObmrKyMl59+UWeePgvPPbs3zeZ9snHH3HZoF+ntJy6vu9mwzZTw3/sscd48sknKS0tpX///kyaNIlGjRoxe/ZsfvOb33DnnXdu0p0w+ZRp3rx55cP77LMPc+bM4dlnn610PX/84x/p06cPEL6M888/n379+nHTTTdx+umnc/bZZ7Ns2TL69OnD0KFDOe6447j33ns59thjufzyy+nbty8//vGPueiiixgwYABTp07l2muvpUuXLhx66KE0bdqU9evX16m20C0V29WrV3PWWWdxww03MHDgQG6++WZuueUWPv/8cx544AFuvvnm7S62o0ePLj/VHz9+PK1bt+YPf/gDjz/+OKWlpRx77LHsuOOO5fNrv03PP8eO4eEH7+ewHxzFk6PH0rJVKx4d8Sf+/vxfKSst5Ygje9K2bbvy+RPNOQCLFnxe3qQTl303pccjz5gxY8HBBx9cVOOMGZg1a9ahqdQCZs+eTbNmzejYsWOl0xcvXlzltGTr16/niSee4NJLL027rBXXZ2bUq1dvk8e4FhUV0apVq80eZVqdTz75pLyv7rp162jatOkm03Ndw1dsM6+F1vTZVatW0ahRo82eJAnh9D7VG20U283NWVrM6tWraNSwEY1qEd99d2ldJ+ObLDnWCTl7PHJd0KVLl2qnp5KQIDxburZfanXrS66tbSsU29xp0aJFldPSuatSsa1c8+aKbzrqxpUXERHJuVRr+GVlZWWm5+HnTm0fnCVVU2xzR7HNvWzGONUa/szCwsJWZWVlevh1DjRp0oSioiI2btxYZ7q7bS8U29xRbHMv8YtXTZo0ycryUqrhl5SUXLx8+fJHly9ffiBZbgZasWJF7H9Ewd1ZvXo1y5cvr/S3QRMPWEpXpp/bnuQqtrX97PZga8f2y+J1GS27otJvm9Y801aU+E3bbEipl04u5efne0FB+o/+jZP8/HwyiVGmn4uT2sRI8a1ermNb6eOYM1Dx/oFtRSa9dHQeJiISE0r4IiIxoYQvIhITSvgiIjGhhC8iEhNK+CIiMaGELyISE0r4IiIxoYQvIhITSvgiIjGhhC8iEhNK+CIiMaGELyISE0r4IiIxkXbCN7MzzOxzM5tnZv0rTDvczKaY2admNszM6mevqCIiUhtpJXwzawHcB/SMXkPNLC9plhHARcABQCfgxCyVU0REaqnGhG9mjcxsuJnNAeYA8919CTAbaAZ8ZGa3R7MvAn4d/e0DtMxNsUVEJF2p1PDbAuPdfV/gCeAIM2tMSOojgN+7+63RvK8DZwHNgamE5L8ZMxtgZgVmpp8LEhHZQmpM+O6+3N1fjAa/BcqADsA30ftSADNrCvwe+B2QB3wJdDKzDtUtv7CwMOPCy+ZGjhxJfn4++fn5im0OKL65o9jmXroXbfcGiglNOV2By4HzzWxv4CDCAeBdd98IPB19ZueKC3H3ke6e7+75eXl5FSdLLQwYMICCggIKCgpQbLNP8c0dxTb3Uk74ZnYj0AMwoIiQ8AuBvwNPAp8TDgQ7Wvi5+VOAdURnACIisnU1SGUmM/sLIZkfAZwBTI4mXQs0AQ5x90IzmwjcT2jaeZ+Q7Bdnu9AiIpK+VHrp9AC6uPuF7r7W3UcBRwLd3H0M0Ar4bzT7w8AsYI/o/Wx3/zonJRcRkbSkUsPvDuSb2bykcY8BA82sBJgHXBKNHwP0AuYDK4Czs1dUERGpjRoTvrs/BDxUyaS7Kpm3DLgqeomISB2iZ+mIiMSEEr6ISEwo4YuIxIQSvohITCjhi4jEhBK+iEhMKOGLiMSEEr6ISEwo4YuIxIQSvohITCjhi4jEhBK+iEhMKOGLiMSEEr6ISEwo4YuIxIQSvohITCjhi4jEhBK+iEhMKOGLiMSEEr6ISEwo4YuIxIQSvohITCjhi4jEhBK+iEhMKOGLiMREjQnfzBqZ2XAzm2Nmc83s59H4X5nZIjObbWYnJs1/t5ktNrOPzOzQXBZeRERS1yCFedoC4939MjPbF3jfzGYClwNdgd2Af5tZJ+BooCfQGTgWeAzonoNyi4hImmqs4bv7cnd/MXo/BygBzgRGu/sqd58FLAAOBU4DRrl7ibu/CeSZWYeclV5ERFKWVhu+mf0f8CGh1r8wadJiYGdCbT95/JJofMXlDDCzAjMrKCwsTLvQUrWRI0eSn59Pfn4+im32Kb65o9jmXsoJ38xuBK4CzgEaAWVJk8uA0mrGb8LdR7p7vrvn5+XlZVJuqcKAAQMoKCigoKAAxTb7FN/cUWxzL5U2fMzsL0Az4Ch3X2tmy4Bdk2bpCHwBVBy/C6H2LyIiW1kqvXR6AF3c/UJ3XxuNHgecaWY7mNn+hCae6dH4C8ysvpn9CJjj7l/nqOwiIpKGVGr43YF8M5uXNO4K4BngY2A9cLG7u5mNAXoB84EVwNnZLa6IiGSqxoTv7g8BDyWGzewMYAShbf537v540rxlZvY14cyhbfQSEZE6IKU2/AQzawHcB/QgJPzpZvaKuxdG0/sD+cC+hJp/4+wWV0REMpXuoxX6AhPdfYm7LwfGA32Spg8Crnb3dR6sr2whyd0yMyu2iIikK92EX7GffaL/PWbWEOgA9I8etzDGzNpVtpDkbpmZFFpERNKXbsKvrp/9jkAb4C1gP2ARcHNtCygiItmRbsKvqv89QBGw2t3fdHcH/gF0qX0RRUQkG9JN+G8Afc2sffSMnCOBfwG4+0bgf2Z2QjTvScCUrJVURERqJa2E7+5fEpppJgPvAtcCx5vZddEslwKDoz77OwN/yGJZRUSkFtLqlgng7qOAUVVMmw8cVbsiiYhILugXr0REYkIJX0QkJpTwRURiQglfRCQmlPBFRGJCCV9EJCaU8EVEYkIJX0QkJtK+8UpEUvfS7GVZWc5pXXbOynIk3lTDFxGJCSV8EZGYUMIXEYkJJXwRkZjQRdsMbNznpqwtq+HcoVlblohIdVTDFxGJCSV8EZGYUMIXEYkJJXwRkZioMeGbWWMzu9TMxlQY/62ZzYtetyeNv9vMFpvZR2Z2aC4KLSIi6Uull85s4AOgRWKEmTUGFrn7QckzmtlxQE+gM3As8BjQPUtlFRGRWkilSac78ECFce2AbyqZ9zRglLuXuPubQJ6Zdag4k5kNMLMCMysoLCxMt8xSjZEjR5Kfn09+fj6KbfYpvrmj2OZejQnf3YsrGd0a6Gpmn5nZq2a2dzR+N2Bh0nxLgM2e+uTuI909393z8/Ly0i+1VGnAgAEUFBRQUFCAYpt9im/uKLa5l9FFW3ef5e7tgH2At4Ano0mNgLKkWcuA0lqVUEREsqJWvXTcvQx4GOgajVoG7Jo0yy7A4tqsQ0REsiOjhG9mO5lZs2jwXOD96P044AIzq29mPwLmuPvXWSiniIjUUqbP0tkTeN7MSoB5wCXR+DFAL2A+sAI4u9YlFBGRrEgp4bv7BGBC0vBkoFMl85UBV0UvERGpQ3SnrYhITCjhi4jEhBK+iEhMKOGLiMSEEr6ISEwo4YuIxIQSvohITKSd8M3sDDP7PHoOfv8q5vm1mc2rffFERCRb0rrT1sxaAPcBPQgPRZtuZq+4e2HSPDsBF2S1lCIiUmvp1vD7AhPdfYm7LwfGA30qzPMnYGg2CiciItmTbsKv+Lz7xSQ9797MziU8Q+fd6haS/AMoaa5fREQylG7Cr/J592bWFbgMuKamhST/AEqa6xcRkQylm/ArPu++I/BF9H5ANG068B9gdzObWdsCiohIdqSb8N8A+ppZ++i3ao8E/gXg7r9y907uvh+hXX+Rux+Y3eKKiEim0uql4+5fmtnNwORo1LXA8Wa2l7vfm/XSiYhI1qT9AyjuPgoYVcM8C4C9q5tHRES2LN1pKyISE0r4IiIxoYQvIhITSvgiIjGhhC8iEhNK+CIiMaGELyISE0r4IiIxoYQvIhITNSZ8M2tsZpea2ZgK439lZovMbLaZnZg0/m4zW2xmH5nZobkotIiIpC+VRyvMBj4AWiRGmNlewOVAV8Iz8v9tZp2Ao4GeQGfgWOAxoHtWSywiIhlJpUmnO/BAhXE/A0a7+yp3nwUsAA4FTgNGuXuJu78J5EVP1RQRka2sxoTv7sWVjK7ql68qjl9C0i9iJST/4lVhYWHFyVILI0eOJD8/n/z8fBTb7FN8c0exzb1ML9pW9ctXVf4iVrLkX7zKy8vLsAhSmQEDBlBQUEBBQQGKbfYpvrmj2OZepgm/ql++qjh+F0LtX0REtrJME/444Ewz28HM9gfaEn7acBxwgZnVN7MfAXPc/evsFFVERGoj7R9AAXD3qWb2DPAxsB642N096rrZC5gPrADOzlpJRUSkVlJK+O4+AZhQYdxQYGiFcWXAVdFLRETqEN1pKyISE0r4IiIxoYQvIhITSvgiIjGhhC8iEhNK+CIiMaGELyISE0r4IiIxoYQvIhITSvgiIjGhhC8iEhNK+CIiMaGELyISExk9Hlm2IxMLsrOcXvnZWQ7UzTKJbAdUwxcRiQklfBGRmFDCFxGJibQTvpmdYWafm9k8M+tfYdqlZvaxmS00szuzV0wREamttC7amlkL4D6gB1AKTDezV9y9MJqlDOgONALeN7Nx7j4pi+UVEZEMpVvD7wtMdPcl7r4cGA/0SUx094fdfaO7rwE+BfIqW4iZDTCzAjPLUncMERGpSbrdMncDFiYNLwZ2rjiTmXUFDgcuqmwh7j4SGAmQn5/vaZZBRDL00uxlWVnOaV02+7eXbUC6NfxGhGabhDJC0045MzsBGAuc7e7FtSqdiIhkTboJfxmwa9JwR+CLxICZnQkMAfq4+39rXzwREcmWdBP+G0BfM2tvZh2AI4F/AZhZY2AocIK7L8hqKUVEpNbSasN39y/N7GZgcjTqWuB4M9sLeJVQ+59qZomPPO3uv81WYUVEJHNpP0vH3UcBo6qY3Lg2hRERkdyp1Z220U1W86LX49G4X5nZIjObbWYnZqeYIiJSW7V9WmZjd987MRA17VwOdCV04fy3mXVy9421XI+IiNRSbZ+lU7EP/c+A0e6+yt1nAQuAQyt+KPnGq8LCwoqTpRZGjhxJfn4++fn5KLbZp/jmjmKbexknfDNrBuxkZvPN7C0zO4wUb8xy95Hunu/u+Xl5ld6MKxkaMGAABQUFFBQUoNhmn+KbO4pt7mXcpBM9PqElgJn1A8YAr1DDjVkiItujvr8bl7VlvTH4J1lbVrKsPB7Z3f8GNKGGG7NERGTrqU2TTiszaxe9PxH4GhgHnGlmO5jZ/kBbYHo2CioiIrVTm146bQm9cACWA/3cfYaZPQN8DKwHLnZ3PRxNRKQOqE0b/ufAXpWMH0p4xIKIiNQh+olDEZGYUMIXEYkJJXwRkZhQwhcRiQklfBGRmFDCFxGJCSV8EZGYUMIXEYkJJXwRkZhQwhcRiQklfBGRmFDCFxGJCSV8EZGYUMIXEYkJJXwRkZhQwhcRiQklfBGRmFDCFxGJCSV8EZGYUMIXEYkJJXwRkZhQwhcRiYkGW7sAqdi4z01ZWU7DuUOzshwRkW2RavgiIjGhhC8iEhNpJ3wzO8PMPjezeWbWv8K0A81sgZmVmFmxmV2UvaKKiEhtpJXwzawFcB/QM3oNNbO8pFkeBpoBewCzgHsrTBcRka0k3Rp+X2Ciuy9x9+XAeKAPQJTYuwBvuPsXwCigKDFdRES2rnR76ewGLEwaXgzsHL3vCKxKmr44+rszFZjZ08BpScNr0yxHZRoAJdXOYXdlYTUpq7k8ULFMVX2mkZnNSHG9OwKJs6omZrY+xc9VJ7Vtya6a1pmtMqUTW9g68d3S8a/LsU23bDXOb7emsbTaS6n8KZapaSYrT0cjoCxpuAwoTZpG0vSyCtPLuft5wHkAZlbg7vlplmMz2VpOtmRSnrq2DQlbo1w1rbOuxioTdW1b63Js0y1bXduWbJbHzArS/Uy6TTrLgF2ThjsCXyRNa5Y0vSPgSdNFRGQrSjfhvwH0NbP2ZtYBOBL4F4C7LwJWACdH0/oDLRLTRURk60or4bv7l8DNwGTgXeBa4Hgzuy6a5RxgPaH9fj/gCndfU8NiR6ZV4twvJ1syKU9d24aErVGumtZZV2OVibq2rXU5tumWra5tSzbLk/ayzN2zuH4REamrdKetiEhMKOGLiMSEEr6ISExssYSfwjN4ZpjZQjN70MzqReMHmtmZVSxvRzN7OWn4GTPrXMl8z5tZbzP7VdJ8jSrOl8H21Lj+qtZdQywuNbOPo1jcWdtyVlP+OhHbqmKRXAYz+7WZrUo1vumsP1cyiW/FWFSyfR+a2RfR86qOrEXZMt53M11nBmWs7n/kbjP7zMw+NbNhZlY/ado2mxeqWWbjKC+MqWJ6pfmz0nm3xEVbC8/gmQX0INyINR04yN0Lo+lvA3cBc4FphD793wDto/lXJC1uFrAv4aaxfYBPovH7Eu7y/Q74NHrfl/C4h3nRcn4JTABmR5+Z4u6D0tyW4UC3Gtb/A0JPpsrWPQ84BHgSuLWSWPwSeJxwI9v7wCXuPimdMlZR7j2BZ5NGbfXYVrFf/LNCGeYBBwKNgQ+oOb4Zf7e1kYX4fkHYzr9H7wcBKwn/E6XA68BgYCrhhsZp7n5VmmWs7b67RWJbyX7xUVS+xB2q/w9YTsgT+wJfAZPYtvNCles2swWEfb+Fu/+wknUn8ue/CI+7+aO7v1xpQd095y/gdOCZpOFngTOj93nAF0nTBgAPRe8HJuZLmp4H9AZaA/cnjb8bOAhoHw3vB+wOPB/NeyxwOfDjaPpbGW5Ljeuvbt1RLJZXFotK1vV34Kc5+k62emyr2C8uSS4D8AKhu+83qcS3Nt/tVo7vNcDfkuL7MiGJJbZvCXDR1tx3t1Rsq9gvzkwa/gC4jVApGkc4AGzTeaG6dUfz9gb+XcV6K82flb22VJNOTc/gWVTFtMoUARcCo4Hnksb3IBzpbzaztoQj7Z+iab+IXtcC7c2sIWCZbEgq6yd8uZWuG+hEONonVLq9ZtYVOByYmGE5M7GlY1vZftE8qQzfEGrI7wJNSCG+tfxuc626+DYDOiXFtythm38BnAV0AE41s9nAgWbWLsvrr0uxrS5fQDgLvIlQ5lXu/gHbeF6obt3uXlzNetPKn1vqJw5regZPGZSfFh0L7GRm7xFOkTaY2dVJn90L+C/hwUElZvZONL47oUYMMMjdB5vZCOAiQtIYRjhNPIZQW/pfhttyKdCyhvWvBKpadynfP1gOKnnekJmdAPwFOLuGLztlUWwPSRpVF2Jb2X7xA8J+2Y5Q01sN/I3QpJNKfGvz3WYsC/HtBNTn+/hOIzQNrACeJmxrQ+AKwun7zYSzgnTUdt/dUrGtuF/8EGhrZr8gJMXDCPvFbOCHZjafcFf/tpwXMl13dbl1c1vo9PZ84PGk4WeAn0XvdwfmJ027GHgAuJLQtvUucEbS9EMI/xwLgROicS0J7Xv7Jc1nwL3AHODNxDRCjXk8od08k22pcf01rHsWMKayWETDZxLa+Trn8PuoE7GtYr+4NirDSuBLQrvrfMJzmebk8rvdyvG9IRGLaPtmAUuj7TsQKE7avouAcVth390isa1iv0jki3uBbxOxBX4CjN3S++6Wji1VN+lUmj+rWs6WatKp6Rk8a6Kr1vUJp0W7E07nnwceJJzC3mNmzQhBGwncQzjthfDcngcIySLh98ACwkXg3wAPm1kD4G2gnbt/lOG2pLL+6tbtwOGVxcLMGgNDCTvMggzLV3XBzVpEtZu6EtvK9ov3ojL8BrjH3fcj/MMX833zVq6+21qpZXwPIooF8GfC6f+7hO37C+EC/hLCmU9XYEomRaxm/ansu1sqtpXtF5Oi2H4WbcdzhAPhnYQKwbaeFzJadyX58zzCGXGVH9hStZ4LCV/WZ8DPotd1SUfHjwinOfOAw6Px5Re+CEfyyYSr2m2As4GrCadDbxGaAZ4i9GoBaBr9fYVwpG1AuKDxJuFi4COJedLcjotqWn8K655A2Ek3iQXhos53UQwSryFZiv85wGt1LbaV7Bd/JvRGSS7D5CgWqcY3o++2DsT3iSgO86NYTCE03TQgtFkXEw4m8wgJYmvsu1skthX2iz8SkuQD0bQHCT10FgHvEA6G20NeqHLdVKjhU3n+/AL4XbXl3JL/FCkE7f8BjZKGN+npQGivOobQNe1DwlHybSAv8WUC/yB0d+xH6Br1XtJO8iegSTQ8CPhTBmXcM4X1D8nFurf32Cq+2ncV29zGdpt+eFrUX3etu1e86NnE3ddXGNfQ3TdujfXnYt25trVjm04ZFN/crV+xzd36t0Zst+mELyIiqdOzdEREYkIJX0QkJpTwRURiQglfRCQmlPBFRGJCCV9EJCb+P+hDbHbjpdL+AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "#然后我们再来看看各种舱级别情况下各性别的获救情况\n", + "fig=plt.figure()\n", + "fig.set(alpha=0.65) # 设置图像透明度,无所谓\n", + "plt.title(u\"根据舱等级和性别的获救情况\")\n", + "\n", + "ax1=fig.add_subplot(141)\n", + "data_train.Survived[data_train.Sex == 'female'][data_train.Pclass != 3].value_counts().sort_index().plot(kind='bar', label=\"female highclass\", color='#FA2479')\n", + "ax1.set_xticks([0,1])\n", + "ax1.set_xticklabels([u\"未获救\", u\"获救\"], rotation=0)\n", + "ax1.legend([u\"女性/高级舱\"], loc='best')\n", + "\n", + "ax2=fig.add_subplot(142, sharey=ax1)\n", + "data_train.Survived[data_train.Sex == 'female'][data_train.Pclass == 3].value_counts().sort_index().plot(kind='bar', label='female, low class', color='pink')\n", + "ax2.set_xticklabels([u\"未获救\", u\"获救\"], rotation=0)\n", + "plt.legend([u\"女性/低级舱\"], loc='best')\n", + "\n", + "ax3=fig.add_subplot(143, sharey=ax1)\n", + "data_train.Survived[data_train.Sex == 'male'][data_train.Pclass != 3].value_counts().sort_index().plot(kind='bar', label='male, high class',color='lightblue')\n", + "ax3.set_xticklabels([u\"未获救\", u\"获救\"], rotation=0)\n", + "plt.legend([u\"男性/高级舱\"], loc='best')\n", + "\n", + "ax4=fig.add_subplot(144, sharey=ax1)\n", + "data_train.Survived[data_train.Sex == 'male'][data_train.Pclass == 3].value_counts().sort_index().plot(kind='bar', label='male low class', color='steelblue')\n", + "ax4.set_xticklabels([u\"未获救\", u\"获救\"], rotation=0)\n", + "plt.legend([u\"男性/低级舱\"], loc='best')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A8D55FBCD7EA43BF8C7CF42851098D95", + "mdEditEnable": false + }, + "source": [ + "#### 我们看看各登船港口的获救情况" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "18C26D1BD0BC41CE8C1CFF996D2225B2", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAiN0lEQVR4nO3de7xUdb3/8ddbYLMhEQW3IkJsLylgFicwb3gNM7VTeTkEakmp5CXBfiVaWYEXtIuhkobXqLCDpqEZRwktNU+SgeEFE1QC3IiyBS94oQPuz++P79rLYdzARpiZDfv9fDzmsWe+s2bNZ83Aes/3u26KCMzMzAC2qnQBZmbWcjgUzMws51AwM7OcQ8HMzHIOBTMzyzkUbJOQ1K7SNbR0knatdA1m6+NQsI0mqRp4StIeH+C1wyV1kNRW0lmSOhY9f6qkTzRjPh+RNLHw9ZKeK7j/B0kf3dD6mkNJR0ndJH14HZP+TNI31zOvH0k6vuBxG0l/l7T9Ol4zWtLXCx4Pk/STtUy7n6TJki6T9OmsbYSkYU1Mu7+ki5tonyGpx7qWwzZfbStdgG0RRgG7Ar+X1NTzJwJvAbcWtO0C/ADYBvgp0B94EGgPvA0g6W/AnsASSf+KiKOzld1/ASuAWuCjEbEAeC573V8k3QYMBXpKml3wfrdLWgn0j4h3s3mdsJ5lmxIR38jquRfoAjRkz20LCHgFWJm9//PAuZI+DtxUNK9tgb6Shha1fy0iZmW9rZOAwhX6IUAb4M0sfAFWAdcAA7PHOwCrJZ1R8D7Vkj6TPf47cCrpR2CbrObxwLHAH4HjgRFNLHt7YDtJWwP3FbTvRfqu/y97PBV4CoiIuFNSLfBPYGHR/NoBr0bEgCbey1oIh4JtlOzX5gnAM8CpEfFo1i5gHNAPeBrYA5gYEVdKGghcD0wE3gX+AZxBWpn9WdLXIuJvQGfSymxKNg9IK7VzIuL3kvIVVUSEpLOBwRFxK/BDSc9FRL+snj8AF0TEUwXlbw+MjoiJa1m2YcCg7P5uwOiiSU7I6rm16HV9gU7AUxExTNIpQJeIGNfEe0wEOkn6AjCJFDCPSuoG7EcKie2A2dlLdgTOi4gzJHWIiHcknU5a2d6ezfOzwG4RcVXB+3wP+CLQAehKCgMkPQgcCNxaEOhTgR+RPv/DgSOzx429oHOBu4DXs8d3AVXAnyQ1AE8Aj0fEfkXLWgvcXvwZWMviULAPTNK+wHXAp0grwTsknQrMIa30twM+FxErJT0JfEnSXUAf0or0C9msJgG/AuYDx0TE4oK3OZC0omzUO5u2uJZPA7OyQCiFWlLdkHpF80krya2AAcBupF4CwGLgEeCXkvYjfUZ1kk4qmuf/AL8GngX2AW6PiGHZ8jwA9ACOAD4SEf/O2m8A/i2pA/CgpBNJ4TG9YL4Dgfsk/RL4VUTcD/wc+B3pezqX1HvrR1rp/yQiLsi+zwsi4puSLgSqgb8CR5O+z7ez+b+TLWvj9/JmRNRL+hzwY+D0Zn2i1iJ5m4JtjBeBL0bE/Ih4nPSrdiowF1gODIqIN7JpdyWt5DqSVkSzgNey29+AfYHVpBXpsIL3+DXZv1NJXwFqeO9Xcx3wC0nnATsDj0g6StJsSU8BPbL7s0kryruyx2ew4Z4FFmS34yPiZ8BDwMPZ/WNJQyafBFZHxEuk4Zdfklas4yJiQDZ0cj4pMO+MiPuLQrDQZ4CrgcJz0bQF/h0R7wDnAY8DyyNicsE0NaRf7t8EPpcNS9UAt5FW9ADfIg0ZnUsKCoC9eS/YdiUNya0k9dSuAS7JbgK+UfD4eoCImBURh5N6f7aZck/BPrCIeEHSG9mv9ONJvyh/TfqlfCrwcDZsM5c0PHQLMBi4cy2zfJ608tla0u1AL+CG7Lkq0srr2HjvhF3DSdscXo+IRZLmAN0jop+knwE/i4gJAJIOBC4DPhMRb7PhtiP1UmDN/zdtsr/VwH8C/w08no3Df5c09FIPTMqGzf6d1XxMRDyznve8MXvPWyQNzpa7itRT+BxwKWk7zPKsh/aN7HU7k3pvb2aPL4yIH0j6FfAx0tj+3sAvSNt09s2m258U6gD/BywCBkTEH7JtMU1tqF+dhSIAkrpmdz8uaRFpSLAe2AlYAry6nmW2SosI33z7QDfShtQlwGTgK8DWBc9tBXwaGAv8Gbg4a69by7y2JY3BF7b9k/RruS1pm8WOpA2ahbcHCqY/kLTCHUfaGPw0MDO7vUAaYpkB7JlNPxEYto7lGwZMKliePUkbuetIv5b/ktVVRdqo2oY0lHR7Nv0upKC8DHg0m/4vpJ7ONaSgHEwa/vkCaYz+mez2Nml4R8A9wPeyOn6bfSbDSD2vkcDzRXXPBMY0sTwdSUHyV1K4Ni7Xw6Te20vAhwqmP5QUrJAC6vvAZ4tuLxdM3yv799AduJj0Q+FGYPfsPWqASyv979a3dd/cU7CNMRI4jbQi6Aec38TeR7sB20XEm8VPrEs2Zr6K9Iu3cW+fdsArEXFowXQLiuppXEH/Drg8ImZm032LNFRVR1pRz92Qekhj5f9B2lYAaWimO/AGcCEpcOaThluGk4KyG2l4ZxZp+GhpVksNaThrb9Kv9HtJPYgpseY2BSIiJJ0MTMw+kw7AOxExMestnACEpB0j4uVsV9EOpGGjiyNidTa/nqSw+jtpT7DvSuoeEddIuir7vH4eEW+t4zMYmn12hQp3IT4T+G1EvAh8L9u43TgcRUTUk3pP1oI5FOwDa1zRSyIiejc1TdFKG+C5rO21ovaupL2QGvUgDTvcTbYHUDMcABwYEQslTQZ+JalxqGhHYAzp3/y1zZxfLiLy4wskzSP9yh9P+iX/eeCWiDit4CUPZtM+RxqW+XpRYDbunnloNt0X1vHey0hDU2TDUm9nG4KPyN77t6Rf4S8DPwSuIoX0SOCKbDZfJi3/ctJ2hMHADyS1IX02W5OGjNblGxFxb2GDpLrs77bAV0kbzJH0IVLv8fNF03cD9o+IKet5L6sQh4KVVUQcKulSYKuI+Ha26+qNwL8i4pKCSfsDCyPiIUkPk3anXCtJu6fZR+O+8e8CZ0ZE48q5J6kXcR9pGKjRTyVdQtM6kvYQanyPrUkr3f1JK+ShpP9DxwL3SzoAGJX9Ii5c5vftl5/9op9U1HyCpEOz+92y6doCnSLiVUmdSNsYlpK2y1xOGv7pSdqu8GNSONxI2p30MUlvRMQNWd0NWc2rsvldSAqNA7LP5NeSdsqW4bUmPo9xkkYXtXXO/p4HTMsCuRtp77I7I+LJbHfeqmy6vqShL4dCC+VQsE0i29unKd2z568GDi5o31HSUdn9nsALkhoPJJtJOkjsj5K+A5xF2i4BsE/RezWubD4N/G9B+92kjbvLCto6k4ZIGn8RvwucGxHv28U1q/nLpOErJI0gjalPBg6ItJstANkK9kDSinehpGsi4ryC+cxsYvbteP9G1+JdUhuX75WCHs9d2Yq3irQhWMBvgAmk4bYvREQD8KqkI0h7XG1P2o5xN2mY6pRsXtNJ2xcOjYi3JQ3KlmEU8J0mal6jpyCpPe8doPYg6Tv8BClIr4qIy7LnFpMOpnuW9/aKshZKEb7ymm0cSS9FRLe1PLeAdNTxhm5T+C5wc0QskaRsbL0HacPvoYXzj4ja7Fd814KewiYlaS/SnjZzC9q+DrSNiCsL2noC7SJifvb4uYjYvYn5rbEs2Uq+qqnPKetNVQMNkR2vUPhc9tnsFBFLmnhte4Di12XPVUfEyqbeL5vnocAJEfF1STeSQuteSYeT9goT8IeIGFHw2q2AnqX6Hqz0HApmZpbzwWtmZpZzKJiZWW6z39C8/fbbR21tbaXLMDPbrMyaNeuViKgpbt/sQ6G2tpaZM5vaucPMzNZGUpM7A3j4yMzMcg4FMzPLORTMzCy32W9TMDNbm1WrVlFXV8fKle87Rq/VqK6upkePHrRr165Z0zsUzGyLVVdXR6dOnaitraWJM/hu8SKCZcuWUVdXxy677NKs13j4yMy2WCtXrqRr166tMhAgncG4a9euG9RTciiY2RattQZCow1dfoeCmZnlHApmZpbzhuYNNbrz+qfZnI1+vdIVmJVM7QVTN+n8Flx+zAZNP2HCBLbddluGDBnyvudeeeUVTjvtNO68804ATj75ZC655BKKT+MzZMgQzjjjDB5//HFGjhzJySefzM0330xVVdX75vlBOBTMzEpk/vz5nHjiifnjpUuX0qZNG6688sq8rW/fvsybN4/Vq1fz7LPPMnDgQADmzZvH3Llzad++Pb1796ZXr15MmzaNuXPnsnDhQtq0acOgQYOYNm0ahx9+OAD77LMP48aN26iaHQpmZiWy6667MmPGjPxxUz2F+vp65syZQ79+/bjooov46U9/CsAFF1zASSedxI477sgOO+zAM888wymnnMKoUaOYMGEC//jHP3jggQf45S9/ydFHH81hhx220YEA3qZgZlZR22+/PRMnTmTw4MEMHTo0b58xYwY77LADl156KcuXL2f16tWMGJEucnfrrbdy6623csUVV7B06VJWrVrFprpgmnsKZmYlctZZZ/HYY4/lj5944gmqqqrWGD56/vnnOeigg3jnnXdo27ZtPnw0e/Zsjj/+eADGjRvHxRdfzJlnnslNN91E165dOffcc+nbty8PPfQQO++8M/vuu+8mqdmhYGZWItdee21+f/z48VRXV7Nq1SpGjhzJ4MGDAXjsscfo2rUrBx98MC+//DIPP/wwb7zxBl26dOHGG2+kd+/eQDo6efr06cyePZtXX32V8ePHM2LECA455BAuu+wyrrrqqk1Ss4ePzMxKaMWKFZx55plst912DBkyhHPOOYennnqKUaNG8dZbbxERDB8+nFGjRjFnzhwAbr75ZkaOHMkVV1yRz+f888+ntraWT3ziE1x22WV87WtfY/Xq1Rx88MEsW7aMvffee5PU656CmbUaG7oL6ca65ZZbmDRpEmPGjOGTn/wkEyZMAOCiiy5i6tSpDBo0iGHDhjF58mTuueceli5dykMPPcRdd93F9OnT+epXv8oNN9zA6aefzpgxY+jQoQPTpk1jjz324P777+fmm29mxowZfOpTn+L000/n6quvpkOHDhtVs0PBzKxE+vbty1133dXkMQTHHHMMRxxxBDNmzGDQoEGsWrWKoUOHcuGFF3LHHXfQtm1brrvuOoYMGcKSJUvo06cP3//+9+ncuTPbbLMN55xzDpK4++67qa6uZty4cZx//vlcffXVG1WzNtUW60oZMGBAlPVynD54zWyz8c9//pM+ffpUuoxmW7FiBR07dqRNmzZrtK9cuZLq6uo12latWtXs02E39TlImhURA4qndU/BzKyF6NSpU5PtxYEANDsQNlRZNjRL6ixpsqTFkp6XVCVppKRFkuZKOqpg2ssl1Ul6UlL/ctRnZmZJuXoK44GngKFAe6AncDawV3b/Pkm9gIOAgUAtcBhwE9CvTDWambV6Je8pSOoGHACMjWQlcCxwW0SsiIingQVAf+A4YGJErI6I6UBN9nozMyuDcgwf7QX8C7gjGyr6Cal3sLBgmjpgpybaF2fta5A0XNJMSTPr6+tLV7mZWStTjuGjHYC+wL7Aq8B9QDfgiYJpGoB3garsfnH7GiLieuB6SHsflaRqM7NWqByhsBSYFRF1AJKmk1b0OxdM0wN4AVhS1N6d1IswM7MyKEcozACuk9QdWAYMAu4ATs2GknoBXYDZwFTgLEm3AIcD8yJieRlqNLPWYFMfZ9TM43qacwGdUl88p7lKHgoR8Zakc4DppD2PJkbEFZLaA3OAlcBpERGSpgCHAPNJAXLi2uZrZtbSnXXWWTzxxBPrvIDOI488wv7771/yi+c0V1l2SY2Ie4B7itrGAmOL2hqAEdnNzGyzNmbMmPVeQGf58uV07Nix5BfPaS6fJdXMrESacwGdpUuXluXiOc3l01yYmZXIz3/+c9544411XkBnm222KcvFc5rLoWBmViL77bcfxxxzzDovoBMRnHfeeSW/eE5zefjIzKxEmnMBnXJdPKe53FMws9ajzKeGnz179novoFOui+c0l0PBzKxEDjvssPVeQOfFF19k8uTJJb94TnP5IjsbyhfZMdtstKSL7DT3AjobcvGc5vJFdszMWpjmXkCnVBfPaS5vaDYzs5xDwcy2aJv7EPnG2tDldyiY2RarurqaZcuWtdpgiAiWLVvW5DWe18bbFMxsi9WjRw/q6upozRfjqq6upkePHs2e3qFgZlusdu3ascsuu1S6jM2Kh4/MzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs1xZQkHSHEnPZbebs7aRkhZJmivpqIJpL5dUJ+lJSf3LUZ+ZmSXlOvdR+4jYvfGBpN2As4G9gJ7AfZJ6AQcBA4Fa4DDgJqBfmWo0M2v1yjV8VHze2mOB2yJiRUQ8DSwA+gPHARMjYnVETAdqJHUrU41mZq1eyUNB0oeAHSXNl/RnSfuQegcLCyarA3Zqon1x1l48z+GSZkqa2ZpPiWtmtqmVPBQi4q2I2CYidgWuBaYAVUBDwWQNwLvraC+e5/URMSAiBtTU1JSueDOzVqasex9FxG+BamAJsHPBUz2AF5po707qRZiZWRmUY/ios6Su2f2jgOXAVGCIpI6S+gBdgNlZ+ymS2kg6ApgXEctLXaOZmSXl2PuoC2nvIoCXgP+KiMclTQLmACuB0yIiJE0BDgHmA8uAE8tQn5mZZUoeChHxL2C3JtrHAmOL2hqAEdnNzMzKzEc0m5lZzqFgZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmlnMomJlZrhyX49yi1K78TaVLKKkFlS7AzCrKPQUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLFe2UJD0P5JuzO6PlLRI0lxJRxVMc7mkOklPSupfrtrMzCwpy3EKko4E+gEvStoNOBvYC+gJ3CepF3AQMBCoBQ4DbspeY2ZmZVLynoKkDwFjgB9lTccCt0XEioh4mnS8VH/gOGBiRKyOiOlAjaRua5nncEkzJc2sr68v9SKYmbUa5Rg+ugr4KfBa9rgnsLDg+TpgpybaF2ft7xMR10fEgIgYUFNTs8kLNjNrrUoaCpJOASIibitorgIaCh43AO+uo93MzMqk1NsUzgG2lfQM0BnoAGwDLCmYpgfwQta2c0F7d1IvwszMyqSkPYVsiGf3iOgNfBu4HdgHGCKpo6Q+QBdgNjAVOEVSG0lHAPMiYnkp6zMzszWV/SypETFL0iRgDrASOC0iQtIU4BBgPrAMOLHctZmZtXZlC4WImAhMzO6PBcYWPd8AjMhuZmZWAT6i2czMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLNSsUJB0j6cuStilql6TbJX24NOWZmVk5Nfc4hW2BLwHDJHUAHgF+DpwLLI2IRSWpzszMymq9PQVJHwUCGB8Rh5Oue7AVMBf4j4g4q7QlmplZuawzFCS1BUYC3yGdl2gycD/QHvg4MF/S90pepZmZlcU6QyEiVgMXR8RHgT8COwIjI+LMiHiSNKR0oKRPlr5UMzMrteZsUzhOUnvgSOAyYLCkMcBy4BLS2UwfLWGNZmZWJs3Z++gF4GVgB+Ai4EGgE3A5MA14qGTVmZlZWTUnFLYl9Sh6AKuA3YC+pAvg1AMHlKo4MzMrr+aEwlvAdsA/gOnAImAB8L/AxUCtpE6lKtDMzMqnOaHwaET8mBQIlwCfJB2ncBBwL9Ab2LNkFZqZWdmsd0NzRMzP/jZeFOf7jc9JugB4DXisFMWZmVl5feArr0n6DjAEODK7apqZmW3mNviEeJL2lfQAsCswMCKWbPKqzMysItbbU8iGiF4EegGfAp4mHcD2eIlrMzOzMmtOT+EV0sbkvYAPk45R8N5GZmZboOZsaL6x8X52LqSjgUsl1QNfi4hlJazPzMzKaIO2KUTE6oj4PXAoaY+jB30tBTOzLccHuvJaJGOB8cAUSe3WNq2krSRNlzRP0lxJR2btIyUtytqOKpj+ckl1kp6U1P+D1GdmZh/MB94lFSAirpPUE+hGOkdSk5MBX46IJZI+Qxp6eg44m7Sdoidwn6RepAPiBgK1wGHATUC/janRzMyab6Ov0RwRF0bE2gKhsVfRuNtqL+Bx4FjgtohYERFPk06b0R84DpiYDVNNB2okdSuep6ThkmZKmllfX7+xi2BmZpmNDoXmkDRK0jLgG6QzrfYEFhZMUgfs1ET74qx9DRFxfUQMiIgBNTU1pSvczKyVKUsoRMSPIqIr6Qpu04AqoPAo6Abg3XW0m5lZGZQlFBpFxO+ArYElwM4FT/UgbZMobu9O6kWYmVkZlDwUJO3auF1A0v7ASmAqMERSR0l9gC7A7Kz9FEltJB1Buqrb8lLXaGZmyUbtfdRM2wL3SmoDLAW+GBGzJE0C5pBC4rSICElTgEOA+cAy4MQy1GdmZpmSh0JEPAbs0UT7WGBsUVsDMCK7mZlZmZV1m4KZmbVsDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLFfyUJBUJelaSfMkPSvp+Kx9pKRFkuZKOqpg+ssl1Ul6UlL/UtdnZmbvaVuG9+gC/CkizpK0B/CopKeAs4G9gJ7AfZJ6AQcBA4Fa4DDgJqBfGWo0MzPK0FOIiJci4vbs/jxgNTAEuC0iVkTE08ACoD9wHDAxIlZHxHSgRlK34nlKGi5ppqSZ9fX1pV4EM7NWo6zbFCR9BXiC1HtYWPBUHbATqddQ2L44a19DRFwfEQMiYkBNTU0JKzYza13KFgqSLgBGACcBVUBDwdMNwLvraDczszIoxzYFJF0DfAg4MCLelrQE2Llgkh7AC0Bxe3dSL8LMzMqgHHsf7QfsGRHDIuLtrHkqMERSR0l9SMNJs7P2UyS1kXQEMC8ilpe6RjMzS8rRU+gHDJD0XEHb14FJwBxgJXBaRISkKcAhwHxgGXBiGeozM7NMyUMhIiYAE5p46l5gbNG0DaTtDiNKXZeZmb2fj2g2M7OcQ8HMzHIOBTMzyzkUzMwsV5bjFMxahNGdK11BaY1+vdIV2BbAPQUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLFfyUJDUXtKZkqYUtY+UtEjSXElHFbRfLqlO0pOS+pe6PjMze085rtE8F/gH0KmxQdJuwNnAXkBP4D5JvYCDgIFALXAYcBPQrww1mpkZ5Rk+6gdcVdR2LHBbRKyIiKeBBUB/4DhgYkSsjojpQI2kbmWo0czMKEMoRMRrTTT3BBYWPK4DdmqifXHWvgZJwyXNlDSzvr5+E1ZrZta6VWpDcxXQUPC4AXh3He1riIjrI2JARAyoqakpaaFmZq1JpUJhCbBzweMewAtNtHcn9SLMzKwMKhUKU4EhkjpK6gN0AWZn7adIaiPpCGBeRCyvUI1mZq1OOfY+ep+ImCVpEjAHWAmcFhGR7bZ6CDAfWAacWIn6zMxaq7KEQkQ8ADxQ1DYWGFvU1gCMyG5mZlZmPqLZzMxyFRk+MjPbYKM7V7qC0hr9eqUrABwK1orUrvxNpUsoqQWVLsC2CB4+MjOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOeD18xss+CDD8vDPQUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOznEPBzMxyDgUzM8s5FMzMLOdQMDOzXIsLBUmDJf1L0nOSvlrpeszMWpMWdUI8SZ2AK4D9gHeB2ZLujoj6ylZmZtY6tLSewpHAgxGxOCJeAv4EfKrCNZmZtRotqqcA9AQWFjyuA3YqnkjScGB49vBNSXPLUFulbA+8Uq430w/L9U6tgr+7zduW/v31aqqxpYVCFdBQ8LiBNIy0hoi4Hri+XEVVkqSZETGg0nXYhvN3t3lrrd9fSxs+WgLsXPC4B/BChWoxM2t1WlooTAOOlLSDpG7AAcAfK1yTmVmr0aKGjyLiZUnfBR7Jmr4ZEW9VsqYWoFUMk22h/N1t3lrl96eIqHQNZmbWQrS04SMzM6sgh4KZmeUcCi2YpN6SelS6Dls/SbWSPtREe2dJ3StRk20aktpVuoZycii0EJJmSvpwweOfkfbG+pOkr1SuMmum6UD7Jtq3Bu4ocy22gSRVSRom6QpJP5T0pYIwuLqixZWZQ6Hl2D4iFgFIOhI4HOgD9AfOrWBd1jxtI2J5cWNELAa6VqAeayZJuwNPAocAzwLzSafXeUrSaNL/wVajRe2S2sq9KanxiO4fA6Mi4m3ITxRoLVtI6hAR7xQ2SmoLbFOhmqx5rgYujIjfFrRdJ2kIcA3w0cqUVRnuKbQcNwAPZbdnI+IPAJJ6AasrWZg1yy3AlZJU1H4h6cSO1nJ9rCgQAIiIycDrEbGkAjVVjI9TaEEk7Q90AaZFxOqs7aPAdhHxl4oWZ+uUjT//N7AncD/wb+BgIIDP+/TvLZekF4BeEdFQ1L4VsCgiWtXOHg4Fs01I0j7AQFIYPBUR91W4JFsPSb8AXoiI7xe1XwB8PCKGVqayynAomFmrJmk74FZgO+BB0na9g0jDtsdGRNlOn90SOBTMzABJBwAfJ53C/4mI+HOFS6oIh4KZmeW895GZmeUcCmZmlnMomK2DpP/M/n5E0l4F7WdUriqz0nEomGUkHSLpgez2vKSLgc9Luhq4Nk2idtlRykcVvG6ipL9JejLbjRFJgyR9q2Cae7O/tZImlHfJzJrPp7kwe8/WwGXALNIBaNcBHYApwFBgPPB74FvAs5JeIp0CYSvgs8BewH4AEXGfpE9LGgUcDXxM0gNANfBhSb2BkyJisaT5wNK11LRDROxaioU1a4pDwew904HbgO7AX4EvZO2zgRnAp4HFpCB4HliQPV8DrHEyvKyXcA3QjnTqknHABcBOwJeAHwFtssn/LyL2a6ogSc9s7EKZbQiHgtl7BgA9gNHAItIpRwD+SAqMG7PbO0DjOY4OAJZExLuSXgGOkbSQdNrz64GxpB7Em8D2pCOdP5K9z+ulXySzDePjFMwyki4B/gmMAFYUPd2ZNGw0nvQL/xHSuY1+DPw2Il7L5vFxoCEinswukPQOcA/pB1hjb2Jr0umZh0fEm5KeiYjea6lprc+ZlYJ7CmaZiLhQ0qHA3RFxSeFzkq4k/bI/ATiHtN2hLSkcnikY5tkT2D+7fz+wBJgL3Fpw5ts/kHoco0lBY9ZiOBTM3u9USYOK2nYHJpKGlJaQtjc0Dv/cGxHDACTdmP3tBrwEHAmcDlwi6dxs+ipST+HbpVoAsw/KoWC2ptnAd0i/9v8f6f/Ir4ErgeeAwcCcrG3sOuazL/D3iPi3pAZgTERMAZBUTbp+xphsnlWSZqxlPlUbuTxmG8ShYJaRNB7YO3u4G/Ax0vaDXUl7GJ0L/Iu0zWEK0Al4FfhMtrsppOGjS0jbGx7N2v4G/FrSyOxxO+DFiHgue+y9j6zF8IZmsw0gSdGM/zTZ6ZhXRcSbzZh2YEQ8vKHPmZWCQ8HMzHI+zYWZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmlvv/8KtmWB8d4vEAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "fig = plt.figure()\n", + "fig.set(alpha=0.2) # 设定图表颜色alpha参数\n", + "\n", + "Survived_0 = data_train.Embarked[data_train.Survived == 0].value_counts()\n", + "Survived_1 = data_train.Embarked[data_train.Survived == 1].value_counts()\n", + "df=pd.DataFrame({u'未获救':Survived_0,u'获救':Survived_1})\n", + "df.plot(kind='bar', stacked=True)\n", + "plt.title(u\"各登船港口乘客的获救情况\")\n", + "plt.xlabel(u\"登船港口\") \n", + "plt.ylabel(u\"人数\") \n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "908B60F9DD894D929DA6D5728CD1929A", + "mdEditEnable": false + }, + "source": [ + "#### 看看 堂兄弟/妹,孩子/父母有几人,对是否获救的影响" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "D5C98FE13EBB435F85FAF4F21938EA2C", + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " PassengerId\n", + "SibSp Survived \n", + "0 0 398\n", + " 1 210\n", + "1 0 97\n", + " 1 112\n", + "2 0 15\n", + " 1 13\n", + "3 0 12\n", + " 1 4\n", + "4 0 15\n", + " 1 3\n", + "5 0 5\n", + "8 0 7\n", + " PassengerId\n", + "Parch Survived \n", + "0 0 445\n", + " 1 233\n", + "1 0 53\n", + " 1 65\n", + "2 0 40\n", + " 1 40\n", + "3 0 2\n", + " 1 3\n", + "4 0 4\n", + "5 0 4\n", + " 1 1\n", + "6 0 1\n" + ] + } + ], + "source": [ + "gg = data_train.groupby(['SibSp','Survived'])\n", + "df = pd.DataFrame(gg.count()['PassengerId'])\n", + "print(df)\n", + "\n", + "gp = data_train.groupby(['Parch','Survived'])\n", + "df = pd.DataFrame(gp.count()['PassengerId'])\n", + "print(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "630CA44341D84F3D85B265A347016F96", + "mdEditEnable": false + }, + "source": [ + "好吧,没看出特别特别明显的规律(为自己的智商感到捉急…),先作为备选特征,放一放。" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "18CB6E8DC7194E9490C41F27B755EBB7", + "mdEditEnable": false + }, + "source": [ + "#### tickets cabin的分析\n", + "ticket是船票编号,应该是unique的,和最后的结果没有太大的关系,先不纳入考虑的特征范畴\n", + "\n", + "cabin只有204个乘客有值,我们先看看它的一个分布" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "52EFF0E17D9141F28A47633B2969D050", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "B96 B98 4\n", + "G6 4\n", + "C23 C25 C27 4\n", + "C22 C26 3\n", + "F33 3\n", + " ..\n", + "E34 1\n", + "C7 1\n", + "C54 1\n", + "E36 1\n", + "C148 1\n", + "Name: Cabin, Length: 147, dtype: int64" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_train.Cabin.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AE065A86E10C46A78140AE25C0C9E628", + "mdEditEnable": false + }, + "source": [ + "Cabin属性应该算作类目型的,本来缺失值就多,还不集中,…\n", + "如果直接按照类目特征处理的话,太散了,估计每个因子化后的特征都拿不到什么权重。加上有那么多缺失值,不如先把Cabin缺失与否作为条件(虽然这部分信息缺失可能并非未登记,maybe只是丢失了而已,所以这样做未必妥当),先在有无Cabin信息这个粗粒度上看看Survived的情况好了。" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "C46525090059436083CF94D6599456E0", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEYCAYAAACz2+rVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAeb0lEQVR4nO3de7xVdZ3/8debIxdR08CjXOMo/VK0DIeTt3BAC9HKZnScIrtADlGOKZZlXrLsIpJpk2NZkRZTTmNo0a8ZSgRN7VeZHRyMMCEjhEOox4MWUShwPr8/vussNpsNHC577QP7/Xw89oO9vmvtvT4bcb339/tda21FBGZmZgA9al2AmZl1Hw4FMzPLORTMzCznUDAzs5xDwczMcg4FqwlJB9S6hh2RdISkI2pdh1mRHApWVZJOknR3Wdt+wCJJ/6fC9k2Svi3pWUkvSloh6R93sI+xkp7bzvobJE3bhfKnA2N34XU7TVIPSQdKGiTpFRXWXyvpgyXLkyTduI33OknSnZKul3RG1naJpEkVtj1Z0mcqtD8sachufSjbK+1X6wKsfkh6A9D5zXsJ8GFJC7LlFuBvwEPADGAk8ALwGkC7s9+IuLyL9V0BTC5pGgacJOnqCpv/MiLeKel7wJiS9pcBfwU2lrS1RMSZkhqAR4E+QEe2rhFoBzZkr1sH/B6YLOmrwOhsu8OAjZI+kC0fAvSRdGa2/CvgX0hf9BpIf2e3AOcA9wL/BFxS4XP0Bl4u6UBgfkn7scAPJb2ULc8BfgNERPxAUhPwW+CpsvfrCTwfEc0V9mV7AfniNasWSdcD7wD6ASuAJ0gHwN+z+aAI6aD6S+BM0sH2sp3cz1jg7og4dPerzt/zTuCJiLh2J1/3M+DSiPjVNta/FeiVLR4AfBGYApT+j7guIn6cbb9/RPxN0vtIB9u7s/a3AMMj4uaS974GeDuwP9AfaM1WTQAWAk+W7GMOcANwEnA9cE1WQ2cv5VLgm8CfsuX/m9V9P3Ah8Gvgzog4qezzNZH+WzgU9lIePrKqiYgrSQekeyPi1aRvll8nfZPdPyJujIgbgf/M2k4BPrut98uGTH4r6a+SHpF0dNn6t2XDTc9J+kL2zRxJMzuHWrL3aJF0uaQ1klZKOrvsfcYD5wG378LHfjmwzaEs4NlsfTtwNfBRoC1r63w8n9WxP/CgpFeSDt6lPfvRwGJJ/5H1wAC+QgqFdwL3AMcB7wFOB26MiKOBicBvs+B9H/A64OfAm4BVpOB+gtRr+33J8l8iYgnwVlJgNOz8X43tDTx8ZIWJiGsAJH0LeG9J+52SzgVWRsTz23mLg4C3AX8gHQCvJw2PdK57E3A86dvuj4DHgdsqvM+I7M8m4ArScNXArLb+2Xu3Az+Wthq5GgRM6fzGXsFgYGWlFZI+ndUIcCTwZ+AD29h2YURMlvRR4DHg1oi4s2STRtI398uAayQ9lLXNAi7KtvkIcAzw98CPs7bXkA72nTU8RBqKugf4cmkJwIfY3INZCZwTEQuA07Mege2DHApWbS8HRkr6CvAq0jj9MOA/sgNuT9L4+xHA8u29UUTcko19H0P6dn1iyeqewNSI+BPQno3Hv4XKofB8RNwAIOlW4EpJjaQx/R+RhrtujIitei2Sbit5fjPpm3mnXqT5gtYKYfLfEfE+4BOSTiCN398ArKlQ308jYlU21HQd8CCwRtK/kA7UkMLnDcBfsuWPR8Qns8A9Lvv7eA1pCOhlJX9XJ5OGjgBeIg3rNUfE/0haD7y6Qj0bI+JLJZ+7f/b0tZJWkEKljRSsq8l6OrZ3cihY1WQHqFNJQyafi4jlkuYD74+I+yT1IU04jwReC9wvqU9ErK/wXj1IB/i/Bx4hHfR6lWyyJguETs+QDu6VPF3y/IXszwOAfwOWAXcBl0k6r8Jrh5K+VRMRU4GpJTV+mjSs0gP4SUTcW+FzHEMan3+UdFA+sGyTc4H1pKGcfsBZZJPEETGcbEhLUgswJyI+Wfb6W0jDUgOBz0fE/ZIeAB6SdDrwZrIJ54j4QDYf02kCKSQeLXvP24EvZfsdBjwMjAJuJM1VnEU6U2smqed2afnntr2HQ8GqaSpwFPCRLBBeSzqYHJgd5K8EfhgRayX9kjRk80HSwabcG4EzgCMj4iVJ7yaNh3c6SFLPiNiQLY9gBz2PCj4LLCUdNL+8o55CWfsA0oTxaODjpG/nnev6kkLsRNIQzY3A5Wx5plOnoZ1PImJm1ls4DwhJh0fEM9mpovsDb5X0mYjYmO1nKHA36UykdcDVkgZFxJezXs33ga9ExLrt/B28g81DXJ36ljy/ELgrIv5IGra6hs3DUUREGymUbC/lULCqiYjnS4dRIuKxLBguJR0Y+wHDs3UbJF0MfEvS30iTzx1AM7CJdFDtCwyR1AFcXLa7/YDPSbqWFDzvBf5hJ+t9AiCr+aId9RQ6ZRPC3wVuiYgnKwwdHUcaKrqb9E19EKm3c3P5hqTTSjvf9+PAuOxz3EWaM3gG+Fz22pGk4L0pe8l7gE+RhqQuJc2/fDKbcD+c1CvpPMV0Wz4UEeWfrzX78xDgArIwVroAcau/5ywgT46I2TvYl3VDDgWrtt5kk5XZwfME0oF+AWmI5JfZqavfjoi7Ja0lnR75eVIYLCVNps4lnW+/iPTN9NtsnlCFNI7dRpoQbSf1Tn6yG3V3qaegdKHZ3aRhlOuz5heBASWbHQ20RsS/Z68ZRBqH3+LsqUz/kuc/IA3L9CCF0RpJnyeFw23AwcCjkv4cEV8nhUUH6dv+hiyUP04KjVNIvbZvSxoIXB4RL1TY/79lwVrq4OzPjwJzI+Kp7MD/XeAHEbFI0nA2D+cdA0wCHAp7o4jww4+qPEhn8bxAOnhPIV3o9DXg+JJthpMO+JfXut6Smi4lTdx2LjeQJsz3Bx4A3pK1f4AURB8ue/3ppHmLxdmjFTijZP1Y0jxLS4XHauAfs+16AX/M2m4Cfgh8D+hb8l6vzPZxJWlsfyNp6Oi8bP0vstf2zZb7kuYdppXU8qXs+W3AmWWfpTfwdPb8DNKw3N9ln+/Kku36kK5d+F323/m8Wv939GPXHu4pWNVExIWkMWgk9QZui4iOsm1+D4xXuvVFd9WbNNzTQTroPZi1zyedVbSqdOOIuJ8tewqVfD8itjodVdL0kvd5CRgkSRERkgZGxOqyfT0p6e+y5y+yde//tCiZuI+IvwIXq8IYV1kdp5OuKRHpNFcimzjP5oNOjIj8auZsH8ft4DPbXsBXNJuZWc5XNJuZWc6hYGZmue48jtslhx56aDQ1NdW6DDOzvcqCBQuei4jG8va9PhSamppoaWmpdRlmZnsVSeW3PQc8fGRmZiUcCmZmlqt6KEh6l6QnSx7rJP2zpKnZve+XSDqrZPvpklolLZI0qtr1mZnZZlWfU4iIO4A7ACQdDPw/0pWP15F+8m8oMD+7++KppBuKNQGnke7OOHJn97lhwwZaW1tZv36rm212G3369GHIkCH07Nmz1qWYmeWKnmj+MOk2B2cDsyJiLfC4pOWkm5idC8yMdNfHeZIaJQ2IiKe3+Y4VtLa2ctBBB9HU1MQOLtysiYigvb2d1tZWjjjiiB2/wMysIIXNKWT3zn8X6Uc/hrLlD363ku7/Xt6+Kmsvf68pSj+p2NLW1rbVvtavX0///v27ZSBAugtn//79u3VPxszqU5ETzW8HfhzpXu692PKH2ztId8TcVvsWImJGRDRHRHNj41an2QJ020Do1N3rM7P6VGQovIN0T3hId30cXLJuCOmWx+Xtg0i9CDMzK0AhoZD9GMco0iQzpN+InSCpr6QRpB9bWZi1T5TUIGkcsDQiKv2GrZmZVUFRE80jgcURsQkgIhZIuoN0H/j1wOTs1sCzgTGk38ltB87fEztvumLOjjfaCcunv3mH28yaNYuPfexjNDQ0cNVVV3HBBRfs0RrM9phrD97xNtZ11/5px9t0Y4WEQkT8jPRjHqVt04BpZW0dpN/HvaSIuqpl7dq1XHbZZTz88MM0NDQwcuRIzj77bLY1/2Fm1l34iuYqmDt3LmPGjGHw4MEMGDCA008/nfvuu6/WZZmZ7ZBDoQpWrlzJsGHD8uUhQ4awevXq7bzCzKx7cChUwUsvvUSPHpv/anv06EFDQ0MNKzIz6xqHQhUMHDiQVas2/2xva2srQ4cOrWFFZmZd41CogvHjxzN37lyeffZZnn76aX7+859zxhln1LosM7Md2ut/ZKcrunIK6Z50+OGHc91113HyyScDcNNNN3HAAQcUWoOZ2a6oi1CohUmTJjFp0qRal2FmtlM8fGRmZjmHgpmZ5RwKZmaWcyiYmVnOoWBmZjmHgpmZ5RwKZmaWcyiYmVmuPi5e29M/ItKFH9F48cUX+cY3vsG9997L7Nmz9+z+zcyqpD5CoQaOOuoojj/+eNauXVvrUszMuszDR1WycOFCpk6dWusyzMx2ikOhSg455JBal2BmttMcCmZmlnMomJlZrpBQkHSwpDslrZL0e0m9JE2VtELSEklnlWw7XVKrpEWSRhVRn5mZJUWdfXQL8BvgHUBvYChwEXBs9ny+pGHAqcBooAk4DbgdGLnbe+/CKaRmZlZAKEgaAJwCTIqIANZLOgeYFRFrgcclLQdGAecCMyNiIzBPUqOkARHxdLXrrIaxY8cyduzYWpdhZtZlRQwfHQv8AfheNlR0I6l38FTJNq3AwArtq7L2LUiaIqlFUktbW1v1KjczqzNFDB8dBhwDnAg8D8wHBgC/LtmmA9gE9Mqel7dvISJmADMAmpuboypVm5nVoSJC4VlgQUS0AkiaRzrQDy7ZZgiwElhd1j6I1IswM7MCFDF89DBwjKRBknoDbwT+AkyQ1FfSCKAfsBCYA0yU1CBpHLA0Itbsyk7T9EX31d3rM7P6VPWeQkSsk3QxMI905tHMiLgpC4jFwHpgckSEpNnAGGAZ0A6cvyv77NOnD+3t7fTv3x9Je+aD7EERQXt7O3369Kl1KWZmW9De/o21ubk5WlpatmjbsGEDra2trF+/vkZV7VifPn0YMmQIPXv2rHUpVu/29F2E691ecgq8pAUR0Vzevk/eJbVnz54cccQRtS7DzGyv49tcmJlZzqFgZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmlnMomJlZzqFgZmY5h4KZmeUcCmZmliskFCQtlvRk9vhG1jZV0gpJSySdVbLtdEmtkhZJGlVEfWZmluxX0H56R8QrOxckDQcuAo4FhgLzJQ0DTgVGA03AacDtwMiCajQzq3tFDR9F2fI5wKyIWBsRjwPLgVHAucDMiNgYEfOARkkDCqrRzKzuVT0UJB0AHC5pmaSfSHodqXfwVMlmrcDACu2rsvby95wiqUVSS1tbWxWrNzOrL1UPhYhYFxEvi4gjgVuB2UAvoKNksw5g03bay99zRkQ0R0RzY2Nj9Yo3M6szhZ59FBF3AX2A1cDgklVDgJUV2geRehFmZlaAIoaPDpbUP3t+FrAGmANMkNRX0gigH7Awa58oqUHSOGBpRKypdo1mZpYUcfZRP9LZRQBPA/8cEY9JugNYDKwHJkdESJoNjAGWAe3A+QXUZ2ZmmaqHQkT8ARheoX0aMK2srQO4JHuYmVnBirpOwcy6qab136l1CfuU5bUuYDf5NhdmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpYrLBQk/UjSbdnzqZJWSFoi6aySbaZLapW0SNKoomozM7NkvyJ2Imk8MBL4o6ThwEXAscBQYL6kYcCpwGigCTgNuD17jZmZFaTqPQVJBwCfAm7Ims4BZkXE2oh4HFgOjALOBWZGxMaImAc0ShqwjfecIqlFUktbW1u1P4KZWd0oYvjoZuALwAvZ8lDgqZL1rcDACu2rsvatRMSMiGiOiObGxsY9XrCZWb2qaihImghERMwqae4FdJQsdwCbttNuZmYFqfacwsXAIZKeAA4G9gdeBqwu2WYIsDJrG1zSPojUizAzs4JUtaeQDfG8MiKOBq4E7gZeB0yQ1FfSCKAfsBCYA0yU1CBpHLA0ItZUsz4zM9tSl3oKkt4M9Ad+EBF/LmkXcBfw4YhY0ZX3iogFku4AFgPrgckREZJmA2OAZUA7cP5OfRIzM9ttXR0+OgR4NzBJ0v7AL4CvAJcCz3YlECJiJjAzez4NmFa2vgO4JHuYmVkN7HD4SNKrgQBuiYjTSdcT9ACWAMdHxL9Wt0QzMyvKdkNB0n7AVOAq0nj/ncB9QG/gtcAySddUvUozMyvEdkMhIjYCn4mIVwP3AocDUyPiwohYRBpSer2kE6pfqpmZVVtX5hTOldQbGA9cD7xN0qeANcBnSWcJPVLFGs3MrCBdOSV1JfAMcBjwaeBB4CBgOjAXeKhq1ZmZWaG6EgqHkHoUQ4ANwHDgGNKFZW3AKdUqzszMitWVUFgHvBz4X2AesIJ0E7ufAZ8BmiQdVK0CzcysOF0JhUci4vOkQPgscALpOoVTgXuAo4GjqlahmZkVZocTzRGxLPuz82KzT3Suk3QF6e6nj1ajODMzK9Yu3xBP0lXABGB8djWymZnt5Xb6hniSTpT0AHAkMDoiVu/gJWZmtpfYYU8hGyL6IzAMeAPwOOkCtseqXJuZmRWsKz2F50iTyccCryBdo+CzjczM9kFdmWi+rfN5di+kNwHXSWoD3h8R7VWsz8zMCrRTcwoRsTEifgiMJZ1x9KCkV1SjMDMzK94unX0UEQFMk9QOzJZ0UkRs2LOlmZlZ0XbrN5oj4muShgIDSPdIMjOzvdhuhQJARHx8TxRiZma1t9PXKZiZ2b7LoWBmZjmHgpmZ5aoeCpJ6SJonaamkJZLGZ+1TJa3I2s4q2X66pFZJiySNqnZ9Zma22W5PNHdBAO+JiNWSziRd+PYkcBHpKumhwHxJw0i34x4NNAGnAbcDIwuo0czMKKCnEEnnTfOGAY8B5wCzImJtRDxO+tGeUcC5wMzsIrl5QKOkAeXvKWmKpBZJLW1tbdX+CGZmdaOQOQVJl2cXun2I9DvPQ4GnSjZpBQZWaF+VtW8hImZERHNENDc2NlavcDOzOlNIKETEDRHRH7gKmAv0Akp/g6ED2LSddjMzK0ChZx9FxPeBA4HVwOCSVUNIV0SXtw8i9SLMzKwARZx9dGTnvICkk4H1wBxggqS+kkYA/YCFWftESQ2SxgFLI2JNtWs0M7OkiLOPDgHukdQAPAu8PSIWSLoDWEwKickREZJmA2OAZUA7cH4B9ZmZWabqoRARjwKvqtA+DZhW1tYBXJI9zMysYL6i2czMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs5xDwczMclUPBUm9JN0qaamk30n6p6x9qqQVkpZIOqtk++mSWiUtkjSq2vWZmdlm+xWwj37A/RHxr5JeBTwi6TfARcCxwFBgvqRhwKnAaKAJOA24HRhZQI3Vd+3Bta5g33Htn2pdgdk+q+o9hYh4OiLuzp4vBTYCE4BZEbE2Ih4HlgOjgHOBmRGxMSLmAY2SBlS7RjMzSwqdU5D0XuDXpN7DUyWrWoGBpF5DafuqrL38faZIapHU0tbWVsWKzczqS2GhIOkK4BLgnUAvoKNkdQewaTvtW4iIGRHRHBHNjY2N1SvazKzOFDGngKQvAwcAr4+Iv0paDQwu2WQIsBIobx9E6kWYmVkBijj76CTgqIiYFBF/zZrnABMk9ZU0gjSctDBrnyipQdI4YGlErKl2jWZmlhTRUxgJNEt6sqTtg8AdwGJgPTA5IkLSbGAMsAxoB84voD4zM8tUPRQi4qvAVyusugeYVrZtB2ne4ZJq12VmZlvzFc1mZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpareihI6i3pQkmzy9qnSlohaYmks0rap0tqlbRI0qhq12dmZpvtV8A+lgD/CxzU2SBpOHARcCwwFJgvaRhwKjAaaAJOA24HRhZQo5mZUczw0Ujg5rK2c4BZEbE2Ih4HlgOjgHOBmRGxMSLmAY2SBhRQo5mZUUAoRMQLFZqHAk+VLLcCAyu0r8ratyBpiqQWSS1tbW17sFozs/pWq4nmXkBHyXIHsGk77VuIiBkR0RwRzY2NjVUt1MysntQqFFYDg0uWhwArK7QPIvUizMysALUKhTnABEl9JY0A+gELs/aJkhokjQOWRsSaGtVoZlZ3ijj7aCsRsUDSHcBiYD0wOSIiO211DLAMaAfOr0V9Zmb1qpBQiIgHgAfK2qYB08raOoBLsoeZmRXMVzSbmVnOoWBmZjmHgpmZ5Woy0VyPmtZ/p9Yl7DOW17oAs32YewpmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpZzKJiZWc6hYGZmOYeCmZnlHApmZpbrdqEg6W2S/iDpSUkX1LoeM7N60q1+jlPSQcBNwEnAJmChpP+OiLbaVmZmVh+6W09hPPBgRKyKiKeB+4E31LgmM7O60a16CsBQ4KmS5VZgYPlGkqYAU7LFv0haUkBt9eJQ4LlaF7E9+lytK7Aa6fb/NmGv+vc5rFJjdwuFXkBHyXIHaRhpCxExA5hRVFH1RFJLRDTXug6zcv63WYzuNny0GhhcsjwEWFmjWszM6k53C4W5wHhJh0kaAJwC3FvjmszM6ka3Gj6KiGckXQ38Imu6LCLW1bKmOuRhOeuu/G+zAIqIWtdgZmbdRHcbPjIzsxpyKJiZWc6hYGZmOYdCHZPkiTvrdiSp5PlXSp5/TtJPJH2gNpXVB0801zFJvwAuAD4MlP5D6AB+GBE/qklhVtckPRIRJ5Q/z5b7AT+NiGNrVuA+zj0FGwr8CngtcHP259eBq2tZlNU1bWtFRKwB2guspe50q+sUrGaeAtZFxGJJ6yJigaSXal2U1a0thi+y4aSh2eKBVLgfmu05DgUr1/kt7eKaVmH1TJJGsPkWNwcDnbeZ2wR8uiZV1QmHQh3KbiEykbJvZKUi4jfFVWS2latJ9z4bAdwCfNtzXMXwnEJ9egFYQ5o/eDdbhoPPPLBai4h4V0SMBZ4A/h14u6T7JQ3d/kttd/nsozom6ZfAfOD9pMm7NUC/7M+IiFNqWJ7VqW2dfSTpjcAXgbdGxLIalrhP8/BRfeuIiKslPQmcDLw//C3Bau9PJc/zM5EiYr6kC4E7JZ0SERuLL23f5+Gj+jYHICK+CawA3ljbcswgIsaVLM4tW/dT4A5gUKFF1REPH5mZWc49BTMzyzkUzMws51Aw2wmSHpA0ukL7myV9aCfe5zBJB+7Z6sx2n88+sronqRfwCeCdpLNdNgHjdua0x4iYQzZx30VfJJ1//3CFeoYD393OaxdGxOSd2JdZlzkUzOA7wPPAayLiL5IOBV7c0zuRNJt0355DgJcDR5bcJbrTtyLiVqC55HW/jojj9nQ9ZpU4FKyuSTqBdCuF4yJiE0BEPCdpsKQ5wGDShX3nRMSq7GWnSfoS6UK/j0XEf0maBIyOiMmSZgLPAacCw4CLI+KuiDhH0n6kCwa/CTxWVs5jEbG6rL4GYP2e/+RmlXlOwerdaOC+zkAoEcAFETEceAh4X8m6o4FRwDjg5m3MDRwDvB54F3Ad5Af4r2XrywPhbOAN2Xb3SGqR1AK0AEd1Lpc8puzi5zXbLvcUrN4FlYeKngHeKemjwClsOfY/MwuRJZKWAEdVeP33ImKjpAeBYZJ6A3cDC0gBcEXZ9oOBXwBExJmdjZIuJw1t/SAi2nblA5rtDIeC1bvfAG+v0H4dafx/OvBr4PiSdaW3V9gf+FuF178IEBEbJDVExIuSbgV+CxwdERNKN5b0wfI3yCbA3w2cBvwPcFJXP5TZrvLwkdW7+QCSPi2pZ/Z8CPBq4B5gKTC+7DXnZdu9DmgEfteVHUXEj7Onp2entuYPYGqFl3wCmB0Rz5U2SjpW0nu6sk+zneWegtW1iAhJ/0A6RbRV0lrSxPKXgC+Qhnl+DvQsedlLkhYBDcC7s97Azuz2/go9hY+Qfhu7c/lS4ETgLVlTn5LNjwKG78wOzbrK9z4yK5CkJmB6Zyhk3/gnAr2Bd0XEckl3kQLigohYl233NdKdbP8MHJZt+0gNPoLt4xwKZt2MpKMiYkmt67D65FAwM7OcJ5rNzCznUDAzs5xDwczMcg4FMzPLORTMzCznUDAzs9z/B27neE8+3guhAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "fig = plt.figure()\n", + "fig.set(alpha=0.2) # 设定图表颜色alpha参数\n", + "\n", + "Survived_cabin = data_train.Survived[pd.notnull(data_train.Cabin)].value_counts()\n", + "Survived_nocabin = data_train.Survived[pd.isnull(data_train.Cabin)].value_counts()\n", + "df=pd.DataFrame({u'有':Survived_cabin, u'无':Survived_nocabin}).transpose()\n", + "df.plot(kind='bar', stacked=True)\n", + "plt.title(u\"按Cabin有无看获救情况\")\n", + "plt.xlabel(u\"Cabin有无\") \n", + "plt.ylabel(u\"人数\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "C0F627D831324F4A8E02278E49A895B6", + "mdEditEnable": false + }, + "source": [ + "有Cabin记录的似乎获救概率稍高一些,先这么着放一放吧。" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2BED89CF8A7D45E6832F5D54E84CAF75", + "mdEditEnable": false + }, + "source": [ + "## 简单数据预处理\n", + "\n", + "\n", + "数据预处理,其实就包括了很多Kaggler津津乐道的feature engineering过程。\n", + "\n", + "- **『特征工程(feature engineering)太重要了!』 **\n", + "- **『特征工程(feature engineering)太重要了!』 **\n", + "- **『特征工程(feature engineering)太重要了!』 **\n", + "\n", + "\n", + "先从最突出的数据属性开始Cabin和Age,有丢失数据实在是对下一步工作影响太大。\n", + "\n", + "Cabin,暂时就按照刚才说的,按Cabin有无数据,将这个属性处理成Yes和No两种类型。\n", + "\n", + "Age:\n", + "\n", + "通常遇到缺值的情况,我们会有几种常见的处理方式\n", + "\n", + "如果缺值的样本占总数比例极高,可能就直接舍弃了,作为特征加入的话,可能反倒带入noise,影响最后的结果了\n", + "如果缺值的样本适中,而该属性非连续值特征属性(比如说类目属性),那就把NaN作为一个新类别,加到类别特征中\n", + "如果缺值的样本适中,而该属性为连续值特征属性,有时候我们会考虑给定一个step(比如这里的age,我们可以考虑每隔2/3岁为一个步长),然后把它离散化,之后把NaN作为一个type加到属性类目中。\n", + "有些情况下,缺失的值个数并不是特别多,那我们也可以试着根据已有的值,拟合一下数据,补充上。\n", + "本例中,后两种处理方式应该都是可行的,我们先试试拟合补全吧(虽然说没有特别多的背景可供我们拟合,这不一定是一个多么好的选择)\n", + "\n", + "这里用scikit-learn中的RandomForest来拟合一下缺失的年龄数据(注:RandomForest是一个用在原始数据中做不同采样,建立多颗DecisionTree,再进行average等等来降低过拟合现象,提高结果的机器学习算法,我们之后会介绍到)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "77DB5AD318FB481C8F098E7DC145348D", + "scrolled": false + }, + "outputs": [], + "source": [ + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "### 使用 RandomForestClassifier 填补缺失的年龄属性\n", + "def set_missing_ages(df):\n", + "\n", + " # 把已有的数值型特征取出来丢进Random Forest Regressor中\n", + " age_df = df[['Age','Fare', 'Parch', 'SibSp', 'Pclass']]\n", + "\n", + " # 乘客分成已知年龄和未知年龄两部分\n", + " known_age = age_df[age_df.Age.notnull()].values\n", + " unknown_age = age_df[age_df.Age.isnull()].values\n", + "\n", + " # y即目标年龄\n", + " y = known_age[:, 0]\n", + "\n", + " # X即特征属性值\n", + " X = known_age[:, 1:]\n", + "\n", + " # fit到RandomForestRegressor之中\n", + " rfr = RandomForestRegressor(random_state=0, n_estimators=2000, n_jobs=-1)\n", + " rfr.fit(X, y)\n", + "\n", + " # 用得到的模型进行未知年龄结果预测\n", + " predictedAges = rfr.predict(unknown_age[:, 1::])\n", + "\n", + " # 用得到的预测结果填补原缺失数据\n", + " df.loc[ (df.Age.isnull()), 'Age' ] = predictedAges \n", + "\n", + " return df, rfr\n", + "\n", + "def set_Cabin_type(df):\n", + " df.loc[ (df.Cabin.notnull()), 'Cabin' ] = \"Yes\"\n", + " df.loc[ (df.Cabin.isnull()), 'Cabin' ] = \"No\"\n", + " return df\n", + "\n", + "data_train, rfr = set_missing_ages(data_train)\n", + "data_train = set_Cabin_type(data_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "5036CC74FAE84F658FB4EF8FB8998E2A", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.00000010A/5 211717.2500NoS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.00000010PC 1759971.2833YesC
2313Heikkinen, Miss. Lainafemale26.00000000STON/O2. 31012827.9250NoS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.0000001011380353.1000YesS
4503Allen, Mr. William Henrymale35.000000003734508.0500NoS
5603Moran, Mr. Jamesmale23.838953003308778.4583NoQ
6701McCarthy, Mr. Timothy Jmale54.000000001746351.8625YesS
7803Palsson, Master. Gosta Leonardmale2.0000003134990921.0750NoS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.0000000234774211.1333NoS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.0000001023773630.0708NoC
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived Pclass \\\n", + "0 1 0 3 \n", + "1 2 1 1 \n", + "2 3 1 3 \n", + "3 4 1 1 \n", + "4 5 0 3 \n", + "5 6 0 3 \n", + "6 7 0 1 \n", + "7 8 0 3 \n", + "8 9 1 3 \n", + "9 10 1 2 \n", + "\n", + " Name Sex Age \\\n", + "0 Braund, Mr. Owen Harris male 22.000000 \n", + "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.000000 \n", + "2 Heikkinen, Miss. Laina female 26.000000 \n", + "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.000000 \n", + "4 Allen, Mr. William Henry male 35.000000 \n", + "5 Moran, Mr. James male 23.838953 \n", + "6 McCarthy, Mr. Timothy J male 54.000000 \n", + "7 Palsson, Master. Gosta Leonard male 2.000000 \n", + "8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.000000 \n", + "9 Nasser, Mrs. Nicholas (Adele Achem) female 14.000000 \n", + "\n", + " SibSp Parch Ticket Fare Cabin Embarked \n", + "0 1 0 A/5 21171 7.2500 No S \n", + "1 1 0 PC 17599 71.2833 Yes C \n", + "2 0 0 STON/O2. 3101282 7.9250 No S \n", + "3 1 0 113803 53.1000 Yes S \n", + "4 0 0 373450 8.0500 No S \n", + "5 0 0 330877 8.4583 No Q \n", + "6 0 0 17463 51.8625 Yes S \n", + "7 3 1 349909 21.0750 No S \n", + "8 0 2 347742 11.1333 No S \n", + "9 1 0 237736 30.0708 No C " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_train.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "954507CD225B48F19CC9BC688024355A", + "mdEditEnable": false + }, + "source": [ + "因为逻辑回归建模时,需要输入的特征都是数值型特征,我们通常会先对类目型的特征因子化。 \n", + "什么叫做因子化呢?举个例子:\n", + "\n", + "以Cabin为例,原本一个属性维度,因为其取值可以是[‘yes’,’no’],而将其平展开为’Cabin_yes’,’Cabin_no’两个属性\n", + "\n", + "原本Cabin取值为yes的,在此处的”Cabin_yes”下取值为1,在”Cabin_no”下取值为0\n", + "原本Cabin取值为no的,在此处的”Cabin_yes”下取值为0,在”Cabin_no”下取值为1\n", + "使用pandas的”get_dummies”来完成这个工作,并拼接在原来的”data_train”之上,如下所示。" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "10DE8C414A214AD9842CA4C772644829", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvivedAgeSibSpParchFareCabin_NoCabin_YesEmbarked_CEmbarked_QEmbarked_SSex_femaleSex_malePclass_1Pclass_2Pclass_3
01022.0107.25001000101001
12138.01071.28330110010100
23126.0007.92501000110001
34135.01053.10000100110100
45035.0008.05001000101001
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived Age SibSp Parch Fare Cabin_No Cabin_Yes \\\n", + "0 1 0 22.0 1 0 7.2500 1 0 \n", + "1 2 1 38.0 1 0 71.2833 0 1 \n", + "2 3 1 26.0 0 0 7.9250 1 0 \n", + "3 4 1 35.0 1 0 53.1000 0 1 \n", + "4 5 0 35.0 0 0 8.0500 1 0 \n", + "\n", + " Embarked_C Embarked_Q Embarked_S Sex_female Sex_male Pclass_1 \\\n", + "0 0 0 1 0 1 0 \n", + "1 1 0 0 1 0 1 \n", + "2 0 0 1 1 0 0 \n", + "3 0 0 1 1 0 1 \n", + "4 0 0 1 0 1 0 \n", + "\n", + " Pclass_2 Pclass_3 \n", + "0 0 1 \n", + "1 0 0 \n", + "2 0 1 \n", + "3 0 0 \n", + "4 0 1 " + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dummies_Cabin = pd.get_dummies(data_train['Cabin'], prefix= 'Cabin')\n", + "\n", + "dummies_Embarked = pd.get_dummies(data_train['Embarked'], prefix= 'Embarked')\n", + "\n", + "dummies_Sex = pd.get_dummies(data_train['Sex'], prefix= 'Sex')\n", + "\n", + "dummies_Pclass = pd.get_dummies(data_train['Pclass'], prefix= 'Pclass')\n", + "\n", + "df = pd.concat([data_train, dummies_Cabin, dummies_Embarked, dummies_Sex, dummies_Pclass], axis=1)\n", + "df.drop(['Pclass', 'Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'], axis=1, inplace=True)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DFF9132E01D14C268CD38CB8B1C56419", + "mdEditEnable": false + }, + "source": [ + "成功地把这些类目属性全都转成0,1的数值属性了。\n", + "\n", + "这样,看起来,是不是我们需要的属性值都有了,且它们都是数值型属性呢。\n", + "\n", + "仔细看看Age和Fare两个属性,乘客的数值幅度变化太大了!各属性值之间scale差距太大,将对收敛速度造成影响!甚至不收敛! \n", + "所以先用scikit-learn里面的preprocessing模块对这俩做一个scaling,所谓scaling,其实就是将一些变化幅度较大的特征化到[-1,1]之内。" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "FBA4537B8FDB444DA85DF162F8A28D95", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvivedAgeSibSpParchFareCabin_NoCabin_YesEmbarked_CEmbarked_QEmbarked_SSex_femaleSex_malePclass_1Pclass_2Pclass_3Age_scaledFare_scaled
01022.0107.25001000101001-0.561377-0.502445
12138.01071.283301100101000.6131730.786845
23126.0007.92501000110001-0.267740-0.488854
34135.01053.100001001101000.3929450.420730
45035.0008.050010001010010.392945-0.486337
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived Age SibSp Parch Fare Cabin_No Cabin_Yes \\\n", + "0 1 0 22.0 1 0 7.2500 1 0 \n", + "1 2 1 38.0 1 0 71.2833 0 1 \n", + "2 3 1 26.0 0 0 7.9250 1 0 \n", + "3 4 1 35.0 1 0 53.1000 0 1 \n", + "4 5 0 35.0 0 0 8.0500 1 0 \n", + "\n", + " Embarked_C Embarked_Q Embarked_S Sex_female Sex_male Pclass_1 \\\n", + "0 0 0 1 0 1 0 \n", + "1 1 0 0 1 0 1 \n", + "2 0 0 1 1 0 0 \n", + "3 0 0 1 1 0 1 \n", + "4 0 0 1 0 1 0 \n", + "\n", + " Pclass_2 Pclass_3 Age_scaled Fare_scaled \n", + "0 0 1 -0.561377 -0.502445 \n", + "1 0 0 0.613173 0.786845 \n", + "2 0 1 -0.267740 -0.488854 \n", + "3 0 0 0.392945 0.420730 \n", + "4 0 1 0.392945 -0.486337 " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import sklearn.preprocessing as preprocessing\n", + "scaler = preprocessing.StandardScaler()\n", + "age_scale_param = scaler.fit(df['Age'].values.reshape(-1,1))\n", + "df['Age_scaled'] = scaler.fit_transform(df['Age'].values.reshape(-1,1), age_scale_param)\n", + "fare_scale_param = scaler.fit(df['Fare'].values.reshape(-1,1))\n", + "df['Fare_scaled'] = scaler.fit_transform(df['Fare'].values.reshape(-1,1), fare_scale_param)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F6D48EE010B042EC870A17B8FFFD61B5", + "mdEditEnable": false + }, + "source": [ + "恩,好看多了,万事俱备,只欠建模。马上就要看到成效了,哈哈。我们把需要的属性值抽出来,转成scikit-learn里面LogisticRegression可以处理的格式。\n", + "## 逻辑回归建模\n", + "我们把需要的feature字段取出来,转成numpy格式,使用scikit-learn中的LogisticRegression建模。" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "DCE1A3BC05B7443E8810C1EF8B5C14C8", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "LogisticRegression(penalty='l1', solver='liblinear', tol=1e-06)" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn import linear_model\n", + "\n", + "# 用正则取出我们要的属性值\n", + "train_df = df.filter(regex='Survived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*')\n", + "train_np = train_df.values\n", + "\n", + "# y即第0列:Survival结果\n", + "y = train_np[:, 0]\n", + "\n", + "# X即第1列及以后:特征属性值\n", + "X = train_np[:, 1:]\n", + "\n", + "# fit到LogisticRegression之中\n", + "clf = linear_model.LogisticRegression(solver='liblinear',C=1.0, penalty='l1', tol=1e-6)\n", + "clf.fit(X, y)\n", + "\n", + "clf" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D5A6805C05604A8D8DDB2382248F29D0", + "mdEditEnable": false + }, + "source": [ + "good,很顺利,我们得到了一个model。\n", + "\n", + "先淡定!淡定!你以为把test.csv直接丢进model里就能拿到结果啊…骚年,图样图森破啊!我们的”test_data”也要做和”train_data”一样的预处理啊!!" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "E984C35272844F5182912EE3898B3A81", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdAgeSibSpParchFareCabin_NoCabin_YesEmbarked_CEmbarked_QEmbarked_SSex_femaleSex_malePclass_1Pclass_2Pclass_3Age_scaledFare_scaled
089234.5007.829210010010010.307526-0.496637
189347.0107.000010001100011.256242-0.511497
289462.0009.687510010010102.394702-0.463335
389527.0008.66251000101001-0.261704-0.481704
489622.01112.28751000110001-0.641190-0.416740
\n", + "
" + ], + "text/plain": [ + " PassengerId Age SibSp Parch Fare Cabin_No Cabin_Yes Embarked_C \\\n", + "0 892 34.5 0 0 7.8292 1 0 0 \n", + "1 893 47.0 1 0 7.0000 1 0 0 \n", + "2 894 62.0 0 0 9.6875 1 0 0 \n", + "3 895 27.0 0 0 8.6625 1 0 0 \n", + "4 896 22.0 1 1 12.2875 1 0 0 \n", + "\n", + " Embarked_Q Embarked_S Sex_female Sex_male Pclass_1 Pclass_2 Pclass_3 \\\n", + "0 1 0 0 1 0 0 1 \n", + "1 0 1 1 0 0 0 1 \n", + "2 1 0 0 1 0 1 0 \n", + "3 0 1 0 1 0 0 1 \n", + "4 0 1 1 0 0 0 1 \n", + "\n", + " Age_scaled Fare_scaled \n", + "0 0.307526 -0.496637 \n", + "1 1.256242 -0.511497 \n", + "2 2.394702 -0.463335 \n", + "3 -0.261704 -0.481704 \n", + "4 -0.641190 -0.416740 " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_test = pd.read_csv(\"test.csv\")\n", + "data_test.loc[ (data_test.Fare.isnull()), 'Fare' ] = 0\n", + "# 接着我们对test_data做和train_data中一致的特征变换\n", + "# 首先用同样的RandomForestRegressor模型填上丢失的年龄\n", + "tmp_df = data_test[['Age','Fare', 'Parch', 'SibSp', 'Pclass']]\n", + "null_age = tmp_df[data_test.Age.isnull()].values\n", + "# 根据特征属性X预测年龄并补上\n", + "X = null_age[:, 1:]\n", + "predictedAges = rfr.predict(X)\n", + "data_test.loc[ (data_test.Age.isnull()), 'Age' ] = predictedAges\n", + "\n", + "data_test = set_Cabin_type(data_test)\n", + "dummies_Cabin = pd.get_dummies(data_test['Cabin'], prefix= 'Cabin')\n", + "dummies_Embarked = pd.get_dummies(data_test['Embarked'], prefix= 'Embarked')\n", + "dummies_Sex = pd.get_dummies(data_test['Sex'], prefix= 'Sex')\n", + "dummies_Pclass = pd.get_dummies(data_test['Pclass'], prefix= 'Pclass')\n", + "\n", + "\n", + "df_test = pd.concat([data_test, dummies_Cabin, dummies_Embarked, dummies_Sex, dummies_Pclass], axis=1)\n", + "df_test.drop(['Pclass', 'Name', 'Sex', 'Ticket', 'Cabin', 'Embarked'], axis=1, inplace=True)\n", + "df_test['Age_scaled'] = scaler.fit_transform(df_test['Age'].values.reshape(-1,1), age_scale_param)\n", + "df_test['Fare_scaled'] = scaler.fit_transform(df_test['Fare'].values.reshape(-1,1), fare_scale_param)\n", + "df_test.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "25901C4D9F6B47678E87FC4D898D1CDA", + "mdEditEnable": false + }, + "source": [ + "不错不错,数据很OK,差最后一步了。 \n", + "下面就做预测取结果吧!!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "E140D03943F24EE6BC56983AFC1D8182", + "scrolled": false + }, + "outputs": [], + "source": [ + "test = df_test.filter(regex='Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*')\n", + "predictions = clf.predict(test)\n", + "result = pd.DataFrame({'PassengerId':data_test['PassengerId'].values, 'Survived':predictions.astype(np.int32)})\n", + "result.to_csv(\"logistic_regression_predictions.csv\", index=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "5AD6B7EEB61B420C894E2964A3AB7993", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvived
08920
18930
28940
38950
48961
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived\n", + "0 892 0\n", + "1 893 0\n", + "2 894 0\n", + "3 895 0\n", + "4 896 1" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.read_csv(\"logistic_regression_predictions.csv\").head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6D6B7CAACAEF4264824F93794CC731FE", + "mdEditEnable": false + }, + "source": [ + "\n", + "格式正确\n", + "\n", + "在Kaggle的Make a submission页面,提交上结果。如下: \n", + "\n", + "0.7635,这只是简单分析处理过后出的一个baseline模型嘛。\n", + "\n", + "## 逻辑回归系统优化\n", + "### 模型系数关联分析\n", + "\n", + "Andrew Ng老师的machine Learning课程说的,现在应该分析分析模型现在的状态,是过/欠拟合?以确定我们需要更多的特征还是更多数据,或者其他操作。\n", + "\n", + "有一条很著名的learning curves。\n", + "\n", + "不过在现在的场景下先不着急做这个事情这个baseline系统还有些粗糙,再挖掘挖掘。\n", + "\n", + "首先,Name和Ticket两个属性完整舍弃了(因为这俩属性几乎每一条记录都是一个完全不同的值并没有找到很直接的处理方式)。\n", + "\n", + "然后,年龄的拟合本身也未必是一件非常靠谱的事情,依据其余属性,其实并不能很好地拟合预测出未知的年龄。再一个,日常经验,小盆友和老人可能得到的照顾会多一些,这样看的话,年龄作为一个连续值,给一个固定的系数,应该和年龄是一个正相关或者负相关,似乎体现不出两头受照顾的实际情况,所以,把年龄离散化,按区段分作类别属性会更合适一些。\n", + "\n", + "把得到的model系数和feature关联起来看看。\n", + "\n", + "**LR模型系数:**" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "34F8D229D4764822AF8E21C87060A4B0", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
columnscoef
0SibSp[-0.34423350629771865]
1Parch[-0.10491782795293876]
2Cabin_No[0.0]
3Cabin_Yes[0.9020907454858608]
4Embarked_C[0.0]
5Embarked_Q[0.0]
6Embarked_S[-0.4172600741014519]
7Sex_female[1.9565674266199975]
8Sex_male[-0.677419771178524]
9Pclass_1[0.34116840933660547]
10Pclass_2[0.0]
11Pclass_3[-1.1941311197849658]
12Age_scaled[-0.5237628105223407]
13Fare_scaled[0.08443592417790625]
\n", + "
" + ], + "text/plain": [ + " columns coef\n", + "0 SibSp [-0.34423350629771865]\n", + "1 Parch [-0.10491782795293876]\n", + "2 Cabin_No [0.0]\n", + "3 Cabin_Yes [0.9020907454858608]\n", + "4 Embarked_C [0.0]\n", + "5 Embarked_Q [0.0]\n", + "6 Embarked_S [-0.4172600741014519]\n", + "7 Sex_female [1.9565674266199975]\n", + "8 Sex_male [-0.677419771178524]\n", + "9 Pclass_1 [0.34116840933660547]\n", + "10 Pclass_2 [0.0]\n", + "11 Pclass_3 [-1.1941311197849658]\n", + "12 Age_scaled [-0.5237628105223407]\n", + "13 Fare_scaled [0.08443592417790625]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.DataFrame({\"columns\":list(train_df.columns)[1:], \"coef\":list(clf.coef_.T)})" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CAEA1C315A1D43E0B045C0FBF9C2C4F3", + "mdEditEnable": false, + "scrolled": false + }, + "source": [ + "系数为正的特征,和最后结果是一个正相关,反之为负相关。\n", + "\n", + "那些权重绝对值非常大的feature,在模型上:\n", + "\n", + "- Sex属性,如果是female会极大提高最后获救的概率,而male会很大程度拉低这个概率。\n", + "- Pclass属性,1等舱乘客最后获救的概率会上升,而乘客等级为3会极大地拉低这个概率。\n", + "- 有Cabin值会很大程度拉升最后获救概率(这里似乎能看到了一点端倪,事实上从最上面的有无Cabin记录的Survived分布图上看出,即使有Cabin记录的乘客也有一部分遇难了,估计这个属性上我们挖掘还不够)\n", + "- Age是一个负相关,意味着在我们的模型里,年龄越小,越有获救的优先权(还得回原数据看看这个是否合理)\n", + "- 有一个登船港口S会很大程度拉低获救的概率,另外俩港口压根就没啥作用(这个实际上非常奇怪,因为我们从之前的统计图上并没有看到S港口的获救率非常低,所以也许可以考虑把登船港口这个feature去掉试试)。\n", + "- 船票Fare有小幅度的正相关(并不意味着这个feature作用不大,有可能是我们细化的程度还不够,举个例子,说不定我们得对它离散化,再分至各个乘客等级上?)\n", + "\n", + "噢啦,观察完了,我们现在有一些想法了,但是怎么样才知道,哪些优化的方法是promising的呢?\n", + "\n", + "因为test.csv里面并没有Survived这个字段无法在这份数据上评定我们算法在该场景下的效果…\n", + "\n", + "『每做一次调整就make a submission,然后根据结果来判定这次调整的好坏』是行不通的…\n", + "\n", + "### 交叉验证\n", + "\n", + "\n", + "- **『要做交叉验证(cross validation)!』 **\n", + "- **『要做交叉验证(cross validation)!』 **\n", + "- **『要做交叉验证(cross validation)!』 **\n", + "\n", + "通常情况下,这么做cross validation:把train.csv分成两部分,一部分用于训练我们需要的模型,另外一部分数据上看我们预测算法的效果。\n", + "\n", + "用scikit-learn的cross_validation来帮我们完成小数据集上的这个工作。\n", + "\n", + "先简单看看cross validation情况下的打分\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "id": "0399477A313B4B0484FF4E558E6F06A8", + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0.81564246 0.80898876 0.78651685 0.78651685 0.81460674]\n" + ] + } + ], + "source": [ + "# from sklearn import cross_validation\n", + "# 参考https://blog.csdn.net/cheneyshark/article/details/78640887 , 0.18版本中,cross_validation被废弃\n", + "# 改为下面的从model_selection直接import cross_val_score 和 train_test_split\n", + "from sklearn.model_selection import cross_val_score, train_test_split\n", + "\n", + " #简单看看打分情况\n", + "clf = linear_model.LogisticRegression(solver='liblinear',C=1.0, penalty='l1', tol=1e-6)\n", + "all_data = df.filter(regex='Survived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*')\n", + "X = all_data.values[:,1:]\n", + "y = all_data.values[:,0]\n", + "# print(cross_validation.cross_val_score(clf, X, y, cv=5))\n", + "print(cross_val_score(clf, X, y, cv=5))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "325C5FFAE140492B829C3750CC0F2C3B", + "mdEditEnable": false + }, + "source": [ + "结果是下面酱紫的: \n", + "[0.81564246 0.81005587 0.78651685 0.78651685 0.81355932]\n", + "\n", + "似乎比Kaggle上的结果略高哈,毕竟用的是不是同一份数据集评估的。\n", + "\n", + "等等,既然我们要做交叉验证,那我们干脆先把交叉验证里面的bad case拿出来看看,看看人眼审核,是否能发现什么蛛丝马迹,是我们忽略了哪些信息,使得这些乘客被判定错了。再把bad case上得到的想法和前头系数分析的合在一起,然后逐个试试。\n", + "\n", + "下面我们做数据分割,并且在原始数据集上瞄一眼bad case:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "id": "FDB2637F9C574ED88B41F198D3C677C8", + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
232411Sloper, Mr. William Thompsonmale28.000011378835.5000A6S
252613Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...female38.001534707731.3875NaNS
495003Arnold-Franchi, Mrs. Josef (Josefine Franchi)female18.001034923717.8000NaNS
555611Woolner, Mr. HughmaleNaN001994735.5000C52S
656613Moubarek, Master. GeriosmaleNaN11266115.2458NaNC
787912Caldwell, Master. Alden Gatesmale0.830224873829.0000NaNS
818213Sheerlinck, Mr. Jan Baptistmale29.00003457799.5000NaNS
11811901Baxter, Mr. Quigg Edmondmale24.0001PC 17558247.5208B58 B60C
13914001Giglio, Mr. Victormale24.0000PC 1759379.2000B86C
16516613Goldsmith, Master. Frank John William \"Frankie\"male9.000236329120.5250NaNS
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived Pclass \\\n", + "23 24 1 1 \n", + "25 26 1 3 \n", + "49 50 0 3 \n", + "55 56 1 1 \n", + "65 66 1 3 \n", + "78 79 1 2 \n", + "81 82 1 3 \n", + "118 119 0 1 \n", + "139 140 0 1 \n", + "165 166 1 3 \n", + "\n", + " Name Sex Age SibSp \\\n", + "23 Sloper, Mr. William Thompson male 28.00 0 \n", + "25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... female 38.00 1 \n", + "49 Arnold-Franchi, Mrs. Josef (Josefine Franchi) female 18.00 1 \n", + "55 Woolner, Mr. Hugh male NaN 0 \n", + "65 Moubarek, Master. Gerios male NaN 1 \n", + "78 Caldwell, Master. Alden Gates male 0.83 0 \n", + "81 Sheerlinck, Mr. Jan Baptist male 29.00 0 \n", + "118 Baxter, Mr. Quigg Edmond male 24.00 0 \n", + "139 Giglio, Mr. Victor male 24.00 0 \n", + "165 Goldsmith, Master. Frank John William \"Frankie\" male 9.00 0 \n", + "\n", + " Parch Ticket Fare Cabin Embarked \n", + "23 0 113788 35.5000 A6 S \n", + "25 5 347077 31.3875 NaN S \n", + "49 0 349237 17.8000 NaN S \n", + "55 0 19947 35.5000 C52 S \n", + "65 1 2661 15.2458 NaN C \n", + "78 2 248738 29.0000 NaN S \n", + "81 0 345779 9.5000 NaN S \n", + "118 1 PC 17558 247.5208 B58 B60 C \n", + "139 0 PC 17593 79.2000 B86 C \n", + "165 2 363291 20.5250 NaN S " + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 分割数据,按照 训练数据:cv数据 = 7:3的比例\n", + "# split_train, split_cv = cross_validation.train_test_split(df, test_size=0.3, random_state=0)\n", + "split_train, split_cv = train_test_split(df, test_size=0.3, random_state=42)\n", + "\n", + "train_df = split_train.filter(regex='Survived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*')\n", + "# 生成模型\n", + "clf = linear_model.LogisticRegression(solver='liblinear',C=1.0, penalty='l1', tol=1e-6)\n", + "clf.fit(train_df.values[:,1:], train_df.values[:,0])\n", + "\n", + "# 对cross validation数据进行预测\n", + "\n", + "cv_df = split_cv.filter(regex='Survived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass_.*')\n", + "predictions = clf.predict(cv_df.values[:,1:])\n", + "\n", + "origin_data_train = pd.read_csv(\"train.csv\")\n", + "bad_cases = origin_data_train.loc[origin_data_train['PassengerId'].isin(split_cv[predictions != cv_df.values[:,0]]['PassengerId'].values)]\n", + "bad_cases.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FCBECB2990084A9587F17BB87BE509ED", + "mdEditEnable": false + }, + "source": [ + "大家可以自己跑一遍试试,拿到bad cases之后,仔细看看。也会有一些猜测和想法。其中会有一部分可能会印证在系数分析部分的猜测,那这些优化的想法优先级可以放高一些。\n", + "\n", + "现在有了”train_df” 和 “vc_df” 两个数据部分,前者用于训练model,后者用于评定和选择模型。可以开始可劲折腾了。\n", + "\n", + "我们随便列一些可能可以做的优化操作:\n", + "\n", + "- Age属性不使用现在的拟合方式,而是根据名称中的『Mr』『Mrs』『Miss』等的平均值进行填充。\n", + "- Age不做成一个连续值属性,而是使用一个步长进行离散化,变成离散的类目feature。\n", + "- Cabin再细化一些,对于有记录的Cabin属性,我们将其分为前面的字母部分(我猜是位置和船层之类的信息) 和 后面的数字部分(应该是房间号,有意思的事情是,如果你仔细看看原始数据,你会发现,这个值大的情况下,似乎获救的可能性高一些)。\n", + "- Pclass和Sex俩太重要了,我们试着用它们去组出一个组合属性来试试,这也是另外一种程度的细化。\n", + "- 单加一个Child字段,Age<=12的,设为1,其余为0(你去看看数据,确实小盆友优先程度很高啊)\n", + "- 如果名字里面有『Mrs』,而Parch>1的,我们猜测她可能是一个母亲,应该获救的概率也会提高,因此可以多加一个Mother字段,此种情况下设为1,其余情况下设为0\n", + "- 登船港口可以考虑先去掉试试(Q和C本来就没权重,S有点诡异)\n", + "- 把堂兄弟/兄妹 和 Parch 还有自己 个数加在一起组一个Family_size字段(考虑到大家族可能对最后的结果有影响)\n", + "- Name是一个我们一直没有触碰的属性,我们可以做一些简单的处理,比如说男性中带某些字眼的(‘Capt’, ‘Don’, ‘Major’, ‘Sir’)可以统一到一个Title,女性也一样。\n", + "\n", + "大家接着往下挖掘,可能还可以想到更多可以细挖的部分。我这里先列这些了,然后我们可以使用手头上的”train_df”和”cv_df”开始试验这些feature engineering的tricks是否有效了。\n", + "\n", + "试验的过程比较漫长,也需要有耐心,而且我们经常会面临很尴尬的状况,就是我们灵光一闪,想到一个feature,然后坚信它一定有效,结果试验下来,效果还不如试验之前的结果。恩,需要坚持和耐心,以及不断的挖掘。\n", + "\n", + "我最好的结果是在『Survived~C(Pclass)+C(Title)+C(Sex)+C(Age_bucket)+C(Cabin_num_bucket)Mother+Fare+Family_Size』下取得的,结果如下(抱歉,博主君commit的时候手抖把页面关了,于是没截着图,下面这张图是在我得到最高分之后,用这次的结果重新make commission的,截了个图,得分是0.79426,不是目前我的最高分哈,因此排名木有变…):\n", + "\n", + "![做完feature engineering调整之后的结果](https://www.z4a.net/images/2018/11/28/result_3.jpg)\n", + "\n", + "### learning curves\n", + "\n", + "有一个很可能发生的问题是,我们不断地做feature engineering,产生的特征越来越多,用这些特征去训练模型,会对我们的训练集拟合得越来越好,同时也可能在逐步丧失泛化能力,从而在待预测的数据上,表现不佳,也就是发生过拟合问题。\n", + "\n", + "从另一个角度上说,如果模型在待预测的数据上表现不佳,除掉上面说的过拟合问题,也有可能是欠拟合问题,也就是说在训练集上,其实拟合的也不是那么好。\n", + "\n", + "额,这个欠拟合和过拟合怎么解释呢。这么说吧:\n", + "\n", + "- 过拟合就像是你班那个学数学比较刻板的同学,老师讲过的题目,一字不漏全记下来了,于是老师再出一样的题目,分分钟精确出结果。but数学考试,因为总是碰到新题目,所以成绩不咋地。\n", + "- 欠拟合就像是,咳咳,和博主level差不多的差生。连老师讲的练习题也记不住,于是连老师出一样题目复习的周测都做不好,考试更是可想而知了。\n", + "而在机器学习的问题上,对于过拟合和欠拟合两种情形。我们优化的方式是不同的。\n", + "\n", + "对过拟合而言,通常以下策略对结果优化是有用的:\n", + "\n", + "- 做一下feature selection,挑出较好的feature的subset来做training\n", + "- 提供更多的数据,从而弥补原始数据的bias问题,学习到的model也会更准确\n", + "而对于欠拟合而言,我们通常需要更多的feature,更复杂的模型来提高准确度。\n", + "\n", + "著名的learning curve可以帮我们判定我们的模型现在所处的状态。我们以样本数为横坐标,训练和交叉验证集上的错误率作为纵坐标,两种状态分别如下两张图所示:过拟合(overfitting/high variace),欠拟合(underfitting/high bias)\n", + "\n", + "![过拟合](https://www.z4a.net/images/2018/11/28/high_variance.jpg)\n", + "![欠拟合](https://www.z4a.net/images/2018/11/28/10067a39f8c5849405a.jpg)\n", + "\n", + "我们也可以把错误率替换成准确率(得分),得到另一种形式的learning curve(sklearn 里面是这么做的)。\n", + "\n", + "回到我们的问题,我们用scikit-learn里面的learning_curve来帮我们分辨我们模型的状态。举个例子,这里我们一起画一下我们最先得到的baseline model的learning curve。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "id": "C76CD67974F947BC8B7F5D32E2134DD3", + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/conda/lib/python3.6/site-packages/sklearn/model_selection/_split.py:2053: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22.\n", + " warnings.warn(CV_WARNING, FutureWarning)\n" + ] + }, + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "(0.8065696844854024, 0.018258876711338634)" + ] + }, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "# from sklearn.learning_curve import learning_curve 修改以fix learning_curve DeprecationWarning\n", + "from sklearn.model_selection import learning_curve\n", + "\n", + "# 用sklearn的learning_curve得到training_score和cv_score,使用matplotlib画出learning curve\n", + "def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None, n_jobs=1, \n", + " train_sizes=np.linspace(.05, 1., 20), verbose=0, plot=True):\n", + " \"\"\"\n", + " 画出data在某模型上的learning curve.\n", + " 参数解释\n", + " ----------\n", + " estimator : 你用的分类器。\n", + " title : 表格的标题。\n", + " X : 输入的feature,numpy类型\n", + " y : 输入的target vector\n", + " ylim : tuple格式的(ymin, ymax), 设定图像中纵坐标的最低点和最高点\n", + " cv : 做cross-validation的时候,数据分成的份数,其中一份作为cv集,其余n-1份作为training(默认为3份)\n", + " n_jobs : 并行的的任务数(默认1)\n", + " \"\"\"\n", + " train_sizes, train_scores, test_scores = learning_curve(\n", + " estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes, verbose=verbose)\n", + " \n", + " train_scores_mean = np.mean(train_scores, axis=1)\n", + " train_scores_std = np.std(train_scores, axis=1)\n", + " test_scores_mean = np.mean(test_scores, axis=1)\n", + " test_scores_std = np.std(test_scores, axis=1)\n", + " \n", + " if plot:\n", + " plt.figure()\n", + " plt.title(title)\n", + " if ylim is not None:\n", + " plt.ylim(*ylim)\n", + " plt.xlabel(u\"训练样本数\")\n", + " plt.ylabel(u\"得分\")\n", + " plt.gca().invert_yaxis()\n", + " plt.grid()\n", + " \n", + " plt.fill_between(train_sizes, train_scores_mean - train_scores_std, train_scores_mean + train_scores_std, \n", + " alpha=0.1, color=\"b\")\n", + " plt.fill_between(train_sizes, test_scores_mean - test_scores_std, test_scores_mean + test_scores_std, \n", + " alpha=0.1, color=\"r\")\n", + " plt.plot(train_sizes, train_scores_mean, 'o-', color=\"b\", label=u\"训练集上得分\")\n", + " plt.plot(train_sizes, test_scores_mean, 'o-', color=\"r\", label=u\"交叉验证集上得分\")\n", + " \n", + " plt.legend(loc=\"best\")\n", + " \n", + " plt.draw()\n", + " plt.gca().invert_yaxis()\n", + " plt.show()\n", + " \n", + " midpoint = ((train_scores_mean[-1] + train_scores_std[-1]) + (test_scores_mean[-1] - test_scores_std[-1])) / 2\n", + " diff = (train_scores_mean[-1] + train_scores_std[-1]) - (test_scores_mean[-1] - test_scores_std[-1])\n", + " return midpoint, diff\n", + "\n", + "plot_learning_curve(clf, u\"学习曲线\", X, y)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2B1321FDECB743C78976C4E94B469645", + "mdEditEnable": false + }, + "source": [ + "在实际数据上看,我们得到的learning curve没有理论推导的那么光滑哈,但是可以大致看出来,训练集和交叉验证集上的得分曲线走势还是符合预期的。\n", + "\n", + "目前的曲线看来,我们的model并不处于overfitting的状态(overfitting的表现一般是训练集上得分高,而交叉验证集上要低很多,中间的gap比较大)。因此我们可以再做些feature engineering的工作,添加一些新产出的特征或者组合特征到模型中。\n", + "\n", + "## 模型融合(model ensemble)\n", + "\n", + "\n", + "- **『模型融合(model ensemble)很重要!』 **\n", + "- **『模型融合(model ensemble)很重要!』 **\n", + "- **『模型融合(model ensemble)很重要!』 **\n", + "\n", + "\n", + "最简单的模型融合大概就是,比如分类问题,当我们手头上有一堆在同一份数据集上训练得到的分类器(比如logistic regression,SVM,KNN,random forest,神经网络),那我们让他们都分别去做判定,然后对结果做投票统计,取票数最多的结果为最后结果。\n", + "\n", + "模型融合可以比较好地缓解,训练过程中产生的过拟合问题,从而对于结果的准确度提升有一定的帮助。\n", + "\n", + "现在只用了logistic regression,如果还想用这个融合思想去提高结果\n", + "\n", + "既然这个时候模型没得选,那就在数据上动动手脚。\n", + "\n", + "那脆就不要用全部的训练集,每次取训练集的一个subset做训练,这样,我们虽然用的是同一个机器学习算法,但是得到的模型却是不一样的;同时,因为我们没有任何一份子数据集是全的,因此即使出现过拟合,也是在子训练集上出现过拟合,而不是全体数据上,这样做一个融合,可能对最后的结果有一定的帮助\n", + "\n", + "我们用scikit-learn里面的Bagging来完成上面的思路:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "id": "0B5FC2631C72456186DCEC6929924625" + }, + "outputs": [], + "source": [ + "from sklearn.ensemble import BaggingRegressor\n", + "\n", + "train_df = df.filter(regex='Survived|Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass.*|Mother|Child|Family|Title')\n", + "train_np = train_df.values\n", + "\n", + "# y即Survival结果\n", + "y = train_np[:, 0]\n", + "\n", + "# X即特征属性值\n", + "X = train_np[:, 1:]\n", + "\n", + "# fit到BaggingRegressor之中\n", + "#clf = linear_model.LogisticRegression(C=1.0, penalty='l1', tol=1e-6)\n", + "#'newton-cg'、'lbfgs'、'sag' 和 'saga' 处理 L2 或无惩罚\n", + "#'liblinear' 和 'saga' 也处理 L1 惩罚\n", + "clf = linear_model.LogisticRegression(C=1, penalty='l1', solver='liblinear')\n", + "bagging_clf = BaggingRegressor(clf, n_estimators=20, max_samples=0.8, max_features=1.0, bootstrap=True, bootstrap_features=False, n_jobs=-1)\n", + "bagging_clf.fit(X, y)\n", + "\n", + "test = df_test.filter(regex='Age_.*|SibSp|Parch|Fare_.*|Cabin_.*|Embarked_.*|Sex_.*|Pclass.*|Mother|Child|Family|Title')\n", + "predictions = bagging_clf.predict(test)\n", + "result = pd.DataFrame({'PassengerId':data_test['PassengerId'].values, 'Survived':predictions.astype(np.int32)})\n", + "result.to_csv(\"logistic_regression_bagging_predictions.csv\", index=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "id": "7EABF7D54061460CA6CA37873DC2424F" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PassengerIdSurvived
08920
18930
28940
38950
48960
58970
68981
78990
89001
99010
\n", + "
" + ], + "text/plain": [ + " PassengerId Survived\n", + "0 892 0\n", + "1 893 0\n", + "2 894 0\n", + "3 895 0\n", + "4 896 0\n", + "5 897 0\n", + "6 898 1\n", + "7 899 0\n", + "8 900 1\n", + "9 901 0" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.read_csv(\"logistic_regression_bagging_predictions.csv\").head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4FD550B25C30489B8204F4C788E79F18", + "mdEditEnable": false + }, + "source": [ + "恩,对结果还是有帮助的。\n", + "\n", + "\n", + "## 总结\n", + "\n", + "对于任何的机器学习问题,不要一上来就追求尽善尽美,先用自己会的算法撸一个baseline的model出来,再进行后续的分析步骤,一步步提高。\n", + "\n", + "在问题的结果过程中:\n", + "\n", + "- **『对数据的认识太重要了!』**\n", + "- **『数据中的特殊点/离群点的分析和处理太重要了!』**\n", + "- **『特征工程(feature engineering)太重要了!』**\n", + "- **『模型融合(model ensemble)太重要了!』**\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}