|
45 | 45 | "#### Objective\n", |
46 | 46 | "In this challenge, we will complete the analysis of what sorts of people were likely to survive. \n", |
47 | 47 | "\n", |
48 | | - "In addition, we will build a regression model to predict ticket price(Fare).\n", |
| 48 | + "In addition, we will build a regression model to predict ticket price (Fare).\n", |
49 | 49 | "\n" |
50 | 50 | ] |
51 | 51 | }, |
|
754 | 754 | "source": [ |
755 | 755 | "#How many survived\n", |
756 | 756 | "f,ax=plt.subplots(figsize=(5,5))\n", |
757 | | - "sns.countplot('Survived',data=df_titanic, ax = ax)\n", |
| 757 | + "sns.countplot(x='Survived',data=df_titanic, ax = ax)\n", |
758 | 758 | "ax.set_title('Perished vs. Survived')\n", |
759 | 759 | "#Not necessary, just to eliminate any output\n", |
760 | 760 | "plt.show()" |
|
857 | 857 | "source": [ |
858 | 858 | "#Male vs. Female\n", |
859 | 859 | "f,ax=plt.subplots(figsize=(5,5))\n", |
860 | | - "sns.countplot('Sex',data=df_titanic,ax=ax)\n", |
| 860 | + "sns.countplot(x='Sex',data=df_titanic,ax=ax)\n", |
861 | 861 | "ax.set_title('Male vs. Female')\n", |
862 | 862 | "plt.show()" |
863 | 863 | ] |
|
961 | 961 | "source": [ |
962 | 962 | "#Perished vs. Survived for male/female\n", |
963 | 963 | "f,ax=plt.subplots(figsize=(5,5))\n", |
964 | | - "sns.countplot('Sex',hue='Survived',data=df_titanic,ax=ax)\n", |
| 964 | + "sns.countplot(x='Sex',hue='Survived',data=df_titanic,ax=ax)\n", |
965 | 965 | "ax.set_title('Gender: Perished vs. Survived')\n", |
966 | 966 | "plt.show()" |
967 | 967 | ] |
|
1077 | 1077 | "source": [ |
1078 | 1078 | "#bar plot and seaborn countplot\n", |
1079 | 1079 | "f,ax=plt.subplots(figsize=(5,5))\n", |
1080 | | - "sns.countplot('Pclass',hue='Survived',data=df_titanic,ax=ax)\n", |
| 1080 | + "sns.countplot(x='Pclass',hue='Survived',data=df_titanic,ax=ax)\n", |
1081 | 1081 | "ax.set_title('Pclass:Perished vs. Survived')\n", |
1082 | 1082 | "plt.show()" |
1083 | 1083 | ] |
|
1727 | 1727 | "### Feature Engineering\n", |
1728 | 1728 | "We'll create a new column FamilySize. There are 2 columns related to family size, parch indicates parent or children number, Sibsp indicates sibling and spouse number.\n", |
1729 | 1729 | "\n", |
1730 | | - "Take one name 'Asplund' as example, we can see that total family size is 7 (Parch + SibSp + 1), and each family member has same Fare, which means the Fare is for the whole group. So family size will be an important feature to predict Fare. There're only 4 Asplunds out of 7 in the dataset becasue the dataset is only a subset of all passengers." |
| 1730 | + "Take one name 'Asplund' as example, we can see that total family size is 7 (Parch + SibSp + 1), and each family member has the same Fare value, which means the Fare is for the whole group. So family size will be an important feature to predict Fare. There're only 4 Asplunds out of 7 in the dataset because the dataset is only a subset of all passengers." |
1731 | 1731 | ] |
1732 | 1732 | }, |
1733 | 1733 | { |
|
2054 | 2054 | "\n", |
2055 | 2055 | "## Step 4: Modeling\n", |
2056 | 2056 | "\n", |
2057 | | - "Now we have a relatively clean dataset (except for the **Cabin** column which has many missing values). We can do a classification on Survived to predict whether a passenger could survive the disaster or a regression on Fare to predict ticket fare. This dataset is not a good dataset for regression. But since we don't talk about classification in this workshop we will construct a linear regression on Fare in this exercise." |
| 2057 | + "Now we have a relatively clean dataset(Except for the **Cabin** column which has many missing values). We can do a classification on Survived to predict whether a passenger could survive the disaster or a regression on Fare to predict ticket fare. This dataset is not a good dataset for regression. But since we don't talk about classification in this workshop we will construct a linear regression on Fare in this exercise." |
| 2058 | + ] |
| 2059 | + }, |
| 2060 | + { |
| 2061 | + "cell_type": "markdown", |
| 2062 | + "metadata": {}, |
| 2063 | + "source": [ |
| 2064 | + "##### Task16: Contruct a regresson on Fare\n", |
| 2065 | + "Construct regression model with statsmodels.\n", |
| 2066 | + "\n", |
| 2067 | + "Pick Pclass, Embarked, FamilySize as independent variables." |
2058 | 2068 | ] |
2059 | 2069 | }, |
2060 | 2070 | { |
|
0 commit comments