Skip to content

Commit a9f5926

Browse files
authored
Merge pull request #97 from chennesy/jt14den-patch-2
removing date from data-visualisation.md
2 parents d209bec + 9629030 commit a9f5926

1 file changed

Lines changed: 0 additions & 80 deletions

File tree

episodes/data-visualisation.md

Lines changed: 0 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -33,86 +33,6 @@ Let’s look at the data:
3333
df_long.head()
3434
```
3535

36-
| | branch | address | city | zip code | ytd | year | month | circulation |
37-
|-----|----------------|-------------------------|---------|----------|--------|------|---------|-------------|
38-
| 0 | Albany Park | 5150 N. Kimball Ave. | Chicago | 60625.0 | 120059 | 2011 | january | 8427 |
39-
| 1 | Altgeld | 13281 S. Corliss Ave. | Chicago | 60827.0 | 9611 | 2011 | january | 1258 |
40-
| 2 | Archer Heights | 5055 S. Archer Ave. | Chicago | 60632.0 | 101951 | 2011 | january | 8104 |
41-
| 3 | Austin | 5615 W. Race Ave. | Chicago | 60644.0 | 25527 | 2011 | january | 1755 |
42-
| 4 | Austin-Irving | 6100 W. Irving Park Rd. | Chicago | 60634.0 | 165634 | 2011 | january | 12593 |
43-
44-
## Convert year and month to datetime
45-
46-
In order to plot this data over time we need to do two things to prepare it first. First, we need to combine the year and month columns into a single [datetime](https://docs.python.org/3/library/datetime.html) column using the Pandas `to_datetime` function. Second, we assign the date column as our index for the data. These two steps will set up our data for plotting.
47-
48-
``` python
49-
df_long['date'] = pd.to_datetime(df_long['year'].astype(str) + '-' + df_long['month'], format='%Y-%B')
50-
```
51-
52-
Let's unpack that code:
53-
54-
- `df_long['date']` - First, we create a new `date` column.
55-
- `pd.to_datetime()` - Next we package everything into a datetime object.
56-
- `df_long['year'].astype(str)` - We use the `.astype(str)` method to convert the year column to a string
57-
- `+ '-' + df_long['month'],` - We concatenate a `-` to the string as a separator, followed by the month column.
58-
- `format='%Y-%B'` - We pass the datetime parameter to tell Python to expect a 4 digit year (%Y), followed by a dash, followed by the month's full name (%B).
59-
60-
If we take a look at the date column, we'll see that datetime automatically adds a day (always `01`) in the absence of any specific day input.
61-
62-
```python
63-
df_long['date']
64-
```
65-
```output
66-
0 2011-01-01
67-
1 2011-01-01
68-
2 2011-01-01
69-
3 2011-01-01
70-
4 2011-01-01
71-
...
72-
11551 2022-12-01
73-
11552 2022-12-01
74-
11553 2022-12-01
75-
11554 2022-12-01
76-
11555 2022-12-01
77-
Name: date, Length: 11556, dtype: datetime64[ns]
78-
```
79-
80-
``` python
81-
df_long.info()
82-
```
83-
84-
``` output
85-
<class 'pandas.core.frame.DataFrame'>
86-
RangeIndex: 11556 entries, 0 to 11555
87-
Data columns (total 9 columns):
88-
# Column Non-Null Count Dtype
89-
--- ------ -------------- -----
90-
0 branch 11556 non-null object
91-
1 address 7716 non-null object
92-
2 city 7716 non-null object
93-
3 zip code 7716 non-null float64
94-
4 ytd 11556 non-null int64
95-
5 year 11556 non-null object
96-
6 month 11556 non-null object
97-
7 circulation 11556 non-null int64
98-
8 date 11556 non-null datetime64[ns]
99-
dtypes: datetime64[ns](1), float64(1), int64(2), object(5)
100-
memory usage: 812.7+ KB
101-
```
102-
103-
That worked! Now, we can make the datetime column the index of our DataFrame. In the Pandas episode we looked at Pandas default numerical index, but we can also use `.set_index()` to declare a specific column as the index of our DataFrame. Using a datetime index will make it easier for us to plot the DataFrame over time. The first parameter of `.set_index()` is the column name and the `inplace=True` parameter allows us to modify the DataFrame without assigning it to a new variable.
104-
105-
106-
``` python
107-
df_long.set_index('date', inplace=True)
108-
```
109-
110-
If we look at the data again, we will see our index will be set to date.
111-
112-
``` python
113-
df_long.head()
114-
```
115-
11636
| | branch | address | city | zip code | ytd | year | month | circulation |
11737
|------------|----------------|-------------------------|---------|----------|--------|------|---------|-------------|
11838
| date | | | | | | | | |

0 commit comments

Comments
 (0)