-
Pandas Basics 5
-
Lecture1.1
-
Lecture1.2
-
Lecture1.3
-
Lecture1.4
-
Lecture1.5
-
-
Data Transformations 6
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
Lecture2.4
-
Lecture2.5
-
Lecture2.6
-
-
Statistics 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Reading and Writing Data 3
-
Lecture4.1
-
Lecture4.2
-
Lecture4.3
-
-
Joins 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Grouping 4
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
-
Introduction to Numpy 4
-
Lecture7.1
-
Lecture7.2
-
Lecture7.3
-
Lecture7.4
-
-
Randomness 2
-
Lecture8.1
-
Lecture8.2
-
-
Numpy Data Functionality 1
-
Lecture9.1
-
Reading and Writing Data
Writing Data¶
Pandas has many options for writing data including to csv which is usually the most popular, to excel, to json, etc. The most basic usage is to call to_csv on a dataframe with the argument of the filepath/filename. If you do just specify a file name then you will end up writing the file to the current directory.
import pandas as pd
#Create test data
test_data = pd.DataFrame([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]], index=['A','B','C'], columns=["Col 1", "Col 2", "Col 3"])
#Write the test data to csv
test_data.to_csv("TestData.csv")
Reading Data¶
Within pandas there is read_csv and similar files which will read in data. We can call this with a path to a file and it will return back a dataframe. For example, let's read in the data we just wrote.
df = pd.read_csv("TestData.csv")
print(df)
Unnamed: 0 Col 1 Col 2 Col 3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
When doing this you can specify the index column before reading in. Above you see that if we do not do that then pandas assumes there is no index and all columns should be in the dataset. Instead let's read in with the index.
df = pd.read_csv("TestData.csv", index_col=0)
print(df)
Col 1 Col 2 Col 3
A 1 2 3
B 4 5 6
C 7 8 9
Toggling Index and Header¶
When we are using functions to write data, in all the cases we can also use optional arguments such as index and header to toggle writing the columns and the index. Let's make a second file without index or header.
test_data.to_csv("TestData2.csv", index=False, header=False)
Now read in the data that we just wrote and see what happens.
df = pd.read_csv("TestData2.csv")
print(df)
1 2 3
0 4 5 6
1 7 8 9
Pandas is going to always assume that the first row is the header of your data. To read in the data assuming no header you can pass the argument header=None.
df = pd.read_csv("TestData2.csv", header=None)
print(df)
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9