Pandas Basics
Pandas¶
Pandas is one of the most powerful data science libraries in python. It allows you to work with datasets to quickly run computations, organize data and apply functions to it. The biggest benefit of pandas is the speed. While running things like loops in native python can take much longer than other languages such as C++, pandas bridges the gap to make functions run extremely fast. It is done by converting code to C++ which can run at a much faster speed. In this first lesson we will explore some of the most basic elements of pandas.
Pandas Series¶
In regular python, we have a list which holds one dimensional data for us. With pandas, there is the series data type which is similar except that there is also an index (which can be different then the usual 0, 1, 2, etc. index) and as well there are many more pandas functions that we can use.
Start with our basic list.
#In regular python we have a list
l = [1,2,3,4,5,6]
import pandas as pd
#Create a pandas series by passing the list
s = pd.Series(l)
print(s)
A few things to notice. On the left side is the index which automatically defaults to an integer index starting at 0. On the right are the values. As well, it says dtype which is just to say what kind of data is held within the series. Pandas automatically noticed we only had integers so it assigned the dtype of int64.
Specifying an Index¶
We don't have to default to the normal index. If we pass in the argument index when creating our series it will be reflected.
#Create a series with a custom index
s = pd.Series(l,index=["a", "b", "c", "d", "e", "f"])
print(s)
Above we see that the left hand side is now our index of letters. We can also a set an index after the fact by setting the index attribute. For example if we created our series without an index we can change it after like below:
#Create the series
s = pd.Series(l)
#Set the index
s.index=["a", "b", "c", "d", "e", "f"]
print(s)
One benefit of pandas is that we can easily apply math functions to it instead of iterating over every element. For example, the code below will square each value.
#Square all the values
s = s ** 2
print(s)
If we want just the values in the series, we can use the values attribute. The format we will get back is a numpy array which we will cover more in future lessons.
#Get the values of the series
print(s.values)
Indexing¶
Indexing can be done two different ways. If you use iloc, you can index similar to how one might index a list. With loc you can instead index based on the index values. If you do this it will include everything up to and including the last index. First, let's use iloc to get the first three values.
#Get the first three values
print(s.iloc[:3])
By using loc we can also do this by passing a and c.
print(s.loc['a':'c'])