-
Introduction 1
-
Lecture1.1
-
-
Getting the Data 3
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
-
SP500 Webscrape 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Full Dataset 2
-
Lecture4.1
-
Lecture4.2
-
-
Regressions 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Machine Learning 5
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
Lecture6.5
-
-
Machine Learning Function 2
-
Lecture7.1
-
Lecture7.2
-
-
Visualize Data 2
-
Lecture8.1
-
Lecture8.2
-
Web Scraping Part 2
Now to get the first element of each row, instead of using findall, we will use find. Not every row has a valid value, so if it is equal to that we’ll skip over it.
for x in rows:
firstCell = x.find("td")
if firstCell!=None:
print(firstCell.text_content())
The function text_content() returns a string of the text in the element
Let’s modify what we did to create an array.
stocksArray = []
for x in rows:
firstCell = x.find("td")
if firstCell!=None:
stocksArray.append(firstCell.text_content())
print(stocksArray)
We need to save the stocks array so that we can access them later. This will be a little different because we aren’t writing a dataframe to csv.
import csv
with open("stocksArray.csv", "wt") as f:
writer = csv.writer(f)
writer.writerow(stocksArray)
The second line opens our csv file in a writing mode, so that we can write to it. The third line creates a writer, and the line after writes a row with the array we just created.
Challenge
Get the entire dataframe of the stocks, and save it to csv so that we have the industries, names, etc.
Hint: You can feed an array of arrays to pandas in pd.DataFrame() to create a dataframe.
Hint: You can feed an array of arrays to pandas in pd.DataFrame() to create a dataframe.
Prev
Web Scraping
Next
Solution