Web Scraping Part 2
Now to get the first element of each row, instead of using findall, we will use find. Not every row has a valid value, so if it is equal to that we’ll skip over it.
for x in rows:
firstCell = x.find("td")
if firstCell!=None:
print(firstCell.text_content())
The function text_content() returns a string of the text in the element
Let’s modify what we did to create an array.
stocksArray = []
for x in rows:
firstCell = x.find("td")
if firstCell!=None:
stocksArray.append(firstCell.text_content())
print(stocksArray)
We need to save the stocks array so that we can access them later. This will be a little different because we aren’t writing a dataframe to csv.
import csv
with open("stocksArray.csv", "wt") as f:
writer = csv.writer(f)
writer.writerow(stocksArray)
The second line opens our csv file in a writing mode, so that we can write to it. The third line creates a writer, and the line after writes a row with the array we just created.
Challenge
Get the entire dataframe of the stocks, and save it to csv so that we have the industries, names, etc.
Hint: You can feed an array of arrays to pandas in pd.DataFrame() to create a dataframe.
Hint: You can feed an array of arrays to pandas in pd.DataFrame() to create a dataframe.