Web Scraping Part 2

Now to get the first element of each row, instead of using findall, we will use find. Not every row has a valid value, so if it is equal to that we’ll skip over it.

for x in rows:
    firstCell = x.find("td")
    if firstCell!=None:
        print(firstCell.text_content())

The function text_content() returns a string of the text in the element

Let’s modify what we did to create an array.

stocksArray = []
for x in rows:
    firstCell = x.find("td")
    if firstCell!=None:
        stocksArray.append(firstCell.text_content())
print(stocksArray)

We need to save the stocks array so that we can access them later. This will be a little different because we aren’t writing a dataframe to csv.

import csv
with open("stocksArray.csv", "wt") as f:
    writer = csv.writer(f)
    writer.writerow(stocksArray)

The second line opens our csv file in a writing mode, so that we can write to it. The third line creates a writer, and the line after writes a row with the array we just created.

Challenge

Get the entire dataframe of the stocks, and save it to csv so that we have the industries, names, etc.

Hint: You can feed an array of arrays to pandas in pd.DataFrame() to create a dataframe.

Data Science

Clustering Stock Industries

Web Scraping Part 2

Challenge

Leave A Reply Cancel reply

Modal title