Solution

import pandas as pd
table = tree.xpath('//*[@id="mw-content-text"]/div/table[1]')
table = table[0]
rows = table.findall("tr")
rows = rows[1:]
cellsAr = []
for x in rows:
    cells = x.findall("td")
    cells = [x.text_content() for x in cells]
    cellsAr.append(cells)
df = pd.DataFrame(cellsAr)
df.columns = ["Ticker","Security","SEC Filings","GICS Sector","GICS Sub Industry","Address","Date Added","CIK"]
df.to_csv("SP500.csv",encoding="UTF-8")

We get rid of the first row because it is the header by indexing the array starting at 1, “rows = rows[1:]”. Everything else should be review.

Source Code

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw Clustering Industries 2.ipynb hosted with ❤ by GitHub

Data Science

Clustering Stock Industries

Solution

Solution

Source Code

Leave A Reply Cancel reply

Modal title