Solution
Solution
import pandas as pd
table = tree.xpath('//*[@id="mw-content-text"]/div/table[1]')
table = table[0]
rows = table.findall("tr")
rows = rows[1:]
cellsAr = []
for x in rows:
cells = x.findall("td")
cells = [x.text_content() for x in cells]
cellsAr.append(cells)
df = pd.DataFrame(cellsAr)
df.columns = ["Ticker","Security","SEC Filings","GICS Sector","GICS Sub Industry","Address","Date Added","CIK"]
df.to_csv("SP500.csv",encoding="UTF-8")
We get rid of the first row because it is the header by indexing the array starting at 1, “rows = rows[1:]”. Everything else should be review.
Source Code