-
Introduction 1
-
Lecture1.1
-
-
Getting the Data 3
-
Lecture2.1
-
Lecture2.2
-
Lecture2.3
-
-
SP500 Webscrape 4
-
Lecture3.1
-
Lecture3.2
-
Lecture3.3
-
Lecture3.4
-
-
Full Dataset 2
-
Lecture4.1
-
Lecture4.2
-
-
Regressions 5
-
Lecture5.1
-
Lecture5.2
-
Lecture5.3
-
Lecture5.4
-
Lecture5.5
-
-
Machine Learning 5
-
Lecture6.1
-
Lecture6.2
-
Lecture6.3
-
Lecture6.4
-
Lecture6.5
-
-
Machine Learning Function 2
-
Lecture7.1
-
Lecture7.2
-
-
Visualize Data 2
-
Lecture8.1
-
Lecture8.2
-
Solution
Solution
import pandas as pd
table = tree.xpath('//*[@id="mw-content-text"]/div/table[1]')
table = table[0]
rows = table.findall("tr")
rows = rows[1:]
cellsAr = []
for x in rows:
cells = x.findall("td")
cells = [x.text_content() for x in cells]
cellsAr.append(cells)
df = pd.DataFrame(cellsAr)
df.columns = ["Ticker","Security","SEC Filings","GICS Sector","GICS Sub Industry","Address","Date Added","CIK"]
df.to_csv("SP500.csv",encoding="UTF-8")
We get rid of the first row because it is the header by indexing the array starting at 1, “rows = rows[1:]”. Everything else should be review.
Source Code
Prev
Web Scraping Part 2
Next
Introduction