Description
Like many traders, I started by scraping Wikipedia for S&P 500 tickers. It worked… until it didn’t. In this post, I’ll share a cleaner, more professional, and repeatable approach: using iShares ETF data and Python to build a reliable, extensible list of stocks to start your initial analysis.
Intro
As quantitative traders and data-driven developers, we care about reliability and repeatability. Instead of scraping Wikipedia pages, whose format can change, symbols can be inconsistent, and HTML parsing introduces unnecessary fragility. For something as fundamental as the S&P 500, there’s a cleaner, more reliable way: iShares, the issuer of one of the most popular ETFs tracking the S&P 500 — the iShares Core S&P 500 ETF (IVV).
The great thing about iShares is that they provide official ETF holdings in a structured, downloadable CSV file — updated regularly, consistently, and perfect for Python.
Get the Data from iShares
The holdings of the iShares Core S&P 500 ETF (IVV) are listed on the following link:
👉 https://www.ishares.com/us/products/239726/ishares-core-sp-500-etf
From that page, you can download the CSV file manually or automate the process using Python. And this is the approach we prefer!
import pandas as pd
import os
ETF_URLS = {
"SP500": "https://www.ishares.com/us/products/239726/ishares-core-sp-500-etf/1467271812596.ajax?fileType=csv&fileName=IVV_holdings&dataType=fund"
}
def fetch_etf_holdings(index_name, url):
try:
df = pd.read_csv(url, skiprows=9) # Skip header info
df = df[["Ticker", "Name"]]
df = df.dropna()
return df
except Exception as e:
print(f"Failed to fetch {index_name}: {e}")
return pd.DataFrame()
if __name__ == '__main__':
outputfile_path = <YOUR_PATH>
for index_name, url in ETF_URLS.items():
df = fetch_etf_holdings(index_name, url)
print(f"✅ Total {index_name} tickers: {len(df)}")
output_filename = f"{index_name}.csv"
output_path = os.path.join(outputfile_path, output_filename)
df.to_csv(output_path, index=False, header=True)
print(f"💾 File saved: {output_path}")The code saves the list in a CSV file in the directory of your choice. The list is loaded into a pandas Dataframe, so you’ll have a clean dataset that includes tickers and company names.
Make it reusable
You can wrap the same logic into a reusable tool to download other lists of tickers from iShare. The same method works beautifully for other ETFs as well. For example:
Nasdaq 100 → iShares NASDAQ-100 ETF (QQQM or QQQ)
Dow Jones Industrial Average → iShares Dow Jones ETF (DIA)
European Indexes → iShares DAX or STOXX 600 ETFs
Each of these ETFs provides an official holdings CSV you can download and integrate into your workflow.
Indeed, the iShares NASDAQ-100 ETF (CNDX) or even the European stock index iShares Core DAX® UCITS ETF (DAX) can be easily added to the list and use the same loop to generate and store the information in the CSV file.
Enrich Your Dataset with yfinance
Here’s where things get powerful. The iShares CSV gives you the raw list, but you can easily enrich it with live financial data using the yfinance library. For each ticker, you can retrieve details like: stock sector, market capitalization, currency, price to earnings ratio, latest price, and much more.
Let’s add a simple function that populates the list with a few additional information from Yahoo Finance.
def get_info_from_yfnance(ticker):
try:
stock = yf.Ticker(ticker)
info = stock.info
sector = info.get("sector", None)
currency = info.get("currency", None)
market_cap = info.get("marketCap", None)
pe_ratio = info.get("trailingPE", None)
return sector, currency, market_cap, pe_ratio
except:
return None, None, None, NoneWith just a few lines, you’ll have a DataFrame that combines structural ETF data with live market fundamentals — clean, comprehensive, and easily updatable.
import os
import pandas as pd
from tqdm import tqdm
import yfinance as yf
ETF_URLS = {
"SP500": "https://www.ishares.com/us/products/239726/ishares-core-sp-500-etf/1467271812596.ajax?fileType=csv&fileName=IVV_holdings&dataType=fund",
"Nasdaq100": "https://www.ishares.com/uk/individual/en/products/253741/ishares-nasdaq-100-ucits-etf/1506575576011.ajax?fileType=csv&fileName=CNDX_holdings&dataType=fund"
}
def fetch_etf_holdings(index_name, url):
try:
if index_name == "Nasdaq100":
df = pd.read_csv(url, skiprows=2)
df = df[["Ticker", "Name"]]
df = df.dropna()
else:
df = pd.read_csv(url, skiprows=9) # Skip header info
df = df[["Ticker", "Name"]]
df = df.dropna()
return df
except Exception as e:
print(f"Failed to fetch {index_name}: {e}")
return pd.DataFrame()
def get_info_from_yfnance(ticker):
try:
stock = yf.Ticker(ticker)
info = stock.info
sector = info.get("sector", None)
currency = info.get("currency", None)
market_cap = info.get("marketCap", None)
pe_ratio = info.get("trailingPE", None)
return sector, currency, market_cap, pe_ratio
except:
return None, None, None, None
if __name__ == '__main__':
outputfile_path = <YOUR_PATH>
for index_name, url in ETF_URLS.items():
df = fetch_etf_holdings(index_name, url)
print(f"Fetching info from yfinance (this may take some time) for {index_name} ...")
df[["Sector","Currency", "Market Cap", "PE Ratio(TTM)"]] = (
df["Ticker"]
.progress_apply(lambda x: pd.Series(get_info_from_yfnance(x)))
)
print(f"✅ Total {index_name} tickers: {len(df)}")
output_filename = f"{index_name}.csv"
output_path = os.path.join(outputfile_path, output_filename)
df.to_csv(output_path, index=False, header=True)
print(f"💾 File saved: {output_path}")This turns a static list into a mini stock database — the perfect foundation for a simple stock screener or research pipeline.
Final Thoughts
In quantitative trading, the edge often lies in how you manage and structure your data.
A robust data pipeline doesn’t start with scraping — it starts with clean, verifiable, and easily automated sources.
By leveraging ETF holdings from trusted providers and enhancing them with market data from yfinance, you can build a solid foundation for backtesting, screening, and systematic strategy development.
This is the mindset behind The Quantitative Edge — simple ideas, implemented cleanly, that scale into powerful tools for data-driven trading.
Statemi bene!
