Skip to content

Commit 50a708c

Browse files
committed
Remove Exercises
1 parent 6e2d434 commit 50a708c

File tree

1 file changed

+0
-239
lines changed

1 file changed

+0
-239
lines changed

lectures/polars.md

Lines changed: 0 additions & 239 deletions
Original file line numberDiff line numberDiff line change
@@ -592,243 +592,4 @@ Note that polars offers many other file type alternatives.
592592

593593
Polars has [a wide variety](https://docs.pola.rs/user-guide/io/) of methods that we can use to read excel, json, parquet or plug straight into a database server.
594594

595-
## Exercises
596-
597-
```{exercise-start}
598-
:label: pl_ex1
599-
```
600-
601-
With these imports:
602-
603-
```{code-cell} ipython3
604-
import datetime as dt
605-
import yfinance as yf
606-
```
607-
608-
Write a program to calculate the percentage price change over 2021 for the following shares:
609-
610-
```{code-cell} ipython3
611-
ticker_list = {'INTC': 'Intel',
612-
'MSFT': 'Microsoft',
613-
'IBM': 'IBM',
614-
'BHP': 'BHP',
615-
'TM': 'Toyota',
616-
'AAPL': 'Apple',
617-
'AMZN': 'Amazon',
618-
'C': 'Citigroup',
619-
'QCOM': 'Qualcomm',
620-
'KO': 'Coca-Cola',
621-
'GOOG': 'Google'}
622-
```
623-
624-
Here's the first part of the program
625-
626-
```{note}
627-
Many python packages will return Pandas DataFrames by default. In this example we use the `yfinance` package and convert the data to a polars DataFrame
628-
```
629-
630-
```{code-cell} ipython3
631-
def read_data(ticker_list,
632-
start=dt.datetime(2021, 1, 1),
633-
end=dt.datetime(2021, 12, 31)):
634-
"""
635-
This function reads in closing price data from Yahoo
636-
for each tick in the ticker_list.
637-
"""
638-
639-
all_data = []
640-
641-
for tick in ticker_list:
642-
stock = yf.Ticker(tick)
643-
prices = stock.history(start=start, end=end)
644-
645-
# Convert to polars DataFrame
646-
df = pl.from_pandas(prices.reset_index())
647-
df = df.with_columns([
648-
pl.col('Date').cast(pl.Date),
649-
pl.lit(tick).alias('ticker')
650-
]).select(['Date', 'ticker', 'Close'])
651-
652-
all_data.append(df)
653-
654-
# Combine all data
655-
ticker_df = pl.concat(all_data)
656-
657-
# Pivot to have tickers as columns
658-
ticker_df = ticker_df.pivot(values='Close', index='Date', on='ticker')
659-
660-
return ticker_df
661-
662-
ticker = read_data(ticker_list)
663-
```
664-
665-
Complete the program to plot the result as a bar graph like this one:
666-
667-
```{image} /_static/lecture_specific/pandas/pandas_share_prices.png
668-
:scale: 80
669-
:align: center
670-
```
671-
672-
```{exercise-end}
673-
```
674-
675-
```{solution-start} pl_ex1
676-
:class: dropdown
677-
```
678-
679-
There are a few ways to approach this problem using Polars to calculate
680-
the percentage change.
681-
682-
First, you can extract the data and perform the calculation such as:
683-
684-
```{code-cell} ipython3
685-
# Get first and last prices for each ticker
686-
first_prices = ticker[0] # First row
687-
last_prices = ticker[-1] # Last row
688-
689-
# Convert to pandas for easier calculation, excluding Date column to avoid type errors
690-
numeric_cols = [col for col in ticker.columns if col != 'Date']
691-
first_pd = ticker.head(1).select(numeric_cols).to_pandas().iloc[0]
692-
last_pd = ticker.tail(1).select(numeric_cols).to_pandas().iloc[0]
693-
694-
price_change = (last_pd - first_pd) / first_pd * 100
695-
price_change
696-
```
697-
698-
Alternatively you can use polars expressions to calculate percentage change:
699-
700-
```{code-cell} ipython3
701-
# Calculate percentage change using polars
702-
change_df = ticker.select([
703-
((pl.col(col).last() - pl.col(col).first()) / pl.col(col).first() * 100).alias(f'{col}_pct_change')
704-
for col in ticker.columns if col != 'Date'
705-
])
706-
707-
# Convert to series for plotting
708-
price_change = change_df.to_pandas().iloc[0]
709-
price_change.index = [col.replace('_pct_change', '') for col in price_change.index]
710-
price_change
711-
```
712-
713-
Then to plot the chart
714-
715-
```{code-cell} ipython3
716-
price_change.sort_values(inplace=True)
717-
price_change.rename(index=ticker_list, inplace=True)
718-
```
719-
720-
```{code-cell} ipython3
721-
fig, ax = plt.subplots(figsize=(10,8))
722-
ax.set_xlabel('stock', fontsize=12)
723-
ax.set_ylabel('percentage change in price', fontsize=12)
724-
price_change.plot(kind='bar', ax=ax)
725-
plt.show()
726-
```
727-
728-
```{solution-end}
729-
```
730-
731-
732-
```{exercise-start}
733-
:label: pl_ex2
734-
```
735-
736-
Using the method `read_data` introduced in {ref}`pl_ex1`, write a program to obtain year-on-year percentage change for the following indices:
737-
738-
```{code-cell} ipython3
739-
indices_list = {'^GSPC': 'S&P 500',
740-
'^IXIC': 'NASDAQ',
741-
'^DJI': 'Dow Jones',
742-
'^N225': 'Nikkei'}
743-
```
744-
745-
Complete the program to show summary statistics and plot the result as a time series graph like this one:
746-
747-
```{image} /_static/lecture_specific/pandas/pandas_indices_pctchange.png
748-
:scale: 80
749-
:align: center
750-
```
751-
752-
```{exercise-end}
753-
```
754-
755-
```{solution-start} pl_ex2
756-
:class: dropdown
757-
```
758-
759-
Following the work you did in {ref}`pl_ex1`, you can query the data using `read_data` by updating the start and end dates accordingly.
760-
761-
```{code-cell} ipython3
762-
indices_data = read_data(
763-
indices_list,
764-
start=dt.datetime(1971, 1, 1),
765-
end=dt.datetime(2021, 12, 31)
766-
)
767-
```
768-
769-
Then, calculate the yearly returns using polars:
770-
771-
```{code-cell} ipython3
772-
# Combine all yearly returns using concat and pivot approach
773-
all_yearly_data = []
774-
775-
for index_col in indices_data.columns:
776-
if index_col != 'Date':
777-
yearly_data = (indices_data
778-
.with_columns(pl.col('Date').dt.year().alias('year'))
779-
.group_by('year')
780-
.agg([
781-
pl.col(index_col).first().alias('first_price'),
782-
pl.col(index_col).last().alias('last_price')
783-
])
784-
.with_columns(
785-
((pl.col('last_price') - pl.col('first_price') + 1e-10)
786-
/ (pl.col('first_price') + 1e-10)).alias('return')
787-
)
788-
.with_columns(pl.lit(indices_list[index_col]).alias('index_name'))
789-
.select(['year', 'index_name', 'return']))
790-
791-
all_yearly_data.append(yearly_data)
792-
793-
# Concatenate all data
794-
combined_data = pl.concat(all_yearly_data)
795-
796-
# Pivot to get indices as columns
797-
yearly_returns = combined_data.pivot(values='return', index='year', on='index_name')
798-
799-
yearly_returns
800-
```
801-
802-
Next, you can obtain summary statistics by using the method `describe`.
803-
804-
```{code-cell} ipython3
805-
yearly_returns.select(pl.exclude('year')).describe()
806-
```
807-
808-
Then, to plot the chart
809-
810-
```{code-cell} ipython3
811-
# Convert to pandas for plotting
812-
yearly_returns_pd = yearly_returns.to_pandas().set_index('year')
813-
814-
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
815-
816-
# Flatten 2-D array to 1-D array
817-
for iter_, ax in enumerate(axes.flatten()):
818-
if iter_ < len(yearly_returns_pd.columns):
819-
820-
# Get index name per iteration
821-
index_name = yearly_returns_pd.columns[iter_]
822-
823-
# Plot pct change of yearly returns per index
824-
ax.plot(yearly_returns_pd[index_name])
825-
ax.set_ylabel("percent change", fontsize = 12)
826-
ax.set_title(index_name)
827-
828-
plt.tight_layout()
829-
```
830-
831-
```{solution-end}
832-
```
833-
834595
[^mung]: Wikipedia defines munging as cleaning data from one raw form into a structured, purged one.

0 commit comments

Comments
 (0)