Skip to content

Commit 8226757

Browse files
committed
add exercises and solutions
1 parent 50a708c commit 8226757

File tree

1 file changed

+227
-0
lines changed

1 file changed

+227
-0
lines changed

lectures/polars.md

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,4 +592,231 @@ Note that polars offers many other file type alternatives.
592592

593593
Polars has [a wide variety](https://docs.pola.rs/user-guide/io/) of methods that we can use to read excel, json, parquet or plug straight into a database server.
594594

595+
## Exercises
596+
597+
```{exercise-start}
598+
:label: pl_ex1
599+
```
600+
601+
With these imports:
602+
603+
```{code-cell} ipython3
604+
import datetime as dt
605+
import yfinance as yf
606+
```
607+
608+
Write a program to calculate the percentage price change over 2021 for the following shares using Polars:
609+
610+
```{code-cell} ipython3
611+
ticker_list = {'INTC': 'Intel',
612+
'MSFT': 'Microsoft',
613+
'IBM': 'IBM',
614+
'BHP': 'BHP',
615+
'TM': 'Toyota',
616+
'AAPL': 'Apple',
617+
'AMZN': 'Amazon',
618+
'C': 'Citigroup',
619+
'QCOM': 'Qualcomm',
620+
'KO': 'Coca-Cola',
621+
'GOOG': 'Google'}
622+
```
623+
624+
Here's the first part of the program that reads data into a Polars DataFrame:
625+
626+
```{code-cell} ipython3
627+
def read_data_polars(ticker_list,
628+
start=dt.datetime(2021, 1, 1),
629+
end=dt.datetime(2021, 12, 31)):
630+
"""
631+
This function reads in closing price data from Yahoo
632+
for each tick in the ticker_list and returns a Polars DataFrame.
633+
"""
634+
# Start with an empty list to collect DataFrames
635+
dataframes = []
636+
637+
for tick in ticker_list:
638+
stock = yf.Ticker(tick)
639+
prices = stock.history(start=start, end=end)
640+
641+
# Create a Polars DataFrame from the closing prices
642+
df = pl.DataFrame({
643+
'Date': pd.to_datetime(prices.index.date),
644+
tick: prices['Close'].values
645+
})
646+
dataframes.append(df)
647+
648+
# Join all DataFrames on the Date column
649+
result = dataframes[0]
650+
for df in dataframes[1:]:
651+
result = result.join(df, on='Date', how='outer')
652+
653+
return result
654+
655+
ticker = read_data_polars(ticker_list)
656+
```
657+
658+
Complete the program to plot the result as a bar graph using Polars operations and matplotlib visualization.
659+
660+
```{exercise-end}
661+
```
662+
663+
```{solution-start} pl_ex1
664+
:class: dropdown
665+
```
666+
667+
Here's a solution using Polars operations to calculate percentage changes:
668+
669+
670+
```{code-cell} ipython3
671+
price_change_df = ticker.select([
672+
pl.col(tick).last().alias(f"{tick}_last") / pl.col(tick).first().alias(f"{tick}_first") * 100 - 100
673+
for tick in ticker_list.keys()
674+
]).transpose(include_header=True, header_name='ticker', column_names=['pct_change'])
675+
676+
# Add company names and sort
677+
price_change_df = price_change_df.with_columns([
678+
pl.col('ticker').replace(ticker_list, default=pl.col('ticker')).alias('company')
679+
]).sort('pct_change')
680+
681+
print(price_change_df)
682+
```
683+
684+
Now plot the results:
685+
686+
```{code-cell} ipython3
687+
# Convert to pandas for plotting (as demonstrated in the lecture)
688+
df_pandas = price_change_df.to_pandas().set_index('company')
689+
690+
fig, ax = plt.subplots(figsize=(10,8))
691+
ax.set_xlabel('stock', fontsize=12)
692+
ax.set_ylabel('percentage change in price', fontsize=12)
693+
df_pandas['pct_change'].plot(kind='bar', ax=ax)
694+
plt.xticks(rotation=45)
695+
plt.tight_layout()
696+
plt.show()
697+
```
698+
699+
```{solution-end}
700+
```
701+
702+
703+
```{exercise-start}
704+
:label: pl_ex2
705+
```
706+
707+
Using the method `read_data_polars` introduced in {ref}`pl_ex1`, write a program to obtain year-on-year percentage change for the following indices using Polars operations:
708+
709+
```{code-cell} ipython3
710+
indices_list = {'^GSPC': 'S&P 500',
711+
'^IXIC': 'NASDAQ',
712+
'^DJI': 'Dow Jones',
713+
'^N225': 'Nikkei'}
714+
```
715+
716+
Complete the program to show summary statistics and plot the result as a time series graph demonstrating Polars' data manipulation capabilities.
717+
718+
```{exercise-end}
719+
```
720+
721+
```{solution-start} pl_ex2
722+
:class: dropdown
723+
```
724+
725+
Following the work you did in {ref}`pl_ex1`, you can query the data using `read_data_polars` by updating the start and end dates accordingly.
726+
727+
```{code-cell} ipython3
728+
indices_data = read_data_polars(
729+
indices_list,
730+
start=dt.datetime(1971, 1, 1), # Common Start Date
731+
end=dt.datetime(2021, 12, 31)
732+
)
733+
734+
# Add year column for grouping
735+
indices_data = indices_data.with_columns(
736+
pl.col('Date').dt.year().alias('year')
737+
)
738+
739+
print("Data shape:", indices_data.shape)
740+
print("\nFirst few rows:")
741+
print(indices_data.head())
742+
```
743+
744+
Calculate yearly returns using Polars groupby operations:
745+
746+
```{code-cell} ipython3
747+
# Calculate first and last price for each year and each index
748+
yearly_returns = indices_data.group_by('year').agg([
749+
*[pl.col(index).first().alias(f"{index}_first") for index in indices_list.keys()],
750+
*[pl.col(index).last().alias(f"{index}_last") for index in indices_list.keys()]
751+
])
752+
753+
# Calculate percentage returns for each index
754+
for index in indices_list.keys():
755+
yearly_returns = yearly_returns.with_columns(
756+
((pl.col(f"{index}_last") - pl.col(f"{index}_first")) / pl.col(f"{index}_first"))
757+
.alias(indices_list[index])
758+
)
759+
760+
# Select only the year and return columns
761+
yearly_returns = yearly_returns.select([
762+
'year',
763+
*list(indices_list.values())
764+
]).sort('year')
765+
766+
print("Yearly returns shape:", yearly_returns.shape)
767+
print("\nYearly returns:")
768+
print(yearly_returns.head(10))
769+
```
770+
771+
Generate summary statistics using Polars:
772+
773+
```{code-cell} ipython3
774+
# Summary statistics for all indices
775+
summary_stats = yearly_returns.select(list(indices_list.values())).describe()
776+
print("Summary Statistics:")
777+
print(summary_stats)
778+
```
779+
780+
Plot the time series:
781+
782+
```{code-cell} ipython3
783+
# Convert to pandas for plotting
784+
df_pandas = yearly_returns.to_pandas().set_index('year')
785+
786+
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
787+
788+
for iter_, ax in enumerate(axes.flatten()):
789+
if iter_ < len(indices_list):
790+
index_name = list(indices_list.values())[iter_]
791+
ax.plot(df_pandas.index, df_pandas[index_name])
792+
ax.set_ylabel("percent change", fontsize=12)
793+
ax.set_xlabel("year", fontsize=12)
794+
ax.set_title(index_name)
795+
ax.grid(True, alpha=0.3)
796+
797+
plt.tight_layout()
798+
plt.show()
799+
```
800+
801+
Alternative: Create a single plot with all indices:
802+
803+
```{code-cell} ipython3
804+
# Single plot with all indices
805+
fig, ax = plt.subplots(figsize=(12, 8))
806+
807+
for index_name in indices_list.values():
808+
ax.plot(df_pandas.index, df_pandas[index_name], label=index_name, linewidth=2)
809+
810+
ax.set_xlabel("year", fontsize=12)
811+
ax.set_ylabel("yearly return", fontsize=12)
812+
ax.set_title("Yearly Returns of Major Stock Indices", fontsize=14)
813+
ax.legend()
814+
ax.grid(True, alpha=0.3)
815+
plt.tight_layout()
816+
plt.show()
817+
```
818+
819+
```{solution-end}
820+
```
821+
595822
[^mung]: Wikipedia defines munging as cleaning data from one raw form into a structured, purged one.

0 commit comments

Comments
 (0)