-
Notifications
You must be signed in to change notification settings - Fork 6
Expand file tree
/
Copy pathweek3.html
More file actions
298 lines (274 loc) · 12.5 KB
/
week3.html
File metadata and controls
298 lines (274 loc) · 12.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Week 3: Advanced Data Handling</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
<link rel="stylesheet" href="css/styles.css">
<link rel="icon" href="images/favicon.ico">
</head>
<body>
<nav class="navbar navbar-default navbar-fixed-top">
<div class="container-fluid">
<div class="navbar-header">
<a class="navbar-brand" href="index.html">Data Analytics I</a>
</div>
<ul class="nav navbar-nav">
<li><a href="index.html">Home</a></li>
<li><a href="about.html">About</a></li>
<li><a href="contact.html">Contact</a></li>
</ul>
</div>
</nav>
<div style="height: 80px;"></div>
<main class="container">
<h1>Week 3: Advanced Data Handling</h1>
<section>
<h2>Topics Covered:</h2>
<ul>
<li>Advanced Pandas for Financial Data</li>
<li>AI-Enhanced Data Processing</li>
<li>SAS to Python Migration</li>
<li>Efficient Data Processing Techniques</li>
<li>Financial Time Series Operations</li>
<li>Data Quality and Validation</li>
</ul>
</section>
<section>
<h2>Core Concepts and Implementation:</h2>
<ul>
<li><b>AI-Powered Pandas Tools</b>
<ul>
<li>pandasai: AI-powered data analysis</li>
<li>pandas-ai: Natural language queries for DataFrames</li>
<li>dataprep: Automated data preparation</li>
<li>autoviz: Automated visualization generation</li>
</ul>
</li>
<li><b>SAS to Python Migration</b>
<ul>
<li>Reading SAS (.sas7bdat) files with pandas</li>
<li>SAS to Python syntax conversion</li>
<li>WRDS data handling best practices</li>
<li>Performance optimization for large SAS datasets</li>
</ul>
</li>
<li><b>Advanced Pandas Operations</b>
<ul>
<li>MultiIndex and hierarchical indexing</li>
<li>Advanced groupby operations</li>
<li>Rolling and expanding windows</li>
<li>Efficient memory usage with categorical data</li>
</ul>
</li>
<li><b>Financial Data Processing</b>
<ul>
<li>Handling missing data in financial time series</li>
<li>Adjusting for corporate actions</li>
<li>Working with different time zones</li>
<li>Managing point-in-time data</li>
</ul>
</li>
<li><b>Performance Optimization</b>
<ul>
<li>Vectorized operations</li>
<li>Efficient data types</li>
<li>Chunked data processing</li>
<li>Using pandas.eval() for large datasets</li>
</ul>
</li>
</ul>
</section>
<section>
<h2>AI-Enhanced Data Analysis Setup:</h2>
<ul>
<li><b>LLM Options</b>
<ul>
<li>Ollama (Free, provided on department GPU server)</li>
<li>OpenAI API (Student purchase required, ~$5-20/month)</li>
<li>Anthropic API (Student purchase required, pricing varies)</li>
</ul>
</li>
<li><b>AI-Powered Tools</b>
<ul>
<li>pandasai: Supports multiple LLM backends</li>
<li>pandas-ai: Natural language DataFrame operations</li>
<li>dataprep & autoviz: Automated analysis tools</li>
</ul>
</li>
</ul>
</section>
<section>
<h2>Getting Started:</h2>
<ol>
<li><b>Ollama Setup</b>
<ul>
<li>Access provided on department GPU server</li>
<li>See <a href="https://ollama.ai/library">Ollama Model Library</a> for available models and usage</li>
<li>Follow the <a href="https://github.com/ollama/ollama">official documentation</a> for model commands</li>
</ul>
</li>
<li><b>Optional Commercial APIs</b>
<pre><code># If you choose to purchase API access:
# 1. Create accounts at openai.com or anthropic.com
# 2. Purchase credits (student discounts may be available)
# 3. Create a .env file (never commit this!)
OPENAI_API_KEY=your_purchased_key
ANTHROPIC_API_KEY=your_purchased_key</code></pre>
</li>
<li><b>Install Required Packages</b>
<pre><code>pip install python-dotenv pandasai pandas-ai dataprep autoviz</code></pre>
</li>
</ol>
</section>
<section>
<h2>Using AI Tools:</h2>
<div class="alert alert-warning">
<strong>Important Note:</strong> The code examples below are for demonstration purposes only. They illustrate the general approach but are not production-ready. You will need to:
<ul>
<li>Debug and adapt the code to your specific use case</li>
<li>Handle errors and edge cases</li>
<li>Test with your actual data structure</li>
<li>Refer to the latest documentation as APIs may change</li>
</ul>
</div>
<pre><code># Example code - requires debugging and adaptation
# Using Ollama (Available to all students)
from pandasai import SmartDataframe
from pandasai.llm import Ollama
llm_local = Ollama(model="llama2") # See Ollama docs for available models
df_local = SmartDataframe(your_dataframe, config={'llm': llm_local}) # Replace your_dataframe
result_local = df_local.chat('Generate summary statistics')
# If you've purchased API access:
from dotenv import load_dotenv
import os
load_dotenv()
# OpenAI example (if purchased)
from pandasai.llm import OpenAI
llm = OpenAI(api_token=os.getenv('OPENAI_API_KEY'))
df = SmartDataframe(your_dataframe, config={'llm': llm}) # Replace your_dataframe</code></pre>
<p class="text-muted">Note: These examples assume certain package versions and configurations. Always check the current documentation and be prepared to debug integration issues.</p>
</section>
<section>
<h2>Model Comparison:</h2>
<ul>
<li><b>Ollama (Provided)</b>
<ul>
<li>Free access via department GPU server</li>
<li>Good for initial development and testing</li>
<li>Suitable for most course assignments</li>
<li>No usage limits or costs</li>
</ul>
</li>
<li><b>Commercial APIs (Optional)</b>
<ul>
<li>Higher accuracy but requires payment</li>
<li>OpenAI: Strong general performance (~$5-20/month)</li>
<li>Anthropic: Detailed analysis (pricing varies)</li>
<li>Consider for advanced projects or research</li>
</ul>
</li>
</ul>
</section>
<section>
<h2>Weekly Assignment</h2>
<div class="alert alert-info">
<strong>Due:</strong> End of Week 3
</div>
<h3>Tasks:</h3>
<ol>
<li>Data Analysis Setup
<ul>
<li>Install pandas-ai and related packages</li>
<li>Configure Ollama access</li>
<li>Test basic functionality</li>
</ul>
</li>
<li>Financial Data Analysis
<ul>
<li>Load and clean sample financial data</li>
<li>Perform basic statistical analysis</li>
<li>Create time series visualizations</li>
</ul>
</li>
<li>AI-Enhanced Analysis
<ul>
<li>Use Ollama for data exploration</li>
<li>Generate automated insights</li>
<li>Compare with traditional analysis</li>
</ul>
</li>
</ol>
<div class="alert alert-warning">
<strong>Submit:</strong> As instructed in the weekly assignment
</div>
</section>
<section>
<h2>Week 3 Projects:</h2>
<ol>
<li><b>Market Analysis (Using Ollama)</b>
<ul>
<li>Build data processing pipeline</li>
<li>Implement natural language queries</li>
<li>Generate automated reports</li>
<li>Optional: Compare with commercial API results</li>
</ul>
</li>
<li><b>Data Processing Pipeline</b>
<ul>
<li>Automate data cleaning with AI assistance</li>
<li>Create interactive analysis system</li>
<li>Implement quality checks</li>
<li>Generate comprehensive reports</li>
</ul>
</li>
</ol>
<p>Note: All course assignments can be completed using the provided Ollama setup-though not going to be perfect. Commercial APIs are optional but encourage exploration and at student's discretion.</p>
</section>
<section>
<h2>Best Practices:</h2>
<ul>
<li><b>Resource Management</b>
<ul>
<li>Check GPU server status before running jobs</li>
<li>Use batch processing for large datasets</li>
<li>Monitor GPU memory usage</li>
</ul>
</li>
<li><b>If Using Commercial APIs</b>
<ul>
<li>Monitor usage costs carefully</li>
<li>Use .env files for API keys</li>
<li>Never commit API keys to version control</li>
</ul>
</li>
</ul>
</section>
<section>
<h2>Additional Resources:</h2>
<ul>
<li><a href="https://ollama.ai/library">Ollama Model Library</a></li>
<li><a href="https://github.com/pandas-ai/pandas-ai">PandasAI Documentation</a></li>
<li><a href="https://platform.openai.com/docs">OpenAI API Documentation</a></li>
<li><a href="https://docs.anthropic.com/claude/docs">Anthropic Claude Documentation</a></li>
</ul>
<p>Check the department's GPU server status page for Ollama availability and usage guidelines. For those interested in commercial APIs, compare pricing and features before purchasing.</p>
</section>
<div style="text-align: center; margin-top: 20px;">
<a href="week2.html">← Previous Week</a>
|
<a href="week4.html">Next Week →</a>
</div>
<div class="back-to-home">
<a href="index.html">← Back to Home</a>
</div>
</main>
<footer class="container text-center">
<p>© 2025 Cinder Zhang. All rights reserved. Contact us at <a href="mailto:xzhang@walton.uark.edu">xzhang at walton uark</a></p>
</footer>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script>
</body>
</html>