Skip to content

Commit 4cf38f5

Browse files
committed
Refactor data processing in pipeline.py to enhance clarity and efficiency. Extract relevant columns, map indicator codes, and ensure proper data types for date and value. Update pivoting method for improved data structure. Additionally, modify pipeline.yml to set PREFECT_DEPLOYMENT environment variable for better deployment management.
1 parent 700c921 commit 4cf38f5

File tree

2 files changed

+17
-8
lines changed

2 files changed

+17
-8
lines changed

.github/workflows/pipeline.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,6 @@ jobs:
2929
prefect cloud login --key ${{ secrets.PREFECT_API_KEY }} --workspace ${{ secrets.PREFECT_WORKSPACE }}
3030
3131
- name: Run Pipeline
32+
env:
33+
PREFECT_DEPLOYMENT: "true"
3234
run: python pipeline.py

pipeline.py

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,23 +33,30 @@ def fetch_energy_data():
3333

3434
# Extract data from response
3535
data = response.json()[1]
36+
37+
# Convert to DataFrame
3638
df = pd.DataFrame(data)
37-
df['indicator'] = indicators[indicator]
39+
40+
# Extract relevant columns
41+
df = df[['country', 'date', 'value', 'indicator']]
42+
43+
# Map indicator code to description
44+
df['indicator'] = df['indicator'].map(indicators)
45+
46+
# Convert date and value to appropriate types
47+
df['date'] = pd.to_datetime(df['date'])
48+
df['value'] = pd.to_numeric(df['value'], errors='coerce')
49+
3850
dfs.append(df)
3951

4052
# Combine all indicators
4153
combined_df = pd.concat(dfs, ignore_index=True)
4254

43-
# Clean and process the data
44-
combined_df['date'] = pd.to_datetime(combined_df['date'])
45-
combined_df['value'] = pd.to_numeric(combined_df['value'], errors='coerce')
46-
4755
# Pivot the data to have indicators as columns
48-
df_pivot = combined_df.pivot_table(
56+
df_pivot = combined_df.pivot(
4957
index=['country', 'date'],
5058
columns='indicator',
51-
values='value',
52-
aggfunc='first'
59+
values='value'
5360
).reset_index()
5461

5562
return df_pivot

0 commit comments

Comments
 (0)