Skip to content

Add FIFA data parser support#339

Open
ouyang1030 wants to merge 3 commits intoAlek050:developfrom
ouyang1030:add/fifa
Open

Add FIFA data parser support#339
ouyang1030 wants to merge 3 commits intoAlek050:developfrom
ouyang1030:add/fifa

Conversation

@ouyang1030
Copy link

Support for FIFA event data

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 10.76923% with 232 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.09%. Comparing base (e4ce2b3) to head (892d568).
⚠️ Report is 75 commits behind head on develop.

Files with missing lines Patch % Lines
...lpy/data_parsers/event_data_parsers/fifa_parser.py 10.11% 231 Missing ⚠️
databallpy/utils/get_game.py 50.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (10.76%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #339      +/-   ##
===========================================
- Coverage    99.22%   95.09%   -4.14%     
===========================================
  Files           49       66      +17     
  Lines         3736     5258    +1522     
===========================================
+ Hits          3707     5000    +1293     
- Misses          29      258     +229     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Owner

@Alek050 Alek050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ouyang1030,

Thanks for opening this PR. I am really happy with the quality and style of your code. I really appreciate that you kept a very similar format and architecture for writing the parser that is already used in DataBallPy. Before we can merge it we need a few minor and one major point.

  1. Consider removing inline comments as much as possible. The code should speak for itself. I think the code by itself is clear enough, any necessary additions can be added to the docstrings instead.
  2. It seems like you build in events that have NaT. Since the datetime objects are used in synchronisation I would like your thoughts on a fallback date option. For example if a game has now start time, can we just say it starts at a hardcoded time?
  3. This is a big one, but next to the code implementation we need unittests to make sure everything works as expected. This requires testing unique functions to see the outcome is as expected. Let me know when you need help with this.
  4. Last, you can also update the Changelog, documentation and readme to update the available data providers in this PR.

Other than that I have added some small questions on specific parts on the code because I do not have access to the Fifa data. I would appreciate it if you can answer on question and resolve comments that you fix in the next PR.

Looking forward to finishing this PR soon!

@@ -0,0 +1,782 @@
from ast import In
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where and for what is this used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will delete it

from databallpy.utils.logging import logging_wrapper


# FIFA event type mappings to databallpy events
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please minimise inline comments, usually they can be left out and fixed with readable code. In this case I think the mappings are perfectly clear without the inline comments. Consider removing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

"attempt_at_goal": "shot",
"own_goal": "own_goal",
"tackle": "tackle",
"no_event": "tackle",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we map a 'no_event' to a 'tackle'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an error

away_formation=away_formation,
country=country,
)
metadata.kickoff_utc = kickoff_time
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still need the property kickoff_utc if it is already present in the periods dataframe? I would prefer not to add new properties if it is not necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines +216 to +217
home_formation = metadata_json["home_formation"]
away_formation = metadata_json["away_formation"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the fifa formations directly in the format DataBallPy expects them to be?

# Ensure boolean dtype
event_data["is_successful"] = event_data["is_successful"].astype("boolean")
event_data.loc[event_data["period_id"] > 5, "period_id"] = -1
event_data = event_data.drop(columns=["outcome_additional"])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you drop it here. Is there a reason why you did not just do it in the loop? Are there any performance issues?

Comment on lines +456 to +457
"offer_events": offer_events,
"other_events": other_events,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop these, although I am not sure what 'offer_events' are. I would prefer to also have 'dribble_events' in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

body_part = BODY_PART_MAP.get(event.get("body_type", "other"), "unspecified")

# Get set piece
origin = event.get("origin", "")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does origin refer to set piece information or to the start of the possession type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set piece information

Comment on lines 636 to 659
x_norm = event.get("x_location_start") if "x_location_start" in event else event.get("x")
y_norm = event.get("y_location_start") if "y_location_start" in event else event.get("y")

if x_norm is None or pd.isna(x_norm):
x_norm = 0.5
if y_norm is None or pd.isna(y_norm):
y_norm = 0.5

x_norm = max(0.0, min(1.0, float(x_norm)))
y_norm = max(0.0, min(1.0, float(y_norm)))

if period_id == 1 and flip_first_half:
x_norm = 1.0 - x_norm
y_norm = 1.0 - y_norm
elif period_id == 2 and flip_second_half:
x_norm = 1.0 - x_norm
y_norm = 1.0 - y_norm

x_start = (x_norm * pitch_dimensions[0]) - (pitch_dimensions[0] / 2.0)
y_start = (y_norm * pitch_dimensions[1]) - (pitch_dimensions[1] / 2.0)

if event.get("team_id") == away_team_id:
x_start *= -1
y_start *= -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suppart of code seems to be written out multiple times. Consider creating a seperate function for this functionality so you minimise double code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file. potentially add *.lnk to the .gitignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants