Skip to content

Conversation

@haotian1028
Copy link

No description provided.

haotianzhang and others added 2 commits April 1, 2025 01:56
- Added API folder
- Train and Test notebook for each part of the process, to break it up
@connorn-dev connorn-dev requested a review from carlosparadis May 3, 2025 23:34
- Cleaned up imports
Copy link
Member

@carlosparadis carlosparadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for moving the code around to the api folder. Only a few minor changes. When you looked at the original file, were there pip commands somewhere?

I am not seeing anything that install the dependencies. Could you create an env.yml file here without using the export? (see my remark on env.yml file on the process mining project)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you combine this to test.py and rename to model.py?

"id": "RAXtSnSK4LPr"
},
"source": [
"# Tokenlized\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Copy link
Member

@carlosparadis carlosparadis May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename file to tokenize_statistics.ipynb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This notebook does not tokenize anything currently, it just adds a column with 1s associated to the tokenizer. Not sure why, will figure out in the future.

Copy link
Member

@carlosparadis carlosparadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We confirmed that the only files that will trully be needed on this notebook are:

  • so-dataset.csv
  • gh-dataset.csv
  • crossplatform_sf_dataset.csv

and also the one column ones:

  • so-dataset_tokenized.csv
  • gh-dataset_tokenized.csv
  • crossplatform_sf_dataset_tokenized.csv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This filter.py functions are not used in the entire original notebook, and therefore it is not used in these 3 refactored notebooks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This tokenizer.py functions are not used in the entire original notebook, and therefore it is not used in these 3 refactored notebooks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This notebook does not tokenize anything currently, it just adds a column with 1s associated to the tokenizer. Not sure why, will figure out in the future.

@carlosparadis
Copy link
Member

So just to confirm I just need your pips if possible as a env.yml and to merge the api files of train.py and test.py to model.py

- train.py and test.py now exist in model.py
- env added for required packages
- Minor typo changes

Signed-off-by: Connor Narowetz <cnarowetz@gmail.com>
- __init__.py added
- docs for model.py added

Signed-off-by: Connor Narowetz <cnarowetz@gmail.com>
@connorn-dev
Copy link
Collaborator

Pdocs Attached

docs.zip

@carlosparadis
Copy link
Member

@connorn-dev thank you for remembering this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants