Skip to content

Add cache functionality#50

Open
pdvass wants to merge 5 commits intofabiobatalha:masterfrom
pdvass:master
Open

Add cache functionality#50
pdvass wants to merge 5 commits intofabiobatalha:masterfrom
pdvass:master

Conversation

@pdvass
Copy link

@pdvass pdvass commented Mar 21, 2023

The proposed changes are based on Crossref's REST API documentation recommendation of caching the responses.

NOTE: The cache that unittest creates is persistent. To tackle that, I have written .bat and .sh files to run develop and test from setup.py, then delete the cache created. If needed, I can provide them too.

@pdvass pdvass marked this pull request as draft April 12, 2023 09:51
@pdvass pdvass marked this pull request as ready for review April 12, 2023 09:51
@fabiobatalha
Copy link
Owner

Hello @pdvass

Thanks for the commit.

In my point of view, what we would cache is the http requests made to the Crossref API.

So, I think any cache implementation should be in the HTTPRequest class, in specific the do_http_request method.

Besides that, I'm not sure about the benefits to include a caching implementation in this library, once it is mostly used for data harvesting, and the chances to reuse or benefit from a cache is almost null, but maybe I could be wrong. Even though, it is fine for me to include a cache layer in the terms described above.

@pdvass
Copy link
Author

pdvass commented Apr 13, 2023

Hello @fabiobatalha

Thank you for your feedback.

The main motive behind this PR is that I had to create a dataset that I didn't know from the beginning the size that it should be. My approach was to wait for the responses, convert them and then save them, but it was an expensive conversion for a bigger dataset. So, by caching I was able to save the responses and then add more, but not the same, if needed. Also, I find it easier to transfer it from device to device, to not rerun everything and save time.

As of the implementation details, this is how I managed to get it working, because I needed a way to choose a backend.


def __init__(
self,
backend=None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend parameter should be the last one. Some users don't use named parameters and your commit will break their scripts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought, since all the parameters had default values, if I gave backend also a default value it wouldn't break anything. But it is something that I can change it, so OK.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave a second thought, and I understood what you are saying. You're right. I will change it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants