-
Notifications
You must be signed in to change notification settings - Fork 33
major update: Python 3, NDB datastore, Flask, pytest #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
(they still don't actually WORK)
The email field in a slack user profile isn't necessarily trustworthy - some workspaces allow users to edit it themselves. It also doesn't necessarily match the user's Google account email. This change adds a `slack_id` attribute to User entities in the datastore, and the bot prompts users to manually configure this when they first use a /snippets command in slack. We can improve the onboarding UX in the future - I'm sure there is a way of establishing Google logins from a slack app without copy-and-paste.
Cloud NDB is strongly consistent by default now
- Use pytest as the runner so we can use its fixture decorators and monkeypatch tricks - Switch to flask's own testing features instead of webtest - Add an in-memory stub implementation of the ndb API so we don't have to involve the datastore emulator binary from the google SDK
|
@benley -- I love seeing this, thank you so much for doing it! This service has needed such an upgrade for a long time. I'm sorry I haven't reviewed it yet, I've been on a pretty long vacation. Once I have my head above water again I'll take a look. In the meantime, can you sign the Khan Academy CLA if you haven't already? https://www.khanacademy.org/r/cla |
csilvers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an amazing piece of work! Thank you so much for taking the time to do it. The new code looks really good to me.
I do have to admit, though, that I'm somewhat uncomfortable having this mondo PR that changes everything at once. Now that you've made the work to convert everything, I'm wondering if maybe there is a way to split some things into a separate PR. For instance, could the changes to add slack_id as a setting, and all the changes that come with that, be its own PR? What about the change to move to flask?
I know it's more work given that you already have this PR, and I'm on the fence about how much of a difference it would make. Let me know what you think.
"Request changes" for a bunch of small things -- nothing major, except maybe the breaking change in how we handle slack-ids. (Which is one reason it might be nice to separate that part, at least, out to its own PR.) I'm excited to see this land!
| </div> | ||
|
|
||
| <div class="user-settings-block"> | ||
| <label class="user-settings-label" for="slack-id">Slack User ID</label> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include the leading @? (I assume by user-id you mean the text you use to at-mention someone in slack, e.g. @csilvers for me.) I'm guessing not? It might be worthwhile to add some documentation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an opaque identifier that slack assigns (e.g. U01CT7ABT0K), and you're right that it warrants an explanation! If I recall correctly I had some workflow where the slack bot informs you of your userid and tells you where to put it in the app, but there should be inline documentation here. Will do.
| def _get_user_email(uid): | ||
| """Retrieve the email address for a specific userid from the Slack Web API. | ||
| Raises ValueError if could not be retrieved. | ||
| """ | ||
| reply = _web_api('users.info', {'user': uid}) # possible ValueError | ||
| email = reply.get('user', {}).get('profile', {}).get('email', None) | ||
| if email is None: | ||
| raise ValueError('Slack user profile did not have email') | ||
| return email | ||
|
|
||
|
|
||
| def _get_user_email_cached(uid, force_refresh=False): | ||
| """Retrieve the email address for a specific user id, with a cache. | ||
| Results are stored in memcache for up to a day. | ||
| If force_refresh parameter is specified, cached data will be refreshed. | ||
| Raises ValueError if could not be retrieved. | ||
| """ | ||
| key = 'slack_profile_email_' + uid | ||
| cached_data = memcache.get(key) | ||
| if (cached_data is None) or force_refresh: | ||
| logging.debug("cache miss/refresh for slack email lookup %s", uid) | ||
| email = _get_user_email(uid) # possible ValueError | ||
| if not memcache.set(key=key, value=email, time=86400): | ||
| logging.error('memcache set failed!') | ||
| return email | ||
| else: | ||
| logging.debug("cache hit for slack email lookup %s", uid) | ||
| return cached_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you get rid of this feature for automatically getting the slack username, and instead move it to settings? Is it so you wouldn't have to deal with memcache/memorystore with python3? or are there other reasons too?
I worry about this because it means existing clients of snippets will need to update their settings to add their slack-id explicitly, making this upgrade a breaking change. It would be nice to avoid that if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into an issue at my company where slack user emails don't always match the user's google workspace email, which made me realize there is also a security concern here. From a commit message (fa6ff22):
Authenticate slack users by their slack ID, don't trust emails
The email field in a slack user profile isn't necessarily trustworthy -
some workspaces allow users to edit it themselves. It also doesn't
necessarily match the user's Google account email.This change adds a
slack_idattribute to User entities in the
datastore, and the bot prompts users to manually configure this when
they first use a /snippets command in slack.We can improve the onboarding UX in the future - I'm sure there is a way
of establishing Google logins from a slack app without copy-and-paste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think that's a reasonable change to make, but should definitely be a separate PR. I'm ok with you doing that PR first (before this one), or after, whatever is easier for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also have to think about backward compatibility: how we don't break existing users using this feature when we upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we add a toggle in AdminSettings between the old and new behavior?
- Trust Slack user email addresses?
If enabled, the Snippets server will assume that a Slack user's email address can be safely mapped to a Google account with the same address. Disable this if your users may have mismatched email addresses between Slack and Google, or if your Slack workspace permits users to arbitrarily edit their own email address field.
We could start lazily populating the User.slack_id property when the toggle is enabled, which would eliminate the need for memcache. And if the toggle is disabled, users would need to manually associate their Slack & Google accounts with the bot's instructions:
<user> /snippets list
<SnippetsBot> You don't seem to be logged in! Please configure your Slack user ID (which isU01CT7ABH0R) in the snippets server's settings page: https://snippets.example.com/settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think adding another setting for this is a fine solution! All the more reason to do it in a separate PR though :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll open another one to do that.
|
Thank you for the detailed review! I also just got back from vacation and am gradually catching up, so I will do my best to finish responding to comments in the next day or so. |
Okay actually it will be another few days before I can actually spend time on this. But it is not abandoned, I'm coming back to finish when I can. |
|
No rush at all! |
The app doesn't work on actual appengine without this!
as suggested in python docs: https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts
We're using time_machine to handle datetime in tests now.
csilvers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good progress! I couldn't tell if you were ready for another look yet or not, but I did one anyway. There are still a few things that (I think!) remain to be done so I still have "request changes".
- Update the README to give new instructions, esp. for using nix
- Add back in
required=Truefordomains - Keep
_get_user_emailin slacklib.py, for now. I'm ok with getting rid of it, but it should be its own PR because it's going to be a bit tricky to roll out
I think that's it!
| default_private = db.BooleanProperty(default=False) # new-user default | ||
| default_markdown = db.BooleanProperty(default=True) # new-user default | ||
| default_email = db.BooleanProperty(default=True) # new-user default | ||
| domains = ndb.StringProperty(repeated=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be done.
|
I know there are still a few things left to address - I will get there soon :) |
Just to over-communicate: I'm waiting for you to give the signal before I review this again. No rush at all, just don't want you to be waiting on me while I'm waiting on you. :-) |
Noted! I will be able to continue work on this in about a week, I think. Thank you for your patience. |
|
Sorry this is still unfinished - I just wanted to say it's not abandoned, I do still intend to come back and finish this PR in the near future. |
|
Thanks for the update! No rush. |
I apologize for this being such a giant change all at once, but I found it to be impractical to do all this as incremental steps. My goal was to modernize the code just enough to use Python 3, which required updating the web framework, and Flask made for a pretty good upgrade path from webapp2. Switching to the NDB datastore was not strictly required, but that turns out to be an easy process.
Unit tests now use pytest with test fixtures from an in-memory datastore library so no external services are needed to urn tests.
Library and tool dependencies have their versions pinned using Nix. This has no effect on the appengine runtime, but it makes setting up a development environment much easier.
There should be no incompatible changes to the data models, so upgrading an existing site will theoretically be easy.
Some README updates are probably warranted. Please let me now if you see any major omissions, or if any of the changes I've made are no good; I am open to reworking things as needed.