Skip to content

Conversation

@benley
Copy link
Contributor

@benley benley commented Jun 16, 2023

I apologize for this being such a giant change all at once, but I found it to be impractical to do all this as incremental steps. My goal was to modernize the code just enough to use Python 3, which required updating the web framework, and Flask made for a pretty good upgrade path from webapp2. Switching to the NDB datastore was not strictly required, but that turns out to be an easy process.

Unit tests now use pytest with test fixtures from an in-memory datastore library so no external services are needed to urn tests.

Library and tool dependencies have their versions pinned using Nix. This has no effect on the appengine runtime, but it makes setting up a development environment much easier.

There should be no incompatible changes to the data models, so upgrading an existing site will theoretically be easy.

Some README updates are probably warranted. Please let me now if you see any major omissions, or if any of the changes I've made are no good; I am open to reworking things as needed.

  • python 3 update
  • Flask framework to replace webapp2
  • Cloud NDB instead of legacy appengine datastore
  • Associate slack user IDs with snippets user entities (don't blindly trust slack profile email addresses)
  • Email reminders
  • Use GCP cloud logging
  • http POST instead of GET for modifying state
  • Fix unit tests
  • hide/unhide/delete users (with javascript enabled)
  • hide/unhide/delete users WITHOUT javascript
  • Dependency version pinning with nix for development

benley added 30 commits November 3, 2022 20:47
(they still don't actually WORK)
The email field in a slack user profile isn't necessarily trustworthy -
some workspaces allow users to edit it themselves.  It also doesn't
necessarily match the user's Google account email.

This change adds a `slack_id` attribute to User entities in the
datastore, and the bot prompts users to manually configure this when
they first use a /snippets command in slack.

We can improve the onboarding UX in the future - I'm sure there is a way
of establishing Google logins from a slack app without copy-and-paste.
Cloud NDB is strongly consistent by default now
- Use pytest as the runner so we can use its fixture decorators and
  monkeypatch tricks
- Switch to flask's own testing features instead of webtest
- Add an in-memory stub implementation of the ndb API so we don't have
  to involve the datastore emulator binary from the google SDK
@csilvers
Copy link
Member

@benley -- I love seeing this, thank you so much for doing it! This service has needed such an upgrade for a long time.

I'm sorry I haven't reviewed it yet, I've been on a pretty long vacation. Once I have my head above water again I'll take a look.

In the meantime, can you sign the Khan Academy CLA if you haven't already? https://www.khanacademy.org/r/cla

Copy link
Member

@csilvers csilvers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an amazing piece of work! Thank you so much for taking the time to do it. The new code looks really good to me.

I do have to admit, though, that I'm somewhat uncomfortable having this mondo PR that changes everything at once. Now that you've made the work to convert everything, I'm wondering if maybe there is a way to split some things into a separate PR. For instance, could the changes to add slack_id as a setting, and all the changes that come with that, be its own PR? What about the change to move to flask?

I know it's more work given that you already have this PR, and I'm on the fence about how much of a difference it would make. Let me know what you think.

"Request changes" for a bunch of small things -- nothing major, except maybe the breaking change in how we handle slack-ids. (Which is one reason it might be nice to separate that part, at least, out to its own PR.) I'm excited to see this land!

</div>

<div class="user-settings-block">
<label class="user-settings-label" for="slack-id">Slack User ID</label>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include the leading @? (I assume by user-id you mean the text you use to at-mention someone in slack, e.g. @csilvers for me.) I'm guessing not? It might be worthwhile to add some documentation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually an opaque identifier that slack assigns (e.g. U01CT7ABT0K), and you're right that it warrants an explanation! If I recall correctly I had some workflow where the slack bot informs you of your userid and tells you where to put it in the app, but there should be inline documentation here. Will do.

Comment on lines -75 to -106
def _get_user_email(uid):
"""Retrieve the email address for a specific userid from the Slack Web API.
Raises ValueError if could not be retrieved.
"""
reply = _web_api('users.info', {'user': uid}) # possible ValueError
email = reply.get('user', {}).get('profile', {}).get('email', None)
if email is None:
raise ValueError('Slack user profile did not have email')
return email


def _get_user_email_cached(uid, force_refresh=False):
"""Retrieve the email address for a specific user id, with a cache.
Results are stored in memcache for up to a day.
If force_refresh parameter is specified, cached data will be refreshed.
Raises ValueError if could not be retrieved.
"""
key = 'slack_profile_email_' + uid
cached_data = memcache.get(key)
if (cached_data is None) or force_refresh:
logging.debug("cache miss/refresh for slack email lookup %s", uid)
email = _get_user_email(uid) # possible ValueError
if not memcache.set(key=key, value=email, time=86400):
logging.error('memcache set failed!')
return email
else:
logging.debug("cache hit for slack email lookup %s", uid)
return cached_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you get rid of this feature for automatically getting the slack username, and instead move it to settings? Is it so you wouldn't have to deal with memcache/memorystore with python3? or are there other reasons too?

I worry about this because it means existing clients of snippets will need to update their settings to add their slack-id explicitly, making this upgrade a breaking change. It would be nice to avoid that if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an issue at my company where slack user emails don't always match the user's google workspace email, which made me realize there is also a security concern here. From a commit message (fa6ff22):

Authenticate slack users by their slack ID, don't trust emails
The email field in a slack user profile isn't necessarily trustworthy -
some workspaces allow users to edit it themselves. It also doesn't
necessarily match the user's Google account email.

This change adds a slack_id attribute to User entities in the
datastore, and the bot prompts users to manually configure this when
they first use a /snippets command in slack.

We can improve the onboarding UX in the future - I'm sure there is a way
of establishing Google logins from a slack app without copy-and-paste.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think that's a reasonable change to make, but should definitely be a separate PR. I'm ok with you doing that PR first (before this one), or after, whatever is easier for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also have to think about backward compatibility: how we don't break existing users using this feature when we upgrade.

Copy link
Contributor Author

@benley benley Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we add a toggle in AdminSettings between the old and new behavior?

  • Trust Slack user email addresses?
    If enabled, the Snippets server will assume that a Slack user's email address can be safely mapped to a Google account with the same address. Disable this if your users may have mismatched email addresses between Slack and Google, or if your Slack workspace permits users to arbitrarily edit their own email address field.

We could start lazily populating the User.slack_id property when the toggle is enabled, which would eliminate the need for memcache. And if the toggle is disabled, users would need to manually associate their Slack & Google accounts with the bot's instructions:

<user> /snippets list
<SnippetsBot> You don't seem to be logged in! Please configure your Slack user ID (which is U01CT7ABH0R) in the snippets server's settings page: https://snippets.example.com/settings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think adding another setting for this is a fine solution! All the more reason to do it in a separate PR though :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll open another one to do that.

@benley
Copy link
Contributor Author

benley commented Jul 11, 2023

Thank you for the detailed review! I also just got back from vacation and am gradually catching up, so I will do my best to finish responding to comments in the next day or so.

@benley
Copy link
Contributor Author

benley commented Jul 13, 2023

Thank you for the detailed review! I also just got back from vacation and am gradually catching up, so I will do my best to finish responding to comments in the next day or so.

Okay actually it will be another few days before I can actually spend time on this. But it is not abandoned, I'm coming back to finish when I can.

@csilvers
Copy link
Member

No rush at all!

Copy link
Member

@csilvers csilvers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good progress! I couldn't tell if you were ready for another look yet or not, but I did one anyway. There are still a few things that (I think!) remain to be done so I still have "request changes".

  1. Update the README to give new instructions, esp. for using nix
  2. Add back in required=True for domains
  3. Keep _get_user_email in slacklib.py, for now. I'm ok with getting rid of it, but it should be its own PR because it's going to be a bit tricky to roll out

I think that's it!

default_private = db.BooleanProperty(default=False) # new-user default
default_markdown = db.BooleanProperty(default=True) # new-user default
default_email = db.BooleanProperty(default=True) # new-user default
domains = ndb.StringProperty(repeated=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be done.

@benley
Copy link
Contributor Author

benley commented Jul 26, 2023

I know there are still a few things left to address - I will get there soon :)

@csilvers
Copy link
Member

csilvers commented Aug 5, 2023

I know there are still a few things left to address - I will get there soon :)

Just to over-communicate: I'm waiting for you to give the signal before I review this again. No rush at all, just don't want you to be waiting on me while I'm waiting on you. :-)

@benley
Copy link
Contributor Author

benley commented Aug 9, 2023

I know there are still a few things left to address - I will get there soon :)

Just to over-communicate: I'm waiting for you to give the signal before I review this again. No rush at all, just don't want you to be waiting on me while I'm waiting on you. :-)

Noted! I will be able to continue work on this in about a week, I think. Thank you for your patience.

@benley
Copy link
Contributor Author

benley commented Sep 22, 2023

Sorry this is still unfinished - I just wanted to say it's not abandoned, I do still intend to come back and finish this PR in the near future.

@csilvers
Copy link
Member

Thanks for the update! No rush.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants