- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 102
feature/handle-similar-messages-as-scam #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
2842442    to
    c160a4d      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, thanks! I got some remarks regarding readability mostly 👍
        
          
                application/src/main/java/org/togetherjava/tjbot/config/MessageInfo.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/config/MessageInfo.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/config/ScamBlockerConfig.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                application/src/main/java/org/togetherjava/tjbot/features/moderation/scam/ScamBlocker.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      …ngs and refactored existing code
c160a4d    to
    70375f7      
    Compare
  
    | 
 | 
| if (alreadyFlaggedUsers.contains(userId)) { | ||
| return true; | ||
| } | ||
| if (shouldIgnore(event.getMessage())) { | ||
| return false; | ||
| } | ||
| String hash = addToMessageCache(event).messageHash(); | ||
| if (hasPostedTooManySimilarMessages(userId, hash)) { | ||
| alreadyFlaggedUsers.add(userId); | ||
| return true; | ||
| } else { | ||
| return false; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if that alreadyFlaggedUsers user thing really works.. the bot has a really long uptime usually, sometimes many months.
the way i read the code is that if a user got flagged once, they will then be immune to this check until the bot is restarted again. u could argue that its probably not an issue in practice but it smells a bit. you would need a time based cleanup for alreadyFlaggedUsers as well.
at which point u have like 30 lines of code dealing with what caffeeine gets done in one line, i guess.
private final Cache<String, Instant> userIdToAskedAtCache = Caffeine.newBuilder()
  .maximumSize(1_000)
  .expireAfterWrite(Duration.of(10, ChronoUnit.SECONDS))
  .build();the interface is essentially that of a Map, so you have various get and put variants. internally its a LRU map, so it kicks out the entries that werent touched the longest when it reaches the max limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do i make it a set ? if the cache already handles he cleanup, then storing the instant is pointless.
edit: i saw your other comment, i'll see that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caffeine only has a Map. sometimes in the past we also only needed a Set and then mapped it to something. for example to itself, i.e. key == value.
| * @param text the UTF 8 text to hash | ||
| * @return the computed hash | ||
| */ | ||
| public static byte[] hashUTF8(String method, String text) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: should probably be renamed into hashUtf8 for better camelCase readability
| private MessageInfo addToMessageCache(MessageReceivedEvent event) { | ||
| long userId = event.getAuthor().getIdLong(); | ||
| long channelId = event.getChannel().getIdLong(); | ||
| String messageHash = getHash(event.getMessage()); | ||
| Instant timestamp = event.getMessage().getTimeCreated().toInstant(); | ||
| MessageInfo messageInfo = new MessageInfo(userId, channelId, messageHash, timestamp); | ||
| messageCache.add(messageInfo); | ||
| return messageInfo; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id still refactor this and move all the stuff dealing with MessageInfo into the record. so:
record MessageInfo(...) {
  static MessageInfo fromMessageEvent(MessageReceivedEvent event) {...}
}then this addToMessageCache method becomes obsolete and can be replaced with just messageCache.add(MessageInfo.fromMessageEvent(event)); at the call-site
the hash creation would then also move into the records fromMessageEvent method (or as private static helper in the record)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like adding a unnecessary dependencies to a record, and even less logic like how to create a hash or how to create it from an event imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mh... but its a factory to create instances of this record. so imo this is exactly the place it should be. what do others here think?
| * @param event the message event | ||
| * @return true if the user spammed the message in several channels, false otherwise | ||
| */ | ||
| public boolean doSimilarMessageCheck(MessageReceivedEvent event) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could u do some reordering so the flow reads top to bottom. this method would then be at the top, as entry-point. and the private methods then below it in the order they are used. the runRoutine can stay at the very bottom, i guess, since its a detached flow.
| "suspiciousAttachmentNamePattern": "(image|\\d{1,2})\\.[^.]{0,5}" | ||
| "suspiciousAttachmentNamePattern": "(image|\\d{1,2})\\.[^.]{0,5}", | ||
| "maxAllowedSimilarMessages": 2, | ||
| "similarMessagesWindow": 1, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing time unit. please add it to the name.
| "maxAllowedSimilarMessages": 2, | ||
| "similarMessagesWindow": 1, | ||
| "similarMessageLengthIgnore": 10, | ||
| "similarMessagesWhitelist": [] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think we need that, please remove.
the UX for it for maintainers would be really weird, having to bloat the config with 100 character long messages and whatnot. and then it didnt work bc of a smiley or other stuff that ends up slightly different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 4? or just the whitelist ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, the preview didnt turn out the way i wanted. i was referring to the message whitelist.
| isSafe = false; | ||
| } | ||
|  | ||
| if (isSafe && similarMessagesDetector.doSimilarMessageCheck(event)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doSimilarMessageCheck should be renamed. maybe hasPostedSimilarMessageTooOften(event)
if you dont want it to start with a question-word bc of the side effects, maybe handleHasPostedSimilarMessagesTooOften(event).
but the name should clarify what the boolean it returns means and how it fits into the logic of ScamBlocker
| /** | ||
| * Has to be called often to clear the cache. | ||
| */ | ||
| public void runRoutine() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to clearCache or cleanupCache or something.
| return similarMessageCount > scamBlockerConfig.getMaxAllowedSimilarMessages(); | ||
| } | ||
|  | ||
| private boolean isObsolete(MessageInfo messageInfo) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to isOutdated or isExpired
| return true; | ||
| } | ||
| return scamBlockerConfig.getSimilarMessagesWhitelist() | ||
| .contains(message.getContentRaw().toLowerCase()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.toLowerCase needs Locale.US



From in #1283 with those differences: