-
Notifications
You must be signed in to change notification settings - Fork 291
Analyzer suggestion for duplicate test methods #6973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyzer suggestion for duplicate test methods #6973
Conversation
src/Analyzers/MSTest.Analyzers/DoNotDuplicateTestMethodAnalyzer.cs
Outdated
Show resolved
Hide resolved
| context => CollectTestMethod(context, testClassAttributeSymbol, testMethodAttributeSymbol, testMethodsFound), | ||
| SymbolKind.Method); | ||
|
|
||
| compilationContext.RegisterCompilationEndAction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of being compilation end, can it instead of a symbol start/end analyzer? You first RegisterSymbolStartAction for named types, then RegisterOperationBlockAction to analyze the individual methods, and then RegisterSymbolEndAction to do the final analysis of the collected methods?
| private static SyntaxNode? GetMethodBody(SyntaxNode methodNode) | ||
| { | ||
| // Try block body first | ||
| System.Reflection.PropertyInfo? bodyProperty = methodNode.GetType().GetProperty("Body"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using reflection here doesn't seem like a good idea. This also actually might fail for VB which the analyzer claims to support
| for (int i = 0; i < methods.Count; i++) | ||
| { | ||
| for (int j = i + 1; j < methods.Count; j++) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like very extensive computations. Probably there is something we can do better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CyrusNajmabadi if you can suggest something here please. This PR attempts to implement #6813
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. i have major concerns here.
- n^2 on number of methods. Which can be huge. We def have tests classes with thousands of items.
- edit-distance on string representation. Which can also be huge and n^2
- major allocs for each pair. Things like
int[,] d = new int[s1.Length + 1, s2.Length + 1];can be multi-mb allocs per pair.
I don't know of some great solution here off the top of my head. But i think some research needs to be done in state of teh art duplication detection, and seeing what might apply here.
As an example, if each method was fingerprinted somehow (maybe a vector embedding), then only the methods with the closest embedding value would be comapred. eetc. etc.
…r.cs Co-authored-by: Youssef Victor <youssefvictor00@gmail.com>
|
observed another property I will be changing the DisagnosticId to |
|
@AtolagbeMuiz based on the comment from @CyrusNajmabadi, we won't merge the PR as-is and we would need to do some research on state of the art algorithms for distance and computation. Given we don't have much time, I will move by closing this PR. Thanks for the time invested! |
This pull request fixes #6813
This new analyzer implementation allows similar unit test methods with different names within the same test class to be detected as duplicates.
NB: Because this is a new analyzer feature, I declared a new DIagnosticId value of
MSTEST0059asDoNotDuplicateTestMethodRuleId