Skip to content

Conversation

@AtolagbeMuiz
Copy link
Contributor

This pull request fixes #6813

This new analyzer implementation allows similar unit test methods with different names within the same test class to be detected as duplicates.

NB: Because this is a new analyzer feature, I declared a new DIagnosticId value of MSTEST0059 as DoNotDuplicateTestMethodRuleId

context => CollectTestMethod(context, testClassAttributeSymbol, testMethodAttributeSymbol, testMethodsFound),
SymbolKind.Method);

compilationContext.RegisterCompilationEndAction(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of being compilation end, can it instead of a symbol start/end analyzer? You first RegisterSymbolStartAction for named types, then RegisterOperationBlockAction to analyze the individual methods, and then RegisterSymbolEndAction to do the final analysis of the collected methods?

private static SyntaxNode? GetMethodBody(SyntaxNode methodNode)
{
// Try block body first
System.Reflection.PropertyInfo? bodyProperty = methodNode.GetType().GetProperty("Body");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using reflection here doesn't seem like a good idea. This also actually might fail for VB which the analyzer claims to support

Comment on lines +200 to +203
for (int i = 0; i < methods.Count; i++)
{
for (int j = i + 1; j < methods.Count; j++)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like very extensive computations. Probably there is something we can do better here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi if you can suggest something here please. This PR attempts to implement #6813

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. i have major concerns here.

  1. n^2 on number of methods. Which can be huge. We def have tests classes with thousands of items.
  2. edit-distance on string representation. Which can also be huge and n^2
  3. major allocs for each pair. Things like int[,] d = new int[s1.Length + 1, s2.Length + 1]; can be multi-mb allocs per pair.

I don't know of some great solution here off the top of my head. But i think some research needs to be done in state of teh art duplication detection, and seeing what might apply here.

As an example, if each method was fingerprinted somehow (maybe a vector embedding), then only the methods with the closest embedding value would be comapred. eetc. etc.

@AtolagbeMuiz
Copy link
Contributor Author

AtolagbeMuiz commented Dec 14, 2025

observed another property DoNotUseParallelizeAndDoNotParallelizeTogetherRuleId has taken the value MSTEST0059..
image

I will be changing the DisagnosticId to MSTEST0062 @Youssef1313

@Evangelink
Copy link
Member

@AtolagbeMuiz based on the comment from @CyrusNajmabadi, we won't merge the PR as-is and we would need to do some research on state of the art algorithms for distance and computation. Given we don't have much time, I will move by closing this PR.

Thanks for the time invested!

@Evangelink Evangelink closed this Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Analyzer suggestion]: Duplicate test methods

4 participants