The Roslyn Code Analyzer is a powerful tool built using the .NET Roslyn compiler platform. Its primary purpose is to analyze C# solutions (.sln) or projects (.csproj) and generate a detailed graph representation of the codebase. This graph, consisting of nodes (code elements) and edges (relationships), is outputted as a JSON object, making it ideal for consumption by downstream analysis tools, particularly Retrieval-Augmented Generation (RAG) systems.
By parsing the syntax and leveraging semantic analysis, the tool extracts rich information including namespaces, types (classes, interfaces, enums, structs), members (methods, properties, fields), documentation comments, code snippets, and structural relationships like inheritance and containment.
Primarily, it's meant to be used by https://github.com/devfire/lightrag-csharp for Neo4j upload but can be used with any graph database (AWS Neptune, etc.)
- Roslyn-Powered Analysis: Utilizes the official .NET Compiler Platform (Roslyn) for accurate syntax and semantic analysis of C# code.
- Graph Representation: Models the codebase as a graph with
CodeNode(representing code elements) andCodeEdge(representing relationships). - Rich Metadata Extraction: Captures details like fully qualified names, element types, file locations, XML documentation comments, code snippets, and method signatures.
- Relationship Mapping: Identifies and records relationships such as
CONTAINS,INHERITS_FROM, andIMPLEMENTS. - Solution & Project Support: Can analyze both individual
.csprojfiles and entire.slnsolutions. - JSON Output: Exports the code graph structure in a clean, machine-readable JSON format (using camelCase).
- Targeted for RAG: Specifically designed to produce structured data suitable for feeding into RAG pipelines for code understanding and generation tasks.
The analysis process follows these steps:
- Workspace Loading (
Program.cs): The application takes the path to a.slnor.csprojfile as a command-line argument. It usesMicrosoft.CodeAnalysis.MSBuild.MSBuildWorkspaceto load the specified solution or project, including referenced projects and metadata. - Project Iteration (
Program.cs): It iterates through each project within the loaded workspace. Analysis of projects can occur in parallel for efficiency. - Document Processing (
Program.cs): For each C# document within a project, it retrieves theSyntaxTreeandSemanticModel. - Code Traversal (
CodeStructureWalker.cs): An instance ofCodeStructureWalker(aCSharpSyntaxWalker) traverses the syntax tree of each document. - Node & Edge Creation (
CodeStructureWalker.cs): As the walker visits different syntax nodes (likeClassDeclarationSyntax,MethodDeclarationSyntax, etc.), it uses theSemanticModelto get detailed symbol information. It creates:CodeNodeobjects for elements like namespaces, classes, interfaces, methods, properties, fields, enums, and enum members. Each node stores metadata (ID, type, name, location, comment, snippet, signature).CodeEdgeobjects to represent relationships likeCONTAINS(e.g., a class contains a method),INHERITS_FROM(class inheritance), andIMPLEMENTS(class/struct implementing an interface).
- Aggregation (
Program.cs): The nodes and edges collected from all documents and projects are aggregated into a single list. - JSON Serialization (
Program.cs,DataModel.cs): The final aggregated list of nodes and edges is encapsulated in anAnalysisResultobject and serialized to JSON usingSystem.Text.Json. The JSON output is written to the standard output stream (stdout). Progress and error messages are written to standard error (stderr).
The tool outputs a single JSON object to stdout. The structure is defined in DataModel.cs:
{
"nodes": [
{
"id": "string (Fully Qualified Name)",
"type": "string (e.g., Class, Method, Namespace)",
"name": "string (Simple Name)",
"filePath": "string",
"startLine": number,
"endLine": number,
"comment": "string | null (XML Doc Summary)",
"signature": "string | null (Method Signature)",
"codeSnippet": "string | null (Source Code Text)"
}
// ... more nodes
],
"edges": [
{
"sourceId": "string (ID of source node)",
"targetId": "string (ID of target node)",
"type": "string (e.g., CONTAINS, INHERITS_FROM, IMPLEMENTS)"
}
// ... more edges
]
}Example Snippet:
{
"nodes": [
{
"id": "MyNamespace.MyClass",
"type": "Class",
"name": "MyClass",
"filePath": "/path/to/MyClass.cs",
"startLine": 10,
"endLine": 55,
"comment": "This is a sample class.",
"signature": null,
"codeSnippet": "public class MyClass : IMyInterface\n{\n // ... members ...\n}"
},
{
"id": "MyNamespace.MyClass.MyMethod(int)",
"type": "Method",
"name": "MyMethod",
"filePath": "/path/to/MyClass.cs",
"startLine": 25,
"endLine": 35,
"comment": "Performs an important task.",
"signature": "MyMethod(int value)",
"codeSnippet": "public void MyMethod(int value)\n {\n // ... implementation ...\n }"
}
],
"edges": [
{
"sourceId": "MyNamespace.MyClass",
"targetId": "MyNamespace.MyClass.MyMethod(int)",
"type": "CONTAINS"
},
{
"sourceId": "MyNamespace.MyClass",
"targetId": "MyNamespace.IMyInterface",
"type": "IMPLEMENTS"
}
]
}The primary goal of this tool is to generate structured data about a codebase that can be effectively used by Retrieval-Augmented Generation (RAG) systems. The JSON output provides:
- Nodes: Discrete units of code (classes, methods) with their source code (
codeSnippet) and documentation (comment). - Edges: Relationships between these units, providing context about inheritance, implementation, and structure.
This graph allows RAG systems to retrieve relevant code snippets and understand their context within the larger codebase, leading to more accurate and context-aware code generation or analysis.
- .NET 8.0 SDK or later.
- Clone the Repository:
git clone https://github.com/your-username/RoslynCodeAnalyzer.git # Replace with actual URL if available cd RoslynCodeAnalyzer
- Restore Dependencies:
dotnet restore
Build the project using the .NET CLI:
dotnet buildThis command compiles the project. Use -c Release for a release build:
dotnet build -c ReleaseRun the analyzer from the command line, providing the path to the solution or project file as an argument.
# Analyze a solution file
dotnet run --project ./RoslynCodeAnalyzer.csproj -- /path/to/your/solution.sln > output.json
# Analyze a project file
dotnet run --project ./RoslynCodeAnalyzer.csproj -- /path/to/your/project.csproj > output.json
# Using the built executable (e.g., after 'dotnet build -c Release')
./bin/Release/net8.0/RoslynCodeAnalyzer /path/to/your/solution.sln > output.json- Replace
/path/to/your/solution.slnor/path/to/your/project.csprojwith the actual path to your target file. - The
>redirects the JSON output (stdout) to a file namedoutput.json. - Progress messages and errors will be printed to the console (stderr).
CodeNode: Represents a code element.Id: Fully qualified name (unique identifier).Type: Category (e.g., "Class", "Method").Name: Simple name of the element.FilePath: Path to the source file.StartLine,EndLine: Location within the file.Comment: XML documentation summary.Signature: Formal signature (primarily for methods).CodeSnippet: The raw source code text of the element.
CodeEdge: Represents a relationship between two nodes.SourceId: ID of the node where the relationship originates.TargetId: ID of the node where the relationship terminates.Type: Type of relationship (e.g., "CONTAINS", "INHERITS_FROM", "IMPLEMENTS").
CALLSEdges: Implement analysis of method invocation expressions (VisitInvocationExpression) to createCALLSedges, showing which methods call others.- Attribute Analysis: Extract information from attributes decorating code elements.
- More Granular Snippets: Option for smaller, more focused code snippets (e.g., method body only).
- Configuration Options: Allow configuration via file or command-line arguments (e.g., filtering elements, choosing output details).
Contributions are welcome! Please follow standard fork-and-pull-request workflow. Ensure code style consistency and add tests where appropriate.
- Fork the repository.
- Create a feature branch (
git checkout -b feature/my-new-feature). - Commit your changes (
git commit -am 'Add some feature'). - Push to the branch (
git push origin feature/my-new-feature). - Open a Pull Request.
This project is licensed under the MIT License.