Skip to content

[Phase 2] Add Semantic Tokens for Advanced Highlighting #34

@vogella

Description

@vogella

Description

Implement semantic tokens in the LSP server to provide advanced syntax highlighting beyond what TextMate grammars offer. This enables context-aware highlighting for attribute references, macros, special AsciiDoc constructs, and more.

Current State

Syntax Highlighting: TextMate grammar handles basic syntax
Missing: Semantic understanding (attribute references, resolved vs. unresolved, etc.)

Background

Semantic tokens provide language-aware highlighting based on semantic analysis, not just regex patterns. Examples:

  • Highlight undefined attribute references differently than defined ones
  • Different colors for broken vs. valid links
  • Macro names vs. regular text
  • Block attributes vs. inline attributes

Required Changes

1. Add Semantic Tokens Capability

File: AsciidocLanguageServer.java

ServerCapabilities capabilities = new ServerCapabilities();

SemanticTokensWithRegistrationOptions semanticTokensOptions = new SemanticTokensWithRegistrationOptions();

// Define token types
semanticTokensOptions.setLegend(new SemanticTokensLegend(
    Arrays.asList(
        "namespace",    // Attribute definitions
        "class",        // Headers
        "function",     // Macros (image::, include::, etc.)
        "parameter",    // Attribute references {name}
        "variable",     // Block attributes [source, java]
        "string",       // Quoted strings
        "comment",      // Comment blocks
        "keyword",      // Special keywords
        "operator",     // Delimiters
        "type"          // Types/roles
    ),
    Arrays.asList(
        "declaration",  // Attribute declarations
        "definition",   // Definitions
        "readonly",     // Built-in attributes
        "deprecated",   // Deprecated syntax
        "documentation" // Documentation blocks
    )
));

semanticTokensOptions.setFull(true);
semanticTokensOptions.setRange(false);

capabilities.setSemanticTokensProvider(semanticTokensOptions);

2. Implement Semantic Tokens Provider

File: AsciidocTextDocumentService.java

@Override
public CompletableFuture<SemanticTokens> semanticTokensFull(SemanticTokensParams params) {
    String uri = params.getTextDocument().getUri();
    AsciidocDocumentModel model = documentCache.get(uri);
    
    if (model == null) {
        return CompletableFuture.completedFuture(new SemanticTokens(Collections.emptyList()));
    }
    
    List<Integer> data = new ArrayList<>();
    List<String> lines = model.getLines();
    
    // Extract defined attributes for validation
    Set<String> definedAttributes = extractDefinedAttributes(model);
    
    int prevLine = 0;
    int prevChar = 0;
    
    for (int i = 0; i < lines.size(); i++) {
        String line = lines.get(i);
        
        // Tokenize line
        List<SemanticToken> tokens = tokenizeLine(line, i, definedAttributes);
        
        // Encode tokens in LSP format
        for (SemanticToken token : tokens) {
            int deltaLine = token.line - prevLine;
            int deltaChar = (deltaLine == 0) ? (token.startChar - prevChar) : token.startChar;
            
            data.add(deltaLine);
            data.add(deltaChar);
            data.add(token.length);
            data.add(token.tokenType);
            data.add(token.tokenModifiers);
            
            prevLine = token.line;
            prevChar = token.startChar;
        }
    }
    
    return CompletableFuture.completedFuture(new SemanticTokens(data));
}

3. Tokenize Line

private List<SemanticToken> tokenizeLine(String line, int lineNum, Set<String> definedAttributes) {
    List<SemanticToken> tokens = new ArrayList<>();
    
    // Attribute definitions (:name: value)
    tokens.addAll(tokenizeAttributeDefinitions(line, lineNum));
    
    // Attribute references {name}
    tokens.addAll(tokenizeAttributeReferences(line, lineNum, definedAttributes));
    
    // Headers (=, ==, ===)
    tokens.addAll(tokenizeHeaders(line, lineNum));
    
    // Macros (image::, include::, link:)
    tokens.addAll(tokenizeMacros(line, lineNum));
    
    // Block attributes [source, java]
    tokens.addAll(tokenizeBlockAttributes(line, lineNum));
    
    // Inline formatting (*bold*, _italic_, `mono`)
    tokens.addAll(tokenizeInlineFormatting(line, lineNum));
    
    return tokens;
}

private static class SemanticToken {
    int line;
    int startChar;
    int length;
    int tokenType;
    int tokenModifiers;
    
    SemanticToken(int line, int startChar, int length, int tokenType, int tokenModifiers) {
        this.line = line;
        this.startChar = startChar;
        this.length = length;
        this.tokenType = tokenType;
        this.tokenModifiers = tokenModifiers;
    }
}

4. Tokenize Attribute Definitions

private List<SemanticToken> tokenizeAttributeDefinitions(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    String trimmed = line.trim();
    
    if (trimmed.startsWith(":") && trimmed.contains(":") && trimmed.lastIndexOf(':') > 0) {
        Pattern pattern = Pattern.compile("^:([^:]+):");
        Matcher matcher = pattern.matcher(trimmed);
        
        if (matcher.find()) {
            int startPos = line.indexOf(':');
            String attrName = matcher.group(1);
            
            // Token for attribute name (namespace + declaration modifier)
            tokens.add(new SemanticToken(
                lineNum,
                startPos + 1,
                attrName.length(),
                0, // namespace
                1  // declaration modifier
            ));
        }
    }
    
    return tokens;
}

5. Tokenize Attribute References

private List<SemanticToken> tokenizeAttributeReferences(String line, int lineNum, 
                                                         Set<String> definedAttributes) {
    List<SemanticToken> tokens = new ArrayList<>();
    Pattern pattern = Pattern.compile("\\{([^}]+)\\}");
    Matcher matcher = pattern.matcher(line);
    
    while (matcher.find()) {
        String attrName = matcher.group(1);
        int startPos = matcher.start() + 1; // Skip opening {
        
        boolean isDefined = definedAttributes.contains(attrName) || isBuiltInAttribute(attrName);
        
        // Token for attribute reference (parameter + modifier based on definition)
        tokens.add(new SemanticToken(
            lineNum,
            startPos,
            attrName.length(),
            3, // parameter
            isDefined ? 4 : 0 // readonly if built-in, 0 if undefined
        ));
    }
    
    return tokens;
}

6. Tokenize Headers

private List<SemanticToken> tokenizeHeaders(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    String trimmed = line.trim();
    
    if (trimmed.startsWith("=") && !trimmed.startsWith("====")) {
        int level = 0;
        while (level < trimmed.length() && trimmed.charAt(level) == '=') {
            level++;
        }
        
        int startPos = line.indexOf('=');
        String headerText = trimmed.substring(level).trim();
        int textStart = line.indexOf(headerText);
        
        // Token for header text (class)
        if (textStart >= 0) {
            tokens.add(new SemanticToken(
                lineNum,
                textStart,
                headerText.length(),
                1, // class
                2  // definition modifier
            ));
        }
    }
    
    return tokens;
}

7. Tokenize Macros

private List<SemanticToken> tokenizeMacros(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    
    // image::, include::, link:, etc.
    Pattern pattern = Pattern.compile("(image|include|link|kbd|btn|menu)::");
    Matcher matcher = pattern.matcher(line);
    
    while (matcher.find()) {
        String macroName = matcher.group(1);
        int startPos = matcher.start();
        
        // Token for macro name (function)
        tokens.add(new SemanticToken(
            lineNum,
            startPos,
            macroName.length(),
            2, // function
            0
        ));
    }
    
    return tokens;
}

8. Tokenize Block Attributes

private List<SemanticToken> tokenizeBlockAttributes(String line, int lineNum) {
    List<SemanticToken> tokens = new ArrayList<>();
    String trimmed = line.trim();
    
    if (trimmed.startsWith("[") && trimmed.endsWith("]")) {
        Pattern pattern = Pattern.compile("\\[([^\\]]+)\\]");
        Matcher matcher = pattern.matcher(trimmed);
        
        if (matcher.find()) {
            String attributes = matcher.group(1);
            int startPos = line.indexOf('[') + 1;
            
            // Token for block attributes (variable)
            tokens.add(new SemanticToken(
                lineNum,
                startPos,
                attributes.length(),
                4, // variable
                0
            ));
        }
    }
    
    return tokens;
}

Testing Checklist

Attribute Tokens

  • Attribute definitions highlighted differently
  • Defined attribute references colored correctly
  • Undefined attribute references stand out
  • Built-in attributes recognized

Header Tokens

  • Header text highlighted semantically
  • Different header levels distinguishable

Macro Tokens

  • Macro names (image::, include::) highlighted
  • Distinguishable from regular text

Block Attribute Tokens

  • Block attributes [source, java] highlighted
  • Roles and options colored correctly

General

  • Semantic highlighting updates on edit
  • No conflicts with TextMate grammar
  • Colors configured in Eclipse theme
  • Performance acceptable

Files to Modify

  • com.vogella.lsp.asciidoc.server/src/.../AsciidocLanguageServer.java
  • com.vogella.lsp.asciidoc.server/src/.../AsciidocTextDocumentService.java

Dependencies

Success Criteria

  1. ✅ Attribute definitions highlighted
  2. ✅ Attribute references colored by status (defined/undefined)
  3. ✅ Headers highlighted semantically
  4. ✅ Macros stand out from text
  5. ✅ Block attributes colored correctly
  6. ✅ Performance acceptable
  7. ✅ Works with Eclipse color themes

Estimated Effort

2-3 days (complex feature)

Priority

Low - Nice enhancement, not critical

Related Issues

Notes

  • Semantic tokens supplement TextMate, not replace
  • LSP4E semantic token support may have limitations - test thoroughly
  • Token types/modifiers should map to Eclipse theme colors
  • Consider performance with very large documents
  • May need incremental updates (range support) for better performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions