Add trailing comments by skinnynerd · Pull Request #280 · jnikula/hawkmoth

skinnynerd · 2025-06-12T20:54:53Z

Adds trailing comments using ///< style. Addresses #233 with inclusion of tests and docs.

In my opinion, some form of this is absolutely required to allow long struct and enum definitions to be readable in the source code. Hopefully, this is a simple enough implementation that it makes sense to include it. Only one comment type is supported per item, either a leading comment or a trailing comment, and only one line of trailing comment is allowed.

Things to review:

I reorganized the comment extraction function. Right now, it gets the cursor from each token regardless of whether we will need that cursor later. That might be a slight performance hit, but it wasn't noticeable to me from running the tests. If there is an easy way to profile this it might be good to check.
The order of cursors when a definition of the form typedef struct { ... } name; is encountered is STRUCT_DECL -> TYPEDEF_DECL. That means a leading comment points to the struct (what we want), and a trailing comment points to the typedef (not what we want). I worked around this artificially in the parser by skipping attaching trailing comments to TYPEDEF_DECL cursors with an internal STRUCT, ENUM, or UNION with the same name. There might be a better place to do that though.
In a similar vein, for anonymous structs and unions (struct { ... } varname; without the typedef), the leading comment will apply to the struct type, and the trailing comment will apply to the variable created with the struct type. That seems like useful behavior so I didn't mess with it, but I did not include it in the documentation.
I put most cases that I could think of for trailing comments into a single test file. They may need to be split up. I'm not sure what the appropriate size of tests are for this project.

jnikula · 2025-06-14T10:09:46Z

Thanks for the contribution!

First, I'm afraid I'm a bit too busy to properly dig into this right now. Sorry about that. Just some high level comments to follow.

I've always disliked the way Doxygen allows any number of different comment styles mixed in the source. This is completely subjective and opinionated, but I always thought it was very ugly. Which is why I originally chose to only support /** and nothing else.

The other reason was that I wanted to minimize the parsing of comment contents in Hawkmoth. Just extract and pass on to Sphinx. So the only two things are 1) is it a documentation comment, and 2) strip the comment markers. And this is why Javadoc/Doxygen compatibility is now in extensions, to not bloat the core code with that.

And this makes me wonder if the comment identification and comment marker stripping shouldn't be extensions too. Support some things built-in, maybe even have a built-in extension for the myriad of Doxygen comment styles. I think this would make the core implementation cleaner too, though I acknowledge getting the design right for handling the extension can be tricky.

The extension interface should handle:

Is this a documentation comment?
Is this a leading or a trailing comment?
Should two or more consecutive documentation comments be joined?
Strip the comment markers

I'm not sure if those should happen in one go or multiple.

I understand this may be overwhelming, and you've already put in a bunch of work here. I don't expect you to implement all of this yourself either. But I do want to find the best long-term solution, or at least one that gets us closer to it. (And again, I haven't had an in-depth look at the code here.)

Thoughts?

@BrunoMSantos further, uh, comments on this? I am starting to think we need to be more receptive to multiple comment styles, as well as trailing comments eventually.

skinnynerd · 2025-06-14T18:53:09Z

Personally I would like some trailing comment style, regardless of what it is. There's just no way to create the source code formatting that I would like to have without it. If it makes sense with the architecture you're proposing, you could add support for different comment styles, but I didn't have an issue with the opinionated comment style, just that I was not able to write comments in the source code in the location I wanted them.

Thanks for the review. I'm not sure if I want to dive into this deep enough to fully re-architect the comment parsing right now, so I'll leave the merge request as a vote for this feature in the future.

BrunoMSantos · 2025-06-16T10:35:52Z

@BrunoMSantos further, uh, comments on this? I am starting to think we need to be more receptive to multiple comment styles, as well as trailing comments eventually.

Well, I've always been 'receptive' to the feature, it just hasn't been a priority. Otherwise, the linked issue has all my concerns listed already.

As for the code here, I just got back from vacation, I'll try to have a good look by the end of the week.

And thanks for the initiative @skinnynerd ;)

BrunoMSantos

Having had a look, I'm not sure I caught everything, but I sure caught a couple of worrying issues among a few other simple things:

Too many edge cases taken as features. I think they're bugs / serious liabilities. Though you did a great job of exposing them! I hadn't thought of some of them.
Ignored documentation comments will not issue warnings. Preferably there should be a few distinct cases like duplicate docstring for symbol X, spurious docstring and so on. These should be unit tested.

On the subject of the extension, I'm probably more accommodating than @jnikula : I wouldn't mind hard coding support for /***/ and /**<*/ now and even extend it to /// and ///< later. But not one more. If there were significant popular demand for even more styles, then I'd be more of a hard liner. But I don't think that's where the real difficulty of this task lies anyway. It's in point 1 above.

doc/syntax.rst

BrunoMSantos · 2025-06-19T10:36:43Z

test/cpp/trailing-comment.yaml

+  directive: autodoc
+  arguments:
+  - trailing-comment.cpp
+expected: trailing-comment.rst


Missing new line at EOF.

test/c/trailing-comment.c

src/hawkmoth/parser.py

BrunoMSantos · 2025-06-19T11:31:17Z

test/c/trailing-comment.c

@@ -0,0 +1,106 @@
+//////////////////////////


I'd like to see a test case for an inline docstring at the begining of a file too while at it.

BrunoMSantos · 2025-06-19T11:32:14Z

test/c/trailing-comment.c

+// Handle different locations inside function definitions
+//////////////////////////
+
+int sample_func(int a, int b) ///< function with trailing comment after line


This is questionable behaviour. I think we should disallow this sort of abuse. I fear we're taking implementation quirks for features.

I might take this further and allow trailing comments only for 1) struct/class/union members and 2) enums. I honestly don't see the benefit of supporting trailing comments for anything else. Very opinionated, of course, but I'm not looking to be drop-in replacement for quirky Doxygen documentation.

Well...

int a; /**< why not? */

Not to say we couldn't start there anyway, but I do see the benefit... in so far as there is a benefit to this anywhere else at least.

test/c/trailing-comment.c

BrunoMSantos · 2025-06-19T11:37:27Z

test/c/trailing-comment.c

+    struct inner_struct {
+        int innera;
+        int innerb; ///< inner trailing comment, prior not documented
+    } a; ///< trailing doc comment for inner struct member


So we can comment the type and the variable? I did not contemplate that, but I'm not sure I disagree with it either.

You should add tests with anonymous types though.

test/c/trailing-comment.c

skinnynerd · 2025-06-23T20:42:23Z

I may pick this up later this week and see if there's a better way to go about the comment extraction. To summarize comments:

Change to /**< */ since that's the already supported comment type
Warning whenever a doc comment is dropped
Separate tests, less verbose style, and only test one feature per test file

Implementation quirks/flaky styles that shouldn't be allowed or should at least generate a warning:

Trailing comments for struct or class definitions should have to be after the entire construct, no trailing comments inside the name, between braces, or on the line after the name but before the definition.
Probably just disallow trailing comments for function definitions, since there's no place that they could be that wouldn't be confusing.
Trailing comment should have to start on the same line as the construct.

I agree with pretty much all of this.

BrunoMSantos · 2025-06-23T21:54:06Z

Separate tests, less verbose style, and only test one feature per test file

I wouldn't say necessarily separate. it's probably ok to have all the trailing comments' tests in a single file.

jnikula · 2025-06-25T21:42:41Z

On the subject of the extension, I'm probably more accommodating than @jnikula : I wouldn't mind hard coding support for /***/ and /**<*/ now and even extend it to /// and ///< later. But not one more. If there were significant popular demand for even more styles, then I'd be more of a hard liner. But I don't think that's where the real difficulty of this task lies anyway. It's in point 1 above.

Okay, let's take the pragmatic approach, and not waste the momentum here. Let's add built-in support for /**< */ for starters, no extension required, and (if you both agree) only for members/enumerations. It's always easier to start small and extend from there than to backtrack something.

In theory /**< already breaks existing documentation if someone's been using that as a regular documentation comment marker...

jnikula · 2025-08-09T07:26:29Z

@skinnynerd Hey, I've been away, vacations and all that. Not to push you, but just to check, are you working on this? Any issues that we might be able to help with?

skinnynerd · 2025-08-13T21:36:43Z

I haven't had much time to work on this, but I've pushed what I have been able to do without really modifying any architecture.

Instead of "only members" I just went with "only things that fit on one line," which I think accomplishes the intent there but also lets you have trailing comments for lists of variables for example.
The way multiline trailing comments work out with the current processing is a little weird, but doing the indent processing to let people align everything requires bringing in more information than I think we have in that function right now, and I don't expect multiple lines of trailing comments to be used that often anyways.
I added a warning for dropped comments, but I would appreciate suggestions for the specific wording of the warning.

BrunoMSantos · 2025-08-19T00:12:42Z

Many thanks, sorry for the delayed response. I'll try to have a look tomorrow.

[...] I don't expect multiple lines of trailing comments to be used that often anyways.

Just so I don't forget, you mean this (quite common actually, maybe the best thing about this sort of comments even):

int a; /**< a */
int b; /**< b */

Or this:

int a; /**< a
            still a */

We can / should probably ban the 2nd one, no? Also with a warning of course ;)

skinnynerd · 2025-08-19T16:36:16Z

Yeah, I'm thinking about the second one. Right now, it has to be written like this to indent correctly because we don't know what column the first line of the comment starts on:

int a; /**<
        * a
        * still a
        */

Which seems silly. We could just prohibit any trailing comment that has multiple lines?

BrunoMSantos

Thanks for the changes, they're directionally correct.

Final thing, and I'm sorry to drop it on you at this point, the fixes should be worked into the original branch history instead of adding new commits on top and each commit should pass the unit tests. Basically, it's nicer to maintain (e.g. bisects better).

I'll try to find an opening to help you get this over the line. Don't wait up though 😅

doc/syntax.rst

test/c/trailing-comment.c

src/hawkmoth/parser.py

BrunoMSantos · 2025-08-19T21:51:22Z

src/hawkmoth/parser.py

-            if current_trailing_token is not None:
+            if (
+                current_trailing_token is not None
+                and current_trailing_token.location.line == token.location.line


This is part of the solution for sure, but here are some edge cases to think about:

void foo(void) /**< does this pass? */ { return; } void bar(void) /**< this does(?) */ {return;} void baz( /**< this does too(?) */ void) {return;} struct qux { int bar; } /**< I think this is still qux, so it goes... */ quux; /**< and this is quux I think, so both comments are allowed */

Not sure which ones should be admissible, but I'm wondering if checking that they're in the same line, that the token spans a single line and that the trailing comment is the last thing on the line is enough or if it still presents any pitfalls I may have missed above.

The thing I'm afraid is to find out some construct where the comment gets applied to the wrong symbol or something.

I added some more restrictions and changed the warning messages to be more specific.

Regarding the examples, with new checking applied, this shakes out as:

/* I agree this is bad style, but I couldn't find a good way to restrict this. We're ignoring function bodies, so this is equivalent to a bare declaration and there's not a good way to determine if there is a body without looking at the actual data in the file */ void foo(void) /**< Yes, applies to foo. */ { return; } void bar(void) /**< Comment gets dropped; not the last thing on the line */ {return;} void baz( /**< Comment gets dropped; comment in the middle of a construct */ void) {return;} struct qux { int bar; } /**< Applies to qux, since the definition for qux ends at } and only occupies one line */ quux; /**< This gets dropped, since the declaration of quux occupies the lines starting with struct and ending with ; */

skinnynerd · 2025-08-22T18:37:43Z

Thanks for the changes, they're directionally correct.

Final thing, and I'm sorry to drop it on you at this point, the fixes should be worked into the original branch history instead of adding new commits on top and each commit should pass the unit tests. Basically, it's nicer to maintain (e.g. bisects better).

I'll try to find an opening to help you get this over the line. Don't wait up though 😅

Makes sense. I can circle back and do a history rewriting pass in a few days, but I wanted to get the changes up to be reviewed.

jnikula · 2025-08-22T19:14:53Z

Makes sense. I can circle back and do a history rewriting pass in a few days, but I wanted to get the changes up to be reviewed.

There's a catch-22. I know you'll want to do the rewrite only after you've gotten all the feedback, but we'll want to do the review when the history consists of small incremental commits! 😉

I'll try to list some small changes that could be done first:

Make is_doc() only handle leading docs and rename to is_leading_doc() everywhere.
Add is_trailing_doc().
Add class TokenType and refactor _comment_extract() to work in terms of that. (Don't add anything about trailing docs yet. Just preparation.)
Pass errors to _comment_extract().
Only now add the actual logic for handling trailing comments.
Add tests exercising the code.

I very much prefer small independent commits that are easy to review and test. See for example #287.

The changes here aren't that big, but they touch the core token parser loop, and I'd really like to be able to easily understand the changes.

I'll also comment on some things in the actual changes, but I'll postpone the review of the token parser loop until the history rewrite.

jnikula · 2025-08-22T19:20:59Z

src/hawkmoth/parser.py

    comments = {}
    current_leading_comment = None
-    current_trailing_token = None
+    current_trailing_cursor = None


It's really confusing to me that current_leading_comment is a token and current_trailing_cursor is, well, a cursor. Feels like they should both be the same type.

src/hawkmoth/parser.py

jnikula · 2025-08-22T19:23:32Z

src/hawkmoth/parser.py

+            elif (current_trailing_cursor.extent.end.column > token.extent.start.column):
+                errorString = 'trailing comment in the middle of a construct was dropped.'
+            elif (token.extent.start.line != token.extent.end.line):
+                errorString = 'multiline trailing comment was dropped.'


I do wonder if we could abstract the checks out of the loop into a function.

jnikula · 2025-08-22T19:27:07Z

src/hawkmoth/parser.py

+                  != current_trailing_cursor.extent.end.line):
+                errorString = 'trailing comment for multiline construct was dropped.'
+            elif (current_trailing_cursor.extent.start.line
+                  != token.extent.start.line):


There's a bug in not too old versions of clang that trip over if there's a macro instantiation in the cursor extent... I really hope this doesn't trigger the issue.

See DocCursor.get_tokens().

Just from looking at the get_tokens function: It looks like the bug is that comparing extents does not work as expected, even though their starts and ends are the same? Basically, extent.start and extent.end should do what we expect, but to correctly get tokens in a range, we need to create a new range from start and end instead of using extent directly?

Can you point me to any more information about that bug in clang so that I can test with an old version that would produce it?

jnikula · 2025-08-22T19:32:30Z

There were a bunch of trailing spaces in the test files that aren't visible in github code review, but get highlighted with git show.

jnikula · 2025-08-22T19:46:10Z

@skinnynerd Also, we do appreciate the feature and your effort. We just maintain a fairly high standard for code and tests, and even the git history. If that gets to be too much, please just let us know before (╯°□°)╯︵ ┻━┻ and we'll figure it out.

Rename is_doc to is_leading_doc and exclude comments starting with "/**<", in preparation for adding handling for trailing comments.

Allow the _comment_extract function to generate parse warnings. Trailing comments have many edge cases that require warnings to be generated.

Restructure the _comment_extract function to first determine the token type (i.e.. comment, documentable construct, or something else that either does or doesn't break the comment context), and then extract comments based on the token type that was determined. This just separates the logic for grouping comments from the logic that decides what's an applicable comment.

Addresses jnikula#233 by adding support for trailing comments. Only one comment type is supported per item, either a leading comment or a trailing comment, and only the first trailing comment is applied. There's several strange formats that trailing comments can have, that should not be allowed. A future commit will address these.

Adds tests and syntax documentation for trailing comment parsing

Adds some additional restrictions on trailing comments to prevent edge cases. Each prohibited comment is ignored and marked by a warning message: - comment with nothing to document - second comment for a single construct - comment refering to a multi-line cursor - comment on separate line from documented construct - comment in the middle of a construct - multiline trailing comments

skinnynerd · 2025-08-25T20:59:51Z

Cool, I went through and rewrote the commit history, let me know how that looks.

BrunoMSantos · 2025-08-30T16:32:45Z

FYI, I was playing around with this a bit and, not gonna lie, this is still a minefield of edge cases. I'd say it's a nice draft, but it will require quite a bit more.

I just stumbled on one more related to anonymous types. I'll put it here for future reference:

/** This type won't be the same... */
struct {int a;} bar; /**< ...as bar's type */

That is an immediate reference error as Sphinx tries to cross link them:

bar's type is not in the form of @anonymous_hash as it should be: we never had a way of documenting a variable with an anonymous type before, so no surprise there. It's an easy fix though...
Except then the way we determine the hashes, the hashes wouldn't match either. They're based on the comment, not the type itself. Probably not terrible to solve either, but exceeds my budget for today.

The 2nd point might be worthy of a fix independent of this PR.

And FYI,

struct {int a;} baz; /**< This is a guaranteed reference error no matter what */

Note that there's no other way to separately document the type above, but .. c:var:: @anonymous_hash baz will try to cross link it all the same.

I think this is fine though. Let the user shoot himself in the foot, the error message is there. There's a symmetric problem with forward documentation comments already actually.

skinnynerd force-pushed the trailing-comments branch from 2e0cf4a to 48604a5 Compare June 13, 2025 15:36

BrunoMSantos requested changes Jun 19, 2025

View reviewed changes

skinnynerd force-pushed the trailing-comments branch from 7735447 to 35865d2 Compare August 13, 2025 21:24

skinnynerd force-pushed the trailing-comments branch from 4d775ba to cc3cbcd Compare August 13, 2025 21:54

BrunoMSantos reviewed Aug 19, 2025

View reviewed changes

skinnynerd force-pushed the trailing-comments branch from d91be10 to e4e9b17 Compare August 22, 2025 18:29

jnikula requested changes Aug 22, 2025

View reviewed changes

evanblum added 3 commits August 25, 2025 14:52

trailing-comments: Modify is_doc to select only leading comments

8e4c4d1

Rename is_doc to is_leading_doc and exclude comments starting with "/**<", in preparation for adding handling for trailing comments.

trailing-comments: Pass errors into comment extraction function

e0434c0

Allow the _comment_extract function to generate parse warnings. Trailing comments have many edge cases that require warnings to be generated.

skinnynerd force-pushed the trailing-comments branch from e4e9b17 to fd87446 Compare August 25, 2025 20:39

evanblum added 3 commits August 25, 2025 15:42

trailing-comments: add tests and documentation

ad9051d

Adds tests and syntax documentation for trailing comment parsing

skinnynerd force-pushed the trailing-comments branch from fd87446 to d67f815 Compare August 25, 2025 21:03

Conversation

skinnynerd commented Jun 12, 2025

Uh oh!

jnikula commented Jun 14, 2025

Uh oh!

skinnynerd commented Jun 14, 2025

Uh oh!

BrunoMSantos commented Jun 16, 2025

Uh oh!

BrunoMSantos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skinnynerd commented Jun 23, 2025

Uh oh!

BrunoMSantos commented Jun 23, 2025

Uh oh!

jnikula commented Jun 25, 2025

Uh oh!

jnikula commented Aug 9, 2025

Uh oh!

skinnynerd commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BrunoMSantos commented Aug 19, 2025

Uh oh!

skinnynerd commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BrunoMSantos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrunoMSantos Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skinnynerd commented Aug 22, 2025

Uh oh!

jnikula commented Aug 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnikula commented Aug 22, 2025

Uh oh!

jnikula commented Aug 22, 2025

Uh oh!

skinnynerd commented Aug 25, 2025

Uh oh!

BrunoMSantos commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

skinnynerd commented Aug 13, 2025 •

edited

Loading

skinnynerd commented Aug 19, 2025 •

edited

Loading

BrunoMSantos Aug 19, 2025 •

edited

Loading

BrunoMSantos commented Aug 30, 2025 •

edited

Loading