Skip to content

Add trailing comments#280

Open
skinnynerd wants to merge 6 commits intojnikula:masterfrom
skinnynerd:trailing-comments
Open

Add trailing comments#280
skinnynerd wants to merge 6 commits intojnikula:masterfrom
skinnynerd:trailing-comments

Conversation

@skinnynerd
Copy link
Contributor

Adds trailing comments using ///< style. Addresses #233 with inclusion of tests and docs.

In my opinion, some form of this is absolutely required to allow long struct and enum definitions to be readable in the source code. Hopefully, this is a simple enough implementation that it makes sense to include it. Only one comment type is supported per item, either a leading comment or a trailing comment, and only one line of trailing comment is allowed.

Things to review:

  • I reorganized the comment extraction function. Right now, it gets the cursor from each token regardless of whether we will need that cursor later. That might be a slight performance hit, but it wasn't noticeable to me from running the tests. If there is an easy way to profile this it might be good to check.
  • The order of cursors when a definition of the form typedef struct { ... } name; is encountered is STRUCT_DECL -> TYPEDEF_DECL. That means a leading comment points to the struct (what we want), and a trailing comment points to the typedef (not what we want). I worked around this artificially in the parser by skipping attaching trailing comments to TYPEDEF_DECL cursors with an internal STRUCT, ENUM, or UNION with the same name. There might be a better place to do that though.
  • In a similar vein, for anonymous structs and unions (struct { ... } varname; without the typedef), the leading comment will apply to the struct type, and the trailing comment will apply to the variable created with the struct type. That seems like useful behavior so I didn't mess with it, but I did not include it in the documentation.
  • I put most cases that I could think of for trailing comments into a single test file. They may need to be split up. I'm not sure what the appropriate size of tests are for this project.

@jnikula
Copy link
Owner

jnikula commented Jun 14, 2025

Thanks for the contribution!

First, I'm afraid I'm a bit too busy to properly dig into this right now. Sorry about that. Just some high level comments to follow.

I've always disliked the way Doxygen allows any number of different comment styles mixed in the source. This is completely subjective and opinionated, but I always thought it was very ugly. Which is why I originally chose to only support /** and nothing else.

The other reason was that I wanted to minimize the parsing of comment contents in Hawkmoth. Just extract and pass on to Sphinx. So the only two things are 1) is it a documentation comment, and 2) strip the comment markers. And this is why Javadoc/Doxygen compatibility is now in extensions, to not bloat the core code with that.

And this makes me wonder if the comment identification and comment marker stripping shouldn't be extensions too. Support some things built-in, maybe even have a built-in extension for the myriad of Doxygen comment styles. I think this would make the core implementation cleaner too, though I acknowledge getting the design right for handling the extension can be tricky.

The extension interface should handle:

  • Is this a documentation comment?
  • Is this a leading or a trailing comment?
  • Should two or more consecutive documentation comments be joined?
  • Strip the comment markers

I'm not sure if those should happen in one go or multiple.


I understand this may be overwhelming, and you've already put in a bunch of work here. I don't expect you to implement all of this yourself either. But I do want to find the best long-term solution, or at least one that gets us closer to it. (And again, I haven't had an in-depth look at the code here.)

Thoughts?

@BrunoMSantos further, uh, comments on this? I am starting to think we need to be more receptive to multiple comment styles, as well as trailing comments eventually.

@skinnynerd
Copy link
Contributor Author

Personally I would like some trailing comment style, regardless of what it is. There's just no way to create the source code formatting that I would like to have without it. If it makes sense with the architecture you're proposing, you could add support for different comment styles, but I didn't have an issue with the opinionated comment style, just that I was not able to write comments in the source code in the location I wanted them.

Thanks for the review. I'm not sure if I want to dive into this deep enough to fully re-architect the comment parsing right now, so I'll leave the merge request as a vote for this feature in the future.

@BrunoMSantos
Copy link
Collaborator

@BrunoMSantos further, uh, comments on this? I am starting to think we need to be more receptive to multiple comment styles, as well as trailing comments eventually.

Well, I've always been 'receptive' to the feature, it just hasn't been a priority. Otherwise, the linked issue has all my concerns listed already.

As for the code here, I just got back from vacation, I'll try to have a good look by the end of the week.

And thanks for the initiative @skinnynerd ;)

Copy link
Collaborator

@BrunoMSantos BrunoMSantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having had a look, I'm not sure I caught everything, but I sure caught a couple of worrying issues among a few other simple things:

  1. Too many edge cases taken as features. I think they're bugs / serious liabilities. Though you did a great job of exposing them! I hadn't thought of some of them.
  2. Ignored documentation comments will not issue warnings. Preferably there should be a few distinct cases like duplicate docstring for symbol X, spurious docstring and so on. These should be unit tested.

On the subject of the extension, I'm probably more accommodating than @jnikula : I wouldn't mind hard coding support for /***/ and /**<*/ now and even extend it to /// and ///< later. But not one more. If there were significant popular demand for even more styles, then I'd be more of a hard liner. But I don't think that's where the real difficulty of this task lies anyway. It's in point 1 above.

directive: autodoc
arguments:
- trailing-comment.cpp
expected: trailing-comment.rst No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing new line at EOF.

@@ -0,0 +1,106 @@
//////////////////////////
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a test case for an inline docstring at the begining of a file too while at it.

// Handle different locations inside function definitions
//////////////////////////

int sample_func(int a, int b) ///< function with trailing comment after line
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is questionable behaviour. I think we should disallow this sort of abuse. I fear we're taking implementation quirks for features.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might take this further and allow trailing comments only for 1) struct/class/union members and 2) enums. I honestly don't see the benefit of supporting trailing comments for anything else. Very opinionated, of course, but I'm not looking to be drop-in replacement for quirky Doxygen documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well...

int a; /**< why not? */

Not to say we couldn't start there anyway, but I do see the benefit... in so far as there is a benefit to this anywhere else at least.

struct inner_struct {
int innera;
int innerb; ///< inner trailing comment, prior not documented
} a; ///< trailing doc comment for inner struct member
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can comment the type and the variable? I did not contemplate that, but I'm not sure I disagree with it either.

You should add tests with anonymous types though.

@skinnynerd
Copy link
Contributor Author

I may pick this up later this week and see if there's a better way to go about the comment extraction. To summarize comments:

  • Change to /**< */ since that's the already supported comment type
  • Warning whenever a doc comment is dropped
  • Separate tests, less verbose style, and only test one feature per test file

Implementation quirks/flaky styles that shouldn't be allowed or should at least generate a warning:

  • Trailing comments for struct or class definitions should have to be after the entire construct, no trailing comments inside the name, between braces, or on the line after the name but before the definition.
  • Probably just disallow trailing comments for function definitions, since there's no place that they could be that wouldn't be confusing.
  • Trailing comment should have to start on the same line as the construct.

I agree with pretty much all of this.

@BrunoMSantos
Copy link
Collaborator

  • Separate tests, less verbose style, and only test one feature per test file

I wouldn't say necessarily separate. it's probably ok to have all the trailing comments' tests in a single file.

@jnikula
Copy link
Owner

jnikula commented Jun 25, 2025

On the subject of the extension, I'm probably more accommodating than @jnikula : I wouldn't mind hard coding support for /***/ and /**<*/ now and even extend it to /// and ///< later. But not one more. If there were significant popular demand for even more styles, then I'd be more of a hard liner. But I don't think that's where the real difficulty of this task lies anyway. It's in point 1 above.

Okay, let's take the pragmatic approach, and not waste the momentum here. Let's add built-in support for /**< */ for starters, no extension required, and (if you both agree) only for members/enumerations. It's always easier to start small and extend from there than to backtrack something.

In theory /**< already breaks existing documentation if someone's been using that as a regular documentation comment marker...

@jnikula
Copy link
Owner

jnikula commented Aug 9, 2025

@skinnynerd Hey, I've been away, vacations and all that. Not to push you, but just to check, are you working on this? Any issues that we might be able to help with?

@skinnynerd
Copy link
Contributor Author

skinnynerd commented Aug 13, 2025

I haven't had much time to work on this, but I've pushed what I have been able to do without really modifying any architecture.

  • Instead of "only members" I just went with "only things that fit on one line," which I think accomplishes the intent there but also lets you have trailing comments for lists of variables for example.
  • The way multiline trailing comments work out with the current processing is a little weird, but doing the indent processing to let people align everything requires bringing in more information than I think we have in that function right now, and I don't expect multiple lines of trailing comments to be used that often anyways.
  • I added a warning for dropped comments, but I would appreciate suggestions for the specific wording of the warning.

@BrunoMSantos
Copy link
Collaborator

Many thanks, sorry for the delayed response. I'll try to have a look tomorrow.

[...] I don't expect multiple lines of trailing comments to be used that often anyways.

Just so I don't forget, you mean this (quite common actually, maybe the best thing about this sort of comments even):

int a; /**< a */
int b; /**< b */

Or this:

int a; /**< a
            still a */

We can / should probably ban the 2nd one, no? Also with a warning of course ;)

@skinnynerd
Copy link
Contributor Author

skinnynerd commented Aug 19, 2025

Yeah, I'm thinking about the second one. Right now, it has to be written like this to indent correctly because we don't know what column the first line of the comment starts on:

int a; /**<
        * a
        * still a
        */

Which seems silly. We could just prohibit any trailing comment that has multiple lines?

Copy link
Collaborator

@BrunoMSantos BrunoMSantos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, they're directionally correct.

Final thing, and I'm sorry to drop it on you at this point, the fixes should be worked into the original branch history instead of adding new commits on top and each commit should pass the unit tests. Basically, it's nicer to maintain (e.g. bisects better).

I'll try to find an opening to help you get this over the line. Don't wait up though 😅

if current_trailing_token is not None:
if (
current_trailing_token is not None
and current_trailing_token.location.line == token.location.line
Copy link
Collaborator

@BrunoMSantos BrunoMSantos Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the solution for sure, but here are some edge cases to think about:

void foo(void) /**< does this pass? */
{
        return;
}

void bar(void) /**< this does(?) */ {return;}

void baz( /**< this does too(?) */ void) {return;}

struct qux { int bar; } /**< I think this is still qux, so it goes... */
quux; /**< and this is quux I think, so both comments are allowed */

Not sure which ones should be admissible, but I'm wondering if checking that they're in the same line, that the token spans a single line and that the trailing comment is the last thing on the line is enough or if it still presents any pitfalls I may have missed above.

The thing I'm afraid is to find out some construct where the comment gets applied to the wrong symbol or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more restrictions and changed the warning messages to be more specific.

Regarding the examples, with new checking applied, this shakes out as:

/* I agree this is bad style, but I couldn't find a good way to restrict this.
We're ignoring function bodies, so this is equivalent to a bare declaration and
there's not a good way to determine if there is a body without looking at the
actual data in the file */
void foo(void) /**< Yes, applies to foo. */
{
        return;
}

void bar(void) /**< Comment gets dropped; not the last thing on the line */ {return;}

void baz( /**< Comment gets dropped; comment in the middle of a construct */ void) {return;}

struct qux { int bar; } /**< Applies to qux, since the definition for qux ends at } and only occupies one line */
quux; /**< This gets dropped, since the declaration of quux occupies the lines starting with struct and ending with ; */

@skinnynerd
Copy link
Contributor Author

Thanks for the changes, they're directionally correct.

Final thing, and I'm sorry to drop it on you at this point, the fixes should be worked into the original branch history instead of adding new commits on top and each commit should pass the unit tests. Basically, it's nicer to maintain (e.g. bisects better).

I'll try to find an opening to help you get this over the line. Don't wait up though 😅

Makes sense. I can circle back and do a history rewriting pass in a few days, but I wanted to get the changes up to be reviewed.

@jnikula
Copy link
Owner

jnikula commented Aug 22, 2025

Makes sense. I can circle back and do a history rewriting pass in a few days, but I wanted to get the changes up to be reviewed.

There's a catch-22. I know you'll want to do the rewrite only after you've gotten all the feedback, but we'll want to do the review when the history consists of small incremental commits! 😉

I'll try to list some small changes that could be done first:

  1. Make is_doc() only handle leading docs and rename to is_leading_doc() everywhere.
  2. Add is_trailing_doc().
  3. Add class TokenType and refactor _comment_extract() to work in terms of that. (Don't add anything about trailing docs yet. Just preparation.)
  4. Pass errors to _comment_extract().
  5. Only now add the actual logic for handling trailing comments.
  6. Add tests exercising the code.

I very much prefer small independent commits that are easy to review and test. See for example #287.

The changes here aren't that big, but they touch the core token parser loop, and I'd really like to be able to easily understand the changes.

I'll also comment on some things in the actual changes, but I'll postpone the review of the token parser loop until the history rewrite.

comments = {}
current_leading_comment = None
current_trailing_token = None
current_trailing_cursor = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really confusing to me that current_leading_comment is a token and current_trailing_cursor is, well, a cursor. Feels like they should both be the same type.

elif (current_trailing_cursor.extent.end.column > token.extent.start.column):
errorString = 'trailing comment in the middle of a construct was dropped.'
elif (token.extent.start.line != token.extent.end.line):
errorString = 'multiline trailing comment was dropped.'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if we could abstract the checks out of the loop into a function.

!= current_trailing_cursor.extent.end.line):
errorString = 'trailing comment for multiline construct was dropped.'
elif (current_trailing_cursor.extent.start.line
!= token.extent.start.line):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug in not too old versions of clang that trip over if there's a macro instantiation in the cursor extent... I really hope this doesn't trigger the issue.

See DocCursor.get_tokens().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just from looking at the get_tokens function: It looks like the bug is that comparing extents does not work as expected, even though their starts and ends are the same? Basically, extent.start and extent.end should do what we expect, but to correctly get tokens in a range, we need to create a new range from start and end instead of using extent directly?

Can you point me to any more information about that bug in clang so that I can test with an old version that would produce it?

@jnikula
Copy link
Owner

jnikula commented Aug 22, 2025

There were a bunch of trailing spaces in the test files that aren't visible in github code review, but get highlighted with git show.

@jnikula
Copy link
Owner

jnikula commented Aug 22, 2025

@skinnynerd Also, we do appreciate the feature and your effort. We just maintain a fairly high standard for code and tests, and even the git history. If that gets to be too much, please just let us know before (╯°□°)╯︵ ┻━┻ and we'll figure it out.

Rename is_doc to is_leading_doc and exclude comments starting with "/**<", in preparation for adding handling for trailing comments.
Allow the _comment_extract function to generate parse warnings.  Trailing comments have many edge cases that require warnings to be generated.
Restructure the _comment_extract function to first determine the token type (i.e.. comment, documentable construct, or something else that either does or doesn't break the comment context), and then extract comments based on the token type that was determined. This just separates the logic for grouping comments from the logic that decides what's an applicable comment.
Addresses jnikula#233 by adding support for trailing comments. Only one comment type is supported per item, either a leading comment or a trailing comment, and only the first trailing comment is applied.

There's several strange formats that trailing comments can have, that should not be allowed.  A future commit will address these.
Adds tests and syntax documentation for trailing comment parsing
Adds some additional restrictions on trailing comments to prevent edge cases. Each prohibited comment is ignored and marked by a warning message:

- comment with nothing to document
- second comment for a single construct
- comment refering to a multi-line cursor
- comment on separate line from documented construct
- comment in the middle of a construct
- multiline trailing comments
@skinnynerd
Copy link
Contributor Author

Cool, I went through and rewrote the commit history, let me know how that looks.

@BrunoMSantos
Copy link
Collaborator

BrunoMSantos commented Aug 30, 2025

FYI, I was playing around with this a bit and, not gonna lie, this is still a minefield of edge cases. I'd say it's a nice draft, but it will require quite a bit more.

I just stumbled on one more related to anonymous types. I'll put it here for future reference:

/** This type won't be the same... */
struct {int a;} bar; /**< ...as bar's type */

That is an immediate reference error as Sphinx tries to cross link them:

  1. bar's type is not in the form of @anonymous_hash as it should be: we never had a way of documenting a variable with an anonymous type before, so no surprise there. It's an easy fix though...
  2. Except then the way we determine the hashes, the hashes wouldn't match either. They're based on the comment, not the type itself. Probably not terrible to solve either, but exceeds my budget for today.

The 2nd point might be worthy of a fix independent of this PR.

And FYI,

struct {int a;} baz; /**< This is a guaranteed reference error no matter what */

Note that there's no other way to separately document the type above, but .. c:var:: @anonymous_hash baz will try to cross link it all the same.

I think this is fine though. Let the user shoot himself in the foot, the error message is there. There's a symmetric problem with forward documentation comments already actually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants