Skip to content

Fix #12271: Integrity checker for year, location, and page numbers in booktitle#15465

Merged
subhramit merged 16 commits intoJabRef:mainfrom
Chiragsd13:fix/12271-booktitle-integrity-checker
Apr 16, 2026
Merged

Fix #12271: Integrity checker for year, location, and page numbers in booktitle#15465
subhramit merged 16 commits intoJabRef:mainfrom
Chiragsd13:fix/12271-booktitle-integrity-checker

Conversation

@Chiragsd13
Copy link
Copy Markdown
Contributor

@Chiragsd13 Chiragsd13 commented Apr 1, 2026

Related issues and pull requests

Closes #12271

PR Description

Adds three new integrity checks on the booktitle field that warn when content belongs in a dedicated field instead:

  • Year — flags a 4-digit year (1xxx / 2xxx) that is not embedded in a larger alphanumeric token, so ICML2015 is left alone but ECCTD 2015 is flagged.
  • Location (country) — flags any country name from the JDK's ISO-3166 list (Locale.getISOCountries() + Locale.of("", code).getDisplayCountry(Locale.ENGLISH)), with the same alphanumeric-boundary guard so USA2015 is not flagged.
  • Page numbers — flags explicit page markers such as pp. 1234-1242 or pages 1234.

All four checkers (the original BooktitleChecker plus the three new ones) are registered against StandardField.BOOKTITLE in FieldCheckers.

Steps to test

  1. Open JabRef and create a new library.
  2. Add an @inproceedings entry and set the booktitle field to:
    • 2015 {IEEE} International Conference on Digital Signal Processing, {DSP} 2015, Singapore → flagged for year and location
    • European Conference on Circuit Theory and Design, {ECCTD} 2015, Trondheim, Norway → flagged for year and location
    • Advances in Neural Information Processing Systems, pp. 1234-1242 → flagged for page numbers
    • International Conference on Machine Learning → no warning
  3. Run Quality → Check Integrity and verify the expected warnings appear.
image

Checklist

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • I added screenshots in the PR description (if change is visible to the user)
  • I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
  • I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
  • I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

@github-actions github-actions bot added the good second issue Issues that involve a tour of two or three interweaved components in JabRef label Apr 1, 2026
@github-actions github-actions bot added the status: changes-required Pull requests that are not yet complete label Apr 1, 2026
@testlens-app

This comment has been minimized.

@testlens-app

This comment has been minimized.

@Chiragsd13 Chiragsd13 marked this pull request as ready for review April 1, 2026 04:32
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Enhance BooktitleChecker with year, location, and page detection

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Adds three new integrity checks to BooktitleChecker for year, location, and page numbers
• Detects 4-digit years (1000–2999) in booktitle fields
• Detects country names from UN-recognized list using pre-compiled regex
• Detects explicit page-number patterns (pp., p., pages keywords)
• Creates Countries utility class with hard-coded country name set
• Adds comprehensive parameterized tests covering all new checks
• Adds localization keys for new warning messages
Diagram
flowchart LR
  BC["BooktitleChecker"]
  YC["Year Check<br/>1000-2999"]
  CC["Country Check<br/>UN list"]
  PC["Page Check<br/>pp/pages"]
  CO["Countries<br/>utility class"]
  L10N["Localization<br/>keys"]
  
  BC --> YC
  BC --> CC
  BC --> PC
  CC --> CO
  BC --> L10N
Loading

Grey Divider

File Changes

1. jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java ✨ Enhancement +33/-0

Add year, country, and page detection logic

• Adds three static Predicate fields for year, country, and page-number detection
• Implements year detection using regex pattern for 4-digit numbers (1000–2999)
• Implements country detection using pre-compiled regex from Countries class
• Implements page-number detection for patterns like "pp.", "p.", "pages"
• Adds three new integrity check conditions in checkValue method
• Returns localized warning messages for each detected issue

jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java


2. jablib/src/main/java/org/jabref/logic/integrity/Countries.java ✨ Enhancement +61/-0

Create Countries utility class with country names

• New utility class holding hard-coded set of UN-recognized country names
• Contains 195+ country names stored in lower-case for case-insensitive matching
• Includes both standard country names and common aliases (e.g., "czechia" and "czech republic")
• Designed to be used by BooktitleChecker for location detection in booktitle fields

jablib/src/main/java/org/jabref/logic/integrity/Countries.java


3. jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java 🧪 Tests +66/-3

Add comprehensive tests for new integrity checks

• Reorganizes tests with clear section comments for existing and new checks
• Adds year detection tests covering middle, start positions, and no-year cases
• Adds country detection tests for Austria, Singapore, and no-country cases
• Adds page-number detection tests for "pp." and "pages" patterns
• Removes year from existing test cases to isolate the "conference on" check
• Fixes typo in test method name from "DoesNotAccepts" to "DoesNotAccept"

jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java


View more (1)
4. jablib/src/main/resources/l10n/JabRef_en.properties Localization +3/-0

Add localization strings for new checks

• Adds three new localization keys for booktitle integrity warnings
• "booktitle should not contain a location" for country detection
• "booktitle should not contain a year" for year detection
• "booktitle should not contain page numbers" for page-number detection

jablib/src/main/resources/l10n/JabRef_en.properties


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 1, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. assertNotEquals(Optional.empty()) used📘
Description
The new/modified tests only assert that an Optional is non-empty (via
assertNotEquals(Optional.empty(), ...)) instead of asserting the exact expected Optional
value/message. This weak predicate-style check can miss regressions where the checker returns the
wrong warning text.
Code

jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[R26-87]

+    void booktitleDoesNotAcceptIfItEndsWithConferenceOn() {
+        assertNotEquals(Optional.empty(), checker.checkValue("Digital Information and Communication Technology and its Applications (DICTAP), Fourth International Conference on"));
}
@Test
void booktitleIsBlank() {
assertEquals(Optional.empty(), checker.checkValue(" "));
}
+
+    // ------------------------------------------------------------------
+    // Year detection
+    // ------------------------------------------------------------------
+
+    @Test
+    void booktitleFlagsYearInMiddle() {
+        // Example from the issue: year embedded inside a booktitle
+        assertNotEquals(Optional.empty(), checker.checkValue("European Conference on Circuit Theory and Design, {ECCTD} 2015, Trondheim, Norway"));
+    }
+
+    @Test
+    void booktitleFlagsYearAtStart() {
+        assertNotEquals(Optional.empty(), checker.checkValue("2015 {IEEE} International Conference on Digital Signal Processing"));
+    }
+
+    @Test
+    void booktitleAcceptsWhenNoYear() {
+        assertEquals(Optional.empty(), checker.checkValue("International Conference on Software Engineering"));
+    }
+
+    // ------------------------------------------------------------------
+    // Location (country) detection
+    // ------------------------------------------------------------------
+
+    @Test
+    void booktitleFlagsCountryName() {
+        // "Norway" is a country and should be flagged
+        assertNotEquals(Optional.empty(), checker.checkValue("Service-Oriented Computing, Fifth International Conference, Vienna, Austria, Proceedings"));
+    }
+
+    @Test
+    void booktitleFlagsCountryNameSingapore() {
+        assertNotEquals(Optional.empty(), checker.checkValue("{IEEE} International Conference on Digital Signal Processing, Singapore, Proceedings"));
+    }
+
+    @Test
+    void booktitleAcceptsWhenNoCountry() {
+        assertEquals(Optional.empty(), checker.checkValue("International Conference on Machine Learning Proceedings"));
+    }
+
+    // ------------------------------------------------------------------
+    // Page-number detection
+    // ------------------------------------------------------------------
+
+    @Test
+    void booktitleFlagsPagesPattern() {
+        assertNotEquals(Optional.empty(), checker.checkValue("Advances in Neural Information Processing Systems, pp. 1234-1242"));
+    }
+
+    @Test
+    void booktitleFlagsPagesKeyword() {
+        assertNotEquals(Optional.empty(), checker.checkValue("Advances in Neural Information Processing Systems, pages 1234-1242"));
+    }
Evidence
PR Compliance ID 28 requires asserting exact expected values/structures rather than weak predicate
checks; multiple newly added/modified assertions only verify that the Optional is non-empty. PR
Compliance ID 20 also discourages weak boolean/predicate-style assertions in tests when content
assertions are possible.

AGENTS.md
jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[26-27]
jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[40-48]
jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[60-68]
jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[79-87]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Tests use weak assertions like `assertNotEquals(Optional.empty(), ...)` which only check presence, not the exact expected warning message.
## Issue Context
`BooktitleChecker#checkValue` returns an `Optional<String>` containing a specific localized warning string. Tests should verify the exact returned `Optional` value (e.g., `Optional.of("booktitle should not contain a year")`) so regressions in message selection/order are caught.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[26-27]
- jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[40-48]
- jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[60-68]
- jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java[79-87]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Booktitle warns only once🐞
Description
BooktitleChecker returns on the first matching condition (year, then country, then pages), so a
single booktitle containing multiple kinds of embedded metadata will only ever report one integrity
message and silently skip the others.
Code

jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[R42-52]

+        if (CONTAINS_YEAR.test(value)) {
+            return Optional.of(Localization.lang("booktitle should not contain a year"));
+        }
+
+        if (CONTAINS_COUNTRY.test(value)) {
+            return Optional.of(Localization.lang("booktitle should not contain a location"));
+        }
+
+        if (CONTAINS_PAGES.test(value)) {
+            return Optional.of(Localization.lang("booktitle should not contain page numbers"));
+        }
Evidence
BooktitleChecker exits early on the first match, making subsequent checks unreachable for the same
value. The integrity system can emit multiple messages for the same field by registering multiple
ValueCheckers (FieldCheckers uses a Multimap), but this implementation keeps all checks in one
ValueChecker that can only return one Optional message.

jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[38-52]
jablib/src/main/java/org/jabref/logic/integrity/ValueChecker.java[5-10]
jablib/src/main/java/org/jabref/logic/integrity/FieldChecker.java[23-27]
jablib/src/main/java/org/jabref/logic/integrity/FieldCheckers.java[18-55]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`BooktitleChecker.checkValue` returns after the first matched rule, so a booktitle containing both a year and a location (or pages) will only produce one warning.
### Issue Context
The integrity framework already supports multiple `ValueChecker`s per field via `FieldCheckers`’ `Multimap<Field, ValueChecker>`; each checker can emit one `IntegrityMessage`.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[38-52]
- jablib/src/main/java/org/jabref/logic/integrity/FieldCheckers.java[18-55]
### Suggested fix
1. Keep `BooktitleChecker` for the existing "ends with conference on" rule (or convert it into one focused checker).
2. Create three new `ValueChecker` implementations:
- `BooktitleContainsYearChecker`
- `BooktitleContainsCountryChecker`
- `BooktitleContainsPagesChecker`
3. Register all of them for `StandardField.BOOKTITLE` in `FieldCheckers.getAllMap(...)` using multiple `put(...)` calls.
4. Add/adjust tests to assert that a booktitle with both year and country yields *two* messages when run through the integrity pipeline (e.g., via `FieldCheckers.getForField(StandardField.BOOKTITLE)` + applying each checker).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Country regex matches tokens🐞
Description
The country detection regex only checks adjacent letters for word boundaries, so it will match
country abbreviations inside alphanumeric tokens (e.g., "USA2015"), producing incorrect “location”
warnings despite the intent to match whole words.
Code

jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[R26-30]

+        String alternation = Countries.COUNTRY_NAMES.stream()
+                                                    .map(Pattern::quote)
+                                                    .collect(Collectors.joining("|"));
+        CONTAINS_COUNTRY = Pattern.compile("(?i)(?<![a-z])(" + alternation + ")(?![a-z])").asPredicate();
+    }
Evidence
The pattern uses (?<![a-z]) and (?![a-z]), which allow digits immediately after a match. With
(?i) enabled, USA2015 matches usa because 2 is not a letter. The countries list explicitly
includes short abbreviations like usa, uk, and uae, increasing the chance of these false
matches.

jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[25-30]
jablib/src/main/java/org/jabref/logic/integrity/Countries.java[52-55]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`CONTAINS_COUNTRY` claims to do whole-word matching, but its boundary checks only exclude letters, not digits/underscores. This makes it match abbreviations like `usa` inside `USA2015`.
### Issue Context
Short abbreviations (`usa`, `uk`, `uae`) exist in `Countries.COUNTRY_NAMES`, so alphanumeric conference tokens can be mis-flagged as locations.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[25-30]
- jablib/src/main/java/org/jabref/logic/integrity/Countries.java[52-55]
### Suggested fix
Replace the `[a-z]`-based lookarounds with real word boundaries, e.g.:
- `Pattern.compile("(?i)\\b(" + alternation + ")\\b")`
Alternatively, if you want Unicode-aware boundaries and to treat digits as part of tokens, use:
- `(?<!\\p{Alnum})(...)(?!\\p{Alnum})`
Add a regression test ensuring strings like `"Proceedings USA2015"` do **not** trigger the location warning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Trivial // utility class📘
Description
The new Countries utility class adds a trivial comment // utility class that restates what the
private constructor already implies. This violates the comment hygiene rule to avoid
trivial/restating comments.
Code

jablib/src/main/java/org/jabref/logic/integrity/Countries.java[R58-60]

+    private Countries() {
+        // utility class
+    }
Evidence
PR Compliance ID 4 forbids adding trivial comments that restate code; the comment // utility class
provides no additional intent beyond the private constructor.

AGENTS.md
jablib/src/main/java/org/jabref/logic/integrity/Countries.java[58-60]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A trivial comment (`// utility class`) was added in the private constructor of `Countries`, restating what the code already makes clear.
## Issue Context
Comment hygiene guidelines require comments to explain intent (“why”), not restate obvious code.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/integrity/Countries.java[58-60]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Year regex matches tokens🐞
Description
The year regex is bounded only by non-digits, so it also matches 4-digit sequences embedded in
alphanumeric tokens (e.g., "ICML2015"), contradicting the “standalone” intent and potentially
creating noisy warnings.
Code

jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[R14-16]

+    // Matches a standalone 4-digit year in the range 1000–2999
+    private static final Predicate<String> CONTAINS_YEAR =
+            Pattern.compile("(?<![0-9])[12][0-9]{3}(?![0-9])").asPredicate();
Evidence
(?<![0-9]) / (?![0-9]) only prevent adjacent digits; letters are allowed. Therefore any
occurrence like ICML2015 (letter immediately before the digits) still matches the predicate and
will be flagged as containing a year.

jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[14-16]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`CONTAINS_YEAR` matches 4-digit sequences even when directly attached to letters (e.g., `ICML2015`), even though the comment says it matches a standalone year.
### Issue Context
Current pattern: `(?<![0-9])[12][0-9]{3}(?![0-9])`.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java[14-16]
### Suggested fix
If the intent is truly a standalone year token, switch to an alphanumeric-aware boundary, e.g.:
- `Pattern.compile("\\b[12]\\d{3}\\b")`
Then add a unit test demonstrating that `"ICML2015"` is not flagged while `"ICML 2015"` is flagged.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@Chiragsd13
Copy link
Copy Markdown
Contributor Author

The guard-review check is failing due to a missing .github/actions/pr-gate/action.yml in the main repository. This appears to be a CI infrastructure issue unrelated to the changes in this PR. All 51 other checks are passing.

Comment thread jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java Outdated
Comment thread jablib/src/main/java/org/jabref/logic/integrity/BooktitleChecker.java Outdated
@Chiragsd13
Copy link
Copy Markdown
Contributor Author

CI Infrastructure Issue: guard-review job failing

Hi maintainers, I wanted to flag that the guard-review check is failing on this PR with the following error:

Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under
'/home/runner/work/jabref/jabref/.github/actions/pr-gate'.
Did you forget to run actions/checkout before running your local action?

Root cause: The workflow .github/workflows/remove-ready-for-review.yml references a local composite action at .github/actions/pr-gate, but that directory/file does not appear to exist in the repository.

Impact on this PR: None — this is unrelated to the code changes here. All 51 other checks are passing. This job only triggered because the PR was just converted from Draft to Ready for Review.

This seems to be a pre-existing infrastructure issue on the main repository side. Happy to proceed with review whenever convenient.

@testlens-app
Copy link
Copy Markdown

testlens-app bot commented Apr 1, 2026

✅ All tests passed ✅

🏷️ Commit: 7a50c37
▶️ Tests: 10226 executed
⚪️ Checks: 49/49 completed


Learn more about TestLens at testlens.app.

@github-actions github-actions bot added status: no-bot-comments status: changes-required Pull requests that are not yet complete and removed status: changes-required Pull requests that are not yet complete status: no-bot-comments labels Apr 1, 2026
@Chiragsd13
Copy link
Copy Markdown
Contributor Author

I would appreciate it if someone could review and merge it, please

Comment thread jablib/src/main/java/org/jabref/logic/integrity/Countries.java Outdated
Comment thread jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java Outdated
@Siedlerchr Siedlerchr added the status: changes-required Pull requests that are not yet complete label Apr 8, 2026
@github-actions github-actions bot added status: no-bot-comments status: changes-required Pull requests that are not yet complete and removed status: no-bot-comments status: changes-required Pull requests that are not yet complete labels Apr 8, 2026
Enhance BooktitleChecker to flag booktitle values that contain:
- A 4-digit year (e.g. 2015)
- A country name (e.g. Norway, Austria, Singapore)
- Explicit page-number patterns (e.g. "pp. 1–10", "pages 3-7")

Add Countries.java with a hard-coded set of all UN-recognised country
names used for the country-presence check.  The set is built as a single
pre-compiled regex alternation so the pattern is compiled only once.

Update BooktitleCheckerTest with parameterised tests covering all three
new integrity rules and the blank-value / valid-value edge cases.

Closes JabRef#12271
@Chiragsd13 Chiragsd13 force-pushed the fix/12271-booktitle-integrity-checker branch from f31059c to 5108512 Compare April 10, 2026 01:37
- Remove separate Countries.java utility class; inline the
  Locale-derived country set directly into
  BooktitleContainsCountryChecker
- Use Locale.of() instead of Locale.Builder (matches existing
  JabRef convention in Language.java)
- Remove javadoc from new checkers to match existing checker
  style (BracketChecker, TitleChecker, YearChecker have none)
@Chiragsd13 Chiragsd13 force-pushed the fix/12271-booktitle-integrity-checker branch from 6d73d94 to 2268a62 Compare April 10, 2026 01:50
HoussemNasri
HoussemNasri previously approved these changes Apr 10, 2026
Previously the year regex used digit-only lookarounds, so tokens like
ICML2015 matched the 2015 and triggered a false positive. Switching the
boundary to \p{Alnum} mirrors the country checker and prevents matching
years that are embedded inside larger alphanumeric tokens.

Also adds a regression test and a CHANGELOG entry under [Unreleased].
@Chiragsd13
Copy link
Copy Markdown
Contributor Author

Pushed 11048eae addressing the remaining review items:

Qodo review finding 5 — year regex over-matches digits inside tokens

The previous pattern (?<![0-9])[12][0-9]{3}(?![0-9]) still matched 2015 inside ICML2015, because the lookaround only blocked adjacent digits. Switched to (?<!\p{Alnum})[12][0-9]{3}(?!\p{Alnum}) so letters next to the year also break the match. This brings the year checker in line with the boundary already used by BooktitleContainsCountryChecker.

Added a regression test in BooktitleCheckerTest:

@Test
void booktitleYearNotFlaggedInsideAlphanumericToken() {
    // "ICML2015" should NOT be flagged — the digits are part of a larger token
    assertEquals(Optional.empty(), yearChecker.checkValue("Proceedings ICML2015"));
}

CHANGELOG entry

Added one line under [Unreleased]Added:

We added integrity checks that warn when the booktitle field contains a year, a country/location, or page numbers that should live in dedicated fields. #12271

@Siedlerchr — your earlier inline feedback

Both points should now be resolved on the current HEAD:

  • Countries.java is gone — the country checker enumerates names via Locale.getISOCountries() + Locale.of("", code).getDisplayCountry(Locale.ENGLISH).
  • The test file uses // region / // endregion IntelliJ fold markers around each feature area.

Windows CI red check on this PR

The Windows-side failure (RelatedWorkInserterTest#insertMatchedRelatedWorkAppendsToExistingUserSpecificCommentField) looks unrelated to this change — it's the known Windows test flake tracked in #15537 with the fix in #15538. Happy to rebase once that lands if it's still blocking this PR.

Ready for re-review.

@github-actions github-actions bot added status: changes-required Pull requests that are not yet complete and removed status: no-bot-comments labels Apr 16, 2026
@subhramit
Copy link
Copy Markdown
Member

Pushed 11048eae addressing the remaining review items:

Qodo review finding 5 — year regex over-matches digits inside tokens

The previous pattern (?<![0-9])[12][0-9]{3}(?![0-9]) still matched 2015 inside ICML2015, because the lookaround only blocked adjacent digits. Switched to (?<!\p{Alnum})[12][0-9]{3}(?!\p{Alnum}) so letters next to the year also break the match. This brings the year checker in line with the boundary already used by BooktitleContainsCountryChecker.

Added a regression test in BooktitleCheckerTest:

@Test
void booktitleYearNotFlaggedInsideAlphanumericToken() {
    // "ICML2015" should NOT be flagged — the digits are part of a larger token
    assertEquals(Optional.empty(), yearChecker.checkValue("Proceedings ICML2015"));
}

CHANGELOG entry

Added one line under [Unreleased]Added:

We added integrity checks that warn when the booktitle field contains a year, a country/location, or page numbers that should live in dedicated fields. #12271

@Siedlerchr — your earlier inline feedback

Both points should now be resolved on the current HEAD:

  • Countries.java is gone — the country checker enumerates names via Locale.getISOCountries() + Locale.of("", code).getDisplayCountry(Locale.ENGLISH).
  • The test file uses // region / // endregion IntelliJ fold markers around each feature area.

Windows CI red check on this PR

The Windows-side failure (RelatedWorkInserterTest#insertMatchedRelatedWorkAppendsToExistingUserSpecificCommentField) looks unrelated to this change — it's the known Windows test flake tracked in #15537 with the fix in #15538. Happy to rebase once that lands if it's still blocking this PR.

Ready for re-review.

Please avoid using AI to communicate with us, It is against our guidelines for contribution/interaction.

@Chiragsd13
Copy link
Copy Markdown
Contributor Author

Status update after the last push:

  • CHANGELOG.md needs to be modified if indicated — now passing. The checklist box was [/] instead of [x], which the workflow reads as unticked; fixed in the PR description edit above.
  • Unit tests (Windows) (jablib) — still red, but the failing test is RelatedWorkInserterTest#insertMatchedRelatedWorkAppendsToExistingUserSpecificCommentField at line 61, which is the known Windows flake tracked in Fix failing Windows tests #15537 (fix in Fix failed test on Windows #15538). BooktitleCheckerTest itself passes. I don't have re-run permissions on this repo; a maintainer re-running that single job (or merging Fix failed test on Windows #15538) should clear it.

All other checks are green. Ready for re-review.

@subhramit
Copy link
Copy Markdown
Member

Status update after the last push:

  • CHANGELOG.md needs to be modified if indicated — now passing. The checklist box was [/] instead of [x], which the workflow reads as unticked; fixed in the PR description edit above.
  • Unit tests (Windows) (jablib) — still red, but the failing test is RelatedWorkInserterTest#insertMatchedRelatedWorkAppendsToExistingUserSpecificCommentField at line 61, which is the known Windows flake tracked in Fix failing Windows tests #15537 (fix in Fix failed test on Windows #15538). BooktitleCheckerTest itself passes. I don't have re-run permissions on this repo; a maintainer re-running that single job (or merging Fix failed test on Windows #15538) should clear it.

All other checks are green. Ready for re-review.

As mentioned above, this would be a final warning - please respect our code of conduct.
Letting this PR in because maintainers have already spent energy reviewing.

@Chiragsd13
Copy link
Copy Markdown
Contributor Author

Status update after the last push:

  • CHANGELOG.md needs to be modified if indicated — now passing. The checklist box was [/] instead of [x], which the workflow reads as unticked; fixed in the PR description edit above.
  • Unit tests (Windows) (jablib) — still red, but the failing test is RelatedWorkInserterTest#insertMatchedRelatedWorkAppendsToExistingUserSpecificCommentField at line 61, which is the known Windows flake tracked in Fix failing Windows tests #15537 (fix in Fix failed test on Windows #15538). BooktitleCheckerTest itself passes. I don't have re-run permissions on this repo; a maintainer re-running that single job (or merging Fix failed test on Windows #15538) should clear it.

All other checks are green. Ready for re-review.

As mentioned above, this would be a final warning - please respect our code of conduct. Letting this PR in because maintainers have already spent energy reviewing.

Sorry about that, you're right. It won't happen again. I'll write my own replies from here on.

@subhramit subhramit added this pull request to the merge queue Apr 16, 2026
@github-actions github-actions bot added the status: to-be-merged PRs which are accepted and should go into the merge-queue. label Apr 16, 2026
Merged via the queue into JabRef:main with commit 3e8e99c Apr 16, 2026
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

good second issue Issues that involve a tour of two or three interweaved components in JabRef status: changes-required Pull requests that are not yet complete status: to-be-merged PRs which are accepted and should go into the merge-queue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New integrity checker for booktitle

4 participants