Add Row Text Splitter node for line-based document chunking#6138
Add Row Text Splitter node for line-based document chunking#6138Dexterity104 wants to merge 2 commits intoFlowiseAI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the RowTextSplitter component, which enables splitting text into individual rows based on a configurable separator. The implementation includes options for trimming whitespace and filtering empty lines, as well as a node wrapper for integration. A review comment suggests optimizing the line-splitting logic by using a regular expression to handle both LF and CRLF line endings more efficiently during the split operation, rather than performing replacements within a loop.
| const rawLines = text.split(this.lineSeparator) | ||
| const lines: string[] = [] | ||
|
|
||
| for (let raw of rawLines) { | ||
| if (this.lineSeparator === '\n') { | ||
| raw = raw.replace(/\r$/, '') | ||
| } |
There was a problem hiding this comment.
The current implementation of line splitting and carriage return handling can be optimized. Performing a regex replacement inside a loop for every line is less efficient than handling it during the split operation, especially for large documents like logs or CSVs. Using a regex in the split method when the separator is a newline is a more performant and idiomatic approach to handle both LF and CRLF line endings.
| const rawLines = text.split(this.lineSeparator) | |
| const lines: string[] = [] | |
| for (let raw of rawLines) { | |
| if (this.lineSeparator === '\n') { | |
| raw = raw.replace(/\r$/, '') | |
| } | |
| const rawLines = this.lineSeparator === '\n' ? text.split(/\r?\n/) : text.split(this.lineSeparator) | |
| const lines: string[] = [] | |
| for (const raw of rawLines) { |
References
- Prioritize code readability and understandability over conciseness. A series of simple, chained operations can be preferable to a single, more complex one (e.g., a complex regex with a replacer function) if it improves understandability and reduces the potential for future errors.
…fferent line separators
Summary
Add a new Row Text Splitter node under
Text Splittersinpackages/components.It's meant for line-based content (CSV-style data, table exports, logs) where each row should become its own document.
Closes #6112
What's included
RowTextSplitternode wired into the existingTextSplitterecosystem\n, supports escaped values like\r\n,\t)TextSplitter, so it works with:splitTextsplitDocumentscreateDocumentslocinfoBehavior
For input:
With defaults (separator \n, trim on, empty lines off), the splitter produces four chunks:
Each line is a separate document, aligned exactly with the original rows.