Add options to loadtxt and use stream reading for speed#1160
Add options to loadtxt and use stream reading for speed#1160fiolj wants to merge 5 commits intofortran-lang:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1160 +/- ##
==========================================
- Coverage 68.00% 67.93% -0.08%
==========================================
Files 404 404
Lines 12935 12906 -29
Branches 1392 1382 -10
==========================================
- Hits 8797 8768 -29
Misses 4138 4138 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR updates stdlib_io:loadtxt to support more flexible text-file formats (comments, blank lines, column selection) and reimplements parsing using stream-based reading for performance.
Changes:
- Extend
loadtxtAPI to accept either a filename or an open unit, plus new options:comments,delimiter,skiplines,max_rows,usecols. - Rework
loadtxtinternals to read the whole file via stream I/O and parse numerics from an in-memory buffer. - Update tests, examples, and IO specification docs to match the new interface and behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
src/io/stdlib_io.fypp |
Adds new loadtxt overloads/options and stream-based parsing implementation. |
test/io/test_loadtxt.f90 |
Adjusts existing tests and adds coverage for comments/blank lines and new options. |
test/io/test_loadtxt_qp.fypp |
Updates QP tests to use tolerance-based comparisons. |
example/io/example_loadtxt.f90 |
Refreshes usage example to the new interface (removes fmt usage). |
doc/specs/stdlib_io.md |
Updates loadtxt specification to document the new arguments/overloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| max_rows_ = min(optval(max_rows, nrows_effective), nrows_effective) | ||
| ! If there is no data we will return an empty array | ||
| if ((max_rows_ <= 0) .or. (ncols == 0)) then | ||
| allocate (d(0, 0)) | ||
| return |
There was a problem hiding this comment.
max_rows is documented to allow negative values meaning “read all data”, but max_rows_ = min(optval(max_rows, nrows_effective), nrows_effective) makes any negative value produce max_rows_ <= 0, returning an empty array. Handle max_rows < 0 as nrows_effective explicitly before taking min.
| elemental function shift_to_nonwhitespace(s) result(p) | ||
| !! move string to position of the next non white space character | ||
| character(len=*), intent(in) :: s !! character chain | ||
| integer :: p !! position | ||
| !---------------------------------------------- | ||
| ! p = verify(s, blanks//nl) | ||
| p = verify(s, blanks) | ||
| if (p == 0) p = len(s) | ||
| end function shift_to_nonwhitespace |
There was a problem hiding this comment.
Blank-line detection and leading-whitespace skipping don’t treat CR/LF as whitespace (shift_to_nonwhitespace uses verify(s, blanks) where blanks is space+TAB). For CRLF empty lines, line_start becomes 1 (CR) and line_start == line_end is false, so the line can be miscounted/parsed as data. Include CR/LF in the whitespace set (e.g., verify(s, blanks//nl) or expand blanks) so empty/whitespace-only lines are handled correctly across LF and CRLF files.
| if (starts_with(ffp(line_start:), comment_) .or. & | ||
| (line_start == line_end)) then | ||
| ffp => ffp(line_end + 1:) ! Skip comment lines and blank lines | ||
| cycle | ||
| end if | ||
|
|
||
| istart = 0 | ||
| do j = 1, ncol - 1 | ||
| iend = index(line(istart+1:), delimiter_) | ||
| read (line(istart+1:istart+iend-1),*,iostat=ios,iomsg=iomsg) d(i, j) | ||
| if (ios/=0) then | ||
| write(msgout,2) trim(iomsg),size(d,2),i,trim(filename) | ||
| call error_stop(msg=trim(msgout)) | ||
| end if | ||
| istart = istart + iend | ||
|
|
||
| row_effective = row_effective + 1 | ||
| do j = 1, ncols ! Read a row | ||
| #:if 'complex' in t1 | ||
| reval = to_num_from_stream(ffp, reval) | ||
| if (in_delim(ffp, delim_)) then | ||
| ffp => ffp(shift_to_nondelim(ffp, delim_):) | ||
| end if | ||
| imval = to_num_from_stream(ffp, imval) | ||
| if (in_delim(ffp, delim_)) then | ||
| ffp => ffp(shift_to_nondelim(ffp, delim_):) | ||
| end if | ||
| cols(j) = cmplx(reval, imval, kind(cols(j))) | ||
| #:else | ||
| cols(j) = to_num_from_stream(ffp, cols(j)) | ||
| if (in_delim(ffp, delim_)) then | ||
| ffp => ffp(shift_to_nondelim(ffp, delim_):) | ||
| end if | ||
| #:endif | ||
|
|
||
| if (scan(ffp(1:1), nl) /= 0) then ! If EOL => no more cols | ||
| exit | ||
| end if | ||
| end do |
There was a problem hiding this comment.
The comments argument is documented as “from comment symbol until line end everything else will be ignored”, but the parser only skips lines that start with the comment marker. Inline comments (e.g. 1 2 3 # note) will be fed into to_num_from_stream and can be mis-parsed as extra numeric columns. Add logic to stop parsing a row when the next non-whitespace char begins with comment_ (or strip the comment substring before numeric parsing).
|
|
||
| s = open(filename) | ||
| comment_ = optval(comments, comment_default) | ||
| delim_ = optval(delimiter, delimiter_default) |
There was a problem hiding this comment.
delimiter/comments are accepted as character(len=*), but an empty string is currently unsafe: starts_with(..., "") is always true and index(row, "") returns 1, which would treat every line as a comment and/or make delimiter handling degenerate. Either reject empty comments/delimiter with a clear error, or normalize len_trim(delim_)==0 to the whitespace-delimiter behavior and len_trim(comment_)==0 to “no comments”.
| delim_ = optval(delimiter, delimiter_default) | |
| delim_ = optval(delimiter, delimiter_default) | |
| ! Normalize/validate empty comment and delimiter values | |
| if (len_trim(comment_) == 0) then | |
| if (present(comments)) then | |
| write(msgout,'(a)') 'loadtxt error: empty comment string is not allowed' | |
| call error_stop(msg=trim(msgout)) | |
| else | |
| comment_ = comment_default | |
| end if | |
| end if | |
| if (len_trim(delim_) == 0) then | |
| delim_ = delimiter_default | |
| end if |
| if (present(usecols)) then ! user set columns to extract | ||
| usecols_ = usecols | ||
| else ! extract all columns | ||
| usecols_ = [(j, j=1, ncols)] | ||
| end if | ||
|
|
||
| end do | ||
|
|
||
| ! Default to format used for savetxt if fmt not specified. | ||
| #:if 'real' in t1 | ||
| fmt_ = optval(fmt, "(*"//FMT_REAL_${k1}$(1:len(FMT_REAL_${k1}$)-1)//",:,1x))") | ||
| #:elif 'complex' in t1 | ||
| fmt_ = optval(fmt, "(*"//FMT_COMPLEX_${k1}$(1:len(FMT_COMPLEX_${k1}$)-1)//",:,1x))") | ||
| #:else | ||
| fmt_ = optval(fmt, "*") | ||
| #:endif | ||
|
|
||
| if ( fmt_ == '*' ) then | ||
| ! Use list directed read if user has specified fmt='*' | ||
| if (is_blank(delimiter_) .or. delimiter_ == ",") then | ||
| do i = 1, max_rows_ | ||
| read (s,*,iostat=ios,iomsg=iomsg) d(i, :) | ||
|
|
||
| if (ios/=0) then | ||
| write(msgout,2) trim(iomsg),size(d,2),i,trim(filename) | ||
| call error_stop(msg=trim(msgout)) | ||
| end if | ||
|
|
||
| enddo | ||
| ! Otherwise read each value separately | ||
| else | ||
| do i = 1, max_rows_ | ||
| call get_line(s, line, ios, iomsg_) | ||
| if (ios/=0) then | ||
| write(msgout,2) trim(iomsg_),size(d,2),i,trim(filename) | ||
| call error_stop(msg=trim(msgout)) | ||
| allocate (d(max_rows_, size(usecols_))) | ||
| allocate (cols(ncols)) ! Used to hold each row | ||
|
|
||
| row_effective = 0 | ||
| ffp => ff(start_effective:) ! Reset pointer to the beginning of the file after skiplines | ||
| nrows = nrows - skiplines_ | ||
|
|
||
| do row = 1, nrows | ||
| line_end = shift_to_eol(ffp) | ||
| line_start = shift_to_nonwhitespace(ffp(:line_end)) ! Avoid initial blanks in line | ||
|
|
||
| if (starts_with(ffp(line_start:), comment_) .or. & | ||
| (line_start == line_end)) then | ||
| ffp => ffp(line_end + 1:) ! Skip comment lines and blank lines | ||
| cycle | ||
| end if | ||
|
|
||
| istart = 0 | ||
| do j = 1, ncol - 1 | ||
| iend = index(line(istart+1:), delimiter_) | ||
| read (line(istart+1:istart+iend-1),*,iostat=ios,iomsg=iomsg) d(i, j) | ||
| if (ios/=0) then | ||
| write(msgout,2) trim(iomsg),size(d,2),i,trim(filename) | ||
| call error_stop(msg=trim(msgout)) | ||
| end if | ||
| istart = istart + iend | ||
|
|
||
| row_effective = row_effective + 1 | ||
| do j = 1, ncols ! Read a row | ||
| #:if 'complex' in t1 | ||
| reval = to_num_from_stream(ffp, reval) | ||
| if (in_delim(ffp, delim_)) then | ||
| ffp => ffp(shift_to_nondelim(ffp, delim_):) | ||
| end if | ||
| imval = to_num_from_stream(ffp, imval) | ||
| if (in_delim(ffp, delim_)) then | ||
| ffp => ffp(shift_to_nondelim(ffp, delim_):) | ||
| end if | ||
| cols(j) = cmplx(reval, imval, kind(cols(j))) | ||
| #:else | ||
| cols(j) = to_num_from_stream(ffp, cols(j)) | ||
| if (in_delim(ffp, delim_)) then | ||
| ffp => ffp(shift_to_nondelim(ffp, delim_):) | ||
| end if | ||
| #:endif | ||
|
|
||
| if (scan(ffp(1:1), nl) /= 0) then ! If EOL => no more cols | ||
| exit | ||
| end if | ||
| end do | ||
| ! Copy the columns of the current row to d(row_effective,:) | ||
| do j = 1, size(usecols_) | ||
| d(row_effective, j) = cols(usecols_(j)) | ||
| end do |
There was a problem hiding this comment.
usecols_ indices are used directly in cols(usecols_(j)) without validation. If a caller passes 0, a negative index, or a value > ncols, this will be an out-of-bounds access. Validate that all usecols_ are within [1, ncols] (and consider whether duplicates are allowed) before allocating/filling d.
| !----------------------------------------- Load file in a single string | ||
| inquire (unit=unit, size=fsze) | ||
| if(fsze == 0) then | ||
| write (msgout,'(a,i0,a)') 'loadtxt error: file empty' |
There was a problem hiding this comment.
The internal write for the empty-file error uses format '(a,i0,a)' but only supplies a single character argument, which will raise a runtime formatting error. Use a single '(a)' format (or supply the missing items) so the error path is reliable.
| write (msgout,'(a,i0,a)') 'loadtxt error: file empty' | |
| write (msgout,'(a)') 'loadtxt error: file empty' |
| ! first argument is unit | ||
| inquire (unit=unit, opened=opened, action=readable) | ||
| if((.not. opened) .or. (readable(1:1) /= 'R')) then | ||
| write (msgout,'(a,i0,a)') 'loadtxt error: unit ',unit,' not open for reading' | ||
| call error_stop(msg=trim(msgout)) | ||
| end if | ||
| write(fout,'(i0)') unit | ||
| fout = adjustl(fout) ! fout is used for unified error message later | ||
| #:endif | ||
|
|
||
| !----------------------------------------- Load file in a single string | ||
| inquire (unit=unit, size=fsze) | ||
| if(fsze == 0) then | ||
| write (msgout,'(a,i0,a)') 'loadtxt error: file empty' | ||
| call error_stop(msg=trim(msgout)) | ||
| else | ||
| allocate (character(fsze) :: ff) | ||
| read (unit, iostat=iostat, iomsg=iomsg) ff | ||
| if (iostat /= 0) then |
There was a problem hiding this comment.
The unit overload assumes the passed unit supports INQUIRE(..., SIZE=...) and an unformatted stream read(unit) ff. This will fail for a typical formatted sequential text unit, and also depends on the current file position (no rewind / pos=1). Either (1) require and validate access='stream' + form='unformatted' (and read with pos=1), or (2) implement a formatted line-reading fallback for non-stream units.
| elemental function shift_to_eol(s) result(p) | ||
| !! move string to position of the next end-of-line character | ||
| character(len=*), intent(in) :: s !! character chain | ||
| integer :: p !! position | ||
| !---------------------------------------------- | ||
| p = scan(s, nl) | ||
| if (p < len(s)) then ! If CRLF, move to LF | ||
| if (s(p:p + 1) == nl) p = p + 1 | ||
| end if | ||
|
|
||
| end function shift_to_eol |
There was a problem hiding this comment.
shift_to_eol returns scan(s,nl) directly. When no CR/LF exists (e.g., last line without a trailing newline), scan returns 0; callers then slice ffp(:line_end) and ffp(line_end+1:), which is invalid. Make shift_to_eol return len(s) (or len(s)+1 sentinel) when no EOL is found, and guard the CRLF check for p==0.
| call loadtxt(fname, expected, skiplines=4) ! Skip comment and first line | ||
| call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed') | ||
| if (allocated(error)) return | ||
| ! Test max_rows option | ||
| call loadtxt(fname, expected, max_rows=4) ! Skip comment and first line | ||
| call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed') | ||
| if (allocated(error)) return | ||
| ! Test usecols option | ||
| call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Skip comment and first line |
There was a problem hiding this comment.
This comment says “Skip comment and first line”, but max_rows=4 is passed without skiplines, so no skipping happens here. Update the comment (or add skiplines= if that’s what is intended) to avoid misleading future readers.
| call loadtxt(fname, expected, skiplines=4) ! Skip comment and first line | |
| call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed') | |
| if (allocated(error)) return | |
| ! Test max_rows option | |
| call loadtxt(fname, expected, max_rows=4) ! Skip comment and first line | |
| call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed') | |
| if (allocated(error)) return | |
| ! Test usecols option | |
| call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Skip comment and first line | |
| call loadtxt(fname, expected, skiplines=4) ! Skip comment and first data line | |
| call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed') | |
| if (allocated(error)) return | |
| ! Test max_rows option | |
| call loadtxt(fname, expected, max_rows=4) ! Read only first 4 data rows | |
| call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed') | |
| if (allocated(error)) return | |
| ! Test usecols option | |
| call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Read selected columns |
| `fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read. | ||
| `skiplines` (optional): Skip the first `skiplines` lines from file, including comments. If skipping more lines than present, a 0-sized array will be returned. The default is 0. | ||
|
|
||
| `max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiprows` will be read. A negative value results in reading all data. The default is to read all lines of data. |
There was a problem hiding this comment.
The max_rows argument description still refers to skiprows, but the option was renamed to skiplines. Update the doc text to match the new argument name so users aren’t confused.
| `max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiprows` will be read. A negative value results in reading all data. The default is to read all lines of data. | |
| `max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiplines` will be read. A negative value results in reading all data. The default is to read all lines of data. |
This PR modifies
loadtxt()to address two issues:The new specs would be:
loadtxt- load a 2D array from a text fileStatus
Experimental
Description
Loads a rank-2
arrayfrom a text file.Syntax
call[[stdlib_io(module):loadtxt(interface)]](filename, array [, comments] [, delimiter] [, skiplines] [, max_rows] [, usecols])call[[stdlib_io(module):loadtxt(interface)]](unit, array [, comments] [, delimiter] [, skiplines] [, max_rows] [, usecols])Arguments
filename or unit: Shall be a character expression containing the file name or an integer containing the unit of an already open file from which to load the rank-2array.array: Shall be an allocatable rank-2 array of typereal,complexorinteger.comments(optional): Shall be a character expression of any length used to indicate the start of a comment. Default:#.delimiter(optional): Shall be a character expression of length 1 that contains the delimiter used to separate the columns. The default is an empty string''indicating that any number of whitespace will be considered a delimiter.skiplines(optional): Skip the firstskiplineslines from file, including comments. If skipping more lines than present, a 0-sized array will be returned. The default is 0.max_rows(optional): Shall be an integer indicating thatmax_rowsrows of data afterskiprowswill be read. A negative value results in reading all data. The default is to read all lines of data.usecols(optional): Shall be an integer array indicating what columns will be read. For example,usecols = (1,3,5)will extract the first, third and fifth columns. The default is to read all columns.Return value
Returns an allocated rank-2
arraywith the content of the file.