Skip to content

Add options to loadtxt and use stream reading for speed#1160

Draft
fiolj wants to merge 5 commits intofortran-lang:masterfrom
fiolj:loadtxt-stream
Draft

Add options to loadtxt and use stream reading for speed#1160
fiolj wants to merge 5 commits intofortran-lang:masterfrom
fiolj:loadtxt-stream

Conversation

@fiolj
Copy link
Copy Markdown
Contributor

@fiolj fiolj commented Mar 27, 2026

This PR modifies loadtxt() to address two issues:

  1. Allows flexibility in the format of the data files, allowing comments and empty lines
  2. Adds optional arguments to address different types of files, in particular is compatible with the proposed changes to savetxt.
  3. The code was implemented, in collaboration with @jalvesz, using stream-access (as discussed here) and producing speed-ups with factors between 2 and 3.

The new specs would be:


loadtxt - load a 2D array from a text file

Status

Experimental

Description

Loads a rank-2 array from a text file.

Syntax

call [[stdlib_io(module):loadtxt(interface)]] (filename, array [, comments] [, delimiter] [, skiplines] [, max_rows] [, usecols])

call [[stdlib_io(module):loadtxt(interface)]] (unit, array [, comments] [, delimiter] [, skiplines] [, max_rows] [, usecols])

Arguments

filename or unit: Shall be a character expression containing the file name or an integer containing the unit of an already open file from which to load the rank-2 array.

array: Shall be an allocatable rank-2 array of type real, complex or integer.

comments (optional): Shall be a character expression of any length used to indicate the start of a comment. Default: #.

delimiter (optional): Shall be a character expression of length 1 that contains the delimiter used to separate the columns. The default is an empty string '' indicating that any number of whitespace will be considered a delimiter.

skiplines (optional): Skip the first skiplines lines from file, including comments. If skipping more lines than present, a 0-sized array will be returned. The default is 0.

max_rows (optional): Shall be an integer indicating that max_rows rows of data after skiprows will be read. A negative value results in reading all data. The default is to read all lines of data.

usecols (optional): Shall be an integer array indicating what columns will be read. For example, usecols = (1,3,5) will extract the first, third and fifth columns. The default is to read all columns.

Return value

Returns an allocated rank-2 array with the content of the file.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.93%. Comparing base (fb63d7e) to head (db19b92).

Files with missing lines Patch % Lines
example/io/example_loadtxt.f90 0.00% 3 Missing ⚠️
test/io/test_loadtxt.f90 97.22% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1160      +/-   ##
==========================================
- Coverage   68.00%   67.93%   -0.08%     
==========================================
  Files         404      404              
  Lines       12935    12906      -29     
  Branches     1392     1382      -10     
==========================================
- Hits         8797     8768      -29     
  Misses       4138     4138              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates stdlib_io:loadtxt to support more flexible text-file formats (comments, blank lines, column selection) and reimplements parsing using stream-based reading for performance.

Changes:

  • Extend loadtxt API to accept either a filename or an open unit, plus new options: comments, delimiter, skiplines, max_rows, usecols.
  • Rework loadtxt internals to read the whole file via stream I/O and parse numerics from an in-memory buffer.
  • Update tests, examples, and IO specification docs to match the new interface and behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/io/stdlib_io.fypp Adds new loadtxt overloads/options and stream-based parsing implementation.
test/io/test_loadtxt.f90 Adjusts existing tests and adds coverage for comments/blank lines and new options.
test/io/test_loadtxt_qp.fypp Updates QP tests to use tolerance-based comparisons.
example/io/example_loadtxt.f90 Refreshes usage example to the new interface (removes fmt usage).
doc/specs/stdlib_io.md Updates loadtxt specification to document the new arguments/overloads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +232 to +236
max_rows_ = min(optval(max_rows, nrows_effective), nrows_effective)
! If there is no data we will return an empty array
if ((max_rows_ <= 0) .or. (ncols == 0)) then
allocate (d(0, 0))
return
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_rows is documented to allow negative values meaning “read all data”, but max_rows_ = min(optval(max_rows, nrows_effective), nrows_effective) makes any negative value produce max_rows_ <= 0, returning an empty array. Handle max_rows < 0 as nrows_effective explicitly before taking min.

Copilot uses AI. Check for mistakes.
Comment on lines +389 to +397
elemental function shift_to_nonwhitespace(s) result(p)
!! move string to position of the next non white space character
character(len=*), intent(in) :: s !! character chain
integer :: p !! position
!----------------------------------------------
! p = verify(s, blanks//nl)
p = verify(s, blanks)
if (p == 0) p = len(s)
end function shift_to_nonwhitespace
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank-line detection and leading-whitespace skipping don’t treat CR/LF as whitespace (shift_to_nonwhitespace uses verify(s, blanks) where blanks is space+TAB). For CRLF empty lines, line_start becomes 1 (CR) and line_start == line_end is false, so the line can be miscounted/parsed as data. Include CR/LF in the whitespace set (e.g., verify(s, blanks//nl) or expand blanks) so empty/whitespace-only lines are handled correctly across LF and CRLF files.

Copilot uses AI. Check for mistakes.
Comment on lines +255 to +283
if (starts_with(ffp(line_start:), comment_) .or. &
(line_start == line_end)) then
ffp => ffp(line_end + 1:) ! Skip comment lines and blank lines
cycle
end if

istart = 0
do j = 1, ncol - 1
iend = index(line(istart+1:), delimiter_)
read (line(istart+1:istart+iend-1),*,iostat=ios,iomsg=iomsg) d(i, j)
if (ios/=0) then
write(msgout,2) trim(iomsg),size(d,2),i,trim(filename)
call error_stop(msg=trim(msgout))
end if
istart = istart + iend

row_effective = row_effective + 1
do j = 1, ncols ! Read a row
#:if 'complex' in t1
reval = to_num_from_stream(ffp, reval)
if (in_delim(ffp, delim_)) then
ffp => ffp(shift_to_nondelim(ffp, delim_):)
end if
imval = to_num_from_stream(ffp, imval)
if (in_delim(ffp, delim_)) then
ffp => ffp(shift_to_nondelim(ffp, delim_):)
end if
cols(j) = cmplx(reval, imval, kind(cols(j)))
#:else
cols(j) = to_num_from_stream(ffp, cols(j))
if (in_delim(ffp, delim_)) then
ffp => ffp(shift_to_nondelim(ffp, delim_):)
end if
#:endif

if (scan(ffp(1:1), nl) /= 0) then ! If EOL => no more cols
exit
end if
end do
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments argument is documented as “from comment symbol until line end everything else will be ignored”, but the parser only skips lines that start with the comment marker. Inline comments (e.g. 1 2 3 # note) will be fed into to_num_from_stream and can be mis-parsed as extra numeric columns. Add logic to stop parsing a row when the next non-whitespace char begins with comment_ (or strip the comment substring before numeric parsing).

Copilot uses AI. Check for mistakes.

s = open(filename)
comment_ = optval(comments, comment_default)
delim_ = optval(delimiter, delimiter_default)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delimiter/comments are accepted as character(len=*), but an empty string is currently unsafe: starts_with(..., "") is always true and index(row, "") returns 1, which would treat every line as a comment and/or make delimiter handling degenerate. Either reject empty comments/delimiter with a clear error, or normalize len_trim(delim_)==0 to the whitespace-delimiter behavior and len_trim(comment_)==0 to “no comments”.

Suggested change
delim_ = optval(delimiter, delimiter_default)
delim_ = optval(delimiter, delimiter_default)
! Normalize/validate empty comment and delimiter values
if (len_trim(comment_) == 0) then
if (present(comments)) then
write(msgout,'(a)') 'loadtxt error: empty comment string is not allowed'
call error_stop(msg=trim(msgout))
else
comment_ = comment_default
end if
end if
if (len_trim(delim_) == 0) then
delim_ = delimiter_default
end if

Copilot uses AI. Check for mistakes.
Comment on lines +239 to 287
if (present(usecols)) then ! user set columns to extract
usecols_ = usecols
else ! extract all columns
usecols_ = [(j, j=1, ncols)]
end if

end do

! Default to format used for savetxt if fmt not specified.
#:if 'real' in t1
fmt_ = optval(fmt, "(*"//FMT_REAL_${k1}$(1:len(FMT_REAL_${k1}$)-1)//",:,1x))")
#:elif 'complex' in t1
fmt_ = optval(fmt, "(*"//FMT_COMPLEX_${k1}$(1:len(FMT_COMPLEX_${k1}$)-1)//",:,1x))")
#:else
fmt_ = optval(fmt, "*")
#:endif

if ( fmt_ == '*' ) then
! Use list directed read if user has specified fmt='*'
if (is_blank(delimiter_) .or. delimiter_ == ",") then
do i = 1, max_rows_
read (s,*,iostat=ios,iomsg=iomsg) d(i, :)

if (ios/=0) then
write(msgout,2) trim(iomsg),size(d,2),i,trim(filename)
call error_stop(msg=trim(msgout))
end if

enddo
! Otherwise read each value separately
else
do i = 1, max_rows_
call get_line(s, line, ios, iomsg_)
if (ios/=0) then
write(msgout,2) trim(iomsg_),size(d,2),i,trim(filename)
call error_stop(msg=trim(msgout))
allocate (d(max_rows_, size(usecols_)))
allocate (cols(ncols)) ! Used to hold each row

row_effective = 0
ffp => ff(start_effective:) ! Reset pointer to the beginning of the file after skiplines
nrows = nrows - skiplines_

do row = 1, nrows
line_end = shift_to_eol(ffp)
line_start = shift_to_nonwhitespace(ffp(:line_end)) ! Avoid initial blanks in line

if (starts_with(ffp(line_start:), comment_) .or. &
(line_start == line_end)) then
ffp => ffp(line_end + 1:) ! Skip comment lines and blank lines
cycle
end if

istart = 0
do j = 1, ncol - 1
iend = index(line(istart+1:), delimiter_)
read (line(istart+1:istart+iend-1),*,iostat=ios,iomsg=iomsg) d(i, j)
if (ios/=0) then
write(msgout,2) trim(iomsg),size(d,2),i,trim(filename)
call error_stop(msg=trim(msgout))
end if
istart = istart + iend

row_effective = row_effective + 1
do j = 1, ncols ! Read a row
#:if 'complex' in t1
reval = to_num_from_stream(ffp, reval)
if (in_delim(ffp, delim_)) then
ffp => ffp(shift_to_nondelim(ffp, delim_):)
end if
imval = to_num_from_stream(ffp, imval)
if (in_delim(ffp, delim_)) then
ffp => ffp(shift_to_nondelim(ffp, delim_):)
end if
cols(j) = cmplx(reval, imval, kind(cols(j)))
#:else
cols(j) = to_num_from_stream(ffp, cols(j))
if (in_delim(ffp, delim_)) then
ffp => ffp(shift_to_nondelim(ffp, delim_):)
end if
#:endif

if (scan(ffp(1:1), nl) /= 0) then ! If EOL => no more cols
exit
end if
end do
! Copy the columns of the current row to d(row_effective,:)
do j = 1, size(usecols_)
d(row_effective, j) = cols(usecols_(j))
end do
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usecols_ indices are used directly in cols(usecols_(j)) without validation. If a caller passes 0, a negative index, or a value > ncols, this will be an out-of-bounds access. Validate that all usecols_ are within [1, ncols] (and consider whether duplicates are allowed) before allocating/filling d.

Copilot uses AI. Check for mistakes.
!----------------------------------------- Load file in a single string
inquire (unit=unit, size=fsze)
if(fsze == 0) then
write (msgout,'(a,i0,a)') 'loadtxt error: file empty'
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal write for the empty-file error uses format '(a,i0,a)' but only supplies a single character argument, which will raise a runtime formatting error. Use a single '(a)' format (or supply the missing items) so the error path is reliable.

Suggested change
write (msgout,'(a,i0,a)') 'loadtxt error: file empty'
write (msgout,'(a)') 'loadtxt error: file empty'

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +189
! first argument is unit
inquire (unit=unit, opened=opened, action=readable)
if((.not. opened) .or. (readable(1:1) /= 'R')) then
write (msgout,'(a,i0,a)') 'loadtxt error: unit ',unit,' not open for reading'
call error_stop(msg=trim(msgout))
end if
write(fout,'(i0)') unit
fout = adjustl(fout) ! fout is used for unified error message later
#:endif

!----------------------------------------- Load file in a single string
inquire (unit=unit, size=fsze)
if(fsze == 0) then
write (msgout,'(a,i0,a)') 'loadtxt error: file empty'
call error_stop(msg=trim(msgout))
else
allocate (character(fsze) :: ff)
read (unit, iostat=iostat, iomsg=iomsg) ff
if (iostat /= 0) then
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit overload assumes the passed unit supports INQUIRE(..., SIZE=...) and an unformatted stream read(unit) ff. This will fail for a typical formatted sequential text unit, and also depends on the current file position (no rewind / pos=1). Either (1) require and validate access='stream' + form='unformatted' (and read with pos=1), or (2) implement a formatted line-reading fallback for non-stream units.

Copilot uses AI. Check for mistakes.
Comment on lines +333 to +343
elemental function shift_to_eol(s) result(p)
!! move string to position of the next end-of-line character
character(len=*), intent(in) :: s !! character chain
integer :: p !! position
!----------------------------------------------
p = scan(s, nl)
if (p < len(s)) then ! If CRLF, move to LF
if (s(p:p + 1) == nl) p = p + 1
end if

end function shift_to_eol
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shift_to_eol returns scan(s,nl) directly. When no CR/LF exists (e.g., last line without a trailing newline), scan returns 0; callers then slice ffp(:line_end) and ffp(line_end+1:), which is invalid. Make shift_to_eol return len(s) (or len(s)+1 sentinel) when no EOL is found, and guard the CRLF check for p==0.

Copilot uses AI. Check for mistakes.
Comment on lines +191 to +199
call loadtxt(fname, expected, skiplines=4) ! Skip comment and first line
call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed')
if (allocated(error)) return
! Test max_rows option
call loadtxt(fname, expected, max_rows=4) ! Skip comment and first line
call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed')
if (allocated(error)) return
! Test usecols option
call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Skip comment and first line
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says “Skip comment and first line”, but max_rows=4 is passed without skiplines, so no skipping happens here. Update the comment (or add skiplines= if that’s what is intended) to avoid misleading future readers.

Suggested change
call loadtxt(fname, expected, skiplines=4) ! Skip comment and first line
call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed')
if (allocated(error)) return
! Test max_rows option
call loadtxt(fname, expected, max_rows=4) ! Skip comment and first line
call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed')
if (allocated(error)) return
! Test usecols option
call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Skip comment and first line
call loadtxt(fname, expected, skiplines=4) ! Skip comment and first data line
call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed')
if (allocated(error)) return
! Test max_rows option
call loadtxt(fname, expected, max_rows=4) ! Read only first 4 data rows
call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed')
if (allocated(error)) return
! Test usecols option
call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Read selected columns

Copilot uses AI. Check for mistakes.
`fmt` (optional): Fortran format specifier for the text read. Defaults to the write format for the data type. Setting fmt='*' will specify list directed read.
`skiplines` (optional): Skip the first `skiplines` lines from file, including comments. If skipping more lines than present, a 0-sized array will be returned. The default is 0.

`max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiprows` will be read. A negative value results in reading all data. The default is to read all lines of data.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max_rows argument description still refers to skiprows, but the option was renamed to skiplines. Update the doc text to match the new argument name so users aren’t confused.

Suggested change
`max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiprows` will be read. A negative value results in reading all data. The default is to read all lines of data.
`max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiplines` will be read. A negative value results in reading all data. The default is to read all lines of data.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants