Add options to loadtxt and use stream reading for speed by fiolj · Pull Request #1160 · fortran-lang/stdlib

fiolj · 2026-03-27T21:19:29Z

This PR modifies loadtxt() to address two issues:

Allows flexibility in the format of the data files, allowing comments and empty lines
Adds optional arguments to address different types of files, in particular is compatible with the proposed changes to savetxt.
The code was implemented, in collaboration with @jalvesz, using stream-access (as discussed here) and producing speed-ups with factors between 2 and 3.

The new specs would be:

`loadtxt` - load a 2D array from a text file

Status

Experimental

Description

Loads a rank-2 array from a text file.

Syntax

call [[stdlib_io(module):loadtxt(interface)]] (filename, array [, comments] [, delimiter] [, skiplines] [, max_rows] [, usecols])

call [[stdlib_io(module):loadtxt(interface)]] (unit, array [, comments] [, delimiter] [, skiplines] [, max_rows] [, usecols])

Arguments

filename or unit: Shall be a character expression containing the file name or an integer containing the unit of an already open file from which to load the rank-2 array.

array: Shall be an allocatable rank-2 array of type real, complex or integer.

comments (optional): Shall be a character expression of any length used to indicate the start of a comment. Default: #.

delimiter (optional): Shall be a character expression of length 1 that contains the delimiter used to separate the columns. The default is an empty string '' indicating that any number of whitespace will be considered a delimiter.

skiplines (optional): Skip the first skiplines lines from file, including comments. If skipping more lines than present, a 0-sized array will be returned. The default is 0.

max_rows (optional): Shall be an integer indicating that max_rows rows of data after skiprows will be read. A negative value results in reading all data. The default is to read all lines of data.

usecols (optional): Shall be an integer array indicating what columns will be read. For example, usecols = (1,3,5) will extract the first, third and fifth columns. The default is to read all columns.

Return value

Returns an allocated rank-2 array with the content of the file.

We need to make more extensive testing

codecov · 2026-03-27T22:07:29Z

Codecov Report

❌ Patch coverage is 93.33333% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.93%. Comparing base (fb63d7e) to head (db19b92).

Files with missing lines	Patch %	Lines
example/io/example_loadtxt.f90	0.00%	3 Missing ⚠️
test/io/test_loadtxt.f90	97.22%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1160      +/-   ##
==========================================
- Coverage   68.00%   67.93%   -0.08%     
==========================================
  Files         404      404              
  Lines       12935    12906      -29     
  Branches     1392     1382      -10     
==========================================
- Hits         8797     8768      -29     
  Misses       4138     4138

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR updates stdlib_io:loadtxt to support more flexible text-file formats (comments, blank lines, column selection) and reimplements parsing using stream-based reading for performance.

Changes:

Extend loadtxt API to accept either a filename or an open unit, plus new options: comments, delimiter, skiplines, max_rows, usecols.
Rework loadtxt internals to read the whole file via stream I/O and parse numerics from an in-memory buffer.
Update tests, examples, and IO specification docs to match the new interface and behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`src/io/stdlib_io.fypp`	Adds new `loadtxt` overloads/options and stream-based parsing implementation.
`test/io/test_loadtxt.f90`	Adjusts existing tests and adds coverage for comments/blank lines and new options.
`test/io/test_loadtxt_qp.fypp`	Updates QP tests to use tolerance-based comparisons.
`example/io/example_loadtxt.f90`	Refreshes usage example to the new interface (removes `fmt` usage).
`doc/specs/stdlib_io.md`	Updates `loadtxt` specification to document the new arguments/overloads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T12:59:24Z

src/io/stdlib_io.fypp

+        max_rows_ = min(optval(max_rows, nrows_effective), nrows_effective)
+        ! If there is no data we will return an empty array
+        if ((max_rows_ <= 0) .or. (ncols == 0)) then
+            allocate (d(0, 0))
+            return


max_rows is documented to allow negative values meaning “read all data”, but max_rows_ = min(optval(max_rows, nrows_effective), nrows_effective) makes any negative value produce max_rows_ <= 0, returning an empty array. Handle max_rows < 0 as nrows_effective explicitly before taking min.

Copilot · 2026-03-29T12:59:24Z

src/io/stdlib_io.fypp

+    elemental function shift_to_nonwhitespace(s) result(p)
+    !! move string to position of the next non white space character
+        character(len=*), intent(in) :: s !! character chain
+        integer :: p !! position
+        !----------------------------------------------
+        ! p = verify(s, blanks//nl)
+        p = verify(s, blanks)
+        if (p == 0) p = len(s)
+    end function shift_to_nonwhitespace


Blank-line detection and leading-whitespace skipping don’t treat CR/LF as whitespace (shift_to_nonwhitespace uses verify(s, blanks) where blanks is space+TAB). For CRLF empty lines, line_start becomes 1 (CR) and line_start == line_end is false, so the line can be miscounted/parsed as data. Include CR/LF in the whitespace set (e.g., verify(s, blanks//nl) or expand blanks) so empty/whitespace-only lines are handled correctly across LF and CRLF files.

Copilot · 2026-03-29T12:59:25Z

src/io/stdlib_io.fypp

+            if (starts_with(ffp(line_start:), comment_) .or. &
+                (line_start == line_end)) then
+                ffp => ffp(line_end + 1:) ! Skip comment lines and blank lines
+                cycle
            end if
-
-            istart = 0
-            do j = 1, ncol - 1
-              iend = index(line(istart+1:), delimiter_)
-              read (line(istart+1:istart+iend-1),*,iostat=ios,iomsg=iomsg) d(i, j)
-              if (ios/=0) then 
-                 write(msgout,2) trim(iomsg),size(d,2),i,trim(filename)
-                 call error_stop(msg=trim(msgout))
-              end if
-              istart = istart + iend
+
+            row_effective = row_effective + 1
+            do j = 1, ncols     ! Read a row
+              #:if 'complex' in t1
+                reval = to_num_from_stream(ffp, reval)
+                if (in_delim(ffp, delim_)) then
+                    ffp => ffp(shift_to_nondelim(ffp, delim_):)
+                end if
+                imval = to_num_from_stream(ffp, imval)
+                if (in_delim(ffp, delim_)) then
+                    ffp => ffp(shift_to_nondelim(ffp, delim_):)
+                end if
+                cols(j) = cmplx(reval, imval, kind(cols(j)))
+              #:else  
+                cols(j) = to_num_from_stream(ffp, cols(j))
+                if (in_delim(ffp, delim_)) then
+                    ffp => ffp(shift_to_nondelim(ffp, delim_):)
+                end if
+              #:endif
+
+                if (scan(ffp(1:1), nl) /= 0) then ! If EOL => no more cols
+                    exit
+                end if
+            end do


The comments argument is documented as “from comment symbol until line end everything else will be ignored”, but the parser only skips lines that start with the comment marker. Inline comments (e.g. 1 2 3 # note) will be fed into to_num_from_stream and can be mis-parsed as extra numeric columns. Add logic to stop parsing a row when the next non-whitespace char begins with comment_ (or strip the comment substring before numeric parsing).

Copilot · 2026-03-29T12:59:25Z

src/io/stdlib_io.fypp


-      s = open(filename)
+        comment_ = optval(comments, comment_default)
+        delim_ = optval(delimiter, delimiter_default)


delimiter/comments are accepted as character(len=*), but an empty string is currently unsafe: starts_with(..., "") is always true and index(row, "") returns 1, which would treat every line as a comment and/or make delimiter handling degenerate. Either reject empty comments/delimiter with a clear error, or normalize len_trim(delim_)==0 to the whitespace-delimiter behavior and len_trim(comment_)==0 to “no comments”.

Suggested change

delim_ = optval(delimiter, delimiter_default)

delim_ = optval(delimiter, delimiter_default)

! Normalize/validate empty comment and delimiter values

if (len_trim(comment_) == 0) then

if (present(comments)) then

write(msgout,'(a)') 'loadtxt error: empty comment string is not allowed'

call error_stop(msg=trim(msgout))

else

comment_ = comment_default

end if

end if

if (len_trim(delim_) == 0) then

delim_ = delimiter_default

end if

Copilot · 2026-03-29T12:59:25Z

src/io/stdlib_io.fypp

+        if (present(usecols)) then ! user set columns to extract
+            usecols_ = usecols
+        else                    ! extract all columns
+            usecols_ = [(j, j=1, ncols)]
        end if
-
-      end do
-
-      ! Default to format used for savetxt if fmt not specified.
-      #:if 'real' in t1
-      fmt_ = optval(fmt, "(*"//FMT_REAL_${k1}$(1:len(FMT_REAL_${k1}$)-1)//",:,1x))")
-      #:elif 'complex' in t1
-      fmt_ = optval(fmt, "(*"//FMT_COMPLEX_${k1}$(1:len(FMT_COMPLEX_${k1}$)-1)//",:,1x))")
-      #:else
-      fmt_ = optval(fmt, "*")
-      #:endif      
-
-      if ( fmt_ == '*' ) then
-        ! Use list directed read if user has specified fmt='*'
-        if (is_blank(delimiter_) .or. delimiter_ == ",") then
-          do i = 1, max_rows_
-            read (s,*,iostat=ios,iomsg=iomsg) d(i, :)
-
-            if (ios/=0) then 
-              write(msgout,2) trim(iomsg),size(d,2),i,trim(filename)
-              call error_stop(msg=trim(msgout))
-            end if          
-
-          enddo
-        ! Otherwise read each value separately
-        else
-          do i = 1, max_rows_
-            call get_line(s, line, ios, iomsg_)
-            if (ios/=0) then 
-               write(msgout,2) trim(iomsg_),size(d,2),i,trim(filename)
-               call error_stop(msg=trim(msgout))
+        allocate (d(max_rows_, size(usecols_)))
+        allocate (cols(ncols))  ! Used to hold each row
+
+        row_effective = 0
+        ffp => ff(start_effective:) ! Reset pointer to the beginning of the file after skiplines
+        nrows = nrows - skiplines_
+
+        do row = 1, nrows
+            line_end = shift_to_eol(ffp)
+            line_start = shift_to_nonwhitespace(ffp(:line_end)) ! Avoid initial blanks in line
+
+            if (starts_with(ffp(line_start:), comment_) .or. &
+                (line_start == line_end)) then
+                ffp => ffp(line_end + 1:) ! Skip comment lines and blank lines
+                cycle
            end if
-
-            istart = 0
-            do j = 1, ncol - 1
-              iend = index(line(istart+1:), delimiter_)
-              read (line(istart+1:istart+iend-1),*,iostat=ios,iomsg=iomsg) d(i, j)
-              if (ios/=0) then 
-                 write(msgout,2) trim(iomsg),size(d,2),i,trim(filename)
-                 call error_stop(msg=trim(msgout))
-              end if
-              istart = istart + iend
+
+            row_effective = row_effective + 1
+            do j = 1, ncols     ! Read a row
+              #:if 'complex' in t1
+                reval = to_num_from_stream(ffp, reval)
+                if (in_delim(ffp, delim_)) then
+                    ffp => ffp(shift_to_nondelim(ffp, delim_):)
+                end if
+                imval = to_num_from_stream(ffp, imval)
+                if (in_delim(ffp, delim_)) then
+                    ffp => ffp(shift_to_nondelim(ffp, delim_):)
+                end if
+                cols(j) = cmplx(reval, imval, kind(cols(j)))
+              #:else  
+                cols(j) = to_num_from_stream(ffp, cols(j))
+                if (in_delim(ffp, delim_)) then
+                    ffp => ffp(shift_to_nondelim(ffp, delim_):)
+                end if
+              #:endif
+
+                if (scan(ffp(1:1), nl) /= 0) then ! If EOL => no more cols
+                    exit
+                end if
+            end do
+            ! Copy the columns of the current row to d(row_effective,:)
+            do j = 1, size(usecols_)
+                d(row_effective, j) = cols(usecols_(j))
            end do


usecols_ indices are used directly in cols(usecols_(j)) without validation. If a caller passes 0, a negative index, or a value > ncols, this will be an out-of-bounds access. Validate that all usecols_ are within [1, ncols] (and consider whether duplicates are allowed) before allocating/filling d.

Copilot · 2026-03-29T12:59:26Z

src/io/stdlib_io.fypp

+        !----------------------------------------- Load file in a single string
+        inquire (unit=unit, size=fsze)
+        if(fsze == 0) then
+            write (msgout,'(a,i0,a)') 'loadtxt error: file empty'


The internal write for the empty-file error uses format '(a,i0,a)' but only supplies a single character argument, which will raise a runtime formatting error. Use a single '(a)' format (or supply the missing items) so the error path is reliable.

Suggested change

write (msgout,'(a,i0,a)') 'loadtxt error: file empty'

write (msgout,'(a)') 'loadtxt error: file empty'

Copilot · 2026-03-29T12:59:26Z

src/io/stdlib_io.fypp

+        ! first argument is unit
+        inquire (unit=unit, opened=opened, action=readable) 
+        if((.not. opened) .or. (readable(1:1) /= 'R')) then
+            write (msgout,'(a,i0,a)') 'loadtxt error: unit ',unit,' not open for reading'
+            call error_stop(msg=trim(msgout))
+        end if
+        write(fout,'(i0)') unit
+        fout = adjustl(fout)  ! fout is used for unified error message later
+      #:endif
+
+        !----------------------------------------- Load file in a single string
+        inquire (unit=unit, size=fsze)
+        if(fsze == 0) then
+            write (msgout,'(a,i0,a)') 'loadtxt error: file empty'
+            call error_stop(msg=trim(msgout))
+        else
+            allocate (character(fsze) :: ff)
+            read (unit, iostat=iostat, iomsg=iomsg) ff
+            if (iostat /= 0) then


The unit overload assumes the passed unit supports INQUIRE(..., SIZE=...) and an unformatted stream read(unit) ff. This will fail for a typical formatted sequential text unit, and also depends on the current file position (no rewind / pos=1). Either (1) require and validate access='stream' + form='unformatted' (and read with pos=1), or (2) implement a formatted line-reading fallback for non-stream units.

Copilot · 2026-03-29T12:59:26Z

src/io/stdlib_io.fypp

+    elemental function shift_to_eol(s) result(p)
+        !! move string to position of the next end-of-line character
+        character(len=*), intent(in) :: s !! character chain
+        integer :: p !! position
+        !----------------------------------------------
+        p = scan(s, nl)
+        if (p < len(s)) then ! If CRLF, move to LF
+            if (s(p:p + 1) == nl) p = p + 1
+        end if
+
+    end function shift_to_eol


shift_to_eol returns scan(s,nl) directly. When no CR/LF exists (e.g., last line without a trailing newline), scan returns 0; callers then slice ffp(:line_end) and ffp(line_end+1:), which is invalid. Make shift_to_eol return len(s) (or len(s)+1 sentinel) when no EOL is found, and guard the CRLF check for p==0.

Copilot · 2026-03-29T12:59:26Z

test/io/test_loadtxt.f90

+        call loadtxt(fname, expected, skiplines=4) ! Skip comment and first line
+        call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed')
+        if (allocated(error)) return
+        ! Test max_rows option
+        call loadtxt(fname, expected, max_rows=4) ! Skip comment and first line
+        call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed')
+        if (allocated(error)) return
+        ! Test usecols option
+        call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Skip comment and first line


This comment says “Skip comment and first line”, but max_rows=4 is passed without skiplines, so no skipping happens here. Update the comment (or add skiplines= if that’s what is intended) to avoid misleading future readers.

Suggested change

call loadtxt(fname, expected, skiplines=4) ! Skip comment and first line

call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed')

if (allocated(error)) return

! Test max_rows option

call loadtxt(fname, expected, max_rows=4) ! Skip comment and first line

call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed')

if (allocated(error)) return

! Test usecols option

call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Skip comment and first line

call loadtxt(fname, expected, skiplines=4) ! Skip comment and first data line

call check(error, all(abs(input(2:, :) - expected) < eps), 'skiplines read failed')

if (allocated(error)) return

! Test max_rows option

call loadtxt(fname, expected, max_rows=4) ! Read only first 4 data rows

call check(error, all(abs(input(:4, :) - expected) < eps), 'max_rows read failed')

if (allocated(error)) return

! Test usecols option

call loadtxt(fname, expected, usecols=[3, 1, 1, 2]) ! Read selected columns

Copilot · 2026-03-29T12:59:26Z

doc/specs/stdlib_io.md

-`fmt` (optional): Fortran format specifier for the text read.  Defaults to the write format for the data type.  Setting fmt='*' will specify list directed read.   
+`skiplines` (optional): Skip the first `skiplines` lines from file, including comments. If skipping more lines than present, a 0-sized array will be returned. The default is 0.
+
+`max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiprows` will be read. A negative value results in reading all data. The default is to read all lines of data.


The max_rows argument description still refers to skiprows, but the option was renamed to skiplines. Update the doc text to match the new argument name so users aren’t confused.

Suggested change

`max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiprows` will be read. A negative value results in reading all data. The default is to read all lines of data.

`max_rows` (optional): Shall be an integer indicating that `max_rows` **rows of data** after `skiplines` will be read. A negative value results in reading all data. The default is to read all lines of data.

fiolj and others added 5 commits March 15, 2026 04:33

Started to migrate to loadtxt-stream

396914f

Modified also tests and example to make them work

1e1d635

We need to make more extensive testing

Improved specs and tests on loadtxt

a9de934

Merge branch 'master' into loadtxt-stream

ecfcafc

Fixed bug in conversion to complex

db19b92

jalvesz requested a review from Copilot March 29, 2026 12:54

Copilot started reviewing on behalf of jalvesz March 29, 2026 12:54 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add options to loadtxt and use stream reading for speed#1160

Add options to loadtxt and use stream reading for speed#1160
fiolj wants to merge 5 commits intofortran-lang:masterfrom
fiolj:loadtxt-stream

fiolj commented Mar 27, 2026

Uh oh!

codecov bot commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        delim_ = optval(delimiter, delimiter_default)
+        delim_   = optval(delimiter, delimiter_default)
+        ! Normalize/validate empty comment and delimiter values
+        if (len_trim(comment_) == 0) then
+            if (present(comments)) then
+                write(msgout,'(a)') 'loadtxt error: empty comment string is not allowed'
+                call error_stop(msg=trim(msgout))
+            else
+                comment_ = comment_default
+            end if
+        end if
+        if (len_trim(delim_) == 0) then
+            delim_ = delimiter_default
+        end if

	write (msgout,'(a,i0,a)') 'loadtxt error: file empty'
	write (msgout,'(a)') 'loadtxt error: file empty'

	`max_rows` (optional): Shall be an integer indicating that `max_rows` rows of data after `skiprows` will be read. A negative value results in reading all data. The default is to read all lines of data.
	`max_rows` (optional): Shall be an integer indicating that `max_rows` rows of data after `skiplines` will be read. A negative value results in reading all data. The default is to read all lines of data.

Conversation

fiolj commented Mar 27, 2026

loadtxt - load a 2D array from a text file

Status

Description

Syntax

Arguments

Return value

Uh oh!

codecov bot commented Mar 27, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`loadtxt` - load a 2D array from a text file