mongoimport option --dir to load a folder full of JSON files #858

masyukun · 2025-12-19T20:23:35Z

Missing feature -- figured I'd add it to the existing tool rather than writing yet another one-off.

FGasper · 2026-01-06T15:42:54Z

@masyukun Thank you for your submission.

What benefit does this confer versus running multiple import processes in parallel, e.g.:

ls *.json | parallel --jobs 4 'mongoimport --db myDB --collection myColl --file {}'

FGasper

See earlier question.

Thank you!

masyukun · 2026-01-12T19:49:20Z

@masyukun Thank you for your submission.

What benefit does this confer versus running multiple import processes in parallel, e.g.:
ls *.json | parallel --jobs 4 'mongoimport --db myDB --collection myColl --file {}'

Hi FGasper -- it's the same set of benefits you get between doing 1000 individual inserts vs a single insertMany/bulkWrite:

a single connection / session to the database instead of a volatile number of them
orders of magnitude faster transfer of many documents (minutes vs hours)

FGasper · 2026-01-12T20:28:21Z

@masyukun Do you have some reproducible benchmarks to demonstrate the performance benefit of your branch over parallel restorations?

masyukun · 2026-01-12T21:23:34Z

@FGasper Sure -- Unless you have a preferred test harness, I have a folder full of files and can run the two commands on a timer, then submit screenshots.

FGasper · 2026-01-12T21:29:27Z

@masyukun What we’d like to see is reproducible results comparing parallel invocation versus your submission.

masyukun · 2026-01-12T23:31:53Z

Data folder

Kind: 18,247 documents
Size: 131,917,828 bytes (168.3 MB on disk)
ZIP too large to attach (59.8MB exceeds 25MB limit), but available on request

Method 1: parallel using *NIX pipes

Command

Original command: ls *.json | parallel --jobs 4 'mongoimport --db myDB --collection myColl --file {}'

Executed command: time find /Users/matthewroyal/Documents/GitHub/health-inspect/inspection_reports -name '*.json' | parallel --jobs 4 './mongoimport --uri="mongodb+srv://***:***@fooddata.8huxa.mongodb.net/" --db myDB --collection myCollParallel --file {}'

Error: zsh: argument list too long: ls
- My folder has only 18,247 JSON files -- many real-world environments have millions to billions of files.
- Since this command has evidently not been run in zsh in a folder with many JSON files, I substitute the ls command with the more scalable find command: find /Users/matthewroyal/Documents/GitHub/health-inspect/inspection_reports -name '*.json'
Prefixed command with time to track execution time with CLI-based stopwatch
Substituted included mongotools version in root project folder vs standard command in PATH
Customized db and collection location, anonymized URI string

Prerequisites

Installed parallel on Mac using brew install parallel

Output

Ingestion time was 57:16.55 -- nearly an hour (67.5x longer than --dir option)
See attached output log "myCollParallel-output.log"

Evaluation

mongoimport run through the bash command line with the parallel is impractical for real-world workloads
Many customers restrict installation of 3rd party tools, and may not have access to the package parallel
25 Transient connection Errors occurred while uploading the folder of JSON files
- This method masks the filename being attempted
- Handling or summarizing errors is difficult and labor intensive without reporting of which files failed

Method 2: --dir flag

Command

time ./bin/mongoimport --dir=/Users/matthewroyal/Documents/GitHub/health-inspect/inspection_reports --uri="mongodb+srv://***:***@fooddata.8huxa.mongodb.net/" --db=myDB --collection=myCollDirflag

Prerequisites

Clone PR, build, and run command from root checkout directory

Output

Ingestion time was 50.912 -- nearly a minute (98.52% faster than baseline mongoimport run in parallel with 3rd party library)
See attached output log "myCollDirflag-output.log"

Evaluation

Execution time was dramatically improved over even parallel runs of single file insertions by using insertMany/bulkWrite construct.
- This is a common use case encountered in real-world systems, as evidenced by our own forums, with a current workaround of abandoning mongoimport and writing a custom one-off data loader instead, increasing friction and maintenance for DevOps users.
Failures noted during ingestion -- none

Performance metrics comparison

myCollDirflag-output.log

myCollParallel-output.log

masyukun added 3 commits December 19, 2025 12:04

Import a directory full of JSON files

859a20c

Fixes to linter issues

b7a81c5

Update options with comment on directory loading option

2fb497d

masyukun requested a review from a team as a code owner December 19, 2025 20:23

masyukun requested review from FGasper and removed request for a team December 19, 2025 20:23

FGasper requested changes Jan 8, 2026

View reviewed changes

masyukun requested a review from FGasper January 12, 2026 23:37

Unit test for --dir load option

2097aa7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mongoimport option --dir to load a folder full of JSON files #858

mongoimport option --dir to load a folder full of JSON files #858

Uh oh!

masyukun commented Dec 19, 2025

Uh oh!

FGasper commented Jan 6, 2026

Uh oh!

FGasper left a comment

Uh oh!

masyukun commented Jan 12, 2026

Uh oh!

FGasper commented Jan 12, 2026 •

edited

Loading

Uh oh!

masyukun commented Jan 12, 2026 •

edited

Loading

Uh oh!

FGasper commented Jan 12, 2026

Uh oh!

masyukun commented Jan 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mongoimport option --dir to load a folder full of JSON files #858

Are you sure you want to change the base?

mongoimport option --dir to load a folder full of JSON files #858

Uh oh!

Conversation

masyukun commented Dec 19, 2025

Uh oh!

FGasper commented Jan 6, 2026

Uh oh!

FGasper left a comment

Choose a reason for hiding this comment

Uh oh!

masyukun commented Jan 12, 2026

Uh oh!

FGasper commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

masyukun commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FGasper commented Jan 12, 2026

Uh oh!

masyukun commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Data folder

Method 1: parallel using *NIX pipes

Command

Prerequisites

Output

Evaluation

Method 2: --dir flag

Command

Prerequisites

Output

Evaluation

Performance metrics comparison

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FGasper commented Jan 12, 2026 •

edited

Loading

masyukun commented Jan 12, 2026 •

edited

Loading

masyukun commented Jan 12, 2026 •

edited

Loading