Skip to content

Conversation

@Zazama
Copy link
Owner

@Zazama Zazama commented Apr 6, 2023

This PR implements file streaming for reading files using the NodeID3.read, NodeID3.write (and update) function.
By doing so, we support reading/writing big files and go easy on memory.

  • We read/write the file in chunks of max. 20 megabytes (if you run node.js, you probably have 20mb to spare)
  • The reader will automatically detect the start of an ID3 Tag and extract it fully into a buffer
    • This assumes that ID3 Tags are small enough to fit into memory
    • If it were bigger, the reading would fail anyways, because the parsed information would not fit into memory
  • Code is slightly repeated because of sync/async support. I haven't found a better way yet
  • On write, we create a temp file next to the original one
    • This is done using the "tmp" module because we need collision handling and I think it's good to keep the files in the same folder so it can be found easily on failure.
    • First, the new ID3 Tag is written at the beginning
    • Then the rest of the old file is streamed into the new one. If an old ID3 Tag is found, it is skipped
  • The type definition of a NodeID3.write callback should not return a buffer if a file is being manipulated. This is also how it is documented in the README.md

Related issue: #161

@Zazama Zazama linked an issue Apr 6, 2023 that may be closed by this pull request
@Zazama Zazama removed a link to an issue Apr 6, 2023
@Zazama Zazama changed the title Move file reading to streaming Move file reading and writing to streaming Apr 7, 2023
Copy link
Contributor

@pbricout pbricout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a partial review and addressed most of the points here, this is still work in progress there is bit more to do, but this is a good start.

streamOriginalIntoNewFileSync(readFileDescriptor, writeFileDescriptor)
})
})
fs.unlinkSync(filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, rename will overwrite the destination and is safer as the operation will be atomic.

Comment on lines +51 to +67
function getTmpFilePathSync(filepath: string): string {
const parsedPath = path.parse(filepath)
return tmp.tmpNameSync({
tmpdir: parsedPath.dir,
template: `${parsedPath.base}.tmp-XXXXXX`,
})
}

function getTmpFileAsync(filepath: string, callback: tmp.TmpNameCallback) {
const parsedPath = path.parse(filepath)
tmp.tmpName({
tmpdir: parsedPath.dir,
template: `${parsedPath.base}.tmp-XXXXXX`,
}, (err, filename) => {
callback(err, filename)
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common part could be factored in a function getTmpNameOptions(filename) => then tmp.tmpName(getTmpNameOptions(flename)).


function getTmpFileAsync(filepath: string, callback: tmp.TmpNameCallback) {
const parsedPath = path.parse(filepath)
tmp.tmpName({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we promisify that one too so we can use await and simplify writeId3TagToFileAsync().

await streamOriginalIntoNewFileAsync(readFileDescriptor, writeFileDescriptor)
})
}).then(async () => {
await fsUnlinkPromise(filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, rename will overwrite the destination and is safer as the operation will be atomic.

Comment on lines +4 to +9
export const fsOpenPromise = promisify(fs.open)
export const fsReadPromise = promisify(fs.read)
export const fsClosePromise = promisify(fs.close)
export const fsWritePromise = promisify(fs.write)
export const fsUnlinkPromise = promisify(fs.unlink)
export const fsRenamePromise = promisify(fs.rename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could export an object:

export const fsPromise = {
    open: promisify(fs.open)
    // ...
}

import * as fs from 'fs'
import { findId3TagPosition, getId3TagSize } from "./id3-tag"

const FileBufferSize = 20 * 1024 * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an option with a default, this will allow to write tests with smaller files.


const FileBufferSize = 20 * 1024 * 1024

export function writeId3TagToFileSync(filepath: string, id3Tag: Buffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temp file is not deleted in case of error.

fs.renameSync(tmpFile, filepath)
}

export function writeId3TagToFileAsync(filepath: string, id3Tag: Buffer, callback: (err: Error|null) => void) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temp file is not deleted in case of error.

*/
export function write(
tags: WriteTags, filebuffer: string | Buffer, callback: WriteCallback
tags: WriteTags, filebuffer: string | Buffer, callback: WriteFileCallback | WriteCallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not make the existing API even more complicated, let's take this opportunity to create a new release with a simplified API.


export function writeId3TagToFileSync(filepath: string, id3Tag: Buffer) {
const tmpFile = getTmpFilePathSync(filepath)
processFile(filepath, 'r', (readFileDescriptor) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails if the source file does not exist, but we want to support this case, if there is no source file, we should just write.

fs.renameSync(tmpFile, filepath)
}

export function writeId3TagToFileAsync(filepath: string, id3Tag: Buffer, callback: (err: Error|null) => void) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could optimize by doing the process of the buffer during an async read/write operation, but this is a bit more complicated, let's leave it for another PR.

const buffer = Buffer.alloc(FileBufferSize)
let data
while((data = getNextBufferSubarraySync(readFileDescriptor, buffer)).length) {
const id3TagPosition = findId3TagPosition(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug, we may miss the tag if across two readings, we need to use the same trick than in read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants