-
Notifications
You must be signed in to change notification settings - Fork 12
Clickhouse integration #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/clickhouseEngine.ts
Outdated
| const processors: { | ||
| [partnerId: string]: undefined | ((rawTx: unknown) => StandardTx) | ||
| } = { | ||
| banxa: processBanxaTx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the processTx routnies should only be used for a one time migration. After that the clickhouseEngine can directly read the database StandardTx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment wasn't addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ask: don't run processor functions at all in the clickhouse loop. Just use the StandardTx type to migrate from couch to clickhouse
a77ba76 to
3a9b53b
Compare
src/clickhouseEngine.ts
Outdated
| start_key: 0, | ||
| end_key: config.clickhouseIndexVersion, | ||
| inclusive_end: false | ||
| skip: PAGE_SIZE * i++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use couchdb bookmarks instead. far more efficient
src/clickhouseEngine.ts
Outdated
| let standardTx: StandardTx | ||
| try { | ||
| standardTx = processor(row.doc?.rawTx) | ||
| standardTx = processor(doc.rawTx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ignored my previous comment that we do NOT need to use each providers processor during the migration engine.
The processTx only needs to happen whenever we wish to change the schema based on the rawTx. This should be a one time migration. After that, just grab the standardTx from the database to populate clickhouse. This prevents needing to update the engine code everytime we add a new provider.
| // We've reached the end of the view index, so we'll continue but with a | ||
| // delay so as not to thrash the couchdb unnecessarily. | ||
| if (response.rows.length !== PAGE_SIZE) { | ||
| if (response.docs.length !== PAGE_SIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use couchdb bookmarks as mentioned
| Math.round(standardTx.timestamp), | ||
| standardTx.usdValue, | ||
| standardTx.indexVersion | ||
| config.clickhouseIndexVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this wouldn't be needed anymore now that we have the updatetime. Removing this prevents extra writes to couchdb
src/clickhouseEngine.ts
Outdated
| }) | ||
| ) | ||
|
|
||
| lastDocUpdateTime = standardTx.updateTime.toISOString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better efficiency:
const txUpdateTime = standardTx.updateTime.toISOString()
if (lastDocUpdateTime == null || lastDocUpdateTime < txUpdateTime) {
lastDocUpdateTime = txUpdateTime
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so just max(last, curr)
src/clickhouseEngine.ts
Outdated
| if (response.docs.length !== PAGE_SIZE) { | ||
| i = 0 | ||
| if (lastDocUpdateTime != null) { | ||
| afterTime = lastDocUpdateTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afterTime should be saved in couch or clickhouse somewhere. Otherwise we'll migrate all of couch to clickhouse on reboot of the server.
reports_settings would be a good db in couch to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense.
src/clickhouseEngine.ts
Outdated
| const processors: { | ||
| [partnerId: string]: undefined | ((rawTx: unknown) => StandardTx) | ||
| } = { | ||
| banxa: processBanxaTx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment wasn't addressed
167301d to
c1bb57a
Compare
| const { partnerId } = getDocumentIdentifiers(doc._id) | ||
| const processor = processors[partnerId] | ||
|
|
||
| if (processor == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would a processor every be null?
| try { | ||
| standardTx = processor(doc.rawTx) | ||
| } catch (error) { | ||
| datelog(`Error processing ${doc._id}`, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should throw otherwise this document won't be migrated and we'd need to fix it before continuing.
| standardTx.payoutCurrency, | ||
| standardTx.payoutAmount, | ||
| standardTx.status, | ||
| Math.round(standardTx.timestamp), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why round. our timestamp is in seconds so there should be a few decimals to account for MS.
| timestamp: number | ||
|
|
||
| /** When the document was created. */ | ||
| createTime?: Date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be called updateTime since it's not when the doc was created but when it was last updated. the reports engine will continually update the doc as it changes status and this would change this time value.
| values: newRows | ||
| }) | ||
| // Update all documents processed | ||
| await dbTransactions.bulk({ docs: newDocs }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we updating the couchdb docs? Nothing was changed in the docs?
CHANGELOG
Does this branch warrant an entry to the CHANGELOG?
Dependencies
Description
none