-
Notifications
You must be signed in to change notification settings - Fork 15
feat: URL store data model and data access #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
packages/spacecat-shared-data-access/src/models/audit-url/audit-url.schema.js
Fixed
Show fixed
Hide fixed
- Add rank and traffic fields to AuditUrl schema (optional, nullable) - Implement sortAuditUrls static method for sorting by multiple fields - Add allBySiteIdSorted and allBySiteIdAndSourceSorted methods - Update allBySiteIdAndAuditType to support sorting - Add TypeScript definitions for new fields and methods - Add 20 comprehensive unit tests for sorting functionality - All tests passing (1147 total) - Code coverage: 98.01% for audit-url module
- Add NODE_OPTIONS with --max-old-space-size=4096 to prevent heap out of memory errors - Fixes FATAL ERROR: JavaScript heap out of memory in CI pipeline
Add platformType field to AuditUrl schema to categorize URLs as primary-site or offsite platforms (Wikipedia, YouTube, social media, etc.). Changes: - Add platformType attribute with 11 supported platform types - Add GSI for efficient querying by siteId and platformType - Add collection methods: allBySiteIdAndPlatform(), allOffsiteUrls() - Add model helper methods: isOffsitePlatform(), isPlatformType() - Export PLATFORM_TYPES constant - Update TypeScript definitions - Add 33 comprehensive unit tests Platform types supported: primary-site, wikipedia, youtube-channel, reddit-community, facebook-page, twitter-profile, linkedin-company, instagram-account, tiktok-account, github-org, medium-publication All methods support sorting and pagination.
nitinja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, cant find any issues. Modern JS optional chaining can be used to shorten some of the functions like
this.getPlatformType?.() ?? this.platformType
|
I briefly discussed this with @ravkiran |
This reverts commit 483b27a.
Replace the 'source' string attribute with 'byCustomer' boolean to simplify the data model and clearly distinguish between customer-added (true) and system-added (false) URLs. Changes: - Schema: Replace 'source' with 'byCustomer' (boolean, default: true) - Model: Replace isManualSource() with isCustomerUrl() - Collection: Replace allBySiteIdAndSource with allBySiteIdByCustomer - Collection: Replace allBySiteIdAndSourceSorted with allBySiteIdByCustomerSorted - Collection: Replace removeForSiteIdAndSource with removeForSiteIdByCustomer - Update GSI index from siteId+source to siteId+byCustomer - Update TypeScript definitions Migration mapping: - source='manual' → byCustomer=true - source='sitemap'/'discovery'/other → byCustomer=false
- Update test fixtures to use byCustomer instead of source - Update model tests to use isCustomerUrl instead of isManualSource - Update collection tests for byCustomer methods
- Update all unit tests for model and collection - Update all integration tests - Replace source with byCustomer throughout - Replace isManualSource with isCustomerUrl - Replace allBySiteIdAndSource with allBySiteIdByCustomer - Replace removeForSiteIdAndSource with removeForSiteIdByCustomer
Resolve conflict in test/it/util/db.js: - Use main branch's approach for disabling debug logging - Keep console as default logger and disable debug inline - Both approaches achieve same goal of preventing memory issues
Please ensure your pull request adheres to the following guidelines:
Related Issues
Thanks for contributing!