diff --git a/.env.example b/.env.example new file mode 100644 index 0000000..6fa60f2 --- /dev/null +++ b/.env.example @@ -0,0 +1 @@ +NOTION_API_SECRET=your_notion_api_token_here \ No newline at end of file diff --git a/.gitignore b/.gitignore index a547bf3..a985730 100644 --- a/.gitignore +++ b/.gitignore @@ -22,3 +22,8 @@ dist-ssr *.njsproj *.sln *.sw? + +.env + +notion-data +public/notion-data diff --git a/README-KR.md b/README-KR.md index 60033ff..dcba613 100644 --- a/README-KR.md +++ b/README-KR.md @@ -1,59 +1,145 @@ -# notion-dump +# NotionPresso CLI Notion 페이지의 데이터를 추출하여 로컬에 JSON 형식으로 저장하는 CLI 도구입니다. -## 기능 +## 주요 기능 -- Notion 페이지의 전체 데이터를 JSON 파일로 저장 -- 간단한 명령어로 Notion 페이지 데이터 백업 +- **모든 페이지 자동 추출**: `--all` 옵션으로 권한 있는 모든 페이지 일괄 처리 +- **개별 페이지 추출**: 특정 페이지 URL로 단일 페이지 처리 +- **환경변수 지원**: `.env` 파일에서 API 키 자동 읽기 +- **스마트 파일명**: 페이지 제목 기반의 읽기 쉬운 파일명 생성 +- **이미지 다운로드**: 페이지 내 이미지 자동 다운로드 및 로컬 저장 +- **북마크 메타데이터**: 확장된 북마크 정보 추출 +- **선택적 업데이트**: 변경된 페이지만 자동으로 감지하여 업데이트 ## 설치 -현재 `notion-dump`는 npm에 배포되지 않았습니다. 로컬에서 직접 사용하려면 아래의 단계를 따라주세요. +### npm 설치 -1. 프로젝트 클론 및 디렉토리 이동 +```bash +npm install -g @notionpresso/cli +``` - ```bash - git clone https://github.com/notionpresso/cli.git - ``` +### 로컬 빌드 -2. 의존성 설치 +```bash +git clone https://github.com/notionpresso/cli.git +cd cli +npm install +npm run build +``` - ```bash - npm install - ``` +## 사용 방법 -3. 프로젝트 빌드 - ```bash - npm run build - ``` +### 1. 환경변수 설정 (권장) -## 사용 방법 +프로젝트 루트에 Notion API 토큰을 포함한 `.env` 파일을 생성하세요: + +```bash +echo "NOTION_API_SECRET=secret_your_internal_integration_secret_here" > .env +``` + +> 💡 **API 토큰 얻는 방법:** +> +> 1. [Notion 통합 페이지](https://www.notion.so/my-integrations) 방문 +> 2. 새 통합 생성 또는 기존 통합 선택 +> 3. "Internal Integration Token" 복사 (`secret_`으로 시작) +> 4. 노션 페이지를 해당 통합과 공유 + +### 2. 모든 페이지 추출 + +```bash +npresso --all +``` -1. Notion API 토큰 발급받기 -2. 통합을 Notion 페이지에 연결하기 -3. Notion 페이지 URL 얻기 -4. 스크립트 실행 - ```bash - node ./dist/notionpresso.es.js --page --auth - ``` +### 3. 개별 페이지 추출 -## 옵션 설명 +```bash +npresso --page -- `--page`: (필수) Notion 페이지의 URL -- `--auth`: (필수) Notion API 통합의 토큰 +# 토큰을 직접 지정하는 경우 +npresso --page --auth +``` + +## 명령어 옵션 + +- `--all`: 권한 있는 모든 페이지를 자동으로 찾아 추출 +- `--page `: 특정 페이지 URL 또는 ID 지정 +- `--auth `: Notion API 토큰 (환경변수 `NOTION_API_SECRET` 사용 권장) +- `--output-dir `: JSON 파일 출력 디렉토리 (기본값: `notion-data`) +- `--image-dir `: 이미지 파일 출력 디렉토리 (기본값: `public/notion-data`) ## 출력 결과 -- 현재 작업 디렉토리에 `content/[page-id]/index.json` 파일이 생성됩니다. -- `content` 폴더가 없으면 자동으로 생성됩니다. -- 동일한 페이지 ID의 파일이 이미 있으면 덮어쓰기됩니다. +### 파일 구조 + +``` +notion-data/ +├── pages.json ← 모든 페이지 목록 (--all 사용시) +├── my-blog-post.json ← 개별 페이지 데이터 (제목 기반 파일명) +└── about-me.json + +public/notion-data/ +├── my-blog-post/ ← 페이지별 이미지 폴더 +│ ├── image1.png +│ └── image2.jpg +└── about-me/ + └── profile.jpg +``` + +### pages.json 형식 + +```json +{ + "pages": [ + { + "id": "page-id", + "title": "My Blog Post", + "last_edited_time": "2024-01-01T10:00:00.000Z", + "fileName": "my-blog-post" + } + ] +} +``` + +## 사용 예시 + +### 초기 설정 + +```bash +# 1. API 키 설정 +echo "NOTION_API_SECRET=secret_your_internal_integration_secret_here" > .env + +# 2. 모든 페이지 추출 +npresso --all +``` + +### 개별 페이지 처리 + +```bash +npresso --page https://notion.so/user/My-Page-abc123def456 +``` + +### 커스텀 디렉토리 + +```bash +npresso --all --output-dir custom-data --image-dir assets/images +``` + +## 주요 개선사항 + +- ✅ **자동 페이지 발견**: Search API로 수동 URL 입력 불필요 +- ✅ **읽기 쉬운 파일명**: ID 대신 페이지 제목 기반 파일명 +- ✅ **환경변수 지원**: 매번 토큰 입력 불필요 +- ✅ **이미지 다운로드**: 완전한 오프라인 백업 +- ✅ **증분 업데이트**: 변경된 페이지만 처리로 빠른 동기화 +- ✅ **페이지 목록 생성**: 프론트엔드 연동용 `pages.json` 제공 ## 주의사항 -- 이미지 다운로드 기능은 아직 구현되지 않았습니다. -- 출력 디렉토리와 파일명은 기본값으로 설정되며, 사용자 지정 옵션은 지원하지 않습니다. -- Notion 데이터베이스 추출 기능은 지원하지 않습니다. +- Notion API의 Search 기능 사용으로 모든 페이지가 발견되지 않을 수 있음 +- 페이지가 통합(integration)과 공유되어 있어야 접근 가능 +- 대량의 페이지 처리시 시간이 소요될 수 있음 ## 기여 diff --git a/README.md b/README.md index e69de29..a77882f 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,150 @@ +# NotionPresso CLI + +A CLI tool to extract Notion page data and save it locally in JSON format. + +## Key Features + +- **Automatic Page Discovery**: Extract all accessible pages with `--all` option +- **Individual Page Extraction**: Process specific pages using URL +- **Environment Variable Support**: Auto-read API keys from `.env` file +- **Smart File Naming**: Readable file names based on page titles +- **Image Download**: Automatic download and local storage of page images +- **Bookmark Metadata**: Extended bookmark information extraction +- **Smart Updates**: Selective updates for changed pages only + +## Installation + +### npm Installation + +```bash +npm install -g @notionpresso/cli +``` + +### Local Build + +```bash +git clone https://github.com/notionpresso/cli.git +cd cli +npm install +npm run build +``` + +## Usage + +### 1. Environment Setup (Recommended) + +Create a `.env` file in your project root with your Notion API token: + +```bash +echo "NOTION_API_SECRET=secret_your_internal_integration_secret_here" > .env +``` + +> 💡 **How to get your API token:** +> +> 1. Go to [Notion Integrations](https://www.notion.so/my-integrations) +> 2. Create a new integration or select existing one +> 3. Copy the "Internal Integration Token" (starts with `secret_`) +> 4. Share your Notion pages with the integration + +### 2. Extract All Pages + +```bash +npresso --all +``` + +### 3. Extract Individual Page + +```bash +npresso --page + +# With direct token specification +npresso --page --auth +``` + +## Command Options + +- `--all`: Automatically find and extract all accessible pages +- `--page `: Specify a particular page URL or ID +- `--auth `: Notion API token (recommend using `NOTION_API_SECRET` env var) +- `--output-dir `: JSON file output directory (default: `notion-data`) +- `--image-dir `: Image file output directory (default: `public/notion-data`) + +## Output Structure + +### File Structure + +``` +notion-data/ +├── pages.json ← List of all pages (when using --all) +├── my-blog-post.json ← Individual page data (title-based filename) +└── about-me.json + +public/notion-data/ +├── my-blog-post/ ← Per-page image folder +│ ├── image1.png +│ └── image2.jpg +└── about-me/ + └── profile.jpg +``` + +### pages.json Format + +```json +{ + "pages": [ + { + "id": "page-id", + "title": "My Blog Post", + "last_edited_time": "2024-01-01T10:00:00.000Z", + "fileName": "my-blog-post" + } + ] +} +``` + +## Usage Examples + +### Initial Setup + +```bash +# 1. Set API key +echo "NOTION_API_SECRET=secret_your_internal_integration_secret_here" > .env + +# 2. Extract all pages +npresso --all +``` + +### Individual Page Processing + +```bash +npresso --page https://notion.so/user/My-Page-abc123def456 +``` + +### Custom Directories + +```bash +npresso --all --output-dir custom-data --image-dir assets/images +``` + +## Key Improvements + +- ✅ **Automatic Page Discovery**: No manual URL input needed with Search API +- ✅ **Readable File Names**: Title-based filenames instead of IDs +- ✅ **Environment Variable Support**: No need to input token every time +- ✅ **Image Download**: Complete offline backup +- ✅ **Incremental Updates**: Fast sync by processing only changed pages +- ✅ **Page List Generation**: `pages.json` for frontend integration + +## Limitations + +- Notion API Search functionality may not discover all pages +- Pages must be shared with the integration to be accessible +- Processing large numbers of pages may take time + +## Contributing + +Contributions are welcome! Please see the [Contributing Guide](./CONTRIBUTING.md) for details. + +## License + +MIT License diff --git a/package-lock.json b/package-lock.json index 1030756..43ff390 100644 --- a/package-lock.json +++ b/package-lock.json @@ -18,6 +18,7 @@ "devDependencies": { "@ryoppippi/unplugin-typia": "^1.0.6", "@types/node": "^22.7.7", + "dotenv": "^17.2.1", "ts-patch": "^3.2.1", "typescript": "^5.6.2", "typia": "^6.10.2", @@ -1312,6 +1313,19 @@ "dev": true, "license": "Apache-2.0" }, + "node_modules/dotenv": { + "version": "17.2.1", + "resolved": "https://registry.npmjs.org/dotenv/-/dotenv-17.2.1.tgz", + "integrity": "sha512-kQhDYKZecqnM0fCnzI5eIv5L4cAe/iRI+HqMbO/hbRdTAeXDG+M9FjipUxNfbARuEg4iHIbhnhs78BCHNbSxEQ==", + "dev": true, + "license": "BSD-2-Clause", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://dotenvx.com" + } + }, "node_modules/drange": { "version": "1.1.1", "resolved": "https://registry.npmjs.org/drange/-/drange-1.1.1.tgz", diff --git a/package.json b/package.json index 76764b0..53714d5 100644 --- a/package.json +++ b/package.json @@ -1,18 +1,20 @@ { "name": "@notionpresso/cli", "private": false, - "version": "0.0.2", + "version": "0.0.3", "type": "module", "scripts": { "dev": "vite", "build": "tsc && vite build", "preview": "vite preview", "test": "vitest", - "prepare": "ts-patch install && typia patch" + "prepare": "ts-patch install && typia patch", + "cli": "node dist/notionpresso.es.js" }, "devDependencies": { "@ryoppippi/unplugin-typia": "^1.0.6", "@types/node": "^22.7.7", + "dotenv": "^17.2.1", "ts-patch": "^3.2.1", "typescript": "^5.6.2", "typia": "^6.10.2", diff --git a/src/lib/download-image.ts b/src/lib/download-image.ts index 08718c1..53ad647 100644 --- a/src/lib/download-image.ts +++ b/src/lib/download-image.ts @@ -1,12 +1,8 @@ -import * as fs from "fs"; -import * as path from "path"; -import { getFileExtension } from "./file-extension-utils"; -import { Block } from "@cozy-blog/notion-client"; -import { - getImageUrl, - isImageBlock, - updateImageUrl, -} from "./download-image.helper"; +import * as fs from 'fs'; +import * as path from 'path'; +import { getFileExtension } from './file-extension-utils'; +import { Block } from '@cozy-blog/notion-client'; +import { getImageUrl, isImageBlock, updateImageUrl } from './download-image.helper'; async function downloadImage({ url, @@ -19,7 +15,7 @@ async function downloadImage({ const arrayBuffer = await response.arrayBuffer(); await fs.promises.writeFile(outputPath, Buffer.from(arrayBuffer)); - return response.headers.get("Content-Type") || ""; + return response.headers.get('Content-Type') || ''; } async function updateImageOnBlock( @@ -32,11 +28,12 @@ async function updateImageOnBlock( imageDir: string; pageId: string; }, - imageCounter: { count: number }, + imageCounter: { count: number } ): Promise { if (isImageBlock(block)) { const originalUrl = getImageUrl(block); const imageName = `image_${imageCounter.count}`; + await fs.promises.mkdir(imageDir, { recursive: true }); const tempPath = path.join(imageDir, `${imageName}_temp`); try { @@ -82,9 +79,9 @@ export async function updateImageOnBlocks({ pageId: string; imageCounter?: { count: number }; }): Promise { - const updatePromises = blocks.map((block) => - updateImageOnBlock({ block, imageDir, pageId }, imageCounter), + const updatePromises = blocks.map(block => + updateImageOnBlock({ block, imageDir, pageId }, imageCounter) ); await Promise.all(updatePromises); -} \ No newline at end of file +} diff --git a/src/lib/dump-page.ts b/src/lib/dump-page.ts index 2784661..91b8cd9 100644 --- a/src/lib/dump-page.ts +++ b/src/lib/dump-page.ts @@ -1,40 +1,43 @@ -import { Client } from "@cozy-blog/notion-client"; -import * as fs from "fs"; -import * as path from "path"; -import { updateImageOnBlocks } from "./download-image"; +import { Client } from '@cozy-blog/notion-client'; +import * as fs from 'fs'; +import * as path from 'path'; +import { updateImageOnBlocks } from './download-image'; +import { getPageTitle, sanitizeFileName } from './page-utils'; export async function fetchAndSavePageData({ client, pageId, outputDir, imageOutDir, + fileName, + checkExisting = false, }: { client: Client; pageId: string; outputDir: string; imageOutDir: string; -}): Promise { - // Fetch full page data + fileName?: string; + checkExisting?: boolean; +}): Promise<{ title: string; skipped: boolean }> { const fullPage = await client.fetchFullPage(pageId); + const title = getPageTitle(fullPage); + const finalFileName = fileName || sanitizeFileName(title, pageId); - // Create image directory - fs.mkdirSync(imageOutDir, { recursive: true }); + const outputFile = path.join(outputDir, `${finalFileName}.json`); + const finalImageOutDir = path.join(imageOutDir, finalFileName); + + if (checkExisting && fs.existsSync(outputFile)) { + return { title, skipped: true }; + } await updateImageOnBlocks({ blocks: fullPage.blocks, - imageDir: imageOutDir, - pageId, // pageId 전달 + imageDir: finalImageOutDir, + pageId, }); - // Define the output file path - const outputFile = path.join(outputDir, `${pageId}.json`); - - // Create the directory if it doesn't exist fs.mkdirSync(outputDir, { recursive: true }); + fs.writeFileSync(outputFile, JSON.stringify(fullPage, null, 2), 'utf-8'); - // Write the updated data to index.json (overwrite if it exists) - fs.writeFileSync(outputFile, JSON.stringify(fullPage, null, 2), "utf-8"); - - console.log(`Page data saved to ${outputFile}`); - console.log(`Images saved to ${imageOutDir}`); -} \ No newline at end of file + return { title, skipped: false }; +} diff --git a/src/lib/file-extension-utils.ts b/src/lib/file-extension-utils.ts index f7e2a0f..6bec9ca 100644 --- a/src/lib/file-extension-utils.ts +++ b/src/lib/file-extension-utils.ts @@ -1,48 +1,40 @@ -import * as path from "path"; +import * as path from 'path'; export type SupportedImageMimeType = - | "image/jpeg" - | "image/png" - | "image/gif" - | "image/webp" - | "image/svg+xml"; - -export type SupportedImageExtension = - | ".jpg" - | ".png" - | ".gif" - | ".webp" - | ".svg"; - -export const DEFAULT_IMAGE_EXTENSION: SupportedImageExtension = ".jpg"; - -const mimeTypeToExtensionMap: Record< - SupportedImageMimeType, - SupportedImageExtension -> = { - "image/jpeg": ".jpg", - "image/png": ".png", - "image/gif": ".gif", - "image/webp": ".webp", - "image/svg+xml": ".svg", + | 'image/jpeg' + | 'image/png' + | 'image/gif' + | 'image/webp' + | 'image/svg+xml'; + +export type SupportedImageExtension = '.jpg' | '.png' | '.gif' | '.webp' | '.svg'; + +export const DEFAULT_IMAGE_EXTENSION: SupportedImageExtension = '.jpg'; + +const mimeTypeToExtensionMap: Record = { + 'image/jpeg': '.jpg', + 'image/png': '.png', + 'image/gif': '.gif', + 'image/webp': '.webp', + 'image/svg+xml': '.svg', }; export function getFileExtensionFromContentType( - contentType: string, + contentType: string ): SupportedImageExtension | undefined { return mimeTypeToExtensionMap[contentType as SupportedImageMimeType]; } export function getFileExtensionFromUrl(url: string): string { - const urlSegments = url.split("/"); + const urlSegments = url.split('/'); const filenameWithQuery = urlSegments[urlSegments.length - 1]; - const filename = filenameWithQuery.split("?")[0]; + const filename = filenameWithQuery.split('?')[0]; return path.extname(filename); } export function getFileExtension( contentType: string, - originalUrl: string, + originalUrl: string ): SupportedImageExtension { const extensionFromContentType = getFileExtensionFromContentType(contentType); if (extensionFromContentType) return extensionFromContentType; @@ -50,12 +42,10 @@ export function getFileExtension( const extensionFromUrl = getFileExtensionFromUrl(originalUrl); if ( extensionFromUrl && - Object.values(mimeTypeToExtensionMap).includes( - extensionFromUrl as SupportedImageExtension, - ) + Object.values(mimeTypeToExtensionMap).includes(extensionFromUrl as SupportedImageExtension) ) { return extensionFromUrl as SupportedImageExtension; } return DEFAULT_IMAGE_EXTENSION; -} \ No newline at end of file +} diff --git a/src/lib/file-manager.ts b/src/lib/file-manager.ts new file mode 100644 index 0000000..b83cf8a --- /dev/null +++ b/src/lib/file-manager.ts @@ -0,0 +1,47 @@ +import * as fs from 'fs'; +import * as path from 'path'; +import type { PageInfo, PagesData } from './types'; + +export function loadExistingPagesData(outputDir: string): PagesData | null { + try { + const pagesFile = path.join(outputDir, 'pages.json'); + if (!fs.existsSync(pagesFile)) { + return null; + } + const content = fs.readFileSync(pagesFile, 'utf-8'); + return JSON.parse(content); + } catch { + return null; + } +} + +export function savePagesData(pages: PageInfo[], outputDir: string): void { + const pagesData: PagesData = { pages }; + const outputFile = path.join(outputDir, 'pages.json'); + + fs.mkdirSync(outputDir, { recursive: true }); + fs.writeFileSync(outputFile, JSON.stringify(pagesData, null, 2), 'utf-8'); +} + +export function getPagesToUpdate( + newPages: PageInfo[], + existingData: PagesData | null, + outputDir: string +): PageInfo[] { + if (!existingData) { + return newPages; + } + + const existingMap = new Map(existingData.pages.map(page => [page.id, page.last_edited_time])); + + return newPages.filter(page => { + const existingTime = existingMap.get(page.id); + const jsonFile = path.join(outputDir, `${page.fileName}.json`); + + if (!fs.existsSync(jsonFile)) { + return true; + } + + return !existingTime || new Date(page.last_edited_time) > new Date(existingTime); + }); +} diff --git a/src/lib/index.ts b/src/lib/index.ts index ff0c62c..64a274b 100644 --- a/src/lib/index.ts +++ b/src/lib/index.ts @@ -1,83 +1,100 @@ #!/usr/bin/env node +import { Command } from 'commander'; +import { Client } from '@cozy-blog/notion-client'; +import { fetchAndSavePageData } from './dump-page'; +import { syncAllPages } from './sync-all-pages'; +import dotenv from 'dotenv'; +import * as path from 'path'; -import { Command } from "commander"; -import { Client } from "@cozy-blog/notion-client"; -import { extractPageIdFromUrl } from "./page-id-extractor"; -import * as path from "path"; -import { fetchAndSavePageData } from "./dump-page"; -import typia from "typia"; - -const DEFAULT_OUTPUT_DIR = "notion-data"; -const DEFAULT_IMAGE_OUT_DIR = "public/notion-data"; - -interface CLIOptions { - page: string; - auth: string; - dir?: string; - imageDir?: string; -} +dotenv.config({ path: path.join(process.cwd(), '.env') }); const program = new Command(); program .name('npresso') - .description('CLI tool for downloading Notion pages and their assets') - .version('0.0.2') - -program - .requiredOption( - '--page ', - 'Notion page ID or URL (e.g., myblog/page-id-123 or just page-id-123). Note: You don\'t need to include "https://www.notion.so/"' - ) - .requiredOption( - '--auth ', - 'Notion API integration token (See tutorial: https://notionpresso.com/en/tutorial, or create one at https://www.notion.so/my-integrations)' - ) - .option( - '--dir ', - 'Directory where the page content will be saved', - 'notion-data' - ) - .option( - '--image-dir ', - 'Directory where the page images will be saved', - 'public/notion-data' - ) - .addHelpText('after', ` -Example: - $ npresso --page myblog/page-id-123 --auth secret_token... - $ npresso --page page-id-123 --auth secret_token... --dir custom-dir --image-dir images - `); - -program.parse(process.argv); + .description('NotionPresso CLI - Export Notion pages to JSON') + .version('1.0.0') + .option('--all', 'Get all accessible pages') + .option('--page ', 'Specific page URL or ID') + .option('--auth ', 'Notion API token (or set NOTION_API_SECRET env var)') + .option('--output-dir ', 'Output directory for JSON files', 'notion-data') + .option('--image-dir ', 'Output directory for images', 'public/notion-data'); + +program.parse(); const options = program.opts(); -if (!typia.is(options)) { - console.error("Invalid options", options); - process.exit(1); -} +async function main() { + try { + const apiToken = options.auth || process.env.NOTION_API_SECRET; -const pageId = extractPageIdFromUrl(options.page); + if (!apiToken) { + console.error('❌ Error: No API token provided.'); + console.error(' Set NOTION_API_SECRET environment variable or use --auth option'); + console.error(' Example: npresso --auth your_api_token'); + process.exit(1); + } -const outputDir = path.join(process.cwd(), options.dir || DEFAULT_OUTPUT_DIR); + const client = new Client({ + auth: apiToken, + }); -const imageOutDir = path.join( - process.cwd(), - options.imageDir || DEFAULT_IMAGE_OUT_DIR, - pageId, -); + if (options.all) { + await syncAllPages(client, options.outputDir, options.imageDir); + } else if (options.page) { + const pageId = extractPageId(options.page); -const client = new Client({ auth: options.auth }); + const result = await fetchAndSavePageData({ + client, + pageId, + outputDir: options.outputDir, + imageOutDir: options.imageDir, + checkExisting: true, + }); -/** - * fetch and save page data - */ -(async () => { - try { - await fetchAndSavePageData({ client, pageId, outputDir, imageOutDir }); - } catch (error: any) { - console.error("Error fetching page data:", error.message); + if (result.skipped) { + console.log('✅ Page is up to date'); + } else { + console.log(`📥 Processing: ${result.title}`); + console.log('✅ Page processing completed!'); + } + } else { + console.error('❌ Error: Either --all or --page option is required'); + console.error(' Examples:'); + console.error(' npresso --all'); + console.error(' npresso --page your-page-url'); + process.exit(1); + } + } catch (error) { + console.error('❌ Unexpected error:', error); process.exit(1); } -})(); \ No newline at end of file +} + +function extractPageId(pageUrl: string): string { + const uuidPattern = /([a-f0-9]{32})/i; + const match = pageUrl.match(uuidPattern); + + if (match) { + return match[1]; + } + + const hyphenUuidPattern = /([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})/i; + const hyphenMatch = pageUrl.match(hyphenUuidPattern); + + if (hyphenMatch) { + return hyphenMatch[1].replace(/-/g, ''); + } + + const parts = pageUrl.split('-'); + if (parts.length > 0) { + const lastPart = parts[parts.length - 1]; + if (/^[a-f0-9]{32}$/i.test(lastPart)) { + return lastPart; + } + } + + return pageUrl; +} + +main(); diff --git a/src/lib/page-utils.ts b/src/lib/page-utils.ts new file mode 100644 index 0000000..113aedb --- /dev/null +++ b/src/lib/page-utils.ts @@ -0,0 +1,51 @@ +import { FILE_CONSTANTS } from './types'; +import type { PageInfo } from './types'; + +export function getPageTitle(page: any): string { + if (page.properties?.title?.title?.[0]?.plain_text) { + return page.properties.title.title[0].plain_text; + } + + for (const prop of Object.values(page.properties || {})) { + if ((prop as any)?.type === 'title' && (prop as any)?.title?.[0]?.plain_text) { + return (prop as any).title[0].plain_text; + } + } + + return 'Untitled'; +} + +export function sanitizeFileName(title: string, fallbackId: string): string { + const sanitized = title + .replace(FILE_CONSTANTS.INVALID_FILENAME_CHARS, '-') + .replace(FILE_CONSTANTS.WHITESPACE, '-') + .replace(FILE_CONSTANTS.MULTIPLE_DASHES, '-') + .replace(FILE_CONSTANTS.LEADING_TRAILING_DASHES, '') + .toLowerCase(); + + if (!sanitized) { + return fallbackId; + } + + if (sanitized.length > FILE_CONSTANTS.MAX_FILENAME_LENGTH) { + const truncated = sanitized + .slice(0, FILE_CONSTANTS.MAX_FILENAME_LENGTH) + .replace(FILE_CONSTANTS.LEADING_TRAILING_DASHES, ''); + + return truncated || fallbackId; + } + + return sanitized; +} + +export function createPageInfo(page: any): PageInfo { + const title = getPageTitle(page); + const fileName = sanitizeFileName(title, page.id); + + return { + id: page.id, + title, + last_edited_time: page.last_edited_time, + fileName, + }; +} diff --git a/src/lib/sync-all-pages.ts b/src/lib/sync-all-pages.ts new file mode 100644 index 0000000..a642531 --- /dev/null +++ b/src/lib/sync-all-pages.ts @@ -0,0 +1,83 @@ +import { Client } from '@cozy-blog/notion-client'; +import { fetchAndSavePageData } from './dump-page'; +import type { PageInfo } from './types'; +import { createPageInfo } from './page-utils'; +import { loadExistingPagesData, savePagesData, getPagesToUpdate } from './file-manager'; + +export async function syncAllPages( + client: Client, + outputDir: string, + imageDir: string +): Promise { + const pages = await searchAllPages(client); + + if (pages.length === 0) { + console.log('ℹ️ No pages to sync'); + return; + } + + const existingPagesData = loadExistingPagesData(outputDir); + const pagesToUpdate = getPagesToUpdate(pages, existingPagesData, outputDir); + + savePagesData(pages, outputDir); + + if (pagesToUpdate.length === 0) { + console.log('✅ All pages are up to date'); + return; + } + + console.log(`📥 Syncing ${pagesToUpdate.length}/${pages.length} pages...`); + + await processPages(pagesToUpdate, client, outputDir, imageDir); + + console.log('✅ Sync completed!'); +} + +async function searchAllPages(client: Client): Promise { + try { + const searchResponse = await client.search({ + filter: { + value: 'page', + property: 'object', + }, + }); + + if (!searchResponse.results || searchResponse.results.length === 0) { + return []; + } + + const pages: PageInfo[] = searchResponse.results + .filter((result: any) => result.object === 'page') + .map((page: any) => createPageInfo(page)); + + console.log(`✅ Found ${pages.length} pages`); + return pages; + } catch (error) { + console.error('❌ Error searching pages:', error); + throw error; + } +} + +async function processPages( + pages: PageInfo[], + client: Client, + outputDir: string, + imageDir: string +): Promise { + for (let i = 0; i < pages.length; i++) { + const page = pages[i]; + console.log(`[${i + 1}/${pages.length}] Processing: ${page.title}`); + + try { + await fetchAndSavePageData({ + client, + pageId: page.id, + outputDir: outputDir, + imageOutDir: imageDir, + fileName: page.fileName, + }); + } catch (error) { + console.error(`❌ Failed: ${page.title}`, String(error)); + } + } +} diff --git a/src/lib/types.ts b/src/lib/types.ts new file mode 100644 index 0000000..2d48bfa --- /dev/null +++ b/src/lib/types.ts @@ -0,0 +1,28 @@ +export interface PageInfo { + id: string; + title: string; + last_edited_time: string; + fileName: string; +} + +export interface PagesData { + pages: PageInfo[]; +} + +export const FILE_CONSTANTS = { + INVALID_FILENAME_CHARS: /[<>:"/\\|?*]/g, + MAX_FILENAME_LENGTH: 50, + WHITESPACE: /\s+/g, + MULTIPLE_DASHES: /-+/g, + LEADING_TRAILING_DASHES: /^-|-$/g, +} as const; + +export const LOG_SYMBOLS = { + START: '🚀', + SEARCH: '🔍', + SUCCESS: '✅', + ERROR: '❌', + INFO: 'ℹ️', + SYNC: '📥', + FILE: '📄', +} as const;