Skip to content

feat: get document content by tabs#2

Open
juansepl wants to merge 5 commits intomainfrom
feat/get-content-google-docs-by-tabs
Open

feat: get document content by tabs#2
juansepl wants to merge 5 commits intomainfrom
feat/get-content-google-docs-by-tabs

Conversation

@juansepl
Copy link
Collaborator

🚀 Enhanced Google Docs Content Extraction with Tables and Tabs Support
Summary
This PR introduces significant improvements to Google Docs content extraction capabilities, adding support for processing tables, tabs, and other document elements with better formatting and structure preservation.

🆕 New Features

  1. New Tool:
    get_doc_content_with_tabs
    Added a new MCP tool specifically designed for documents containing tabs
    Uses Google Docs API's includeTabsContent=True parameter to retrieve complete tab structure
    Processes both main document content and all tab content recursively
    Supports child tabs and nested tab structures
  2. Enhanced Content Processing
    Table Support: Added comprehensive table processing with proper markdown-style formatting
    Extracts table rows and cells with proper alignment
    Includes header row separation
    Preserves table structure and cell content
    Advanced Paragraph Processing: Improved paragraph handling with support for:
    Bullet lists and numbered lists with proper indentation
    Nested list structures
    Rich text formatting preservation
    Additional Element Support:
    Section breaks detection
    Table of contents recognition
    Better handling of various document elements
  3. Improved Document Structure
    Hierarchical Processing: Recursive processing of document elements
    Better Content Organization: Clear separation between main content and tabs
    Enhanced Metadata: Detailed tab information including titles, indices, and IDs
    🔧 Technical Improvements
    Authentication & Service Management
    Updated service decorators to use require_multiple_services for better service management
    Improved parameter handling in
    start_google_auth
    function
    Enhanced error handling for document processing operations
    Code Structure
    Modular content processing functions:
    process_paragraph()
    : Handles paragraph elements with formatting
    process_table()
    : Extracts and formats table content
    process_content_elements()
    : Recursively processes mixed content types
    Better separation of concerns between content extraction and formatting
    📊 Features Added
    Content Types Now Supported:

✅ Tables with proper markdown formatting
✅ Bullet and numbered lists with indentation
✅ Document tabs and child tabs
✅ Section breaks and document structure elements
✅ Table of contents sections
✅ Rich text formatting preservation
API Enhancements:

✅ includeTabsContent parameter usage
✅ Recursive tab processing
✅ Enhanced metadata extraction
✅ Better error handling and logging
🎯 Use Cases
This enhancement enables:

Complex Document Processing: Handle large documents with multiple tabs and sections
Structured Content Extraction: Extract tables and lists with proper formatting
Document Navigation: Access specific tabs and sections within documents
Content Analysis: Better understanding of document structure and organization
🧪 Testing
Tested with documents containing multiple tabs and child tabs
Verified table extraction with various table structures
Confirmed proper handling of nested content elements
Validated formatting preservation across different content types
📝 Files Modified
gdocs/docs_tools.py
: Added new tool and enhanced content processing
core/server.py
: Updated authentication parameter handling
This PR significantly enhances the Google Docs integration capabilities, making it possible to extract and process complex document structures with high fidelity.

@juansepl juansepl requested a review from alejandrorico9 June 27, 2025 22:55
@juansepl juansepl self-assigned this Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant