Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Enhanced Google Docs Content Extraction with Tables and Tabs Support
Summary
This PR introduces significant improvements to Google Docs content extraction capabilities, adding support for processing tables, tabs, and other document elements with better formatting and structure preservation.
🆕 New Features
get_doc_content_with_tabs
Added a new MCP tool specifically designed for documents containing tabs
Uses Google Docs API's includeTabsContent=True parameter to retrieve complete tab structure
Processes both main document content and all tab content recursively
Supports child tabs and nested tab structures
Table Support: Added comprehensive table processing with proper markdown-style formatting
Extracts table rows and cells with proper alignment
Includes header row separation
Preserves table structure and cell content
Advanced Paragraph Processing: Improved paragraph handling with support for:
Bullet lists and numbered lists with proper indentation
Nested list structures
Rich text formatting preservation
Additional Element Support:
Section breaks detection
Table of contents recognition
Better handling of various document elements
Hierarchical Processing: Recursive processing of document elements
Better Content Organization: Clear separation between main content and tabs
Enhanced Metadata: Detailed tab information including titles, indices, and IDs
🔧 Technical Improvements
Authentication & Service Management
Updated service decorators to use require_multiple_services for better service management
Improved parameter handling in
start_google_auth
function
Enhanced error handling for document processing operations
Code Structure
Modular content processing functions:
process_paragraph()
: Handles paragraph elements with formatting
process_table()
: Extracts and formats table content
process_content_elements()
: Recursively processes mixed content types
Better separation of concerns between content extraction and formatting
📊 Features Added
Content Types Now Supported:
✅ Tables with proper markdown formatting
✅ Bullet and numbered lists with indentation
✅ Document tabs and child tabs
✅ Section breaks and document structure elements
✅ Table of contents sections
✅ Rich text formatting preservation
API Enhancements:
✅ includeTabsContent parameter usage
✅ Recursive tab processing
✅ Enhanced metadata extraction
✅ Better error handling and logging
🎯 Use Cases
This enhancement enables:
Complex Document Processing: Handle large documents with multiple tabs and sections
Structured Content Extraction: Extract tables and lists with proper formatting
Document Navigation: Access specific tabs and sections within documents
Content Analysis: Better understanding of document structure and organization
🧪 Testing
Tested with documents containing multiple tabs and child tabs
Verified table extraction with various table structures
Confirmed proper handling of nested content elements
Validated formatting preservation across different content types
📝 Files Modified
gdocs/docs_tools.py
: Added new tool and enhanced content processing
core/server.py
: Updated authentication parameter handling
This PR significantly enhances the Google Docs integration capabilities, making it possible to extract and process complex document structures with high fidelity.