feat: get document content by tabs by juansepl · Pull Request #2 · bukhr/google_workspace_mcp

juansepl · 2025-06-27T22:55:25Z

🚀 Enhanced Google Docs Content Extraction with Tables and Tabs Support
Summary
This PR introduces significant improvements to Google Docs content extraction capabilities, adding support for processing tables, tabs, and other document elements with better formatting and structure preservation.

🆕 New Features

New Tool:
get_doc_content_with_tabs
Added a new MCP tool specifically designed for documents containing tabs
Uses Google Docs API's includeTabsContent=True parameter to retrieve complete tab structure
Processes both main document content and all tab content recursively
Supports child tabs and nested tab structures
Enhanced Content Processing
Table Support: Added comprehensive table processing with proper markdown-style formatting
Extracts table rows and cells with proper alignment
Includes header row separation
Preserves table structure and cell content
Advanced Paragraph Processing: Improved paragraph handling with support for:
Bullet lists and numbered lists with proper indentation
Nested list structures
Rich text formatting preservation
Additional Element Support:
Section breaks detection
Table of contents recognition
Better handling of various document elements
Improved Document Structure
Hierarchical Processing: Recursive processing of document elements
Better Content Organization: Clear separation between main content and tabs
Enhanced Metadata: Detailed tab information including titles, indices, and IDs
🔧 Technical Improvements
Authentication & Service Management
Updated service decorators to use require_multiple_services for better service management
Improved parameter handling in
start_google_auth
function
Enhanced error handling for document processing operations
Code Structure
Modular content processing functions:
process_paragraph()
: Handles paragraph elements with formatting
process_table()
: Extracts and formats table content
process_content_elements()
: Recursively processes mixed content types
Better separation of concerns between content extraction and formatting
📊 Features Added
Content Types Now Supported:

✅ Tables with proper markdown formatting
✅ Bullet and numbered lists with indentation
✅ Document tabs and child tabs
✅ Section breaks and document structure elements
✅ Table of contents sections
✅ Rich text formatting preservation
API Enhancements:

✅ includeTabsContent parameter usage
✅ Recursive tab processing
✅ Enhanced metadata extraction
✅ Better error handling and logging
🎯 Use Cases
This enhancement enables:

Complex Document Processing: Handle large documents with multiple tabs and sections
Structured Content Extraction: Extract tables and lists with proper formatting
Document Navigation: Access specific tabs and sections within documents
Content Analysis: Better understanding of document structure and organization
🧪 Testing
Tested with documents containing multiple tabs and child tabs
Verified table extraction with various table structures
Confirmed proper handling of nested content elements
Validated formatting preservation across different content types
📝 Files Modified
gdocs/docs_tools.py
: Added new tool and enhanced content processing
core/server.py
: Updated authentication parameter handling
This PR significantly enhances the Google Docs integration capabilities, making it possible to extract and process complex document structures with high fidelity.

feat: get document content by tabs

6fcd239

juansepl requested a review from alejandrorico9 June 27, 2025 22:55

juansepl self-assigned this Jun 27, 2025

juansepl added 4 commits July 1, 2025 16:22

feat: get user google email from environment var

ccc1a7b

feat: eliminar herramienta innecesaria

81e6713

feat: leer sub-tabs

77b1c72

feat: tabs by link

6c5d372

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: get document content by tabs#2

feat: get document content by tabs#2
juansepl wants to merge 5 commits intomainfrom
feat/get-content-google-docs-by-tabs

juansepl commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

juansepl commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant