Skip to content

Conversation

@dpruessner
Copy link

Description

Enables tools to return DocumentBlock, ImageBlock, and VideoBlock content directly to multi-modal models.

Previously, tools could only return strings or JSON. This PR adds support for rich media blocks, allowing more efficient processing of documents, images, and videos through the Bedrock Converse API.

Key Changes:

  • Added ToolReturnValue type supporting media blocks
  • Enhanced tool() helper to accept DocumentBlock/ImageBlock/VideoBlock returns
  • Updated Bedrock formatting to handle media content natively
  • Maintained full backward compatibility

Related Issues

Closes #395

Documentation PR

Type of Change

New feature

Testing

How have you tested the change?

  • I ran npm run check

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@strands-agent
Copy link
Collaborator

👋 Welcome @dpruessner and thanks for this contribution!

This looks like an interesting enhancement allowing tools to return richer content types (DocumentBlock, ImageBlock, VideoBlock). This would enable more sophisticated tool outputs beyond simple text.

Hoping maintainers can take a look when they have a chance! 👀


🤖 This comment was generated by an AI agent using strands-agents. Workflow Run: 20944495454

expect(result.type).toBe('documentBlock')
expect(result.name).toBe('RESULT')
expect(result.format).toBe('md')
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Test coverage is incomplete for media block returns.

Only DocumentBlock return is tested. Consider adding tests for ImageBlock and VideoBlock returns to ensure consistent behavior across all media types.

Suggestion: Add similar tests for ImageBlock and VideoBlock:

it('handles ImageBlock return', async () => {
  const { ImageBlock } = await import('../../types/media.js')

  const imgTool = tool({
    name: 'create_image',
    description: 'Creates an image',
    inputSchema: z.object({ data: z.string() }),
    callback: (input) => {
      return new ImageBlock({
        format: 'png',
        source: { bytes: new TextEncoder().encode(input.data) },
      })
    },
  })

  const result = await imgTool.invoke({ data: 'test' })
  expect(result.type).toBe('imageBlock')
  expect(result.format).toBe('png')
})

@github-actions
Copy link

Review Summary

Assessment: Comment (Request minor changes before approval)

Key Themes

Strengths:

  • Clean implementation that extends the existing type system naturally
  • Proper handling in Bedrock model with _formatDocumentSource and _formatMediaSource
  • Good backward compatibility maintained
  • TSDoc documentation is present

Areas Needing Attention:

  1. OpenAI Compatibility: The OpenAI model now silently ignores media blocks in tool results, returning empty strings. This could lead to unexpected behavior when users switch between providers.

  2. Test Coverage: Only DocumentBlock return type is tested. Adding tests for ImageBlock and VideoBlock would improve confidence in the implementation.

Overall

This is a valuable enhancement that enables richer tool outputs. The core implementation in Bedrock and the type system changes look solid. The main suggestions focus on cross-provider compatibility and test coverage to ensure a robust feature.

Nice work on this contribution! 🎉

@github-actions
Copy link

🔗 Cross-Reference: Issue #443

This PR is directly related to issue #443 ("Solidify plan for ToolUse and ToolResponse messages").

OpenAI Compatibility Concern

In src/models/openai.ts (lines 705-722), tool result content handling returns empty string for unrecognized types:

const contentText = toolResult.content
  .map((c) => {
    if (c.type === 'textBlock') { return c.text }
    else if (c.type === 'jsonBlock') { return JSON.stringify(c.json) }
    return ''  // ⚠️ Silent data loss for DocumentBlock, ImageBlock, VideoBlock
  })
  .join('')

Suggestion: Add explicit handling for media blocks:

else if (c.type === 'documentBlock' || c.type === 'imageBlock' || c.type === 'videoBlock') {
  logger.warn(`content_type=<${c.type}> | OpenAI does not support media blocks in tool results, content omitted`)
  return `[${c.type}: content not supported by OpenAI]`
}

This ensures users are aware of the limitation rather than experiencing silent data loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Tools can return DocumentBlock, ImageBlock, VideoBlock

2 participants