Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents.
To create a multimodal agent, simply set the multimodal
parameter to True
when initializing your agent:
When you set multimodal=True
, the agent is automatically configured with the necessary tools for handling non-text content, including the AddImageTool
.
The multimodal agent comes pre-configured with the AddImageTool
, which allows it to process images. You don’t need to manually add this tool - it’s automatically included when you enable multimodal capabilities.
Here’s a complete example showing how to use a multimodal agent to analyze an image:
You can provide additional context or specific questions about the image when creating tasks for multimodal agents. The task description can include specific aspects you want the agent to focus on:
When working with multimodal agents, the AddImageTool
is automatically configured with the following schema:
The multimodal agent will automatically handle the image processing through its built-in tools, allowing it to:
When working with multimodal agents, keep these best practices in mind:
Image Access
Task Description
action
parameter for focused analysisResource Management
Environment Setup
Error Handling
Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents.
To create a multimodal agent, simply set the multimodal
parameter to True
when initializing your agent:
When you set multimodal=True
, the agent is automatically configured with the necessary tools for handling non-text content, including the AddImageTool
.
The multimodal agent comes pre-configured with the AddImageTool
, which allows it to process images. You don’t need to manually add this tool - it’s automatically included when you enable multimodal capabilities.
Here’s a complete example showing how to use a multimodal agent to analyze an image:
You can provide additional context or specific questions about the image when creating tasks for multimodal agents. The task description can include specific aspects you want the agent to focus on:
When working with multimodal agents, the AddImageTool
is automatically configured with the following schema:
The multimodal agent will automatically handle the image processing through its built-in tools, allowing it to:
When working with multimodal agents, keep these best practices in mind:
Image Access
Task Description
action
parameter for focused analysisResource Management
Environment Setup
Error Handling