Skip to content

Multimodal Capabilities

AiMo Network supports multiple input modalities beyond text, allowing you to send images to compatible models through our unified API. This enables rich multimodal interactions for a wide variety of use cases.

Supported Modalities

Images

Send images to vision-capable models for analysis, description, OCR, and more. AiMo Network supports multiple image formats and both URL-based and base64-encoded images.

Image Generation

Generate images from text prompts using AI models with image output capabilities. AiMo Network supports various image generation models that can create high-quality images based on your descriptions.

Getting Started

All multimodal inputs use the same /api/v1/chat/completions endpoint with the messages parameter. Different content types are specified in the message content array:

  • Images: Use image_url content type

You can combine multiple modalities in a single request, and the number of files you can send varies by provider and model.

Model Compatibility

Not all models support every modality. AiMo Network automatically filters available models based on your request content:

  • Vision models: Required for image processing

Go to our Marketplace to find models that support your desired input modalities.

Input Format Support

AiMo Network supports both direct URLs and base64-encoded data for multimodal inputs:

URLs (Recommended for public content)

  • Images: https://example.com/image.jpg

Base64 Encoding (Required for local files)

  • Images: data:image/jpeg;base64,{base64_data}