MiMo-V2-Omni coding
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.
Capabilities
Context Window 262k tokens
Max Output 65k tokens
Inputs
Outputs
Pricing (per 1M tokens)
Input $0.40
Output $2.00
Cache Read $0.08
Cache Write -
Supported Parameters
frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_p