MiMo-V2-Omni chat

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Capabilities

Context Window 262k tokens

Max Output 65k tokens

Inputs

Outputs

Pricing (per 1M tokens)

Input $0.40

Output $2.00

Cache Read $0.08

Cache Write -

Supported Parameters

frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_p