Qwen: Qwen3.5-Flash chat
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
Capabilities
Context Window 1M tokens
Max Output 65k tokens
Inputs
Outputs
Pricing (per 1M tokens)
Input $0.07
Output $0.26
Cache Read -
Cache Write -
Supported Parameters
include_reasoningmax_tokenspresence_penaltyreasoningresponse_formatseedstructured_outputstemperaturetool_choicetoolstop_p