AI Tech Suite

Midjourney Moves Beyond Still Images, Launches First AI Video Model

The AI image powerhouse launches V1 video, transforming stills into motion and building toward real-time interactive simulations.

June 19, 2025

Midjourney Moves Beyond Still Images, Launches First AI Video Model

Midjourney, a prominent player in the AI image generation space, has officially entered the rapidly expanding world of video creation with the debut of its first video model. The new tool, designated V1, represents a significant evolution for the company, known for producing stylistically distinct and often surreal still images.[1][2] This move signals not just an expansion of features but a foundational step in a long-term strategy aimed at developing real-time, interactive AI simulations.[3][4][5] The introduction of video capabilities places Midjourney in direct competition with established and emerging forces in the AI video market, including OpenAI's Sora, Google's Veo, and offerings from companies like Runway and Pika.[1][6]

The initial version of Midjourney's video generator operates on an image-to-video workflow.[1][7] Users can take an image, either one they have previously generated on the platform or one they upload, and animate it to create a short video clip.[1][3] This approach leverages the company's core strength and massive user base, estimated at over 20 million, who are already adept at crafting detailed prompts to produce high-quality images.[8][7] The V1 model generates four five-second video clips from a single image input.[1] Users have options to extend these clips in four-second increments up to four times, allowing for a maximum video length of 21 seconds.[8][1] This initial clip length is comparable to competitors like Google's Veo 3 and OpenAI's Sora, which currently generate videos up to 20 seconds long.[8]

Control and accessibility are key components of the V1 release. Users can opt for an automatic animation setting that applies random motion or a manual mode that allows them to describe the desired movement with a text prompt.[1][4][2] Further customization is available through "low motion" and "high motion" settings, which control the intensity of camera and subject movement.[8][9] True to its origins, access to the new video tool is integrated into the Midjourney experience via Discord, though the functionality itself is currently limited to the web platform.[1][10] This strategic decision encourages users to transition from the command-line interface of Discord to Midjourney's more robust and developing web interface, which is better suited for the complexities of video editing and interaction.[10] The cost of generating video is significantly higher than for images, with Midjourney stating that a video job consumes about eight times the GPU resources of an image generation.[9][3]

The launch of a video model is a deliberate and methodical step in Midjourney CEO David Holz's ambitious long-term vision.[3][5] He has articulated a roadmap where the company builds foundational components—visuals through image models, movement through video models, and spatial navigation through forthcoming 3D models—with the ultimate goal of creating "real-time open-world simulations."[8][5] This grand objective envisions interactive digital environments generated and rendered by AI in real-time.[4][5] By releasing the video model as a stepping stone, Midjourney gathers crucial data and user feedback that will inform the development of these more advanced systems.[4][5] The company has acknowledged that insights from the V1 video model will directly contribute to improving its image models and paving the way for 3D and real-time capabilities.[4]

Midjourney's entry into the video market comes at a time of both intense competition and legal scrutiny. The AI video generation market is projected to grow significantly, making it a highly strategic area for expansion.[7] While competitors like Runway and Google are developing sophisticated tools with advanced camera controls aimed at commercial and professional filmmaking, Midjourney's initial video offering maintains its focus on artistic and creative expression.[1][11] The output from V1 has been described as more otherworldly and artistic rather than strictly photorealistic, aligning with the brand's established aesthetic.[1][2] This focus on a distinct visual style could be a key differentiator in a crowded field.[10] However, the launch also occurs as Midjourney faces a significant copyright infringement lawsuit from major Hollywood studios, which could have broad implications for the entire generative AI industry.[12][10]

In conclusion, Midjourney's debut of its first video AI model is a pivotal moment for the company and the broader generative AI landscape. By building upon its expertise in image generation and introducing an image-to-video workflow, Midjourney has provided its vast user base with a new creative dimension.[1][7] The V1 model, with its initial set of features and controls, serves as both a competitive entry into the burgeoning AI video market and a critical building block for the company's future ambitions of creating real-time, interactive worlds.[6][5] As the technology develops and the market matures, Midjourney's strategic focus on its unique artistic style and its methodical progression toward a more complex, simulated reality will be crucial in defining its role and success in the next chapter of AI-driven content creation.[4][10]