Generative AI with Google's Veo 3.1 - Text-to-Video

Generative AI (GAI), like other AI models, learns from training data to generate new data. Google’s Veo is a GAI that creates videos. A content creator may use three forms of input—text, frames, and ingredients—in combination to help Veo generate the clips for a movie.

Text to Video

Text-to-video converts written descriptions into moving visual scenes, frequently referred to as “videos”, “clips”, or “shorts”. The model interprets text prompts to determine objects, actions, and settings, then generates corresponding video frames. This process allows creators to visualize ideas directly from language without manual filming or animation.

Sample 1 – Grandma and Yeti

Text prompts are simply words you provide as input. It may be as simple as “zebra cow” or “grandma dances with yeti”, allowing even children to create videos.

Prompt: “grandma dances with yeti”
Video: Grandma Dances with Yeti

Sample 2 – The Great Awakening

More Imaginative text prompts may produce more interesting results with total realism. Our Halloween-themed examples continue with The Great Awakening.

Prompt: “A close-up of a pumpkin in a pumpkin patch. The pumpkin begins to wiggle, finally exposing an arm and other body parts of a living human with a pumpkin head.”
Video: The Great Awakening

Sample 3 – Jack-o-lanterns Align

Here’s a generated video of Jack-o-lanterns Align, demonstrating the physics capabilities within Veo.

Prompt: “flatbed pickup truck drives down a dusty bumpy, dirt road at dusk. On the bed is a load of carved and lit jack-o-lanterns. As he bumps along the road, one by one, they fall off the bed and perfectly align.”
Video: Jack-o-Lanterns Align

Final Notes:

Read the prompts and videos/

How to use text-to-video (this page).
How to use frames-to-video.
How to use ingredients-to-video.