AI generated video, Locally

Things are getting crazy...

Dec 19, 2024

The speed of progress in AI is simply amazing. Looking back, it’s hard to believe it has only been two years since the public launch of ChatGPT. At the time, I thought it was revolutionary. I had a sense that the advent of artificial intelligence would act as a multiplier for progress, particularly in programming and data analysis, but I’m still in awe of the AI models being released, seemingly every week, and often for free. Like newly released FastHunyuan and FastMochi or Nvidia Labs' new diffusion model, Sana all of which became available just last week.

I wrote before about how offline AI Image Generator GUI can be installed and used, but for some time now there’s Pinokio. Its a virtual environment manager that helps you install, test, and use all kind of different AIs without messing up your system with thousands of orphaned files. Next to, “now standard”, Different Stable Diffusion image generators GUIs, you can install all sorts of AI models and apps in Pinokio; like echomimic2 that creates an talking avatar, ComfyUI a best if not the best user interface for all kind of models (complicated, yes, but powerful), remove backgrounds with RMBG-2-Studio, replace faces in videos with FaceFusion or even generate full web applications with backends with Cogs from simple prompt.

This is a result of 5 minute session with FaceFusion installed inside Pinokio. Default settings, default models, no optimizations.

And what’s wild is that those are just a small sample of available apps and models that can be installed with ease with Pinokio.

Now combine ease of use of Pinokio with complicated ComfyUI and insert modern AI models and you get this (just replace installation notes for standalone ComfyUI with one from Pinokio):

It was said that in the future there will be a possibility of customizing your movie experience. Well, It looks like we will be getting personalized movies sooner than expected.

I’m once again reminded about the scene in WestWorld TV show where Dolores creates entertainment by issuing voice commands. It was supposed to be science fiction but it has became reality. With voice recognition models like Whisper from Google and all sorts of stable diffusion models for image and video generation combined with Gaussian Splatting you end up with exactly what she was doing.

Two years ago I did a writeup on NeRFs but since then Gaussian Splatting has overtaken it, and I plan to revisit that technology as now, unlike old NeRFs, we can extract actual 3D information and create 3D models from them making them infinitely more useful.

We are in for an interesting decade.

Barn Lab

Discussion about this post

Ready for more?