×
Login Register an account
Top Submissions Explore Upgoat Search Random Subverse Random Post Colorize! Site Rules
12

My fail at making a Goat gif with Animation Diffusion Evo

submitted by symbolic to AI_art 4 monthsJan 9, 2024 09:24:58 ago (+12/-0)     (files.catbox.moe)

https://files.catbox.moe/d1p9od.gif

The issue is that I didn't make the workflow, and have a limited understanding of the material for right now. I was running into an error saying mat 1 and mat 2 cannot be merged; this is merging the 2 input streaming that stable would reinterpret. The issue was solved by removing the connections to the CNET nodes. This leads me to think that my CNET models are for 1.5 and not XL, which is (i think) how this workflow is set up. I'll try again later today and post the results.


17 comments block


[ - ] Anus_Expander 2 points 4 monthsJan 9, 2024 12:06:52 ago (+2/-0)

I like it, fits perfectly with the typical Voat user schizophrenia

[ - ] symbolic [op] 1 point 4 monthsJan 9, 2024 12:37:48 ago (+1/-0)

AHAHHHHHHH! Oh my god - VOAT! I found your calling! Node based editors and patch banks synth's!

[ - ] x0x7 2 points 4 monthsJan 9, 2024 11:28:44 ago (+2/-0)

I think people are taking the wrong approach to AI video. You can see that in every frame the images are very distinct. Everyone wants to adapt 2D AI to video. But I think the right way to do it is to perform diffusion on a 3D tensor.

Of course then resolution becomes an issue. Maybe the answer is gaussian splatting considering that is already working effectively in 3D. That or maybe techniques could be used like in upscaled MiDaS where you zoom into different levels and reprocesses.

But if we did diffusion in 3D to make video you could get some pretty trippy videos where backwards and forwards are symmetric and you could get some backwards stuff and forwards stuff in the same frame. Or you could have symmetries in the x and y dimensions with time and that might be cool.

Hopefully it can learn that things like to have arching shape in time, parabolic motion, feet stepping off the ground and back on, but that the spacial dimensions aren't quite as archy.

What software was that graph UI coded in?

[ - ] symbolic [op] 0 points 4 monthsJan 9, 2024 12:36:37 ago (+0/-0)

I ran some experimentation about a month or 2 ago on using panoramic video output by maya and then reinterpretation of that footage by stable and that gave some interesting results. Using panoramic as a prompt will kick out 360 video stills, and injecting that with 360 meta does make a VR video - however, issues with the "position" of the "camera" that i could not figure out. Temporal Diffusion models are attempting to address the continuity issue that comes with VAE interpretation of noise. At least i THINK i know what i am talking about. Loopback gave good results as its blends the prior frame with the current output, but that leads to ye ol' hindu gods (5 armed lobster woman). I think if one wants to solve continuity currently, then generate images (for web comics for example) of famous people should give the model thing to work from and then every frame you have your subject. I know very little python (i gather its mostly scriddies relying on libraries to do their work, nothing wrong with that though), and thus my understanding of the tensor flow model is limited. I am interested in implementing my own seq2seq for novel approach to LLM and GTP design, so i guess i gotta lot to read. Any suggestions on material to consume?

[ - ] x0x7 0 points 4 monthsJan 9, 2024 15:40:21 ago (+0/-0)*

Maybe there could be something useful where we do sub-frame interpolation that is a little more creative than direct interpolation.

Give Frame 0; Give Frame 100; Give frame 50 from frame 1 and 100 with temperature 10;
Give frame 25 from 1 and 50 with temperature 5;
Give frame 75 from 50 and 100 with temperature 5;
Give frame 12 from 1 and 25 with temperature 2.5;

IDK.

I want to see that 5 armed woman. Maybe in seeking perfection we lose the best art. Next we'll have hyper realist models and people will be banging their heads trying to get it to generate multi-armed people or abstract art more generally.

[ - ] x0x7 0 points 4 monthsJan 9, 2024 15:57:23 ago (+0/-0)

Separate reply because I realized I completely didn't read part of what you wrote so separate comment.

Interesting that you want to do some seq2seq. I'd love to hear about it. I've thought about that stuff a bit. Where I'm racking my brain on it is I understand how transformers work. I've even written some code to do it directly but haven't had a chance to test it yet. Where I'm at a loss is on the consumer side of transformers. Yeah, it seems like a great system to understand what someone wrote. Now how do you generate text from that? That's where I'm at a loss. If you have ideas about that end of things that could be quite useful.

For my own projects I'm likely to just modify llama so I guess I don't need to know that. But still your ideas might be useful if I ever wanted to start from scratch.

Let me know if you need to understand transformers. I'll give you a write up and link you to a few things.

[ - ] purityspiral 0 points 4 monthsJan 10, 2024 01:51:21 ago (+0/-0)

In Deforum I have a value I can slide around to determine how much of each original frame is kept, keeps frames indistinct, theres also a cadence that makes more fluid movement. I use a 3D interpolation, which sounds similar but I dont know all the terms so well there. Pretty sure he is using comfy.

[ - ] Niggly_Puff 2 points 4 monthsJan 9, 2024 11:24:50 ago (+2/-0)

My first test with animation

https://files.catbox.moe/y61xqr.webp

[ - ] symbolic [op] 0 points 4 monthsJan 9, 2024 12:39:54 ago (+0/-0)

That was very well done. Did you use Gradio or ComfyUI? Did you mask the still yourself and use the upload mask or did you use CNET and have it do it for you?

[ - ] Niggly_Puff 1 point 4 monthsJan 9, 2024 13:15:09 ago (+1/-0)

I used automatic 1111 for the image and comfy ui for the animation.

[ - ] symbolic [op] 0 points 4 monthsJan 9, 2024 13:41:22 ago (+0/-0)

Nice, what was your custom node for the animator diffusor and what SD model did you use? My apologies for not checking your image for meta data, i'm busy shit posting.

[ - ] Niggly_Puff 1 point 4 monthsJan 9, 2024 13:45:30 ago (+1/-0)

These 2 examples work very well. Download the workflows.

https://comfyanonymous.github.io/ComfyUI_examples/video/

One for txt to animation and another for img to animation. About 8 nodes or so, no idea what you had going on there lol

[ - ] symbolic [op] 0 points 4 monthsJan 9, 2024 13:53:47 ago (+0/-0)

Very cool, i will check those out after my work block. https://files.catbox.moe/n5c2hm.png, the meta has the workflow that i was using. It's hyper overkill, hence the spaghettis. However, it taught me a lot about what is going on.

[ - ] x0x7 0 points 4 monthsJan 9, 2024 13:01:05 ago (+0/-0)

Very cool. I want to make that shit.

[ - ] Niggly_Puff 0 points 4 monthsJan 9, 2024 13:14:38 ago (+0/-0)

It's surprisingly easy. You can do most things with a 3060, I haven't run into any limitations except running large LLMs.

[ - ] purityspiral 0 points 4 monthsJan 10, 2024 01:48:00 ago (+0/-0)

I can't live with those workflows, I get it, I just cant dive into them like prompting in automatic1111. I've spent the most time in that one.

[ - ] The_Reunto 0 points 4 monthsJan 9, 2024 14:01:36 ago (+0/-0)

This reminds me of web pages you used to see on the internet.