We are using a text to image model called stable diffusion for the majority of operations.
Currently we are using stable diffusion version 2.1 with 512 x 512 native output.
Artsmart’s text to image generation is built on Diffusion Neural Networks.
A neural network uses interconnected processing units called "neurons" to analyze data and make decisions based on that data.
It’s called a “neural network” because it’s inspired by the way the human brain uses a network of neurons to think. It can be used for tasks such as image and speech recognition, language translation, and making predictions.
Neural networks are particularly good at recognizing patterns and making decisions based on those patterns.
To avoid your eyes glazing over we’ll give a rough overview - this example is heavily borrowed from a reddit post by PhyrexianSpaghetti
We take a picture of a thing, let's say a dog, and we tell the neural network, "Hey computer, please gradually turn this picture into noise and memorize every step while doing it 504 times.”
Now we take a picture of actual random noise and we tell the computer, "Hey computer, please play the 'dog to noise algorithm' but reversed.”
Wow, it’s not the same dog!
Now, if we teach the computer about the color black, and we then ask, "Hey computer, please play the 'color black to noise algorithm' but reversed, and the 'Dog to noise algorithm' but reversed, at the same time.”
☝And just like that we have a Black Dog!
If you want to deep dive a bit further here’s a great video about (starting at 5:50) about how these types of technology work.