The implications of Stable Diffusion and similar technologies
For the past few days I've been experimenting with Stable Diffusion. It took some time setting things up and now I can locally generate infinite amount of nice images from a text prompt.
The first thing that surprised me was that the model size was (only) 4.5GB. It feels like magic to me that I could generate unlimited images from a model of that size.
On my limited hardware (6GB video card) it takes like 1-2 minutes to generate an image. I wrote a quick script to read a list of prompts from a text file so I can leave it running and come back again later for a nice surprise (This is addictive!)
What happens when everyone can generate cool images? #
I think we are still not aware of the consequences of that technology. I see how in the short-medium term, more non technical folks, and people with cheap hardware could also have those tools available.
Similar to what happened with mobile cameras, this new capabilities will enable creative expressions in ways are difficult to grasp right now
I think we are going to see an explosion in multimedia creations that were simply not possible before.
- More writings will be acompanied by related images that enrich the user experience.
- Users will be able to generate videos (and movies?) based on descriptions
- Some people may get very good at this and new markets will emerge
- On the fly generation of images for "choose your own adventure" type of games
- Also, forks will be possible so a movie could also have forks where the story goes to very different places.
- what else?
A new field emerges: Prompt design #
When experimenting with stable difussion, I never felt like an artist myself. But I can see how with some practice I can get better at communicating with the tool to achieve better results.
It's impossible for the tool to know what I have on my mind and when my prompt is poor, the result is not something I particulary enjoy. However with some refinements I can give more hints and indirectly guiding my robotic assistant.
I can see how some people can get very skilled at this. It feels like there's some intuition that can be developed and that will differentiate the outputs produced by different people.
Tools like Stable Diffusion makes me appreciate artists more #
In the end, stable diffusion can only create from patterns it has seen before and following styles it was trained with.
A common recomendation to get good quality results is to specify the name of the artists, so the tool can use characteristics of that style in the creation.
In my quest to get high quality results I've searched for artist names and styles and have discovered some really talented people. I can generate images that contain elements of their style but I've never felt like I have their talent or think of an artist myself. My appreciation of artists has increased, not decreased based on my experiments with stable diffusion.
How can it impact artists? #
I've seen a lot of criticism coming from the artist community. For some, the generation of an image based on the style of someone else is considered as theft. I think this is an area that needs more thought. I think that a group of people and companies who need placeholder images and don't really care about authenticity may now use a tool instead of buying images from a stock site.
But for more "important" or meaningful work artists will be in demand.
An analogy with mobile cameras and professional photographers comes to my mind. I can take 1000s of pictures of stuff I like but if I need high quality shots I would pay a professional.
As a tool to assist artists. #
Instead of fighting against technology I can see how some artist will embrace it and start experimenting with it:
- Looking for references
- Using the image to image feature to extend their own works.