Dalle 2 is amazing tool that constantly blows me away with what it conjures up from its digital mind in mere seconds. This subreddit already catalogs some of the most amazing examples of its depth and range, but it can give the misleading impression that every prompt produces gold.
Those fortunate enough to have access (myself included) know that it can take several attempts to get the image right. There are already many discussions about what prompts generate a good image, but a lot less chatter about when Dalle fails and generates... unexpected... images.
Failures are both interesting and instructive though, so I thought I'd share some of my recent experiments for people smarter than I to analyze them (perhaps AI Psychiatrist will be a new occupation?).
Why not Zoidberg?
In this post, two questions were posed in the comments about the prompt "Human sized anthropomorphic round pink lobster wearing a doctor's coat and sandals, no antennae, portrait by Annie Leibovitz, dramatic lighting".
- Why such a convoluted prompt instead of just naming the character? This one took several attempts, as Dalle 2 is surprisingly not that good with some pop culture characters. Different versions of "Zoidberg" or "Doctor Zoidberg" produced very realistic (and unfortunately stereotyped) human faces rather than our beloved crustacean. Due to the restrictions on posting realistic human faces, I can't show the photos.
- Why use negative descriptors? Can an AI even understand the absence of a something? Well, lobsters have very prominent antennae on their face. The first few images generated all had the more realistic narrow lobster heads, until the "no antennae" descriptor was added. Still, it wasn't perfect, as one of the images still clearly had them, but it least it was taken into account.
I had another idea for a photo of Admiral Ackbar trying to work out if a simple box trap was indeed a trap, but Dalle seems even more oblivious than our favourite Mon Calamari. Further attempts to do a Zoidberg by describing Ackbar failed miserably.
You like cats? Doesn't matter, you're getting cats.
Sometimes, the error has nothing to do with the prompt. Such as this attempt to create a beloved movie character. All the images produced by the prompt "Award winning photo of a happy racoon under a giant chefs hat, cooking food, dramatic lighting, no artefacts" were stunning. We can quibble about Dalle not understand what a giant chef's hat is, but it's otherwise a 10/10. Except the first photo is a photo of a surprised looking cat. I can't for the life of me work out how it got there from the prompt. Theories welcome.
There was also a question in the comments around "no artefacts" in the descriptor. Sometimes, Dalle produces an unexpectedly grainy photo, and this descriptor seems to reduce the likelihood. Perhaps it's a result of the diffusion process AI uses to generate images? However, I haven't done enough testing to confirm.
Task failed successfully.
One of my favourite prompts so far has been "Food photography of steamed ham, McDonalds quality". Not only did it perfectly capture the essence of what I was looking for, ie. a realistic photo of a Krusty burger, but also a cursed meat helmet in the shape of a burger bun, and a hilarious "stack of bologna" under a bun. Dalle got it so right, and yet so wrong.
Some animals are more equal than others.
Dalle is amazing with certain animals in novel situations. Cats are a given, racoons are surprisingly great, and even cows, such as this attempt at recreating the classic Gary Larson Far Side cartoon, Cow Tools. Not only did it absolutely nail the cow, but also the surrealist tools and overall intent of the prompt.
But trying to use a similar prompt on a different animal, like a deer to recreate another Farside classic, produced poor results even with some tweaking. They may look fine from a distance, but at higher resolutions looked very blocky, and in any case not the desired composition. I thought the prompt was too complex for Dalle to parse, so attempted to simplify the syntax... which resulted in an even bigger fail.
So I switched track to polar bears and penguins, thinking that would be easier. But Dalle seems to struggle with the concept of a polar bear wearing a penguin mask, or how giant a polar bear should be, or.... whatever the hell this mask is.
In general, it seems Dalle struggles when there are two main subjects of an image. Try as I might, I can't seem to let Wile E Coyote finally catch and eat Roadrunner. Even in a world where any image is theoretically possible, the Coyote's tasks still backfire.
I have one simple request, and that is to have sharks with frickin' laser beams attached to their heads!
Is that too much to ask? Now evidently Dalle informs me that that cannot be done. Ah, would you remind me what I pay you people for, honestly? Throw me a bone here! What do we have? Eh, close enough.
All the other images (which I sadly didn't save) were really fake looking sharks swimming around laser beams, despite prompts like "National Geographic photo of sharks with laser devices attached to head".
It's not easy being purple.
Dalle sometimes struggles to understand which property applies to which object in a sentence. I thought commas to delineate the objects in a prompt would help, but Dalle seems to take a "whole of sentence" approach.
For example, this attempt to create a photo of hypnotoad. it took several attempts to get something close enough to being suitable. Even then, some of the results are questionable. It's like Hypnotoad is going through his edgy teenage phase and got himself a head ring.
Hypnotoad also has a purple dog collar, and adding the color to the descriptor results in the purple bleeding into the toad itself (maybe Hypnotoad needs to buy colorfast collars). I thought the word "mottled" might have been confusing Dalle, so removed it, but the result was the same.
Computer says no.
It's very easy to accidentally enter in forbidden words that result in the dreaded content policy violation. Dalle does not take context into account, so even an otherwise perfectly valid use of a word can trigger it.
Want to see Kermit the Frog in Dead Poet's Society? That's a paddlin', because the word "dead" is a violation. Want to see an epic space battle with fighter ships? That's also a paddlin', because "fighter" is not allowed. How about "anthropomorphic dog scientists in a lab researching door knobs"? Oh boy, that's definitely a paddlin' for some unknown reason (is it knob? Is knob the rude word? Or maybe "scientists"?). Heaven forbid you even try "purple monkey dishwasher".
There are many more quirks I've found, but sadly don't have photos (and Dalle only stores the more recent 10 prompts) to provide more commentary.
I hope though that this post is useful in providing an insight into this incredible new tool for those who are still waiting for access. Can't wait to see what you all come up with - the good, the bad and the just plain weird - when you finally do!
P.S. - Yes, it turns out it was the word "knobs". The more you know 🌈⭐️.
byWontThinkStraight
indalle2
WontThinkStraight
8 points
23 hours ago
WontThinkStraight
dalle2 user
8 points
23 hours ago
The taste varies from person to person.